From 60af9643728ce669e4695f7b2658f8d276a7fcd9 Mon Sep 17 00:00:00 2001 From: Peter Dillinger Date: Thu, 12 Nov 2020 20:45:02 -0800 Subject: [PATCH] Experimental (production candidate) SST schema for Ribbon filter (#7658) Summary: Added experimental public API for Ribbon filter: NewExperimentalRibbonFilterPolicy(). This experimental API will take a "Bloom equivalent" bits per key, and configure the Ribbon filter for the same FP rate as Bloom would have but ~30% space savings. (Note: optimize_filters_for_memory is not yet implemented for Ribbon filter. That can be added with no effect on schema.) Internally, the Ribbon filter is configured using a "one_in_fp_rate" value, which is 1 over desired FP rate. For example, use 100 for 1% FP rate. I'm expecting this will be used in the future for configuring Bloom-like filters, as I expect people to more commonly hold constant the filter accuracy and change the space vs. time trade-off, rather than hold constant the space (per key) and change the accuracy vs. time trade-off, though we might make that available. ### Benchmarking ``` $ ./filter_bench -impl=2 -quick -m_keys_total_max=200 -average_keys_per_filter=100000 -net_includes_hashing Building... Build avg ns/key: 34.1341 Number of filters: 1993 Total size (MB): 238.488 Reported total allocated memory (MB): 262.875 Reported internal fragmentation: 10.2255% Bits/key stored: 10.0029 ---------------------------- Mixed inside/outside queries... Single filter net ns/op: 18.7508 Random filter net ns/op: 258.246 Average FP rate %: 0.968672 ---------------------------- Done. (For more info, run with -legend or -help.) $ ./filter_bench -impl=3 -quick -m_keys_total_max=200 -average_keys_per_filter=100000 -net_includes_hashing Building... Build avg ns/key: 130.851 Number of filters: 1993 Total size (MB): 168.166 Reported total allocated memory (MB): 183.211 Reported internal fragmentation: 8.94626% Bits/key stored: 7.05341 ---------------------------- Mixed inside/outside queries... Single filter net ns/op: 58.4523 Random filter net ns/op: 363.717 Average FP rate %: 0.952978 ---------------------------- Done. (For more info, run with -legend or -help.) ``` 168.166 / 238.488 = 0.705 -> 29.5% space reduction 130.851 / 34.1341 = 3.83x construction time for this Ribbon filter vs. lastest Bloom filter (could make that as little as about 2.5x for less space reduction) ### Working around a hashing "flaw" bloom_test discovered a flaw in the simple hashing applied in StandardHasher when num_starts == 1 (num_slots == 128), showing an excessively high FP rate. The problem is that when many entries, on the order of number of hash bits or kCoeffBits, are associated with the same start location, the correlation between the CoeffRow and ResultRow (for efficiency) can lead to a solution that is "universal," or nearly so, for entries mapping to that start location. (Normally, variance in start location breaks the effective association between CoeffRow and ResultRow; the same value for CoeffRow is effectively different if start locations are different.) Without kUseSmash and with num_starts > 1 (thus num_starts ~= num_slots), this flaw should be completely irrelevant. Even with 10M slots, the chances of a single slot having just 16 (or more) entries map to it--not enough to cause an FP problem, which would be local to that slot if it happened--is 1 in millions. This spreadsheet formula shows that: =1/(10000000*(1 - POISSON(15, 1, TRUE))) As kUseSmash==false (the setting for Standard128RibbonBitsBuilder) is intended for CPU efficiency of filters with many more entries/slots than kCoeffBits, a very reasonable work-around is to disallow num_starts==1 when !kUseSmash, by making the minimum non-zero number of slots 2*kCoeffBits. This is the work-around I've applied. This also means that the new Ribbon filter schema (Standard128RibbonBitsBuilder) is not space-efficient for less than a few hundred entries. Because of this, I have made it fall back on constructing a Bloom filter, under existing schema, when that is more space efficient for small filters. (We can change this in the future if we want.) TODO: better unit tests for this case in ribbon_test, and probably update StandardHasher for kUseSmash case so that it can scale nicely to small filters. ### Other related changes * Add Ribbon filter to stress/crash test * Add Ribbon filter to filter_bench as -impl=3 * Add option string support, as in "filter_policy=experimental_ribbon:5.678;" where 5.678 is the Bloom equivalent bits per key. * Rename internal mode BloomFilterPolicy::kAuto to kAutoBloom * Add a general BuiltinFilterBitsBuilder::CalculateNumEntry based on binary searching CalculateSpace (inefficient), so that subclasses (especially experimental ones) don't have to provide an efficient implementation inverting CalculateSpace. * Minor refactor FastLocalBloomBitsBuilder for new base class XXH3pFilterBitsBuilder shared with new Standard128RibbonBitsBuilder, which allows the latter to fall back on Bloom construction in some extreme cases. * Mostly updated bloom_test for Ribbon filter, though a test like FullBloomTest::Schema is a next TODO to ensure schema stability (in case this becomes production-ready schema as it is). * Add some APIs to ribbon_impl.h for configuring Ribbon filters. Although these are reasonably covered by bloom_test, TODO more unit tests in ribbon_test * Added a "tool" FindOccupancyForSuccessRate to ribbon_test to get data for constructing the linear approximations in GetNumSlotsFor95PctSuccess. Pull Request resolved: https://github.com/facebook/rocksdb/pull/7658 Test Plan: Some unit tests updated but other testing is left TODO. This is considered experimental but laying down schema compatibility as early as possible in case it proves production-quality. Also tested in stress/crash test. Reviewed By: jay-zhuang Differential Revision: D24899349 Pulled By: pdillinger fbshipit-source-id: 9715f3e6371c959d923aea8077c9423c7a9f82b8 --- HISTORY.md | 3 + db/db_bloom_filter_test.cc | 12 +- db_stress_tool/db_stress_common.h | 1 + db_stress_tool/db_stress_gflags.cc | 3 + db_stress_tool/db_stress_test_base.cc | 13 +- include/rocksdb/filter_policy.h | 20 ++ options/options_test.cc | 18 +- table/block_based/filter_policy.cc | 355 +++++++++++++++++++-- table/block_based/filter_policy_internal.h | 23 +- tools/db_crashtest.py | 1 + util/bloom_test.cc | 20 +- util/filter_bench.cc | 7 +- util/ribbon_impl.h | 195 ++++++++++- util/ribbon_test.cc | 87 ++++- 14 files changed, 696 insertions(+), 62 deletions(-) diff --git a/HISTORY.md b/HISTORY.md index c765695bc8..854e1b126a 100644 --- a/HISTORY.md +++ b/HISTORY.md @@ -22,6 +22,9 @@ * The dictionary compression settings specified in `ColumnFamilyOptions::compression_opts` now additionally affect files generated by flush and compaction to non-bottommost level. Previously those settings at most affected files generated by compaction to bottommost level, depending on whether `ColumnFamilyOptions::bottommost_compression_opts` overrode them. Users who relied on dictionary compression settings in `ColumnFamilyOptions::compression_opts` affecting only the bottommost level can keep the behavior by moving their dictionary settings to `ColumnFamilyOptions::bottommost_compression_opts` and setting its `enabled` flag. * When the `enabled` flag is set in `ColumnFamilyOptions::bottommost_compression_opts`, those compression options now take effect regardless of the value in `ColumnFamilyOptions::bottommost_compression`. Previously, those compression options only took effect when `ColumnFamilyOptions::bottommost_compression != kDisableCompressionOption`. Now, they additionally take effect when `ColumnFamilyOptions::bottommost_compression == kDisableCompressionOption` (such a setting causes bottommost compression type to fall back to `ColumnFamilyOptions::compression_per_level` if configured, and otherwise fall back to `ColumnFamilyOptions::compression`). +### New Features +* An EXPERIMENTAL new Bloom alternative that saves about 30% space compared to Bloom filters, with about 3-4x construction time and similar query times is available using NewExperimentalRibbonFilterPolicy. + ## 6.14 (10/09/2020) ### Bug fixes * Fixed a bug after a `CompactRange()` with `CompactRangeOptions::change_level` set fails due to a conflict in the level change step, which caused all subsequent calls to `CompactRange()` with `CompactRangeOptions::change_level` set to incorrectly fail with a `Status::NotSupported("another thread is refitting")` error. diff --git a/db/db_bloom_filter_test.cc b/db/db_bloom_filter_test.cc index 3d4bd3a9d3..191d72060e 100644 --- a/db/db_bloom_filter_test.cc +++ b/db/db_bloom_filter_test.cc @@ -514,24 +514,24 @@ INSTANTIATE_TEST_CASE_P( ::testing::Values( std::make_tuple(BFP::kDeprecatedBlock, false, test::kDefaultFormatVersion), - std::make_tuple(BFP::kAuto, true, test::kDefaultFormatVersion), - std::make_tuple(BFP::kAuto, false, test::kDefaultFormatVersion))); + std::make_tuple(BFP::kAutoBloom, true, test::kDefaultFormatVersion), + std::make_tuple(BFP::kAutoBloom, false, test::kDefaultFormatVersion))); INSTANTIATE_TEST_CASE_P( FormatDef, DBBloomFilterTestWithParam, ::testing::Values( std::make_tuple(BFP::kDeprecatedBlock, false, test::kDefaultFormatVersion), - std::make_tuple(BFP::kAuto, true, test::kDefaultFormatVersion), - std::make_tuple(BFP::kAuto, false, test::kDefaultFormatVersion))); + std::make_tuple(BFP::kAutoBloom, true, test::kDefaultFormatVersion), + std::make_tuple(BFP::kAutoBloom, false, test::kDefaultFormatVersion))); INSTANTIATE_TEST_CASE_P( FormatLatest, DBBloomFilterTestWithParam, ::testing::Values( std::make_tuple(BFP::kDeprecatedBlock, false, test::kLatestFormatVersion), - std::make_tuple(BFP::kAuto, true, test::kLatestFormatVersion), - std::make_tuple(BFP::kAuto, false, test::kLatestFormatVersion))); + std::make_tuple(BFP::kAutoBloom, true, test::kLatestFormatVersion), + std::make_tuple(BFP::kAutoBloom, false, test::kLatestFormatVersion))); #endif // ROCKSDB_VALGRIND_RUN TEST_F(DBBloomFilterTest, BloomFilterRate) { diff --git a/db_stress_tool/db_stress_common.h b/db_stress_tool/db_stress_common.h index 6952c16b3a..40a1e653cf 100644 --- a/db_stress_tool/db_stress_common.h +++ b/db_stress_tool/db_stress_common.h @@ -144,6 +144,7 @@ DECLARE_bool(enable_write_thread_adaptive_yield); DECLARE_int32(reopen); DECLARE_double(bloom_bits); DECLARE_bool(use_block_based_filter); +DECLARE_bool(use_ribbon_filter); DECLARE_bool(partition_filters); DECLARE_bool(optimize_filters_for_memory); DECLARE_int32(index_type); diff --git a/db_stress_tool/db_stress_gflags.cc b/db_stress_tool/db_stress_gflags.cc index eeb97dca3a..155c9cc74b 100644 --- a/db_stress_tool/db_stress_gflags.cc +++ b/db_stress_tool/db_stress_gflags.cc @@ -375,6 +375,9 @@ DEFINE_bool(use_block_based_filter, false, "use block based filter" "instead of full filter for block based table"); +DEFINE_bool(use_ribbon_filter, false, + "Use Ribbon filter instead of Bloom filter"); + DEFINE_bool(partition_filters, false, "use partitioned filters " "for block-based table"); diff --git a/db_stress_tool/db_stress_test_base.cc b/db_stress_tool/db_stress_test_base.cc index da1098e274..94082bd713 100644 --- a/db_stress_tool/db_stress_test_base.cc +++ b/db_stress_tool/db_stress_test_base.cc @@ -22,11 +22,14 @@ namespace ROCKSDB_NAMESPACE { StressTest::StressTest() : cache_(NewCache(FLAGS_cache_size)), compressed_cache_(NewLRUCache(FLAGS_compressed_cache_size)), - filter_policy_(FLAGS_bloom_bits >= 0 - ? FLAGS_use_block_based_filter - ? NewBloomFilterPolicy(FLAGS_bloom_bits, true) - : NewBloomFilterPolicy(FLAGS_bloom_bits, false) - : nullptr), + filter_policy_( + FLAGS_bloom_bits >= 0 + ? FLAGS_use_ribbon_filter + ? NewExperimentalRibbonFilterPolicy(FLAGS_bloom_bits) + : FLAGS_use_block_based_filter + ? NewBloomFilterPolicy(FLAGS_bloom_bits, true) + : NewBloomFilterPolicy(FLAGS_bloom_bits, false) + : nullptr), db_(nullptr), #ifndef ROCKSDB_LITE txn_db_(nullptr), diff --git a/include/rocksdb/filter_policy.h b/include/rocksdb/filter_policy.h index 3cd85a2260..7829db14e6 100644 --- a/include/rocksdb/filter_policy.h +++ b/include/rocksdb/filter_policy.h @@ -212,4 +212,24 @@ class FilterPolicy { // trailing spaces in keys. extern const FilterPolicy* NewBloomFilterPolicy( double bits_per_key, bool use_block_based_builder = false); + +// An EXPERIMENTAL new Bloom alternative that saves about 30% space +// compared to Bloom filters, with about 3-4x construction time and +// similar query times. For example, if you pass in 10 for +// bloom_equivalent_bits_per_key, you'll get the same 0.95% FP rate +// as Bloom filter but only using about 7 bits per key. (This +// way of configuring the new filter is considered experimental +// and/or transitional, so is expected to go away.) +// +// Ribbon filters are ignored by previous versions of RocksDB, as if +// no filter was used. +// +// Note: this policy can generate Bloom filters in some cases. +// For very small filters (well under 1KB), Bloom fallback is by +// design, as the current Ribbon schema is not optimized to save vs. +// Bloom for such small filters. Other cases of Bloom fallback should +// be exceptional and log an appropriate warning. +extern const FilterPolicy* NewExperimentalRibbonFilterPolicy( + double bloom_equivalent_bits_per_key); + } // namespace ROCKSDB_NAMESPACE diff --git a/options/options_test.cc b/options/options_test.cc index 5aa035fd27..ba39b622f0 100644 --- a/options/options_test.cc +++ b/options/options_test.cc @@ -862,10 +862,11 @@ TEST_F(OptionsTest, GetBlockBasedTableOptionsFromString) { ASSERT_EQ(new_opt.format_version, 5U); ASSERT_EQ(new_opt.whole_key_filtering, true); ASSERT_TRUE(new_opt.filter_policy != nullptr); - const BloomFilterPolicy& bfp = - dynamic_cast(*new_opt.filter_policy); - EXPECT_EQ(bfp.GetMillibitsPerKey(), 4567); - EXPECT_EQ(bfp.GetWholeBitsPerKey(), 5); + const BloomFilterPolicy* bfp = + dynamic_cast(new_opt.filter_policy.get()); + EXPECT_EQ(bfp->GetMillibitsPerKey(), 4567); + EXPECT_EQ(bfp->GetWholeBitsPerKey(), 5); + EXPECT_EQ(bfp->GetMode(), BloomFilterPolicy::kAutoBloom); // unknown option Status s = GetBlockBasedTableOptionsFromString( @@ -919,6 +920,15 @@ TEST_F(OptionsTest, GetBlockBasedTableOptionsFromString) { new_opt.cache_index_and_filter_blocks); ASSERT_EQ(table_opt.filter_policy, new_opt.filter_policy); + // Experimental Ribbon filter policy + ASSERT_OK(GetBlockBasedTableOptionsFromString( + config_options, table_opt, "filter_policy=experimental_ribbon:5.678;", + &new_opt)); + ASSERT_TRUE(new_opt.filter_policy != nullptr); + bfp = dynamic_cast(new_opt.filter_policy.get()); + EXPECT_EQ(bfp->GetMillibitsPerKey(), 5678); + EXPECT_EQ(bfp->GetMode(), BloomFilterPolicy::kStandard128Ribbon); + // Check block cache options are overwritten when specified // in new format as a struct. ASSERT_OK(GetBlockBasedTableOptionsFromString( diff --git a/table/block_based/filter_policy.cc b/table/block_based/filter_policy.cc index 31eb6b90df..a7ab907d4b 100644 --- a/table/block_based/filter_policy.cc +++ b/table/block_based/filter_policy.cc @@ -7,26 +7,82 @@ // Use of this source code is governed by a BSD-style license that can be // found in the LICENSE file. See the AUTHORS file for names of contributors. +#include "rocksdb/filter_policy.h" + #include #include -#include "rocksdb/filter_policy.h" - #include "rocksdb/slice.h" #include "table/block_based/block_based_filter_block.h" -#include "table/block_based/full_filter_block.h" #include "table/block_based/filter_policy_internal.h" +#include "table/block_based/full_filter_block.h" #include "third-party/folly/folly/ConstexprMath.h" #include "util/bloom_impl.h" #include "util/coding.h" #include "util/hash.h" +#include "util/ribbon_impl.h" namespace ROCKSDB_NAMESPACE { +int BuiltinFilterBitsBuilder::CalculateNumEntry(const uint32_t bytes) { + int cur = 1; + // Find overestimate + while (CalculateSpace(cur) <= bytes && cur * 2 > cur) { + cur *= 2; + } + // Change to underestimate less than factor of two from answer + cur /= 2; + // Binary search + int delta = cur / 2; + while (delta > 0) { + if (CalculateSpace(cur + delta) <= bytes) { + cur += delta; + } + delta /= 2; + } + return cur; +} + namespace { +Slice FinishAlwaysFalse(std::unique_ptr* /*buf*/) { + // Missing metadata, treated as zero entries + return Slice(nullptr, 0); +} + +// Base class for filter builders using the XXH3 preview hash, +// also known as Hash64 or GetSliceHash64. +class XXH3pFilterBitsBuilder : public BuiltinFilterBitsBuilder { + public: + ~XXH3pFilterBitsBuilder() override {} + + virtual void AddKey(const Slice& key) override { + uint64_t hash = GetSliceHash64(key); + // Especially with prefixes, it is common to have repetition, + // though only adjacent repetition, which we want to immediately + // recognize and collapse for estimating true filter space + // requirements. + if (hash_entries_.empty() || hash != hash_entries_.back()) { + hash_entries_.push_back(hash); + } + } + + protected: + // For delegating between XXH3pFilterBitsBuilders + void SwapEntriesWith(XXH3pFilterBitsBuilder* other) { + std::swap(hash_entries_, other->hash_entries_); + } + + // A deque avoids unnecessary copying of already-saved values + // and has near-minimal peak memory use. + std::deque hash_entries_; +}; + +// #################### FastLocalBloom implementation ################## // +// ############## also known as format_version=5 Bloom filter ########## // + // See description in FastLocalBloomImpl -class FastLocalBloomBitsBuilder : public BuiltinFilterBitsBuilder { +class FastLocalBloomBitsBuilder : public XXH3pFilterBitsBuilder { public: // Non-null aggregate_rounding_balance implies optimize_filters_for_memory explicit FastLocalBloomBitsBuilder( @@ -43,13 +99,6 @@ class FastLocalBloomBitsBuilder : public BuiltinFilterBitsBuilder { ~FastLocalBloomBitsBuilder() override {} - virtual void AddKey(const Slice& key) override { - uint64_t hash = GetSliceHash64(key); - if (hash_entries_.empty() || hash != hash_entries_.back()) { - hash_entries_.push_back(hash); - } - } - virtual Slice Finish(std::unique_ptr* buf) override { size_t num_entry = hash_entries_.size(); std::unique_ptr mutable_buf; @@ -294,9 +343,6 @@ class FastLocalBloomBitsBuilder : public BuiltinFilterBitsBuilder { // See BloomFilterPolicy::aggregate_rounding_balance_. If nullptr, // always "round up" like historic behavior. std::atomic* aggregate_rounding_balance_; - // A deque avoids unnecessary copying of already-saved values - // and has near-minimal peak memory use. - std::deque hash_entries_; }; // See description in FastLocalBloomImpl @@ -341,6 +387,213 @@ class FastLocalBloomBitsReader : public FilterBitsReader { const uint32_t len_bytes_; }; +// ##################### Ribbon filter implementation ################### // + +// Implements concept RehasherTypesAndSettings in ribbon_impl.h +struct Standard128RibbonRehasherTypesAndSettings { + // These are schema-critical. Any change almost certainly changes + // underlying data. + static constexpr bool kIsFilter = true; + static constexpr bool kFirstCoeffAlwaysOne = true; + static constexpr bool kUseSmash = false; + using CoeffRow = ROCKSDB_NAMESPACE::Unsigned128; + using Hash = uint64_t; + using Seed = uint32_t; + // Changing these doesn't necessarily change underlying data, + // but might affect supported scalability of those dimensions. + using Index = uint32_t; + using ResultRow = uint32_t; + // Save a conditional in Ribbon queries + static constexpr bool kAllowZeroStarts = false; +}; + +using Standard128RibbonTypesAndSettings = + ribbon::StandardRehasherAdapter; + +class Standard128RibbonBitsBuilder : public XXH3pFilterBitsBuilder { + public: + explicit Standard128RibbonBitsBuilder(double desired_one_in_fp_rate, + int bloom_millibits_per_key, + Logger* info_log) + : desired_one_in_fp_rate_(desired_one_in_fp_rate), + info_log_(info_log), + bloom_fallback_(bloom_millibits_per_key, nullptr) { + assert(desired_one_in_fp_rate >= 1.0); + } + + // No Copy allowed + Standard128RibbonBitsBuilder(const Standard128RibbonBitsBuilder&) = delete; + void operator=(const Standard128RibbonBitsBuilder&) = delete; + + ~Standard128RibbonBitsBuilder() override {} + + virtual Slice Finish(std::unique_ptr* buf) override { + // More than 2^30 entries (~1 billion) not supported + if (hash_entries_.size() >= (size_t{1} << 30)) { + ROCKS_LOG_WARN(info_log_, "Too many keys for Ribbon filter: %llu", + static_cast(hash_entries_.size())); + SwapEntriesWith(&bloom_fallback_); + assert(hash_entries_.empty()); + return bloom_fallback_.Finish(buf); + } + if (hash_entries_.size() == 0) { + // Save a conditional in Ribbon queries by using alternate reader + // for zero entries added. + return FinishAlwaysFalse(buf); + } + uint32_t num_entries = static_cast(hash_entries_.size()); + uint32_t num_slots = BandingType::GetNumSlotsFor95PctSuccess(num_entries); + num_slots = SolnType::RoundUpNumSlots(num_slots); + + uint32_t entropy = 0; + if (num_entries > 0) { + entropy = Lower32of64(hash_entries_.front()); + } + size_t len = SolnType::GetBytesForOneInFpRate( + num_slots, desired_one_in_fp_rate_, /*rounding*/ entropy); + size_t len_with_metadata = len + 5; + + // Use Bloom filter when it's better for small filters + if (num_slots < 1024 && bloom_fallback_.CalculateSpace(static_cast( + num_entries)) < len_with_metadata) { + SwapEntriesWith(&bloom_fallback_); + assert(hash_entries_.empty()); + return bloom_fallback_.Finish(buf); + } + + BandingType banding; + bool success = banding.ResetAndFindSeedToSolve( + num_slots, hash_entries_.begin(), hash_entries_.end(), + /*starting seed*/ entropy & 255, /*seed mask*/ 255); + if (!success) { + ROCKS_LOG_WARN(info_log_, + "Too many re-seeds (256) for Ribbon filter, %llu / %llu", + static_cast(hash_entries_.size()), + static_cast(num_slots)); + SwapEntriesWith(&bloom_fallback_); + assert(hash_entries_.empty()); + return bloom_fallback_.Finish(buf); + } + hash_entries_.clear(); + + uint32_t seed = banding.GetOrdinalSeed(); + assert(seed < 256); + + std::unique_ptr mutable_buf(new char[len_with_metadata]); + + SolnType soln(mutable_buf.get(), len_with_metadata); + soln.BackSubstFrom(banding); + uint32_t num_blocks = soln.GetNumBlocks(); + // This should be guaranteed: + // num_entries < 2^30 + // => (overhead_factor < 2.0) + // num_entries * overhead_factor == num_slots < 2^31 + // => (num_blocks = num_slots / 128) + // num_blocks < 2^24 + assert(num_blocks < 0x1000000U); + + // See BloomFilterPolicy::GetBloomBitsReader re: metadata + // -2 = Marker for Standard128 Ribbon + mutable_buf[len] = static_cast(-2); + // Hash seed + mutable_buf[len + 1] = static_cast(seed); + // Number of blocks, in 24 bits + // (Along with bytes, we can derive other settings) + mutable_buf[len + 2] = static_cast(num_blocks & 255); + mutable_buf[len + 3] = static_cast((num_blocks >> 8) & 255); + mutable_buf[len + 4] = static_cast((num_blocks >> 16) & 255); + + Slice rv(mutable_buf.get(), len_with_metadata); + *buf = std::move(mutable_buf); + return rv; + } + + uint32_t CalculateSpace(const int num_entries) override { + // NB: the BuiltinFilterBitsBuilder API presumes len fits in uint32_t. + uint32_t num_slots = + NumEntriesToNumSlots(static_cast(num_entries)); + uint32_t ribbon = static_cast( + SolnType::GetBytesForOneInFpRate(num_slots, desired_one_in_fp_rate_, + /*rounding*/ 0) + + /*metadata*/ 5); + // Consider possible Bloom fallback for small filters + if (num_slots < 1024) { + uint32_t bloom = bloom_fallback_.CalculateSpace(num_entries); + return std::min(bloom, ribbon); + } else { + return ribbon; + } + } + + double EstimatedFpRate(size_t num_entries, + size_t len_with_metadata) override { + uint32_t num_slots = + NumEntriesToNumSlots(static_cast(num_entries)); + SolnType fake_soln(nullptr, len_with_metadata); + fake_soln.ConfigureForNumSlots(num_slots); + return fake_soln.ExpectedFpRate(); + } + + private: + using TS = Standard128RibbonTypesAndSettings; + using SolnType = ribbon::SerializableInterleavedSolution; + using BandingType = ribbon::StandardBanding; + + static uint32_t NumEntriesToNumSlots(uint32_t num_entries) { + uint32_t num_slots1 = BandingType::GetNumSlotsFor95PctSuccess(num_entries); + return SolnType::RoundUpNumSlots(num_slots1); + } + + // A desired value for 1/fp_rate. For example, 100 -> 1% fp rate. + double desired_one_in_fp_rate_; + + // For warnings, or can be nullptr + Logger* info_log_; + + // For falling back on Bloom filter in some exceptional cases and + // very small filter cases + FastLocalBloomBitsBuilder bloom_fallback_; +}; + +class Standard128RibbonBitsReader : public FilterBitsReader { + public: + Standard128RibbonBitsReader(const char* data, size_t len_bytes, + uint32_t num_blocks, uint32_t seed) + : soln_(const_cast(data), len_bytes) { + soln_.ConfigureForNumBlocks(num_blocks); + hasher_.SetOrdinalSeed(seed); + } + + // No Copy allowed + Standard128RibbonBitsReader(const Standard128RibbonBitsReader&) = delete; + void operator=(const Standard128RibbonBitsReader&) = delete; + + ~Standard128RibbonBitsReader() override {} + + bool MayMatch(const Slice& key) override { + uint64_t h = GetSliceHash64(key); + return soln_.FilterQuery(h, hasher_); + } + + virtual void MayMatch(int num_keys, Slice** keys, bool* may_match) override { + std::array hashes; + for (int i = 0; i < num_keys; ++i) { + hashes[i] = GetSliceHash64(*keys[i]); + // FIXME: batched get optimization + } + for (int i = 0; i < num_keys; ++i) { + may_match[i] = soln_.FilterQuery(hashes[i], hasher_); + } + } + + private: + using TS = Standard128RibbonTypesAndSettings; + ribbon::SerializableInterleavedSolution soln_; + ribbon::StandardHasher hasher_; +}; + +// ##################### Legacy Bloom implementation ################### // + using LegacyBloomImpl = LegacyLocalityBloomImpl; class LegacyBloomBitsBuilder : public BuiltinFilterBitsBuilder { @@ -595,11 +848,13 @@ const std::vector BloomFilterPolicy::kAllFixedImpls = { kLegacyBloom, kDeprecatedBlock, kFastLocalBloom, + kStandard128Ribbon, }; const std::vector BloomFilterPolicy::kAllUserModes = { kDeprecatedBlock, - kAuto, + kAutoBloom, + kStandard128Ribbon, }; BloomFilterPolicy::BloomFilterPolicy(double bits_per_key, Mode mode) @@ -616,6 +871,15 @@ BloomFilterPolicy::BloomFilterPolicy(double bits_per_key, Mode mode) // point are interpreted accurately. millibits_per_key_ = static_cast(bits_per_key * 1000.0 + 0.500001); + // For now configure Ribbon filter to match Bloom FP rate and save + // memory. (Ribbon bits per key will be ~30% less than Bloom bits per key + // for same FP rate.) + desired_one_in_fp_rate_ = + 1.0 / BloomMath::CacheLocalFpRate( + bits_per_key, + FastLocalBloomImpl::ChooseNumProbes(millibits_per_key_), + /*cache_line_bits*/ 512); + // For better or worse, this is a rounding up of a nudged rounding up, // e.g. 7.4999999999999 will round up to 8, but that provides more // predictability against small arithmetic errors in floating point. @@ -700,7 +964,7 @@ FilterBitsBuilder* BloomFilterPolicy::GetBuilderWithContext( // one exhaustive switch without (risky) recursion for (int i = 0; i < 2; ++i) { switch (cur) { - case kAuto: + case kAutoBloom: if (context.table_options.format_version < 5) { cur = kLegacyBloom; } else { @@ -733,6 +997,9 @@ FilterBitsBuilder* BloomFilterPolicy::GetBuilderWithContext( } return new LegacyBloomBitsBuilder(whole_bits_per_key_, context.info_log); + case kStandard128Ribbon: + return new Standard128RibbonBitsBuilder( + desired_one_in_fp_rate_, millibits_per_key_, context.info_log); } } assert(false); @@ -780,13 +1047,20 @@ FilterBitsReader* BloomFilterPolicy::GetFilterBitsReader( if (raw_num_probes < 1) { // Note: < 0 (or unsigned > 127) indicate special new implementations // (or reserved for future use) - if (raw_num_probes == -1) { - // Marker for newer Bloom implementations - return GetBloomBitsReader(contents); + switch (raw_num_probes) { + case 0: + // Treat as zero probes (always FP) + return new AlwaysTrueFilter(); + case -1: + // Marker for newer Bloom implementations + return GetBloomBitsReader(contents); + case -2: + // Marker for Ribbon implementations + return GetRibbonBitsReader(contents); + default: + // Reserved (treat as zero probes, always FP, for now) + return new AlwaysTrueFilter(); } - // otherwise - // Treat as zero probes (always FP) for now. - return new AlwaysTrueFilter(); } // else attempt decode for LegacyBloomBitsReader @@ -824,6 +1098,29 @@ FilterBitsReader* BloomFilterPolicy::GetFilterBitsReader( log2_cache_line_size); } +FilterBitsReader* BloomFilterPolicy::GetRibbonBitsReader( + const Slice& contents) const { + uint32_t len_with_meta = static_cast(contents.size()); + uint32_t len = len_with_meta - 5; + + assert(len > 0); // precondition + + uint32_t seed = static_cast(contents.data()[len + 1]); + uint32_t num_blocks = static_cast(contents.data()[len + 2]); + num_blocks |= static_cast(contents.data()[len + 3]) << 8; + num_blocks |= static_cast(contents.data()[len + 4]) << 16; + if (num_blocks < 2) { + // Not supported + // num_blocks == 1 is not used because num_starts == 1 is problematic + // for the hashing scheme. num_blocks == 0 is unused because there's + // already a concise encoding of an "always false" filter. + // Return something safe: + return new AlwaysTrueFilter(); + } + return new Standard128RibbonBitsReader(contents.data(), len, num_blocks, + seed); +} + // For newer Bloom filter implementations FilterBitsReader* BloomFilterPolicy::GetBloomBitsReader( const Slice& contents) const { @@ -890,7 +1187,7 @@ const FilterPolicy* NewBloomFilterPolicy(double bits_per_key, if (use_block_based_builder) { m = BloomFilterPolicy::kDeprecatedBlock; } else { - m = BloomFilterPolicy::kAuto; + m = BloomFilterPolicy::kAutoBloom; } assert(std::find(BloomFilterPolicy::kAllUserModes.begin(), BloomFilterPolicy::kAllUserModes.end(), @@ -898,6 +1195,12 @@ const FilterPolicy* NewBloomFilterPolicy(double bits_per_key, return new BloomFilterPolicy(bits_per_key, m); } +extern const FilterPolicy* NewExperimentalRibbonFilterPolicy( + double bloom_equivalent_bits_per_key) { + return new BloomFilterPolicy(bloom_equivalent_bits_per_key, + BloomFilterPolicy::kStandard128Ribbon); +} + FilterBuildingContext::FilterBuildingContext( const BlockBasedTableOptions& _table_options) : table_options(_table_options) {} @@ -908,6 +1211,7 @@ Status FilterPolicy::CreateFromString( const ConfigOptions& /*options*/, const std::string& value, std::shared_ptr* policy) { const std::string kBloomName = "bloomfilter:"; + const std::string kExpRibbonName = "experimental_ribbon:"; if (value == kNullptrString || value == "rocksdb.BuiltinBloomFilter") { policy->reset(); #ifndef ROCKSDB_LITE @@ -924,6 +1228,11 @@ Status FilterPolicy::CreateFromString( policy->reset( NewBloomFilterPolicy(bits_per_key, use_block_based_builder)); } + } else if (value.compare(0, kExpRibbonName.size(), kExpRibbonName) == 0) { + double bloom_equivalent_bits_per_key = + ParseDouble(trim(value.substr(kExpRibbonName.size()))); + policy->reset( + NewExperimentalRibbonFilterPolicy(bloom_equivalent_bits_per_key)); } else { return Status::NotFound("Invalid filter policy name ", value); #else diff --git a/table/block_based/filter_policy_internal.h b/table/block_based/filter_policy_internal.h index 783373b262..457a3b2060 100644 --- a/table/block_based/filter_policy_internal.h +++ b/table/block_based/filter_policy_internal.h @@ -29,6 +29,10 @@ class BuiltinFilterBitsBuilder : public FilterBitsBuilder { // return >= the num_entry passed in. virtual uint32_t CalculateSpace(const int num_entry) = 0; + // A somewhat expensive but workable default implementation + // using binary search on CalculateSpace + int CalculateNumEntry(const uint32_t bytes) override; + // Returns an estimate of the FP rate of the returned filter if // `keys` keys are added and the filter returned by Finish is `bytes` // bytes. @@ -64,10 +68,12 @@ class BloomFilterPolicy : public FilterPolicy { // FastLocalBloomImpl. // NOTE: TESTING ONLY as this mode does not check format_version kFastLocalBloom = 2, - // Automatically choose from the above (except kDeprecatedBlock) based on + // A Bloom alternative saving about 30% space for ~3-4x construction + // CPU time. See ribbon_alg.h and ribbon_impl.h. + kStandard128Ribbon = 3, + // Automatically choose between kLegacyBloom and kFastLocalBloom based on // context at build time, including compatibility with format_version. - // NOTE: This is currently the only recommended mode that is user exposed. - kAuto = 100, + kAutoBloom = 100, }; // All the different underlying implementations that a BloomFilterPolicy // might use, as a mode that says "always use this implementation." @@ -115,8 +121,12 @@ class BloomFilterPolicy : public FilterPolicy { int GetMillibitsPerKey() const { return millibits_per_key_; } // Essentially for testing only: legacy whole bits/key int GetWholeBitsPerKey() const { return whole_bits_per_key_; } + // Testing only + Mode GetMode() const { return mode_; } private: + // Bits per key settings are for configuring Bloom filters. + // Newer filters support fractional bits per key. For predictable behavior // of 0.001-precision values across floating point implementations, we // round to thousandths of a bit (on average) per key. @@ -127,6 +137,10 @@ class BloomFilterPolicy : public FilterPolicy { // behavior with format_version < 5 just in case.) int whole_bits_per_key_; + // For configuring Ribbon filter: a desired value for 1/fp_rate. For + // example, 100 -> 1% fp rate. + double desired_one_in_fp_rate_; + // Selected mode (a specific implementation or way of selecting an // implementation) for building new SST filters. Mode mode_; @@ -147,6 +161,9 @@ class BloomFilterPolicy : public FilterPolicy { // For newer Bloom filter implementation(s) FilterBitsReader* GetBloomBitsReader(const Slice& contents) const; + + // For Ribbon filter implementation(s) + FilterBitsReader* GetRibbonBitsReader(const Slice& contents) const; }; } // namespace ROCKSDB_NAMESPACE diff --git a/tools/db_crashtest.py b/tools/db_crashtest.py index b32f212274..722593caf5 100644 --- a/tools/db_crashtest.py +++ b/tools/db_crashtest.py @@ -100,6 +100,7 @@ default_params = { "mock_direct_io": False, "use_full_merge_v1": lambda: random.randint(0, 1), "use_merge": lambda: random.randint(0, 1), + "use_ribbon_filter": lambda: random.randint(0, 1), "verify_checksum": 1, "write_buffer_size": 4 * 1024 * 1024, "writepercent": 35, diff --git a/util/bloom_test.cc b/util/bloom_test.cc index 2c671794a1..4eab70280c 100644 --- a/util/bloom_test.cc +++ b/util/bloom_test.cc @@ -381,7 +381,8 @@ class FullBloomTest : public testing::TestWithParam { case BloomFilterPolicy::kFastLocalBloom: return for_fast_local_bloom; case BloomFilterPolicy::kDeprecatedBlock: - case BloomFilterPolicy::kAuto: + case BloomFilterPolicy::kAutoBloom: + case BloomFilterPolicy::kStandard128Ribbon: /* N/A */; } // otherwise @@ -473,7 +474,7 @@ TEST_P(FullBloomTest, FullVaryingLengths) { } Build(); - ASSERT_LE(FilterSize(), + EXPECT_LE(FilterSize(), (size_t)((length * 10 / 8) + CACHE_LINE_SIZE * 2 + 5)); // All added keys must match @@ -488,7 +489,7 @@ TEST_P(FullBloomTest, FullVaryingLengths) { fprintf(stderr, "False positives: %5.2f%% @ length = %6d ; bytes = %6d\n", rate*100.0, length, static_cast(FilterSize())); } - ASSERT_LE(rate, 0.02); // Must not be over 2% + EXPECT_LE(rate, 0.02); // Must not be over 2% if (rate > 0.0125) mediocre_filters++; // Allowed, but not too often else @@ -498,10 +499,14 @@ TEST_P(FullBloomTest, FullVaryingLengths) { fprintf(stderr, "Filters: %d good, %d mediocre\n", good_filters, mediocre_filters); } - ASSERT_LE(mediocre_filters, good_filters/5); + EXPECT_LE(mediocre_filters, good_filters / 5); } TEST_P(FullBloomTest, OptimizeForMemory) { + if (GetParam() == BloomFilterPolicy::kStandard128Ribbon) { + // TODO Not yet implemented + return; + } char buffer[sizeof(int)]; for (bool offm : {true, false}) { table_options_.optimize_filters_for_memory = offm; @@ -596,6 +601,10 @@ inline uint32_t SelectByCacheLineSize(uint32_t for64, uint32_t for128, // ability to read filters generated using other cache line sizes. // See RawSchema. TEST_P(FullBloomTest, Schema) { + if (GetParam() == BloomFilterPolicy::kStandard128Ribbon) { + // TODO ASAP to ensure schema stability + return; + } char buffer[sizeof(int)]; // Use enough keys so that changing bits / key by 1 is guaranteed to @@ -974,7 +983,8 @@ TEST_P(FullBloomTest, CorruptFilters) { INSTANTIATE_TEST_CASE_P(Full, FullBloomTest, testing::Values(BloomFilterPolicy::kLegacyBloom, - BloomFilterPolicy::kFastLocalBloom)); + BloomFilterPolicy::kFastLocalBloom, + BloomFilterPolicy::kStandard128Ribbon)); } // namespace ROCKSDB_NAMESPACE diff --git a/util/filter_bench.cc b/util/filter_bench.cc index 7aaf30a73d..3761dce756 100644 --- a/util/filter_bench.cc +++ b/util/filter_bench.cc @@ -80,7 +80,8 @@ DEFINE_bool(new_builder, false, DEFINE_uint32(impl, 0, "Select filter implementation. Without -use_plain_table_bloom:" - "0 = full filter, 1 = block-based filter. With " + "0 = legacy full Bloom filter, 1 = block-based Bloom filter, " + "2 = format_version 5 Bloom filter, 3 = Ribbon128 filter. With " "-use_plain_table_bloom: 0 = no locality, 1 = locality."); DEFINE_bool(net_includes_hashing, false, @@ -306,9 +307,9 @@ void FilterBench::Go() { throw std::runtime_error( "Block-based filter not currently supported by filter_bench"); } - if (FLAGS_impl > 2) { + if (FLAGS_impl > 3) { throw std::runtime_error( - "-impl must currently be 0 or 2 for Block-based table"); + "-impl must currently be 0, 2, or 3 for Block-based table"); } } diff --git a/util/ribbon_impl.h b/util/ribbon_impl.h index ee81d6a1f5..aec1b29c23 100644 --- a/util/ribbon_impl.h +++ b/util/ribbon_impl.h @@ -179,11 +179,11 @@ class StandardHasher { // this function) when number of slots is roughly 10k or larger. // // The best values for these smash weights might depend on how - // densely you're packing entries, but this seems to work well for - // 2% overhead and roughly 50% success probability. + // densely you're packing entries, and also kCoeffBits, but this + // seems to work well for roughly 95% success probability. // - constexpr auto kFrontSmash = kCoeffBits / 3; - constexpr auto kBackSmash = kCoeffBits / 3; + constexpr Index kFrontSmash = kCoeffBits / 4; + constexpr Index kBackSmash = kCoeffBits / 4; Index start = FastRangeGeneric(h, num_starts + kFrontSmash + kBackSmash); start = std::max(start, kFrontSmash); start -= kFrontSmash; @@ -265,11 +265,16 @@ class StandardHasher { // This is not so much "critical path" code because it can be done in // parallel (instruction level) with memory lookup. // - // There is no evidence that ResultRow needs to be independent from - // CoeffRow, so we draw from the same bits computed for CoeffRow, - // which are reasonably independent from Start. (Inlining and common - // subexpression elimination with GetCoeffRow should make this + // ResultRow bits only needs to be independent from CoeffRow bits if + // many entries might have the same start location, where "many" is + // comparable to number of hash bits or kCoeffBits. If !kUseSmash + // and num_starts > kCoeffBits, it is safe and efficient to draw from + // the same bits computed for CoeffRow, which are reasonably + // independent from Start. (Inlining and common subexpression + // elimination with GetCoeffRow should make this // a single shared multiplication in generated code.) + // + // TODO: fix & test the kUseSmash case with very small num_starts Hash a = h * kCoeffAndResultFactor; // The bits here that are *most* independent of Start are the highest // order bits (as in Knuth multiplicative hash). To make those the @@ -432,6 +437,7 @@ class StandardBanding : public StandardHasher { StandardBanding(Index num_slots = 0, Index backtrack_size = 0) { Reset(num_slots, backtrack_size); } + void Reset(Index num_slots, Index backtrack_size = 0) { if (num_slots == 0) { // Unusual (TypesAndSettings::kAllowZeroStarts) or "uninitialized" @@ -456,6 +462,7 @@ class StandardBanding : public StandardHasher { } EnsureBacktrackSize(backtrack_size); } + void EnsureBacktrackSize(Index backtrack_size) { if (backtrack_size > backtrack_size_) { backtrack_.reset(new Index[backtrack_size]); @@ -601,6 +608,54 @@ class StandardBanding : public StandardHasher { return false; } + // ******************************************************************** + // Static high-level API + + // Based on data from FindOccupancyForSuccessRate in ribbon_test, + // returns a number of slots for a given number of entries to add + // that should have roughly 95% or better chance of successful + // construction per seed. Does NOT do rounding for InterleavedSoln; + // call RoundUpNumSlots for that. + // + // num_to_add should not exceed roughly 2/3rds of the maximum value + // of the Index type to avoid overflow. + static Index GetNumSlotsFor95PctSuccess(Index num_to_add) { + if (num_to_add == 0) { + return 0; + } + double factor = GetFactorFor95PctSuccess(num_to_add); + Index num_slots = static_cast(num_to_add * factor); + assert(num_slots >= num_to_add); + return num_slots; + } + + // Based on data from FindOccupancyForSuccessRate in ribbon_test, + // given a number of entries to add, returns a space overhead factor + // (slots divided by num_to_add) that should have roughly 95% or better + // chance of successful construction per seed. Does NOT do rounding for + // InterleavedSoln; call RoundUpNumSlots for that. + // + // The reason that num_to_add is needed is that Ribbon filters of a + // particular CoeffRow size do not scale infinitely. + static double GetFactorFor95PctSuccess(Index num_to_add) { + double log2_num_to_add = std::log(num_to_add) * 1.442695; + if (kCoeffBits == 64) { + if (TypesAndSettings::kUseSmash) { + return 1.02 + std::max(log2_num_to_add - 8.5, 0.0) * 0.009; + } else { + return 1.05 + std::max(log2_num_to_add - 11.0, 0.0) * 0.009; + } + } else { + // Currently only support 64 and 128 + assert(kCoeffBits == 128); + if (TypesAndSettings::kUseSmash) { + return 1.01 + std::max(log2_num_to_add - 10.0, 0.0) * 0.0042; + } else { + return 1.02 + std::max(log2_num_to_add - 12.0, 0.0) * 0.0042; + } + } + } + protected: // TODO: explore combining in a struct std::unique_ptr coeff_rows_; @@ -759,6 +814,19 @@ class SerializableInterleavedSolution { // ******************************************************************** // High-level API + void ConfigureForNumBlocks(Index num_blocks) { + if (num_blocks == 0) { + PrepareForNumStarts(0); + } else { + PrepareForNumStarts(num_blocks * kCoeffBits - kCoeffBits + 1); + } + } + + void ConfigureForNumSlots(Index num_slots) { + assert(num_slots % kCoeffBits == 0); + ConfigureForNumBlocks(num_slots / kCoeffBits); + } + template void BackSubstFrom(const BandingStorage& bs) { if (TypesAndSettings::kAllowZeroStarts && bs.GetNumStarts() == 0) { @@ -805,7 +873,7 @@ class SerializableInterleavedSolution { // Note: Ignoring smash setting; still close enough in that case double lower_portion = - (upper_start_block_ * kCoeffBits * 1.0) / num_starts_; + (upper_start_block_ * 1.0 * kCoeffBits) / num_starts_; // Each result (solution) bit (column) cuts FP rate in half. Weight that // for upper and lower number of bits (columns). @@ -813,7 +881,112 @@ class SerializableInterleavedSolution { (1.0 - lower_portion) * std::pow(0.5, upper_num_columns_); } + // ******************************************************************** + // Static high-level API + + // Round up to a number of slots supported by this structure. Note that + // this needs to be must be taken into account for the banding if this + // solution layout/storage is to be used. + static Index RoundUpNumSlots(Index num_slots) { + // Must be multiple of kCoeffBits + Index corrected = (num_slots + kCoeffBits - 1) / kCoeffBits * kCoeffBits; + + // Do not use num_starts==1 unless kUseSmash, because the hashing + // might not be equipped for stacking up so many entries on a + // single start location. + if (!TypesAndSettings::kUseSmash && corrected == kCoeffBits) { + corrected += kCoeffBits; + } + return corrected; + } + + // Compute the number of bytes for a given number of slots and desired + // FP rate. Since desired FP rate might not be exactly achievable, + // rounding_bias32==0 means to always round toward lower FP rate + // than desired (more bytes); rounding_bias32==max uint32_t means always + // round toward higher FP rate than desired (fewer bytes); other values + // act as a proportional threshold or bias between the two. + static size_t GetBytesForFpRate(Index num_slots, double desired_fp_rate, + uint32_t rounding_bias32) { + return InternalGetBytesForFpRate(num_slots, desired_fp_rate, + 1.0 / desired_fp_rate, rounding_bias32); + } + + // The same, but specifying desired accuracy as 1.0 / FP rate, or + // one_in_fp_rate. E.g. desired_one_in_fp_rate=100 means 1% FP rate. + static size_t GetBytesForOneInFpRate(Index num_slots, + double desired_one_in_fp_rate, + uint32_t rounding_bias32) { + return InternalGetBytesForFpRate(num_slots, 1.0 / desired_one_in_fp_rate, + desired_one_in_fp_rate, rounding_bias32); + } + protected: + static size_t InternalGetBytesForFpRate(Index num_slots, + double desired_fp_rate, + double desired_one_in_fp_rate, + uint32_t rounding_bias32) { + assert(TypesAndSettings::kIsFilter); + if (TypesAndSettings::kAllowZeroStarts && num_slots == 0) { + // Unusual. Zero starts presumes no keys added -> always false (no FPs) + return 0U; + } + // Must be rounded up already. + assert(RoundUpNumSlots(num_slots) == num_slots); + + if (desired_one_in_fp_rate > 1.0 && desired_fp_rate < 1.0) { + // Typical: less than 100% FP rate + if (desired_one_in_fp_rate <= static_cast(-1)) { + // Typical: Less than maximum result row entropy + ResultRow rounded = static_cast(desired_one_in_fp_rate); + int lower_columns = FloorLog2(rounded); + double lower_columns_fp_rate = std::pow(2.0, -lower_columns); + double upper_columns_fp_rate = std::pow(2.0, -(lower_columns + 1)); + // Floating point don't let me down! + assert(lower_columns_fp_rate >= desired_fp_rate); + assert(upper_columns_fp_rate <= desired_fp_rate); + + double lower_portion = (desired_fp_rate - upper_columns_fp_rate) / + (lower_columns_fp_rate - upper_columns_fp_rate); + // Floating point don't let me down! + assert(lower_portion >= 0.0); + assert(lower_portion <= 1.0); + + double rounding_bias = (rounding_bias32 + 0.5) / double{0x100000000}; + assert(rounding_bias > 0.0); + assert(rounding_bias < 1.0); + + // Note: Ignoring smash setting; still close enough in that case + Index num_starts = num_slots - kCoeffBits + 1; + // Lower upper_start_block means lower FP rate (higher accuracy) + Index upper_start_block = static_cast( + (lower_portion * num_starts + rounding_bias) / kCoeffBits); + Index num_blocks = num_slots / kCoeffBits; + assert(upper_start_block < num_blocks); + + // Start by assuming all blocks use lower number of columns + Index num_segments = num_blocks * static_cast(lower_columns); + // Correct by 1 each for blocks using upper number of columns + num_segments += (num_blocks - upper_start_block); + // Total bytes + return num_segments * sizeof(CoeffRow); + } else { + // one_in_fp_rate too big, thus requested FP rate is smaller than + // supported. Use max number of columns for minimum supported FP rate. + return num_slots * sizeof(ResultRow); + } + } else { + // Effectively asking for 100% FP rate, or NaN etc. + if (TypesAndSettings::kAllowZeroStarts) { + // Zero segments + return 0U; + } else { + // One segment (minimum size, maximizing FP rate) + return sizeof(CoeffRow); + } + } + } + void InternalConfigure() { const Index num_blocks = GetNumBlocks(); Index num_segments = GetNumSegments(); @@ -842,11 +1015,11 @@ class SerializableInterleavedSolution { data_len_ = num_segments * sizeof(CoeffRow); } + char* const data_; + size_t data_len_; Index num_starts_ = 0; Index upper_num_columns_ = 0; Index upper_start_block_ = 0; - char* const data_; - size_t data_len_; }; } // namespace ribbon diff --git a/util/ribbon_test.cc b/util/ribbon_test.cc index 00dda42a07..9067c9719f 100644 --- a/util/ribbon_test.cc +++ b/util/ribbon_test.cc @@ -14,12 +14,36 @@ #ifndef GFLAGS uint32_t FLAGS_thoroughness = 5; +bool FLAGS_find_occ = false; +double FLAGS_find_next_factor = 1.414; +double FLAGS_find_success = 0.95; +double FLAGS_find_delta_start = 0.01; +double FLAGS_find_delta_end = 0.0001; +double FLAGS_find_delta_shrink = 0.99; +uint32_t FLAGS_find_min_slots = 128; +uint32_t FLAGS_find_max_slots = 12800000; #else #include "util/gflags_compat.h" using GFLAGS_NAMESPACE::ParseCommandLineFlags; // Using 500 is a good test when you have time to be thorough. // Default is for general RocksDB regression test runs. DEFINE_uint32(thoroughness, 5, "iterations per configuration"); + +// Options for FindOccupancyForSuccessRate, which is more of a tool +// than a test. +DEFINE_bool(find_occ, false, + "whether to run the FindOccupancyForSuccessRate tool"); +DEFINE_double(find_next_factor, 1.414, + "target success rate for FindOccupancyForSuccessRate"); +DEFINE_double(find_success, 0.95, + "target success rate for FindOccupancyForSuccessRate"); +DEFINE_double(find_delta_start, 0.01, " for FindOccupancyForSuccessRate"); +DEFINE_double(find_delta_end, 0.0001, " for FindOccupancyForSuccessRate"); +DEFINE_double(find_delta_shrink, 0.99, " for FindOccupancyForSuccessRate"); +DEFINE_uint32(find_min_slots, 128, + "number of slots for FindOccupancyForSuccessRate"); +DEFINE_uint32(find_max_slots, 12800000, + "number of slots for FindOccupancyForSuccessRate"); #endif // GFLAGS template @@ -44,6 +68,11 @@ struct StandardKeyGen { return *this; } + StandardKeyGen& operator+=(uint64_t i) { + id_ += i; + return *this; + } + const std::string& operator*() { // Use multiplication to mix things up a little in the key ROCKSDB_NAMESPACE::EncodeFixed64(&str_[str_.size() - 8], @@ -81,6 +110,11 @@ struct SmallKeyGen { return *this; } + SmallKeyGen& operator+=(uint64_t i) { + id_ += i; + return *this; + } + const std::string& operator*() { ROCKSDB_NAMESPACE::EncodeFixed64(&str_[str_.size() - 8], id_); return str_; @@ -325,8 +359,8 @@ TYPED_TEST(RibbonTypeParamTest, CompactnessAndBacktrackAndFpRate) { Index num_slots = static_cast(num_to_add * kFactor); if (test_interleaved) { - // Round to nearest multiple of kCoeffBits - num_slots = ((num_slots + kCoeffBits / 2) / kCoeffBits) * kCoeffBits; + // Round to supported number of slots + num_slots = InterleavedSoln::RoundUpNumSlots(num_slots); // Re-adjust num_to_add to get as close as possible to kFactor num_to_add = static_cast(num_slots / kFactor); } @@ -839,6 +873,55 @@ TEST(RibbonTest, PhsfBasic) { } } +// Not a real test, but a tool used to build GetNumSlotsFor95PctSuccess +TYPED_TEST(RibbonTypeParamTest, FindOccupancyForSuccessRate) { + IMPORT_RIBBON_TYPES_AND_SETTINGS(TypeParam); + IMPORT_RIBBON_IMPL_TYPES(TypeParam); + using KeyGen = typename TypeParam::KeyGen; + + if (!FLAGS_find_occ) { + fprintf(stderr, "Tool disabled during unit test runs\n"); + return; + } + + KeyGen cur("blah", 0); + + Banding banding; + Index num_slots = InterleavedSoln::RoundUpNumSlots(FLAGS_find_min_slots); + while (num_slots < FLAGS_find_max_slots) { + double factor = 0.95; + double delta = FLAGS_find_delta_start; + while (delta > FLAGS_find_delta_end) { + Index num_to_add = static_cast(factor * num_slots); + KeyGen end = cur; + end += num_to_add; + bool success = banding.ResetAndFindSeedToSolve(num_slots, cur, end, 0, 0); + cur = end; // fresh keys + if (success) { + factor += delta * (1.0 - FLAGS_find_success); + factor = std::min(factor, 1.0); + } else { + factor -= delta * FLAGS_find_success; + factor = std::max(factor, 0.0); + } + delta *= FLAGS_find_delta_shrink; + fprintf(stderr, + "slots: %u log2_slots: %g target_success: %g ->overhead: %g\r", + static_cast(num_slots), + std::log(num_slots * 1.0) / std::log(2.0), FLAGS_find_success, + 1.0 / factor); + } + fprintf(stderr, "\n"); + + num_slots = std::max( + num_slots + 1, static_cast(num_slots * FLAGS_find_next_factor)); + num_slots = InterleavedSoln::RoundUpNumSlots(num_slots); + } +} + +// TODO: unit tests for configuration APIs +// TODO: unit tests for small filter FP rates + int main(int argc, char** argv) { ::testing::InitGoogleTest(&argc, argv); #ifdef GFLAGS