mirror of
https://github.com/facebook/rocksdb.git
synced 2024-11-26 07:30:54 +00:00
f63428bcc7
Summary:
This is in part a refactoring / simplification to set up for "decoupled" partitioned filters and in part to fix an intentional regression for a correctness fix in https://github.com/facebook/rocksdb/issues/12872. Basically, we are taking out some complexity of the filter block builders, and pushing part of it (simultaneous de-duplication of prefixes and whole keys) into the filter bits builders, where it is more efficient by operating on hashes (rather than copied keys).
Previously, the FullFilterBlockBuilder had a somewhat fragile and confusing set of conditions under which it would keep a copy of the most recent prefix and most recent whole key, along with some other state that is essentially redundant. Now we just track (always) the previous prefix in the PartitionedFilterBlockBuilder, to deal with the boundary prefix Seek filtering problem. (Btw, the next PR will optimize this away since BlockBasedTableReader already tracks the previous key.) And to deal with the problem of de-duplicating both whole keys and prefixes going into a single filter, we add a new function to FilterBitsBuilder that has that extra de-duplication capabilty, which is relatively efficient because we only have to cache an extra 64-bit hash, not a copied key or prefix. (The API of this new function is somewhat awkward to avoid a small CPU regression in some cases.)
Also previously, there was awkward logic split between FullFilterBlockBuilder and PartitionedFilterBlockBuilder to deal with some things specific to partitioning. And confusing names like Add vs. AddKey. FullFilterBlockBuilder is much cleaner and simplified now.
The splitting of PartitionedFilterBlockBuilder::MaybeCutAFilterBlock into DecideCutAFilterBlock and CutAFilterBlock is to address what would have been a slight performance regression in some cases. The split allows for more intruction-level parallelism by reducing unnecessary control dependencies.
Pull Request resolved: https://github.com/facebook/rocksdb/pull/12931
Test Plan:
existing tests (with some minor updates)
Also manually ported over the pre-broken regression test described in
https://github.com/facebook/rocksdb/issues/12870 and ran it (passed).
Performance:
Here we validate that an entire series of recent related PRs are a net improvement in aggregate. "Before" is with these PRs reverted: https://github.com/facebook/rocksdb/issues/12872 #12911 https://github.com/facebook/rocksdb/issues/12874 #12867 https://github.com/facebook/rocksdb/issues/12903 #12904. "After" includes this PR (and all
of those, with base revision
|
||
---|---|---|
.. | ||
adaptive | ||
block_based | ||
cuckoo | ||
plain | ||
block_fetcher.cc | ||
block_fetcher.h | ||
block_fetcher_test.cc | ||
cleanable_test.cc | ||
compaction_merging_iterator.cc | ||
compaction_merging_iterator.h | ||
format.cc | ||
format.h | ||
get_context.cc | ||
get_context.h | ||
internal_iterator.h | ||
iter_heap.h | ||
iterator.cc | ||
iterator_wrapper.h | ||
merger_test.cc | ||
merging_iterator.cc | ||
merging_iterator.h | ||
meta_blocks.cc | ||
meta_blocks.h | ||
mock_table.cc | ||
mock_table.h | ||
multiget_context.h | ||
persistent_cache_helper.cc | ||
persistent_cache_helper.h | ||
persistent_cache_options.h | ||
sst_file_dumper.cc | ||
sst_file_dumper.h | ||
sst_file_reader.cc | ||
sst_file_reader_test.cc | ||
sst_file_writer.cc | ||
sst_file_writer_collectors.h | ||
table_builder.h | ||
table_factory.cc | ||
table_iterator.h | ||
table_properties.cc | ||
table_properties_internal.h | ||
table_reader.h | ||
table_reader_bench.cc | ||
table_test.cc | ||
two_level_iterator.cc | ||
two_level_iterator.h | ||
unique_id.cc | ||
unique_id_impl.h |