rocksdb/table
Peter Dillinger f63428bcc7 Optimize, simplify filter block building (fix regression) (#12931)
Summary:
This is in part a refactoring / simplification to set up for "decoupled" partitioned filters and in part to fix an intentional regression for a correctness fix in https://github.com/facebook/rocksdb/issues/12872. Basically, we are taking out some complexity of the filter block builders, and pushing part of it (simultaneous de-duplication of prefixes and whole keys) into the filter bits builders, where it is more efficient by operating on hashes (rather than copied keys).

Previously, the FullFilterBlockBuilder had a somewhat fragile and confusing set of conditions under which it would keep a copy of the most recent prefix and most recent whole key, along with some other state that is essentially redundant. Now we just track (always) the previous prefix in the PartitionedFilterBlockBuilder, to deal with the boundary prefix Seek filtering problem. (Btw, the next PR will optimize this away since BlockBasedTableReader already tracks the previous key.) And to deal with the problem of de-duplicating both whole keys and prefixes going into a single filter, we add a new function to FilterBitsBuilder that has that extra de-duplication capabilty, which is relatively efficient because we only have to cache an extra 64-bit hash, not a copied key or prefix. (The API of this new function is somewhat awkward to avoid a small CPU regression in some cases.)

Also previously, there was awkward logic split between FullFilterBlockBuilder and PartitionedFilterBlockBuilder to deal with some things specific to partitioning. And confusing names like Add vs. AddKey. FullFilterBlockBuilder is much cleaner and simplified now.

The splitting of PartitionedFilterBlockBuilder::MaybeCutAFilterBlock into DecideCutAFilterBlock and CutAFilterBlock is to address what would have been a slight performance regression in some cases. The split allows for more intruction-level parallelism by reducing unnecessary control dependencies.

Pull Request resolved: https://github.com/facebook/rocksdb/pull/12931

Test Plan:
existing tests (with some minor updates)

Also manually ported over the pre-broken regression test described in
 https://github.com/facebook/rocksdb/issues/12870 and ran it (passed).

Performance:
Here we validate that an entire series of recent related PRs are a net improvement in aggregate. "Before" is with these PRs reverted: https://github.com/facebook/rocksdb/issues/12872 #12911 https://github.com/facebook/rocksdb/issues/12874 #12867 https://github.com/facebook/rocksdb/issues/12903 #12904. "After" includes this PR (and all
of those, with base revision 16c21af). Simultaneous test script designed to maximally depend on SST construction efficiency:

```
for PF in 0 1; do for PS in 0 8; do for WK in 0 1; do [ "$PS" == "$WK" ] || (for I in `seq 1 20`; do TEST_TMPDIR=/dev/shm/rocksdb2 ./db_bench -benchmarks=fillrandom -num=10000000 -disable_wal=1 -write_buffer_size=30000000 -memtablerep=vector -allow_concurrent_memtable_write=0 -bloom_bits=10 -compaction_style=2 -fifo_compaction_max_table_files_size_mb=10000 -fifo_compaction_allow_compaction=0 -partition_index_and_filters=$PF -prefix_size=$PS -whole_key_filtering=$WK 2>&1 | grep micros/op; done) | awk '{ t += $5; c++; print } END { print 1.0 * t / c }'; echo "Was -partition_index_and_filters=$PF -prefix_size=$PS -whole_key_filtering=$WK"; done; done; done) | tee results
```

Showing average ops/sec of "after" vs. "before"

```
-partition_index_and_filters=0 -prefix_size=0 -whole_key_filtering=1
935586 vs. 928176 (+0.79%)
-partition_index_and_filters=0 -prefix_size=8 -whole_key_filtering=0
930171 vs. 926801 (+0.36%)
-partition_index_and_filters=0 -prefix_size=8 -whole_key_filtering=1
910727 vs. 894397 (+1.8%)
-partition_index_and_filters=1 -prefix_size=0 -whole_key_filtering=1
929795 vs. 922007 (+0.84%)
-partition_index_and_filters=1 -prefix_size=8 -whole_key_filtering=0
921924 vs. 917285 (+0.51%)
-partition_index_and_filters=1 -prefix_size=8 -whole_key_filtering=1
903393 vs. 887340 (+1.8%)
```

As one would predict, the most improvement is seen in cases where we have optimized away copying the whole key.

Reviewed By: jowlyzhang

Differential Revision: D61138271

Pulled By: pdillinger

fbshipit-source-id: 427cef0b1465017b45d0a507bfa7720fa20af043
2024-08-14 15:13:16 -07:00
..
adaptive
block_based Optimize, simplify filter block building (fix regression) (#12931) 2024-08-14 15:13:16 -07:00
cuckoo
plain Add CompactForTieringCollector to support automatically trigger compaction for tiering use case (#12760) 2024-06-18 10:51:29 -07:00
block_fetcher.cc Add ticker stats for read corruption retries (#12923) 2024-08-12 15:32:07 -07:00
block_fetcher.h Fix stale memory access with FSBuffer and tiered sec cache (#12712) 2024-05-30 12:33:58 -07:00
block_fetcher_test.cc
cleanable_test.cc
compaction_merging_iterator.cc Fix possible double-free on TruncatedRangeDelIterator (#12805) 2024-06-24 11:51:16 -07:00
compaction_merging_iterator.h Fix possible double-free on TruncatedRangeDelIterator (#12805) 2024-06-24 11:51:16 -07:00
format.cc Fix AddressSanitizer container-overflow (#12722) 2024-06-04 09:41:53 -07:00
format.h
get_context.cc
get_context.h
internal_iterator.h
iter_heap.h
iterator.cc Add Iterator property "rocksdb.iterator.is-value-pinned" (#12659) 2024-05-15 19:11:52 -07:00
iterator_wrapper.h
merger_test.cc
merging_iterator.cc Fix possible double-free on TruncatedRangeDelIterator (#12805) 2024-06-24 11:51:16 -07:00
merging_iterator.h Fix possible double-free on TruncatedRangeDelIterator (#12805) 2024-06-24 11:51:16 -07:00
meta_blocks.cc Add some checks at property block creation side (#12898) 2024-07-31 13:28:17 -07:00
meta_blocks.h Add some checks at property block creation side (#12898) 2024-07-31 13:28:17 -07:00
mock_table.cc
mock_table.h
multiget_context.h
persistent_cache_helper.cc
persistent_cache_helper.h
persistent_cache_options.h
sst_file_dumper.cc Support read timestamp in ldb (#12641) 2024-05-13 15:43:12 -07:00
sst_file_dumper.h
sst_file_reader.cc Remove extra semi colon from internal_repo_rocksdb/repo/table/sst_file_reader.cc 2024-05-22 07:14:52 -07:00
sst_file_reader_test.cc Add support in SstFileReader to get a raw table iterator (#12385) 2024-04-02 21:23:06 -07:00
sst_file_writer.cc
sst_file_writer_collectors.h Add CompactForTieringCollector to support automatically trigger compaction for tiering use case (#12760) 2024-06-18 10:51:29 -07:00
table_builder.h Add CompactForTieringCollector to support automatically trigger compaction for tiering use case (#12760) 2024-06-18 10:51:29 -07:00
table_factory.cc
table_iterator.h Add support in SstFileReader to get a raw table iterator (#12385) 2024-04-02 21:23:06 -07:00
table_properties.cc
table_properties_internal.h
table_reader.h Fix TSAN-reported data race with uncache_aggressiveness (#12753) 2024-06-11 16:53:13 -07:00
table_reader_bench.cc
table_test.cc Fix wrong padded bytes being used to generate file checksum (#12598) 2024-04-30 15:38:53 -07:00
two_level_iterator.cc
two_level_iterator.h
unique_id.cc
unique_id_impl.h