rocksdb/db
Peter Dillinger 280b9f371a Fix auto_prefix_mode performance with partitioned filters (#10012)
Summary:
Essentially refactored the RangeMayExist implementation in
FullFilterBlockReader to FilterBlockReaderCommon so that it applies to
partitioned filters as well. (The function is not called for the
block-based filter case.) RangeMayExist is essentially a series of checks
around a possible PrefixMayExist, and I'm confident those checks should
be the same for partitioned as for full filters. (I think it's likely
that bugs remain in those checks, but this change is overall a simplifying
one.)

Added auto_prefix_mode support to db_bench

Other small fixes as well

Fixes https://github.com/facebook/rocksdb/issues/10003

Pull Request resolved: https://github.com/facebook/rocksdb/pull/10012

Test Plan:
Expanded unit test that uses statistics to check for filter
optimization, fails without the production code changes here

Performance: populate two DBs with
```
TEST_TMPDIR=/dev/shm/rocksdb_nonpartitioned ./db_bench -benchmarks=fillrandom -num=10000000 -disable_wal=1 -write_buffer_size=30000000 -bloom_bits=16 -compaction_style=2 -fifo_compaction_max_table_files_size_mb=10000 -fifo_compaction_allow_compaction=0 -prefix_size=8
TEST_TMPDIR=/dev/shm/rocksdb_partitioned ./db_bench -benchmarks=fillrandom -num=10000000 -disable_wal=1 -write_buffer_size=30000000 -bloom_bits=16 -compaction_style=2 -fifo_compaction_max_table_files_size_mb=10000 -fifo_compaction_allow_compaction=0 -prefix_size=8 -partition_index_and_filters
```

Observe no measurable change in non-partitioned performance
```
TEST_TMPDIR=/dev/shm/rocksdb_nonpartitioned ./db_bench -benchmarks=seekrandom[-X1000] -num=10000000 -readonly -bloom_bits=16 -compaction_style=2 -fifo_compaction_max_table_files_size_mb=10000 -fifo_compaction_allow_compaction=0 -prefix_size=8 -auto_prefix_mode -cache_index_and_filter_blocks=1 -cache_size=1000000000 -duration 20
```
Before: seekrandom [AVG 15 runs] : 11798 (± 331) ops/sec
After: seekrandom [AVG 15 runs] : 11724 (± 315) ops/sec

Observe big improvement with partitioned (also supported by bloom use statistics)
```
TEST_TMPDIR=/dev/shm/rocksdb_partitioned ./db_bench -benchmarks=seekrandom[-X1000] -num=10000000 -readonly -bloom_bits=16 -compaction_style=2 -fifo_compaction_max_table_files_size_mb=10000 -fifo_compaction_allow_compaction=0 -prefix_size=8 -partition_index_and_filters -auto_prefix_mode -cache_index_and_filter_blocks=1 -cache_size=1000000000 -duration 20
```
Before: seekrandom [AVG 12 runs] : 2942 (± 57) ops/sec
After: seekrandom [AVG 12 runs] : 7489 (± 184) ops/sec

Reviewed By: siying

Differential Revision: D36469796

Pulled By: pdillinger

fbshipit-source-id: bcf1e2a68d347b32adb2b27384f945434e7a266d
2022-05-19 13:09:03 -07:00
..
blob Remove own ToString() (#9955) 2022-05-06 13:03:58 -07:00
compaction Track SST unique id in MANIFEST and verify (#9990) 2022-05-19 11:04:21 -07:00
db_impl Track SST unique id in MANIFEST and verify (#9990) 2022-05-19 11:04:21 -07:00
arena_wrapped_db_iter.cc Remove own ToString() (#9955) 2022-05-06 13:03:58 -07:00
arena_wrapped_db_iter.h Cleanup includes in dbformat.h (#8930) 2021-09-29 04:04:40 -07:00
builder.cc Track SST unique id in MANIFEST and verify (#9990) 2022-05-19 11:04:21 -07:00
builder.h CompactionIterator sees consistent view of which keys are committed (#9830) 2022-04-14 11:11:04 -07:00
c.cc Port the batched version of MultiGet() to RocksDB's C API (#9952) 2022-05-12 18:17:36 -07:00
c_test.c Port the batched version of MultiGet() to RocksDB's C API (#9952) 2022-05-12 18:17:36 -07:00
column_family.cc Use std::numeric_limits<> (#9954) 2022-05-05 13:08:21 -07:00
column_family.h Meta-internal folly integration with F14FastMap (#9546) 2022-04-13 07:34:01 -07:00
column_family_test.cc Remove own ToString() (#9955) 2022-05-06 13:03:58 -07:00
compact_files_test.cc Remove own ToString() (#9955) 2022-05-06 13:03:58 -07:00
comparator_db_test.cc Remove own ToString() (#9955) 2022-05-06 13:03:58 -07:00
convenience.cc Specify largest_seqno in VerifyChecksum (#9919) 2022-05-02 10:22:08 -07:00
corruption_test.cc Update WAL corruption test so that it fails without fix (#9942) 2022-05-11 16:12:55 -07:00
cuckoo_table_db_test.cc Remove own ToString() (#9955) 2022-05-06 13:03:58 -07:00
db_basic_test.cc Remove own ToString() (#9955) 2022-05-06 13:03:58 -07:00
db_block_cache_test.cc Remove own ToString() (#9955) 2022-05-06 13:03:58 -07:00
db_bloom_filter_test.cc Rewrite memory-charging feature's option API (#9926) 2022-05-17 15:01:51 -07:00
db_compaction_filter_test.cc Remove own ToString() (#9955) 2022-05-06 13:03:58 -07:00
db_compaction_test.cc Remove own ToString() (#9955) 2022-05-06 13:03:58 -07:00
db_dynamic_level_test.cc Remove deprecated API AdvancedColumnFamilyOptions::soft_rate_limit/hard_rate_limit (#9452) 2022-01-27 13:01:09 -08:00
db_encryption_test.cc Fix a minor issue with initializing the test path (#8555) 2021-07-23 08:38:45 -07:00
db_filesnapshot.cc Use std::numeric_limits<> (#9954) 2022-05-05 13:08:21 -07:00
db_flush_test.cc Use std::numeric_limits<> (#9954) 2022-05-05 13:08:21 -07:00
db_info_dumper.cc Printing IO Error in DumpDBFileSummary (#9940) 2022-05-04 10:19:53 -07:00
db_info_dumper.h Add a DB Session ID (#6959) 2020-06-15 10:47:02 -07:00
db_inplace_update_test.cc fix a bug, c api, if enable inplace_update_support, and use create sn… (#9471) 2022-03-21 12:04:33 -07:00
db_io_failure_test.cc Enable a few unit tests to use custom Env objects (#9087) 2021-11-08 11:05:59 -08:00
db_iter.cc Remove dead code (#9825) 2022-04-11 10:26:55 -07:00
db_iter.h Remove dead code (#9825) 2022-04-11 10:26:55 -07:00
db_iter_stress_test.cc Remove own ToString() (#9955) 2022-05-06 13:03:58 -07:00
db_iter_test.cc Remove own ToString() (#9955) 2022-05-06 13:03:58 -07:00
db_iterator_test.cc Remove own ToString() (#9955) 2022-05-06 13:03:58 -07:00
db_kv_checksum_test.cc Use std::numeric_limits<> (#9954) 2022-05-05 13:08:21 -07:00
db_log_iter_test.cc Remove own ToString() (#9955) 2022-05-06 13:03:58 -07:00
db_logical_block_size_cache_test.cc Attempt to deflake DBLogicalBlockSizeCacheTest.CreateColumnFamilies (#9516) 2022-03-04 11:35:28 -08:00
db_memtable_test.cc Remove own ToString() (#9955) 2022-05-06 13:03:58 -07:00
db_merge_operand_test.cc Fix GetMergeOperands() heap-use-after-free on flushed memtable (#9805) 2022-04-05 12:26:36 -07:00
db_merge_operator_test.cc Fix a minor issue with initializing the test path (#8555) 2021-07-23 08:38:45 -07:00
db_options_test.cc Track SST unique id in MANIFEST and verify (#9990) 2022-05-19 11:04:21 -07:00
db_properties_test.cc Remove own ToString() (#9955) 2022-05-06 13:03:58 -07:00
db_range_del_test.cc Remove own ToString() (#9955) 2022-05-06 13:03:58 -07:00
db_rate_limiter_test.cc Set Read rate limiter priority dynamically and pass it to FS (#9996) 2022-05-18 19:41:44 -07:00
db_secondary_test.cc Track SST unique id in MANIFEST and verify (#9990) 2022-05-19 11:04:21 -07:00
db_sst_test.cc Rewrite memory-charging feature's option API (#9926) 2022-05-17 15:01:51 -07:00
db_statistics_test.cc Bytes read stat for VerifyChecksum() and VerifyFileChecksums() APIs (#8741) 2021-09-07 13:28:29 -07:00
db_table_properties_test.cc Remove own ToString() (#9955) 2022-05-06 13:03:58 -07:00
db_tailing_iter_test.cc Fix a minor issue with initializing the test path (#8555) 2021-07-23 08:38:45 -07:00
db_test.cc Revert "Bugfix/fix manual flush blocking bug (#9893)" (#9992) 2022-05-13 12:31:30 -07:00
db_test2.cc Fix auto_prefix_mode performance with partitioned filters (#10012) 2022-05-19 13:09:03 -07:00
db_test_util.cc Rewrite memory-charging feature's option API (#9926) 2022-05-17 15:01:51 -07:00
db_test_util.h Rewrite memory-charging feature's option API (#9926) 2022-05-17 15:01:51 -07:00
db_universal_compaction_test.cc Remove own ToString() (#9955) 2022-05-06 13:03:58 -07:00
db_wal_test.cc Remove own ToString() (#9955) 2022-05-06 13:03:58 -07:00
db_with_timestamp_basic_test.cc Remove own ToString() (#9955) 2022-05-06 13:03:58 -07:00
db_with_timestamp_compaction_test.cc Use the comparator from the sst file table properties in sst_dump_tool (#9491) 2022-02-08 12:15:35 -08:00
db_write_buffer_manager_test.cc Enable a few unit tests to use custom Env objects (#9087) 2021-11-08 11:05:59 -08:00
db_write_test.cc Remove own ToString() (#9955) 2022-05-06 13:03:58 -07:00
dbformat.cc Track per-SST user-defined timestamp information in MANIFEST (#9092) 2021-11-10 10:49:04 -08:00
dbformat.h Use std::numeric_limits<> (#9954) 2022-05-05 13:08:21 -07:00
dbformat_test.cc Enable a few unit tests to use custom Env objects (#9087) 2021-11-08 11:05:59 -08:00
deletefile_test.cc Remove own ToString() (#9955) 2022-05-06 13:03:58 -07:00
error_handler.cc Expand auto recovery to background read errors (#9679) 2022-03-15 14:45:34 -07:00
error_handler.h Expand auto recovery to background read errors (#9679) 2022-03-15 14:45:34 -07:00
error_handler_fs_test.cc Remove own ToString() (#9955) 2022-05-06 13:03:58 -07:00
event_helpers.cc Fix a race condition in WAL tracking causing DB open failure (#9715) 2022-03-23 19:41:31 -07:00
event_helpers.h Add a listener callback for end of auto error recovery (#9244) 2021-12-08 14:30:57 -08:00
experimental.cc Track SST unique id in MANIFEST and verify (#9990) 2022-05-19 11:04:21 -07:00
external_sst_file_basic_test.cc Track SST unique id in MANIFEST and verify (#9990) 2022-05-19 11:04:21 -07:00
external_sst_file_ingestion_job.cc Track SST unique id in MANIFEST and verify (#9990) 2022-05-19 11:04:21 -07:00
external_sst_file_ingestion_job.h Track SST unique id in MANIFEST and verify (#9990) 2022-05-19 11:04:21 -07:00
external_sst_file_test.cc Remove own ToString() (#9955) 2022-05-06 13:03:58 -07:00
fault_injection_test.cc Fix a bug causing duplicate trailing entries in WritableFile (buffered IO) (#9236) 2021-12-13 09:00:36 -08:00
file_indexer.cc
file_indexer.h Use std::numeric_limits<> (#9954) 2022-05-05 13:08:21 -07:00
file_indexer_test.cc
filename_test.cc fixing issue #8345 RocksDB does not work when using UNC network paths (#9384) 2022-03-30 15:55:31 -07:00
flush_job.cc Track SST unique id in MANIFEST and verify (#9990) 2022-05-19 11:04:21 -07:00
flush_job.h Set Write rate limiter priority dynamically and pass it to FS (#9988) 2022-05-18 00:41:41 -07:00
flush_job_test.cc Set Write rate limiter priority dynamically and pass it to FS (#9988) 2022-05-18 00:41:41 -07:00
flush_scheduler.cc
flush_scheduler.h Include C++ standard library headers instead of C compatibility headers (#8068) 2021-03-19 12:09:47 -07:00
forward_iterator.cc Remove own ToString() (#9955) 2022-05-06 13:03:58 -07:00
forward_iterator.h Fast path for detecting unchanged prefix_extractor (#9407) 2022-01-21 11:37:46 -08:00
forward_iterator_bench.cc Remove using namespace (#9369) 2022-01-12 09:31:12 -08:00
history_trimming_iterator.h Add OpenAndTrimHistory API to support trimming data with specified timestamp (#9410) 2022-03-11 16:13:23 -08:00
import_column_family_job.cc Track SST unique id in MANIFEST and verify (#9990) 2022-05-19 11:04:21 -07:00
import_column_family_job.h New stable, fixed-length cache keys (#9126) 2021-12-16 17:15:13 -08:00
import_column_family_test.cc Track SST unique id in MANIFEST and verify (#9990) 2022-05-19 11:04:21 -07:00
internal_stats.cc Remove own ToString() (#9955) 2022-05-06 13:03:58 -07:00
internal_stats.h Expose the amount of garbage in live blob files as a dedicated DB property (#9835) 2022-04-13 13:36:30 -07:00
job_context.h CompactionIterator sees consistent view of which keys are committed (#9830) 2022-04-14 11:11:04 -07:00
kv_checksum.h fix compile errors in db/kv_checksum.h (#9173) 2021-11-16 10:20:50 -08:00
listener_test.cc Remove own ToString() (#9955) 2022-05-06 13:03:58 -07:00
log_format.h Add record to set WAL compression type if enabled (#9556) 2022-02-17 16:19:31 -08:00
log_reader.cc Integrate WAL compression into log reader/writer. (#9642) 2022-03-09 15:49:53 -08:00
log_reader.h Integrate WAL compression into log reader/writer. (#9642) 2022-03-09 15:49:53 -08:00
log_test.cc Integrate WAL compression into log reader/writer. (#9642) 2022-03-09 15:49:53 -08:00
log_writer.cc Integrate WAL compression into log reader/writer. (#9642) 2022-03-09 15:49:53 -08:00
log_writer.h Integrate WAL compression into log reader/writer. (#9642) 2022-03-09 15:49:53 -08:00
logs_with_prep_tracker.cc
logs_with_prep_tracker.h Include C++ standard library headers instead of C compatibility headers (#8068) 2021-03-19 12:09:47 -07:00
lookup_key.h Cleanup includes in dbformat.h (#8930) 2021-09-29 04:04:40 -07:00
malloc_stats.cc Replace most typedef with using= (#8751) 2021-09-07 11:31:59 -07:00
malloc_stats.h
manual_compaction_test.cc Remove using namespace (#9369) 2022-01-12 09:31:12 -08:00
memtable.cc Use std::numeric_limits<> (#9954) 2022-05-05 13:08:21 -07:00
memtable.h Meta-internal folly integration with F14FastMap (#9546) 2022-04-13 07:34:01 -07:00
memtable_list.cc Encode min_log_number_to_keep and delete_wals_before in one version edit (#9766) 2022-03-31 20:00:52 -07:00
memtable_list.h Fix various spelling errors still found in code (#9653) 2022-05-05 19:45:32 -07:00
memtable_list_test.cc Remove own ToString() (#9955) 2022-05-06 13:03:58 -07:00
merge_context.h Add Merge Operator support to WriteBatchWithIndex (#8135) 2021-05-10 12:50:25 -07:00
merge_helper.cc Support readahead during compaction for blob files (#9187) 2021-11-19 17:53:47 -08:00
merge_helper.h Support readahead during compaction for blob files (#9187) 2021-11-19 17:53:47 -08:00
merge_helper_test.cc Support readahead during compaction for blob files (#9187) 2021-11-19 17:53:47 -08:00
merge_operator.cc
merge_test.cc Make the Env class Customizable (#9293) 2022-01-04 16:45:49 -08:00
obsolete_files_test.cc Remove own ToString() (#9955) 2022-05-06 13:03:58 -07:00
options_file_test.cc No elide constructors (#7798) 2020-12-23 16:55:53 -08:00
output_validator.cc Cleanup includes in dbformat.h (#8930) 2021-09-29 04:04:40 -07:00
output_validator.h Cleanup includes in dbformat.h (#8930) 2021-09-29 04:04:40 -07:00
perf_context_test.cc Remove own ToString() (#9955) 2022-05-06 13:03:58 -07:00
periodic_work_scheduler.cc Fix a timer crash caused by invalid memory management (#9656) 2022-03-12 11:45:56 -08:00
periodic_work_scheduler.h Fix a timer crash caused by invalid memory management (#9656) 2022-03-12 11:45:56 -08:00
periodic_work_scheduler_test.cc Fix a minor issue with initializing the test path (#8555) 2021-07-23 08:38:45 -07:00
pinned_iterators_manager.h Replace most typedef with using= (#8751) 2021-09-07 11:31:59 -07:00
plain_table_db_test.cc Remove own ToString() (#9955) 2022-05-06 13:03:58 -07:00
pre_release_callback.h Fix and detect headers with missing dependencies (#8893) 2021-09-10 10:00:26 -07:00
prefix_test.cc Remove own ToString() (#9955) 2022-05-06 13:03:58 -07:00
range_del_aggregator.cc In ParseInternalKey(), include corrupt key info in Status (#7515) 2020-10-28 10:12:58 -07:00
range_del_aggregator.h Fix some typos in comments (#8066) 2021-03-25 21:18:08 -07:00
range_del_aggregator_bench.cc Cleanup multiple implementations of VectorIterator (#8901) 2021-10-06 07:48:31 -07:00
range_del_aggregator_test.cc Cleanup multiple implementations of VectorIterator (#8901) 2021-10-06 07:48:31 -07:00
range_tombstone_fragmenter.cc Added memtable garbage statistics (#8411) 2021-06-18 04:57:27 -07:00
range_tombstone_fragmenter.h Added memtable garbage statistics (#8411) 2021-06-18 04:57:27 -07:00
range_tombstone_fragmenter_test.cc Cleanup multiple implementations of VectorIterator (#8901) 2021-10-06 07:48:31 -07:00
read_callback.h Fix and detect headers with missing dependencies (#8893) 2021-09-10 10:00:26 -07:00
repair.cc Track SST unique id in MANIFEST and verify (#9990) 2022-05-19 11:04:21 -07:00
repair_test.cc Track SST unique id in MANIFEST and verify (#9990) 2022-05-19 11:04:21 -07:00
snapshot_checker.h Use STATIC_AVOID_DESTRUCTION for static objects with non-trivial destructors (#9958) 2022-05-17 09:39:22 -07:00
snapshot_impl.cc
snapshot_impl.h Fix and detect headers with missing dependencies (#8893) 2021-09-10 10:00:26 -07:00
table_cache.cc Improve the precision of row entry charge in row_cache (#9337) 2022-05-09 12:27:38 -07:00
table_cache.h Fast path for detecting unchanged prefix_extractor (#9407) 2022-01-21 11:37:46 -08:00
table_properties_collector.cc Apply sample_for_compression to all block-based tables (#8105) 2021-03-25 15:00:45 -07:00
table_properties_collector.h Track each SST's timestamp information as user properties (#9093) 2021-11-19 11:37:06 -08:00
table_properties_collector_test.cc Improve / clean up meta block code & integrity (#9163) 2021-11-18 11:43:44 -08:00
transaction_log_impl.cc Add checks to GetUpdatesSince (#9459) 2022-04-14 17:12:16 -07:00
transaction_log_impl.h Add checks to GetUpdatesSince (#9459) 2022-04-14 17:12:16 -07:00
trim_history_scheduler.cc
trim_history_scheduler.h
version_builder.cc Use std::numeric_limits<> (#9954) 2022-05-05 13:08:21 -07:00
version_builder.h Fast path for detecting unchanged prefix_extractor (#9407) 2022-01-21 11:37:46 -08:00
version_builder_test.cc Track SST unique id in MANIFEST and verify (#9990) 2022-05-19 11:04:21 -07:00
version_edit.cc Track SST unique id in MANIFEST and verify (#9990) 2022-05-19 11:04:21 -07:00
version_edit.h Track SST unique id in MANIFEST and verify (#9990) 2022-05-19 11:04:21 -07:00
version_edit_handler.cc Fix a race condition in WAL tracking causing DB open failure (#9715) 2022-03-23 19:41:31 -07:00
version_edit_handler.h Fixed manifest_dump issues when printing keys and values containing null characters (#8378) 2021-06-10 12:55:20 -07:00
version_edit_test.cc Track SST unique id in MANIFEST and verify (#9990) 2022-05-19 11:04:21 -07:00
version_set.cc Track SST unique id in MANIFEST and verify (#9990) 2022-05-19 11:04:21 -07:00
version_set.h Track SST unique id in MANIFEST and verify (#9990) 2022-05-19 11:04:21 -07:00
version_set_test.cc Track SST unique id in MANIFEST and verify (#9990) 2022-05-19 11:04:21 -07:00
version_util.h Add manifest fix-up utility for file temperatures (#9683) 2022-03-18 16:35:51 -07:00
wal_edit.cc Always track WAL obsoletion (#7759) 2020-12-09 16:02:12 -08:00
wal_edit.h Use std::numeric_limits<> (#9954) 2022-05-05 13:08:21 -07:00
wal_edit_test.cc Always track WAL obsoletion (#7759) 2020-12-09 16:02:12 -08:00
wal_manager.cc Remove own ToString() (#9955) 2022-05-06 13:03:58 -07:00
wal_manager.h Add checks to GetUpdatesSince (#9459) 2022-04-14 17:12:16 -07:00
wal_manager_test.cc Remove own ToString() (#9955) 2022-05-06 13:03:58 -07:00
write_batch.cc Use std::numeric_limits<> (#9954) 2022-05-05 13:08:21 -07:00
write_batch_base.cc
write_batch_internal.h Support WBWI for keys having timestamps (#9603) 2022-02-22 14:23:01 -08:00
write_batch_test.cc Remove own ToString() (#9955) 2022-05-06 13:03:58 -07:00
write_callback.h
write_callback_test.cc Move slow valgrind tests behind -DROCKSDB_FULL_VALGRIND_RUN (#8475) 2021-07-07 11:14:05 -07:00
write_controller.cc Revamp WriteController (#8064) 2021-03-18 09:47:31 -07:00
write_controller.h Set Write rate limiter priority dynamically and pass it to FS (#9988) 2022-05-18 00:41:41 -07:00
write_controller_test.cc Revamp WriteController (#8064) 2021-03-18 09:47:31 -07:00
write_thread.cc Rate-limit automatic WAL flush after each user write (#9607) 2022-03-08 13:19:39 -08:00
write_thread.h Rate-limit automatic WAL flush after each user write (#9607) 2022-03-08 13:19:39 -08:00