rocksdb/util
Peter Dillinger 39f5846ec7 Much better stats for seeks and prefix filtering (#11460)
Summary:
We want to know more about opportunities for better range filters, and the effectiveness of our own range filters. Currently the stats are very limited, essentially logging just hits and misses against prefix filters for range scans in BLOOM_FILTER_PREFIX_* without tracking the false positive rate. Perhaps confusingly, when prefix filters are used for point queries, the stats are currently going into the non-PREFIX tickers.

This change does several things:
* Introduce new stat tickers for seeks and related filtering, \*LEVEL_SEEK\*
  * Most importantly, allows us to see opportunities for range filtering. Specifically, we can count how many times a seek in an SST file accesses at least one data block, and how many times at least one value() is then accessed. If a data block was accessed but no value(), we can generally assume that the key(s) seen was(were) not of interest so could have been filtered with the right kind of filter, avoiding the data block access.
  * We can get the same level of detail when a filter (for now, prefix Bloom/ribbon) is used, or not. Specifically, we can infer a false positive rate for prefix filters (not available before) from the seek "false positive" rate: when a data block is accessed but no value() is called. (There can be other explanations for a seek false positive, but in typical iterator usage it would indicate a filter false positive.)
  * For efficiency, I wanted to avoid making additional calls to the prefix extractor (or key comparisons, etc.), which would be required if we wanted to more precisely detect filter false positives. I believe that instrumenting value() is the best balance of efficiency vs. accurately measuring what we are often interested in.
  * The stats are divided between last level and non-last levels, to help understand potential tiered storage use cases.
* The old BLOOM_FILTER_PREFIX_* stats have a different meaning: no longer referring to iterators but to point queries using prefix filters. BLOOM_FILTER_PREFIX_TRUE_POSITIVE is added for computing the prefix false positive rate on point queries, which can be due to filter false positives as well as different keys with the same prefix.
* Similarly, the non-PREFIX BLOOM_FILTER stats are now for whole key filtering only.

Pull Request resolved: https://github.com/facebook/rocksdb/pull/11460

Test Plan:
unit tests updated, including updating many to pop the stat value since last read to improve test
readability and maintainability.

Performance test shows a consistent small improvement with these changes, both with clang and with gcc. CPU profile indicates that RecordTick is using less CPU, and this makes sense at least for a high filter miss rate. Before, we were recording two ticks per filter miss in iterators (CHECKED & USEFUL) and now recording just one (FILTERED).

Create DB with
```
TEST_TMPDIR=/dev/shm ./db_bench -benchmarks=fillrandom -num=10000000 -disable_wal=1 -write_buffer_size=30000000 -bloom_bits=8 -compaction_style=2 -fifo_compaction_max_table_files_size_mb=10000 -fifo_compaction_allow_compaction=0 -prefix_size=8
```
And run simultaneous before&after with
```
TEST_TMPDIR=/dev/shm ./db_bench -readonly -benchmarks=seekrandom[-X1000] -num=10000000 -bloom_bits=8 -compaction_style=2 -fifo_compaction_max_table_files_size_mb=10000 -fifo_compaction_allow_compaction=0 -prefix_size=8 -seek_nexts=1 -duration=20 -seed=43 -threads=8 -cache_size=1000000000 -statistics
```
Before: seekrandom [AVG 275 runs] : 189680 (± 222) ops/sec;   18.4 (± 0.0) MB/sec
After: seekrandom [AVG 275 runs] : 197110 (± 208) ops/sec;   19.1 (± 0.0) MB/sec

Reviewed By: ajkr

Differential Revision: D46029177

Pulled By: pdillinger

fbshipit-source-id: cdace79a2ea548d46c5900b068c5b7c3a02e5822
2023-05-19 15:25:49 -07:00
..
aligned_buffer.h remove dependency on options.h for port_posix.h andport_win.h (#11214) 2023-02-13 02:21:38 -08:00
async_file_reader.cc Return any errors returned by ReadAsync to the MultiGet caller (#11171) 2023-02-02 16:35:27 -08:00
async_file_reader.h Add some missing headers (#10519) 2022-08-11 12:45:50 -07:00
autovector.h Remove RocksDB LITE (#11147) 2023-01-27 13:14:19 -08:00
autovector_test.cc Remove RocksDB LITE (#11147) 2023-01-27 13:14:19 -08:00
bloom_impl.h Simplify detection of x86 CPU features (#11419) 2023-05-09 22:25:45 -07:00
bloom_test.cc Fix an uninitialized variable warning for g++ 12.2.0 (#10995) 2022-11-30 19:27:28 -08:00
build_version.cc.in Remove RocksDB LITE (#11147) 2023-01-27 13:14:19 -08:00
cast_util.h More refactoring ahead of footer & meta changes (#9240) 2021-12-10 08:13:26 -08:00
channel.h Fix and detect headers with missing dependencies (#8893) 2021-09-10 10:00:26 -07:00
cleanable.cc Fix compile error in Clang 13 (#10033) 2022-05-28 00:15:28 -07:00
coding.cc clang-format cache/ and util/ directories (#10867) 2022-10-26 12:08:20 -07:00
coding.h clang-format cache/ and util/ directories (#10867) 2022-10-26 12:08:20 -07:00
coding_lean.h New stable, fixed-length cache keys (#9126) 2021-12-16 17:15:13 -08:00
coding_test.cc clang-format cache/ and util/ directories (#10867) 2022-10-26 12:08:20 -07:00
compaction_job_stats_impl.cc Remove RocksDB LITE (#11147) 2023-01-27 13:14:19 -08:00
comparator.cc Remove RocksDB LITE (#11147) 2023-01-27 13:14:19 -08:00
compression.cc Fix bug in WAL streaming uncompression (#11198) 2023-02-08 12:05:49 -08:00
compression.h Change internal headers with duplicate names (#11408) 2023-05-17 11:27:09 -07:00
compression_context_cache.cc clang-format cache/ and util/ directories (#10867) 2022-10-26 12:08:20 -07:00
compression_context_cache.h Replace namespace name "rocksdb" with ROCKSDB_NAMESPACE (#6433) 2020-02-20 12:09:57 -08:00
concurrent_task_limiter_impl.cc clang-format cache/ and util/ directories (#10867) 2022-10-26 12:08:20 -07:00
concurrent_task_limiter_impl.h clang-format cache/ and util/ directories (#10867) 2022-10-26 12:08:20 -07:00
core_local.h remove dependency on options.h for port_posix.h andport_win.h (#11214) 2023-02-13 02:21:38 -08:00
coro_utils.h Add some missing headers (#10519) 2022-08-11 12:45:50 -07:00
crc32c.cc Simplify detection of x86 CPU features (#11419) 2023-05-09 22:25:45 -07:00
crc32c.h clang-format cache/ and util/ directories (#10867) 2022-10-26 12:08:20 -07:00
crc32c_arm64.cc Minimal RocksJava compliance with Java 8 language level (EB 1046) (#10951) 2023-05-17 19:44:24 -07:00
crc32c_arm64.h Manual interventions for clang-format util/ (#10870) 2022-10-26 12:08:20 -07:00
crc32c_ppc.c Fix Compilation on ppc64le using Clang 11 (#7713) 2020-12-01 11:21:44 -08:00
crc32c_ppc.h Fix and detect headers with missing dependencies (#8893) 2021-09-10 10:00:26 -07:00
crc32c_ppc_asm.S Fix Compilation on ppc64le using Clang 11 (#7713) 2020-12-01 11:21:44 -08:00
crc32c_ppc_constants.h Remove PATENTS text from a few straggler files (#5326) 2019-05-21 16:22:35 -07:00
crc32c_test.cc clang-format cache/ and util/ directories (#10867) 2022-10-26 12:08:20 -07:00
data_structure.cc Improve SmallEnumSet (#11178) 2023-02-08 20:14:57 -08:00
defer.h clang-format cache/ and util/ directories (#10867) 2022-10-26 12:08:20 -07:00
defer_test.cc clang-format cache/ and util/ directories (#10867) 2022-10-26 12:08:20 -07:00
distributed_mutex.h remove dependency on options.h for port_posix.h andport_win.h (#11214) 2023-02-13 02:21:38 -08:00
duplicate_detector.h Remove RocksDB LITE (#11147) 2023-01-27 13:14:19 -08:00
dynamic_bloom.cc clang-format cache/ and util/ directories (#10867) 2022-10-26 12:08:20 -07:00
dynamic_bloom.h clang-format cache/ and util/ directories (#10867) 2022-10-26 12:08:20 -07:00
dynamic_bloom_test.cc clang-format cache/ and util/ directories (#10867) 2022-10-26 12:08:20 -07:00
fastrange.h Fix and detect headers with missing dependencies (#8893) 2021-09-10 10:00:26 -07:00
file_checksum_helper.cc Group rocksdb.sst.read.micros stat by IOActivity flush and compaction (#11288) 2023-04-21 09:07:18 -07:00
file_checksum_helper.h remove dependency on options.h for port_posix.h andport_win.h (#11214) 2023-02-13 02:21:38 -08:00
file_reader_writer_test.cc Remove RocksDB LITE (#11147) 2023-01-27 13:14:19 -08:00
filelock_test.cc clang-format cache/ and util/ directories (#10867) 2022-10-26 12:08:20 -07:00
filter_bench.cc Use user-provided ReadOptions for metadata block reads more often (#11208) 2023-04-04 16:53:14 -07:00
gflags_compat.h Fix gflags_compat.h (#11346) 2023-04-03 10:41:00 -07:00
hash.cc Experimental support for SST unique IDs (#8990) 2021-10-18 23:32:01 -07:00
hash.h Experimental support for SST unique IDs (#8990) 2021-10-18 23:32:01 -07:00
hash128.h Upgrade xxhash, add Hash128 (#8634) 2021-08-20 18:41:51 -07:00
hash_containers.h Meta-internal folly integration with F14FastMap (#9546) 2022-04-13 07:34:01 -07:00
hash_map.h Change HashMap::Insert()'s value to a const reference (#6567) 2020-03-20 14:59:54 -07:00
hash_test.cc clang-format cache/ and util/ directories (#10867) 2022-10-26 12:08:20 -07:00
heap.h clang-format cache/ and util/ directories (#10867) 2022-10-26 12:08:20 -07:00
heap_test.cc clang-format cache/ and util/ directories (#10867) 2022-10-26 12:08:20 -07:00
kv_map.h clang-format cache/ and util/ directories (#10867) 2022-10-26 12:08:20 -07:00
log_write_bench.cc Add a SystemClock class to capture the time functions of an Env (#7858) 2021-01-25 22:09:11 -08:00
math.h Simplify detection of x86 CPU features (#11419) 2023-05-09 22:25:45 -07:00
math128.h Derive cache keys from SST unique IDs (#10394) 2022-08-12 13:49:49 -07:00
murmurhash.cc Manual interventions for clang-format util/ (#10870) 2022-10-26 12:08:20 -07:00
murmurhash.h clang-format cache/ and util/ directories (#10867) 2022-10-26 12:08:20 -07:00
mutexlock.h remove dependency on options.h for port_posix.h andport_win.h (#11214) 2023-02-13 02:21:38 -08:00
ppc-opcode.h Remove PATENTS text from a few straggler files (#5326) 2019-05-21 16:22:35 -07:00
random.cc remove dependency on options.h for port_posix.h andport_win.h (#11214) 2023-02-13 02:21:38 -08:00
random.h clang-format cache/ and util/ directories (#10867) 2022-10-26 12:08:20 -07:00
random_test.cc clang-format cache/ and util/ directories (#10867) 2022-10-26 12:08:20 -07:00
rate_limiter.cc Change internal headers with duplicate names (#11408) 2023-05-17 11:27:09 -07:00
rate_limiter_impl.h Change internal headers with duplicate names (#11408) 2023-05-17 11:27:09 -07:00
rate_limiter_test.cc Change internal headers with duplicate names (#11408) 2023-05-17 11:27:09 -07:00
repeatable_thread.h Use SystemClock* instead of std::shared_ptr<SystemClock> in lower level routines (#8033) 2021-03-15 04:34:11 -07:00
repeatable_thread_test.cc Print stack traces on frozen tests in CI (#10828) 2022-10-18 00:35:35 -07:00
ribbon_alg.h Use only ASCII in source files (#10164) 2022-06-15 14:44:43 -07:00
ribbon_config.cc Refine Ribbon configuration, improve testing, add Homogeneous (#7879) 2021-02-26 08:50:42 -08:00
ribbon_config.h Refine Ribbon configuration, improve testing, add Homogeneous (#7879) 2021-02-26 08:50:42 -08:00
ribbon_impl.h Account Bloom/Ribbon filter construction memory in global memory limit (#9073) 2021-11-18 09:42:20 -08:00
ribbon_test.cc util/ribbon_test.cc: avoid ambiguous reversed operator error in c++20 (#11371) 2023-04-12 13:24:34 -07:00
set_comparator.h Fix and detect headers with missing dependencies (#8893) 2021-09-10 10:00:26 -07:00
single_thread_executor.h Add some missing headers (#10519) 2022-08-11 12:45:50 -07:00
slice.cc Remove RocksDB LITE (#11147) 2023-01-27 13:14:19 -08:00
slice_test.cc Improve SmallEnumSet (#11178) 2023-02-08 20:14:57 -08:00
slice_transform_test.cc Much better stats for seeks and prefix filtering (#11460) 2023-05-19 15:25:49 -07:00
status.cc Merge operator failed subcode (#11231) 2023-02-17 10:58:46 -08:00
stderr_logger.cc Fix an import issue in fbcode. (#10604) 2022-08-29 21:09:36 -07:00
stderr_logger.h Fix an import issue in fbcode. (#10604) 2022-08-29 21:09:36 -07:00
stop_watch.h Change internal headers with duplicate names (#11408) 2023-05-17 11:27:09 -07:00
string_util.cc Remove RocksDB LITE (#11147) 2023-01-27 13:14:19 -08:00
string_util.h Remove RocksDB LITE (#11147) 2023-01-27 13:14:19 -08:00
thread_guard.h Introduce a ThreadGuard class and use it in ExternalSSTFileTest.PickedLevelBug (#8112) 2021-03-25 22:08:58 -07:00
thread_list_test.cc Group rocksdb.sst.read.micros stat by IOActivity flush and compaction (#11288) 2023-04-21 09:07:18 -07:00
thread_local.cc clang-format cache/ and util/ directories (#10867) 2022-10-26 12:08:20 -07:00
thread_local.h clang-format cache/ and util/ directories (#10867) 2022-10-26 12:08:20 -07:00
thread_local_test.cc Remove RocksDB LITE (#11147) 2023-01-27 13:14:19 -08:00
thread_operation.h Group rocksdb.sst.read.micros stat by IOActivity flush and compaction (#11288) 2023-04-21 09:07:18 -07:00
threadpool_imp.cc clang-format cache/ and util/ directories (#10867) 2022-10-26 12:08:20 -07:00
threadpool_imp.h clang-format cache/ and util/ directories (#10867) 2022-10-26 12:08:20 -07:00
timer.h clang-format cache/ and util/ directories (#10867) 2022-10-26 12:08:20 -07:00
timer_queue.h clang-format cache/ and util/ directories (#10867) 2022-10-26 12:08:20 -07:00
timer_queue_test.cc clang-format cache/ and util/ directories (#10867) 2022-10-26 12:08:20 -07:00
timer_test.cc Print stack traces on frozen tests in CI (#10828) 2022-10-18 00:35:35 -07:00
udt_util.h Add support in log writer and reader for a user-defined timestamp size record (#11433) 2023-05-11 17:26:19 -07:00
user_comparator_wrapper.h Make UserComparatorWrapper not Customizable (#10837) 2022-10-21 12:27:50 -07:00
vector_iterator.h Make InternalKeyComparator not configurable (#10342) 2022-07-14 10:09:31 -07:00
work_queue.h Fix and detect headers with missing dependencies (#8893) 2021-09-10 10:00:26 -07:00
work_queue_test.cc Print stack traces on frozen tests in CI (#10828) 2022-10-18 00:35:35 -07:00
xxhash.cc Manual interventions for clang-format util/ (#10870) 2022-10-26 12:08:20 -07:00
xxhash.h Upgrade xxhash.h to latest dev (#11098) 2023-01-19 12:07:50 -08:00
xxph3.h Manual interventions for clang-format util/ (#10870) 2022-10-26 12:08:20 -07:00