rocksdb/tools
Peter Dillinger dd23e84cad Re-implement GetApproximateMemTableStats for skip lists (#13047)
Summary:
GetApproximateMemTableStats() could return some bad results with the standard skip list memtable. See this new db_bench test showing the dismal distribution of results when the actual number of entries in range is 1000:

```
$ ./db_bench --benchmarks=filluniquerandom,approximatememtablestats,readrandom --value_size=1 --num=1000000 --batch_size=1000
...
filluniquerandom :       1.391 micros/op 718915 ops/sec 1.391 seconds 1000000 operations;   11.7 MB/s
approximatememtablestats :       3.711 micros/op 269492 ops/sec 3.711 seconds 1000000 operations;
Reported entry count stats (expected 1000):
Count: 1000000 Average: 2344.1611  StdDev: 26587.27
Min: 0  Median: 965.8555  Max: 835273
Percentiles: P50: 965.86 P75: 1610.77 P99: 12618.01 P99.9: 74991.58 P99.99: 830970.97
------------------------------------------------------
[       0,       1 ]   131344  13.134%  13.134% ###
(       1,       2 ]      115   0.011%  13.146%
(       2,       3 ]      106   0.011%  13.157%
(       3,       4 ]      190   0.019%  13.176%
(       4,       6 ]      214   0.021%  13.197%
(       6,      10 ]      522   0.052%  13.249%
(      10,      15 ]      748   0.075%  13.324%
(      15,      22 ]     1002   0.100%  13.424%
(      22,      34 ]     1948   0.195%  13.619%
(      34,      51 ]     3067   0.307%  13.926%
(      51,      76 ]     4213   0.421%  14.347%
(      76,     110 ]     5721   0.572%  14.919%
(     110,     170 ]    11375   1.137%  16.056%
(     170,     250 ]    17928   1.793%  17.849%
(     250,     380 ]    36597   3.660%  21.509% #
(     380,     580 ]    77882   7.788%  29.297% ##
(     580,     870 ]   160193  16.019%  45.317% ###
(     870,    1300 ]   210098  21.010%  66.326% ####
(    1300,    1900 ]   167461  16.746%  83.072% ###
(    1900,    2900 ]    78678   7.868%  90.940% ##
(    2900,    4400 ]    47743   4.774%  95.715% #
(    4400,    6600 ]    17650   1.765%  97.480%
(    6600,    9900 ]    11895   1.190%  98.669%
(    9900,   14000 ]     4993   0.499%  99.168%
(   14000,   22000 ]     2384   0.238%  99.407%
(   22000,   33000 ]     1966   0.197%  99.603%
(   50000,   75000 ]     2968   0.297%  99.900%
(  570000,  860000 ]      999   0.100% 100.000%

readrandom   :       1.967 micros/op 508487 ops/sec 1.967 seconds 1000000 operations;    8.2 MB/s (1000000 of 1000000 found)
```

Perhaps the only good thing to say about the old implementation was that it was fast, though apparently not that fast.

I've implemented a much more robust and reasonably fast new version of the function. It's still logarithmic but with some larger constant factors. The standard deviation from true count is around 20% or less, and roughly the CPU cost of two memtable point look-ups. See code comments for detail.

```
$ ./db_bench --benchmarks=filluniquerandom,approximatememtablestats,readrandom --value_size=1 --num=1000000 --batch_size=1000
...
filluniquerandom :       1.478 micros/op 676434 ops/sec 1.478 seconds 1000000 operations;   11.0 MB/s
approximatememtablestats :       2.694 micros/op 371157 ops/sec 2.694 seconds 1000000 operations;
Reported entry count stats (expected 1000):
Count: 1000000 Average: 1073.5158  StdDev: 197.80
Min: 608  Median: 1079.9506  Max: 2176
Percentiles: P50: 1079.95 P75: 1223.69 P99: 1852.36 P99.9: 1898.70 P99.99: 2176.00
------------------------------------------------------
(     580,     870 ]   134848  13.485%  13.485% ###
(     870,    1300 ]   747868  74.787%  88.272% ###############
(    1300,    1900 ]   116536  11.654%  99.925% ##
(    1900,    2900 ]      748   0.075% 100.000%

readrandom   :       1.997 micros/op 500654 ops/sec 1.997 seconds 1000000 operations;    8.1 MB/s (1000000 of 1000000 found)
```

We can already see that the distribution of results is dramatically better and wonderfully normal-looking, with relative standard deviation around 20%. The function is also FASTER, at least with these parameters. Let's look how this behavior generalizes, first *much* larger range:

```
$ ./db_bench --benchmarks=filluniquerandom,approximatememtablestats,readrandom --value_size=1 --num=1000000 --batch_size=30000
filluniquerandom :       1.390 micros/op 719654 ops/sec 1.376 seconds 990000 operations;   11.7 MB/s
approximatememtablestats :       1.129 micros/op 885649 ops/sec 1.129 seconds 1000000 operations;
Reported entry count stats (expected 30000):
Count: 1000000 Average: 31098.8795  StdDev: 3601.47
Min: 21504  Median: 29333.9303  Max: 43008
Percentiles: P50: 29333.93 P75: 33018.00 P99: 43008.00 P99.9: 43008.00 P99.99: 43008.00
------------------------------------------------------
(   14000,   22000 ]      408   0.041%   0.041%
(   22000,   33000 ]   749327  74.933%  74.974% ###############
(   33000,   50000 ]   250265  25.027% 100.000% #####

readrandom   :       1.894 micros/op 528083 ops/sec 1.894 seconds 1000000 operations;    8.5 MB/s (989989 of 1000000 found)
```

This is *even faster* and relatively *more accurate*, with relative standard deviation closer to 10%. Code comments explain why. Now let's look at smaller ranges. Implementation quirks or conveniences:
* When actual number in range is >= 40, the minimum return value is 40.
* When the actual is <= 10, it is guaranteed to return that actual number.
```
$ ./db_bench --benchmarks=filluniquerandom,approximatememtablestats,readrandom --value_size=1 --num=1000000 --batch_size=75
...
filluniquerandom :       1.417 micros/op 705668 ops/sec 1.417 seconds 999975 operations;   11.4 MB/s
approximatememtablestats :       3.342 micros/op 299197 ops/sec 3.342 seconds 1000000 operations;
Reported entry count stats (expected 75):
Count: 1000000 Average: 75.1210  StdDev: 15.02
Min: 40  Median: 71.9395  Max: 256
Percentiles: P50: 71.94 P75: 89.69 P99: 119.12 P99.9: 166.68 P99.99: 229.78
------------------------------------------------------
(      34,      51 ]    38867   3.887%   3.887% #
(      51,      76 ]   550554  55.055%  58.942% ###########
(      76,     110 ]   398854  39.885%  98.828% ########
(     110,     170 ]    11353   1.135%  99.963%
(     170,     250 ]      364   0.036%  99.999%
(     250,     380 ]        8   0.001% 100.000%

readrandom   :       1.861 micros/op 537224 ops/sec 1.861 seconds 1000000 operations;    8.7 MB/s (999974 of 1000000 found)

$ ./db_bench --benchmarks=filluniquerandom,approximatememtablestats,readrandom --value_size=1 --num=1000000 --batch_size=25
...
filluniquerandom :       1.501 micros/op 666283 ops/sec 1.501 seconds 1000000 operations;   10.8 MB/s
approximatememtablestats :       5.118 micros/op 195401 ops/sec 5.118 seconds 1000000 operations;
Reported entry count stats (expected 25):
Count: 1000000 Average: 26.2392  StdDev: 4.58
Min: 25  Median: 28.4590  Max: 72
Percentiles: P50: 28.46 P75: 31.69 P99: 49.27 P99.9: 67.95 P99.99: 72.00
------------------------------------------------------
(      22,      34 ]   928936  92.894%  92.894% ###################
(      34,      51 ]    67960   6.796%  99.690% #
(      51,      76 ]     3104   0.310% 100.000%

readrandom   :       1.892 micros/op 528595 ops/sec 1.892 seconds 1000000 operations;    8.6 MB/s (1000000 of 1000000 found)

$ ./db_bench --benchmarks=filluniquerandom,approximatememtablestats,readrandom --value_size=1 --num=1000000 --batch_size=10
...
filluniquerandom :       1.642 micros/op 608916 ops/sec 1.642 seconds 1000000 operations;    9.9 MB/s
approximatememtablestats :       3.042 micros/op 328721 ops/sec 3.042 seconds 1000000 operations;
Reported entry count stats (expected 10):
Count: 1000000 Average: 10.0000  StdDev: 0.00
Min: 10  Median: 10.0000  Max: 10
Percentiles: P50: 10.00 P75: 10.00 P99: 10.00 P99.9: 10.00 P99.99: 10.00
------------------------------------------------------
(       6,      10 ]  1000000 100.000% 100.000% ####################

readrandom   :       1.805 micros/op 554126 ops/sec 1.805 seconds 1000000 operations;    9.0 MB/s (1000000 of 1000000 found)
```

Remarkably consistent.

Pull Request resolved: https://github.com/facebook/rocksdb/pull/13047

Test Plan: new db_bench test for both performance and accuracy (see above); added to crash test; unit test updated.

Reviewed By: cbi42

Differential Revision: D63722003

Pulled By: pdillinger

fbshipit-source-id: cfc8613c085e87c17ecec22d82601aac2a5a1b26
2024-10-02 14:25:50 -07:00
..
advisor Fix lint issues after enable BLACK (#10717) 2022-09-21 13:37:51 -07:00
block_cache_analyzer Block cache analyzer: Calculate miss ratio for each caller (#10823) 2024-01-10 14:02:14 -08:00
dump internal_repo_rocksdb (435146444452818992) (#12115) 2023-12-01 11:15:17 -08:00
CMakeLists.txt
Dockerfile
analyze_txn_stress_test.sh
auto_sanity_test.sh
backup_db.sh Revamp check_format_compatible.sh (#8012) 2021-03-02 11:42:27 -08:00
benchmark.sh optimize file size statistics in benchmark script (#12363) 2024-02-21 15:45:18 -08:00
benchmark_ci.py Remove NUMA setting for benchmark-linux (#11180) 2023-02-02 15:15:09 -08:00
benchmark_compare.sh Fix file modes (#10815) 2022-10-13 09:00:37 -07:00
benchmark_leveldb.sh
blob_dump.cc Remove RocksDB LITE (#11147) 2023-01-27 13:14:19 -08:00
check_all_python.py Enable BLACK for internal_repo_rocksdb (#10710) 2022-09-20 17:47:52 -07:00
check_format_compatible.sh Update HISTORY.md, version.h, and the format compatibility check script for the 9.7 release (#13027) 2024-09-20 19:19:06 -07:00
db_bench.cc Add (& fix) some simple source code checks (#8821) 2021-09-07 21:19:27 -07:00
db_bench_tool.cc Re-implement GetApproximateMemTableStats for skip lists (#13047) 2024-10-02 14:25:50 -07:00
db_bench_tool_test.cc Group SST write in flush, compaction and db open with new stats (#11910) 2023-12-29 15:29:23 -08:00
db_crashtest.py Steps toward making IDENTITY file obsolete (#13019) 2024-09-19 14:05:21 -07:00
db_repl_stress.cc Prefer static_cast in place of most reinterpret_cast (#12308) 2024-02-07 10:44:11 -08:00
db_sanity_test.cc Remove 'virtual' when implied by 'override' (#12319) 2024-01-31 13:14:42 -08:00
dbench_monitor
generate_random_db.sh
ingest_external_sst.sh
io_tracer_parser.cc Remove RocksDB LITE (#11147) 2023-01-27 13:14:19 -08:00
io_tracer_parser_test.cc Remove RocksDB LITE (#11147) 2023-01-27 13:14:19 -08:00
io_tracer_parser_tool.cc Remove RocksDB LITE (#11147) 2023-01-27 13:14:19 -08:00
io_tracer_parser_tool.h Remove RocksDB LITE (#11147) 2023-01-27 13:14:19 -08:00
ldb.cc Remove RocksDB LITE (#11147) 2023-01-27 13:14:19 -08:00
ldb_cmd.cc Add an option to dump wal seqno gaps (#13014) 2024-09-18 17:48:18 -07:00
ldb_cmd_impl.h Add an option to dump wal seqno gaps (#13014) 2024-09-18 17:48:18 -07:00
ldb_cmd_test.cc Remove `bottommost_temperature` (#12389) 2024-02-27 14:48:00 -08:00
ldb_test.py Add an option to dump wal seqno gaps (#13014) 2024-09-18 17:48:18 -07:00
ldb_tool.cc Add LDB command and option for follower instances (#12682) 2024-05-28 23:21:32 -07:00
pflag
reduce_levels_test.cc Make option `level_compaction_dynamic_level_bytes` true by default (#11525) 2023-06-15 21:12:39 -07:00
regression_test.sh Fix regression script for async_io benchmarks (#11462) 2023-05-22 15:32:12 -07:00
restore_db.sh Revamp check_format_compatible.sh (#8012) 2021-03-02 11:42:27 -08:00
rocksdb_dump_test.sh
run_blob_bench.sh add exe and script path check (#11621) 2023-07-19 12:05:24 -07:00
run_flash_bench.sh
run_leveldb.sh
sample-dump.dmp
simulated_hybrid_file_system.cc Group SST write in flush, compaction and db open with new stats (#11910) 2023-12-29 15:29:23 -08:00
simulated_hybrid_file_system.h Remove RocksDB LITE (#11147) 2023-01-27 13:14:19 -08:00
sst_dump.cc Remove RocksDB LITE (#11147) 2023-01-27 13:14:19 -08:00
sst_dump_test.cc Allow SstFileReader to verify number of entries in SST files (#12418) 2024-03-12 11:05:20 -07:00
sst_dump_tool.cc Augment sst_dump tool to verify num_entries in table property (#12322) 2024-02-01 14:35:03 -08:00
trace_analyzer.cc Remove RocksDB LITE (#11147) 2023-01-27 13:14:19 -08:00
trace_analyzer_test.cc internal_repo_rocksdb (435146444452818992) (#12115) 2023-12-01 11:15:17 -08:00
trace_analyzer_tool.cc Trace analyzer: replace number with enumeration type (#10827) 2023-12-27 10:38:53 -08:00
trace_analyzer_tool.h Remove RocksDB LITE (#11147) 2023-01-27 13:14:19 -08:00
verify_random_db.sh Fix some bugs in verify_random_db.sh (#10112) 2022-06-03 16:35:13 -07:00
write_external_sst.sh Revamp check_format_compatible.sh (#8012) 2021-03-02 11:42:27 -08:00
write_stress.cc Remove RocksDB LITE (#11147) 2023-01-27 13:14:19 -08:00
write_stress_runner.py Enable BLACK for internal_repo_rocksdb (#10710) 2022-09-20 17:47:52 -07:00