Peter Dillinger
dd23e84cad
Re-implement GetApproximateMemTableStats for skip lists ( #13047 )
...
Summary:
GetApproximateMemTableStats() could return some bad results with the standard skip list memtable. See this new db_bench test showing the dismal distribution of results when the actual number of entries in range is 1000:
```
$ ./db_bench --benchmarks=filluniquerandom,approximatememtablestats,readrandom --value_size=1 --num=1000000 --batch_size=1000
...
filluniquerandom : 1.391 micros/op 718915 ops/sec 1.391 seconds 1000000 operations; 11.7 MB/s
approximatememtablestats : 3.711 micros/op 269492 ops/sec 3.711 seconds 1000000 operations;
Reported entry count stats (expected 1000):
Count: 1000000 Average: 2344.1611 StdDev: 26587.27
Min: 0 Median: 965.8555 Max: 835273
Percentiles: P50: 965.86 P75: 1610.77 P99: 12618.01 P99.9: 74991.58 P99.99: 830970.97
------------------------------------------------------
[ 0, 1 ] 131344 13.134% 13.134% ###
( 1, 2 ] 115 0.011% 13.146%
( 2, 3 ] 106 0.011% 13.157%
( 3, 4 ] 190 0.019% 13.176%
( 4, 6 ] 214 0.021% 13.197%
( 6, 10 ] 522 0.052% 13.249%
( 10, 15 ] 748 0.075% 13.324%
( 15, 22 ] 1002 0.100% 13.424%
( 22, 34 ] 1948 0.195% 13.619%
( 34, 51 ] 3067 0.307% 13.926%
( 51, 76 ] 4213 0.421% 14.347%
( 76, 110 ] 5721 0.572% 14.919%
( 110, 170 ] 11375 1.137% 16.056%
( 170, 250 ] 17928 1.793% 17.849%
( 250, 380 ] 36597 3.660% 21.509% #
( 380, 580 ] 77882 7.788% 29.297% ##
( 580, 870 ] 160193 16.019% 45.317% ###
( 870, 1300 ] 210098 21.010% 66.326% ####
( 1300, 1900 ] 167461 16.746% 83.072% ###
( 1900, 2900 ] 78678 7.868% 90.940% ##
( 2900, 4400 ] 47743 4.774% 95.715% #
( 4400, 6600 ] 17650 1.765% 97.480%
( 6600, 9900 ] 11895 1.190% 98.669%
( 9900, 14000 ] 4993 0.499% 99.168%
( 14000, 22000 ] 2384 0.238% 99.407%
( 22000, 33000 ] 1966 0.197% 99.603%
( 50000, 75000 ] 2968 0.297% 99.900%
( 570000, 860000 ] 999 0.100% 100.000%
readrandom : 1.967 micros/op 508487 ops/sec 1.967 seconds 1000000 operations; 8.2 MB/s (1000000 of 1000000 found)
```
Perhaps the only good thing to say about the old implementation was that it was fast, though apparently not that fast.
I've implemented a much more robust and reasonably fast new version of the function. It's still logarithmic but with some larger constant factors. The standard deviation from true count is around 20% or less, and roughly the CPU cost of two memtable point look-ups. See code comments for detail.
```
$ ./db_bench --benchmarks=filluniquerandom,approximatememtablestats,readrandom --value_size=1 --num=1000000 --batch_size=1000
...
filluniquerandom : 1.478 micros/op 676434 ops/sec 1.478 seconds 1000000 operations; 11.0 MB/s
approximatememtablestats : 2.694 micros/op 371157 ops/sec 2.694 seconds 1000000 operations;
Reported entry count stats (expected 1000):
Count: 1000000 Average: 1073.5158 StdDev: 197.80
Min: 608 Median: 1079.9506 Max: 2176
Percentiles: P50: 1079.95 P75: 1223.69 P99: 1852.36 P99.9: 1898.70 P99.99: 2176.00
------------------------------------------------------
( 580, 870 ] 134848 13.485% 13.485% ###
( 870, 1300 ] 747868 74.787% 88.272% ###############
( 1300, 1900 ] 116536 11.654% 99.925% ##
( 1900, 2900 ] 748 0.075% 100.000%
readrandom : 1.997 micros/op 500654 ops/sec 1.997 seconds 1000000 operations; 8.1 MB/s (1000000 of 1000000 found)
```
We can already see that the distribution of results is dramatically better and wonderfully normal-looking, with relative standard deviation around 20%. The function is also FASTER, at least with these parameters. Let's look how this behavior generalizes, first *much* larger range:
```
$ ./db_bench --benchmarks=filluniquerandom,approximatememtablestats,readrandom --value_size=1 --num=1000000 --batch_size=30000
filluniquerandom : 1.390 micros/op 719654 ops/sec 1.376 seconds 990000 operations; 11.7 MB/s
approximatememtablestats : 1.129 micros/op 885649 ops/sec 1.129 seconds 1000000 operations;
Reported entry count stats (expected 30000):
Count: 1000000 Average: 31098.8795 StdDev: 3601.47
Min: 21504 Median: 29333.9303 Max: 43008
Percentiles: P50: 29333.93 P75: 33018.00 P99: 43008.00 P99.9: 43008.00 P99.99: 43008.00
------------------------------------------------------
( 14000, 22000 ] 408 0.041% 0.041%
( 22000, 33000 ] 749327 74.933% 74.974% ###############
( 33000, 50000 ] 250265 25.027% 100.000% #####
readrandom : 1.894 micros/op 528083 ops/sec 1.894 seconds 1000000 operations; 8.5 MB/s (989989 of 1000000 found)
```
This is *even faster* and relatively *more accurate*, with relative standard deviation closer to 10%. Code comments explain why. Now let's look at smaller ranges. Implementation quirks or conveniences:
* When actual number in range is >= 40, the minimum return value is 40.
* When the actual is <= 10, it is guaranteed to return that actual number.
```
$ ./db_bench --benchmarks=filluniquerandom,approximatememtablestats,readrandom --value_size=1 --num=1000000 --batch_size=75
...
filluniquerandom : 1.417 micros/op 705668 ops/sec 1.417 seconds 999975 operations; 11.4 MB/s
approximatememtablestats : 3.342 micros/op 299197 ops/sec 3.342 seconds 1000000 operations;
Reported entry count stats (expected 75):
Count: 1000000 Average: 75.1210 StdDev: 15.02
Min: 40 Median: 71.9395 Max: 256
Percentiles: P50: 71.94 P75: 89.69 P99: 119.12 P99.9: 166.68 P99.99: 229.78
------------------------------------------------------
( 34, 51 ] 38867 3.887% 3.887% #
( 51, 76 ] 550554 55.055% 58.942% ###########
( 76, 110 ] 398854 39.885% 98.828% ########
( 110, 170 ] 11353 1.135% 99.963%
( 170, 250 ] 364 0.036% 99.999%
( 250, 380 ] 8 0.001% 100.000%
readrandom : 1.861 micros/op 537224 ops/sec 1.861 seconds 1000000 operations; 8.7 MB/s (999974 of 1000000 found)
$ ./db_bench --benchmarks=filluniquerandom,approximatememtablestats,readrandom --value_size=1 --num=1000000 --batch_size=25
...
filluniquerandom : 1.501 micros/op 666283 ops/sec 1.501 seconds 1000000 operations; 10.8 MB/s
approximatememtablestats : 5.118 micros/op 195401 ops/sec 5.118 seconds 1000000 operations;
Reported entry count stats (expected 25):
Count: 1000000 Average: 26.2392 StdDev: 4.58
Min: 25 Median: 28.4590 Max: 72
Percentiles: P50: 28.46 P75: 31.69 P99: 49.27 P99.9: 67.95 P99.99: 72.00
------------------------------------------------------
( 22, 34 ] 928936 92.894% 92.894% ###################
( 34, 51 ] 67960 6.796% 99.690% #
( 51, 76 ] 3104 0.310% 100.000%
readrandom : 1.892 micros/op 528595 ops/sec 1.892 seconds 1000000 operations; 8.6 MB/s (1000000 of 1000000 found)
$ ./db_bench --benchmarks=filluniquerandom,approximatememtablestats,readrandom --value_size=1 --num=1000000 --batch_size=10
...
filluniquerandom : 1.642 micros/op 608916 ops/sec 1.642 seconds 1000000 operations; 9.9 MB/s
approximatememtablestats : 3.042 micros/op 328721 ops/sec 3.042 seconds 1000000 operations;
Reported entry count stats (expected 10):
Count: 1000000 Average: 10.0000 StdDev: 0.00
Min: 10 Median: 10.0000 Max: 10
Percentiles: P50: 10.00 P75: 10.00 P99: 10.00 P99.9: 10.00 P99.99: 10.00
------------------------------------------------------
( 6, 10 ] 1000000 100.000% 100.000% ####################
readrandom : 1.805 micros/op 554126 ops/sec 1.805 seconds 1000000 operations; 9.0 MB/s (1000000 of 1000000 found)
```
Remarkably consistent.
Pull Request resolved: https://github.com/facebook/rocksdb/pull/13047
Test Plan: new db_bench test for both performance and accuracy (see above); added to crash test; unit test updated.
Reviewed By: cbi42
Differential Revision: D63722003
Pulled By: pdillinger
fbshipit-source-id: cfc8613c085e87c17ecec22d82601aac2a5a1b26
2024-10-02 14:25:50 -07:00