rocksdb/tools
Changyu Bi 9d77bf8f7b Fragment memtable range tombstone in the write path (#10380)
Summary:
- Right now each read fragments the memtable range tombstones https://github.com/facebook/rocksdb/issues/4808. This PR explores the idea of fragmenting memtable range tombstones in the write path and reads can just read this cached fragmented tombstone without any fragmenting cost. This PR only does the caching for immutable memtable, and does so right before a memtable is added to an immutable memtable list. The fragmentation is done without holding mutex to minimize its performance impact.
- db_bench is updated to print out the number of range deletions executed if there is any.

Pull Request resolved: https://github.com/facebook/rocksdb/pull/10380

Test Plan:
- CI, added asserts in various places to check whether a fragmented range tombstone list should have been constructed.
- Benchmark: as this PR only optimizes immutable memtable path, the number of writes in the benchmark is chosen such  an immutable memtable is created and range tombstones are in that memtable.

```
single thread:
./db_bench --benchmarks=fillrandom,readrandom --writes_per_range_tombstone=1 --max_write_buffer_number=100 --min_write_buffer_number_to_merge=100 --writes=500000 --reads=100000 --max_num_range_tombstones=100

multi_thread
./db_bench --benchmarks=fillrandom,readrandom --writes_per_range_tombstone=1 --max_write_buffer_number=100 --min_write_buffer_number_to_merge=100 --writes=15000 --reads=20000 --threads=32 --max_num_range_tombstones=100
```
Commit 99cdf16464 is included in benchmark result. It was an earlier attempt where tombstones are fragmented for each write operation. Reader threads share it using a shared_ptr which would slow down multi-thread read performance as seen in benchmark results.
Results are averaged over 5 runs.

Single thread result:
| Max # tombstones  | main fillrandom micros/op | 99cdf16464 | Post PR | main readrandom micros/op |  99cdf16464 | Post PR |
| ------------- | ------------- |------------- |------------- |------------- |------------- |------------- |
| 0    |6.68     |6.57     |6.72     |4.72     |4.79     |4.54     |
| 1    |6.67     |6.58     |6.62     |5.41     |4.74     |4.72     |
| 10   |6.59     |6.5      |6.56     |7.83     |4.69     |4.59     |
| 100  |6.62     |6.75     |6.58     |29.57    |5.04     |5.09     |
| 1000 |6.54     |6.82     |6.61     |320.33   |5.22     |5.21     |

32-thread result: note that "Max # tombstones" is per thread.
| Max # tombstones  | main fillrandom micros/op | 99cdf16464 | Post PR | main readrandom micros/op |  99cdf16464 | Post PR |
| ------------- | ------------- |------------- |------------- |------------- |------------- |------------- |
| 0    |234.52   |260.25   |239.42   |5.06     |5.38     |5.09     |
| 1    |236.46   |262.0    |231.1    |19.57    |22.14    |5.45     |
| 10   |236.95   |263.84   |251.49   |151.73   |21.61    |5.73     |
| 100  |268.16   |296.8    |280.13   |2308.52  |22.27    |6.57     |

Reviewed By: ajkr

Differential Revision: D37916564

Pulled By: cbi42

fbshipit-source-id: 05d6d2e16df26c374c57ddcca13a5bfe9d5b731e
2022-08-05 12:02:33 -07:00
..
advisor
block_cache_analyzer Use std::numeric_limits<> (#9954) 2022-05-05 13:08:21 -07:00
dump
analyze_txn_stress_test.sh
auto_sanity_test.sh
backup_db.sh
benchmark.sh Revert "Add a blob-specific cache priority (#10309)" (#10434) 2022-07-29 07:18:15 -07:00
benchmark_ci.py Run new benchmark script in branch. (#10303) 2022-07-25 14:44:10 -07:00
benchmark_compare.sh Run new benchmark script in branch. (#10303) 2022-07-25 14:44:10 -07:00
benchmark_leveldb.sh
blob_dump.cc
check_all_python.py
check_format_compatible.sh Post 7.5 branch cut changes (#10376) 2022-07-18 12:58:04 -07:00
CMakeLists.txt
db_bench.cc
db_bench_tool.cc Fragment memtable range tombstone in the write path (#10380) 2022-08-05 12:02:33 -07:00
db_bench_tool_test.cc Support prepopulating/warming the blob cache (#10298) 2022-07-17 07:13:59 -07:00
db_crashtest.py Add CompressedSecondaryCache into stress test (#10442) 2022-08-01 11:01:03 -07:00
db_repl_stress.cc
db_sanity_test.cc Remove own ToString() (#9955) 2022-05-06 13:03:58 -07:00
dbench_monitor
Dockerfile
generate_random_db.sh
ingest_external_sst.sh
io_tracer_parser.cc
io_tracer_parser_test.cc
io_tracer_parser_tool.cc
io_tracer_parser_tool.h
ldb.cc
ldb_cmd.cc ldb to display public unique id and dump work with key range (#10417) 2022-07-26 20:40:18 -07:00
ldb_cmd_impl.h Support single delete in ldb (#9469) 2022-05-10 16:37:19 -07:00
ldb_cmd_test.cc Add blob source to retrieve blobs in RocksDB (#10198) 2022-06-20 20:58:11 -07:00
ldb_test.py Make it possible to enable blob files starting from a certain LSM tree level (#10077) 2022-06-02 20:04:33 -07:00
ldb_tool.cc Default try_load_options to true when DB is specified (#9937) 2022-05-04 08:49:46 -07:00
pflag
reduce_levels_test.cc Remove own ToString() (#9955) 2022-05-06 13:03:58 -07:00
regression_test.sh regression_test.sh: kill very old db_bench (and more) (#10441) 2022-08-02 09:16:17 -07:00
restore_db.sh
rocksdb_dump_test.sh
run_blob_bench.sh Support prepopulating/warming the blob cache (#10298) 2022-07-17 07:13:59 -07:00
run_flash_bench.sh
run_leveldb.sh
sample-dump.dmp
simulated_hybrid_file_system.cc
simulated_hybrid_file_system.h
sst_dump.cc
sst_dump_test.cc
sst_dump_tool.cc Support using ZDICT_finalizeDictionary to generate zstd dictionary (#9857) 2022-05-20 12:09:09 -07:00
trace_analyzer.cc
trace_analyzer_test.cc Support read rate-limiting in SequentialFileReader (#9973) 2022-05-24 10:28:57 -07:00
trace_analyzer_tool.cc Support read rate-limiting in SequentialFileReader (#9973) 2022-05-24 10:28:57 -07:00
trace_analyzer_tool.h
verify_random_db.sh Fix some bugs in verify_random_db.sh (#10112) 2022-06-03 16:35:13 -07:00
write_external_sst.sh
write_stress.cc
write_stress_runner.py