rocksdb/db/db_impl
Changyu Bi 9d77bf8f7b Fragment memtable range tombstone in the write path (#10380)
Summary:
- Right now each read fragments the memtable range tombstones https://github.com/facebook/rocksdb/issues/4808. This PR explores the idea of fragmenting memtable range tombstones in the write path and reads can just read this cached fragmented tombstone without any fragmenting cost. This PR only does the caching for immutable memtable, and does so right before a memtable is added to an immutable memtable list. The fragmentation is done without holding mutex to minimize its performance impact.
- db_bench is updated to print out the number of range deletions executed if there is any.

Pull Request resolved: https://github.com/facebook/rocksdb/pull/10380

Test Plan:
- CI, added asserts in various places to check whether a fragmented range tombstone list should have been constructed.
- Benchmark: as this PR only optimizes immutable memtable path, the number of writes in the benchmark is chosen such  an immutable memtable is created and range tombstones are in that memtable.

```
single thread:
./db_bench --benchmarks=fillrandom,readrandom --writes_per_range_tombstone=1 --max_write_buffer_number=100 --min_write_buffer_number_to_merge=100 --writes=500000 --reads=100000 --max_num_range_tombstones=100

multi_thread
./db_bench --benchmarks=fillrandom,readrandom --writes_per_range_tombstone=1 --max_write_buffer_number=100 --min_write_buffer_number_to_merge=100 --writes=15000 --reads=20000 --threads=32 --max_num_range_tombstones=100
```
Commit 99cdf16464 is included in benchmark result. It was an earlier attempt where tombstones are fragmented for each write operation. Reader threads share it using a shared_ptr which would slow down multi-thread read performance as seen in benchmark results.
Results are averaged over 5 runs.

Single thread result:
| Max # tombstones  | main fillrandom micros/op | 99cdf16464 | Post PR | main readrandom micros/op |  99cdf16464 | Post PR |
| ------------- | ------------- |------------- |------------- |------------- |------------- |------------- |
| 0    |6.68     |6.57     |6.72     |4.72     |4.79     |4.54     |
| 1    |6.67     |6.58     |6.62     |5.41     |4.74     |4.72     |
| 10   |6.59     |6.5      |6.56     |7.83     |4.69     |4.59     |
| 100  |6.62     |6.75     |6.58     |29.57    |5.04     |5.09     |
| 1000 |6.54     |6.82     |6.61     |320.33   |5.22     |5.21     |

32-thread result: note that "Max # tombstones" is per thread.
| Max # tombstones  | main fillrandom micros/op | 99cdf16464 | Post PR | main readrandom micros/op |  99cdf16464 | Post PR |
| ------------- | ------------- |------------- |------------- |------------- |------------- |------------- |
| 0    |234.52   |260.25   |239.42   |5.06     |5.38     |5.09     |
| 1    |236.46   |262.0    |231.1    |19.57    |22.14    |5.45     |
| 10   |236.95   |263.84   |251.49   |151.73   |21.61    |5.73     |
| 100  |268.16   |296.8    |280.13   |2308.52  |22.27    |6.57     |

Reviewed By: ajkr

Differential Revision: D37916564

Pulled By: cbi42

fbshipit-source-id: 05d6d2e16df26c374c57ddcca13a5bfe9d5b731e
2022-08-05 12:02:33 -07:00
..
compacted_db_impl.cc Return "invalid argument" when read timestamp is too old (#10109) 2022-06-06 14:36:22 -07:00
compacted_db_impl.h Add API for writing wide-column entities (#10242) 2022-06-25 15:30:47 -07:00
db_impl.cc Fragment memtable range tombstone in the write path (#10380) 2022-08-05 12:02:33 -07:00
db_impl.h Avoid allocations/copies for large GetMergeOperands() results (#10458) 2022-08-04 00:42:13 -07:00
db_impl_compaction_flush.cc Deflake DBWALTest.RaceInstallFlushResultsWithWalObsoletion (#10456) 2022-08-04 12:14:28 -07:00
db_impl_debug.cc Do not hold mutex when write keys if not necessary (#7516) 2022-07-21 13:35:36 -07:00
db_impl_experimental.cc Remove unused fields from FileMetaData (temporarily) (#10443) 2022-08-01 17:56:13 -07:00
db_impl_files.cc Do not hold mutex when write keys if not necessary (#7516) 2022-07-21 13:35:36 -07:00
db_impl_open.cc Fragment memtable range tombstone in the write path (#10380) 2022-08-05 12:02:33 -07:00
db_impl_readonly.cc Fragment memtable range tombstone in the write path (#10380) 2022-08-05 12:02:33 -07:00
db_impl_readonly.h Add API for writing wide-column entities (#10242) 2022-06-25 15:30:47 -07:00
db_impl_secondary.cc Fragment memtable range tombstone in the write path (#10380) 2022-08-05 12:02:33 -07:00
db_impl_secondary.h Update code comment and logging for secondary instance (#10260) 2022-07-05 10:09:44 -07:00
db_impl_write.cc Fragment memtable range tombstone in the write path (#10380) 2022-08-05 12:02:33 -07:00