rocksdb/db
Changyu Bi 229297d1b8 Refactor AddRangeDels() + consider range tombstone during compaction file cutting (#11113)
Summary:
A second attempt after https://github.com/facebook/rocksdb/issues/10802, with bug fixes and refactoring. This PR updates compaction logic to take range tombstones into account when determining whether to cut the current compaction output file (https://github.com/facebook/rocksdb/issues/4811). Before this change, only point keys were considered, and range tombstones could cause large compactions. For example, if the current compaction outputs is a range tombstone [a, b) and 2 point keys y, z, they would be added to the same file, and may overlap with too many files in the next level and cause a large compaction in the future. This PR also includes ajkr's effort to simplify the logic to add range tombstones to compaction output files in `AddRangeDels()` ([https://github.com/facebook/rocksdb/issues/11078](https://github.com/facebook/rocksdb/pull/11078#issuecomment-1386078861)).

The main change is for `CompactionIterator` to emit range tombstone start keys to be processed by `CompactionOutputs`. A new class `CompactionMergingIterator` is introduced to replace `MergingIterator` under `CompactionIterator` to enable emitting of range tombstone start keys. Further improvement after this PR include cutting compaction output at some grandparent boundary key (instead of the next output key) when cutting within a range tombstone to reduce overlap with grandparents.

Pull Request resolved: https://github.com/facebook/rocksdb/pull/11113

Test Plan:
* added unit test in db_range_del_test
* crash test with a small key range: `python3 tools/db_crashtest.py blackbox --simple --max_key=100 --interval=600 --write_buffer_size=262144 --target_file_size_base=256 --max_bytes_for_level_base=262144 --block_size=128 --value_size_mult=33 --subcompactions=10 --use_multiget=1 --delpercent=3 --delrangepercent=2 --verify_iterator_with_expected_state_one_in=2 --num_iterations=10`

Reviewed By: ajkr

Differential Revision: D42655709

Pulled By: cbi42

fbshipit-source-id: 8367e36ef5640e8f21c14a3855d4a8d6e360a34c
2023-02-22 12:28:18 -08:00
..
blob Refactor AddRangeDels() + consider range tombstone during compaction file cutting (#11113) 2023-02-22 12:28:18 -08:00
compaction Refactor AddRangeDels() + consider range tombstone during compaction file cutting (#11113) 2023-02-22 12:28:18 -08:00
db_impl fix -Wrange-loop-analysis in Apple clang version 12.0.0 (clang-1200.0.32.29) (#11240) 2023-02-22 05:44:03 -08:00
wide Add a new MultiGetEntity API (#11222) 2023-02-15 09:34:17 -08:00
arena_wrapped_db_iter.cc Ran clang-format on db/ directory (#10910) 2022-11-02 14:34:24 -07:00
arena_wrapped_db_iter.h Ran clang-format on db/ directory (#10910) 2022-11-02 14:34:24 -07:00
builder.cc Remove RocksDB LITE (#11147) 2023-01-27 13:14:19 -08:00
builder.h Include estimated bytes deleted by range tombstones in compensated file size (#10734) 2022-12-29 13:28:24 -08:00
c.cc add c api to set option fail_if_not_bottommost_level (#11158) 2023-02-21 10:52:09 -08:00
c_test.c Remove RocksDB LITE (#11147) 2023-01-27 13:14:19 -08:00
column_family.cc Remove RocksDB LITE (#11147) 2023-01-27 13:14:19 -08:00
column_family.h Remove RocksDB LITE (#11147) 2023-01-27 13:14:19 -08:00
column_family_test.cc Remove RocksDB LITE (#11147) 2023-01-27 13:14:19 -08:00
compact_files_test.cc Remove RocksDB LITE (#11147) 2023-01-27 13:14:19 -08:00
comparator_db_test.cc Ran clang-format on db/ directory (#10910) 2022-11-02 14:34:24 -07:00
convenience.cc Remove RocksDB LITE (#11147) 2023-01-27 13:14:19 -08:00
corruption_test.cc Remove RocksDB LITE (#11147) 2023-01-27 13:14:19 -08:00
cuckoo_table_db_test.cc Remove RocksDB LITE (#11147) 2023-01-27 13:14:19 -08:00
db_basic_test.cc Return any errors returned by ReadAsync to the MultiGet caller (#11171) 2023-02-02 16:35:27 -08:00
db_block_cache_test.cc Put Cache and CacheWrapper in new public header (#11192) 2023-02-09 12:12:02 -08:00
db_bloom_filter_test.cc Remove RocksDB LITE (#11147) 2023-01-27 13:14:19 -08:00
db_compaction_filter_test.cc Remove RocksDB LITE (#11147) 2023-01-27 13:14:19 -08:00
db_compaction_test.cc Allow canceling manual compaction while waiting for conflicting compaction (#11165) 2023-01-31 16:57:49 -08:00
db_dynamic_level_test.cc Remove RocksDB LITE (#11147) 2023-01-27 13:14:19 -08:00
db_encryption_test.cc Remove RocksDB LITE (#11147) 2023-01-27 13:14:19 -08:00
db_filesnapshot.cc Remove RocksDB LITE (#11147) 2023-01-27 13:14:19 -08:00
db_flush_test.cc Remove RocksDB LITE (#11147) 2023-01-27 13:14:19 -08:00
db_info_dumper.cc Ran clang-format on db/ directory (#10910) 2022-11-02 14:34:24 -07:00
db_info_dumper.h Add a DB Session ID (#6959) 2020-06-15 10:47:02 -07:00
db_inplace_update_test.cc Fix in-place updates for value types other than kTypeValue (#10254) 2022-06-27 16:37:09 -07:00
db_io_failure_test.cc Remove RocksDB LITE (#11147) 2023-01-27 13:14:19 -08:00
db_iter.cc Fix an assertion failure in DBIter::SeekToLast() when user-defined timestamp is enabled (#11223) 2023-02-21 11:57:58 -08:00
db_iter.h Remove RocksDB LITE (#11147) 2023-01-27 13:14:19 -08:00
db_iter_stress_test.cc Ran clang-format on db/ directory (#10910) 2022-11-02 14:34:24 -07:00
db_iter_test.cc Ran clang-format on db/ directory (#10910) 2022-11-02 14:34:24 -07:00
db_iterator_test.cc Remove RocksDB LITE (#11147) 2023-01-27 13:14:19 -08:00
db_kv_checksum_test.cc Print stack traces on frozen tests in CI (#10828) 2022-10-18 00:35:35 -07:00
db_log_iter_test.cc Remove RocksDB LITE (#11147) 2023-01-27 13:14:19 -08:00
db_logical_block_size_cache_test.cc Remove RocksDB LITE (#11147) 2023-01-27 13:14:19 -08:00
db_memtable_test.cc Add support for wide-column point lookups (#10540) 2022-08-19 11:51:12 -07:00
db_merge_operand_test.cc Remove RocksDB LITE (#11147) 2023-01-27 13:14:19 -08:00
db_merge_operator_test.cc Merge operator failed subcode (#11231) 2023-02-17 10:58:46 -08:00
db_options_test.cc Remove RocksDB LITE (#11147) 2023-01-27 13:14:19 -08:00
db_properties_test.cc Remove RocksDB LITE (#11147) 2023-01-27 13:14:19 -08:00
db_range_del_test.cc Refactor AddRangeDels() + consider range tombstone during compaction file cutting (#11113) 2023-02-22 12:28:18 -08:00
db_rate_limiter_test.cc Remove RocksDB LITE (#11147) 2023-01-27 13:14:19 -08:00
db_readonly_with_timestamp_test.cc Remove RocksDB LITE (#11147) 2023-01-27 13:14:19 -08:00
db_secondary_test.cc Remove RocksDB LITE (#11147) 2023-01-27 13:14:19 -08:00
db_sst_test.cc Remove RocksDB LITE (#11147) 2023-01-27 13:14:19 -08:00
db_statistics_test.cc Remove RocksDB LITE (#11147) 2023-01-27 13:14:19 -08:00
db_table_properties_test.cc Remove RocksDB LITE (#11147) 2023-01-27 13:14:19 -08:00
db_tailing_iter_test.cc Remove RocksDB LITE (#11147) 2023-01-27 13:14:19 -08:00
db_test.cc Remove RocksDB LITE (#11147) 2023-01-27 13:14:19 -08:00
db_test2.cc Remove RocksDB LITE (#11147) 2023-01-27 13:14:19 -08:00
db_test_util.cc Remove RocksDB LITE (#11147) 2023-01-27 13:14:19 -08:00
db_test_util.h Put Cache and CacheWrapper in new public header (#11192) 2023-02-09 12:12:02 -08:00
db_universal_compaction_test.cc Remove RocksDB LITE (#11147) 2023-01-27 13:14:19 -08:00
db_wal_test.cc Simplify TEST_F(DBWALTest, FixSyncWalOnObseletedWalWithNewManifestCausingMissingWAL) (#11186) 2023-02-06 16:10:03 -08:00
db_with_timestamp_basic_test.cc Fix an assertion failure in DBIter::SeekToLast() when user-defined timestamp is enabled (#11223) 2023-02-21 11:57:58 -08:00
db_with_timestamp_compaction_test.cc Remove RocksDB LITE (#11147) 2023-01-27 13:14:19 -08:00
db_with_timestamp_test_util.cc Add timestamp support to DBImplReadOnly (#10004) 2022-05-19 18:39:41 -07:00
db_with_timestamp_test_util.h Add timestamp support to DBImplReadOnly (#10004) 2022-05-19 18:39:41 -07:00
db_write_buffer_manager_test.cc Remove RocksDB LITE (#11147) 2023-01-27 13:14:19 -08:00
db_write_test.cc Attempt fix flaky DBWriteTest.LockWALInEffect (#11209) 2023-02-09 09:21:55 -08:00
dbformat.cc Remove copying of range tombstones keys in iterator (#10878) 2022-11-28 19:27:22 -08:00
dbformat.h Refactor AddRangeDels() + consider range tombstone during compaction file cutting (#11113) 2023-02-22 12:28:18 -08:00
dbformat_test.cc Ran clang-format on db/ directory (#10910) 2022-11-02 14:34:24 -07:00
deletefile_test.cc Remove RocksDB LITE (#11147) 2023-01-27 13:14:19 -08:00
error_handler.cc Remove RocksDB LITE (#11147) 2023-01-27 13:14:19 -08:00
error_handler.h Ran clang-format on db/ directory (#10910) 2022-11-02 14:34:24 -07:00
error_handler_fs_test.cc Remove RocksDB LITE (#11147) 2023-01-27 13:14:19 -08:00
event_helpers.cc Remove FactoryFunc from LoadXXXObject (#11203) 2023-02-17 12:54:07 -08:00
event_helpers.h Remove RocksDB LITE (#11147) 2023-01-27 13:14:19 -08:00
experimental.cc Remove RocksDB LITE (#11147) 2023-01-27 13:14:19 -08:00
external_sst_file_basic_test.cc Deprecate write_global_seqno and default to false (#11179) 2023-02-03 13:00:04 -08:00
external_sst_file_ingestion_job.cc Remove RocksDB LITE (#11147) 2023-01-27 13:14:19 -08:00
external_sst_file_ingestion_job.h Add missing range conflict check between file ingestion and RefitLevel() (#10988) 2022-12-29 15:05:36 -08:00
external_sst_file_test.cc Move ExternalSSTTestEnv to FileSystemWrapper (#11139) 2023-01-27 14:51:39 -08:00
fault_injection_test.cc Ran clang-format on db/ directory (#10910) 2022-11-02 14:34:24 -07:00
file_indexer.cc Ran clang-format on db/ directory (#10910) 2022-11-02 14:34:24 -07:00
file_indexer.h Ran clang-format on db/ directory (#10910) 2022-11-02 14:34:24 -07:00
file_indexer_test.cc Ran clang-format on db/ directory (#10910) 2022-11-02 14:34:24 -07:00
filename_test.cc Ran clang-format on db/ directory (#10910) 2022-11-02 14:34:24 -07:00
flush_job.cc Remove RocksDB LITE (#11147) 2023-01-27 13:14:19 -08:00
flush_job.h Remove RocksDB LITE (#11147) 2023-01-27 13:14:19 -08:00
flush_job_test.cc Remove RocksDB LITE (#11147) 2023-01-27 13:14:19 -08:00
flush_scheduler.cc
flush_scheduler.h Include C++ standard library headers instead of C compatibility headers (#8068) 2021-03-19 12:09:47 -07:00
forward_iterator.cc Remove RocksDB LITE (#11147) 2023-01-27 13:14:19 -08:00
forward_iterator.h Remove RocksDB LITE (#11147) 2023-01-27 13:14:19 -08:00
forward_iterator_bench.cc Remove RocksDB LITE (#11147) 2023-01-27 13:14:19 -08:00
history_trimming_iterator.h Refactor AddRangeDels() + consider range tombstone during compaction file cutting (#11113) 2023-02-22 12:28:18 -08:00
import_column_family_job.cc Remove RocksDB LITE (#11147) 2023-01-27 13:14:19 -08:00
import_column_family_job.h Add some missing headers (#10519) 2022-08-11 12:45:50 -07:00
import_column_family_test.cc Remove RocksDB LITE (#11147) 2023-01-27 13:14:19 -08:00
internal_stats.cc Remove RocksDB LITE (#11147) 2023-01-27 13:14:19 -08:00
internal_stats.h Remove RocksDB LITE (#11147) 2023-01-27 13:14:19 -08:00
job_context.h Remove RocksDB LITE (#11147) 2023-01-27 13:14:19 -08:00
kv_checksum.h Add memtable per key-value checksum (#10281) 2022-08-12 13:51:32 -07:00
listener_test.cc Remove RocksDB LITE (#11147) 2023-01-27 13:14:19 -08:00
log_format.h Add record to set WAL compression type if enabled (#9556) 2022-02-17 16:19:31 -08:00
log_reader.cc Fix bug in WAL streaming uncompression (#11198) 2023-02-08 12:05:49 -08:00
log_reader.h Ran clang-format on db/ directory (#10910) 2022-11-02 14:34:24 -07:00
log_test.cc Fix bug in WAL streaming uncompression (#11198) 2023-02-08 12:05:49 -08:00
log_writer.cc Add manual_wal_flush, FlushWAL() to stress/crash test (#10698) 2022-09-30 15:48:33 -07:00
log_writer.h Ran clang-format on db/ directory (#10910) 2022-11-02 14:34:24 -07:00
logs_with_prep_tracker.cc
logs_with_prep_tracker.h Ran clang-format on db/ directory (#10910) 2022-11-02 14:34:24 -07:00
lookup_key.h Ran clang-format on db/ directory (#10910) 2022-11-02 14:34:24 -07:00
malloc_stats.cc Remove RocksDB LITE (#11147) 2023-01-27 13:14:19 -08:00
malloc_stats.h Remove RocksDB LITE (#11147) 2023-01-27 13:14:19 -08:00
manual_compaction_test.cc Ran clang-format on db/ directory (#10910) 2022-11-02 14:34:24 -07:00
memtable.cc Add a new MultiGetEntity API (#11222) 2023-02-15 09:34:17 -08:00
memtable.h Remove RocksDB LITE (#11147) 2023-01-27 13:14:19 -08:00
memtable_list.cc Remove RocksDB LITE (#11147) 2023-01-27 13:14:19 -08:00
memtable_list.h Fix memtable-only iterator regression (#10705) 2022-09-21 09:49:31 -07:00
memtable_list_test.cc Remove RocksDB LITE (#11147) 2023-01-27 13:14:19 -08:00
merge_context.h Ran clang-format on db/ directory (#10910) 2022-11-02 14:34:24 -07:00
merge_helper.cc Refactor AddRangeDels() + consider range tombstone during compaction file cutting (#11113) 2023-02-22 12:28:18 -08:00
merge_helper.h Add API to limit blast radius of merge operator failure (#11092) 2023-01-20 14:40:30 -08:00
merge_helper_test.cc Basic Support for Merge with user-defined timestamp (#10819) 2022-10-31 22:28:58 -07:00
merge_operator.cc Ran clang-format on db/ directory (#10910) 2022-11-02 14:34:24 -07:00
merge_test.cc Remove RocksDB LITE (#11147) 2023-01-27 13:14:19 -08:00
obsolete_files_test.cc Remove RocksDB LITE (#11147) 2023-01-27 13:14:19 -08:00
options_file_test.cc Remove RocksDB LITE (#11147) 2023-01-27 13:14:19 -08:00
output_validator.cc Cleanup includes in dbformat.h (#8930) 2021-09-29 04:04:40 -07:00
output_validator.h Cleanup includes in dbformat.h (#8930) 2021-09-29 04:04:40 -07:00
perf_context_test.cc Remove RocksDB LITE (#11147) 2023-01-27 13:14:19 -08:00
periodic_task_scheduler.cc Remove RocksDB LITE (#11147) 2023-01-27 13:14:19 -08:00
periodic_task_scheduler.h Remove RocksDB LITE (#11147) 2023-01-27 13:14:19 -08:00
periodic_task_scheduler_test.cc Remove RocksDB LITE (#11147) 2023-01-27 13:14:19 -08:00
pinned_iterators_manager.h Avoid allocations/copies for large GetMergeOperands() results (#10458) 2022-08-04 00:42:13 -07:00
plain_table_db_test.cc Remove RocksDB LITE (#11147) 2023-01-27 13:14:19 -08:00
post_memtable_callback.h Snapshots with user-specified timestamps (#9879) 2022-06-10 16:07:03 -07:00
pre_release_callback.h Fix and detect headers with missing dependencies (#8893) 2021-09-10 10:00:26 -07:00
prefix_test.cc Remove RocksDB LITE (#11147) 2023-01-27 13:14:19 -08:00
range_del_aggregator.cc Refactor AddRangeDels() + consider range tombstone during compaction file cutting (#11113) 2023-02-22 12:28:18 -08:00
range_del_aggregator.h Refactor AddRangeDels() + consider range tombstone during compaction file cutting (#11113) 2023-02-22 12:28:18 -08:00
range_del_aggregator_bench.cc Improve FragmentTombstones() speed by lazily initializing seq_set_ (#10848) 2022-10-25 11:33:04 -07:00
range_del_aggregator_test.cc Refactor AddRangeDels() + consider range tombstone during compaction file cutting (#11113) 2023-02-22 12:28:18 -08:00
range_tombstone_fragmenter.cc Improve FragmentTombstones() speed by lazily initializing seq_set_ (#10848) 2022-10-25 11:33:04 -07:00
range_tombstone_fragmenter.h Refactor AddRangeDels() + consider range tombstone during compaction file cutting (#11113) 2023-02-22 12:28:18 -08:00
range_tombstone_fragmenter_test.cc snapshots of FragmentedRangeTombstoneList must in ascending order (#11046) 2022-12-19 15:06:22 -08:00
read_callback.h Fix and detect headers with missing dependencies (#8893) 2021-09-10 10:00:26 -07:00
repair.cc Remove RocksDB LITE (#11147) 2023-01-27 13:14:19 -08:00
repair_test.cc Remove RocksDB LITE (#11147) 2023-01-27 13:14:19 -08:00
seqno_time_test.cc Remove RocksDB LITE (#11147) 2023-01-27 13:14:19 -08:00
seqno_to_time_mapping.cc Add option preserve_internal_time_seconds to preserve the time info (#10747) 2022-10-07 18:49:40 -07:00
seqno_to_time_mapping.h Add option preserve_internal_time_seconds to preserve the time info (#10747) 2022-10-07 18:49:40 -07:00
snapshot_checker.h Remove RocksDB LITE (#11147) 2023-01-27 13:14:19 -08:00
snapshot_impl.cc Ran clang-format on db/ directory (#10910) 2022-11-02 14:34:24 -07:00
snapshot_impl.h Ran clang-format on db/ directory (#10910) 2022-11-02 14:34:24 -07:00
table_cache.cc Remove RocksDB LITE (#11147) 2023-01-27 13:14:19 -08:00
table_cache.h Major Cache refactoring, CPU efficiency improvement (#10975) 2023-01-11 14:20:40 -08:00
table_cache_sync_and_async.h Remove RocksDB LITE (#11147) 2023-01-27 13:14:19 -08:00
table_properties_collector.cc Ran clang-format on db/ directory (#10910) 2022-11-02 14:34:24 -07:00
table_properties_collector.h Fix an assertion failure in TimestampTablePropertiesCollector for empty output (#11015) 2022-12-05 13:46:27 -08:00
table_properties_collector_test.cc Remove RocksDB LITE (#11147) 2023-01-27 13:14:19 -08:00
transaction_log_impl.cc Remove RocksDB LITE (#11147) 2023-01-27 13:14:19 -08:00
transaction_log_impl.h Remove RocksDB LITE (#11147) 2023-01-27 13:14:19 -08:00
trim_history_scheduler.cc
trim_history_scheduler.h Ran clang-format on db/ directory (#10910) 2022-11-02 14:34:24 -07:00
version_builder.cc Major Cache refactoring, CPU efficiency improvement (#10975) 2023-01-11 14:20:40 -08:00
version_builder.h Sort L0 files by newly introduced epoch_num (#10922) 2022-12-13 13:29:37 -08:00
version_builder_test.cc Include estimated bytes deleted by range tombstones in compensated file size (#10734) 2022-12-29 13:28:24 -08:00
version_edit.cc Include estimated bytes deleted by range tombstones in compensated file size (#10734) 2022-12-29 13:28:24 -08:00
version_edit.h Put Cache and CacheWrapper in new public header (#11192) 2023-02-09 12:12:02 -08:00
version_edit_handler.cc Sort L0 files by newly introduced epoch_num (#10922) 2022-12-13 13:29:37 -08:00
version_edit_handler.h Sort L0 files by newly introduced epoch_num (#10922) 2022-12-13 13:29:37 -08:00
version_edit_test.cc Include estimated bytes deleted by range tombstones in compensated file size (#10734) 2022-12-29 13:28:24 -08:00
version_set.cc Refactor AddRangeDels() + consider range tombstone during compaction file cutting (#11113) 2023-02-22 12:28:18 -08:00
version_set.h Add a new MultiGetEntity API (#11222) 2023-02-15 09:34:17 -08:00
version_set_sync_and_async.h Merge operator failed subcode (#11231) 2023-02-17 10:58:46 -08:00
version_set_test.cc Remove RocksDB LITE (#11147) 2023-01-27 13:14:19 -08:00
version_util.h Allow manifest fix-up without requiring prior state (#10796) 2022-10-10 17:59:17 -07:00
wal_edit.cc Do not hold mutex when write keys if not necessary (#7516) 2022-07-21 13:35:36 -07:00
wal_edit.h Do not hold mutex when write keys if not necessary (#7516) 2022-07-21 13:35:36 -07:00
wal_edit_test.cc Do not hold mutex when write keys if not necessary (#7516) 2022-07-21 13:35:36 -07:00
wal_manager.cc Remove RocksDB LITE (#11147) 2023-01-27 13:14:19 -08:00
wal_manager.h Remove RocksDB LITE (#11147) 2023-01-27 13:14:19 -08:00
wal_manager_test.cc Remove RocksDB LITE (#11147) 2023-01-27 13:14:19 -08:00
write_batch.cc Add API to limit blast radius of merge operator failure (#11092) 2023-01-20 14:40:30 -08:00
write_batch_base.cc
write_batch_internal.h Ran clang-format on db/ directory (#10910) 2022-11-02 14:34:24 -07:00
write_batch_test.cc Remove RocksDB LITE (#11147) 2023-01-27 13:14:19 -08:00
write_callback.h
write_callback_test.cc Remove RocksDB LITE (#11147) 2023-01-27 13:14:19 -08:00
write_controller.cc Revamp WriteController (#8064) 2021-03-18 09:47:31 -07:00
write_controller.h Ran clang-format on db/ directory (#10910) 2022-11-02 14:34:24 -07:00
write_controller_test.cc Ran clang-format on db/ directory (#10910) 2022-11-02 14:34:24 -07:00
write_thread.cc Ensure LockWAL() stall cleared for UnlockWAL() return (#11172) 2023-02-03 12:08:37 -08:00
write_thread.h Ensure LockWAL() stall cleared for UnlockWAL() return (#11172) 2023-02-03 12:08:37 -08:00