rocksdb/db
Yu Zhang 071a146fa0 Add support for range deletion when user timestamps are not persisted (#12254)
Summary:
For the user defined timestamps in memtable only feature, some special handling for range deletion blocks are needed since both the key (start_key) and the value (end_key) of a range tombstone can contain user-defined timestamps. Handling for the key is taken care of in the same way as the other data blocks in the block based table. This PR adds the special handling needed for the value (end_key) part. This includes:

1) On the write path, when L0 SST files are first created from flush, user-defined timestamps are removed from an end key of a range tombstone. There are places where it's logically removed (replaced with a min timestamp) because there is still logic with the running comparator that expects a user key that contains timestamp. And in the block based builder, it is eventually physically removed before persisted in a block.

2) On the read path, when range deletion block is being read, we artificially pad a min timestamp to the end key of a range tombstone in `BlockBasedTableReader`.

3) For file boundary `FileMetaData.largest`, we artificially pad a max timestamp to it if it contains a range deletion sentinel. Anytime when range deletion end_key is used to update file boundaries, it's using max timestamp instead of the range tombstone's actual timestamp to mark it as an exclusive end. d69628e6ce/db/dbformat.h (L923-L935)
This max timestamp is removed when in memory `FileMetaData.largest` is persisted into Manifest, we pad it back when it's read from Manifest while handling related `VersionEdit` in `VersionEditHandler`.

Pull Request resolved: https://github.com/facebook/rocksdb/pull/12254

Test Plan: Added unit test and enabled this feature combination's stress test.

Reviewed By: cbi42

Differential Revision: D52965527

Pulled By: jowlyzhang

fbshipit-source-id: e8315f8a2c5268e2ae0f7aec8012c266b86df985
2024-01-29 11:37:34 -08:00
..
blob Refactor FilePrefetchBuffer code (#12097) 2024-01-05 09:29:01 -08:00
compaction Add support for range deletion when user timestamps are not persisted (#12254) 2024-01-29 11:37:34 -08:00
db_impl Remove unnecessary, confusing 'extern' (#12300) 2024-01-29 10:38:08 -08:00
wide internal_repo_rocksdb (4372117296613874540) (#12117) 2023-12-04 11:17:32 -08:00
arena_wrapped_db_iter.cc Invalidate threadlocal SV before incrementing `super_version_number_` (#11848) 2023-09-18 09:37:40 -07:00
arena_wrapped_db_iter.h Remove unnecessary, confusing 'extern' (#12300) 2024-01-29 10:38:08 -08:00
builder.cc Add support for range deletion when user timestamps are not persisted (#12254) 2024-01-29 11:37:34 -08:00
builder.h Remove unnecessary, confusing 'extern' (#12300) 2024-01-29 10:38:08 -08:00
c.cc Remove unnecessary, confusing 'extern' (#12300) 2024-01-29 10:38:08 -08:00
c_test.c Allow setting Stderr Logger via C API (#12262) 2024-01-25 12:36:40 -08:00
column_family.cc Detect compaction pressure at lower debt ratios (#12236) 2024-01-15 22:41:18 -08:00
column_family.h Remove unnecessary, confusing 'extern' (#12300) 2024-01-29 10:38:08 -08:00
column_family_test.cc Deflake ColumnFamilyTest.WriteStallSingleColumnFamily (#12294) 2024-01-25 14:40:18 -08:00
compact_files_test.cc Add missing status check when compiling with `ASSERT_STATUS_CHECKED=1` (#11686) 2023-08-09 15:46:44 -07:00
comparator_db_test.cc Enforce status checking after Valid() returns false for IteratorWrapper (#11975) 2023-10-18 09:38:38 -07:00
convenience.cc Group SST write in flush, compaction and db open with new stats (#11910) 2023-12-29 15:29:23 -08:00
convenience_impl.h Group rocksdb.sst.read.micros stat by different user read IOActivity + misc (#11444) 2023-08-08 17:26:50 -07:00
corruption_test.cc Make option `level_compaction_dynamic_level_bytes` true by default (#11525) 2023-06-15 21:12:39 -07:00
cuckoo_table_db_test.cc Make option `level_compaction_dynamic_level_bytes` true by default (#11525) 2023-06-15 21:12:39 -07:00
db_basic_test.cc Add some asserts in FilePickerMultiGet for debugging (#12241) 2024-01-16 17:08:58 -08:00
db_block_cache_test.cc Enforce status checking after Valid() returns false for IteratorWrapper (#11975) 2023-10-18 09:38:38 -07:00
db_bloom_filter_test.cc Enforce status checking after Valid() returns false for IteratorWrapper (#11975) 2023-10-18 09:38:38 -07:00
db_clip_test.cc Support Clip DB to KeyRange (#11379) 2023-05-18 13:25:01 -07:00
db_compaction_filter_test.cc Enforce status checking after Valid() returns false for IteratorWrapper (#11975) 2023-10-18 09:38:38 -07:00
db_compaction_test.cc Deflake `DBCompactionTest.BottomPriCompactionCountsTowardConcurrencyLimit` (#12289) 2024-01-25 10:37:11 -08:00
db_dynamic_level_test.cc Remove RocksDB LITE (#11147) 2023-01-27 13:14:19 -08:00
db_encryption_test.cc Remove RocksDB LITE (#11147) 2023-01-27 13:14:19 -08:00
db_filesnapshot.cc Remove the default force behavior for `EnableFileDeletion` API (#12001) 2023-11-10 14:35:54 -08:00
db_flush_test.cc Fix leak or crash on failure in automatic atomic flush (#12176) 2023-12-26 11:04:25 -08:00
db_info_dumper.cc Automated modernization (#12210) 2024-01-05 11:53:57 -08:00
db_info_dumper.h
db_inplace_update_test.cc
db_io_failure_test.cc Remove RocksDB LITE (#11147) 2023-01-27 13:14:19 -08:00
db_iter.cc Fix bug in auto_readahead_size that returned wrong key (#12229) 2024-01-16 11:30:36 -08:00
db_iter.h Remove unnecessary, confusing 'extern' (#12300) 2024-01-29 10:38:08 -08:00
db_iter_stress_test.cc Automated modernization (#12210) 2024-01-05 11:53:57 -08:00
db_iter_test.cc Automated modernization (#12210) 2024-01-05 11:53:57 -08:00
db_iterator_test.cc Automated modernization (#12210) 2024-01-05 11:53:57 -08:00
db_kv_checksum_test.cc Automated modernization (#12210) 2024-01-05 11:53:57 -08:00
db_log_iter_test.cc Remove the default force behavior for `EnableFileDeletion` API (#12001) 2023-11-10 14:35:54 -08:00
db_logical_block_size_cache_test.cc Remove RocksDB LITE (#11147) 2023-01-27 13:14:19 -08:00
db_memtable_test.cc
db_merge_operand_test.cc Make option `level_compaction_dynamic_level_bytes` true by default (#11525) 2023-06-15 21:12:39 -07:00
db_merge_operator_test.cc Fix the handling of wide-column base values in the max_successive_merges logic (#11913) 2023-10-02 16:25:25 -07:00
db_options_test.cc Automated modernization (#12210) 2024-01-05 11:53:57 -08:00
db_properties_test.cc Automated modernization (#12210) 2024-01-05 11:53:57 -08:00
db_range_del_test.cc Automated modernization (#12210) 2024-01-05 11:53:57 -08:00
db_rate_limiter_test.cc Enforce status checking after Valid() returns false for IteratorWrapper (#11975) 2023-10-18 09:38:38 -07:00
db_readonly_with_timestamp_test.cc Enforce status checking after Valid() returns false for IteratorWrapper (#11975) 2023-10-18 09:38:38 -07:00
db_secondary_test.cc Automated modernization (#12210) 2024-01-05 11:53:57 -08:00
db_sst_test.cc Automated modernization (#12210) 2024-01-05 11:53:57 -08:00
db_statistics_test.cc Fix double counting of BYTES_WRITTEN ticker (#12111) 2023-12-08 17:12:11 -08:00
db_table_properties_test.cc Automated modernization (#12210) 2024-01-05 11:53:57 -08:00
db_tailing_iter_test.cc Automated modernization (#12210) 2024-01-05 11:53:57 -08:00
db_test.cc Print additional information when flaky test DBTestWithParam.ThreadStatusSingleCompaction fails (#12268) 2024-01-23 10:07:06 -08:00
db_test2.cc Rate-limit un-ratelimited flush/compaction code paths (#12290) 2024-01-25 12:00:15 -08:00
db_test_util.cc Automated modernization (#12210) 2024-01-05 11:53:57 -08:00
db_test_util.h test: WritableFile derived class: add missing GetFileSize() override (#11726) 2023-09-29 15:58:08 -07:00
db_universal_compaction_test.cc Automated modernization (#12210) 2024-01-05 11:53:57 -08:00
db_wal_test.cc Automated modernization (#12210) 2024-01-05 11:53:57 -08:00
db_with_timestamp_basic_test.cc Add support for range deletion when user timestamps are not persisted (#12254) 2024-01-29 11:37:34 -08:00
db_with_timestamp_compaction_test.cc Automated modernization (#12210) 2024-01-05 11:53:57 -08:00
db_with_timestamp_test_util.cc
db_with_timestamp_test_util.h
db_write_buffer_manager_test.cc Add missing status check when compiling with `ASSERT_STATUS_CHECKED=1` (#11686) 2023-08-09 15:46:44 -07:00
db_write_test.cc Automated modernization (#12210) 2024-01-05 11:53:57 -08:00
dbformat.cc Add support for range deletion when user timestamps are not persisted (#12254) 2024-01-29 11:37:34 -08:00
dbformat.h Add support for range deletion when user timestamps are not persisted (#12254) 2024-01-29 11:37:34 -08:00
dbformat_test.cc Logically strip timestamp during flush (#11557) 2023-06-29 15:50:50 -07:00
deletefile_test.cc Automated modernization (#12210) 2024-01-05 11:53:57 -08:00
error_handler.cc Consolidate stats recording in error handler (#11992) 2024-01-22 14:57:30 -08:00
error_handler.h Consolidate stats recording in error handler (#11992) 2024-01-22 14:57:30 -08:00
error_handler_fs_test.cc Quarantine files in a limbo state after a manifest error (#12030) 2023-11-11 08:11:11 -08:00
event_helpers.cc Fix/cleanup SeqnoToTimeMapping (#12253) 2024-01-19 21:50:38 -08:00
event_helpers.h Remove RocksDB LITE (#11147) 2023-01-27 13:14:19 -08:00
experimental.cc Group SST write in flush, compaction and db open with new stats (#11910) 2023-12-29 15:29:23 -08:00
external_sst_file_basic_test.cc Fix bug of newer ingested data assigned with an older seqno (#12257) 2024-01-24 11:21:05 -08:00
external_sst_file_ingestion_job.cc Skip searching through lsm tree for a target level when files overlap (#12284) 2024-01-24 23:30:08 -08:00
external_sst_file_ingestion_job.h Add missing range conflict check between file ingestion and RefitLevel() (#10988) 2022-12-29 15:05:36 -08:00
external_sst_file_test.cc Skip searching through lsm tree for a target level when files overlap (#12284) 2024-01-24 23:30:08 -08:00
fault_injection_test.cc Group SST write in flush, compaction and db open with new stats (#11910) 2023-12-29 15:29:23 -08:00
file_indexer.cc Simplify conditional judgment (#11580) 2023-07-03 09:41:48 -07:00
file_indexer.h Ran clang-format on db/ directory (#10910) 2022-11-02 14:34:24 -07:00
file_indexer_test.cc Ran clang-format on db/ directory (#10910) 2022-11-02 14:34:24 -07:00
filename_test.cc Ran clang-format on db/ directory (#10910) 2022-11-02 14:34:24 -07:00
flush_job.cc Rate-limit un-ratelimited flush/compaction code paths (#12290) 2024-01-25 12:00:15 -08:00
flush_job.h Rate-limit un-ratelimited flush/compaction code paths (#12290) 2024-01-25 12:00:15 -08:00
flush_job_test.cc Rate-limit un-ratelimited flush/compaction code paths (#12290) 2024-01-25 12:00:15 -08:00
flush_scheduler.cc
flush_scheduler.h
forward_iterator.cc Add an interface to provide support for underlying FS to pass their own buffer during reads (#11324) 2023-06-23 11:48:49 -07:00
forward_iterator.h Ignore async_io ReadOption if FileSystem doesn't support it (#11296) 2023-03-17 14:57:09 -07:00
forward_iterator_bench.cc Remove RocksDB LITE (#11147) 2023-01-27 13:14:19 -08:00
history_trimming_iterator.h Refactor AddRangeDels() + consider range tombstone during compaction file cutting (#11113) 2023-02-22 12:28:18 -08:00
import_column_family_job.cc Group SST write in flush, compaction and db open with new stats (#11910) 2023-12-29 15:29:23 -08:00
import_column_family_job.h Support to create a CF by importing multiple non-overlapping CFs (#11378) 2023-06-15 12:25:04 -07:00
import_column_family_test.cc internal_repo_rocksdb (4372117296613874540) (#12117) 2023-12-04 11:17:32 -08:00
internal_stats.cc Log pending compaction bytes in a couple places (#12267) 2024-01-23 09:14:59 -08:00
internal_stats.h Remove unnecessary, confusing 'extern' (#12300) 2024-01-29 10:38:08 -08:00
job_context.h Quarantine files in a limbo state after a manifest error (#12030) 2023-11-11 08:11:11 -08:00
kv_checksum.h Block per key-value checksum (#11287) 2023-04-25 12:08:23 -07:00
listener_test.cc internal_repo_rocksdb (4372117296613874540) (#12117) 2023-12-04 11:17:32 -08:00
log_format.h Add support in log writer and reader for a user-defined timestamp size record (#11433) 2023-05-11 17:26:19 -07:00
log_reader.cc internal_repo_rocksdb (4372117296613874540) (#12117) 2023-12-04 11:17:32 -08:00
log_reader.h switch to use RocksDB UnorderedMap (#11507) 2023-06-05 13:36:26 -07:00
log_test.cc Group SST write in flush, compaction and db open with new stats (#11910) 2023-12-29 15:29:23 -08:00
log_writer.cc Group SST write in flush, compaction and db open with new stats (#11910) 2023-12-29 15:29:23 -08:00
log_writer.h Group SST write in flush, compaction and db open with new stats (#11910) 2023-12-29 15:29:23 -08:00
logs_with_prep_tracker.cc
logs_with_prep_tracker.h Ran clang-format on db/ directory (#10910) 2022-11-02 14:34:24 -07:00
lookup_key.h Ran clang-format on db/ directory (#10910) 2022-11-02 14:34:24 -07:00
malloc_stats.cc internal_repo_rocksdb (4372117296613874540) (#12117) 2023-12-04 11:17:32 -08:00
malloc_stats.h Remove RocksDB LITE (#11147) 2023-01-27 13:14:19 -08:00
manual_compaction_test.cc internal_repo_rocksdb (4372117296613874540) (#12117) 2023-12-04 11:17:32 -08:00
memtable.cc Group SST write in flush, compaction and db open with new stats (#11910) 2023-12-29 15:29:23 -08:00
memtable.h Remove unnecessary, confusing 'extern' (#12300) 2024-01-29 10:38:08 -08:00
memtable_list.cc Group SST write in flush, compaction and db open with new stats (#11910) 2023-12-29 15:29:23 -08:00
memtable_list.h Remove unnecessary, confusing 'extern' (#12300) 2024-01-29 10:38:08 -08:00
memtable_list_test.cc Lightweight verification of MANIFEST file after close on shutdown (#12174) 2023-12-28 18:25:29 -08:00
merge_context.h Ran clang-format on db/ directory (#10910) 2022-11-02 14:34:24 -07:00
merge_helper.cc Eliminate some code duplication in MergeHelper (#12121) 2023-12-05 14:07:42 -08:00
merge_helper.h Eliminate some code duplication in MergeHelper (#12121) 2023-12-05 14:07:42 -08:00
merge_helper_test.cc Basic Support for Merge with user-defined timestamp (#10819) 2022-10-31 22:28:58 -07:00
merge_operator.cc Add helper methods WideColumnsHelper::{Has,Get}DefaultColumn (#11813) 2023-09-11 16:32:32 -07:00
merge_test.cc internal_repo_rocksdb (4372117296613874540) (#12117) 2023-12-04 11:17:32 -08:00
obsolete_files_test.cc internal_repo_rocksdb (4372117296613874540) (#12117) 2023-12-04 11:17:32 -08:00
options_file_test.cc internal_repo_rocksdb (4372117296613874540) (#12117) 2023-12-04 11:17:32 -08:00
output_validator.cc
output_validator.h
perf_context_test.cc Deflake `PerfContextTest.CPUTimer` (#12252) 2024-01-19 10:13:52 -08:00
periodic_task_scheduler.cc Turn the default Timer in PeriodicTaskScheduler into a leaky Meyers singleton (#12128) 2023-12-08 10:34:07 -08:00
periodic_task_scheduler.h Improve efficiency of create_missing_column_families, light refactor (#11920) 2023-10-04 14:14:22 -07:00
periodic_task_scheduler_test.cc Remove RocksDB LITE (#11147) 2023-01-27 13:14:19 -08:00
pinned_iterators_manager.h
plain_table_db_test.cc Remove unnecessary, confusing 'extern' (#12300) 2024-01-29 10:38:08 -08:00
post_memtable_callback.h
pre_release_callback.h
prefix_test.cc internal_repo_rocksdb (4372117296613874540) (#12117) 2023-12-04 11:17:32 -08:00
range_del_aggregator.cc internal_repo_rocksdb (4372117296613874540) (#12117) 2023-12-04 11:17:32 -08:00
range_del_aggregator.h Add new Iterator API Refresh(const snapshot*) (#10594) 2023-09-15 10:44:43 -07:00
range_del_aggregator_bench.cc
range_del_aggregator_test.cc Refactor AddRangeDels() + consider range tombstone during compaction file cutting (#11113) 2023-02-22 12:28:18 -08:00
range_tombstone_fragmenter.cc Add support for range deletion when user timestamps are not persisted (#12254) 2024-01-29 11:37:34 -08:00
range_tombstone_fragmenter.h Add support for range deletion when user timestamps are not persisted (#12254) 2024-01-29 11:37:34 -08:00
range_tombstone_fragmenter_test.cc snapshots of FragmentedRangeTombstoneList must in ascending order (#11046) 2022-12-19 15:06:22 -08:00
read_callback.h
repair.cc Group SST write in flush, compaction and db open with new stats (#11910) 2023-12-29 15:29:23 -08:00
repair_test.cc internal_repo_rocksdb (4372117296613874540) (#12117) 2023-12-04 11:17:32 -08:00
seqno_time_test.cc Fix UB/crash in new SeqnoToTimeMapping::CopyFromSeqnoRange (#12293) 2024-01-25 11:27:15 -08:00
seqno_to_time_mapping.cc Fix UB/crash in new SeqnoToTimeMapping::CopyFromSeqnoRange (#12293) 2024-01-25 11:27:15 -08:00
seqno_to_time_mapping.h Fix UB/crash in new SeqnoToTimeMapping::CopyFromSeqnoRange (#12293) 2024-01-25 11:27:15 -08:00
snapshot_checker.h Remove RocksDB LITE (#11147) 2023-01-27 13:14:19 -08:00
snapshot_impl.cc Ran clang-format on db/ directory (#10910) 2022-11-02 14:34:24 -07:00
snapshot_impl.h Ran clang-format on db/ directory (#10910) 2022-11-02 14:34:24 -07:00
table_cache.cc internal_repo_rocksdb (4372117296613874540) (#12117) 2023-12-04 11:17:32 -08:00
table_cache.h Fix row cache falsely return kNotFound when timestamp enabled (#11816) 2023-09-20 11:34:38 -07:00
table_cache_sync_and_async.h Fix StopWatch bug; Remove setting `record_read_stats` (#11474) 2023-05-25 10:16:58 -07:00
table_properties_collector.cc Ran clang-format on db/ directory (#10910) 2022-11-02 14:34:24 -07:00
table_properties_collector.h Allow TablePropertiesCollectorFactory to return null collector (#12129) 2023-12-11 12:02:56 -08:00
table_properties_collector_test.cc Remove unnecessary, confusing 'extern' (#12300) 2024-01-29 10:38:08 -08:00
transaction_log_impl.cc Remove RocksDB LITE (#11147) 2023-01-27 13:14:19 -08:00
transaction_log_impl.h Remove RocksDB LITE (#11147) 2023-01-27 13:14:19 -08:00
trim_history_scheduler.cc
trim_history_scheduler.h Ran clang-format on db/ directory (#10910) 2022-11-02 14:34:24 -07:00
version_builder.cc Fix blob files not reclaimed after deleting all SSTs (#12235) 2024-01-16 11:15:23 -08:00
version_builder.h Block per key-value checksum (#11287) 2023-04-25 12:08:23 -07:00
version_builder_test.cc Fix blob files not reclaimed after deleting all SSTs (#12235) 2024-01-16 11:15:23 -08:00
version_edit.cc internal_repo_rocksdb (4372117296613874540) (#12117) 2023-12-04 11:17:32 -08:00
version_edit.h Remove unnecessary, confusing 'extern' (#12300) 2024-01-29 10:38:08 -08:00
version_edit_handler.cc Add support for range deletion when user timestamps are not persisted (#12254) 2024-01-29 11:37:34 -08:00
version_edit_handler.h Remove VersionEdit's friends pattern (#12024) 2023-11-01 12:04:11 -07:00
version_edit_test.cc internal_repo_rocksdb (4372117296613874540) (#12117) 2023-12-04 11:17:32 -08:00
version_set.cc Log pending compaction bytes in a couple places (#12267) 2024-01-23 09:14:59 -08:00
version_set.h Remove unnecessary, confusing 'extern' (#12300) 2024-01-29 10:38:08 -08:00
version_set_sync_and_async.h Add helper methods WideColumnsHelper::{Has,Get}DefaultColumn (#11813) 2023-09-11 16:32:32 -07:00
version_set_test.cc Group SST write in flush, compaction and db open with new stats (#11910) 2023-12-29 15:29:23 -08:00
version_util.h Group SST write in flush, compaction and db open with new stats (#11910) 2023-12-29 15:29:23 -08:00
wal_edit.cc
wal_edit.h
wal_edit_test.cc
wal_manager.cc Replace push_back by emplace_back in wal manager (#10805) 2023-12-27 10:40:33 -08:00
wal_manager.h Remove RocksDB LITE (#11147) 2023-01-27 13:14:19 -08:00
wal_manager_test.cc Group SST write in flush, compaction and db open with new stats (#11910) 2023-12-29 15:29:23 -08:00
write_batch.cc Group SST write in flush, compaction and db open with new stats (#11910) 2023-12-29 15:29:23 -08:00
write_batch_base.cc
write_batch_internal.h Set default cf ts sz for a reused transaction (#11685) 2023-08-09 13:49:42 -07:00
write_batch_test.cc Stubs for piping write time (#12043) 2023-11-09 15:58:07 -08:00
write_callback.h
write_callback_test.cc internal_repo_rocksdb (4372117296613874540) (#12117) 2023-12-04 11:17:32 -08:00
write_controller.cc
write_controller.h Ran clang-format on db/ directory (#10910) 2022-11-02 14:34:24 -07:00
write_controller_test.cc Ran clang-format on db/ directory (#10910) 2022-11-02 14:34:24 -07:00
write_stall_stats.cc Fix initialization-order-fiasco in write_stall_stats.cc (#11355) 2023-04-05 14:42:31 -07:00
write_stall_stats.h Remove unnecessary, confusing 'extern' (#12300) 2024-01-29 10:38:08 -08:00
write_thread.cc Ensure LockWAL() stall cleared for UnlockWAL() return (#11172) 2023-02-03 12:08:37 -08:00
write_thread.h Group SST write in flush, compaction and db open with new stats (#11910) 2023-12-29 15:29:23 -08:00