rocksdb/db
Changyu Bi 4384dd5eee Support ingesting SST files generated by a live DB (#12750)
Summary:
... to enable use cases like using RocksDB to merge sort data for ingestion. A new file ingestion option `IngestExternalFileOptions::allow_db_generated_files` is introduced to allows users to ingest SST files generated by live DBs instead of SstFileWriter. For now this only works if the SST files being ingested have zero as their largest sequence number AND do not overlap with any data in the DB (so we can assign seqno 0 which matches the seqno of all ingested keys).

The feature is marked the option as experimental for now.

Main changes needed to enable this:
- ignore CF id mismatch during ingestion
- ignore the missing external file version table property

Rest of the change is mostly in new unit tests.

A previous attempt is in https://github.com/facebook/rocksdb/issues/5602.

Pull Request resolved: https://github.com/facebook/rocksdb/pull/12750

Test Plan: - new unit tests

Reviewed By: ajkr, jowlyzhang

Differential Revision: D58396673

Pulled By: cbi42

fbshipit-source-id: aae513afad7b1ff5d4faa48104df5f384926bf03
2024-07-19 16:14:54 -07:00
..
blob Fixed MultiGet() error handling to not skip blob dereference (#12597) 2024-04-29 14:18:42 -07:00
compaction fix: Round-Robin pri under leveled compaction allows subcompactions b… (#12843) 2024-07-08 12:25:11 -07:00
db_impl Support ingesting SST files generated by a live DB (#12750) 2024-07-19 16:14:54 -07:00
wide Fix the output of ldb dump_wal for PutEntity records (#12677) 2024-05-20 17:04:14 -07:00
arena_wrapped_db_iter.cc Fix possible double-free on TruncatedRangeDelIterator (#12805) 2024-06-24 11:51:16 -07:00
arena_wrapped_db_iter.h Fix possible double-free on TruncatedRangeDelIterator (#12805) 2024-06-24 11:51:16 -07:00
attribute_group_iterator_impl.cc MultiCfIterator - AttributeGroupIter Impl & CoalescingIter Optimization (#12534) 2024-04-16 08:45:38 -07:00
attribute_group_iterator_impl.h MultiCfIterator - AttributeGroupIter Impl & CoalescingIter Optimization (#12534) 2024-04-16 08:45:38 -07:00
builder.cc Add CompactForTieringCollector to support automatically trigger compaction for tiering use case (#12760) 2024-06-18 10:51:29 -07:00
builder.h Add CompactForTieringCollector to support automatically trigger compaction for tiering use case (#12760) 2024-06-18 10:51:29 -07:00
c.cc Create C API function to iterate over WriteBatch for custom Column Families (#12718) 2024-07-09 12:05:08 -07:00
c_test.c Create C API function to iterate over WriteBatch for custom Column Families (#12718) 2024-07-09 12:05:08 -07:00
coalescing_iterator.cc MultiCfIterator - AttributeGroupIter Impl & CoalescingIter Optimization (#12534) 2024-04-16 08:45:38 -07:00
coalescing_iterator.h MultiCfIterator - AttributeGroupIter Impl & CoalescingIter Optimization (#12534) 2024-04-16 08:45:38 -07:00
column_family.cc Change the behavior of manual flush to not retain UDT (#12737) 2024-06-13 13:18:10 -07:00
column_family.h Fix manual flush hanging on waiting for no stall for UDT in memtable … (#12771) 2024-06-14 13:37:37 -07:00
column_family_test.cc Fix manual flush hanging on waiting for no stall for UDT in memtable … (#12771) 2024-06-14 13:37:37 -07:00
compact_files_test.cc Prevent data block compression with BlockBasedTableOptions::block_align (#12592) 2024-04-26 20:05:30 -07:00
comparator_db_test.cc Run internal cpp modernizer on RocksDB repo (#12398) 2024-03-04 10:08:32 -08:00
convenience.cc Inject more errors to more files in stress test (#12713) 2024-06-19 08:42:00 -07:00
convenience_impl.h
corruption_test.cc Prefer static_cast in place of most reinterpret_cast (#12308) 2024-02-07 10:44:11 -08:00
cuckoo_table_db_test.cc Run internal cpp modernizer on RocksDB repo (#12398) 2024-03-04 10:08:32 -08:00
db_basic_test.cc DeleteRange() return NotSupported if row_cache is configured (#12512) 2024-04-29 16:33:13 -07:00
db_block_cache_test.cc Fix rare failure in DBBlockCacheTypeTest.Uncache (#12775) 2024-06-14 20:50:36 -07:00
db_bloom_filter_test.cc Fix major bug with prefixes, SeekForPrev, and partitioned filters (#12872) 2024-07-17 14:08:35 -07:00
db_clip_test.cc
db_compaction_filter_test.cc Replace ScopedArenaIterator with ScopedArenaPtr<InternalIterator> (#12470) 2024-03-22 13:40:42 -07:00
db_compaction_test.cc Fix crash in CompactFiles() of conflict range under preclude_last_level_data_seconds > 0 (#12628) 2024-05-13 13:12:06 -07:00
db_dynamic_level_test.cc Prefer static_cast in place of most reinterpret_cast (#12308) 2024-02-07 10:44:11 -08:00
db_encryption_test.cc
db_filesnapshot.cc Avoid unnecessary work in internal calls to GetSortedWalFiles (#12831) 2024-07-01 23:29:02 -07:00
db_flush_test.cc Refactor SyncWAL and SyncClosedLogs for code sharing (#12707) 2024-05-30 14:53:13 -07:00
db_follower_test.cc Implement obsolete file deletion (GC) in follower (#12657) 2024-05-17 19:13:33 -07:00
db_info_dumper.cc Don't log an error when an auxiliary dir is missing (#12326) 2024-02-05 10:26:41 -08:00
db_info_dumper.h
db_inplace_update_test.cc
db_io_failure_test.cc Retry DB::Open upon a corruption detected while reading the MANIFEST (#12518) 2024-04-18 17:36:33 -07:00
db_iter.cc Add Iterator property "rocksdb.iterator.is-value-pinned" (#12659) 2024-05-15 19:11:52 -07:00
db_iter.h Follow ups for TimedPut and write time property (#12455) 2024-03-21 10:00:15 -07:00
db_iter_stress_test.cc Automated modernization (#12210) 2024-01-05 11:53:57 -08:00
db_iter_test.cc Add initial support for TimedPut API (#12419) 2024-03-14 15:44:55 -07:00
db_iterator_test.cc Add Iterator property "rocksdb.iterator.is-value-pinned" (#12659) 2024-05-15 19:11:52 -07:00
db_kv_checksum_test.cc Fix locking for ColumnFamilyOptions::inplace_update_support (#12624) 2024-05-08 08:30:12 -07:00
db_log_iter_test.cc Disable flaky part of TransactionLogIteratorCheckWhenArchive (#12423) 2024-03-12 12:54:53 -07:00
db_logical_block_size_cache_test.cc
db_memtable_test.cc fix DeleteRange+memtable_insert_with_hint_prefix_extractor interaction (#12558) 2024-04-22 20:13:58 -07:00
db_merge_operand_test.cc Add ContinueCallback to GetMergeOperands() (#12438) 2024-03-15 12:25:49 -07:00
db_merge_operator_test.cc
db_options_test.cc Prefer static_cast in place of most reinterpret_cast (#12308) 2024-02-07 10:44:11 -08:00
db_properties_test.cc GetAggregatedIntProperty accumulates property once per block cache (#12755) 2024-06-18 10:46:55 -07:00
db_range_del_test.cc Fail DeleteRange() early when row_cache is configured (#12710) 2024-05-29 15:03:15 -07:00
db_rate_limiter_test.cc Fix db_rate_limiter_test for win (#12816) 2024-07-01 16:14:19 -07:00
db_readonly_with_timestamp_test.cc Enforce status checking after Valid() returns false for IteratorWrapper (#11975) 2023-10-18 09:38:38 -07:00
db_secondary_test.cc Prefer static_cast in place of most reinterpret_cast (#12308) 2024-02-07 10:44:11 -08:00
db_sst_test.cc Fix delete obsolete files on recovery not rate limited (#12590) 2024-05-01 12:26:54 -07:00
db_statistics_test.cc Fix double counting of BYTES_WRITTEN ticker (#12111) 2023-12-08 17:12:11 -08:00
db_table_properties_test.cc Run internal cpp modernizer on RocksDB repo (#12398) 2024-03-04 10:08:32 -08:00
db_tailing_iter_test.cc Prefer static_cast in place of most reinterpret_cast (#12308) 2024-02-07 10:44:11 -08:00
db_test.cc Disable "uncache" behavior in DB shutdown (#12751) 2024-06-11 15:57:40 -07:00
db_test2.cc Rename, deprecate LogFile and VectorLogPtr (#12695) 2024-05-28 09:24:49 -07:00
db_test_util.cc Support ingesting SST files generated by a live DB (#12750) 2024-07-19 16:14:54 -07:00
db_test_util.h Support ingesting SST files generated by a live DB (#12750) 2024-07-19 16:14:54 -07:00
db_universal_compaction_test.cc Prefer static_cast in place of most reinterpret_cast (#12308) 2024-02-07 10:44:11 -08:00
db_wal_test.cc Fix possible crash in failure to sync some WALs (#12789) 2024-06-21 12:56:21 -07:00
db_with_timestamp_basic_test.cc Deprecate some variants of Get and MultiGet (#12327) 2024-02-16 09:21:06 -08:00
db_with_timestamp_compaction_test.cc Prefer static_cast in place of most reinterpret_cast (#12308) 2024-02-07 10:44:11 -08:00
db_with_timestamp_test_util.cc
db_with_timestamp_test_util.h
db_write_buffer_manager_test.cc Prefer static_cast in place of most reinterpret_cast (#12308) 2024-02-07 10:44:11 -08:00
db_write_test.cc Disallow memtable flush and sst ingest while WAL is locked (#12652) 2024-05-21 10:17:34 -07:00
dbformat.cc Add EntryType for TimedPut (#12669) 2024-05-16 15:18:12 -07:00
dbformat.h Support read timestamp in ldb (#12641) 2024-05-13 15:43:12 -07:00
dbformat_test.cc
deletefile_test.cc Add an option to wait for purge in WaitForCompact (#12520) 2024-04-17 17:33:27 -07:00
error_handler.cc Fix a bug where OnErrorRecoveryBegin() is not called before auto-recovery (#12860) 2024-07-15 17:00:14 -07:00
error_handler.h Remove the return value of SetBGError() (#12792) 2024-06-26 18:17:05 -07:00
error_handler_fs_test.cc Remove the return value of SetBGError() (#12792) 2024-06-26 18:17:05 -07:00
event_helpers.cc Fix race condition between event listener and error handler (#12803) 2024-06-24 11:45:28 -07:00
event_helpers.h
experimental.cc Run internal cpp modernizer on RocksDB repo (#12398) 2024-03-04 10:08:32 -08:00
external_sst_file_basic_test.cc Disallow memtable flush and sst ingest while WAL is locked (#12652) 2024-05-21 10:17:34 -07:00
external_sst_file_ingestion_job.cc Support ingesting SST files generated by a live DB (#12750) 2024-07-19 16:14:54 -07:00
external_sst_file_ingestion_job.h Use extended file boundary for key range overlap check during file ingestion (#12735) 2024-06-04 13:39:51 -07:00
external_sst_file_test.cc Support ingesting SST files generated by a live DB (#12750) 2024-07-19 16:14:54 -07:00
fault_injection_test.cc FaultInjectionTestFS follow-up and clean-up (#12861) 2024-07-15 10:28:34 -07:00
file_indexer.cc
file_indexer.h
file_indexer_test.cc
filename_test.cc
flush_job.cc Add CompactForTieringCollector to support automatically trigger compaction for tiering use case (#12760) 2024-06-18 10:51:29 -07:00
flush_job.h Add CompactForTieringCollector to support automatically trigger compaction for tiering use case (#12760) 2024-06-18 10:51:29 -07:00
flush_job_test.cc Follow ups for TimedPut and write time property (#12455) 2024-03-21 10:00:15 -07:00
flush_scheduler.cc
flush_scheduler.h
forward_iterator.cc Support returning write unix time in iterator property (#12428) 2024-03-15 15:37:37 -07:00
forward_iterator.h Support returning write unix time in iterator property (#12428) 2024-03-15 15:37:37 -07:00
forward_iterator_bench.cc
history_trimming_iterator.h
import_column_family_job.cc Fix a corruption bug in CreateColumnFamilyWithImport() (#12602) 2024-05-06 11:01:38 -07:00
import_column_family_job.h
import_column_family_test.cc Fix a corruption bug in CreateColumnFamilyWithImport() (#12602) 2024-05-06 11:01:38 -07:00
internal_stats.cc GetAggregatedIntProperty accumulates property once per block cache (#12755) 2024-06-18 10:46:55 -07:00
internal_stats.h GetAggregatedIntProperty accumulates property once per block cache (#12755) 2024-06-18 10:46:55 -07:00
job_context.h Support returning write unix time in iterator property (#12428) 2024-03-15 15:37:37 -07:00
kv_checksum.h
listener_test.cc internal_repo_rocksdb (4372117296613874540) (#12117) 2023-12-04 11:17:32 -08:00
log_format.h
log_reader.cc Fix recycled WAL detection when wal_compression is enabled (#12643) 2024-05-22 15:34:37 -07:00
log_reader.h Enable recycle_log_file_num option for point in time recovery (#12403) 2024-03-21 12:29:35 -07:00
log_test.cc Fix recycled WAL detection when wal_compression is enabled (#12643) 2024-05-22 15:34:37 -07:00
log_writer.cc Inject more errors to more files in stress test (#12713) 2024-06-19 08:42:00 -07:00
log_writer.h Ensure Close() before LinkFile() for WALs in Checkpoint (#12734) 2024-06-12 11:48:45 -07:00
logs_with_prep_tracker.cc
logs_with_prep_tracker.h
lookup_key.h
malloc_stats.cc Prefer static_cast in place of most reinterpret_cast (#12308) 2024-02-07 10:44:11 -08:00
malloc_stats.h
manual_compaction_test.cc internal_repo_rocksdb (4372117296613874540) (#12117) 2023-12-04 11:17:32 -08:00
memtable.cc Fix assertion failure in ConstructFragmentedRangeTombstones() (#12796) 2024-06-22 11:31:16 -07:00
memtable.h Fix assertion failure in ConstructFragmentedRangeTombstones() (#12796) 2024-06-22 11:31:16 -07:00
memtable_list.cc Fix possible double-free on TruncatedRangeDelIterator (#12805) 2024-06-24 11:51:16 -07:00
memtable_list.h Support returning write unix time in iterator property (#12428) 2024-03-15 15:37:37 -07:00
memtable_list_test.cc Add initial support for TimedPut API (#12419) 2024-03-14 15:44:55 -07:00
merge_context.h Add ContinueCallback to GetMergeOperands() (#12438) 2024-03-15 12:25:49 -07:00
merge_helper.cc Add initial support for TimedPut API (#12419) 2024-03-14 15:44:55 -07:00
merge_helper.h Eliminate some code duplication in MergeHelper (#12121) 2023-12-05 14:07:42 -08:00
merge_helper_test.cc
merge_operator.cc
merge_test.cc Prefer static_cast in place of most reinterpret_cast (#12308) 2024-02-07 10:44:11 -08:00
multi_cf_iterator_impl.h Fix heap-use-after-free in MultiCfIteratorImpl (#12784) 2024-06-21 11:56:10 -07:00
multi_cf_iterator_test.cc Fix IteratorsConsistentView tests (#12582) 2024-04-25 14:06:46 -07:00
obsolete_files_test.cc Remove the force mode for EnableFileDeletions API (#12337) 2024-02-13 18:36:25 -08:00
options_file_test.cc internal_repo_rocksdb (4372117296613874540) (#12117) 2023-12-04 11:17:32 -08:00
output_validator.cc Removed check_flush_compaction_key_order (#12311) 2024-01-31 16:30:26 -08:00
output_validator.h Removed check_flush_compaction_key_order (#12311) 2024-01-31 16:30:26 -08:00
perf_context_test.cc Add write_memtable_time to perf level kEnableWait (#12394) 2024-02-29 15:08:26 -08:00
periodic_task_scheduler.cc Remove extra semi colon from instagram/ranking/mezql/shots/parser/fast/Token.cpp 2024-03-04 06:32:50 -08:00
periodic_task_scheduler.h
periodic_task_scheduler_test.cc Prefer static_cast in place of most reinterpret_cast (#12308) 2024-02-07 10:44:11 -08:00
pinned_iterators_manager.h Prefer static_cast in place of most reinterpret_cast (#12308) 2024-02-07 10:44:11 -08:00
plain_table_db_test.cc Prefer static_cast in place of most reinterpret_cast (#12308) 2024-02-07 10:44:11 -08:00
post_memtable_callback.h
pre_release_callback.h
prefix_test.cc internal_repo_rocksdb (4372117296613874540) (#12117) 2023-12-04 11:17:32 -08:00
range_del_aggregator.cc Replace ScopedArenaIterator with ScopedArenaPtr<InternalIterator> (#12470) 2024-03-22 13:40:42 -07:00
range_del_aggregator.h Replace ScopedArenaIterator with ScopedArenaPtr<InternalIterator> (#12470) 2024-03-22 13:40:42 -07:00
range_del_aggregator_bench.cc
range_del_aggregator_test.cc
range_tombstone_fragmenter.cc Add support for range deletion when user timestamps are not persisted (#12254) 2024-01-29 11:37:34 -08:00
range_tombstone_fragmenter.h Fix compile errors in C++23 (#12106) 2024-05-28 15:33:57 -07:00
range_tombstone_fragmenter_test.cc
read_callback.h
repair.cc Add CompactForTieringCollector to support automatically trigger compaction for tiering use case (#12760) 2024-06-18 10:51:29 -07:00
repair_test.cc internal_repo_rocksdb (4372117296613874540) (#12117) 2023-12-04 11:17:32 -08:00
seqno_time_test.cc Add initial support for TimedPut API (#12419) 2024-03-14 15:44:55 -07:00
seqno_to_time_mapping.cc Add CompactForTieringCollector to support automatically trigger compaction for tiering use case (#12760) 2024-06-18 10:51:29 -07:00
seqno_to_time_mapping.h Add CompactForTieringCollector to support automatically trigger compaction for tiering use case (#12760) 2024-06-18 10:51:29 -07:00
snapshot_checker.h Remove 'virtual' when implied by 'override' (#12319) 2024-01-31 13:14:42 -08:00
snapshot_impl.cc
snapshot_impl.h
table_cache.cc Fix possible double-free on TruncatedRangeDelIterator (#12805) 2024-06-24 11:51:16 -07:00
table_cache.h Fix possible double-free on TruncatedRangeDelIterator (#12805) 2024-06-24 11:51:16 -07:00
table_cache_sync_and_async.h Fix kBlockCacheTier read when merge-chain base value is in a blob file (#12462) 2024-03-21 12:38:53 -07:00
table_properties_collector.cc
table_properties_collector.h Add CompactForTieringCollector to support automatically trigger compaction for tiering use case (#12760) 2024-06-18 10:51:29 -07:00
table_properties_collector_test.cc Add CompactForTieringCollector to support automatically trigger compaction for tiering use case (#12760) 2024-06-18 10:51:29 -07:00
transaction_log_impl.cc Rename, deprecate LogFile and VectorLogPtr (#12695) 2024-05-28 09:24:49 -07:00
transaction_log_impl.h Rename, deprecate LogFile and VectorLogPtr (#12695) 2024-05-28 09:24:49 -07:00
trim_history_scheduler.cc
trim_history_scheduler.h
version_builder.cc Fix blob files not reclaimed after deleting all SSTs (#12235) 2024-01-16 11:15:23 -08:00
version_builder.h
version_builder_test.cc Fix blob files not reclaimed after deleting all SSTs (#12235) 2024-01-16 11:15:23 -08:00
version_edit.cc Support ingesting SST files generated by a live DB (#12750) 2024-07-19 16:14:54 -07:00
version_edit.h Remove unnecessary, confusing 'extern' (#12300) 2024-01-29 10:38:08 -08:00
version_edit_handler.cc Implement obsolete file deletion (GC) in follower (#12657) 2024-05-17 19:13:33 -07:00
version_edit_handler.h Fix version edit dump in json (#12703) 2024-05-28 16:44:25 -07:00
version_edit_test.cc internal_repo_rocksdb (4372117296613874540) (#12117) 2023-12-04 11:17:32 -08:00
version_set.cc Fix race between manifest error recovery and file ingestion (#12871) 2024-07-19 10:37:51 -07:00
version_set.h Support pro-actively erasing obsolete block cache entries (#12694) 2024-06-07 08:57:11 -07:00
version_set_sync_and_async.h
version_set_test.cc Implement obsolete file deletion (GC) in follower (#12657) 2024-05-17 19:13:33 -07:00
version_util.h Group SST write in flush, compaction and db open with new stats (#11910) 2023-12-29 15:29:23 -08:00
wal_edit.cc
wal_edit.h
wal_edit_test.cc
wal_manager.cc Avoid unnecessary work in internal calls to GetSortedWalFiles (#12831) 2024-07-01 23:29:02 -07:00
wal_manager.h Avoid unnecessary work in internal calls to GetSortedWalFiles (#12831) 2024-07-01 23:29:02 -07:00
wal_manager_test.cc Group SST write in flush, compaction and db open with new stats (#11910) 2023-12-29 15:29:23 -08:00
write_batch.cc Fail DeleteRange() early when row_cache is configured (#12710) 2024-05-29 15:03:15 -07:00
write_batch_base.cc
write_batch_internal.h Add initial support for TimedPut API (#12419) 2024-03-14 15:44:55 -07:00
write_batch_test.cc Replace ScopedArenaIterator with ScopedArenaPtr<InternalIterator> (#12470) 2024-03-22 13:40:42 -07:00
write_callback.h
write_callback_test.cc Add public API WriteWithCallback to support custom callbacks (#12603) 2024-05-31 19:30:19 -07:00
write_controller.cc
write_controller.h
write_controller_test.cc
write_stall_stats.cc
write_stall_stats.h Remove unnecessary, confusing 'extern' (#12300) 2024-01-29 10:38:08 -08:00
write_thread.cc Add public API WriteWithCallback to support custom callbacks (#12603) 2024-05-31 19:30:19 -07:00
write_thread.h Add public API WriteWithCallback to support custom callbacks (#12603) 2024-05-31 19:30:19 -07:00