rocksdb/db
Yu Zhang 509947ce2c Quarantine files in a limbo state after a manifest error (#12030)
Summary:
Part of the procedures to handle manifest IO error is to disable file deletion in case some files in limbo state get deleted prematurely. This is not ideal because: 1) not all the VersionEdits whose commit encounter such an error contain updates for files, disabling file deletion sometimes are not necessary. 2) `EnableFileDeletion` has a force mode that could make other threads accidentally disrupt this procedure in recovery.  3) Disabling file deletion as a whole is also not as efficient as more precisely tracking impacted files from being prematurely deleted.  This PR replaces this mechanism with tracking such files and quarantine them from being deleted in `ErrorHandler`.

These are the types of files being actively tracked in quarantine in this PR:
1) new table files and blob files from a background job
2) old manifest file whose immediately following new manifest file's CURRENT file creation gets into unclear state. Current handling is not sufficient to make sure the old manifest file is kept in case it's needed.

Note that WAL logs are not part of the quarantine because `min_log_number_to_keep` is a safe mechanism and it's only updated after successful manifest commits so it can prevent this premature deletion issue from happening.

We track these files' file numbers because they share the same file number space.

Pull Request resolved: https://github.com/facebook/rocksdb/pull/12030

Test Plan: Modified existing unit tests

Reviewed By: ajkr

Differential Revision: D51036774

Pulled By: jowlyzhang

fbshipit-source-id: 84ef26271fbbc888ef70da5c40fe843bd7038716
2023-11-11 08:11:11 -08:00
..
blob Enforce status checking after Valid() returns false for IteratorWrapper (#11975) 2023-10-18 09:38:38 -07:00
compaction Quarantine files in a limbo state after a manifest error (#12030) 2023-11-11 08:11:11 -08:00
db_impl Quarantine files in a limbo state after a manifest error (#12030) 2023-11-11 08:11:11 -08:00
wide AttributeGroups - PutEntity Implementation (#11977) 2023-11-06 16:52:51 -08:00
arena_wrapped_db_iter.cc Invalidate threadlocal SV before incrementing super_version_number_ (#11848) 2023-09-18 09:37:40 -07:00
arena_wrapped_db_iter.h Add new Iterator API Refresh(const snapshot*) (#10594) 2023-09-15 10:44:43 -07:00
builder.cc Refactor, clean up, fixes, and more testing for SeqnoToTimeMapping (#11905) 2023-09-29 11:21:59 -07:00
builder.h Group rocksdb.sst.read.micros stat by IOActivity flush and compaction (#11288) 2023-04-21 09:07:18 -07:00
c.cc Add rocksdb_options_set_cf_paths (#11151) 2023-11-10 11:36:11 -08:00
c_test.c Add rocksdb_options_set_cf_paths (#11151) 2023-11-10 11:36:11 -08:00
column_family.cc Clean up WriteBatchWithIndexInternal a bit (#11930) 2023-10-09 15:25:35 -07:00
column_family.h Clean up WriteBatchWithIndexInternal a bit (#11930) 2023-10-09 15:25:35 -07:00
column_family_test.cc Fix race in options taking effect (#11929) 2023-10-12 10:05:23 -07:00
compact_files_test.cc Add missing status check when compiling with ASSERT_STATUS_CHECKED=1 (#11686) 2023-08-09 15:46:44 -07:00
comparator_db_test.cc Enforce status checking after Valid() returns false for IteratorWrapper (#11975) 2023-10-18 09:38:38 -07:00
convenience.cc Group rocksdb.sst.read.micros stat by different user read IOActivity + misc (#11444) 2023-08-08 17:26:50 -07:00
convenience_impl.h Group rocksdb.sst.read.micros stat by different user read IOActivity + misc (#11444) 2023-08-08 17:26:50 -07:00
corruption_test.cc Make option level_compaction_dynamic_level_bytes true by default (#11525) 2023-06-15 21:12:39 -07:00
cuckoo_table_db_test.cc Make option level_compaction_dynamic_level_bytes true by default (#11525) 2023-06-15 21:12:39 -07:00
db_basic_test.cc Quarantine files in a limbo state after a manifest error (#12030) 2023-11-11 08:11:11 -08:00
db_block_cache_test.cc Enforce status checking after Valid() returns false for IteratorWrapper (#11975) 2023-10-18 09:38:38 -07:00
db_bloom_filter_test.cc Enforce status checking after Valid() returns false for IteratorWrapper (#11975) 2023-10-18 09:38:38 -07:00
db_clip_test.cc Support Clip DB to KeyRange (#11379) 2023-05-18 13:25:01 -07:00
db_compaction_filter_test.cc Enforce status checking after Valid() returns false for IteratorWrapper (#11975) 2023-10-18 09:38:38 -07:00
db_compaction_test.cc Mark more files for periodic compaction during offpeak (#12031) 2023-11-06 11:43:59 -08:00
db_dynamic_level_test.cc Remove RocksDB LITE (#11147) 2023-01-27 13:14:19 -08:00
db_encryption_test.cc Remove RocksDB LITE (#11147) 2023-01-27 13:14:19 -08:00
db_filesnapshot.cc Remove the default force behavior for EnableFileDeletion API (#12001) 2023-11-10 14:35:54 -08:00
db_flush_test.cc Enforce status checking after Valid() returns false for IteratorWrapper (#11975) 2023-10-18 09:38:38 -07:00
db_info_dumper.cc Log host name (#11776) 2023-08-31 08:39:09 -07:00
db_info_dumper.h
db_inplace_update_test.cc
db_io_failure_test.cc Remove RocksDB LITE (#11147) 2023-01-27 13:14:19 -08:00
db_iter.cc Stubs for piping write time (#12043) 2023-11-09 15:58:07 -08:00
db_iter.h Fix various failures in auto_readahead_size (#11884) 2023-10-02 17:47:24 -07:00
db_iter_stress_test.cc
db_iter_test.cc Enforce status checking after Valid() returns false for IteratorWrapper (#11975) 2023-10-18 09:38:38 -07:00
db_iterator_test.cc Enforce status checking after Valid() returns false for IteratorWrapper (#11975) 2023-10-18 09:38:38 -07:00
db_kv_checksum_test.cc
db_log_iter_test.cc Remove the default force behavior for EnableFileDeletion API (#12001) 2023-11-10 14:35:54 -08:00
db_logical_block_size_cache_test.cc Remove RocksDB LITE (#11147) 2023-01-27 13:14:19 -08:00
db_memtable_test.cc
db_merge_operand_test.cc Make option level_compaction_dynamic_level_bytes true by default (#11525) 2023-06-15 21:12:39 -07:00
db_merge_operator_test.cc Fix the handling of wide-column base values in the max_successive_merges logic (#11913) 2023-10-02 16:25:25 -07:00
db_options_test.cc Mark more files for periodic compaction during offpeak (#12031) 2023-11-06 11:43:59 -08:00
db_properties_test.cc Remove the default force behavior for EnableFileDeletion API (#12001) 2023-11-10 14:35:54 -08:00
db_range_del_test.cc Enforce status checking after Valid() returns false for IteratorWrapper (#11975) 2023-10-18 09:38:38 -07:00
db_rate_limiter_test.cc Enforce status checking after Valid() returns false for IteratorWrapper (#11975) 2023-10-18 09:38:38 -07:00
db_readonly_with_timestamp_test.cc Enforce status checking after Valid() returns false for IteratorWrapper (#11975) 2023-10-18 09:38:38 -07:00
db_secondary_test.cc Enforce status checking after Valid() returns false for IteratorWrapper (#11975) 2023-10-18 09:38:38 -07:00
db_sst_test.cc Rate limiting stale sst files' deletion during recovery (#12016) 2023-10-28 09:50:52 -07:00
db_statistics_test.cc Add missing status check when compiling with ASSERT_STATUS_CHECKED=1 (#11686) 2023-08-09 15:46:44 -07:00
db_table_properties_test.cc Make option level_compaction_dynamic_level_bytes true by default (#11525) 2023-06-15 21:12:39 -07:00
db_tailing_iter_test.cc Enforce status checking after Valid() returns false for IteratorWrapper (#11975) 2023-10-18 09:38:38 -07:00
db_test.cc Enforce status checking after Valid() returns false for IteratorWrapper (#11975) 2023-10-18 09:38:38 -07:00
db_test2.cc Remove the default force behavior for EnableFileDeletion API (#12001) 2023-11-10 14:35:54 -08:00
db_test_util.cc Enforce status checking after Valid() returns false for IteratorWrapper (#11975) 2023-10-18 09:38:38 -07:00
db_test_util.h test: WritableFile derived class: add missing GetFileSize() override (#11726) 2023-09-29 15:58:08 -07:00
db_universal_compaction_test.cc Mark more files for periodic compaction during offpeak (#12031) 2023-11-06 11:43:59 -08:00
db_wal_test.cc Quarantine files in a limbo state after a manifest error (#12030) 2023-11-11 08:11:11 -08:00
db_with_timestamp_basic_test.cc Enforce status checking after Valid() returns false for IteratorWrapper (#11975) 2023-10-18 09:38:38 -07:00
db_with_timestamp_compaction_test.cc Remove RocksDB LITE (#11147) 2023-01-27 13:14:19 -08:00
db_with_timestamp_test_util.cc
db_with_timestamp_test_util.h
db_write_buffer_manager_test.cc Add missing status check when compiling with ASSERT_STATUS_CHECKED=1 (#11686) 2023-08-09 15:46:44 -07:00
db_write_test.cc Add missing status check when compiling with ASSERT_STATUS_CHECKED=1 (#11686) 2023-08-09 15:46:44 -07:00
dbformat.cc Add documentation to some formatting util functions (#11674) 2023-08-14 22:04:18 -07:00
dbformat.h Add documentation to some formatting util functions (#11674) 2023-08-14 22:04:18 -07:00
dbformat_test.cc Logically strip timestamp during flush (#11557) 2023-06-29 15:50:50 -07:00
deletefile_test.cc Enforce status checking after Valid() returns false for IteratorWrapper (#11975) 2023-10-18 09:38:38 -07:00
error_handler.cc Quarantine files in a limbo state after a manifest error (#12030) 2023-11-11 08:11:11 -08:00
error_handler.h Quarantine files in a limbo state after a manifest error (#12030) 2023-11-11 08:11:11 -08:00
error_handler_fs_test.cc Quarantine files in a limbo state after a manifest error (#12030) 2023-11-11 08:11:11 -08:00
event_helpers.cc Fix for RecoverFromRetryableBGIOError starting with recovery_in_prog_ false (#11991) 2023-10-31 16:13:36 -07:00
event_helpers.h Remove RocksDB LITE (#11147) 2023-01-27 13:14:19 -08:00
experimental.cc Record the persist_user_defined_timestamps flag in manifest (#11515) 2023-06-21 21:49:01 -07:00
external_sst_file_basic_test.cc Add missing status check in ExternalSstFileIngestionJob and ImportColumnFamilyJob (#12042) 2023-11-06 07:41:36 -08:00
external_sst_file_ingestion_job.cc Add missing status check in ExternalSstFileIngestionJob and ImportColumnFamilyJob (#12042) 2023-11-06 07:41:36 -08:00
external_sst_file_ingestion_job.h
external_sst_file_test.cc Enforce status checking after Valid() returns false for IteratorWrapper (#11975) 2023-10-18 09:38:38 -07:00
fault_injection_test.cc Add missing status check when compiling with ASSERT_STATUS_CHECKED=1 (#11686) 2023-08-09 15:46:44 -07:00
file_indexer.cc Simplify conditional judgment (#11580) 2023-07-03 09:41:48 -07:00
file_indexer.h
file_indexer_test.cc
filename_test.cc
flush_job.cc Refactor, clean up, fixes, and more testing for SeqnoToTimeMapping (#11905) 2023-09-29 11:21:59 -07:00
flush_job.h Refactor, clean up, fixes, and more testing for SeqnoToTimeMapping (#11905) 2023-09-29 11:21:59 -07:00
flush_job_test.cc Quarantine files in a limbo state after a manifest error (#12030) 2023-11-11 08:11:11 -08:00
flush_scheduler.cc
flush_scheduler.h
forward_iterator.cc Add an interface to provide support for underlying FS to pass their own buffer during reads (#11324) 2023-06-23 11:48:49 -07:00
forward_iterator.h Ignore async_io ReadOption if FileSystem doesn't support it (#11296) 2023-03-17 14:57:09 -07:00
forward_iterator_bench.cc Remove RocksDB LITE (#11147) 2023-01-27 13:14:19 -08:00
history_trimming_iterator.h Refactor AddRangeDels() + consider range tombstone during compaction file cutting (#11113) 2023-02-22 12:28:18 -08:00
import_column_family_job.cc Mark more files for periodic compaction during offpeak (#12031) 2023-11-06 11:43:59 -08:00
import_column_family_job.h Support to create a CF by importing multiple non-overlapping CFs (#11378) 2023-06-15 12:25:04 -07:00
import_column_family_test.cc Support to create a CF by importing multiple non-overlapping CFs (#11378) 2023-06-15 12:25:04 -07:00
internal_stats.cc add property "rocksdb.obsolete-sst-files-size" (#11533) 2023-06-13 15:52:45 -07:00
internal_stats.h add property "rocksdb.obsolete-sst-files-size" (#11533) 2023-06-13 15:52:45 -07:00
job_context.h Quarantine files in a limbo state after a manifest error (#12030) 2023-11-11 08:11:11 -08:00
kv_checksum.h Block per key-value checksum (#11287) 2023-04-25 12:08:23 -07:00
listener_test.cc Add missing status check when compiling with ASSERT_STATUS_CHECKED=1 (#11686) 2023-08-09 15:46:44 -07:00
log_format.h Add support in log writer and reader for a user-defined timestamp size record (#11433) 2023-05-11 17:26:19 -07:00
log_reader.cc Fix dead loop with kSkipAnyCorruptedRecords mode selected in some cases (#11955) (#11979) 2023-10-25 09:16:24 -07:00
log_reader.h switch to use RocksDB UnorderedMap (#11507) 2023-06-05 13:36:26 -07:00
log_test.cc switch to use RocksDB UnorderedMap (#11507) 2023-06-05 13:36:26 -07:00
log_writer.cc switch to use RocksDB UnorderedMap (#11507) 2023-06-05 13:36:26 -07:00
log_writer.h switch to use RocksDB UnorderedMap (#11507) 2023-06-05 13:36:26 -07:00
logs_with_prep_tracker.cc
logs_with_prep_tracker.h
lookup_key.h
malloc_stats.cc Remove RocksDB LITE (#11147) 2023-01-27 13:14:19 -08:00
malloc_stats.h Remove RocksDB LITE (#11147) 2023-01-27 13:14:19 -08:00
manual_compaction_test.cc Enforce status checking after Valid() returns false for IteratorWrapper (#11975) 2023-10-18 09:38:38 -07:00
memtable.cc Integrate FullMergeV3 into the query and compaction paths (#11858) 2023-09-19 17:27:04 -07:00
memtable.h Add an option to trigger flush when the number of range deletions reach a threshold (#11358) 2023-08-02 19:58:56 -07:00
memtable_list.cc Fix a bug with atomic_flush that causes DB to stuck after a flush failure (#11872) 2023-09-22 16:43:50 -07:00
memtable_list.h Give retry flushes their own functions (#11903) 2023-10-02 16:26:24 -07:00
memtable_list_test.cc Quarantine files in a limbo state after a manifest error (#12030) 2023-11-11 08:11:11 -08:00
merge_context.h
merge_helper.cc Fix the handling of wide-column base values in the max_successive_merges logic (#11913) 2023-10-02 16:25:25 -07:00
merge_helper.h Fix the handling of wide-column base values in the max_successive_merges logic (#11913) 2023-10-02 16:25:25 -07:00
merge_helper_test.cc
merge_operator.cc Add helper methods WideColumnsHelper::{Has,Get}DefaultColumn (#11813) 2023-09-11 16:32:32 -07:00
merge_test.cc Introduce a wide column aware MergeOperator API (#11807) 2023-09-11 12:13:58 -07:00
obsolete_files_test.cc Remove the default force behavior for EnableFileDeletion API (#12001) 2023-11-10 14:35:54 -08:00
options_file_test.cc Remove RocksDB LITE (#11147) 2023-01-27 13:14:19 -08:00
output_validator.cc
output_validator.h
perf_context_test.cc Enforce status checking after Valid() returns false for IteratorWrapper (#11975) 2023-10-18 09:38:38 -07:00
periodic_task_scheduler.cc Remove RocksDB LITE (#11147) 2023-01-27 13:14:19 -08:00
periodic_task_scheduler.h Improve efficiency of create_missing_column_families, light refactor (#11920) 2023-10-04 14:14:22 -07:00
periodic_task_scheduler_test.cc Remove RocksDB LITE (#11147) 2023-01-27 13:14:19 -08:00
pinned_iterators_manager.h
plain_table_db_test.cc Enforce status checking after Valid() returns false for IteratorWrapper (#11975) 2023-10-18 09:38:38 -07:00
post_memtable_callback.h
pre_release_callback.h
prefix_test.cc Enforce status checking after Valid() returns false for IteratorWrapper (#11975) 2023-10-18 09:38:38 -07:00
range_del_aggregator.cc Improve documentation for MergingIterator (#11161) 2023-03-03 12:17:30 -08:00
range_del_aggregator.h Add new Iterator API Refresh(const snapshot*) (#10594) 2023-09-15 10:44:43 -07:00
range_del_aggregator_bench.cc
range_del_aggregator_test.cc Refactor AddRangeDels() + consider range tombstone during compaction file cutting (#11113) 2023-02-22 12:28:18 -08:00
range_tombstone_fragmenter.cc
range_tombstone_fragmenter.h Add new Iterator API Refresh(const snapshot*) (#10594) 2023-09-15 10:44:43 -07:00
range_tombstone_fragmenter_test.cc
read_callback.h
repair.cc Quarantine files in a limbo state after a manifest error (#12030) 2023-11-11 08:11:11 -08:00
repair_test.cc Respect cutoff timestamp during flush (#11599) 2023-07-26 16:25:06 -07:00
seqno_time_test.cc Use manifest to persist pre-allocated seqnos (#11995) 2023-10-23 09:20:59 -07:00
seqno_to_time_mapping.cc Bootstrap, pre-populate seqno_to_time_mapping (#11922) 2023-10-06 08:21:21 -07:00
seqno_to_time_mapping.h Bootstrap, pre-populate seqno_to_time_mapping (#11922) 2023-10-06 08:21:21 -07:00
snapshot_checker.h Remove RocksDB LITE (#11147) 2023-01-27 13:14:19 -08:00
snapshot_impl.cc
snapshot_impl.h
table_cache.cc Fix row cache falsely return kNotFound when timestamp enabled (#11816) 2023-09-20 11:34:38 -07:00
table_cache.h Fix row cache falsely return kNotFound when timestamp enabled (#11816) 2023-09-20 11:34:38 -07:00
table_cache_sync_and_async.h Fix StopWatch bug; Remove setting record_read_stats (#11474) 2023-05-25 10:16:58 -07:00
table_properties_collector.cc
table_properties_collector.h
table_properties_collector_test.cc Group rocksdb.sst.read.micros stat by IOActivity flush and compaction (#11288) 2023-04-21 09:07:18 -07:00
transaction_log_impl.cc Remove RocksDB LITE (#11147) 2023-01-27 13:14:19 -08:00
transaction_log_impl.h Remove RocksDB LITE (#11147) 2023-01-27 13:14:19 -08:00
trim_history_scheduler.cc
trim_history_scheduler.h
version_builder.cc Add unit test for default temperature (#11722) 2023-08-21 12:14:03 -07:00
version_builder.h Block per key-value checksum (#11287) 2023-04-25 12:08:23 -07:00
version_builder_test.cc Mark more files for periodic compaction during offpeak (#12031) 2023-11-06 11:43:59 -08:00
version_edit.cc Support switching on / off UDT together with in-Memtable-only feature (#11623) 2023-07-26 20:16:32 -07:00
version_edit.h Quarantine files in a limbo state after a manifest error (#12030) 2023-11-11 08:11:11 -08:00
version_edit_handler.cc Remove VersionEdit's friends pattern (#12024) 2023-11-01 12:04:11 -07:00
version_edit_handler.h Remove VersionEdit's friends pattern (#12024) 2023-11-01 12:04:11 -07:00
version_edit_test.cc Support switching on / off UDT together with in-Memtable-only feature (#11623) 2023-07-26 20:16:32 -07:00
version_set.cc Quarantine files in a limbo state after a manifest error (#12030) 2023-11-11 08:11:11 -08:00
version_set.h Quarantine files in a limbo state after a manifest error (#12030) 2023-11-11 08:11:11 -08:00
version_set_sync_and_async.h Add helper methods WideColumnsHelper::{Has,Get}DefaultColumn (#11813) 2023-09-11 16:32:32 -07:00
version_set_test.cc Quarantine files in a limbo state after a manifest error (#12030) 2023-11-11 08:11:11 -08:00
version_util.h Quarantine files in a limbo state after a manifest error (#12030) 2023-11-11 08:11:11 -08:00
wal_edit.cc
wal_edit.h
wal_edit_test.cc
wal_manager.cc Remove RocksDB LITE (#11147) 2023-01-27 13:14:19 -08:00
wal_manager.h Remove RocksDB LITE (#11147) 2023-01-27 13:14:19 -08:00
wal_manager_test.cc Quarantine files in a limbo state after a manifest error (#12030) 2023-11-11 08:11:11 -08:00
write_batch.cc AttributeGroups - PutEntity Implementation (#11977) 2023-11-06 16:52:51 -08:00
write_batch_base.cc
write_batch_internal.h Set default cf ts sz for a reused transaction (#11685) 2023-08-09 13:49:42 -07:00
write_batch_test.cc Stubs for piping write time (#12043) 2023-11-09 15:58:07 -08:00
write_callback.h
write_callback_test.cc Remove RocksDB LITE (#11147) 2023-01-27 13:14:19 -08:00
write_controller.cc
write_controller.h
write_controller_test.cc
write_stall_stats.cc Fix initialization-order-fiasco in write_stall_stats.cc (#11355) 2023-04-05 14:42:31 -07:00
write_stall_stats.h Fix initialization-order-fiasco in write_stall_stats.cc (#11355) 2023-04-05 14:42:31 -07:00
write_thread.cc Ensure LockWAL() stall cleared for UnlockWAL() return (#11172) 2023-02-03 12:08:37 -08:00
write_thread.h Ensure LockWAL() stall cleared for UnlockWAL() return (#11172) 2023-02-03 12:08:37 -08:00