rocksdb/db
Cheng Chang efe827baf0 Always track WAL obsoletion (#7759)
Summary:
Currently, when a WAL becomes obsolete after flushing, if VersionSet::WalSet does not contain the WAL, we do not track the WAL obsoletion event in MANIFEST.

But consider this case:
* WAL 10 is synced, a VersionEdit is LogAndApplied to MANIFEST to log this WAL addition event, but the VersionEdit is not applied to WalSet yet since its corresponding ManifestWriter is still pending in the write queue;
* Since the above ManifestWriter is blocking, the LogAndApply will block on a conditional variable and release the db mutex, so another LogAndApply can proceed to enqueue other VersionEdits concurrently;
* Now flush happens, and WAL 10 becomes obsolete, although WalSet does not contain WAL 10 yet, we should call LogAndApply to enqueue a VersionEdit to indicate the obsoletion of WAL 10;
* otherwise, when the queued edit indicating WAL 10 addition is logged to MANIFEST, and DB crashes and reopens, the WAL 10 might have been removed from disk, but it still exists in MANIFEST.

This PR changes the behavior to: always `LogAndApply` any WAL addition or obsoletion event, without considering the order issues caused by concurrency, but when applying the edits to `WalSet`, do not add the WALs if they are already obsolete. In this approach, the logical events of WAL addition and obsoletion are always tracked in MANIFEST, so we can inspect the MANIFEST and know all the previous WAL events, but we choose to ignore certain events due to the concurrency issues such as the case above, or the case in https://github.com/facebook/rocksdb/pull/7725.

Pull Request resolved: https://github.com/facebook/rocksdb/pull/7759

Test Plan: make check

Reviewed By: pdillinger

Differential Revision: D25423089

Pulled By: cheng-chang

fbshipit-source-id: 9cb9a7fbc1875bf954f2a42f9b6cfd6d49a7b21c
2020-12-09 16:02:12 -08:00
..
blob Add blob support to DBIter (#7731) 2020-12-04 21:29:38 -08:00
compaction Change ErrorHandler methods to return const Status& (#7539) 2020-12-07 20:11:35 -08:00
db_impl Always track WAL obsoletion (#7759) 2020-12-09 16:02:12 -08:00
arena_wrapped_db_iter.cc Add blob support to DBIter (#7731) 2020-12-04 21:29:38 -08:00
arena_wrapped_db_iter.h Add blob support to DBIter (#7731) 2020-12-04 21:29:38 -08:00
builder.cc Add full_history_ts_low_ to FlushJob (#7655) 2020-11-12 18:44:34 -08:00
builder.h Add full_history_ts_low_ to FlushJob (#7655) 2020-11-12 18:44:34 -08:00
c.cc Remove unused includes (#7604) 2020-10-28 23:22:27 -07:00
c_test.c Add getters to the C API for env, universal compaction options and fifo compaction options (#7501) 2020-10-16 11:04:01 -07:00
column_family.cc Write min_log_number_to_keep to MANIFEST during atomic flush under 2 phase commit (#7570) 2020-12-03 19:22:24 -08:00
column_family.h Add full_history_ts_low to column family (#7740) 2020-12-05 14:18:22 -08:00
column_family_test.cc Integrated blob garbage collection: relocate blobs (#7694) 2020-11-23 21:08:22 -08:00
compact_files_test.cc Replace reinterpret_cast with static_cast_with_check (#7067) 2020-07-02 19:25:41 -07:00
compacted_db_impl.cc Periodically flush info log out of application buffer (#7488) 2020-10-01 19:14:14 -07:00
compacted_db_impl.h
comparator_db_test.cc Remove unused includes (#7604) 2020-10-28 23:22:27 -07:00
convenience.cc
corruption_test.cc Port corruption test to use custom env (#7699) 2020-11-20 18:40:24 -08:00
cuckoo_table_db_test.cc Fix a recovery corner case (#7621) 2020-11-07 22:23:27 -08:00
db_basic_test.cc Do not track obsolete WALs in MANIFEST even if they are synced (#7725) 2020-12-08 10:58:04 -08:00
db_block_cache_test.cc Expand effect of dictionary settings in `ColumnFamilyOptions::compression_opts` (#7619) 2020-11-02 19:21:11 -08:00
db_bloom_filter_test.cc Experimental (production candidate) SST schema for Ribbon filter (#7658) 2020-11-12 20:46:14 -08:00
db_compaction_filter_test.cc In ParseInternalKey(), include corrupt key info in Status (#7515) 2020-10-28 10:12:58 -07:00
db_compaction_test.cc Integrated blob garbage collection: relocate blobs (#7694) 2020-11-23 21:08:22 -08:00
db_dynamic_level_test.cc Add a host location property to TableProperties (#7479) 2020-10-19 11:38:48 -07:00
db_encryption_test.cc Fix many tests to run with MEM_ENV and ENCRYPTED_ENV; Introduce a MemoryFileSystem class (#7566) 2020-10-27 10:33:09 -07:00
db_filesnapshot.cc Fix checkpoint file deletion race with avoid_unnecessary_blocking_io (#7369) 2020-09-10 22:35:25 -07:00
db_flush_test.cc Write min_log_number_to_keep to MANIFEST during atomic flush under 2 phase commit (#7570) 2020-12-03 19:22:24 -08:00
db_info_dumper.cc Make FileType Public and Replace kLogFile with kWalFile (#7580) 2020-10-22 17:06:20 -07:00
db_info_dumper.h Add a DB Session ID (#6959) 2020-06-15 10:47:02 -07:00
db_inplace_update_test.cc Whole DBTest to skip fsync (#7274) 2020-08-17 18:42:25 -07:00
db_io_failure_test.cc Whole DBTest to skip fsync (#7274) 2020-08-17 18:42:25 -07:00
db_iter.cc Add blob support to DBIter (#7731) 2020-12-04 21:29:38 -08:00
db_iter.h Add blob support to DBIter (#7731) 2020-12-04 21:29:38 -08:00
db_iter_stress_test.cc Add blob support to DBIter (#7731) 2020-12-04 21:29:38 -08:00
db_iter_test.cc Add blob support to DBIter (#7731) 2020-12-04 21:29:38 -08:00
db_iterator_test.cc Add further tests to ASSERT_STATUS_CHECKED (1) (#7679) 2020-12-08 15:55:04 -08:00
db_log_iter_test.cc Whole DBTest to skip fsync (#7274) 2020-08-17 18:42:25 -07:00
db_logical_block_size_cache_test.cc
db_memtable_test.cc Add further tests to ASSERT_STATUS_CHECKED (1) (#7679) 2020-12-08 15:55:04 -08:00
db_merge_operand_test.cc Add further tests to ASSERT_STATUS_CHECKED (1) (#7679) 2020-12-08 15:55:04 -08:00
db_merge_operator_test.cc Add further tests to ASSERT_STATUS_CHECKED (1) (#7679) 2020-12-08 15:55:04 -08:00
db_options_test.cc Always apply bottommost_compression_opts when enabled (#7633) 2020-11-11 20:32:28 -08:00
db_properties_test.cc Fix many tests to run with MEM_ENV and ENCRYPTED_ENV; Introduce a MemoryFileSystem class (#7566) 2020-10-27 10:33:09 -07:00
db_range_del_test.cc Fix a recovery corner case (#7621) 2020-11-07 22:23:27 -08:00
db_sst_test.cc Fix many tests to run with MEM_ENV and ENCRYPTED_ENV; Introduce a MemoryFileSystem class (#7566) 2020-10-27 10:33:09 -07:00
db_statistics_test.cc Add a new stats level to exclude tickers (#7329) 2020-09-04 23:25:03 -07:00
db_table_properties_test.cc DBTablePropertiesTest often times out in internal test infra (#7639) 2020-11-06 14:25:14 -08:00
db_tailing_iter_test.cc Whole DBTest to skip fsync (#7274) 2020-08-17 18:42:25 -07:00
db_test.cc Fix many tests to run with MEM_ENV and ENCRYPTED_ENV; Introduce a MemoryFileSystem class (#7566) 2020-10-27 10:33:09 -07:00
db_test2.cc Fix unit test failure ppc64le in travis (#7752) 2020-12-07 10:24:33 -08:00
db_test_util.cc Add further tests to ASSERT_STATUS_CHECKED (1) (#7679) 2020-12-08 15:55:04 -08:00
db_test_util.h Fix many tests to run with MEM_ENV and ENCRYPTED_ENV; Introduce a MemoryFileSystem class (#7566) 2020-10-27 10:33:09 -07:00
db_universal_compaction_test.cc Fix many tests to run with MEM_ENV and ENCRYPTED_ENV; Introduce a MemoryFileSystem class (#7566) 2020-10-27 10:33:09 -07:00
db_wal_test.cc Fix assertion failure in bg flush (#7362) 2020-12-02 09:31:14 -08:00
db_with_timestamp_basic_test.cc Make CompactRange and GetApproximateSizes work with timestamp (#7684) 2020-12-02 13:00:53 -08:00
db_with_timestamp_compaction_test.cc Whole DBTest to skip fsync (#7274) 2020-08-17 18:42:25 -07:00
db_write_test.cc Fix many tests to run with MEM_ENV and ENCRYPTED_ENV; Introduce a MemoryFileSystem class (#7566) 2020-10-27 10:33:09 -07:00
dbformat.cc Make CompactRange and GetApproximateSizes work with timestamp (#7684) 2020-12-02 13:00:53 -08:00
dbformat.h Make CompactRange and GetApproximateSizes work with timestamp (#7684) 2020-12-02 13:00:53 -08:00
dbformat_test.cc Remove unused includes (#7604) 2020-10-28 23:22:27 -07:00
deletefile_test.cc Make FileType Public and Replace kLogFile with kWalFile (#7580) 2020-10-22 17:06:20 -07:00
error_handler.cc Change ErrorHandler methods to return const Status& (#7539) 2020-12-07 20:11:35 -08:00
error_handler.h Change ErrorHandler methods to return const Status& (#7539) 2020-12-07 20:11:35 -08:00
error_handler_fs_test.cc Add kManifestWriteNoWAL to BackgroundErrorReason to handle Flush IO Error when WAL is disabled (#7693) 2020-12-02 18:24:01 -08:00
event_helpers.cc Status check enforcement for error_handler_fs_test (#7342) 2020-10-02 16:41:13 -07:00
event_helpers.h Pass SST file checksum information through OnTableFileCreated (#7108) 2020-08-25 10:46:11 -07:00
experimental.cc
external_sst_file_basic_test.cc Fix many tests to run with MEM_ENV and ENCRYPTED_ENV; Introduce a MemoryFileSystem class (#7566) 2020-10-27 10:33:09 -07:00
external_sst_file_ingestion_job.cc Updated GenerateOneFileChecksum to use requested_checksum_func_name (#7586) 2020-10-28 16:47:12 -07:00
external_sst_file_ingestion_job.h Store FSSequentialFilePtr object in SequenceFileReader (#7190) 2020-08-18 16:20:54 -07:00
external_sst_file_test.cc Do not use ASSERT_OK in child threads in ExternalSstFileTest.PickedLevelBug (#7754) 2020-12-07 17:37:17 -08:00
fault_injection_test.cc Remove unused includes (#7604) 2020-10-28 23:22:27 -07:00
file_indexer.cc
file_indexer.h
file_indexer_test.cc
filename_test.cc Remove unused includes (#7604) 2020-10-28 23:22:27 -07:00
flush_job.cc Fix assertion failure in bg flush (#7362) 2020-12-02 09:31:14 -08:00
flush_job.h Fix assertion failure in bg flush (#7362) 2020-12-02 09:31:14 -08:00
flush_job_test.cc Write min_log_number_to_keep to MANIFEST during atomic flush under 2 phase commit (#7570) 2020-12-03 19:22:24 -08:00
flush_scheduler.cc
flush_scheduler.h
forward_iterator.cc Add further tests to ASSERT_STATUS_CHECKED (1) (#7679) 2020-12-08 15:55:04 -08:00
forward_iterator.h
forward_iterator_bench.cc
import_column_family_job.cc In ParseInternalKey(), include corrupt key info in Status (#7515) 2020-10-28 10:12:58 -07:00
import_column_family_job.h Store FSSequentialFilePtr object in SequenceFileReader (#7190) 2020-08-18 16:20:54 -07:00
import_column_family_test.cc Fix many tests to run with MEM_ENV and ENCRYPTED_ENV; Introduce a MemoryFileSystem class (#7566) 2020-10-27 10:33:09 -07:00
internal_stats.cc Handling misuse of snprintf return value (#7686) 2020-12-07 13:43:55 -08:00
internal_stats.h Introduce BlobFileCache and add support for blob files to Get() (#7540) 2020-10-15 13:04:47 -07:00
job_context.h
listener_test.cc Remove unused includes (#7604) 2020-10-28 23:22:27 -07:00
log_format.h
log_reader.cc Fix kPointInTimeRecovery handling of truncated WAL (#7701) 2020-11-30 18:11:38 -08:00
log_reader.h Real fix for race in backup custom checksum checking (#7309) 2020-08-26 10:39:20 -07:00
log_test.cc Fix kPointInTimeRecovery handling of truncated WAL (#7701) 2020-11-30 18:11:38 -08:00
log_writer.cc Fail recovery when MANIFEST record checksum mismatch (#6996) 2020-06-18 10:09:12 -07:00
log_writer.h
logs_with_prep_tracker.cc
logs_with_prep_tracker.h
lookup_key.h
malloc_stats.cc
malloc_stats.h
manual_compaction_test.cc
memtable.cc Exclude timestamp from prefix extractor (#7668) 2020-12-01 14:07:15 -08:00
memtable.h Return `Status` from `MemTable` mutation functions (#7656) 2020-11-23 16:29:04 -08:00
memtable_list.cc Always track WAL obsoletion (#7759) 2020-12-09 16:02:12 -08:00
memtable_list.h Write min_log_number_to_keep to MANIFEST during atomic flush under 2 phase commit (#7570) 2020-12-03 19:22:24 -08:00
memtable_list_test.cc Write min_log_number_to_keep to MANIFEST during atomic flush under 2 phase commit (#7570) 2020-12-03 19:22:24 -08:00
merge_context.h
merge_helper.cc In ParseInternalKey(), include corrupt key info in Status (#7515) 2020-10-28 10:12:58 -07:00
merge_helper.h In ParseInternalKey(), include corrupt key info in Status (#7515) 2020-10-28 10:12:58 -07:00
merge_helper_test.cc
merge_operator.cc
merge_test.cc Perform post-flush updates of memtable list in a callback (#6069) 2020-10-26 18:23:01 -07:00
obsolete_files_test.cc Remove unused includes (#7604) 2020-10-28 23:22:27 -07:00
options_file_test.cc Add more tests to ASSERT_STATUS_CHECKED (#7367) 2020-09-16 15:48:07 -07:00
output_validator.cc Use NPHash64 in more places (#7632) 2020-11-10 23:42:13 -08:00
output_validator.h Use NPHash64 in more places (#7632) 2020-11-10 23:42:13 -08:00
perf_context_test.cc Fix many tests to run with MEM_ENV and ENCRYPTED_ENV; Introduce a MemoryFileSystem class (#7566) 2020-10-27 10:33:09 -07:00
periodic_work_scheduler.cc Remove unused includes (#7604) 2020-10-28 23:22:27 -07:00
periodic_work_scheduler.h Periodically flush info log out of application buffer (#7488) 2020-10-01 19:14:14 -07:00
periodic_work_scheduler_test.cc Fix many tests to run with MEM_ENV and ENCRYPTED_ENV; Introduce a MemoryFileSystem class (#7566) 2020-10-27 10:33:09 -07:00
pinned_iterators_manager.h
plain_table_db_test.cc Remove unused includes (#7604) 2020-10-28 23:22:27 -07:00
pre_release_callback.h
prefix_test.cc Fix prefix_test for status check (#7495) 2020-10-02 17:01:15 -07:00
range_del_aggregator.cc In ParseInternalKey(), include corrupt key info in Status (#7515) 2020-10-28 10:12:58 -07:00
range_del_aggregator.h In ParseInternalKey(), include corrupt key info in Status (#7515) 2020-10-28 10:12:58 -07:00
range_del_aggregator_bench.cc
range_del_aggregator_test.cc
range_tombstone_fragmenter.cc
range_tombstone_fragmenter.h
range_tombstone_fragmenter_test.cc
read_callback.h Get() with timestamp should respect snapshot (#7227) 2020-08-14 19:20:58 -07:00
repair.cc Add full_history_ts_low_ to FlushJob (#7655) 2020-11-12 18:44:34 -08:00
repair_test.cc add Status check assertions for repair_test (#7455) 2020-09-29 16:30:08 -07:00
snapshot_checker.h
snapshot_impl.cc
snapshot_impl.h
table_cache.cc Fix MultiGet unable to query timestamp data issue (#7589) 2020-11-03 09:45:41 -08:00
table_cache.h Store FSRandomAccessPtr object in RandomAccessFileReader (#7192) 2020-08-27 11:21:52 -07:00
table_properties_collector.cc In ParseInternalKey(), include corrupt key info in Status (#7515) 2020-10-28 10:12:58 -07:00
table_properties_collector.h
table_properties_collector_test.cc Bring the Configurable options together (#5753) 2020-09-14 17:01:01 -07:00
transaction_log_impl.cc Store FSSequentialFilePtr object in SequenceFileReader (#7190) 2020-08-18 16:20:54 -07:00
transaction_log_impl.h Store FileSystemPtr object that contains FileSystem ptr (#7180) 2020-08-12 17:31:23 -07:00
trim_history_scheduler.cc
trim_history_scheduler.h
version_builder.cc Enforce status check for corruption_test (#7453) 2020-10-02 22:11:00 -07:00
version_builder.h
version_builder_test.cc Remove unused includes (#7604) 2020-10-28 23:22:27 -07:00
version_edit.cc Add full_history_ts_low to column family (#7740) 2020-12-05 14:18:22 -08:00
version_edit.h Add full_history_ts_low to column family (#7740) 2020-12-05 14:18:22 -08:00
version_edit_handler.cc Add full_history_ts_low to column family (#7740) 2020-12-05 14:18:22 -08:00
version_edit_handler.h Refactor with VersionEditHandler (#6581) 2020-11-11 08:00:14 -08:00
version_edit_test.cc Add full_history_ts_low to column family (#7740) 2020-12-05 14:18:22 -08:00
version_set.cc Refactor ProcessManifestWrites a little bit (#7751) 2020-12-08 02:37:38 -08:00
version_set.h Always track WAL obsoletion (#7759) 2020-12-09 16:02:12 -08:00
version_set_test.cc Add full_history_ts_low to column family (#7740) 2020-12-05 14:18:22 -08:00
wal_edit.cc Always track WAL obsoletion (#7759) 2020-12-09 16:02:12 -08:00
wal_edit.h Always track WAL obsoletion (#7759) 2020-12-09 16:02:12 -08:00
wal_edit_test.cc Always track WAL obsoletion (#7759) 2020-12-09 16:02:12 -08:00
wal_manager.cc Make FileType Public and Replace kLogFile with kWalFile (#7580) 2020-10-22 17:06:20 -07:00
wal_manager.h Store FileSystemPtr object that contains FileSystem ptr (#7180) 2020-08-12 17:31:23 -07:00
wal_manager_test.cc Make FileType Public and Replace kLogFile with kWalFile (#7580) 2020-10-22 17:06:20 -07:00
write_batch.cc Return `Status` from `MemTable` mutation functions (#7656) 2020-11-23 16:29:04 -08:00
write_batch_base.cc
write_batch_internal.h
write_batch_test.cc Add further tests to ASSERT_STATUS_CHECKED (1) (#7679) 2020-12-08 15:55:04 -08:00
write_callback.h
write_callback_test.cc Divide WriteCallbackTest.WriteWithCallbackTest (#7037) 2020-06-30 12:31:30 -07:00
write_controller.cc
write_controller.h
write_controller_test.cc
write_thread.cc Fix StallWrite crash with mixed of slowdown/no_slowdown writes (#7508) 2020-10-06 12:44:20 -07:00
write_thread.h Remove the status.PermitUncheckedError() from WriteGroup Destructor (#7555) 2020-10-14 10:47:58 -07:00