rocksdb/db/db_impl
Hui Xiao 2f76ab150d Fix missing WAL in new manifest by rolling over the WAL deletion record from prev manifest (#10892)
Summary:
**Context**
`Options::track_and_verify_wals_in_manifest = true` verifies each of the WALs tracked in manifest indeed presents in the WAL folder. If not, a corruption "Missing WAL with log number" will be thrown.

`DB::SyncWAL()` called at a specific timing (i.e, at the `TEST_SYNC_POINT("FindObsoleteFiles::PostMutexUnlock")`) can record in a new manifest the WAL addition of a WAL file that already had a WAL deletion recorded in the previous manifest.
And the WAL deletion record is not rollover-ed to the new manifest. So the new manifest creates the illusion of such WAL never gets deleted and should presents at db re/open.
- Such WAL deletion record can be caused by flushing the memtable associated with that WAL and such WAL deletion can actually happen in` PurgeObsoleteFiles()`.

As a consequence, upon `DB::Reopen()`, this WAL file can be deleted while manifest still has its WAL addition record , which causes a false alarm of corruption "Missing WAL with log number" to be thrown.

**Summary**
This PR fixes this false alarm by rolling over the WAL deletion record from prev manifest to the new manifest by adding the WAL deletion record to the new manifest.

**Test**
- Make check
- Added new unit test `TEST_F(DBWALTest, FixSyncWalOnObseletedWalWithNewManifestCausingMissingWAL)` that failed before the fix and passed after
- [Ongoing]CI stress test + aggressive value as in https://github.com/facebook/rocksdb/pull/10761 , which is how this false alarm was first surfaced, to confirm such false alarm disappears
- [Ongoing]Regular CI stress test to confirm such fix didn't harm anything

Pull Request resolved: https://github.com/facebook/rocksdb/pull/10892

Reviewed By: ajkr

Differential Revision: D40778965

Pulled By: hx235

fbshipit-source-id: a512364bfdeb0b1a55c171890e60d856c528f37f
2022-11-29 14:14:43 -08:00
..
compacted_db_impl.cc Run clang-format on some files in db/db_impl directory (#10869) 2022-10-25 13:49:09 -07:00
compacted_db_impl.h Run clang-format on some files in db/db_impl directory (#10869) 2022-10-25 13:49:09 -07:00
db_impl.cc Prevent iterating over range tombstones beyond `iterate_upper_bound` (#10966) 2022-11-23 14:27:14 -08:00
db_impl.h Basic Support for Merge with user-defined timestamp (#10819) 2022-10-31 22:28:58 -07:00
db_impl_compaction_flush.cc Revert PR 10777 "Fix FIFO causing overlapping seqnos in L0 files due to overla…" (#10999) 2022-11-29 10:56:42 -08:00
db_impl_debug.cc Add manual_wal_flush, FlushWAL() to stress/crash test (#10698) 2022-09-30 15:48:33 -07:00
db_impl_experimental.cc Remove unused fields from FileMetaData (temporarily) (#10443) 2022-08-01 17:56:13 -07:00
db_impl_files.cc Fix missing WAL in new manifest by rolling over the WAL deletion record from prev manifest (#10892) 2022-11-29 14:14:43 -08:00
db_impl_open.cc Run clang-format on some files in db/db_impl directory (#10869) 2022-10-25 13:49:09 -07:00
db_impl_readonly.cc Skip swaths of range tombstone covered keys in merging iterator (2022 edition) (#10449) 2022-09-02 09:51:19 -07:00
db_impl_readonly.h Run clang-format on some files in db/db_impl directory (#10869) 2022-10-25 13:49:09 -07:00
db_impl_secondary.cc Run clang-format on some files in db/db_impl directory (#10869) 2022-10-25 13:49:09 -07:00
db_impl_secondary.h Run clang-format on some files in db/db_impl directory (#10869) 2022-10-25 13:49:09 -07:00
db_impl_write.cc Basic Support for Merge with user-defined timestamp (#10819) 2022-10-31 22:28:58 -07:00