rocksdb/file
Peter Dillinger 98393f0139 Fix Checkpoint hard link of inactive but unsynced WAL (#12731)
Summary:
Background: there is one active WAL file but there can be
several more WAL files in various states. Those other WALs are always
in a "flushed" state but could be on the `logs_` list not yet fully
synced. We currently allow any WAL that is not the active WAL to be
hard-linked when creating a Checkpoint, as although it might still be
open for write, we are not appending any more data to it.

The problem is that a created Checkpoint is supposed to be fully synced
on return of that function, and a hard-linked WAL in the state described
above might not be fully synced. (Through some prudence in https://github.com/facebook/rocksdb/issues/10083,
it would synced if using track_and_verify_wals_in_manifest=true.)

The fix is a step toward a long term goal of removing the need to query
the filesystem to determine WAL files and their state. (I consider it
dubious any time we independently read from or query metadata from a
file we have open for writing, as this makes us more susceptible to
FileSystem deficiencies or races.) More specifically:
* Detect which WALs might not be fully synced, according to our DBImpl
  metadata, and prevent hard linking those (with `trim_to_size=true`
  from `GetLiveFilesStorageInfo()`. And while we're at it, use our known
  flushed sizes for those WALs.
* To avoid a race between that and GetSortedWalFiles(), track a maximum
  needed WAL number for the Checkpoint/GetLiveFilesStorageInfo.
* Because of the level of consistency provided by those two, we no
  longer need to consider syncing as part of the FlushWAL in
  GetLiveFilesStorageInfo. (We determine the max WAL number consistent
  with the manifest file size, while holding DB mutex. Should make
  track_and_verify_wals_in_manifest happy.) This makes the premise of
  test PutRaceWithCheckpointTrackedWalSync obsolete (sync point callback
  no longer hit) so the test is removed, with crash test as backstop for
  related issues. See https://github.com/facebook/rocksdb/issues/10185

Stacked on https://github.com/facebook/rocksdb/issues/12729

Pull Request resolved: https://github.com/facebook/rocksdb/pull/12731

Test Plan:
Expanded an existing test, which now fails before fix.
Also long runs of blackbox_crash_test with amplified checkpoint frequency.

Reviewed By: cbi42

Differential Revision: D58199629

Pulled By: pdillinger

fbshipit-source-id: 376e55f4a2b082cd2adb6408a41209de14422382
2024-06-05 17:40:09 -07:00
..
delete_scheduler.cc Fix delete obsolete files on recovery not rate limited (#12590) 2024-05-01 12:26:54 -07:00
delete_scheduler.h Remove RocksDB LITE (#11147) 2023-01-27 13:14:19 -08:00
delete_scheduler_test.cc Prefer static_cast in place of most reinterpret_cast (#12308) 2024-02-07 10:44:11 -08:00
file_prefetch_buffer.cc Change ReadAsync callback API to remove const from FSReadRequest (#11649) 2024-02-16 09:14:55 -08:00
file_prefetch_buffer.h Change ReadAsync callback API to remove const from FSReadRequest (#11649) 2024-02-16 09:14:55 -08:00
file_util.cc Fix Checkpoint hard link of inactive but unsynced WAL (#12731) 2024-06-05 17:40:09 -07:00
file_util.h Fix delete obsolete files on recovery not rate limited (#12590) 2024-05-01 12:26:54 -07:00
filename.cc Group SST write in flush, compaction and db open with new stats (#11910) 2023-12-29 15:29:23 -08:00
filename.h Remove unnecessary, confusing 'extern' (#12300) 2024-01-29 10:38:08 -08:00
line_file_reader.cc Support read rate-limiting in SequentialFileReader (#9973) 2022-05-24 10:28:57 -07:00
line_file_reader.h Support read rate-limiting in SequentialFileReader (#9973) 2022-05-24 10:28:57 -07:00
prefetch_test.cc Prefer static_cast in place of most reinterpret_cast (#12308) 2024-02-07 10:44:11 -08:00
random_access_file_reader.cc default_write_temperature option (#12388) 2024-02-28 14:36:13 -08:00
random_access_file_reader.h Change ReadAsync callback API to remove const from FSReadRequest (#11649) 2024-02-16 09:14:55 -08:00
random_access_file_reader_test.cc internal_repo_rocksdb (4372117296613874540) (#12117) 2023-12-04 11:17:32 -08:00
read_write_util.cc Run Clang format on file folder (#10860) 2022-10-24 18:34:52 -07:00
read_write_util.h Remove unnecessary, confusing 'extern' (#12300) 2024-01-29 10:38:08 -08:00
readahead_file_info.h Reuse internal auto readhead_size at each Level (expect L0) for Iterations (#9056) 2021-11-10 16:20:04 -08:00
readahead_raf.cc Change internal headers with duplicate names (#11408) 2023-05-17 11:27:09 -07:00
readahead_raf.h Make StringEnv, StringSink, StringSource use FS classes (#7786) 2021-01-04 16:01:01 -08:00
sequence_file_reader.cc Retry DB::Open upon a corruption detected while reading the MANIFEST (#12518) 2024-04-18 17:36:33 -07:00
sequence_file_reader.h Retry DB::Open upon a corruption detected while reading the MANIFEST (#12518) 2024-04-18 17:36:33 -07:00
sst_file_manager_impl.cc Remove RocksDB LITE (#11147) 2023-01-27 13:14:19 -08:00
sst_file_manager_impl.h Fix delete obsolete files on recovery not rate limited (#12590) 2024-05-01 12:26:54 -07:00
writable_file_writer.cc Sync WAL during db Close() (#12556) 2024-05-20 17:33:43 -07:00
writable_file_writer.h FaultInjectionTestFS read unsynced data by default (#12729) 2024-06-04 15:25:23 -07:00