Fix race between manifest error recovery and file ingestion (#12871)

Summary:
This PR fixes an assertion failure in `DBImpl::ResumeImpl` - `assert(!versions_->descriptor_log_)`. In `VersionSet`, `descriptor_log_` has a pointer to the current MANIFEST writer. When there's an error updating the manifest, `descriptor_log_` is reset, and the error recovery thread checks `io_status()` in `VersionSet` and attempts to write a new MANIFEST. If another DB manipulation happens at the same time (like external file ingestion, column family manipulation etc), it calls `LogAndApply`, which also attempts to write a new MANIFEST. The assertion in `ResumeImpl` might fail in this case since the other MANIFEST writer may have updated `descriptor_log_`. To prevent the assertion, this fix updates both `io_status_` and `descriptor_log_` while holding the DB mutex.

The other option would have been to simply remove the assert. But I think its important to have it to ensure the invariant that `io_status_` is cleared if the MANIFEST is written successfully, and this fix makes things easier to reason about.

Pull Request resolved: https://github.com/facebook/rocksdb/pull/12871

Test Plan: Existing tests and crash test

Reviewed By: hx235

Differential Revision: D59926947

Pulled By: anand1976

fbshipit-source-id: af9ad18da3e29fc62c7ec2e30e0738aa33d4e5f1
This commit is contained in:
anand76 2024-07-19 10:37:51 -07:00 committed by Facebook GitHub Bot
parent de6d0e5ec3
commit 0fca5e31b4
2 changed files with 11 additions and 6 deletions

View File

@ -5508,6 +5508,7 @@ Status VersionSet::ProcessManifestWrites(
Status s;
IOStatus io_s;
IOStatus manifest_io_status;
std::unique_ptr<log::Writer> new_desc_log_ptr;
{
FileOptions opt_file_opts = fs_->OptimizeForManifestWrite(file_options_);
mu->Unlock();
@ -5536,6 +5537,7 @@ Status VersionSet::ProcessManifestWrites(
}
}
log::Writer* raw_desc_log_ptr = descriptor_log_.get();
if (s.ok() && new_descriptor_log) {
// This is fine because everything inside of this block is serialized --
// only one thread can be here at the same time
@ -5557,11 +5559,11 @@ Status VersionSet::ProcessManifestWrites(
db_options_->listeners, nullptr,
tmp_set.Contains(FileType::kDescriptorFile),
tmp_set.Contains(FileType::kDescriptorFile)));
descriptor_log_.reset(
new_desc_log_ptr.reset(
new log::Writer(std::move(file_writer), 0, false));
raw_desc_log_ptr = new_desc_log_ptr.get();
s = WriteCurrentStateToManifest(write_options, curr_state,
wal_additions, descriptor_log_.get(),
io_s);
wal_additions, raw_desc_log_ptr, io_s);
assert(s == io_s);
}
if (!io_s.ok()) {
@ -5607,7 +5609,7 @@ Status VersionSet::ProcessManifestWrites(
}
++idx;
#endif /* !NDEBUG */
io_s = descriptor_log_->AddRecord(write_options, record);
io_s = raw_desc_log_ptr->AddRecord(write_options, record);
if (!io_s.ok()) {
s = io_s;
manifest_io_status = io_s;
@ -5617,7 +5619,7 @@ Status VersionSet::ProcessManifestWrites(
if (s.ok()) {
io_s =
SyncManifest(db_options_, write_options, descriptor_log_->file());
SyncManifest(db_options_, write_options, raw_desc_log_ptr->file());
manifest_io_status = io_s;
TEST_SYNC_POINT_CALLBACK(
"VersionSet::ProcessManifestWrites:AfterSyncManifest", &io_s);
@ -5650,7 +5652,7 @@ Status VersionSet::ProcessManifestWrites(
if (s.ok()) {
// find offset in manifest file where this version is stored.
new_manifest_file_size = descriptor_log_->file()->GetFileSize();
new_manifest_file_size = raw_desc_log_ptr->file()->GetFileSize();
}
if (first_writer.edit_list.front()->IsColumnFamilyDrop()) {
@ -5696,6 +5698,7 @@ Status VersionSet::ProcessManifestWrites(
// Append the old manifest file to the obsolete_manifest_ list to be deleted
// by PurgeObsoleteFiles later.
if (s.ok() && new_descriptor_log) {
descriptor_log_ = std::move(new_desc_log_ptr);
obsolete_manifests_.emplace_back(
DescriptorFileName("", manifest_file_number_));
}
@ -5773,6 +5776,7 @@ Status VersionSet::ProcessManifestWrites(
// corrupted. So we need to force the next version update to start a
// new manifest file.
descriptor_log_.reset();
new_desc_log_ptr.reset();
// If manifest operations failed, then we know the CURRENT file still
// points to the original MANIFEST. Therefore, we can safely delete the
// new MANIFEST.

View File

@ -0,0 +1 @@
Fixed a race between error recovery due to manifest sync or write failure and external SST file ingestion. Both attempt to write a new manifest file, which causes an assertion failure.