rocksdb/env
Cheng Chang 5e794b0841 Fix a recovery corner case (#7621)
Summary:
Consider the following sequence of events:

1. Db flushed an SST with file number N, appended to MANIFEST, and tried to sync the MANIFEST.
2. Syncing MANIFEST failed and db crashed.
3. Db tried to recover with this MANIFEST. In the meantime, no entry about the newly-flushed SST was found in the MANIFEST. Therefore, RocksDB replayed WAL and tried to flush to an SST file reusing the same file number N. This failed because file system does not support overwrite. Then Db deleted this file.
4. Db crashed again.
5. Db tried to recover. When db read the MANIFEST, there was an entry referencing N.sst. This could happen probably because the append in step 1 finally reached the MANIFEST and became visible. Since N.sst had been deleted in step 3, recovery failed.

It is possible that N.sst created in step 1 is valid. Although step 3 would still fail since the MANIFEST was not synced properly in step 1 and 2, deleting N.sst would make it impossible for the db to recover even if the remaining part of MANIFEST was appended and visible after step 5.

After this PR, in step 3, immediately after recovering from MANIFEST, a new MANIFEST is created, then we find that N.sst is not referenced in the MANIFEST, so we delete it, and we'll not reuse N as file number. Then in step 5, since the new MANIFEST does not contain N.sst, the recovery failure situation in step 5 won't happen.

Pull Request resolved: https://github.com/facebook/rocksdb/pull/7621

Test Plan:
1. some tests are updated, because these tests assume that new MANIFEST is created after WAL recovery.
2. a new unit test is added in db_basic_test to simulate step 3.

Reviewed By: riversand963

Differential Revision: D24668144

Pulled By: cheng-chang

fbshipit-source-id: 90d7487fbad2bc3714f5ede46ea949895b15ae3b
2020-11-07 22:23:27 -08:00
..
composite_env_wrapper.h Add AppendWithVerify and PositionedAppendWithVerify to Env and FileSystem (#7419) 2020-09-23 19:02:26 -07:00
env.cc Remove unused includes (#7604) 2020-10-28 23:22:27 -07:00
env_basic_test.cc Fix many tests to run with MEM_ENV and ENCRYPTED_ENV; Introduce a MemoryFileSystem class (#7566) 2020-10-27 10:33:09 -07:00
env_chroot.cc Make env*_test work with ASSERT_STATUS_CHECKED (#7176) 2020-07-28 22:59:48 -07:00
env_chroot.h Replace namespace name "rocksdb" with ROCKSDB_NAMESPACE (#6433) 2020-02-20 12:09:57 -08:00
env_encryption.cc Changes to EncryptedEnv public API (#7279) 2020-09-15 17:14:10 -07:00
env_encryption_ctr.h Changes to EncryptedEnv public API (#7279) 2020-09-15 17:14:10 -07:00
env_hdfs.cc fix build with 'USE_HDFS' on windows (#6950) 2020-06-12 16:21:50 -07:00
env_posix.cc Remove unused includes (#7604) 2020-10-28 23:22:27 -07:00
env_test.cc Require only one `Logger::Logv()` implementation (#7605) 2020-10-28 10:00:51 -07:00
file_system.cc Remove duplicate colon in Status message (#7041) 2020-08-06 15:18:04 -07:00
file_system_tracer.cc Update IOTrace operations in stackable_db.h (#7514) 2020-10-14 10:16:15 -07:00
file_system_tracer.h Update IOTrace operations in stackable_db.h (#7514) 2020-10-14 10:16:15 -07:00
fs_posix.cc Remove unused includes (#7604) 2020-10-28 23:22:27 -07:00
io_posix.cc Remove unused includes (#7604) 2020-10-28 23:22:27 -07:00
io_posix.h Add AppendWithVerify and PositionedAppendWithVerify to Env and FileSystem (#7419) 2020-09-23 19:02:26 -07:00
io_posix_test.cc Status check enforcement for io_posix_test and options_settable_test (#6857) 2020-05-19 19:22:28 -07:00
mock_env.cc Fix a recovery corner case (#7621) 2020-11-07 22:23:27 -08:00
mock_env.h Fix many tests to run with MEM_ENV and ENCRYPTED_ENV; Introduce a MemoryFileSystem class (#7566) 2020-10-27 10:33:09 -07:00
mock_env_test.cc Replace namespace name "rocksdb" with ROCKSDB_NAMESPACE (#6433) 2020-02-20 12:09:57 -08:00