rocksdb/file
Yanqin Jin 08721293ea Fix a bug causing duplicate trailing entries in WritableFile (buffered IO) (#9236)
Summary:
`db_stress` is a user of `FaultInjectionTestFS`. After injecting a write error, `db_stress` probabilistically determins
data drop (https://github.com/facebook/rocksdb/blob/6.27.fb/db_stress_tool/db_stress_test_base.cc#L2615:L2619).

In some of our recent runs of `db_stress`, we found duplicate trailing entries corresponding to file trivial move in
the MANIFEST, causing the recovery to fail, because the file move operation is not idempotent: you cannot delete a
file from a given level twice.

Investigation suggests that data buffering in both `WritableFileWriter` and `FaultInjectionTestFS` may be the root cause.

WritableFileWriter buffers data to write in a memory buffer, `WritableFileWriter::buf_`. After each
`WriteBuffered()`/`WriteBufferedWithChecksum()` succeeds, the `buf_` is cleared.

If the underlying file `WritableFileWriter::writable_file_` is opened in buffered IO mode, then `FaultInjectionTestFS`
buffers data written for each file until next file sync. After an injected error, user of `FaultInjectionFS` can
choose to drop some or none of previously buffered data. If `db_stress` does not drop any unsynced data, then
such data will still exist in the `FaultInjectionTestFS`'s buffer.

Existing implementation of `WritableileWriter::WriteBuffered()` does not clear `buf_` if there is an error. This may lead
to the data being buffered two copies: one in `WritableFileWriter`, and another in `FaultInjectionTestFS`.
We also know that the `WritableFileWriter` of MANIFEST file will close upon an error.  During `Close()`, it will flush the
content in `buf_`. If no write error is injected to `FaultInjectionTestFS` this time, then we end up with two copies of the
data appended to the file.

To fix, we clear the `WritableFileWriter::buf_` upon failure as well. We focus this PR on files opened in non-direct mode.

This PR includes a unit test to reproduce a case when write error injection
to `WritableFile` can cause duplicate trailing entries.

Pull Request resolved: https://github.com/facebook/rocksdb/pull/9236

Test Plan: make check

Reviewed By: zhichao-cao

Differential Revision: D33033984

Pulled By: riversand963

fbshipit-source-id: ebfa5a0db8cbf1ed73100528b34fcba543c5db31
2021-12-13 09:00:36 -08:00
..
delete_scheduler.cc Skip directory fsync for filesystem btrfs (#8903) 2021-11-03 12:21:27 -07:00
delete_scheduler.h Use SystemClock* instead of std::shared_ptr<SystemClock> in lower level routines (#8033) 2021-03-15 04:34:11 -07:00
delete_scheduler_test.cc Use SST file manager to track blob files as well (#8037) 2021-03-17 20:44:49 -07:00
file_prefetch_buffer.cc Fix bug in rocksdb internal automatic prefetching (#9234) 2021-11-30 22:53:10 -08:00
file_prefetch_buffer.h Fix bug in rocksdb internal automatic prefetching (#9234) 2021-11-30 22:53:10 -08:00
file_util.cc Support readahead during compaction for blob files (#9187) 2021-11-19 17:53:47 -08:00
file_util.h Allow WAL dir to change with db dir (#8582) 2021-07-30 12:16:44 -07:00
filename.cc Skip directory fsync for filesystem btrfs (#8903) 2021-11-03 12:21:27 -07:00
filename.h Add (Live)FileStorageInfo API (#8968) 2021-10-16 10:04:32 -07:00
line_file_reader.cc Replace Status with IOStatus in the backupable_db (#8820) 2021-09-15 15:09:48 -07:00
line_file_reader.h Replace Status with IOStatus in the backupable_db (#8820) 2021-09-15 15:09:48 -07:00
prefetch_test.cc Fix bug in rocksdb internal automatic prefetching (#9234) 2021-11-30 22:53:10 -08:00
random_access_file_reader.cc Add listener API that notifies on IOError (#9177) 2021-11-18 17:11:19 -08:00
random_access_file_reader.h Add listener API that notifies on IOError (#9177) 2021-11-18 17:11:19 -08:00
random_access_file_reader_test.cc use the pointer directly (#8095) 2021-03-26 21:31:16 -07:00
read_write_util.cc Move old files to warm tier in FIFO compactions (#8310) 2021-08-09 12:51:14 -07:00
read_write_util.h Refactor: add LineFileReader and Status::MustCheck (#8026) 2021-03-09 20:12:38 -08:00
readahead_file_info.h Reuse internal auto readhead_size at each Level (expect L0) for Iterations (#9056) 2021-11-10 16:20:04 -08:00
readahead_raf.cc Make StringEnv, StringSink, StringSource use FS classes (#7786) 2021-01-04 16:01:01 -08:00
readahead_raf.h Make StringEnv, StringSink, StringSource use FS classes (#7786) 2021-01-04 16:01:01 -08:00
sequence_file_reader.cc Add file operation callbacks to SequentialFileReader (#8982) 2021-10-05 10:51:59 -07:00
sequence_file_reader.h Add listener API that notifies on IOError (#9177) 2021-11-18 17:11:19 -08:00
sst_file_manager_impl.cc Cleanup includes in dbformat.h (#8930) 2021-09-29 04:04:40 -07:00
sst_file_manager_impl.h Use SST file manager to track blob files as well (#8037) 2021-03-17 20:44:49 -07:00
writable_file_writer.cc Fix a bug causing duplicate trailing entries in WritableFile (buffered IO) (#9236) 2021-12-13 09:00:36 -08:00
writable_file_writer.h Add listener API that notifies on IOError (#9177) 2021-11-18 17:11:19 -08:00