rocksdb/monitoring
Akanksha Mahajan ae82d91492 Remove corrupted WAL files in kPointRecoveryMode with avoid_flush_duing_recovery set true (#9634)
Summary:
1) In case of non-TransactionDB and avoid_flush_during_recovery = true, RocksDB won't
flush the data from WAL to L0 for all column families if possible. As a
result, not all column families can increase their log_numbers, and
min_log_number_to_keep won't change.
2) For transaction DB (.allow_2pc), even with the flush, there may be old WAL files that it must not delete because they can contain data of uncommitted transactions and min_log_number_to_keep won't change.

If we persist a new MANIFEST with
advanced log_numbers for some column families, then during a second
crash after persisting the MANIFEST, RocksDB will see some column
families' log_numbers larger than the corrupted wal, and the "column family inconsistency" error will be hit, causing recovery to fail.

As a solution,
1. the corrupted WALs whose numbers are larger than the
corrupted wal and smaller than the new WAL will be moved to archive folder.
2. Currently, RocksDB DB::Open() may creates and writes to two new MANIFEST files even before recovery succeeds. This PR buffers the edits in a structure and writes to a new MANIFEST after recovery is successful

Pull Request resolved: https://github.com/facebook/rocksdb/pull/9634

Test Plan:
1. Added new unit tests
                2. make crast_test -j

Reviewed By: riversand963

Differential Revision: D34463666

Pulled By: akankshamahajan15

fbshipit-source-id: e233d3af0ed4e2028ca0cf051e5a334a0fdc9d19
2022-04-11 15:39:31 -07:00
..
file_read_sample.h
histogram.cc improve-histogram-performance: remove valueIndexMap_ (#8625) 2021-10-14 14:45:20 -07:00
histogram.h improve-histogram-performance: remove valueIndexMap_ (#8625) 2021-10-14 14:45:20 -07:00
histogram_test.cc Make SystemClock into a Customizable Class (#8636) 2021-09-21 09:23:48 -07:00
histogram_windowing.cc
histogram_windowing.h
in_memory_stats_history.cc
in_memory_stats_history.h
instrumented_mutex.cc Use SystemClock* instead of std::shared_ptr<SystemClock> in lower level routines (#8033) 2021-03-15 04:34:11 -07:00
instrumented_mutex.h Remove explicit padding from CacheAlignedInstrumentedMutex (#9809) 2022-04-05 18:32:05 -07:00
iostats_context.cc Add file temperature related counter and bytes stats to and io_stats (#8710) 2021-10-07 14:58:41 -07:00
iostats_context_imp.h Remove IOSTATS_ADD_IF_POSITIVE() (#8984) 2021-10-01 14:43:00 -07:00
iostats_context_test.cc
perf_context.cc Add a PerfContext counter for secondary cache hits (#8685) 2021-08-20 15:17:30 -07:00
perf_context_imp.h Add file temperature related counter and bytes stats to and io_stats (#8710) 2021-10-07 14:58:41 -07:00
perf_level.cc
perf_level_imp.h
perf_step_timer.h make PerfStepTimer struct smaller by reordering members (#7931) 2021-03-08 21:33:15 -08:00
persistent_stats_history.cc
persistent_stats_history.h
statistics.cc Update stats for Read and ReadAsync in random_access_file_reader for async prefetching (#9810) 2022-04-06 14:26:53 -07:00
statistics.h Add support for building on s390x platform (#8962) 2021-10-22 10:13:15 -07:00
statistics_test.cc Added a default Name method to Statistics (#8918) 2021-09-17 07:25:43 -07:00
stats_history_test.cc Remove corrupted WAL files in kPointRecoveryMode with avoid_flush_duing_recovery set true (#9634) 2022-04-11 15:39:31 -07:00
thread_status_impl.cc
thread_status_updater.cc
thread_status_updater.h
thread_status_updater_debug.cc
thread_status_util.cc
thread_status_util.h
thread_status_util_debug.cc