rocksdb/db/error_handler_fs_test.cc
Peter Dillinger 204fcff751 HyperClockCache support for SecondaryCache, with refactoring (#11301)
Summary:
Internally refactors SecondaryCache integration out of LRUCache specifically and into a wrapper/adapter class that works with various Cache implementations. Notably, this relies on separating the notion of async lookup handles from other cache handles, so that HyperClockCache doesn't have to deal with the problem of allocating handles from the hash table for lookups that might fail anyway, and might be on the same key without support for coalescing. (LRUCache's hash table can incorporate previously allocated handles thanks to its pointer indirection.) Specifically, I'm worried about the case in which hundreds of threads try to access the same block and probing in the hash table degrades to linear search on the pile of entries with the same key.

This change is a big step in the direction of supporting stacked SecondaryCaches, but there are obstacles to completing that. Especially, there is no SecondaryCache hook for evictions to pass from one to the next. It has been proposed that evictions be transmitted simply as the persisted data (as in SaveToCallback), but given the current structure provided by the CacheItemHelpers, that would require an extra copy of the block data, because there's intentionally no way to ask for a contiguous Slice of the data (to allow for flexibility in storage). `AsyncLookupHandle` and the re-worked `WaitAll()` should be essentially prepared for stacked SecondaryCaches, but several "TODO with stacked secondaries" issues remain in various places.

It could be argued that the stacking instead be done as a SecondaryCache adapter that wraps two (or more) SecondaryCaches, but at least with the current API that would require an extra heap allocation on SecondaryCache Lookup for a wrapper SecondaryCacheResultHandle that can transfer a Lookup between secondaries. We could also consider trying to unify the Cache and SecondaryCache APIs, though that might be difficult if `AsyncLookupHandle` is kept a fixed struct.

## cache.h (public API)
Moves `secondary_cache` option from LRUCacheOptions to ShardedCacheOptions so that it is applicable to HyperClockCache.

## advanced_cache.h (advanced public API)
* Add `Cache::CreateStandalone()` so that the SecondaryCache support wrapper can use it.
* Add `SetEvictionCallback()` / `eviction_callback_` so that the SecondaryCache support wrapper can use it. Only a single callback is supported for efficiency. If there is ever a need for more than one, hopefully that can be handled with a broadcast callback wrapper.

These are essentially the two "extra" pieces of `Cache` for pulling out specific SecondaryCache support from the `Cache` implementation. I think it's a good trade-off as these are reasonable, limited, and reusable "cut points" into the `Cache` implementations.

* Remove async capability from standard `Lookup()` (getting rid of awkward restrictions on pending Handles) and add `AsyncLookupHandle` and `StartAsyncLookup()`. As noted in the comments, the full struct of `AsyncLookupHandle` is exposed so that it can be stack allocated, for efficiency, though more data is being copied around than before, which could impact performance. (Lookup info -> AsyncLookupHandle -> Handle vs. Lookup info -> Handle)

I could foresee a future in which a Cache internally saves a pointer to the AsyncLookupHandle, which means it's dangerous to allow it to be copyable or even movable. It also means it's not compatible with std::vector (which I don't like requiring as an API parameter anyway), so `WaitAll()` expects any contiguous array of AsyncLookupHandles. I believe this is best for common case efficiency, while behaving well in other cases also. For example, `WaitAll()` has no effect on default-constructed AsyncLookupHandles, which look like a completed cache miss.

## cacheable_entry.h
A couple of functions are obsolete because Cache::Handle can no longer be pending.

## cache.cc
Provides default implementations for new or revamped Cache functions, especially appropriate for non-blocking caches.

## secondary_cache_adapter.{h,cc}
The full details of the Cache wrapper adding SecondaryCache support. Essentially replicates the SecondaryCache handling that was in LRUCache, but obviously refactored. There is a bit of logic duplication, where Lookup() is essentially a manually optimized version of StartAsyncLookup() and Wait(), but it's roughly a dozen lines of code.

## sharded_cache.h, typed_cache.h, charged_cache.{h,cc}, sim_cache.cc
Simply updated for Cache API changes.

## lru_cache.{h,cc}
Carefully remove SecondaryCache logic, implement `CreateStandalone` and eviction handler functionality.

## clock_cache.{h,cc}
Expose existing `CreateStandalone` functionality, add eviction handler functionality. Light refactoring.

## block_based_table_reader*
Mostly re-worked the only usage of async Lookup, which is in BlockBasedTable::MultiGet. Used arrays in place of autovector in some places for efficiency. Simplified some logic by not trying to process some cache results before they're all ready.

Created new function `BlockBasedTable::GetCachePriority()` to reduce some pre-existing code duplication (and avoid making it worse).

Fixed at least one small bug from the prior confusing mixture of async and sync Lookups. In MaybeReadBlockAndLoadToCache(), called by RetrieveBlock(), called by MultiGet() with wait=false, is_cache_hit for the block_cache_tracer entry would not be set to true if the handle was pending after Lookup and before Wait.

## Intended follow-up work
* Figure out if there are any missing stats or block_cache_tracer work in refactored BlockBasedTable::MultiGet
* Stacked secondary caches (see above discussion)
* See if we can make up for the small MultiGet performance regression.
* Study more performance with SecondaryCache
* Items evicted from over-full LRUCache in Release were not being demoted to SecondaryCache, and still aren't to minimize unit test churn. Ideally they would be demoted, but it's an exceptional case so not a big deal.
* Use CreateStandalone for cache reservations (save unnecessary hash table operations). Not a big deal, but worthy cleanup.
* Somehow I got the contract for SecondaryCache::Insert wrong in #10945. (Doesn't take ownership!) That API comment needs to be fixed, but didn't want to mingle that in here.

Pull Request resolved: https://github.com/facebook/rocksdb/pull/11301

Test Plan:
## Unit tests
Generally updated to include HCC in SecondaryCache tests, though HyperClockCache has some different, less strict behaviors that leads to some tests not really being set up to work with it. Some of the tests remain disabled with it, but I think we have good coverage without them.

## Crash/stress test
Updated to use the new combination.

## Performance
First, let's check for regression on caches without secondary cache configured. Adding support for the eviction callback is likely to have a tiny effect, but it shouldn't be worrisome. LRUCache could benefit slightly from less logic around SecondaryCache handling. We can test with cache_bench default settings, built with DEBUG_LEVEL=0 and PORTABLE=0.

```
(while :; do base/cache_bench --cache_type=hyper_clock_cache | grep Rough; done) | awk '{ sum += $9; count++; print $0; print "Average: " int(sum / count) }'
```

**Before** this and #11299 (which could also have a small effect), running for about an hour, before & after running concurrently for each cache type:
HyperClockCache: 3168662 (average parallel ops/sec)
LRUCache: 2940127

**After** this and #11299, running for about an hour:
HyperClockCache: 3164862 (average parallel ops/sec) (0.12% slower)
LRUCache: 2940928 (0.03% faster)

This is an acceptable difference IMHO.

Next, let's consider essentially the worst case of new CPU overhead affecting overall performance. MultiGet uses the async lookup interface regardless of whether SecondaryCache or folly are used. We can configure a benchmark where all block cache queries are for data blocks, and all are hits.

Create DB and test (before and after tests running simultaneously):
```
TEST_TMPDIR=/dev/shm ./db_bench -benchmarks=fillrandom -num=30000000 -disable_wal=1 -bloom_bits=16
TEST_TMPDIR=/dev/shm base/db_bench -benchmarks=multireadrandom[-X30] -readonly -multiread_batched -batch_size=32 -num=30000000 -bloom_bits=16 -cache_size=6789000000 -duration 20 -threads=16
```

**Before**:
multireadrandom [AVG    30 runs] : 3444202 (± 57049) ops/sec;  240.9 (± 4.0) MB/sec
multireadrandom [MEDIAN 30 runs] : 3514443 ops/sec;  245.8 MB/sec
**After**:
multireadrandom [AVG    30 runs] : 3291022 (± 58851) ops/sec;  230.2 (± 4.1) MB/sec
multireadrandom [MEDIAN 30 runs] : 3366179 ops/sec;  235.4 MB/sec

So that's roughly a 3% regression, on kind of a *worst case* test of MultiGet CPU. Similar story with HyperClockCache:

**Before**:
multireadrandom [AVG    30 runs] : 3933777 (± 41840) ops/sec;  275.1 (± 2.9) MB/sec
multireadrandom [MEDIAN 30 runs] : 3970667 ops/sec;  277.7 MB/sec
**After**:
multireadrandom [AVG    30 runs] : 3755338 (± 30391) ops/sec;  262.6 (± 2.1) MB/sec
multireadrandom [MEDIAN 30 runs] : 3785696 ops/sec;  264.8 MB/sec

Roughly a 4-5% regression. Not ideal, but not the whole story, fortunately.

Let's also look at Get() in db_bench:

```
TEST_TMPDIR=/dev/shm ./db_bench -benchmarks=readrandom[-X30] -readonly -num=30000000 -bloom_bits=16 -cache_size=6789000000 -duration 20 -threads=16
```

**Before**:
readrandom [AVG    30 runs] : 2198685 (± 13412) ops/sec;  153.8 (± 0.9) MB/sec
readrandom [MEDIAN 30 runs] : 2209498 ops/sec;  154.5 MB/sec
**After**:
readrandom [AVG    30 runs] : 2292814 (± 43508) ops/sec;  160.3 (± 3.0) MB/sec
readrandom [MEDIAN 30 runs] : 2365181 ops/sec;  165.4 MB/sec

That's showing roughly a 4% improvement, perhaps because of the secondary cache code that is no longer part of LRUCache. But weirdly, HyperClockCache is also showing 2-3% improvement:

**Before**:
readrandom [AVG    30 runs] : 2272333 (± 9992) ops/sec;  158.9 (± 0.7) MB/sec
readrandom [MEDIAN 30 runs] : 2273239 ops/sec;  159.0 MB/sec
**After**:
readrandom [AVG    30 runs] : 2332407 (± 11252) ops/sec;  163.1 (± 0.8) MB/sec
readrandom [MEDIAN 30 runs] : 2335329 ops/sec;  163.3 MB/sec

Reviewed By: ltamasi

Differential Revision: D44177044

Pulled By: pdillinger

fbshipit-source-id: e808e48ff3fe2f792a79841ba617be98e48689f5
2023-03-17 20:23:49 -07:00

2863 lines
97 KiB
C++

// Copyright (c) 2011-present, Facebook, Inc. All rights reserved.
// This source code is licensed under both the GPLv2 (found in the
// COPYING file in the root directory) and Apache 2.0 License
// (found in the LICENSE.Apache file in the root directory).
//
// Copyright (c) 2011 The LevelDB Authors. All rights reserved.
// Use of this source code is governed by a BSD-style license that can be
// found in the LICENSE file. See the AUTHORS file for names of contributors.
#include "db/db_test_util.h"
#include "file/sst_file_manager_impl.h"
#include "port/stack_trace.h"
#include "rocksdb/io_status.h"
#include "rocksdb/sst_file_manager.h"
#include "test_util/sync_point.h"
#include "util/random.h"
#include "utilities/fault_injection_env.h"
#include "utilities/fault_injection_fs.h"
namespace ROCKSDB_NAMESPACE {
class DBErrorHandlingFSTest : public DBTestBase {
public:
DBErrorHandlingFSTest()
: DBTestBase("db_error_handling_fs_test", /*env_do_fsync=*/true) {
fault_fs_.reset(new FaultInjectionTestFS(env_->GetFileSystem()));
fault_env_.reset(new CompositeEnvWrapper(env_, fault_fs_));
}
std::string GetManifestNameFromLiveFiles() {
std::vector<std::string> live_files;
uint64_t manifest_size;
Status s = dbfull()->GetLiveFiles(live_files, &manifest_size, false);
if (!s.ok()) {
return "";
}
for (auto& file : live_files) {
uint64_t num = 0;
FileType type;
if (ParseFileName(file, &num, &type) && type == kDescriptorFile) {
return file;
}
}
return "";
}
std::shared_ptr<FaultInjectionTestFS> fault_fs_;
std::unique_ptr<Env> fault_env_;
};
class ErrorHandlerFSListener : public EventListener {
public:
ErrorHandlerFSListener()
: mutex_(),
cv_(&mutex_),
no_auto_recovery_(false),
recovery_complete_(false),
file_creation_started_(false),
override_bg_error_(false),
file_count_(0),
fault_fs_(nullptr) {}
~ErrorHandlerFSListener() {
file_creation_error_.PermitUncheckedError();
bg_error_.PermitUncheckedError();
new_bg_error_.PermitUncheckedError();
}
void OnTableFileCreationStarted(
const TableFileCreationBriefInfo& /*ti*/) override {
InstrumentedMutexLock l(&mutex_);
file_creation_started_ = true;
if (file_count_ > 0) {
if (--file_count_ == 0) {
fault_fs_->SetFilesystemActive(false, file_creation_error_);
file_creation_error_ = IOStatus::OK();
}
}
cv_.SignalAll();
}
void OnErrorRecoveryBegin(BackgroundErrorReason /*reason*/, Status bg_error,
bool* auto_recovery) override {
bg_error.PermitUncheckedError();
if (*auto_recovery && no_auto_recovery_) {
*auto_recovery = false;
}
}
void OnErrorRecoveryEnd(const BackgroundErrorRecoveryInfo& info) override {
InstrumentedMutexLock l(&mutex_);
recovery_complete_ = true;
cv_.SignalAll();
new_bg_error_ = info.new_bg_error;
}
bool WaitForRecovery(uint64_t /*abs_time_us*/) {
InstrumentedMutexLock l(&mutex_);
while (!recovery_complete_) {
cv_.Wait(/*abs_time_us*/);
}
if (recovery_complete_) {
recovery_complete_ = false;
return true;
}
return false;
}
void WaitForTableFileCreationStarted(uint64_t /*abs_time_us*/) {
InstrumentedMutexLock l(&mutex_);
while (!file_creation_started_) {
cv_.Wait(/*abs_time_us*/);
}
file_creation_started_ = false;
}
void OnBackgroundError(BackgroundErrorReason /*reason*/,
Status* bg_error) override {
if (override_bg_error_) {
*bg_error = bg_error_;
override_bg_error_ = false;
}
}
void EnableAutoRecovery(bool enable = true) { no_auto_recovery_ = !enable; }
void OverrideBGError(Status bg_err) {
bg_error_ = bg_err;
override_bg_error_ = true;
}
void InjectFileCreationError(FaultInjectionTestFS* fs, int file_count,
IOStatus io_s) {
fault_fs_ = fs;
file_count_ = file_count;
file_creation_error_ = io_s;
}
Status new_bg_error() { return new_bg_error_; }
private:
InstrumentedMutex mutex_;
InstrumentedCondVar cv_;
bool no_auto_recovery_;
bool recovery_complete_;
bool file_creation_started_;
bool override_bg_error_;
int file_count_;
IOStatus file_creation_error_;
Status bg_error_;
Status new_bg_error_;
FaultInjectionTestFS* fault_fs_;
};
TEST_F(DBErrorHandlingFSTest, FLushWriteError) {
std::shared_ptr<ErrorHandlerFSListener> listener(
new ErrorHandlerFSListener());
Options options = GetDefaultOptions();
options.env = fault_env_.get();
options.create_if_missing = true;
options.listeners.emplace_back(listener);
options.statistics = CreateDBStatistics();
Status s;
listener->EnableAutoRecovery(false);
DestroyAndReopen(options);
ASSERT_OK(Put(Key(0), "val"));
SyncPoint::GetInstance()->SetCallBack("FlushJob::Start", [&](void*) {
fault_fs_->SetFilesystemActive(false, IOStatus::NoSpace("Out of space"));
});
SyncPoint::GetInstance()->EnableProcessing();
s = Flush();
ASSERT_EQ(s.severity(), ROCKSDB_NAMESPACE::Status::Severity::kHardError);
SyncPoint::GetInstance()->DisableProcessing();
fault_fs_->SetFilesystemActive(true);
s = dbfull()->Resume();
ASSERT_OK(s);
ASSERT_EQ(1, options.statistics->getAndResetTickerCount(
ERROR_HANDLER_BG_ERROR_COUNT));
ASSERT_EQ(1, options.statistics->getAndResetTickerCount(
ERROR_HANDLER_BG_IO_ERROR_COUNT));
ASSERT_EQ(0, options.statistics->getAndResetTickerCount(
ERROR_HANDLER_BG_RETRYABLE_IO_ERROR_COUNT));
ASSERT_EQ(0, options.statistics->getAndResetTickerCount(
ERROR_HANDLER_AUTORESUME_COUNT));
ASSERT_EQ(0, options.statistics->getAndResetTickerCount(
ERROR_HANDLER_AUTORESUME_RETRY_TOTAL_COUNT));
ASSERT_EQ(0, options.statistics->getAndResetTickerCount(
ERROR_HANDLER_AUTORESUME_SUCCESS_COUNT));
Reopen(options);
ASSERT_EQ("val", Get(Key(0)));
Destroy(options);
}
// All the NoSpace IOError will be handled as the regular BG Error no matter the
// retryable flag is set of not. So the auto resume for retryable IO Error will
// not be triggered. Also, it is mapped as hard error.
TEST_F(DBErrorHandlingFSTest, FLushWriteNoSpaceError) {
std::shared_ptr<ErrorHandlerFSListener> listener(
new ErrorHandlerFSListener());
Options options = GetDefaultOptions();
options.env = fault_env_.get();
options.create_if_missing = true;
options.listeners.emplace_back(listener);
options.max_bgerror_resume_count = 2;
options.bgerror_resume_retry_interval = 100000; // 0.1 second
options.statistics = CreateDBStatistics();
Status s;
listener->EnableAutoRecovery(false);
DestroyAndReopen(options);
IOStatus error_msg = IOStatus::NoSpace("Retryable IO Error");
error_msg.SetRetryable(true);
ASSERT_OK(Put(Key(1), "val1"));
SyncPoint::GetInstance()->SetCallBack(
"BuildTable:BeforeFinishBuildTable",
[&](void*) { fault_fs_->SetFilesystemActive(false, error_msg); });
SyncPoint::GetInstance()->EnableProcessing();
s = Flush();
ASSERT_EQ(s.severity(), ROCKSDB_NAMESPACE::Status::Severity::kHardError);
SyncPoint::GetInstance()->DisableProcessing();
fault_fs_->SetFilesystemActive(true);
s = dbfull()->Resume();
ASSERT_OK(s);
ASSERT_EQ(1, options.statistics->getAndResetTickerCount(
ERROR_HANDLER_BG_ERROR_COUNT));
ASSERT_EQ(1, options.statistics->getAndResetTickerCount(
ERROR_HANDLER_BG_IO_ERROR_COUNT));
ASSERT_EQ(0, options.statistics->getAndResetTickerCount(
ERROR_HANDLER_BG_RETRYABLE_IO_ERROR_COUNT));
ASSERT_EQ(0, options.statistics->getAndResetTickerCount(
ERROR_HANDLER_AUTORESUME_COUNT));
ASSERT_EQ(0, options.statistics->getAndResetTickerCount(
ERROR_HANDLER_AUTORESUME_RETRY_TOTAL_COUNT));
ASSERT_EQ(0, options.statistics->getAndResetTickerCount(
ERROR_HANDLER_AUTORESUME_SUCCESS_COUNT));
Destroy(options);
}
TEST_F(DBErrorHandlingFSTest, FLushWriteRetryableError) {
std::shared_ptr<ErrorHandlerFSListener> listener(
new ErrorHandlerFSListener());
Options options = GetDefaultOptions();
options.env = fault_env_.get();
options.create_if_missing = true;
options.listeners.emplace_back(listener);
options.max_bgerror_resume_count = 0;
options.statistics = CreateDBStatistics();
Status s;
listener->EnableAutoRecovery(false);
DestroyAndReopen(options);
IOStatus error_msg = IOStatus::IOError("Retryable IO Error");
error_msg.SetRetryable(true);
ASSERT_OK(Put(Key(1), "val1"));
SyncPoint::GetInstance()->SetCallBack(
"BuildTable:BeforeFinishBuildTable",
[&](void*) { fault_fs_->SetFilesystemActive(false, error_msg); });
SyncPoint::GetInstance()->EnableProcessing();
s = Flush();
ASSERT_EQ(s.severity(), ROCKSDB_NAMESPACE::Status::Severity::kSoftError);
SyncPoint::GetInstance()->DisableProcessing();
fault_fs_->SetFilesystemActive(true);
s = dbfull()->Resume();
ASSERT_OK(s);
ASSERT_EQ(1, options.statistics->getAndResetTickerCount(
ERROR_HANDLER_BG_ERROR_COUNT));
ASSERT_EQ(1, options.statistics->getAndResetTickerCount(
ERROR_HANDLER_BG_IO_ERROR_COUNT));
ASSERT_EQ(1, options.statistics->getAndResetTickerCount(
ERROR_HANDLER_BG_RETRYABLE_IO_ERROR_COUNT));
ASSERT_EQ(0, options.statistics->getAndResetTickerCount(
ERROR_HANDLER_AUTORESUME_COUNT));
ASSERT_EQ(0, options.statistics->getAndResetTickerCount(
ERROR_HANDLER_AUTORESUME_RETRY_TOTAL_COUNT));
ASSERT_EQ(0, options.statistics->getAndResetTickerCount(
ERROR_HANDLER_AUTORESUME_SUCCESS_COUNT));
Reopen(options);
ASSERT_EQ("val1", Get(Key(1)));
ASSERT_OK(Put(Key(2), "val2"));
SyncPoint::GetInstance()->SetCallBack(
"BuildTable:BeforeSyncTable",
[&](void*) { fault_fs_->SetFilesystemActive(false, error_msg); });
SyncPoint::GetInstance()->EnableProcessing();
s = Flush();
ASSERT_EQ(s.severity(), ROCKSDB_NAMESPACE::Status::Severity::kSoftError);
SyncPoint::GetInstance()->DisableProcessing();
fault_fs_->SetFilesystemActive(true);
s = dbfull()->Resume();
ASSERT_OK(s);
Reopen(options);
ASSERT_EQ("val2", Get(Key(2)));
ASSERT_OK(Put(Key(3), "val3"));
SyncPoint::GetInstance()->SetCallBack(
"BuildTable:BeforeCloseTableFile",
[&](void*) { fault_fs_->SetFilesystemActive(false, error_msg); });
SyncPoint::GetInstance()->EnableProcessing();
s = Flush();
ASSERT_EQ(s.severity(), ROCKSDB_NAMESPACE::Status::Severity::kSoftError);
SyncPoint::GetInstance()->DisableProcessing();
fault_fs_->SetFilesystemActive(true);
s = dbfull()->Resume();
ASSERT_OK(s);
Reopen(options);
ASSERT_EQ("val3", Get(Key(3)));
Destroy(options);
}
TEST_F(DBErrorHandlingFSTest, FLushWriteFileScopeError) {
std::shared_ptr<ErrorHandlerFSListener> listener(
new ErrorHandlerFSListener());
Options options = GetDefaultOptions();
options.env = fault_env_.get();
options.create_if_missing = true;
options.listeners.emplace_back(listener);
options.max_bgerror_resume_count = 0;
Status s;
listener->EnableAutoRecovery(false);
DestroyAndReopen(options);
IOStatus error_msg = IOStatus::IOError("File Scope Data Loss Error");
error_msg.SetDataLoss(true);
error_msg.SetScope(
ROCKSDB_NAMESPACE::IOStatus::IOErrorScope::kIOErrorScopeFile);
error_msg.SetRetryable(false);
ASSERT_OK(Put(Key(1), "val1"));
SyncPoint::GetInstance()->SetCallBack(
"BuildTable:BeforeFinishBuildTable",
[&](void*) { fault_fs_->SetFilesystemActive(false, error_msg); });
SyncPoint::GetInstance()->EnableProcessing();
s = Flush();
ASSERT_EQ(s.severity(), ROCKSDB_NAMESPACE::Status::Severity::kSoftError);
SyncPoint::GetInstance()->DisableProcessing();
fault_fs_->SetFilesystemActive(true);
s = dbfull()->Resume();
ASSERT_OK(s);
Reopen(options);
ASSERT_EQ("val1", Get(Key(1)));
ASSERT_OK(Put(Key(2), "val2"));
SyncPoint::GetInstance()->SetCallBack(
"BuildTable:BeforeSyncTable",
[&](void*) { fault_fs_->SetFilesystemActive(false, error_msg); });
SyncPoint::GetInstance()->EnableProcessing();
s = Flush();
ASSERT_EQ(s.severity(), ROCKSDB_NAMESPACE::Status::Severity::kSoftError);
SyncPoint::GetInstance()->DisableProcessing();
fault_fs_->SetFilesystemActive(true);
s = dbfull()->Resume();
ASSERT_OK(s);
Reopen(options);
ASSERT_EQ("val2", Get(Key(2)));
ASSERT_OK(Put(Key(3), "val3"));
SyncPoint::GetInstance()->SetCallBack(
"BuildTable:BeforeCloseTableFile",
[&](void*) { fault_fs_->SetFilesystemActive(false, error_msg); });
SyncPoint::GetInstance()->EnableProcessing();
s = Flush();
ASSERT_EQ(s.severity(), ROCKSDB_NAMESPACE::Status::Severity::kSoftError);
SyncPoint::GetInstance()->DisableProcessing();
fault_fs_->SetFilesystemActive(true);
s = dbfull()->Resume();
ASSERT_OK(s);
Reopen(options);
ASSERT_EQ("val3", Get(Key(3)));
// not file scope, but retyrable set
error_msg.SetDataLoss(false);
error_msg.SetScope(
ROCKSDB_NAMESPACE::IOStatus::IOErrorScope::kIOErrorScopeFileSystem);
error_msg.SetRetryable(true);
ASSERT_OK(Put(Key(3), "val3"));
SyncPoint::GetInstance()->SetCallBack(
"BuildTable:BeforeCloseTableFile",
[&](void*) { fault_fs_->SetFilesystemActive(false, error_msg); });
SyncPoint::GetInstance()->EnableProcessing();
s = Flush();
ASSERT_EQ(s.severity(), ROCKSDB_NAMESPACE::Status::Severity::kSoftError);
SyncPoint::GetInstance()->DisableProcessing();
fault_fs_->SetFilesystemActive(true);
s = dbfull()->Resume();
ASSERT_OK(s);
Reopen(options);
ASSERT_EQ("val3", Get(Key(3)));
Destroy(options);
}
TEST_F(DBErrorHandlingFSTest, FLushWALWriteRetryableError) {
std::shared_ptr<ErrorHandlerFSListener> listener(
new ErrorHandlerFSListener());
Options options = GetDefaultOptions();
options.env = fault_env_.get();
options.create_if_missing = true;
options.listeners.emplace_back(listener);
options.max_bgerror_resume_count = 0;
Status s;
IOStatus error_msg = IOStatus::IOError("Retryable IO Error");
error_msg.SetRetryable(true);
listener->EnableAutoRecovery(false);
SyncPoint::GetInstance()->SetCallBack(
"DBImpl::SyncClosedLogs:Start",
[&](void*) { fault_fs_->SetFilesystemActive(false, error_msg); });
SyncPoint::GetInstance()->EnableProcessing();
CreateAndReopenWithCF({"pikachu, sdfsdfsdf"}, options);
WriteOptions wo = WriteOptions();
wo.disableWAL = false;
ASSERT_OK(Put(Key(1), "val1", wo));
s = Flush();
ASSERT_EQ(s.severity(), ROCKSDB_NAMESPACE::Status::Severity::kHardError);
SyncPoint::GetInstance()->DisableProcessing();
fault_fs_->SetFilesystemActive(true);
auto cfh = dbfull()->GetColumnFamilyHandle(1);
s = dbfull()->DropColumnFamily(cfh);
s = dbfull()->Resume();
ASSERT_OK(s);
ASSERT_EQ("val1", Get(Key(1)));
ASSERT_OK(Put(Key(3), "val3", wo));
ASSERT_EQ("val3", Get(Key(3)));
s = Flush();
ASSERT_OK(s);
ASSERT_EQ("val3", Get(Key(3)));
Destroy(options);
}
TEST_F(DBErrorHandlingFSTest, FLushWALAtomicWriteRetryableError) {
std::shared_ptr<ErrorHandlerFSListener> listener(
new ErrorHandlerFSListener());
Options options = GetDefaultOptions();
options.env = fault_env_.get();
options.create_if_missing = true;
options.listeners.emplace_back(listener);
options.max_bgerror_resume_count = 0;
options.atomic_flush = true;
Status s;
IOStatus error_msg = IOStatus::IOError("Retryable IO Error");
error_msg.SetRetryable(true);
listener->EnableAutoRecovery(false);
SyncPoint::GetInstance()->SetCallBack(
"DBImpl::SyncClosedLogs:Start",
[&](void*) { fault_fs_->SetFilesystemActive(false, error_msg); });
SyncPoint::GetInstance()->EnableProcessing();
CreateAndReopenWithCF({"pikachu, sdfsdfsdf"}, options);
WriteOptions wo = WriteOptions();
wo.disableWAL = false;
ASSERT_OK(Put(Key(1), "val1", wo));
s = Flush();
ASSERT_EQ(s.severity(), ROCKSDB_NAMESPACE::Status::Severity::kHardError);
SyncPoint::GetInstance()->DisableProcessing();
fault_fs_->SetFilesystemActive(true);
auto cfh = dbfull()->GetColumnFamilyHandle(1);
s = dbfull()->DropColumnFamily(cfh);
s = dbfull()->Resume();
ASSERT_OK(s);
ASSERT_EQ("val1", Get(Key(1)));
ASSERT_OK(Put(Key(3), "val3", wo));
ASSERT_EQ("val3", Get(Key(3)));
s = Flush();
ASSERT_OK(s);
ASSERT_EQ("val3", Get(Key(3)));
Destroy(options);
}
// The flush error is injected before we finish the table build
TEST_F(DBErrorHandlingFSTest, FLushWritNoWALRetryableError1) {
std::shared_ptr<ErrorHandlerFSListener> listener(
new ErrorHandlerFSListener());
Options options = GetDefaultOptions();
options.env = fault_env_.get();
options.create_if_missing = true;
options.listeners.emplace_back(listener);
options.max_bgerror_resume_count = 0;
options.statistics = CreateDBStatistics();
Status s;
listener->EnableAutoRecovery(false);
DestroyAndReopen(options);
IOStatus error_msg = IOStatus::IOError("Retryable IO Error");
error_msg.SetRetryable(true);
WriteOptions wo = WriteOptions();
wo.disableWAL = true;
ASSERT_OK(Put(Key(1), "val1", wo));
SyncPoint::GetInstance()->SetCallBack(
"BuildTable:BeforeFinishBuildTable",
[&](void*) { fault_fs_->SetFilesystemActive(false, error_msg); });
SyncPoint::GetInstance()->EnableProcessing();
s = Flush();
ASSERT_OK(Put(Key(2), "val2", wo));
ASSERT_EQ(s.severity(), ROCKSDB_NAMESPACE::Status::Severity::kSoftError);
ASSERT_EQ("val2", Get(Key(2)));
SyncPoint::GetInstance()->DisableProcessing();
fault_fs_->SetFilesystemActive(true);
s = dbfull()->Resume();
ASSERT_OK(s);
ASSERT_EQ("val1", Get(Key(1)));
ASSERT_EQ("val2", Get(Key(2)));
ASSERT_OK(Put(Key(3), "val3", wo));
ASSERT_EQ("val3", Get(Key(3)));
s = Flush();
ASSERT_OK(s);
ASSERT_EQ("val3", Get(Key(3)));
ASSERT_EQ(1, options.statistics->getAndResetTickerCount(
ERROR_HANDLER_BG_ERROR_COUNT));
ASSERT_EQ(1, options.statistics->getAndResetTickerCount(
ERROR_HANDLER_BG_IO_ERROR_COUNT));
ASSERT_EQ(1, options.statistics->getAndResetTickerCount(
ERROR_HANDLER_BG_RETRYABLE_IO_ERROR_COUNT));
ASSERT_EQ(0, options.statistics->getAndResetTickerCount(
ERROR_HANDLER_AUTORESUME_COUNT));
ASSERT_EQ(0, options.statistics->getAndResetTickerCount(
ERROR_HANDLER_AUTORESUME_RETRY_TOTAL_COUNT));
ASSERT_EQ(0, options.statistics->getAndResetTickerCount(
ERROR_HANDLER_AUTORESUME_SUCCESS_COUNT));
Destroy(options);
}
// The retryable IO error is injected before we sync table
TEST_F(DBErrorHandlingFSTest, FLushWriteNoWALRetryableError2) {
std::shared_ptr<ErrorHandlerFSListener> listener(
new ErrorHandlerFSListener());
Options options = GetDefaultOptions();
options.env = fault_env_.get();
options.create_if_missing = true;
options.listeners.emplace_back(listener);
options.max_bgerror_resume_count = 0;
Status s;
listener->EnableAutoRecovery(false);
DestroyAndReopen(options);
IOStatus error_msg = IOStatus::IOError("Retryable IO Error");
error_msg.SetRetryable(true);
WriteOptions wo = WriteOptions();
wo.disableWAL = true;
ASSERT_OK(Put(Key(1), "val1", wo));
SyncPoint::GetInstance()->SetCallBack(
"BuildTable:BeforeSyncTable",
[&](void*) { fault_fs_->SetFilesystemActive(false, error_msg); });
SyncPoint::GetInstance()->EnableProcessing();
s = Flush();
ASSERT_OK(Put(Key(2), "val2", wo));
ASSERT_EQ(s.severity(), ROCKSDB_NAMESPACE::Status::Severity::kSoftError);
ASSERT_EQ("val2", Get(Key(2)));
SyncPoint::GetInstance()->DisableProcessing();
fault_fs_->SetFilesystemActive(true);
s = dbfull()->Resume();
ASSERT_OK(s);
ASSERT_EQ("val1", Get(Key(1)));
ASSERT_EQ("val2", Get(Key(2)));
ASSERT_OK(Put(Key(3), "val3", wo));
ASSERT_EQ("val3", Get(Key(3)));
s = Flush();
ASSERT_OK(s);
ASSERT_EQ("val3", Get(Key(3)));
Destroy(options);
}
// The retryable IO error is injected before we close the table file
TEST_F(DBErrorHandlingFSTest, FLushWriteNoWALRetryableError3) {
std::shared_ptr<ErrorHandlerFSListener> listener(
new ErrorHandlerFSListener());
Options options = GetDefaultOptions();
options.env = fault_env_.get();
options.create_if_missing = true;
options.listeners.emplace_back(listener);
options.max_bgerror_resume_count = 0;
Status s;
listener->EnableAutoRecovery(false);
DestroyAndReopen(options);
IOStatus error_msg = IOStatus::IOError("Retryable IO Error");
error_msg.SetRetryable(true);
WriteOptions wo = WriteOptions();
wo.disableWAL = true;
ASSERT_OK(Put(Key(1), "val1", wo));
SyncPoint::GetInstance()->SetCallBack(
"BuildTable:BeforeCloseTableFile",
[&](void*) { fault_fs_->SetFilesystemActive(false, error_msg); });
SyncPoint::GetInstance()->EnableProcessing();
s = Flush();
ASSERT_OK(Put(Key(2), "val2", wo));
ASSERT_EQ(s.severity(), ROCKSDB_NAMESPACE::Status::Severity::kSoftError);
ASSERT_EQ("val2", Get(Key(2)));
SyncPoint::GetInstance()->DisableProcessing();
fault_fs_->SetFilesystemActive(true);
s = dbfull()->Resume();
ASSERT_OK(s);
ASSERT_EQ("val1", Get(Key(1)));
ASSERT_EQ("val2", Get(Key(2)));
ASSERT_OK(Put(Key(3), "val3", wo));
ASSERT_EQ("val3", Get(Key(3)));
s = Flush();
ASSERT_OK(s);
ASSERT_EQ("val3", Get(Key(3)));
Destroy(options);
}
TEST_F(DBErrorHandlingFSTest, ManifestWriteError) {
std::shared_ptr<ErrorHandlerFSListener> listener(
new ErrorHandlerFSListener());
Options options = GetDefaultOptions();
options.env = fault_env_.get();
options.create_if_missing = true;
options.listeners.emplace_back(listener);
Status s;
std::string old_manifest;
std::string new_manifest;
listener->EnableAutoRecovery(false);
DestroyAndReopen(options);
old_manifest = GetManifestNameFromLiveFiles();
ASSERT_OK(Put(Key(0), "val"));
ASSERT_OK(Flush());
ASSERT_OK(Put(Key(1), "val"));
SyncPoint::GetInstance()->SetCallBack(
"VersionSet::LogAndApply:WriteManifest", [&](void*) {
fault_fs_->SetFilesystemActive(false,
IOStatus::NoSpace("Out of space"));
});
SyncPoint::GetInstance()->EnableProcessing();
s = Flush();
ASSERT_EQ(s.severity(), ROCKSDB_NAMESPACE::Status::Severity::kHardError);
SyncPoint::GetInstance()->ClearAllCallBacks();
SyncPoint::GetInstance()->DisableProcessing();
fault_fs_->SetFilesystemActive(true);
s = dbfull()->Resume();
ASSERT_OK(s);
new_manifest = GetManifestNameFromLiveFiles();
ASSERT_NE(new_manifest, old_manifest);
Reopen(options);
ASSERT_EQ("val", Get(Key(0)));
ASSERT_EQ("val", Get(Key(1)));
Close();
}
TEST_F(DBErrorHandlingFSTest, ManifestWriteRetryableError) {
std::shared_ptr<ErrorHandlerFSListener> listener(
new ErrorHandlerFSListener());
Options options = GetDefaultOptions();
options.env = fault_env_.get();
options.create_if_missing = true;
options.listeners.emplace_back(listener);
options.max_bgerror_resume_count = 0;
Status s;
std::string old_manifest;
std::string new_manifest;
listener->EnableAutoRecovery(false);
DestroyAndReopen(options);
old_manifest = GetManifestNameFromLiveFiles();
IOStatus error_msg = IOStatus::IOError("Retryable IO Error");
error_msg.SetRetryable(true);
ASSERT_OK(Put(Key(0), "val"));
ASSERT_OK(Flush());
ASSERT_OK(Put(Key(1), "val"));
SyncPoint::GetInstance()->SetCallBack(
"VersionSet::LogAndApply:WriteManifest",
[&](void*) { fault_fs_->SetFilesystemActive(false, error_msg); });
SyncPoint::GetInstance()->EnableProcessing();
s = Flush();
ASSERT_EQ(s.severity(), ROCKSDB_NAMESPACE::Status::Severity::kSoftError);
SyncPoint::GetInstance()->ClearAllCallBacks();
SyncPoint::GetInstance()->DisableProcessing();
fault_fs_->SetFilesystemActive(true);
s = dbfull()->Resume();
ASSERT_OK(s);
new_manifest = GetManifestNameFromLiveFiles();
ASSERT_NE(new_manifest, old_manifest);
Reopen(options);
ASSERT_EQ("val", Get(Key(0)));
ASSERT_EQ("val", Get(Key(1)));
Close();
}
TEST_F(DBErrorHandlingFSTest, ManifestWriteFileScopeError) {
std::shared_ptr<ErrorHandlerFSListener> listener(
new ErrorHandlerFSListener());
Options options = GetDefaultOptions();
options.env = fault_env_.get();
options.create_if_missing = true;
options.listeners.emplace_back(listener);
options.max_bgerror_resume_count = 0;
Status s;
std::string old_manifest;
std::string new_manifest;
listener->EnableAutoRecovery(false);
DestroyAndReopen(options);
old_manifest = GetManifestNameFromLiveFiles();
IOStatus error_msg = IOStatus::IOError("File Scope Data Loss Error");
error_msg.SetDataLoss(true);
error_msg.SetScope(
ROCKSDB_NAMESPACE::IOStatus::IOErrorScope::kIOErrorScopeFile);
error_msg.SetRetryable(false);
ASSERT_OK(Put(Key(0), "val"));
ASSERT_OK(Flush());
ASSERT_OK(Put(Key(1), "val"));
SyncPoint::GetInstance()->SetCallBack(
"VersionSet::LogAndApply:WriteManifest",
[&](void*) { fault_fs_->SetFilesystemActive(false, error_msg); });
SyncPoint::GetInstance()->EnableProcessing();
s = Flush();
ASSERT_EQ(s.severity(), ROCKSDB_NAMESPACE::Status::Severity::kSoftError);
SyncPoint::GetInstance()->ClearAllCallBacks();
SyncPoint::GetInstance()->DisableProcessing();
fault_fs_->SetFilesystemActive(true);
s = dbfull()->Resume();
ASSERT_OK(s);
new_manifest = GetManifestNameFromLiveFiles();
ASSERT_NE(new_manifest, old_manifest);
Reopen(options);
ASSERT_EQ("val", Get(Key(0)));
ASSERT_EQ("val", Get(Key(1)));
Close();
}
TEST_F(DBErrorHandlingFSTest, ManifestWriteNoWALRetryableError) {
std::shared_ptr<ErrorHandlerFSListener> listener(
new ErrorHandlerFSListener());
Options options = GetDefaultOptions();
options.env = fault_env_.get();
options.create_if_missing = true;
options.listeners.emplace_back(listener);
options.max_bgerror_resume_count = 0;
Status s;
std::string old_manifest;
std::string new_manifest;
listener->EnableAutoRecovery(false);
DestroyAndReopen(options);
old_manifest = GetManifestNameFromLiveFiles();
IOStatus error_msg = IOStatus::IOError("Retryable IO Error");
error_msg.SetRetryable(true);
WriteOptions wo = WriteOptions();
wo.disableWAL = true;
ASSERT_OK(Put(Key(0), "val", wo));
ASSERT_OK(Flush());
ASSERT_OK(Put(Key(1), "val", wo));
SyncPoint::GetInstance()->SetCallBack(
"VersionSet::LogAndApply:WriteManifest",
[&](void*) { fault_fs_->SetFilesystemActive(false, error_msg); });
SyncPoint::GetInstance()->EnableProcessing();
s = Flush();
ASSERT_EQ(s.severity(), ROCKSDB_NAMESPACE::Status::Severity::kSoftError);
SyncPoint::GetInstance()->ClearAllCallBacks();
SyncPoint::GetInstance()->DisableProcessing();
fault_fs_->SetFilesystemActive(true);
s = dbfull()->Resume();
ASSERT_OK(s);
new_manifest = GetManifestNameFromLiveFiles();
ASSERT_NE(new_manifest, old_manifest);
Reopen(options);
ASSERT_EQ("val", Get(Key(0)));
ASSERT_EQ("val", Get(Key(1)));
Close();
}
TEST_F(DBErrorHandlingFSTest, DoubleManifestWriteError) {
std::shared_ptr<ErrorHandlerFSListener> listener(
new ErrorHandlerFSListener());
Options options = GetDefaultOptions();
options.env = fault_env_.get();
options.create_if_missing = true;
options.listeners.emplace_back(listener);
Status s;
std::string old_manifest;
std::string new_manifest;
listener->EnableAutoRecovery(false);
DestroyAndReopen(options);
old_manifest = GetManifestNameFromLiveFiles();
ASSERT_OK(Put(Key(0), "val"));
ASSERT_OK(Flush());
ASSERT_OK(Put(Key(1), "val"));
SyncPoint::GetInstance()->SetCallBack(
"VersionSet::LogAndApply:WriteManifest", [&](void*) {
fault_fs_->SetFilesystemActive(false,
IOStatus::NoSpace("Out of space"));
});
SyncPoint::GetInstance()->EnableProcessing();
s = Flush();
ASSERT_EQ(s.severity(), ROCKSDB_NAMESPACE::Status::Severity::kHardError);
fault_fs_->SetFilesystemActive(true);
// This Resume() will attempt to create a new manifest file and fail again
s = dbfull()->Resume();
ASSERT_EQ(s.severity(), ROCKSDB_NAMESPACE::Status::Severity::kHardError);
fault_fs_->SetFilesystemActive(true);
SyncPoint::GetInstance()->ClearAllCallBacks();
SyncPoint::GetInstance()->DisableProcessing();
// A successful Resume() will create a new manifest file
s = dbfull()->Resume();
ASSERT_OK(s);
new_manifest = GetManifestNameFromLiveFiles();
ASSERT_NE(new_manifest, old_manifest);
Reopen(options);
ASSERT_EQ("val", Get(Key(0)));
ASSERT_EQ("val", Get(Key(1)));
Close();
}
TEST_F(DBErrorHandlingFSTest, CompactionManifestWriteError) {
if (mem_env_ != nullptr) {
ROCKSDB_GTEST_SKIP("Test requires non-mock environment");
return;
}
std::shared_ptr<ErrorHandlerFSListener> listener(
new ErrorHandlerFSListener());
Options options = GetDefaultOptions();
options.env = fault_env_.get();
options.create_if_missing = true;
options.level0_file_num_compaction_trigger = 2;
options.listeners.emplace_back(listener);
Status s;
std::string old_manifest;
std::string new_manifest;
std::atomic<bool> fail_manifest(false);
DestroyAndReopen(options);
old_manifest = GetManifestNameFromLiveFiles();
ASSERT_OK(Put(Key(0), "val"));
ASSERT_OK(Put(Key(2), "val"));
s = Flush();
ASSERT_OK(s);
ROCKSDB_NAMESPACE::SyncPoint::GetInstance()->LoadDependency(
// Wait for flush of 2nd L0 file before starting compaction
{{"DBImpl::FlushMemTable:FlushMemTableFinished",
"BackgroundCallCompaction:0"},
// Wait for compaction to detect manifest write error
{"BackgroundCallCompaction:1", "CompactionManifestWriteError:0"},
// Make compaction thread wait for error to be cleared
{"CompactionManifestWriteError:1",
"DBImpl::BackgroundCallCompaction:FoundObsoleteFiles"},
// Wait for DB instance to clear bg_error before calling
// TEST_WaitForCompact
{"SstFileManagerImpl::ErrorCleared", "CompactionManifestWriteError:2"}});
// trigger manifest write failure in compaction thread
ROCKSDB_NAMESPACE::SyncPoint::GetInstance()->SetCallBack(
"BackgroundCallCompaction:0", [&](void*) { fail_manifest.store(true); });
ROCKSDB_NAMESPACE::SyncPoint::GetInstance()->SetCallBack(
"VersionSet::LogAndApply:WriteManifest", [&](void*) {
if (fail_manifest.load()) {
fault_fs_->SetFilesystemActive(false,
IOStatus::NoSpace("Out of space"));
}
});
ROCKSDB_NAMESPACE::SyncPoint::GetInstance()->EnableProcessing();
ASSERT_OK(Put(Key(1), "val"));
// This Flush will trigger a compaction, which will fail when appending to
// the manifest
s = Flush();
ASSERT_OK(s);
TEST_SYNC_POINT("CompactionManifestWriteError:0");
// Clear all errors so when the compaction is retried, it will succeed
fault_fs_->SetFilesystemActive(true);
ROCKSDB_NAMESPACE::SyncPoint::GetInstance()->ClearAllCallBacks();
TEST_SYNC_POINT("CompactionManifestWriteError:1");
TEST_SYNC_POINT("CompactionManifestWriteError:2");
s = dbfull()->TEST_WaitForCompact();
ROCKSDB_NAMESPACE::SyncPoint::GetInstance()->DisableProcessing();
ASSERT_OK(s);
new_manifest = GetManifestNameFromLiveFiles();
ASSERT_NE(new_manifest, old_manifest);
Reopen(options);
ASSERT_EQ("val", Get(Key(0)));
ASSERT_EQ("val", Get(Key(1)));
ASSERT_EQ("val", Get(Key(2)));
Close();
}
TEST_F(DBErrorHandlingFSTest, CompactionManifestWriteRetryableError) {
std::shared_ptr<ErrorHandlerFSListener> listener(
new ErrorHandlerFSListener());
Options options = GetDefaultOptions();
options.env = fault_env_.get();
options.create_if_missing = true;
options.level0_file_num_compaction_trigger = 2;
options.listeners.emplace_back(listener);
options.max_bgerror_resume_count = 0;
Status s;
std::string old_manifest;
std::string new_manifest;
std::atomic<bool> fail_manifest(false);
DestroyAndReopen(options);
old_manifest = GetManifestNameFromLiveFiles();
IOStatus error_msg = IOStatus::IOError("Retryable IO Error");
error_msg.SetRetryable(true);
ASSERT_OK(Put(Key(0), "val"));
ASSERT_OK(Put(Key(2), "val"));
s = Flush();
ASSERT_OK(s);
listener->OverrideBGError(Status(error_msg, Status::Severity::kHardError));
listener->EnableAutoRecovery(false);
ROCKSDB_NAMESPACE::SyncPoint::GetInstance()->LoadDependency(
// Wait for flush of 2nd L0 file before starting compaction
{{"DBImpl::FlushMemTable:FlushMemTableFinished",
"BackgroundCallCompaction:0"},
// Wait for compaction to detect manifest write error
{"BackgroundCallCompaction:1", "CompactionManifestWriteError:0"},
// Make compaction thread wait for error to be cleared
{"CompactionManifestWriteError:1",
"DBImpl::BackgroundCallCompaction:FoundObsoleteFiles"}});
// trigger manifest write failure in compaction thread
ROCKSDB_NAMESPACE::SyncPoint::GetInstance()->SetCallBack(
"BackgroundCallCompaction:0", [&](void*) { fail_manifest.store(true); });
ROCKSDB_NAMESPACE::SyncPoint::GetInstance()->SetCallBack(
"VersionSet::LogAndApply:WriteManifest", [&](void*) {
if (fail_manifest.load()) {
fault_fs_->SetFilesystemActive(false, error_msg);
}
});
ROCKSDB_NAMESPACE::SyncPoint::GetInstance()->EnableProcessing();
ASSERT_OK(Put(Key(1), "val"));
s = Flush();
ASSERT_OK(s);
TEST_SYNC_POINT("CompactionManifestWriteError:0");
TEST_SYNC_POINT("CompactionManifestWriteError:1");
s = dbfull()->TEST_WaitForCompact();
ASSERT_EQ(s.severity(), ROCKSDB_NAMESPACE::Status::Severity::kHardError);
fault_fs_->SetFilesystemActive(true);
SyncPoint::GetInstance()->ClearAllCallBacks();
SyncPoint::GetInstance()->DisableProcessing();
s = dbfull()->Resume();
ASSERT_OK(s);
new_manifest = GetManifestNameFromLiveFiles();
ASSERT_NE(new_manifest, old_manifest);
Reopen(options);
ASSERT_EQ("val", Get(Key(0)));
ASSERT_EQ("val", Get(Key(1)));
ASSERT_EQ("val", Get(Key(2)));
Close();
}
TEST_F(DBErrorHandlingFSTest, CompactionWriteError) {
std::shared_ptr<ErrorHandlerFSListener> listener(
new ErrorHandlerFSListener());
Options options = GetDefaultOptions();
options.env = fault_env_.get();
options.create_if_missing = true;
options.level0_file_num_compaction_trigger = 2;
options.listeners.emplace_back(listener);
Status s;
DestroyAndReopen(options);
ASSERT_OK(Put(Key(0), "va;"));
ASSERT_OK(Put(Key(2), "va;"));
s = Flush();
ASSERT_OK(s);
listener->OverrideBGError(
Status(Status::NoSpace(), Status::Severity::kHardError));
listener->EnableAutoRecovery(false);
ROCKSDB_NAMESPACE::SyncPoint::GetInstance()->LoadDependency(
{{"DBImpl::FlushMemTable:FlushMemTableFinished",
"BackgroundCallCompaction:0"}});
ROCKSDB_NAMESPACE::SyncPoint::GetInstance()->SetCallBack(
"BackgroundCallCompaction:0", [&](void*) {
fault_fs_->SetFilesystemActive(false,
IOStatus::NoSpace("Out of space"));
});
ROCKSDB_NAMESPACE::SyncPoint::GetInstance()->EnableProcessing();
ASSERT_OK(Put(Key(1), "val"));
s = Flush();
ASSERT_OK(s);
s = dbfull()->TEST_WaitForCompact();
ASSERT_EQ(s.severity(), ROCKSDB_NAMESPACE::Status::Severity::kHardError);
fault_fs_->SetFilesystemActive(true);
s = dbfull()->Resume();
ASSERT_OK(s);
Destroy(options);
}
TEST_F(DBErrorHandlingFSTest, DISABLED_CompactionWriteRetryableError) {
std::shared_ptr<ErrorHandlerFSListener> listener(
new ErrorHandlerFSListener());
Options options = GetDefaultOptions();
options.env = fault_env_.get();
options.create_if_missing = true;
options.level0_file_num_compaction_trigger = 2;
options.listeners.emplace_back(listener);
options.max_bgerror_resume_count = 0;
Status s;
DestroyAndReopen(options);
IOStatus error_msg = IOStatus::IOError("Retryable IO Error");
error_msg.SetRetryable(true);
ASSERT_OK(Put(Key(0), "va;"));
ASSERT_OK(Put(Key(2), "va;"));
s = Flush();
ASSERT_OK(s);
listener->OverrideBGError(Status(error_msg, Status::Severity::kHardError));
listener->EnableAutoRecovery(false);
ROCKSDB_NAMESPACE::SyncPoint::GetInstance()->LoadDependency(
{{"DBImpl::FlushMemTable:FlushMemTableFinished",
"BackgroundCallCompaction:0"}});
ROCKSDB_NAMESPACE::SyncPoint::GetInstance()->SetCallBack(
"CompactionJob::OpenCompactionOutputFile",
[&](void*) { fault_fs_->SetFilesystemActive(false, error_msg); });
ROCKSDB_NAMESPACE::SyncPoint::GetInstance()->SetCallBack(
"DBImpl::BackgroundCompaction:Finish",
[&](void*) { CancelAllBackgroundWork(dbfull()); });
ROCKSDB_NAMESPACE::SyncPoint::GetInstance()->EnableProcessing();
ASSERT_OK(Put(Key(1), "val"));
s = Flush();
ASSERT_OK(s);
s = dbfull()->TEST_GetBGError();
ASSERT_OK(s);
fault_fs_->SetFilesystemActive(true);
SyncPoint::GetInstance()->ClearAllCallBacks();
SyncPoint::GetInstance()->DisableProcessing();
s = dbfull()->Resume();
ASSERT_OK(s);
Destroy(options);
}
TEST_F(DBErrorHandlingFSTest, DISABLED_CompactionWriteFileScopeError) {
std::shared_ptr<ErrorHandlerFSListener> listener(
new ErrorHandlerFSListener());
Options options = GetDefaultOptions();
options.env = fault_env_.get();
options.create_if_missing = true;
options.level0_file_num_compaction_trigger = 2;
options.listeners.emplace_back(listener);
options.max_bgerror_resume_count = 0;
Status s;
DestroyAndReopen(options);
IOStatus error_msg = IOStatus::IOError("File Scope Data Loss Error");
error_msg.SetDataLoss(true);
error_msg.SetScope(
ROCKSDB_NAMESPACE::IOStatus::IOErrorScope::kIOErrorScopeFile);
error_msg.SetRetryable(false);
ASSERT_OK(Put(Key(0), "va;"));
ASSERT_OK(Put(Key(2), "va;"));
s = Flush();
ASSERT_OK(s);
listener->OverrideBGError(Status(error_msg, Status::Severity::kHardError));
listener->EnableAutoRecovery(false);
ROCKSDB_NAMESPACE::SyncPoint::GetInstance()->LoadDependency(
{{"DBImpl::FlushMemTable:FlushMemTableFinished",
"BackgroundCallCompaction:0"}});
ROCKSDB_NAMESPACE::SyncPoint::GetInstance()->SetCallBack(
"CompactionJob::OpenCompactionOutputFile",
[&](void*) { fault_fs_->SetFilesystemActive(false, error_msg); });
ROCKSDB_NAMESPACE::SyncPoint::GetInstance()->SetCallBack(
"DBImpl::BackgroundCompaction:Finish",
[&](void*) { CancelAllBackgroundWork(dbfull()); });
ROCKSDB_NAMESPACE::SyncPoint::GetInstance()->EnableProcessing();
ASSERT_OK(Put(Key(1), "val"));
s = Flush();
ASSERT_OK(s);
s = dbfull()->TEST_GetBGError();
ASSERT_OK(s);
fault_fs_->SetFilesystemActive(true);
SyncPoint::GetInstance()->ClearAllCallBacks();
SyncPoint::GetInstance()->DisableProcessing();
s = dbfull()->Resume();
ASSERT_OK(s);
Destroy(options);
}
TEST_F(DBErrorHandlingFSTest, CorruptionError) {
Options options = GetDefaultOptions();
options.env = fault_env_.get();
options.create_if_missing = true;
options.level0_file_num_compaction_trigger = 2;
Status s;
DestroyAndReopen(options);
ASSERT_OK(Put(Key(0), "va;"));
ASSERT_OK(Put(Key(2), "va;"));
s = Flush();
ASSERT_OK(s);
ROCKSDB_NAMESPACE::SyncPoint::GetInstance()->LoadDependency(
{{"DBImpl::FlushMemTable:FlushMemTableFinished",
"BackgroundCallCompaction:0"}});
ROCKSDB_NAMESPACE::SyncPoint::GetInstance()->SetCallBack(
"BackgroundCallCompaction:0", [&](void*) {
fault_fs_->SetFilesystemActive(false,
IOStatus::Corruption("Corruption"));
});
ROCKSDB_NAMESPACE::SyncPoint::GetInstance()->EnableProcessing();
ASSERT_OK(Put(Key(1), "val"));
s = Flush();
ASSERT_OK(s);
s = dbfull()->TEST_WaitForCompact();
ASSERT_EQ(s.severity(),
ROCKSDB_NAMESPACE::Status::Severity::kUnrecoverableError);
fault_fs_->SetFilesystemActive(true);
s = dbfull()->Resume();
ASSERT_NOK(s);
Destroy(options);
}
TEST_F(DBErrorHandlingFSTest, AutoRecoverFlushError) {
if (mem_env_ != nullptr) {
ROCKSDB_GTEST_SKIP("Test requires non-mock environment");
return;
}
std::shared_ptr<ErrorHandlerFSListener> listener(
new ErrorHandlerFSListener());
Options options = GetDefaultOptions();
options.env = fault_env_.get();
options.create_if_missing = true;
options.listeners.emplace_back(listener);
options.statistics = CreateDBStatistics();
Status s;
listener->EnableAutoRecovery();
DestroyAndReopen(options);
ASSERT_OK(Put(Key(0), "val"));
SyncPoint::GetInstance()->SetCallBack("FlushJob::Start", [&](void*) {
fault_fs_->SetFilesystemActive(false, IOStatus::NoSpace("Out of space"));
});
SyncPoint::GetInstance()->EnableProcessing();
s = Flush();
ASSERT_EQ(s.severity(), ROCKSDB_NAMESPACE::Status::Severity::kHardError);
SyncPoint::GetInstance()->DisableProcessing();
fault_fs_->SetFilesystemActive(true);
ASSERT_EQ(listener->WaitForRecovery(5000000), true);
s = Put(Key(1), "val");
ASSERT_OK(s);
ASSERT_EQ(1, options.statistics->getAndResetTickerCount(
ERROR_HANDLER_BG_ERROR_COUNT));
ASSERT_EQ(1, options.statistics->getAndResetTickerCount(
ERROR_HANDLER_BG_IO_ERROR_COUNT));
ASSERT_EQ(0, options.statistics->getAndResetTickerCount(
ERROR_HANDLER_BG_RETRYABLE_IO_ERROR_COUNT));
ASSERT_EQ(0, options.statistics->getAndResetTickerCount(
ERROR_HANDLER_AUTORESUME_COUNT));
ASSERT_EQ(0, options.statistics->getAndResetTickerCount(
ERROR_HANDLER_AUTORESUME_RETRY_TOTAL_COUNT));
ASSERT_EQ(0, options.statistics->getAndResetTickerCount(
ERROR_HANDLER_AUTORESUME_SUCCESS_COUNT));
Reopen(options);
ASSERT_EQ("val", Get(Key(0)));
ASSERT_EQ("val", Get(Key(1)));
Destroy(options);
}
TEST_F(DBErrorHandlingFSTest, FailRecoverFlushError) {
std::shared_ptr<ErrorHandlerFSListener> listener(
new ErrorHandlerFSListener());
Options options = GetDefaultOptions();
options.env = fault_env_.get();
options.create_if_missing = true;
options.listeners.emplace_back(listener);
Status s;
listener->EnableAutoRecovery();
DestroyAndReopen(options);
ASSERT_OK(Put(Key(0), "val"));
SyncPoint::GetInstance()->SetCallBack("FlushJob::Start", [&](void*) {
fault_fs_->SetFilesystemActive(false, IOStatus::NoSpace("Out of space"));
});
SyncPoint::GetInstance()->EnableProcessing();
s = Flush();
ASSERT_EQ(s.severity(), ROCKSDB_NAMESPACE::Status::Severity::kHardError);
// We should be able to shutdown the database while auto recovery is going
// on in the background
Close();
DestroyDB(dbname_, options).PermitUncheckedError();
}
TEST_F(DBErrorHandlingFSTest, WALWriteError) {
if (mem_env_ != nullptr) {
ROCKSDB_GTEST_SKIP("Test requires non-mock environment");
return;
}
std::shared_ptr<ErrorHandlerFSListener> listener(
new ErrorHandlerFSListener());
Options options = GetDefaultOptions();
options.env = fault_env_.get();
options.create_if_missing = true;
options.writable_file_max_buffer_size = 32768;
options.listeners.emplace_back(listener);
Status s;
Random rnd(301);
listener->EnableAutoRecovery();
DestroyAndReopen(options);
{
WriteBatch batch;
for (auto i = 0; i < 100; ++i) {
ASSERT_OK(batch.Put(Key(i), rnd.RandomString(1024)));
}
WriteOptions wopts;
wopts.sync = true;
ASSERT_OK(dbfull()->Write(wopts, &batch));
};
{
WriteBatch batch;
int write_error = 0;
for (auto i = 100; i < 199; ++i) {
ASSERT_OK(batch.Put(Key(i), rnd.RandomString(1024)));
}
SyncPoint::GetInstance()->SetCallBack(
"WritableFileWriter::Append:BeforePrepareWrite", [&](void*) {
write_error++;
if (write_error > 2) {
fault_fs_->SetFilesystemActive(false,
IOStatus::NoSpace("Out of space"));
}
});
SyncPoint::GetInstance()->EnableProcessing();
WriteOptions wopts;
wopts.sync = true;
s = dbfull()->Write(wopts, &batch);
ASSERT_EQ(s, s.NoSpace());
}
SyncPoint::GetInstance()->DisableProcessing();
// `ClearAllCallBacks()` is needed in addition to `DisableProcessing()` to
// drain all callbacks. Otherwise, a pending callback in the background
// could re-disable `fault_fs_` after we enable it below.
SyncPoint::GetInstance()->ClearAllCallBacks();
fault_fs_->SetFilesystemActive(true);
ASSERT_EQ(listener->WaitForRecovery(5000000), true);
for (auto i = 0; i < 199; ++i) {
if (i < 100) {
ASSERT_NE(Get(Key(i)), "NOT_FOUND");
} else {
ASSERT_EQ(Get(Key(i)), "NOT_FOUND");
}
}
Reopen(options);
for (auto i = 0; i < 199; ++i) {
if (i < 100) {
ASSERT_NE(Get(Key(i)), "NOT_FOUND");
} else {
ASSERT_EQ(Get(Key(i)), "NOT_FOUND");
}
}
Close();
}
TEST_F(DBErrorHandlingFSTest, WALWriteRetryableError) {
std::shared_ptr<ErrorHandlerFSListener> listener(
new ErrorHandlerFSListener());
Options options = GetDefaultOptions();
options.env = fault_env_.get();
options.create_if_missing = true;
options.writable_file_max_buffer_size = 32768;
options.listeners.emplace_back(listener);
options.paranoid_checks = true;
options.max_bgerror_resume_count = 0;
Random rnd(301);
DestroyAndReopen(options);
IOStatus error_msg = IOStatus::IOError("Retryable IO Error");
error_msg.SetRetryable(true);
// For the first batch, write is successful, require sync
{
WriteBatch batch;
for (auto i = 0; i < 100; ++i) {
ASSERT_OK(batch.Put(Key(i), rnd.RandomString(1024)));
}
WriteOptions wopts;
wopts.sync = true;
ASSERT_OK(dbfull()->Write(wopts, &batch));
};
// For the second batch, the first 2 file Append are successful, then the
// following Append fails due to file system retryable IOError.
{
WriteBatch batch;
int write_error = 0;
for (auto i = 100; i < 200; ++i) {
ASSERT_OK(batch.Put(Key(i), rnd.RandomString(1024)));
}
SyncPoint::GetInstance()->SetCallBack(
"WritableFileWriter::Append:BeforePrepareWrite", [&](void*) {
write_error++;
if (write_error > 2) {
fault_fs_->SetFilesystemActive(false, error_msg);
}
});
SyncPoint::GetInstance()->EnableProcessing();
WriteOptions wopts;
wopts.sync = true;
Status s = dbfull()->Write(wopts, &batch);
ASSERT_TRUE(s.IsIOError());
}
fault_fs_->SetFilesystemActive(true);
SyncPoint::GetInstance()->ClearAllCallBacks();
SyncPoint::GetInstance()->DisableProcessing();
// Data in corrupted WAL are not stored
for (auto i = 0; i < 199; ++i) {
if (i < 100) {
ASSERT_NE(Get(Key(i)), "NOT_FOUND");
} else {
ASSERT_EQ(Get(Key(i)), "NOT_FOUND");
}
}
// Resume and write a new batch, should be in the WAL
ASSERT_OK(dbfull()->Resume());
{
WriteBatch batch;
for (auto i = 200; i < 300; ++i) {
ASSERT_OK(batch.Put(Key(i), rnd.RandomString(1024)));
}
WriteOptions wopts;
wopts.sync = true;
ASSERT_OK(dbfull()->Write(wopts, &batch));
};
Reopen(options);
for (auto i = 0; i < 300; ++i) {
if (i < 100 || i >= 200) {
ASSERT_NE(Get(Key(i)), "NOT_FOUND");
} else {
ASSERT_EQ(Get(Key(i)), "NOT_FOUND");
}
}
Close();
}
TEST_F(DBErrorHandlingFSTest, MultiCFWALWriteError) {
if (mem_env_ != nullptr) {
ROCKSDB_GTEST_SKIP("Test requires non-mock environment");
return;
}
std::shared_ptr<ErrorHandlerFSListener> listener(
new ErrorHandlerFSListener());
Options options = GetDefaultOptions();
options.env = fault_env_.get();
options.create_if_missing = true;
options.writable_file_max_buffer_size = 32768;
options.listeners.emplace_back(listener);
Random rnd(301);
listener->EnableAutoRecovery();
CreateAndReopenWithCF({"one", "two", "three"}, options);
{
WriteBatch batch;
for (auto i = 1; i < 4; ++i) {
for (auto j = 0; j < 100; ++j) {
ASSERT_OK(batch.Put(handles_[i], Key(j), rnd.RandomString(1024)));
}
}
WriteOptions wopts;
wopts.sync = true;
ASSERT_OK(dbfull()->Write(wopts, &batch));
};
{
WriteBatch batch;
int write_error = 0;
// Write to one CF
for (auto i = 100; i < 199; ++i) {
ASSERT_OK(batch.Put(handles_[2], Key(i), rnd.RandomString(1024)));
}
SyncPoint::GetInstance()->SetCallBack(
"WritableFileWriter::Append:BeforePrepareWrite", [&](void*) {
write_error++;
if (write_error > 2) {
fault_fs_->SetFilesystemActive(false,
IOStatus::NoSpace("Out of space"));
}
});
SyncPoint::GetInstance()->EnableProcessing();
WriteOptions wopts;
wopts.sync = true;
Status s = dbfull()->Write(wopts, &batch);
ASSERT_TRUE(s.IsNoSpace());
}
SyncPoint::GetInstance()->DisableProcessing();
// `ClearAllCallBacks()` is needed in addition to `DisableProcessing()` to
// drain all callbacks. Otherwise, a pending callback in the background
// could re-disable `fault_fs_` after we enable it below.
SyncPoint::GetInstance()->ClearAllCallBacks();
fault_fs_->SetFilesystemActive(true);
ASSERT_EQ(listener->WaitForRecovery(5000000), true);
for (auto i = 1; i < 4; ++i) {
// Every CF should have been flushed
ASSERT_EQ(NumTableFilesAtLevel(0, i), 1);
}
for (auto i = 1; i < 4; ++i) {
for (auto j = 0; j < 199; ++j) {
if (j < 100) {
ASSERT_NE(Get(i, Key(j)), "NOT_FOUND");
} else {
ASSERT_EQ(Get(i, Key(j)), "NOT_FOUND");
}
}
}
ReopenWithColumnFamilies({"default", "one", "two", "three"}, options);
for (auto i = 1; i < 4; ++i) {
for (auto j = 0; j < 199; ++j) {
if (j < 100) {
ASSERT_NE(Get(i, Key(j)), "NOT_FOUND");
} else {
ASSERT_EQ(Get(i, Key(j)), "NOT_FOUND");
}
}
}
Close();
}
TEST_F(DBErrorHandlingFSTest, MultiDBCompactionError) {
if (mem_env_ != nullptr) {
ROCKSDB_GTEST_SKIP("Test requires non-mock environment");
return;
}
FaultInjectionTestEnv* def_env = new FaultInjectionTestEnv(env_);
std::vector<std::unique_ptr<Env>> fault_envs;
std::vector<FaultInjectionTestFS*> fault_fs;
std::vector<Options> options;
std::vector<std::shared_ptr<ErrorHandlerFSListener>> listener;
std::vector<DB*> db;
std::shared_ptr<SstFileManager> sfm(NewSstFileManager(def_env));
int kNumDbInstances = 3;
Random rnd(301);
for (auto i = 0; i < kNumDbInstances; ++i) {
listener.emplace_back(new ErrorHandlerFSListener());
options.emplace_back(GetDefaultOptions());
fault_fs.emplace_back(new FaultInjectionTestFS(env_->GetFileSystem()));
std::shared_ptr<FileSystem> fs(fault_fs.back());
fault_envs.emplace_back(new CompositeEnvWrapper(def_env, fs));
options[i].env = fault_envs.back().get();
options[i].create_if_missing = true;
options[i].level0_file_num_compaction_trigger = 2;
options[i].writable_file_max_buffer_size = 32768;
options[i].listeners.emplace_back(listener[i]);
options[i].sst_file_manager = sfm;
DB* dbptr;
char buf[16];
listener[i]->EnableAutoRecovery();
// Setup for returning error for the 3rd SST, which would be level 1
listener[i]->InjectFileCreationError(fault_fs[i], 3,
IOStatus::NoSpace("Out of space"));
snprintf(buf, sizeof(buf), "_%d", i);
ASSERT_OK(DestroyDB(dbname_ + std::string(buf), options[i]));
ASSERT_OK(DB::Open(options[i], dbname_ + std::string(buf), &dbptr));
db.emplace_back(dbptr);
}
for (auto i = 0; i < kNumDbInstances; ++i) {
WriteBatch batch;
for (auto j = 0; j <= 100; ++j) {
ASSERT_OK(batch.Put(Key(j), rnd.RandomString(1024)));
}
WriteOptions wopts;
wopts.sync = true;
ASSERT_OK(db[i]->Write(wopts, &batch));
ASSERT_OK(db[i]->Flush(FlushOptions()));
}
def_env->SetFilesystemActive(false, Status::NoSpace("Out of space"));
for (auto i = 0; i < kNumDbInstances; ++i) {
WriteBatch batch;
// Write to one CF
for (auto j = 100; j < 199; ++j) {
ASSERT_OK(batch.Put(Key(j), rnd.RandomString(1024)));
}
WriteOptions wopts;
wopts.sync = true;
ASSERT_OK(db[i]->Write(wopts, &batch));
ASSERT_OK(db[i]->Flush(FlushOptions()));
}
for (auto i = 0; i < kNumDbInstances; ++i) {
Status s = static_cast<DBImpl*>(db[i])->TEST_WaitForCompact(true);
ASSERT_EQ(s.severity(), Status::Severity::kSoftError);
fault_fs[i]->SetFilesystemActive(true);
}
def_env->SetFilesystemActive(true);
for (auto i = 0; i < kNumDbInstances; ++i) {
std::string prop;
ASSERT_EQ(listener[i]->WaitForRecovery(5000000), true);
ASSERT_OK(static_cast<DBImpl*>(db[i])->TEST_WaitForCompact(true));
EXPECT_TRUE(db[i]->GetProperty(
"rocksdb.num-files-at-level" + std::to_string(0), &prop));
EXPECT_EQ(atoi(prop.c_str()), 0);
EXPECT_TRUE(db[i]->GetProperty(
"rocksdb.num-files-at-level" + std::to_string(1), &prop));
EXPECT_EQ(atoi(prop.c_str()), 1);
}
SstFileManagerImpl* sfmImpl =
static_cast_with_check<SstFileManagerImpl>(sfm.get());
sfmImpl->Close();
for (auto i = 0; i < kNumDbInstances; ++i) {
char buf[16];
snprintf(buf, sizeof(buf), "_%d", i);
delete db[i];
fault_fs[i]->SetFilesystemActive(true);
if (getenv("KEEP_DB")) {
printf("DB is still at %s%s\n", dbname_.c_str(), buf);
} else {
ASSERT_OK(DestroyDB(dbname_ + std::string(buf), options[i]));
}
}
options.clear();
sfm.reset();
delete def_env;
}
TEST_F(DBErrorHandlingFSTest, MultiDBVariousErrors) {
if (mem_env_ != nullptr) {
ROCKSDB_GTEST_SKIP("Test requires non-mock environment");
return;
}
FaultInjectionTestEnv* def_env = new FaultInjectionTestEnv(env_);
std::vector<std::unique_ptr<Env>> fault_envs;
std::vector<FaultInjectionTestFS*> fault_fs;
std::vector<Options> options;
std::vector<std::shared_ptr<ErrorHandlerFSListener>> listener;
std::vector<DB*> db;
std::shared_ptr<SstFileManager> sfm(NewSstFileManager(def_env));
int kNumDbInstances = 3;
Random rnd(301);
for (auto i = 0; i < kNumDbInstances; ++i) {
listener.emplace_back(new ErrorHandlerFSListener());
options.emplace_back(GetDefaultOptions());
fault_fs.emplace_back(new FaultInjectionTestFS(env_->GetFileSystem()));
std::shared_ptr<FileSystem> fs(fault_fs.back());
fault_envs.emplace_back(new CompositeEnvWrapper(def_env, fs));
options[i].env = fault_envs.back().get();
options[i].create_if_missing = true;
options[i].level0_file_num_compaction_trigger = 2;
options[i].writable_file_max_buffer_size = 32768;
options[i].listeners.emplace_back(listener[i]);
options[i].sst_file_manager = sfm;
DB* dbptr;
char buf[16];
listener[i]->EnableAutoRecovery();
switch (i) {
case 0:
// Setup for returning error for the 3rd SST, which would be level 1
listener[i]->InjectFileCreationError(fault_fs[i], 3,
IOStatus::NoSpace("Out of space"));
break;
case 1:
// Setup for returning error after the 1st SST, which would result
// in a hard error
listener[i]->InjectFileCreationError(fault_fs[i], 2,
IOStatus::NoSpace("Out of space"));
break;
default:
break;
}
snprintf(buf, sizeof(buf), "_%d", i);
ASSERT_OK(DestroyDB(dbname_ + std::string(buf), options[i]));
ASSERT_OK(DB::Open(options[i], dbname_ + std::string(buf), &dbptr));
db.emplace_back(dbptr);
}
for (auto i = 0; i < kNumDbInstances; ++i) {
WriteBatch batch;
for (auto j = 0; j <= 100; ++j) {
ASSERT_OK(batch.Put(Key(j), rnd.RandomString(1024)));
}
WriteOptions wopts;
wopts.sync = true;
ASSERT_OK(db[i]->Write(wopts, &batch));
ASSERT_OK(db[i]->Flush(FlushOptions()));
}
def_env->SetFilesystemActive(false, Status::NoSpace("Out of space"));
for (auto i = 0; i < kNumDbInstances; ++i) {
WriteBatch batch;
// Write to one CF
for (auto j = 100; j < 199; ++j) {
ASSERT_OK(batch.Put(Key(j), rnd.RandomString(1024)));
}
WriteOptions wopts;
wopts.sync = true;
ASSERT_OK(db[i]->Write(wopts, &batch));
if (i != 1) {
ASSERT_OK(db[i]->Flush(FlushOptions()));
} else {
ASSERT_TRUE(db[i]->Flush(FlushOptions()).IsNoSpace());
}
}
for (auto i = 0; i < kNumDbInstances; ++i) {
Status s = static_cast<DBImpl*>(db[i])->TEST_WaitForCompact(true);
switch (i) {
case 0:
ASSERT_EQ(s.severity(), Status::Severity::kSoftError);
break;
case 1:
ASSERT_EQ(s.severity(), Status::Severity::kHardError);
break;
case 2:
ASSERT_OK(s);
break;
}
fault_fs[i]->SetFilesystemActive(true);
}
def_env->SetFilesystemActive(true);
for (auto i = 0; i < kNumDbInstances; ++i) {
std::string prop;
if (i < 2) {
ASSERT_EQ(listener[i]->WaitForRecovery(5000000), true);
}
if (i == 1) {
ASSERT_OK(static_cast<DBImpl*>(db[i])->TEST_WaitForCompact(true));
}
EXPECT_TRUE(db[i]->GetProperty(
"rocksdb.num-files-at-level" + std::to_string(0), &prop));
EXPECT_EQ(atoi(prop.c_str()), 0);
EXPECT_TRUE(db[i]->GetProperty(
"rocksdb.num-files-at-level" + std::to_string(1), &prop));
EXPECT_EQ(atoi(prop.c_str()), 1);
}
SstFileManagerImpl* sfmImpl =
static_cast_with_check<SstFileManagerImpl>(sfm.get());
sfmImpl->Close();
for (auto i = 0; i < kNumDbInstances; ++i) {
char buf[16];
snprintf(buf, sizeof(buf), "_%d", i);
fault_fs[i]->SetFilesystemActive(true);
delete db[i];
if (getenv("KEEP_DB")) {
printf("DB is still at %s%s\n", dbname_.c_str(), buf);
} else {
EXPECT_OK(DestroyDB(dbname_ + std::string(buf), options[i]));
}
}
options.clear();
delete def_env;
}
// When Put the KV-pair, the write option is set to disable WAL.
// If retryable error happens in this condition, map the bg error
// to soft error and trigger auto resume. During auto resume, SwitchMemtable
// is disabled to avoid small SST tables. Write can still be applied before
// the bg error is cleaned unless the memtable is full.
TEST_F(DBErrorHandlingFSTest, FLushWritNoWALRetryableErrorAutoRecover1) {
// Activate the FS before the first resume
std::shared_ptr<ErrorHandlerFSListener> listener(
new ErrorHandlerFSListener());
Options options = GetDefaultOptions();
options.env = fault_env_.get();
options.create_if_missing = true;
options.listeners.emplace_back(listener);
options.max_bgerror_resume_count = 2;
options.bgerror_resume_retry_interval = 100000; // 0.1 second
options.statistics = CreateDBStatistics();
Status s;
listener->EnableAutoRecovery(false);
DestroyAndReopen(options);
IOStatus error_msg = IOStatus::IOError("Retryable IO Error");
error_msg.SetRetryable(true);
WriteOptions wo = WriteOptions();
wo.disableWAL = true;
ASSERT_OK(Put(Key(1), "val1", wo));
ROCKSDB_NAMESPACE::SyncPoint::GetInstance()->LoadDependency(
{{"RecoverFromRetryableBGIOError:LoopOut",
"FLushWritNoWALRetryableeErrorAutoRecover1:1"}});
SyncPoint::GetInstance()->SetCallBack(
"BuildTable:BeforeFinishBuildTable",
[&](void*) { fault_fs_->SetFilesystemActive(false, error_msg); });
SyncPoint::GetInstance()->EnableProcessing();
s = Flush();
ASSERT_EQ("val1", Get(Key(1)));
ASSERT_EQ(s.severity(), ROCKSDB_NAMESPACE::Status::Severity::kSoftError);
TEST_SYNC_POINT("FLushWritNoWALRetryableeErrorAutoRecover1:1");
ASSERT_EQ("val1", Get(Key(1)));
ASSERT_EQ("val1", Get(Key(1)));
SyncPoint::GetInstance()->DisableProcessing();
fault_fs_->SetFilesystemActive(true);
ASSERT_EQ(3, options.statistics->getAndResetTickerCount(
ERROR_HANDLER_BG_ERROR_COUNT));
ASSERT_EQ(3, options.statistics->getAndResetTickerCount(
ERROR_HANDLER_BG_IO_ERROR_COUNT));
ASSERT_EQ(3, options.statistics->getAndResetTickerCount(
ERROR_HANDLER_BG_RETRYABLE_IO_ERROR_COUNT));
ASSERT_EQ(1, options.statistics->getAndResetTickerCount(
ERROR_HANDLER_AUTORESUME_COUNT));
ASSERT_LE(0, options.statistics->getAndResetTickerCount(
ERROR_HANDLER_AUTORESUME_RETRY_TOTAL_COUNT));
ASSERT_LE(0, options.statistics->getAndResetTickerCount(
ERROR_HANDLER_AUTORESUME_SUCCESS_COUNT));
HistogramData autoresume_retry;
options.statistics->histogramData(ERROR_HANDLER_AUTORESUME_RETRY_COUNT,
&autoresume_retry);
ASSERT_GE(autoresume_retry.max, 0);
ASSERT_OK(Put(Key(2), "val2", wo));
s = Flush();
// Since auto resume fails, the bg error is not cleand, flush will
// return the bg_error set before.
ASSERT_EQ(s.severity(), ROCKSDB_NAMESPACE::Status::Severity::kSoftError);
ASSERT_EQ("val2", Get(Key(2)));
// call auto resume
ASSERT_OK(dbfull()->Resume());
ASSERT_OK(Put(Key(3), "val3", wo));
// After resume is successful, the flush should be ok.
ASSERT_OK(Flush());
ASSERT_EQ("val3", Get(Key(3)));
Destroy(options);
}
TEST_F(DBErrorHandlingFSTest, FLushWritNoWALRetryableErrorAutoRecover2) {
// Activate the FS before the first resume
std::shared_ptr<ErrorHandlerFSListener> listener(
new ErrorHandlerFSListener());
Options options = GetDefaultOptions();
options.env = fault_env_.get();
options.create_if_missing = true;
options.listeners.emplace_back(listener);
options.max_bgerror_resume_count = 2;
options.bgerror_resume_retry_interval = 100000; // 0.1 second
options.statistics = CreateDBStatistics();
Status s;
listener->EnableAutoRecovery(false);
DestroyAndReopen(options);
IOStatus error_msg = IOStatus::IOError("Retryable IO Error");
error_msg.SetRetryable(true);
WriteOptions wo = WriteOptions();
wo.disableWAL = true;
ASSERT_OK(Put(Key(1), "val1", wo));
SyncPoint::GetInstance()->SetCallBack(
"BuildTable:BeforeFinishBuildTable",
[&](void*) { fault_fs_->SetFilesystemActive(false, error_msg); });
SyncPoint::GetInstance()->EnableProcessing();
s = Flush();
ASSERT_EQ("val1", Get(Key(1)));
ASSERT_EQ(s.severity(), ROCKSDB_NAMESPACE::Status::Severity::kSoftError);
SyncPoint::GetInstance()->DisableProcessing();
fault_fs_->SetFilesystemActive(true);
ASSERT_EQ(listener->WaitForRecovery(5000000), true);
ASSERT_EQ("val1", Get(Key(1)));
ASSERT_EQ(1, options.statistics->getAndResetTickerCount(
ERROR_HANDLER_BG_ERROR_COUNT));
ASSERT_EQ(1, options.statistics->getAndResetTickerCount(
ERROR_HANDLER_BG_IO_ERROR_COUNT));
ASSERT_EQ(1, options.statistics->getAndResetTickerCount(
ERROR_HANDLER_BG_RETRYABLE_IO_ERROR_COUNT));
ASSERT_EQ(1, options.statistics->getAndResetTickerCount(
ERROR_HANDLER_AUTORESUME_COUNT));
ASSERT_LE(0, options.statistics->getAndResetTickerCount(
ERROR_HANDLER_AUTORESUME_RETRY_TOTAL_COUNT));
ASSERT_LE(0, options.statistics->getAndResetTickerCount(
ERROR_HANDLER_AUTORESUME_SUCCESS_COUNT));
HistogramData autoresume_retry;
options.statistics->histogramData(ERROR_HANDLER_AUTORESUME_RETRY_COUNT,
&autoresume_retry);
ASSERT_GE(autoresume_retry.max, 0);
ASSERT_OK(Put(Key(2), "val2", wo));
s = Flush();
// Since auto resume is successful, the bg error is cleaned, flush will
// be successful.
ASSERT_OK(s);
ASSERT_EQ("val2", Get(Key(2)));
Destroy(options);
}
// Auto resume fromt the flush retryable IO error. Activate the FS before the
// first resume. Resume is successful
TEST_F(DBErrorHandlingFSTest, FLushWritRetryableErrorAutoRecover1) {
// Activate the FS before the first resume
std::shared_ptr<ErrorHandlerFSListener> listener(
new ErrorHandlerFSListener());
Options options = GetDefaultOptions();
options.env = fault_env_.get();
options.create_if_missing = true;
options.listeners.emplace_back(listener);
options.max_bgerror_resume_count = 2;
options.bgerror_resume_retry_interval = 100000; // 0.1 second
Status s;
listener->EnableAutoRecovery(false);
DestroyAndReopen(options);
IOStatus error_msg = IOStatus::IOError("Retryable IO Error");
error_msg.SetRetryable(true);
ASSERT_OK(Put(Key(1), "val1"));
SyncPoint::GetInstance()->SetCallBack(
"BuildTable:BeforeFinishBuildTable",
[&](void*) { fault_fs_->SetFilesystemActive(false, error_msg); });
SyncPoint::GetInstance()->EnableProcessing();
s = Flush();
ASSERT_EQ(s.severity(), ROCKSDB_NAMESPACE::Status::Severity::kSoftError);
SyncPoint::GetInstance()->DisableProcessing();
fault_fs_->SetFilesystemActive(true);
ASSERT_EQ(listener->WaitForRecovery(5000000), true);
ASSERT_EQ("val1", Get(Key(1)));
Reopen(options);
ASSERT_EQ("val1", Get(Key(1)));
ASSERT_OK(Put(Key(2), "val2"));
ASSERT_OK(Flush());
ASSERT_EQ("val2", Get(Key(2)));
Destroy(options);
}
// Auto resume fromt the flush retryable IO error and set the retry limit count.
// Never activate the FS and auto resume should fail at the end
TEST_F(DBErrorHandlingFSTest, FLushWritRetryableErrorAutoRecover2) {
// Fail all the resume and let user to resume
std::shared_ptr<ErrorHandlerFSListener> listener(
new ErrorHandlerFSListener());
Options options = GetDefaultOptions();
options.env = fault_env_.get();
options.create_if_missing = true;
options.listeners.emplace_back(listener);
options.max_bgerror_resume_count = 2;
options.bgerror_resume_retry_interval = 100000; // 0.1 second
Status s;
listener->EnableAutoRecovery(false);
DestroyAndReopen(options);
IOStatus error_msg = IOStatus::IOError("Retryable IO Error");
error_msg.SetRetryable(true);
ASSERT_OK(Put(Key(1), "val1"));
ROCKSDB_NAMESPACE::SyncPoint::GetInstance()->LoadDependency(
{{"FLushWritRetryableeErrorAutoRecover2:0",
"RecoverFromRetryableBGIOError:BeforeStart"},
{"RecoverFromRetryableBGIOError:LoopOut",
"FLushWritRetryableeErrorAutoRecover2:1"}});
SyncPoint::GetInstance()->SetCallBack(
"BuildTable:BeforeFinishBuildTable",
[&](void*) { fault_fs_->SetFilesystemActive(false, error_msg); });
SyncPoint::GetInstance()->EnableProcessing();
s = Flush();
ASSERT_EQ(s.severity(), ROCKSDB_NAMESPACE::Status::Severity::kSoftError);
TEST_SYNC_POINT("FLushWritRetryableeErrorAutoRecover2:0");
TEST_SYNC_POINT("FLushWritRetryableeErrorAutoRecover2:1");
fault_fs_->SetFilesystemActive(true);
ROCKSDB_NAMESPACE::SyncPoint::GetInstance()->ClearAllCallBacks();
SyncPoint::GetInstance()->DisableProcessing();
ASSERT_EQ("val1", Get(Key(1)));
// Auto resume fails due to FS does not recover during resume. User call
// resume manually here.
s = dbfull()->Resume();
ASSERT_EQ("val1", Get(Key(1)));
ASSERT_OK(s);
ASSERT_OK(Put(Key(2), "val2"));
ASSERT_OK(Flush());
ASSERT_EQ("val2", Get(Key(2)));
Destroy(options);
}
// Auto resume fromt the flush retryable IO error and set the retry limit count.
// Fail the first resume and let the second resume be successful.
TEST_F(DBErrorHandlingFSTest, ManifestWriteRetryableErrorAutoRecover) {
// Fail the first resume and let the second resume be successful
std::shared_ptr<ErrorHandlerFSListener> listener(
new ErrorHandlerFSListener());
Options options = GetDefaultOptions();
options.env = fault_env_.get();
options.create_if_missing = true;
options.listeners.emplace_back(listener);
options.max_bgerror_resume_count = 2;
options.bgerror_resume_retry_interval = 100000; // 0.1 second
Status s;
std::string old_manifest;
std::string new_manifest;
listener->EnableAutoRecovery(false);
DestroyAndReopen(options);
old_manifest = GetManifestNameFromLiveFiles();
IOStatus error_msg = IOStatus::IOError("Retryable IO Error");
error_msg.SetRetryable(true);
ASSERT_OK(Put(Key(0), "val"));
ASSERT_OK(Flush());
ASSERT_OK(Put(Key(1), "val"));
ROCKSDB_NAMESPACE::SyncPoint::GetInstance()->LoadDependency(
{{"RecoverFromRetryableBGIOError:BeforeStart",
"ManifestWriteRetryableErrorAutoRecover:0"},
{"ManifestWriteRetryableErrorAutoRecover:1",
"RecoverFromRetryableBGIOError:BeforeWait1"},
{"RecoverFromRetryableBGIOError:RecoverSuccess",
"ManifestWriteRetryableErrorAutoRecover:2"}});
SyncPoint::GetInstance()->SetCallBack(
"VersionSet::LogAndApply:WriteManifest",
[&](void*) { fault_fs_->SetFilesystemActive(false, error_msg); });
SyncPoint::GetInstance()->EnableProcessing();
s = Flush();
ASSERT_EQ(s.severity(), ROCKSDB_NAMESPACE::Status::Severity::kSoftError);
TEST_SYNC_POINT("ManifestWriteRetryableErrorAutoRecover:0");
fault_fs_->SetFilesystemActive(true);
ROCKSDB_NAMESPACE::SyncPoint::GetInstance()->ClearAllCallBacks();
TEST_SYNC_POINT("ManifestWriteRetryableErrorAutoRecover:1");
TEST_SYNC_POINT("ManifestWriteRetryableErrorAutoRecover:2");
SyncPoint::GetInstance()->DisableProcessing();
new_manifest = GetManifestNameFromLiveFiles();
ASSERT_NE(new_manifest, old_manifest);
Reopen(options);
ASSERT_EQ("val", Get(Key(0)));
ASSERT_EQ("val", Get(Key(1)));
Close();
}
TEST_F(DBErrorHandlingFSTest, ManifestWriteNoWALRetryableErrorAutoRecover) {
// Fail the first resume and let the second resume be successful
std::shared_ptr<ErrorHandlerFSListener> listener(
new ErrorHandlerFSListener());
Options options = GetDefaultOptions();
options.env = fault_env_.get();
options.create_if_missing = true;
options.listeners.emplace_back(listener);
options.max_bgerror_resume_count = 2;
options.bgerror_resume_retry_interval = 100000; // 0.1 second
Status s;
std::string old_manifest;
std::string new_manifest;
listener->EnableAutoRecovery(false);
DestroyAndReopen(options);
old_manifest = GetManifestNameFromLiveFiles();
IOStatus error_msg = IOStatus::IOError("Retryable IO Error");
error_msg.SetRetryable(true);
WriteOptions wo = WriteOptions();
wo.disableWAL = true;
ASSERT_OK(Put(Key(0), "val", wo));
ASSERT_OK(Flush());
ASSERT_OK(Put(Key(1), "val", wo));
ROCKSDB_NAMESPACE::SyncPoint::GetInstance()->LoadDependency(
{{"RecoverFromRetryableBGIOError:BeforeStart",
"ManifestWriteNoWALRetryableErrorAutoRecover:0"},
{"ManifestWriteNoWALRetryableErrorAutoRecover:1",
"RecoverFromRetryableBGIOError:BeforeWait1"},
{"RecoverFromRetryableBGIOError:RecoverSuccess",
"ManifestWriteNoWALRetryableErrorAutoRecover:2"}});
SyncPoint::GetInstance()->SetCallBack(
"VersionSet::LogAndApply:WriteManifest",
[&](void*) { fault_fs_->SetFilesystemActive(false, error_msg); });
SyncPoint::GetInstance()->EnableProcessing();
s = Flush();
ASSERT_EQ(s.severity(), ROCKSDB_NAMESPACE::Status::Severity::kSoftError);
TEST_SYNC_POINT("ManifestWriteNoWALRetryableErrorAutoRecover:0");
fault_fs_->SetFilesystemActive(true);
ROCKSDB_NAMESPACE::SyncPoint::GetInstance()->ClearAllCallBacks();
TEST_SYNC_POINT("ManifestWriteNoWALRetryableErrorAutoRecover:1");
TEST_SYNC_POINT("ManifestWriteNoWALRetryableErrorAutoRecover:2");
SyncPoint::GetInstance()->DisableProcessing();
new_manifest = GetManifestNameFromLiveFiles();
ASSERT_NE(new_manifest, old_manifest);
Reopen(options);
ASSERT_EQ("val", Get(Key(0)));
ASSERT_EQ("val", Get(Key(1)));
Close();
}
TEST_F(DBErrorHandlingFSTest,
CompactionManifestWriteRetryableErrorAutoRecover) {
std::shared_ptr<ErrorHandlerFSListener> listener(
new ErrorHandlerFSListener());
Options options = GetDefaultOptions();
options.env = fault_env_.get();
options.create_if_missing = true;
options.level0_file_num_compaction_trigger = 2;
options.listeners.emplace_back(listener);
options.max_bgerror_resume_count = 2;
options.bgerror_resume_retry_interval = 100000; // 0.1 second
Status s;
std::string old_manifest;
std::string new_manifest;
std::atomic<bool> fail_manifest(false);
DestroyAndReopen(options);
old_manifest = GetManifestNameFromLiveFiles();
IOStatus error_msg = IOStatus::IOError("Retryable IO Error");
error_msg.SetRetryable(true);
ASSERT_OK(Put(Key(0), "val"));
ASSERT_OK(Put(Key(2), "val"));
ASSERT_OK(Flush());
listener->OverrideBGError(Status(error_msg, Status::Severity::kHardError));
listener->EnableAutoRecovery(false);
ROCKSDB_NAMESPACE::SyncPoint::GetInstance()->LoadDependency(
// Wait for flush of 2nd L0 file before starting compaction
{{"DBImpl::FlushMemTable:FlushMemTableFinished",
"BackgroundCallCompaction:0"},
// Wait for compaction to detect manifest write error
{"BackgroundCallCompaction:1", "CompactionManifestWriteErrorAR:0"},
// Make compaction thread wait for error to be cleared
{"CompactionManifestWriteErrorAR:1",
"DBImpl::BackgroundCallCompaction:FoundObsoleteFiles"},
{"CompactionManifestWriteErrorAR:2",
"RecoverFromRetryableBGIOError:BeforeStart"},
// Fail the first resume, before the wait in resume
{"RecoverFromRetryableBGIOError:BeforeResume0",
"CompactionManifestWriteErrorAR:3"},
// Activate the FS before the second resume
{"CompactionManifestWriteErrorAR:4",
"RecoverFromRetryableBGIOError:BeforeResume1"},
// Wait the auto resume be sucessful
{"RecoverFromRetryableBGIOError:RecoverSuccess",
"CompactionManifestWriteErrorAR:5"}});
// trigger manifest write failure in compaction thread
ROCKSDB_NAMESPACE::SyncPoint::GetInstance()->SetCallBack(
"BackgroundCallCompaction:0", [&](void*) { fail_manifest.store(true); });
ROCKSDB_NAMESPACE::SyncPoint::GetInstance()->SetCallBack(
"VersionSet::LogAndApply:WriteManifest", [&](void*) {
if (fail_manifest.load()) {
fault_fs_->SetFilesystemActive(false, error_msg);
}
});
ROCKSDB_NAMESPACE::SyncPoint::GetInstance()->EnableProcessing();
ASSERT_OK(Put(Key(1), "val"));
s = Flush();
ASSERT_OK(s);
TEST_SYNC_POINT("CompactionManifestWriteErrorAR:0");
TEST_SYNC_POINT("CompactionManifestWriteErrorAR:1");
s = dbfull()->TEST_WaitForCompact();
ASSERT_EQ(s.severity(), ROCKSDB_NAMESPACE::Status::Severity::kHardError);
TEST_SYNC_POINT("CompactionManifestWriteErrorAR:2");
TEST_SYNC_POINT("CompactionManifestWriteErrorAR:3");
fault_fs_->SetFilesystemActive(true);
SyncPoint::GetInstance()->ClearAllCallBacks();
TEST_SYNC_POINT("CompactionManifestWriteErrorAR:4");
TEST_SYNC_POINT("CompactionManifestWriteErrorAR:5");
SyncPoint::GetInstance()->DisableProcessing();
new_manifest = GetManifestNameFromLiveFiles();
ASSERT_NE(new_manifest, old_manifest);
Reopen(options);
ASSERT_EQ("val", Get(Key(0)));
ASSERT_EQ("val", Get(Key(1)));
ASSERT_EQ("val", Get(Key(2)));
Close();
}
TEST_F(DBErrorHandlingFSTest, CompactionWriteRetryableErrorAutoRecover) {
// In this test, in the first round of compaction, the FS is set to error.
// So the first compaction fails due to retryable IO error and it is mapped
// to soft error. Then, compaction is rescheduled, in the second round of
// compaction, the FS is set to active and compaction is successful, so
// the test will hit the CompactionJob::FinishCompactionOutputFile1 sync
// point.
std::shared_ptr<ErrorHandlerFSListener> listener(
new ErrorHandlerFSListener());
Options options = GetDefaultOptions();
options.env = fault_env_.get();
options.create_if_missing = true;
options.level0_file_num_compaction_trigger = 2;
options.listeners.emplace_back(listener);
Status s;
std::atomic<bool> fail_first(false);
std::atomic<bool> fail_second(true);
DestroyAndReopen(options);
IOStatus error_msg = IOStatus::IOError("Retryable IO Error");
error_msg.SetRetryable(true);
ASSERT_OK(Put(Key(0), "va;"));
ASSERT_OK(Put(Key(2), "va;"));
s = Flush();
ASSERT_OK(s);
listener->OverrideBGError(Status(error_msg, Status::Severity::kHardError));
listener->EnableAutoRecovery(false);
ROCKSDB_NAMESPACE::SyncPoint::GetInstance()->LoadDependency(
{{"DBImpl::FlushMemTable:FlushMemTableFinished",
"BackgroundCallCompaction:0"},
{"CompactionJob::FinishCompactionOutputFile1",
"CompactionWriteRetryableErrorAutoRecover0"}});
ROCKSDB_NAMESPACE::SyncPoint::GetInstance()->SetCallBack(
"DBImpl::BackgroundCompaction:Start",
[&](void*) { fault_fs_->SetFilesystemActive(true); });
ROCKSDB_NAMESPACE::SyncPoint::GetInstance()->SetCallBack(
"BackgroundCallCompaction:0", [&](void*) { fail_first.store(true); });
ROCKSDB_NAMESPACE::SyncPoint::GetInstance()->SetCallBack(
"CompactionJob::OpenCompactionOutputFile", [&](void*) {
if (fail_first.load() && fail_second.load()) {
fault_fs_->SetFilesystemActive(false, error_msg);
fail_second.store(false);
}
});
ROCKSDB_NAMESPACE::SyncPoint::GetInstance()->EnableProcessing();
ASSERT_OK(Put(Key(1), "val"));
s = Flush();
ASSERT_OK(s);
s = dbfull()->TEST_WaitForCompact();
ASSERT_OK(s);
TEST_SYNC_POINT("CompactionWriteRetryableErrorAutoRecover0");
SyncPoint::GetInstance()->ClearAllCallBacks();
SyncPoint::GetInstance()->DisableProcessing();
Destroy(options);
}
TEST_F(DBErrorHandlingFSTest, WALWriteRetryableErrorAutoRecover1) {
std::shared_ptr<ErrorHandlerFSListener> listener(
new ErrorHandlerFSListener());
Options options = GetDefaultOptions();
options.env = fault_env_.get();
options.create_if_missing = true;
options.writable_file_max_buffer_size = 32768;
options.listeners.emplace_back(listener);
options.paranoid_checks = true;
options.max_bgerror_resume_count = 2;
options.bgerror_resume_retry_interval = 100000; // 0.1 second
Status s;
Random rnd(301);
DestroyAndReopen(options);
IOStatus error_msg = IOStatus::IOError("Retryable IO Error");
error_msg.SetRetryable(true);
// For the first batch, write is successful, require sync
{
WriteBatch batch;
for (auto i = 0; i < 100; ++i) {
ASSERT_OK(batch.Put(Key(i), rnd.RandomString(1024)));
}
WriteOptions wopts;
wopts.sync = true;
ASSERT_OK(dbfull()->Write(wopts, &batch));
};
// For the second batch, the first 2 file Append are successful, then the
// following Append fails due to file system retryable IOError.
{
WriteBatch batch;
int write_error = 0;
for (auto i = 100; i < 200; ++i) {
ASSERT_OK(batch.Put(Key(i), rnd.RandomString(1024)));
}
ROCKSDB_NAMESPACE::SyncPoint::GetInstance()->LoadDependency(
{{"WALWriteErrorDone", "RecoverFromRetryableBGIOError:BeforeStart"},
{"RecoverFromRetryableBGIOError:BeforeResume0", "WALWriteError1:0"},
{"WALWriteError1:1", "RecoverFromRetryableBGIOError:BeforeResume1"},
{"RecoverFromRetryableBGIOError:RecoverSuccess", "WALWriteError1:2"}});
SyncPoint::GetInstance()->SetCallBack(
"WritableFileWriter::Append:BeforePrepareWrite", [&](void*) {
write_error++;
if (write_error > 2) {
fault_fs_->SetFilesystemActive(false, error_msg);
}
});
SyncPoint::GetInstance()->EnableProcessing();
WriteOptions wopts;
wopts.sync = true;
s = dbfull()->Write(wopts, &batch);
ASSERT_EQ(true, s.IsIOError());
TEST_SYNC_POINT("WALWriteErrorDone");
TEST_SYNC_POINT("WALWriteError1:0");
fault_fs_->SetFilesystemActive(true);
SyncPoint::GetInstance()->ClearAllCallBacks();
TEST_SYNC_POINT("WALWriteError1:1");
TEST_SYNC_POINT("WALWriteError1:2");
}
SyncPoint::GetInstance()->DisableProcessing();
// Data in corrupted WAL are not stored
for (auto i = 0; i < 199; ++i) {
if (i < 100) {
ASSERT_NE(Get(Key(i)), "NOT_FOUND");
} else {
ASSERT_EQ(Get(Key(i)), "NOT_FOUND");
}
}
// Resume and write a new batch, should be in the WAL
{
WriteBatch batch;
for (auto i = 200; i < 300; ++i) {
ASSERT_OK(batch.Put(Key(i), rnd.RandomString(1024)));
}
WriteOptions wopts;
wopts.sync = true;
ASSERT_OK(dbfull()->Write(wopts, &batch));
};
Reopen(options);
for (auto i = 0; i < 300; ++i) {
if (i < 100 || i >= 200) {
ASSERT_NE(Get(Key(i)), "NOT_FOUND");
} else {
ASSERT_EQ(Get(Key(i)), "NOT_FOUND");
}
}
Close();
}
TEST_F(DBErrorHandlingFSTest, WALWriteRetryableErrorAutoRecover2) {
// Fail the first recover and try second time.
std::shared_ptr<ErrorHandlerFSListener> listener(
new ErrorHandlerFSListener());
Options options = GetDefaultOptions();
options.env = fault_env_.get();
options.create_if_missing = true;
options.writable_file_max_buffer_size = 32768;
options.listeners.emplace_back(listener);
options.paranoid_checks = true;
options.max_bgerror_resume_count = 2;
options.bgerror_resume_retry_interval = 100000; // 0.1 second
Status s;
Random rnd(301);
DestroyAndReopen(options);
IOStatus error_msg = IOStatus::IOError("Retryable IO Error");
error_msg.SetRetryable(true);
// For the first batch, write is successful, require sync
{
WriteBatch batch;
for (auto i = 0; i < 100; ++i) {
ASSERT_OK(batch.Put(Key(i), rnd.RandomString(1024)));
}
WriteOptions wopts;
wopts.sync = true;
ASSERT_OK(dbfull()->Write(wopts, &batch));
};
// For the second batch, the first 2 file Append are successful, then the
// following Append fails due to file system retryable IOError.
{
WriteBatch batch;
int write_error = 0;
for (auto i = 100; i < 200; ++i) {
ASSERT_OK(batch.Put(Key(i), rnd.RandomString(1024)));
}
ROCKSDB_NAMESPACE::SyncPoint::GetInstance()->LoadDependency(
{{"RecoverFromRetryableBGIOError:BeforeWait0", "WALWriteError2:0"},
{"WALWriteError2:1", "RecoverFromRetryableBGIOError:BeforeWait1"},
{"RecoverFromRetryableBGIOError:RecoverSuccess", "WALWriteError2:2"}});
SyncPoint::GetInstance()->SetCallBack(
"WritableFileWriter::Append:BeforePrepareWrite", [&](void*) {
write_error++;
if (write_error > 2) {
fault_fs_->SetFilesystemActive(false, error_msg);
}
});
SyncPoint::GetInstance()->EnableProcessing();
WriteOptions wopts;
wopts.sync = true;
s = dbfull()->Write(wopts, &batch);
ASSERT_EQ(true, s.IsIOError());
TEST_SYNC_POINT("WALWriteError2:0");
fault_fs_->SetFilesystemActive(true);
SyncPoint::GetInstance()->ClearAllCallBacks();
TEST_SYNC_POINT("WALWriteError2:1");
TEST_SYNC_POINT("WALWriteError2:2");
}
SyncPoint::GetInstance()->DisableProcessing();
// Data in corrupted WAL are not stored
for (auto i = 0; i < 199; ++i) {
if (i < 100) {
ASSERT_NE(Get(Key(i)), "NOT_FOUND");
} else {
ASSERT_EQ(Get(Key(i)), "NOT_FOUND");
}
}
// Resume and write a new batch, should be in the WAL
{
WriteBatch batch;
for (auto i = 200; i < 300; ++i) {
ASSERT_OK(batch.Put(Key(i), rnd.RandomString(1024)));
}
WriteOptions wopts;
wopts.sync = true;
ASSERT_OK(dbfull()->Write(wopts, &batch));
};
Reopen(options);
for (auto i = 0; i < 300; ++i) {
if (i < 100 || i >= 200) {
ASSERT_NE(Get(Key(i)), "NOT_FOUND");
} else {
ASSERT_EQ(Get(Key(i)), "NOT_FOUND");
}
}
Close();
}
// Fail auto resume from a flush retryable error and verify that
// OnErrorRecoveryEnd listener callback is called
TEST_F(DBErrorHandlingFSTest, FLushWritRetryableErrorAbortRecovery) {
// Activate the FS before the first resume
std::shared_ptr<ErrorHandlerFSListener> listener(
new ErrorHandlerFSListener());
Options options = GetDefaultOptions();
options.env = fault_env_.get();
options.create_if_missing = true;
options.listeners.emplace_back(listener);
options.max_bgerror_resume_count = 2;
options.bgerror_resume_retry_interval = 100000; // 0.1 second
Status s;
listener->EnableAutoRecovery(false);
DestroyAndReopen(options);
IOStatus error_msg = IOStatus::IOError("Retryable IO Error");
error_msg.SetRetryable(true);
ASSERT_OK(Put(Key(1), "val1"));
SyncPoint::GetInstance()->SetCallBack(
"BuildTable:BeforeFinishBuildTable",
[&](void*) { fault_fs_->SetFilesystemActive(false, error_msg); });
SyncPoint::GetInstance()->EnableProcessing();
s = Flush();
ASSERT_EQ(s.severity(), ROCKSDB_NAMESPACE::Status::Severity::kSoftError);
ASSERT_EQ(listener->WaitForRecovery(5000000), true);
ASSERT_EQ(listener->new_bg_error(), Status::Aborted());
SyncPoint::GetInstance()->DisableProcessing();
fault_fs_->SetFilesystemActive(true);
Destroy(options);
}
TEST_F(DBErrorHandlingFSTest, FlushReadError) {
std::shared_ptr<ErrorHandlerFSListener> listener =
std::make_shared<ErrorHandlerFSListener>();
Options options = GetDefaultOptions();
options.env = fault_env_.get();
options.create_if_missing = true;
options.listeners.emplace_back(listener);
options.statistics = CreateDBStatistics();
Status s;
listener->EnableAutoRecovery(false);
DestroyAndReopen(options);
ASSERT_OK(Put(Key(0), "val"));
SyncPoint::GetInstance()->SetCallBack(
"BuildTable:BeforeOutputValidation", [&](void*) {
IOStatus st = IOStatus::IOError();
st.SetRetryable(true);
st.SetScope(IOStatus::IOErrorScope::kIOErrorScopeFile);
fault_fs_->SetFilesystemActive(false, st);
});
SyncPoint::GetInstance()->SetCallBack(
"BuildTable:BeforeDeleteFile",
[&](void*) { fault_fs_->SetFilesystemActive(true, IOStatus::OK()); });
SyncPoint::GetInstance()->EnableProcessing();
s = Flush();
ASSERT_EQ(s.severity(), ROCKSDB_NAMESPACE::Status::Severity::kSoftError);
SyncPoint::GetInstance()->DisableProcessing();
fault_fs_->SetFilesystemActive(true);
ASSERT_EQ(listener->WaitForRecovery(5000000), true);
ASSERT_EQ(1, options.statistics->getAndResetTickerCount(
ERROR_HANDLER_BG_ERROR_COUNT));
ASSERT_EQ(1, options.statistics->getAndResetTickerCount(
ERROR_HANDLER_BG_IO_ERROR_COUNT));
ASSERT_EQ(1, options.statistics->getAndResetTickerCount(
ERROR_HANDLER_BG_RETRYABLE_IO_ERROR_COUNT));
ASSERT_LE(1, options.statistics->getAndResetTickerCount(
ERROR_HANDLER_AUTORESUME_COUNT));
ASSERT_LE(0, options.statistics->getAndResetTickerCount(
ERROR_HANDLER_AUTORESUME_RETRY_TOTAL_COUNT));
s = dbfull()->TEST_GetBGError();
ASSERT_OK(s);
Reopen(GetDefaultOptions());
ASSERT_EQ("val", Get(Key(0)));
}
TEST_F(DBErrorHandlingFSTest, AtomicFlushReadError) {
std::shared_ptr<ErrorHandlerFSListener> listener =
std::make_shared<ErrorHandlerFSListener>();
Options options = GetDefaultOptions();
options.env = fault_env_.get();
options.create_if_missing = true;
options.listeners.emplace_back(listener);
options.statistics = CreateDBStatistics();
Status s;
listener->EnableAutoRecovery(false);
options.atomic_flush = true;
CreateAndReopenWithCF({"pikachu"}, options);
ASSERT_OK(Put(0, Key(0), "val"));
ASSERT_OK(Put(1, Key(0), "val"));
SyncPoint::GetInstance()->SetCallBack(
"BuildTable:BeforeOutputValidation", [&](void*) {
IOStatus st = IOStatus::IOError();
st.SetRetryable(true);
st.SetScope(IOStatus::IOErrorScope::kIOErrorScopeFile);
fault_fs_->SetFilesystemActive(false, st);
});
SyncPoint::GetInstance()->SetCallBack(
"BuildTable:BeforeDeleteFile",
[&](void*) { fault_fs_->SetFilesystemActive(true, IOStatus::OK()); });
SyncPoint::GetInstance()->EnableProcessing();
s = Flush({0, 1});
ASSERT_EQ(s.severity(), ROCKSDB_NAMESPACE::Status::Severity::kSoftError);
SyncPoint::GetInstance()->DisableProcessing();
fault_fs_->SetFilesystemActive(true);
ASSERT_EQ(listener->WaitForRecovery(5000000), true);
ASSERT_EQ(1, options.statistics->getAndResetTickerCount(
ERROR_HANDLER_BG_ERROR_COUNT));
ASSERT_EQ(1, options.statistics->getAndResetTickerCount(
ERROR_HANDLER_BG_IO_ERROR_COUNT));
ASSERT_EQ(1, options.statistics->getAndResetTickerCount(
ERROR_HANDLER_BG_RETRYABLE_IO_ERROR_COUNT));
ASSERT_LE(1, options.statistics->getAndResetTickerCount(
ERROR_HANDLER_AUTORESUME_COUNT));
ASSERT_LE(0, options.statistics->getAndResetTickerCount(
ERROR_HANDLER_AUTORESUME_RETRY_TOTAL_COUNT));
s = dbfull()->TEST_GetBGError();
ASSERT_OK(s);
TryReopenWithColumnFamilies({kDefaultColumnFamilyName, "pikachu"},
GetDefaultOptions());
ASSERT_EQ("val", Get(Key(0)));
}
TEST_F(DBErrorHandlingFSTest, AtomicFlushNoSpaceError) {
std::shared_ptr<ErrorHandlerFSListener> listener =
std::make_shared<ErrorHandlerFSListener>();
Options options = GetDefaultOptions();
options.env = fault_env_.get();
options.create_if_missing = true;
options.listeners.emplace_back(listener);
options.statistics = CreateDBStatistics();
Status s;
listener->EnableAutoRecovery(true);
options.atomic_flush = true;
CreateAndReopenWithCF({"pikachu"}, options);
ASSERT_OK(Put(0, Key(0), "val"));
ASSERT_OK(Put(1, Key(0), "val"));
SyncPoint::GetInstance()->SetCallBack("BuildTable:create_file", [&](void*) {
IOStatus st = IOStatus::NoSpace();
fault_fs_->SetFilesystemActive(false, st);
});
SyncPoint::GetInstance()->SetCallBack(
"BuildTable:BeforeDeleteFile",
[&](void*) { fault_fs_->SetFilesystemActive(true, IOStatus::OK()); });
SyncPoint::GetInstance()->EnableProcessing();
s = Flush({0, 1});
ASSERT_EQ(s.severity(), ROCKSDB_NAMESPACE::Status::Severity::kHardError);
SyncPoint::GetInstance()->DisableProcessing();
fault_fs_->SetFilesystemActive(true);
ASSERT_EQ(listener->WaitForRecovery(5000000), true);
ASSERT_LE(1, options.statistics->getAndResetTickerCount(
ERROR_HANDLER_BG_ERROR_COUNT));
ASSERT_LE(1, options.statistics->getAndResetTickerCount(
ERROR_HANDLER_BG_IO_ERROR_COUNT));
s = dbfull()->TEST_GetBGError();
ASSERT_OK(s);
TryReopenWithColumnFamilies({kDefaultColumnFamilyName, "pikachu"},
GetDefaultOptions());
ASSERT_EQ("val", Get(Key(0)));
}
TEST_F(DBErrorHandlingFSTest, CompactionReadRetryableErrorAutoRecover) {
// In this test, in the first round of compaction, the FS is set to error.
// So the first compaction fails due to retryable IO error and it is mapped
// to soft error. Then, compaction is rescheduled, in the second round of
// compaction, the FS is set to active and compaction is successful, so
// the test will hit the CompactionJob::FinishCompactionOutputFile1 sync
// point.
std::shared_ptr<ErrorHandlerFSListener> listener =
std::make_shared<ErrorHandlerFSListener>();
Options options = GetDefaultOptions();
options.env = fault_env_.get();
options.create_if_missing = true;
options.level0_file_num_compaction_trigger = 2;
options.listeners.emplace_back(listener);
BlockBasedTableOptions table_options;
table_options.no_block_cache = true;
options.table_factory.reset(NewBlockBasedTableFactory(table_options));
Status s;
std::atomic<bool> fail_first(false);
std::atomic<bool> fail_second(true);
Random rnd(301);
DestroyAndReopen(options);
IOStatus error_msg = IOStatus::IOError("Retryable IO Error");
error_msg.SetRetryable(true);
for (int i = 0; i < 100; ++i) {
ASSERT_OK(Put(Key(i), rnd.RandomString(1024)));
}
s = Flush();
ASSERT_OK(s);
listener->OverrideBGError(Status(error_msg, Status::Severity::kHardError));
listener->EnableAutoRecovery(false);
ROCKSDB_NAMESPACE::SyncPoint::GetInstance()->LoadDependency(
{{"DBImpl::FlushMemTable:FlushMemTableFinished",
"BackgroundCallCompaction:0"},
{"CompactionJob::FinishCompactionOutputFile1",
"CompactionWriteRetryableErrorAutoRecover0"}});
ROCKSDB_NAMESPACE::SyncPoint::GetInstance()->SetCallBack(
"DBImpl::BackgroundCompaction:Start",
[&](void*) { fault_fs_->SetFilesystemActive(true); });
ROCKSDB_NAMESPACE::SyncPoint::GetInstance()->SetCallBack(
"BackgroundCallCompaction:0", [&](void*) { fail_first.store(true); });
ROCKSDB_NAMESPACE::SyncPoint::GetInstance()->SetCallBack(
"CompactionJob::Run():PausingManualCompaction:2", [&](void*) {
if (fail_first.load() && fail_second.load()) {
fault_fs_->SetFilesystemActive(false, error_msg);
fail_second.store(false);
}
});
ROCKSDB_NAMESPACE::SyncPoint::GetInstance()->EnableProcessing();
ASSERT_OK(Put(Key(1), "val"));
s = Flush();
ASSERT_OK(s);
s = dbfull()->TEST_WaitForCompact();
ASSERT_OK(s);
TEST_SYNC_POINT("CompactionWriteRetryableErrorAutoRecover0");
SyncPoint::GetInstance()->ClearAllCallBacks();
SyncPoint::GetInstance()->DisableProcessing();
Reopen(GetDefaultOptions());
}
class DBErrorHandlingFencingTest : public DBErrorHandlingFSTest,
public testing::WithParamInterface<bool> {};
TEST_P(DBErrorHandlingFencingTest, FLushWriteFenced) {
std::shared_ptr<ErrorHandlerFSListener> listener(
new ErrorHandlerFSListener());
Options options = GetDefaultOptions();
options.env = fault_env_.get();
options.create_if_missing = true;
options.listeners.emplace_back(listener);
options.paranoid_checks = GetParam();
Status s;
listener->EnableAutoRecovery(true);
DestroyAndReopen(options);
ASSERT_OK(Put(Key(0), "val"));
SyncPoint::GetInstance()->SetCallBack("FlushJob::Start", [&](void*) {
fault_fs_->SetFilesystemActive(false, IOStatus::IOFenced("IO fenced"));
});
SyncPoint::GetInstance()->EnableProcessing();
s = Flush();
ASSERT_EQ(s.severity(), ROCKSDB_NAMESPACE::Status::Severity::kFatalError);
ASSERT_TRUE(s.IsIOFenced());
SyncPoint::GetInstance()->DisableProcessing();
fault_fs_->SetFilesystemActive(true);
s = dbfull()->Resume();
ASSERT_TRUE(s.IsIOFenced());
Destroy(options);
}
TEST_P(DBErrorHandlingFencingTest, ManifestWriteFenced) {
std::shared_ptr<ErrorHandlerFSListener> listener(
new ErrorHandlerFSListener());
Options options = GetDefaultOptions();
options.env = fault_env_.get();
options.create_if_missing = true;
options.listeners.emplace_back(listener);
options.paranoid_checks = GetParam();
Status s;
std::string old_manifest;
std::string new_manifest;
listener->EnableAutoRecovery(true);
DestroyAndReopen(options);
old_manifest = GetManifestNameFromLiveFiles();
ASSERT_OK(Put(Key(0), "val"));
ASSERT_OK(Flush());
ASSERT_OK(Put(Key(1), "val"));
SyncPoint::GetInstance()->SetCallBack(
"VersionSet::LogAndApply:WriteManifest", [&](void*) {
fault_fs_->SetFilesystemActive(false, IOStatus::IOFenced("IO fenced"));
});
SyncPoint::GetInstance()->EnableProcessing();
s = Flush();
ASSERT_EQ(s.severity(), ROCKSDB_NAMESPACE::Status::Severity::kFatalError);
ASSERT_TRUE(s.IsIOFenced());
SyncPoint::GetInstance()->ClearAllCallBacks();
SyncPoint::GetInstance()->DisableProcessing();
fault_fs_->SetFilesystemActive(true);
s = dbfull()->Resume();
ASSERT_TRUE(s.IsIOFenced());
Close();
}
TEST_P(DBErrorHandlingFencingTest, CompactionWriteFenced) {
std::shared_ptr<ErrorHandlerFSListener> listener(
new ErrorHandlerFSListener());
Options options = GetDefaultOptions();
options.env = fault_env_.get();
options.create_if_missing = true;
options.level0_file_num_compaction_trigger = 2;
options.listeners.emplace_back(listener);
options.paranoid_checks = GetParam();
Status s;
DestroyAndReopen(options);
ASSERT_OK(Put(Key(0), "va;"));
ASSERT_OK(Put(Key(2), "va;"));
s = Flush();
ASSERT_OK(s);
listener->EnableAutoRecovery(true);
ROCKSDB_NAMESPACE::SyncPoint::GetInstance()->LoadDependency(
{{"DBImpl::FlushMemTable:FlushMemTableFinished",
"BackgroundCallCompaction:0"}});
ROCKSDB_NAMESPACE::SyncPoint::GetInstance()->SetCallBack(
"BackgroundCallCompaction:0", [&](void*) {
fault_fs_->SetFilesystemActive(false, IOStatus::IOFenced("IO fenced"));
});
ROCKSDB_NAMESPACE::SyncPoint::GetInstance()->EnableProcessing();
ASSERT_OK(Put(Key(1), "val"));
s = Flush();
ASSERT_OK(s);
s = dbfull()->TEST_WaitForCompact();
ASSERT_EQ(s.severity(), ROCKSDB_NAMESPACE::Status::Severity::kFatalError);
ASSERT_TRUE(s.IsIOFenced());
fault_fs_->SetFilesystemActive(true);
s = dbfull()->Resume();
ASSERT_TRUE(s.IsIOFenced());
Destroy(options);
}
TEST_P(DBErrorHandlingFencingTest, WALWriteFenced) {
std::shared_ptr<ErrorHandlerFSListener> listener(
new ErrorHandlerFSListener());
Options options = GetDefaultOptions();
options.env = fault_env_.get();
options.create_if_missing = true;
options.writable_file_max_buffer_size = 32768;
options.listeners.emplace_back(listener);
options.paranoid_checks = GetParam();
Status s;
Random rnd(301);
listener->EnableAutoRecovery(true);
DestroyAndReopen(options);
{
WriteBatch batch;
for (auto i = 0; i < 100; ++i) {
ASSERT_OK(batch.Put(Key(i), rnd.RandomString(1024)));
}
WriteOptions wopts;
wopts.sync = true;
ASSERT_OK(dbfull()->Write(wopts, &batch));
};
{
WriteBatch batch;
int write_error = 0;
for (auto i = 100; i < 199; ++i) {
ASSERT_OK(batch.Put(Key(i), rnd.RandomString(1024)));
}
SyncPoint::GetInstance()->SetCallBack(
"WritableFileWriter::Append:BeforePrepareWrite", [&](void*) {
write_error++;
if (write_error > 2) {
fault_fs_->SetFilesystemActive(false,
IOStatus::IOFenced("IO fenced"));
}
});
SyncPoint::GetInstance()->EnableProcessing();
WriteOptions wopts;
wopts.sync = true;
s = dbfull()->Write(wopts, &batch);
ASSERT_TRUE(s.IsIOFenced());
}
SyncPoint::GetInstance()->DisableProcessing();
fault_fs_->SetFilesystemActive(true);
{
WriteBatch batch;
for (auto i = 0; i < 100; ++i) {
ASSERT_OK(batch.Put(Key(i), rnd.RandomString(1024)));
}
WriteOptions wopts;
wopts.sync = true;
s = dbfull()->Write(wopts, &batch);
ASSERT_TRUE(s.IsIOFenced());
}
Close();
}
INSTANTIATE_TEST_CASE_P(DBErrorHandlingFSTest, DBErrorHandlingFencingTest,
::testing::Bool());
} // namespace ROCKSDB_NAMESPACE
int main(int argc, char** argv) {
ROCKSDB_NAMESPACE::port::InstallStackTraceHandler();
::testing::InitGoogleTest(&argc, argv);
return RUN_ALL_TESTS();
}