Go to file
Yanqin Jin e66199d848 First step towards handling MANIFEST write error (#6949)
Summary:
This PR provides preliminary support for handling IO error during MANIFEST write.
File write/sync is not guaranteed to be atomic. If we encounter an IOError while writing/syncing to the MANIFEST file, we cannot be sure about the state of the MANIFEST file. The version edits may or may not have reached the file. During cleanup, if we delete the newly-generated SST files referenced by the pending version edit(s), but the version edit(s) actually are persistent in the MANIFEST, then next recovery attempt will process the version edits(s) and then fail since the SST files have already been deleted.
One approach is to truncate the MANIFEST after write/sync error, so that it is safe to delete the SST files. However, file truncation may not be supported on certain file systems. Therefore, we take the following approach.
If an IOError is detected during MANIFEST write/sync, we disable file deletions for the faulty database. Depending on whether the IOError is retryable (set by underlying file system), either RocksDB or application can call `DB::Resume()`, or simply shutdown and restart. During `Resume()`, RocksDB will try to switch to a new MANIFEST and write all existing in-memory version storage in the new file. If this succeeds, then RocksDB may proceed. If all recovery is completed, then file deletions will be re-enabled.
Note that multiple threads can call `LogAndApply()` at the same time, though only one of them will be going through the process MANIFEST write, possibly batching the version edits of other threads. When the leading MANIFEST writer finishes, all of the MANIFEST writing threads in this batch will have the same IOError. They will all call `ErrorHandler::SetBGError()` in which file deletion will be disabled.

Possible future directions:
- Add an `ErrorContext` structure so that it is easier to pass more info to `ErrorHandler`. Currently, as in this example, a new `BackgroundErrorReason` has to be added.

Test plan (dev server):
make check
Pull Request resolved: https://github.com/facebook/rocksdb/pull/6949

Reviewed By: anand1976

Differential Revision: D22026020

Pulled By: riversand963

fbshipit-source-id: f3c68a2ef45d9b505d0d625c7c5e0c88495b91c8
2020-06-24 19:07:08 -07:00
.circleci Test CircleCI with CLANG-10 (#7025) 2020-06-24 16:22:49 -07:00
.github/workflows
buckifier
build_tools
cache
cmake
coverage
db First step towards handling MANIFEST write error (#6949) 2020-06-24 19:07:08 -07:00
db_stress_tool Minimize memory internal fragmentation for Bloom filters (#6427) 2020-06-22 13:32:07 -07:00
docs
env Test CircleCI with CLANG-10 (#7025) 2020-06-24 16:22:49 -07:00
examples
file Fix block checksum for >=4GB, refactor (#6978) 2020-06-19 16:18:24 -07:00
hdfs
include/rocksdb First step towards handling MANIFEST write error (#6949) 2020-06-24 19:07:08 -07:00
java
logging
memory
memtable
monitoring
options Test CircleCI with CLANG-10 (#7025) 2020-06-24 16:22:49 -07:00
port
table Test CircleCI with CLANG-10 (#7025) 2020-06-24 16:22:49 -07:00
test_util Remove racially charged terms "whitelist" and "blacklist" (#7008) 2020-06-19 15:27:32 -07:00
third-party
tools Test CircleCI with CLANG-10 (#7025) 2020-06-24 16:22:49 -07:00
trace_replay Fix unity build broken by #7007 (#7024) 2020-06-24 13:40:48 -07:00
util Minimize memory internal fragmentation for Bloom filters (#6427) 2020-06-22 13:32:07 -07:00
utilities Move kNoExpiration to blob_db.h (#7018) 2020-06-23 13:45:06 -07:00
.clang-format
.gitignore
.lgtm.yml
.travis.yml
.watchmanconfig
AUTHORS
CMakeLists.txt
CODE_OF_CONDUCT.md
CONTRIBUTING.md
COPYING
DEFAULT_OPTIONS_HISTORY.md
DUMP_FORMAT.md
HISTORY.md First step towards handling MANIFEST write error (#6949) 2020-06-24 19:07:08 -07:00
INSTALL.md
LANGUAGE-BINDINGS.md
LICENSE.Apache
LICENSE.leveldb
Makefile Remove racially charged terms "whitelist" and "blacklist" (#7008) 2020-06-19 15:27:32 -07:00
README.md
ROCKSDB_LITE.md
TARGETS
USERS.md
Vagrantfile
WINDOWS_PORT.md
appveyor.yml
defs.bzl
issue_template.md
src.mk
thirdparty.inc

README.md

RocksDB: A Persistent Key-Value Store for Flash and RAM Storage

Linux/Mac Build Status Windows Build status PPC64le Build Status

RocksDB is developed and maintained by Facebook Database Engineering Team. It is built on earlier work on LevelDB by Sanjay Ghemawat (sanjay@google.com) and Jeff Dean (jeff@google.com)

This code is a library that forms the core building block for a fast key-value server, especially suited for storing data on flash drives. It has a Log-Structured-Merge-Database (LSM) design with flexible tradeoffs between Write-Amplification-Factor (WAF), Read-Amplification-Factor (RAF) and Space-Amplification-Factor (SAF). It has multi-threaded compactions, making it especially suitable for storing multiple terabytes of data in a single database.

Start with example usage here: https://github.com/facebook/rocksdb/tree/master/examples

See the github wiki for more explanation.

The public interface is in include/. Callers should not include or rely on the details of any other header files in this package. Those internal APIs may be changed without warning.

Design discussions are conducted in https://www.facebook.com/groups/rocksdb.dev/ and https://rocksdb.slack.com/

License

RocksDB is dual-licensed under both the GPLv2 (found in the COPYING file in the root directory) and Apache 2.0 License (found in the LICENSE.Apache file in the root directory). You may select, at your option, one of the above-listed licenses.