Go to file
Anand Ananthabhotla a27fce408e Auto recovery from out of space errors (#4164)
Summary:
This commit implements automatic recovery from a Status::NoSpace() error
during background operations such as write callback, flush and
compaction. The broad design is as follows -
1. Compaction errors are treated as soft errors and don't put the
database in read-only mode. A compaction is delayed until enough free
disk space is available to accomodate the compaction outputs, which is
estimated based on the input size. This means that users can continue to
write, and we rely on the WriteController to delay or stop writes if the
compaction debt becomes too high due to persistent low disk space
condition
2. Errors during write callback and flush are treated as hard errors,
i.e the database is put in read-only mode and goes back to read-write
only fater certain recovery actions are taken.
3. Both types of recovery rely on the SstFileManagerImpl to poll for
sufficient disk space. We assume that there is a 1-1 mapping between an
SFM and the underlying OS storage container. For cases where multiple
DBs are hosted on a single storage container, the user is expected to
allocate a single SFM instance and use the same one for all the DBs. If
no SFM is specified by the user, DBImpl::Open() will allocate one, but
this will be one per DB and each DB will recover independently. The
recovery implemented by SFM is as follows -
  a) On the first occurance of an out of space error during compaction,
subsequent
  compactions will be delayed until the disk free space check indicates
  enough available space. The required space is computed as the sum of
  input sizes.
  b) The free space check requirement will be removed once the amount of
  free space is greater than the size reserved by in progress
  compactions when the first error occured
  c) If the out of space error is a hard error, a background thread in
  SFM will poll for sufficient headroom before triggering the recovery
  of the database and putting it in write-only mode. The headroom is
  calculated as the sum of the write_buffer_size of all the DB instances
  associated with the SFM
4. EventListener callbacks will be called at the start and completion of
automatic recovery. Users can disable the auto recov ery in the start
callback, and later initiate it manually by calling DB::Resume()

Todo:
1. More extensive testing
2. Add disk full condition to db_stress (follow-on PR)
Pull Request resolved: https://github.com/facebook/rocksdb/pull/4164

Differential Revision: D9846378

Pulled By: anand1976

fbshipit-source-id: 80ea875dbd7f00205e19c82215ff6e37da10da4a
2018-09-15 13:43:04 -07:00
buckifier Grab straggler files to explicitly import AutoHeaders 2018-08-28 21:28:55 -07:00
build_tools Release 5.16 (#4298) 2018-08-21 14:43:08 -07:00
cache Support group commits of version edits (#3944) 2018-06-28 12:34:39 -07:00
cmake Search paths provided by intel's "tbbvars.sh". 2018-05-07 14:28:36 -07:00
coverage Remove unused imports, from python scripts. (#4057) 2018-06-26 12:43:04 -07:00
db Auto recovery from out of space errors (#4164) 2018-09-15 13:43:04 -07:00
docs data block hash index blog post 2018-08-29 10:58:10 -07:00
env Auto recovery from out of space errors (#4164) 2018-09-15 13:43:04 -07:00
examples Pin top-level index on partitioned index/filter blocks (#4037) 2018-06-22 15:27:46 -07:00
hdfs Comment out unused variables 2018-03-05 13:13:41 -08:00
include/rocksdb Auto recovery from out of space errors (#4164) 2018-09-15 13:43:04 -07:00
java Remove warnings caused by unused variables in jni (#4345) 2018-09-05 13:42:34 -07:00
memtable Suppress clang analyzer error (#4299) 2018-08-21 16:43:05 -07:00
monitoring Support pragma once in all header files and cleanup some warnings (#4339) 2018-09-05 18:13:31 -07:00
options Add path to WritableFileWriter. (#4039) 2018-08-23 10:12:58 -07:00
port Fix cross-filesystem checkpoint on Windows (#4365) 2018-09-14 10:28:39 -07:00
table Remove sync point from Block destructor (#4370) 2018-09-15 00:12:57 -07:00
third-party Support pragma once in all header files and cleanup some warnings (#4339) 2018-09-05 18:13:31 -07:00
tools Auto recovery from out of space errors (#4164) 2018-09-15 13:43:04 -07:00
util Auto recovery from out of space errors (#4164) 2018-09-15 13:43:04 -07:00
utilities Store the return value of Fsync for check 2018-09-14 13:29:56 -07:00
.clang-format A script that automatically reformat affected lines 2014-01-14 12:21:24 -08:00
.gitignore RocksDB Trace Analyzer (#4091) 2018-08-13 11:44:02 -07:00
.lgtm.yml Create lgtm.yml for LGTM.com C/C++ analysis (#4058) 2018-06-26 12:43:04 -07:00
.travis.yml Add GCC 8 to Travis (#3433) 2018-07-13 10:58:06 -07:00
AUTHORS Update RocksDB Authors File 2017-10-18 14:42:10 -07:00
CMakeLists.txt cmake: allow opting out debug runtime (#4317) 2018-08-27 15:58:59 -07:00
CODE_OF_CONDUCT.md Add Code of Conduct 2017-12-05 18:42:35 -08:00
CONTRIBUTING.md Add Code of Conduct 2017-12-05 18:42:35 -08:00
COPYING Add GPLv2 as an alternative license. 2017-04-27 18:06:12 -07:00
DEFAULT_OPTIONS_HISTORY.md options.delayed_write_rate use the rate of rate_limiter by default. 2017-05-24 09:58:24 -07:00
DUMP_FORMAT.md First version of rocksdb_dump and rocksdb_undump. 2015-06-19 16:24:36 -07:00
HISTORY.md Skip concurrency control during recovery of pessimistic txn (#4346) 2018-09-10 16:57:53 -07:00
INSTALL.md Enable compilation on OpenBSD 2018-03-19 12:30:05 -07:00
LANGUAGE-BINDINGS.md Added PingCaps Rust RocksDB and ObjectiveRocks (#4065) 2018-06-27 15:43:21 -07:00
LICENSE.Apache Change RocksDB License 2017-07-15 16:11:23 -07:00
LICENSE.leveldb Add back the LevelDB license file 2017-07-16 18:42:18 -07:00
Makefile Fix Makefile target 'jtest' on PowerPC (#4357) 2018-09-11 16:37:23 -07:00
README.md Create lgtm.yml for LGTM.com C/C++ analysis (#4058) 2018-06-26 12:43:04 -07:00
ROCKSDB_LITE.md Fix some typos in comments and docs. 2018-03-08 10:27:25 -08:00
TARGETS Lint TARGETS files with buildifier 2018-09-11 14:58:19 -07:00
USERS.md Support range deletion tombstones in IngestExternalFile SSTs (#3778) 2018-07-13 22:43:09 -07:00
Vagrantfile Adding CentOS 7 Vagrantfile & build script 2018-02-26 15:27:17 -08:00
WINDOWS_PORT.md Add GCC 8 to Travis (#3433) 2018-07-13 10:58:06 -07:00
appveyor.yml Upgrade Appveyor to VS2017 2018-02-01 13:57:01 -08:00
issue_template.md Add a template for issues 2017-09-29 11:41:28 -07:00
src.mk Remove trace_analyzer_tool from LIB_SOURCES (#4331) 2018-08-29 21:28:40 -07:00
thirdparty.inc Provide a way to override windows memory allocator with jemalloc for ZSTD 2018-06-04 12:12:48 -07:00

README.md

RocksDB: A Persistent Key-Value Store for Flash and RAM Storage

Linux/Mac Build Status Windows Build status PPC64le Build Status

RocksDB is developed and maintained by Facebook Database Engineering Team. It is built on earlier work on LevelDB by Sanjay Ghemawat (sanjay@google.com) and Jeff Dean (jeff@google.com)

This code is a library that forms the core building block for a fast key value server, especially suited for storing data on flash drives. It has a Log-Structured-Merge-Database (LSM) design with flexible tradeoffs between Write-Amplification-Factor (WAF), Read-Amplification-Factor (RAF) and Space-Amplification-Factor (SAF). It has multi-threaded compactions, making it specially suitable for storing multiple terabytes of data in a single database.

Start with example usage here: https://github.com/facebook/rocksdb/tree/master/examples

See the github wiki for more explanation.

The public interface is in include/. Callers should not include or rely on the details of any other header files in this package. Those internal APIs may be changed without warning.

Design discussions are conducted in https://www.facebook.com/groups/rocksdb.dev/

License

RocksDB is dual-licensed under both the GPLv2 (found in the COPYING file in the root directory) and Apache 2.0 License (found in the LICENSE.Apache file in the root directory). You may select, at your option, one of the above-listed licenses.