Go to file
Akanksha Mahajan 596e9008e4 Stall writes in WriteBufferManager when memory_usage exceeds buffer_size (#7898)
Summary:
When WriteBufferManager is shared across DBs and column families
to maintain memory usage under a limit, OOMs have been observed when flush cannot
finish but writes continuously insert to memtables.
In order to avoid OOMs, when memory usage goes beyond buffer_limit_ and DBs tries to write,
this change will stall incoming writers until flush is completed and memory_usage
drops.

Design: Stall condition: When total memory usage exceeds WriteBufferManager::buffer_size_
(memory_usage() >= buffer_size_) WriterBufferManager::ShouldStall() returns true.

DBImpl first block incoming/future writers by calling write_thread_.BeginWriteStall()
(which adds dummy stall object to the writer's queue).
Then DB is blocked on a state State::Blocked (current write doesn't go
through). WBStallInterface object maintained by every DB instance is added to the queue of
WriteBufferManager.

If multiple DBs tries to write during this stall, they will also be
blocked when check WriteBufferManager::ShouldStall() returns true.

End Stall condition: When flush is finished and memory usage goes down, stall will end only if memory
waiting to be flushed is less than buffer_size/2. This lower limit will give time for flush
to complete and avoid continous stalling if memory usage remains close to buffer_size.

WriterBufferManager::EndWriteStall() is called,
which removes all instances from its queue and signal them to continue.
Their state is changed to State::Running and they are unblocked. DBImpl
then signal all incoming writers of that DB to continue by calling
write_thread_.EndWriteStall() (which removes dummy stall object from the
queue).

DB instance creates WBMStallInterface which is an interface to block and
signal DBs during stall.
When DB needs to be blocked or signalled by WriteBufferManager,
state_for_wbm_ state is changed accordingly (RUNNING or BLOCKED).

Pull Request resolved: https://github.com/facebook/rocksdb/pull/7898

Test Plan: Added a new test db/db_write_buffer_manager_test.cc

Reviewed By: anand1976

Differential Revision: D26093227

Pulled By: akankshamahajan15

fbshipit-source-id: 2bbd982a3fb7033f6de6153aa92a221249861aae
2021-04-21 13:54:02 -07:00
.circleci Fix unittest no space issue (#8204) 2021-04-20 08:42:28 -07:00
.github/workflows Update clang-format-diff.py path (#7944) 2021-02-09 12:49:38 -08:00
buckifier Make tests "parallel" and "passing ASC" by default (#8146) 2021-04-04 20:10:11 -07:00
build_tools Fix crash test with backup as read-only DB (#8161) 2021-04-06 23:31:51 -07:00
cache Fix some typos in comments (#8066) 2021-03-25 21:18:08 -07:00
cmake Add `find_dependency()` in cmake config file. (#6791) 2020-05-12 21:18:29 -07:00
coverage Find the correct gcov (#6904) 2020-06-01 16:33:05 -07:00
db Stall writes in WriteBufferManager when memory_usage exceeds buffer_size (#7898) 2021-04-21 13:54:02 -07:00
db_stress_tool Revert Ribbon starting level support from #8198 (#8212) 2021-04-20 19:46:40 -07:00
docs Add Blog Post "(Call For Contribution) Make Universal Compaction More Incremental" (#8182) 2021-04-13 13:18:47 -07:00
env Fix unittest no space issue (#8204) 2021-04-20 08:42:28 -07:00
examples make:Fix c header prototypes (#7994) 2021-03-09 20:44:23 -08:00
file Handle rename() failure in non-local FS (#8192) 2021-04-19 18:11:13 -07:00
fuzz Remove Legacy and Custom FileWrapper classes from header files (#7851) 2021-01-28 22:10:32 -08:00
hdfs fix build with 'USE_HDFS' on windows (#6950) 2020-06-12 16:21:50 -07:00
include/rocksdb Stall writes in WriteBufferManager when memory_usage exceeds buffer_size (#7898) 2021-04-21 13:54:02 -07:00
java Handle rename() failure in non-local FS (#8192) 2021-04-19 18:11:13 -07:00
logging Use SystemClock* instead of std::shared_ptr<SystemClock> in lower level routines (#8033) 2021-03-15 04:34:11 -07:00
memory Use thread-safe `strerror_r()` to get error message (#8087) 2021-03-24 23:07:27 -07:00
memtable Stall writes in WriteBufferManager when memory_usage exceeds buffer_size (#7898) 2021-04-21 13:54:02 -07:00
monitoring Disable IOStatsContext/PerfContext if no thread local (#8117) 2021-04-13 07:56:59 -07:00
options Revert Ribbon starting level support from #8198 (#8212) 2021-04-20 19:46:40 -07:00
plugin Makefile support to statically link external plugin code (#7918) 2021-02-10 08:35:34 -08:00
port Fix Windows strcmp for Unicode (#8190) 2021-04-16 12:11:16 -07:00
table Revert Ribbon starting level support from #8198 (#8212) 2021-04-20 19:46:40 -07:00
test_util Unittest uses unique test db name (#8124) 2021-03-30 16:51:26 -07:00
third-party Fix a compilation error in CircleCI vs2019 CXX20 (#8090) 2021-03-23 10:28:04 -07:00
tools Revert Ribbon starting level support from #8198 (#8212) 2021-04-20 19:46:40 -07:00
trace_replay Add request_id in IODebugContext. (#8045) 2021-04-01 13:14:51 -07:00
util Revert Ribbon starting level support from #8198 (#8212) 2021-04-20 19:46:40 -07:00
utilities Cleanup include (#8208) 2021-04-20 14:57:27 -07:00
.clang-format A script that automatically reformat affected lines 2014-01-14 12:21:24 -08:00
.gitignore gitignore cmake-build-* for CLion integration (#7933) 2021-02-19 13:43:15 -08:00
.lgtm.yml Create lgtm.yml for LGTM.com C/C++ analysis (#4058) 2018-06-26 12:43:04 -07:00
.travis.yml Move arm build from travis to circleci (#8203) 2021-04-19 20:07:02 -07:00
.watchmanconfig Added .watchmanconfig file to rocksdb repo (#5593) 2019-07-19 15:00:33 -07:00
AUTHORS Update RocksDB Authors File 2017-10-18 14:42:10 -07:00
CMakeLists.txt Disable IOStatsContext/PerfContext if no thread local (#8117) 2021-04-13 07:56:59 -07:00
CODE_OF_CONDUCT.md Adopt Contributor Covenant 2019-08-29 23:21:01 -07:00
CONTRIBUTING.md Add Code of Conduct 2017-12-05 18:42:35 -08:00
COPYING Add GPLv2 as an alternative license. 2017-04-27 18:06:12 -07:00
DEFAULT_OPTIONS_HISTORY.md options.delayed_write_rate use the rate of rate_limiter by default. 2017-05-24 09:58:24 -07:00
DUMP_FORMAT.md First version of rocksdb_dump and rocksdb_undump. 2015-06-19 16:24:36 -07:00
HISTORY.md Stall writes in WriteBufferManager when memory_usage exceeds buffer_size (#7898) 2021-04-21 13:54:02 -07:00
INSTALL.md Update installation instructions (#8158) 2021-04-06 16:02:04 -07:00
LANGUAGE-BINDINGS.md Add RestoreDBFromLatestBackup to C API, add new C# package (#7092) 2020-07-08 11:56:41 -07:00
LICENSE.Apache Change RocksDB License 2017-07-15 16:11:23 -07:00
LICENSE.leveldb Add back the LevelDB license file 2017-07-16 18:42:18 -07:00
Makefile Stall writes in WriteBufferManager when memory_usage exceeds buffer_size (#7898) 2021-04-21 13:54:02 -07:00
PLUGINS.md Makefile support to statically link external plugin code (#7918) 2021-02-10 08:35:34 -08:00
README.md Fix the CI badge for ppc64le Jenkins (#7561) 2020-10-16 09:00:56 -07:00
ROCKSDB_LITE.md Fix some typos in comments and docs. 2018-03-08 10:27:25 -08:00
TARGETS Stall writes in WriteBufferManager when memory_usage exceeds buffer_size (#7898) 2021-04-21 13:54:02 -07:00
USERS.md Add Apache Doris to USERS (#7865) 2021-01-19 15:31:56 -08:00
Vagrantfile Adding CentOS 7 Vagrantfile & build script 2018-02-26 15:27:17 -08:00
WINDOWS_PORT.md #5145 , rename port/dirent.h to port/port_dirent.h to avoid compile err when use port dir as header dir output (#5152) 2019-04-04 11:38:19 -07:00
appveyor.yml Remove 2019 from appveyor (#7038) 2020-06-29 14:31:41 -07:00
defs.bzl Make testpilot recognize that these tests have coverage instrumentation 2020-03-20 11:23:23 -07:00
issue_template.md Add Google Group to Issue Template 2020-01-28 14:40:37 -08:00
src.mk Stall writes in WriteBufferManager when memory_usage exceeds buffer_size (#7898) 2021-04-21 13:54:02 -07:00
thirdparty.inc Fix build jemalloc api (#5470) 2019-06-24 17:40:32 -07:00

README.md

RocksDB: A Persistent Key-Value Store for Flash and RAM Storage

CircleCI Status TravisCI Status Appveyor Build status PPC64le Build Status

RocksDB is developed and maintained by Facebook Database Engineering Team. It is built on earlier work on LevelDB by Sanjay Ghemawat (sanjay@google.com) and Jeff Dean (jeff@google.com)

This code is a library that forms the core building block for a fast key-value server, especially suited for storing data on flash drives. It has a Log-Structured-Merge-Database (LSM) design with flexible tradeoffs between Write-Amplification-Factor (WAF), Read-Amplification-Factor (RAF) and Space-Amplification-Factor (SAF). It has multi-threaded compactions, making it especially suitable for storing multiple terabytes of data in a single database.

Start with example usage here: https://github.com/facebook/rocksdb/tree/master/examples

See the github wiki for more explanation.

The public interface is in include/. Callers should not include or rely on the details of any other header files in this package. Those internal APIs may be changed without warning.

Design discussions are conducted in https://www.facebook.com/groups/rocksdb.dev/ and https://rocksdb.slack.com/

License

RocksDB is dual-licensed under both the GPLv2 (found in the COPYING file in the root directory) and Apache 2.0 License (found in the LICENSE.Apache file in the root directory). You may select, at your option, one of the above-listed licenses.