a657ee9a9c
Summary: Consider the following WAL with 4 batch entries prefixed with their sequence at time of memtable insert. [1: BEGIN_PREPARE, PUT, PUT, PUT, PUT, END_PREPARE(a)] [1: BEGIN_PREPARE, PUT, PUT, PUT, PUT, END_PREPARE(b)] [4: COMMIT(a)] [7: COMMIT(b)] The first two batches do not consume any sequence numbers so are both prefixed with seq=1. For 2pc commit, memtable insertion takes place before COMMIT batch is written to WAL. We can see that sequence number consumption takes place between WAL entries giving us the seemingly sparse sequence prefix for WAL entries. This is a valid WAL. Because with 2PC markers one WriteBatch points to another batch containing its inserts a writebatch can consume more or less sequence numbers than the number of sequence consuming entries that it contains. We can see that, given the entries in the WAL, 6 sequence ids were consumed. Yet on recovery the maximum sequence consumed would be 7 + 3 (the number of sequence numbers consumed by COMMIT(b)) So, now upon recovery we must track the actual consumption of sequence numbers. In the provided scenario there will be no sequence gaps, but it is possible to produce a sequence gap. This should not be a problem though. correct? Test Plan: provided test. Reviewers: sdong Subscribers: andrewkr, leveldb, dhruba, hermanlee4 Differential Revision: https://reviews.facebook.net/D57645 |
||
---|---|---|
arcanist_util | ||
build_tools | ||
coverage | ||
db | ||
doc | ||
examples | ||
hdfs | ||
include/rocksdb | ||
java | ||
memtable | ||
port | ||
table | ||
third-party | ||
tools | ||
util | ||
utilities | ||
.arcconfig | ||
.clang-format | ||
.gitignore | ||
.travis.yml | ||
AUTHORS | ||
CMakeLists.txt | ||
CONTRIBUTING.md | ||
DEFAULT_OPTIONS_HISTORY.md | ||
DUMP_FORMAT.md | ||
HISTORY.md | ||
INSTALL.md | ||
LANGUAGE-BINDINGS.md | ||
LICENSE | ||
Makefile | ||
PATENTS | ||
README.md | ||
ROCKSDB_LITE.md | ||
USERS.md | ||
Vagrantfile | ||
WINDOWS_PORT.md | ||
appveyor.yml | ||
src.mk | ||
thirdparty.inc |
README.md
RocksDB: A Persistent Key-Value Store for Flash and RAM Storage
RocksDB is developed and maintained by Facebook Database Engineering Team. It is built on earlier work on LevelDB by Sanjay Ghemawat (sanjay@google.com) and Jeff Dean (jeff@google.com)
This code is a library that forms the core building block for a fast key value server, especially suited for storing data on flash drives. It has a Log-Structured-Merge-Database (LSM) design with flexible tradeoffs between Write-Amplification-Factor (WAF), Read-Amplification-Factor (RAF) and Space-Amplification-Factor (SAF). It has multi-threaded compactions, making it specially suitable for storing multiple terabytes of data in a single database.
Start with example usage here: https://github.com/facebook/rocksdb/tree/master/examples
See the github wiki for more explanation.
The public interface is in include/
. Callers should not include or
rely on the details of any other header files in this package. Those
internal APIs may be changed without warning.
Design discussions are conducted in https://www.facebook.com/groups/rocksdb.dev/