Go to file
Reid Horuff a657ee9a9c [rocksdb] Recovery path sequence miscount fix
Summary:
Consider the following WAL with 4 batch entries prefixed with their sequence at time of memtable insert.
[1: BEGIN_PREPARE, PUT, PUT, PUT, PUT, END_PREPARE(a)]
[1: BEGIN_PREPARE, PUT, PUT, PUT, PUT, END_PREPARE(b)]
[4: COMMIT(a)]
[7: COMMIT(b)]

The first two batches do not consume any sequence numbers so are both prefixed with seq=1.
For 2pc commit, memtable insertion takes place before COMMIT batch is written to WAL.
We can see that sequence number consumption takes place between WAL entries giving us the seemingly sparse sequence prefix for WAL entries.
This is a valid WAL.

Because with 2PC markers one WriteBatch points to another batch containing its inserts a writebatch can consume more or less sequence numbers than the number of sequence consuming entries that it contains.

We can see that, given the entries in the WAL, 6 sequence ids were consumed. Yet on recovery the maximum sequence consumed would be 7 + 3 (the number of sequence numbers consumed by COMMIT(b))

So, now upon recovery we must track the actual consumption of sequence numbers.
In the provided scenario there will be no sequence gaps, but it is possible to produce a sequence gap. This should not be a problem though. correct?

Test Plan: provided test.

Reviewers: sdong

Subscribers: andrewkr, leveldb, dhruba, hermanlee4

Differential Revision: https://reviews.facebook.net/D57645
2016-05-10 14:06:07 -07:00
arcanist_util Have sandcastle run lite_test for every diff 2016-05-06 14:51:20 -07:00
build_tools Have sandcastle run lite_test for every diff 2016-05-06 14:51:20 -07:00
coverage Fix coverage script 2014-11-03 14:53:00 -08:00
db [rocksdb] Recovery path sequence miscount fix 2016-05-10 14:06:07 -07:00
doc Lint everything 2015-11-16 12:56:21 -08:00
examples Adding pin_l0_filter_and_index_blocks_in_cache feature and related fixes. 2016-04-01 10:42:39 -07:00
hdfs using java7 in runtime for hdfs env (#1072) 2016-04-12 00:08:26 -04:00
include/rocksdb [rocksdb] Two Phase Transaction 2016-05-10 14:06:07 -07:00
java Java API - Add missing HEADER_LEVEL logging (#1104) 2016-05-06 15:06:12 -07:00
memtable Updated all copyright headers to the new format. 2016-02-09 15:12:00 -08:00
port Fix multiple issues with WinMmapFile fo sequential writing (#1108) 2016-04-29 16:43:13 -07:00
table BlockBasedTable::Get() not to use prefix bloom if read_options.total_order_seek = true 2016-05-06 10:16:11 -07:00
third-party Fix the build break on Ubuntu 15.10 when gcc 5.2.1 is used 2016-03-15 10:30:10 -07:00
tools Modification of WriteBatch to support two phase commit 2016-05-10 14:06:07 -07:00
util [rocksdb] Recovery path sequence miscount fix 2016-05-10 14:06:07 -07:00
utilities [rocksdb] Recovery path sequence miscount fix 2016-05-10 14:06:07 -07:00
.arcconfig Integrate Jenkins with Phabricator 2015-04-07 11:56:29 -07:00
.clang-format A script that automatically reformat affected lines 2014-01-14 12:21:24 -08:00
.gitignore Ignore db_test2 2016-03-07 15:56:16 -08:00
.travis.yml Split Travis unittests Job 2016-04-13 14:22:29 -07:00
AUTHORS Add AUTHORS file. Fix #203 2014-09-29 10:52:18 -07:00
CMakeLists.txt [rocksdb] Recovery path sequence miscount fix 2016-05-10 14:06:07 -07:00
CONTRIBUTING.md facebook accounts are not required for CLA signers 2014-07-08 05:57:54 -04:00
DEFAULT_OPTIONS_HISTORY.md Release RocksDB 4.8.0 2016-05-02 14:38:04 -07:00
DUMP_FORMAT.md First version of rocksdb_dump and rocksdb_undump. 2015-06-19 16:24:36 -07:00
HISTORY.md Add bottommost_compression option 2016-05-09 15:57:19 -07:00
INSTALL.md Simple changes to support builds for ppc64[le] consistent with X86 2016-01-19 09:08:19 -06:00
LANGUAGE-BINDINGS.md Merge pull request #1056 from facebook/igorcanadi-patch-1 2016-04-04 08:08:52 -07:00
LICENSE Updated all copyright headers to the new format. 2016-02-09 15:12:00 -08:00
Makefile Print memory allocation counters 2016-04-27 16:23:33 -07:00
PATENTS Update Patent Grant. 2015-04-13 10:33:43 +01:00
README.md Replaced "built on on earlier work" by "built on earlier work" in README.md 2014-09-17 01:16:17 -07:00
ROCKSDB_LITE.md Optimistic Transactions 2015-05-29 14:36:35 -07:00
USERS.md Update USERS.md with link to LinkedIn blog post (#1088) 2016-04-22 15:53:32 -07:00
Vagrantfile RocksDB on FreeBSD support 2015-02-26 15:19:17 -08:00
WINDOWS_PORT.md Commit both PR and internal code review changes 2015-07-07 16:58:20 -07:00
appveyor.yml Exclude DBTest.FileCreationRandomFailure as a long running test 2015-11-17 13:54:13 -08:00
src.mk [rocksdb] Recovery path sequence miscount fix 2016-05-10 14:06:07 -07:00
thirdparty.inc Introduce XPRESS compresssion on Windows. (#1081) 2016-04-19 22:54:24 -07:00

README.md

RocksDB: A Persistent Key-Value Store for Flash and RAM Storage

Build Status

RocksDB is developed and maintained by Facebook Database Engineering Team. It is built on earlier work on LevelDB by Sanjay Ghemawat (sanjay@google.com) and Jeff Dean (jeff@google.com)

This code is a library that forms the core building block for a fast key value server, especially suited for storing data on flash drives. It has a Log-Structured-Merge-Database (LSM) design with flexible tradeoffs between Write-Amplification-Factor (WAF), Read-Amplification-Factor (RAF) and Space-Amplification-Factor (SAF). It has multi-threaded compactions, making it specially suitable for storing multiple terabytes of data in a single database.

Start with example usage here: https://github.com/facebook/rocksdb/tree/master/examples

See the github wiki for more explanation.

The public interface is in include/. Callers should not include or rely on the details of any other header files in this package. Those internal APIs may be changed without warning.

Design discussions are conducted in https://www.facebook.com/groups/rocksdb.dev/