Go to file
Yi Wu b1ad6ebba8 WritePrepared: fix two versions in compaction see different status for released snapshots (#4890)
Summary:
Fix how CompactionIterator::findEarliestVisibleSnapshots handles released snapshot. It fixing the two scenarios:

Scenario 1:
key1 has two values v1 and v2. There're two snapshots s1 and s2 taken after v1 and v2 are committed. Right after compaction output v2, s1 is released. Now findEarliestVisibleSnapshot may see s1 being released, and return the next snapshot, which is s2. That's larger than v2's earliest visible snapshot, which was s1.
The fix: the only place we check against last snapshot and current key snapshot is when we decide whether to compact out a value if it is hidden by a later value. In the check if we see current snapshot is even larger than last snapshot, we know last snapshot is released, and we are safe to compact out current key.

Scenario 2:
key1 has two values v1 and v2. there are two snapshots s1 and s2 taken after v1 and v2 are committed. During compaction before we process the key, s1 is released. When compaction process v2, snapshot checker may return kSnapshotReleased, and the earliest visible snapshot for v2 become s2. When compaction process v1, snapshot checker may return kIsInSnapshot (for WritePrepared transaction, it could be because v1 is still in commit cache). The result will become inconsistent here.
The fix: remember the set of released snapshots ever reported by snapshot checker, and ignore them when finding result for findEarliestVisibleSnapshot.
Pull Request resolved: https://github.com/facebook/rocksdb/pull/4890

Differential Revision: D13705538

Pulled By: maysamyabandeh

fbshipit-source-id: e577f0d9ee1ff5a6035f26859e56902ecc85a5a4
2019-01-18 17:24:06 -08:00
buckifier Fix skylark incompatible build files in rocksdb 2019-01-07 13:37:40 -08:00
build_tools Fix spelling errors (#4827) 2019-01-02 11:17:57 -08:00
cache Revert "Move MemoryAllocator option from Cache to BlockBasedTableOpti… (#4697) 2018-11-21 11:29:57 -08:00
cmake Make FindZLIB consistent with official definitions (#4823) 2019-01-02 12:49:57 -08:00
coverage Remove unused imports, from python scripts. (#4057) 2018-06-26 12:43:04 -07:00
db WritePrepared: fix two versions in compaction see different status for released snapshots (#4890) 2019-01-18 17:24:06 -08:00
docs Insane line length detected (#4813) 2018-12-21 14:54:34 -08:00
env Introduce a CPU time counter in perf_context (#4741) 2018-12-20 12:03:44 -08:00
examples Pin top-level index on partitioned index/filter blocks (#4037) 2018-06-22 15:27:46 -07:00
hdfs Update all unique/shared_ptr instances to be qualified with namespace std (#4638) 2018-11-09 11:19:58 -08:00
include/rocksdb Use chrono::time_point instead of time_t (#4868) 2019-01-16 09:51:05 -08:00
java Fix typos in comments (#4819) 2018-12-26 09:43:56 -08:00
memtable WriteBufferManger doens't cost to cache if no limit is set (#4695) 2018-11-18 16:55:43 -08:00
monitoring Add a new per level counter for block cache hit (#4796) 2018-12-21 13:20:05 -08:00
options Concurrent task limiter for compaction thread control (#4332) 2018-12-13 13:18:28 -08:00
port Detect if Jemalloc is linked with the binary (#4844) 2019-01-03 16:30:12 -08:00
table fix accounting for range tombstones in TableProperties (#4841) 2019-01-02 15:08:53 -08:00
third-party Support pragma once in all header files and cleanup some warnings (#4339) 2018-09-05 18:13:31 -07:00
tools With ldb --try_load_options and wal_dir doesn't exist, ignore it (#4875) 2019-01-11 16:48:32 -08:00
util Use chrono::time_point instead of time_t (#4868) 2019-01-16 09:51:05 -08:00
utilities WritePrepared: fix two versions in compaction see different status for released snapshots (#4890) 2019-01-18 17:24:06 -08:00
.clang-format
.gitignore RocksDB Trace Analyzer (#4091) 2018-08-13 11:44:02 -07:00
.lgtm.yml Create lgtm.yml for LGTM.com C/C++ analysis (#4058) 2018-06-26 12:43:04 -07:00
.travis.yml Fix printf formatting on MacOS (#4533) 2018-10-19 14:46:09 -07:00
AUTHORS
CMakeLists.txt Remove some components (#4101) 2019-01-10 13:30:09 -08:00
CODE_OF_CONDUCT.md Add Code of Conduct 2017-12-05 18:42:35 -08:00
CONTRIBUTING.md Add Code of Conduct 2017-12-05 18:42:35 -08:00
COPYING
DEFAULT_OPTIONS_HISTORY.md
DUMP_FORMAT.md
HISTORY.md Use chrono::time_point instead of time_t (#4868) 2019-01-16 09:51:05 -08:00
INSTALL.md Update the version of the dependencies used by the RocksJava static build (#4761) 2018-12-18 20:25:43 -08:00
LANGUAGE-BINDINGS.md Added PingCaps Rust RocksDB and ObjectiveRocks (#4065) 2018-06-27 15:43:21 -07:00
LICENSE.Apache
LICENSE.leveldb
Makefile Fix downloaded filename of snappy (#4870) 2019-01-11 10:29:40 -08:00
README.md Create lgtm.yml for LGTM.com C/C++ analysis (#4058) 2018-06-26 12:43:04 -07:00
ROCKSDB_LITE.md Fix some typos in comments and docs. 2018-03-08 10:27:25 -08:00
TARGETS Remove some components (#4101) 2019-01-10 13:30:09 -08:00
USERS.md Adding IOTA Foundation to USERS.MD (#4436) 2018-10-02 10:03:46 -07:00
Vagrantfile Adding CentOS 7 Vagrantfile & build script 2018-02-26 15:27:17 -08:00
WINDOWS_PORT.md Add GCC 8 to Travis (#3433) 2018-07-13 10:58:06 -07:00
appveyor.yml Add RocksJava build to AppVeyor 2019-01-03 10:44:44 -08:00
issue_template.md
src.mk Remove some components (#4101) 2019-01-10 13:30:09 -08:00
thirdparty.inc Provide a way to override windows memory allocator with jemalloc for ZSTD 2018-06-04 12:12:48 -07:00

README.md

RocksDB: A Persistent Key-Value Store for Flash and RAM Storage

Linux/Mac Build Status Windows Build status PPC64le Build Status

RocksDB is developed and maintained by Facebook Database Engineering Team. It is built on earlier work on LevelDB by Sanjay Ghemawat (sanjay@google.com) and Jeff Dean (jeff@google.com)

This code is a library that forms the core building block for a fast key value server, especially suited for storing data on flash drives. It has a Log-Structured-Merge-Database (LSM) design with flexible tradeoffs between Write-Amplification-Factor (WAF), Read-Amplification-Factor (RAF) and Space-Amplification-Factor (SAF). It has multi-threaded compactions, making it specially suitable for storing multiple terabytes of data in a single database.

Start with example usage here: https://github.com/facebook/rocksdb/tree/master/examples

See the github wiki for more explanation.

The public interface is in include/. Callers should not include or rely on the details of any other header files in this package. Those internal APIs may be changed without warning.

Design discussions are conducted in https://www.facebook.com/groups/rocksdb.dev/

License

RocksDB is dual-licensed under both the GPLv2 (found in the COPYING file in the root directory) and Apache 2.0 License (found in the LICENSE.Apache file in the root directory). You may select, at your option, one of the above-listed licenses.