Find a file
Abhishek Madan 7528130e38 Cache fragmented range tombstones in BlockBasedTableReader (#4493)
Summary:
This allows tombstone fragmenting to only be performed when the table is opened, and cached for subsequent accesses.

On the same DB used in #4449, running `readrandom` results in the following:
```
readrandom   :       0.983 micros/op 1017076 ops/sec;   78.3 MB/s (63103 of 100000 found)
```

Now that Get performance in the presence of range tombstones is reasonable, I also compared the performance between a DB with range tombstones, "expanded" range tombstones (several point tombstones that cover the same keys the equivalent range tombstone would cover, a common workaround for DeleteRange), and no range tombstones. The created DBs had 5 million keys each, and DeleteRange was called at regular intervals (depending on the total number of range tombstones being written) after 4.5 million Puts. The table below summarizes the results of a `readwhilewriting` benchmark (in order to provide somewhat more realistic results):
```
   Tombstones?    | avg micros/op | stddev micros/op |  avg ops/s   | stddev ops/s
----------------- | ------------- | ---------------- | ------------ | ------------
None              |        0.6186 |          0.04637 | 1,625,252.90 | 124,679.41
500 Expanded      |        0.6019 |          0.03628 | 1,666,670.40 | 101,142.65
500 Unexpanded    |        0.6435 |          0.03994 | 1,559,979.40 | 104,090.52
1k Expanded       |        0.6034 |          0.04349 | 1,665,128.10 | 125,144.57
1k Unexpanded     |        0.6261 |          0.03093 | 1,600,457.50 |  79,024.94
5k Expanded       |        0.6163 |          0.05926 | 1,636,668.80 | 154,888.85
5k Unexpanded     |        0.6402 |          0.04002 | 1,567,804.70 | 100,965.55
10k Expanded      |        0.6036 |          0.05105 | 1,667,237.70 | 142,830.36
10k Unexpanded    |        0.6128 |          0.02598 | 1,634,633.40 |  72,161.82
25k Expanded      |        0.6198 |          0.04542 | 1,620,980.50 | 116,662.93
25k Unexpanded    |        0.5478 |          0.0362  | 1,833,059.10 | 121,233.81
50k Expanded      |        0.5104 |          0.04347 | 1,973,107.90 | 184,073.49
50k Unexpanded    |        0.4528 |          0.03387 | 2,219,034.50 | 170,984.32
```

After a large enough quantity of range tombstones are written, range tombstone Gets can become faster than reading from an equivalent DB with several point tombstones.
Pull Request resolved: https://github.com/facebook/rocksdb/pull/4493

Differential Revision: D10842844

Pulled By: abhimadan

fbshipit-source-id: a7d44534f8120e6aabb65779d26c6b9df954c509
2018-10-25 19:26:44 -07:00
buckifier
build_tools
cache
cmake
coverage
db Cache fragmented range tombstones in BlockBasedTableReader (#4493) 2018-10-25 19:26:44 -07:00
docs
env Add missing methods to WritableFileWrapper (#4584) 2018-10-24 12:19:54 -07:00
examples
hdfs
include/rocksdb Add missing methods to WritableFileWrapper (#4584) 2018-10-24 12:19:54 -07:00
java WriteBufferManager JNI fixes (#4579) 2018-10-24 12:40:52 -07:00
memtable
monitoring
options
port
table Cache fragmented range tombstones in BlockBasedTableReader (#4493) 2018-10-25 19:26:44 -07:00
third-party
tools Fix two contrun job failures (#4587) 2018-10-24 20:16:45 -07:00
util Use only "local" range tombstones during Get (#4449) 2018-10-24 12:31:12 -07:00
utilities Set WriteCommitted txn id to commit sequence number (#4565) 2018-10-24 12:21:38 -07:00
.clang-format
.gitignore
.lgtm.yml
.travis.yml
appveyor.yml
AUTHORS
CMakeLists.txt Use only "local" range tombstones during Get (#4449) 2018-10-24 12:31:12 -07:00
CODE_OF_CONDUCT.md
CONTRIBUTING.md
COPYING
DEFAULT_OPTIONS_HISTORY.md
DUMP_FORMAT.md
HISTORY.md
INSTALL.md
issue_template.md
LANGUAGE-BINDINGS.md
LICENSE.Apache
LICENSE.leveldb
Makefile Use only "local" range tombstones during Get (#4449) 2018-10-24 12:31:12 -07:00
README.md
ROCKSDB_LITE.md
src.mk Use only "local" range tombstones during Get (#4449) 2018-10-24 12:31:12 -07:00
TARGETS Use only "local" range tombstones during Get (#4449) 2018-10-24 12:31:12 -07:00
thirdparty.inc
USERS.md
Vagrantfile
WINDOWS_PORT.md

RocksDB: A Persistent Key-Value Store for Flash and RAM Storage

Linux/Mac Build Status Windows Build status PPC64le Build Status

RocksDB is developed and maintained by Facebook Database Engineering Team. It is built on earlier work on LevelDB by Sanjay Ghemawat (sanjay@google.com) and Jeff Dean (jeff@google.com)

This code is a library that forms the core building block for a fast key value server, especially suited for storing data on flash drives. It has a Log-Structured-Merge-Database (LSM) design with flexible tradeoffs between Write-Amplification-Factor (WAF), Read-Amplification-Factor (RAF) and Space-Amplification-Factor (SAF). It has multi-threaded compactions, making it specially suitable for storing multiple terabytes of data in a single database.

Start with example usage here: https://github.com/facebook/rocksdb/tree/master/examples

See the github wiki for more explanation.

The public interface is in include/. Callers should not include or rely on the details of any other header files in this package. Those internal APIs may be changed without warning.

Design discussions are conducted in https://www.facebook.com/groups/rocksdb.dev/

License

RocksDB is dual-licensed under both the GPLv2 (found in the COPYING file in the root directory) and Apache 2.0 License (found in the LICENSE.Apache file in the root directory). You may select, at your option, one of the above-listed licenses.