Go to file
Andrew Kryczka 6fbe96baf8 Compaction Support for Range Deletion
Summary:
This diff introduces RangeDelAggregator, which takes ownership of iterators
provided to it via AddTombstones(). The tombstones are organized in a two-level
map (snapshot stripe -> begin key -> tombstone). Tombstone creation avoids data
copy by holding Slices returned by the iterator, which remain valid thanks to pinning.

For compaction, we create a hierarchical range tombstone iterator with structure
matching the iterator over compaction input data. An aggregator based on that
iterator is used by CompactionIterator to determine which keys are covered by
range tombstones. In case of merge operand, the same aggregator is used by
MergeHelper. Upon finishing each file in the compaction, relevant range tombstones
are added to the output file's range tombstone metablock and file boundaries are
updated accordingly.

To check whether a key is covered by range tombstone, RangeDelAggregator::ShouldDelete()
considers tombstones in the key's snapshot stripe. When this function is used outside of
compaction, it also checks newer stripes, which can contain covering tombstones. Currently
the intra-stripe check involves a linear scan; however, in the future we plan to collapse ranges
within a stripe such that binary search can be used.

RangeDelAggregator::AddToBuilder() adds all range tombstones in the table's key-range
to a new table's range tombstone meta-block. Since range tombstones may fall in the gap
between files, we may need to extend some files' key-ranges. The strategy is (1) first file
extends as far left as possible and other files do not extend left, (2) all files extend right
until either the start of the next file or the end of the last range tombstone in the gap,
whichever comes first.

One other notable change is adding release/move semantics to ScopedArenaIterator
such that it can be used to transfer ownership of an arena-allocated iterator, similar to
how unique_ptr is used for malloc'd data.

Depends on D61473

Test Plan: compaction_iterator_test, mock_table, end-to-end tests in D63927

Reviewers: sdong, IslamAbdelRahman, wanning, yhchiang, lightmark

Reviewed By: lightmark

Subscribers: andrewkr, dhruba, leveldb

Differential Revision: https://reviews.facebook.net/D62205
2016-10-18 12:04:56 -07:00
arcanist_util Fix arcanist 2016-10-12 20:11:30 -07:00
build_tools Support ZSTD with finalized format 2016-09-06 12:22:16 -07:00
cmake/modules cmake support for linux and osx (#1358) 2016-09-28 11:53:15 -07:00
coverage Fix coverage script 2014-11-03 14:53:00 -08:00
db Compaction Support for Range Deletion 2016-10-18 12:04:56 -07:00
docs Editorial change to README.md 2016-10-12 20:24:50 -07:00
examples Fix typo (#903) 2016-09-14 14:12:31 -07:00
hdfs Fix bug in UnScSigned-off-by: xh931076284 <931076284@qq.com> (#1336) 2016-09-14 10:17:34 -07:00
include/rocksdb add seeforprev in history 2016-10-17 15:34:13 -07:00
java Remove "-Xcheck:jni" from Java tests (#1402) 2016-10-18 09:18:24 -04:00
memtable Add SeekForPrev() to Iterator 2016-09-27 18:20:57 -07:00
port Implement WinRandomRW file and improve code reuse (#1388) 2016-10-13 16:36:34 -07:00
table Compaction Support for Range Deletion 2016-10-18 12:04:56 -07:00
third-party Fix clang analyzer errors 2016-07-08 17:50:51 -07:00
tools Fix a minor bug in the ldb tool that was not selecting the specified (#1399) 2016-10-17 10:40:30 -07:00
util Remove function local statics that interfere with memory pooling (#1392) 2016-10-14 13:09:18 -07:00
utilities Make Lock Info test multiple column families 2016-10-07 15:04:05 -07:00
.arcconfig Integrate Jenkins with Phabricator 2015-04-07 11:56:29 -07:00
.clang-format A script that automatically reformat affected lines 2014-01-14 12:21:24 -08:00
.gitignore Experiments on column-aware encodings 2016-08-01 14:50:19 -07:00
.travis.yml Fix travis 2016-08-31 00:10:49 -07:00
AUTHORS Add AUTHORS file. Fix #203 2014-09-29 10:52:18 -07:00
CMakeLists.txt Compaction Support for Range Deletion 2016-10-18 12:04:56 -07:00
CONTRIBUTING.md facebook accounts are not required for CLA signers 2014-07-08 05:57:54 -04:00
DEFAULT_OPTIONS_HISTORY.md Release RocksDB 4.8.0 2016-05-02 14:38:04 -07:00
DUMP_FORMAT.md First version of rocksdb_dump and rocksdb_undump. 2015-06-19 16:24:36 -07:00
HISTORY.md add seeforprev in history 2016-10-17 15:34:13 -07:00
INSTALL.md Simple changes to support builds for ppc64[le] consistent with X86 2016-01-19 09:08:19 -06:00
LANGUAGE-BINDINGS.md Update LANGUAGE-BINDINGS.md 2016-07-15 13:09:30 +02:00
LICENSE Updated all copyright headers to the new format. 2016-02-09 15:12:00 -08:00
Makefile Minor fixes around Windows 64 Java Artifacts (#1366) 2016-10-03 11:58:08 -07:00
PATENTS Update Patent Grant. 2015-04-13 10:33:43 +01:00
README.md Appveyor badge to show master branch 2016-07-26 13:54:08 -07:00
ROCKSDB_LITE.md Optimistic Transactions 2015-05-29 14:36:35 -07:00
USERS.md Adding Dgraph to list of Users (#1291) 2016-09-12 17:33:44 -07:00
Vagrantfile RocksDB on FreeBSD support 2015-02-26 15:19:17 -08:00
WINDOWS_PORT.md Commit both PR and internal code review changes 2015-07-07 16:58:20 -07:00
appveyor.yml fix vs generator (#1269) 2016-08-10 09:08:13 -07:00
src.mk Compaction Support for Range Deletion 2016-10-18 12:04:56 -07:00
thirdparty.inc Introduce XPRESS compresssion on Windows. (#1081) 2016-04-19 22:54:24 -07:00

README.md

RocksDB: A Persistent Key-Value Store for Flash and RAM Storage

Build Status Build status

RocksDB is developed and maintained by Facebook Database Engineering Team. It is built on earlier work on LevelDB by Sanjay Ghemawat (sanjay@google.com) and Jeff Dean (jeff@google.com)

This code is a library that forms the core building block for a fast key value server, especially suited for storing data on flash drives. It has a Log-Structured-Merge-Database (LSM) design with flexible tradeoffs between Write-Amplification-Factor (WAF), Read-Amplification-Factor (RAF) and Space-Amplification-Factor (SAF). It has multi-threaded compactions, making it specially suitable for storing multiple terabytes of data in a single database.

Start with example usage here: https://github.com/facebook/rocksdb/tree/master/examples

See the github wiki for more explanation.

The public interface is in include/. Callers should not include or rely on the details of any other header files in this package. Those internal APIs may be changed without warning.

Design discussions are conducted in https://www.facebook.com/groups/rocksdb.dev/