Go to file
Nikhil Benesch 7891af8b53 expose a hook to skip tables during iteration
Summary:
As discussed on the mailing list (["Skipping entire SSTs while iterating"](https://groups.google.com/forum/#!topic/rocksdb/ujHCJVLrHlU)), this patch adds a `table_filter` to `ReadOptions` that allows specifying a callback to be executed during iteration before each table in the database is scanned. The callback is passed the table's properties; the table is scanned iff the callback returns true.

This can be used in conjunction with a `TablePropertiesCollector` to dramatically speed up scans by skipping tables that are known to contain irrelevant data for the scan at hand.

We're using this [downstream in CockroachDB](https://github.com/cockroachdb/cockroach/blob/master/pkg/storage/engine/db.cc#L2009-L2022) already. With this feature, under ideal conditions, we can reduce the time of an incremental backup in  from hours to seconds.

FYI, the first commit in this PR fixes a segfault that I unfortunately have not figured out how to reproduce outside of CockroachDB. I'm hoping you accept it on the grounds that it is not correct to return 8-byte aligned memory from a call to `malloc` on some 64-bit platforms; one correct approach is to infer the necessary alignment from `std::max_align_t`, as done here. As noted in the first commit message, the bug is tickled by having a`std::function` in `struct ReadOptions`. That is, the following patch alone is enough to cause RocksDB to segfault when run from CockroachDB on Darwin.

```diff
 --- a/include/rocksdb/options.h
+++ b/include/rocksdb/options.h
@@ -1546,6 +1546,13 @@ struct ReadOptions {
   // Default: false
   bool ignore_range_deletions;

+  // A callback to determine whether relevant keys for this scan exist in a
+  // given table based on the table's properties. The callback is passed the
+  // properties of each table during iteration. If the callback returns false,
+  // the table will not be scanned.
+  // Default: empty (every table will be scanned)
+  std::function<bool(const TableProperties&)> table_filter;
+
   ReadOptions();
   ReadOptions(bool cksum, bool cache);
 };
```

/cc danhhz
Closes https://github.com/facebook/rocksdb/pull/2265

Differential Revision: D5054262

Pulled By: yiwu-arbug

fbshipit-source-id: dd6b28f2bba6cb8466250d8c5c542d3c92785476
2017-10-17 22:12:00 -07:00
buckifier rocksdb: make buildable on aarch64 2017-08-13 17:13:54 -07:00
build_tools fix lite build 2017-10-17 08:57:09 -07:00
cache Add -DPORTABLE=1 to MSVC CI build 2017-08-31 16:42:48 -07:00
cmake CMake: Add support for CMake packages 2017-08-28 17:14:37 -07:00
coverage Fix /bin/bash shebangs 2017-08-03 15:56:46 -07:00
db expose a hook to skip tables during iteration 2017-10-17 22:12:00 -07:00
docs Blog post for 5.8 release 2017-09-28 10:14:09 -07:00
env Repair DBs with trailing slash in name 2017-09-22 12:42:22 -07:00
examples Pinnableslice examples and blog post 2017-08-24 12:26:07 -07:00
hdfs Revert "comment out unused parameters" 2017-07-21 18:26:26 -07:00
include/rocksdb expose a hook to skip tables during iteration 2017-10-17 22:12:00 -07:00
java Add OptionsUtil class to java/CMakeLists.txt 2017-10-12 16:57:05 -07:00
memtable Added CPU prefetch for skiplist 2017-10-04 18:12:52 -07:00
monitoring Use RAII instead of pointers in cf_info_map 2017-09-28 14:26:47 -07:00
options fix lite build 2017-10-17 08:57:09 -07:00
port Fix MinGW build 2017-09-19 10:28:26 -07:00
table print more table_options to info log 2017-10-13 14:42:26 -07:00
third-party Revert "comment out unused parameters" 2017-07-21 18:26:26 -07:00
tools db_bench randomtransaction print throughput 2017-10-16 18:42:25 -07:00
util arena: derive alignment unit from std::max_align_t 2017-10-17 11:13:19 -07:00
utilities Blob DB: Store blob index as kTypeBlobIndex in base db 2017-10-17 17:28:11 -07:00
.clang-format A script that automatically reformat affected lines 2014-01-14 12:21:24 -08:00
.gitignore Remove leftover references to phutil_module_cache 2017-08-23 12:12:21 -07:00
.travis.yml fix lite build 2017-10-17 08:57:09 -07:00
AUTHORS Add AUTHORS file. Fix #203 2014-09-29 10:52:18 -07:00
CMakeLists.txt PinnableSlice move assignment 2017-10-12 18:28:24 -07:00
CONTRIBUTING.md Remove the licensing description in CONTRIBUTING.md 2017-07-16 15:57:18 -07:00
COPYING Add GPLv2 as an alternative license. 2017-04-27 18:06:12 -07:00
DEFAULT_OPTIONS_HISTORY.md options.delayed_write_rate use the rate of rate_limiter by default. 2017-05-24 09:58:24 -07:00
DUMP_FORMAT.md First version of rocksdb_dump and rocksdb_undump. 2015-06-19 16:24:36 -07:00
HISTORY.md rate limit auto-tuning 2017-10-04 19:15:01 -07:00
INSTALL.md Default one to rocksdb:x64-windows 2017-09-28 16:12:24 -07:00
LANGUAGE-BINDINGS.md add Erlang to the list of language bindings 2017-08-28 16:43:16 -07:00
LICENSE.Apache Change RocksDB License 2017-07-15 16:11:23 -07:00
LICENSE.leveldb Add back the LevelDB license file 2017-07-16 18:42:18 -07:00
Makefile PinnableSlice move assignment 2017-10-12 18:28:24 -07:00
README.md Appveyor badge to show master branch 2016-07-26 13:54:08 -07:00
ROCKSDB_LITE.md Optimistic Transactions 2015-05-29 14:36:35 -07:00
TARGETS PinnableSlice move assignment 2017-10-12 18:28:24 -07:00
USERS.md Add LogDevice to USERS.md 2017-09-25 15:56:40 -07:00
Vagrantfile Update Vagrant file (test internal phabricator workflow) 2016-10-28 15:39:19 -07:00
WINDOWS_PORT.md Commit both PR and internal code review changes 2015-07-07 16:58:20 -07:00
appveyor.yml Add -DPORTABLE=1 to MSVC CI build 2017-08-31 16:42:48 -07:00
issue_template.md Add a template for issues 2017-09-29 11:41:28 -07:00
src.mk PinnableSlice move assignment 2017-10-12 18:28:24 -07:00
thirdparty.inc Introduce XPRESS compresssion on Windows. (#1081) 2016-04-19 22:54:24 -07:00

README.md

RocksDB: A Persistent Key-Value Store for Flash and RAM Storage

Build Status Build status

RocksDB is developed and maintained by Facebook Database Engineering Team. It is built on earlier work on LevelDB by Sanjay Ghemawat (sanjay@google.com) and Jeff Dean (jeff@google.com)

This code is a library that forms the core building block for a fast key value server, especially suited for storing data on flash drives. It has a Log-Structured-Merge-Database (LSM) design with flexible tradeoffs between Write-Amplification-Factor (WAF), Read-Amplification-Factor (RAF) and Space-Amplification-Factor (SAF). It has multi-threaded compactions, making it specially suitable for storing multiple terabytes of data in a single database.

Start with example usage here: https://github.com/facebook/rocksdb/tree/master/examples

See the github wiki for more explanation.

The public interface is in include/. Callers should not include or rely on the details of any other header files in this package. Those internal APIs may be changed without warning.

Design discussions are conducted in https://www.facebook.com/groups/rocksdb.dev/