Summary:
This diff adds three sets of APIs to RocksDB.
= GetColumnFamilyMetaData =
* This APIs allow users to obtain the current state of a RocksDB instance on one column family.
* See GetColumnFamilyMetaData in include/rocksdb/db.h
= EventListener =
* A virtual class that allows users to implement a set of
call-back functions which will be called when specific
events of a RocksDB instance happens.
* To register EventListener, simply insert an EventListener to ColumnFamilyOptions::listeners
= CompactFiles =
* CompactFiles API inputs a set of file numbers and an output level, and RocksDB
will try to compact those files into the specified level.
= Example =
* Example code can be found in example/compact_files_example.cc, which implements
a simple external compactor using EventListener, GetColumnFamilyMetaData, and
CompactFiles API.
Test Plan:
listener_test
compactor_test
example/compact_files_example
export ROCKSDB_TESTS=CompactFiles
db_test
export ROCKSDB_TESTS=MetaData
db_test
Reviewers: ljin, igor, rven, sdong
Reviewed By: sdong
Subscribers: MarkCallaghan, dhruba, leveldb
Differential Revision: https://reviews.facebook.net/D24705
Summary:
Only one more try, I promise.
I talked to Jim and he mentioned that if we include our system includes with -isystem rather than with -I, that signals to the compile that those are system includes and thus no warnings are issued. So I turned our glibc includes into system includes and now we no longer get the warning from there, making us shadow-warning-free!
Test Plan: compiles with both clang and gcc
Reviewers: sdong, yhchiang, rven, ljin
Reviewed By: ljin
Subscribers: dhruba, leveldb
Differential Revision: https://reviews.facebook.net/D28479
Summary:
So glibc is not -Wshadow-safe, so we need to turn it off :(
error: ‘int sigaction(int, const sigaction*, sigaction*)’ hides
constructor for ‘struct sigaction’
The rest of the changes in this diff is that we include .h files under rocksdb namespace, which is a no-no.
Test Plan: compiles now
Reviewers: ljin, yhchiang, rven, sdong
Reviewed By: sdong
Subscribers: dhruba, leveldb
Differential Revision: https://reviews.facebook.net/D28413
Summary: It turns out that -Wshadow has different rules for gcc than clang. Previous commit fixed clang. This commits fixes the rest of the warnings for gcc.
Test Plan: compiles
Reviewers: ljin, yhchiang, rven, sdong
Reviewed By: sdong
Subscribers: dhruba, leveldb
Differential Revision: https://reviews.facebook.net/D28131
Summary:
TestMemEnv simulates all Env APIs using in-memory data structures.
We can use it to speed up db_test run, which is now reduced ~7mins when it is
enabled.
We can also add features to simulate power/disk failures in the next
step
TestMemEnv is derived from helper/mem_env
mem_env can not be used for rocksdb since some of its APIs do not give
the same results as env_posix. And its file read/write is not thread safe
Test Plan:
make all -j32
./db_test
./env_mem_test
Reviewers: sdong, yhchiang, rven, igor
Reviewed By: igor
Subscribers: dhruba, leveldb
Differential Revision: https://reviews.facebook.net/D28035
Summary: as title
Test Plan: ./c_test
Reviewers: igor
Reviewed By: igor
Subscribers: dhruba, leveldb
Differential Revision: https://reviews.facebook.net/D28119
Summary:
...and fix all the errors :)
Jim suggested turning on -Wshadow because it helped him fix number of critical bugs in fbcode. I think it's a good idea to be -Wshadow clean.
Test Plan: compiles
Reviewers: yhchiang, rven, sdong, ljin
Reviewed By: ljin
Subscribers: dhruba, leveldb
Differential Revision: https://reviews.facebook.net/D27711
Summary:
Rename Version::Builder to VersionBuilder and expose its definition to a header.
Make VerisonBuilder not reference Version or ColumnFamilyData, only working with VersionStorageInfo.
Add version_builder_test which has a simple test.
Test Plan: make all check
Reviewers: rven, yhchiang, igor, ljin
Reviewed By: igor
Subscribers: leveldb, dhruba
Differential Revision: https://reviews.facebook.net/D27969
Summary: Decoupling code that deals with archived log files outside of DBImpl. That will make this code easier to reason about and test. It will also make the code easier to improve, because an improver doesn't have to understand DBImpl code in entirety.
Test Plan: added test
Reviewers: ljin, yhchiang, rven, sdong
Reviewed By: sdong
Subscribers: dhruba, leveldb
Differential Revision: https://reviews.facebook.net/D27873
Summary:
Add some helper functions to make sure DB works well for non-default comparators.
Add a test for SimpleSuffixReverseComparator.
Test Plan: Run the new test
Reviewers: ljin, rven, yhchiang, igor
Reviewed By: igor
Subscribers: leveldb, dhruba
Differential Revision: https://reviews.facebook.net/D27831
Summary:
Make compaction picker easier to test.
The basic idea is to separate a minimum subcomponent of Version to VersionStorageInfo, which just responsible to LSM tree. A stub VersionStorageInfo can then be easily created and passed into compaction picker so that we can check the outputs.
It now passes most tests. Still two things need to be done:
(1) deal with the FIFO compaction's file size.
(2) write an example test to make sure the interface can do the job.
Add a compaction_picker_test to make sure compaction picker codes can be easily unit tested.
Test Plan:
Pass all unit tests and compaction_picker_test
Reviewers: yhchiang, rven, igor, ljin
Reviewed By: ljin
Subscribers: leveldb, dhruba
Differential Revision: https://reviews.facebook.net/D27639
Summary:
This diff replaces BlockBasedTable in flush_job_test with TableMock, making it depend on less things and making it closer to an unit test than integration test.
It also introduces a framework to compile mock classes -- Any file named *mock.cc will not be compiled into the build. It will only get compiled into the tests. What way we can mock out most other classes, Version, VersionSet, DBImpl, etc.
Test Plan: flush_job_test
Reviewers: ljin, rven, yhchiang, sdong
Reviewed By: sdong
Subscribers: dhruba, leveldb
Differential Revision: https://reviews.facebook.net/D27681
Summary:
Abstract out FlushProcess and take it out of DBImpl.
This also includes taking DeletionState outside of DBImpl.
Currently this diff is only doing the refactoring. Future work includes:
1. Decoupling flush_process.cc, make it depend on less state
2. Write flush_process_test, which will mock out everything that FlushProcess depends on and test it in isolation
Test Plan: make check
Reviewers: rven, yhchiang, sdong, ljin
Reviewed By: ljin
Subscribers: dhruba, leveldb
Differential Revision: https://reviews.facebook.net/D27561
Summary: perf_context.get_from_output_files_time is now only set writable DB's DB::Get(). Extend it to MultiGet() and read only DB.
Test Plan:
make all check
Fix perf_context_test and extend it to cover MultiGet(), as long as read-only DB. Run it and watch the results
Reviewers: ljin, yhchiang, igor
Reviewed By: igor
Subscribers: rven, leveldb
Differential Revision: https://reviews.facebook.net/D24207
Summary:
Before this diff, there are two places with rocksdb versions. After the diff:
1. we only have one source of truth for rocksdb version
2. we have a script that we can use to get the version that we can use in other compilations (java, go, etc).
Test Plan: make
Reviewers: yhchiang, sdong, ljin
Reviewed By: ljin
Subscribers: leveldb
Differential Revision: https://reviews.facebook.net/D24333
Summary:
I put together a script to assist in the generation of deb's and
rpm's. I've tested that this works on ubuntu via vagrant. I've included the
Vagrantfile here, but I can remove it if it's not useful. The package.sh
script should work on any ubuntu or centos machine, I just added a bit of
logic in there to allow a base Ubuntu or Centos machine to be able to build
RocksDB from scratch.
Example output on Ubuntu 14.04:
```
root@vagrant-ubuntu-trusty-64:/vagrant# ./tools/package.sh
[+] g++-4.7 is already installed. skipping.
[+] libgflags-dev is already installed. skipping.
[+] ruby-all-dev is already installed. skipping.
[+] fpm is already installed. skipping.
Created package {:path=>"rocksdb_3.5_amd64.deb"}
root@vagrant-ubuntu-trusty-64:/vagrant# dpkg --info rocksdb_3.5_amd64.deb
new debian package, version 2.0.
size 17392022 bytes: control archive=1518 bytes.
275 bytes, 11 lines control
2911 bytes, 38 lines md5sums
Package: rocksdb
Version: 3.5
License: BSD
Vendor: Facebook
Architecture: amd64
Maintainer: rocksdb@fb.com
Installed-Size: 83358
Section: default
Priority: extra
Homepage: http://rocksdb.org/
Description: RocksDB is an embeddable persistent key-value store for fast storage.
```
Example output on CentOS 6.5:
```
[root@localhost vagrant]# rpm -qip rocksdb-3.5-1.x86_64.rpm
Name : rocksdb Relocations: /usr
Version : 3.5 Vendor: Facebook
Release : 1 Build Date: Mon 29 Sep 2014 01:26:11 AM UTC
Install Date: (not installed) Build Host: localhost
Group : default Source RPM: rocksdb-3.5-1.src.rpm
Size : 96231106 License: BSD
Signature : (none)
Packager : rocksdb@fb.com
URL : http://rocksdb.org/
Summary : RocksDB is an embeddable persistent key-value store for fast storage.
Description :
RocksDB is an embeddable persistent key-value store for fast storage.
```
Test Plan:
How this gets used is really up to the RocksDB core team. If you
want to actually get this into mainline, you might have to change `make
install` such that it install the RocksDB shared object file as well, which
would require you to link against gflags (maybe?) and that would require some
potential modifications to the script here (basically add a depends on that
package).
Currently, this will install the headers and a pre-compiled statically linked
object file. If that's what you want out of life, than this requires no
modifications.
Reviewers: ljin, yhchiang, igor
Reviewed By: igor
Differential Revision: https://reviews.facebook.net/D24141
Summary:
Intead of passing callback function pointer and its arg on Table::Get()
interface, passing GetContext. This makes the interface cleaner and
possible better perf. Also adding a fast pass for SaveValue()
Test Plan: make all check
Reviewers: igor, yhchiang, sdong
Reviewed By: sdong
Subscribers: leveldb
Differential Revision: https://reviews.facebook.net/D24057
Summary:
Add make install. If INSTALL_PATH is not set, then rocksdb will be
installed under "/usr/local" directory (/usr/local/include for headers
and /usr/local/lib for library file(s).)
Test Plan:
Develop a simple rocksdb app, called test.cc, and do the followings.
make clean
make static_lib -j32
sudo make install
g++ -std=c++11 test.cc -lrocksdb -lbz2 -lz -o test
./test
sudo make uninstall
make clean
make shared_lib -j32
sudo make install
g++ -std=c++11 test.cc -lrocksdb -lbz2 -lz -o test
./test
make INSTALL_PATH=/tmp/path install
make INSTALL_PATH=/tmp/path uninstall
and make sure things are installed / uninstalled in the specified path.
Reviewers: ljin, sdong, igor
Reviewed By: igor
Subscribers: leveldb
Differential Revision: https://reviews.facebook.net/D23211
Summary:
Introducing WriteController, which is a source of truth about per-DB write delays. Let's define an DB epoch as a period where there are no flushes and compactions (i.e. new epoch is started when flush or compaction finishes). Each epoch can either:
* proceed with all writes without delay
* delay all writes by fixed time
* stop all writes
The three modes are recomputed at each epoch change (flush, compaction), rather than on every write (which is currently the case).
When we have a lot of column families, our current pull behavior adds a big overhead, since we need to loop over every column family for every write. With new push model, overhead on Write code-path is minimal.
This is just the start. Next step is to also take care of stalls introduced by slow memtable flushes. The final goal is to eliminate function MakeRoomForWrite(), which currently needs to be called for every column family by every write.
Test Plan: make check for now. I'll add some unit tests later. Also, perf test.
Reviewers: dhruba, yhchiang, MarkCallaghan, sdong, ljin
Reviewed By: ljin
Subscribers: leveldb
Differential Revision: https://reviews.facebook.net/D22791
Summary:
1. Make filter_block.h a base class. Derive block_based_filter_block and full_filter_block. The previous one is the traditional filter block. The full_filter_block is newly added. It would generate a filter block that contain all the keys in SST file.
2. When querying a key, table would first check if full_filter is available. If not, it would go to the exact data block and check using block_based filter.
3. User could choose to use full_filter or tradional(block_based_filter). They would be stored in SST file with different meta index name. "filter.filter_policy" or "full_filter.filter_policy". Then, Table reader is able to know the fllter block type.
4. Some optimizations have been done for full_filter_block, thus it requires a different interface compared to the original one in filter_policy.h.
5. Actual implementation of filter bits coding/decoding is placed in util/bloom_impl.cc
Benchmark: base commit 1d23b5c470
Command:
db_bench --db=/dev/shm/rocksdb --num_levels=6 --key_size=20 --prefix_size=20 --keys_per_prefix=0 --value_size=100 --write_buffer_size=134217728 --max_write_buffer_number=2 --target_file_size_base=33554432 --max_bytes_for_level_base=1073741824 --verify_checksum=false --max_background_compactions=4 --use_plain_table=0 --memtablerep=prefix_hash --open_files=-1 --mmap_read=1 --mmap_write=0 --bloom_bits=10 --bloom_locality=1 --memtable_bloom_bits=500000 --compression_type=lz4 --num=393216000 --use_hash_search=1 --block_size=1024 --block_restart_interval=16 --use_existing_db=1 --threads=1 --benchmarks=readrandom —disable_auto_compactions=1
Read QPS increase for about 30% from 2230002 to 2991411.
Test Plan:
make all check
valgrind db_test
db_stress --use_block_based_filter = 0
./auto_sanity_test.sh
Reviewers: igor, yhchiang, ljin, sdong
Reviewed By: sdong
Subscribers: dhruba, leveldb
Differential Revision: https://reviews.facebook.net/D20979
Summary: In DBImpl::Recover method, while loading memtables, also check if memtables are empty. Use this in DBImplReadonly to determine whether to lookup memtable or not.
Test Plan:
db_test
make check all
Reviewers: sdong, yhchiang, ljin, igor
Reviewed By: ljin
Subscribers: leveldb
Differential Revision: https://reviews.facebook.net/D22281
Summary:
Add WriteBatchWithIndex so that a user can query data out of a WriteBatch, to support MongoDB's read-its-own-write.
WriteBatchWithIndex uses a skiplist to store the binary index. The index stores the offset of the entry in the write batch. When searching for a key, the key for the entry is read by read the entry from the write batch from the offset.
Define a new iterator class for querying data out of WriteBatchWithIndex. A user can create an iterator of the write batch for one column family, seek to a key and keep calling Next() to see next entries.
I will add more unit tests if people are OK about this API.
Test Plan:
make all check
Add unit tests.
Reviewers: yhchiang, igor, MarkCallaghan, ljin
Reviewed By: ljin
Subscribers: dhruba, leveldb, xjin
Differential Revision: https://reviews.facebook.net/D21381
Summary: as title
Test Plan:
make all check
ran db_bench and saw seek stats at the end
Reviewers: yhchiang, igor, sdong
Reviewed By: sdong
Subscribers: leveldb
Differential Revision: https://reviews.facebook.net/D21651