Commit graph

1847 commits

Author SHA1 Message Date
Igor Canadi 107e08baa7 Use same sorting for all level 0 files
Summary:
We decided that one of the long term goals is to unify level and universal compaction.

As a small first step, I'm unifying level 0 sorting methods.

Previously, we used to sort level 0 files in level compaction by file number and in universal compaction by sequence number.

But it turns out that in level compaction, sorting by file number is exactly the same as sorting by sequence number.

Test Plan:
Ran make check with bunch of asserts to verify the sorting order is exactly the same.
Also, make check with this patch

Reviewers: haobo, yhchiang, ljin, dhruba, sdong

Reviewed By: sdong

Subscribers: leveldb

Differential Revision: https://reviews.facebook.net/D19131
2014-06-20 09:12:14 +02:00
Haobo Xu 7a9dd5f214 [RocksDB] Make block based table hash index more adaptive
Summary: Currently, RocksDB returns error if a db written with prefix hash index, is later opened without providing a prefix extractor. This is uncessarily harsh. Without a prefix extractor, we could always fallback to the normal binary index.

Test Plan: unit test, also manually veried LOG that fallback did occur.

Reviewers: sdong, ljin

Reviewed By: ljin

Subscribers: leveldb

Differential Revision: https://reviews.facebook.net/D19191
2014-06-19 16:40:32 -07:00
Yueh-Hsuan Chiang 4f5ccfd179 Fixed a potential write hang
Summary:
Currently, when something badly happen in the DB::Write() while the write-queue
contains more than one element, the current design seems to forget to clean up
the queue as well as wake-up all the writers, this potentially makes rocksdb
hang on writes.

Test Plan: make all check

Reviewers: sdong, ljin, igor, haobo

Reviewed By: haobo

Subscribers: leveldb

Differential Revision: https://reviews.facebook.net/D19167
2014-06-19 14:53:03 -07:00
Igor Canadi bae495740d Merge pull request #179 from edsrzf/c-api-compaction-filter
Support for compaction filters in the C API
2014-06-19 21:22:46 +02:00
Lei Jin 1ec2d1c69d fix make shared_lib compilation error
Summary: s/class ParsedInternalKey/struct ParsedInternalKey

Test Plan: make shared_lib

Reviewers: igor, yhchiang, sdong, haobo

Reviewed By: haobo

Subscribers: leveldb

Differential Revision: https://reviews.facebook.net/D19173
2014-06-19 10:12:26 -07:00
Lei Jin c4e90c79ed bug fix: iteration over ColumnFamilySet needs to be under mutex
Summary: asan_crash_test is failing on segfault

Test Plan: running asan_crash_test

Reviewers: sdong, igor

Reviewed By: igor

Subscribers: leveldb

Differential Revision: https://reviews.facebook.net/D19149
2014-06-19 09:31:14 -07:00
Evan Shaw 5363eb8ad4 Add a test for using compaction filters via the C API 2014-06-19 21:46:58 +12:00
Haobo Xu 167738256f [RocksDB] Fix unit test
Summary: fix a bug in D19047, which caused  DBTest.RecoverDuringMemtableCompaction to fail.

Test Plan: unit test

Reviewers: sdong, igor

Reviewed By: igor

Subscribers: leveldb

Differential Revision: https://reviews.facebook.net/D19155
2014-06-19 01:37:21 -07:00
Evan Shaw d72313a7fa Add a way to set compaction filter in the C API 2014-06-19 16:31:24 +12:00
Evan Shaw df2701373d Support for compaction filters in the C API 2014-06-19 16:31:17 +12:00
sdong edd47c5104 PlainTable to encode to avoid to rewrite prefix when it is the same as the previous key
Summary:
Add a encoding feature of PlainTable to encode PlainTable's keys to save some bytes for the same prefixes.
The data format is documented in table/plain_table_factory.h

Test Plan: Add unit test coverage in plain_table_db_test

Reviewers: yhchiang, igor, dhruba, ljin, haobo

Reviewed By: haobo

Subscribers: nkg-, leveldb

Differential Revision: https://reviews.facebook.net/D18735
2014-06-18 20:41:52 -07:00
Haobo Xu 0f0076ed5a [RocksDB] Reduce memory footprint of the blockbased table hash index.
Summary:
Currently, the in-memory hash index of blockbased table uses a precise hash map to track the prefix to block range mapping. In some use cases, especially when prefix itself is big, the memory overhead becomes a problem. This diff introduces a fixed hash bucket array that does not store the prefix and allows prefix collision, which is similar to the plaintable hash index, in order to reduce the memory consumption.
Just a quick draft, still testing and refining.

Test Plan: unit test and shadow testing

Reviewers: dhruba, kailiu, sdong

Reviewed By: sdong

Subscribers: leveldb

Differential Revision: https://reviews.facebook.net/D19047
2014-06-18 18:16:07 -07:00
Igor Canadi 3525aac9e5 Change order of parameters in adaptive table factory
Summary:
This is minor, but if we put the writing talbe factory as the third parameter, when we add a new table format, we'll have a situation:
1) block based factory
2) plain table factory
3) output factory
4) new format factory

I think it makes more sense to have output as the first parameter.

Also, fixed a NewAdaptiveTableFactory() call in unit test

Test Plan: unit test

Reviewers: sdong

Reviewed By: sdong

Subscribers: leveldb

Differential Revision: https://reviews.facebook.net/D19119
2014-06-18 07:04:37 +02:00
sdong 8c265c08f1 HashLinkList to log distribution of number of entries aross buckets
Summary: Add two parameters of hash linked list to log distribution of number of entries across all buckets, and a sample row when there are too many entries in one single bucket.

Test Plan: Turn it on in plain_table_db_test and see the logs.

Reviewers: haobo, ljin

Reviewed By: ljin

Subscribers: leveldb, nkg-, dhruba, yhchiang

Differential Revision: https://reviews.facebook.net/D19095
2014-06-17 17:55:36 -07:00
Yueh-Hsuan Chiang 4bff7a8a87 Merge pull request #177 from nanwu/master
specify the command to install build_tools/mac-install-gflags.sh file in...
2014-06-17 15:52:37 -07:00
nawu b982e65f8b specify the command to install build_tools/mac-install-gflags.sh file in doc 2014-06-17 17:03:21 -05:00
sdong 200e4b4a72 Add a table factory that can read DB with both of PlainTable and BlockBasedTable in it
Summary: The new table factory is used if users want to convert a DB from one table format to the other. A user can use this table to open a DB written using one table format and write new files to another table format.

Test Plan: add a unit test

Reviewers: haobo, igor

Reviewed By: igor

Subscribers: dhruba, ljin, yhchiang, leveldb

Differential Revision: https://reviews.facebook.net/D19017
2014-06-17 11:49:22 -07:00
Igor Canadi 4f18bfe376 Merge pull request #176 from bgrainger/mutexrw-unlock
Add separate Read/WriteUnlock methods in MutexRW.
2014-06-17 20:38:06 +02:00
Yueh-Hsuan Chiang e6e259b8ab Include max_write_buffer_number >= 2 to SanitizeOptions.
Summary: Include max_write_buffer_number >= 2 to SanitizeOptions.

Test Plan: make all check

Reviewers: haobo, sdong, igor, ljin

Reviewed By: ljin

Subscribers: leveldb

Differential Revision: https://reviews.facebook.net/D19077
2014-06-16 16:26:46 -07:00
sdong cadc1adffa Refactor: group metadata needed to open an SST file to a separate copyable struct
Summary:
We added multiple fields to FileMetaData recently and are planning to add more.
This refactoring separate the minimum information for accessing the file. This object is copyable (FileMetaData is not copyable since the ref counter). I hope this refactoring can enable further improvements:

(1) use it to design a more efficient data structure to speed up read queries.
(2) in the future, when we add information of storage level, we can easily do the encoding, instead of enlarge this structure, which might expand memory work set for file meta data.

The definition is same as current EncodedFileMetaData used in two level iterator, so now the logic in two level iterator is easier to understand.

Test Plan: make all check

Reviewers: haobo, igor, ljin

Reviewed By: ljin

Subscribers: leveldb, dhruba, yhchiang

Differential Revision: https://reviews.facebook.net/D18933
2014-06-16 16:10:52 -07:00
Bradley Grainger 2d02ec6533 Add separate Read/WriteUnlock methods in MutexRW.
Some platforms, particularly Windows, do not have a single method that can
release both a held reader lock and a held writer lock; instead, a
separate method (ReleaseSRWLockShared or ReleaseSRWLockExclusive) must be
called in each case.

This may also be necessary to back MutexRW with a shared_mutex in C++14;
the current language proposal includes both an unlock() and a
shared_unlock() method.
2014-06-16 15:41:46 -07:00
Yueh-Hsuan Chiang 4d913cfbc3 Fix a bug causing LOG is not created when max_log_file_size is set.
Summary:
Fix a bug causing LOG is not created when max_log_file_size is set.
This bug is reported in issue #174.

Test Plan:
Add TEST(AutoRollLoggerTest, LogFileExistence).
make auto_roll_logger_test
./auto_roll_logger_test

Reviewers: haobo, sdong, ljin, igor, igor2

Reviewed By: igor2

Subscribers: dhruba, leveldb

Differential Revision: https://reviews.facebook.net/D19053
2014-06-16 10:27:42 -07:00
sdong 983c93d731 VersionSet::Get(): Bring back the logic of skipping key range check when there are <=3 level 0 files
Summary:
https://reviews.facebook.net/D17205 removed the logic of skipping file key range check when there are less than 3 level 0 files. This patch brings it back.

Other than that, add another small optimization to avoid to check all the levels if most higher levels don't have any file.

Test Plan: make all check

Reviewers: ljin

Reviewed By: ljin

Subscribers: yhchiang, igor, haobo, dhruba, leveldb

Differential Revision: https://reviews.facebook.net/D19035
2014-06-13 15:51:44 -07:00
Barnaby a52a4e0952 Update README.md
Moved doc/index.html to wiki
2014-06-13 14:11:10 -07:00
sdong 9202d9b625 Fix sst_dump for PlainTable
Summary: sst_dump now doesn't work well for PlainTable. Not sure when it started, but this should fix it.

Test Plan: Run sst_dump against a file that used to fail.

Reviewers: yhchiang, haobo, igor

Reviewed By: igor

Subscribers: dhruba, ljin, leveldb

Differential Revision: https://reviews.facebook.net/D19023
2014-06-12 11:03:03 -07:00
Lei Jin c83b085770 prefetch bloom filter data block for L0 files
Summary: as title

Test Plan:
db_bench
the initial result is very promising. I will post results of complete
runs

Reviewers: dhruba, haobo, sdong, igor

Reviewed By: sdong

Subscribers: leveldb

Differential Revision: https://reviews.facebook.net/D18867
2014-06-12 10:06:18 -07:00
Lei Jin 578cf84ddf give correct metric name for grep in regression script
Summary: as title

Test Plan: jenkin

Reviewers: igor

Reviewed By: igor

Subscribers: leveldb

Differential Revision: https://reviews.facebook.net/D19005
2014-06-10 16:32:09 -07:00
Lei Jin c4b3817d7c fix regression test
Summary: as title

Test Plan: push

Reviewers: igor

Reviewed By: igor

Subscribers: leveldb

Differential Revision: https://reviews.facebook.net/D18999
2014-06-10 11:24:24 -07:00
sdong 88a1691a1e BlockBasedTable::PrefixMayMatch() to bloom setting to the beginning of the function
Summary: In BlockBasedTable::PrefixMayMatch() we calculate prefix even if bloom is not config. Move the check before

Test Plan: make all check

Reviewers: igor, ljin

Reviewed By: ljin

Subscribers: wuj, leveldb, haobo, yhchiang, dhruba

Differential Revision: https://reviews.facebook.net/D18993
2014-06-10 11:14:22 -07:00
Lei Jin e2d3101cf1 collect metrics for in memory workload get/seek
Summary:
collect in-memory workload get/seek metrics so that we can alert on
regression

Test Plan: ran locally

Reviewers: igor

Reviewed By: igor

Subscribers: leveldb

Differential Revision: https://reviews.facebook.net/D18969
2014-06-10 09:59:16 -07:00
Lei Jin 77db08f27b fix forward iterator bug
Summary: obvious

Test Plan: db_test

Reviewers: sdong, haobo, igor

Reviewed By: igor

Subscribers: leveldb

Differential Revision: https://reviews.facebook.net/D18987
2014-06-10 09:57:26 -07:00
sdong 80f409ea37 Clean PlainTableReader's variables for better data locality
Summary:
Clean PlainTableReader's data structures:
(1) inline bloom_ (in order to do this, change DynamicBloom to allow lazy initialization)
(2) remove some variables only used when initialization from the class
(3) put variables not used in normal read code paths to the end of the class and reference prefix_extractor directly
(4) make Options a reference.

Test Plan: make all check

Reviewers: haobo, ljin

Reviewed By: ljin

Subscribers: igor, yhchiang, dhruba, leveldb

Differential Revision: https://reviews.facebook.net/D18891
2014-06-09 13:53:39 -07:00
Igor Canadi f43c8262c2 Don't compress block bigger than 2GB
Summary: This is a temporary solution to a issue that we have with compression libraries. See task #4453446.

Test Plan: make check doesn't complain :)

Reviewers: haobo, ljin, yhchiang, dhruba, sdong

Reviewed By: sdong

Subscribers: leveldb

Differential Revision: https://reviews.facebook.net/D18975
2014-06-09 12:26:09 -07:00
sdong ee5a51e6ce sst_dump: still try to print out table properties even if failing to read the file
Summary: Even if the file is corrupted, table properties are usually available to print out. Now sst_dump would just fail without printing table properties. With this patch, table properties are still try to be printed out.

Test Plan: run sst_dump against multiple scenarios

Reviewers: igor, yhchiang, ljin, haobo

Reviewed By: haobo

Subscribers: dhruba, leveldb

Differential Revision: https://reviews.facebook.net/D18981
2014-06-09 11:21:44 -07:00
Yueh-Hsuan Chiang 3e701a654f [Java] Improve documentation for RocksEnv and its C++ resource.
Summary:
Improve documentation for RocksEnv and its C++ resource.  Specifically,
the result of RocksEnv::Default() is a singleton, and the ownership
of its c++ resource belongs to rocksdb c++.  As a result, calling
dispose() of the return value of RocksDB::Default() will be no-op.

Test Plan: no code change.

Reviewers: haobo, ankgup87

Reviewed By: ankgup87

Subscribers: leveldb

Differential Revision: https://reviews.facebook.net/D18945
2014-06-07 17:27:03 -07:00
Igor Canadi 0365eaf12e remove unnecessary printf 2014-06-06 18:27:44 -07:00
Igor Canadi a0191c9dfe Create Missing Column Families
Summary: Provide an convenience option to create column families if they are missing from the DB. Task #4460490

Test Plan: added unit test. also, stress test for some time

Reviewers: sdong, haobo, dhruba, ljin, yhchiang

Reviewed By: yhchiang

Subscribers: yhchiang, leveldb

Differential Revision: https://reviews.facebook.net/D18951
2014-06-06 18:04:56 -07:00
Igor Canadi 99d3eed2fd Write Fast-path for single column family
Summary: We have a perf regression of Write() even with one column family. Make fast path for single column family to avoid the perf regression. See task #4455480

Test Plan: make check

Reviewers: sdong, ljin

Reviewed By: sdong, ljin

Subscribers: leveldb

Differential Revision: https://reviews.facebook.net/D18963
2014-06-06 17:26:23 -07:00
Yueh-Hsuan Chiang e72b02e3c2 [Java] Add basic Java binding for rocksdb::Env.
Summary: Add basic Java binding for rocksdb::Env.

Test Plan:
make rocksdbjava
make jtest
cd java
./jdb_bench.sh --max_background_compactions=1
./jdb_bench.sh --max_background_compactions=10

Reviewers: sdong, ankgup87, haobo

Reviewed By: haobo

Subscribers: leveldb

Differential Revision: https://reviews.facebook.net/D18903
2014-06-05 17:09:25 -07:00
sdong b92a19a431 sst_dump: Set dummy prefix extractor for binary search index in block based table
Summary: Now sst_dump fails in block based tables if binary search index is used, as it requires a prefix extractor. Add it.

Test Plan: Run it against such a file to make sure it fixes the problem.

Reviewers: yhchiang, kailiu

Reviewed By: kailiu

Subscribers: ljin, igor, dhruba, haobo, leveldb

Differential Revision: https://reviews.facebook.net/D18927
2014-06-05 15:37:23 -07:00
Igor Canadi 5d870717ae Correctly preallocate files in universal compaction
Summary: In universal compaction, MaxFileSizeForLevel is ULLONG_MAX. We've been preallocation files to UULONG_MAX size all these time :)

Test Plan: make check

Reviewers: dhruba, haobo, ljin, sdong, yhchiang

Reviewed By: yhchiang

Subscribers: leveldb

Differential Revision: https://reviews.facebook.net/D18915
2014-06-05 13:19:35 -07:00
Yueh-Hsuan Chiang 166cc5b456 Merge pull request #157 from ankgup87/master
[Java] Add RestoreBackupableDB
2014-06-05 09:24:55 -07:00
Ankit Gupta 1a6c1a5ddd Fix build 2014-06-05 13:34:38 +01:00
Ankit Gupta 2fa0a993b8 Fix build 2014-06-05 13:25:08 +01:00
Ankit Gupta 08314321fa Merge branch 'master' of https://github.com/facebook/rocksdb 2014-06-05 13:17:53 +01:00
Igor Canadi 457bae6911 Fix regression test
Summary:
388d2054c7
added extra line to db_bench output, breaking regression tests. This diff makes it more robust and fixes the issue

Test Plan: ran it

Reviewers: ljin, sdong

Reviewed By: sdong

Subscribers: sdong, leveldb

Differential Revision: https://reviews.facebook.net/D18897
2014-06-04 09:59:44 -07:00
Igor Canadi 552c49f0f4 Remove upper bound for rate limiting unit test 2014-06-03 13:58:44 -07:00
Igor Canadi fd27001072 Fix compile errors on Mac
Summary: https://phabricator.fb.com/P11372644

Test Plan: compiles

Reviewers: sdong, ljin

Reviewed By: ljin

Subscribers: leveldb

Differential Revision: https://reviews.facebook.net/D18873
2014-06-03 12:28:58 -07:00
sdong df9069d23f In DB::NewIterator(), try to allocate the whole iterator tree in an arena
Summary:
In this patch, try to allocate the whole iterator tree starting from DBIter from an arena
1. ArenaWrappedDBIter is created when serves as the entry point of an iterator tree, with an arena in it.
2. Add an option to create iterator from arena for following iterators: DBIter, MergingIterator, MemtableIterator, all mem table's iterators, all table reader's iterators and two level iterator.
3. MergeIteratorBuilder is created to incrementally build the tree of internal iterators. It is passed to mem table list and version set and add iterators to it.

Limitations:
(1) Only DB::NewIterator() without tailing uses the arena. Other cases, including readonly DB and compactions are still from malloc
(2) Two level iterator itself is allocated in arena, but not iterators inside it.

Test Plan: make all check

Reviewers: ljin, haobo

Reviewed By: haobo

Subscribers: leveldb, dhruba, yhchiang, igor

Differential Revision: https://reviews.facebook.net/D18513
2014-06-02 17:44:57 -07:00
sdong 462796697c dynamic_bloom: replace some divide (remainder) operations with shifts in locality mode, and other improvements
Summary:
This patch changes meaning of options.bloom_locality: 0 means disable cache line optimization and any positive number means use CACHE_LINE_SIZE as block size (the previous behavior is the block size will be CACHE_LINE_SIZE*options.bloom_locality). By doing it, the divide operations inside a block can be replaced by a shift.

Performance is improved:
https://reviews.facebook.net/P471

Also, improve the basic algorithm in two ways:
(1) make sure num of blocks is an odd number
(2) rotate bytes after every probe in locality mode. Since the divider is 2^n, unless doing it, we are never able to use all the bits.
Improvements of false positive: https://reviews.facebook.net/P459

Test Plan: make all check

Reviewers: ljin, haobo

Reviewed By: haobo

Subscribers: dhruba, yhchiang, igor, leveldb

Differential Revision: https://reviews.facebook.net/D18843
2014-06-02 17:36:38 -07:00