Commit graph

44 commits

Author SHA1 Message Date
sdong 9a8e3f73ed plain table reader: non-mmap mode to keep two recent buffers
Summary: In plain table reader's non-mmap mode, we only keep the most recent read buffer. However, for binary search, it is likely we come back to a location to read. To avoid one pread in such a case, we keep two read buffers. It should cover most of the cases.

Test Plan:
1. run tests
2. check the optimization works through strace when running
./table_reader_bench -mmap_read=false --num_keys2=1 -num_keys1=5000 -table_factory=plain_table --iterator --through_db

Reviewers: anthony, rven, kradhakrishnan, igor, yhchiang, IslamAbdelRahman

Reviewed By: IslamAbdelRahman

Subscribers: leveldb, dhruba

Differential Revision: https://reviews.facebook.net/D51171
2016-01-08 10:53:57 -08:00
sdong df34aea331 PlainTableReader to support non-mmap mode
Summary:
PlainTableReader now only allows mmap-mode. Add the support to non-mmap mode for more flexibility.
Refactor the codes to move all logic of reading data to PlainTableKeyDecoder, and consolidate the calls to Read() call and ReadVarint32() call. Implement the calls for both of mmap and non-mmap case seperately. For non-mmap mode, make copy of keys in several places when we need to move the buffer after reading the keys.

Test Plan: Add the mode of non-mmap case in plain_table_db_test. Run it in valgrind mode too.

Subscribers: leveldb, dhruba

Differential Revision: https://reviews.facebook.net/D47187
2015-09-23 11:41:07 -07:00
Islam AbdelRahman 45e9e4f0bb Refactor NewTableReader to accept TableReaderOptions
Summary:
Refactoring NewTableReader to accept TableReaderOptions
This will make it easier to add new options in the future, for example in this diff https://reviews.facebook.net/D46071

Test Plan: run existing tests

Reviewers: igor, yhchiang, anthony, rven, sdong

Reviewed By: sdong

Subscribers: dhruba

Differential Revision: https://reviews.facebook.net/D46179
2015-09-11 11:36:33 -07:00
sdong 3d78eb66bb Arena usage to be calculated using malloc_usable_size()
Summary: malloc_usable_size() gets a better estimation of memory usage. It is already used to calculate block cache memory usage. Use it in arena too.

Test Plan: Run all unit tests

Reviewers: anthony, kradhakrishnan, rven, IslamAbdelRahman, yhchiang

Reviewed By: yhchiang

Subscribers: leveldb, dhruba

Differential Revision: https://reviews.facebook.net/D43317
2015-08-31 09:39:27 -07:00
Islam AbdelRahman 0d1d9aeebe Block plain_table_db_test in ROCKSDB_LITE
Summary: Block plain_table_db_test in ROCKSDB_LITE since plain table is not supported in ROCKSDB_LITE

Test Plan: plain_table_db_test

Reviewers: igor, sdong, yhchiang

Reviewed By: yhchiang

Subscribers: dhruba

Differential Revision: https://reviews.facebook.net/D42159
2015-07-20 11:12:02 -07:00
sdong 6e9fbeb27c Move rate_limiter, write buffering, most perf context instrumentation and most random kill out of Env
Summary: We want to keep Env a think layer for better portability. Less platform dependent codes should be moved out of Env. In this patch, I create a wrapper of file readers and writers, and put rate limiting, write buffering, as well as most perf context instrumentation and random kill out of Env. It will make it easier to maintain multiple Env in the future.

Test Plan: Run all existing unit tests.

Reviewers: anthony, kradhakrishnan, IslamAbdelRahman, yhchiang, igor

Reviewed By: igor

Subscribers: leveldb, dhruba

Differential Revision: https://reviews.facebook.net/D42321
2015-07-17 16:58:18 -07:00
Igor Canadi 35ca59364c Don't let flushes preempt compactions
Summary:
When we first started, max_background_flushes was 0 by default and compaction thread was executing flushes (since there was no flush thread). Then, we switched the default max_background_flushes to 1. However, we still support the case where there is no flush thread and flushes are done in compaction. This is making our code a bit more complicated. By not supporting this use-case we can make our code simpler.

We have a special case that when you set max_background_flushes to 0, we
schedule the flush to execute on the compaction thread.

Test Plan: make check (there might be some unit tests that depend on this behavior)

Reviewers: IslamAbdelRahman, yhchiang, sdong

Reviewed By: sdong

Subscribers: dhruba, leveldb

Differential Revision: https://reviews.facebook.net/D41931
2015-07-17 12:02:52 -07:00
Igor Sugak 9405b5ef8f rocksdb: Remove #include "util/string_util.h" from util/testharness.h
Summary:
1. Manually deleted #include "util/string_util.h" from util/testharness.h
2.
```
% USE_CLANG=1 make all -j55 -k 2> build.log
% perl -naF: -E 'say $F[0] if /: error:/' build.log | sort -u | xargs sed -i '/#include "util\/testharness.h"/i #include "util\/string_util.h"'
```

Test Plan:
Make sure make all completes with no errors.
```
% make all -j55
```

Reviewers: meyering, igor, sdong

Reviewed By: sdong

Subscribers: dhruba, leveldb

Differential Revision: https://reviews.facebook.net/D35493
2015-03-19 17:29:37 -07:00
Igor Sugak b4b69e4f77 rocksdb: switch to gtest
Summary:
Our existing test notation is very similar to what is used in gtest. It makes it easy to adopt what is different.
In this diff I modify existing [[ https://code.google.com/p/googletest/wiki/Primer#Test_Fixtures:_Using_the_Same_Data_Configuration_for_Multiple_Te | test fixture ]] classes to inherit from `testing::Test`. Also for unit tests that use fixture class, `TEST` is replaced with `TEST_F` as required in gtest.

There are several custom `main` functions in our existing tests. To make this transition easier, I modify all `main` functions to fallow gtest notation. But eventually we can remove them and use implementation of `main` that gtest provides.

```lang=bash
% cat ~/transform
#!/bin/sh
files=$(git ls-files '*test\.cc')
for file in $files
do
  if grep -q "rocksdb::test::RunAllTests()" $file
  then
    if grep -Eq '^class \w+Test {' $file
    then
      perl -pi -e 's/^(class \w+Test) {/${1}: public testing::Test {/g' $file
      perl -pi -e 's/^(TEST)/${1}_F/g' $file
    fi
    perl -pi -e 's/(int main.*\{)/${1}::testing::InitGoogleTest(&argc, argv);/g' $file
    perl -pi -e 's/rocksdb::test::RunAllTests/RUN_ALL_TESTS/g' $file
  fi
done
% sh ~/transform
% make format
```

Second iteration of this diff contains only scripted changes.

Third iteration contains manual changes to fix last errors and make it compilable.

Test Plan:
Build and notice no errors.
```lang=bash
% USE_CLANG=1 make check -j55
```
Tests are still testing.

Reviewers: meyering, sdong, rven, igor

Reviewed By: igor

Subscribers: dhruba, leveldb

Differential Revision: https://reviews.facebook.net/D35157
2015-03-17 14:08:00 -07:00
Igor Sugak 9fd6edf81c rocksdb: Replace ASSERT* with EXPECT* in functions that does not return void value
Summary:
gtest does not use exceptions to fail a unit test by design, and `ASSERT*`s are implemented using `return`. As a consequence we cannot use `ASSERT*` in a function that does not return `void` value ([[ https://code.google.com/p/googletest/wiki/AdvancedGuide#Assertion_Placement | 1]]), and have to fix our existing code. This diff does this in a generic way, with no manual changes.

In order to detect all existing `ASSERT*` that are used in functions that doesn't return void value, I change the code to generate compile errors for such cases.

In `util/testharness.h` I defined `EXPECT*` assertions, the same way as `ASSERT*`, and redefined `ASSERT*` to return `void`. Then executed:

```lang=bash
% USE_CLANG=1 make all -j55 -k 2> build.log
% perl -naF: -e 'print "-- -number=".$F[1]." ".$F[0]."\n" if  /: error:/' \
build.log | xargs -L 1 perl -spi -e 's/ASSERT/EXPECT/g if $. == $number'
% make format
```
After that I reverted back change to `ASSERT*` in `util/testharness.h`. But preserved introduced `EXPECT*`, which is the same as `ASSERT*`. This will be deleted once switched to gtest.

This diff is independent and contains manual changes only in `util/testharness.h`.

Test Plan:
Make sure all tests are passing.
```lang=bash
% USE_CLANG=1 make check
```

Reviewers: igor, lgalanis, sdong, yufei.zhu, rven, meyering

Reviewed By: meyering

Subscribers: dhruba, leveldb

Differential Revision: https://reviews.facebook.net/D33333
2015-03-16 20:52:32 -07:00
Yueh-Hsuan Chiang 13de000f07 Add rocksdb::ToString() to address cases where std::to_string is not available.
Summary:
In some environment such as android, the c++ library does not have
std::to_string.  This path adds rocksdb::ToString(), which wraps std::to_string
when std::to_string is not available, and implements std::to_string
in the other case.

Test Plan:
make dbg -j32
./db_test
make clean
make dbg OPT=-DOS_ANDROID -j32
./db_test

Reviewers: ljin, sdong, igor

Reviewed By: igor

Subscribers: dhruba, leveldb

Differential Revision: https://reviews.facebook.net/D29181
2014-11-24 20:44:49 -08:00
Igor Canadi 767777c2bd Turn on -Wshorten-64-to-32 and fix all the errors
Summary:
We need to turn on -Wshorten-64-to-32 for mobile. See D1671432 (internal phabricator) for details.

This diff turns on the warning flag and fixes all the errors. There were also some interesting errors that I might call bugs, especially in plain table. Going forward, I think it makes sense to have this flag turned on and be very very careful when converting 64-bit to 32-bit variables.

Test Plan: compiles

Reviewers: ljin, rven, yhchiang, sdong

Reviewed By: yhchiang

Subscribers: bobbaldwin, dhruba, leveldb

Differential Revision: https://reviews.facebook.net/D28689
2014-11-11 16:47:22 -05:00
sdong 86de2007b8 Add ComparatorDBTest to test non-default comparators
Summary:
Add some helper functions to make sure DB works well for non-default comparators.
Add a test for SimpleSuffixReverseComparator.

Test Plan: Run the new test

Reviewers: ljin, rven, yhchiang, igor

Reviewed By: igor

Subscribers: leveldb, dhruba

Differential Revision: https://reviews.facebook.net/D27831
2014-10-29 15:42:36 -07:00
Lei Jin 5665e5e285 introduce ImmutableOptions
Summary:
As a preparation to support updating some options dynamically, I'd like
to first introduce ImmutableOptions, which is a subset of Options that
cannot be changed during the course of a DB lifetime without restart.

ColumnFamily will keep both Options and ImmutableOptions. Any component
below ColumnFamily should only take ImmutableOptions in their
constructor. Other options should be taken from APIs, which will be
allowed to adjust dynamically.

I am yet to make changes to memtable and other related classes to take
ImmutableOptions in their ctor. That can be done in a seprate diff as
this one is already pretty big.

Test Plan: make all check

Reviewers: yhchiang, igor, sdong

Reviewed By: sdong

Subscribers: leveldb, dhruba

Differential Revision: https://reviews.facebook.net/D22545
2014-09-04 16:18:36 -07:00
sdong 1242bfcad7 Add DB property "rocksdb.estimate-table-readers-mem"
Summary:
Add a DB Property "rocksdb.estimate-table-readers-mem" to return estimated memory usage by all loaded table readers, other than allocated from block cache.

Refactor the property codes to allow getting property from a version, with DB mutex not acquired.

Test Plan: Add several checks of this new property in existing codes for various cases.

Reviewers: yhchiang, ljin

Reviewed By: ljin

Subscribers: xjin, igor, leveldb

Differential Revision: https://reviews.facebook.net/D20733
2014-08-06 11:39:46 -07:00
Stanislau Hlebik 9d70cce047 Adding option to save PlainTable index and bloom filter in SST file.
Summary:
Adding option to save PlainTable index and bloom filter in SST file.
If there is no bloom block and/or index block, PlainTableReader builds
new ones. Otherwise PlainTableReader just use these blocks.

Test Plan: make all check

Reviewers: sdong

Reviewed By: sdong

Subscribers: leveldb

Differential Revision: https://reviews.facebook.net/D19527
2014-07-18 16:58:13 -07:00
Stanislau Hlebik 92d73cbe78 Add PlainTableOptions
Summary:
Since we have a lot of options for PlainTable, add a struct PlainTableOptions
to manage them

Test Plan: make all check

Reviewers: sdong

Reviewed By: sdong

Subscribers: leveldb

Differential Revision: https://reviews.facebook.net/D20175
2014-07-18 00:08:38 -07:00
Stanislau Hlebik 30c81e7717 Removing NewTotalOrderPlainTableFactory
Summary:
Seems like NewTotalOrderPlainTableFactory is useless and is semantically incorrect.
Total order mode indicator is prefix_extractor == nullptr,
but NewTotalOrderPlainTableFactory doesn't set it to be nullptr. That's why some tests
in plain_table_db_tests is incorrect.

Test Plan: make all check

Reviewers: sdong

Reviewed By: sdong

Subscribers: leveldb

Differential Revision: https://reviews.facebook.net/D19587
2014-07-10 11:32:04 -07:00
sdong 9c332aa11a HashLinkList memtable switches a bucket to a skip list to reduce performance outliers
Summary:
In this patch, we enhance HashLinkList memtable to reduce performance outliers when a bucket contains too many entries. We switch to skip list for this case to enable binary search.

Add threshold_use_skiplist parameter to determine when a bucket needs to switch to skip list.

The new data structure is documented in comments in the codes.

Test Plan:
make all check
set threshold_use_skiplist in several tests

Reviewers: yhchiang, haobo, ljin

Reviewed By: yhchiang, ljin

Subscribers: nkg-, xjin, dhruba, yhchiang, leveldb

Differential Revision: https://reviews.facebook.net/D19299
2014-07-01 17:14:15 -07:00
sdong edd47c5104 PlainTable to encode to avoid to rewrite prefix when it is the same as the previous key
Summary:
Add a encoding feature of PlainTable to encode PlainTable's keys to save some bytes for the same prefixes.
The data format is documented in table/plain_table_factory.h

Test Plan: Add unit test coverage in plain_table_db_test

Reviewers: yhchiang, igor, dhruba, ljin, haobo

Reviewed By: haobo

Subscribers: nkg-, leveldb

Differential Revision: https://reviews.facebook.net/D18735
2014-06-18 20:41:52 -07:00
Igor Canadi 3525aac9e5 Change order of parameters in adaptive table factory
Summary:
This is minor, but if we put the writing talbe factory as the third parameter, when we add a new table format, we'll have a situation:
1) block based factory
2) plain table factory
3) output factory
4) new format factory

I think it makes more sense to have output as the first parameter.

Also, fixed a NewAdaptiveTableFactory() call in unit test

Test Plan: unit test

Reviewers: sdong

Reviewed By: sdong

Subscribers: leveldb

Differential Revision: https://reviews.facebook.net/D19119
2014-06-18 07:04:37 +02:00
sdong 8c265c08f1 HashLinkList to log distribution of number of entries aross buckets
Summary: Add two parameters of hash linked list to log distribution of number of entries across all buckets, and a sample row when there are too many entries in one single bucket.

Test Plan: Turn it on in plain_table_db_test and see the logs.

Reviewers: haobo, ljin

Reviewed By: ljin

Subscribers: leveldb, nkg-, dhruba, yhchiang

Differential Revision: https://reviews.facebook.net/D19095
2014-06-17 17:55:36 -07:00
sdong 200e4b4a72 Add a table factory that can read DB with both of PlainTable and BlockBasedTable in it
Summary: The new table factory is used if users want to convert a DB from one table format to the other. A user can use this table to open a DB written using one table format and write new files to another table format.

Test Plan: add a unit test

Reviewers: haobo, igor

Reviewed By: igor

Subscribers: dhruba, ljin, yhchiang, leveldb

Differential Revision: https://reviews.facebook.net/D19017
2014-06-17 11:49:22 -07:00
sdong 80f409ea37 Clean PlainTableReader's variables for better data locality
Summary:
Clean PlainTableReader's data structures:
(1) inline bloom_ (in order to do this, change DynamicBloom to allow lazy initialization)
(2) remove some variables only used when initialization from the class
(3) put variables not used in normal read code paths to the end of the class and reference prefix_extractor directly
(4) make Options a reference.

Test Plan: make all check

Reviewers: haobo, ljin

Reviewed By: ljin

Subscribers: igor, yhchiang, dhruba, leveldb

Differential Revision: https://reviews.facebook.net/D18891
2014-06-09 13:53:39 -07:00
sdong 4a7c747064 Revert "Revert "Allow allocating dynamic bloom, plain table indexes and hash linked list from huge page TLB""
And make the default 0 for hash linked list memtable

This reverts commit d69dc64be7.
2014-05-04 13:56:29 -07:00
Igor Canadi d69dc64be7 Revert "Allow allocating dynamic bloom, plain table indexes and hash linked list from huge page TLB"
This reverts commit 7dafa3a1d7.
2014-05-04 08:37:09 -07:00
sdong 7dafa3a1d7 Allow allocating dynamic bloom, plain table indexes and hash linked list from huge page TLB
Summary: Add an option to allocate a piece of memory from huge page TLB. Add options to trigger it in dynamic bloom, plain table indexes andhash linked list hash table.

Test Plan: make all check

Reviewers: haobo, ljin

Reviewed By: haobo

CC: nkg-, dhruba, leveldb, igor, yhchiang

Differential Revision: https://reviews.facebook.net/D18357
2014-04-30 11:02:26 -07:00
Igor Canadi 66f88c43a5 Some fixes as preparation for release 2014-04-30 09:03:24 -07:00
Lei Jin 3995e801ab kill ReadOptions.prefix and .prefix_seek
Summary:
also add an override option total_order_iteration if you want to use full
iterator with prefix_extractor

Test Plan: make all check

Reviewers: igor, haobo, sdong, yhchiang

Reviewed By: haobo

CC: leveldb, dhruba

Differential Revision: https://reviews.facebook.net/D17805
2014-04-25 12:21:34 -07:00
sdong 86a0133d05 PlainTableReader to expose index size to users
Summary:
This is a temp solution to expose index sizes to users from PlainTableReader before we persistent them to files.
In this patch, the memory consumption of indexes used by PlainTableReader will be reported as two user defined properties, so that users can monitor them.

Test Plan:
Add a unit test.
make all check`

Reviewers: haobo, ljin

Reviewed By: haobo

CC: nkg-, yhchiang, igor, ljin, dhruba, leveldb

Differential Revision: https://reviews.facebook.net/D18195
2014-04-22 19:29:05 -07:00
Igor Canadi 4daea66343 Turn on -Wmissing-prototypes
Summary: Compiling for iOS has by default turned on -Wmissing-prototypes, which causes rocksdb to fail compiling. This diff turns on -Wmissing-prototypes in our compile options and cleans up all functions with missing prototypes.

Test Plan: compiles

Reviewers: dhruba, haobo, ljin, sdong

Reviewed By: ljin

CC: leveldb

Differential Revision: https://reviews.facebook.net/D17649
2014-04-09 21:17:14 -07:00
sdong 5e2db3b434 PlainTableIterator not to store copied key in std::string
Summary:
Move PlainTableIterator's copied key from std::string local buffer to avoid paying the extra costs in std::string related to sharing. Reuse the same buffer class in DbIter. Move the class to dbformat.h.

This patch improves iterator performance significantly. Running this benchmark:

./table_reader_bench --num_keys2=17 --iterator --plain_table --time_unit=nanosecond

The average latency is improved to about 750 nanoseconds from 1100 nanoseconds.

Test Plan:
Add a unit test.
make all check

Reviewers: haobo, ljin

Reviewed By: haobo

CC: igor, yhchiang, dhruba, leveldb

Differential Revision: https://reviews.facebook.net/D17547
2014-04-07 19:06:09 -07:00
sdong d50619a559 PlainTableIterator::Seek() shouldn't check bloom filter in total order mode
Summary:
In total order mode, iterator's seek() shouldn't check total order.

Also some cleaning up about checking null for shared pointers. I don't know the behavior before it.

This bug was reported by @igor.

Test Plan: test plain_table_db_test

Reviewers: ljin, haobo, igor

Reviewed By: igor

CC: yhchiang, dhruba, igor, leveldb

Differential Revision: https://reviews.facebook.net/D17391
2014-04-01 15:05:16 -07:00
Lei Jin 8d007b4aaf Consolidate SliceTransform object ownership
Summary:
(1) Fix SanitizeOptions() to also check HashLinkList. The current
dynamic case just happens to work because the 2 classes have the same
layout.
(2) Do not delete SliceTransform object in HashSkipListFactory and
HashLinkListFactory destructor. Reason: SanitizeOptions() enforces
prefix_extractor and SliceTransform to be the same object when
Hash**Factory is used. This makes the behavior strange: when
Hash**Factory is used, prefix_extractor will be released by RocksDB. If
other memtable factory is used, prefix_extractor should be released by
user.

Test Plan: db_bench && make asan_check

Reviewers: haobo, igor, sdong

Reviewed By: igor

CC: leveldb, dhruba

Differential Revision: https://reviews.facebook.net/D16587
2014-03-10 12:56:46 -07:00
Kai Liu 2b205b35d8 Disable putting filter block to block cache
Summary: This bug caused server crash issues because the filter block is too big and kept purging out of cache.

Test Plan: Wrote a new unit tests to make sure it works.

Reviewers: dhruba, haobo, igor, sdong

Reviewed By: haobo

CC: leveldb

Differential Revision: https://reviews.facebook.net/D16221
2014-02-19 15:38:57 -08:00
Kai Liu b2e7ee8b41 Followup code refactor on plain table
Summary:
Fixed most comments in https://reviews.facebook.net/D15429.
Still have some remaining comments left.

Test Plan: make all check

Reviewers: sdong, haobo

Reviewed By: haobo

CC: leveldb

Differential Revision: https://reviews.facebook.net/D15885
2014-02-13 15:27:59 -08:00
Siying Dong f3ae3d07cc Add more black-box tests for PlainTable and explicitly support total order mode
Summary:
1. Add some more implementation-aware tests for PlainTable
2. move from a hard-coded one index per 16 rows in one prefix to a configurable number. Also, make hash table ratio = 0  means binary search only. Also fixes some divide 0 risks.
3. Explicitly support total order (only use binary search)
4. some code cleaning up.

Test Plan: make all check

Reviewers: haobo, kailiu

Reviewed By: haobo

CC: leveldb

Differential Revision: https://reviews.facebook.net/D16023
2014-02-12 17:37:22 -08:00
kailiu 4f6cb17bdb First phase API clean up
Summary:
Addressed all the issues in https://reviews.facebook.net/D15447.
Now most table-related modules are hidden from user land.

Test Plan: make check

Reviewers: sdong, haobo, dhruba

CC: leveldb

Differential Revision: https://reviews.facebook.net/D15525
2014-02-03 00:30:43 -08:00
kailiu 4e0298f23c Clean up arena API
Summary:
Easy thing goes first. This patch moves arena to internal dir; based
on which, the coming patch will deal with memtable_rep.

Test Plan: make check

Reviewers: haobo, sdong, dhruba

CC: leveldb

Differential Revision: https://reviews.facebook.net/D15615
2014-01-30 22:10:10 -08:00
Kai Liu 054c5dda8c Merge branch 'master' into performance
Conflicts:
	db/db_impl.cc
	db/db_test.cc
	db/memtable.cc
	db/version_set.cc
	include/rocksdb/statistics.h
	util/statistics_imp.h
2014-01-23 16:32:49 -08:00
Kai Liu 5e7d5629c7 Fix the valgrind issues 2014-01-03 11:48:31 -08:00
Siying Dong abaf26266d [RocksDB] [Performance Branch] Some Changes to PlainTable format
Summary:
Some changes to PlainTable format:
(1) support variable key length
(2) use user defined slice transformer to extract prefixes
(3) Run some test cases against PlainTable in db_test and table_test

Test Plan: test db_test

Reviewers: haobo, kailiu

CC: dhruba, igor, leveldb, nkg-

Differential Revision: https://reviews.facebook.net/D14457
2013-12-20 12:08:35 -08:00
Siying Dong 718488abc5 Add BloomFilter to PlainTableIterator::Seek()
Summary:
This patch adds a simple bloom filter in PlainTableIterator::Seek()

Test Plan: N/A

Reviewers:

CC:

Task ID: #

Blame Rev:
2013-11-21 22:26:39 -08:00
Siying Dong b59d4d5a50 A Simple Plain Table
Summary:
A Simple plain table format. No block structure. When creating the table reader, scanning the full table to create indexes.

Test Plan:Add unit test

Reviewers:haobo,dhruba,kailiu

CC:

Task ID: #

Blame Rev:
2013-11-20 18:44:22 -08:00