Commit graph

161 commits

Author SHA1 Message Date
Kai Liu 2b205b35d8 Disable putting filter block to block cache
Summary: This bug caused server crash issues because the filter block is too big and kept purging out of cache.

Test Plan: Wrote a new unit tests to make sure it works.

Reviewers: dhruba, haobo, igor, sdong

Reviewed By: haobo

CC: leveldb

Differential Revision: https://reviews.facebook.net/D16221
2014-02-19 15:38:57 -08:00
kailiu 63690625cd Expose the table properties to application
Summary: Provide a public API for users to access the table properties for each SSTable.

Test Plan: Added a unit tests to test the function correctness under differnet conditions.

Reviewers: haobo, dhruba, sdong

Reviewed By: haobo

CC: leveldb

Differential Revision: https://reviews.facebook.net/D16083
2014-02-13 16:28:21 -08:00
Kai Liu b2e7ee8b41 Followup code refactor on plain table
Summary:
Fixed most comments in https://reviews.facebook.net/D15429.
Still have some remaining comments left.

Test Plan: make all check

Reviewers: sdong, haobo

Reviewed By: haobo

CC: leveldb

Differential Revision: https://reviews.facebook.net/D15885
2014-02-13 15:27:59 -08:00
Kai Liu 59cffe02c4 Benchmark table reader wiht nanoseconds
Summary: nanosecnods gave us better view of the performance, especially when some operations are fast so that micro seconds may only reveal less informative results.

Test Plan:
sample output:

    ./table_reader_bench --plain_table --time_unit=nanosecond
    =======================================================================================================
    InMemoryTableSimpleBenchmark:           PlainTable   num_key1:   4096   num_key2:   512   non_empty
    =======================================================================================================
    Histogram (unit: nanosecond):
    Count: 6291456  Average: 475.3867  StdDev: 556.05
    Min: 135.0000  Median: 400.1817  Max: 33370.0000
    Percentiles: P50: 400.18 P75: 530.02 P99: 887.73 P99.9: 8843.26 P99.99: 9941.21
    ------------------------------------------------------
    [     120,     140 )        2   0.000%   0.000%
    [     140,     160 )      452   0.007%   0.007%
    [     160,     180 )    13683   0.217%   0.225%
    [     180,     200 )    54353   0.864%   1.089%
    [     200,     250 )   101004   1.605%   2.694%
    [     250,     300 )   729791  11.600%  14.294% ##
    [     300,     350 )   616070   9.792%  24.086% ##
    [     350,     400 )  1628021  25.877%  49.963% #####
    [     400,     450 )   647220  10.287%  60.250% ##
    [     450,     500 )   577206   9.174%  69.424% ##
    [     500,     600 )  1168585  18.574%  87.999% ####
    [     600,     700 )   506875   8.057%  96.055% ##
    [     700,     800 )   147878   2.350%  98.406%
    [     800,     900 )    42633   0.678%  99.083%
    [     900,    1000 )    16304   0.259%  99.342%
    [    1000,    1200 )     7811   0.124%  99.466%
    [    1200,    1400 )     1453   0.023%  99.490%
    [    1400,    1600 )      307   0.005%  99.494%
    [    1600,    1800 )       81   0.001%  99.496%
    [    1800,    2000 )       18   0.000%  99.496%
    [    2000,    2500 )        8   0.000%  99.496%
    [    2500,    3000 )        6   0.000%  99.496%
    [    3500,    4000 )        3   0.000%  99.496%
    [    4000,    4500 )      116   0.002%  99.498%
    [    4500,    5000 )     1144   0.018%  99.516%
    [    5000,    6000 )     1087   0.017%  99.534%
    [    6000,    7000 )     2403   0.038%  99.572%
    [    7000,    8000 )     9840   0.156%  99.728%
    [    8000,    9000 )    12820   0.204%  99.932%
    [    9000,   10000 )     3881   0.062%  99.994%
    [   10000,   12000 )      135   0.002%  99.996%
    [   12000,   14000 )      159   0.003%  99.998%
    [   14000,   16000 )       58   0.001%  99.999%
    [   16000,   18000 )       30   0.000% 100.000%
    [   18000,   20000 )       14   0.000% 100.000%
    [   20000,   25000 )        2   0.000% 100.000%
    [   25000,   30000 )        2   0.000% 100.000%
    [   30000,   35000 )        1   0.000% 100.000%

Reviewers: haobo, dhruba, sdong

CC: leveldb

Differential Revision: https://reviews.facebook.net/D16113
2014-02-13 13:57:36 -08:00
sdong b5140a0361 Fix table_reader_bench and add it to "make"
Summary: Fix table_reader_bench after some interface changes. Add it to make to avoid future breaking

Test Plan: make table_reader_bench and run it with different options.

Reviewers: kailiu, haobo

Reviewed By: haobo

CC: igor, leveldb

Differential Revision: https://reviews.facebook.net/D16107
2014-02-12 18:31:02 -08:00
Siying Dong f3ae3d07cc Add more black-box tests for PlainTable and explicitly support total order mode
Summary:
1. Add some more implementation-aware tests for PlainTable
2. move from a hard-coded one index per 16 rows in one prefix to a configurable number. Also, make hash table ratio = 0  means binary search only. Also fixes some divide 0 risks.
3. Explicitly support total order (only use binary search)
4. some code cleaning up.

Test Plan: make all check

Reviewers: haobo, kailiu

Reviewed By: haobo

CC: leveldb

Differential Revision: https://reviews.facebook.net/D16023
2014-02-12 17:37:22 -08:00
kailiu e6b3e3b4db Support prefix seek in UserCollectedProperties
Summary: We'll need the prefix seek support for property aggregation.

Test Plan: make all check

Reviewers: haobo, sdong, dhruba

Reviewed By: haobo

CC: leveldb

Differential Revision: https://reviews.facebook.net/D15963
2014-02-12 13:14:59 -08:00
kailiu aa734ce9ab Fix a member variables initialization order issue
Summary:
In MacOS, I got issue with `Footer`'s default constructor, which initialized the magic number with some random number instead of 0.
With investigation, I found we forgot to make the kInvalidTableMagicNumber to be static. As a result, kInvalidTableMagicNumber was assgined to `table_magic_number_` before it is initialized (which will be populated with random number).

Test Plan: passed current unit tests; also passed the unit tests for the incoming diff which used the default footer.

Reviewers: yhchiang

CC: leveldb

Differential Revision: https://reviews.facebook.net/D16077
2014-02-11 14:16:46 -08:00
kailiu 745c181e20 Quick fix for table_test failure
Summary:
* Fixed the compression state array size bug.
* Temporarily disable running `DoCompressionTest()` against bzip, which will fail the test.

Test Plan: make && ./table_test

Reviewers: igor

CC: leveldb

Differential Revision: https://reviews.facebook.net/D16065
2014-02-10 17:05:14 -08:00
Igor Canadi 8e634d3ea4 Merge pull request #74 from alberts/lz4
Support for LZ4 compression.
2014-02-10 15:46:56 -08:00
Albert Strasheim df2f92214a Support for LZ4 compression. 2014-02-08 14:15:51 -08:00
Kai Liu 9a270f3f6d Fix the valgrind error in table test. 2014-02-07 22:43:58 -08:00
Kai Liu b8ea5e36b3 Fix incompatible compilation in Linux server 2014-02-07 19:47:48 -08:00
kailiu 161ab42a8a Make table properties shareable
Summary:
We are going to expose properties of all tables to end users through "some" db interface.
However, current design doesn't naturally fit for this need, which is because:

1. If a table presents in table cache, we cannot simply return the reference to its table properties, because the table may be destroy after compaction (and we don't want to hold the ref of the version).
2. Copy table properties is OK, but it's slow.

Thus in this diff, I change the table reader's interface to return a shared pointer (for const table properties), instead a const refernce.

Test Plan: `make check` passed

Reviewers: haobo, sdong, dhruba

Reviewed By: haobo

CC: leveldb

Differential Revision: https://reviews.facebook.net/D15999
2014-02-07 19:26:49 -08:00
Yueh-Hsuan Chiang 3ce8d9a988 Add support for plain table format to sst_dump.
Summary:
This diff enables the command line tool `sst_dump` to work for sst files
under plain table format.  Changes include:
  * In tools/sst_dump.cc:
    - add support for plain table format
    - display prefix_extractor information when --show_properties is on
  * In table/format.cc
    - Now the table magic number of a Footer can be later initialized
      via ReadFooterFromFile().
  * In table/meta_bocks:
    - add function ReadTableMagicNumber() that reads the magic number of
      the specified file.

Minor fixes:
 - remove a duplicate #include in table/table_test.cc
 - fix a commentary typo in include/rocksdb/memtablerep.h
 - fix lint errors.

Test Plan:
Runs sst_dump with both block-based and plain-table format files with
different arguments, specifically those with --show-properties and --from.

* sample output:
  https://reviews.facebook.net/P261

Reviewers: kailiu, sdong, xjin

CC: leveldb

Differential Revision: https://reviews.facebook.net/D15903
2014-02-07 11:15:00 -08:00
kailiu 84f8185fc0 Merge branch 'master' into performance
Conflicts:
	HISTORY.md
	db/db_impl.cc
	db/memtable.cc
2014-02-05 21:21:00 -08:00
kailiu d43ebd8c65 Put table factory back to public api
Summary:
Previous I am too ambitious to hide every detail about table factory
to internal api. However, we cannot pass the compilatoin for external
users since we use table factory as the shared_ptr, which requires
the definition of table factory's destructor.

Test Plan: make check;

Reviewers: sdong, haobo

CC: leveldb

Differential Revision: https://reviews.facebook.net/D15861
2014-02-03 19:51:20 -08:00
Igor Canadi 2966d764cd Fix some 32-bit compile errors
Summary: RocksDB doesn't compile on 32-bit architecture apparently. This is attempt to fix some of 32-bit errors. They are reported here: https://gist.github.com/paxos/8789697

Test Plan: RocksDB still compiles on 64-bit :)

Reviewers: kailiu

Reviewed By: kailiu

CC: leveldb

Differential Revision: https://reviews.facebook.net/D15825
2014-02-03 13:48:30 -08:00
Siying Dong d169b67680 [Performance Branch] PlainTable to encode rows with seqID 0, value type using 1 internal byte.
Summary: In PlainTable, use one single byte to represent 8 bytes of internal bytes, if seqID = 0 and it is value type (which should be common for bottom most files). It is to save 7 bytes for uncompressed cases.

Test Plan: make all check

Reviewers: haobo, dhruba, kailiu

Reviewed By: haobo

CC: igor, leveldb

Differential Revision: https://reviews.facebook.net/D15489
2014-02-03 12:19:30 -08:00
kailiu 4f6cb17bdb First phase API clean up
Summary:
Addressed all the issues in https://reviews.facebook.net/D15447.
Now most table-related modules are hidden from user land.

Test Plan: make check

Reviewers: sdong, haobo, dhruba

CC: leveldb

Differential Revision: https://reviews.facebook.net/D15525
2014-02-03 00:30:43 -08:00
kailiu 4e0298f23c Clean up arena API
Summary:
Easy thing goes first. This patch moves arena to internal dir; based
on which, the coming patch will deal with memtable_rep.

Test Plan: make check

Reviewers: haobo, sdong, dhruba

CC: leveldb

Differential Revision: https://reviews.facebook.net/D15615
2014-01-30 22:10:10 -08:00
kailiu a5e220f5ef Merge branch 'master' into performance
Conflicts:
	Makefile
	db/db_impl.cc
	db/db_test.cc
	db/memtable_list.cc
	db/memtable_list.h
	table/block_based_table_reader.cc
	table/table_test.cc
	util/cache.cc
	util/coding.cc
2014-01-28 10:35:55 -08:00
Kai Liu 4b51dffcf8 Some refactorings on plain table
Summary:
Plain table has been working well and this is just a nit-picking patch,
which is generated during my coding reading. No real functional changes.
only some changes regarding:

* Improve some comments from the perspective a "new" code reader.
* Change some magic number to constant, which can help us to parameterize them
  in the future.
* Did some style, naming, C++ convention changes.
* Fix warnings from new "arc lint"

Test Plan: make check

Reviewers: sdong, haobo

CC: leveldb

Differential Revision: https://reviews.facebook.net/D15429
2014-01-24 21:28:10 -08:00
Kai Liu 0ab766132b Re-org the table tests
Summary:
We'll divide the table tests into 3 buckets, plain table test, block-based table test and general table feature test.
This diff does no real change and only does the rename and reorg.

Test Plan: run table_test

Reviewers: sdong, haobo, igor, dhruba

CC: leveldb

Differential Revision: https://reviews.facebook.net/D15417
2014-01-24 12:46:17 -08:00
Kai Liu 7d991be400 Some small refactorings on table_test
Summary:

Just revise some hard-to-read or unnecessarily verbose code.

Test Plan:

make check
2014-01-24 11:10:32 -08:00
kailiu 66dc033af3 Temporarily disable caching index/filter blocks
Summary:
Mixing index/filter blocks with data blocks resulted in some known
issues.  To make sure in next release our users won't be affected,
we added a new option in BlockBasedTableFactory::TableOption to
conceal this functionality for now.

This patch also introduced a BlockBasedTableReader::OpenOptions,
which avoids the "infinite" growth of parameters in
BlockBasedTableReader::Open().

Test Plan: make check

Reviewers: haobo, sdong, igor, dhruba

Reviewed By: igor

CC: leveldb, tnovak

Differential Revision: https://reviews.facebook.net/D15327
2014-01-24 10:57:15 -08:00
Kai Liu 054c5dda8c Merge branch 'master' into performance
Conflicts:
	db/db_impl.cc
	db/db_test.cc
	db/memtable.cc
	db/version_set.cc
	include/rocksdb/statistics.h
	util/statistics_imp.h
2014-01-23 16:32:49 -08:00
Kai Liu ef602f6275 Misc cleanup on performance branch
Summary:

Did some trivial stuffs:

* Add more comments;
* fix compiler's warning messages (uninitialized variables).
* etc

Test Plan:

make check
2014-01-17 14:26:29 -08:00
Igor Canadi 83681bf9ef Statistics code cleanup
Summary: I'm separating code-cleanup part of https://reviews.facebook.net/D14517. This will make D14517 easier to understand and this diff easier to review.

Test Plan: make check

Reviewers: haobo, kailiu, sdong, dhruba, tnovak

Reviewed By: tnovak

CC: leveldb

Differential Revision: https://reviews.facebook.net/D15099
2014-01-17 12:46:06 -08:00
kailiu 1304d8c8ce Merge branch 'master' into performance
Conflicts:
	Makefile
	db/db_impl.cc
	db/db_impl.h
	db/db_test.cc
	db/memtable.cc
	db/memtable.h
	db/version_edit.h
	db/version_set.cc
	include/rocksdb/options.h
	util/hash_skiplist_rep.cc
	util/options.cc
2014-01-15 23:12:31 -08:00
Igor Canadi 7f3e417f59 Fix memtable construction in tests 2014-01-14 15:36:12 -08:00
Kai Liu 8c4eb71b5d Fix one more valgrind error in table_test 2014-01-03 18:27:33 -08:00
Kai Liu 5e7d5629c7 Fix the valgrind issues 2014-01-03 11:48:31 -08:00
Siying Dong abaf26266d [RocksDB] [Performance Branch] Some Changes to PlainTable format
Summary:
Some changes to PlainTable format:
(1) support variable key length
(2) use user defined slice transformer to extract prefixes
(3) Run some test cases against PlainTable in db_test and table_test

Test Plan: test db_test

Reviewers: haobo, kailiu

CC: dhruba, igor, leveldb, nkg-

Differential Revision: https://reviews.facebook.net/D14457
2013-12-20 12:08:35 -08:00
Siying Dong 28c24de8be [RocksDB Peformance Branch] A bug in PlainTable format
Summary: A bug to fix. IT's already fixed in D14457, but want to check it in sooner to unblock tests

Test Plan: plain_table_db_test

Reviewers: nkg-, haobo

Reviewed By: nkg-

CC: kailiu, leveldb

Differential Revision: https://reviews.facebook.net/D14673
2013-12-13 21:51:16 -08:00
Kai Liu 2e9efcd6d8 Add the property block for the plain table
Summary:
This is the last diff that adds the property block to plain table.
The format resembles that of the block-based table: https://github.com/facebook/rocksdb/wiki/Rocksdb-table-format

  [data block]
  [meta block 1: stats block]
  [meta block 2: future extended block]
  ...
  [meta block K: future extended block]  (we may add more meta blocks in the future)
  [metaindex block]
  [index block: we only have the placeholder here, we can add persistent index block in the future]
  [Footer: contains magic number, handle to metaindex block and index block]
  <end_of_file>

Test Plan: extended existing property block test.

Reviewers: haobo, sdong, dhruba

CC: leveldb

Differential Revision: https://reviews.facebook.net/D14523
2013-12-13 17:18:14 -08:00
Siying Dong 9718c790ec [Performance Branch] Fix a bug of PlainTable when building indexes
Summary:
PlainTable now has a bug of the ordering of indexes for the prefixes in the same bucket. I thought std::map guaranteed key order but it didn't, probably because I didn't use it properly. But seems to me that we don't need to make extra sorting as input prefixes are already sorted. Found by problem by running leaf4 against plain table. Replace the map with a vector. It should performs better too.

After the fix, leaf4 unit tests are passing.

Test Plan:
run plain_table_db_test
Also going to run db_test with plain table in the uncommitted branch.

Reviewers: haobo, kailiu

Reviewed By: haobo

CC: nkg-, leveldb

Differential Revision: https://reviews.facebook.net/D14649
2013-12-12 22:22:35 -08:00
kailiu 551e9428ce Merge branch 'master' into performance 2013-12-06 14:15:42 -08:00
kailiu b1d2de4a40 Fix #26 by putting the implementation of CreateDBStatistics() to a cc file 2013-12-05 22:29:03 -08:00
kailiu 90729f8b23 Extract metaindex block from block-based table
Summary: This change will allow other table to reuse the code for meta blocks.

Test Plan: all existing unit tests passed

Reviewers: dhruba, haobo, sdong

CC: leveldb

Differential Revision: https://reviews.facebook.net/D14475
2013-12-05 16:34:16 -08:00
kailiu e1d92dfd2e Fix a bunch of mac compilation issues in performance branch 2013-12-04 23:00:33 -08:00
Kai Liu 219b35be6a Generalize footer reading from file
Summary:

Generalizing this process will help us to re-use the code for plain table

Test Plan:

ran ./table_test
2013-12-04 16:35:48 -08:00
Kai Liu 5dec7acd91 Introducing the concept of NULL block handle 2013-12-04 15:45:14 -08:00
Kai Liu 3a0e98d558 Parameterize table magic number
Summary:

As we are having different types of tables and they all might share the same structure in block-based table:

[metaindex block]
[index block]
[Footer]

To be able to identify differnt types of tables, we need to parameterize the "magic number" in the `Footer`.

Test Plan:

make check
2013-12-04 15:17:40 -08:00
Siying Dong f040e536e4 [RocksDB Performance Branch] A more customized index in PlainTableReader
Summary:
PlainTableReader to use a more customized hash table. This patch assumes the SST file is smaller than 2GB:
(1) Every bucket uses 32-bit integer
(2) no key is stored in bucket
(3) use the first bit of the bucket value to distinguish it points to the file offset or a second level index.
This index schema fits the use case that most of prefixes have very small number of keys

Test Plan: plain_table_db_test

Reviewers: haobo, kailiu, dhruba

Reviewed By: haobo

CC: nkg-, leveldb

Differential Revision: https://reviews.facebook.net/D14343
2013-12-04 13:43:45 -08:00
Igor Canadi 043fc14c3e Get rid of some shared_ptrs
Summary:
I went through all remaining shared_ptrs and removed the ones that I found not-necessary. Only GenerateCachePrefix() is called fairly often, so don't expect much perf wins.

The ones that are left are accessed infrequently and I think we're fine with keeping them.

Test Plan: make asan_check

Reviewers: dhruba, haobo

Reviewed By: dhruba

CC: leveldb

Differential Revision: https://reviews.facebook.net/D14427
2013-12-03 11:17:58 -08:00
Dhruba Borthakur 96bc3ec297 Memtables should be deleted appropriately in the unit test.
Summary:
Memtables should be deleted appropriately in the unit test.

Test Plan:

Reviewers:

CC:

Task ID: #

Blame Rev:
2013-12-01 21:23:44 -08:00
Kai Liu 1966b63137 Merge branch 'master' into perf 2013-11-27 11:47:40 -08:00
Haobo Xu 5b825d6964 [RocksDB] Use raw pointer instead of shared pointer when passing Statistics object internally
Summary: liveness of the statistics object is already ensured by the shared pointer in DB options. There's no reason to pass again shared pointer among internal functions. Raw pointer is sufficient and efficient.

Test Plan: make check

Reviewers: dhruba, MarkCallaghan, igor

Reviewed By: dhruba

CC: leveldb, reconnect.grayhat

Differential Revision: https://reviews.facebook.net/D14289
2013-11-25 10:38:15 -08:00
Siying Dong dfa1460d88 [For Performance Branch] Bloom filter in PlainTableIterator::Seek() - Update 1
Summary:
Address @haobo's comments in D14277

Test Plan: ./indexed_table_db_test

Reviewers: haobo

CC:

Task ID: #

Blame Rev:
2013-11-21 23:33:45 -08:00