Summary:
Define Block::Iter to be an independent class to be used by block_based_table_reader
When creating data and index iterator, update an existing iterator rather than new one
Thus malloc and free could be reduced
Benchmark,
Base:
commit 76286ee67e
commands:
--db=/dev/shm/rocksdb --num_levels=6 --key_size=20 --prefix_size=20 --keys_per_prefix=0 --value_size=100 --write_buffer_size=134217728 --max_write_buffer_number=2 --target_file_size_base=33554432 --max_bytes_for_level_base=1073741824 --verify_checksum=false --max_background_compactions=4 --use_plain_table=0 --memtablerep=prefix_hash --open_files=-1 --mmap_read=1 --mmap_write=0 --bloom_bits=10 --bloom_locality=1 --memtable_bloom_bits=500000 --compression_type=lz4 --num=2621440 --use_hash_search=1 --block_size=1024 --block_restart_interval=1 --use_existing_db=1 --threads=1 --benchmarks=readrandom —disable_auto_compactions=1
malloc: 3.30% -> 1.42%
free: 3.59%->1.61%
Test Plan:
make all check
run db_stress
valgrind ./db_test ./table_test
Reviewers: ljin, yhchiang, dhruba, igor, sdong
Reviewed By: sdong
Subscribers: leveldb
Differential Revision: https://reviews.facebook.net/D20655
Summary:
- Copy the key and value to in-memory hash table during Add operation. Also modified cuckoo_table_reader_test to use this.
- Store only the user_key in in-memory hash table if it is last level file.
- Handle Carryover while chosing unused key in Finish() method in case unused key was never found before Finish() call.
Test Plan:
cuckoo_table_reader_test --enable_perf
cuckoo_table_builder_test
valgrind_check
asan_check
Reviewers: sdong, yhchiang, igor, ljin
Reviewed By: ljin
Subscribers: leveldb
Differential Revision: https://reviews.facebook.net/D20715
Summary:
Make StatisticsImpl being able to forward stats to provided statistics
implementation. The main purpose is to allow us to collect internal
stats in the future even when user supplies custom statistics
implementation. It avoids intrumenting 2 sets of stats collection code.
One immediate use case is tuning advisor, which needs to collect some
internal stats, users may not be interested.
Test Plan:
ran db_bench and see stats show up at the end of run
Will run make all check since some tests rely on statistics
Reviewers: yhchiang, sdong, igor
Reviewed By: sdong
Subscribers: dhruba, leveldb
Differential Revision: https://reviews.facebook.net/D20145
Summary:
In block based table's hash index checking, when looking for a key that doesn't exist, there is a high chance that a false block is returned because of hash bucket conflicts. In this revision, another check is done to filter out some of those cases: comparing previous key of the block boundary to see whether the target block is what we are looking for.
In a favored test setting (bloom filter disabled, 8 L0 files), I saw about 80% improvements. In a non-favored test setting (bloom filter enabled, files are all in L1, files are all cached), I see the performance penalty is less than 3%.
Test Plan: make all check
Reviewers: haobo, ljin
Reviewed By: ljin
Subscribers: wuj, leveldb, zagfox, yhchiang
Differential Revision: https://reviews.facebook.net/D20595
Summary:
Contains:
- Implementation of TableReader based on Cuckoo Hashing
- Unittests for CuckooTableReader
- Performance test for TableReader
Test Plan:
make cuckoo_table_reader_test
./cuckoo_table_reader_test
make valgrind_check
make asan_check
Reviewers: yhchiang, sdong, igor, ljin
Reviewed By: ljin
Subscribers: leveldb
Differential Revision: https://reviews.facebook.net/D20511
Summary:
Contains the following changes in CuckooTableBuilder:
- Take an extra parameter in constructor to identify last level file.
- Implement a better way to identify if a bucket has been inserted into the tree already during BFS search.
- Minor typos
Test Plan:
make cuckoo_table_builder
./cuckoo_table_builder
make valgrind_check
Reviewers: sdong, igor, yhchiang, ljin
Reviewed By: ljin
Subscribers: leveldb
Differential Revision: https://reviews.facebook.net/D20445
Summary:
Modify a functioin TrimAppend in dbformat.h: IterKey. Write a test for it in dbformat_test
Use IterKey in block::Iter to replace std::string to reduce malloc.
Evaluate it using perf record.
malloc: 4.26% -> 2.91%
free: 3.61% -> 3.08%
Test Plan:
make all check
./valgrind db_test dbformat_test
Reviewers: ljin, haobo, yhchiang, dhruba, igor, sdong
Reviewed By: sdong
Differential Revision: https://reviews.facebook.net/D20433
Summary: Fixes some memory leaks in cuckoo_builder_test.cc. This also fixed broken valgrind_check tests
Test Plan:
make valgrind_check
./cuckoo_builder_test
Currently running make check all. I shall update once it is done.
Reviewers: ljin, sdong, yhchiang, igor
Reviewed By: igor
Subscribers: leveldb
Differential Revision: https://reviews.facebook.net/D20385
Summary:
Fixed the following compile error.
./table/cuckoo_table_builder.h:72:22: error: private field 'key_length_' is not used [-Werror,-Wunused-private-field]
const unsigned int key_length_;
^
1 error generated.
Test Plan: make
Reviewers: sdong, ljin, radheshyamb, igor
Reviewed By: igor
Subscribers: leveldb
Differential Revision: https://reviews.facebook.net/D20349
Summary:
Adding option to save PlainTable index and bloom filter in SST file.
If there is no bloom block and/or index block, PlainTableReader builds
new ones. Otherwise PlainTableReader just use these blocks.
Test Plan: make all check
Reviewers: sdong
Reviewed By: sdong
Subscribers: leveldb
Differential Revision: https://reviews.facebook.net/D19527
Summary:
Since we have a lot of options for PlainTable, add a struct PlainTableOptions
to manage them
Test Plan: make all check
Reviewers: sdong
Reviewed By: sdong
Subscribers: leveldb
Differential Revision: https://reviews.facebook.net/D20175
Summary:
Seems like NewTotalOrderPlainTableFactory is useless and is semantically incorrect.
Total order mode indicator is prefix_extractor == nullptr,
but NewTotalOrderPlainTableFactory doesn't set it to be nullptr. That's why some tests
in plain_table_db_tests is incorrect.
Test Plan: make all check
Reviewers: sdong
Reviewed By: sdong
Subscribers: leveldb
Differential Revision: https://reviews.facebook.net/D19587
Summary:
ReadInternalKey() will assign correct value anyway. Initialize it to
true to suppress compiler error reported
https://github.com/facebook/rocksdb/issues/186
Test Plan: I cannot reproduce it but this is obvious
Reviewers: sdong, yhchiang
Reviewed By: yhchiang
Subscribers: leveldb
Differential Revision: https://reviews.facebook.net/D19467
Summary:
Bloomfilter and hashskiplist's buckets_ allocated by memtable's arena
DynamicBloom: pass arena via constructor, allocate space in SetTotalBits
HashSkipListRep: allocate space of buckets_ using arena.
do not delete it in deconstructor because arena would take care of it.
Several test files are changed.
Test Plan:
make all check
Reviewers: ljin, haobo, yhchiang, sdong
Reviewed By: sdong
Subscribers: igor, dhruba
Differential Revision: https://reviews.facebook.net/D19335
Summary:
As discussed in our internal group, we don't get much use of seek compaction at the moment, while it's making code more complicated and slower in some cases.
This diff removes seek compaction and (hopefully) all code that was introduced to support seek compaction.
There is one test case that relied on didIO information. I'll try to find another way to implement it.
Test Plan: make check
Reviewers: sdong, haobo, yhchiang, ljin, dhruba
Reviewed By: ljin
Subscribers: leveldb
Differential Revision: https://reviews.facebook.net/D19161
Summary: Currently, RocksDB returns error if a db written with prefix hash index, is later opened without providing a prefix extractor. This is uncessarily harsh. Without a prefix extractor, we could always fallback to the normal binary index.
Test Plan: unit test, also manually veried LOG that fallback did occur.
Reviewers: sdong, ljin
Reviewed By: ljin
Subscribers: leveldb
Differential Revision: https://reviews.facebook.net/D19191
Summary: fix a bug in D19047, which caused DBTest.RecoverDuringMemtableCompaction to fail.
Test Plan: unit test
Reviewers: sdong, igor
Reviewed By: igor
Subscribers: leveldb
Differential Revision: https://reviews.facebook.net/D19155
Summary:
Add a encoding feature of PlainTable to encode PlainTable's keys to save some bytes for the same prefixes.
The data format is documented in table/plain_table_factory.h
Test Plan: Add unit test coverage in plain_table_db_test
Reviewers: yhchiang, igor, dhruba, ljin, haobo
Reviewed By: haobo
Subscribers: nkg-, leveldb
Differential Revision: https://reviews.facebook.net/D18735
Summary:
Currently, the in-memory hash index of blockbased table uses a precise hash map to track the prefix to block range mapping. In some use cases, especially when prefix itself is big, the memory overhead becomes a problem. This diff introduces a fixed hash bucket array that does not store the prefix and allows prefix collision, which is similar to the plaintable hash index, in order to reduce the memory consumption.
Just a quick draft, still testing and refining.
Test Plan: unit test and shadow testing
Reviewers: dhruba, kailiu, sdong
Reviewed By: sdong
Subscribers: leveldb
Differential Revision: https://reviews.facebook.net/D19047
Summary:
This is minor, but if we put the writing talbe factory as the third parameter, when we add a new table format, we'll have a situation:
1) block based factory
2) plain table factory
3) output factory
4) new format factory
I think it makes more sense to have output as the first parameter.
Also, fixed a NewAdaptiveTableFactory() call in unit test
Test Plan: unit test
Reviewers: sdong
Reviewed By: sdong
Subscribers: leveldb
Differential Revision: https://reviews.facebook.net/D19119
Summary: The new table factory is used if users want to convert a DB from one table format to the other. A user can use this table to open a DB written using one table format and write new files to another table format.
Test Plan: add a unit test
Reviewers: haobo, igor
Reviewed By: igor
Subscribers: dhruba, ljin, yhchiang, leveldb
Differential Revision: https://reviews.facebook.net/D19017
Summary: as title
Test Plan:
db_bench
the initial result is very promising. I will post results of complete
runs
Reviewers: dhruba, haobo, sdong, igor
Reviewed By: sdong
Subscribers: leveldb
Differential Revision: https://reviews.facebook.net/D18867
Summary: In BlockBasedTable::PrefixMayMatch() we calculate prefix even if bloom is not config. Move the check before
Test Plan: make all check
Reviewers: igor, ljin
Reviewed By: ljin
Subscribers: wuj, leveldb, haobo, yhchiang, dhruba
Differential Revision: https://reviews.facebook.net/D18993
Summary:
Clean PlainTableReader's data structures:
(1) inline bloom_ (in order to do this, change DynamicBloom to allow lazy initialization)
(2) remove some variables only used when initialization from the class
(3) put variables not used in normal read code paths to the end of the class and reference prefix_extractor directly
(4) make Options a reference.
Test Plan: make all check
Reviewers: haobo, ljin
Reviewed By: ljin
Subscribers: igor, yhchiang, dhruba, leveldb
Differential Revision: https://reviews.facebook.net/D18891
Summary: This is a temporary solution to a issue that we have with compression libraries. See task #4453446.
Test Plan: make check doesn't complain :)
Reviewers: haobo, ljin, yhchiang, dhruba, sdong
Reviewed By: sdong
Subscribers: leveldb
Differential Revision: https://reviews.facebook.net/D18975
Summary:
In this patch, try to allocate the whole iterator tree starting from DBIter from an arena
1. ArenaWrappedDBIter is created when serves as the entry point of an iterator tree, with an arena in it.
2. Add an option to create iterator from arena for following iterators: DBIter, MergingIterator, MemtableIterator, all mem table's iterators, all table reader's iterators and two level iterator.
3. MergeIteratorBuilder is created to incrementally build the tree of internal iterators. It is passed to mem table list and version set and add iterators to it.
Limitations:
(1) Only DB::NewIterator() without tailing uses the arena. Other cases, including readonly DB and compactions are still from malloc
(2) Two level iterator itself is allocated in arena, but not iterators inside it.
Test Plan: make all check
Reviewers: ljin, haobo
Reviewed By: haobo
Subscribers: leveldb, dhruba, yhchiang, igor
Differential Revision: https://reviews.facebook.net/D18513
Summary:
Forward iterator puts everything together in a flat structure instead of
a hierarchy of nested iterators. this should simplify the code and
provide better performance. It also enables more optimization since all
information are accessiable in one place.
Init evaluation shows about 6% improvement
Test Plan: db_test and db_bench
Reviewers: dhruba, igor, tnovak, sdong, haobo
Reviewed By: haobo
Subscribers: sdong, leveldb
Differential Revision: https://reviews.facebook.net/D18795
Summary:
Materialize the hash index to avoid the soaring cpu/flash usage
when initializing the database.
Test Plan: existing unit tests passed
Reviewers: sdong, haobo
Reviewed By: sdong
CC: leveldb
Differential Revision: https://reviews.facebook.net/D18339
Summary:
This diff addresses task #4296714 and rethinks how users provide us with TablePropertiesCollectors as part of Options.
Here's description of task #4296714:
I'm debugging #4295529 and noticed that our count of user properties kDeletedKeys is wrong. We're sharing one single InternalKeyPropertiesCollector with all Table Builders. In LOG Files, we're outputting number of kDeletedKeys as connected with a single table, while it's actually the total count of deleted keys since creation of the DB.
For example, this table has 3155 entries and 1391828 deleted keys.
The problem with current approach that we call methods on a single TablePropertiesCollector for all the tables we create. Even worse, we could do it from multiple threads at the same time and TablePropertiesCollector has no way of knowing which table we're calling it for.
Good part: Looks like nobody inside Facebook is using Options::table_properties_collectors. This means we should be able to painfully change the API.
In this change, I introduce TablePropertiesCollectorFactory. For every table we create, we call `CreateTablePropertiesCollector`, which creates a TablePropertiesCollector for a single table. We then use it sequentially from a single thread, which means it doesn't have to be thread-safe.
Test Plan:
Added a test in table_properties_collector_test that fails on master (build two tables, assert that kDeletedKeys count is correct for the second one).
Also, all other tests
Reviewers: sdong, dhruba, haobo, kailiu
Reviewed By: kailiu
CC: leveldb
Differential Revision: https://reviews.facebook.net/D18579
Summary:
Use autovector in MergingIterator so that if there are 4 or less child iterators in it, iterator wrappers are inline, which is more likely to be cache friendly.
Based on one test run with a shadow traffic of one product, it reduces CPU of MergingIterator::Seek() by half.
Test Plan: make all check
Reviewers: haobo, yhchiang, igor, dhruba
Reviewed By: igor
CC: leveldb
Differential Revision: https://reviews.facebook.net/D18531
Summary:
TLB page allocation errors are now logged to info logs, instead of stderr.
In order to do that, mem table rep's factory functions take a info logger now.
Test Plan: make all check
Reviewers: haobo, igor, yhchiang
Reviewed By: yhchiang
CC: leveldb, yhchiang, dhruba
Differential Revision: https://reviews.facebook.net/D18471
Summary: PlainTableFactory::PlainTableFactory() now has Huge TLB page feature turned on by default. Although it is not a public API (which we always turn the feature off now), our unit tests, like db_test sometimes uses it directly, which causes wrong coverage of codes. This patch fix it to allow unit tests to run with the correct setting
Test Plan: Run db_test and make sure this feature is not on any more.
Reviewers: igor, haobo
Reviewed By: igor
CC: yhchiang, dhruba, leveldb
Differential Revision: https://reviews.facebook.net/D18483
Summary: Add an option to allocate a piece of memory from huge page TLB. Add options to trigger it in dynamic bloom, plain table indexes andhash linked list hash table.
Test Plan: make all check
Reviewers: haobo, ljin
Reviewed By: haobo
CC: nkg-, dhruba, leveldb, igor, yhchiang
Differential Revision: https://reviews.facebook.net/D18357
Summary:
As a follow-up diff for https://reviews.facebook.net/D17805, add
optimization to check PrefixMayMatch on Seek()
Test Plan: make all check
Reviewers: igor, haobo, sdong, yhchiang, dhruba
Reviewed By: haobo
CC: leveldb
Differential Revision: https://reviews.facebook.net/D17853
Summary:
also add an override option total_order_iteration if you want to use full
iterator with prefix_extractor
Test Plan: make all check
Reviewers: igor, haobo, sdong, yhchiang
Reviewed By: haobo
CC: leveldb, dhruba
Differential Revision: https://reviews.facebook.net/D17805
Summary:
This is a temp solution to expose index sizes to users from PlainTableReader before we persistent them to files.
In this patch, the memory consumption of indexes used by PlainTableReader will be reported as two user defined properties, so that users can monitor them.
Test Plan:
Add a unit test.
make all check`
Reviewers: haobo, ljin
Reviewed By: haobo
CC: nkg-, yhchiang, igor, ljin, dhruba, leveldb
Differential Revision: https://reviews.facebook.net/D18195
Summary: Added benchmark functionality on the lines of folly/Benchmark.h
Test Plan: Added unit tests
Reviewers: igor, haobo, sdong, ljin, yhchiang, dhruba
Reviewed By: igor
CC: leveldb
Differential Revision: https://reviews.facebook.net/D17973
Summary:
A recent commit e37dd216f9 makes sure hash index can be used when reading existing files. This patch uses another way to achieve the approach:
(1) Currently, always writing kBinarySearch to files, despite of BlockBasedTableOptions.IndexType setting.
(2) When reading a file, read out the field, and make sure it is kBinarySearch, while always use index type by users.
The reason for doing it is, to reserve kHashSearch property on disk to future. If now we write out binary index for both of kHashSearch and kBinarySearch. We have to use a new flag in the future for hash index on disk, otherwise compatibility would break. Also, we want the real index type and type shown in properties block to be consistent.
Test Plan: make all check
Reviewers: haobo, kailiu
Reviewed By: kailiu
CC: igor, ljin, yhchiang, xjin, dhruba, leveldb
Differential Revision: https://reviews.facebook.net/D18009
Summary:
With the recent changes, there is no need to check the property block about the index block type.
If user want to use it, they don't really need any disk format change; everything happens in the fly.
Also another team encountered an error while reading the index type from properties.
Test Plan:
ran all the tests
Reviewers: sdong
CC:
Task ID: #
Blame Rev:
Summary:
From 2.6 to 2.7, property block name is renamed from rocksdb.stats to rocksdb.properties. Older properties were not able to be loaded. In 2.8, we seem to have added some logic that uses property block without checking null pointers, which create segment faults.
In this patch, we fix it by:
(1) try rocksdb.stats if rocksdb.properties is not found
(2) add some null checking before consuming rep->table_properties
Test Plan: make sure a file generated in 2.7 couldn't be opened now can be opened.
Reviewers: haobo, igor, yhchiang
Reviewed By: igor
CC: ljin, xjin, dhruba, kailiu, leveldb
Differential Revision: https://reviews.facebook.net/D17961
Summary:
Introducing RocksDBLite! Removes all the non-essential features and reduces the binary size. This effort should help our adoption on mobile.
Binary size when compiling for IOS (`TARGET_OS=IOS m static_lib`) is down to 9MB from 15MB (without stripping)
Test Plan: compiles :)
Reviewers: dhruba, haobo, ljin, sdong, yhchiang
Reviewed By: yhchiang
CC: leveldb
Differential Revision: https://reviews.facebook.net/D17835
Summary:
I was wrong about the "index builder", right now since we create index
by scanning both whole table and index, there is not need to preserve
the whole key as the index key.
I switch back to original way index which is both space efficient and
able to supprot in-fly construction of hash index.
IN this patch, I made minimal change since I'm not sure if we still need
the "pluggable index builder", under current circumstance it is of no use
and kind of over-engineered. But I'm not sure if we can still exploit its
usefulness in the future; otherwise I think I can just burn them with great
vengeance.
Test Plan: unit tests
Reviewers: sdong, haobo
CC: leveldb
Differential Revision: https://reviews.facebook.net/D17745
Summary: Based on previous patches, this diff eventually provides the end-to-end mechanism for users to specify the hash-index.
Test Plan: Wrote several new unit tests.
Reviewers: sdong, haobo, dhruba
Reviewed By: sdong
CC: leveldb
Differential Revision: https://reviews.facebook.net/D16539
Summary: Compiling for iOS has by default turned on -Wmissing-prototypes, which causes rocksdb to fail compiling. This diff turns on -Wmissing-prototypes in our compile options and cleans up all functions with missing prototypes.
Test Plan: compiles
Reviewers: dhruba, haobo, ljin, sdong
Reviewed By: ljin
CC: leveldb
Differential Revision: https://reviews.facebook.net/D17649
Summary: Our profile shows that in one of the applications, 5% of the CPU costs of PlainTableBuilder::Add() are spent on std::string stacks. By this simple change, we avoid this global reusable string. Also, we avoid another call of file appending, which probably gives another 2%.
Test Plan: make all check
Reviewers: haobo, ljin
Reviewed By: haobo
CC: igor, yhchiang, dhruba, leveldb
Differential Revision: https://reviews.facebook.net/D17601
Summary: This will allow us to disable them completely for iOS or for better performance
Test Plan: will run make all check
Reviewers: igor, haobo, dhruba
Reviewed By: haobo
CC: leveldb
Differential Revision: https://reviews.facebook.net/D17511
Summary:
Move PlainTableIterator's copied key from std::string local buffer to avoid paying the extra costs in std::string related to sharing. Reuse the same buffer class in DbIter. Move the class to dbformat.h.
This patch improves iterator performance significantly. Running this benchmark:
./table_reader_bench --num_keys2=17 --iterator --plain_table --time_unit=nanosecond
The average latency is improved to about 750 nanoseconds from 1100 nanoseconds.
Test Plan:
Add a unit test.
make all check
Reviewers: haobo, ljin
Reviewed By: haobo
CC: igor, yhchiang, dhruba, leveldb
Differential Revision: https://reviews.facebook.net/D17547
Summary:
In total order mode, iterator's seek() shouldn't check total order.
Also some cleaning up about checking null for shared pointers. I don't know the behavior before it.
This bug was reported by @igor.
Test Plan: test plain_table_db_test
Reviewers: ljin, haobo, igor
Reviewed By: igor
CC: yhchiang, dhruba, igor, leveldb
Differential Revision: https://reviews.facebook.net/D17391
Summary:
By constraining the probes within cache line(s), we can improve the
cache miss rate thus performance. This probably only makes sense for
in-memory workload so defaults the option to off.
Numbers and comparision can be found in wiki:
https://our.intern.facebook.com/intern/wiki/index.php/Ljin/rocksdb_perf/2014_03_17#Bloom_Filter_Study
Test Plan: benchmarked this change substantially. Will run make all check as well
Reviewers: haobo, igor, dhruba, sdong, yhchiang
Reviewed By: haobo
CC: leveldb
Differential Revision: https://reviews.facebook.net/D17133
Summary:
Previous code had two bugs:
* didn't initialize the table_magic_number_ explicitly -- as a
result a random junk number is stored for table_magic_number_, making
HasInitializedMagicNumber() always return true.
* if condition is inconrrect in set_table_magic_number(), and the return value is not checked.
I replace if-else by a stronger requirement enforced by assert().
Test Plan:
Previous sst_dump failed to work.
After the fix, things back to normal.
Reviewers: yhchiang
CC: haobo, sdong, igor, dhruba, leveldb
Differential Revision: https://reviews.facebook.net/D17055
Summary:
I'm cleaning up some code preparing for the big diff review tomorrow. This is the first part of the cleanup.
Changes are mostly cosmetic. The goal is to decrease amount of code difference between columnfamilies and master branch.
This diff also fixes race condition when dropping column family.
Test Plan: Ran db_stress with variety of parameters
Reviewers: dhruba, haobo
Differential Revision: https://reviews.facebook.net/D16833
Summary:
(1) Fix SanitizeOptions() to also check HashLinkList. The current
dynamic case just happens to work because the 2 classes have the same
layout.
(2) Do not delete SliceTransform object in HashSkipListFactory and
HashLinkListFactory destructor. Reason: SanitizeOptions() enforces
prefix_extractor and SliceTransform to be the same object when
Hash**Factory is used. This makes the behavior strange: when
Hash**Factory is used, prefix_extractor will be released by RocksDB. If
other memtable factory is used, prefix_extractor should be released by
user.
Test Plan: db_bench && make asan_check
Reviewers: haobo, igor, sdong
Reviewed By: igor
CC: leveldb, dhruba
Differential Revision: https://reviews.facebook.net/D16587
Summary:
Previous we did rough estimation of subindex size, which in worst case may result in array reallocation.
This patch aims to get the exact size and avoid any reallocation.
Test Plan: make all check
Reviewers: sdong, dhruba, haobo
Reviewed By: haobo
CC: leveldb
Differential Revision: https://reviews.facebook.net/D16125
Summary: The assert was pointless since if if prefix is the same as the whole key, assertion will surely fail. Reason behind is when performing the internal key comparison, if user keys are the same, *key with smaller transaction id* wins.
Test Plan: make -j32 && make check
Reviewers: sdong, dhruba, haobo
CC: leveldb
Differential Revision: https://reviews.facebook.net/D16551
Summary:
this is the key component extracted from diff: https://reviews.facebook.net/D14271
I separate it to a dedicated patch to make the review easier.
Test Plan: added a unit test and passed it.
Reviewers: haobo, sdong, dhruba
CC: leveldb
Differential Revision: https://reviews.facebook.net/D16245
Summary:
Due to a bad merge of D14163 and D14001 before checking in D14001, "direction_ = kForward;" in MergeIterator::Seek() was deleted my mistake (in commit b135d01e7b ). It will generate wrong results or assert failure after the sequence of Prev() (or SeekToLast()), Seek() and Prev().
Fix it
Test Plan: make all check
Reviewers: igor, haobo, dhruba
Reviewed By: igor
CC: yhchiang, i.am.jin.lei, ljin, leveldb
Differential Revision: https://reviews.facebook.net/D16527
Summary:
BinarySearchIndex didn't use unique_ptr to guard the block object nor
delete it in destructor, leading to valgrind failure for "definite
memory leak".
Test Plan:
re-ran the failed valgrind test cases
Summary:
My last diff was developed in MacOS but in devserver environment error occurs.
I dug into the problem and found the way we calcuate approximate data size is pretty out-of-date. We can use table properties to get more accurate results.
Test Plan: ran ./table_test and passed
Reviewers: igor, dhruba, haobo, sdong
CC: leveldb
Differential Revision: https://reviews.facebook.net/D16509
Summary:
This patch introduced a new table options that allows us to make
block-based table's index pluggable.
To support that new features:
* Code has been refacotred to be more flexible and supports this option well.
* More documentation is added for the existing obsecure functionalities.
* Big surgeon on DataBlockReader(), where the logic was really convoluted.
* Other small code cleanups.
The pluggablility will mostly affect development of internal modules
and won't change frequently, as a result I intentionally avoid
heavy-weight patterns (like factory) and try to make it simple.
Test Plan: make all check
Reviewers: haobo, sdong
Reviewed By: sdong
CC: leveldb
Differential Revision: https://reviews.facebook.net/D16395
Summary:
Previous code is too convoluted and I must be drunk for letting
such code to be written without a second thought.
Thanks to the discussion with @sdong, I added the `Options` when
generating the flusher, thus avoiding the tricks.
Just FYI: I resisted to add Options in flush_block_policy.h since I
wanted to avoid cyclic dependencies: FlushBlockPolicy dpends on Options
and Options also depends FlushBlockPolicy... While I appreciate my
effort to prevent it, the old design turns out creating more troubles than
it tried to avoid.
Test Plan: ran ./table_test
Reviewers: sdong
Reviewed By: sdong
CC: sdong, leveldb
Differential Revision: https://reviews.facebook.net/D16503
Summary:
Found some function follows camel style. When naming funciton, we have two styles:
Trivially expose internal data in readonly mode: `all_lower_case()`
Regular function: `CapitalizeFirstLetter()`
I renames these functions.
Test Plan: make -j32
Reviewers: haobo, sdong, dhruba, igor
CC: leveldb
Differential Revision: https://reviews.facebook.net/D16383
Summary:
PlainTable::Next() should pass the error message from ReadKey(). Now it would return a wrong error message.
Also improve the messages of status when failing to read
Test Plan: make all check
Reviewers: ljin, kailiu, haobo
Reviewed By: kailiu
CC: leveldb
Differential Revision: https://reviews.facebook.net/D16365
Summary: This bug caused server crash issues because the filter block is too big and kept purging out of cache.
Test Plan: Wrote a new unit tests to make sure it works.
Reviewers: dhruba, haobo, igor, sdong
Reviewed By: haobo
CC: leveldb
Differential Revision: https://reviews.facebook.net/D16221
Summary: Provide a public API for users to access the table properties for each SSTable.
Test Plan: Added a unit tests to test the function correctness under differnet conditions.
Reviewers: haobo, dhruba, sdong
Reviewed By: haobo
CC: leveldb
Differential Revision: https://reviews.facebook.net/D16083