Summary:
Something I am working on is going to expand usage of `BlockBasedTableBuilder::Rep::last_key`, but the existing code contract for `IndexBuilder::AddIndexEntry` makes that difficult because it modifies its `last_key` parameter to be the separator value recorded in the index, often something between the two boundary keys.
This change primarily changes the contract of that function and related functions to separate function inputs and outputs, without sacrificing efficiency. For efficiency, a reusable scratch string buffer is provided by the caller, which the callee can use (or not) in returning a result Slice. That should yield a performance improvement as we are reusing a buffer for keys rather than copying into a new one each time in the FindShort* functions, without any additional string copies or conditional branches.
Additional improvements in PartitionedIndexBuilder specifically:
* Reduce string copies by eliminating `sub_index_last_key_` and instead tracking the key for the next partition in a placeholder Entry.
* Simplify code and improve code quality by changing `sub_index_builder_` to unique_ptr.
* Eliminate unnecessary NewFlushBlockPolicy call/object.
Pull Request resolved: https://github.com/facebook/rocksdb/pull/12867
Test Plan: existing tests, crash test. Will validate performance along with the change this is setting up.
Reviewed By: anand1976
Differential Revision: D59793119
Pulled By: pdillinger
fbshipit-source-id: 556da75cf13b967511f84702b2713d152f536a07
Summary:
Logically strip the user-defined timestamp when L0 files are created during flush when `AdvancedColumnFamilyOptions.persist_user_defined_timestamps` is false. Logically stripping timestamp here means replacing the original user-defined timestamp with a mininum timestamp, which for now is hard coded to be all zeros bytes.
While working on this, I caught a missing piece on the `BlockBuilder` level for this feature. The current quick path `std::min(buffer_size, last_key_size)` needs a bit tweaking to work for this feature. When user-defined timestamp is stripped during block building, on writing first entry or right after resetting, `buffer` is empty and `buffer_size` is zero as usual. However, in follow-up writes, depending on the size of the stripped user-defined timestamp, and the size of the value, what's in `buffer` can sometimes be smaller than `last_key_size`, leading `std::min(buffer_size, last_key_size)` to truncate the `last_key`. Previous test doesn't caught the bug because in those tests, the size of the stripped user-defined timestamps bytes is smaller than the length of the value. In order to avoid the conditional operation, this PR changed the original trivial `std::min` operation into an arithmetic operation. Since this is a change in a hot and performance critical path, I did the following benchmark to check no observable regression is introduced.
```TEST_TMPDIR=/dev/shm/rocksdb1 ./db_bench -benchmarks=fillseq -memtablerep=vector -allow_concurrent_memtable_write=false -num=50000000```
Compiled with DEBUG_LEVEL=0
Test vs. control runs simulaneous for better accuracy, units = ops/sec
PR vs base:
Round 1: 350652 vs 349055
Round 2: 365733 vs 364308
Round 3: 355681 vs 354475
Pull Request resolved: https://github.com/facebook/rocksdb/pull/11557
Test Plan:
New timestamp specific test added or existing tests augmented, both are parameterized with `UserDefinedTimestampTestMode`:
`UserDefinedTimestampTestMode::kNormal` -> UDT feature enabled, write / read with min timestamp
`UserDefinedTimestampTestMode::kStripUserDefinedTimestamps` -> UDT feature enabled, write / read with min timestamp, set Options.persist_user_defined_timestamps to false.
```
make all check
./db_wal_test --gtest_filter="*WithTimestamp*"
./flush_job_test --gtest_filter="*WithTimestamp*"
./repair_test --gtest_filter="*WithTimestamp*"
./block_based_table_reader_test
```
Reviewed By: pdillinger
Differential Revision: D47027664
Pulled By: jowlyzhang
fbshipit-source-id: e729193b6334dfc63aaa736d684d907a022571f5
Summary:
This patch adds support in `BlockBuilder` to strip user-defined timestamp from the `key` added via `Add(key, value)` and its equivalent APIs. The stripping logic is different when the key is either a user key or an internal key, so the `BlockBuilder` is created with a flag to indicate that. This patch also add support on the read path to APIs `NewIndexIterator`, `NewDataIterator` to support pad a min timestamp.
Pull Request resolved: https://github.com/facebook/rocksdb/pull/11472
Test Plan:
Three test modes are added to parameterize existing tests:
UserDefinedTimestampTestMode::kNone -> UDT feature is not enabled
UserDefinedTimestampTestMode::kNormal -> UDT feature enabled, write / read with min timestamp
UserDefinedTimestampTestMode::kStripUserDefinedTimestamps -> UDT feature enabled, write / read with min timestamp, set `persist_user_defined_timestamps` where it applies to false.
The tests read/write with min timestamp so that point read and range scan can correctly read values in all three test modes.
`block_test` are parameterized to run with above three test modes and some additional parameteriazation
```
make all check
./block_test --gtest_filter="P/BlockTest*"
./block_test --gtest_filter="P/IndexBlockTest*"
```
Reviewed By: ltamasi
Differential Revision: D46200539
Pulled By: jowlyzhang
fbshipit-source-id: 59f5d6b584639976b69c2943eba723bd47d9b3c0
Summary:
Ran `find ./db/ -type f | xargs clang-format -i`. Excluded minor changes it tried to make on db/db_impl/. Everything else it changed was directly under db/ directory. Included minor manual touchups mentioned in PR commit history.
Pull Request resolved: https://github.com/facebook/rocksdb/pull/10910
Reviewed By: riversand963
Differential Revision: D40880683
Pulled By: ajkr
fbshipit-source-id: cfe26cda05b3fb9a72e3cb82c286e21d8c5c4174
Summary:
Instead of existing calls to ps from gnu_parallel, call a new wrapper that does ps, looks for unit test like processes, and uses pstack or gdb to print thread stack traces. Also, using `ps -wwf` instead of `ps -wf` ensures output is not cut off.
For security, CircleCI runs with security restrictions on ptrace (/proc/sys/kernel/yama/ptrace_scope = 1), and this change adds a work-around to `InstallStackTraceHandler()` (only used by testing tools) to allow any process from the same user to debug it. (I've also touched >100 files to ensure all the unit tests call this function.)
Pull Request resolved: https://github.com/facebook/rocksdb/pull/10828
Test Plan: local manual + temporary infinite loop in a unit test to observe in CircleCI
Reviewed By: hx235
Differential Revision: D40447634
Pulled By: pdillinger
fbshipit-source-id: 718a4c4a5b54fa0f9af2d01a446162b45e5e84e1
Summary:
InternalKeyComparator is an internal class which is a simple wrapper of Comparator. https://github.com/facebook/rocksdb/pull/8336 made Comparator customizeable. As a side effect, internal key comparator was made configurable too. This introduces overhead to this simple wrapper. For example, every InternalKeyComparator will have an std::vector attached to it, which consumes memory and possible allocation overhead too.
We remove InternalKeyComparator from being customizable by making InternalKeyComparator not a subclass of Comparator.
Pull Request resolved: https://github.com/facebook/rocksdb/pull/10342
Test Plan: Run existing CI tests and make sure it doesn't fail
Reviewed By: riversand963
Differential Revision: D37771351
fbshipit-source-id: 917256ee04b2796ed82974549c734fb6c4d8ccee
Summary:
Allow compaction_job_test, db_io_failure_test, dbformat_test, deletefile_test, and fault_injection_test to use a custom Env object. Also move ```RegisterCustomObjects``` declaration to a header file to simplify things.
Pull Request resolved: https://github.com/facebook/rocksdb/pull/9087
Test Plan: Run manually using "buck test rocksdb/src:compaction_job_test_fbcode" etc.
Reviewed By: riversand963
Differential Revision: D32007222
Pulled By: anand1976
fbshipit-source-id: 99af58559e25bf61563dfa95dc46e31fa7375792
Summary:
This is a PR generated **semi-automatically** by an internal tool to remove unused includes and `using` statements.
Pull Request resolved: https://github.com/facebook/rocksdb/pull/7604
Test Plan: make check
Reviewed By: ajkr
Differential Revision: D24579392
Pulled By: riversand963
fbshipit-source-id: c4bfa6c6b08da1de186690d37eb73d8fff45aecd
Summary:
Fixes Issue https://github.com/facebook/rocksdb/issues/7497
When allow_data_in_errors db_options is set, log error key details in `ParseInternalKey()`
Have fixed most of the calls. Have few TODOs still pending - because have to make more deeper changes to pass in the allow_data_in_errors flag. Will do those in a separate PR later.
Tests:
- make check
- some of the existing tests that exercise the "internal key too small" condition are: dbformat_test, cuckoo_table_builder_test
- some of the existing tests that exercise the corrupted key path are: corruption_test, merge_helper_test, compaction_iterator_test
Example of new status returns:
- Key too small - `Corrupted Key: Internal Key too small. Size=5`
- Corrupt key with allow_data_in_errors option set to false: `Corrupted Key: '<redacted>' seq:3, type:3`
- Corrupt key with allow_data_in_errors option set to true: `Corrupted Key: '61' seq:3, type:3`
Pull Request resolved: https://github.com/facebook/rocksdb/pull/7515
Reviewed By: ajkr
Differential Revision: D24240264
Pulled By: ramvadiv
fbshipit-source-id: bc48f5d4475ac19d7713e16df37505b31aac42e7
Summary:
When dynamically linking two binaries together, different builds of RocksDB from two sources might cause errors. To provide a tool for user to solve the problem, the RocksDB namespace is changed to a flag which can be overridden in build time.
Pull Request resolved: https://github.com/facebook/rocksdb/pull/6433
Test Plan: Build release, all and jtest. Try to build with ROCKSDB_NAMESPACE with another flag.
Differential Revision: D19977691
fbshipit-source-id: aa7f2d0972e1c31d75339ac48478f34f6cfcfb3e
Summary:
Many logging related source files are under util/. It will be more structured if they are together.
Pull Request resolved: https://github.com/facebook/rocksdb/pull/5387
Differential Revision: D15579036
Pulled By: siying
fbshipit-source-id: 3850134ed50b8c0bb40a0c8ae1f184fa4081303f
Summary:
There are too many types of files under util/. Some test related files don't belong to there or just are just loosely related. Mo
ve them to a new directory test_util/, so that util/ is cleaner.
Pull Request resolved: https://github.com/facebook/rocksdb/pull/5377
Differential Revision: D15551366
Pulled By: siying
fbshipit-source-id: 0f5c8653832354ef8caa31749c0143815d719e2c
Summary:
Do not consider the range tombstone sentinel key as causing 2 adjacent
sstables in a level to overlap. When a range tombstone's end key is the
largest key in an sstable, the sstable's end key is so to a "sentinel"
value that is the smallest key in the next sstable with a sequence
number of kMaxSequenceNumber. This "sentinel" is guaranteed to not
overlap in internal-key space with the next sstable. Unfortunately,
GetOverlappingFiles uses user-keys to determine overlap and was thus
considering 2 adjacent sstables in a level to overlap if they were
separated by this sentinel key. This in turn would cause compactions to
be larger than necessary.
Note that this conflicts with
https://github.com/facebook/rocksdb/pull/2769 and cases
`DBRangeDelTest.CompactionTreatsSplitInputLevelDeletionAtomically` to
fail.
Pull Request resolved: https://github.com/facebook/rocksdb/pull/4050
Differential Revision: D8844423
Pulled By: ajkr
fbshipit-source-id: df3f9f1db8f4cff2bff77376b98b83c2ae1d155b
Summary:
to void future bug that caused by the mix of userkey/internalkey
Closes https://github.com/facebook/rocksdb/pull/2084
Differential Revision: D4825889
Pulled By: lightmark
fbshipit-source-id: 28411db
Summary:
The current implementation find the first different byte and try to increment it, if it cannot it return the original key
we can improve this by keep going after the first different byte to find the first non 0xFF byte and increment it
After trying this patch on some logdevice sst files I see decrease in there index block size by 8.5%
Test Plan: existing tests and updated test
Reviewers: yhchiang, andrewkr, sdong
Reviewed By: sdong
Subscribers: andrewkr, dhruba
Differential Revision: https://reviews.facebook.net/D56241
Summary:
We were manipulating `const char*` arrays in CompactionJob to
change the sequence number/types of keys. This patch changes
UpdateInternalKey() to use string methods to do the manipulation
and updates all calls accordingly.
Test Plan:
Added test case for UpdateInternalKey() in dbformat_test.
make && make check
Reviewers: sdong, rven, yhchiang, igor
Reviewed By: igor
Subscribers: dhruba, leveldb
Differential Revision: https://reviews.facebook.net/D41985
Summary:
Our existing test notation is very similar to what is used in gtest. It makes it easy to adopt what is different.
In this diff I modify existing [[ https://code.google.com/p/googletest/wiki/Primer#Test_Fixtures:_Using_the_Same_Data_Configuration_for_Multiple_Te | test fixture ]] classes to inherit from `testing::Test`. Also for unit tests that use fixture class, `TEST` is replaced with `TEST_F` as required in gtest.
There are several custom `main` functions in our existing tests. To make this transition easier, I modify all `main` functions to fallow gtest notation. But eventually we can remove them and use implementation of `main` that gtest provides.
```lang=bash
% cat ~/transform
#!/bin/sh
files=$(git ls-files '*test\.cc')
for file in $files
do
if grep -q "rocksdb::test::RunAllTests()" $file
then
if grep -Eq '^class \w+Test {' $file
then
perl -pi -e 's/^(class \w+Test) {/${1}: public testing::Test {/g' $file
perl -pi -e 's/^(TEST)/${1}_F/g' $file
fi
perl -pi -e 's/(int main.*\{)/${1}::testing::InitGoogleTest(&argc, argv);/g' $file
perl -pi -e 's/rocksdb::test::RunAllTests/RUN_ALL_TESTS/g' $file
fi
done
% sh ~/transform
% make format
```
Second iteration of this diff contains only scripted changes.
Third iteration contains manual changes to fix last errors and make it compilable.
Test Plan:
Build and notice no errors.
```lang=bash
% USE_CLANG=1 make check -j55
```
Tests are still testing.
Reviewers: meyering, sdong, rven, igor
Reviewed By: igor
Subscribers: dhruba, leveldb
Differential Revision: https://reviews.facebook.net/D35157
Summary:
Modify a functioin TrimAppend in dbformat.h: IterKey. Write a test for it in dbformat_test
Use IterKey in block::Iter to replace std::string to reduce malloc.
Evaluate it using perf record.
malloc: 4.26% -> 2.91%
free: 3.61% -> 3.08%
Test Plan:
make all check
./valgrind db_test dbformat_test
Reviewers: ljin, haobo, yhchiang, dhruba, igor, sdong
Reviewed By: sdong
Differential Revision: https://reviews.facebook.net/D20433
Summary:
Change namespace from leveldb to rocksdb. This allows a single
application to link in open-source leveldb code as well as
rocksdb code into the same process.
Test Plan: compile rocksdb
Reviewers: emayanke
Reviewed By: emayanke
CC: leveldb
Differential Revision: https://reviews.facebook.net/D13287
Summary:
The default compilation process now uses "-Wall" to compile.
Fix all compilation error generated by gcc.
Test Plan: make all check
Reviewers: heyongqiang, emayanke, sheki
Reviewed By: heyongqiang
CC: MarkCallaghan
Differential Revision: https://reviews.facebook.net/D6525
- Replace raw slice comparison with a call to user comparator.
Added test for custom comparators.
- Fix end of namespace comments.
- Fixed bug in picking inputs for a level-0 compaction.
When finding overlapping files, the covered range may expand
as files are added to the input set. We now correctly expand
the range when this happens instead of continuing to use the
old range. For example, suppose L0 contains files with the
following ranges:
F1: a .. d
F2: c .. g
F3: f .. j
and the initial compaction target is F3. We used to search
for range f..j which yielded {F2,F3}. However we now expand
the range as soon as another file is added. In this case,
when F2 is added, we expand the range to c..j and restart the
search. That picks up file F1 as well.
This change fixes a bug related to deleted keys showing up
incorrectly after a compaction as described in Issue 44.
(Sync with upstream @25072954)
* env_chromium.cc should not export symbols.
* Fix MSVC warnings.
* Removed large value support.
* Fix broken reference to documentation file
git-svn-id: https://leveldb.googlecode.com/svn/trunk@24 62dab493-f737-651d-591e-8d6aee1b9529