Summary:
Unit test names, together with other components, are used to create log files
during some internal testing. Overly long names cause infra failure due to file
names being too long.
Look for internal tests.
Pull Request resolved: https://github.com/facebook/rocksdb/pull/6352
Differential Revision: D19649307
Pulled By: riversand963
fbshipit-source-id: 6f29de096e33c0eaa87d9c8702f810eda50059e7
Summary:
This PR adds a `rocksdb_options_set_atomic_flush` function to the C API.
Pull Request resolved: https://github.com/facebook/rocksdb/pull/6307
Differential Revision: D19451313
Pulled By: ltamasi
fbshipit-source-id: 750495642ef55b1ea7e13477f85c38cd6574849c
Summary:
This makes it easier to call the functions from Rust as otherwise they require mutable types.
Pull Request resolved: https://github.com/facebook/rocksdb/pull/6283
Differential Revision: D19349991
Pulled By: wqfish
fbshipit-source-id: e8da7a75efe8cd97757baef8ca844a054f2519b4
Summary:
Fix an error message when CURRENT is not found.
Test plan (dev server)
```
make check
```
Pull Request resolved: https://github.com/facebook/rocksdb/pull/6264
Differential Revision: D19300699
Pulled By: riversand963
fbshipit-source-id: 303fa206386a125960ecca1dbdeff07422690caf
Summary:
It seems that the C-API doesn't expose the range delete functionality at the moment, so add the API.
Pull Request resolved: https://github.com/facebook/rocksdb/pull/6259
Differential Revision: D19290320
Pulled By: pdillinger
fbshipit-source-id: 3f403a4c3446d2042d55f1ece7cdc9c040f40c27
Summary:
Formatter somehow complains some recent lines changed. Apply them to make the formatter happy.
Pull Request resolved: https://github.com/facebook/rocksdb/pull/6138
Test Plan: See CI passes.
Differential Revision: D18895950
fbshipit-source-id: 7d1696cf3e3a682bc10a30cdca748a23c6565255
Summary:
Add overrides needed in FilterPolicy wrapper to fix
rocksdb_filterpolicy_create_bloom_full (see issue https://github.com/facebook/rocksdb/issues/6129). Re-enabled
assertion in BloomFilterPolicy::CreateFilter that was being violated.
Expanded c_test to identify Bloom filter implementations by FP counts.
(Without the fix, updated test will trigger assertion and fail otherwise
without the assertion.)
Fixes https://github.com/facebook/rocksdb/issues/6129
Pull Request resolved: https://github.com/facebook/rocksdb/pull/6132
Test Plan: updated c_test, also run under valgrind.
Differential Revision: D18864911
Pulled By: pdillinger
fbshipit-source-id: 08e81d7b5368b08e501cd402ef5583f2650c19fa
Summary:
Adds two missing functions to the C-API:
- `rocksdb_block_based_options_set_data_block_index_type`
- `rocksdb_block_based_options_set_data_block_hash_ratio`
This enables users in other languages to enjoy the new(-ish) feature.
The changes here are partially overlapping with [another PR](https://github.com/facebook/rocksdb/pull/5630) but are more focused on the DataBlock indexing options.
Pull Request resolved: https://github.com/facebook/rocksdb/pull/6101
Differential Revision: D18765639
fbshipit-source-id: 4a8947e71b179f26fa1eb83c267dd47ee64ac3b3
Summary:
MyRocks currently sets `max_write_buffer_number_to_maintain` in order to maintain enough history for transaction conflict checking. The effectiveness of this approach depends on the size of memtables. When memtables are small, it may not keep enough history; when memtables are large, this may consume too much memory.
We are proposing a new way to configure memtable list history: by limiting the memory usage of immutable memtables. The new option is `max_write_buffer_size_to_maintain` and it will take precedence over the old `max_write_buffer_number_to_maintain` if they are both set to non-zero values. The new option accounts for the total memory usage of flushed immutable memtables and mutable memtable. When the total usage exceeds the limit, RocksDB may start dropping immutable memtables (which is also called trimming history), starting from the oldest one.
The semantics of the old option actually works both as an upper bound and lower bound. History trimming will start if number of immutable memtables exceeds the limit, but it will never go below (limit-1) due to history trimming.
In order the mimic the behavior with the new option, history trimming will stop if dropping the next immutable memtable causes the total memory usage go below the size limit. For example, assuming the size limit is set to 64MB, and there are 3 immutable memtables with sizes of 20, 30, 30. Although the total memory usage is 80MB > 64MB, dropping the oldest memtable will reduce the memory usage to 60MB < 64MB, so in this case no memtable will be dropped.
Pull Request resolved: https://github.com/facebook/rocksdb/pull/5022
Differential Revision: D14394062
Pulled By: miasantreble
fbshipit-source-id: 60457a509c6af89d0993f988c9b5c2aa9e45f5c5
Summary:
`wal_batch.writeBatchPtr.release()` gives up the ownership of the original `WriteBatch`, but there is no new owner, which causes memory leak.
The patch is simple. Removing `release()` prevent ownership change. `std::move` is for speed.
Pull Request resolved: https://github.com/facebook/rocksdb/pull/5515
Differential Revision: D16264281
Pulled By: riversand963
fbshipit-source-id: 51c556b7a1c977325c3aa24acb636303847151fa
Summary:
Add C binding for secondary instance as well as unit test.
Test plan (on devserver)
```
$make clean && COMPILE_WITH_ASAN=1 make -j20 all
$./c_test
$make check
```
Pull Request resolved: https://github.com/facebook/rocksdb/pull/5505
Differential Revision: D16000043
Pulled By: riversand963
fbshipit-source-id: 3361ef6bfdf4ce12438cee7290a0ac203b5250bd
Summary:
Performing unordered writes in rocksdb when unordered_write option is set to true. When enabled the writes to memtable are done without joining any write thread. This offers much higher write throughput since the upcoming writes would not have to wait for the slowest memtable write to finish. The tradeoff is that the writes visible to a snapshot might change over time. If the application cannot tolerate that, it should implement its own mechanisms to work around that. Using TransactionDB with WRITE_PREPARED write policy is one way to achieve that. Doing so increases the max throughput by 2.2x without however compromising the snapshot guarantees.
The patch is prepared based on an original by siying
Existing unit tests are extended to include unordered_write option.
Benchmark Results:
```
TEST_TMPDIR=/dev/shm/ ./db_bench_unordered --benchmarks=fillrandom --threads=32 --num=10000000 -max_write_buffer_number=16 --max_background_jobs=64 --batch_size=8 --writes=3000000 -level0_file_num_compaction_trigger=99999 --level0_slowdown_writes_trigger=99999 --level0_stop_writes_trigger=99999 -enable_pipelined_write=false -disable_auto_compactions --unordered_write=1
```
With WAL
- Vanilla RocksDB: 78.6 MB/s
- WRITER_PREPARED with unordered_write: 177.8 MB/s (2.2x)
- unordered_write: 368.9 MB/s (4.7x with relaxed snapshot guarantees)
Without WAL
- Vanilla RocksDB: 111.3 MB/s
- WRITER_PREPARED with unordered_write: 259.3 MB/s MB/s (2.3x)
- unordered_write: 645.6 MB/s (5.8x with relaxed snapshot guarantees)
- WRITER_PREPARED with unordered_write disable concurrency control: 185.3 MB/s MB/s (2.35x)
Limitations:
- The feature is not yet extended to `max_successive_merges` > 0. The feature is also incompatible with `enable_pipelined_write` = true as well as with `allow_concurrent_memtable_write` = false.
Pull Request resolved: https://github.com/facebook/rocksdb/pull/5218
Differential Revision: D15219029
Pulled By: maysamyabandeh
fbshipit-source-id: 38f2abc4af8780148c6128acdba2b3227bc81759
Summary:
There were no C bindings for lowering thread pool priority. This adds those.
Pull Request resolved: https://github.com/facebook/rocksdb/pull/5285
Differential Revision: D15290050
Pulled By: siying
fbshipit-source-id: b2ed94d0c39d27434ace2204829a242b53d0d67a
Summary:
Part of compaction cpu goes to processing snapshot list, the larger the list the bigger the overhead. Although the lifetime of most of the snapshots is much shorter than the lifetime of compactions, the compaction conservatively operates on the list of snapshots that it initially obtained. This patch allows the snapshot list to be updated via a callback if the compaction is taking long. This should let the compaction to continue more efficiently with much smaller snapshot list.
For simplicity, to avoid the feature is disabled in two cases: i) When more than one sub-compaction are sharing the same snapshot list, ii) when Range Delete is used in which the range delete aggregator has its own copy of snapshot list.
This fixes the reverted https://github.com/facebook/rocksdb/pull/5099 issue with range deletes.
Pull Request resolved: https://github.com/facebook/rocksdb/pull/5278
Differential Revision: D15203291
Pulled By: maysamyabandeh
fbshipit-source-id: fa645611e606aa222c7ce53176dc5bb6f259c258
Summary:
Our daily stress tests are failing after this feature. Reverting temporarily until we figure the reason for test failures.
Pull Request resolved: https://github.com/facebook/rocksdb/pull/5269
Differential Revision: D15151285
Pulled By: maysamyabandeh
fbshipit-source-id: e4002b99690a97df30d4b4b58bf0f61e9591bc6e
Summary:
Part of compaction cpu goes to processing snapshot list, the larger the list the bigger the overhead. Although the lifetime of most of the snapshots is much shorter than the lifetime of compactions, the compaction conservatively operates on the list of snapshots that it initially obtained. This patch allows the snapshot list to be updated via a callback if the compaction is taking long. This should let the compaction to continue more efficiently with much smaller snapshot list.
Pull Request resolved: https://github.com/facebook/rocksdb/pull/5099
Differential Revision: D15086710
Pulled By: maysamyabandeh
fbshipit-source-id: 7649f56c3b6b2fb334962048150142a3bf9c1a12
Summary:
We found that the behavior of CompactionFilter::IgnoreSnapshots() = false isn't
what we have expected. We thought that snapshot will always be preserved.
However, we just realized that, if no snapshot is created while compaction
starts, and a snapshot is created after that, the data seen from the snapshot
can successfully be dropped by the compaction. This creates a strange behavior
to the feature, which is hard to explain. Like what is documented in code
comment, this feature is not very useful with snapshot anyway. The decision
is to deprecate the feature.
We keep the function to avoid to break users code. However, we will fail
compactions if false is returned.
Pull Request resolved: https://github.com/facebook/rocksdb/pull/4954
Differential Revision: D13981900
Pulled By: siying
fbshipit-source-id: 2db8c2c3865acd86a28dca625945d1481b1d1e36
Summary:
Current implementation of `current_over_upper_bound_` fails to take into consideration that keys might be invalid in either base iterator or delta iterator. Calling key() in such scenario will lead to assertion failure and runtime errors.
This PR addresses the bug by adding check for valid keys before calling `IsOverUpperBound()`, also added test coverage for iterate_upper_bound usage in BaseDeltaIterator
Also recommit https://github.com/facebook/rocksdb/pull/4656 (It was reverted earlier due to bugs)
Pull Request resolved: https://github.com/facebook/rocksdb/pull/4702
Differential Revision: D13146643
Pulled By: miasantreble
fbshipit-source-id: 6d136929da12d0f2e2a5cea474a8038ec5cdf1d0
Summary:
Currently transaction iterator does not apply `ReadOptions.iterate_upper_bound` when iterating. This PR attempts to fix the problem by having `BaseDeltaIterator` enforcing the upper bound check when iterator state is changed.
Pull Request resolved: https://github.com/facebook/rocksdb/pull/4656
Differential Revision: D13039257
Pulled By: miasantreble
fbshipit-source-id: 909eb9f6b4597a4d80418fb139f32ec82c6ec1d1
Summary:
he ratio of num_deletions to num_entries of a level can be useful to determine if a manual compaction needs to be triggered on a level.
Also refer #3980
Pull Request resolved: https://github.com/facebook/rocksdb/pull/4623
Differential Revision: D13045744
Pulled By: sagar0
fbshipit-source-id: 71f3c8e363a8ffd194ec3bb0ed0b69612231f0b3
Summary:
Ran the following commands to recursively change all the files under RocksDB:
```
find . -type f -name "*.cc" -exec sed -i 's/ unique_ptr/ std::unique_ptr/g' {} +
find . -type f -name "*.cc" -exec sed -i 's/<unique_ptr/<std::unique_ptr/g' {} +
find . -type f -name "*.cc" -exec sed -i 's/ shared_ptr/ std::shared_ptr/g' {} +
find . -type f -name "*.cc" -exec sed -i 's/<shared_ptr/<std::shared_ptr/g' {} +
```
Running `make format` updated some formatting on the files touched.
Pull Request resolved: https://github.com/facebook/rocksdb/pull/4638
Differential Revision: D12934992
Pulled By: sagar0
fbshipit-source-id: 45a15d23c230cdd64c08f9c0243e5183934338a8
Summary:
Please consider this small PR providing access to the `MemoryUsage::GetApproximateMemoryUsageByType` function in plain C API. Actually I'm working on Go application and now trying to investigate the reasons of high memory consumption (#4313). Go [wrappers](https://github.com/tecbot/gorocksdb) are built on the top of Rocksdb C API. According to the #706, `MemoryUsage::GetApproximateMemoryUsageByType` is considered as the best option to get database internal memory usage stats, but it wasn't supported in C API yet.
Pull Request resolved: https://github.com/facebook/rocksdb/pull/4340
Differential Revision: D9655135
Pulled By: ajkr
fbshipit-source-id: a3d2f3f47c143ae75862fbcca2f571ea1b49e14a
Summary:
As you know, almost all compilers support "pragma once" keyword instead of using include guards. To be keep consistency between header files, all header files are edited.
Besides this, try to fix some warnings about loss of data.
Pull Request resolved: https://github.com/facebook/rocksdb/pull/4339
Differential Revision: D9654990
Pulled By: ajkr
fbshipit-source-id: c2cf3d2d03a599847684bed81378c401920ca848
Summary:
We want to sample the file I/O issued by RocksDB and report the function calls. This requires us to include the file paths otherwise it's hard to tell what has been going on.
Pull Request resolved: https://github.com/facebook/rocksdb/pull/4039
Differential Revision: D8670178
Pulled By: riversand963
fbshipit-source-id: 97ee806d1c583a2983e28e213ee764dc6ac28f7a
Summary:
…ression
For `CompressionType` we have options `compression` and `bottommost_compression`. Thus, to make the compression options consitent with the compression type when bottommost_compression is enabled, we add the bottommost_compression_opts
Closes https://github.com/facebook/rocksdb/pull/3985
Reviewed By: riversand963
Differential Revision: D8385911
Pulled By: zhichao-cao
fbshipit-source-id: 07bc533dd61bcf1cef5927d8d62901c13d38d5fc
Summary:
Top-level index in partitioned index/filter blocks are small and could be pinned in memory. So far we use that by cache_index_and_filter_blocks to false. This however make it difficult to keep account of the total memory usage. This patch introduces pin_top_level_index_and_filter which in combination with cache_index_and_filter_blocks=true keeps the top-level index in cache and yet pinned them to avoid cache misses and also cache lookup overhead.
Closes https://github.com/facebook/rocksdb/pull/4037
Differential Revision: D8596218
Pulled By: maysamyabandeh
fbshipit-source-id: 3a5f7f9ca6b4b525b03ff6bd82354881ae974ad2
Summary:
Add flush_before_backup to rocksdb_backup_engine_create_new_backup. make c api able to control the flush before backup behavior.
Closes https://github.com/facebook/rocksdb/pull/3897
Differential Revision: D8157676
Pulled By: ajkr
fbshipit-source-id: 88998c62f89f087bf8672398fd7ddafabbada505
Summary:
A minor change: I wrapped TransactionLogIterator for the C API.
I needed that for the golang binding.
Closes https://github.com/facebook/rocksdb/pull/3304
Differential Revision: D6628736
Pulled By: miasantreble
fbshipit-source-id: 3374f3c64b1d7b225696b8767090917761e2f30a
Summary:
Currently, the `rocksdb_options_set_ratelimiter` in `c.cc` will change the input to nil, which make it is
not possible to use the shared rate limiter create by `rocksdb_ratelimiter_create` in different rocksdb option.
In this pr, I changed it to shared ptr.
Closes https://github.com/facebook/rocksdb/pull/3758
Differential Revision: D7749740
Pulled By: ajkr
fbshipit-source-id: c6121f8ca75402afdb4b295ce63c2338d253a1b5
Summary:
Fix the following gcc-8 warnings:
- conflicting C language linkage declaration [-Werror]
- writing to an object with no trivial copy-assignment [-Werror=class-memaccess]
- array subscript -1 is below array bounds [-Werror=array-bounds]
Solves https://github.com/facebook/rocksdb/issues/3716
Closes https://github.com/facebook/rocksdb/pull/3736
Differential Revision: D7684161
Pulled By: yiwu-arbug
fbshipit-source-id: 47c0423d26b74add251f1d3595211eee1e41e54a
Summary:
This pull request exposes the interface of PerfContext as C API
Closes https://github.com/facebook/rocksdb/pull/3607
Differential Revision: D7294225
Pulled By: ajkr
fbshipit-source-id: eddcfbc13538f379950b2c8b299486695ffb5e2c
Summary:
Expose read and write options via the C API
Closes https://github.com/facebook/rocksdb/pull/3185
Differential Revision: D6389658
Pulled By: sagar0
fbshipit-source-id: 1848912750329a476805b3cb2f315e7b71f61472
Summary:
SUMMARY
Moves the bytes_per_sync and wal_bytes_per_sync options from immutableoptions to mutable options. Also if wal_bytes_per_sync is changed, the wal file and memtables are flushed.
TEST PLAN
ran make check
all passed
Two new tests SetBytesPerSync, SetWalBytesPerSync check that after issuing setoptions with a new value for the var, the db options have the new value.
Closes https://github.com/facebook/rocksdb/pull/2893
Reviewed By: yiwu-arbug
Differential Revision: D5845814
Pulled By: TheRushingWookie
fbshipit-source-id: 93b52d779ce623691b546679dcd984a06d2ad1bd
Summary:
Added possibility to set save points in transactions and then rollback to them
Closes https://github.com/facebook/rocksdb/pull/2876
Differential Revision: D5825829
Pulled By: yiwu-arbug
fbshipit-source-id: 62168992340bbcddecdaea3baa2a678475d1429d
Summary:
Added some bindings for `OptimisticTransactionDB` in C API
Closes https://github.com/facebook/rocksdb/pull/2823
Differential Revision: D5820672
Pulled By: yiwu-arbug
fbshipit-source-id: 7efd17f619cc0741feddd2050b8fc856f9288350
Summary:
Solves #2632
Added OptimisticTransactionDB to the C API.
Added missing merge operations to Transaction.
Added missing get_for_update operation to transaction
If required I will create tests for this another day.
Closes https://github.com/facebook/rocksdb/pull/2633
Differential Revision: D5600906
Pulled By: yiwu-arbug
fbshipit-source-id: da23e4484433d8f59d471f778ff2ae210e3fe4eb
Summary:
It would be super helpful to not have to recompile rocksdb to get this performance tweak for mechanical disks.
I have signed the CLA.
Closes https://github.com/facebook/rocksdb/pull/2718
Differential Revision: D5606994
Pulled By: yiwu-arbug
fbshipit-source-id: c05e92bad0d03bd38211af1e1ced0d0d1e02f634
Summary:
This reverts the previous commit 1d7048c598, which broke the build.
Did a `git revert 1d7048c`.
Closes https://github.com/facebook/rocksdb/pull/2627
Differential Revision: D5476473
Pulled By: sagar0
fbshipit-source-id: 4756ff5c0dfc88c17eceb00e02c36176de728d06
Summary: This uses `clang-tidy` to comment out unused parameters (in functions, methods and lambdas) in fbcode. Cases that the tool failed to handle are fixed manually.
Reviewed By: igorsugak
Differential Revision: D5454343
fbshipit-source-id: 5dee339b4334e25e963891b519a5aa81fbf627b2
Summary:
Previously sst_file_writer only supports kTypeValue, we need kTypeMerge and kTypeDeletion also as user requested.
Closes https://github.com/facebook/rocksdb/pull/2361
Differential Revision: D5139402
Pulled By: lightmark
fbshipit-source-id: 092a60756d01692539d817a3765ebfd58a8d7f88
Summary:
I've added functions to the C API to support Transactions as requested in #1637 and to support Checkpoint.
I have also added the corresponding tests to c_test.c
For now, the following is omitted:
1. Optimistic Transactions
2. The column family variation of functions
Closes https://github.com/facebook/rocksdb/pull/2236
Differential Revision: D4989510
Pulled By: yiwu-arbug
fbshipit-source-id: 518cb39f76d5e9ec9690d633fcdc014b98958071
Summary:
Replace Options::use_direct_writes with Options::use_direct_io_for_flush_and_compaction
Now if Options::use_direct_io_for_flush_and_compaction = true, we will enable direct io for both reads and writes for flush and compaction job. Whereas Options::use_direct_reads controls user reads like iterator and Get().
Closes https://github.com/facebook/rocksdb/pull/2117
Differential Revision: D4860912
Pulled By: lightmark
fbshipit-source-id: d93575a8a5e780cf7e40797287edc425ee648c19
Summary:
I've added functions to the C API to support WriteBatchWithIndex as requested in #1833.
I've also added unit tests to c_test
I've implemented the WriteBatchWithIndex variation of every function available for regular WriteBatch. And added additional functions unique to WriteBatchWithIndex.
For now, the following is omitted:
1. The ability to create WriteBatchWithIndex's custom batch-only iterator as I'm not sure what its purpose is. It should be possible to add later if anyone wants it.
2. The ability to create the batch with a fallback comparator, since it appears to be unnecessary. I believe the column family comparator will be used for this, meaning those using a custom comparator can just use the column family variations.
Closes https://github.com/facebook/rocksdb/pull/1985
Differential Revision: D4760039
Pulled By: siying
fbshipit-source-id: 393227e
Summary:
The comparator param in SstFileWriter constructor is redundant as it already exists as a field in options. So the current SstFileWriter constructor should be deprecated in favor of a new one which does not take a comparator.
Note that the jni/java apis have not been touched yet.
Closes https://github.com/facebook/rocksdb/pull/1978
Differential Revision: D4685629
Pulled By: sagar0
fbshipit-source-id: 372ce96
Summary:
This option is needed to be enabled for Direct IO
and I cannot think of a reason where we need to disable it
remove it and default it to true
Closes https://github.com/facebook/rocksdb/pull/1944
Differential Revision: D4641088
Pulled By: IslamAbdelRahman
fbshipit-source-id: d7085b9
Summary:
…action
The two options, min_partial_merge_operands and verify_checksums_in_compaction, are not seldom used. Remove them to reduce the total number of options. Also remove them from Java and C interface.
Closes https://github.com/facebook/rocksdb/pull/1902
Differential Revision: D4601219
Pulled By: siying
fbshipit-source-id: aad4cb2
Summary:
Remove disableDataSync, and another similarly named disable_data_sync options.
This is being done to simplify options, and also because the performance gains of this feature can be achieved by other methods.
Closes https://github.com/facebook/rocksdb/pull/1859
Differential Revision: D4541292
Pulled By: sagar0
fbshipit-source-id: 5b3a6ca
Summary:
- rocksdb_property_int (so that we don't have to parse strings)
- and rocksdb_set_options (to allow controlling options via strings)
- a few other missing options exposed
- a documentation comment fix
Closes https://github.com/facebook/rocksdb/pull/1793
Differential Revision: D4456569
Pulled By: yiwu-arbug
fbshipit-source-id: 9f1fac1
Summary:
Allow set SavePoint to WriteBatch in C ABI.
Closes https://github.com/facebook/rocksdb/pull/1698
Differential Revision: D4378556
Pulled By: yiwu-arbug
fbshipit-source-id: afca746
Summary:
Since the backup work as snapshot, we should only copy
the bytes of the wal while we get the alive files.
Closes https://github.com/facebook/rocksdb/pull/1733
Differential Revision: D4373457
Pulled By: ajkr
fbshipit-source-id: 389318f
Summary:
Seem that writebatch delete range can work now, so I add C API for later use.
Btw, can we use this feature in production now?
Closes https://github.com/facebook/rocksdb/pull/1647
Differential Revision: D4314534
Pulled By: ajkr
fbshipit-source-id: e835165
Summary:
Add C API to set base_backgroud_compactions
Closes https://github.com/facebook/rocksdb/pull/1571
Differential Revision: D4245709
Pulled By: yiwu-arbug
fbshipit-source-id: 792c6b8
Summary:
IngestExternalFile is very useful when doing bulk load. This pr expose this API to c so many bindings can benefit from it too.
Closes https://github.com/facebook/rocksdb/pull/1454
Differential Revision: D4113420
Pulled By: yiwu-arbug
fbshipit-source-id: 307c6ae
Summary: To reduce number of options, merge source_compaction_factor, max_grandparent_overlap_bytes and expanded_compaction_factor into max_compaction_bytes.
Test Plan: Add two new unit tests. Run all existing tests, including jtest.
Reviewers: yhchiang, igor, IslamAbdelRahman
Reviewed By: IslamAbdelRahman
Subscribers: leveldb, andrewkr, dhruba
Differential Revision: https://reviews.facebook.net/D59829
Summary: Extend the option memtable_prefix_bloom_huge_page_tlb_size from just putting memtable bloom filter to huge page to memtable itself too.
Test Plan: Run all existing tests.
Reviewers: IslamAbdelRahman, yhchiang, andrewkr
Reviewed By: andrewkr
Subscribers: leveldb, andrewkr, dhruba
Differential Revision: https://reviews.facebook.net/D60513
Summary: filter_deltes is not a frequently used feature. Remove it.
Test Plan: Run all test suites.
Reviewers: igor, yhchiang, IslamAbdelRahman
Reviewed By: IslamAbdelRahman
Subscribers: leveldb, andrewkr, dhruba
Differential Revision: https://reviews.facebook.net/D59427
Summary:
memtable_prefix_bloom_probes is not a critical option. Remove it to reduce number of options.
It's easier for users to make mistakes with memtable_prefix_bloom_bits, turn it to memtable_prefix_bloom_bits_ratio
Test Plan: Run all existing tests
Reviewers: yhchiang, igor, IslamAbdelRahman
Reviewed By: IslamAbdelRahman
Subscribers: gunnarku, yoshinorim, MarkCallaghan, leveldb, andrewkr, dhruba
Differential Revision: https://reviews.facebook.net/D59199
Summary: There were a few narrowing conversions that clang didn't like.
Test Plan:
$ make clean && USE_CLANG=1 DISABLE_JEMALLOC=1 TEST_TMPDIR=/dev/shm/rocksdb OPT=-g make -j32 check
Reviewers: IslamAbdelRahman
Reviewed By: IslamAbdelRahman
Subscribers: andrewkr, dhruba, leveldb
Differential Revision: https://reviews.facebook.net/D57351
Summary:
This adds a new metablock containing a shared dictionary that is used
to compress all data blocks in the SST file. The size of the shared dictionary
is configurable in CompressionOptions and defaults to 0. It's currently only
used for zlib/lz4/lz4hc, but the block will be stored in the SST regardless of
the compression type if the user chooses a nonzero dictionary size.
During compaction, computes the dictionary by randomly sampling the first
output file in each subcompaction. It pre-computes the intervals to sample
by assuming the output file will have the maximum allowable length. In case
the file is smaller, some of the pre-computed sampling intervals can be beyond
end-of-file, in which case we skip over those samples and the dictionary will
be a bit smaller. After the dictionary is generated using the first file in a
subcompaction, it is loaded into the compression library before writing each
block in each subsequent file of that subcompaction.
On the read path, gets the dictionary from the metablock, if it exists. Then,
loads that dictionary into the compression library before reading each block.
Test Plan: new unit test
Reviewers: yhchiang, IslamAbdelRahman, cyan, sdong
Reviewed By: sdong
Subscribers: andrewkr, yoshinorim, kradhakrishnan, dhruba, leveldb
Differential Revision: https://reviews.facebook.net/D52287
Summary:
When a block based table file is opened, if prefetch_index_and_filter is true, it will prefetch the index and filter blocks, putting them into the block cache.
What this feature adds: when a L0 block based table file is opened, if pin_l0_filter_and_index_blocks_in_cache is true in the options (and prefetch_index_and_filter is true), then the filter and index blocks aren't released back to the block cache at the end of BlockBasedTableReader::Open(). Instead the table reader takes ownership of them, hence pinning them, ie. the LRU cache will never push them out. Meanwhile in the table reader, further accesses will not hit the block cache, thus avoiding lock contention.
Test Plan:
'export TEST_TMPDIR=/dev/shm/ && DISABLE_JEMALLOC=1 OPT=-g make all valgrind_check -j32' is OK.
I didn't run the Java tests, I don't have Java set up on my devserver.
Reviewers: sdong
Reviewed By: sdong
Subscribers: andrewkr, dhruba
Differential Revision: https://reviews.facebook.net/D56133
Summary:
When a block based table file is opened, if prefetch_index_and_filter is true, it will prefetch the index and filter blocks, putting them into the block cache.
What this feature adds: when a L0 block based table file is opened, if pin_l0_filter_and_index_blocks_in_cache is true in the options (and prefetch_index_and_filter is true), then the filter and index blocks aren't released back to the block cache at the end of BlockBasedTableReader::Open(). Instead the table reader takes ownership of them, hence pinning them, ie. the LRU cache will never push them out. Meanwhile in the table reader, further accesses will not hit the block cache, thus avoiding lock contention.
When the table reader is destroyed, it releases the pinned blocks (if there were any). This has to happen before the cache is destroyed, so I had to introduce a TableReader::Close(), to guarantee the order of destruction.
Test Plan:
Added two unit tests for this. Existing unit tests run fine (default is pin_l0_filter_and_index_blocks_in_cache=false).
DISABLE_JEMALLOC=1 OPT=-g make all valgrind_check -j32
Mac: OK.
Linux: with D55287 patched in it's OK.
Reviewers: sdong
Reviewed By: sdong
Subscribers: andrewkr, leveldb, dhruba
Differential Revision: https://reviews.facebook.net/D54801
Summary:
When we first started, max_background_flushes was 0 by default and compaction thread was executing flushes (since there was no flush thread). Then, we switched the default max_background_flushes to 1. However, we still support the case where there is no flush thread and flushes are done in compaction. This is making our code a bit more complicated. By not supporting this use-case we can make our code simpler.
We have a special case that when you set max_background_flushes to 0, we
schedule the flush to execute on the compaction thread.
Test Plan: make check (there might be some unit tests that depend on this behavior)
Reviewers: IslamAbdelRahman, yhchiang, sdong
Reviewed By: sdong
Subscribers: dhruba, leveldb
Differential Revision: https://reviews.facebook.net/D41931
Summary: It has been around for a while and it looks like it never found any uses in the wild. It's also complicating our compaction_job code quite a bit. We're deprecating it in 3.13, but will put it back in 3.14 if we actually find users that need this feature.
Test Plan: make check
Reviewers: noetzli, yhchiang, sdong
Reviewed By: sdong
Subscribers: dhruba, leveldb
Differential Revision: https://reviews.facebook.net/D42405
Summary: Moved convenience.h out of utilities to remove a dependency on utilities in db.
Test Plan: unit tests. Also compiled a link to the old location to verify the _Pragma works.
Reviewers: sdong, yhchiang, igor
Reviewed By: igor
Subscribers: dhruba, leveldb
Differential Revision: https://reviews.facebook.net/D42201
Summary: This option is guarding the feature implemented 2 and a half years ago: D8991. The feature was enabled by default back then and has been running without issues. There is no reason why any client would turn this feature off. I found no reference in fbcode.
Test Plan: none
Reviewers: sdong, yhchiang, anthony, dhruba
Reviewed By: dhruba
Subscribers: dhruba, leveldb
Differential Revision: https://reviews.facebook.net/D42063
Summary: This helps Windows port to format their changes, as discussed. Might have formatted some other codes too becasue last 10 commits include more.
Test Plan: Build it.
Reviewers: anthony, IslamAbdelRahman, kradhakrishnan, yhchiang, igor
Reviewed By: igor
Subscribers: leveldb, dhruba
Differential Revision: https://reviews.facebook.net/D41961
Summary: Make RocksDb build and run on Windows to be functionally
complete and performant. All existing test cases run with no
regressions. Performance numbers are in the pull-request.
Test plan: make all of the existing unit tests pass, obtain perf numbers.
Co-authored-by: Praveen Rao praveensinghrao@outlook.com
Co-authored-by: Sherlock Huang baihan.huang@gmail.com
Co-authored-by: Alex Zinoviev alexander.zinoviev@me.com
Co-authored-by: Dmitri Smirnov dmitrism@microsoft.com
Summary:
This diff update DB::CompactRange to use RangeCompactionOptions instead of using multiple parameters
Old CompactRange is still available but deprecated
Test Plan:
make all check
make rocksdbjava
USE_CLANG=1 make all
OPT=-DROCKSDB_LITE make release
Reviewers: sdong, yhchiang, igor
Reviewed By: igor
Subscribers: dhruba
Differential Revision: https://reviews.facebook.net/D40209
Summary: Block c_test in ROCKSDB_LITE as it's not supported in ROCKSDB_LITE.
Test Plan: c_test
Reviewers: sdong, rven, anthony, kradhakrishnan, IslamAbdelRahman, igor
Reviewed By: igor
Subscribers: dhruba, leveldb
Differential Revision: https://reviews.facebook.net/D40257
Summary:
For transactions, we are using the memtables to validate that there are no write conflicts. But after flushing, we don't have any memtables, and transactions could fail to commit. So we want to someone keep around some extra history to use for conflict checking. In addition, we want to provide a way to increase the size of this history if too many transactions fail to commit.
After chatting with people, it seems like everyone prefers just using Memtables to store this history (instead of a separate history structure). It seems like the best place for this is abstracted inside the memtable_list. I decide to create a separate list in MemtableListVersion as using the same list complicated the flush/installalflushresults logic too much.
This diff adds a new parameter to control how much memtable history to keep around after flushing. However, it sounds like people aren't too fond of adding new parameters. So I am making the default size of flushed+not-flushed memtables be set to max_write_buffers. This should not change the maximum amount of memory used, but make it more likely we're using closer the the limit. (We are now postponing deleting flushed memtables until the max_write_buffer limit is reached). So while we might use more memory on average, we are still obeying the limit set (and you could argue it's better to go ahead and use up memory now instead of waiting for a write stall to happen to test this limit).
However, if people are opposed to this default behavior, we can easily set it to 0 and require this parameter be set in order to use transactions.
Test Plan: Added a xfunc test to play around with setting different values of this parameter in all tests. Added testing in memtablelist_test and planning on adding more testing here.
Reviewers: sdong, rven, igor
Reviewed By: igor
Subscribers: dhruba, leveldb
Differential Revision: https://reviews.facebook.net/D37443
Summary: It is no longer used by the implementation, so we should also remove it from the public API.
Test Plan: make check
Reviewers: sdong
Reviewed By: sdong
Subscribers: dhruba, leveldb
Differential Revision: https://reviews.facebook.net/D34971
Summary:
When using latest clang (3.6 or 3.7/trunck) rocksdb is failing with many errors. Almost all of them are missing override errors. This diff adds missing override keyword. No manual changes.
Prerequisites: bear and clang 3.5 build with extra tools
```lang=bash
% USE_CLANG=1 bear make all # generate a compilation database http://clang.llvm.org/docs/JSONCompilationDatabase.html
% clang-modernize -p . -include . -add-override
% make format
```
Test Plan:
Make sure all tests are passing.
```lang=bash
% #Use default fb code clang.
% make check
```
Verify less error and no missing override errors.
```lang=bash
% # Have trunk clang present in path.
% ROCKSDB_NO_FBCODE=1 CC=clang CXX=clang++ make
```
Reviewers: igor, kradhakrishnan, rven, meyering, sdong
Reviewed By: sdong
Subscribers: dhruba, leveldb
Differential Revision: https://reviews.facebook.net/D34077
Summary:
The "const" attribute does not make sense on a return type,
and provokes a warning/error from gcc -W -Wextra.
Test Plan:
Run "make EXTRA_CXXFLAGS='-W -Wextra'" and see fewer errors.
Reviewers: ljin, sdong, igor.sugak, igor
Reviewed By: igor
Subscribers: dhruba
Differential Revision: https://reviews.facebook.net/D33759
Summary:
Introduces a new class for managing write buffer memory across column
families. We supplement ColumnFamilyOptions::write_buffer_size with
ColumnFamilyOptions::write_buffer, a shared pointer to a WriteBuffer
instance that enforces memory limits before flushing out to disk.
Test Plan: Added SharedWriteBuffer unit test to db_test.cc
Reviewers: sdong, rven, ljin, igor
Reviewed By: igor
Subscribers: tnovak, yhchiang, dhruba, xjin, MarkCallaghan, yoshinorim
Differential Revision: https://reviews.facebook.net/D22581