rocksdb

Commit Graph

Author	SHA1	Message	Date
Peter Dillinger	a1a102ffce	Steps toward deprecating implicit prefix seek, related fixes (#13026 ) Summary: With some new use cases onboarding to prefix extractors/seek/filters, one of the risks is existing iterator code, e.g. for maintenance tasks, being unintentionally subject to prefix seek semantics. This is a longstanding known design flaw with prefix seek, and `prefix_same_as_start` and `auto_prefix_mode` were steps in the direction of making that obsolete. However, we can't just immediately set `total_order_seek` to true by default, because that would impact so much code instantly. Here we add a new DB option, `prefix_seek_opt_in_only` that basically allows users to transition to the future behavior when they are ready. When set to true, all iterators will be treated as if `total_order_seek=true` and then the only ways to get prefix seek semantics are with `prefix_same_as_start` or `auto_prefix_mode`. Related fixes / changes: * Make sure that `prefix_same_as_start` and `auto_prefix_mode` are compatible with (or override) `total_order_seek` (depending on your interpretation). * Fix a bug in which a new iterator after dynamically changing the prefix extractor might mix different prefix semantics between memtable and SSTs. Both should use the latest extractor semantics, which means iterators ignoring memtable prefix filters with an old extractor. And that means passing the latest prefix extractor to new memtable iterators that might use prefix seek. (Without the fix, the test added for this fails in many ways.) Suggested follow-up: * Investigate a FIXME where a MergeIteratorBuilder is created in db_impl.cc. No unit test detects a change in value that should impact correctness. * Make memtable prefix bloom compatible with `auto_prefix_mode`, which might require involving the memtablereps because we don't know at iterator creation time (only seek time) whether an auto_prefix_mode seek will be a prefix seek. * Add `prefix_same_as_start` testing to db_stress Pull Request resolved: https://github.com/facebook/rocksdb/pull/13026 Test Plan: tests updated, added. Add combination of `total_order_seek=true` and `auto_prefix_mode=true` to stress test. Ran `make blackbox_crash_test` for a long while. Manually ran tests with `prefix_seek_opt_in_only=true` as default, looking for unexpected issues. I inspected most of the results and migrated many tests to be ready for such a change (but not all). Reviewed By: ltamasi Differential Revision: D63147378 Pulled By: pdillinger fbshipit-source-id: 1f4477b730683d43b4be7e933338583702d3c25e	2024-09-20 15:54:19 -07:00
Levi Tamasi	54ace7f340	Change the semantics of blob_garbage_collection_force_threshold to provide better control over space amp (#13022 ) Summary: Pull Request resolved: https://github.com/facebook/rocksdb/pull/13022 Currently, `blob_garbage_collection_force_threshold` applies to the oldest batch of blob files, which is typically only a small subset of the blob files currently eligible for garbage collection. This can result in a form of head-of-line blocking: no GC-triggered compactions will be scheduled if the oldest batch does not currently exceed the threshold, even if a lot of higher-numbered blob files do. This can in turn lead to high space amplification that exceeds the soft bound implicit in the force threshold (e.g. 50% would suggest a space amp of <2 and 75% would imply a space amp of <4). The patch changes the semantics of this configuration threshold to apply to the entire set of blob files that are eligible for garbage collection based on `blob_garbage_collection_age_cutoff`. This provides more intuitive semantics for the option and can provide a better write amp/space amp trade-off. (Note that GC-triggered compactions still pick the same SST files as before, so triggered GC still targets the oldest the blob files.) Reviewed By: jowlyzhang Differential Revision: D62977860 fbshipit-source-id: a999f31fe9cdda313de513f0e7a6fc707424d4a3	2024-09-19 15:47:13 -07:00
Peter Dillinger	98c33cb8e3	Steps toward making IDENTITY file obsolete (#13019 ) Summary: * Set write_dbid_to_manifest=true by default * Add new option write_identity_file (default true) that allows us to opt-in to future behavior without identity file * Refactor related DB open code to minimize code duplication _Recommend hiding whitespace changes for review_ Intended follow-up: add support to ldb for reading and even replacing the DB identity in the manifest. Could be a variant of `update_manifest` command or based on it. Pull Request resolved: https://github.com/facebook/rocksdb/pull/13019 Test Plan: unit tests and stress test updated for new functionality Reviewed By: anand1976 Differential Revision: D62898229 Pulled By: pdillinger fbshipit-source-id: c08b25cf790610b034e51a9de0dc78b921abbcf0	2024-09-19 14:05:21 -07:00
anand76	c21fe1a47f	Add ticker stats for read corruption retries (#12923 ) Summary: Add a couple of ticker stats for corruption retry count and successful retries. This PR also eliminates an extra read attempt when there's a checksum mismatch in a block read from the prefetch buffer. Pull Request resolved: https://github.com/facebook/rocksdb/pull/12923 Test Plan: Update existing tests Reviewed By: jowlyzhang Differential Revision: D61024687 Pulled By: anand1976 fbshipit-source-id: 3a08403580ab244000e0d480b7ee0f5a03d76b06	2024-08-12 15:32:07 -07:00
Zixuan Tan	a97a1f3247	Fix incorrect refillPeriodMicros unit in the document (#12832 ) Summary: The default value for `refillPeriodMicros` is `100 * 1000`, which means 100ms (or 100,000us). The document comments say 100,000ms (equivalent to 100 seconds), which is incorrect and misleading. This PR fixes this typo. Pull Request resolved: https://github.com/facebook/rocksdb/pull/12832 Reviewed By: cbi42 Differential Revision: D59492336 Pulled By: ajkr fbshipit-source-id: c2f55a8b996fe078a1510fcbebaea92ec0075929	2024-07-08 18:08:53 -07:00
Richard Barnes	a8dd15ad41	Fix deprecated dynamic exception in internal_repo_rocksdb/repo/java/rocksjni/kv_helper.h +1 Summary: LLVM has detected a violation of `-Wdeprecated-dynamic-exception-spec`. Dynamic exceptions were removed in C++17. This diff fixes the deprecated instance(s). See [Dynamic exception specification](https://en.cppreference.com/w/cpp/language/except_spec) and [noexcept specifier](https://en.cppreference.com/w/cpp/language/noexcept_spec). Reviewed By: palmje Differential Revision: D58528375 fbshipit-source-id: 130fecd3aa556e4cdb955feea53c442bd9fbc864	2024-06-13 12:41:13 -07:00
Davide Angelocola	cee32c5cce	use nullptr instead of NULL / 0 in rocksdbjni (#12575 ) Summary: While I was trying to understand issue https://github.com/facebook/rocksdb/issues/12503, I found this minor problem. Please have a look adamretter rhubner Pull Request resolved: https://github.com/facebook/rocksdb/pull/12575 Reviewed By: ajkr Differential Revision: D57596055 Pulled By: cbi42 fbshipit-source-id: ee0860bdfbee9364cd30c23957b72a04da6acd45	2024-05-21 12:56:07 -07:00
Peter Dillinger	45c105104b	Set optimize_filters_for_memory by default (#12377 ) Summary: This feature has been around for a couple of years and users haven't reported any problems with it. Not quite related: fixed a technical ODR violation in public header for info_log_level in case DEBUG build status changes. Pull Request resolved: https://github.com/facebook/rocksdb/pull/12377 Test Plan: unit tests updated, already in crash test. Some unit tests are expecting specific behaviors of optimize_filters_for_memory=false and we now need to bake that in. Reviewed By: jowlyzhang Differential Revision: D54129517 Pulled By: pdillinger fbshipit-source-id: a64b614840eadd18b892624187b3e122bab6719c	2024-04-30 08:33:31 -07:00
Andrew Kryczka	b4c520cadc	change default `CompactionOptions::compression` while deprecating it (#12587 ) Summary: I had a TODO to complete `CompactionOptions`'s compression API but never did it: `d610e14f93/db/compaction/compaction_picker.cc (L371-L373)` Without solving that TODO, the API remains incomplete and unsafe. Now, however, I don't think it's worthwhile to complete it. I think we should instead delete the API entirely. This PR deprecates it in preparation for deletion in a future major release. The `ColumnFamilyOptions` settings for compression should be good enough for `CompactFiles()` since they are apparently good enough for every other compaction, including `CompactRange()`. In the meantime, I also changed the default `CompressionType`. Having callers of `CompactFiles()` use Snappy compression by default does not make sense when the default could be to simply use the same compression type that is used for every other compaction. As a bonus, this change makes the default `CompressionType` consistent with the `CompressionOptions` that will be used. Pull Request resolved: https://github.com/facebook/rocksdb/pull/12587 Reviewed By: hx235 Differential Revision: D56619273 Pulled By: ajkr fbshipit-source-id: 1477de49f14b06c72d6f0045616a8ce91d97e66e	2024-04-26 13:03:21 -07:00
Radek Hubner	a8035ebc0b	Fix exception on RocksDB.getColumnFamilyMetaData() (#12474 ) Summary: https://github.com/facebook/rocksdb/issues/12466 reported a bug when `RocksDB.getColumnFamilyMetaData()` is called on an existing database(With files stored on disk). As neilramaswamy mentioned, this was caused by https://github.com/facebook/rocksdb/issues/11770 where the signature of `SstFileMetaData` constructor was changed, but JNI code wasn't updated. This PR fix JNI code, and also properly populate `fileChecksum` on `SstFileMetaData`. Pull Request resolved: https://github.com/facebook/rocksdb/pull/12474 Reviewed By: jowlyzhang Differential Revision: D55811808 Pulled By: ajkr fbshipit-source-id: 2ab156f41eaf4a4f30c49e6df421b61e8451230e	2024-04-05 13:55:18 -07:00
Alexey Vinogradov	c4df598b8e	Implement `PerfContex#toString` for the Java API (#12473 ) Summary: I've implemented `PerfContext#toString` for the Java API. See: https://groups.google.com/g/rocksdb/c/qbY_gNhbyAg Pull Request resolved: https://github.com/facebook/rocksdb/pull/12473 Reviewed By: jowlyzhang Differential Revision: D55660871 Pulled By: cbi42 fbshipit-source-id: f0528fba31ac06e16495e4f49b0bafe0dbc1bc61	2024-04-03 14:33:31 -07:00
Radek Hubner	db9eb10b5b	Enable all Java test via CMake (#12446 ) Summary: This PR follows the work done in https://github.com/facebook/rocksdb/issues/11756 and enable all Java test to be run via CMake/Ctest. Pull Request resolved: https://github.com/facebook/rocksdb/pull/12446 Reviewed By: jowlyzhang Differential Revision: D55661635 Pulled By: cbi42 fbshipit-source-id: 3ea49a121a3ba72089632ff43ee7fe4419b08a96	2024-04-03 11:03:11 -07:00
Peter Dillinger	b515a5db3f	Replace ScopedArenaIterator with ScopedArenaPtr<InternalIterator> (#12470 ) Summary: ScopedArenaIterator is not an iterator. It is a pointer wrapper. And we don't need a custom implemented pointer wrapper when std::unique_ptr can be instantiated with what we want. So this adds ScopedArenaPtr<T> to replace those uses. Pull Request resolved: https://github.com/facebook/rocksdb/pull/12470 Test Plan: CI (including ASAN/UBSAN) Reviewed By: jowlyzhang Differential Revision: D55254362 Pulled By: pdillinger fbshipit-source-id: cc96a0b9840df99aa807f417725e120802c0ae18	2024-03-22 13:40:42 -07:00
anand76	98d8a85624	New PerfContext counters for block cache bytes read (#12459 ) Summary: Add PerfContext counters for measuring the cumulative size of blocks found in the block cache. Pull Request resolved: https://github.com/facebook/rocksdb/pull/12459 Reviewed By: ajkr Differential Revision: D55170694 Pulled By: anand1976 fbshipit-source-id: 8cbba76eece116cefce7f00e2fc9d74757661d25	2024-03-21 10:46:46 -07:00
anand76	4868c10b44	Retry block reads on checksum mismatch (#12427 ) Summary: On file systems that support storage level data checksum and reconstruction, retry SST block reads for point lookups, scans, and flush and compaction if there's a checksum mismatch on the initial read. A file system can indicate its support by setting the `FSSupportedOps::kVerifyAndReconstructRead` bit in `SupportedOps`. Tests: Add new unit tests Pull Request resolved: https://github.com/facebook/rocksdb/pull/12427 Reviewed By: ajkr Differential Revision: D55025941 Pulled By: anand1976 fbshipit-source-id: dbd990cb75e03f756c8a66d42956f645c0b6d55e	2024-03-18 16:16:05 -07:00
Yu Zhang	f2546b6623	Support returning write unix time in iterator property (#12428 ) Summary: This PR adds support to return data's approximate unix write time in the iterator property API. The general implementation is: 1) If the entry comes from a SST file, the sequence number to time mapping recorded in that file's table properties will be used to deduce the entry's write time from its sequence number. If no such recording is available, `std::numeric_limits<uint64_t>::max()` is returned to indicate the write time is unknown except if the entry's sequence number is zero, in which case, 0 is returned. This also means that even if `preclude_last_level_data_seconds` and `preserve_internal_time_seconds` can be toggled off between DB reopens, as long as the SST file's table property has the mapping available, the entry's write time can be deduced and returned. 2) If the entry comes from memtable, we will use the DB's sequence number to write time mapping to do similar things. A copy of the DB's seqno to write time mapping is kept in SuperVersion to allow iterators to have lock free access. This also means a new `SuperVersion` is installed each time DB's seqno to time mapping updates, which is originally proposed by Peter in https://github.com/facebook/rocksdb/issues/11928 . Similarly, if the feature is not enabled, `std::numeric_limits<uint64_t>::max()` is returned to indicate the write time is unknown. Needed follow up: 1) The write time for `kTypeValuePreferredSeqno` should be special cased, where it's already specified by the user, so we can directly return it. 2) Flush job can be updated to use DB's seqno to time mapping copy in the SuperVersion. 3) Handle the case when `TimedPut` is called with a write time that is `std::numeric_limits<uint64_t>::max()`. We can make it a regular `Put`. Pull Request resolved: https://github.com/facebook/rocksdb/pull/12428 Test Plan: Added unit test Reviewed By: pdillinger Differential Revision: D54967067 Pulled By: jowlyzhang fbshipit-source-id: c795b1b7ec142e09e53f2ed3461cf719833cb37a	2024-03-15 15:37:37 -07:00
Alan Paxton	d9a441113e	JNI get_helper code sharing / multiGet() use efficient batch C++ support (#12344 ) Summary: Implement RAII-based helpers for JNIGet() and multiGet() Replace JNI C++ helpers `rocksdb_get_helper, rocksdb_get_helper_direct`, `multi_get_helper`, `multi_get_helper_direct`, `multi_get_helper_release_keys`, `txn_get_helper`, and `txn_multi_get_helper`. The model is to entirely do away with a single helper, instead a number of utility methods allow each separate JNI `Get()` and `MultiGet()` method to organise their parameters efficiently, then call the underlying C++ `db->Get()`, `db->MultiGet()`, `txn->Get()`, or `txn->MultiGet()` method itself, and use further utilities to retrieve results. Roughly speaking: * get keys into C++ form * Call C++ Get() * get results and status into Java form We achieve a useful performance gain as part of this work; by using the updated C++ multiGet we immediately pick up its performance gains (batch improvements to multiGet C++ were previously implemented, but not until now used by Java/JNI). multiGetBB already uses the batched C++ multiGet(), and all other benchmarks show consistent improvement after the changes: ## Before: ``` Benchmark (columnFamilyTestType) (keyCount) (keySize) (multiGetSize) (valueSize) Mode Cnt Score Error Units MultiGetNewBenchmarks.multiGetBB200 no_column_family 10000 1024 100 256 thrpt 25 5315.459 ± 20.465 ops/s MultiGetNewBenchmarks.multiGetBB200 no_column_family 10000 1024 100 1024 thrpt 25 5673.115 ± 78.299 ops/s MultiGetNewBenchmarks.multiGetBB200 no_column_family 10000 1024 100 4096 thrpt 25 2616.860 ± 46.994 ops/s MultiGetNewBenchmarks.multiGetBB200 no_column_family 10000 1024 100 16384 thrpt 25 1700.058 ± 24.034 ops/s MultiGetNewBenchmarks.multiGetBB200 no_column_family 10000 1024 100 65536 thrpt 25 791.171 ± 13.955 ops/s MultiGetNewBenchmarks.multiGetList10 no_column_family 10000 1024 100 256 thrpt 25 6129.929 ± 94.200 ops/s MultiGetNewBenchmarks.multiGetList10 no_column_family 10000 1024 100 1024 thrpt 25 7012.405 ± 97.886 ops/s MultiGetNewBenchmarks.multiGetList10 no_column_family 10000 1024 100 4096 thrpt 25 2799.014 ± 39.352 ops/s MultiGetNewBenchmarks.multiGetList10 no_column_family 10000 1024 100 16384 thrpt 25 1417.205 ± 22.272 ops/s MultiGetNewBenchmarks.multiGetList10 no_column_family 10000 1024 100 65536 thrpt 25 655.594 ± 13.050 ops/s MultiGetNewBenchmarks.multiGetListExplicitCF20 no_column_family 10000 1024 100 256 thrpt 25 6147.247 ± 82.711 ops/s MultiGetNewBenchmarks.multiGetListExplicitCF20 no_column_family 10000 1024 100 1024 thrpt 25 7004.213 ± 79.251 ops/s MultiGetNewBenchmarks.multiGetListExplicitCF20 no_column_family 10000 1024 100 4096 thrpt 25 2715.154 ± 110.017 ops/s MultiGetNewBenchmarks.multiGetListExplicitCF20 no_column_family 10000 1024 100 16384 thrpt 25 1408.070 ± 31.714 ops/s MultiGetNewBenchmarks.multiGetListExplicitCF20 no_column_family 10000 1024 100 65536 thrpt 25 623.829 ± 57.374 ops/s MultiGetNewBenchmarks.multiGetListRandomCF30 no_column_family 10000 1024 100 256 thrpt 25 6119.243 ± 116.313 ops/s MultiGetNewBenchmarks.multiGetListRandomCF30 no_column_family 10000 1024 100 1024 thrpt 25 6931.873 ± 128.094 ops/s MultiGetNewBenchmarks.multiGetListRandomCF30 no_column_family 10000 1024 100 4096 thrpt 25 2678.253 ± 39.113 ops/s MultiGetNewBenchmarks.multiGetListRandomCF30 no_column_family 10000 1024 100 16384 thrpt 25 1337.384 ± 19.500 ops/s MultiGetNewBenchmarks.multiGetListRandomCF30 no_column_family 10000 1024 100 65536 thrpt 25 625.596 ± 14.525 ops/s ``` ## After: ``` Benchmark (columnFamilyTestType) (keyCount) (keySize) (multiGetSize) (valueSize) Mode Cnt Score Error Units MultiGetBenchmarks.multiGetBB200 no_column_family 10000 1024 100 256 thrpt 25 5191.074 ± 78.250 ops/s MultiGetBenchmarks.multiGetBB200 no_column_family 10000 1024 100 1024 thrpt 25 5378.692 ± 260.682 ops/s MultiGetBenchmarks.multiGetBB200 no_column_family 10000 1024 100 4096 thrpt 25 2590.183 ± 34.844 ops/s MultiGetBenchmarks.multiGetBB200 no_column_family 10000 1024 100 16384 thrpt 25 1634.793 ± 34.022 ops/s MultiGetBenchmarks.multiGetBB200 no_column_family 10000 1024 100 65536 thrpt 25 786.455 ± 8.462 ops/s MultiGetBenchmarks.multiGetBB200 1_column_family 10000 1024 100 256 thrpt 25 5285.055 ± 11.676 ops/s MultiGetBenchmarks.multiGetBB200 1_column_family 10000 1024 100 1024 thrpt 25 5586.758 ± 213.008 ops/s MultiGetBenchmarks.multiGetBB200 1_column_family 10000 1024 100 4096 thrpt 25 2527.172 ± 17.106 ops/s MultiGetBenchmarks.multiGetBB200 1_column_family 10000 1024 100 16384 thrpt 25 1819.547 ± 12.958 ops/s MultiGetBenchmarks.multiGetBB200 1_column_family 10000 1024 100 65536 thrpt 25 803.861 ± 9.963 ops/s MultiGetBenchmarks.multiGetBB200 20_column_families 10000 1024 100 256 thrpt 25 5253.793 ± 28.020 ops/s MultiGetBenchmarks.multiGetBB200 20_column_families 10000 1024 100 1024 thrpt 25 5705.591 ± 20.556 ops/s MultiGetBenchmarks.multiGetBB200 20_column_families 10000 1024 100 4096 thrpt 25 2523.377 ± 15.415 ops/s MultiGetBenchmarks.multiGetBB200 20_column_families 10000 1024 100 16384 thrpt 25 1815.344 ± 11.309 ops/s MultiGetBenchmarks.multiGetBB200 20_column_families 10000 1024 100 65536 thrpt 25 820.792 ± 3.192 ops/s MultiGetBenchmarks.multiGetBB200 100_column_families 10000 1024 100 256 thrpt 25 5262.184 ± 20.477 ops/s MultiGetBenchmarks.multiGetBB200 100_column_families 10000 1024 100 1024 thrpt 25 5706.959 ± 23.123 ops/s MultiGetBenchmarks.multiGetBB200 100_column_families 10000 1024 100 4096 thrpt 25 2520.362 ± 9.170 ops/s MultiGetBenchmarks.multiGetBB200 100_column_families 10000 1024 100 16384 thrpt 25 1789.185 ± 14.239 ops/s MultiGetBenchmarks.multiGetBB200 100_column_families 10000 1024 100 65536 thrpt 25 818.401 ± 12.132 ops/s MultiGetBenchmarks.multiGetList10 no_column_family 10000 1024 100 256 thrpt 25 6978.310 ± 14.084 ops/s MultiGetBenchmarks.multiGetList10 no_column_family 10000 1024 100 1024 thrpt 25 7664.242 ± 22.304 ops/s MultiGetBenchmarks.multiGetList10 no_column_family 10000 1024 100 4096 thrpt 25 2881.778 ± 81.054 ops/s MultiGetBenchmarks.multiGetList10 no_column_family 10000 1024 100 16384 thrpt 25 1599.826 ± 7.190 ops/s MultiGetBenchmarks.multiGetList10 no_column_family 10000 1024 100 65536 thrpt 25 737.520 ± 6.809 ops/s MultiGetBenchmarks.multiGetList10 1_column_family 10000 1024 100 256 thrpt 25 6974.376 ± 10.716 ops/s MultiGetBenchmarks.multiGetList10 1_column_family 10000 1024 100 1024 thrpt 25 7637.440 ± 45.877 ops/s MultiGetBenchmarks.multiGetList10 1_column_family 10000 1024 100 4096 thrpt 25 2820.472 ± 42.231 ops/s MultiGetBenchmarks.multiGetList10 1_column_family 10000 1024 100 16384 thrpt 25 1716.663 ± 8.527 ops/s MultiGetBenchmarks.multiGetList10 1_column_family 10000 1024 100 65536 thrpt 25 755.848 ± 7.514 ops/s MultiGetBenchmarks.multiGetList10 20_column_families 10000 1024 100 256 thrpt 25 6943.651 ± 20.040 ops/s MultiGetBenchmarks.multiGetList10 20_column_families 10000 1024 100 1024 thrpt 25 7679.415 ± 9.114 ops/s MultiGetBenchmarks.multiGetList10 20_column_families 10000 1024 100 4096 thrpt 25 2844.564 ± 13.388 ops/s MultiGetBenchmarks.multiGetList10 20_column_families 10000 1024 100 16384 thrpt 25 1729.545 ± 5.983 ops/s MultiGetBenchmarks.multiGetList10 20_column_families 10000 1024 100 65536 thrpt 25 783.218 ± 1.530 ops/s MultiGetBenchmarks.multiGetList10 100_column_families 10000 1024 100 256 thrpt 25 6944.276 ± 29.995 ops/s MultiGetBenchmarks.multiGetList10 100_column_families 10000 1024 100 1024 thrpt 25 7670.301 ± 8.986 ops/s MultiGetBenchmarks.multiGetList10 100_column_families 10000 1024 100 4096 thrpt 25 2839.828 ± 12.421 ops/s MultiGetBenchmarks.multiGetList10 100_column_families 10000 1024 100 16384 thrpt 25 1730.005 ± 9.209 ops/s MultiGetBenchmarks.multiGetList10 100_column_families 10000 1024 100 65536 thrpt 25 787.096 ± 1.977 ops/s MultiGetBenchmarks.multiGetListExplicitCF20 no_column_family 10000 1024 100 256 thrpt 25 6896.944 ± 21.530 ops/s MultiGetBenchmarks.multiGetListExplicitCF20 no_column_family 10000 1024 100 1024 thrpt 25 7622.407 ± 12.824 ops/s MultiGetBenchmarks.multiGetListExplicitCF20 no_column_family 10000 1024 100 4096 thrpt 25 2927.538 ± 19.792 ops/s MultiGetBenchmarks.multiGetListExplicitCF20 no_column_family 10000 1024 100 16384 thrpt 25 1598.041 ± 4.312 ops/s MultiGetBenchmarks.multiGetListExplicitCF20 no_column_family 10000 1024 100 65536 thrpt 25 744.564 ± 9.236 ops/s MultiGetBenchmarks.multiGetListExplicitCF20 1_column_family 10000 1024 100 256 thrpt 25 6853.760 ± 78.041 ops/s MultiGetBenchmarks.multiGetListExplicitCF20 1_column_family 10000 1024 100 1024 thrpt 25 7360.917 ± 355.365 ops/s MultiGetBenchmarks.multiGetListExplicitCF20 1_column_family 10000 1024 100 4096 thrpt 25 2848.774 ± 13.409 ops/s MultiGetBenchmarks.multiGetListExplicitCF20 1_column_family 10000 1024 100 16384 thrpt 25 1727.688 ± 3.329 ops/s MultiGetBenchmarks.multiGetListExplicitCF20 1_column_family 10000 1024 100 65536 thrpt 25 776.088 ± 7.517 ops/s MultiGetBenchmarks.multiGetListExplicitCF20 20_column_families 10000 1024 100 256 thrpt 25 6910.339 ± 14.366 ops/s MultiGetBenchmarks.multiGetListExplicitCF20 20_column_families 10000 1024 100 1024 thrpt 25 7633.660 ± 10.830 ops/s MultiGetBenchmarks.multiGetListExplicitCF20 20_column_families 10000 1024 100 4096 thrpt 25 2787.799 ± 81.775 ops/s MultiGetBenchmarks.multiGetListExplicitCF20 20_column_families 10000 1024 100 16384 thrpt 25 1726.517 ± 6.830 ops/s MultiGetBenchmarks.multiGetListExplicitCF20 20_column_families 10000 1024 100 65536 thrpt 25 787.597 ± 3.362 ops/s MultiGetBenchmarks.multiGetListExplicitCF20 100_column_families 10000 1024 100 256 thrpt 25 6922.445 ± 10.493 ops/s MultiGetBenchmarks.multiGetListExplicitCF20 100_column_families 10000 1024 100 1024 thrpt 25 7604.710 ± 48.043 ops/s MultiGetBenchmarks.multiGetListExplicitCF20 100_column_families 10000 1024 100 4096 thrpt 25 2848.788 ± 15.783 ops/s MultiGetBenchmarks.multiGetListExplicitCF20 100_column_families 10000 1024 100 16384 thrpt 25 1730.837 ± 6.497 ops/s MultiGetBenchmarks.multiGetListExplicitCF20 100_column_families 10000 1024 100 65536 thrpt 25 794.557 ± 1.869 ops/s MultiGetBenchmarks.multiGetListRandomCF30 no_column_family 10000 1024 100 256 thrpt 25 6918.716 ± 15.766 ops/s MultiGetBenchmarks.multiGetListRandomCF30 no_column_family 10000 1024 100 1024 thrpt 25 7626.692 ± 9.394 ops/s MultiGetBenchmarks.multiGetListRandomCF30 no_column_family 10000 1024 100 4096 thrpt 25 2871.382 ± 72.155 ops/s MultiGetBenchmarks.multiGetListRandomCF30 no_column_family 10000 1024 100 16384 thrpt 25 1598.786 ± 4.819 ops/s MultiGetBenchmarks.multiGetListRandomCF30 no_column_family 10000 1024 100 65536 thrpt 25 748.469 ± 7.234 ops/s MultiGetBenchmarks.multiGetListRandomCF30 1_column_family 10000 1024 100 256 thrpt 25 6922.666 ± 17.131 ops/s MultiGetBenchmarks.multiGetListRandomCF30 1_column_family 10000 1024 100 1024 thrpt 25 7623.890 ± 8.805 ops/s MultiGetBenchmarks.multiGetListRandomCF30 1_column_family 10000 1024 100 4096 thrpt 25 2850.698 ± 18.004 ops/s MultiGetBenchmarks.multiGetListRandomCF30 1_column_family 10000 1024 100 16384 thrpt 25 1727.623 ± 4.868 ops/s MultiGetBenchmarks.multiGetListRandomCF30 1_column_family 10000 1024 100 65536 thrpt 25 774.534 ± 10.025 ops/s MultiGetBenchmarks.multiGetListRandomCF30 20_column_families 10000 1024 100 256 thrpt 25 5486.251 ± 13.582 ops/s MultiGetBenchmarks.multiGetListRandomCF30 20_column_families 10000 1024 100 1024 thrpt 25 4920.656 ± 44.557 ops/s MultiGetBenchmarks.multiGetListRandomCF30 20_column_families 10000 1024 100 4096 thrpt 25 3922.913 ± 25.686 ops/s MultiGetBenchmarks.multiGetListRandomCF30 20_column_families 10000 1024 100 16384 thrpt 25 2873.106 ± 4.336 ops/s MultiGetBenchmarks.multiGetListRandomCF30 20_column_families 10000 1024 100 65536 thrpt 25 802.404 ± 8.967 ops/s MultiGetBenchmarks.multiGetListRandomCF30 100_column_families 10000 1024 100 256 thrpt 25 4817.996 ± 18.042 ops/s MultiGetBenchmarks.multiGetListRandomCF30 100_column_families 10000 1024 100 1024 thrpt 25 4243.922 ± 13.929 ops/s MultiGetBenchmarks.multiGetListRandomCF30 100_column_families 10000 1024 100 4096 thrpt 25 3175.998 ± 7.773 ops/s MultiGetBenchmarks.multiGetListRandomCF30 100_column_families 10000 1024 100 16384 thrpt 25 2321.990 ± 12.501 ops/s MultiGetBenchmarks.multiGetListRandomCF30 100_column_families 10000 1024 100 65536 thrpt 25 1753.028 ± 7.130 ops/s ``` Closes https://github.com/facebook/rocksdb/issues/11518 Pull Request resolved: https://github.com/facebook/rocksdb/pull/12344 Reviewed By: cbi42 Differential Revision: D54809714 Pulled By: pdillinger fbshipit-source-id: bee3b949720abac073bce043b59ce976a11e99eb	2024-03-12 12:42:08 -07:00
Alan Paxton	c4d37da826	Java API - Fix handling of CF handles in DB subclasses (#12417 ) Summary: The most general `open()` method for each of RocksDB, TtlDB, OptimisticTransactionDB and TransactionDB should - ensure the default CF is supplied in the list of descriptors - cache the default CF handle - store open CF handles for automatic close on DB close The `close()` method in each of these DB subclasses should `close()` all the owned CF handles. I can’t find a cleaner way to build some generalised open/close that does this for all DB subclasses, so it exists as cut and paste with variations in the 4 different DB subclasses. Added some slightly paranoid testing that CF handles explicitly referred to as default in a list of CF handles in the general open methods, and the simple open that doesn’t supply a CF, end up reading and writing to the same CF. Prompted by the fact that this code is a bit opaque; the first returned handle is the DB. As part of this, fix the bug where the Java side of `OptimisticsTransactionDB` was not setting up default column family; this was visible when setting up an iterator; add a test to validate that the iterator is OK. A single Java reference to the default column family was not being created in the OptimisticsTransactionDB RocksDB subclass; it should be created in all subclasses. The same problem had previously been fixed for TtlDB. Pull Request resolved: https://github.com/facebook/rocksdb/pull/12417 Reviewed By: ajkr Differential Revision: D54807643 Pulled By: pdillinger fbshipit-source-id: 66f34e56a822a009a8f2018d401cf8940d91aa35	2024-03-12 10:33:27 -07:00
Radek Hubner	583fded565	Fix regression for Javadoc jar build (#12404 ) Summary: https://github.com/facebook/rocksdb/issues/12371 Introduced regression not defining dependency between `create_javadoc` and `rocksdb_javadocs_jar` build targets. Pull Request resolved: https://github.com/facebook/rocksdb/pull/12404 Reviewed By: pdillinger Differential Revision: D54516862 Pulled By: ajkr fbshipit-source-id: 785a99b2caf979395ae0de60e40e7d1b93059adb	2024-03-06 10:33:17 -08:00
Adam Retter	5458eda5f0	Pass build parallelism flag to Docker builds (#12392 ) Summary: Passed the `-j` flag through to builds happening inside Docker containers. Pull Request resolved: https://github.com/facebook/rocksdb/pull/12392 Reviewed By: cbi42 Differential Revision: D54311937 Pulled By: ajkr fbshipit-source-id: 5cf1bfe4b9059cc2d078fb5331812f32cf9e89ab	2024-02-28 12:51:00 -08:00
Adam Retter	99cc36be9b	Correct CMake Javadoc and source jar builds (#12371 ) Summary: Fix some issues introduced in https://github.com/facebook/rocksdb/pull/12199 (CC rhubner) 1. Previous `jar -v -c -f` was not valid command syntax. 2. Javadoc and source Jar files were prefixed `rocksdb-`, now corrected to `rocksdbjni-` pdillinger This needs to be merged to `main` and also `8.11.fb` (to fix the Windows build for the RocksJava release of 8.11.2) please. Pull Request resolved: https://github.com/facebook/rocksdb/pull/12371 Reviewed By: pdillinger, jowlyzhang Differential Revision: D54136834 Pulled By: hx235 fbshipit-source-id: f356f2401042af359ada607e5f0be627418ccd6c	2024-02-27 15:46:12 -08:00
Yu Zhang	2940acac00	Persist table options use_delta_encoding in options file (#11987 ) Summary: This option is used for encoding keys in block based table files. It has been having a default true value since its introduction. Users may not notice this option is not persisted in options file unless they are explicitly setting it to false. If the users expect `Iterator::GetProperty("rocksdb.iterator.is-key-pinned")` to return 1 when setting `ReadOptions.pin_data = true`, they should have noticed loading options file won't work and have work around for this by always explicitly set this option to false for opening DB. This change won't impact those users except that now they can remove their work around. If the users are not relying on key pinning behavior at all and as a result didn't notice the option is not persisted, this change shouldn't have any visible behavior impact either. Pull Request resolved: https://github.com/facebook/rocksdb/pull/11987 Reviewed By: hx235 Differential Revision: D54093238 Pulled By: jowlyzhang fbshipit-source-id: 256a3348c44cf91349034d1f6e242c437b32b9a5	2024-02-23 14:13:28 -08:00
leedonggyu	ca99a8f153	Add function to check if the RocksDB instance is closed or not (#11337 ) Summary: In RocksDb jni threre is no method to know if the instance is closed or not. so when using a closed instance it makes jvm crash. Pull Request resolved: https://github.com/facebook/rocksdb/pull/11337 Reviewed By: jaykorean Differential Revision: D53941387 Pulled By: ajkr fbshipit-source-id: e3e4e6fe48409fa70a312810e467ec0c4ce356ef	2024-02-20 11:36:28 -08:00
anand76	d227276147	Deprecate some variants of Get and MultiGet (#12327 ) Summary: A lot of variants of Get and MultiGet have been added to `include/rocksdb/db.h` over the years. Try to consolidate them by marking variants that don't return timestamps as deprecated. The underlying DB implementation will check and return Status::NotSupported() if it doesn't support returning timestamps and the caller asks for it. Pull Request resolved: https://github.com/facebook/rocksdb/pull/12327 Reviewed By: pdillinger Differential Revision: D53828151 Pulled By: anand1976 fbshipit-source-id: e0b5ca42d32daa2739d5f439a729815a2d4ff050	2024-02-16 09:21:06 -08:00
anand76	28c1c15c29	Sync tickers and histograms across C++ and Java (#12355 ) Summary: The RocksDB ticker and histogram statistics were out of sync between the C++ and Java code, with a number of newer stats missing in TickerType.java and HistogramType.java. Also, there were gaps in numbering in portal.h, which could soon become an issue due to the number of tickers and the fact that we're limited to 1 byte in Java. This PR adds the missing stats, and re-numbers all of them. It also moves some stats around to try to group related stats together. Since this will go into a major release, compatibility shouldn't be an issue. This should be automated at some point, since the current process is somewhat error prone. Pull Request resolved: https://github.com/facebook/rocksdb/pull/12355 Reviewed By: jaykorean Differential Revision: D53825324 Pulled By: anand1976 fbshipit-source-id: 298c180872f4b9f1ee54b8bb22f4e280458e7e09	2024-02-15 17:22:03 -08:00
Peter Dillinger	bfd00bba9c	Use format_version=6 by default (#12352 ) Summary: It's in production for a large storage service, and it was initially released 6 months ago (8.6.0). IMHO that's enough room for "easy downgrade" to most any user's previously integrated version, even if they only update a few times a year. Pull Request resolved: https://github.com/facebook/rocksdb/pull/12352 Test Plan: tests updated, including format capatibility test table_test: ApproximateOffsetOfCompressed is affected because adding index block to metaindex adds about 13 bytes to SST files in format_version 6. This test has historically been problematic and one reason is that, apparently, not only could it pass/fail depending on snappy compression version, but also how long your host name is, because of db_host_id. I've cleared that out for the test, which takes care of format_version=6 and hopefully improves long-term reliability. Suggested follow-up: FinishImpl in table_test.cc takes a table_options that is ignored in some cases and might not match the ioptions.table_factory configuration unless the caller is very careful. This should be cleaned up somehow. Reviewed By: anand1976 Differential Revision: D53786884 Pulled By: pdillinger fbshipit-source-id: 1964cbd40d3ab0a821fdc01c458031df716fcf51	2024-02-15 11:23:48 -08:00
Yu Zhang	4bea83aa44	Remove the force mode for EnableFileDeletions API (#12337 ) Summary: There is no strong reason for user to need this mode while on the other hand, its behavior is destructive. Pull Request resolved: https://github.com/facebook/rocksdb/pull/12337 Reviewed By: hx235 Differential Revision: D53630393 Pulled By: jowlyzhang fbshipit-source-id: ce94b537258102cd98f89aa4090025663664dd78	2024-02-13 18:36:25 -08:00
马越	45668a05f5	add unit test for compactRangeWithNullBoundaries java api (#12333 ) Summary: The purpose of this PR is to supplement a set of unit tests for https://github.com/facebook/rocksdb/pull/12328 Pull Request resolved: https://github.com/facebook/rocksdb/pull/12333 Reviewed By: ltamasi Differential Revision: D53553830 Pulled By: cbi42 fbshipit-source-id: d21490f7ce7b30f42807ee37eda455ca6abdd072	2024-02-13 10:48:31 -08:00
Hui Xiao	1a885fe730	Remove deprecated Options::access_hint_on_compaction_start (#11654 ) Summary: Context: `Options::access_hint_on_compaction_start ` is marked deprecated and now ready to be removed. Pull Request resolved: https://github.com/facebook/rocksdb/pull/11654 Test Plan: Multiple db_stress runs with pre-PR and post-PR binary randomly to ensure forward/backward compatibility on options `36a5686ec0`?fbclid=IwAR2IcdAUdTvw9O9V5GkHEYJRGMVR9p7Ei-LMa-9qiXlj3z80DxjkxlGnP1E `python3 tools/db_crashtest.py --simple blackbox --interval=30` Reviewed By: cbi42 Differential Revision: D47892459 Pulled By: hx235 fbshipit-source-id: a62f46a0377fe143be7638e218978d5431c15c56	2024-02-05 13:35:19 -08:00
马越	3a287796e3	Fix the problem that wrong Key may be passed when using CompactRange JAVA API (#12328 ) Summary: When using the Rocksdb Java API. When we use Java code to call `db.compactRange (columnFamilyHandle, start, null)` which means we hope to perform range compaction on keys bigger than start. we expected call to the corresponding C++ code : `db->compactRange (columnFamilyHandle, &start, nullptr)` But in reality, what is being called is `db ->compactRange (columnFamilyHandle,start,"")` The problem here is the `null` in Java are not converted to `nullptr`, but rather to `""`, which may result in some unexpected results Pull Request resolved: https://github.com/facebook/rocksdb/pull/12328 Reviewed By: jowlyzhang Differential Revision: D53432749 Pulled By: cbi42 fbshipit-source-id: eeadd19d05667230568668946d2ef1d5b2568268	2024-02-05 11:05:57 -08:00
Peter Dillinger	76c834e441	Remove 'virtual' when implied by 'override' (#12319 ) Summary: ... to follow modern C++ style / idioms. Used this hack: ``` for FILE in `cat my_list_of_files`; do perl -pi -e 'BEGIN{undef $/;} s/ virtual( [^;{]* override)/$1/smg' $FILE; done ``` Pull Request resolved: https://github.com/facebook/rocksdb/pull/12319 Test Plan: existing tests, CI Reviewed By: jaykorean Differential Revision: D53275303 Pulled By: pdillinger fbshipit-source-id: bc0881af270aa8ef4d0ae4f44c5a6614b6407377	2024-01-31 13:14:42 -08:00
Radek Hubner	1d8c54aeaa	Fix build on OpenBSD i386 (#12142 ) Summary: Pull Request resolved: https://github.com/facebook/rocksdb/pull/12142 Reviewed By: pdillinger Differential Revision: D53150218 Pulled By: ajkr fbshipit-source-id: a4c4d9d22d99e8a82d93d1a7ef37ec5326855cb5	2024-01-29 16:19:59 -08:00
Radek Hubner	f2ddb92750	Fix database open with column family. (#12167 ) Summary: When is RocksDB is opened with Column Family descriptors, the default column family must be set properly. If it was not, then the flush operation will fail. Pull Request resolved: https://github.com/facebook/rocksdb/pull/12167 Reviewed By: ajkr Differential Revision: D53104007 Pulled By: cbi42 fbshipit-source-id: dffa8e34a4b2a438553ee4ea308f3fa2e22e46f7	2024-01-26 09:13:03 -08:00
Radek Hubner	0bf9079d44	Change Java native methods to static (#11882 ) Summary: This should give us some performance benefit calling native C++ code. Closes https://github.com/facebook/rocksdb/issues/4786 See https://github.com/evolvedbinary/jni-benchmarks/ for more info. Pull Request resolved: https://github.com/facebook/rocksdb/pull/11882 Reviewed By: pdillinger Differential Revision: D53066207 Pulled By: ajkr fbshipit-source-id: daedef185215d0d8e791cd85bef598900bcb5bf2	2024-01-25 12:36:30 -08:00
Radek Hubner	46e8c445e7	Generate the same output for cmake rocksdbjava as for make. (#12093 ) Summary: Pull Request resolved: https://github.com/facebook/rocksdb/pull/12093 Reviewed By: pdillinger Differential Revision: D53066122 Pulled By: ajkr fbshipit-source-id: 9fcb037dbbae35091a6e47b695bfa4d643ed7d65	2024-01-25 12:36:03 -08:00
Neil Ramaswamy	4835c11cce	Add native logger support to RocksJava (#12213 ) Summary: ## Overview In this PR, we introduce support for setting the RocksDB native logger through Java. As mentioned in the discussion on the [Google Group discussion](https://groups.google.com/g/rocksdb/c/xYmbEs4sqRM/m/e73E4whJAQAJ), this work is primarily motivated by the JDK 17 [performance regression in JNI thread attach/detach calls](https://bugs.openjdk.org/browse/JDK-8314859): the only existing RocksJava logging configuration call, `setLogger`, invokes the provided logger over the JNI. ## Changes Specifically, these changes add support for the `devnull` and `stderr` native loggers. For the `stderr` logger, we add the ability to prefix every log with a `logPrefix`, so that it becomes possible know which database a particular log is coming from (if multiple databases are in use). The API looks like the following: ```java Options opts = new Options(); NativeLogger stderrNativeLogger = NativeLogger.newStderrLogger( InfoLogLevel.DEBUG_LEVEL, "[my prefix here]"); options.setLogger(stderrNativeLogger); try (final RocksDB db = RocksDB.open(options, ...)) {...} // Cleanup stderrNativeLogger.close() opts.close(); ``` Note that the API to set the logger is the same, via `Options::setLogger` (or `DBOptions::setLogger`). However, it will set the RocksDB logger to be native when the provided logger is an instance of `NativeLogger`. ## Testing Two tests have been added in `NativeLoggerTest.java`. The first test creates both the `devnull` and `stderr` loggers, and sets them on the associated `Options`. However, to avoid polluting the testing output with logs from `stderr`, only the `devnull` logger is actually used in the test. The second test does the same logic, but for `DBOptions`. It is possible to manually verify the `stderr` logger by modifying the tests slightly, and observing that the console indeed gets cluttered with logs from `stderr`. Pull Request resolved: https://github.com/facebook/rocksdb/pull/12213 Reviewed By: cbi42 Differential Revision: D52772306 Pulled By: ajkr fbshipit-source-id: 4026895f78f9cc250daf6bfa57427957e2d8b053	2024-01-17 17:51:36 -08:00
Radek Hubner	491e3d4342	Add of javadoc and sources JAR to CMake build. (#12199 ) Summary: Pull Request resolved: https://github.com/facebook/rocksdb/pull/12199 Reviewed By: hx235 Differential Revision: D52542815 Pulled By: ajkr fbshipit-source-id: 0cc30feae01c2e09bcc0371ac2ed7eaf715da4f8	2024-01-10 09:46:00 -08:00
haobo sun	09411e199d	Format async io for Java API (#12192 ) Summary: Format https://github.com/facebook/rocksdb/issues/12184 according to adamretter 's comments. Pull Request resolved: https://github.com/facebook/rocksdb/pull/12192 Reviewed By: cbi42 Differential Revision: D52457427 Pulled By: ajkr fbshipit-source-id: 75b1be5d89687be4e58e618d693a6a120c5efc78	2024-01-02 13:19:08 -08:00
Hui Xiao	06e593376c	Group SST write in flush, compaction and db open with new stats (#11910 ) Summary: ## Context/Summary Similar to https://github.com/facebook/rocksdb/pull/11288, https://github.com/facebook/rocksdb/pull/11444, categorizing SST/blob file write according to different io activities allows more insight into the activity. For that, this PR does the following: - Tag different write IOs by passing down and converting WriteOptions to IOOptions - Add new SST_WRITE_MICROS histogram in WritableFileWriter::Append() and breakdown FILE_WRITE_{FLUSH\|COMPACTION\|DB_OPEN}_MICROS Some related code refactory to make implementation cleaner: - Blob stats - Replace high-level write measurement with low-level WritableFileWriter::Append() measurement for BLOB_DB_BLOB_FILE_WRITE_MICROS. This is to make FILE_WRITE_{FLUSH\|COMPACTION\|DB_OPEN}_MICROS include blob file. As a consequence, this introduces some behavioral changes on it, see HISTORY and db bench test plan below for more info. - Fix bugs where BLOB_DB_BLOB_FILE_SYNCED/BLOB_DB_BLOB_FILE_BYTES_WRITTEN include file failed to sync and bytes failed to write. - Refactor WriteOptions constructor for easier construction with io_activity and rate_limiter_priority - Refactor DBImpl::~DBImpl()/BlobDBImpl::Close() to bypass thread op verification - Build table - TableBuilderOptions now includes Read/WriteOpitons so BuildTable() do not need to take these two variables - Replace the io_priority passed into BuildTable() with TableBuilderOptions::WriteOpitons::rate_limiter_priority. Similar for BlobFileBuilder. This parameter is used for dynamically changing file io priority for flush, see https://github.com/facebook/rocksdb/pull/9988?fbclid=IwAR1DtKel6c-bRJAdesGo0jsbztRtciByNlvokbxkV6h_L-AE9MACzqRTT5s for more - Update ThreadStatus::FLUSH_BYTES_WRITTEN to use io_activity to track flush IO in flush job and db open instead of io_priority ## Test ### db bench Flush ``` ./db_bench --statistics=1 --benchmarks=fillseq --num=100000 --write_buffer_size=100 rocksdb.sst.write.micros P50 : 1.830863 P95 : 4.094720 P99 : 6.578947 P100 : 26.000000 COUNT : 7875 SUM : 20377 rocksdb.file.write.flush.micros P50 : 1.830863 P95 : 4.094720 P99 : 6.578947 P100 : 26.000000 COUNT : 7875 SUM : 20377 rocksdb.file.write.compaction.micros P50 : 0.000000 P95 : 0.000000 P99 : 0.000000 P100 : 0.000000 COUNT : 0 SUM : 0 rocksdb.file.write.db.open.micros P50 : 0.000000 P95 : 0.000000 P99 : 0.000000 P100 : 0.000000 COUNT : 0 SUM : 0 ``` compaction, db oopen ``` Setup: ./db_bench --statistics=1 --benchmarks=fillseq --num=10000 --disable_auto_compactions=1 -write_buffer_size=100 --db=../db_bench Run:./db_bench --statistics=1 --benchmarks=compact --db=../db_bench --use_existing_db=1 rocksdb.sst.write.micros P50 : 2.675325 P95 : 9.578788 P99 : 18.780000 P100 : 314.000000 COUNT : 638 SUM : 3279 rocksdb.file.write.flush.micros P50 : 0.000000 P95 : 0.000000 P99 : 0.000000 P100 : 0.000000 COUNT : 0 SUM : 0 rocksdb.file.write.compaction.micros P50 : 2.757353 P95 : 9.610687 P99 : 19.316667 P100 : 314.000000 COUNT : 615 SUM : 3213 rocksdb.file.write.db.open.micros P50 : 2.055556 P95 : 3.925000 P99 : 9.000000 P100 : 9.000000 COUNT : 23 SUM : 66 ``` blob stats - just to make sure they aren't broken by this PR ``` Integrated Blob DB Setup: ./db_bench --enable_blob_files=1 --statistics=1 --benchmarks=fillseq --num=10000 --disable_auto_compactions=1 -write_buffer_size=100 --db=../db_bench Run:./db_bench --enable_blob_files=1 --statistics=1 --benchmarks=compact --db=../db_bench --use_existing_db=1 pre-PR: rocksdb.blobdb.blob.file.write.micros P50 : 7.298246 P95 : 9.771930 P99 : 9.991813 P100 : 16.000000 COUNT : 235 SUM : 1600 rocksdb.blobdb.blob.file.synced COUNT : 1 rocksdb.blobdb.blob.file.bytes.written COUNT : 34842 post-PR: rocksdb.blobdb.blob.file.write.micros P50 : 2.000000 P95 : 2.829360 P99 : 2.993779 P100 : 9.000000 COUNT : 707 SUM : 1614 - COUNT is higher and values are smaller as it includes header and footer write - COUNT is 3X higher due to each Append() count as one post-PR, while in pre-PR, 3 Append()s counts as one. See https://github.com/facebook/rocksdb/pull/11910/files#diff-32b811c0a1c000768cfb2532052b44dc0b3bf82253f3eab078e15ff201a0dabfL157-L164 rocksdb.blobdb.blob.file.synced COUNT : 1 (stay the same) rocksdb.blobdb.blob.file.bytes.written COUNT : 34842 (stay the same) ``` ``` Stacked Blob DB Run: ./db_bench --use_blob_db=1 --statistics=1 --benchmarks=fillseq --num=10000 --disable_auto_compactions=1 -write_buffer_size=100 --db=../db_bench pre-PR: rocksdb.blobdb.blob.file.write.micros P50 : 12.808042 P95 : 19.674497 P99 : 28.539683 P100 : 51.000000 COUNT : 10000 SUM : 140876 rocksdb.blobdb.blob.file.synced COUNT : 8 rocksdb.blobdb.blob.file.bytes.written COUNT : 1043445 post-PR: rocksdb.blobdb.blob.file.write.micros P50 : 1.657370 P95 : 2.952175 P99 : 3.877519 P100 : 24.000000 COUNT : 30001 SUM : 67924 - COUNT is higher and values are smaller as it includes header and footer write - COUNT is 3X higher due to each Append() count as one post-PR, while in pre-PR, 3 Append()s counts as one. See https://github.com/facebook/rocksdb/pull/11910/files#diff-32b811c0a1c000768cfb2532052b44dc0b3bf82253f3eab078e15ff201a0dabfL157-L164 rocksdb.blobdb.blob.file.synced COUNT : 8 (stay the same) rocksdb.blobdb.blob.file.bytes.written COUNT : 1043445 (stay the same) ``` ### Rehearsal CI stress test Trigger 3 full runs of all our CI stress tests ### Performance Flush ``` TEST_TMPDIR=/dev/shm ./db_basic_bench_pre_pr --benchmark_filter=ManualFlush/key_num:524288/per_key_size:256 --benchmark_repetitions=1000 -- default: 1 thread is used to run benchmark; enable_statistics = true Pre-pr: avg 507515519.3 ns 497686074,499444327,500862543,501389862,502994471,503744435,504142123,504224056,505724198,506610393,506837742,506955122,507695561,507929036,508307733,508312691,508999120,509963561,510142147,510698091,510743096,510769317,510957074,511053311,511371367,511409911,511432960,511642385,511691964,511730908, Post-pr: avg 511971266.5 ns, regressed 0.88% 502744835,506502498,507735420,507929724,508313335,509548582,509994942,510107257,510715603,511046955,511352639,511458478,512117521,512317380,512766303,512972652,513059586,513804934,513808980,514059409,514187369,514389494,514447762,514616464,514622882,514641763,514666265,514716377,514990179,515502408, ``` Compaction ``` TEST_TMPDIR=/dev/shm ./db_basic_bench_{pre\|post}_pr --benchmark_filter=ManualCompaction/comp_style:0/max_data:134217728/per_key_size:256/enable_statistics:1 --benchmark_repetitions=1000 -- default: 1 thread is used to run benchmark Pre-pr: avg 495346098.30 ns 492118301,493203526,494201411,494336607,495269217,495404950,496402598,497012157,497358370,498153846 Post-pr: avg 504528077.20, regressed 1.85%. "ManualCompaction" include flush so the isolated regression for compaction should be around 1.85-0.88 = 0.97% 502465338,502485945,502541789,502909283,503438601,504143885,506113087,506629423,507160414,507393007 ``` Put with WAL (in case passing WriteOptions slows down this path even without collecting SST write stats) ``` TEST_TMPDIR=/dev/shm ./db_basic_bench_pre_pr --benchmark_filter=DBPut/comp_style:0/max_data:107374182400/per_key_size:256/enable_statistics:1/wal:1 --benchmark_repetitions=1000 -- default: 1 thread is used to run benchmark Pre-pr: avg 3848.10 ns 3814,3838,3839,3848,3854,3854,3854,3860,3860,3860 Post-pr: avg 3874.20 ns, regressed 0.68% 3863,3867,3871,3874,3875,3877,3877,3877,3880,3881 ``` Pull Request resolved: https://github.com/facebook/rocksdb/pull/11910 Reviewed By: ajkr Differential Revision: D49788060 Pulled By: hx235 fbshipit-source-id: 79e73699cda5be3b66461687e5147c2484fc5eff	2023-12-29 15:29:23 -08:00
haobo sun	2a8b2df383	Add async_io for Java API (#12184 ) Summary: Fixes https://github.com/facebook/rocksdb/issues/12183 Pull Request resolved: https://github.com/facebook/rocksdb/pull/12184 Reviewed By: hx235 Differential Revision: D52421787 Pulled By: ajkr fbshipit-source-id: ad3bdae9be51bef5a208b02ceb08f6feb9fac8e4	2023-12-26 14:33:11 -08:00
Nicolas Pepin-Perreault	5b073a7daa	Access SST full file checksum via RocksDB#getLiveFilesMetadata (#11770 ) Summary: Description This PR passes along the native `LiveFileMetaData#file_checksum` field from the C++ class to the Java API as a copied byte array. If there is no file checksum generator factory set beforehand, then the array will empty. Please advise if you'd rather it be null - an empty array means one extra allocation, but it avoids possible null pointer exceptions. > Note > This functionality complements but does not supersede https://github.com/facebook/rocksdb/issues/11736 It's outside the scope here to add support for Java based `FileChecksumGenFactory` implementations. As a workaround, users can already use the built-in one by creating their initial `DBOptions` via properties: ```java final Properties props = new Properties(); props.put("file_checksum_gen_factory", "FileChecksumGenCrc32cFactory"); try (final DBOptions dbOptions = DBOptions.getDBOptionsFromProps(props); final ColumnFamilyOptions cfOptions = new ColumnFamilyOptions(); final Options options = new Options(dbOptions, cfOptions).setCreateIfMissing(true)) { // do stuff } ``` I wanted to add a better test, but unfortunately there's no available CRC32C implementation available in Java 8 without adding a dependency or adding a JNI helper for RocksDB's own implementation (or bumping the minimum version for tests to Java 9). That said, I understand the test is rather poor, so happy to change it to whatever you'd like. Context To give some context, we replicate RocksDB checkpoints to other nodes. Part of this is verifying the integrity of each file during replication. With a large enough RocksDB, computing the checksum ourselves is prohibitively expensive. Since SST files comprise the bulk of the data, we'd much rather delegate this to RocksDB on file write, and read it back after to compare. It's likely we will provide a follow up to read the file checksum list directly from the manifest without having to open the DB, but this was the easiest first step to get it working for us. Pull Request resolved: https://github.com/facebook/rocksdb/pull/11770 Reviewed By: hx235 Differential Revision: D52420729 Pulled By: ajkr fbshipit-source-id: a873de35a48aaf315e125733091cd221a97b9073	2023-12-26 14:02:36 -08:00
Adam Retter	d8c1ab8b2d	Add Iterator::Refresh(Snapshot) to RocksJava (#12145 ) Summary: Adds the API to RocksJava. Also improves the C++ doc for Iterator::Refresh(Snapshot) Closes https://github.com/facebook/rocksdb/issues/12095 Pull Request resolved: https://github.com/facebook/rocksdb/pull/12145 Reviewed By: hx235 Differential Revision: D52266452 Pulled By: ajkr fbshipit-source-id: 6b72b41672081b966b0c5dd07d9bf151ed009122	2023-12-20 18:03:42 -08:00
akankshamahajan	7b24dec25d	Fix header files to meet Open source requirements (#12164 ) Summary: Same as title Pull Request resolved: https://github.com/facebook/rocksdb/pull/12164 Reviewed By: hx235 Differential Revision: D52302234 Pulled By: akankshamahajan15 fbshipit-source-id: d4724fc944c773242788f5a47d1c7eadbbc5a522	2023-12-19 13:43:17 -08:00
Radek Hubner	f7486ff6a3	Add deletion-triggered compaction to RocksJava (#12028 ) Summary: Pull Request resolved: https://github.com/facebook/rocksdb/pull/12028 Reviewed By: akankshamahajan15 Differential Revision: D52264983 Pulled By: ajkr fbshipit-source-id: 02d08015b4bffac06d889dc1be50a51d03f891b3	2023-12-18 13:43:01 -08:00
anand76	cc069f25b3	Add some compressed and tiered secondary cache stats (#12150 ) Summary: Add statistics for more visibility. Pull Request resolved: https://github.com/facebook/rocksdb/pull/12150 Reviewed By: akankshamahajan15 Differential Revision: D52184633 Pulled By: anand1976 fbshipit-source-id: 9969e05d65223811cd12627102b020bb6d229352	2023-12-15 11:34:08 -08:00
Ludovic Henry	5502f06729	Add support for linux-riscv64 (#12139 ) Summary: Following https://github.com/evolvedbinary/docker-rocksjava/pull/2, we can now build rocksdb on riscv64. I've verified this works as expected with `make rocksdbjavastaticdockerriscv64`. Also fixes https://github.com/facebook/rocksdb/issues/10500 https://github.com/facebook/rocksdb/issues/11994 Pull Request resolved: https://github.com/facebook/rocksdb/pull/12139 Reviewed By: jaykorean Differential Revision: D52128098 Pulled By: akankshamahajan15 fbshipit-source-id: 706d36a3f8a9e990b76f426bc450937a0cd1a537	2023-12-14 11:27:17 -08:00
Radek Hubner	5c5e018943	Fix JNI lazy load regression. (#12133 ) Summary: A small regression that conflicted with PR https://github.com/facebook/rocksdb/pull/12133 was later merged in commit `2296c624fa (diff-26d3ab8a3d764183d0ea3aea834fe481eec2347c334b918ebd7bdce4bcabcc19R35)` This PR addresses that regression. Closes https://github.com/facebook/rocksdb/issues/12132 Pull Request resolved: https://github.com/facebook/rocksdb/pull/12133 Reviewed By: jowlyzhang Differential Revision: D52041736 Pulled By: ltamasi fbshipit-source-id: 33db57035154c833ae00b5d921b17b3be80c8dd7	2023-12-11 11:21:52 -08:00
Alan Paxton	5a063ecd34	Java API consistency between RocksDB.put() , .merge() and Transaction.put() , .merge() (#11019 ) Summary: ### Implement new Java API get()/put()/merge() methods, and transactional variants. The Java API methods are very inconsistent in terms of how they pass parameters (byte[], ByteBuffer), and what variants and defaulted parameters they support. We try to bring some consistency to this. * All APIs should support calls with ByteBuffer parameters. * Similar methods (RocksDB.get() vs Transaction.get()) should support as similar as possible sets of parameters for predictability. * get()-like methods should provide variants where the caller supplies the target buffer, for the sake of efficiency. Allocation costs in Java can be significant when large buffers are repeatedly allocated and freed. ### API Additions 1. RockDB.get implement indirect ByteBuffers. Added indirect ByteBuffers and supporting native methods for get(). 2. RocksDB.Iterator implement missing (byte[], offset, length) variants for key() and value() parameters. 3. Transaction.get() implement missing methods, based on RocksDB.get. Added ByteBuffer.get with and without column family. Added byte[]-as-target get. 4. Transaction.iterator() implement a getIterator() which defaults ReadOptions; as per RocksDB.iterator(). Rationalize support API for this and RocksDB.iterator() 5. RocksDB.merge implement ByteBuffer methods; both direct and indirect buffers. Shadow the methods of RocksDB.put; RocksDB.put only offers ByteBuffer API with explicit WriteOptions. Duplicated this with RocksDB.merge 6. Transaction.merge implement methods as per RocksDB.merge methods. Transaction is already constructed with WriteOptions, so no explicit WriteOptions methods required. 7. Transaction.mergeUntracked implement the same API methods as Transaction.merge except the ones that use assumeTracked, because that’s not a feature of merge untracked. ### Support Changes (C++) The current JNI code in C++ supports multiple variants of methods through a number of helper functions. There are numerous TODO suggestions in the code proposing that the helpers be re-factored/shared. We have taken a different approach for the new methods; we have created wrapper classes `JDirectBufferSlice`, `JDirectBufferPinnableSlice`, `JByteArraySlice` and `JByteArrayPinnableSlice` RAII classes which construct slices from JNI parameters and can then be passed directly to RocksDB methods. For instance, the `Java_org_rocksdb_Transaction_getDirect` method is implemented like this: ``` try { ROCKSDB_NAMESPACE::JDirectBufferSlice key(env, jkey_bb, jkey_off, jkey_part_len); ROCKSDB_NAMESPACE::JDirectBufferPinnableSlice value(env, jval_bb, jval_off, jval_part_len); ROCKSDB_NAMESPACE::KVException::ThrowOnError( env, txn->Get(*read_options, column_family_handle, key.slice(), &value.pinnable_slice())); return value.Fetch(); } catch (const ROCKSDB_NAMESPACE::KVException& e) { return e.Code(); } ``` Notice the try/catch mechanism with the `KVException` class, which combined with RAII and the wrapper classes means that there is no ad-hoc cleanup necessary in the JNI methods. We propose to extend this mechanism to existing JNI methods as further work. ### Support Changes (Java) Where there are multiple parameter-variant versions of the same method, we use fewer or just one supporting native method for all of them. This makes maintenance a bit easier and reduces the opportunity for coding errors mixing up (untyped) object handles. In order to support this efficiently, some classes need to have default values for column families and read options added and cached so that they are not re-constructed on every method call. This PR closes https://github.com/facebook/rocksdb/issues/9776 Pull Request resolved: https://github.com/facebook/rocksdb/pull/11019 Reviewed By: ajkr Differential Revision: D52039446 Pulled By: jowlyzhang fbshipit-source-id: 45d0140a4887e42134d2e56520e9b8efbd349660	2023-12-11 11:03:17 -08:00
Alexander Kiel	6e7701d49b	Fix JavaDoc of setCompactionReadaheadSize (#12090 ) Summary: Recently in https://github.com/facebook/rocksdb/issues/11762 the default of `compaction_readahead_size` changed from 0 to 2 MB. Closes: https://github.com/facebook/rocksdb/issues/12088 Pull Request resolved: https://github.com/facebook/rocksdb/pull/12090 Reviewed By: jaykorean Differential Revision: D51531762 Pulled By: ajkr fbshipit-source-id: a0b7145a1dca95ee90ffa3553f6eeacce6424aee	2023-11-27 11:50:53 -08:00
Changyu Bi	b059c5680e	Add missing copyright header (#12076 ) Summary: Required for open source repo. Pull Request resolved: https://github.com/facebook/rocksdb/pull/12076 Reviewed By: ajkr Differential Revision: D51449839 Pulled By: cbi42 fbshipit-source-id: 4a25a3422880db3f28a2834d966341935db32530	2023-11-19 09:50:59 -08:00

1 2 3 4 5 ...

984 Commits