Commit Graph

949 Commits

Author SHA1 Message Date
Neil Ramaswamy 4835c11cce Add native logger support to RocksJava (#12213)
Summary:
## Overview

In this PR, we introduce support for setting the RocksDB native logger through Java. As mentioned in the discussion on the [Google Group discussion](https://groups.google.com/g/rocksdb/c/xYmbEs4sqRM/m/e73E4whJAQAJ), this work is primarily motivated by the  JDK 17 [performance regression in JNI thread attach/detach calls](https://bugs.openjdk.org/browse/JDK-8314859): the only existing RocksJava logging configuration call, `setLogger`, invokes the provided logger over the JNI.

## Changes

Specifically, these changes add support for the `devnull` and `stderr` native loggers. For the `stderr` logger, we add the ability to prefix every log with a `logPrefix`, so that it becomes possible know which database a particular log is coming from (if multiple databases are in use). The  API looks like the following:

```java
Options opts = new Options();

NativeLogger stderrNativeLogger = NativeLogger.newStderrLogger(
  InfoLogLevel.DEBUG_LEVEL, "[my prefix here]");
options.setLogger(stderrNativeLogger);

try (final RocksDB db = RocksDB.open(options, ...))  {...}

// Cleanup
stderrNativeLogger.close()
opts.close();
```

Note that the API to set the logger is the same, via `Options::setLogger` (or `DBOptions::setLogger`). However, it will set the RocksDB logger to be native when  the provided logger is an instance of `NativeLogger`.

## Testing

Two tests have been added in `NativeLoggerTest.java`. The first test creates both the `devnull` and `stderr` loggers, and sets them on the associated `Options`. However, to avoid polluting the testing output with logs from `stderr`, only the `devnull` logger is actually used in the test. The second test does the same logic, but for `DBOptions`.

It is possible to manually verify the `stderr` logger by modifying the tests slightly, and observing that the console indeed gets cluttered with logs from `stderr`.

Pull Request resolved: https://github.com/facebook/rocksdb/pull/12213

Reviewed By: cbi42

Differential Revision: D52772306

Pulled By: ajkr

fbshipit-source-id: 4026895f78f9cc250daf6bfa57427957e2d8b053
2024-01-17 17:51:36 -08:00
Radek Hubner 491e3d4342 Add of javadoc and sources JAR to CMake build. (#12199)
Summary: Pull Request resolved: https://github.com/facebook/rocksdb/pull/12199

Reviewed By: hx235

Differential Revision: D52542815

Pulled By: ajkr

fbshipit-source-id: 0cc30feae01c2e09bcc0371ac2ed7eaf715da4f8
2024-01-10 09:46:00 -08:00
haobo sun 09411e199d Format async io for Java API (#12192)
Summary:
Format https://github.com/facebook/rocksdb/issues/12184  according to adamretter 's comments.

Pull Request resolved: https://github.com/facebook/rocksdb/pull/12192

Reviewed By: cbi42

Differential Revision: D52457427

Pulled By: ajkr

fbshipit-source-id: 75b1be5d89687be4e58e618d693a6a120c5efc78
2024-01-02 13:19:08 -08:00
Hui Xiao 06e593376c Group SST write in flush, compaction and db open with new stats (#11910)
Summary:
## Context/Summary
Similar to https://github.com/facebook/rocksdb/pull/11288, https://github.com/facebook/rocksdb/pull/11444, categorizing SST/blob file write according to different io activities allows more insight into the activity.

For that, this PR does the following:
- Tag different write IOs by passing down and converting WriteOptions to IOOptions
- Add new SST_WRITE_MICROS histogram in WritableFileWriter::Append() and breakdown FILE_WRITE_{FLUSH|COMPACTION|DB_OPEN}_MICROS

Some related code refactory to make implementation cleaner:
- Blob stats
   - Replace high-level write measurement with low-level WritableFileWriter::Append() measurement for BLOB_DB_BLOB_FILE_WRITE_MICROS. This is to make FILE_WRITE_{FLUSH|COMPACTION|DB_OPEN}_MICROS  include blob file. As a consequence, this introduces some behavioral changes on it, see HISTORY and db bench test plan below for more info.
   - Fix bugs where BLOB_DB_BLOB_FILE_SYNCED/BLOB_DB_BLOB_FILE_BYTES_WRITTEN include file failed to sync and bytes failed to write.
- Refactor WriteOptions constructor for easier construction with io_activity and rate_limiter_priority
- Refactor DBImpl::~DBImpl()/BlobDBImpl::Close() to bypass thread op verification
- Build table
   - TableBuilderOptions now includes Read/WriteOpitons so BuildTable() do not need to take these two variables
   - Replace the io_priority passed into BuildTable() with TableBuilderOptions::WriteOpitons::rate_limiter_priority. Similar for BlobFileBuilder.
This parameter is used for dynamically changing file io priority for flush, see  https://github.com/facebook/rocksdb/pull/9988?fbclid=IwAR1DtKel6c-bRJAdesGo0jsbztRtciByNlvokbxkV6h_L-AE9MACzqRTT5s for more
   - Update ThreadStatus::FLUSH_BYTES_WRITTEN to use io_activity to track flush IO in flush job and db open instead of io_priority

## Test
### db bench

Flush
```
./db_bench --statistics=1 --benchmarks=fillseq --num=100000 --write_buffer_size=100

rocksdb.sst.write.micros P50 : 1.830863 P95 : 4.094720 P99 : 6.578947 P100 : 26.000000 COUNT : 7875 SUM : 20377
rocksdb.file.write.flush.micros P50 : 1.830863 P95 : 4.094720 P99 : 6.578947 P100 : 26.000000 COUNT : 7875 SUM : 20377
rocksdb.file.write.compaction.micros P50 : 0.000000 P95 : 0.000000 P99 : 0.000000 P100 : 0.000000 COUNT : 0 SUM : 0
rocksdb.file.write.db.open.micros P50 : 0.000000 P95 : 0.000000 P99 : 0.000000 P100 : 0.000000 COUNT : 0 SUM : 0
```

compaction, db oopen
```
Setup: ./db_bench --statistics=1 --benchmarks=fillseq --num=10000 --disable_auto_compactions=1 -write_buffer_size=100 --db=../db_bench

Run:./db_bench --statistics=1 --benchmarks=compact  --db=../db_bench --use_existing_db=1

rocksdb.sst.write.micros P50 : 2.675325 P95 : 9.578788 P99 : 18.780000 P100 : 314.000000 COUNT : 638 SUM : 3279
rocksdb.file.write.flush.micros P50 : 0.000000 P95 : 0.000000 P99 : 0.000000 P100 : 0.000000 COUNT : 0 SUM : 0
rocksdb.file.write.compaction.micros P50 : 2.757353 P95 : 9.610687 P99 : 19.316667 P100 : 314.000000 COUNT : 615 SUM : 3213
rocksdb.file.write.db.open.micros P50 : 2.055556 P95 : 3.925000 P99 : 9.000000 P100 : 9.000000 COUNT : 23 SUM : 66
```

blob stats - just to make sure they aren't broken by this PR
```
Integrated Blob DB

Setup: ./db_bench --enable_blob_files=1 --statistics=1 --benchmarks=fillseq --num=10000 --disable_auto_compactions=1 -write_buffer_size=100 --db=../db_bench

Run:./db_bench --enable_blob_files=1 --statistics=1 --benchmarks=compact  --db=../db_bench --use_existing_db=1

pre-PR:
rocksdb.blobdb.blob.file.write.micros P50 : 7.298246 P95 : 9.771930 P99 : 9.991813 P100 : 16.000000 COUNT : 235 SUM : 1600
rocksdb.blobdb.blob.file.synced COUNT : 1
rocksdb.blobdb.blob.file.bytes.written COUNT : 34842

post-PR:
rocksdb.blobdb.blob.file.write.micros P50 : 2.000000 P95 : 2.829360 P99 : 2.993779 P100 : 9.000000 COUNT : 707 SUM : 1614
- COUNT is higher and values are smaller as it includes header and footer write
- COUNT is 3X higher due to each Append() count as one post-PR, while in pre-PR, 3 Append()s counts as one. See https://github.com/facebook/rocksdb/pull/11910/files#diff-32b811c0a1c000768cfb2532052b44dc0b3bf82253f3eab078e15ff201a0dabfL157-L164

rocksdb.blobdb.blob.file.synced COUNT : 1 (stay the same)
rocksdb.blobdb.blob.file.bytes.written COUNT : 34842 (stay the same)
```

```
Stacked Blob DB

Run: ./db_bench --use_blob_db=1 --statistics=1 --benchmarks=fillseq --num=10000 --disable_auto_compactions=1 -write_buffer_size=100 --db=../db_bench

pre-PR:
rocksdb.blobdb.blob.file.write.micros P50 : 12.808042 P95 : 19.674497 P99 : 28.539683 P100 : 51.000000 COUNT : 10000 SUM : 140876
rocksdb.blobdb.blob.file.synced COUNT : 8
rocksdb.blobdb.blob.file.bytes.written COUNT : 1043445

post-PR:
rocksdb.blobdb.blob.file.write.micros P50 : 1.657370 P95 : 2.952175 P99 : 3.877519 P100 : 24.000000 COUNT : 30001 SUM : 67924
- COUNT is higher and values are smaller as it includes header and footer write
- COUNT is 3X higher due to each Append() count as one post-PR, while in pre-PR, 3 Append()s counts as one. See https://github.com/facebook/rocksdb/pull/11910/files#diff-32b811c0a1c000768cfb2532052b44dc0b3bf82253f3eab078e15ff201a0dabfL157-L164

rocksdb.blobdb.blob.file.synced COUNT : 8 (stay the same)
rocksdb.blobdb.blob.file.bytes.written COUNT : 1043445 (stay the same)
```

###  Rehearsal CI stress test
Trigger 3 full runs of all our CI stress tests

###  Performance

Flush
```
TEST_TMPDIR=/dev/shm ./db_basic_bench_pre_pr --benchmark_filter=ManualFlush/key_num:524288/per_key_size:256 --benchmark_repetitions=1000
-- default: 1 thread is used to run benchmark; enable_statistics = true

Pre-pr: avg 507515519.3 ns
497686074,499444327,500862543,501389862,502994471,503744435,504142123,504224056,505724198,506610393,506837742,506955122,507695561,507929036,508307733,508312691,508999120,509963561,510142147,510698091,510743096,510769317,510957074,511053311,511371367,511409911,511432960,511642385,511691964,511730908,

Post-pr: avg 511971266.5 ns, regressed 0.88%
502744835,506502498,507735420,507929724,508313335,509548582,509994942,510107257,510715603,511046955,511352639,511458478,512117521,512317380,512766303,512972652,513059586,513804934,513808980,514059409,514187369,514389494,514447762,514616464,514622882,514641763,514666265,514716377,514990179,515502408,
```

Compaction
```
TEST_TMPDIR=/dev/shm ./db_basic_bench_{pre|post}_pr --benchmark_filter=ManualCompaction/comp_style:0/max_data:134217728/per_key_size:256/enable_statistics:1  --benchmark_repetitions=1000
-- default: 1 thread is used to run benchmark

Pre-pr: avg 495346098.30 ns
492118301,493203526,494201411,494336607,495269217,495404950,496402598,497012157,497358370,498153846

Post-pr: avg 504528077.20, regressed 1.85%. "ManualCompaction" include flush so the isolated regression for compaction should be around 1.85-0.88 = 0.97%
502465338,502485945,502541789,502909283,503438601,504143885,506113087,506629423,507160414,507393007
```

Put with WAL (in case passing WriteOptions slows down this path even without collecting SST write stats)
```
TEST_TMPDIR=/dev/shm ./db_basic_bench_pre_pr --benchmark_filter=DBPut/comp_style:0/max_data:107374182400/per_key_size:256/enable_statistics:1/wal:1  --benchmark_repetitions=1000
-- default: 1 thread is used to run benchmark

Pre-pr: avg 3848.10 ns
3814,3838,3839,3848,3854,3854,3854,3860,3860,3860

Post-pr: avg 3874.20 ns, regressed 0.68%
3863,3867,3871,3874,3875,3877,3877,3877,3880,3881
```

Pull Request resolved: https://github.com/facebook/rocksdb/pull/11910

Reviewed By: ajkr

Differential Revision: D49788060

Pulled By: hx235

fbshipit-source-id: 79e73699cda5be3b66461687e5147c2484fc5eff
2023-12-29 15:29:23 -08:00
haobo sun 2a8b2df383 Add async_io for Java API (#12184)
Summary:
Fixes https://github.com/facebook/rocksdb/issues/12183

Pull Request resolved: https://github.com/facebook/rocksdb/pull/12184

Reviewed By: hx235

Differential Revision: D52421787

Pulled By: ajkr

fbshipit-source-id: ad3bdae9be51bef5a208b02ceb08f6feb9fac8e4
2023-12-26 14:33:11 -08:00
Nicolas Pepin-Perreault 5b073a7daa Access SST full file checksum via RocksDB#getLiveFilesMetadata (#11770)
Summary:
**Description**

This PR passes along the native `LiveFileMetaData#file_checksum` field from the C++ class to the Java API as a copied byte array. If there is no file checksum generator factory set beforehand, then the array will empty. Please advise if you'd rather it be null - an empty array means one extra allocation, but it avoids possible null pointer exceptions.

> **Note**
> This functionality complements but does not supersede https://github.com/facebook/rocksdb/issues/11736

It's outside the scope here to add support for Java based `FileChecksumGenFactory` implementations. As a workaround, users can already use the built-in one by creating their initial `DBOptions` via properties:

```java
final Properties props = new Properties();
props.put("file_checksum_gen_factory", "FileChecksumGenCrc32cFactory");

try (final DBOptions dbOptions = DBOptions.getDBOptionsFromProps(props);
     final ColumnFamilyOptions cfOptions = new ColumnFamilyOptions();
     final Options options = new Options(dbOptions, cfOptions).setCreateIfMissing(true)) {
// do stuff
}
```

I wanted to add a better test, but unfortunately there's no available CRC32C implementation available in Java 8 without adding a dependency or adding a JNI helper for RocksDB's own implementation (or bumping the minimum version for tests to Java 9). That said, I understand the test is rather poor, so happy to change it to whatever you'd like.

**Context**

To give some context, we replicate RocksDB checkpoints to other nodes. Part of this is verifying the integrity of each file during replication. With a large enough RocksDB, computing the checksum ourselves is prohibitively expensive. Since SST files comprise the bulk of the data, we'd much rather delegate this to RocksDB on file write, and read it back after to compare.

It's likely we will provide a follow up to read the file checksum list directly from the manifest without having to open the DB, but this was the easiest first step to get it working for us.

Pull Request resolved: https://github.com/facebook/rocksdb/pull/11770

Reviewed By: hx235

Differential Revision: D52420729

Pulled By: ajkr

fbshipit-source-id: a873de35a48aaf315e125733091cd221a97b9073
2023-12-26 14:02:36 -08:00
Adam Retter d8c1ab8b2d Add Iterator::Refresh(Snapshot*) to RocksJava (#12145)
Summary:
Adds the API to RocksJava.
Also improves the C++ doc for Iterator::Refresh(Snapshot*)
Closes https://github.com/facebook/rocksdb/issues/12095

Pull Request resolved: https://github.com/facebook/rocksdb/pull/12145

Reviewed By: hx235

Differential Revision: D52266452

Pulled By: ajkr

fbshipit-source-id: 6b72b41672081b966b0c5dd07d9bf151ed009122
2023-12-20 18:03:42 -08:00
akankshamahajan 7b24dec25d Fix header files to meet Open source requirements (#12164)
Summary:
Same as title

Pull Request resolved: https://github.com/facebook/rocksdb/pull/12164

Reviewed By: hx235

Differential Revision: D52302234

Pulled By: akankshamahajan15

fbshipit-source-id: d4724fc944c773242788f5a47d1c7eadbbc5a522
2023-12-19 13:43:17 -08:00
Radek Hubner f7486ff6a3 Add deletion-triggered compaction to RocksJava (#12028)
Summary: Pull Request resolved: https://github.com/facebook/rocksdb/pull/12028

Reviewed By: akankshamahajan15

Differential Revision: D52264983

Pulled By: ajkr

fbshipit-source-id: 02d08015b4bffac06d889dc1be50a51d03f891b3
2023-12-18 13:43:01 -08:00
anand76 cc069f25b3 Add some compressed and tiered secondary cache stats (#12150)
Summary:
Add statistics for more visibility.

Pull Request resolved: https://github.com/facebook/rocksdb/pull/12150

Reviewed By: akankshamahajan15

Differential Revision: D52184633

Pulled By: anand1976

fbshipit-source-id: 9969e05d65223811cd12627102b020bb6d229352
2023-12-15 11:34:08 -08:00
Ludovic Henry 5502f06729 Add support for linux-riscv64 (#12139)
Summary:
Following https://github.com/evolvedbinary/docker-rocksjava/pull/2, we can now build rocksdb on riscv64.

I've verified this works as expected with `make rocksdbjavastaticdockerriscv64`.

Also fixes https://github.com/facebook/rocksdb/issues/10500 https://github.com/facebook/rocksdb/issues/11994

Pull Request resolved: https://github.com/facebook/rocksdb/pull/12139

Reviewed By: jaykorean

Differential Revision: D52128098

Pulled By: akankshamahajan15

fbshipit-source-id: 706d36a3f8a9e990b76f426bc450937a0cd1a537
2023-12-14 11:27:17 -08:00
Radek Hubner 5c5e018943 Fix JNI lazy load regression. (#12133)
Summary:
A small regression that conflicted with PR https://github.com/facebook/rocksdb/pull/12133 was later merged in commit 2296c624fa (diff-26d3ab8a3d764183d0ea3aea834fe481eec2347c334b918ebd7bdce4bcabcc19R35)
This PR addresses that regression. Closes https://github.com/facebook/rocksdb/issues/12132

Pull Request resolved: https://github.com/facebook/rocksdb/pull/12133

Reviewed By: jowlyzhang

Differential Revision: D52041736

Pulled By: ltamasi

fbshipit-source-id: 33db57035154c833ae00b5d921b17b3be80c8dd7
2023-12-11 11:21:52 -08:00
Alan Paxton 5a063ecd34 Java API consistency between RocksDB.put() , .merge() and Transaction.put() , .merge() (#11019)
Summary:
### Implement new Java API get()/put()/merge() methods, and transactional variants.

The Java API methods are very inconsistent in terms of how they pass parameters (byte[], ByteBuffer), and what variants and defaulted parameters they support. We try to bring some consistency to this.
 * All APIs should support calls with ByteBuffer parameters.
 * Similar methods (RocksDB.get() vs Transaction.get()) should support as similar as possible sets of parameters for predictability.
 * get()-like methods should provide variants where the caller supplies the target buffer, for the sake of efficiency. Allocation costs in Java can be significant when large buffers are repeatedly allocated and freed.

### API Additions

 1. RockDB.get implement indirect ByteBuffers. Added indirect ByteBuffers and supporting native methods for get().
 2. RocksDB.Iterator implement missing (byte[], offset, length) variants for key() and value() parameters.
 3. Transaction.get() implement missing methods, based on RocksDB.get. Added ByteBuffer.get with and without column family. Added byte[]-as-target get.
 4. Transaction.iterator() implement a getIterator() which defaults ReadOptions; as per RocksDB.iterator(). Rationalize support API for this and RocksDB.iterator()
 5. RocksDB.merge implement ByteBuffer methods; both direct and indirect buffers. Shadow the methods of RocksDB.put; RocksDB.put only offers ByteBuffer API with explicit WriteOptions. Duplicated this with RocksDB.merge
 6. Transaction.merge implement methods as per RocksDB.merge methods. Transaction is already constructed with WriteOptions, so no explicit WriteOptions methods required.
 7. Transaction.mergeUntracked implement the same API methods as Transaction.merge except the ones that use assumeTracked, because that’s not a feature of merge untracked.

### Support Changes (C++)

The current JNI code in C++ supports multiple variants of methods through a number of helper functions. There are numerous TODO suggestions in the code proposing that the helpers be re-factored/shared.

We have taken a different approach for the new methods; we have created wrapper classes `JDirectBufferSlice`, `JDirectBufferPinnableSlice`, `JByteArraySlice` and `JByteArrayPinnableSlice` RAII classes which construct slices from JNI parameters and can then be passed directly to RocksDB methods. For instance, the `Java_org_rocksdb_Transaction_getDirect` method is implemented like this:

```
  try {
    ROCKSDB_NAMESPACE::JDirectBufferSlice key(env, jkey_bb, jkey_off,
                                              jkey_part_len);
    ROCKSDB_NAMESPACE::JDirectBufferPinnableSlice value(env, jval_bb, jval_off,
                                                        jval_part_len);
    ROCKSDB_NAMESPACE::KVException::ThrowOnError(
        env, txn->Get(*read_options, column_family_handle, key.slice(),
                      &value.pinnable_slice()));
    return value.Fetch();
  } catch (const ROCKSDB_NAMESPACE::KVException& e) {
    return e.Code();
  }
```
Notice the try/catch mechanism with the `KVException` class, which combined with RAII and the wrapper classes means that there is no ad-hoc cleanup necessary in the JNI methods.

We propose to extend this mechanism to existing JNI methods as further work.

### Support Changes (Java)

Where there are multiple parameter-variant versions of the same method, we use fewer or just one supporting native method for all of them. This makes maintenance a bit easier and reduces the opportunity for coding errors mixing up (untyped) object handles.

In  order to support this efficiently, some classes need to have default values for column families and read options added and cached so that they are not re-constructed on every method call.

This PR closes https://github.com/facebook/rocksdb/issues/9776

Pull Request resolved: https://github.com/facebook/rocksdb/pull/11019

Reviewed By: ajkr

Differential Revision: D52039446

Pulled By: jowlyzhang

fbshipit-source-id: 45d0140a4887e42134d2e56520e9b8efbd349660
2023-12-11 11:03:17 -08:00
Alexander Kiel 6e7701d49b Fix JavaDoc of setCompactionReadaheadSize (#12090)
Summary:
Recently in https://github.com/facebook/rocksdb/issues/11762 the default of `compaction_readahead_size` changed from 0 to 2 MB.

Closes: https://github.com/facebook/rocksdb/issues/12088

Pull Request resolved: https://github.com/facebook/rocksdb/pull/12090

Reviewed By: jaykorean

Differential Revision: D51531762

Pulled By: ajkr

fbshipit-source-id: a0b7145a1dca95ee90ffa3553f6eeacce6424aee
2023-11-27 11:50:53 -08:00
Changyu Bi b059c5680e Add missing copyright header (#12076)
Summary:
Required for open source repo.

Pull Request resolved: https://github.com/facebook/rocksdb/pull/12076

Reviewed By: ajkr

Differential Revision: D51449839

Pulled By: cbi42

fbshipit-source-id: 4a25a3422880db3f28a2834d966341935db32530
2023-11-19 09:50:59 -08:00
Radek Hubner 2f9ea8193f Add HyperClockCache Java API. (#12065)
Summary:
Fix https://github.com/facebook/rocksdb/issues/11510

Pull Request resolved: https://github.com/facebook/rocksdb/pull/12065

Reviewed By: ajkr

Differential Revision: D51406695

Pulled By: cbi42

fbshipit-source-id: b9e32da5f9bcafb5365e4349f7295be90d5aa7ba
2023-11-16 15:46:31 -08:00
Andrew Kryczka 9202db1867 Consider archived WALs for deletion more frequently (#12069)
Summary:
Fixes https://github.com/facebook/rocksdb/issues/11000.

That issue pointed out that RocksDB was slow to delete archived WALs in case time-based and size-based expiration were enabled, and the time-based threshold (`WAL_ttl_seconds`) was small. This PR prevents the delay by taking into account `WAL_ttl_seconds` when deciding the frequency to process archived WALs for deletion.

Pull Request resolved: https://github.com/facebook/rocksdb/pull/12069

Reviewed By: pdillinger

Differential Revision: D51262589

Pulled By: ajkr

fbshipit-source-id: e65431a06ee96f4c599ba84a27d1aedebecbb003
2023-11-15 15:42:28 -08:00
Jay Huh 8b8f6c63ef ColumnFamilyHandle Nullcheck in GetEntity and MultiGetEntity (#12057)
Summary:
- Add missing null check for ColumnFamilyHandle in `GetEntity()`
- `FailIfCfHasTs()` now returns `Status::InvalidArgument()` if `column_family` is null. `MultiGetEntity()` can rely on this for cfh null check.
- Added `DeleteRange` API using Default Column Family to be consistent with other major APIs (This was also causing Java Test failure after the `FailIfCfHasTs()` change)

Pull Request resolved: https://github.com/facebook/rocksdb/pull/12057

Test Plan:
- Updated `DBWideBasicTest::GetEntityAsPinnableAttributeGroups` to include null CF case
- Updated `DBWideBasicTest::MultiCFMultiGetEntityAsPinnableAttributeGroups` to include null CF case

Reviewed By: jowlyzhang

Differential Revision: D51167445

Pulled By: jaykorean

fbshipit-source-id: 1c1e44fd7b7df4d2dc3bb2d7d251da85bad7d664
2023-11-13 14:30:04 -08:00
Yu Zhang c6c683a0ca Remove the default force behavior for `EnableFileDeletion` API (#12001)
Summary:
Disabling file deletion can be critical for operations like making a backup, recovery from manifest IO error (for now). Ideally as long as there is one caller requesting file deletion disabled, it should be kept disabled until all callers agree to re-enable it. So this PR removes the default forcing behavior for the `EnableFileDeletion` API, and users need to explicitly pass the argument if they insisted on doing so knowing the consequence of what can be potentially disrupted.

This PR removes the API's default argument value so it will cause breakage for all users that are relying on the default value, regardless of whether the forcing behavior is critical for them.  When fixing this breakage, it's good to check if the forcing behavior is indeed needed and potential disruption is OK.

This PR also makes unit test that do not need force behavior to do a regular enable file deletion.

Pull Request resolved: https://github.com/facebook/rocksdb/pull/12001

Reviewed By: ajkr

Differential Revision: D51214683

Pulled By: jowlyzhang

fbshipit-source-id: ca7b1ebf15c09eed00f954da2f75c00d2c6a97e4
2023-11-10 14:35:54 -08:00
马越 19768a923a Add jni Support for API CreateColumnFamilyWithImport (#11646)
Summary:
- Add the following missing options to src/main/java/org/rocksdb/ImportColumnFamilyOptions.java and in java/rocksjni/import_column_family_options.cc in RocksJava.
- Add the struct to src/main/java/org/rocksdb/ExportImportFilesMetaData.java and in java/rocksjni/export_import_files_metadatajni.cc in RocksJava.
- Add New Java API `createColumnFamilyWithImport` to src/main/java/org/rocksdb/RocksDB.java
- Add New Java API `exportColumnFamily` to src/main/java/org/rocksdb/Checkpoint.java

Pull Request resolved: https://github.com/facebook/rocksdb/pull/11646

Test Plan:
- added unit tests for exportColumnFamily in org.rocksdb.CheckpointTest
- added unit tests for createColumnFamilyWithImport to org.rocksdb.ImportColumnFamilyTest

Reviewed By: ajkr

Differential Revision: D50889700

Pulled By: cbi42

fbshipit-source-id: d623b35e445bba62a0d3c007d74352e937678f6c
2023-11-06 07:38:42 -08:00
马越 8e1adab5ce add RocksDB#clipColumnFamily to Java API (#11868)
Summary:
### main change:

- add java clipColumnFamily api in Rocksdb.java
The method signature of the new API is
 ```
 public void clipColumnFamily(final ColumnFamilyHandle columnFamilyHandle, final byte[] beginKey,
      final byte[] endKey)
```
### Test
add unit test RocksDBTest#clipColumnFamily()

Pull Request resolved: https://github.com/facebook/rocksdb/pull/11868

Reviewed By: jaykorean

Differential Revision: D50889783

Pulled By: cbi42

fbshipit-source-id: 7f545171ad9adb9c20bdd92efae2e6bc55d5703f
2023-11-02 08:00:08 -07:00
Radek Hubner b3fd3838d4 Remove build dependencies for java tests. (#12021)
Summary:
Final fix for https://github.com/facebook/rocksdb/issues/12013

- Reverting back changes on CirleCI explicit image declaration.
- Removed CMake dependencies between java classed and java test classes.

Pull Request resolved: https://github.com/facebook/rocksdb/pull/12021

Reviewed By: akankshamahajan15

Differential Revision: D50745392

Pulled By: ajkr

fbshipit-source-id: 6a7a1da1e7e4da8da72130c9272915974e10fffc
2023-10-30 10:08:19 -07:00
anand76 e81393e81e Add some stats to observe the usefulness of scan prefetching (#11981)
Summary:
Add stats for better observability of scan prefetching. Its only implemented for sync scan right now. These stats can help inform future improvements in scan prefetching.

Pull Request resolved: https://github.com/facebook/rocksdb/pull/11981

Test Plan: Add a new unit test

Reviewed By: akankshamahajan15

Differential Revision: D50516505

Pulled By: anand1976

fbshipit-source-id: cb1cc6cf02df8295930a49c62b11870020df3f97
2023-10-23 14:42:44 -07:00
Jay Huh 2e514e4b4a Fix copyright header (#11986)
Summary:
Add missing copyright header

Pull Request resolved: https://github.com/facebook/rocksdb/pull/11986

Test Plan: N/A

Reviewed By: hx235

Differential Revision: D50461904

Pulled By: jaykorean

fbshipit-source-id: b1b2704890f4a0bb3c9b464b01468e85a5365a8e
2023-10-19 10:30:54 -07:00
Hui Xiao 0836a2b26d New tickers on deletion compactions grouped by reasons (#11957)
Summary:
Context/Summary: as titled

Pull Request resolved: https://github.com/facebook/rocksdb/pull/11957

Test Plan: piggyback on existing tests; fixed a failed test due to adding new stats

Reviewed By: ajkr, cbi42

Differential Revision: D50294310

Pulled By: hx235

fbshipit-source-id: d99b97ebac41efc1bdeaf9ca7a1debd2927d54cd
2023-10-18 18:00:07 -07:00
Radek Hubner 55590012ae Add RocksJava tests to CMake (#11756)
Summary:
Refactor CMake build to allow run java tests via CMake, including Windows.

Pull Request resolved: https://github.com/facebook/rocksdb/pull/11756

Reviewed By: hx235

Differential Revision: D50370739

Pulled By: akankshamahajan15

fbshipit-source-id: ae05cc08a0f9bb2a0a4f1ece02c523fb465bb817
2023-10-18 12:51:50 -07:00
Radek Hubner a80e3f6c57 Add keyExists Java API (#11705)
Summary:
Add a new method to check if a key exists in the database. It avoids copying data between C++ and Java code.

Pull Request resolved: https://github.com/facebook/rocksdb/pull/11705

Reviewed By: ajkr

Differential Revision: D50370934

Pulled By: akankshamahajan15

fbshipit-source-id: ab2d42213fbebcaff919b0ffbbef9d45e88ca365
2023-10-18 12:46:35 -07:00
Radek Hubner 0bb3a26d89 Lazy load native library in Statistics constructor. (#11953)
Summary:
Should fix https://github.com/facebook/rocksdb/issues/9667 and follow the same appropoach as https://github.com/facebook/rocksdb/issues/11919

Pull Request resolved: https://github.com/facebook/rocksdb/pull/11953

Reviewed By: hx235

Differential Revision: D50307456

Pulled By: ajkr

fbshipit-source-id: 43e7e671e8b04875185b38284cefd4c3e11981fa
2023-10-18 11:01:04 -07:00
Alan Paxton 2296c624fa Perform java static checks in CI (#11221)
Summary:
Integrate pmd on the Java API to catch and report common Java coding problems; fix or suppress a basic set of PMD checks.

Link pmd into java build / CI

Add a pmd dependency to maven
Add a jpmd target to Makefile which runs pmd
Add a workflow to Circle CI which runs pmd
Configure an initial default pmd for CI

Repair lots of very simple PMD reports generated when we apply pmd-rules.xml

Repair or exception for PMD rules in the CloseResource category, which finds unclosed AutoCloseable resources.
We special-case the configuration of CloseResource and use the // NOPMD comment in source the avoid reports where we are the API creating an AutoCloseable, and therefore returning an unclosed resource is correct behaviour.

Pull Request resolved: https://github.com/facebook/rocksdb/pull/11221

Reviewed By: akankshamahajan15

Differential Revision: D50369930

Pulled By: jowlyzhang

fbshipit-source-id: a41c36b44b3bab7644df3e9cc16afbdf33b84f6b
2023-10-17 10:04:35 -07:00
Alan Paxton b2fe14817e java API - load block based table config (#10826)
Summary:
Closes https://github.com/facebook/rocksdb/issues/5297

The BlockBasedTableConfig (or more generally, the TableFormatConfig) of ColumnFamilyOptions, isn't being constructed when column family options are loaded. This happens in `OptionsUtil` which implements the loading.

In `OptionsUtil` we add the method `private native static TableFormatConfig readTableFormatConfig(final long nativeHandle_)` which defers to a JNI method which creates a `TableFormatConfig` (specifically a `BlockBasedTableConfig`) for the supplied `ColumnFamilyOptions`, by copying the table format attached to the C++ column family options. A new Java constructor for `BlockBasedTableConfig` is implemented which is called from C++ with the parameters retrieved from the table format, and then returned to the calling `readTableFormatConfig`.

At the Java side in `OptionsUtil`, the new `TableFormatConfig` is added as the `tableFormatConfig_` field of the `ColumnFamilyOptions`.

To support this, the new class `BlockBasedTableOptionsJni` and associated support methods are added to 'portal.h'.

`BloomFilter.java` has a constructor and field added so that the filter in use can be read back and inspected.

`FilterPolicyType.java` implements an enum (shadowed in C++) to support transfer of filter policy information back to Java from being read at the C++ side.

 Tests written to cover the block based table config, and cleaned up and generalised a bit as some of the methods on OptionsUtil weren't tested; and these had their own unique JNI method variants which in turn were never exercised in test.

Pull Request resolved: https://github.com/facebook/rocksdb/pull/10826

Reviewed By: ajkr

Differential Revision: D50136247

Pulled By: jowlyzhang

fbshipit-source-id: 39387448147abc574e99f43979d89b0900e5f81d
2023-10-12 09:39:01 -07:00
Levi Tamasi 5e2906c288 Add missing copyright headers to files added in PR 11805 (#11942)
Summary: Pull Request resolved: https://github.com/facebook/rocksdb/pull/11942

Reviewed By: akankshamahajan15

Differential Revision: D50188986

fbshipit-source-id: 56a8e72eac085470824e33f2126d2a6ec3880400
2023-10-11 12:56:39 -07:00
Radek Hubner 98ab2d80fa Add PerfContext API in Java (#11805)
Summary:
This PR expose RocksDB C++ API for performance measurement in Java.
It's initial implementation and it doesn't support ```level_to_perf_context```

Pull Request resolved: https://github.com/facebook/rocksdb/pull/11805

Reviewed By: akankshamahajan15

Differential Revision: D50128356

Pulled By: ltamasi

fbshipit-source-id: afb35980a89129a30d4a6b4cce12352c9de186b6
2023-10-10 11:07:33 -07:00
Radek Hubner f1aa17c73f Lazy load java native library (#11919)
Summary:
This address https://github.com/facebook/rocksdb/issues/11277. Java native library is not anymore loaded until the code is first used.
It should allow to manually load native library from different location with `RocksDB#loadLibrary(List<java.lang.String>)`

Pull Request resolved: https://github.com/facebook/rocksdb/pull/11919

Reviewed By: jaykorean

Differential Revision: D50103182

Pulled By: ltamasi

fbshipit-source-id: 6090b529c7299b032f4e93cd0c3025a60f58652f
2023-10-10 08:40:07 -07:00
akankshamahajan f65a0379f0 Implement trimming of readhead size when upper bound is specified (#11684)
Summary:
Implement trimming of readahead_size under a new option ReadOptions.auto_readahead_size. It'll trim the readahead_size during prefetching upto iterate_upper_bound offset only when ReadOptions.iterate_upper_bound is set, therefore reducing the prefetching of data beyond upper_bound.
It's enabled for both implicit auto readahead size and when ReadOptions.readahead_size is specified and for sync and async_io.

Pull Request resolved: https://github.com/facebook/rocksdb/pull/11684

Test Plan: Added new unit test

Reviewed By: anand1976

Differential Revision: D48479723

Pulled By: akankshamahajan15

fbshipit-source-id: 2b1703579caf779105e836b580866ffd7db076fc
2023-08-18 15:52:04 -07:00
Hui Xiao 9a034801ce Group rocksdb.sst.read.micros stat by different user read IOActivity + misc (#11444)
Summary:
**Context/Summary:**
- Similar to https://github.com/facebook/rocksdb/pull/11288 but for user read such as `Get(), MultiGet(), DBIterator::XXX(), Verify(File)Checksum()`.
   - For this, I refactored some user-facing `MultiGet` calls in `TransactionBase` and various types of `DB` so that it does not call a user-facing `Get()` but `GetImpl()` for passing the `ReadOptions::io_activity` check (see PR conversation)
   - New user read stats breakdown are guarded by `kExceptDetailedTimers` since measurement shows they have 4-5% regression to the upstream/main.

- Misc
   - More refactoring: with https://github.com/facebook/rocksdb/pull/11288, we complete passing `ReadOptions/IOOptions` to FS level. So we can now replace the previously [added](https://github.com/facebook/rocksdb/pull/9424) `rate_limiter_priority` parameter in `RandomAccessFileReader`'s `Read/MultiRead/Prefetch()` with `IOOptions::rate_limiter_priority`
   - Also, `ReadAsync()` call time is measured in `SST_READ_MICRO` now

Pull Request resolved: https://github.com/facebook/rocksdb/pull/11444

Test Plan:
- CI fake db crash/stress test
- Microbenchmarking

**Build** `make clean && ROCKSDB_NO_FBCODE=1 DEBUG_LEVEL=0 make -jN db_basic_bench`
- google benchmark version: 604f6fd3f4
- db_basic_bench_base: upstream
- db_basic_bench_pr: db_basic_bench_base + this PR
- asyncread_db_basic_bench_base: upstream + [db basic bench patch for IteratorNext](https://github.com/facebook/rocksdb/compare/main...hx235:rocksdb:micro_bench_async_read)
- asyncread_db_basic_bench_pr: asyncread_db_basic_bench_base + this PR

**Test**

Get
```
TEST_TMPDIR=/dev/shm ./db_basic_bench_{null_stat|base|pr} --benchmark_filter=DBGet/comp_style:0/max_data:134217728/per_key_size:256/enable_statistics:1/negative_query:0/enable_filter:0/mmap:1/threads:1 --benchmark_repetitions=1000
```

Result
```
Coming soon
```

AsyncRead
```
TEST_TMPDIR=/dev/shm ./asyncread_db_basic_bench_{base|pr} --benchmark_filter=IteratorNext/comp_style:0/max_data:134217728/per_key_size:256/enable_statistics:1/async_io:1/include_detailed_timers:0 --benchmark_repetitions=1000 > syncread_db_basic_bench_{base|pr}.out
```

Result
```
Base:
1956,1956,1968,1977,1979,1986,1988,1988,1988,1990,1991,1991,1993,1993,1993,1993,1994,1996,1997,1997,1997,1998,1999,2001,2001,2002,2004,2007,2007,2008,

PR (2.3% regression, due to measuring `SST_READ_MICRO` that wasn't measured before):
1993,2014,2016,2022,2024,2027,2027,2028,2028,2030,2031,2031,2032,2032,2038,2039,2042,2044,2044,2047,2047,2047,2048,2049,2050,2052,2052,2052,2053,2053,
```

Reviewed By: ajkr

Differential Revision: D45918925

Pulled By: hx235

fbshipit-source-id: 58a54560d9ebeb3a59b6d807639692614dad058a
2023-08-08 17:26:50 -07:00
Hui Xiao 09882a52d6 Prepare for deprecation of Options::access_hint_on_compaction_start (#11658)
Summary:
**Context/Summary:**
After https://github.com/facebook/rocksdb/pull/11631, file hint is not longer needed for compaction read. Therefore we can deprecate `Options::access_hint_on_compaction_start`. As this is a public API change, we should first mark the relevant APIs (including the Java's) deprecated and remove it in next major release 9.0.

Pull Request resolved: https://github.com/facebook/rocksdb/pull/11658

Test Plan: No code change

Reviewed By: ajkr

Differential Revision: D47997856

Pulled By: hx235

fbshipit-source-id: 16e015ae7728c224b1caef73143aa9915668f4ac
2023-08-03 17:23:02 -07:00
Vardhan 87a21d08fe Add an option to trigger flush when the number of range deletions reach a threshold (#11358)
Summary:
Add a mutable column family option `memtable_max_range_deletions`. When non-zero, RocksDB will try to flush the current memtable after it has at least `memtable_max_range_deletions` range deletions. Java API is added and crash test is updated accordingly to randomly enable this option.

Pull Request resolved: https://github.com/facebook/rocksdb/pull/11358

Test Plan:
* New unit test: `DBRangeDelTest.MemtableMaxRangeDeletions`
* Ran crash test `python3 ./tools/db_crashtest.py whitebox --simple --memtable_max_range_deletions=20` and saw logs showing flushed memtables usually with 20 range deletions.

Reviewed By: ajkr

Differential Revision: D46582680

Pulled By: cbi42

fbshipit-source-id: f23d6fa8d8264ecf0a18d55c113ba03f5e2504da
2023-08-02 19:58:56 -07:00
Yu Zhang c24ef26ca7 Support switching on / off UDT together with in-Memtable-only feature (#11623)
Summary:
Add support to allow enabling / disabling user-defined timestamps feature for an existing column family in combination with the in-Memtable only feature.

To do this, this PR includes:
1) Log the `persist_user_defined_timestamps` option per column family in Manifest to facilitate detecting an attempt to enable / disable UDT. This entry is enforced to be logged in the same VersionEdit as the user comparator name entry.

2) User-defined timestamps related options are validated when re-opening a column family, including user comparator name and the `persist_user_defined_timestamps` flag. These type of settings and settings change are considered valid:
     a) no user comparator change and no effective `persist_user_defined_timestamp` flag change.
     b) switch user comparator to enable UDT provided the immediately effective `persist_user_defined_timestamps` flag
         is false.
     c) switch user comparator to disable UDT provided that the before-change `persist_user_defined_timestamps` is
         already false.
3) when an attempt to enable UDT is detected, we mark all its existing SST files as "having no UDT" by marking its `FileMetaData.user_defined_timestamps_persisted` flag to false and handle their file boundaries `FileMetaData.smallest`, `FileMetaData.largest` by padding a min timestamp.

4) while enabling / disabling UDT feature, timestamp size inconsistency in existing WAL logs are handled to make it compatible with the running user comparator.

Pull Request resolved: https://github.com/facebook/rocksdb/pull/11623

Test Plan:
```
make all check
./db_with_timestamp_basic_test --gtest-filter="*EnableDisableUDT*"
./db_wal_test --gtest_filter="*EnableDisableUDT*"
```

Reviewed By: ltamasi

Differential Revision: D47636862

Pulled By: jowlyzhang

fbshipit-source-id: dcd19f67292da3c3cc9584c09ad00331c9ab9322
2023-07-26 20:16:32 -07:00
Changyu Bi bc04ec85db Make option `level_compaction_dynamic_level_bytes` true by default (#11525)
Summary:
after https://github.com/facebook/rocksdb/issues/11321 and https://github.com/facebook/rocksdb/issues/11340 (both included in RocksDB v8.2), migration from `level_compaction_dynamic_level_bytes=false` to `level_compaction_dynamic_level_bytes=true` is automatic by RocksDB and requires no manual compaction from user. Making the option true by default as it has several advantages: 1. better space amplification guarantee (a more stable LSM shape). 2. compaction is more adaptive to write traffic. 3. automatic draining of unneeded levels. Wiki is updated with more detail: https://github.com/facebook/rocksdb/wiki/Leveled-Compaction#option-level_compaction_dynamic_level_bytes-and-levels-target-size.

The PR mostly contains fixes for unit tests as they assumed `level_compaction_dynamic_level_bytes=false`. Most notable change is commit f742be330c and b1928e42b3 which override the default option in DBTestBase to still set `level_compaction_dynamic_level_bytes=false` by default. This helps to reduce the change needed for unit tests. I think this default option override in unit tests is okay since the behavior of `level_compaction_dynamic_level_bytes=true` is tested by explicitly setting this option. Also, `level_compaction_dynamic_level_bytes=false` may be more desired in unit tests as it makes it easier to create a desired LSM shape.

Comment for option `level_compaction_dynamic_level_bytes` is updated to reflect this change and change made in https://github.com/facebook/rocksdb/issues/10057.

Pull Request resolved: https://github.com/facebook/rocksdb/pull/11525

Test Plan: `make -j32 J=32 check` several times to try to catch flaky tests due to this option change.

Reviewed By: ajkr

Differential Revision: D46654256

Pulled By: cbi42

fbshipit-source-id: 6b5827dae124f6f1fdc8cca2ac6f6fcd878830e1
2023-06-15 21:12:39 -07:00
Alan Paxton 93e0715fad Implement missing compactrangeoptions from Java API (#10880)
Summary:
Add the following missing options to `src/main/java/org/rocksdb/CompactRangeOptions.java` and in `java/rocksjni/options.cc` in RocksJava.

For the descriptions and API see the C++ file `include/rocksdb/options.h`, specifically the struct `CompactRangeOptions`

* full_history_ts_low
* canceled

We changed the handle to return an object (of class `Java_org_rocksdb_CompactRangeOptions`) containing a `ROCKSDB_NAMESPACE::CompactRangeOptions` at (almost certainly) 0-offset, rather than a raw `ROCKSDB_NAMESPACE::CompactRangeOptions`.

The `Java_org_rocksdb_CompactRangeOptions` contains as supplementary fields objects (std::string, std::atomic<bool>) which are passed as pointers to the `ROCKSDB_NAMESPACE::CompactRangeOptions` and which must therefore live for as long as the `ROCKSDB_NAMESPACE::CompactRangeOptions`. By placing them in a `Java_org_rocksdb_CompactRangeOptions` we achieve this.

Because the field offset of the `ROCKSDB_NAMESPACE::CompactRangeOptions` member is (very probably) 0, casting the handle to ROCKSDB_NAMESPACE::CompactRangeOptions works (i.e. old methods didn’t have to be changed), but really that’s a minefield and the correct answer is to cast to the correct type (Java_org_rocksdb_CompactRangeOptions) and then use the ROCKSDB_NAMESPACE::CompactRangeOptions field in that. So the get/set methods for existing parameters have this change.

Testing
-------
We added unit tests for getting and setting the newly implemented fields to `CompactRangeOptionsTest`

Pull Request resolved: https://github.com/facebook/rocksdb/pull/10880

Reviewed By: ajkr

Differential Revision: D41482476

Pulled By: anand1976

fbshipit-source-id: c70795e790436fb3544655920adf6fca62ed34e2
2023-05-24 11:04:46 -07:00
Hui Xiao 50046869a4 Add `rocksdb.file.read.db.open.micros` (#11455)
Summary:
**Context/Summary:**
`rocksdb.file.read.db.open.micros` is left out in https://github.com/facebook/rocksdb/pull/11288

Pull Request resolved: https://github.com/facebook/rocksdb/pull/11455

Test Plan:
- db bench
Setup: `./db_bench -db=/dev/shm/testdb/ -statistics=true -benchmarks="fillseq" -key_size=32 -value_size=512 -num=5000 -write_buffer_size=655 -target_file_size_base=655 -disable_auto_compactions=false -compression_type=none -bloom_bits=3`
Run:
`./db_bench --bloom_bits=3 --use_existing_db=1 --seed=1682546046158958 --partition_index_and_filters=1 --statistics=1 -db=/dev/shm/testdb/  -benchmarks=readrandom  -key_size=3200 -value_size=512 -num=0 -write_buffer_size=6550000 -disable_auto_compactions=false -target_file_size_base=6550000 -compression_type=none -file_checksum=1 -cache_size=1`

```
rocksdb.sst.read.micros P50 : 3.979798 P95 : 9.738420 P99 : 19.566667 P100 : 39.000000 COUNT : 2360 SUM : 12148
rocksdb.file.read.flush.micros P50 : 0.000000 P95 : 0.000000 P99 : 0.000000 P100 : 0.000000 COUNT : 0 SUM : 0
rocksdb.file.read.compaction.micros P50 : 0.000000 P95 : 0.000000 P99 : 0.000000 P100 : 0.000000 COUNT : 0 SUM : 0
rocksdb.file.read.db.open.micros P50 : 3.979798 P95 : 9.738420 P99 : 19.566667 P100 : 39.000000 COUNT : 2360 SUM : 12148
```

Reviewed By: ajkr

Differential Revision: D45951934

Pulled By: hx235

fbshipit-source-id: 6c88639dc1b10d98ecccc963ce32a8800495f55b
2023-05-18 09:26:29 -07:00
Alan Paxton e110d713e0 Minimal RocksJava compliance with Java 8 language level (EB 1046) (#10951)
Summary:
Apply a small (and automatic) set of IntelliJ Java inspections/repairs to the Java interface to RocksDB Java and its tests.
Partly enabled by the fact that we now (from RocksDB7) require java 8.

Explicit <p> in empty lines in javadoc comments.

Parameters and variables made final where possible.
Anonymous subclasses converted lambdas.

Some tests which previously used other assertion models were converted to assertj, e.g. (assertThat(actual).isEqualTo(expected)

In a very few cases tests were found to be inoperative or broken, and were repaired. No problems with actual RocksDB behaviour were observed.

This PR is intended to replace https://github.com/facebook/rocksdb/pull/9618 - that PR was not merged, and attempts to rebase it have yielded a questionable looking diff, so we choose to go back to square 1 here, and implement a conservative set of changes.

Pull Request resolved: https://github.com/facebook/rocksdb/pull/10951

Reviewed By: anand1976

Differential Revision: D45057849

Pulled By: ajkr

fbshipit-source-id: e4ea46bfc80518ae86f37702b03ca9352bc11c3d
2023-05-17 19:44:24 -07:00
Peter Dillinger 206fdea3d9 Change internal headers with duplicate names (#11408)
Summary:
In IDE navigation I find it annoying that there are two statistics.h files (etc.) and often land on the wrong one. Here I migrate several headers to use the blah.h <- blah_impl.h <- blah.cc idiom. Although clang-format wants "blah.h" to be the top include for "blah.cc", I think overall this is an improvement.

No public API changes.

Pull Request resolved: https://github.com/facebook/rocksdb/pull/11408

Test Plan: existing tests

Reviewed By: ltamasi

Differential Revision: D45456696

Pulled By: pdillinger

fbshipit-source-id: 809d931253f3272c908cf5facf7e1d32fc507373
2023-05-17 11:27:09 -07:00
Andrew Kryczka 113f3250f1 Add block checksum mismatch ticker stat (#11438)
Summary:
Added a ticker stat, `BLOCK_CHECKSUM_MISMATCH_COUNT`, to count how many block checksum verifications detected a mismatch.

Pull Request resolved: https://github.com/facebook/rocksdb/pull/11438

Test Plan: new unit test

Reviewed By: pdillinger

Differential Revision: D45788179

Pulled By: ajkr

fbshipit-source-id: e2b44eba7c23b3e110ebe69eaa78a710dec2590f
2023-05-12 18:16:11 -07:00
Changyu Bi 8827cd0618 Support compacting files to different temperatures in FIFO compaction (#11428)
Summary:
- Add a new option `CompactionOptionsFIFO::file_temperature_age_thresholds` that allows user to specify age thresholds for compacting files to different temperatures. File temperature can be used to store files in different storage media. The new options allows specifying multiple temperature-age pairs. The option uses struct for a temperature-age pair to use the existing parsing functionality to make the option dynamically settable.
- Deprecate the old option `age_for_warm` that was added for a similar purpose.
- Compaction score calculation logic is updated to check if a file needs to be compacted to change its temperature.
- Some refactoring is done in `FIFOCompactionPicker::PickTemperatureChangeCompaction`.

Pull Request resolved: https://github.com/facebook/rocksdb/pull/11428

Test Plan: adapted unit tests that were for `age_for_warm` to this new option.

Reviewed By: ajkr

Differential Revision: D45611412

Pulled By: cbi42

fbshipit-source-id: 2dc384841f61cc04abb9681e31aa2de0f0b06106
2023-05-11 16:40:59 -07:00
Hui Xiao 151242ce46 Group rocksdb.sst.read.micros stat by IOActivity flush and compaction (#11288)
Summary:
**Context:**
The existing stat rocksdb.sst.read.micros does not reflect each of compaction and flush cases but aggregate them, which is not so helpful for us to understand IO read behavior of each of them.

**Summary**
- Update `StopWatch` and `RandomAccessFileReader` to record `rocksdb.sst.read.micros` and `rocksdb.file.{flush/compaction}.read.micros`
   - Fixed the default histogram in `RandomAccessFileReader`
- New field `ReadOptions/IOOptions::io_activity`; Pass `ReadOptions` through paths under db open, flush and compaction to where we can prepare `IOOptions` and pass it to `RandomAccessFileReader`
- Use `thread_status_util` for assertion in `DbStressFSWrapper` for continuous testing on we are passing correct `io_activity` under db open, flush and compaction

Pull Request resolved: https://github.com/facebook/rocksdb/pull/11288

Test Plan:
- **Stress test**
- **Db bench 1: rocksdb.sst.read.micros COUNT ≈ sum of rocksdb.file.read.flush.micros's and rocksdb.file.read.compaction.micros's.**  (without blob)
     - May not be exactly the same due to `HistogramStat::Add` only guarantees atomic not accuracy across threads.
```
./db_bench -db=/dev/shm/testdb/ -statistics=true -benchmarks="fillseq" -key_size=32 -value_size=512 -num=50000 -write_buffer_size=655 -target_file_size_base=655 -disable_auto_compactions=false -compression_type=none -bloom_bits=3 (-use_plain_table=1 -prefix_size=10)
```
```
// BlockBasedTable
rocksdb.sst.read.micros P50 : 2.009374 P95 : 4.968548 P99 : 8.110362 P100 : 43.000000 COUNT : 40456 SUM : 114805
rocksdb.file.read.flush.micros P50 : 1.871841 P95 : 3.872407 P99 : 5.540541 P100 : 43.000000 COUNT : 2250 SUM : 6116
rocksdb.file.read.compaction.micros P50 : 2.023109 P95 : 5.029149 P99 : 8.196910 P100 : 26.000000 COUNT : 38206 SUM : 108689

// PlainTable
Does not apply
```
- **Db bench 2: performance**

**Read**

SETUP: db with 900 files
```
./db_bench -db=/dev/shm/testdb/ -benchmarks="fillseq" -key_size=32 -value_size=512 -num=50000 -write_buffer_size=655  -disable_auto_compactions=true -target_file_size_base=655 -compression_type=none
```run till convergence
```
./db_bench -seed=1678564177044286 -use_existing_db=true -db=/dev/shm/testdb -benchmarks=readrandom[-X60] -statistics=true -num=1000000 -disable_auto_compactions=true -compression_type=none -bloom_bits=3
```
Pre-change
`readrandom [AVG 60 runs] : 21568 (± 248) ops/sec`
Post-change (no regression, -0.3%)
`readrandom [AVG 60 runs] : 21486 (± 236) ops/sec`

**Compaction/Flush**run till convergence
```
./db_bench -db=/dev/shm/testdb2/ -seed=1678564177044286 -benchmarks="fillseq[-X60]" -key_size=32 -value_size=512 -num=50000 -write_buffer_size=655  -disable_auto_compactions=false -target_file_size_base=655 -compression_type=none

rocksdb.sst.read.micros  COUNT : 33820
rocksdb.sst.read.flush.micros COUNT : 1800
rocksdb.sst.read.compaction.micros COUNT : 32020
```
Pre-change
`fillseq [AVG 46 runs] : 1391 (± 214) ops/sec;    0.7 (± 0.1) MB/sec`

Post-change (no regression, ~-0.4%)
`fillseq [AVG 46 runs] : 1385 (± 216) ops/sec;    0.7 (± 0.1) MB/sec`

Reviewed By: ajkr

Differential Revision: D44007011

Pulled By: hx235

fbshipit-source-id: a54c89e4846dfc9a135389edf3f3eedfea257132
2023-04-21 09:07:18 -07:00
Andrew Kryczka bd80433c73 Set -source 8 in CMAKE_JAVA_COMPILE_FLAGS (#11385)
Summary:
build-windows-vs2022 jobs (e.g., https://app.circleci.com/pipelines/github/facebook/rocksdb/26641/workflows/7d1c58b8-7dd6-4dd6-a222-ecdfb0892c3b/jobs/593583) began failing with:

```
       (CustomBuild target) ->
         CUSTOMBUILD : error : Source option 7 is no longer supported. Use 8 or later. [C:\Users\circleci.PACKER-64370BA5\project\build\java\rocksdbjni_classes.vcxproj]
```

So, this PR tries setting `-source 8` instead.

Pull Request resolved: https://github.com/facebook/rocksdb/pull/11385

Reviewed By: ltamasi

Differential Revision: D45058172

Pulled By: ajkr

fbshipit-source-id: b0daa1ea6f576c8417add40bd6c92710d329c44d
2023-04-18 11:15:25 -07:00
Hui Xiao cb58477185 New stat rocksdb.{cf|db}-write-stall-stats exposed in a structural way (#11300)
Summary:
**Context/Summary:**
Users are interested in figuring out what has caused write stall.
- Refactor write stall related stats from property `kCFStats` into its own db property `rocksdb.cf-write-stall-stats` as a map or string. For now, this only contains count of different combination of (CF-scope `WriteStallCause`) + (`WriteStallCondition`)
- Add new `WriteStallCause::kWriteBufferManagerLimit` to reflect write stall caused by write buffer manager
- Add new `rocksdb.db-write-stall-stats`. For now, this only contains `WriteStallCause::kWriteBufferManagerLimit` + `WriteStallCondition::kStopped`

- Expose functions in new class `WriteStallStatsMapKeys` for examining the above two properties returned as map
- Misc: rename/comment some write stall InternalStats for clarity

Pull Request resolved: https://github.com/facebook/rocksdb/pull/11300

Test Plan:
- New UT
- Stress test
`python3 tools/db_crashtest.py blackbox --simple --get_property_one_in=1`
- Perf test: Both converge very slowly at similar rates but post-change has higher average ops/sec than pre-change even though they are run at the same time.
```
./db_bench -seed=1679014417652004 -db=/dev/shm/testdb/ -statistics=false -benchmarks="fillseq[-X60]" -key_size=32 -value_size=512 -num=100000 -db_write_buffer_size=655 -target_file_size_base=655 -disable_auto_compactions=false -compression_type=none -bloom_bits=3
```
pre-change:
```
fillseq [AVG 15 runs] : 1176 (± 732) ops/sec;    0.6 (± 0.4) MB/sec
fillseq      :    1052.671 micros/op 949 ops/sec 105.267 seconds 100000 operations;    0.5 MB/s
fillseq [AVG 16 runs] : 1162 (± 685) ops/sec;    0.6 (± 0.4) MB/sec
fillseq      :    1387.330 micros/op 720 ops/sec 138.733 seconds 100000 operations;    0.4 MB/s
fillseq [AVG 17 runs] : 1136 (± 646) ops/sec;    0.6 (± 0.3) MB/sec
fillseq      :    1232.011 micros/op 811 ops/sec 123.201 seconds 100000 operations;    0.4 MB/s
fillseq [AVG 18 runs] : 1118 (± 610) ops/sec;    0.6 (± 0.3) MB/sec
fillseq      :    1282.567 micros/op 779 ops/sec 128.257 seconds 100000 operations;    0.4 MB/s
fillseq [AVG 19 runs] : 1100 (± 578) ops/sec;    0.6 (± 0.3) MB/sec
fillseq      :    1914.336 micros/op 522 ops/sec 191.434 seconds 100000 operations;    0.3 MB/s
fillseq [AVG 20 runs] : 1071 (± 551) ops/sec;    0.6 (± 0.3) MB/sec
fillseq      :    1227.510 micros/op 814 ops/sec 122.751 seconds 100000 operations;    0.4 MB/s
fillseq [AVG 21 runs] : 1059 (± 525) ops/sec;    0.5 (± 0.3) MB/sec
```
post-change:
```
fillseq [AVG 15 runs] : 1226 (± 732) ops/sec;    0.6 (± 0.4) MB/sec
fillseq      :    1323.825 micros/op 755 ops/sec 132.383 seconds 100000 operations;    0.4 MB/s
fillseq [AVG 16 runs] : 1196 (± 687) ops/sec;    0.6 (± 0.4) MB/sec
fillseq      :    1223.905 micros/op 817 ops/sec 122.391 seconds 100000 operations;    0.4 MB/s
fillseq [AVG 17 runs] : 1174 (± 647) ops/sec;    0.6 (± 0.3) MB/sec
fillseq      :    1168.996 micros/op 855 ops/sec 116.900 seconds 100000 operations;    0.4 MB/s
fillseq [AVG 18 runs] : 1156 (± 611) ops/sec;    0.6 (± 0.3) MB/sec
fillseq      :    1348.729 micros/op 741 ops/sec 134.873 seconds 100000 operations;    0.4 MB/s
fillseq [AVG 19 runs] : 1134 (± 579) ops/sec;    0.6 (± 0.3) MB/sec
fillseq      :    1196.887 micros/op 835 ops/sec 119.689 seconds 100000 operations;    0.4 MB/s
fillseq [AVG 20 runs] : 1119 (± 550) ops/sec;    0.6 (± 0.3) MB/sec
fillseq      :    1193.697 micros/op 837 ops/sec 119.370 seconds 100000 operations;    0.4 MB/s
fillseq [AVG 21 runs] : 1106 (± 524) ops/sec;    0.6 (± 0.3) MB/sec
```

Reviewed By: ajkr

Differential Revision: D44159541

Pulled By: hx235

fbshipit-source-id: 8d29efb70001fdc52d34535eeb3364fc3e71e40b
2023-03-18 09:51:58 -07:00
Hui Xiao bab5f9a6f2 Add new stat rocksdb.table.open.prefetch.tail.read.bytes, rocksdb.table.open.prefetch.tail.{miss|hit} (#11265)
Summary:
**Context/Summary:**
We are adding new stats to measure behavior of prefetched tail size and look up into this buffer

The stat collection is done in FilePrefetchBuffer but only for prefetched tail buffer during table open for now using FilePrefetchBuffer enum. It's cleaner than the alternative of implementing in upper-level call places of FilePrefetchBuffer for table open. It also has the benefit of extensible to other types of FilePrefetchBuffer if needed. See db bench for perf regression concern.

Pull Request resolved: https://github.com/facebook/rocksdb/pull/11265

Test Plan:
**- Piggyback on existing test**
**- rocksdb.table.open.prefetch.tail.miss is harder to UT so I manually set prefetch tail read bytes to be small and run db bench.**
```
./db_bench -db=/tmp/testdb -statistics=true -benchmarks="fillseq" -key_size=32 -value_size=512 -num=5000 -write_buffer_size=655 -target_file_size_base=655 -disable_auto_compactions=false -compression_type=none -bloom_bits=3  -use_direct_reads=true
```
```
rocksdb.table.open.prefetch.tail.read.bytes P50 : 4096.000000 P95 : 4096.000000 P99 : 4096.000000 P100 : 4096.000000 COUNT : 225 SUM : 921600
rocksdb.table.open.prefetch.tail.miss COUNT : 91
rocksdb.table.open.prefetch.tail.hit COUNT : 1034
```
**- No perf regression observed in db_bench**

SETUP command: create same db with ~900 files for pre-change/post-change.
```
./db_bench -db=/tmp/testdb -benchmarks="fillseq" -key_size=32 -value_size=512 -num=500000 -write_buffer_size=655360  -disable_auto_compactions=true -target_file_size_base=16777216 -compression_type=none
```
TEST command 60 runs or til convergence: as suggested by anand1976 and akankshamahajan15, vary `seek_nexts` and `async_io` in testing.
```
./db_bench -use_existing_db=true -db=/tmp/testdb -statistics=false -cache_size=0 -cache_index_and_filter_blocks=false -benchmarks=seekrandom[-X60] -num=50000 -seek_nexts={10, 500, 1000} -async_io={0|1} -use_direct_reads=true
```
async io = 0, direct io read = true

  | seek_nexts = 10, 30 runs | seek_nexts = 500, 12 runs | seek_nexts = 1000, 6 runs
-- | -- | -- | --
pre-post change | 4776 (± 28) ops/sec;   24.8 (± 0.1) MB/sec | 288 (± 1) ops/sec;   74.8 (± 0.4) MB/sec | 145 (± 4) ops/sec;   75.6 (± 2.2) MB/sec
post-change | 4790 (± 32) ops/sec;   24.9 (± 0.2) MB/sec | 288 (± 3) ops/sec;   74.7 (± 0.8) MB/sec | 143 (± 3) ops/sec;   74.5 (± 1.6) MB/sec

async io = 1, direct io read = true
  | seek_nexts = 10, 54 runs | seek_nexts = 500, 6 runs | seek_nexts = 1000, 4 runs
-- | -- | -- | --
pre-post change | 3350 (± 36) ops/sec;   17.4 (± 0.2) MB/sec | 264 (± 0) ops/sec;   68.7 (± 0.2) MB/sec | 138 (± 1) ops/sec;   71.8 (± 1.0) MB/sec
post-change | 3358 (± 27) ops/sec;   17.4 (± 0.1) MB/sec  | 263 (± 2) ops/sec;   68.3 (± 0.8) MB/sec | 139 (± 1) ops/sec;   72.6 (± 0.6) MB/sec

Reviewed By: ajkr

Differential Revision: D43781467

Pulled By: hx235

fbshipit-source-id: a706a18472a8edb2b952bac3af40eec803537f2a
2023-03-15 14:02:43 -07:00
Alan Paxton fbd603d04a Reverse wrong order of parameter names for Java WriteBatchWithIndex#iteratorWithBase (#11280)
Summary:
Fix for https://github.com/facebook/rocksdb/issues/11008

`Java_org_rocksdb_WriteBatchWithIndex_iteratorWithBase` takes parameters `(… jlong jwbwi_handle, jlong jcf_handle,
    jlong jbase_iterator_handle, jlong jread_opts_handle)` while `WriteBatchWithIndex.java` declares `private native long iteratorWithBase(final long handle, final long baseIteratorHandle,
      final long cfHandle, final long readOptionsHandle)`.

Luckily the only call to `iteratorWithBase` passes the parameters in the correct order for the implementation `(… cfHandle, baseIteratorHandle …)` This type checks because the types are the same (long words).

The code is currently used correctly, it is just extremely misleading. Swap the names of the 2 parameters in the Java method so that the correct usage is clear.

There already exist test methods which call the API correctly and only succeed because of that. These continue to work.

Pull Request resolved: https://github.com/facebook/rocksdb/pull/11280

Reviewed By: cbi42

Differential Revision: D43874798

Pulled By: ajkr

fbshipit-source-id: b59bc930bf579f4e0804f0effd4fb17f4225d60c
2023-03-10 12:26:09 -08:00