Commit graph

11 commits

Author SHA1 Message Date
Levi Tamasi 81765866c4 Update HISTORY/version/format compatibility script for the 8.10 release (#12154)
Summary: Pull Request resolved: https://github.com/facebook/rocksdb/pull/12154

Reviewed By: jaykorean, akankshamahajan15

Differential Revision: D52216271

Pulled By: ltamasi

fbshipit-source-id: 13bab72802eeec8f6e3544be9ebcd7f725a64d2e
2023-12-15 14:44:23 -08:00
Alan Paxton 5a063ecd34 Java API consistency between RocksDB.put() , .merge() and Transaction.put() , .merge() (#11019)
Summary:
### Implement new Java API get()/put()/merge() methods, and transactional variants.

The Java API methods are very inconsistent in terms of how they pass parameters (byte[], ByteBuffer), and what variants and defaulted parameters they support. We try to bring some consistency to this.
 * All APIs should support calls with ByteBuffer parameters.
 * Similar methods (RocksDB.get() vs Transaction.get()) should support as similar as possible sets of parameters for predictability.
 * get()-like methods should provide variants where the caller supplies the target buffer, for the sake of efficiency. Allocation costs in Java can be significant when large buffers are repeatedly allocated and freed.

### API Additions

 1. RockDB.get implement indirect ByteBuffers. Added indirect ByteBuffers and supporting native methods for get().
 2. RocksDB.Iterator implement missing (byte[], offset, length) variants for key() and value() parameters.
 3. Transaction.get() implement missing methods, based on RocksDB.get. Added ByteBuffer.get with and without column family. Added byte[]-as-target get.
 4. Transaction.iterator() implement a getIterator() which defaults ReadOptions; as per RocksDB.iterator(). Rationalize support API for this and RocksDB.iterator()
 5. RocksDB.merge implement ByteBuffer methods; both direct and indirect buffers. Shadow the methods of RocksDB.put; RocksDB.put only offers ByteBuffer API with explicit WriteOptions. Duplicated this with RocksDB.merge
 6. Transaction.merge implement methods as per RocksDB.merge methods. Transaction is already constructed with WriteOptions, so no explicit WriteOptions methods required.
 7. Transaction.mergeUntracked implement the same API methods as Transaction.merge except the ones that use assumeTracked, because that’s not a feature of merge untracked.

### Support Changes (C++)

The current JNI code in C++ supports multiple variants of methods through a number of helper functions. There are numerous TODO suggestions in the code proposing that the helpers be re-factored/shared.

We have taken a different approach for the new methods; we have created wrapper classes `JDirectBufferSlice`, `JDirectBufferPinnableSlice`, `JByteArraySlice` and `JByteArrayPinnableSlice` RAII classes which construct slices from JNI parameters and can then be passed directly to RocksDB methods. For instance, the `Java_org_rocksdb_Transaction_getDirect` method is implemented like this:

```
  try {
    ROCKSDB_NAMESPACE::JDirectBufferSlice key(env, jkey_bb, jkey_off,
                                              jkey_part_len);
    ROCKSDB_NAMESPACE::JDirectBufferPinnableSlice value(env, jval_bb, jval_off,
                                                        jval_part_len);
    ROCKSDB_NAMESPACE::KVException::ThrowOnError(
        env, txn->Get(*read_options, column_family_handle, key.slice(),
                      &value.pinnable_slice()));
    return value.Fetch();
  } catch (const ROCKSDB_NAMESPACE::KVException& e) {
    return e.Code();
  }
```
Notice the try/catch mechanism with the `KVException` class, which combined with RAII and the wrapper classes means that there is no ad-hoc cleanup necessary in the JNI methods.

We propose to extend this mechanism to existing JNI methods as further work.

### Support Changes (Java)

Where there are multiple parameter-variant versions of the same method, we use fewer or just one supporting native method for all of them. This makes maintenance a bit easier and reduces the opportunity for coding errors mixing up (untyped) object handles.

In  order to support this efficiently, some classes need to have default values for column families and read options added and cached so that they are not re-constructed on every method call.

This PR closes https://github.com/facebook/rocksdb/issues/9776

Pull Request resolved: https://github.com/facebook/rocksdb/pull/11019

Reviewed By: ajkr

Differential Revision: D52039446

Pulled By: jowlyzhang

fbshipit-source-id: 45d0140a4887e42134d2e56520e9b8efbd349660
2023-12-11 11:03:17 -08:00
Hui Xiao ab15d33566 Update history, version and format testing for 8.8 (#12004)
Summary: Pull Request resolved: https://github.com/facebook/rocksdb/pull/12004

Reviewed By: cbi42

Differential Revision: D50586984

Pulled By: hx235

fbshipit-source-id: 1480a8c2757340ebf83510557104aaa0e437b3ae
2023-10-24 12:03:07 -07:00
Peter Dillinger 141b872bd4 Improve efficiency of create_missing_column_families, light refactor (#11920)
Summary:
In preparing some seqno_to_time_mapping improvements, I found that some of the wrap-up work for creating column families was unnecessarily repeated in the case of DB::Open with create_missing_column_families. This change fixes that (`CreateColumnFamily()` -> `CreateColumnFamilyImpl()` in `DBImpl::Open()`), motivated by avoiding repeated calls to `RegisterRecordSeqnoTimeWorker()` but with the side benefit of avoiding repeated calls to `WriteOptionsFile()` for each CF.

Also in this change:
* Add a `Status::UpdateIfOk()` function for combining statuses in a common pattern
* Rename `max_time_duration` -> `min_preserve_seconds` (include units as much as possible)
* Improved comments in several places

Pull Request resolved: https://github.com/facebook/rocksdb/pull/11920

Test Plan: tests added / updated

Reviewed By: jaykorean

Differential Revision: D49919147

Pulled By: pdillinger

fbshipit-source-id: 3d0318c1d070c842c5331da0a5b415caedc104f1
2023-10-04 14:14:22 -07:00
Changyu Bi 49da91ec09 Update files for version 8.8 (#11878)
Summary: Pull Request resolved: https://github.com/facebook/rocksdb/pull/11878

Reviewed By: ajkr

Differential Revision: D49568389

Pulled By: cbi42

fbshipit-source-id: b2022735799be9b5e81e03dfb418f8b104632ecf
2023-09-23 11:02:19 -07:00
akankshamahajan 3d67b5e8e5 Lookup ahead in block cache ahead to tune Readaheadsize (#11860)
Summary:
Implement block cache lookup to determine readahead_size during scans. It's enabled if auto_readahead_size, block_cache and iterate_upper_bound - all three are set.

Design -
1. Whenever there is a cache miss and FilePrefetchBuffer is called, a callback is made to determine readahead_size for that prefetching.
2. The callback iterates over index and do block cache lookup for each data block handle until existing readahead_size is reached. Then It removes the cache hit data blocks from end to calculate optimized readahead_size.
3. Since index_iter_ is moved, it stores block handles in a queue, and use that queue to get block handle instead of doing index_iter_->Next().
4. This is for Sync scans. Async scans support is in progress.

NOTE:
The issue right now is after Seek and Next, if Prev is called, there is no way to do Prev operation. index_iter_ is already pointing to a different block. So it returns "Not supported" in that case with error message - "auto tuning of readahead size is not supported with Prev op"

Pull Request resolved: https://github.com/facebook/rocksdb/pull/11860

Test Plan:
- Added new unit test
- crash_tests
- Running scans locally to check for any regression

Reviewed By: anand1976

Differential Revision: D49548118

Pulled By: akankshamahajan15

fbshipit-source-id: f1aee409a71b4ad9e5bf3610f43edf30c6630c78
2023-09-22 18:12:08 -07:00
akankshamahajan 5b5b011cdd Avoid double block cache lookup during Seek with async_io option (#11616)
Summary:
With the async_io option, the Seek happens in 2 phases. Phase 1 starts an asynchronous read on a block cache miss, and phase 2 waits for it to complete and finishes the seek. In both phases, BlockBasedTable::NewDataBlockIterator is called, which tries to lookup the block cache for the data block first before looking in the prefetch buffer. It's optimized by doing the block cache lookup only in the first phase and save some CPU.

Pull Request resolved: https://github.com/facebook/rocksdb/pull/11616

Test Plan: Added unit test

Reviewed By: jaykorean

Differential Revision: D47477887

Pulled By: akankshamahajan15

fbshipit-source-id: 0355e0a68fc0ea2eb92340ae42735afcdbcbfd79
2023-09-18 11:32:30 -07:00
Andrew Kryczka cf95821fb6 Update for 8.5.fb branch cut (#11642)
Summary:
Updated the main branch for the 8.5.fb branch cut. Also made unreleased_history/release.sh backdate to the last commit instead of the current date in case the release manager is a laggard like myself.

Pull Request resolved: https://github.com/facebook/rocksdb/pull/11642

Reviewed By: cbi42

Differential Revision: D47783574

Pulled By: ajkr

fbshipit-source-id: 4e2a80f5ccd542dc7dd0d22dfd7e59cb136325a1
2023-08-02 12:34:11 -07:00
akankshamahajan 749b179c04 Remove reallocation of AlignedBuffer in direct_io sync reads if already aligned (#11600)
Summary:
Remove reallocation of AlignedBuffer in direct_io sync reads in RandomAccessFileReader::Read if buffer passed is already aligned.

Pull Request resolved: https://github.com/facebook/rocksdb/pull/11600

Test Plan:
Setup: `TEST_TMPDIR=./tmp-db/ ./db_bench -benchmarks=filluniquerandom -disable_auto_compactions=true -target_file_size_base=1048576 -write_buffer_size=1048576 -compression_type=none`
Benchmark: `TEST_TMPDIR=./tmp-db/ perf record ./db_bench --cache_size=8388608 --use_existing_db=true --disable_auto_compactions=true --benchmarks=seekrandom --use_direct_reads=true -use_direct_io_for_flush_and_compaction=true -reads=1000 -seek_nexts=1 -max_auto_readahead_size=131072 -initial_auto_readahead_size=16384 -adaptive_readahead=true -num_file_reads_for_auto_readahead=0`

Perf profile-
Before:
```
8.73% db_bench libc.so.6 [.] __memmove_evex_unaligned_erms
3.34% db_bench [kernel.vmlinux] [k] filemap_get_read_batch
```

After:
```
2.50% db_bench [kernel.vmlinux] [k] filemap_get_read_batch
2.29% db_bench libc.so.6 [.] __memmove_evex_unaligned_erms
```

`make  crash_test -j `with direct_io enabled completed succesfully locally.

Ran few benchmarks with direct_io from seek_nexts varying between 912 to 327680 and different readahead_size parameters and it showed no regression so far.

Reviewed By: ajkr

Differential Revision: D47478598

Pulled By: akankshamahajan15

fbshipit-source-id: 6a48e21cb34696f5d09c22a6311a3a1cb5f9cf33
2023-07-14 20:08:05 -07:00
Peter Dillinger b1b6f87fbe Some small improvements to HyperClockCache (#11601)
Summary:
Stacked on https://github.com/facebook/rocksdb/issues/11572
* Minimize use of std::function and lambdas to minimize chances of
compiler heap-allocating closures (unnecessary stress on allocator). It
appears that converting FindSlot to a template enables inlining the
lambda parameters, avoiding heap allocations.
* Clean up some logic with FindSlot (FIXMEs from https://github.com/facebook/rocksdb/issues/11572)
* Fix handling of rare case of probing all slots, with new unit test.
(Previously Insert would not roll back displacements in that case, which
would kill performance if it were to happen.)
* Add an -early_exit option to cache_bench for gathering memory stats
before deallocation.

Pull Request resolved: https://github.com/facebook/rocksdb/pull/11601

Test Plan:
unit test added for probing all slots

## Seeing heap allocations
Run `MALLOC_CONF="stats_print:true" ./cache_bench -cache_type=hyper_clock_cache`
before https://github.com/facebook/rocksdb/issues/11572 vs. after this change. Before, we see this in the
interesting bin statistics:

```
size  nrequests
----  ---------
  32     578460
  64      24340
8192     578460
```
And after:
```
size  nrequests
----  ---------
  32  (insignificant)
  64      24370
8192     579130
```

## Performance test
Build with `make USE_CLANG=1 PORTABLE=0 DEBUG_LEVEL=0 -j32 cache_bench`

Run `./cache_bench -cache_type=hyper_clock_cache -ops_per_thread=5000000`
in before and after configurations, simultaneously:

```
Before: Complete in 33.244 s; Rough parallel ops/sec = 2406442
After:  Complete in 32.773 s; Rough parallel ops/sec = 2441019
```

Reviewed By: jowlyzhang

Differential Revision: D47375092

Pulled By: pdillinger

fbshipit-source-id: 46f0f57257ddb374290a0a38c651764ea60ba410
2023-07-14 16:19:22 -07:00
Peter Dillinger 7a9b264f36 Some fixes to unreleased_history/ (#11504)
Summary:
* Add a "Performance Improvements" section
* Add required copyright headers

Pull Request resolved: https://github.com/facebook/rocksdb/pull/11504

Test Plan: manual

Reviewed By: hx235

Differential Revision: D46405128

Pulled By: pdillinger

fbshipit-source-id: 4f878dfd0170d381d3051a44c13479c860e812c0
2023-06-02 15:55:02 -07:00