Commit Graph

715 Commits

Author SHA1 Message Date
Mike Kolupaev 637e64b9ac Add an option to prevent DB::Open() from querying sizes of all sst files (#6353)
Summary:
When paranoid_checks is on, DBImpl::CheckConsistency() iterates over all sst files and calls Env::GetFileSize() for each of them. As far as I could understand, this is pretty arbitrary and doesn't affect correctness - if filesystem doesn't corrupt fsynced files, the file sizes will always match; if it does, it may as well corrupt contents as well as sizes, and rocksdb doesn't check contents on open.

If there are thousands of sst files, getting all their sizes takes a while. If, on top of that, Env is overridden to use some remote storage instead of local filesystem, it can be *really* slow and overload the remote storage service. This PR adds an option to not do GetFileSize(); instead it does GetChildren() for parent directory to check that all the expected sst files are at least present, but doesn't check their sizes.

We can't just disable paranoid_checks instead because paranoid_checks do a few other important things: make the DB read-only on write errors, print error messages on read errors, etc.
Pull Request resolved: https://github.com/facebook/rocksdb/pull/6353

Test Plan: ran the added sanity check unit test. Will try it out in a LogDevice test cluster where the GetFileSize() calls are causing a lot of trouble.

Differential Revision: D19656425

Pulled By: al13n321

fbshipit-source-id: c2c421b367633033760d1f56747bad206d1fbf82
2020-02-04 01:27:26 -08:00
Adam Retter 7242dae7fe Improve RocksJava Comparator (#6252)
Summary:
This is a redesign of the API for RocksJava comparators with the aim of improving performance. It also simplifies the class hierarchy.

**NOTE**: This breaks backwards compatibility for existing 3rd party Comparators implemented in Java... so we need to consider carefully which release branches this goes into.

Previously when implementing a comparator in Java the developer had a choice of subclassing either `DirectComparator` or `Comparator` which would use direct and non-direct byte-buffers resepectively (via `DirectSlice` and `Slice`).

In this redesign there we have eliminated the overhead of using the Java Slice classes, and just use `ByteBuffer`s. The `ComparatorOptions` supplied when constructing a Comparator allow you to choose between direct and non-direct byte buffers by setting `useDirect`.

In addition, the `ComparatorOptions` now allow you to choose whether a ByteBuffer is reused over multiple comparator calls, by setting `maxReusedBufferSize > 0`. When buffers are reused, ComparatorOptions provides a choice of mutex type by setting `useAdaptiveMutex`.

 ---
[JMH benchmarks previously indicated](https://github.com/facebook/rocksdb/pull/6241#issue-356398306) that the difference between C++ and Java for implementing a comparator was ~7x slowdown in Java.

With these changes, when reusing buffers and guarding access to them via mutexes the slowdown is approximately the same. However, these changes offer a new facility to not reuse mutextes, which reduces the slowdown to ~5.5x in Java. We also offer a `thread_local` mechanism for reusing buffers, which reduces slowdown to ~5.2x in Java (closes https://github.com/facebook/rocksdb/pull/4425).

These changes also form a good base for further optimisation work such as further JNI lookup caching, and JNI critical.

 ---
These numbers were captured without jemalloc. With jemalloc, the performance improves for all tests, and the Java slowdown reduces to between 4.8x and 5.x.

```
ComparatorBenchmarks.put                                                native_bytewise  thrpt   25  124483.795 ± 2032.443  ops/s
ComparatorBenchmarks.put                                        native_reverse_bytewise  thrpt   25  114414.536 ± 3486.156  ops/s
ComparatorBenchmarks.put              java_bytewise_non-direct_reused-64_adaptive-mutex  thrpt   25   17228.250 ± 1288.546  ops/s
ComparatorBenchmarks.put          java_bytewise_non-direct_reused-64_non-adaptive-mutex  thrpt   25   16035.865 ± 1248.099  ops/s
ComparatorBenchmarks.put                java_bytewise_non-direct_reused-64_thread-local  thrpt   25   21571.500 ±  871.521  ops/s
ComparatorBenchmarks.put                  java_bytewise_direct_reused-64_adaptive-mutex  thrpt   25   23613.773 ± 8465.660  ops/s
ComparatorBenchmarks.put              java_bytewise_direct_reused-64_non-adaptive-mutex  thrpt   25   16768.172 ± 5618.489  ops/s
ComparatorBenchmarks.put                    java_bytewise_direct_reused-64_thread-local  thrpt   25   23921.164 ± 8734.742  ops/s
ComparatorBenchmarks.put                              java_bytewise_non-direct_no-reuse  thrpt   25   17899.684 ±  839.679  ops/s
ComparatorBenchmarks.put                                  java_bytewise_direct_no-reuse  thrpt   25   22148.316 ± 1215.527  ops/s
ComparatorBenchmarks.put      java_reverse_bytewise_non-direct_reused-64_adaptive-mutex  thrpt   25   11311.126 ±  820.602  ops/s
ComparatorBenchmarks.put  java_reverse_bytewise_non-direct_reused-64_non-adaptive-mutex  thrpt   25   11421.311 ±  807.210  ops/s
ComparatorBenchmarks.put        java_reverse_bytewise_non-direct_reused-64_thread-local  thrpt   25   11554.005 ±  960.556  ops/s
ComparatorBenchmarks.put          java_reverse_bytewise_direct_reused-64_adaptive-mutex  thrpt   25   22960.523 ± 1673.421  ops/s
ComparatorBenchmarks.put      java_reverse_bytewise_direct_reused-64_non-adaptive-mutex  thrpt   25   18293.317 ± 1434.601  ops/s
ComparatorBenchmarks.put            java_reverse_bytewise_direct_reused-64_thread-local  thrpt   25   24479.361 ± 2157.306  ops/s
ComparatorBenchmarks.put                      java_reverse_bytewise_non-direct_no-reuse  thrpt   25    7942.286 ±  626.170  ops/s
ComparatorBenchmarks.put                          java_reverse_bytewise_direct_no-reuse  thrpt   25   11781.955 ± 1019.843  ops/s
```
Pull Request resolved: https://github.com/facebook/rocksdb/pull/6252

Differential Revision: D19331064

Pulled By: pdillinger

fbshipit-source-id: 1f3b794e6a14162b2c3ffb943e8c0e64a0c03738
2020-02-03 12:30:13 -08:00
Peter Dillinger 90c71aa5d9 Don't download from (unreliable) maven.org (#6348)
Summary:
I set up a mirror of our Java deps on github so we can download
them through github URLs rather than maven.org, which is proving
terribly unreliable from Travis builds.

Also sanitized calls to curl, so they are easier to read and
appropriately fail on download failure.
Pull Request resolved: https://github.com/facebook/rocksdb/pull/6348

Test Plan: CI

Differential Revision: D19633621

Pulled By: pdillinger

fbshipit-source-id: 7eb3f730953db2ead758dc94039c040f406790f3
2020-01-30 11:02:08 -08:00
Adam Retter a07a9dc904 Reduce the need to re-download dependencies (#6318)
Summary:
Both changes are related to RocksJava:

1. Allow dependencies that are already present on the host system due to Maven to be reused in Docker builds.

2. Extend the `make clean-not-downloaded` target to RocksJava, so that libraries needed as dependencies for the test suite are not deleted and re-downloaded unnecessarily.
Pull Request resolved: https://github.com/facebook/rocksdb/pull/6318

Differential Revision: D19608742

Pulled By: pdillinger

fbshipit-source-id: 25e25649e3e3212b537ac4512b40e2e53dc02ae7
2020-01-29 08:01:56 -08:00
Levi Tamasi b7f1b3e51c Access Maven Central over HTTPS (#6301)
Summary:
As of 1/15/2020, Maven Central does not support plain HTTP. Because of
this, our Travis and AppVeyor builds have started failing during the
assertj download step. This patch will hopefully fix these issues.

See https://blog.sonatype.com/central-repository-moving-to-https
for more info.
Pull Request resolved: https://github.com/facebook/rocksdb/pull/6301

Test Plan:
Will monitor the builds. ("I don't always test my changes but when I do,
I do it in production.")

Differential Revision: D19422923

Pulled By: ltamasi

fbshipit-source-id: 76f9a8564a5b66ddc721d705f9cbfc736bf7a97d
2020-01-15 17:54:53 -08:00
Adam Retter 6477075f2c JMH microbenchmarks for RocksJava (#6241)
Summary:
This is the start of some JMH microbenchmarks for RocksJava.

Such benchmarks can help us decide on performance improvements of the Java API.

At the moment, I have only added benchmarks for various Comparator options, as that is one of the first areas where I want to improve performance. I plan to expand this to many more tests.

Details of how to compile and run the benchmarks are in the `README.md`.

A run of these on a XEON 3.5 GHz 4vCPU (QEMU Virtual CPU version 2.5+) / 8GB RAM KVM with Ubuntu 18.04, OpenJDK 1.8.0_232, and gcc 8.3.0 produced the following:

```
# Run complete. Total time: 01:43:17

REMEMBER: The numbers below are just data. To gain reusable insights, you need to follow up on
why the numbers are the way they are. Use profilers (see -prof, -lprof), design factorial
experiments, perform baseline and negative tests that provide experimental control, make sure
the benchmarking environment is safe on JVM/OS/HW level, ask for reviews from the domain experts.
Do not assume the numbers tell you what you want them to tell.

Benchmark                                         (comparatorName)   Mode  Cnt       Score       Error  Units
ComparatorBenchmarks.put                           native_bytewise thrpt   25   122373.920 ±  2200.538  ops/s
ComparatorBenchmarks.put              java_bytewise_adaptive_mutex thrpt   25    17388.201 ±  1444.006  ops/s
ComparatorBenchmarks.put          java_bytewise_non-adaptive_mutex thrpt   25    16887.150 ±  1632.204  ops/s
ComparatorBenchmarks.put       java_direct_bytewise_adaptive_mutex thrpt   25    15644.572 ±  1791.189  ops/s
ComparatorBenchmarks.put   java_direct_bytewise_non-adaptive_mutex thrpt   25    14869.601 ±  2252.135  ops/s
ComparatorBenchmarks.put                   native_reverse_bytewise thrpt   25   116528.735 ±  4168.797  ops/s
ComparatorBenchmarks.put      java_reverse_bytewise_adaptive_mutex thrpt   25    10651.975 ±   545.998  ops/s
ComparatorBenchmarks.put  java_reverse_bytewise_non-adaptive_mutex thrpt   25    10514.224 ±   930.069  ops/s
```

Indicating a ~7x difference between comparators implemented natively (C++) and those implemented in Java. Let's see if we can't improve on that in the near future...
Pull Request resolved: https://github.com/facebook/rocksdb/pull/6241

Differential Revision: D19290410

Pulled By: pdillinger

fbshipit-source-id: 25d44bf3a31de265502ed0c5d8a28cf4c7cb9c0b
2020-01-07 15:46:09 -08:00
Adam Retter e697da0b18 RocksDB#keyMayExist should not assume database values are unicode strings (#6186)
Summary:
Closes https://github.com/facebook/rocksdb/issues/6183
Pull Request resolved: https://github.com/facebook/rocksdb/pull/6186

Differential Revision: D19201281

Pulled By: pdillinger

fbshipit-source-id: 1c96b4ea09e826f91e44b0009eba3de0991d9053
2019-12-20 14:27:58 -08:00
Adam Retter 4d3264e4ab Cleanup deprecation warnings and javadoc (#6218)
Summary:
There are no API changes ;-)
Pull Request resolved: https://github.com/facebook/rocksdb/pull/6218

Differential Revision: D19200373

Pulled By: pdillinger

fbshipit-source-id: 58d34b01ea53b75a1eccbd72f8b14d6256a7380f
2019-12-20 13:41:00 -08:00
Peter Dillinger 7da8c067a2 Avoid heading tags in javadocs; fix EnvironmentTest (#6208)
Summary:
Should fix Travis build error that randomly showed up upon
using Java 13 version of javadoc.

    AdvancedColumnFamilyOptionsInterface.java:257: error:
      unexpected heading used: <H2>, compared to implicit preceding heading: <H3>

According to this reference https://bugs.openjdk.java.net/browse/JDK-8220379
it should work to start at h4, but that didn't work, so avoiding
headings should be fine.

Also fix Java EnvironmentTest for JDK13.

Pull Request resolved: https://github.com/facebook/rocksdb/pull/6208

Test Plan: Travis run on PR (don't have Java 13 handy)

Differential Revision: D19163105

Pulled By: pdillinger

fbshipit-source-id: 4a9419cbe7ef780fba771b8a1508e1ea80d17b3e
2019-12-18 13:36:30 -08:00
Jermy Li f453bcb40d Add unit tests for concurrent CF iteration and drop (#6180)
Summary:
improve https://github.com/facebook/rocksdb/issues/6147
Pull Request resolved: https://github.com/facebook/rocksdb/pull/6180

Differential Revision: D19148936

fbshipit-source-id: f691c9879fd51d54e96c1a99670cf85ca4485a89
2019-12-18 11:54:35 -08:00
Adam Retter edbf0e2d90 Env should also load the native library (#6167)
Summary:
Closes https://github.com/facebook/rocksdb/issues/6118
Pull Request resolved: https://github.com/facebook/rocksdb/pull/6167

Differential Revision: D19053577

Pulled By: pdillinger

fbshipit-source-id: 86aca9a5bec0947a641649b515da17b3cb12bdde
2019-12-13 16:27:55 -08:00
myasuka 4b74035e40 Correct java docs of RocksDB options (#6123)
Summary:
Correct javadocs of several RocksDB option classes to not mislead RocksJava users.
Pull Request resolved: https://github.com/facebook/rocksdb/pull/6123

Differential Revision: D18989044

Pulled By: pdillinger

fbshipit-source-id: a5ac6a415e5311084b10d973d354e6925788f01e
2019-12-12 18:10:03 -08:00
Adam Retter b433bbefe9 Add missing mutable DBOptions to RocksJava (#6152)
Summary:
As requested in https://github.com/facebook/rocksdb/issues/6127
Pull Request resolved: https://github.com/facebook/rocksdb/pull/6152

Differential Revision: D18955608

Pulled By: pdillinger

fbshipit-source-id: 3e1367d944e44d5f1675a422f7dd2451c86feb6f
2019-12-12 12:01:19 -08:00
Adam Simpkins 2bb5fc1280 Add an option to the CMake build to disable building shared libraries (#6122)
Summary:
Add an option to explicitly disable building shared versions of the
RocksDB libraries.  The shared libraries cannot be built in cases where
some dependencies are only available as static libraries.  This allows
still building RocksDB in these situations.
Pull Request resolved: https://github.com/facebook/rocksdb/pull/6122

Differential Revision: D18920740

fbshipit-source-id: d24f66d93c68a1e65635e6e0b663bae62c903bca
2019-12-10 15:20:50 -08:00
Yingchun Lai 9befbe9b40 fix typo (#6099)
Summary:
fix a typo in struct ReadOptions
Pull Request resolved: https://github.com/facebook/rocksdb/pull/6099

Differential Revision: D18729618

fbshipit-source-id: 850a9df71f7c0abebea17feab77b8d5874e8ba0a
2019-11-27 10:26:42 -08:00
Patrick Double 8ae149eba1 Add shared library for musl-libc (#3143)
Summary:
Add the jni library for musl-libc, specifically for incorporating into Alpine based docker images. The classifier is `musl64`.

I have signed the CLA electronically.
Pull Request resolved: https://github.com/facebook/rocksdb/pull/3143

Differential Revision: D18719372

fbshipit-source-id: 6189d149310b6436d6def7d808566b0234b23313
2019-11-26 18:24:09 -08:00
Peter Dillinger 57f3032285 Allow fractional bits/key in BloomFilterPolicy (#6092)
Summary:
There's no technological impediment to allowing the Bloom
filter bits/key to be non-integer (fractional/decimal) values, and it
provides finer control over the memory vs. accuracy trade-off. This is
especially handy in using the format_version=5 Bloom filter in place
of the old one, because bits_per_key=9.55 provides the same accuracy as
the old bits_per_key=10.

This change not only requires refining the logic for choosing the best
num_probes for a given bits/key setting, it revealed a flaw in that logic.
As bits/key gets higher, the best num_probes for a cache-local Bloom
filter is closer to bpk / 2 than to bpk * 0.69, the best choice for a
standard Bloom filter. For example, at 16 bits per key, the best
num_probes is 9 (FP rate = 0.0843%) not 11 (FP rate = 0.0884%).
This change fixes and refines that logic (for the format_version=5
Bloom filter only, just in case) based on empirical tests to find
accuracy inflection points between each num_probes.

Although bits_per_key is now specified as a double, the new Bloom
filter converts/rounds this to "millibits / key" for predictable/precise
internal computations. Just in case of unforeseen compatibility
issues, we round to the nearest whole number bits / key for the
legacy Bloom filter, so as not to unlock new behaviors for it.
Pull Request resolved: https://github.com/facebook/rocksdb/pull/6092

Test Plan: unit tests included

Differential Revision: D18711313

Pulled By: pdillinger

fbshipit-source-id: 1aa73295f152a995328cb846ef9157ae8a05522a
2019-11-26 15:59:34 -08:00
Adam Retter 7f14519577 Small improvements to Docker build for RocksJava (#6079)
Summary:
* We can reuse downloaded 3rd-party libraries
* We can isolate the build to a Docker volume. This is useful for investigating failed builds, as we can examine the volume by assigning it a name during the build.
Pull Request resolved: https://github.com/facebook/rocksdb/pull/6079

Differential Revision: D18710263

fbshipit-source-id: 93f456ba44b49e48941c43b0c4d53995ecc1f404
2019-11-26 13:28:31 -08:00
Fatih Şentürk f382f44e39 fix typo (#6025)
Summary:
fix a typo at java readme page
Pull Request resolved: https://github.com/facebook/rocksdb/pull/6025

Differential Revision: D18481232

fbshipit-source-id: 1c70c2435bcd4b02f25e28cd7e35c42273e07be0
2019-11-13 11:02:28 -08:00
Adam Retter 1e9c8d42a0 Fix the rocksjava release Vagrant build on CentOS (#5901)
Summary:
Closes https://github.com/facebook/rocksdb/issues/5873
Pull Request resolved: https://github.com/facebook/rocksdb/pull/5901

Differential Revision: D17869585

fbshipit-source-id: 559472486f1d3ac80c0c7df6c421c4b612b9b7f9
2019-10-10 17:21:18 -07:00
Chen, You 51185592fd Add unordered write option rocksjava (#5839)
Summary:
Add unordered_write option api and related ut to rocksjava
Pull Request resolved: https://github.com/facebook/rocksdb/pull/5839

Differential Revision: D17604446

Pulled By: maysamyabandeh

fbshipit-source-id: c6b07e85ca9d5e3a92973ddb6ab2bc079e53c9c1
2019-09-27 10:19:40 -07:00
Yikun Jiang 2367656b6c Remove invalid comparison of va_list and nullptr (#5836)
Summary:
The comparison of va_list and nullptr is always False under any arch, and will raise invalid operands of types error in aarch64 env (`error: invalid operands of types ‘va_list {aka __va_list}’ and ‘std::nullptr_t’ to binary ‘operator!=’`).

This patch removes this invalid assert.

Closes: https://github.com/facebook/rocksdb/issues/4277
Pull Request resolved: https://github.com/facebook/rocksdb/pull/5836

Differential Revision: D17532470

fbshipit-source-id: ca98078ecbc6a9416c69de3bd6ffcfa33a0f0185
2019-09-23 12:15:01 -07:00
sdong e8263dbdaa Apply formatter to recent 200+ commits. (#5830)
Summary:
Further apply formatter to more recent commits.
Pull Request resolved: https://github.com/facebook/rocksdb/pull/5830

Test Plan: Run all existing tests.

Differential Revision: D17488031

fbshipit-source-id: 137458fd94d56dd271b8b40c522b03036943a2ab
2019-09-20 12:04:26 -07:00
sdong c06b54d0c6 Apply formatter on recent 45 commits. (#5827)
Summary:
Some recent commits might not have passed through the formatter. I formatted recent 45 commits. The script hangs for more commits so I stopped there.
Pull Request resolved: https://github.com/facebook/rocksdb/pull/5827

Test Plan: Run all existing tests.

Differential Revision: D17483727

fbshipit-source-id: af23113ee63015d8a43d89a3bc2c1056189afe8f
2019-09-19 12:34:17 -07:00
Tomas Kolda 3a408eeae9 Adding support for deleteFilesInRanges in JNI (#4031)
Summary:
It is very useful method call to achieve https://github.com/facebook/rocksdb/wiki/Delete-A-Range-Of-Keys
Pull Request resolved: https://github.com/facebook/rocksdb/pull/4031

Differential Revision: D13515418

Pulled By: vjnadimpalli

fbshipit-source-id: 930b48e0992ef07fd1edd0b0cb5f780fabb1b4b5
2019-09-18 09:45:11 -07:00
tongyingrui c85c87a718 test size was wrong in 'fillbatch' benchmark (#5198)
Summary:
for fillbatch benchmar, the numEntries should be [num_] but not [num_ / 1000] because numEntries is just the total entries we want to test
Pull Request resolved: https://github.com/facebook/rocksdb/pull/5198

Differential Revision: D17274664

Pulled By: anand1976

fbshipit-source-id: f96e952babdbac63fb99d14e1254d478a10437be
2019-09-11 12:04:44 -07:00
Richard He 699e1b5ede Added support for SstFileReader JNI interface (#5556)
Summary:
Feature request as per https://github.com/facebook/rocksdb/issues/5538 issue.
Pull Request resolved: https://github.com/facebook/rocksdb/pull/5556

Differential Revision: D17219008

fbshipit-source-id: e31f18dec318416eac9dea8213bab31da96e1f3a
2019-09-09 18:12:53 -07:00
Wilfried Goesgens fbab9913e2 upgrade gtest 1.7.0 => 1.8.1 for json result writing
Summary: Pull Request resolved: https://github.com/facebook/rocksdb/pull/5332

Differential Revision: D17242232

fbshipit-source-id: c0d4646556a1335e51ac7382b986ca7f6ced7b64
2019-09-09 11:24:11 -07:00
Zhongyi Xie 2f41ecfe75 Refactor trimming logic for immutable memtables (#5022)
Summary:
MyRocks currently sets `max_write_buffer_number_to_maintain` in order to maintain enough history for transaction conflict checking. The effectiveness of this approach depends on the size of memtables. When memtables are small, it may not keep enough history; when memtables are large, this may consume too much memory.
We are proposing a new way to configure memtable list history: by limiting the memory usage of immutable memtables. The new option is `max_write_buffer_size_to_maintain` and it will take precedence over the old `max_write_buffer_number_to_maintain` if they are both set to non-zero values. The new option accounts for the total memory usage of flushed immutable memtables and mutable memtable. When the total usage exceeds the limit, RocksDB may start dropping immutable memtables (which is also called trimming history), starting from the oldest one.
The semantics of the old option actually works both as an upper bound and lower bound. History trimming will start if number of immutable memtables exceeds the limit, but it will never go below (limit-1) due to history trimming.
In order the mimic the behavior with the new option, history trimming will stop if dropping the next immutable memtable causes the total memory usage go below the size limit. For example, assuming the size limit is set to 64MB, and there are 3 immutable memtables with sizes of 20, 30, 30. Although the total memory usage is 80MB > 64MB, dropping the oldest memtable will reduce the memory usage to 60MB < 64MB, so in this case no memtable will be dropped.
Pull Request resolved: https://github.com/facebook/rocksdb/pull/5022

Differential Revision: D14394062

Pulled By: miasantreble

fbshipit-source-id: 60457a509c6af89d0993f988c9b5c2aa9e45f5c5
2019-08-23 13:55:34 -07:00
Adam Retter f2bf0b2d1e Fixes for building RocksJava releases on arm64v8
Summary: Pull Request resolved: https://github.com/facebook/rocksdb/pull/5674

Differential Revision: D16870338

fbshipit-source-id: c8dac644b1479fa734b491f3a8d50151772290f7
2019-08-16 16:27:50 -07:00
Yun Tang 4f98b43ba3 Correct the default write buffer size of java doc (#5670)
Summary:
The actual value of default write buffer size within `rocksdb/include/rocksdb/options.h` is 64 MB, we should correct this value in java doc.
Pull Request resolved: https://github.com/facebook/rocksdb/pull/5670

Differential Revision: D16668815

Pulled By: maysamyabandeh

fbshipit-source-id: cc3a981c9f1c2cd4a8392b0ed5f1fd0a2d729afb
2019-08-06 09:13:48 -07:00
Maysam Yabandeh 208556ee13 WritePrepared: fix Get without snapshot (#5664)
Summary:
if read_options.snapshot is not set, ::Get will take the last sequence number after taking a super-version and uses that as the sequence number. Theoretically max_eviceted_seq_ could advance this sequence number. This could lead ::IsInSnapshot that will be invoked by the ReadCallback to notice the absence of the snapshot. In this case, the ReadCallback should have passed a non-value to snap_released so that it could be set by the ::IsInSnapshot. The patch does that, and adds a unit test to verify it.
Pull Request resolved: https://github.com/facebook/rocksdb/pull/5664

Differential Revision: D16614033

Pulled By: maysamyabandeh

fbshipit-source-id: 06fb3fd4aacd75806ed1a1acec7961f5d02486f2
2019-08-05 13:41:21 -07:00
Mike Kolupaev b4d7209428 Add an option to put first key of each sst block in the index (#5289)
Summary:
The first key is used to defer reading the data block until this file gets to the top of merging iterator's heap. For short range scans, most files never make it to the top of the heap, so this change can reduce read amplification by a lot sometimes.

Consider the following workload. There are a few data streams (we'll be calling them "logs"), each stream consisting of a sequence of blobs (we'll be calling them "records"). Each record is identified by log ID and a sequence number within the log. RocksDB key is concatenation of log ID and sequence number (big endian). Reads are mostly relatively short range scans, each within a single log. Writes are mostly sequential for each log, but writes to different logs are randomly interleaved. Compactions are disabled; instead, when we accumulate a few tens of sst files, we create a new column family and start writing to it.

So, a typical sst file consists of a few ranges of blocks, each range corresponding to one log ID (we use FlushBlockPolicy to cut blocks at log boundaries). A typical read would go like this. First, iterator Seek() reads one block from each sst file. Then a series of Next()s move through one sst file (since writes to each log are mostly sequential) until the subiterator reaches the end of this log in this sst file; then Next() switches to the next sst file and reads sequentially from that, and so on. Often a range scan will only return records from a small number of blocks in small number of sst files; in this case, the cost of initial Seek() reading one block from each file may be bigger than the cost of reading the actually useful blocks.

Neither iterate_upper_bound nor bloom filters can prevent reading one block from each file in Seek(). But this PR can: if the index contains first key from each block, we don't have to read the block until this block actually makes it to the top of merging iterator's heap, so for short range scans we won't read any blocks from most of the sst files.

This PR does the deferred block loading inside value() call. This is not ideal: there's no good way to report an IO error from inside value(). As discussed with siying offline, it would probably be better to change InternalIterator's interface to explicitly fetch deferred value and get status. I'll do it in a separate PR.
Pull Request resolved: https://github.com/facebook/rocksdb/pull/5289

Differential Revision: D15256423

Pulled By: al13n321

fbshipit-source-id: 750e4c39ce88e8d41662f701cf6275d9388ba46a
2019-06-24 20:54:04 -07:00
Jermy Li c92c58f84d JNI: Do not create 8M block cache for negative blockCacheSize values (#5465)
Summary:
As [BlockBasedTableConfig setBlockCacheSize()](1966a7c055/java/src/main/java/org/rocksdb/BlockBasedTableConfig.java (L728)) said, If cacheSize is non-positive, then cache will not be used. but when we configure a negative number or 0, there is an unexpected result: the block cache becomes 8M.

- Allow 0 as a valid size. When block cache size is 0, an 8MB block cache is created, as it is the default C++ API behavior. Also updated the comment.
- Set no_block_cache true if negative value is passed to block cache size, and no block cache will be created.
Pull Request resolved: https://github.com/facebook/rocksdb/pull/5465

Differential Revision: D15968788

Pulled By: sagar0

fbshipit-source-id: ee02d6e95841c9e2c316a64bfdf192d46ff5638a
2019-06-24 11:37:04 -07:00
Jurriaan Mous 5830c619d5 Java: Make the generics of the Options interfaces more strict (#5461)
Summary:
Make the generics of the Options interfaces more strict so they are usable in a Kotlin Multiplatform expect/actual typealias implementation without causing a Violation of Finite Bound Restriction.

This fix would enable the creation of a generic Kotlin multiplatform library by just typealiasing the JVM implementation to the current Java implementation.
Pull Request resolved: https://github.com/facebook/rocksdb/pull/5461

Differential Revision: D15903288

Pulled By: sagar0

fbshipit-source-id: 75e83fdf5d2fcede40744a17e767563d6a4b0696
2019-06-19 14:43:52 -07:00
Adam Retter d1ae67bdb9 Switch Travis to Xenial build (#4789)
Summary:
I think this should now also run on Travis's new virtualised infrastructure which affords more memory and CPU.

We also need to think about migrating from travis-ci.org to travis-ci.com.
Pull Request resolved: https://github.com/facebook/rocksdb/pull/4789

Differential Revision: D15856272

fbshipit-source-id: 10b41d21924e8a362bc9646a63ccd1a5dfc437c6
2019-06-17 10:20:02 -07:00
Patrick Zhang 5c76ba9dc4 Support rocksdbjava aarch64 build and test (#5258)
Summary:
Verified with an Ampere Computing eMAG aarch64 system.
Pull Request resolved: https://github.com/facebook/rocksdb/pull/5258

Differential Revision: D15807309

Pulled By: maysamyabandeh

fbshipit-source-id: ab85d2fd3fe40e6094430ab0eba557b1e979510d
2019-06-13 11:48:10 -07:00
Siying Dong 000b9ec217 Move some logging related files to logging/ (#5387)
Summary:
Many logging related source files are under util/. It will be more structured if they are together.
Pull Request resolved: https://github.com/facebook/rocksdb/pull/5387

Differential Revision: D15579036

Pulled By: siying

fbshipit-source-id: 3850134ed50b8c0bb40a0c8ae1f184fa4081303f
2019-05-31 17:23:59 -07:00
Siying Dong 8843129ece Move some memory related files from util/ to memory/ (#5382)
Summary:
Move arena, allocator, and memory tools under util to a separate memory/ directory.
Pull Request resolved: https://github.com/facebook/rocksdb/pull/5382

Differential Revision: D15564655

Pulled By: siying

fbshipit-source-id: 9cd6b5d0d3d52b39606e19221fa154596e5852a5
2019-05-30 17:44:09 -07:00
Siying Dong e9e0101ca4 Move test related files under util/ to test_util/ (#5377)
Summary:
There are too many types of files under util/. Some test related files don't belong to there or just are just loosely related. Mo
ve them to a new directory test_util/, so that util/ is cleaner.
Pull Request resolved: https://github.com/facebook/rocksdb/pull/5377

Differential Revision: D15551366

Pulled By: siying

fbshipit-source-id: 0f5c8653832354ef8caa31749c0143815d719e2c
2019-05-30 11:25:51 -07:00
Vijay Nadimpalli 931c9df886 Use separate status code for column family drop and db shutdown in progress (#5275)
Summary:
Currently RocksDB uses Status::ShutdownInProgress to inform about column family drop. I would like to have a separate Status code for this event.
https://github.com/facebook/rocksdb/blob/master/include/rocksdb/status.h#L55
Comment on this:
abc4202e47/db/version_set.cc (L2742):L2743
Pull Request resolved: https://github.com/facebook/rocksdb/pull/5275

Differential Revision: D15204583

Pulled By: vjnadimpalli

fbshipit-source-id: 95e99e34b27bc165b554ecb8a48a7f8e60f21e2a
2019-05-20 10:47:32 -07:00
Fosco Marotto 6c2bf9e916 Add copyright headers per FB open-source checkup tool. (#5199)
Summary:
internal task: T35568575
Pull Request resolved: https://github.com/facebook/rocksdb/pull/5199

Differential Revision: D14962794

Pulled By: gfosco

fbshipit-source-id: 93838ede6d0235eaecff90d200faed9a8515bbbe
2019-04-18 10:55:01 -07:00
Zhongyi Xie baa5302447 Avoid double-compacting data in bottom level in manual compactions (#5138)
Summary:
Depending on the config, manual compaction (leveled compaction style) does following compactions:
L0->L1
L1->L2
...
Ln-1 -> Ln
Ln -> Ln
The final Ln -> Ln compaction is partly unnecessary as it recompacts all the files that were just generated by the Ln-1 -> Ln. We should avoid recompacting such files. This rule should be applied to Lmax only.
Resolves issue https://github.com/facebook/rocksdb/issues/4995
Pull Request resolved: https://github.com/facebook/rocksdb/pull/5138

Differential Revision: D14940106

Pulled By: miasantreble

fbshipit-source-id: 8d3cf5507a17e76f3333cfd4bac5256d005636e5
2019-04-16 23:32:20 -07:00
ableegoldman 1966a7c055 Expose JavaAPI for getting the filter policy of a BlockBasedTableConfig (#5186)
Summary:
I would like to be able to read out the current Filter that has been set (or not) for a BlockBasedTableConfig. Added one public method to BlockBasedTableConfig:

public Filter filterPolicy() {
    return filterPolicy;
}
Pull Request resolved: https://github.com/facebook/rocksdb/pull/5186

Differential Revision: D14921415

Pulled By: siying

fbshipit-source-id: 2a63c8685480197862b49fc48916c757cd6daf95
2019-04-12 14:01:36 -07:00
Andrew Audibert e50326f327 Document the interaction between disableWAL and BackupEngine (#5071)
Summary:
BackupEngine relies on write-ahead logs to back up the memtable. Disabling write-ahead logs
can result in backups failing to preserve unflushed keys. This PR updates the documentation to specify this behavior, and suggest always flushing the memtable when write-ahead logs are disabled.
Pull Request resolved: https://github.com/facebook/rocksdb/pull/5071

Differential Revision: D14524124

Pulled By: miasantreble

fbshipit-source-id: 635f855f8a42ad60273b5efd226139b511e3e5d5
2019-03-19 14:58:14 -07:00
SeterKwok a283800616 Move some RocksObject into try-with-resources in Test (#5037)
Summary:
Fix #5008
Pull Request resolved: https://github.com/facebook/rocksdb/pull/5037

Differential Revision: D14302474

Pulled By: riversand963

fbshipit-source-id: dcd9dda5d4d6d459315692f355499a39e546d518
2019-03-04 14:56:15 -08:00
Sagar Vemuri 9a31b8dd2c Disable getApproximateSizes test (#5035)
Summary:
Disabling `org.rocksdb.RocksDBTest.getApproximateSizes` test as it is frequently crashing on travis (#5020). It will be re-enabled once the root-cause is found and fixed.
Pull Request resolved: https://github.com/facebook/rocksdb/pull/5035

Differential Revision: D14294736

Pulled By: sagar0

fbshipit-source-id: e28bff0d143a58ad6c82991fec3d4cf8c0209995
2019-03-01 23:05:32 -08:00
Siying Dong 0920bf4e68 Revert "Remove PlainTable's feature store_index_in_file (#4914)" (#5034)
Summary:
This reverts commit ee1818081f.

We are not ready to deprecate this feature. revert it for now.
Pull Request resolved: https://github.com/facebook/rocksdb/pull/5034

Differential Revision: D14287246

Pulled By: siying

fbshipit-source-id: e4beafdeaee1c94364fdaa6ba198218d158339f7
2019-03-01 15:45:45 -08:00
Siying Dong aef763b6d6 Make statistics's stats_level change thread-safe (#5030)
Summary:
Right now, users can change statistics.stats_level while DB is running, but TSAN may report
data race. We make stats_level_ to be atomic, and access them using accessors.
Pull Request resolved: https://github.com/facebook/rocksdb/pull/5030

Differential Revision: D14267519

Pulled By: siying

fbshipit-source-id: 37d7ebeff7a43a406230143422a16af899163f73
2019-03-01 10:42:09 -08:00
Sagar Vemuri 06ea73d60b Fix DefaultEnvTest.incBackgroundThreadsIfNeeded test (#5021)
Summary:
`DefaultEnvTest.incBackgroundThreadsIfNeeded` jtest should assert that the number of threads is greater than or equal to the minimum number of threads.
Pull Request resolved: https://github.com/facebook/rocksdb/pull/5021

Differential Revision: D14268311

Pulled By: sagar0

fbshipit-source-id: 01fb32b5b3ce636451d162fa1a2bbc5bd1974682
2019-02-28 14:48:57 -08:00