Commit Graph

11652 Commits

Author SHA1 Message Date
Yanqin Jin 18cb731f27 Fix a bug in range scan with merge and deletion with timestamp (#10915)
Summary:
When performing Merge during range scan, iterator should understand value types of kDeletionWithTimestamp.

Also add an additional check in debug mode to MergeHelper, and account for the presence of compaction filter.

Pull Request resolved: https://github.com/facebook/rocksdb/pull/10915

Test Plan: make check

Reviewed By: ltamasi

Differential Revision: D40960039

Pulled By: riversand963

fbshipit-source-id: dd79d86d7c79d05755bb939a3d94e0c53ddd7f59
2022-11-03 13:02:06 -07:00
Levi Tamasi 941d834739 Support Merge for wide-column entities during point lookups (#10916)
Summary:
The patch adds `Merge` support for wide-column entities to the point lookup
APIs, i.e. `Get`, `MultiGet`, `GetEntity`, and `GetMergeOperands`. (I plan to
update the iterator and compaction logic in separate PRs.) In terms of semantics,
the `Merge` operation is applied to the default (anonymous) column; any other
columns in the entity are unaffected.

Pull Request resolved: https://github.com/facebook/rocksdb/pull/10916

Test Plan: `make check`

Reviewed By: riversand963

Differential Revision: D40962311

Pulled By: ltamasi

fbshipit-source-id: 244bc9d172be1af2f204796b2f89104e4d2fa373
2022-11-03 08:35:42 -07:00
Peter Dillinger cc8c8f6958 Refactor (Hyper)ClockCache code (#10887)
Summary:
For clean-up and in preparation for some other anticipated changes, including
* A new dynamically-scaling variant of HyperClockCache
* SecondaryCache support for HyperClockCache

This change does some refactoring for current and future code sharing and reusability. (Including follow-up on https://github.com/facebook/rocksdb/issues/10843)

## clock_cache.h
* TBD whether new variant will be a HyperClockCache or use some other name, so namespace is just clock_cache for the family of structures.
* A number of helper functions introduced and used.
* Pre-emptively split ClockHandle (shared among lock-free clock cache variants) and HandleImpl (specific to a kind of Table), and introduce template to plug new Table implementation into ClockCacheShard.

## clock_cache.cc
* Mostly using helper functions. Some things like `Rollback()` and `FreeDataMarkEmpty()` were not combined because `Rollback()` is Table-specific while `FreeDataMarkEmpty()` can be used with different table implementations.
* Performance testing indicated that despite more opportunities for parallelism, making a local copy of handle data for processing after marking an entry empty was slower than doing that processing before marking the entry empty (but after marking it "under construction"), thus avoiding a few words of copying data. At least for now, this answers the "TODO? Delay freeing?" questions (no).

Pull Request resolved: https://github.com/facebook/rocksdb/pull/10887

Test Plan:
fixed a unit testing gap; other minor test updates for refactoring

No functionality change

## Performance
Same setup as https://github.com/facebook/rocksdb/issues/10801:

Before: `readrandom [AVG 81 runs] : 627992 (± 5124) ops/sec`
After: `readrandom [AVG 81 runs] : 637512 (± 4866) ops/sec`

I've been getting some inconsistent results on restarts like the system is not being fair to the two processes, so I'm not sure there's such a real difference.

Reviewed By: anand1976

Differential Revision: D40959240

Pulled By: pdillinger

fbshipit-source-id: 0a8f3646b3bdb5bc7aaad60b26790b0779189949
2022-11-02 22:41:39 -07:00
Tal Zussman 0d5dc5fdb9 Add rocksdb_backup_restore_example to examples/.gitignore (#10825)
Summary: Pull Request resolved: https://github.com/facebook/rocksdb/pull/10825

Reviewed By: akankshamahajan15

Differential Revision: D40419234

Pulled By: ajkr

fbshipit-source-id: 2d700154eb5b2943d10a0f944f2b414ece353e4a
2022-11-02 15:02:09 -07:00
Yanqin Jin 0547cecb81 Reduce access to atomic variables in a test (#10909)
Summary:
With TSAN build on CircleCI (see mini-tsan in .circleci/config).
Sometimes `SeqAdvanceConcurrentTest.SeqAdvanceConcurrent` will get stuck when an experimental feature called
"unordered write" is enabled. Stack trace will be the following
```
Thread 7 (Thread 0x7f2284a1c700 (LWP 481523) "write_prepared_"):
#0  0x00000000004fa3f5 in __tsan_atomic64_load () at ./db/merge_context.h:15
https://github.com/facebook/rocksdb/issues/1  0x00000000005e5942 in std::__atomic_base<unsigned long>::load (this=0x7b74000012f8, __m=std::memory_order_seq_cst) at /usr/bin/../lib/gcc/x86_64-linux-gnu/11/../../../../include/c++/11/bits/atomic_base.h:481
https://github.com/facebook/rocksdb/issues/2  std::__atomic_base<unsigned long>::operator unsigned long (this=0x7b74000012f8) at /usr/bin/../lib/gcc/x86_64-linux-gnu/11/../../../../include/c++/11/bits/atomic_base.h:341
https://github.com/facebook/rocksdb/issues/3  0x00000000005bf001 in rocksdb::SeqAdvanceConcurrentTest_SeqAdvanceConcurrent_Test::TestBody()::$_9::operator()(void*) const (this=0x7b14000085e8) at utilities/transactions/write_prepared_transaction_test.cc:1702

Thread 6 (Thread 0x7f228421b700 (LWP 481521) "write_prepared_"):
#0  0x000000000052178c in __tsan::MetaMap::GetAndLock(__tsan::ThreadState*, unsigned long, unsigned long, bool, bool) () at ./db/merge_context.h:15
https://github.com/facebook/rocksdb/issues/1  0x00000000004fa48e in __tsan_atomic64_load () at ./db/merge_context.h:15
https://github.com/facebook/rocksdb/issues/2  0x00000000005e5942 in std::__atomic_base<unsigned long>::load (this=0x7b74000012f8, __m=std::memory_order_seq_cst) at /usr/bin/../lib/gcc/x86_64-linux-gnu/11/../../../../include/c++/11/bits/atomic_base.h:481
https://github.com/facebook/rocksdb/issues/3  std::__atomic_base<unsigned long>::operator unsigned long (this=0x7b74000012f8) at /usr/bin/../lib/gcc/x86_64-linux-gnu/11/../../../../include/c++/11/bits/atomic_base.h:341
https://github.com/facebook/rocksdb/issues/4  0x00000000005bf001 in rocksdb::SeqAdvanceConcurrentTest_SeqAdvanceConcurrent_Test::TestBody()::$_9::operator()(void*) const (this=0x7b14000085e8) at utilities/transactions/write_prepared_transaction_test.cc:1702
```

This is problematic and suspicious. Two threads will get stuck in the same place trying to load from an atomic variable.
https://github.com/facebook/rocksdb/blob/7.8.fb/utilities/transactions/write_prepared_transaction_test.cc#L1694:L1707. Not sure why two threads can reach the same point.

The stack trace shows that there may be a deadlock, since the two threads are on the same write thread (one is doing Prepare, while the other is trying to commit).

Pull Request resolved: https://github.com/facebook/rocksdb/pull/10909

Test Plan:
On CircleCI mini-tsan, apply a patch first so that we have a higher chance of hitting the same problematic situation,
```
 diff --git a/utilities/transactions/write_prepared_transaction_test.cc b/utilities/transactions/write_prepared_transaction_test.cc
index 4bc1f3744..bd5dc4924 100644
 --- a/utilities/transactions/write_prepared_transaction_test.cc
+++ b/utilities/transactions/write_prepared_transaction_test.cc
@@ -1714,13 +1714,13 @@ TEST_P(SeqAdvanceConcurrentTest, SeqAdvanceConcurrent) {
       size_t d = (n % base[bi + 1]) / base[bi];
       switch (d) {
         case 0:
-          threads.emplace_back(txn_t0, bi);
+          threads.emplace_back(txn_t3, bi);
           break;
         case 1:
-          threads.emplace_back(txn_t1, bi);
+          threads.emplace_back(txn_t3, bi);
           break;
         case 2:
-          threads.emplace_back(txn_t2, bi);
+          threads.emplace_back(txn_t3, bi);
           break;
         case 3:
           threads.emplace_back(txn_t3, bi);
```
then build and run tests
```
COMPILE_WITH_TSAN=1 CC=clang-13 CXX=clang++-13 ROCKSDB_DISABLE_ALIGNED_NEW=1 USE_CLANG=1 make V=1 -j32 check
gtest-parallel -r 100 ./write_prepared_transaction_test --gtest_filter=TwoWriteQueues/SeqAdvanceConcurrentTest.SeqAdvanceConcurrent/19
```
In the above, `SeqAdvanceConcurrent/19`. The tests 10 to 19 correspond to unordered write in which Prepare() and Commit() can both enter the same write thread.
Before this PR, there is a high chance of hitting the deadlock. With this PR, no deadlock has been encountered so far.

Reviewed By: ltamasi

Differential Revision: D40869387

Pulled By: riversand963

fbshipit-source-id: 81e82a70c263e4f3417597a201b081ee54f1deab
2022-11-02 14:54:58 -07:00
Brord van Wierst d80baa1396 Added placeholders for MADV defines (#10881)
Summary:
Cross compiling rocksdb with rust bindings to android leads to an error since 7.4.0 (Incusion of madvise)
This is due to missing placeholders for non-linux platforms.

This PR adds the missing placeholders.

See https://github.com/rust-rocksdb/rust-rocksdb/issues/697 for the specific error thrown.

I have just completed the CLA :)

Pull Request resolved: https://github.com/facebook/rocksdb/pull/10881

Reviewed By: akankshamahajan15

Differential Revision: D40726103

Pulled By: ajkr

fbshipit-source-id: 6b391636a74ef7e20d0daf47d332ddf0c14d5c34
2022-11-02 14:42:42 -07:00
Adam Retter 781a387488 Improve musl libc detection and provide an option for the user to override (#10889)
Summary:
The user may override the detection of whether to use GNU libc (the default) or musl libc by setting the environment variable: `ROCKSDB_MUSL_LIBC=true`.

Builds upon and supersedes: https://github.com/facebook/rocksdb/pull/9977

Pull Request resolved: https://github.com/facebook/rocksdb/pull/10889

Reviewed By: akankshamahajan15

Differential Revision: D40788431

Pulled By: ajkr

fbshipit-source-id: ef594d973fc14cbadf28bfb38434231a18a2107c
2022-11-02 14:42:23 -07:00
Brad Smith 4a6906e28c Add OpenBSD/arm64 support for detection of CRC32 and PMULL (#10902)
Summary: Pull Request resolved: https://github.com/facebook/rocksdb/pull/10902

Reviewed By: akankshamahajan15

Differential Revision: D40839659

Pulled By: ajkr

fbshipit-source-id: 06be5919622f8cce1fce1097c5e654900bf7f8fb
2022-11-02 14:35:27 -07:00
Andrew Kryczka 5cf6ab6f31 Ran clang-format on db/ directory (#10910)
Summary:
Ran `find ./db/ -type f | xargs clang-format -i`. Excluded minor changes it tried to make on db/db_impl/. Everything else it changed was directly under db/ directory. Included minor manual touchups mentioned in PR commit history.

Pull Request resolved: https://github.com/facebook/rocksdb/pull/10910

Reviewed By: riversand963

Differential Revision: D40880683

Pulled By: ajkr

fbshipit-source-id: cfe26cda05b3fb9a72e3cb82c286e21d8c5c4174
2022-11-02 14:34:24 -07:00
akankshamahajan ff9ad2c39b Fix async_io failures in case there is error in reading data (#10890)
Summary:
Fix memory corruption error in scans if async_io is enabled. Memory corruption happened if data is overlapping between two buffers. If there is IOError while reading the data, it leads to empty buffer and other buffer already in progress of async read goes again for reading causing the error.
Fix: Added check to abort IO in second buffer if curr_ got empty.

This PR also fixes db_stress failures which happened when buffers are not aligned.

Pull Request resolved: https://github.com/facebook/rocksdb/pull/10890

Test Plan:
- Ran make crash_test -j32 with async_io enabled.
-  Ran benchmarks to make sure there is no regression.

Reviewed By: anand1976

Differential Revision: D40881731

Pulled By: akankshamahajan15

fbshipit-source-id: 39fcf2134c7b1bbb08415ede3e1ef261ac2dbc58
2022-11-01 16:06:51 -07:00
Yanqin Jin 7d26e4c5a3 Basic Support for Merge with user-defined timestamp (#10819)
Summary:
This PR implements the originally disabled `Merge()` APIs when user-defined timestamp is enabled.

Simplest usage:
```cpp
// assume string append merge op is used with '.' as delimiter.
// ts1 < ts2
db->Put(WriteOptions(), "key", ts1, "v0");
db->Merge(WriteOptions(), "key", ts2, "1");
ReadOptions ro;
ro.timestamp = &ts2;
db->Get(ro, "key", &value);
ASSERT_EQ("v0.1", value);
```

Some code comments are added for clarity.

Note: support for timestamp in `DB::GetMergeOperands()` will be done in a follow-up PR.

Pull Request resolved: https://github.com/facebook/rocksdb/pull/10819

Test Plan: make check

Reviewed By: ltamasi

Differential Revision: D40603195

Pulled By: riversand963

fbshipit-source-id: f96d6f183258f3392d80377025529f7660503013
2022-10-31 22:28:58 -07:00
Denis Hananein 9f3475eccf Fix compilation errors, clang++-15 (#10907)
Summary:
I've tried to compile the main branch, but there are two minor things which are make CE.
I'm not sure about the second one (`num_empty_non_l0_level`), probably there is should be additional assert.

```
-c ../cache/clock_cache.cc
[build] ../cache/clock_cache.cc:855:15: error: variable 'i' set but not used [-Werror,-Wunused-but-set-variable]
[build]   for (size_t i = 0; &array_[current] != h; i++) {
[build]               ^
```

```
[build] ../db/version_set.cc:3665:7: error: variable 'num_empty_non_l0_level' set but not used [-Werror,-Wunused-but-set-variable]
[build]   int num_empty_non_l0_level = 0;
[build]       ^
[build] 1 error generated.
```

Pull Request resolved: https://github.com/facebook/rocksdb/pull/10907

Reviewed By: jay-zhuang

Differential Revision: D40866667

Pulled By: ajkr

fbshipit-source-id: 963b7bd56859d0b3b2779cd36fad229425cb7b17
2022-10-31 18:24:44 -07:00
Hui Xiao 7f5e438aee Move move wrong history entry out of 7.8 release (#10898)
Summary:
**Context/Summary:**

https://github.com/facebook/rocksdb/pull/10777 mistakenly added a history entry under 7.8 release but the PR is not included in 7.8. This mistake was due to rebase and merge didn't realize it was a conflict when "## Unreleased" was changed to "## 7.8.0 (10/22/2022)".

Pull Request resolved: https://github.com/facebook/rocksdb/pull/10898

Test Plan: Make check

Reviewed By: akankshamahajan15

Differential Revision: D40861001

Pulled By: hx235

fbshipit-source-id: b2310c95490f6ebb90834a210c965a74c9560b51
2022-10-31 15:02:29 -07:00
Levi Tamasi ea1982d010 Add missing copyright headers to a couple of Java test files (#10900)
Summary: Pull Request resolved: https://github.com/facebook/rocksdb/pull/10900

Reviewed By: akankshamahajan15

Differential Revision: D40825886

Pulled By: ltamasi

fbshipit-source-id: e60f74aa8a622c3c71e1fee420fd586728fb2b7b
2022-10-31 10:05:03 -07:00
sdong d989300ad1 Avoid repeat periodic stats printing when there is no change (#10891)
Summary:
When there is a column family that doesn't get any traffic, its stats are still dumped when options.options.stats_dump_period_sec triggers. This sometimes spam the information logs. With this change, we skip the printing if there is not change, until 8 periods.

Pull Request resolved: https://github.com/facebook/rocksdb/pull/10891

Test Plan: Manually test the behavior with hacked db_bench setups.

Reviewed By: jay-zhuang

Differential Revision: D40777183

fbshipit-source-id: ef0b9a793e4f6282df099b464f01d1fb4c5a2cab
2022-10-31 09:51:38 -07:00
Yanqin Jin 9079895aae Fix deletion counting in memtable stats (#10886)
Summary:
Currently, a memtable's stats `num_deletes_` is incremented only if the entry is a regular delete (kTypeDeletion). We need to fix it by accounting for kTypeSingleDeletion and kTypeDeletionWithTimestamp.

Pull Request resolved: https://github.com/facebook/rocksdb/pull/10886

Test Plan: make check

Reviewed By: ltamasi

Differential Revision: D40740754

Pulled By: riversand963

fbshipit-source-id: 7bde62cd6df136585bc5bfb1c426c7a8276c08e1
2022-10-28 17:03:44 -07:00
Jay Zhuang 36f5e19e33 Fix a Windows build error (#10897)
Summary:
The for loop is marked as unreachable code because it will never call the increment. Switch it to `if`.

```
\table\merging_iterator.cc(823): error C2220: the following warning is treated as an error
\table\merging_iterator.cc(823): warning C4702: unreachable code
\table\merging_iterator.cc(1030): error C2220: the following warning is treated as an error
\table\merging_iterator.cc(1030): warning C4702: unreachable code
```

Pull Request resolved: https://github.com/facebook/rocksdb/pull/10897

Reviewed By: cbi42

Differential Revision: D40811790

Pulled By: jay-zhuang

fbshipit-source-id: fe8fd3e7cf3d6f710360c402b79763854d5120df
2022-10-28 14:24:48 -07:00
Yanqin Jin 900f79126d Pass `const LockInfo&` to AcquireLocked() and AcquireWithTimeout (#10874)
Summary:
The motivation and benefit of current behavior of passing `LockInfo&&` as argument to AcquireLocked() and AcquireWithTimeout() is not clear to me. Furthermore, in AcquireWithTimeout(), we access members of `LockInfo&&` after it is passed to AcquireLocked() as rvalue ref. In addition, we may call `AcquireLocked()` with `std::move(lock_info)` multiple times.

This leads to linter warning of use-after-move. If future implementation of AcquireLocked() does something like moving-construct a new `LockedInfo` using the passed-in `LockInfo&&`, then the caller cannot use it because `LockInfo` has a member of type `autovector`.

Pull Request resolved: https://github.com/facebook/rocksdb/pull/10874

Test Plan: make check

Reviewed By: ltamasi

Differential Revision: D40704210

Pulled By: riversand963

fbshipit-source-id: 20091df65b4fc63b072bcec9809efc49955d6d35
2022-10-28 14:05:12 -07:00
Hui Xiao 08a63ad10b Run clang format against files under example/, memory/ and memtable/ folders (#10893)
Summary:
**Context/Summary:**
Run the following to format
```
find ./examples -iname *.h -o -iname *.cc | xargs clang-format -i
find ./memory -iname *.h -o -iname *.cc | xargs clang-format -i
find ./memtable -iname *.h -o -iname *.cc | xargs clang-format -i
```

**Test**
- Manual inspection to ensure changes are cosmetic only
- CI

Pull Request resolved: https://github.com/facebook/rocksdb/pull/10893

Reviewed By: jay-zhuang

Differential Revision: D40779187

Pulled By: hx235

fbshipit-source-id: 529cbb0f0fbd698d95817e8c42fe3ce32254d9b0
2022-10-28 13:16:50 -07:00
Levi Tamasi 7867a1112b Handle Merges correctly in GetEntity (#10894)
Summary:
The PR fixes the handling of `Merge`s in `GetEntity`. Note that `Merge` is not yet
supported for wide-column entities written using `PutEntity`; this change is
about returning correct (i.e. consistent with `Get`) results in cases like when the
base value is a plain old key-value written using `Put` or when there is no real base
value because we hit either a tombstone or the beginning of history.

Implementation-wise, the patch introduces a new wrapper around the existing
`MergeHelper::TimedFullMerge` that can store the merge result in either a string
(for the purposes of `Get`) or a `PinnableWideColumns` instance (for `GetEntity`).

Pull Request resolved: https://github.com/facebook/rocksdb/pull/10894

Test Plan: `make check`

Reviewed By: riversand963

Differential Revision: D40782708

Pulled By: ltamasi

fbshipit-source-id: 3d700d56b2ef81f02ba1e2d93f6481bf13abcc90
2022-10-28 10:48:51 -07:00
Jay Zhuang 1e6f1ef894 Upgrade CircleCI Windows Build (#10090)
Summary:
* Upgrade CircleCI orb from 2.4 to 5.0
* Setup vs2022 build
* Use image build-in vs2019 and vs2022
* Remove vs2017
* Remove CMAKE_CXX_STANDARD=20

Pull Request resolved: https://github.com/facebook/rocksdb/pull/10090

Reviewed By: ajkr

Differential Revision: D40787942

Pulled By: jay-zhuang

fbshipit-source-id: cc74c02a9f28dd784a0ba5502c4bfc9ff1a26d3e
2022-10-28 09:14:47 -07:00
anand76 bf497e91ad Allow a custom DB cleanup command to be passed to db_crashtest.py (#10883)
Summary:
This option allows a custom cleanup command line for a non-Posix file system to be used by db_crashtest.py to cleanup between runs.

Pull Request resolved: https://github.com/facebook/rocksdb/pull/10883

Test Plan: Run the whitebox crash test

Reviewed By: pdillinger

Differential Revision: D40726424

Pulled By: anand1976

fbshipit-source-id: b827f6b583ff78f9ca75ced2d96f7e58f5200432
2022-10-27 19:47:01 -07:00
Levi Tamasi 22ff8c5af7 Use malloc/free for LRUHandle instead of new[]/delete[] (#10884)
Summary:
It's unsafe to call `malloc_usable_size` with an address not returned by a function from the `malloc` family (see https://github.com/facebook/rocksdb/issues/10798). The patch switches from using `new[]` / `delete[]` for `LRUHandle` to `malloc` / `free`.

Pull Request resolved: https://github.com/facebook/rocksdb/pull/10884

Test Plan: `make check`

Reviewed By: pdillinger

Differential Revision: D40738089

Pulled By: ltamasi

fbshipit-source-id: ac5583f88125fee49c314639be6b6df85937fbee
2022-10-27 15:39:29 -07:00
Changyu Bi 56715350d9 Reduce heap operations for range tombstone keys in iterator (#10877)
Summary:
Right now in MergingIterator, for each range tombstone start and end key, we pop one end from heap and push the other end into the heap. This involves extra downheap and upheap cost. In the likely cases when a range tombstone iterator emits relatively adjacent keys, these keys should have similar order within all keys in the heap. This can happen when there is a burst of consecutive range tombstones, and most of the keys covered by them are dropped already. This PR uses `replace_top()` when inserting new range tombstone keys, which is more efficient in these common cases.

Pull Request resolved: https://github.com/facebook/rocksdb/pull/10877

Test Plan:
- existing UT
- ran all flavors of stress test through sandcastle
- benchmark:
```
# Set up: --writes_per_range_tombstone=1 means one point write and one delete range

TEST_TMPDIR=/tmp/rocksdb-rangedel-test-all-tombstone ./db_bench --benchmarks=fillseq,levelstats --writes_per_range_tombstone=1 --max_num_range_tombstones=1000000 --range_tombstone_width=2 --num=100000000 --writes=800000 --max_bytes_for_level_base=4194304 --disable_auto_compactions --write_buffer_size=33554432 --key_size=64

Level Files Size(MB)
--------------------
  0        8      152
  1        0        0
  2        0        0
  3        0        0
  4        0        0
  5        0        0
  6        0        0

# Benchmark
TEST_TMPDIR=/tmp/rocksdb-rangedel-test-all-tombstone/ ./db_bench --benchmarks=readseq[-W1][-X5],levelstats --use_existing_db=true --cache_size=3221225472 --num=100000000 --reads=1000000 --disable_auto_compactions=true --avoid_flush_during_recovery=true

# Pre PR
readseq [AVG    5 runs] : 1432116 (± 59664) ops/sec;  224.0 (± 9.3) MB/sec
readseq [MEDIAN 5 runs] : 1454886 ops/sec;  227.5 MB/sec

# Post PR
readseq [AVG    5 runs] : 1944425 (± 29521) ops/sec;  304.1 (± 4.6) MB/sec
readseq [MEDIAN 5 runs] : 1959430 ops/sec;  306.5 MB/sec
```

Reviewed By: ajkr

Differential Revision: D40710936

Pulled By: cbi42

fbshipit-source-id: cb782fb9cdcd26c0c3eb9443215a4ef4d2f79022
2022-10-27 14:28:50 -07:00
sdong 3e686c7cbe sst_dump --command=raw to add index offset information (#10873)
Summary:
Add some extra information in outputs of "sst_dump --command=raw" to help debug some issues. Right now, encoded block handle is printed out. It is more useful to directly print out offset and size.

Pull Request resolved: https://github.com/facebook/rocksdb/pull/10873

Test Plan: Manually run it against a file and check the output.

Reviewed By: anand1976

Differential Revision: D40742289

fbshipit-source-id: 04d7de26e7f27e1595a7cc3ac1c1082e4e835b93
2022-10-27 11:56:09 -07:00
anand76 5fef34fd3a Fix a potential std::vector use after move bug (#10845)
Summary:
The call to `folly::coro::collectAllRange()` should move the input `mget_tasks`. But just in case, assert and clear the std::vector before reusing.

Pull Request resolved: https://github.com/facebook/rocksdb/pull/10845

Reviewed By: akankshamahajan15

Differential Revision: D40611719

Pulled By: anand1976

fbshipit-source-id: 0f32b387cf5a2894b13389016c020b01ab479b5e
2022-10-26 22:34:36 -07:00
Peter Dillinger 5d3953114f Fix include of windows.h in mmap.h (#10885)
Summary:
If windows.h is not included in a particular way, it can conflict with other code including it. I don't know all the details, but having just one standard place where we include windows.h in header files seems best and seems to fix the internal issue we hit.

Pull Request resolved: https://github.com/facebook/rocksdb/pull/10885

Test Plan: CI and internal validation

Reviewed By: anand1976

Differential Revision: D40738945

Pulled By: pdillinger

fbshipit-source-id: 88f635e895b1c7b810baad159e6dbb8351344cac
2022-10-26 18:07:57 -07:00
Alan Paxton 17553bdd5e RocksJava API - fix Transaction.multiGet() size limit, remove bogus EnsureLocalCapacity() calls (#10674)
Summary:
Resolves see https://github.com/facebook/rocksdb/issues/9006

Fixes 2 related issues with JNI local references in the RocksJava API.

1. Some instances of RocksJava API JNI code appear to have misunderstood the reason for `JNIEnv->EnsureLocalCapacity()` and are carrying out bogus checks which happen to fail with some larger parameter values (many column families in a single call, very long key names or values). Remove these checks and add some regression tests for the previous failures.

2. The helper for Transaction multiGet operations (`multiGet()`, `multiGetForUpdate()`,...) is limited in the number of keys it can `get()` for because it requires a corresponding number of live local references. Refactor the helper slightly, copying out the key contents within a loop so that the references don't have to exist at the same time.

Pull Request resolved: https://github.com/facebook/rocksdb/pull/10674

Reviewed By: ajkr

Differential Revision: D40515361

Pulled By: jay-zhuang

fbshipit-source-id: f1be0126181a698b3ad27c0945a39c54d950aa25
2022-10-26 17:25:33 -07:00
Qiaolin Yu bf78380851 Rename block_cache_trace_analyzer_tool in CMakeLists (#10814)
Summary:
Currently, the name of `block_cache_trace_analyzer_tool` in `CMakeLists.txt` is somewhat confusing.

## Makefile
The same thing in Makefile is called `block_cache_trace_analyzer`.
```c++
block_cache_trace_analyzer: $(OBJ_DIR)/tools/block_cache_analyzer/block_cache_trace_analyzer_tool.o $(ANALYZE_OBJECTS) $(TOOLS_LIBRARY) $(LIBRARY)
	$(AM_LINK)
```

## RocksDB Wiki
Also, in the [Block-cache-analysis-and-simulation-tools](https://github.com/facebook/rocksdb/wiki/Block-cache-analysis-and-simulation-tools#quick-start) of RocksDB Wiki, it is called `block_cache_trace_analyzer` too.
<img width="955" alt="Screen Shot 2022-10-13 at 20 07 09" src="https://user-images.githubusercontent.com/90088090/195591912-00b539b4-7f8c-4117-bf72-ac4eb51100d1.png">

Therefore, I think maybe it's better to rename `block_cache_trace_analyzer_tool` to `block_cache_trace_analyzer` in `CMakeLists.txt`.

Pull Request resolved: https://github.com/facebook/rocksdb/pull/10814

Reviewed By: ajkr

Differential Revision: D40348522

Pulled By: jay-zhuang

fbshipit-source-id: f3d69d5880b27cdb8c8fe71df56fa3dbe1dc32fb
2022-10-26 17:02:37 -07:00
Jay Zhuang b36ec37a4b clang-format for db/compaction (#10882)
Summary: Pull Request resolved: https://github.com/facebook/rocksdb/pull/10882

Reviewed By: riversand963

Differential Revision: D40724867

Pulled By: jay-zhuang

fbshipit-source-id: 7f387724f8cd07d8d2b90566a515a4e9078d21f1
2022-10-26 12:35:12 -07:00
Peter Dillinger a1a1dc6659 Manual interventions for clang-format util/ (#10870)
Summary:
Complements https://github.com/facebook/rocksdb/issues/10867 with some manual edits to avoid weird formatting or to avoid massive reformatting third party code.

Pull Request resolved: https://github.com/facebook/rocksdb/pull/10870

Test Plan: `make check` etc

Reviewed By: riversand963

Differential Revision: D40686526

Pulled By: pdillinger

fbshipit-source-id: 6af988fe4b0a8ae4a5992ec2c3c37fe67584226e
2022-10-26 12:08:20 -07:00
Peter Dillinger 7fff38b1fe clang-format cache/ and util/ directories (#10867)
Summary:
This is purely the result of running `clang-format -i` on files, except some files have been excluded for manual intervention in a separate PR

Pull Request resolved: https://github.com/facebook/rocksdb/pull/10867

Test Plan: `make check`, `make check-headers`, `make format`

Reviewed By: jay-zhuang

Differential Revision: D40682086

Pulled By: pdillinger

fbshipit-source-id: 8673d978553ab99b516da7fb63ba0b82523337f8
2022-10-26 12:08:20 -07:00
Brendan MacDonell 5f915b447d Fix ChecksumType::kXXH3 in the Java API (#10862)
Summary:
While PR#9749 nominally added support for XXH3 in the Java API, it did not update the `toCppChecksumType` method. As a result, setting the checksum type to XXH3 actually set it to CRC32c instead.

This commit adds the missing entry to portal.h, and also updates the tests so that they verify the options passed to RocksDB, instead of simply checking that the getter returns the value set by the setter.

Pull Request resolved: https://github.com/facebook/rocksdb/pull/10862

Reviewed By: pdillinger

Differential Revision: D40665031

Pulled By: ajkr

fbshipit-source-id: 2834419b3361a4bac47db3b858951fb451b5bdc8
2022-10-25 19:25:44 -07:00
Levi Tamasi d484275230 Adjust value generation in batched ops stress tests (#10872)
Summary:
The patch adjusts the generation of values in batched ops stress tests so that the digits 0..9 are appended (instead of prepended) to the values written. This has the advantage of aligning the encoding of the "value base" into the value string across non-batched, batched, and CF consistency stress tests.

Pull Request resolved: https://github.com/facebook/rocksdb/pull/10872

Test Plan: Tested using some black box stress test runs.

Reviewed By: riversand963

Differential Revision: D40692847

Pulled By: ltamasi

fbshipit-source-id: 26bf8adff2944cbe416665f09c3bab89d80416b3
2022-10-25 17:51:20 -07:00
sdong 48fe921754 Run clang format against files under tools/ and db_stress_tool/ (#10868)
Summary:
Some lines of .h and .cc files are not properly fomatted. Clear them up with clang format.

Pull Request resolved: https://github.com/facebook/rocksdb/pull/10868

Test Plan: Watch existing CI to pass

Reviewed By: ajkr

Differential Revision: D40683485

fbshipit-source-id: 491fbb78b2cdcb948164f306829909ad816d5d0b
2022-10-25 14:29:41 -07:00
Yanqin Jin 95a1935cb1 Run clang-format on utilities/transactions (#10871)
Summary:
This PR is the result of running the following command
```
find ./utilities/transactions/ -name '*.cc' -o -name '*.h' -o -name '*.c' -o -name '*.hpp' -o -name '*.cpp' | xargs clang-format -i
```

Pull Request resolved: https://github.com/facebook/rocksdb/pull/10871

Test Plan: make check

Reviewed By: cbi42

Differential Revision: D40686871

Pulled By: riversand963

fbshipit-source-id: 613738d667ec8f8e13cce4802e0e166d6be52211
2022-10-25 14:15:22 -07:00
Yanqin Jin 84563a2701 Run clang-format on some files in db/db_impl directory (#10869)
Summary:
Run clang-format on some files in db/db_impl/ directory

```
clang-format -i <file>
```

Pull Request resolved: https://github.com/facebook/rocksdb/pull/10869

Test Plan: make check

Reviewed By: ltamasi

Differential Revision: D40685390

Pulled By: riversand963

fbshipit-source-id: 64449ccb21b0d61c5142eb2bcbff828acb45c154
2022-10-25 13:49:09 -07:00
anand76 727bad78b8 Format files under table/ by clang-format (#10852)
Summary:
Run clang-format on files under the `table` directory.

Pull Request resolved: https://github.com/facebook/rocksdb/pull/10852

Reviewed By: ajkr

Differential Revision: D40650732

Pulled By: anand1976

fbshipit-source-id: 2023a958e37fd6274040c5181130284600c9e0ef
2022-10-25 11:50:38 -07:00
Changyu Bi 7a95938899 Improve FragmentTombstones() speed by lazily initializing `seq_set_` (#10848)
Summary:
FragmentedRangeTombstoneList has a member variable `seq_set_` that contains the sequence numbers of all range tombstones in a set. The set is constructed in `FragmentTombstones()` and is used only in `FragmentedRangeTombstoneList::ContainsRange()` which only happens during compaction. This PR moves the initialization of `seq_set_` to `FragmentedRangeTombstoneList::ContainsRange()`. This should speed up `FragmentTombstones()` when the range tombstone list is used for read/scan requests. Microbench shows the speed improvement to be ~45%.

Pull Request resolved: https://github.com/facebook/rocksdb/pull/10848

Test Plan:
- Existing tests and stress test: `python3 tools/db_crashtest.py whitebox --simple  --verify_iterator_with_expected_state_one_in=5`.
- Microbench: update `range_del_aggregator_bench` to benchmark speed of `FragmentTombstones()`:
```
./range_del_aggregator_bench --num_range_tombstones=1000 --tombstone_start_upper_bound=50000000 --num_runs=10000 --tombstone_width_mean=200 --should_deletes_per_run=100 --use_compaction_range_del_aggregator=true

Before this PR:
=========================
Fragment Tombstones:     270.286 us
AddTombstones:           1.28933 us
ShouldDelete (first):    0.525528 us
ShouldDelete (rest):     0.0797519 us

After this PR: time to fragment tombstones is pushed to AddTombstones() which only happen during compaction.
=========================
Fragment Tombstones:     149.879 us
AddTombstones:           102.131 us
ShouldDelete (first):    0.565871 us
ShouldDelete (rest):     0.0729444 us
```
- db_bench: this should improve speed for fragmenting range tombstones for mutable memtable:
```
./db_bench --benchmarks=readwhilewriting --writes_per_range_tombstone=100 --max_write_buffer_number=100 --min_write_buffer_number_to_merge=100 --writes=500000 --reads=250000 --disable_auto_compactions --max_num_range_tombstones=100000 --finish_after_writes --write_buffer_size=1073741824 --threads=25

Before this PR:
readwhilewriting :      18.301 micros/op 1310445 ops/sec 4.769 seconds 6250000 operations;   28.1 MB/s (41001 of 250000 found)
After this PR:
readwhilewriting :      16.943 micros/op 1439376 ops/sec 4.342 seconds 6250000 operations;   23.8 MB/s (28977 of 250000 found)
```

Reviewed By: ajkr

Differential Revision: D40646227

Pulled By: cbi42

fbshipit-source-id: ea471667edb258f67d01cfd828588e80a89e4083
2022-10-25 11:33:04 -07:00
Hui Xiao fc74abb436 Fix FIFO causing overlapping seqnos in L0 files due to overlapped seqnos between ingested files and memtable's (#10777)
Summary:
**Context:**
Same as https://github.com/facebook/rocksdb/pull/5958#issue-511150930 but apply the fix to FIFO Compaction case
Repro:
```
COERCE_CONTEXT_SWICH=1 make -j56 db_stress

./db_stress --acquire_snapshot_one_in=0 --adaptive_readahead=0 --allow_data_in_errors=True --async_io=1 --avoid_flush_during_recovery=0 --avoid_unnecessary_blocking_io=0 --backup_max_size=104857600 --backup_one_in=0 --batch_protection_bytes_per_key=0 --block_size=16384 --bloom_bits=18 --bottommost_compression_type=disable --bytes_per_sync=262144 --cache_index_and_filter_blocks=0 --cache_size=8388608 --cache_type=lru_cache --charge_compression_dictionary_building_buffer=0 --charge_file_metadata=1 --charge_filter_construction=1 --charge_table_reader=1 --checkpoint_one_in=0 --checksum_type=kCRC32c --clear_column_family_one_in=0 --column_families=1 --compact_files_one_in=0 --compact_range_one_in=1000 --compaction_pri=3 --open_files=-1 --compaction_style=2 --fifo_allow_compaction=1 --compaction_ttl=0 --compression_max_dict_buffer_bytes=8388607 --compression_max_dict_bytes=16384 --compression_parallel_threads=1 --compression_type=zlib --compression_use_zstd_dict_trainer=1 --compression_zstd_max_train_bytes=0 --continuous_verification_interval=0 --data_block_index_type=0 --db=/dev/shm/rocksdb_test0/rocksdb_crashtest_whitebox --db_write_buffer_size=8388608 --delpercent=4 --delrangepercent=1 --destroy_db_initially=1 --detect_filter_construct_corruption=0 --disable_wal=0 --enable_compaction_filter=0 --enable_pipelined_write=1 --fail_if_options_file_error=1 --file_checksum_impl=none --flush_one_in=1000 --format_version=5 --get_current_wal_file_one_in=0 --get_live_files_one_in=0 --get_property_one_in=0 --get_sorted_wal_files_one_in=0 --index_block_restart_interval=15 --index_type=3 --ingest_external_file_one_in=100 --initial_auto_readahead_size=0 --iterpercent=10 --key_len_percent_dist=1,30,69 --level_compaction_dynamic_level_bytes=True --log2_keys_per_lock=10 --long_running_snapshots=0 --mark_for_compaction_one_file_in=10 --max_auto_readahead_size=16384 --max_background_compactions=20 --max_bytes_for_level_base=10485760 --max_key=100000 --max_key_len=3 --max_manifest_file_size=1073741824 --max_write_batch_group_size_bytes=1048576 --max_write_buffer_number=3 --max_write_buffer_size_to_maintain=4194304 --memtable_prefix_bloom_size_ratio=0.5 --memtable_protection_bytes_per_key=1 --memtable_whole_key_filtering=1 --memtablerep=skip_list --mmap_read=1 --mock_direct_io=False --nooverwritepercent=1 --num_file_reads_for_auto_readahead=0 --num_levels=1 --open_metadata_write_fault_one_in=0 --open_read_fault_one_in=32 --open_write_fault_one_in=0 --ops_per_thread=200000 --optimize_filters_for_memory=0 --paranoid_file_checks=1 --partition_filters=0 --partition_pinning=1 --pause_background_one_in=0 --periodic_compaction_seconds=0 --prefix_size=8 --prefixpercent=5 --prepopulate_block_cache=0 --progress_reports=0 --read_fault_one_in=0 --readahead_size=16384 --readpercent=45 --recycle_log_file_num=1 --reopen=20 --ribbon_starting_level=999 --snapshot_hold_ops=1000 --sst_file_manager_bytes_per_sec=0 --sst_file_manager_bytes_per_truncate=0 --subcompactions=2 --sync=0 --sync_fault_injection=0 --target_file_size_base=524288 --target_file_size_multiplier=2 --test_batches_snapshots=0 --top_level_index_pinning=3 --unpartitioned_pinning=0 --use_direct_io_for_flush_and_compaction=0 --use_direct_reads=0 --use_full_merge_v1=1 --use_merge=0 --use_multiget=1 --user_timestamp_size=0 --value_size_mult=32 --verify_checksum=1 --verify_checksum_one_in=0 --verify_db_one_in=1000 --verify_sst_unique_id_in_manifest=1 --wal_bytes_per_sync=0 --wal_compression=zstd --write_buffer_size=524288 --write_dbid_to_manifest=0 --writepercent=35

put or merge error: Corruption: force_consistency_checks(DEBUG): VersionBuilder: L0 file https://github.com/facebook/rocksdb/issues/479 with seqno 23711 29070 vs. file https://github.com/facebook/rocksdb/issues/482 with seqno 27138 29049
```

**Summary:**
FIFO only does intra-L0 compaction in the following four cases. For other cases, FIFO drops data instead of compacting on data, which is irrelevant to the overlapping seqno issue we are solving.
-  [FIFOCompactionPicker::PickSizeCompaction](https://github.com/facebook/rocksdb/blob/7.6.fb/db/compaction/compaction_picker_fifo.cc#L155) when `total size < compaction_options_fifo.max_table_files_size` and `compaction_options_fifo.allow_compaction == true`
   - For this path, we simply reuse the fix in `FindIntraL0Compaction` https://github.com/facebook/rocksdb/pull/5958/files#diff-c261f77d6dd2134333c4a955c311cf4a196a08d3c2bb6ce24fd6801407877c89R56
   - This path was not stress-tested at all. Therefore we covered `fifo.allow_compaction` in stress test to surface the overlapping seqno issue we are fixing here.
- [FIFOCompactionPicker::PickCompactionToWarm](https://github.com/facebook/rocksdb/blob/7.6.fb/db/compaction/compaction_picker_fifo.cc#L313) when `compaction_options_fifo.age_for_warm > 0`
  - For this path, we simply replicate the idea in https://github.com/facebook/rocksdb/pull/5958#issue-511150930 and skip files of largest seqno greater than `earliest_mem_seqno`
  - This path was not stress-tested at all. However covering `age_for_warm` option worths a separate PR to deal with db stress compatibility. Therefore we manually tested this path for this PR
- [FIFOCompactionPicker::CompactRange](https://github.com/facebook/rocksdb/blob/7.6.fb/db/compaction/compaction_picker_fifo.cc#L365) that ends up picking one of the above two compactions
- [CompactionPicker::CompactFiles](https://github.com/facebook/rocksdb/blob/7.6.fb/db/compaction/compaction_picker.cc#L378)
    - Since `SanitizeCompactionInputFiles()` will be called [before](https://github.com/facebook/rocksdb/blob/7.6.fb/db/compaction/compaction_picker.h#L111-L113) `CompactionPicker::CompactFiles` , we simply replicate the idea in https://github.com/facebook/rocksdb/pull/5958#issue-511150930  in `SanitizeCompactionInputFiles()`. To simplify implementation, we return `Stats::Abort()` on encountering seqno-overlapped file when doing compaction to L0 instead of skipping the file and proceed with the compaction.

Some additional clean-up included in this PR:
- Renamed `earliest_memtable_seqno` to `earliest_mem_seqno` for consistent naming
- Added comment about `earliest_memtable_seqno` in related APIs
- Made parameter `earliest_memtable_seqno` constant and required

Pull Request resolved: https://github.com/facebook/rocksdb/pull/10777

Test Plan:
- make check
- New unit test `TEST_P(DBCompactionTestFIFOCheckConsistencyWithParam, FlushAfterIntraL0CompactionWithIngestedFile)`corresponding to the above 4 cases, which will fail accordingly without the fix
- Regular CI stress run on this PR + stress test with aggressive value https://github.com/facebook/rocksdb/pull/10761  and on FIFO compaction only

Reviewed By: ajkr

Differential Revision: D40090485

Pulled By: hx235

fbshipit-source-id: 52624186952ee7109117788741aeeac86b624a4f
2022-10-25 10:39:58 -07:00
sdong 2a551976f4 Run format check for *.h and *.cc files under java/ (#10851)
Summary:
Run format check for .h and .cc files to clean the format

Pull Request resolved: https://github.com/facebook/rocksdb/pull/10851

Test Plan: Watch CI tests to pass

Reviewed By: ajkr

Differential Revision: D40649723

fbshipit-source-id: 62d32cead0b3b8e6540e86d25451bd72642109eb
2022-10-25 09:26:51 -07:00
changyubi de34e7196f clang format files under monitoring/ (#10857)
Summary:
Ran find . -iname '*.h' -o -iname '*.cc' | xargs clang-format -i under monitoring/.

Pull Request resolved: https://github.com/facebook/rocksdb/pull/10857

Test Plan: existing CI.

Reviewed By: siying

Differential Revision: D40652600

Pulled By: cbi42

fbshipit-source-id: 2af2467c33995b093e07b7512b8c32ed4144968e
2022-10-24 20:45:54 -07:00
changyubi aca00006bf clang format files under test_util/ (#10855)
Summary:
Ran `find . -iname '*.h' -o -iname '*.cc' | xargs clang-format -i` under test_util/.

Pull Request resolved: https://github.com/facebook/rocksdb/pull/10855

Test Plan: existing CI.

Reviewed By: siying

Differential Revision: D40652583

Pulled By: cbi42

fbshipit-source-id: ed0fbcfe17b6f9ec217a64b80d6d43dfbf1cc34e
2022-10-24 20:32:25 -07:00
akankshamahajan 671753c43d Run Clang format on file folder (#10860)
Summary: Pull Request resolved: https://github.com/facebook/rocksdb/pull/10860

Test Plan: CircleCI jobs

Reviewed By: anand1976

Differential Revision: D40656236

Pulled By: akankshamahajan15

fbshipit-source-id: 557600db5c2e0ab9b400655336c467307f7136de
2022-10-24 18:34:52 -07:00
akankshamahajan 935aae3bcf Run clang format on logging folder (#10861)
Summary: Pull Request resolved: https://github.com/facebook/rocksdb/pull/10861

Test Plan: CircleCI jobs

Reviewed By: siying

Differential Revision: D40654198

Pulled By: akankshamahajan15

fbshipit-source-id: 787be2575578b3aa3bd985509f96fdb9e02f7ad7
2022-10-24 18:13:43 -07:00
akankshamahajan ee3dbdc083 Run clang-format on env/ folder (#10859)
Summary: Pull Request resolved: https://github.com/facebook/rocksdb/pull/10859

Test Plan: CircleCI jobs

Reviewed By: anand1976

Differential Revision: D40653839

Pulled By: akankshamahajan15

fbshipit-source-id: ce75205ee34ee3896a77a807d5c556886de78b01
2022-10-24 17:54:14 -07:00
akankshamahajan 0ed1a800ed Fix override error in system_clock.h (#10858)
Summary:
Fix error
```
 rocksdb/system_clock.h:30:11: error: '~SystemClock' overrides a destructor but is not marked 'override' [-Werror,-Wsuggest-destructor-override]
virtual ~SystemClock() {}
```

Pull Request resolved: https://github.com/facebook/rocksdb/pull/10858

Test Plan: Ran internally

Reviewed By: siying

Differential Revision: D40652374

Pulled By: akankshamahajan15

fbshipit-source-id: 5dda8ca03ea57d709442c87e23e5fe097d7db672
2022-10-24 17:13:26 -07:00
sdong 7cf27eae0a clang format files under port/ (#10849)
Summary:
Run "clang-format" against files under port to make it happy.

Pull Request resolved: https://github.com/facebook/rocksdb/pull/10849

Test Plan: Watch existing CI to pass.

Reviewed By: anand1976

Differential Revision: D40645839

fbshipit-source-id: 582b4215503223795cf6234af90cc4e8e4eba773
2022-10-24 16:56:01 -07:00
Levi Tamasi 4d9cb433fa Run clang-format on utilities/ (except utilities/transactions/) (#10853)
Summary: Pull Request resolved: https://github.com/facebook/rocksdb/pull/10853

Test Plan: `make check`

Reviewed By: siying

Differential Revision: D40651315

Pulled By: ltamasi

fbshipit-source-id: 8b270ff4777a06464be86e376c2a680427866a46
2022-10-24 16:38:09 -07:00
akankshamahajan 966cd42c7d Update header file to include right copyright (#10854)
Summary: Pull Request resolved: https://github.com/facebook/rocksdb/pull/10854

Reviewed By: siying

Differential Revision: D40651483

Pulled By: akankshamahajan15

fbshipit-source-id: 95ce53297e9699a34cc80439bc7553f6cc3ac957
2022-10-24 16:13:16 -07:00