Commit graph

1572 commits

Author SHA1 Message Date
Hui Xiao 3bbacda9b1 Disallow inplace_update_support with allow_concurrent_memtable_write (#12550)
Summary:
**Context/Summary:**
In-place memtable updates (inplace_update_support) is not compatible with concurrent writes (allow_concurrent_memtable_write). So we disallow this combination in crash test

Pull Request resolved: https://github.com/facebook/rocksdb/pull/12550

Test Plan: CI

Reviewed By: jaykorean

Differential Revision: D56204269

Pulled By: hx235

fbshipit-source-id: 06608f2591db5e37470a1da6afcdfd2701781c2d
2024-04-16 19:41:38 -07:00
Hui Xiao 24a35b6e57 Add more public APIs to crash/stress test (#12541)
Summary:
**Context/Summary:**
This PR includes some public DB APIs not tested in crash/stress yet can be added in a straightforward way.

Pull Request resolved: https://github.com/facebook/rocksdb/pull/12541

Test Plan:
- Locally run crash test heavily stressing on these new APIs
- CI

Reviewed By: jowlyzhang

Differential Revision: D56164892

Pulled By: hx235

fbshipit-source-id: 8bb568c3e65aec39d642987033f1d76c52f69bd8
2024-04-16 15:43:26 -07:00
Jay Huh dfdc3b158e Add offpeak feature to crash test (#12549)
Summary:
As title. Add offpeak feature in stress test.

Pull Request resolved: https://github.com/facebook/rocksdb/pull/12549

Test Plan:
Ran stress test locally with the flag set
```
Running db_stress with pid=701060: ./db_stress ... --daily_offpeak_time_utc=04:00-08:00 ... --periodic_compaction_seconds=10 ...
...
KILLED 701060

stdout:
 Choosing random keys with no overwrite
Creating 6250000 locks
2024/04/16-11:38:19  Initializing db_stress
RocksDB version           : 9.2
Format version            : 5
TransactionDB             : false
Stacked BlobDB            : false
Read only mode            : false
Atomic flush              : false
Manual WAL flush          : true
Column families           : 1
Clear CFs one in          : 0
Number of threads         : 32
Ops per thread            : 100000000
Time to live(sec)         : unused
Read percentage           : 60%
Prefix percentage         : 0%
Write percentage          : 35%
Delete percentage         : 4%
Delete range percentage   : 1%
No overwrite percentage   : 1%
Iterate percentage        : 0%
Custom ops percentage     : 0%
DB-write-buffer-size      : 0
Write-buffer-size         : 4194304
Iterations                : 10
Max key                   : 25000000
Ratio #ops/#keys          : 128.000000
Num times DB reopens      : 0
Batches/snapshots         : 0
Do update in place        : 0
Num keys per lock         : 4
Compression               : LZ4
Bottommost Compression    : DisableOption
Checksum type             : kxxHash
File checksum impl        : none
Bloom bits / key          : 18.000000
Max subcompactions        : 4
Use MultiGet              : false
Use GetEntity             : false
Use MultiGetEntity        : false
Verification only         : false
Memtablerep               : skip_list
Test kill odd             : 0
Periodic Compaction Secs  : 10
Daily Offpeak UTC         : 04:00-08:00  <<<<<<<<<<<<<<< Newly added
Compaction TTL            : 0
Compaction Pri            : kMinOverlappingRatio
Background Purge          : 0
Write DB ID to manifest   : 0
Max Write Batch Group Size: 16
Use dynamic level         : 1
Read fault one in         : 0
Write fault one in        : 1000
Open metadata write fault one in:
                            8
Sync fault injection      : 0
Best efforts recovery     : 0
Fail if OPTIONS file error: 0
User timestamp size bytes : 0
Persist user defined timestamps : 1
WAL compression           : zstd
Try verify sst unique id  : 1
------------------------------------------------
```

Reviewed By: hx235

Differential Revision: D56203102

Pulled By: jaykorean

fbshipit-source-id: 11a9be7362b3b26940d74d41c8bf4ebac3f03a2d
2024-04-16 12:44:44 -07:00
Hui Xiao d41e568b1c Add inplace_update_support to crash/stress test (#12535)
Summary:
**Context/Summary:**
`inplace_update_support=true` is not tested in crash/stress test. Since it's not compatible with snapshots like compaction_filter, we need to sanitize its value in presence of snapshots-related options. A minor refactoring is added to centralize such sanitization in db_crashtest.py - see `check_multiget_consistency` and `check_multiget_entity_consistency`

Pull Request resolved: https://github.com/facebook/rocksdb/pull/12535

Test Plan: CI

Reviewed By: ajkr

Differential Revision: D56102978

Pulled By: hx235

fbshipit-source-id: 2e2ab6685a65123b14a321b99f45f60bc6509c6b
2024-04-15 16:11:58 -07:00
Hui Xiao ef26d68e8d Renable kAdmPolicyThreeQueue in crash test (#12524)
Summary:
Context/Summary:

We need a `nvm_sec_cache` when `kAdmPolicyThreeQueue` is used otherwise a nullptr cache will be accessed causing us segfault in https://github.com/facebook/rocksdb/pull/12521

Pull Request resolved: https://github.com/facebook/rocksdb/pull/12524

Test Plan: - Re-enabled `kAdmPolicyThreeQueue` and rehearsed stress test that failed before this fix and pass after

Reviewed By: jowlyzhang

Differential Revision: D55997093

Pulled By: hx235

fbshipit-source-id: e1c6f1015091b4cff0ce6a3fff981d5dece52a62
2024-04-11 14:53:11 -07:00
Hui Xiao 447b7aa7ed Temporarily disable kAdmPolicyThreeQueue in crash test (#12521)
Summary:
**Context/Summary**

This policy leads to segfault in `CompressedCacheSetCapacityThread` with some build/compilation. Before figuring out the why, disable it for now.

**Test**
Rehearse stress test that failed before the fix but passes after

Pull Request resolved: https://github.com/facebook/rocksdb/pull/12521

Reviewed By: jowlyzhang

Differential Revision: D55942399

Pulled By: hx235

fbshipit-source-id: 85f28e50d596dcfd4a316481570b78fdce58ed0b
2024-04-09 16:15:54 -07:00
Hui Xiao ad423abbd1 Add more missing options in crash test (#12508)
Summary:
**Context/Summary:**
This is to improve our crash test coverage.

Bonus change:
- Added the missing Options string mapping for `CacheTier::kVolatileCompressedTier`
- Deprecated crash test options `enable_tiered_storage` mainly for setting `last_level_temperature` which is now covered in crash test by itself
- Intensified `verify_checksum_one_in\verify_file_checksums_one_in` as I found out these together with new coverage surface more issues

Pull Request resolved: https://github.com/facebook/rocksdb/pull/12508

Test Plan: CI to look out for trivial failures

Reviewed By: jowlyzhang

Differential Revision: D55768594

Pulled By: hx235

fbshipit-source-id: 9b829da0309a7db3fcdb17992a524dd64498325c
2024-04-08 09:48:03 -07:00
Changyu Bi 7c28dc8beb Enable parallel compression in crash test (#12506)
Summary:
Since some internal user might be interested in using this feature.

Pull Request resolved: https://github.com/facebook/rocksdb/pull/12506

Test Plan:
The option was disabled in stress test due to causing failures.
I've ran a round of crash tests internally and there was no failure due to parallel compression. Will monitor if more runs cause failures. So we will know at least how it's broken and decide to fix them or reverse the change.

Reviewed By: jowlyzhang

Differential Revision: D55747552

Pulled By: cbi42

fbshipit-source-id: ae5cda78c338b8b58f651c557d9b70790362444d
2024-04-05 10:29:08 -07:00
Hui Xiao 8e6e8957fb Disable wal_bytes_per_sync at one more place (#12492)
Summary:
Summary/Context: supplement to https://github.com/facebook/rocksdb/pull/12489

Pull Request resolved: https://github.com/facebook/rocksdb/pull/12492

Test Plan: CI

Reviewed By: jaykorean

Differential Revision: D55612747

Pulled By: hx235

fbshipit-source-id: 5c8fbda3e6c8482f2a3363a98a545f1c11e4ea27
2024-04-02 09:44:37 -07:00
Hui Xiao 21d11de761 Temporarily disable wal_bytes_per_sync in crash test (#12489)
Summary:
**Context/Summary:**

`wal_bytes_per_sync > 0` can sync newer WAL but not an older WAL by its nature. This creates a hole in synced WAL data. By our crash test, we recently discovered that our DB can recover past that hole. This resulted in crash-recovery-verification error. Before we fix that recovery behavior, we will temporarily disable `wal_bytes_per_sync` in crash test

Bonus: updated the API to make the nature of this option more explicitly documented

Pull Request resolved: https://github.com/facebook/rocksdb/pull/12489

Test Plan: More stabilized crash test

Reviewed By: ajkr

Differential Revision: D55531589

Pulled By: hx235

fbshipit-source-id: 6dea6486420dc0f50550d488c15652f93972a0ea
2024-03-29 13:01:15 -07:00
Andrew Kryczka 3d4e78937a Initialize FaultInjectionTestFS::checksum_handoff_func_type_ to kCRC32c (#12485)
Summary:
Previously it was uninitialized. Setting `checksum_handoff_file_types` will cause `kCRC32c` checksums to be passed down in the `DataVerificationInfo`, so it makes sense for `kCRC32c` to be the default.

Pull Request resolved: https://github.com/facebook/rocksdb/pull/12485

Test Plan:
ran `db_stress` in a way that failed before. Building with ASAN was needed to ensure the uninitialized bytes are nonzero according to `malloc_fill_byte` (default 0xbe)

```
$ COMPILE_WITH_ASAN=1 make -j28 db_stress
...
$ ./db_stress -sync_fault_injection=1 -enable_checksum_handoff=true
```

Reviewed By: jaykorean

Differential Revision: D55450587

Pulled By: ajkr

fbshipit-source-id: 53dc829b86e49b3fa80570032e83af0bb12adaad
2024-03-27 18:37:58 -07:00
akankshamahajan 1856734821 Branch cut 9.1.fb (#12476)
Summary: Pull Request resolved: https://github.com/facebook/rocksdb/pull/12476

Reviewed By: jowlyzhang

Differential Revision: D55319508

Pulled By: akankshamahajan15

fbshipit-source-id: 2b6db671e027511282775c0fea155335d8e73cc2
2024-03-25 15:07:43 -07:00
Peter Dillinger b515a5db3f Replace ScopedArenaIterator with ScopedArenaPtr<InternalIterator> (#12470)
Summary:
ScopedArenaIterator is not an iterator. It is a pointer wrapper. And we don't need a custom implemented pointer wrapper when std::unique_ptr can be instantiated with what we want.

So this adds ScopedArenaPtr<T> to replace those uses.

Pull Request resolved: https://github.com/facebook/rocksdb/pull/12470

Test Plan: CI (including ASAN/UBSAN)

Reviewed By: jowlyzhang

Differential Revision: D55254362

Pulled By: pdillinger

fbshipit-source-id: cc96a0b9840df99aa807f417725e120802c0ae18
2024-03-22 13:40:42 -07:00
anand76 63a105a481 Enable recycle_log_file_num option for point in time recovery (#12403)
Summary:
This option was previously disabled due to a bug in the recovery logic. The recovery code in `DBImpl::RecoverLogFiles` couldn't tell if an EoF reported by the log reader was really an EoF or a possible corruption that made a record look like an old log record. To fix this, the log reader now explicitly reports when it encounters what looks like an old record. The recovery code treats it as a possible corruption, and uses the next sequence number in the WAL to determine if it should continue replaying the WAL.

This PR also fixes a couple of bugs that log file recycling exposed in the backup and checkpoint path.

Pull Request resolved: https://github.com/facebook/rocksdb/pull/12403

Test Plan:
1. Add new unit tests to verify behavior upon corruption
2. Re-enable disabled tests for verifying recycling behavior

Reviewed By: ajkr

Differential Revision: D54544824

Pulled By: anand1976

fbshipit-source-id: 12f5ce39bd6bc0d63b0bc6432dc4db510e0e802a
2024-03-21 12:29:35 -07:00
Richard Barnes fc40165614 Remove extra semi colon from internal_repo_rocksdb/repo/tools/ldb_cmd.cc
Summary:
`-Wextra-semi` or `-Wextra-semi-stmt`

If the code compiles, this is safe to land.

Reviewed By: palmje

Differential Revision: D54362227

fbshipit-source-id: ac634ba34f9351ba559c4ed96448f51d6ef33175
2024-03-18 18:51:50 -07:00
Changyu Bi 3d5be596a5 Fix a bug in iterator with UDT + ReadOptions::pin_data (#12451)
Summary:
with https://github.com/facebook/rocksdb/issues/12414 enabling `ReadOptions::pin_data`, this bug surfaced as corrupted per key-value checksum during crash test. `saved_key_.GetUserKey()` could be pinned user key, so DBIter should not overwrite it.

In one case, it only surfaces when iterator skips many keys of the same user key. To stress that code path, this PR also added `max_sequential_skip_in_iterations` to crash test.

Pull Request resolved: https://github.com/facebook/rocksdb/pull/12451

Test Plan:
- Set ReadOptions::pin_data to true, the bug can be reproed quickly with `./db_stress --persist_user_defined_timestamps=1 --user_timestamp_size=8 --writepercent=35 --delpercent=4 --delrangepercent=1 --iterpercent=20 --nooverwritepercent=1 --prefix_size=8 --prefixpercent=10 --readpercent=30 --memtable_protection_bytes_per_key=8 --block_protection_bytes_per_key=2 --clear_column_family_one_in=0`.
    - Set max_sequential_skip_in_iterations to 1 for the other occurrence of the bug.

Reviewed By: jowlyzhang

Differential Revision: D55003766

Pulled By: cbi42

fbshipit-source-id: 23e1049129456684dafb028b6132b70e0afc07fb
2024-03-18 09:05:11 -07:00
Changyu Bi ba022dd44c Disable enable_checksum_handoff in crash test (#12431)
Summary:
since it been causing a few crash tests failures, I suspect it'll be easy to repro locally. Also fixed how to print its corruption message so it does not crash with output cannot be utf-8 decoded.

Pull Request resolved: https://github.com/facebook/rocksdb/pull/12431

Reviewed By: hx235

Differential Revision: D54881023

Pulled By: cbi42

fbshipit-source-id: 47208a637cd69b30d2545154849405e37db62ed3
2024-03-13 18:03:55 -07:00
Hui Xiao 30243c6573 Add missing db crash options (#12414)
Summary:
**Context/Summary:**
We are doing a sweep in all public options, including but not limited to the `Options`, `Read/WriteOptions`, `IngestExternalFileOptions`, cache options.., to find and add the uncovered ones into db crash. The options included in this PR require minimum changes to db crash other than adding the options themselves.

A bonus change: to surface new issues by improved coverage in stderror, we decided to fail/terminate crash test for manual compactions (CompactFiles, CompactRange()) on meaningful errors. See https://github.com/facebook/rocksdb/pull/12414/files#diff-5c4ced6afb6a90e27fec18ab03b2cd89e8f99db87791b4ecc6fa2694284d50c0R2528-R2532, https://github.com/facebook/rocksdb/pull/12414/files#diff-5c4ced6afb6a90e27fec18ab03b2cd89e8f99db87791b4ecc6fa2694284d50c0R2330-R2336 for more.

Pull Request resolved: https://github.com/facebook/rocksdb/pull/12414

Test Plan:
- Run `python3 ./tools/db_crashtest.py --simple blackbox` for 10 minutes to ensure no trivial failure
- Run `python3 tools/db_crashtest.py --simple blackbox --compact_files_one_in=1 --compact_range_one_in=1 --read_fault_one_in=1 --write_fault_one_in=1 --interval=50` for a while to ensure the bonus change does not result in trivial crash/termination of stress test

Reviewed By: ajkr, jowlyzhang, cbi42

Differential Revision: D54691774

Pulled By: hx235

fbshipit-source-id: 50443dfb6aaabd8e24c79a2e42b68c6de877be88
2024-03-12 17:24:12 -07:00
Changyu Bi 36c1b0aded Allow SstFileReader to verify number of entries in SST files (#12418)
Summary:
Add `SstFileReader::VerifyNumEntries()` for this purpose. I added the same functionality to `sst_dump` in https://github.com/facebook/rocksdb/issues/12322. Since sst_file_reader.h is exposed to users while sst_dump.h is not, it seems more appropriate to add SST files related APIs here.

Pull Request resolved: https://github.com/facebook/rocksdb/pull/12418

Test Plan: `./sst_file_reader_test --gtest_filter="*VerifyNumEntries*"`

Reviewed By: jowlyzhang

Differential Revision: D54764271

Pulled By: cbi42

fbshipit-source-id: 22ebfe04bbb0b152762cee13d4210b147b36d3e9
2024-03-12 11:05:20 -07:00
Andrew Kryczka 27a2473668 Best-effort recovery support for atomic flush (#12406)
Summary:
This PR updates `VersionEditHandlerPointInTime` to recover all or none of the updates in an AtomicGroup. This makes best-effort recovery properly handle atomic flushes during recovery, so the features are now allowed to both be enabled at once.

The new logic requires that AtomicGroups do not contain column family additions or removals. AtomicGroups are currently written for atomic flush, which does not include such edits.

Column family additions or removals are recovered independently of AtomicGroups. The new logic needs to be aware of removal, though, so that a dropped CF does not prevent completion of an AtomicGroup recovery.

The new logic treats each AtomicGroup as if it contains updates for all existing column families, even though it is possible to create AtomicGroups that only affect a subset of column families. This simplifies the logic at the expense of recovering less data in certain edge case scenarios.

The usage of `MaybeCreateVersion()` is pretty tricky. The goal is to create a barrier at the start of an AtomicGroup such that all valid states up to that point will be applied to `versions_`. Here is a summary.

- `MaybeCreateVersion(..., false)` creates a `Version` on a negative edge trigger (transition from valid to invalid). It was  previously called when applying each update. Now, it is only called when applying non-AtomicGroup updates.
- `MaybeCreateVersion(..., true)` creates a `Version` on a positive level trigger (valid state). It was previously called only at the end of iteration. Now, it is additionally called before processing an AtomicGroup.

Pull Request resolved: https://github.com/facebook/rocksdb/pull/12406

Reviewed By: jaykorean, cbi42

Differential Revision: D54494904

Pulled By: ajkr

fbshipit-source-id: 0114a9fe1d04b471d086dcab5978ea8a3a56ad52
2024-03-06 14:40:40 -08:00
Peter Dillinger d780e7a561 Remove bottommost_temperature (#12389)
Summary:
deprecated option already replaced by `last_level_temperature`. (Keeping recognition of the option in old options files.)

Pull Request resolved: https://github.com/facebook/rocksdb/pull/12389

Test Plan: tests updated

Reviewed By: jowlyzhang, cbi42

Differential Revision: D54267946

Pulled By: pdillinger

fbshipit-source-id: 65c49b15e7394829c1f3b44edd4179d2daff6017
2024-02-27 14:48:00 -08:00
Andrew Kryczka a43481b3d0 Decouple RateLimiter burst size and refill period (#12379)
Summary:
When the rate limiter does not have any waiting requests, the first request to arrive may consume all of the available bandwidth, despite potentially having lower priority than requests that arrive later in the same refill interval. Then, those higher priority requests must wait for a refill. So even in scenarios in which we have an overall bandwidth surplus, the highest priority requests can be sporadically delayed up to a whole refill period.

Alone, this isn't necessarily problematic as the refill period is configurable via `refill_period_us` and can be tuned down as needed until the max sporadic delay is tolerable. However, tuning down `refill_period_us` had a side effect of reducing burst size. Some users require a certain burst size to issue optimal I/O sizes to the underlying storage system.

To satisfy those users, this PR decouples the refill period from the burst size. That way, the max sporadic delay can be limited without impacting I/O sizes issued to the underlying storage system.

Pull Request resolved: https://github.com/facebook/rocksdb/pull/12379

Test Plan:
The goal is to show we can now limit the max sporadic delay without impacting compaction's I/O size.

The benchmark runs compaction with a large I/O size, while user reads simultaneously run at a low rate that does not consume all of the available bandwidth. The max sporadic delay is measured using the P100 of rocksdb.file.read.get.micros. I just used strace to verify the compaction reads follow `rate_limiter_single_burst_bytes`

Setup: `./db_bench -benchmarks=fillrandom,flush -write_buffer_size=67108864 -disable_auto_compactions=true -value_size=256 -num=1048576`

Benchmark: `./db_bench -benchmarks=readrandom -use_existing_db=true -num=1048576 -duration=10 -benchmark_read_rate_limit=4096 -rate_limiter_bytes_per_sec=67108864 -rate_limiter_refill_period_us=$refill_micros -rate_limiter_single_burst_bytes=16777216 -rate_limit_bg_reads=true -rate_limit_user_ops=true -statistics=true -cache_size=0 -stats_level=5 -compaction_readahead_size=16777216 -use_direct_reads=true`

Results:

refill_micros | rocksdb.file.read.get.micros (P100)
-- | --
10000 | 10802
100000 | 100240
1000000 | 922061

For verifying compaction read sizes: `strace -fye pread64 ./db_bench -benchmarks=compact -use_existing_db=true -rate_limiter_bytes_per_sec=67108864 -rate_limiter_refill_period_us=$refill_micros -rate_limiter_single_burst_bytes=16777216 -rate_limit_bg_reads=true -compaction_readahead_size=16777216 -use_direct_reads=true`

Reviewed By: hx235

Differential Revision: D54165675

Pulled By: ajkr

fbshipit-source-id: c5968486316cbfb7ff8e5b7d75d3589883dd1105
2024-02-26 16:55:13 -08:00
jrchyang 70cb330a4a optimize file size statistics in benchmark script (#12363)
Summary:
Execute `ls` once when counting the file size of the `DB_DIR` and remove unused file number counter variable `c` . The test information as follow :

```Shell
# benchmark command

NUM_KEYS=30000000 CACHE_SIZE=6442450944 DB_DIR=/mnt/rocksdb_test WAL_DIR=/mnt/rocksdb_test ../tools/benchmark.sh fillseq_disable_wal

# before modification

cat /tmp/benchmark_fillseq.wal_disabled.v400.log.stats.sizes
0.0	0.0	0.0	0.0	195250
1.1	1.1	0.0	0.0	195300
2.5	2.5	0.0	0.0	195310
3.8	3.7	0.0	0.0	195320
5.1	5.1	0.0	0.0	195330
max sizes (GB): 5.1 all, 5.1 sst, 0.0 log, 0.0 blob

# after modification

cat /tmp/benchmark_fillseq.wal_disabled.v400.log.stats.sizes
0.0	0.0	0.0	0.0	194839
1.2	1.2	0.0	0.0	194849
2.6	2.6	0.0	0.0	194859
4.0	4.0	0.0	0.0	194909
5.4	5.4	0.0	0.0	194919
max sizes (GB): 5.4 all, 5.4 sst, 0.0 log, 0.0 blob
```

Pull Request resolved: https://github.com/facebook/rocksdb/pull/12363

Reviewed By: hx235

Differential Revision: D54005427

Pulled By: ajkr

fbshipit-source-id: fae149705eb3fcda48d7381c42836a150f35ddc4
2024-02-21 15:45:18 -08:00
Andrew Kryczka 8e29f243c9 No filesystem reads during Merge() writes (#12365)
Summary:
This occasional filesystem read in the write path has caused user pain. It doesn't seem very useful considering it only limits one component's merge chain length, and only helps merge uncached (i.e., infrequently read) values. This PR proposes allowing `max_successive_merges` to be exceeded when the value cannot be read from in-memory components. I included a rollback flag (`strict_max_successive_merges`) just in case.

Pull Request resolved: https://github.com/facebook/rocksdb/pull/12365

Test Plan:
"rocksdb.block.cache.data.add" is number of data blocks read from filesystem. Since the benchmark is write-only, compaction is disabled, and flush doesn't read data blocks, any nonzero value means the user write issued the read.

```
$ for s in false true; do echo -n "strict_max_successive_merges=$s: " && ./db_bench -value_size=64 -write_buffer_size=131072 -writes=128 -num=1 -benchmarks=mergerandom,flush,mergerandom -merge_operator=stringappend -disable_auto_compactions=true -compression_type=none -strict_max_successive_merges=$s -max_successive_merges=100 -statistics=true |& grep 'block.cache.data.add COUNT' ; done
strict_max_successive_merges=false: rocksdb.block.cache.data.add COUNT : 0
strict_max_successive_merges=true: rocksdb.block.cache.data.add COUNT : 1
```

Reviewed By: hx235

Differential Revision: D53982520

Pulled By: ajkr

fbshipit-source-id: e40f761a60bd601f232417ac0058e4a33ee9c0f4
2024-02-21 13:15:27 -08:00
Yu Zhang 31dfc81e18 Start 9.1.0 release (#12360)
Summary:
with release notes for 9.0.fb, format_compatible test update, and version.h update.

Pull Request resolved: https://github.com/facebook/rocksdb/pull/12360

Test Plan: CI

Reviewed By: cbi42

Differential Revision: D53879416

Pulled By: jowlyzhang

fbshipit-source-id: 29598893d9ce2d0bb181345ddb78f9b1529aee75
2024-02-16 18:26:48 -08:00
Peter Dillinger bfd00bba9c Use format_version=6 by default (#12352)
Summary:
It's in production for a large storage service, and it was initially released 6 months ago (8.6.0). IMHO that's enough room for "easy downgrade" to most any user's previously integrated version, even if they only update a few times a year.

Pull Request resolved: https://github.com/facebook/rocksdb/pull/12352

Test Plan:
tests updated, including format capatibility test

table_test: ApproximateOffsetOfCompressed is affected because adding index block to metaindex adds about 13 bytes
to SST files in format_version 6. This test has historically been problematic and one reason is that, apparently, not only
could it pass/fail depending on snappy compression version, but also how long your host name is, because of db_host_id.
I've cleared that out for the test, which takes care of format_version=6 and hopefully improves long-term reliability.

Suggested follow-up: FinishImpl in table_test.cc takes a table_options that is ignored in some cases and might not match
the ioptions.table_factory configuration unless the caller is very careful. This should be cleaned up somehow.

Reviewed By: anand1976

Differential Revision: D53786884

Pulled By: pdillinger

fbshipit-source-id: 1964cbd40d3ab0a821fdc01c458031df716fcf51
2024-02-15 11:23:48 -08:00
Yu Zhang 4bea83aa44 Remove the force mode for EnableFileDeletions API (#12337)
Summary:
There is no strong reason for user to need this mode while on the other hand, its behavior is destructive.

Pull Request resolved: https://github.com/facebook/rocksdb/pull/12337

Reviewed By: hx235

Differential Revision: D53630393

Pulled By: jowlyzhang

fbshipit-source-id: ce94b537258102cd98f89aa4090025663664dd78
2024-02-13 18:36:25 -08:00
Jay Huh 8c7c0a38f1 Minor refactor with printing stdout in blackbox tests (#12350)
Summary:
As title. Adding a missing stdout printing in `blackbox_crash_main()`

# Test

**Blackbox**
```
$> python3 tools/db_crashtest.py blackbox --simple --max_key=25000000 --write_buffer_size=4194304
```
```
...
stdout:
 Choosing random keys with no overwrite
DB path: [/tmp/jewoongh/rocksdb_crashtest_blackbox34jwn9of]
(Re-)verified 0 unique IDs
2024/02/13-12:27:33  Initializing worker threads
Crash-recovery verification passed :)
2024/02/13-12:27:36  Starting database operations
...
jewoongh stdout test
jewoongh stdout test
...
jewoongh stdout test
stderr:
 jewoongh injected error
```

**Whitebox**
```
$> python3 tools/db_crashtest.py whitebox --simple --max_key=25000000 --write_buffer_size=4194304
```
```
...
stdout:
 Choosing random keys with no overwrite
Creating 24415 locks
...
2024/02/13-12:31:51  Initializing worker threads
Crash-recovery verification passed :)
2024/02/13-12:31:54  Starting database operations
jewoongh stdout test
jewoongh stdout test
jewoongh stdout test
...
stderr:
 jewoongh injected error
...
```

Pull Request resolved: https://github.com/facebook/rocksdb/pull/12350

Reviewed By: akankshamahajan15, cbi42

Differential Revision: D53728910

Pulled By: jaykorean

fbshipit-source-id: ec90ed3b5e6a1102d1fb55d357d0371e5072a173
2024-02-13 14:15:52 -08:00
Changyu Bi b46f5707c4 Fix unexpected keyword argument 'print_as_stderr' in crash test (#12339)
Summary:
Fix crash test failure like https://github.com/facebook/rocksdb/actions/runs/7821514511/job/21338625372#step:5:530

Pull Request resolved: https://github.com/facebook/rocksdb/pull/12339

Test Plan: CI

Reviewed By: jaykorean

Differential Revision: D53545053

Pulled By: cbi42

fbshipit-source-id: b466a8dc9c0ded0377e8677937199c6f959f96ef
2024-02-07 15:44:17 -08:00
Akanksha Mahajan 9a2d7485f0 Print stderr in crash test script and exit on stderr (#12335)
Summary:
Some of the errors like data race and heap-after-use are error out based on crash test reporting them as error by relying on stderr. So reverting back to original form unless we come up with a more reliable solution to error out.

Pull Request resolved: https://github.com/facebook/rocksdb/pull/12335

Reviewed By: cbi42

Differential Revision: D53534781

Pulled By: akankshamahajan15

fbshipit-source-id: b19aa560d1560ac2281f7bc04e13961ed751f178
2024-02-07 12:34:40 -08:00
Peter Dillinger 54cb9c77d9 Prefer static_cast in place of most reinterpret_cast (#12308)
Summary:
The following are risks associated with pointer-to-pointer reinterpret_cast:
* Can produce the "wrong result" (crash or memory corruption). IIRC, in theory this can happen for any up-cast or down-cast for a non-standard-layout type, though in practice would only happen for multiple inheritance cases (where the base class pointer might be "inside" the derived object). We don't use multiple inheritance a lot, but we do.
* Can mask useful compiler errors upon code change, including converting between unrelated pointer types that you are expecting to be related, and converting between pointer and scalar types unintentionally.

I can only think of some obscure cases where static_cast could be troublesome when it compiles as a replacement:
* Going through `void*` could plausibly cause unnecessary or broken pointer arithmetic. Suppose we have
`struct Derived: public Base1, public Base2`.  If we have `Derived*` -> `void*` -> `Base2*` -> `Derived*` through reinterpret casts, this could plausibly work (though technical UB) assuming the `Base2*` is not dereferenced. Changing to static cast could introduce breaking pointer arithmetic.
* Unnecessary (but safe) pointer arithmetic could arise in a case like `Derived*` -> `Base2*` -> `Derived*` where before the Base2 pointer might not have been dereferenced. This could potentially affect performance.

With some light scripting, I tried replacing pointer-to-pointer reinterpret_casts with static_cast and kept the cases that still compile. Most occurrences of reinterpret_cast have successfully been changed (except for java/ and third-party/). 294 changed, 257 remain.

A couple of related interventions included here:
* Previously Cache::Handle was not actually derived from in the implementations and just used as a `void*` stand-in with reinterpret_cast. Now there is a relationship to allow static_cast. In theory, this could introduce pointer arithmetic (as described above) but is unlikely without multiple inheritance AND non-empty Cache::Handle.
* Remove some unnecessary casts to void* as this is allowed to be implicit (for better or worse).

Most of the remaining reinterpret_casts are for converting to/from raw bytes of objects. We could consider better idioms for these patterns in follow-up work.

I wish there were a way to implement a template variant of static_cast that would only compile if no pointer arithmetic is generated, but best I can tell, this is not possible. AFAIK the best you could do is a dynamic check that the void* conversion after the static cast is unchanged.

Pull Request resolved: https://github.com/facebook/rocksdb/pull/12308

Test Plan: existing tests, CI

Reviewed By: ltamasi

Differential Revision: D53204947

Pulled By: pdillinger

fbshipit-source-id: 9de23e618263b0d5b9820f4e15966876888a16e2
2024-02-07 10:44:11 -08:00
Jay Huh 0088f77788 Multiget LDB Followup (#12332)
Summary:
# Summary

Following up jowlyzhang 's comment in https://github.com/facebook/rocksdb/issues/12283 .
- Remove `ARG_TTL` from help which is not relevant to `multi_get` command
- Treat NotFound status as non-error case for both `Get` and `MultiGet` and updated the unit test, `ldb_test.py`
- Print key along with value in `multi_get` command

Pull Request resolved: https://github.com/facebook/rocksdb/pull/12332

Test Plan:
**Unit Test**
```
$>python3 tools/ldb_test.py
...
Ran 25 tests in 17.447s

OK
```

**Manual Run**

```
$> ./ldb --db=/data/users/jewoongh/rocksdb_test/T173992396/rocksdb_crashtest_blackbox --hex multi_get 0x0000000000000009000000000000012B00000000000000D8 0x0000000000000009000000000000002678787878BEEF
0x0000000000000009000000000000012B00000000000000D8 ==> 0x47000000434241404F4E4D4C4B4A494857565554535251505F5E5D5C5B5A595867666564636261606F6E6D6C6B6A696877767574737271707F7E7D7C7B7A797807060504030201000F0E0D0C0B0A090817161514131211101F1E1D1C1B1A1918
Key not found: 0x0000000000000009000000000000002678787878BEEF
```

```
$> ./ldb --db=/data/users/jewoongh/rocksdb_test/T173992396/rocksdb_crashtest_blackbox --hex get 0x00000000000000090000000000
Key not found
```

Reviewed By: jowlyzhang

Differential Revision: D53450164

Pulled By: jaykorean

fbshipit-source-id: 9ccec78ad3695e65b1ed0c147c7cbac502a1bd48
2024-02-05 20:11:35 -08:00
Hui Xiao 1a885fe730 Remove deprecated Options::access_hint_on_compaction_start (#11654)
Summary:
**Context:**
`Options::access_hint_on_compaction_start ` is marked deprecated and now ready to be removed.

Pull Request resolved: https://github.com/facebook/rocksdb/pull/11654

Test Plan:
Multiple db_stress runs with pre-PR and post-PR binary randomly to ensure forward/backward compatibility on options 36a5686ec0?fbclid=IwAR2IcdAUdTvw9O9V5GkHEYJRGMVR9p7Ei-LMa-9qiXlj3z80DxjkxlGnP1E
`python3 tools/db_crashtest.py --simple blackbox --interval=30`

Reviewed By: cbi42

Differential Revision: D47892459

Pulled By: hx235

fbshipit-source-id: a62f46a0377fe143be7638e218978d5431c15c56
2024-02-05 13:35:19 -08:00
Peter Dillinger 1d6dbfb8b7 Rename IntTblPropCollector -> InternalTblPropColl (#12320)
Summary:
I've always found this name difficult to read, because it sounds like it's for collecting int(eger)
table properties.

I'm fixing this now to set up for a change that I have stubbed out in the public API (table_properties.h):
a new adapter function `TablePropertiesCollector::AsInternal()` that allows RocksDB-provided
TablePropertiesCollectors (such as CompactOnDeletionCollector) to implement the easier-to-upgrade
internal interface while still (superficially) implementing the public interface. In addition to added flexibility,
this should be a performance improvement as the adapter class UserKeyTablePropertiesCollector can be
avoided for such cases where a RocksDB-provided collector is used (AsInternal() returns non-nullptr).

table_properties.h is the only file with changes that aren't simple find-replace renaming.

Pull Request resolved: https://github.com/facebook/rocksdb/pull/12320

Test Plan: existing tests, CI

Reviewed By: ajkr

Differential Revision: D53336945

Pulled By: pdillinger

fbshipit-source-id: 02535bcb30bbfb00e29e8478af62e5dad50a63b8
2024-02-02 14:14:43 -08:00
Changyu Bi c6b1f6d182 Augment sst_dump tool to verify num_entries in table property (#12322)
Summary:
sst_dump --command=check can now compare number of keys in a file with num_entries in table property and reports corruption is there is a mismatch.

Pull Request resolved: https://github.com/facebook/rocksdb/pull/12322

Test Plan:
- new unit test for API `SstFileDumper::ReadSequential`
- ran sst_dump on a good and a bad file:
```
sst_dump --file=./32316112.sst
options.env is 0x7f68bfcb5000
Process ./32316112.sst
Sst file format: block-based
from [] to []

sst_dump --file=./32316115.sst
options.env is 0x7f6d0d2b5000
Process ./32316115.sst
Sst file format: block-based
from [] to []
./32316115.sst: Corruption: Table property has num_entries = 6050408 but scanning the table returns 6050406 records.
```

Reviewed By: jowlyzhang

Differential Revision: D53320481

Pulled By: cbi42

fbshipit-source-id: d84c996346a9575a5a2ea5f5fb09a9d3ee672cd6
2024-02-01 14:35:03 -08:00
Andrew Kryczka f9d45358ca Removed check_flush_compaction_key_order (#12311)
Summary:
`check_flush_compaction_key_order` option was introduced for the key order checking online validation. It gave users the ability to disable the validation without downgrade in case the validation caused inefficiencies or false positives. Over time this validation has shown to be cheap and correct, so the option to disable it can now be removed.

Pull Request resolved: https://github.com/facebook/rocksdb/pull/12311

Reviewed By: cbi42

Differential Revision: D53233379

Pulled By: ajkr

fbshipit-source-id: 1384361104021d6e3e580dce2ec123f9f99ce637
2024-01-31 16:30:26 -08:00
Peter Dillinger 76c834e441 Remove 'virtual' when implied by 'override' (#12319)
Summary:
... to follow modern C++ style / idioms.

Used this hack:
```
for FILE in `cat my_list_of_files`; do perl -pi -e 'BEGIN{undef $/;} s/ virtual( [^;{]* override)/$1/smg' $FILE; done
```

Pull Request resolved: https://github.com/facebook/rocksdb/pull/12319

Test Plan: existing tests, CI

Reviewed By: jaykorean

Differential Revision: D53275303

Pulled By: pdillinger

fbshipit-source-id: bc0881af270aa8ef4d0ae4f44c5a6614b6407377
2024-01-31 13:14:42 -08:00
Yu Zhang 071a146fa0 Add support for range deletion when user timestamps are not persisted (#12254)
Summary:
For the user defined timestamps in memtable only feature, some special handling for range deletion blocks are needed since both the key (start_key) and the value (end_key) of a range tombstone can contain user-defined timestamps. Handling for the key is taken care of in the same way as the other data blocks in the block based table. This PR adds the special handling needed for the value (end_key) part. This includes:

1) On the write path, when L0 SST files are first created from flush, user-defined timestamps are removed from an end key of a range tombstone. There are places where it's logically removed (replaced with a min timestamp) because there is still logic with the running comparator that expects a user key that contains timestamp. And in the block based builder, it is eventually physically removed before persisted in a block.

2) On the read path, when range deletion block is being read, we artificially pad a min timestamp to the end key of a range tombstone in `BlockBasedTableReader`.

3) For file boundary `FileMetaData.largest`, we artificially pad a max timestamp to it if it contains a range deletion sentinel. Anytime when range deletion end_key is used to update file boundaries, it's using max timestamp instead of the range tombstone's actual timestamp to mark it as an exclusive end. d69628e6ce/db/dbformat.h (L923-L935)
This max timestamp is removed when in memory `FileMetaData.largest` is persisted into Manifest, we pad it back when it's read from Manifest while handling related `VersionEdit` in `VersionEditHandler`.

Pull Request resolved: https://github.com/facebook/rocksdb/pull/12254

Test Plan: Added unit test and enabled this feature combination's stress test.

Reviewed By: cbi42

Differential Revision: D52965527

Pulled By: jowlyzhang

fbshipit-source-id: e8315f8a2c5268e2ae0f7aec8012c266b86df985
2024-01-29 11:37:34 -08:00
Jay Huh 8829ba9fe1 print stderr separately per option (#12301)
Summary:
While working on Meta's internal test triaging process, I found that `db_crashtest.py` was printing out `stdout` and `stderr` altogether. Adding an option to print `stderr` separately so that it's easy to extract only `stderr` from the test run.

`print_stderr_separately` is introduced as an optional parameter with default value `False` to keep the existing behavior as is (except a few minor changes).

Minor changes to the existing behavior
- We no longer print `stderr has error message:` and `***` prefix to each line. We simply print `stderr:` before printing `stderr` if stderr is printed in stdout and print `stderr` as is.
- We no longer print `times error occurred in output is ...` which doesn't appear to have any values

Pull Request resolved: https://github.com/facebook/rocksdb/pull/12301

Test Plan:
**Default Behavior (blackbox)**

Run printed everything as is
```
$> python3 tools/db_crashtest.py blackbox --simple --max_key=25000000 --write_buffer_size=4194304 2> /tmp/error.log
Running blackbox-crash-test with
interval_between_crash=120
total-duration=6000
...
Integrated BlobDB: blob files enabled 0, min blob size 0, blob file size 268435456, blob compression type NoCompression, blob GC enabled 0, cutoff 0.250000, force threshold 1.000000, blob compaction readahead size 0, blob file starting level 0
Integrated BlobDB: blob cache disabled
DB path: [/tmp/jewoongh/rocksdb_crashtest_blackboxwh7yxpec]
(Re-)verified 0 unique IDs
2024/01/29-09:16:30  Initializing worker threads
Crash-recovery verification passed :)
2024/01/29-09:16:35  Starting database operations
2024/01/29-09:16:35  Starting verification
Stress Test : 543.600 micros/op 8802 ops/sec
            : Wrote 0.00 MB (0.27 MB/sec) (50% of 10 ops)
            : Wrote 5 times
            : Deleted 1 times
            : Single deleted 0 times
            : 4 read and 0 found the key
            : Prefix scanned 0 times
            : Iterator size sum is 0
            : Iterated 0 times
            : Deleted 0 key-ranges
            : Range deletions covered 0 keys
            : Got errors 0 times
            : 0 CompactFiles() succeed
            : 0 CompactFiles() did not succeed

stderr:
WARNING: prefix_size is non-zero but memtablerep != prefix_hash
Error : jewoongh injected test error This is not a real failure.
Verification failed :(
```

Nothing in stderr
```
$> cat /tmp/error.log
```

**Default Behavior (whitebox)**
Run printed everything as is
```
$> python3 tools/db_crashtest.py whitebox --simple --max_key=25000000 --write_buffer_size=4194304 2> /tmp/error.log
Running whitebox-crash-test with
total-duration=10000
...
(Re-)verified 571 unique IDs
2024/01/29-09:33:53  Initializing worker threads
Crash-recovery verification passed :)
2024/01/29-09:35:16  Starting database operations
2024/01/29-09:35:16  Starting verification
Stress Test : 97248.125 micros/op 10 ops/sec
            : Wrote 0.00 MB (0.00 MB/sec) (12% of 8 ops)
            : Wrote 1 times
            : Deleted 0 times
            : Single deleted 0 times
            : 4 read and 1 found the key
            : Prefix scanned 1 times
            : Iterator size sum is 120868
            : Iterated 4 times
            : Deleted 0 key-ranges
            : Range deletions covered 0 keys
            : Got errors 0 times
            : 0 CompactFiles() succeed
            : 0 CompactFiles() did not succeed

stderr:
WARNING: prefix_size is non-zero but memtablerep != prefix_hash
Error : jewoongh injected test error This is not a real failure.
New cache capacity = 4865393
Verification failed :(

TEST FAILED. See kill option and exit code above!!!
```
Nothing in stderr
```
$> cat /tmp/error.log
```

**New option  added (blackbox)**
```
$> python3 tools/db_crashtest.py blackbox --simple --max_key=25000000 --write_buffer_size=4194304 --print_stderr_separately 2> /tmp/error.log
Running blackbox-crash-test with
interval_between_crash=120
total-duration=6000
...
Integrated BlobDB: blob files enabled 0, min blob size 0, blob file size 268435456, blob compression type NoCompression, blob GC enabled 0, cutoff 0.250000, force threshold 1.000000, blob compaction readahead size 0, blob file starting level 0
Integrated BlobDB: blob cache disabled
DB path: [/tmp/jewoongh/rocksdb_crashtest_blackbox7ybna32z]
(Re-)verified 0 unique IDs
Compaction filter factory: DbStressCompactionFilterFactory
2024/01/29-09:05:39  Initializing worker threads
Crash-recovery verification passed :)
2024/01/29-09:05:46  Starting database operations
2024/01/29-09:05:46  Starting verification
Stress Test : 235.917 micros/op 16000 ops/sec
            : Wrote 0.00 MB (0.16 MB/sec) (16% of 12 ops)
            : Wrote 2 times
            : Deleted 1 times
            : Single deleted 0 times
            : 9 read and 0 found the key
            : Prefix scanned 0 times
            : Iterator size sum is 0
            : Iterated 0 times
            : Deleted 0 key-ranges
            : Range deletions covered 0 keys
            : Got errors 0 times
            : 0 CompactFiles() succeed
            : 0 CompactFiles() did not succeed
```
stderr printed separately
```
$> cat /tmp/error.log
WARNING: prefix_size is non-zero but memtablerep != prefix_hash
Error : jewoongh injected test error This is not a real failure.
New cache capacity = 19461571
Verification failed :(
```

**New option  added (whitebox)**
```
$> python3 tools/db_crashtest.py whitebox --simple --max_key=25000000 --write_buffer_size=4194304 --print_stderr_separately 2> /tmp/error.log

Running whitebox-crash-test with
total-duration=10000
...
Integrated BlobDB: blob files enabled 0, min blob size 0, blob file size 268435456, blob compression type NoCompression, blob GC enabled 0, cutoff 0.250000, force threshold 1.000000, blob compaction readahead size 0, blob file starting level 0
Integrated BlobDB: blob cache disabled
DB path: [/tmp/jewoongh/rocksdb_crashtest_whiteboxtwj0ihn6]
(Re-)verified 157 unique IDs
2024/01/29-09:39:59  Initializing worker threads
Crash-recovery verification passed :)
2024/01/29-09:40:16  Starting database operations
2024/01/29-09:40:16  Starting verification
Stress Test : 742.474 micros/op 11801 ops/sec
            : Wrote 0.00 MB (0.27 MB/sec) (36% of 19 ops)
            : Wrote 7 times
            : Deleted 1 times
            : Single deleted 0 times
            : 8 read and 0 found the key
            : Prefix scanned 0 times
            : Iterator size sum is 0
            : Iterated 4 times
            : Deleted 0 key-ranges
            : Range deletions covered 0 keys
            : Got errors 0 times
            : 0 CompactFiles() succeed
            : 0 CompactFiles() did not succeed

TEST FAILED. See kill option and exit code above!!!
```
stderr printed separately
```
$> cat /tmp/error.log
WARNING: prefix_size is non-zero but memtablerep != prefix_hash
Error : jewoongh injected test error This is not a real failure.
Error : jewoongh injected test error This is not a real failure.
Error : jewoongh injected test error This is not a real failure.
New cache capacity = 4865393
Verification failed :(
```

Reviewed By: akankshamahajan15

Differential Revision: D53187491

Pulled By: jaykorean

fbshipit-source-id: 76f9100d08b96d014e41b7b88b206d69f0ae932b
2024-01-29 11:09:47 -08:00
akankshamahajan 36704e9227 Improve crash test script to not rely on std::errors for failures. (#12265)
Summary:
Right now crash_test relies on std::errors too to check for only errors/failures along with verification. However, that's not a reliable solution and many internal services logs benign errors/warnings in which case our test script fails.

Pull Request resolved: https://github.com/facebook/rocksdb/pull/12265

Test Plan: Keep std::errors but printout instead of failing and will monitor crash tests internally to see if there is any scenario which solely relies on std::error, in which case stress tests can be improve.

Reviewed By: ajkr, cbi42

Differential Revision: D52967000

Pulled By: akankshamahajan15

fbshipit-source-id: 5328c8b69480c7946fe6a9c72f9ffeede70ac2ad
2024-01-26 11:39:47 -08:00
chuhao zeng d82d179a5e Enhance ldb_cmd_tool to enable user pass in customized cfds (#12261)
Summary:
The current implementation of the ldb_cmd tool involves commenting out the user-passed column_family_descriptors, resulting in the tool consistently constructing its column_family_descriptors from the pre-existing OPTIONS file.

The proposed fix prioritizes user-passed column family descriptors, ensuring they take precedence over those specified in the OPTIONS file. This modification enhances the tool's adaptability and responsiveness to user configurations.

Pull Request resolved: https://github.com/facebook/rocksdb/pull/12261

Reviewed By: cbi42

Differential Revision: D52965877

Pulled By: ajkr

fbshipit-source-id: 334a83a8e1004c271b19e7ca09381a0e7cf87b03
2024-01-24 16:16:18 -08:00
Jay Huh 59f4cbef8c MultiGet support in ldb (#12283)
Summary:
While investigating test failures due to the inconsistency between `Get()` and `MultiGet()`, I realized that LDB currently doesn't support `MultiGet()`. This PR introduces the `MultiGet()` support in LDB. Tested the command manually. Unit test will follow in a separate PR.

Pull Request resolved: https://github.com/facebook/rocksdb/pull/12283

Test Plan:
When key not found
```
$> ./ldb --db=/data/users/jewoongh/rocksdb_test/T173992396/rocksdb_crashtest_blackbox --hex multi_get 0x0000000000000009000000000000012B00000000000002AB
Status for key 0x0000000000000009000000000000012B00000000000002AB: NotFound:
```
Compare the same key with get
```
$> ./ldb --db=/data/users/jewoongh/rocksdb_test/T173992396/rocksdb_crashtest_blackbox --hex get 0x0000000000000009000000000000012B00000000000002AB
Failed: Get failed: NotFound:
```

Multiple keys not found
```
$> ./ldb --db=/data/users/jewoongh/rocksdb_test/T173992396/rocksdb_crashtest_blackbox --hex multi_get 0x0000000000000009000000000000012B00000000000002AB 0x0000000000000009000000000000012B00000000000002AC                                                                                                                                                                                                        Status for key 0x0000000000000009000000000000012B00000000000002AB: NotFound:
Status for key 0x0000000000000009000000000000012B00000000000002AC: NotFound:
```

One of the keys found
```
$> ./ldb --db=/data/users/jewoongh/rocksdb_test/T173992396/rocksdb_crashtest_blackbox --hex multi_get 0x0000000000000009000000000000012B00000000000002AB 0x00000000000000090000000000000026787878787878
Status for key 0x0000000000000009000000000000012B00000000000002AB: NotFound:
0x22000000262724252A2B28292E2F2C2D32333031363734353A3B38393E3F3C3D02030001060704050A0B08090E0F0C0D12131011161714151A1B18191E1F1C1D
```

All of the keys found
```
$> ./ldb --db=/data/users/jewoongh/rocksdb_test/T173992396/rocksdb_crashtest_blackbox --hex multi_get 0x0000000000000009000000000000012B00000000000000D8 0x00000000000000090000000000000026787878787878                                                                                                                                                                                                            15:57:03
0x47000000434241404F4E4D4C4B4A494857565554535251505F5E5D5C5B5A595867666564636261606F6E6D6C6B6A696877767574737271707F7E7D7C7B7A797807060504030201000F0E0D0C0B0A090817161514131211101F1E1D1C1B1A1918
0x22000000262724252A2B28292E2F2C2D32333031363734353A3B38393E3F3C3D02030001060704050A0B08090E0F0C0D12131011161714151A1B18191E1F1C1D
```

Reviewed By: hx235

Differential Revision: D53048519

Pulled By: jaykorean

fbshipit-source-id: a6217905464c5f460a222e2b883bdff47b9dd9c7
2024-01-24 11:35:12 -08:00
Peter Dillinger 800cfae987 Start 9.0.0 release (#12256)
Summary:
with release notes for 8.11.fb, format_compatible test update, and version.h update.

Pull Request resolved: https://github.com/facebook/rocksdb/pull/12256

Test Plan: CI

Reviewed By: cbi42

Differential Revision: D52926051

Pulled By: pdillinger

fbshipit-source-id: adcf7119b065758599e904c16cbdf1d28811e0b4
2024-01-20 08:38:20 -08:00
Jay Huh d982260b63 Clean up after long-running whitebox crashtest (#12248)
Summary:
Currently, we treat the long-running whitebox_crash_test as passing. However, we were not cleaning up after ourselves when we killed the running test for running too long, which often caused out-of-space errors in subsequent tests (e.g., blackbox_crash_test after whitebox_crash_test).

Unless we want to start treating these timeouts as failures and need the DB output for investigation now, we should properly clean up the tmp dir.

Pull Request resolved: https://github.com/facebook/rocksdb/pull/12248

Test Plan:
```
$> make crash_test -j
```

Reviewed By: ajkr

Differential Revision: D52885342

Pulled By: jaykorean

fbshipit-source-id: 7c1f2ca7cf03d0705bb14155ee44d5d7a411c132
2024-01-19 16:25:39 -08:00
anand76 b49f9cdd3c Add CompressionOptions to the compressed secondary cache (#12234)
Summary:
Add ```CompressionOptions``` to ```CompressedSecondaryCacheOptions``` to allow users to set options such as compression level. It allows performance to be fine tuned.

Tests -
Run db_bench and verify compression options in the LOG file

Pull Request resolved: https://github.com/facebook/rocksdb/pull/12234

Reviewed By: ajkr

Differential Revision: D52758133

Pulled By: anand1976

fbshipit-source-id: af849fbffce6f84704387c195d8edba40d9548f6
2024-01-16 12:21:27 -08:00
Changyu Bi 9d58e3f63a Disable LockWAL() for multiops_wp_txn stress test (#12221)
Summary:
We test LockWAL() and UnlockWAL() by checking that latest sequence number is not changed: 1a1f9f1660/db_stress_tool/db_stress_test_base.cc (L920-L937). With writeprepared transaction, sequence number can be advanced in SwitchMemtable::WriteRecoverableState() when writing recoverable state: 1a1f9f1660/db/db_impl/db_impl_write.cc (L1560)

This PR disables LockWAL() tests for writeprepared transaction for now. We probably need to change how we test LockWAL() for writeprepared before re-enabling this test.

Pull Request resolved: https://github.com/facebook/rocksdb/pull/12221

Reviewed By: ajkr

Differential Revision: D52677076

Pulled By: cbi42

fbshipit-source-id: 27ee694878edf63e8f4ad52f769d4db401f511bc
2024-01-11 15:54:11 -08:00
Qiaolin Yu fa0190f885 Block cache analyzer: Calculate miss ratio for each caller (#10823)
Summary:
Currently, when `block_cache_trace_analyzer` analyzes the cache miss ratio, it only analyzes the total miss ratio.

But it seems also important to analyze the cache miss ratio of each caller. To achieve this, we can calculate and print the miss ratio of each caller in the analyzer.

## Before modification
```
Running for 1 seconds: Processed 85732 records/second. Trace duration 58 seconds. Observed miss ratio 7.97
```

## After modification
```
Running for 1 seconds: Processed 85732 records/second. Trace duration 58 seconds. Observed miss ratio 7.97
Caller Get: Observed miss ratio 6.31
Caller Iterator: Observed miss ratio 11.86
***************************************************************
```

Pull Request resolved: https://github.com/facebook/rocksdb/pull/10823

Reviewed By: ajkr

Differential Revision: D52632764

Pulled By: hx235

fbshipit-source-id: 40994d6039b73dc38fe78ea1b4adce187bb98909
2024-01-10 14:02:14 -08:00
Yu Zhang c5fbfd7ad8 Disable blobDB and UDT in memtable only combination in stress test (#12218)
Summary:
This feature combination is not fully working yet. Disable them so the stress tests have less noise.

Pull Request resolved: https://github.com/facebook/rocksdb/pull/12218

Reviewed By: cbi42

Differential Revision: D52643957

Pulled By: jowlyzhang

fbshipit-source-id: 8815a18a3b5814cad4f7ec41f3fb94869302081e
2024-01-09 17:37:01 -08:00
Peter Dillinger ed46981bea Fix and defend against FilePrefetchBuffer combined with mmap reads (#12206)
Summary:
FilePrefetchBuffer makes an unchecked assumption about the behavior of RandomAccessFileReader::Read: that it will write to the provided buffer rather than returning the data in an alternate buffer. FilePrefetchBuffer has been quietly incompatible with mmap reads (e.g. allow_mmap_reads / use_mmap_reads) because in that case an alternate buffer is returned (mmapped memory). This incompatibility currently leads to quiet data corruption, as seen in amplified crash test failure in https://github.com/facebook/rocksdb/issues/12200.

In this change,
* Check whether RandomAccessFileReader::Read has the expected behavior, and fail if not. (Assertion failure in debug build, return Corruption in release build.) This will detect future regressions synchronously and precisely, rather than relying on debugging downstream data corruption.
  * Why not recover? My understanding is that FilePrefetchBuffer is not intended for use when RandomAccessFileReader::Read uses an alternate buffer, so quietly recovering could lead to undesirable (inefficient) behavior.
* Mention incompatibility with mmap-based readers in the internal API comments for FilePrefetchBuffer
* Fix two cases where FilePrefetchBuffer could be used with mmap, both stemming from SstFileDumper, though one fix is in BlockBasedTableReader. There is currently no way to ask a RandomAccessFileReader whether it's using mmap, so we currently have to rely on other options as clues.

Keeping separate from https://github.com/facebook/rocksdb/issues/12200 in part because this change is more appropriate for backport than that one.

Pull Request resolved: https://github.com/facebook/rocksdb/pull/12206

Test Plan:
* Manually verified that the new check aids in debugging.
* Unit test added, that fails if either fix is missed.
* Ran blackbox_crash_test for hours, with and without https://github.com/facebook/rocksdb/issues/12200

Reviewed By: akankshamahajan15

Differential Revision: D52551701

Pulled By: pdillinger

fbshipit-source-id: dea87c5782b7c484a6c6e424585c8832dfc580dc
2024-01-04 18:39:05 -08:00
Peter Dillinger ea6ed0d56e Re-enable ingest_external_file with mmap_read in crash test (#12201)
Summary:
I suspect the issue called out in https://github.com/facebook/rocksdb/issues/9357 was fixed in https://github.com/facebook/rocksdb/issues/11328

Pull Request resolved: https://github.com/facebook/rocksdb/pull/12201

Test Plan: `make blackbox_crash_test` for hours

Reviewed By: ajkr

Differential Revision: D52543075

Pulled By: pdillinger

fbshipit-source-id: b705a6bdb2799a5f51ad2746df2083aa82f360a2
2024-01-04 13:46:07 -08:00