Commit Graph

12472 Commits

Author SHA1 Message Date
Yu Zhang 928aca835f Skip searching through lsm tree for a target level when files overlap (#12284)
Summary:
While ingesting multiple external files with key range overlap, current flow go through the lsm tree to do a search for a target level and later discard that result by defaulting back to L0. This PR improves this by just skip the search altogether.

The other change is to remove default to L0 for the combination of universal compaction + force global sequence number, which was initially added to meet a pre https://github.com/facebook/rocksdb/issues/7421  invariant.

Pull Request resolved: https://github.com/facebook/rocksdb/pull/12284

Test Plan:
Added unit test:
./external_sst_file_test --gtest_filter="*IngestFileWithGlobalSeqnoAssignedUniversal*"

Reviewed By: ajkr

Differential Revision: D53072238

Pulled By: jowlyzhang

fbshipit-source-id: 30943e2e284a7f23b495c0ea4c80cb166a34a8ac
2024-01-24 23:30:08 -08:00
chuhao zeng d82d179a5e Enhance ldb_cmd_tool to enable user pass in customized cfds (#12261)
Summary:
The current implementation of the ldb_cmd tool involves commenting out the user-passed column_family_descriptors, resulting in the tool consistently constructing its column_family_descriptors from the pre-existing OPTIONS file.

The proposed fix prioritizes user-passed column family descriptors, ensuring they take precedence over those specified in the OPTIONS file. This modification enhances the tool's adaptability and responsiveness to user configurations.

Pull Request resolved: https://github.com/facebook/rocksdb/pull/12261

Reviewed By: cbi42

Differential Revision: D52965877

Pulled By: ajkr

fbshipit-source-id: 334a83a8e1004c271b19e7ca09381a0e7cf87b03
2024-01-24 16:16:18 -08:00
Jay Huh 59f4cbef8c MultiGet support in ldb (#12283)
Summary:
While investigating test failures due to the inconsistency between `Get()` and `MultiGet()`, I realized that LDB currently doesn't support `MultiGet()`. This PR introduces the `MultiGet()` support in LDB. Tested the command manually. Unit test will follow in a separate PR.

Pull Request resolved: https://github.com/facebook/rocksdb/pull/12283

Test Plan:
When key not found
```
$> ./ldb --db=/data/users/jewoongh/rocksdb_test/T173992396/rocksdb_crashtest_blackbox --hex multi_get 0x0000000000000009000000000000012B00000000000002AB
Status for key 0x0000000000000009000000000000012B00000000000002AB: NotFound:
```
Compare the same key with get
```
$> ./ldb --db=/data/users/jewoongh/rocksdb_test/T173992396/rocksdb_crashtest_blackbox --hex get 0x0000000000000009000000000000012B00000000000002AB
Failed: Get failed: NotFound:
```

Multiple keys not found
```
$> ./ldb --db=/data/users/jewoongh/rocksdb_test/T173992396/rocksdb_crashtest_blackbox --hex multi_get 0x0000000000000009000000000000012B00000000000002AB 0x0000000000000009000000000000012B00000000000002AC                                                                                                                                                                                                        Status for key 0x0000000000000009000000000000012B00000000000002AB: NotFound:
Status for key 0x0000000000000009000000000000012B00000000000002AC: NotFound:
```

One of the keys found
```
$> ./ldb --db=/data/users/jewoongh/rocksdb_test/T173992396/rocksdb_crashtest_blackbox --hex multi_get 0x0000000000000009000000000000012B00000000000002AB 0x00000000000000090000000000000026787878787878
Status for key 0x0000000000000009000000000000012B00000000000002AB: NotFound:
0x22000000262724252A2B28292E2F2C2D32333031363734353A3B38393E3F3C3D02030001060704050A0B08090E0F0C0D12131011161714151A1B18191E1F1C1D
```

All of the keys found
```
$> ./ldb --db=/data/users/jewoongh/rocksdb_test/T173992396/rocksdb_crashtest_blackbox --hex multi_get 0x0000000000000009000000000000012B00000000000000D8 0x00000000000000090000000000000026787878787878                                                                                                                                                                                                            15:57:03
0x47000000434241404F4E4D4C4B4A494857565554535251505F5E5D5C5B5A595867666564636261606F6E6D6C6B6A696877767574737271707F7E7D7C7B7A797807060504030201000F0E0D0C0B0A090817161514131211101F1E1D1C1B1A1918
0x22000000262724252A2B28292E2F2C2D32333031363734353A3B38393E3F3C3D02030001060704050A0B08090E0F0C0D12131011161714151A1B18191E1F1C1D
```

Reviewed By: hx235

Differential Revision: D53048519

Pulled By: jaykorean

fbshipit-source-id: a6217905464c5f460a222e2b883bdff47b9dd9c7
2024-01-24 11:35:12 -08:00
Hui Xiao 1b2b16b38e Fix bug of newer ingested data assigned with an older seqno (#12257)
Summary:
**Context:**
We found an edge case where newer ingested data is assigned with an older seqno. This causes older data of that key to be returned for read.

Consider the following lsm shape:
![image](https://github.com/facebook/rocksdb/assets/83968999/973fd160-5065-49cd-8b7b-b6ab4badae23)
Then ingest a file to L5 containing new data of key_overlap. Because of [this](https://l.facebook.com/l.php?u=https%3A%2F%2Fgithub.com%2Ffacebook%2Frocksdb%2Fblob%2F5a26f392ca640818da0b8590be6119699e852b07%2Fdb%2Fexternal_sst_file_ingestion_job.cc%3Ffbclid%3DIwAR10clXxpUSrt6sYg12sUMeHfShS7XigFrsJHvZoUDroQpbj_Sb3dG_JZFc%23L951-L956&h=AT0m56P7O0ZML7jk1sdjgnZZyGPMXg9HkKvBEb8mE9ZM3fpJjPrArAMsaHWZQPt9Ki-Pn7lv7x-RT9NEd_202Y6D2juIVHOIt3EjCZptDKBLRBMG49F8iBUSM9ypiKe8XCfM-FNW2Hl4KbVq2e3nZRbMvUM), the file is assigned with seqno 2, older than the old data's seqno 4. After just another compaction, we will drop the new_v for key_overlap because of the seqno and cause older data to be returned.
![image](https://github.com/facebook/rocksdb/assets/83968999/a3ef95e4-e7ae-4c30-8d03-955cd4b5ed42)

**Summary:**
This PR removes the incorrect seqno assignment

Pull Request resolved: https://github.com/facebook/rocksdb/pull/12257

Test Plan:
- New unit test failed before the fix but passes after
- python3 tools/db_crashtest.py --compaction_style=1 --ingest_external_file_one_in=10 --preclude_last_level_data_seconds=36000 --compact_files_one_in=10 --enable_blob_files=0 blackbox`
- Rehearsal stress test

Reviewed By: cbi42

Differential Revision: D52926092

Pulled By: hx235

fbshipit-source-id: 9e4dade0f6cc44e548db8fca27ccbc81a621cd6f
2024-01-24 11:21:05 -08:00
Yu Zhang 9243f1b668 Ensures PendingExpectedValue either Commit or Rollback (#12244)
Summary:
This PR adds automatic checks in the `PendingExpectedValue` class to make sure it's either committed or rolled back before being destructed.

Pull Request resolved: https://github.com/facebook/rocksdb/pull/12244

Reviewed By: hx235

Differential Revision: D52853794

Pulled By: jowlyzhang

fbshipit-source-id: 1dcd7695f2c52b79695be0abe11e861047637dc4
2024-01-24 11:04:40 -08:00
Peter Dillinger b31f3245f1 Fix flaky test shutdown race in seqno_time_test (#12282)
Summary:
Seen in build-macos-cmake:

```
Received signal 11 (Segmentation fault: 11)
	https://github.com/facebook/rocksdb/issues/1   rocksdb::MockSystemClock::InstallTimedWaitFixCallback()::$_0::operator()(void*) const (in seqno_time_test) (mock_time_env.cc:29)
	https://github.com/facebook/rocksdb/issues/2   decltype(std::declval<rocksdb::MockSystemClock::InstallTimedWaitFixCallback()::$_0&>()(std::declval<void*>())) std::__1::__invoke[abi:v15006]<rocksdb::MockSystemClock::InstallTimedWaitFixCallback()::$_0&, void*>(rocksdb::MockSystemClock::InstallTimedWait	ixCallback()::$_0&, void*&&) (in seqno_time_test) (invoke.h:394)
...
```

This is presumably because the std::function from the lambda only saves a copy of the SeqnoTimeTest* this pointer, which doesn't prevent it from being reclaimed on parallel shutdown. If we instead save a copy of the `std::shared_ptr<MockSystemClock>` in the std::function, this should prevent the crash. (Note that in `SyncPoint::Data::Process()` copies the std::function before releasing the mutex for calling the callback.)

Pull Request resolved: https://github.com/facebook/rocksdb/pull/12282

Test Plan: watch CI

Reviewed By: cbi42

Differential Revision: D53027136

Pulled By: pdillinger

fbshipit-source-id: 26cd9c0352541d806d42bb061dd349d3b47171a5
2024-01-24 10:14:22 -08:00
Richard Barnes fc25ac0f3b Remove extra semi colon from internal_repo_rocksdb/repo/util/ribbon_impl.h (#12269)
Summary:
Pull Request resolved: https://github.com/facebook/rocksdb/pull/12269

`-Wextra-semi` or `-Wextra-semi-stmt`

If the code compiles, this is safe to land.

Reviewed By: jaykorean

Differential Revision: D52969093

fbshipit-source-id: 0520085819fa785679c859b63b877931d3f71f2c
2024-01-24 08:20:50 -08:00
Richard Barnes 14633148a7 Remove extra semi colon from internal_repo_rocksdb/repo/util/murmurhash.cc (#12270)
Summary:
Pull Request resolved: https://github.com/facebook/rocksdb/pull/12270

`-Wextra-semi` or `-Wextra-semi-stmt`

If the code compiles, this is safe to land.

Reviewed By: jaykorean

Differential Revision: D52965944

fbshipit-source-id: 625d47662e984db9ce06e72ff39025b8a24aa246
2024-01-24 07:39:59 -08:00
Richard Barnes 28ba896f19 Remove extra semi colon from internal_repo_rocksdb/repo/include/rocksdb/slice_transform.h (#12275)
Summary:
Pull Request resolved: https://github.com/facebook/rocksdb/pull/12275

`-Wextra-semi` or `-Wextra-semi-stmt`

If the code compiles, this is safe to land.

Reviewed By: jaykorean

Differential Revision: D52969065

fbshipit-source-id: cf2fcdc006d3b45fb54fb700a8ebefb14b42de0d
2024-01-24 07:38:17 -08:00
Richard Barnes f099032131 Remove extra semi colon from internal_repo_rocksdb/repo/utilities/env_mirror.cc (#12271)
Summary:
Pull Request resolved: https://github.com/facebook/rocksdb/pull/12271

`-Wextra-semi` or `-Wextra-semi-stmt`

If the code compiles, this is safe to land.

Reviewed By: jaykorean

Differential Revision: D52969070

fbshipit-source-id: 22e0958ad6ced5c021ef7dafbe16a17c282935d8
2024-01-24 07:37:31 -08:00
Richard Barnes 502a1754c4 Remove extra semi colon from internal_repo_rocksdb/repo/utilities/transactions/lock/range/range_tree/lib/locktree/manager.cc (#12276)
Summary:
Pull Request resolved: https://github.com/facebook/rocksdb/pull/12276

`-Wextra-semi` or `-Wextra-semi-stmt`

If the code compiles, this is safe to land.

Reviewed By: jaykorean

Differential Revision: D52969073

fbshipit-source-id: 1b2495548d939c32e7a89a6424767497fab9550e
2024-01-24 07:25:27 -08:00
Richard Barnes 0797616de0 Remove extra semi colon from internal_repo_rocksdb/repo/utilities/transactions/write_unprepared_txn.h (#12273)
Summary:
Pull Request resolved: https://github.com/facebook/rocksdb/pull/12273

`-Wextra-semi` or `-Wextra-semi-stmt`

If the code compiles, this is safe to land.

Reviewed By: jaykorean

Differential Revision: D52969166

fbshipit-source-id: 129715bfe69735b83b077c7d6cbf1786c1dfc410
2024-01-24 07:24:06 -08:00
Richard Barnes 1f3e3ead3f Remove extra semi colon from internal_repo_rocksdb/repo/env/env_encryption.cc (#12274)
Summary:
Pull Request resolved: https://github.com/facebook/rocksdb/pull/12274

`-Wextra-semi` or `-Wextra-semi-stmt`

If the code compiles, this is safe to land.

Reviewed By: jaykorean

Differential Revision: D52969133

fbshipit-source-id: f5a8452af25a5a51d5c7e4045baef12575022da9
2024-01-24 07:22:49 -08:00
Richard Barnes 532c940b79 Remove extra semi colon from internal_repo_rocksdb/repo/utilities/transactions/transaction_base.h (#12272)
Summary:
Pull Request resolved: https://github.com/facebook/rocksdb/pull/12272

`-Wextra-semi` or `-Wextra-semi-stmt`

If the code compiles, this is safe to land.

Reviewed By: jaykorean

Differential Revision: D52969170

fbshipit-source-id: 581304039be789cbce6760740e9557a925e02722
2024-01-24 07:22:45 -08:00
Richard Barnes 8c01cb79da Remove extra semi colon from internal_repo_rocksdb/repo/include/rocksdb/table.h (#12277)
Summary:
Pull Request resolved: https://github.com/facebook/rocksdb/pull/12277

`-Wextra-semi` or `-Wextra-semi-stmt`

If the code compiles, this is safe to land.

Reviewed By: jaykorean

Differential Revision: D52969088

fbshipit-source-id: cd83cb3cd98b1389ddfe3e5e316f088eb5975b9f
2024-01-24 07:22:31 -08:00
Richard Barnes 3079a7e7c2 Remove extra semi colon from internal_repo_rocksdb/repo/db/internal_stats.h (#12278)
Summary:
Pull Request resolved: https://github.com/facebook/rocksdb/pull/12278

`-Wextra-semi` or `-Wextra-semi-stmt`

If the code compiles, this is safe to land.

Reviewed By: jaykorean

Differential Revision: D52969116

fbshipit-source-id: 8cb28dafdbede54e8cb59c2b8d461b1eddb3de68
2024-01-24 07:22:10 -08:00
Richard Barnes 5eebfaaa09 Remove extra semi colon from internal_repo_rocksdb/repo/utilities/fault_injection_fs.h (#12279)
Summary:
Pull Request resolved: https://github.com/facebook/rocksdb/pull/12279

`-Wextra-semi` or `-Wextra-semi-stmt`

If the code compiles, this is safe to land.

Reviewed By: jaykorean

Differential Revision: D52969150

fbshipit-source-id: a66326e2f8285625c4260f4d23df678a25bcfe29
2024-01-24 07:16:00 -08:00
Changyu Bi 3ef9092487 Print additional information when flaky test DBTestWithParam.ThreadStatusSingleCompaction fails (#12268)
Summary:
The test is [flaky](https://github.com/facebook/rocksdb/actions/runs/7616272304/job/20742657041?pr=12257&fbclid=IwAR1vNI1rSRVKnOsXs0WCPklqTkBXxlwS1GMJgWWe7D8dtAvh6e6wxk067FY) but I could not reproduce the test failure. Add some debug print to make the next failure more helpful

Pull Request resolved: https://github.com/facebook/rocksdb/pull/12268

Test Plan:
```
check print works when test fails:
[ RUN      ] DBTestWithParam/DBTestWithParam.ThreadStatusSingleCompaction/0
thread id: 6134067200, thread status:
thread id: 6133493760, thread status: Compaction
db/db_test.cc:4680: Failure
Expected equality of these values:
  op_count
    Which is: 1
  expected_count
    Which is: 0
```

Reviewed By: hx235

Differential Revision: D52987503

Pulled By: cbi42

fbshipit-source-id: 33b369796f9b97155578b45167e722ddcde93594
2024-01-23 10:07:06 -08:00
Richard Barnes dee46863ba Remove extra semi colon from internal_repo_rocksdb/repo/port/lang.h
Summary:
`-Wextra-semi` or `-Wextra-semi-stmt`

If the code compiles, this is safe to land.

Reviewed By: dmm-fb

Differential Revision: D52968990

fbshipit-source-id: 58d344b719734c736cd80d47eeb6965557ce344b
2024-01-23 09:41:29 -08:00
Andrew Kryczka 7fe93162c5 Log pending compaction bytes in a couple places (#12267)
Summary:
This PR adds estimated pending compaction bytes in two places:

- The "Level summary", which is printed to the info LOG after every flush or compaction
- The "rocksdb.cfstats" property, which is printed to the info LOG periodically according to `stats_dump_period_sec`

Pull Request resolved: https://github.com/facebook/rocksdb/pull/12267

Test Plan:
Ran `./db_bench -benchmarks=filluniquerandom -stats_dump_period_sec=1 -statistics=true -write_buffer_size=524288` and looked at the LOG.

```
** Compaction Stats [default] **
...
Estimated pending compaction bytes: 12117691
...
2024/01/22-13:15:12.283563 1572872 (Original Log Time 2024/01/22-13:15:12.283540) [/db_impl/db_impl_compaction_flush.cc:371] [default] Level summary: files[10 1 0 0 0 0 0] max score 0.50, estimated pending compaction bytes 12359137
```

Reviewed By: cbi42

Differential Revision: D52973337

Pulled By: ajkr

fbshipit-source-id: c4e546bd9bdac387eebeeba303d04125212037b8
2024-01-23 09:14:59 -08:00
Richard Barnes 84711e2f6a Remove extra semi colon from internal_repo_rocksdb/repo/monitoring/histogram.h
Summary:
`-Wextra-semi` or `-Wextra-semi-stmt`

If the code compiles, this is safe to land.

Reviewed By: dmm-fb

Differential Revision: D52968964

fbshipit-source-id: 2cb8c683f958742e2f151db8ef6824ab622528e6
2024-01-23 08:42:15 -08:00
Richard Barnes c057c2e81d Remove extra semi colon from internal_repo_rocksdb/repo/include/rocksdb/file_system.h
Summary:
`-Wextra-semi` or `-Wextra-semi-stmt`

If the code compiles, this is safe to land.

Reviewed By: dmm-fb

Differential Revision: D52969123

fbshipit-source-id: d9e22dff70644dad0173ee8f6f9b64021f4b2551
2024-01-23 08:40:00 -08:00
Richard Barnes 186344196b Remove extra semi colon from internal_repo_rocksdb/repo/monitoring/histogram.cc
Summary:
`-Wextra-semi` or `-Wextra-semi-stmt`

If the code compiles, this is safe to land.

Reviewed By: dmm-fb

Differential Revision: D52969001

fbshipit-source-id: d628fa6c5e5d01657fcb7aff7b05dea704ed2025
2024-01-23 08:37:47 -08:00
Richard Barnes b60cb55889 Remove extra semi colon from internal_repo_rocksdb/repo/util/xxhash.h
Summary:
`-Wextra-semi` or `-Wextra-semi-stmt`

If the code compiles, this is safe to land.

Reviewed By: dmm-fb

Differential Revision: D52967247

fbshipit-source-id: 4a67cb9719e092ad9bbe9c7e1d060e3f9042ecf7
2024-01-23 08:36:43 -08:00
Richard Barnes 24e7e7be04 Remove extra semi colon from internal_repo_rocksdb/repo/include/rocksdb/env_encryption.h
Summary:
`-Wextra-semi` or `-Wextra-semi-stmt`

If the code compiles, this is safe to land.

Reviewed By: dmm-fb

Differential Revision: D52969125

fbshipit-source-id: f8b6090393459b8d2973e54fac488290a54bf752
2024-01-23 08:35:47 -08:00
Richard Barnes 51ecdd3e8f Remove extra semi colon from internal_repo_rocksdb/repo/env/env_encryption_ctr.h
Summary:
`-Wextra-semi` or `-Wextra-semi-stmt`

If the code compiles, this is safe to land.

Reviewed By: jaykorean

Differential Revision: D52969018

fbshipit-source-id: 0b79c1599fef4eb902c9ef3fac827f1ed4ea94ed
2024-01-23 06:07:30 -08:00
Yu Zhang ef342246dc Consolidate stats recording in error handler (#11992)
Summary:
This is a non functional refactor, mostly for deduplicating the stats recording logic in error handler. Plus some documentation update and simple code dedupe.

Pull Request resolved: https://github.com/facebook/rocksdb/pull/11992

Test Plan: existing tests

Reviewed By: hx235

Differential Revision: D52967713

Pulled By: jowlyzhang

fbshipit-source-id: d584eae1a06410438f5a4c59c2cb67666ea7de1a
2024-01-22 14:57:30 -08:00
Peter Dillinger bc95cdd242 Add 8.11 release note for FileOperationType enum addition (#12263)
Summary:
Adding a this new possibility caused an assertion failure in our own RocksDB extensions (switch now incomplete), so we should warn others about it as well.

Will pick this into 8.11.fb branch

Pull Request resolved: https://github.com/facebook/rocksdb/pull/12263

Test Plan: no code change

Reviewed By: jaykorean

Differential Revision: D52966124

Pulled By: pdillinger

fbshipit-source-id: 4998293a9480909e4888871850a012b7354c3e81
2024-01-22 12:43:44 -08:00
Changyu Bi a29db3048f Fix TestGetEntity failure with UDT (#12264)
Summary:
Use the read option with right timestamp and skip verification when using old timestamps.

Pull Request resolved: https://github.com/facebook/rocksdb/pull/12264

Test Plan:
I can repro with small keyspace:
```
./db_stress --acquire_snapshot_one_in=10000 --adaptive_readahead=0 --allow_data_in_errors=True --async_io=0 --auto_readahead_size=1 --avoid_flush_during_recovery=0 --avoid_unnecessary_blocking_io=0 --backup_max_size=104857600 --backup_one_in=1000 --batch_protection_bytes_per_key=0 --block_protection_bytes_per_key=4 --block_size=16384 --bloom_before_level=7 --bloom_bits=15 --bottommost_compression_type=xpress --bottommost_file_compaction_delay=0 --bytes_per_sync=262144 --cache_index_and_filter_blocks=1 --cache_size=33554432 --cache_type=fixed_hyper_clock_cache --charge_compression_dictionary_building_buffer=1 --charge_file_metadata=0 --charge_filter_construction=1 --charge_table_reader=1 --checkpoint_one_in=10000 --checksum_type=kXXH3 --clear_column_family_one_in=0 --compact_files_one_in=1000000 --compact_range_one_in=1000000 --compaction_pri=4 --compaction_readahead_size=0 --compaction_style=1 --compaction_ttl=0 --compressed_secondary_cache_size=16777216 --compression_checksum=1 --compression_max_dict_buffer_bytes=0 --compression_max_dict_bytes=0 --compression_parallel_threads=1 --compression_type=snappy --compression_use_zstd_dict_trainer=0 --compression_zstd_max_train_bytes=0 --continuous_verification_interval=0 --data_block_index_type=1 --db_write_buffer_size=0 --delpercent=4 --delrangepercent=1 --destroy_db_initially=1 --detect_filter_construct_corruption=1 --disable_wal=0 --enable_compaction_filter=0 --enable_pipelined_write=1 --enable_thread_tracking=0 --fail_if_options_file_error=1 --fifo_allow_compaction=1 --file_checksum_impl=big --flush_one_in=1000 --format_version=2 --get_current_wal_file_one_in=0 --get_live_files_one_in=10000 --get_property_one_in=100000 --get_sorted_wal_files_one_in=0 --index_block_restart_interval=13 --index_type=0 --ingest_external_file_one_in=0 --initial_auto_readahead_size=16384 --iterpercent=10 --key_len_percent_dist=1,30,69 --level_compaction_dynamic_level_bytes=0 --lock_wal_one_in=10000 --log2_keys_per_lock=10 --long_running_snapshots=0 --manual_wal_flush_one_in=1000 --mark_for_compaction_one_file_in=0 --max_auto_readahead_size=0 --max_background_compactions=20 --max_bytes_for_level_base=10485760 --max_key=1000 --max_key_len=3 --max_manifest_file_size=16384 --max_write_batch_group_size_bytes=1048576 --max_write_buffer_number=3 --max_write_buffer_size_to_maintain=2097152 --memtable_max_range_deletions=0 --memtable_prefix_bloom_size_ratio=0.01 --memtable_protection_bytes_per_key=0 --memtable_whole_key_filtering=0 --memtablerep=skip_list --min_write_buffer_number_to_merge=2 --mmap_read=1 --mock_direct_io=False --nooverwritepercent=1 --num_file_reads_for_auto_readahead=1 --open_files=100 --open_metadata_write_fault_one_in=0 --open_read_fault_one_in=32 --open_write_fault_one_in=16 --ops_per_thread=200000 --optimize_filters_for_memory=0 --paranoid_file_checks=0 --partition_filters=0 --partition_pinning=2 --pause_background_one_in=1000000 --periodic_compaction_seconds=0 --persist_user_defined_timestamps=1 --prefix_size=8 --prefixpercent=5 --prepopulate_block_cache=1 --preserve_internal_time_seconds=3600 --progress_reports=0 --read_fault_one_in=0 --readahead_size=16384 --readpercent=45 --recycle_log_file_num=1 --reopen=20 --secondary_cache_fault_one_in=32 --secondary_cache_uri= --snapshot_hold_ops=100000 --sst_file_manager_bytes_per_sec=0 --sst_file_manager_bytes_per_truncate=0 --stats_dump_period_sec=0 --subcompactions=3 --sync=0 --sync_fault_injection=0 --target_file_size_base=524288 --target_file_size_multiplier=2 --test_batches_snapshots=0 --test_cf_consistency=0 --top_level_index_pinning=1 --unpartitioned_pinning=1 --use_direct_io_for_flush_and_compaction=0 --use_direct_reads=0 --use_full_merge_v1=0 --use_get_entity=1 --use_merge=0 --use_multi_get_entity=0 --use_multiget=1 --use_put_entity_one_in=0 --use_txn=0 --use_write_buffer_manager=0 --user_timestamp_size=8 --value_size_mult=32 --verification_only=0 --verify_checksum=1 --verify_checksum_one_in=1000000 --verify_db_one_in=100000 --verify_file_checksums_one_in=100000 --verify_iterator_with_expected_state_one_in=5 --verify_sst_unique_id_in_manifest=1 --wal_bytes_per_sync=524288 --wal_compression=zstd --write_buffer_size=4194304 --write_dbid_to_manifest=0 --write_fault_one_in=0 --writepercent=35 --db=/dev/shm/rocksdb_test/rocksdb_crashtest_whitebox --expected_values_dir=/dev/shm/rocksdb_test/rocksdb_crashtest_expected

Errors when run with main:
error : inconsistent values for key 0x00000000000000E5000000000000012B000000000000014D: expected state has the key, GetEntity returns NotFound.

error : inconsistent values for key 0x0000000000000009000000000000012B0000000000000254: GetEntity returns :0x010000000504070609080B0A0D0C0F0E111013121514171619181B1A1D1C1F1E212023222524272629282B2A2D2C2F2E313033323534373639383B3A3D3C3F3E, expected state does not have the key.
```

Reviewed By: jaykorean

Differential Revision: D52966251

Pulled By: cbi42

fbshipit-source-id: 09436a1b747f1ac545140fc83a2fa4555fef51c1
2024-01-22 12:15:17 -08:00
zaidoon e572ae9f57 expose mode option to Rate Limiter via C API (#12259)
Summary:
addresses https://github.com/facebook/rocksdb/issues/12220 to allow rate limiting compaction but not flushes

Pull Request resolved: https://github.com/facebook/rocksdb/pull/12259

Reviewed By: jaykorean

Differential Revision: D52965342

Pulled By: ajkr

fbshipit-source-id: 38566d9ac75c932c63e10cc53796fab0e46e3b2e
2024-01-22 11:45:53 -08:00
Changyu Bi 4b684e96b7 Allow more intra-L0 compaction when L0 is small (#12214)
Summary:
introduce a new option `intra_l0_compaction_size` to allow more intra-L0 compaction when total L0 size is under a threshold. This option applies only to leveled compaction. It is enabled by default and set to `max_bytes_for_level_base / max_bytes_for_level_multiplier` only for atomic_flush users. When atomic_flush=true, it is more likely that some CF's total L0 size is small when it's eligible for compaction. This option aims to reduce write amplification in this case.

Pull Request resolved: https://github.com/facebook/rocksdb/pull/12214

Test Plan:
- new unit test
- benchmark:
```
TEST_TMPDIR=/dev/shm ./db_bench --benchmarks=fillrandom --write_buffer_size=51200 --max_bytes_for_level_base=5242880 --level0_file_num_compaction_trigger=4 --statistics=1

main:
fillrandom   :     234.499 micros/op 4264 ops/sec 234.499 seconds 1000000 operations;    0.5 MB/s
rocksdb.compact.read.bytes COUNT : 1490756235
rocksdb.compact.write.bytes COUNT : 1469056734
rocksdb.flush.write.bytes COUNT : 71099011

branch:
fillrandom   :     128.494 micros/op 7782 ops/sec 128.494 seconds 1000000 operations;    0.9 MB/s
rocksdb.compact.read.bytes COUNT : 807474156
rocksdb.compact.write.bytes COUNT : 781977610
rocksdb.flush.write.bytes COUNT : 71098785
```

Reviewed By: ajkr

Differential Revision: D52637771

Pulled By: cbi42

fbshipit-source-id: 4f2c7925d0c3a718635c948ea0d4981ed9fabec3
2024-01-22 10:23:57 -08:00
Peter Dillinger 800cfae987 Start 9.0.0 release (#12256)
Summary:
with release notes for 8.11.fb, format_compatible test update, and version.h update.

Pull Request resolved: https://github.com/facebook/rocksdb/pull/12256

Test Plan: CI

Reviewed By: cbi42

Differential Revision: D52926051

Pulled By: pdillinger

fbshipit-source-id: adcf7119b065758599e904c16cbdf1d28811e0b4
2024-01-20 08:38:20 -08:00
Peter Dillinger cb08a682d4 Fix/cleanup SeqnoToTimeMapping (#12253)
Summary:
The SeqnoToTimeMapping class (RocksDB internal) used by the preserve_internal_time_seconds / preclude_last_level_data_seconds options was essentially in a prototype state with some significant flaws that would risk biting us some day. This is a big, complicated change because both the implementation and the behavioral requirements of the class needed to be upgraded together. In short, this makes SeqnoToTimeMapping more internally responsible for maintaining good invariants, so that callers don't easily encounter dangerous scenarios.

* Some API functions were confusingly named and structured, so I fully refactored the APIs to use clear naming (e.g. `DecodeFrom` and `CopyFromSeqnoRange`), object states, function preconditions, etc.
  * Previously the object could informally be sorted / compacted or not, and there was limited checking or enforcement on these states. Now there's a well-defined "enforced" state that is consistently checked in debug mode for applicable operations. (I attempted to create a separate "builder" class for unenforced states, but IIRC found that more cumbersome for existing uses than it was worth.)
* Previously operations would coalesce data in a way that was better for `GetProximalTimeBeforeSeqno` than for `GetProximalSeqnoBeforeTime` which is odd because the latter is the only one used by DB code currently (what is the seqno cut-off for data definitely older than this given time?). This is now reversed to consistently favor `GetProximalSeqnoBeforeTime`, with that logic concentrated in one place: `SeqnoToTimeMapping::SeqnoTimePair::Merge()`. Unfortunately, a lot of unit test logic was specifically testing the old, suboptimal behavior.
* Previously, the natural behavior of SeqnoToTimeMapping was to THROW AWAY data needed to get reasonable answers to the important `GetProximalSeqnoBeforeTime` queries. This is because SeqnoToTimeMapping only had a FIFO policy for staying within the entry capacity (except in aggregate+sort+serialize mode). If the DB wasn't extremely careful to avoid gathering too many time mappings, it could lose track of where the seqno cutoff was for cold data (`GetProximalSeqnoBeforeTime()` returning 0) and preventing all further data migration to the cold tier--until time passes etc. for mappings to catch up with FIFO purging of them. (The problem is not so acute because SST files contain relevant snapshots of the mappings, but the problem would apply to long-lived memtables.)
  * Now the SeqnoToTimeMapping class has fully-integrated smarts for keeping a sufficiently complete history, within capacity limits, to give good answers to `GetProximalSeqnoBeforeTime` queries.
  * Fixes old `// FIXME: be smarter about how we erase to avoid data falling off the front prematurely.`
* Fix an apparent bug in how entries are selected for storing into SST files. Previously, it only selected entries within the seqno range of the file, but that would easily leave a gap at the beginning of the timeline for data in the file for the purposes of answering GetProximalXXX queries with reasonable accuracy. This could probably lead to the same problem discussed above in naively throwing away entries in FIFO order in the old SeqnoToTimeMapping. The updated testing of GetProximalSeqnoBeforeTime in BasicSeqnoToTimeMapping relies on the fixed behavior.
* Fix a potential compaction CPU efficiency/scaling issue in which each compaction output file would iterate over and sort all seqno-to-time mappings from all compaction input files. Now we distill the input file entries to a constant size before processing each compaction output file.

Intended follow-up (me or others):
* Expand some direct testing of SeqnoToTimeMapping APIs. Here I've focused on updating existing tests to make sense.
* There are likely more gaps in availability of needed SeqnoToTimeMapping data when the DB shuts down and is restarted, at least with WAL.
* The data tracked in the DB could be kept more accurate and limited if it used the oldest seqno of unflushed data. This might require some more API refactoring.

Pull Request resolved: https://github.com/facebook/rocksdb/pull/12253

Test Plan: unit tests updated

Reviewed By: jowlyzhang

Differential Revision: D52913733

Pulled By: pdillinger

fbshipit-source-id: 020737fcbbe6212f6701191a6ab86565054c9593
2024-01-19 21:50:38 -08:00
Jay Huh d982260b63 Clean up after long-running whitebox crashtest (#12248)
Summary:
Currently, we treat the long-running whitebox_crash_test as passing. However, we were not cleaning up after ourselves when we killed the running test for running too long, which often caused out-of-space errors in subsequent tests (e.g., blackbox_crash_test after whitebox_crash_test).

Unless we want to start treating these timeouts as failures and need the DB output for investigation now, we should properly clean up the tmp dir.

Pull Request resolved: https://github.com/facebook/rocksdb/pull/12248

Test Plan:
```
$> make crash_test -j
```

Reviewed By: ajkr

Differential Revision: D52885342

Pulled By: jaykorean

fbshipit-source-id: 7c1f2ca7cf03d0705bb14155ee44d5d7a411c132
2024-01-19 16:25:39 -08:00
Andrew Kryczka d69628e6ce Mark unsafe/outdated options as deprecated (#12249)
Summary:
These options were added for users to roll back a behavior change without downgrading. To our knowledge they were not needed so can now be removed.

- `level_compaction_dynamic_file_size`
- `ignore_max_compaction_bytes_for_input`

These options were added for users to disable an online validation in case it is expensive or has false positives. Those validations have shown to be cheap, correct, and are enabled by default, so these options can be removed.

- `check_flush_compaction_key_order`
- `flush_verify_memtable_count`
- `compaction_verify_record_count`
- `fail_if_options_file_error`

This option was added for users to violate API contracts or run old databases that used to violate API contracts. It appears to be set by MyRocks so it is unclear whether we can remove it. In any case we should discourage it until it can be removed.

- `enforce_single_del_contracts`

Pull Request resolved: https://github.com/facebook/rocksdb/pull/12249

Reviewed By: cbi42

Differential Revision: D52886651

Pulled By: ajkr

fbshipit-source-id: e0d5a35144ce048505899efb1ca68c3948050aa4
2024-01-19 10:44:49 -08:00
Changyu Bi ec5b1be18d Deflake `PerfContextTest.CPUTimer` (#12252)
Summary:
We saw failures like
```
db/perf_context_test.cc:952: Failure
Expected: (next_count) > (count), actual: 26699 vs 26699
```
I can repro by running the test repeatedly and the test fails with different seek keys. So
the cause is likely not with Seek() implementation. I found that
`clock_gettime(CLOCK_THREAD_CPUTIME_ID, &ts);` can return the same time when
called repeatedly. However, I don't know if Seek() is fast enough that this happened during
continuous test.

Pull Request resolved: https://github.com/facebook/rocksdb/pull/12252

Test Plan: `gtest_parallel.py --repeat=10000 --workers=1 ./perf_context_test --gtest_filter="PerfContextTest.CPUTimer"`

Reviewed By: ajkr

Differential Revision: D52912751

Pulled By: cbi42

fbshipit-source-id: 8985ae93baa99cdf4b9136ea38addd2e41f4b202
2024-01-19 10:13:52 -08:00
Adam Retter 5a26f392ca Use the correct Docker Image for RocksJava on Linux (#12169)
Summary: Pull Request resolved: https://github.com/facebook/rocksdb/pull/12169

Reviewed By: pdillinger

Differential Revision: D52715225

Pulled By: ajkr

fbshipit-source-id: 28476d363034fa1bb9c8c919d577c03b6391451b
2024-01-19 10:12:31 -08:00
akankshamahajan b5bb553d5e Fix PREFETCH_BYTES_USEFUL stat calculation (#12251)
Summary:
After refactoring of FilePrefetchBuffer, PREFETCH_BYTES_USEFUL was miscalculated. Instead of calculating how many requested bytes are already in the buffer, it took into account alignment as well because aligned_useful_len takes into consideration alignment too.

Also refactored the naming of chunk_offset_in_buffer to make it similar to aligned_useful_len

Pull Request resolved: https://github.com/facebook/rocksdb/pull/12251

Test Plan:
1. Validated internally through release validation benchmarks.
2. Updated unit test that fails without the fix.

Reviewed By: ajkr

Differential Revision: D52891112

Pulled By: akankshamahajan15

fbshipit-source-id: 2526a0b0572d473beaf8b841f2f9c2f6275d9779
2024-01-18 19:09:49 -08:00
Neil Ramaswamy 4835c11cce Add native logger support to RocksJava (#12213)
Summary:
## Overview

In this PR, we introduce support for setting the RocksDB native logger through Java. As mentioned in the discussion on the [Google Group discussion](https://groups.google.com/g/rocksdb/c/xYmbEs4sqRM/m/e73E4whJAQAJ), this work is primarily motivated by the  JDK 17 [performance regression in JNI thread attach/detach calls](https://bugs.openjdk.org/browse/JDK-8314859): the only existing RocksJava logging configuration call, `setLogger`, invokes the provided logger over the JNI.

## Changes

Specifically, these changes add support for the `devnull` and `stderr` native loggers. For the `stderr` logger, we add the ability to prefix every log with a `logPrefix`, so that it becomes possible know which database a particular log is coming from (if multiple databases are in use). The  API looks like the following:

```java
Options opts = new Options();

NativeLogger stderrNativeLogger = NativeLogger.newStderrLogger(
  InfoLogLevel.DEBUG_LEVEL, "[my prefix here]");
options.setLogger(stderrNativeLogger);

try (final RocksDB db = RocksDB.open(options, ...))  {...}

// Cleanup
stderrNativeLogger.close()
opts.close();
```

Note that the API to set the logger is the same, via `Options::setLogger` (or `DBOptions::setLogger`). However, it will set the RocksDB logger to be native when  the provided logger is an instance of `NativeLogger`.

## Testing

Two tests have been added in `NativeLoggerTest.java`. The first test creates both the `devnull` and `stderr` loggers, and sets them on the associated `Options`. However, to avoid polluting the testing output with logs from `stderr`, only the `devnull` logger is actually used in the test. The second test does the same logic, but for `DBOptions`.

It is possible to manually verify the `stderr` logger by modifying the tests slightly, and observing that the console indeed gets cluttered with logs from `stderr`.

Pull Request resolved: https://github.com/facebook/rocksdb/pull/12213

Reviewed By: cbi42

Differential Revision: D52772306

Pulled By: ajkr

fbshipit-source-id: 4026895f78f9cc250daf6bfa57427957e2d8b053
2024-01-17 17:51:36 -08:00
Richard Barnes 59ba1d200d Remove unused variables in internal_repo_rocksdb/repo/env/env_posix.cc (#12243)
Summary:
Pull Request resolved: https://github.com/facebook/rocksdb/pull/12243

LLVM-15 has a warning `-Wunused-but-set-variable` which we treat as an error because it's so often diagnostic of a code issue. Unused variables can compromise readability or, worse, performance.

This diff either (a) removes an unused variable and, possibly, it's associated code, or (b) qualifies the variable with `[[maybe_unused]]`, mostly in cases where the variable _is_ used, but, eg, in an `assert` statement that isn't present in production code.

 - If you approve of this diff, please use the "Accept & Ship" button :-)

Reviewed By: jowlyzhang

Differential Revision: D52847993

fbshipit-source-id: 221da13c6ca9967e3b934f98f318a832a144df39
2024-01-17 14:08:07 -08:00
anand76 65e162bf09 Add some asserts in FilePickerMultiGet for debugging (#12241)
Summary:
Add asserts to help debug a crash test failure. The test fails as wollows -
```rocksdb::FilePickerMultiGet::PrepareNextLevel(): Assertion `fp_ctx.search_right_bound == -1 || fp_ctx.search_right_bound == FileIndexer::kLevelMaxIndex' failed```

Also add a unit test to verify an edge case.

Pull Request resolved: https://github.com/facebook/rocksdb/pull/12241

Reviewed By: cbi42

Differential Revision: D52819029

Pulled By: anand1976

fbshipit-source-id: 33316985c8ace1aed9ecc2400da8b777aec488ff
2024-01-16 17:08:58 -08:00
Yu Zhang c4228abdc0 Fix backup/checkpoint stress test failure (#12227)
Summary:
This PR fixes this type of stress test failure that could happen in either checkpoint or backup. Example failure messages are like this:

`Failure in a backup/restore operation with: Corruption: 0x00000000000001D5000000000000012B00000000000000FD exists in original db but not in restore`

`A checkpoint operation failed with: Corruption: 0x0000000000000365000000000000012B0000000000000067 exists in original db but not in checkpoint /...`

The internal task has an example test command to quickly reproduce this type of error.

The common symptom of these test failures are these expected keys do not exist in the original db either. The root cause is `TestCheckpoint` and `TestBackupRestore` both use the expected state as a proxy for the state of the original db when it comes to check a key's existence. 0758271d51/db_stress_tool/db_stress_test_base.cc (L1838)

This `ExpectedState::Exists` API returns true if a key has a pending write, such as a pending put. In usual case, this pending put should either soon materialize to an actual write when `PendingExpectedValue::Commit` is called to reflect a successful write to the DB, or test should be safely terminated if write to DB fails. All of which happens while a key is locked. So checkpoint and backup usually won't see the discrepancy between db and expected state caused by pending writes. However, the external file ingestion test currently has a path that will proceed the test after a failed ingestion caused by injected errors, leaving the pending put in the expected state. 0758271d51/db_stress_tool/no_batched_ops_stress.cc (L1577-L1589)

I think a proper and future proof fix for this is to explicitly rollback a pending state when a db write operation failed so that expected state do not diverge from db in the first place. I added a `PendingExpectedValue::Rollback` API so that we don't implicitly depend on thread termination to prevent test failures. Another place that could cause same divergence as external file ingestion is `PreloadDbAndReopenAsReadOnly`.
0758271d51/db_stress_tool/db_stress_test_base.cc (L616-L619)

Pull Request resolved: https://github.com/facebook/rocksdb/pull/12227

Reviewed By: hx235

Differential Revision: D52705470

Pulled By: jowlyzhang

fbshipit-source-id: b21586b037caeeba29a2cff8c2fdc6f1d0bda9cf
2024-01-16 13:28:51 -08:00
Levi Tamasi 7e4406a171 Add a changelog entry for PR 12235 (#12238)
Summary: Pull Request resolved: https://github.com/facebook/rocksdb/pull/12238

Reviewed By: jowlyzhang

Differential Revision: D52809593

fbshipit-source-id: 692852cdd3074275ef92bde83ff15a800d8ae3d5
2024-01-16 13:02:18 -08:00
anand76 b49f9cdd3c Add CompressionOptions to the compressed secondary cache (#12234)
Summary:
Add ```CompressionOptions``` to ```CompressedSecondaryCacheOptions``` to allow users to set options such as compression level. It allows performance to be fine tuned.

Tests -
Run db_bench and verify compression options in the LOG file

Pull Request resolved: https://github.com/facebook/rocksdb/pull/12234

Reviewed By: ajkr

Differential Revision: D52758133

Pulled By: anand1976

fbshipit-source-id: af849fbffce6f84704387c195d8edba40d9548f6
2024-01-16 12:21:27 -08:00
akankshamahajan cad76a2e1e Fix bug in auto_readahead_size that returned wrong key (#12229)
Summary:
IndexType::kBinarySearchWithFirstKey +
BlockCacheLookupForReadAheadSize enabled => FindNextUserEntryInternal assertion fails or iterator lands at a wrong key because BlockCacheLookupForReadAheadSize moves the index_iter_ and in internal_wrapper.h, result_.key didn't update and pointed to wrong key. Also ikey_ was also pointing to iter_.key() instead of copying the key.

Pull Request resolved: https://github.com/facebook/rocksdb/pull/12229

Test Plan:
```
 rm -rf /dev/shm/rocksdb_test/rocksdb_crashtest_blackbox_alt3 /dev/shm/rocksdb_test/rocksdb_crashtest_expected_alt3
mkdir /dev/shm/rocksdb_test/rocksdb_crashtest_blackbox_alt3 /dev/shm/rocksdb_test/rocksdb_crashtest_expected_alt3
./db_stress -threads=1 --acquire_snapshot_one_in=0 --adaptive_readahead=0 --allow_concurrent_memtable_write=0 --allow_data_in_errors=True --allow_setting_blob_options_dynamically=0 --async_io=0 --auto_readahead_size=1 --avoid_flush_during_recovery=0 --avoid_unnecessary_blocking_io=1 --backup_max_size=0 --backup_one_in=0 --batch_protection_bytes_per_key=0 --blob_cache_size=0 --blob_compaction_readahead_size=0 --blob_compression_type=lz4 --blob_file_size=0 --blob_file_starting_level=0 --blob_garbage_collection_age_cutoff=0 --blob_garbage_collection_force_threshold=0 --block_protection_bytes_per_key=0 --block_size=2048 --bloom_before_level=2147483646 --bloom_bits=15 --bottommost_compression_type=snappy --bottommost_file_compaction_delay=0 --bytes_per_sync=0 --cache_index_and_filter_blocks=0 --cache_size=8388608 --cache_type=lru_cache --charge_compression_dictionary_building_buffer=0 --charge_file_metadata=0 --charge_filter_construction=0 --charge_table_reader=0 --checkpoint_one_in=0 --checksum_type=kCRC32c --clear_column_family_one_in=0 --column_families=1 --compact_files_one_in=0 --compact_range_one_in=0 --compaction_pri=1 --compaction_readahead_size=0 --compaction_ttl=0 --compressed_secondary_cache_size=0 --compression_checksum=0 --compression_max_dict_buffer_bytes=511 --compression_max_dict_bytes=16384 --compression_parallel_threads=1 --compression_type=none --compression_use_zstd_dict_trainer=1 --compression_zstd_max_train_bytes=0 --continuous_verification_interval=0 --data_block_index_type=1 --db=/dev/shm/rocksdb_test/rocksdb_crashtest_blackbox_alt3 --db_write_buffer_size=0 --delpercent=0 --delrangepercent=0 --destroy_db_initially=0 --detect_filter_construct_corruption=0 --disable_wal=0 --enable_blob_files=0 --enable_blob_garbage_collection=0 --enable_compaction_filter=0 --enable_pipelined_write=0 --enable_thread_tracking=1 --expected_values_dir=/dev/shm/rocksdb_test/rocksdb_crashtest_expected_alt3 --fail_if_options_file_error=1 --fifo_allow_compaction=0 --file_checksum_impl=crc32c --flush_one_in=1000000 --format_version=3 --get_current_wal_file_one_in=0 --get_live_files_one_in=0 --get_property_one_in=0 --get_sorted_wal_files_one_in=0 --index_block_restart_interval=13 --index_type=3 --ingest_external_file_one_in=10 --initial_auto_readahead_size=0 --iterpercent=55 --key_len_percent_dist=1,30,69 --level_compaction_dynamic_level_bytes=0 --lock_wal_one_in=0 --long_running_snapshots=0 --manual_wal_flush_one_in=0 --mark_for_compaction_one_file_in=0 --max_auto_readahead_size=0 --max_background_compactions=1 --max_bytes_for_level_base=67108864 --max_key=100000 --max_key_len=3 --max_manifest_file_size=1073741824 --max_write_batch_group_size_bytes=16 --max_write_buffer_number=10 --max_write_buffer_size_to_maintain=4194304 --memtable_max_range_deletions=1000 --memtable_prefix_bloom_size_ratio=0.5 --memtable_protection_bytes_per_key=0 --memtable_whole_key_filtering=0 --memtablerep=skip_list --min_blob_size=8 --min_write_buffer_number_to_merge=2 --mmap_read=0 --mock_direct_io=False --nooverwritepercent=1 --num_file_reads_for_auto_readahead=2 --open_files=-1 --open_metadata_write_fault_one_in=0 --open_read_fault_one_in=0 --open_write_fault_one_in=0 --ops_per_thread=10000000 --optimize_filters_for_memory=0 --paranoid_file_checks=0 --partition_filters=0 --partition_pinning=0 --pause_background_one_in=0 --periodic_compaction_seconds=0 --prefix_size=1 --prefixpercent=0 --prepopulate_block_cache=0 --preserve_internal_time_seconds=0 --progress_reports=0 --read_fault_one_in=0 --readahead_size=1 --readpercent=45 --recycle_log_file_num=1 --reopen=0 --secondary_cache_fault_one_in=0 --secondary_cache_uri= --set_options_one_in=0 --snapshot_hold_ops=0 --sst_file_manager_bytes_per_sec=0 --sst_file_manager_bytes_per_truncate=0 --stats_dump_period_sec=600 --subcompactions=1 --sync=0 --sync_fault_injection=0 --target_file_size_base=16777216 --target_file_size_multiplier=1 --test_batches_snapshots=0 --top_level_index_pinning=0 --unpartitioned_pinning=0 --use_blob_cache=0 --use_direct_io_for_flush_and_compaction=0 --use_direct_reads=0 --use_full_merge_v1=0 --use_get_entity=0 --use_merge=0 --use_multi_get_entity=0 --use_multiget=0 --use_put_entity_one_in=0 --use_shared_block_and_blob_cache=0 --use_write_buffer_manager=0 --user_timestamp_size=0 --value_size_mult=32 --verification_only=0 --verify_checksum=0 --verify_checksum_one_in=0 --verify_db_one_in=0 --verify_file_checksums_one_in=0 --verify_iterator_with_expected_state_one_in=1 --verify_sst_unique_id_in_manifest=0 --wal_bytes_per_sync=0 --wal_compression=none --write_buffer_size=33554432 --write_dbid_to_manifest=0 --write_fault_one_in=0 --writepercent=0 > repro.out
Verification failed. Expected state has key 0000000000000077000000000000004178, iterator is at key 0000000000000077000000000000008A78
Column family: default, op_logs: S 0000000000000077000000000000003D7878787878 NNNN
No writes or ops?
Verification failed :(
```

Reviewed By: ajkr

Differential Revision: D52710655

Pulled By: akankshamahajan15

fbshipit-source-id: 9d2e684e190fb0832bdce3337bce1c6548cd054d
2024-01-16 11:30:36 -08:00
Jonah Gao e28251ca72 Fix blob files not reclaimed after deleting all SSTs (#12235)
Summary:
Fix issue https://github.com/facebook/rocksdb/issues/12208.

After all the SSTs have been deleted, all the blob files will become unreferenced.
These files should be considered obsolete and thus, should not be saved to the vstorage.

Pull Request resolved: https://github.com/facebook/rocksdb/pull/12235

Reviewed By: jowlyzhang

Differential Revision: D52806441

Pulled By: ltamasi

fbshipit-source-id: 62f94d4f2544ed2822c764d8ace5bf7f57efe42d
2024-01-16 11:15:23 -08:00
Andrew Kryczka 2dda7a0dd2 Detect compaction pressure at lower debt ratios (#12236)
Summary:
This PR significantly reduces the compaction pressure threshold introduced in https://github.com/facebook/rocksdb/issues/12130 by a factor of 64x. The original number was too high to trigger in scenarios where compaction parallelism was needed.

Pull Request resolved: https://github.com/facebook/rocksdb/pull/12236

Reviewed By: cbi42

Differential Revision: D52765685

Pulled By: ajkr

fbshipit-source-id: 8298e966933b485de24f63165a00e672cb9db6c4
2024-01-15 22:41:18 -08:00
Chdy 21d5a8f54f Fix a bug in sst_dump when parsing PlainTable (#12223)
Summary:
### Summary: The sst_dump tool occur IO Error when reading data in PlainTable, as shown in the follow
```bash
❯ ./sst_dump --file=/tmp/write_example  --command=scan --show_properties --verify_checksum
options.env is 0x60000282dc00
Process /tmp/write_example/001630.sst
Sst file format: plain table
/tmp/filepicker_example/001630.sst: IO error: While pread offset 0 len 758: /tmp/filepicker_example/001630.sst: Bad address
Process /tmp/filepicker_example/001624.sst
```

#### Reason
The root cause is that `fopts.use_mmap_reads` is false, `NewRandomAccessFile` will produce an `PosixRandomAccessFile` file. but `soptions_.use_mmap_reads` is true, This will result in unexpected calls in the `MmapDataIfNeeded` function.
```c++
Status SstFileDumper::GetTableReader(const std::string& file_path) {
	...

  if (s.ok()) {
    if (magic_number == kPlainTableMagicNumber ||
        magic_number == kLegacyPlainTableMagicNumber ||
			  magic_number == kCuckooTableMagicNumber) {
      soptions_.use_mmap_reads = true;
     ...

     // WARN: fopts.use_mmap_reads is false
      fs->NewRandomAccessFile(file_path, fopts, &file, nullptr);
      file_.reset(new RandomAccessFileReader(std::move(file), file_path));
    }
    ...

  }

  if (s.ok()) {
    // soptions_.use_mmap_reads is true
    s = NewTableReader(ioptions_, soptions_, internal_comparator_, file_size,
                       &table_reader_);
  }
  return s;
}
```

The following read logic was executed on a `PosixRandomAccessFile` file, Eventually, `PosixRandomAccessFile::Read` will be called with a `nullptr` `scratch`
```c++
Status PlainTableReader::MmapDataIfNeeded() {
  if (file_info_.is_mmap_mode) {
    // Get mmapped memory.
    // Executing the following logic on the PosixRandomAccessFile file is incorrect
    return file_info_.file->Read(
        IOOptions(), 0, static_cast<size_t>(file_size_), &file_info_.file_data,
        nullptr, nullptr, Env::IO_TOTAL /* rate_limiter_priority */);
  }
  return Status::OK();
}
```

#### Fix:
When parsing PlainTable, set the variable `fopts.use_mmap_reads` equal `soptions_.use_mmap_reads`,  When the `soptions_.use_mmap_reads` is true, `NewRandomAccessFile` will produce an `PosixMmapReadableFile` file. This will work correctly in the `MmapDataIfNeeded` function
```
❯ ./sst_dump --file=/tmp/write_example  --command=scan --show_properties --verify_checksum
options.env is 0x6000009323e0
Process /tmp/write_example/001630.sst
Sst file format: plain table
from [] to []
'keys496' seq:0, type:1 => values1496
'keys497' seq:0, type:1 => values1497
'keys498' seq:0, type:1 => values1498
Table Properties:
------------------------------
  # data blocks: 1
  # entries: 3
  # deletions: 0
  # merge operands: 0
  # range deletions: 0
  raw key size: 45
  raw average key size: 15.000000
  raw value size: 42
```

Pull Request resolved: https://github.com/facebook/rocksdb/pull/12223

Reviewed By: cbi42

Differential Revision: D52706238

Pulled By: ajkr

fbshipit-source-id: 2f9f518ec81d1cbde00bd65ab6bd304796836c0a
2024-01-12 14:56:10 -08:00
Yu Zhang 8d0c09d7e6 Abort verification when expected state has pending writes / db return non OK(NotFound) status (#12232)
Summary:
In the current flow, the verification will pass and continue the test when db return non Ok(NotFound) status while expected state has pending writes.

fdfd044bb2/db_stress_tool/no_batched_ops_stress.cc (L2054-L2065)

We can just abort when such a db status is ever encountered. This can prevent follow up tests like `TestCheckpoint` and `TestBackupRestore` to consider such a key as existing in the db via the `ExpectedState::Exists` API. This could be a reason for some recent test failures in this path.

Pull Request resolved: https://github.com/facebook/rocksdb/pull/12232

Reviewed By: cbi42

Differential Revision: D52737393

Pulled By: jowlyzhang

fbshipit-source-id: f2658c5332ccd42f6190783960e2dc6fcd81ccc5
2024-01-12 12:23:09 -08:00
Jay Huh fdfd044bb2 Logging for test failure due to get/multiget inconsistency (#12228)
Summary:
Additional logging for debugging purpose

Pull Request resolved: https://github.com/facebook/rocksdb/pull/12228

Test Plan: CI

Reviewed By: cbi42

Differential Revision: D52713401

Pulled By: jaykorean

fbshipit-source-id: 535972d60debb70c220887f0f4c06a32f7668f72
2024-01-11 18:15:17 -08:00