Commit Graph

12857 Commits

Author SHA1 Message Date
Yu Zhang d12aaf23ca Fix file deletions in DestroyDB not rate limited (#12891)
Summary:
Make `DestroyDB` slowly delete files if it's configured and enabled via `SstFileManager`.

It's currently not available mainly because of DeleteScheduler's logic related to tracked total_size_ and total_trash_size_. These accounting and logic should not be applied to `DestroyDB`. This PR adds a `DeleteUnaccountedDBFile` util for this purpose which deletes files without accounting it.  This util also supports assigning a file to a specified trash bucket so that user can later wait for a specific trash bucket to be empty. For `DestroyDB`, files with more than 1 hard links will be deleted immediately.

Pull Request resolved: https://github.com/facebook/rocksdb/pull/12891

Test Plan: Added unit tests, existing tests.

Reviewed By: anand1976

Differential Revision: D60300220

Pulled By: jowlyzhang

fbshipit-source-id: 8b18109a177a3a9532f6dc2e40e08310c08ca3c7
2024-08-02 19:31:55 -07:00
Peter Dillinger 9d5c8c89a1 Fix filter partition size logic (#12904)
Summary:
Was checking == a desired number of entries added to a filter, when the combination of whole key and prefix filtering could add more than one entry per table internal key. This could lead to unnecessarily large filter partitions, which could affect performance and block cache fairness.

Also (only somewhat related because of other work in progress):
* Some variable renaming and a new assertion in BlockBasedTableBuilder, to add some clarity.

Pull Request resolved: https://github.com/facebook/rocksdb/pull/12904

Test Plan:
If you add assertion logic to the base revision checking that the partition cut is requested whenever `keys_added_to_partition_ >= keys_per_partition_`, it fails on a number of db_bloom_filter_test tests. However, such an assertion in the revised code would be essentially redundant with the new logic.

If I added a regression test for this, it would be tricky and fragile, so I don't think it's important enough to chase and maintain.  (Open to suggestions / input.)

Reviewed By: jowlyzhang

Differential Revision: D60557827

Pulled By: pdillinger

fbshipit-source-id: 77a56097d540da6e7851941a26d26ced2d944373
2024-08-02 14:49:02 -07:00
Levi Tamasi 2e8a1a14ef Fix a data race affecting the background error status (#12910)
Summary:
Pull Request resolved: https://github.com/facebook/rocksdb/pull/12910

There is currently a call to `GetBGError()` in `DBImpl::WriteImplWALOnly()` where the DB mutex is (incorrectly) not held, leading to a data race. Technically, we could acquire the mutex here but instead, the patch removes the affected check altogether, since the same check is already performed (in a thread-safe manner) in the subsequent call to `PreprocessWrite()`.

Reviewed By: cbi42

Differential Revision: D60682008

fbshipit-source-id: 54b67975dcf57d67c068cac71e8ada09a1793ec5
2024-08-02 14:11:08 -07:00
Peter Dillinger 9245550e8b Clean up/refactor (Partitioned)FilterBlockBuilder (#12903)
Summary:
This is ahead of some related changes/enhancements. Refactorings here:
* Restructure some state of PartitionedFilterBlockBuilder to reduce redundancy in state tracking, improve clarity.
* Changed some function signatures to better match standard practice (return Status)
* Improve comments, arrange related fields
* Discourage/prevent production use of Finish without status (now TEST_Finish)

Pull Request resolved: https://github.com/facebook/rocksdb/pull/12903

Test Plan: existing tests

Reviewed By: jowlyzhang

Differential Revision: D60548613

Pulled By: pdillinger

fbshipit-source-id: d7dbc79951fcc3b837877227d58f713698ad2596
2024-08-02 13:35:45 -07:00
Hui Xiao 5e203c76a2 SyncWAL() before Close() when FLAGS_avoid_flush_during_shutdown=true in crash test (#12900)
Summary:
**Context/Summary:**
When we use WAL and don't flush data during shutdown `FLAGS_avoid_flush_during_shutdown=true`, then we rely on WAL to recover data in next Open() so will need to sync WAL in crash test. Currently the condition is flipped.

Pull Request resolved: https://github.com/facebook/rocksdb/pull/12900

Test Plan:
Below fails with data loss `Verification failed. Expected state has key 000000000000015D000000000000012B0000000000000147, iterator is at key 000000000000015D000000000000012B0000000000000152` before the fix but not after the fix
```
./db_stress --WAL_size_limit_MB=0 --WAL_ttl_seconds=0 --acquire_snapshot_one_in=10000 --adaptive_readahead=1 --adm_policy=3 --advise_random_on_open=1 --allow_concurrent_memtable_write=0 --allow_data_in_errors=True --allow_fallocate=1 --async_io=1 --auto_readahead_size=1 --avoid_flush_during_recovery=0 --avoid_flush_during_shutdown=1 --avoid_unnecessary_blocking_io=0 --backup_max_size=104857600 --backup_one_in=1000 --batch_protection_bytes_per_key=8 --bgerror_resume_retry_interval=100 --block_align=0 --block_protection_bytes_per_key=4 --block_size=16384 --bloom_before_level=0 --bloom_bits=10 --bottommost_compression_type=disable --bottommost_file_compaction_delay=3600 --bytes_per_sync=262144 --cache_index_and_filter_blocks=1 --cache_index_and_filter_blocks_with_high_priority=0 --cache_size=33554432 --cache_type=tiered_auto_hyper_clock_cache --charge_compression_dictionary_building_buffer=1 --charge_file_metadata=0 --charge_filter_construction=0 --charge_table_reader=0 --check_multiget_consistency=0 --check_multiget_entity_consistency=0 --checkpoint_one_in=1000000 --checksum_type=kxxHash64 --clear_column_family_one_in=0 --column_families=1 --compact_files_one_in=1000000 --compact_range_one_in=1000000 --compaction_pri=3 --compaction_readahead_size=0 --compaction_style=1 --compaction_ttl=0 --compress_format_version=1 --compressed_secondary_cache_ratio=0.3333333333333333 --compressed_secondary_cache_size=0 --compression_checksum=1 --compression_max_dict_buffer_bytes=0 --compression_max_dict_bytes=0 --compression_parallel_threads=8 --compression_type=zlib --compression_use_zstd_dict_trainer=0 --compression_zstd_max_train_bytes=0 --continuous_verification_interval=0 --daily_offpeak_time_utc= --data_block_index_type=1 --db=/dev/shm/rocksdb_test/rocksdb_crashtest_whitebox_2 --db_write_buffer_size=0 --default_temperature=kUnknown --default_write_temperature=kWarm --delete_obsolete_files_period_micros=30000000 --delpercent=4 --delrangepercent=1 --destroy_db_initially=1 --detect_filter_construct_corruption=0 --disable_file_deletions_one_in=1000000 --disable_manual_compaction_one_in=10000 --disable_wal=0 --dump_malloc_stats=0 --enable_checksum_handoff=0 --enable_compaction_filter=0 --enable_custom_split_merge=0 --enable_do_not_compress_roles=0 --enable_index_compression=1 --enable_memtable_insert_with_hint_prefix_extractor=0 --enable_pipelined_write=0 --enable_sst_partitioner_factory=0 --enable_thread_tracking=1 --enable_write_thread_adaptive_yield=1 --error_recovery_with_no_fault_injection=0 --exclude_wal_from_write_fault_injection=1 --expected_values_dir=/dev/shm/rocksdb_test/rocksdb_crashtest_expected_2 --fail_if_options_file_error=1 --fifo_allow_compaction=1 --file_checksum_impl=none --fill_cache=0 --flush_one_in=1000000 --format_version=5 --get_all_column_family_metadata_one_in=1000000 --get_current_wal_file_one_in=0 --get_live_files_apis_one_in=10000 --get_properties_of_all_tables_one_in=100000 --get_property_one_in=1000000 --get_sorted_wal_files_one_in=0 --hard_pending_compaction_bytes_limit=274877906944 --high_pri_pool_ratio=0.5 --index_block_restart_interval=13 --index_shortening=1 --index_type=2 --ingest_external_file_one_in=0 --initial_auto_readahead_size=524288 --inplace_update_support=0 --iterpercent=10 --key_len_percent_dist=1,30,69 --key_may_exist_one_in=100000 --last_level_temperature=kHot --level_compaction_dynamic_level_bytes=0 --lock_wal_one_in=10000 --log2_keys_per_lock=10 --log_file_time_to_roll=60 --log_readahead_size=16777216 --long_running_snapshots=0 --low_pri_pool_ratio=0.5 --lowest_used_cache_tier=0 --manifest_preallocation_size=0 --manual_wal_flush_one_in=0 --mark_for_compaction_one_file_in=0 --max_auto_readahead_size=0 --max_background_compactions=1 --max_bytes_for_level_base=67108864 --max_key=100000 --max_key_len=3 --max_log_file_size=1048576 --max_manifest_file_size=1073741824 --max_sequential_skip_in_iterations=16 --max_total_wal_size=0 --max_write_batch_group_size_bytes=16 --max_write_buffer_number=3 --max_write_buffer_size_to_maintain=8388608 --memtable_insert_hint_per_batch=0 --memtable_max_range_deletions=0 --memtable_prefix_bloom_size_ratio=0.01 --memtable_protection_bytes_per_key=4 --memtable_whole_key_filtering=1 --memtablerep=skip_list --metadata_charge_policy=1 --metadata_read_fault_one_in=0 --metadata_write_fault_one_in=0 --min_write_buffer_number_to_merge=1 --mmap_read=0 --mock_direct_io=True --nooverwritepercent=1 --num_file_reads_for_auto_readahead=2 --open_files=100 --open_metadata_read_fault_one_in=0 --open_metadata_write_fault_one_in=8 --open_read_fault_one_in=0 --open_write_fault_one_in=0 --ops_per_thread=200000 --optimize_filters_for_hits=1 --optimize_filters_for_memory=1 --optimize_multiget_for_io=0 --paranoid_file_checks=0 --partition_filters=0 --partition_pinning=1 --pause_background_one_in=1000000 --periodic_compaction_seconds=0 --prefix_size=1 --prefixpercent=5 --prepopulate_block_cache=0 --preserve_internal_time_seconds=0 --progress_reports=0 --promote_l0_one_in=0 --read_amp_bytes_per_bit=0 --read_fault_one_in=0 --readahead_size=16384 --readpercent=45 --recycle_log_file_num=1 --reopen=20 --report_bg_io_stats=1 --reset_stats_one_in=1000000 --sample_for_compression=0 --secondary_cache_fault_one_in=32 --secondary_cache_uri= --skip_stats_update_on_db_open=1 --snapshot_hold_ops=100000 --soft_pending_compaction_bytes_limit=68719476736 --sqfc_name=foo --sqfc_version=2 --sst_file_manager_bytes_per_sec=104857600 --sst_file_manager_bytes_per_truncate=0 --stats_dump_period_sec=0 --stats_history_buffer_size=0 --strict_bytes_per_sync=1 --subcompactions=3 --sync=0 --sync_fault_injection=1 --table_cache_numshardbits=6 --target_file_size_base=16777216 --target_file_size_multiplier=1 --test_batches_snapshots=0 --top_level_index_pinning=0 --uncache_aggressiveness=4404 --universal_max_read_amp=-1 --unpartitioned_pinning=2 --use_adaptive_mutex=0 --use_adaptive_mutex_lru=1 --use_attribute_group=0 --use_delta_encoding=1 --use_direct_io_for_flush_and_compaction=1 --use_direct_reads=1 --use_full_merge_v1=1 --use_get_entity=0 --use_merge=0 --use_multi_cf_iterator=1 --use_multi_get_entity=0 --use_multiget=1 --use_put_entity_one_in=0 --use_sqfc_for_range_queries=1 --use_timed_put_one_in=0 --use_write_buffer_manager=0 --user_timestamp_size=0 --value_size_mult=32 --verification_only=0 --verify_checksum=1 --verify_checksum_one_in=1000 --verify_compression=1 --verify_db_one_in=10000 --verify_file_checksums_one_in=0 --verify_iterator_with_expected_state_one_in=5 --verify_sst_unique_id_in_manifest=1 --wal_bytes_per_sync=0 --wal_compression=none --write_buffer_size=33554432 --write_dbid_to_manifest=1 --write_fault_one_in=0 --writepercent=35

```

Reviewed By: anand1976, ltamasi

Differential Revision: D60489038

Pulled By: hx235

fbshipit-source-id: fb35889ae1509eb1bac27b015bb24a07d3b95268
2024-08-02 10:45:34 -07:00
Changyu Bi 8be824e316 Use compensated file size for intra-L0 compaction (#12878)
Summary:
In leveled compaction, we pick intra-L0 compaction instead of L0->Lbase whenever L0 size is small. When L0 files contain many deletions, it makes more sense to compact then down instead of accumulating tombstones in L0. This PR uses compensated_file_size when computing L0 size for determining intra-L0 compaction. Also scale down the limit on total L0 size further to be more cautious about accumulating data in L0.

Pull Request resolved: https://github.com/facebook/rocksdb/pull/12878

Test Plan: updated unit test.

Reviewed By: hx235

Differential Revision: D59932421

Pulled By: cbi42

fbshipit-source-id: 9de973ac51eb7df81b38b8c68110072b1aa06321
2024-08-01 17:49:34 -07:00
Yu Zhang 005256bcc8 Fix same user collected property being re-added in stress tests (#12907)
Summary:
As titled. The `emplace_back` below will add the same collector factory again during Reopen.

Pull Request resolved: https://github.com/facebook/rocksdb/pull/12907

Reviewed By: pdillinger

Differential Revision: D60614170

Pulled By: jowlyzhang

fbshipit-source-id: a79498d209e4910a5e94a5cb742935015277918c
2024-08-01 16:21:39 -07:00
Levi Tamasi 8767267315 Attempt to fix the nightly build-linux-clang-13-asan-ubsan-with-folly build
Summary: https://github.com/facebook/rocksdb/pull/12801 updated the version of `folly` used in RocksDB builds to a revision that requires `g++` version 10 when built with a GNU toolchain. This shouldn't really matter for this nightly GitHub Actions job, since we're supposed to be building with `clang++-13`; however, due to the way the compilers had been set, seems like we were historically only building RocksDB with `clang` (and `folly` with `gcc-9`, which led to a broken build after the update). Attempt to fix this by setting `CC` / `CXX` to `clang` / `clang++` in the job's environment.

Reviewed By: pdillinger

Differential Revision: D60534452

fbshipit-source-id: c7b5a02409fb1ea50e4524731237f7bc8d3f7ca6
2024-08-01 13:29:56 -07:00
Yu Zhang 319374ae67 Add some checks at property block creation side (#12898)
Summary:
Crash test encountered this failure:
```file ingestion error: Corruption: properties unsorted under specified IngestExternalFileOptions: move_files: 0, verify_checksums_before_ingest: 1, verify_checksums_readahead_size: 1048576 (Empty string or missing field indicates default option or value is used```

Further inspection showed out of order table properties in an external file created by `SstFileWriter` for ingestion, and the file is likely created like this because it passed the initial checksum check. This change added some assertions to check invariant at the properties creation and collecting side.

Pull Request resolved: https://github.com/facebook/rocksdb/pull/12898

Test Plan: Existing tests

Reviewed By: hx235

Differential Revision: D60459817

Pulled By: jowlyzhang

fbshipit-source-id: 91474943d2f9d7795f00b6031c08a13ab91e2470
2024-07-31 13:28:17 -07:00
Peter Dillinger 2595476541 Fix rare WAL handling crash (#12899)
Summary:
A crash test failure in log sync in DBImpl::WriteToWAL is due to a missed case in https://github.com/facebook/rocksdb/issues/12734. Just need to apply similar logic from DBImpl::SyncWalImpl to check for an already closed WAL (nullptr writer). This is extremely rare because it only comes from failed Sync on a closed WAL.

Pull Request resolved: https://github.com/facebook/rocksdb/pull/12899

Test Plan: watch crash test

Reviewed By: cbi42

Differential Revision: D60481652

Pulled By: pdillinger

fbshipit-source-id: 4a176bb6a53dcf077f88344710a110c2f946c386
2024-07-30 17:38:30 -07:00
anand76 55877d8893 Make transaction name conflict check more robust (#12895)
Summary:
The `PessimisticTransaction::SetName()` code checks for an existing txn of the given name before registering the new txn. However, this is not atomic, which could result in a race condition if two txns try to register with the same name. Both might succeed and lead to unpredictable behavior. This PR makes the test and set atomic.

Pull Request resolved: https://github.com/facebook/rocksdb/pull/12895

Reviewed By: pdillinger

Differential Revision: D60460482

Pulled By: anand1976

fbshipit-source-id: e8afeb2356e1b8f4e8df785cb73532739f82579d
2024-07-30 12:31:02 -07:00
Peter Dillinger 9058fd037c Small CPU optimization to experimental range filters (#12893)
Summary:
By reusing an object that owns a vector. The vector allocation/sizing was substantial in a CPU profile.

Pull Request resolved: https://github.com/facebook/rocksdb/pull/12893

Test Plan: existing tests

Reviewed By: jowlyzhang

Differential Revision: D60405139

Pulled By: pdillinger

fbshipit-source-id: 8bfbc07cd9b4829f2ac9015e90f2b4eba61fd984
2024-07-29 14:23:35 -07:00
Yu Zhang 24d86f7b41 Add an option to toggle timestamp based validation for the whole DB (#12857)
Summary:
As titled. This PR adds a `TransactionDBOptions` field `enable_udt_validation` to allow user to toggle the timestamp based validation behavior across the whole DB. When it is true, which is the default value and the existing behavior. A recap of what this behavior is: `GetForUpdate` does timestamp based conflict checking to make sure no other transaction has committed a version of the key tagged with a timestamp equal to or newer than the calling transaction's `read_timestamp_` the user set via `SetReadTimestampForValidation`. When this field is set to false, we disable timestamp based validation for the whole DB. MyRocks find it hard to find a read timestamp for this validation API, so we added this flexibility.

Pull Request resolved: https://github.com/facebook/rocksdb/pull/12857

Test Plan: Added unit test

Reviewed By: ltamasi

Differential Revision: D60194134

Pulled By: jowlyzhang

fbshipit-source-id: b8507f8ddc37fc7a2948cf492ce5c599ae646fef
2024-07-29 13:54:37 -07:00
Hui Xiao 408e8d4c85 Handle injected write error after successful WAL write in crash test + misc (#12838)
Summary:
**Context/Summary:**
We discovered the following false positive in our crash test lately:
(1) PUT() writes k/v to WAL but fails in `ApplyWALToManifest()`. The k/v is in the WAL
(2) Current stress test logic will rollback the expected state of such k/v since PUT() fails
(3) If the DB crashes before recovery finishes and reopens, the WAL will be replayed and the k/v is in the DB while the expected state have been roll-backed.

We decided to leave those expected state to be pending until the loop-write of the same key succeeds.

Bonus: Now that I realized write to manifest can also fail the write which faces the similar problem as https://github.com/facebook/rocksdb/pull/12797, I decided to disable fault injection on user write per thread (instead of globally) when tracing is needed for prefix recovery; some refactory

Pull Request resolved: https://github.com/facebook/rocksdb/pull/12838

Test Plan:
Rehearsal CI
Run below command (varies on sync_fault_injection=1,0 to verify ExpectedState behavior) for a while to ensure crash recovery validation works fine

```
python3 tools/db_crashtest.py --simple blackbox --interval=30 --WAL_size_limit_MB=0 --WAL_ttl_seconds=0 --acquire_snapshot_one_in=10000 --adaptive_readahead=1 --adm_policy=1 --advise_random_on_open=0 --allow_concurrent_memtable_write=0 --allow_data_in_errors=True --allow_fallocate=0 --async_io=0 --auto_readahead_size=0 --avoid_flush_during_recovery=0 --avoid_flush_during_shutdown=0 --avoid_unnecessary_blocking_io=0 --backup_max_size=104857600 --backup_one_in=0 --batch_protection_bytes_per_key=0 --bgerror_resume_retry_interval=1000000 --block_align=1 --block_protection_bytes_per_key=4 --block_size=16384 --bloom_before_level=4 --bloom_bits=56.810257702625165 --bottommost_compression_type=none --bottommost_file_compaction_delay=0 --bytes_per_sync=262144 --cache_index_and_filter_blocks=1 --cache_index_and_filter_blocks_with_high_priority=1 --cache_size=8388608 --cache_type=auto_hyper_clock_cache --charge_compression_dictionary_building_buffer=1 --charge_file_metadata=1 --charge_filter_construction=1 --charge_table_reader=0 --check_multiget_consistency=0 --check_multiget_entity_consistency=1 --checkpoint_one_in=10000 --checksum_type=kxxHash --clear_column_family_one_in=0 --column_families=1 --compact_files_one_in=1000 --compact_range_one_in=1000 --compaction_pri=4 --compaction_readahead_size=1048576 --compaction_ttl=10 --compress_format_version=1 --compressed_secondary_cache_ratio=0.0 --compressed_secondary_cache_size=0 --compression_checksum=0 --compression_max_dict_buffer_bytes=0 --compression_max_dict_bytes=0 --compression_parallel_threads=1 --compression_type=none --compression_use_zstd_dict_trainer=0 --compression_zstd_max_train_bytes=0 --continuous_verification_interval=0 --daily_offpeak_time_utc=04:00-08:00 --data_block_index_type=1 --db_write_buffer_size=0 --default_temperature=kWarm --default_write_temperature=kCold --delete_obsolete_files_period_micros=30000000 --delpercent=20 --delrangepercent=20 --destroy_db_initially=0 --detect_filter_construct_corruption=0 --disable_file_deletions_one_in=10000 --disable_manual_compaction_one_in=1000000 --disable_wal=0 --dump_malloc_stats=0 --enable_checksum_handoff=1 --enable_compaction_filter=0 --enable_custom_split_merge=0 --enable_do_not_compress_roles=0 --enable_index_compression=1 --enable_memtable_insert_with_hint_prefix_extractor=0 --enable_pipelined_write=0 --enable_sst_partitioner_factory=0 --enable_thread_tracking=0 --enable_write_thread_adaptive_yield=0 --error_recovery_with_no_fault_injection=1 --exclude_wal_from_write_fault_injection=0 --fail_if_options_file_error=1 --fifo_allow_compaction=0 --file_checksum_impl=crc32c --fill_cache=1 --flush_one_in=1000000 --format_version=3 --get_all_column_family_metadata_one_in=1000000 --get_current_wal_file_one_in=0 --get_live_files_apis_one_in=1000000 --get_properties_of_all_tables_one_in=1000000 --get_property_one_in=100000 --get_sorted_wal_files_one_in=0 --hard_pending_compaction_bytes_limit=274877906944 --high_pri_pool_ratio=0.5 --index_block_restart_interval=4 --index_shortening=2 --index_type=0 --ingest_external_file_one_in=0 --initial_auto_readahead_size=16384 --inplace_update_support=0 --iterpercent=10 --key_len_percent_dist=1,30,69 --key_may_exist_one_in=100 --last_level_temperature=kWarm --level_compaction_dynamic_level_bytes=1 --lock_wal_one_in=10000 --log_file_time_to_roll=60 --log_readahead_size=16777216 --long_running_snapshots=1 --low_pri_pool_ratio=0 --lowest_used_cache_tier=0 --manifest_preallocation_size=0 --manual_wal_flush_one_in=0 --mark_for_compaction_one_file_in=10 --max_auto_readahead_size=16384 --max_background_compactions=1 --max_bytes_for_level_base=67108864 --max_key=100000 --max_key_len=3 --max_log_file_size=1048576 --max_manifest_file_size=32768 --max_sequential_skip_in_iterations=1 --max_total_wal_size=0 --max_write_batch_group_size_bytes=16 --max_write_buffer_number=10 --max_write_buffer_size_to_maintain=8388608 --memtable_insert_hint_per_batch=1 --memtable_max_range_deletions=0 --memtable_prefix_bloom_size_ratio=0.01 --memtable_protection_bytes_per_key=1 --memtable_whole_key_filtering=1 --memtablerep=skip_list --metadata_charge_policy=1 --metadata_read_fault_one_in=0 --metadata_write_fault_one_in=8 --min_write_buffer_number_to_merge=1 --mmap_read=1 --mock_direct_io=False --nooverwritepercent=1 --num_file_reads_for_auto_readahead=1 --open_files=-1 --open_metadata_read_fault_one_in=0 --open_metadata_write_fault_one_in=8 --open_read_fault_one_in=0 --open_write_fault_one_in=8 --ops_per_thread=100000000 --optimize_filters_for_hits=1 --optimize_filters_for_memory=1 --optimize_multiget_for_io=1 --paranoid_file_checks=0 --partition_filters=0 --partition_pinning=3 --pause_background_one_in=1000000 --periodic_compaction_seconds=2 --prefix_size=7 --prefixpercent=0 --prepopulate_block_cache=0 --preserve_internal_time_seconds=0 --progress_reports=0 --promote_l0_one_in=0 --read_amp_bytes_per_bit=0 --read_fault_one_in=1000 --readahead_size=524288 --readpercent=10 --recycle_log_file_num=1 --reopen=0 --report_bg_io_stats=0 --reset_stats_one_in=1000000 --sample_for_compression=0 --secondary_cache_fault_one_in=0 --set_options_one_in=0 --skip_stats_update_on_db_open=1 --snapshot_hold_ops=100000 --soft_pending_compaction_bytes_limit=68719476736 --sqfc_name=foo --sqfc_version=0 --sst_file_manager_bytes_per_sec=104857600 --sst_file_manager_bytes_per_truncate=0 --stats_dump_period_sec=10 --stats_history_buffer_size=0 --strict_bytes_per_sync=1 --subcompactions=4 --sync=1 --sync_fault_injection=0 --table_cache_numshardbits=6 --target_file_size_base=16777216 --target_file_size_multiplier=1 --test_batches_snapshots=0 --top_level_index_pinning=2 --uncache_aggressiveness=239 --universal_max_read_amp=-1 --unpartitioned_pinning=1 --use_adaptive_mutex=1 --use_adaptive_mutex_lru=1 --use_attribute_group=0 --use_delta_encoding=0 --use_direct_io_for_flush_and_compaction=0 --use_direct_reads=0 --use_full_merge_v1=0 --use_get_entity=0 --use_merge=0 --use_multi_cf_iterator=0 --use_multi_get_entity=0 --use_multiget=0 --use_put_entity_one_in=0 --use_sqfc_for_range_queries=1 --use_timed_put_one_in=0 --use_write_buffer_manager=0 --user_timestamp_size=0 --value_size_mult=32 --verification_only=0 --verify_checksum=1 --verify_checksum_one_in=1000000 --verify_compression=0 --verify_db_one_in=100000 --verify_file_checksums_one_in=1000000 --verify_iterator_with_expected_state_one_in=5 --verify_sst_unique_id_in_manifest=1 --wal_bytes_per_sync=0 --wal_compression=none --write_buffer_size=33554432 --write_dbid_to_manifest=0 --write_fault_one_in=8 --writepercent=40
```

Reviewed By: cbi42

Differential Revision: D59377075

Pulled By: hx235

fbshipit-source-id: 91f602fd67e2d339d378cd28b982095fd073dcb6
2024-07-29 13:51:49 -07:00
Yu Zhang d94c2adc28 Add entry for bug fix in #12882 (#12892)
Summary:
As titled.

Pull Request resolved: https://github.com/facebook/rocksdb/pull/12892

Reviewed By: hx235

Differential Revision: D60400651

Pulled By: jowlyzhang

fbshipit-source-id: 2dd60c2287143f464ecab0de859715af6ab3a825
2024-07-29 12:20:50 -07:00
Yu Zhang 9883b5f497 Fix manifest_number_ point to invalid file (#12882)
Summary:
This PR fix `VersionSet`'s `manifest_number_` could be pointing to an invalid number intermediately. This happens when a new manifest roll is attempted but fast failed after loading table handlers and before the new manifest file creation/writing is actually attempted.

In theory, a later manifest roll effort will overthrow this intermediate invalid in memory state. There is on harm when the DB crashes in this invalid state either. But efforts that takes a file snapshot of the DB like backup will incorrectly try to copy a non existing manifest file.

Pull Request resolved: https://github.com/facebook/rocksdb/pull/12882

Reviewed By: cbi42

Differential Revision: D60204956

Pulled By: jowlyzhang

fbshipit-source-id: effbdb124b582f879d114988af06ac63867fc549
2024-07-24 17:50:08 -07:00
Yu Zhang 05c9c9aeed Fix race between test and recovery flush switch memtable (#12884)
Summary:
As titled, to fix this type of data race:
https://github.com/facebook/rocksdb/actions/runs/10066814221/job/27829003372?pr=12882

Pull Request resolved: https://github.com/facebook/rocksdb/pull/12884

Test Plan:
COMPILE_WITH_TSAN=1 make -j10 db_wal_test
./db_wal_test --gtest_filter=DBWALTest.RecoveryFlushSwitchWALOnEmptyMemtable --gtest_repeat=100

Reviewed By: anand1976

Differential Revision: D60197834

Pulled By: jowlyzhang

fbshipit-source-id: 89524cdb4d17a1b647295bcccf5eb2d7d425bc6a
2024-07-24 17:06:16 -07:00
Jay Huh 086849aa4f Properly disable MultiCFIterator in WritePrepared/UnPreparedTxnDBs (#12883)
Summary:
MultiCfIterators (`CoalescingIterator` and `AttributeGroupIterator`) are not yet compatible with write-prepared/write-unprepared transactions, yet (write-committed is fine). This fix includes the following.

- Properly return `ErrorIterator` if the user attempts to use the `CoalescingIterator` or `AttributeGroupIterator` in WritePreparedTxnDB (and WriteUnpreparedTxnDB)
- Set `use_multi_cf_iterator = 0` if `use_txn=1` and `txn_write_policy != 0 (WRITE_COMMITTED)` in stress test.

Pull Request resolved: https://github.com/facebook/rocksdb/pull/12883

Test Plan:
Works
```
./db_stress ... --use_txn=1 --txn_write_policy=0 --use_multi_cf_iterator=1
```

Fails
```
./db_stress ... --use_txn=1 --txn_write_policy=1 --use_multi_cf_iterator=1
```

Reviewed By: cbi42

Differential Revision: D60190784

Pulled By: jaykorean

fbshipit-source-id: 3bc1093e81a4ef5753ba9b32c5aea997c21bfd33
2024-07-24 16:50:12 -07:00
Peter Dillinger f456a7213f Refactor IndexBuilder::AddIndexEntry (#12867)
Summary:
Something I am working on is going to expand usage of `BlockBasedTableBuilder::Rep::last_key`, but the existing code contract for `IndexBuilder::AddIndexEntry` makes that difficult because it modifies its `last_key` parameter to be the separator value recorded in the index, often something between the two boundary keys.

This change primarily changes the contract of that function and related functions to separate function inputs and outputs, without sacrificing efficiency. For efficiency, a reusable scratch string buffer is provided by the caller, which the callee can use (or not) in returning a result Slice. That should yield a performance improvement as we are reusing a buffer for keys rather than copying into a new one each time in the FindShort* functions, without any additional string copies or conditional branches.

Additional improvements in PartitionedIndexBuilder specifically:
* Reduce string copies by eliminating `sub_index_last_key_` and instead tracking the key for the next partition in a placeholder Entry.
* Simplify code and improve code quality by changing `sub_index_builder_` to unique_ptr.
* Eliminate unnecessary NewFlushBlockPolicy call/object.

Pull Request resolved: https://github.com/facebook/rocksdb/pull/12867

Test Plan: existing tests, crash test. Will validate performance along with the change this is setting up.

Reviewed By: anand1976

Differential Revision: D59793119

Pulled By: pdillinger

fbshipit-source-id: 556da75cf13b967511f84702b2713d152f536a07
2024-07-22 14:27:31 -07:00
Hui Xiao 15d9988ab2 Update history and version for 9.5.fb release (#12880)
Summary: Pull Request resolved: https://github.com/facebook/rocksdb/pull/12880

Reviewed By: jaykorean, jowlyzhang

Differential Revision: D60057955

Pulled By: hx235

fbshipit-source-id: 1c599a5334aff1f424bb473275efe4349b17d41d
2024-07-22 13:15:09 -07:00
Hui Xiao 349b1ec08f Fix duplicate WAL entries caused by write after error recovery (#12873)
Summary:
**Context/Summary:**
We recently discovered a case where write of the same key right after error recovery of a previous failed write of the same key finishes causes two same WAL entries, violating our assertion. This is because we don't advance seqno on failed write and reuse the same WAL containing the failed write for the new write if the memtable at the time is empty.

This PR reuses the flush path for an empty memtable to switch WAL and update min WAL to keep in error recovery flush
 as well as updates the INFO log message for clarity.

```
2024/07/17-15:01:32.271789 327757 (Original Log Time 2024/07/17-15:01:25.942234) [/flush_job.cc:1017] [default] [JOB 2] Level-0 flush table https://github.com/facebook/rocksdb/issues/9: 0 bytes OK It's an empty SST file from a successful flush so won't be kept in the DB
2024/07/17-15:01:32.271798 327757 (Original Log Time 2024/07/17-15:01:32.269954) [/memtable_list.cc:560] [default] Level-0 commit flush result of table https://github.com/facebook/rocksdb/issues/9 started
2024/07/17-15:01:32.271802 327757 (Original Log Time 2024/07/17-15:01:32.271217) [/memtable_list.cc:760] [default] Level-0 commit flush result of table https://github.com/facebook/rocksdb/issues/9: memtable https://github.com/facebook/rocksdb/issues/1 done
```

Pull Request resolved: https://github.com/facebook/rocksdb/pull/12873

Test Plan:
New UT that failed before this PR with following assertion failure (i.e, duplicate WAL entries) and passes after
```
db_wal_test: db/write_batch.cc:2254: rocksdb::Status rocksdb::{anonymous}::MemTableInserter::PutCFImpl(uint32_t, const rocksdb::Slice&, const rocksdb::Slice&, rocksdb::ValueType, RebuildTxnOp, const ProtectionInfoKVOS64*) [with RebuildTxnOp = rocksdb::{anonymous}::MemTableInserter::PutCF(uint32_t, const rocksdb::Slice&, const rocksdb::Slice&)::<lambda(rocksdb::WriteBatch*, uint32_t, const rocksdb::Slice&, const rocksdb::Slice&)>; uint32_t = unsigned int; rocksdb::ProtectionInfoKVOS64 = rocksdb::ProtectionInfoKVOS<long unsigned int>]: Assertion `seq_per_batch_' failed.
```

Reviewed By: anand1976

Differential Revision: D59884468

Pulled By: hx235

fbshipit-source-id: 5d854b719092552c69727a979f269fb7f6c39756
2024-07-22 12:40:25 -07:00
Changyu Bi c064ac3bc5 Avoid opening table files and reading table properties under mutex (#12879)
Summary:
InitInputTableProperties() can open and do IOs and is called under mutex_. This PR removes it from FinalizeInputInfo(). It is now called in CompactionJob::Run() and BuildCompactionJobInfo() (called in NotifyOnCompactionBegin()) without holding mutex_.

Pull Request resolved: https://github.com/facebook/rocksdb/pull/12879

Test Plan: existing unit tests. Added assert in GetInputTableProperties() to ensure that input_table_properties_ is initialized whenever it's called.

Reviewed By: hx235

Differential Revision: D59933195

Pulled By: cbi42

fbshipit-source-id: c8089e13af8567fa3ab4b94d9ec384ae98ab2ec8
2024-07-19 19:12:45 -07:00
Changyu Bi 4384dd5eee Support ingesting SST files generated by a live DB (#12750)
Summary:
... to enable use cases like using RocksDB to merge sort data for ingestion. A new file ingestion option `IngestExternalFileOptions::allow_db_generated_files` is introduced to allows users to ingest SST files generated by live DBs instead of SstFileWriter. For now this only works if the SST files being ingested have zero as their largest sequence number AND do not overlap with any data in the DB (so we can assign seqno 0 which matches the seqno of all ingested keys).

The feature is marked the option as experimental for now.

Main changes needed to enable this:
- ignore CF id mismatch during ingestion
- ignore the missing external file version table property

Rest of the change is mostly in new unit tests.

A previous attempt is in https://github.com/facebook/rocksdb/issues/5602.

Pull Request resolved: https://github.com/facebook/rocksdb/pull/12750

Test Plan: - new unit tests

Reviewed By: ajkr, jowlyzhang

Differential Revision: D58396673

Pulled By: cbi42

fbshipit-source-id: aae513afad7b1ff5d4faa48104df5f384926bf03
2024-07-19 16:14:54 -07:00
anand76 0fca5e31b4 Fix race between manifest error recovery and file ingestion (#12871)
Summary:
This PR fixes an assertion failure in `DBImpl::ResumeImpl` - `assert(!versions_->descriptor_log_)`. In `VersionSet`, `descriptor_log_` has a pointer to the current MANIFEST writer. When there's an error updating the manifest, `descriptor_log_` is reset, and the error recovery thread checks `io_status()` in `VersionSet` and attempts to write a new MANIFEST. If another DB manipulation happens at the same time (like external file ingestion, column family manipulation etc), it calls `LogAndApply`, which also attempts to write a new MANIFEST. The assertion in `ResumeImpl` might fail in this case since the other MANIFEST writer may have updated `descriptor_log_`. To prevent the assertion, this fix updates both `io_status_` and `descriptor_log_` while holding the DB mutex.

The other option would have been to simply remove the assert. But I think its important to have it to ensure the invariant that `io_status_` is cleared if the MANIFEST is written successfully, and this fix makes things easier to reason about.

Pull Request resolved: https://github.com/facebook/rocksdb/pull/12871

Test Plan: Existing tests and crash test

Reviewed By: hx235

Differential Revision: D59926947

Pulled By: anand1976

fbshipit-source-id: af9ad18da3e29fc62c7ec2e30e0738aa33d4e5f1
2024-07-19 10:37:51 -07:00
Peter Dillinger de6d0e5ec3 Reduce cases of impacted performance from bug fix (#12874)
Summary:
https://github.com/facebook/rocksdb/issues/12872 was a bit too gross of a fix, because we still don't need to track previous prefix in FullFilterBlockBuilder for many non-partitioned use cases. This basically narrows the fix (and potentail CPU regression) to partitioned+prefix filter cases, which are the cases that needed to be fixed.

A better efficiency fix would still be nice but not as high of a priority.

Pull Request resolved: https://github.com/facebook/rocksdb/pull/12874

Test Plan: existing tests (just added in https://github.com/facebook/rocksdb/issues/12872)

Reviewed By: jowlyzhang

Differential Revision: D59885591

Pulled By: pdillinger

fbshipit-source-id: 8f273fc3e14c4b60c8a55501dc4bbcc325cd17a1
2024-07-17 16:42:27 -07:00
Peter Dillinger 93b163d1a2 Fix major bug with prefixes, SeekForPrev, and partitioned filters (#12872)
Summary:
Basically, the fix in https://github.com/facebook/rocksdb/issues/8137 was incomplete (and I missed it in the review), because if `whole_key_filtering` is false, then `last_prefix_str_` will never be set to non-empty and the fix doesn't work. Also related to https://github.com/facebook/rocksdb/issues/5835.

This is intended as a safe, simple fix that will regress CPU efficiency slightly (for `whole_key_filtering=false` cases, because of extra prefix string copies during flush & compaction). An efficient fix is not possible without some substantial refactoring.

Also in this PR: new test DBBloomFilterTest.FilterNumEntriesCoalesce tests an adjacent code path that was previously untested for its effect of ensuring the number of unique prefixes and keys is tracked properly when both prefixes and whole keys are going into a filter. (Test fails when either of the two code segments checking for duplicates is disabled.) In addition, the same test would fail before the main bug fix here because the code would inappropriately add the empty string to the filter (because of unmodified `last_prefix_str_`).

Pull Request resolved: https://github.com/facebook/rocksdb/pull/12872

Test Plan: In addition to DBBloomFilterTest.FilterNumEntriesCoalesce, extended DBBloomFilterTest.SeekForPrevWithPartitionedFilters to cover the broken case. (Mostly whitespace change.)

Reviewed By: jowlyzhang

Differential Revision: D59873793

Pulled By: pdillinger

fbshipit-source-id: 2a7b7f09ca73dc188fb4dab833826ad6da7ebb11
2024-07-17 14:08:35 -07:00
Hui Xiao 21db55f816 Move WAL sync before memtable insertion (#12869)
Summary:
**Context/Summary:**
WAL sync currently happens after memtable write. This causes inconvenience in stress test as we can't simply rollback the ExpectedState when write fails due to injected WAL sync error so something complicated like https://github.com/facebook/rocksdb/pull/12838 might be needed. After moving WAL sync before memtable insertion, there should not be injected IO error after memtable insertion so we can keep the current simple way of handling failed write in stress test with ExpectedState rollback.

Pull Request resolved: https://github.com/facebook/rocksdb/pull/12869

Test Plan:
1. Below command failed with `iterator has key 0000000000000207000000000000012B0000000000000013, but expected state does not.` before this PR and passes after
```
./db_stress  --WAL_size_limit_MB=0 --WAL_ttl_seconds=0 --acquire_snapshot_one_in=10000 --adaptive_readahead=1 --adm_policy=1 --advise_random_on_open=0 --allow_concurrent_memtable_write=0 --allow_data_in_errors=True --allow_fallocate=0 --async_io=0 --auto_readahead_size=0 --avoid_flush_during_recovery=0 --avoid_flush_during_shutdown=0 --avoid_unnecessary_blocking_io=0 --backup_max_size=104857600 --backup_one_in=0 --batch_protection_bytes_per_key=0 --bgerror_resume_retry_interval=1000000 --block_align=1 --block_protection_bytes_per_key=4 --block_size=16384 --bloom_before_level=4 --bloom_bits=56.810257702625165 --bottommost_compression_type=none --bottommost_file_compaction_delay=0 --bytes_per_sync=262144 --cache_index_and_filter_blocks=1 --cache_index_and_filter_blocks_with_high_priority=1 --cache_size=8388608 --cache_type=auto_hyper_clock_cache --charge_compression_dictionary_building_buffer=1 --charge_file_metadata=1 --charge_filter_construction=1 --charge_table_reader=0 --check_multiget_consistency=0 --check_multiget_entity_consistency=1 --checkpoint_one_in=10000 --checksum_type=kxxHash --clear_column_family_one_in=0 --column_families=1 --compact_files_one_in=1000 --compact_range_one_in=1000 --compaction_pri=4 --compaction_readahead_size=1048576 --compaction_ttl=10 --compress_format_version=1 --compressed_secondary_cache_ratio=0.0 --compressed_secondary_cache_size=0 --compression_checksum=0 --compression_max_dict_buffer_bytes=0 --compression_max_dict_bytes=0 --compression_parallel_threads=1 --compression_type=none --compression_use_zstd_dict_trainer=0 --compression_zstd_max_train_bytes=0 --continuous_verification_interval=0 --daily_offpeak_time_utc=04:00-08:00 --data_block_index_type=1 --db=/dev/shm/rocksdb_test/rocksdb_crashtest_blackbox --db_write_buffer_size=0 --default_temperature=kWarm --default_write_temperature=kCold --delete_obsolete_files_period_micros=30000000 --delpercent=0 --delrangepercent=0 --destroy_db_initially=0 --detect_filter_construct_corruption=0 --disable_file_deletions_one_in=10000 --disable_manual_compaction_one_in=1000000 --disable_wal=0 --dump_malloc_stats=0 --enable_checksum_handoff=1 --enable_compaction_filter=0 --enable_custom_split_merge=0 --enable_do_not_compress_roles=0 --enable_index_compression=1 --enable_memtable_insert_with_hint_prefix_extractor=0 --enable_pipelined_write=0 --enable_sst_partitioner_factory=0 --enable_thread_tracking=0 --enable_write_thread_adaptive_yield=0 --error_recovery_with_no_fault_injection=1 --exclude_wal_from_write_fault_injection=1 --expected_values_dir=/dev/shm/rocksdb_test/rocksdb_crashtest_expected --fail_if_options_file_error=1 --fifo_allow_compaction=0 --file_checksum_impl=crc32c --fill_cache=1 --flush_one_in=1000000 --format_version=3 --get_all_column_family_metadata_one_in=1000000 --get_current_wal_file_one_in=0 --get_live_files_apis_one_in=1000000 --get_properties_of_all_tables_one_in=1000000 --get_property_one_in=100000 --get_sorted_wal_files_one_in=0 --hard_pending_compaction_bytes_limit=274877906944 --high_pri_pool_ratio=0.5 --index_block_restart_interval=4 --index_shortening=2 --index_type=0 --ingest_external_file_one_in=0 --initial_auto_readahead_size=16384 --inplace_update_support=0 --iterpercent=50 --key_len_percent_dist=1,30,69 --key_may_exist_one_in=100 --last_level_temperature=kWarm --level_compaction_dynamic_level_bytes=1 --lock_wal_one_in=10000 --log_file_time_to_roll=60 --log_readahead_size=16777216 --long_running_snapshots=1 --low_pri_pool_ratio=0 --lowest_used_cache_tier=0 --manifest_preallocation_size=0 --manual_wal_flush_one_in=0 --mark_for_compaction_one_file_in=10 --max_auto_readahead_size=16384 --max_background_compactions=1 --max_bytes_for_level_base=67108864 --max_key=100000 --max_key_len=3 --max_log_file_size=1048576 --max_manifest_file_size=32768 --max_sequential_skip_in_iterations=1 --max_total_wal_size=0 --max_write_batch_group_size_bytes=16 --max_write_buffer_number=10 --max_write_buffer_size_to_maintain=8388608 --memtable_insert_hint_per_batch=1 --memtable_max_range_deletions=0 --memtable_prefix_bloom_size_ratio=0.01 --memtable_protection_bytes_per_key=1 --memtable_whole_key_filtering=1 --memtablerep=skip_list --metadata_charge_policy=1 --metadata_read_fault_one_in=32 --metadata_write_fault_one_in=0 --min_write_buffer_number_to_merge=1 --mmap_read=1 --mock_direct_io=False --nooverwritepercent=1 --num_file_reads_for_auto_readahead=1 --open_files=-1 --open_metadata_read_fault_one_in=0 --open_metadata_write_fault_one_in=0 --open_read_fault_one_in=0 --open_write_fault_one_in=0 --ops_per_thread=100000000 --optimize_filters_for_hits=1 --optimize_filters_for_memory=1 --optimize_multiget_for_io=1 --paranoid_file_checks=0 --partition_filters=0 --partition_pinning=3 --pause_background_one_in=1000000 --periodic_compaction_seconds=2 --prefix_size=7 --prefixpercent=0 --prepopulate_block_cache=0 --preserve_internal_time_seconds=0 --progress_reports=0 --promote_l0_one_in=0 --read_amp_bytes_per_bit=0 --read_fault_one_in=1000 --readahead_size=524288 --readpercent=0 --recycle_log_file_num=1 --reopen=0 --report_bg_io_stats=0 --reset_stats_one_in=1000000 --sample_for_compression=0 --secondary_cache_fault_one_in=0 --set_options_one_in=0 --skip_stats_update_on_db_open=1 --snapshot_hold_ops=100000 --soft_pending_compaction_bytes_limit=68719476736 --sqfc_name=foo --sqfc_version=0 --sst_file_manager_bytes_per_sec=104857600 --sst_file_manager_bytes_per_truncate=0 --stats_dump_period_sec=10 --stats_history_buffer_size=0 --strict_bytes_per_sync=1 --subcompactions=4 --sync=1 --sync_fault_injection=0 --table_cache_numshardbits=6 --target_file_size_base=16777216 --target_file_size_multiplier=1 --test_batches_snapshots=0 --top_level_index_pinning=2 --uncache_aggressiveness=239 --universal_max_read_amp=-1 --unpartitioned_pinning=1 --use_adaptive_mutex=1 --use_adaptive_mutex_lru=1 --use_attribute_group=0 --use_delta_encoding=0 --use_direct_io_for_flush_and_compaction=0 --use_direct_reads=0 --use_full_merge_v1=0 --use_get_entity=0 --use_merge=0 --use_multi_cf_iterator=0 --use_multi_get_entity=0 --use_multiget=0 --use_put_entity_one_in=0 --use_sqfc_for_range_queries=1 --use_timed_put_one_in=0 --use_write_buffer_manager=0 --user_timestamp_size=0 --value_size_mult=32 --verification_only=0 --verify_checksum=1 --verify_checksum_one_in=1000000 --verify_compression=0 --verify_db_one_in=100000 --verify_file_checksums_one_in=1000000 --verify_iterator_with_expected_state_one_in=5 --verify_sst_unique_id_in_manifest=1 --wal_bytes_per_sync=0 --wal_compression=none --write_buffer_size=33554432 --write_dbid_to_manifest=0 --write_fault_one_in=128 --writepercent=50

Reviewed By: jowlyzhang

Differential Revision: D59825730

Pulled By: hx235

fbshipit-source-id: 7d77aaf177ded2f99bf1ce19f5a4bd0783b9ca92
2024-07-17 13:39:14 -07:00
Hui Xiao 6870cc1187 Temporally disable log recycle with testing GetLiveFilesStorageInfo() (#12868)
Summary:
**Context/Summary:**
We recently discovered a case where `GetLiveFilesStorageInfo()` failed when `Options::recycle_log_file_num` > 0. Before fixing the incompatibility, we disable these combination in stress test.

Pull Request resolved: https://github.com/facebook/rocksdb/pull/12868

Test Plan: monitor CI

Reviewed By: jowlyzhang

Differential Revision: D59820802

Pulled By: hx235

fbshipit-source-id: 7b09063af6d72ae0ba187b4cf8887abd8a78e5e8
2024-07-16 12:37:50 -07:00
Hui Xiao 9e4ee7f0c6 Fix non-okay status being ignored in write path under two_write_queues_ (#12866)
Summary:
Context/Summary: see above, though the impact is small.

Pull Request resolved: https://github.com/facebook/rocksdb/pull/12866

Test Plan: exiting UT

Reviewed By: anand1976

Differential Revision: D59782913

Pulled By: hx235

fbshipit-source-id: ec02843645cce49466bde602035d2e61c31965b8
2024-07-16 10:55:08 -07:00
anand76 5aa675457e Fix unhandled MANIFEST write errors (#12865)
Summary:
The failure of `WriteCurrentStateToManifest()` in `VersionSet::ProcessManifestWrites()` was not handled properly. If it failed, `manifest_io_status` was not updated, leading to `manifest_file_number_` being updated to the newly created manifest even though its bad. This would lead to the bad manifest immediately getting deleted, and also the good manifest (referenced by `CURRENT`) getting deleted by obsolete file deletion because of `manifest_file_number_` not referencing its number.

Pull Request resolved: https://github.com/facebook/rocksdb/pull/12865

Reviewed By: hx235

Differential Revision: D59782940

Pulled By: anand1976

fbshipit-source-id: f752fb9a1c23fd3d734616e273613cbac204301b
2024-07-15 19:13:29 -07:00
Hui Xiao 4ff35afb42 Fix a bug where `OnErrorRecoveryBegin()` is not called before auto-recovery (#12860)
Summary:
**Context/Summary:**
`*auto_recovery` needs to be set true in order for `OnErrorRecoveryBegin()` to be called before auto-recovery
3db030d7ee/db/event_helpers.cc (L64-L66)
Currently it's set false for auto-recovery. This PR fixes it.

Pull Request resolved: https://github.com/facebook/rocksdb/pull/12860

Test Plan:
- Manual observation that it is called
- Existing UT

Reviewed By: jowlyzhang

Differential Revision: D59693315

Pulled By: hx235

fbshipit-source-id: 3f428c5b1e9818bb7697fdcd7f245d11378eb14a
2024-07-15 17:00:14 -07:00
WangQian 755010f8d3 Fix the bug with using the user comparator to compare prefix. (#12862)
Summary:
Fixes https://github.com/facebook/rocksdb/issues/12855

Pull Request resolved: https://github.com/facebook/rocksdb/pull/12862

Reviewed By: cbi42

Differential Revision: D59771651

Pulled By: jowlyzhang

fbshipit-source-id: ffe0025143f51f9ce1b46900c3fef6a20eb34f4a
2024-07-15 15:13:29 -07:00
Peter Dillinger 0e3e43f4d1 FaultInjectionTestFS follow-up and clean-up (#12861)
Summary:
In follow-up to https://github.com/facebook/rocksdb/issues/12852:
* Use std::copy in place of copy_n for potentially overlapping buffer
* Get rid of troublesome -1 idiom from `pos_at_last_append_` and `pos_at_last_sync_`
* Small improvements to test FaultInjectionFSTest.ReadUnsyncedData

Pull Request resolved: https://github.com/facebook/rocksdb/pull/12861

Test Plan: CI, crash test, etc.

Reviewed By: cbi42

Differential Revision: D59757484

Pulled By: pdillinger

fbshipit-source-id: c6fbdc2e97c959983184925a855cc8b0285fa23f
2024-07-15 10:28:34 -07:00
Changyu Bi b800b5eb6a Deflake ThreadStatus related unit tests (#12858)
Summary:
Unit tests `DBTest.ThreadStatusFlush` and `DBTestWithParam.ThreadStatusSingleCompaction` have been flaky and fail with error message
```
[ RUN      ] DBTest.ThreadStatusFlush
op_count: 0, expected_count 1
thread id: 718113, thread status: , cf_name
thread id: 718114, thread status: , cf_name pikachu
/__w/rocksdb/rocksdb/db/db_test.cc:4817: Failure
Value of: VerifyOperationCount(env_, ThreadStatus::OP_FLUSH, 1)
  Actual: false
Expected: true
[  FAILED  ] DBTest.ThreadStatusFlush (106 ms)

[ RUN      ] DBTestWithParam/DBTestWithParam.ThreadStatusSingleCompaction/0
db/db_test.cc:4673: Failure
Expected equality of these values:
  op_count
    Which is: 0
  expected_count
    Which is: 1
[  FAILED  ] DBTestWithParam/DBTestWithParam.ThreadStatusSingleCompaction/0, where GetParam() = (1, false)
```

One cause for this is that before flush/compaction finishes, we will go through `~WritableFileWriter()`, either for WAL or SST file, and temporarily set thread_operation to UNKNOWN. This UNKNOWN thread operation seem to be there for some stress test verification. This PR fixes these tests by setting the IOActivity in ~WritableFileWriter() for debug build.

Pull Request resolved: https://github.com/facebook/rocksdb/pull/12858

Test Plan: monitor future test failure.

Reviewed By: hx235

Differential Revision: D59691564

Pulled By: cbi42

fbshipit-source-id: 3f96998bba9d42aba50d1830c2b51bef2dd6705f
2024-07-15 09:56:09 -07:00
Peter Dillinger 72438a6788 Support read & write with unsynced data in FaultInjectionTestFS (#12852)
Summary:
Follow-up to https://github.com/facebook/rocksdb/issues/12729 and others to fix FaultInjectionTestFS handling the case where a live WAL is being appended to and synced while also being copied for checkpoint or backup, up to a known flushed (but not necessarily synced) prefix of the file. It was tricky to structure the code in a way that could handle a tricky race with Sync in another thread (see code comments, thanks Changyu) while maintaining good performance and test-ability.

For more context, see the call to FlushWAL() in DBImpl::GetLiveFilesStorageInfo().

Also, the unit test for https://github.com/facebook/rocksdb/issues/12729 was neutered by https://github.com/facebook/rocksdb/issues/12797, and this re-enables the functionality it is testing.

Pull Request resolved: https://github.com/facebook/rocksdb/pull/12852

Test Plan:
unit test expanded/updated. Local runs of blackbox_crash_test.

The implementation is structured so that a multi-threaded unit test is not needed to cover at least the code lines, as the race handling is folded into "catch up after returning unsynced and then a sync."

Reviewed By: cbi42

Differential Revision: D59594045

Pulled By: pdillinger

fbshipit-source-id: 94667bb72255e2952586c53bae2c2dd384e85a50
2024-07-12 16:01:57 -07:00
Yu Zhang 3db030d7ee Fix bug for recovering a prepared but not committed txn (#12856)
Summary:
This PR fix a bug for recovering a prepared Transaction that can contain user-defined timestamps.

The `Transaction::Put` type of APIs expect the key provided to be user key without timestamps. When the original transaction added a key for a column family that enables user-defined timestamps, say of size 8. Internally `WriteBatch::Put` will leave a placeholder 8 bytes for the final commit timestamp. For example:
cec28aa90f/db/write_batch.cc (L937)

When rebuilding this transaction from a `WriteBatch` from WAL log, we should consider this and remove the tailing 8 bytes of a key before adding it via the public Transaction write APIs.

Pull Request resolved: https://github.com/facebook/rocksdb/pull/12856

Test Plan: Added unit test that would fail without this fix

Reviewed By: cbi42

Differential Revision: D59656399

Pulled By: jowlyzhang

fbshipit-source-id: c716aefa4d548770b691efe96ac8e6d7dab458b9
2024-07-11 16:25:35 -07:00
Changyu Bi cec28aa90f Fix SetOptions() failure in stress test (#12854)
Summary:
fix SetOptions() so that max_read_amp is at least level0_file_num_compaction_trigger.

Pull Request resolved: https://github.com/facebook/rocksdb/pull/12854

Test Plan: monitor stress test new failure

Reviewed By: hx235

Differential Revision: D59618547

Pulled By: cbi42

fbshipit-source-id: b83371f293b87097ee9cdd32d662e9965cde57e6
2024-07-10 21:36:44 -07:00
anand76 37b81bd28f Avoid SyncWAL if flushing during shutdown (#12853)
Summary:
https://github.com/facebook/rocksdb/issues/12746 added calls to FlushWAL/SyncWAL in db_stress during reopen, in order to ensure persistence of unpersisted data and avoid false alarms due to lack of prefix recovery support in db_stress reopen. However, there's no need to flush/sync the WAL if avoid_flush_during_shutdown is false, as the WAL will not be needed during recovery. This allows file systems that don't support SyncWAL (not thread safe) to avoid the need by requesting flush during shutdown.

Pull Request resolved: https://github.com/facebook/rocksdb/pull/12853

Reviewed By: hx235

Differential Revision: D59604138

Pulled By: anand1976

fbshipit-source-id: 4c4470b3c956d6bf64f5b8a1a5727a8b888f1a5f
2024-07-10 15:59:35 -07:00
Jay Huh 6997dd909c Disable attribute group txn tests (#12851)
Summary:
Transactions are not yet supported in AttributeGroup APIs. Disabling `use_attribute_group` for txn tests

Pull Request resolved: https://github.com/facebook/rocksdb/pull/12851

Test Plan:
Verified output that `--use_attribute_group=0`

```
python3 tools/db_crashtest.py whitebox --txn
```
```
python3 tools/db_crashtest.py whitebox --optimistic_txn
```

Reviewed By: hx235

Differential Revision: D59565635

Pulled By: jaykorean

fbshipit-source-id: 7d618f475b6d2e5a53c3c59cdf1e694f3893ae58
2024-07-10 10:53:30 -07:00
Changyu Bi d6f265f9d6 Fix race in multiops txn stress test (#12847)
Summary:
`MultiOpsTxnsStressListener::OnCompactionCompleted()` access `db_` and can be called while db_ is being destroyed in ~StressTest(). This causes TSAN to complain about data race. This PR fixes this issue by calling db_->Close() first to stop all background work. Also moved the cleanup out of StressTest destructor to avoid race between the listener and  ~StressTest().

Pull Request resolved: https://github.com/facebook/rocksdb/pull/12847

Test Plan: monitor crash test failure.

Reviewed By: hx235

Differential Revision: D59492691

Pulled By: cbi42

fbshipit-source-id: afcbab084cc9ac0904d6b04809b0888498ca8e66
2024-07-09 16:51:38 -07:00
Hui Xiao ebe2116240 Remove false-postive assertion in `FaultInjectionTestFS::RenameFile` (#12828)
Summary:
**Context/Summary:**
The assertion `tlist.find(tdn.second) == tlist.end()` 9eebaf11cb/utilities/fault_injection_fs.cc (L1003) can catch us false positive.

Some context
(1) When fault injection is enabled and db open fails because of that, crash test will retry open without injected error in order to proceed with a clean open:
9eebaf11cb/db_stress_tool/db_stress_test_base.cc (L3559)
9eebaf11cb/db_stress_tool/db_stress_test_base.cc (L3586-L3639)
(2)
a. `FaultInjectionTestFS::dir_to_new_files_since_last_sync` records files that are created but not yet synced.
b. When we create CURRENT, we will first create a temp file and rename it as "CURRENT". As part of the renaming, we will [assert](9eebaf11cb/utilities/fault_injection_fs.cc (L1003)) `FaultInjectionTestFS::dir_to_new_files_since_last_sync ` doesn't already have a file named `CURRENT`.

Suppose the following sequence of events happened:

(1) 1st open, with metadata write error
1. As part of creating CURRENT file, added "CURRENT" to `FaultInjectionTestFS::dir_to_new_files_since_last_sync_`
9eebaf11cb/utilities/fault_injection_fs.cc (L735)
2.  `SyncDir()` here 9eebaf11cb/file/filename.cc (L412) failed with injected metadata write error. Therefore, "CURRENT" file didn't get removed from `FaultInjectionTestFS::dir_to_new_files_since_last_sync_` as it would if `SyncDir()` succeeded 9eebaf11cb/utilities/fault_injection_fs.h (L344)

(2) 2st open
1. Attempted to create a CURRENT file and failed during renaming since `FaultInjectionTestFS::dir_to_new_files_since_last_sync_` already had a file called CURRENT. So  will fail
```
assertion failed - tlist.find(tdn.second) == tlist.end()
```

This PR fixed this by removing the assertion. It used to catch us some missing sync of some directory (e.,g https://github.com/facebook/rocksdb/pull/10573) so we will keep thinking about a better way to catch that.

Pull Request resolved: https://github.com/facebook/rocksdb/pull/12828

Test Plan:
Command constantly failed before the fix but passed after the PR running for 10 minutes
```
python3 tools/db_crashtest.py --simple blackbox --interval=10 --WAL_size_limit_MB=1 --WAL_ttl_seconds=60 --acquire_snapshot_one_in=100 --adaptive_readahead=1 --adm_policy=2 --advise_random_on_open=1 --allow_concurrent_memtable_write=1 --allow_data_in_errors=True --allow_fallocate=1 --async_io=0 --auto_readahead_size=1 --avoid_flush_during_recovery=0 --avoid_flush_during_shutdown=0 --avoid_unnecessary_blocking_io=0 --backup_max_size=104857600 --backup_one_in=100000 --batch_protection_bytes_per_key=0 --bgerror_resume_retry_interval=100 --block_align=0 --block_protection_bytes_per_key=8 --block_size=16384 --bloom_before_level=1 --bloom_bits=10 --bottommost_compression_type=lz4hc --bottommost_file_compaction_delay=86400 --bytes_per_sync=0 --cache_index_and_filter_blocks=1 --cache_index_and_filter_blocks_with_high_priority=0 --cache_size=8388608 --cache_type=tiered_auto_hyper_clock_cache --charge_compression_dictionary_building_buffer=1 --charge_file_metadata=0 --charge_filter_construction=0 --charge_table_reader=0 --check_multiget_consistency=0 --check_multiget_entity_consistency=0 --checkpoint_one_in=10000 --checksum_type=kCRC32c --clear_column_family_one_in=0 --column_families=1 --compact_files_one_in=1000 --compact_range_one_in=1000000 --compaction_pri=3 --compaction_readahead_size=0 --compaction_ttl=1 --compress_format_version=1 --compressed_secondary_cache_ratio=0.5 --compressed_secondary_cache_size=0 --compression_checksum=0 --compression_max_dict_buffer_bytes=15 --compression_max_dict_bytes=16384 --compression_parallel_threads=1 --compression_type=zstd --compression_use_zstd_dict_trainer=1 --compression_zstd_max_train_bytes=65536 --continuous_verification_interval=0 --daily_offpeak_time_utc= --data_block_index_type=1 --db_write_buffer_size=0 --default_temperature=kHot --default_write_temperature=kUnknown --delete_obsolete_files_period_micros=30000000 --delpercent=4 --delrangepercent=1 --destroy_db_initially=0 --detect_filter_construct_corruption=1 --disable_file_deletions_one_in=10000 --disable_manual_compaction_one_in=10000 --disable_wal=0 --dump_malloc_stats=0 --enable_checksum_handoff=1 --enable_compaction_filter=0 --enable_custom_split_merge=0 --enable_do_not_compress_roles=0 --enable_index_compression=1 --enable_memtable_insert_with_hint_prefix_extractor=0 --enable_pipelined_write=0 --enable_sst_partitioner_factory=1 --enable_thread_tracking=1 --enable_write_thread_adaptive_yield=0 --error_recovery_with_no_fault_injection=1 --exclude_wal_from_write_fault_injection=1 --fail_if_options_file_error=1 --fifo_allow_compaction=0 --file_checksum_impl=crc32c --fill_cache=1 --flush_one_in=1000000 --format_version=3 --get_all_column_family_metadata_one_in=1000000 --get_current_wal_file_one_in=0 --get_live_files_apis_one_in=1000000 --get_properties_of_all_tables_one_in=100000 --get_property_one_in=100000 --get_sorted_wal_files_one_in=0 --hard_pending_compaction_bytes_limit=2097152 --high_pri_pool_ratio=0 --index_block_restart_interval=2 --index_shortening=0 --index_type=2 --ingest_external_file_one_in=0 --initial_auto_readahead_size=16384 --inplace_update_support=0 --iterpercent=10 --key_len_percent_dist=1,30,69 --key_may_exist_one_in=100000 --last_level_temperature=kWarm --level_compaction_dynamic_level_bytes=0 --lock_wal_one_in=10000 --log_file_time_to_roll=60 --log_readahead_size=16777216 --long_running_snapshots=1 --low_pri_pool_ratio=0.5 --lowest_used_cache_tier=1 --manifest_preallocation_size=0 --manual_wal_flush_one_in=0 --mark_for_compaction_one_file_in=10 --max_auto_readahead_size=16384 --max_background_compactions=1 --max_bytes_for_level_base=67108864 --max_key=1000000 --max_key_len=3 --max_log_file_size=0 --max_manifest_file_size=1073741824 --max_sequential_skip_in_iterations=1 --max_total_wal_size=0 --max_write_batch_group_size_bytes=16 --max_write_buffer_number=3 --max_write_buffer_size_to_maintain=2097152 --memtable_insert_hint_per_batch=0 --memtable_max_range_deletions=0 --memtable_prefix_bloom_size_ratio=0.1 --memtable_protection_bytes_per_key=8 --memtable_whole_key_filtering=0 --memtablerep=skip_list --metadata_charge_policy=1 --metadata_read_fault_one_in=32 --metadata_write_fault_one_in=0 --min_write_buffer_number_to_merge=2 --mmap_read=1 --mock_direct_io=False --nooverwritepercent=1 --num_file_reads_for_auto_readahead=1 --open_files=-1 --open_metadata_read_fault_one_in=0 --open_metadata_write_fault_one_in=8 --open_read_fault_one_in=0 --open_write_fault_one_in=0 --ops_per_thread=100000000 --optimize_filters_for_hits=0 --optimize_filters_for_memory=0 --optimize_multiget_for_io=1 --paranoid_file_checks=1 --partition_filters=1 --partition_pinning=3 --pause_background_one_in=1000000 --periodic_compaction_seconds=1000 --prefix_size=5 --prefixpercent=5 --prepopulate_block_cache=1 --preserve_internal_time_seconds=0 --progress_reports=0 --promote_l0_one_in=0 --read_amp_bytes_per_bit=32 --read_fault_one_in=0 --readahead_size=524288 --readpercent=45 --recycle_log_file_num=0 --reopen=0 --report_bg_io_stats=0 --reset_stats_one_in=1000000 --sample_for_compression=0 --secondary_cache_fault_one_in=32 --secondary_cache_uri= --set_options_one_in=0 --skip_stats_update_on_db_open=0 --snapshot_hold_ops=100000 --soft_pending_compaction_bytes_limit=68719476736 --sqfc_name=foo --sqfc_version=1 --sst_file_manager_bytes_per_sec=0 --sst_file_manager_bytes_per_truncate=0 --stats_dump_period_sec=10 --stats_history_buffer_size=1048576 --strict_bytes_per_sync=1 --subcompactions=2 --sync=0 --sync_fault_injection=1 --table_cache_numshardbits=6 --target_file_size_base=16777216 --target_file_size_multiplier=1 --test_batches_snapshots=0 --top_level_index_pinning=2 --uncache_aggressiveness=1582 --universal_max_read_amp=4 --unpartitioned_pinning=0 --use_adaptive_mutex=0 --use_adaptive_mutex_lru=1 --use_attribute_group=1 --use_delta_encoding=0 --use_direct_io_for_flush_and_compaction=0 --use_direct_reads=0 --use_full_merge_v1=0 --use_get_entity=0 --use_merge=0 --use_multi_cf_iterator=1 --use_multi_get_entity=1 --use_multiget=0 --use_put_entity_one_in=1 --use_sqfc_for_range_queries=1 --use_timed_put_one_in=0 --use_write_buffer_manager=0 --user_timestamp_size=0 --value_size_mult=32 --verification_only=0 --verify_checksum=1 --verify_checksum_one_in=1000 --verify_compression=1 --verify_db_one_in=10000 --verify_file_checksums_one_in=1000 --verify_iterator_with_expected_state_one_in=5 --verify_sst_unique_id_in_manifest=1 --wal_bytes_per_sync=0 --wal_compression=none --write_buffer_size=33554432 --write_dbid_to_manifest=1 --write_fault_one_in=8 --writepercent=35
```

Reviewed By: cbi42

Differential Revision: D59241548

Pulled By: hx235

fbshipit-source-id: 5bb49e6a94943273f47578a2caf3d08ca5b67e5f
2024-07-09 15:35:54 -07:00
Konstantin Ilin 5ecb92760a Create C API function to iterate over WriteBatch for custom Column Families (#12718)
Summary:
Create C API function for iterating over WriteBatch for custom Column Families
Adding function to C API that exposes column family specific methods to iterate over WriteBatch: put_cf, delete_cf and merge_cf. This is required when the one needs to read changes for any non-default column family. Without that functionality it is impossible to iterate over changes in WAL that are relevant to custom column families.

Fixes https://github.com/facebook/rocksdb/issues/12790

Testing:
Added WriteBatch iteration test to "columnfamilies" section of C API unit tests

Pull Request resolved: https://github.com/facebook/rocksdb/pull/12718

Reviewed By: cbi42

Differential Revision: D59483601

Pulled By: ajkr

fbshipit-source-id: b68b900636304528a38620a8c3ad82fdce4b60cb
2024-07-09 12:05:08 -07:00
w41ter b837d41ab1 Expose SizeApproximationFlags to C API (#12836)
Summary: Pull Request resolved: https://github.com/facebook/rocksdb/pull/12836

Reviewed By: cbi42

Differential Revision: D59502673

Pulled By: ajkr

fbshipit-source-id: fc9f77d6740d8efa45d9357662f0f827dbd0511f
2024-07-09 12:00:50 -07:00
Yu Zhang 2e1b3f921f Remove unreachable code (#12846)
Summary:
Removing some unreachable code.

Pull Request resolved: https://github.com/facebook/rocksdb/pull/12846

Reviewed By: cbi42

Differential Revision: D59498423

Pulled By: jowlyzhang

fbshipit-source-id: 6b2c51732d94b1f69a8ba7474b16a171d4e6d640
2024-07-09 09:24:43 -07:00
Jeffery 62b62cf135 Fix CondVar::TimedWait for Windows (#12815)
Summary:
Based on https://github.com/microsoft/STL/issues/369
They fixed the issue in `std::condition_variable_any` but not in `std::condition_variable`, which is currently used in rocksdb repo. So we need to implement the work around regardless of `_MSVC_STL_UPDATE`.

Pull Request resolved: https://github.com/facebook/rocksdb/pull/12815

Reviewed By: cbi42

Differential Revision: D59493690

Pulled By: ajkr

fbshipit-source-id: ad0fc9ef9f2357347d21e271c2f1d0a3a97d89be
2024-07-08 21:38:21 -07:00
Zixuan Tan a97a1f3247 Fix incorrect refillPeriodMicros unit in the document (#12832)
Summary:
The default value for `refillPeriodMicros` is `100 * 1000`, which means 100ms (or 100,000us).

The document comments say 100,000ms (equivalent to 100 seconds), which is incorrect and misleading. This PR fixes this typo.

Pull Request resolved: https://github.com/facebook/rocksdb/pull/12832

Reviewed By: cbi42

Differential Revision: D59492336

Pulled By: ajkr

fbshipit-source-id: c2f55a8b996fe078a1510fcbebaea92ec0075929
2024-07-08 18:08:53 -07:00
WangQian f471e56190 fix the non initialized bug in StderrLogger. (#12839)
Summary:
This PR is intended to fix a potential uninitialized variable bug.

Fixes https://github.com/facebook/rocksdb/issues/12837

Pull Request resolved: https://github.com/facebook/rocksdb/pull/12839

Reviewed By: ajkr

Differential Revision: D59398888

Pulled By: cbi42

fbshipit-source-id: 337391d7c1e73c0ff61797f88fbb4a8379500211
2024-07-08 15:59:02 -07:00
Chdy 110ce5f4a3 fix: Round-Robin pri under leveled compaction allows subcompactions b… (#12843)
Summary:
### Summary: Round-Robin pri under leveled compaction allows subcompactions by default is not compatible with PlainTable

```c++
bool Compaction::ShouldFormSubcompactions() const {
  if (cfd_ == nullptr) {
    return false;
  }

  // Round-Robin pri under leveled compaction allows subcompactions by default
  // and the number of subcompactions can be larger than max_subcompactions_
  if (cfd_->ioptions()->compaction_pri == kRoundRobin &&
      cfd_->ioptions()->compaction_style == kCompactionStyleLevel) {
    return output_level_ > 0;
  }

  if (max_subcompactions_ <= 1) {
    return false;
  }
```

PlainTable does not support Subcompaction, including when AdaptiveTable is applied to PlainTable.  subcompaction by default will result in the following error in some scenarios.

```c++
void PlainTableIterator::Seek(const Slice& target) {
  if (use_prefix_seek_ != !table_->IsTotalOrderMode()) {
    // This check is done here instead of NewIterator() to permit creating an
    // iterator with total_order_seek = true even if we won't be able to Seek()
    // it. This is needed for compaction: it creates iterator with
    // total_order_seek = true but usually never does Seek() on it,
    // only SeekToFirst().
    status_ = Status::InvalidArgument(
        "total_order_seek not implemented for PlainTable.");
    offset_ = next_offset_ = table_->file_info_.data_end_offset;
    return;
  }
```

Pull Request resolved: https://github.com/facebook/rocksdb/pull/12843

Reviewed By: ajkr

Differential Revision: D59433477

Pulled By: cbi42

fbshipit-source-id: fb780ba7f7e8efdfedb7480abf14dd38e0b63677
2024-07-08 12:25:11 -07:00
Radek Hubner b6c3495a71 Update snappy dependency for Java releases. (#12207)
Summary: Pull Request resolved: https://github.com/facebook/rocksdb/pull/12207

Reviewed By: hx235

Differential Revision: D59299915

Pulled By: cbi42

fbshipit-source-id: 3f5fa88b0c5e8366a08734f99db1d3de942cd60b
2024-07-05 09:30:28 -07:00
Hui Xiao 1f589a3f73 Clarify GetProperty API doc (#12829)
Summary:
**Context/Summary:** as titled since 9eebaf11cb/db/internal_stats.cc (L1162).

Pull Request resolved: https://github.com/facebook/rocksdb/pull/12829

Test Plan: no code change

Reviewed By: pdillinger

Differential Revision: D59243565

Pulled By: hx235

fbshipit-source-id: 074137b29bb12d9d965d154626a3289f85a39c52
2024-07-02 13:15:00 -07:00