mirror of https://github.com/facebook/rocksdb.git
444 Commits
Author | SHA1 | Message | Date |
---|---|---|---|
Hui Xiao | 0d93c8a6ca |
Decouple sync fault and write injection in FaultInjectionTestFS & fix tracing issue under WAL write error injection (#12797)
Summary:
**Context/Summary:**
After injecting write error to WAL, we started to see crash recovery verification failure in prefix recovery. That's because the current tracing implementation traces every write before it writes to WAL even when the WAL write can fail with write error injection. One consequence of that is the traced writes in trace files does not corresponding to write sequence sequence anymore e.g, it has more traced writes that the actual assigned sequence number to successful writes. Therefore
|
|
Hui Xiao | 58bc4db456 |
Print more debugging info & further disable backup/restore error inejction (#12812)
Summary: **Context/Summary:** Print more info for debugging a TestCheckpoint error; further disable backup/restore error injection as it has not been stabilized with our new thread-local error injection. Will need to enable it separately later. Pull Request resolved: https://github.com/facebook/rocksdb/pull/12812 Test Plan: CI Reviewed By: jaykorean Differential Revision: D59072678 Pulled By: hx235 fbshipit-source-id: 9481ccf62db952288e7f47ee4b68a34ad0651d5c |
|
Hui Xiao | 6f79496475 |
Tag FaultInjectionIOType::kRead for FaultInjectionTestFS new read file & fix unrelease snapshot (#12810)
Summary:
**Context/Summary:** It makes more sense to mark error injection during creation as read file as "kread" so we don't get confusing msg like below
```
stderr:
error : Get() returns IO error: injected metadata write error for key: 000000000000004F000000000000012B00000000000000EF.
Verification failed :(
```
Also an early return here
|
|
Hui Xiao | e0ddbee76f |
Remove unnecessary injected error logging in crash test (#12807)
Summary: Context/Summary: as titled, since injected error log isn't that useful for debugging and takes up a lot of console printing space Pull Request resolved: https://github.com/facebook/rocksdb/pull/12807 Test Plan: CI Reviewed By: pdillinger, jowlyzhang Differential Revision: D58969796 Pulled By: hx235 fbshipit-source-id: 1663fb0779d7a049fc3b101ddefd263be7bdd4b5 |
|
Hui Xiao | 56f7ef50d7 |
Fix nullptr access and race to fault_fs_guard (#12799)
Summary: **Context/Summary:** There are a couple places where we forgot to check fault_fs_guard before accessing it. So we can see something like this occasionally ``` =138831==Hint: address points to the zero page. SCARINESS: 10 (null-deref) AddressSanitizer:DEADLYSIGNAL #0 0x18b9e0b in rocksdb::ThreadLocalPtr::Get() const fbcode/internal_repo_rocksdb/repo/util/thread_local.cc:503 https://github.com/facebook/rocksdb/issues/1 0x83d8b7 in rocksdb::StressTest::TestCompactRange(rocksdb::ThreadState*, long, rocksdb::Slice const&, rocksdb::ColumnFamilyHandle*) fbcode/internal_repo_rocksdb/repo/utilities/fault_injection_fs.h ``` Also accessing of `io_activties_exempted_from_fault_injection.find` not fully synced so we see the following ``` WARNING: ThreadSanitizer: data race (pid=90939) Write of size 8 at 0x7b4c000004d0 by thread T762 (mutexes: write M0): #0 std::_Rb_tree<rocksdb::Env::IOActivity, rocksdb::Env::IOActivity, std::_Identity<rocksdb::Env::IOActivity>, std::less<rocksdb::Env::IOActivity>, std::allocator<rocksdb::Env::IOActivity>>::operator=(std::_Rb_tree<rocksdb::Env::IOActivity, rocksdb::Env::IOActivity, std::_Identity<rocksdb::Env::IOActivity>, std::less<rocksdb::Env::IOActivity>, std::allocator<rocksdb::Env::IOActivity>> const&) fbcode/third-party-buck/platform010/build/libgcc/include/c++/trunk/bits/stl_tree.h:208 (db_stress+0x411c32) (BuildId: b803e5aca22c6b080defed8e85b7bfec) https://github.com/facebook/rocksdb/issues/1 rocksdb::DbStressListener::OnErrorRecoveryCompleted(rocksdb::Status) fbcode/third-party-buck/platform010/build/libgcc/include/c++/trunk/bits/stl_set.h:298 (db_stress+0x4112e5) (BuildId: b803e5aca22c6b080defed8e85b7bfec) https://github.com/facebook/rocksdb/issues/2 rocksdb::EventHelpers::NotifyOnErrorRecoveryEnd(std::vector<std::shared_ptr<rocksdb::EventListener>, std::allocator<std::shared_ptr<rocksdb::EventListener>>> const&, rocksdb::Status const&, rocksdb::Status const&, rocksdb::InstrumentedMutex*) fbcode/internal_repo_rocksdb/repo/db/event_helpers.cc:239 (db_stress+0xa09d60) (BuildId: b803e5aca22c6b080defed8e85b7bfec) Previous read of size 8 at 0x7b4c000004d0 by thread T131 (mutexes: write M1): #0 rocksdb::FaultInjectionTestFS::MaybeInjectThreadLocalError(rocksdb::FaultInjectionIOType, rocksdb::IOOptions const&, rocksdb::FaultInjectionTestFS::ErrorOperation, rocksdb::Slice*, bool, char*, bool, bool*) fbcode/third-party-buck/platform010/build/libgcc/include/c++/trunk/bits/stl_tree.h:798 (db_stress+0xf7d0f3) (BuildId: b803e5aca22c6b080defed8e85b7bfec) ``` Pull Request resolved: https://github.com/facebook/rocksdb/pull/12799 Test Plan: CI Reviewed By: jowlyzhang Differential Revision: D58917449 Pulled By: hx235 fbshipit-source-id: f24fc1acc2a7d91f9f285447a97ba41397f48dbd |
|
Hui Xiao | e90e9153d5 |
Calculate `injected_error_count` even when `SharedState::ignore_read_error` (#12800)
Summary:
**Context/Summary:**
`injected_error_count` is needed to verify read error injection. For example, when injected_error_count == 0, the read call should not return error.
|
|
Hui Xiao | 981fd432fa |
Fix not getting expected injected read error (#12793)
Summary: **Context/Summary:** https://github.com/facebook/rocksdb/pull/12713 accidentally removed the mechanism of ignoring injected read error on non-critical read path such as read from filter. IO failure in read from filter should not fail the read as we can always read from the actual file. Therefore error injection in filter read path does not need to lead to failure in Get() and crash test should allow that. Otherwise, we will get crash test error "Didn't get expected error from..." Pull Request resolved: https://github.com/facebook/rocksdb/pull/12793 Test Plan: CI Reviewed By: cbi42 Differential Revision: D58895393 Pulled By: hx235 fbshipit-source-id: 5b605d8446e0b8d4149cdbe6f4be3c7534d4acfa |
|
Hui Xiao | 1adb935720 |
Inject more errors to more files in stress test (#12713)
Summary: **Context:** We currently have partial error injection: - DB operation: all read, SST write - DB open: all read, SST write, all metadata write. This PR completes the error injection (with some limitations below): - DB operation & open: all read, all write, all metadata write, all metadata read **Summary:** - Inject retryable metadata read, metadata write error concerning directory (e.g, dir sync, ) or file metadata (e.g, name, size, file creation/deletion...) - Inject retryable errors to all major file types: random access file, sequential file, writable file - Allow db stress test operations to handle above injected errors gracefully without crashing - Change all error injection to thread-local implementation for easier disabling and enabling in the same thread. For example, we can control error handling thread to have no error injection. It's also cleaner in code. - Limitation: compared to before, we now don't have write fault injection for backup/restore CopyOrCreateFiles work threads since they use anonymous background threads as well as read injection for db open bg thread - Add a new flag to test error recovery without error injection so we can test the path where error recovery actually succeeds - Some Refactory & fix to db stress test framework (see PR review comments) - Fix some minor bugs surfaced (see PR review comments) - Limitation: had to disable backup restore with metadata read/write injection since it surfaces too many testing issues. Will add it back later to focus on surfacing actual code/internal bugs first. Pull Request resolved: https://github.com/facebook/rocksdb/pull/12713 Test Plan: - Existing UT - CI with no trivial error failure Reviewed By: pdillinger Differential Revision: D58326608 Pulled By: hx235 fbshipit-source-id: 011b5195aaeb6011641ae0a9194f7f2a0e325ad7 |
|
Peter Dillinger | 71f9e6b5b3 |
Add experimental range filters to stress/crash test (#12769)
Summary: Implemented two key segment extractors that satisfy the "segment prefix property," one with variable segment widths and one with fixed. Used these to create a couple of named configs and versions that are randomly selected by the crash test. On the read side, the required table_filter is set up everywhere I found the stress test uses iterator_upper_bound. Writing filters on new SST files and applying filters on SST files to range queries are configured independently, to potentially help with isolating different sides of the functionality. Not yet implemented / possible follow-up: * Consider manipulating/skewing the query bounds to better exercise filters * Not yet using categories in the extractors * Not yet dynamically changing the filtering version Pull Request resolved: https://github.com/facebook/rocksdb/pull/12769 Test Plan: Some stress test trial runs, including with ASAN. Inserted some temporary probes to ensure code was being exercised (more or less) as intended. Reviewed By: hx235 Differential Revision: D58547462 Pulled By: pdillinger fbshipit-source-id: f7b1596dd668426268c5293ac17615f749703f52 |
|
Jay Huh | f26e2fedb3 |
Disable AttributeGroup in multiops txn test (#12781)
Summary: AttributeGroup is not yet supported in MultiOpsTxn Test. Disabling it for now. Pull Request resolved: https://github.com/facebook/rocksdb/pull/12781 Test Plan: Disabling in the test Reviewed By: hx235 Differential Revision: D58757042 Pulled By: jaykorean fbshipit-source-id: 8c3c85376e6ec0d1c7027b83abeb91eddc64236f |
|
Hui Xiao | d0259c2c98 |
Enable reading un-synced data in db stress test (#12752)
Summary: **Context/Summary:** There are a few blockers to enabling reading un-synced data in db stress test (1) GetFileSize() will always return 0 for file written under direct IO because we don't track the last flushed position for `TestFSWritableFile` under direct IO. So it will surface as ``` Verification failed: VerifyChecksum failed: Corruption: file is too short (0 bytes) to be an sstable: /tmp/rocksdb_crashtest_blackbox4deg_c5e/000009.sst db_stress: db_stress_tool/db_stress_test_base.cc:518: void rocksdb::StressTest::ProcessStatus(rocksdb::SharedState*, std::string, const rocksdb::Status&, bool) const: Assertion `false' failed. Received signal 6 (Aborted) Invoking GDB for stack trace... ``` (2) A couple minor FIXME in left in https://github.com/facebook/rocksdb/pull/12729. This PR fixed (1) and (2) and enabled reading un-synced data in stress test. Pull Request resolved: https://github.com/facebook/rocksdb/pull/12752 Test Plan: - The following command failed before this PR and passed after. ``` ./db_stress --WAL_size_limit_MB=1 --WAL_ttl_seconds=0 --acquire_snapshot_one_in=100 --adaptive_readahead=0 --adm_policy=0 --advise_random_on_open=1 --allow_concurrent_memtable_write=0 --allow_data_in_errors=True --allow_fallocate=1 --async_io=1 --atomic_flush=1 --auto_readahead_size=1 --avoid_flush_during_recovery=0 --avoid_flush_during_shutdown=1 --avoid_unnecessary_blocking_io=0 --backup_max_size=104857600 --backup_one_in=100000 --batch_protection_bytes_per_key=0 --bgerror_resume_retry_interval=10000 --block_align=0 --block_protection_bytes_per_key=4 --block_size=16384 --bloom_before_level=2147483647 --bloom_bits=37.92024930098943 --bottommost_compression_type=disable --bottommost_file_compaction_delay=0 --bytes_per_sync=0 --cache_index_and_filter_blocks=1 --cache_index_and_filter_blocks_with_high_priority=0 --cache_size=8388608 --cache_type=auto_hyper_clock_cache --charge_compression_dictionary_building_buffer=1 --charge_file_metadata=0 --charge_filter_construction=0 --charge_table_reader=1 --check_multiget_consistency=0 --check_multiget_entity_consistency=0 --checkpoint_one_in=1000000 --checksum_type=kXXH3 --clear_column_family_one_in=0 --column_families=1 --compact_files_one_in=1000 --compact_range_one_in=1000 --compaction_pri=3 --compaction_readahead_size=0 --compaction_ttl=10 --compress_format_version=2 --compressed_secondary_cache_size=8388608 --compression_checksum=1 --compression_max_dict_buffer_bytes=0 --compression_max_dict_bytes=0 --compression_parallel_threads=1 --compression_type=zlib --compression_use_zstd_dict_trainer=0 --compression_zstd_max_train_bytes=0 --continuous_verification_interval=0 --daily_offpeak_time_utc= --data_block_index_type=0 --db=/tmp/rocksdb_crashtest_blackbox4deg_c5e --db_write_buffer_size=0 --default_temperature=kWarm --default_write_temperature=kHot --delete_obsolete_files_period_micros=30000000 --delpercent=4 --delrangepercent=1 --destroy_db_initially=0 --detect_filter_construct_corruption=0 --disable_file_deletions_one_in=1000000 --disable_manual_compaction_one_in=10000 --disable_wal=1 --dump_malloc_stats=0 --enable_checksum_handoff=0 --enable_compaction_filter=0 --enable_custom_split_merge=1 --enable_do_not_compress_roles=0 --enable_index_compression=1 --enable_memtable_insert_with_hint_prefix_extractor=0 --enable_pipelined_write=0 --enable_sst_partitioner_factory=1 --enable_thread_tracking=0 --enable_write_thread_adaptive_yield=0 --expected_values_dir=/tmp/rocksdb_crashtest_expected_8whyhdxm --fail_if_options_file_error=0 --fifo_allow_compaction=1 --file_checksum_impl=xxh64 --fill_cache=1 --flush_one_in=1000 --format_version=4 --get_all_column_family_metadata_one_in=10000 --get_current_wal_file_one_in=0 --get_live_files_apis_one_in=1000000 --get_properties_of_all_tables_one_in=100000 --get_property_one_in=100000 --get_sorted_wal_files_one_in=0 --hard_pending_compaction_bytes_limit=274877906944 --high_pri_pool_ratio=0.5 --index_block_restart_interval=9 --index_shortening=0 --index_type=0 --ingest_external_file_one_in=0 --initial_auto_readahead_size=0 --inplace_update_support=0 --iterpercent=10 --key_len_percent_dist=1,30,69 --key_may_exist_one_in=100000 --last_level_temperature=kUnknown --level_compaction_dynamic_level_bytes=1 --lock_wal_one_in=100 --log_file_time_to_roll=0 --log_readahead_size=0 --long_running_snapshots=0 --low_pri_pool_ratio=0 --lowest_used_cache_tier=0 --manifest_preallocation_size=5120 --manual_wal_flush_one_in=0 --mark_for_compaction_one_file_in=0 --max_auto_readahead_size=0 --max_background_compactions=1 --max_bytes_for_level_base=67108864 --max_key=100000 --max_key_len=3 --max_log_file_size=0 --max_manifest_file_size=1073741824 --max_sequential_skip_in_iterations=2 --max_total_wal_size=0 --max_write_batch_group_size_bytes=1048576 --max_write_buffer_number=3 --max_write_buffer_size_to_maintain=2097152 --memtable_insert_hint_per_batch=1 --memtable_max_range_deletions=100 --memtable_prefix_bloom_size_ratio=0.001 --memtable_protection_bytes_per_key=0 --memtable_whole_key_filtering=0 --memtablerep=skip_list --metadata_charge_policy=1 --min_write_buffer_number_to_merge=2 --mmap_read=0 --mock_direct_io=False --nooverwritepercent=1 --num_file_reads_for_auto_readahead=1 --open_files=-1 --open_metadata_write_fault_one_in=0 --open_read_fault_one_in=0 --open_write_fault_one_in=16 --ops_per_thread=100000000 --optimize_filters_for_hits=1 --optimize_filters_for_memory=0 --optimize_multiget_for_io=0 --paranoid_file_checks=1 --partition_filters=0 --partition_pinning=1 --pause_background_one_in=1000000 --periodic_compaction_seconds=1000 --prefix_size=5 --prefixpercent=5 --prepopulate_block_cache=1 --preserve_internal_time_seconds=0 --progress_reports=0 --promote_l0_one_in=0 --read_amp_bytes_per_bit=0 --read_fault_one_in=1000 --readahead_size=16384 --readpercent=45 --recycle_log_file_num=0 --reopen=0 --report_bg_io_stats=0 --reset_stats_one_in=1000000 --sample_for_compression=5 --secondary_cache_fault_one_in=32 --secondary_cache_uri= --set_options_one_in=0 --skip_stats_update_on_db_open=0 --snapshot_hold_ops=100000 --soft_pending_compaction_bytes_limit=68719476736 --sst_file_manager_bytes_per_sec=104857600 --sst_file_manager_bytes_per_truncate=1048576 --stats_dump_period_sec=0 --stats_history_buffer_size=0 --strict_bytes_per_sync=0 --subcompactions=2 --sync=0 --sync_fault_injection=1 --table_cache_numshardbits=0 --target_file_size_base=16777216 --target_file_size_multiplier=1 --test_batches_snapshots=0 --top_level_index_pinning=0 --uncache_aggressiveness=1 --universal_max_read_amp=-1 --unpartitioned_pinning=0 --use_adaptive_mutex=0 --use_adaptive_mutex_lru=0 --use_attribute_group=1 --use_delta_encoding=1 --use_direct_io_for_flush_and_compaction=1 --use_direct_reads=0 --use_full_merge_v1=0 --use_get_entity=0 --use_merge=0 --use_multi_cf_iterator=0 --use_multi_get_entity=1 --use_multiget=0 --use_put_entity_one_in=5 --use_timed_put_one_in=0 --use_write_buffer_manager=0 --user_timestamp_size=0 --value_size_mult=32 --verification_only=0 --verify_checksum=1 --verify_checksum_one_in=10 --verify_compression=1 --verify_db_one_in=10000 --verify_file_checksums_one_in=10 --verify_iterator_with_expected_state_one_in=5 --verify_sst_unique_id_in_manifest=1 --wal_bytes_per_sync=0 --wal_compression=zstd --write_buffer_size=33554432 --write_dbid_to_manifest=0 --write_fault_one_in=0 --writepercent=35 Verification failed: VerifyChecksum failed: Corruption: file is too short (0 bytes) to be an sstable: /tmp/rocksdb_crashtest_blackbox4deg_c5e/000009.sst db_stress: db_stress_tool/db_stress_test_base.cc:518: void rocksdb::StressTest::ProcessStatus(rocksdb::SharedState*, std::string, const rocksdb::Status&, bool) const: Assertion `false' failed. Received signal 6 (Aborted) Invoking GDB for stack trace... ``` - Run python3 tools/db_crashtest.py --simple blackbox --lock_wal_one_in=10 --backup_one_in=10 --sync_fault_injection=0 --use_direct_io_for_flush_and_compaction=0 for 1 hour - Monitor stress test CI Reviewed By: pdillinger Differential Revision: D58395807 Pulled By: hx235 fbshipit-source-id: 7d4b321acc0a0af3501b62dc417a7f6e2d318265 |
|
Jay Huh | b8c9a2576a |
Add AttributeGroupIterator to Stress Test (#12776)
Summary: As title. Changes include the following - `Refresh()` moved from `Iterator` interface to `IteratorBase` so that `AttributeGroupIterator` can also have Refresh() API (implemention will be added in the future PR) - `TestIterate()`'s main logic refactored into `TestIterateImpl()` so that it can be shared with `TestIterateAttributeGroups()` - `VerifyIterator()` also changed so that verification code can be shared between `Iterator` and `AttributeGroupIterator` Pull Request resolved: https://github.com/facebook/rocksdb/pull/12776 Test Plan: Single CF Iterator ``` python3 tools/db_crashtest.py blackbox --simple --max_key=25000000 --write_buffer_size=4194304 --use_attribute_group=0 --use_put_entity_one_in=1 --use_multi_get=1 --use_multi_cf_iterator=0 --verify_iterator_with_expected_state_one_in=2 ``` CoalescingIterator ``` python3 tools/db_crashtest.py blackbox --simple --max_key=25000000 --write_buffer_size=4194304 --use_attribute_group=0 --use_put_entity_one_in=1 --use_multi_get=1 --use_multi_cf_iterator=1 --verify_iterator_with_expected_state_one_in=2 ``` AttributeGroupIterator ``` python3 tools/db_crashtest.py blackbox --simple --max_key=25000000 --write_buffer_size=4194304 --use_attribute_group=1 --use_put_entity_one_in=1 --use_multi_get=1 --use_multi_cf_iterator=1 --verify_iterator_with_expected_state_one_in=2 ``` Reviewed By: cbi42 Differential Revision: D58626165 Pulled By: jaykorean fbshipit-source-id: 3e0a6ff72e51ecef9e06b65acfa53605a24d742e |
|
Hui Xiao | a2f772910e |
Fix manual WAL flush causing false-positive inconsistent values in TestBackupRestore() (#12758)
Summary: **Context/Summary:** When manual WAL flush is used, the following can happen: t1: Issued Put(k1) to original DB. It entered WAL buffer since manual_wal_flush_one_in > 0. It never made it to WAL file without FlushWAL() t2: The same WAL got back-up and restored to restore DB. So the restore DB's WAL does not contain this Put() t3: The same WAL in the original DB got FlushWAL() so it got the Put() entry Querying k1 in original and restored DB will give different result and fail our consistency check in stress test. ``` Failure in a backup/restore operation with: Corruption: 0x000000000000000178 exists in original db but not in restore ``` This PR fixed it. Pull Request resolved: https://github.com/facebook/rocksdb/pull/12758 Test Plan: ``` ./db_stress --WAL_size_limit_MB=0 --WAL_ttl_seconds=0 --acquire_snapshot_one_in=10000 --adaptive_readahead=1 --adm_policy=1 --advise_random_on_open=1 --allow_concurrent_memtable_write=0 --allow_data_in_errors=True --allow_fallocate=1 --async_io=1 --auto_readahead_size=0 --avoid_flush_during_recovery=1 --avoid_flush_during_shutdown=1 --avoid_unnecessary_blocking_io=0 --backup_max_size=104857600 --backup_one_in=100 --batch_protection_bytes_per_key=8 --bgerror_resume_retry_interval=1000000 --block_align=1 --block_protection_bytes_per_key=0 --block_size=16384 --bloom_before_level=2147483646 --bloom_bits=13 --bottommost_compression_type=none --bottommost_file_compaction_delay=600 --bytes_per_sync=262144 --cache_index_and_filter_blocks=0 --cache_index_and_filter_blocks_with_high_priority=0 --cache_size=33554432 --cache_type=auto_hyper_clock_cache --charge_compression_dictionary_building_buffer=0 --charge_file_metadata=0 --charge_filter_construction=1 --charge_table_reader=0 --check_multiget_consistency=0 --check_multiget_entity_consistency=1 --checkpoint_one_in=1000000 --checksum_type=kxxHash --clear_column_family_one_in=0 --column_families=1 --compact_files_one_in=1000 --compact_range_one_in=1000000 --compaction_pri=3 --compaction_readahead_size=0 --compaction_ttl=0 --compress_format_version=1 --compressed_secondary_cache_size=8388608 --compression_checksum=0 --compression_max_dict_buffer_bytes=0 --compression_max_dict_bytes=0 --compression_parallel_threads=4 --compression_type=none --compression_use_zstd_dict_trainer=0 --compression_zstd_max_train_bytes=0 --continuous_verification_interval=0 --daily_offpeak_time_utc= --data_block_index_type=0 --db=/dev/shm/rocksdb_test/rocksdb_crashtest_blackbox_1 --db_write_buffer_size=0 --default_temperature=kCold --default_write_temperature=kHot --delete_obsolete_files_period_micros=21600000000 --delpercent=40 --delrangepercent=0 --destroy_db_initially=0 --detect_filter_construct_corruption=1 --disable_file_deletions_one_in=1000000 --disable_manual_compaction_one_in=10000 --disable_wal=0 --dump_malloc_stats=0 --enable_checksum_handoff=0 --enable_compaction_filter=0 --enable_custom_split_merge=1 --enable_do_not_compress_roles=1 --enable_index_compression=0 --enable_memtable_insert_with_hint_prefix_extractor=0 --enable_pipelined_write=0 --enable_sst_partitioner_factory=1 --enable_thread_tracking=0 --enable_write_thread_adaptive_yield=1 --expected_values_dir=/dev/shm/rocksdb_test/rocksdb_crashtest_expected_1 --fail_if_options_file_error=1 --fifo_allow_compaction=1 --file_checksum_impl=none --fill_cache=0 --flush_one_in=1000000 --format_version=2 --get_all_column_family_metadata_one_in=1000000 --get_current_wal_file_one_in=0 --get_live_files_apis_one_in=1000000 --get_properties_of_all_tables_one_in=1000000 --get_property_one_in=100000 --get_sorted_wal_files_one_in=0 --hard_pending_compaction_bytes_limit=274877906944 --high_pri_pool_ratio=0.5 --index_block_restart_interval=5 --index_shortening=2 --index_type=0 --ingest_external_file_one_in=0 --initial_auto_readahead_size=16384 --inplace_update_support=0 --iterpercent=10 --key_len_percent_dist=1,30,69 --key_may_exist_one_in=100 --last_level_temperature=kUnknown --level_compaction_dynamic_level_bytes=1 --lock_wal_one_in=0 --log_file_time_to_roll=60 --log_readahead_size=16777216 --long_running_snapshots=0 --low_pri_pool_ratio=0.5 --lowest_used_cache_tier=1 --manifest_preallocation_size=5120 --manual_wal_flush_one_in=100 --mark_for_compaction_one_file_in=10 --max_auto_readahead_size=524288 --max_background_compactions=1 --max_bytes_for_level_base=67108864 --max_key=10 --max_key_len=3 --max_log_file_size=0 --max_manifest_file_size=1073741824 --max_sequential_skip_in_iterations=16 --max_total_wal_size=0 --max_write_batch_group_size_bytes=64 --max_write_buffer_number=10 --max_write_buffer_size_to_maintain=2097152 --memtable_insert_hint_per_batch=1 --memtable_max_range_deletions=0 --memtable_prefix_bloom_size_ratio=0.1 --memtable_protection_bytes_per_key=1 --memtable_whole_key_filtering=0 --memtablerep=skip_list --metadata_charge_policy=1 --min_write_buffer_number_to_merge=1 --mmap_read=1 --mock_direct_io=False --nooverwritepercent=1 --num_file_reads_for_auto_readahead=0 --open_files=-1 --open_metadata_write_fault_one_in=0 --open_read_fault_one_in=0 --open_write_fault_one_in=16 --ops_per_thread=100000000 --optimize_filters_for_hits=0 --optimize_filters_for_memory=0 --optimize_multiget_for_io=1 --paranoid_file_checks=1 --partition_filters=0 --partition_pinning=0 --pause_background_one_in=1000000 --periodic_compaction_seconds=2 --prefix_size=7 --prefixpercent=5 --prepopulate_block_cache=0 --preserve_internal_time_seconds=0 --progress_reports=0 --promote_l0_one_in=0 --read_amp_bytes_per_bit=0 --read_fault_one_in=1000 --readahead_size=16384 --readpercent=0 --recycle_log_file_num=0 --reopen=0 --report_bg_io_stats=0 --reset_stats_one_in=10000 --sample_for_compression=0 --secondary_cache_fault_one_in=0 --secondary_cache_uri= --set_options_one_in=0 --skip_stats_update_on_db_open=1 --snapshot_hold_ops=100000 --soft_pending_compaction_bytes_limit=68719476736 --sst_file_manager_bytes_per_sec=0 --sst_file_manager_bytes_per_truncate=0 --stats_dump_period_sec=600 --stats_history_buffer_size=0 --strict_bytes_per_sync=1 --subcompactions=2 --sync=0 --sync_fault_injection=1 --table_cache_numshardbits=-1 --target_file_size_base=16777216 --target_file_size_multiplier=1 --test_batches_snapshots=0 --top_level_index_pinning=2 --uncache_aggressiveness=709 --universal_max_read_amp=0 --unpartitioned_pinning=0 --use_adaptive_mutex=0 --use_adaptive_mutex_lru=1 --use_attribute_group=1 --use_delta_encoding=1 --use_direct_io_for_flush_and_compaction=0 --use_direct_reads=0 --use_full_merge_v1=0 --use_get_entity=0 --use_merge=1 --use_multi_cf_iterator=0 --use_multi_get_entity=0 --use_multiget=0 --use_put_entity_one_in=0 --use_timed_put_one_in=0 --use_write_buffer_manager=0 --user_timestamp_size=0 --value_size_mult=32 --verification_only=0 --verify_checksum=1 --verify_checksum_one_in=1000 --verify_compression=0 --verify_db_one_in=100000 --verify_file_checksums_one_in=0 --verify_iterator_with_expected_state_one_in=5 --verify_sst_unique_id_in_manifest=1 --wal_bytes_per_sync=0 --wal_compression=none --write_buffer_size=335544 --write_dbid_to_manifest=1 --write_fault_one_in=128 --writepercent=45 ``` Repro-ed quickly before the fix and stably run after the fix. Reviewed By: jowlyzhang Differential Revision: D58426535 Pulled By: hx235 fbshipit-source-id: 611e56086e76f8c06d292624e60fd96e511ce723 |
|
Hui Xiao | d3c4b7fe0b |
Enable reopen with un-synced data loss in crash test (#12746)
Summary: **Context/Summary:** https://github.com/facebook/rocksdb/pull/12567 disabled reopen with un-synced data loss in crash test since we discovered un-synced WAL loss and we currently don't support prefix recovery in reopen. This PR explicitly sync WAL data before close to avoid such data loss case from happening and add back the testing coverage. Pull Request resolved: https://github.com/facebook/rocksdb/pull/12746 Test Plan: CI Reviewed By: ajkr Differential Revision: D58326890 Pulled By: hx235 fbshipit-source-id: 0865f715e97c5948d7cb3aea62fe2a626cb6522a |
|
Peter Dillinger | b34cef57b7 |
Support pro-actively erasing obsolete block cache entries (#12694)
Summary: Currently, when files become obsolete, the block cache entries associated with them just age out naturally. With pure LRU, this is not too bad, as once you "use" enough cache entries to (re-)fill the cache, you are guranteed to have purged the obsolete entries. However, HyperClockCache is a counting clock cache with a somewhat longer memory, so could be more negatively impacted by previously-hot cache entries becoming obsolete, and taking longer to age out than newer single-hit entries. Part of the reason we still have this natural aging-out is that there's almost no connection between block cache entries and the file they are associated with. Everything is hashed into the same pool(s) of entries with nothing like a secondary index based on file. Keeping track of such an index could be expensive. This change adds a new, mutable CF option `uncache_aggressiveness` for erasing obsolete block cache entries. The process can be speculative, lossy, or unproductive because not all potential block cache entries associated with files will be resident in memory, and attempting to remove them all could be wasted CPU time. Rather than a simple on/off switch, `uncache_aggressiveness` basically tells RocksDB how much CPU you're willing to burn trying to purge obsolete block cache entries. When such efforts are not sufficiently productive for a file, we stop and move on. The option is in ColumnFamilyOptions so that it is dynamically changeable for already-open files, and customizeable by CF. Note that this block cache removal happens as part of the process of purging obsolete files, which is often in a background thread (depending on `background_purge_on_iterator_cleanup` and `avoid_unnecessary_blocking_io` options) rather than along CPU critical paths. Notable auxiliary code details: * Possibly fixing some issues with trivial moves with `only_delete_metadata`: unnecessary TableCache::Evict in that case and missing from the ObsoleteFileInfo move operator. (Not able to reproduce an current failure.) * Remove suspicious TableCache::Erase() from VersionSet::AddObsoleteBlobFile() (TODO follow-up item) Marked EXPERIMENTAL until more thorough validation is complete. Direct stats of this functionality are omitted because they could be misleading. Block cache hit rate is a better indicator of benefit, and CPU profiling a better indicator of cost. Pull Request resolved: https://github.com/facebook/rocksdb/pull/12694 Test Plan: * Unit tests added, including refactoring an existing test to make better use of parameterized tests. * Added to crash test. * Performance, sample command: ``` for I in `seq 1 10`; do for UA in 300; do for CT in lru_cache fixed_hyper_clock_cache auto_hyper_clock_cache; do rm -rf /dev/shm/test3; TEST_TMPDIR=/dev/shm/test3 /usr/bin/time ./db_bench -benchmarks=readwhilewriting -num=13000000 -read_random_exp_range=6 -write_buffer_size=10000000 -bloom_bits=10 -cache_type=$CT -cache_size=390000000 -cache_index_and_filter_blocks=1 -disable_wal=1 -duration=60 -statistics -uncache_aggressiveness=$UA 2>&1 | grep -E 'micros/op|rocksdb.block.cache.data.(hit|miss)|rocksdb.number.keys.(read|written)|maxresident' | awk '/rocksdb.block.cache.data.miss/ { miss = $4 } /rocksdb.block.cache.data.hit/ { hit = $4 } { print } END { print "hit rate = " ((hit * 1.0) / (miss + hit)) }' | tee -a results-$CT-$UA; done; done; done ``` Averaging 10 runs each case, block cache data block hit rates ``` lru_cache UA=0 -> hit rate = 0.327, ops/s = 87668, user CPU sec = 139.0 UA=300 -> hit rate = 0.336, ops/s = 87960, user CPU sec = 139.0 fixed_hyper_clock_cache UA=0 -> hit rate = 0.336, ops/s = 100069, user CPU sec = 139.9 UA=300 -> hit rate = 0.343, ops/s = 100104, user CPU sec = 140.2 auto_hyper_clock_cache UA=0 -> hit rate = 0.336, ops/s = 97580, user CPU sec = 140.5 UA=300 -> hit rate = 0.345, ops/s = 97972, user CPU sec = 139.8 ``` Conclusion: up to roughly 1 percentage point of improved block cache hit rate, likely leading to overall improved efficiency (because the foreground CPU cost of cache misses likely outweighs the background CPU cost of erasure, let alone I/O savings). Reviewed By: ajkr Differential Revision: D57932442 Pulled By: pdillinger fbshipit-source-id: 84a243ca5f965f731f346a4853009780a904af6c |
|
Hui Xiao | 390fc55ba1 |
Revert PR 12684 and 12556 (#12738)
Summary: **Context/Summary:** a better API design is decided lately so we decided to revert these two changes. Pull Request resolved: https://github.com/facebook/rocksdb/pull/12738 Test Plan: - CI Reviewed By: ajkr Differential Revision: D58162165 Pulled By: hx235 fbshipit-source-id: 9bbe4d2fe9fbe39213f4cf137a2d419e6ffb8e16 |
|
Peter Dillinger | 98393f0139 |
Fix Checkpoint hard link of inactive but unsynced WAL (#12731)
Summary: Background: there is one active WAL file but there can be several more WAL files in various states. Those other WALs are always in a "flushed" state but could be on the `logs_` list not yet fully synced. We currently allow any WAL that is not the active WAL to be hard-linked when creating a Checkpoint, as although it might still be open for write, we are not appending any more data to it. The problem is that a created Checkpoint is supposed to be fully synced on return of that function, and a hard-linked WAL in the state described above might not be fully synced. (Through some prudence in https://github.com/facebook/rocksdb/issues/10083, it would synced if using track_and_verify_wals_in_manifest=true.) The fix is a step toward a long term goal of removing the need to query the filesystem to determine WAL files and their state. (I consider it dubious any time we independently read from or query metadata from a file we have open for writing, as this makes us more susceptible to FileSystem deficiencies or races.) More specifically: * Detect which WALs might not be fully synced, according to our DBImpl metadata, and prevent hard linking those (with `trim_to_size=true` from `GetLiveFilesStorageInfo()`. And while we're at it, use our known flushed sizes for those WALs. * To avoid a race between that and GetSortedWalFiles(), track a maximum needed WAL number for the Checkpoint/GetLiveFilesStorageInfo. * Because of the level of consistency provided by those two, we no longer need to consider syncing as part of the FlushWAL in GetLiveFilesStorageInfo. (We determine the max WAL number consistent with the manifest file size, while holding DB mutex. Should make track_and_verify_wals_in_manifest happy.) This makes the premise of test PutRaceWithCheckpointTrackedWalSync obsolete (sync point callback no longer hit) so the test is removed, with crash test as backstop for related issues. See https://github.com/facebook/rocksdb/issues/10185 Stacked on https://github.com/facebook/rocksdb/issues/12729 Pull Request resolved: https://github.com/facebook/rocksdb/pull/12731 Test Plan: Expanded an existing test, which now fails before fix. Also long runs of blackbox_crash_test with amplified checkpoint frequency. Reviewed By: cbi42 Differential Revision: D58199629 Pulled By: pdillinger fbshipit-source-id: 376e55f4a2b082cd2adb6408a41209de14422382 |
|
Peter Dillinger | 9f4c597d83 |
FaultInjectionTestFS read unsynced data by default (#12729)
Summary: In places (e.g. GetSortedWals()) RocksDB relies on querying the file size or even reading the contents of files currently open for writing, and as in POSIX semantics, expects to see the flushed size and contents regardless of what has been synced. FaultInjectionTestFS historically did not emulate this behavior, only showing synced data from such read operations. (Different from FaultInjectionTestEnv--sigh.) This change makes the "proper" behavior the default behavior, at least for GetFileSize and FSSequentialFile. However, this new functionality is disabled in db_stress because of undiagnosed, unresolved issues. Also removes unused and confusing field `pos_at_last_flush_` This change is needed to support testing a relevant bug fix (in a follow-up diff). Other suggested follow-up: * Fix db_stress not to rely on the old behavior, and fix a related FIXME in db_stress_test_base.cc in LockWAL testing. * Fill in some corner cases in the FileSystem API for reading unsynced data (see new TODO items). * Consider deprecating and removing Flush() API functions from FileSystem APIs. It is not clear to me that there is a supported scenario in which they do anything but confuse API users and developers. If there is a use for them, it doesn't appear to be tested. Pull Request resolved: https://github.com/facebook/rocksdb/pull/12729 Test Plan: applies to all unit tests successfully, just updating the unit test from https://github.com/facebook/rocksdb/issues/12556 due to relying on the errant behavior. Also added a specific unit test Reviewed By: hx235 Differential Revision: D58091835 Pulled By: pdillinger fbshipit-source-id: f47a63b2b000f5875b6293a98577bff663d7fd33 |
|
Po-Chuan Hsieh | b03d415660 |
Fix build on i386 (#12719)
Summary: Cited from https://pkg-status.freebsd.org/beefy21/data/140i386-default/02faf78f4c9b/logs/rocksdb-9.2.1.log The error message is as follows: ``` mkdir -p db_stress_tool && clang++ -O2 -pipe -DOS_FREEBSD -fstack-protector-strong -isystem /usr/local/include -fno-strict-aliasing -Wno-inconsistent-missing-override -Wno-unused-parameter -Wno-unused-variable -Wno-unused-private-field -isystem /usr/local/include -std=c++17 -fPIC -DROCKSDB_DLL -DROCKSDB_USE_RTTI -g -W -Wextra -Wall -Wsign-compare -Wshadow -Wunused-parameter -Werror -I. -I./include -std=c++17 -O2 -pipe -DOS_FREEBSD -fstack-protector-strong -isystem /usr/local/include -fno-strict-aliasing -Wno-inconsistent-missing-override -Wno-unused-parameter -Wno-unused-variable -Wno-unused-private-field -isystem /usr/local/include -std=c++17 -faligned-new -DHAVE_ALIGNED_NEW -DROCKSDB_PLATFORM_POSIX -DROCKSDB_LIB_IO_POSIX -O2 -pipe -DOS_FREEBSD -fstack-protector-strong -isystem /usr/local/include -fno-strict-aliasing -D_REENTRANT -DOS_FREEBSD -DSNAPPY -DGFLAGS=1 -DZLIB -DBZIP2 -DLZ4 -DZSTD -DROCKSDB_MALLOC_USABLE_SIZE -DROCKSDB_PTHREAD_ADAPTIVE_MUTEX -DROCKSDB_BACKTRACE -DROCKSDB_SCHED_GETCPU_PRESENT -isystem third-party/gtest-1.8.1/fused-src -O2 -fno-omit-frame-pointer -momit-leaf-frame-pointer -DNDEBUG -Woverloaded-virtual -Wnon-virtual-dtor -Wno-missing-field-initializers -Wno-invalid-offsetof -c db_stress_tool/db_stress_common.cc -o db_stress_tool/db_stress_common.o db_stress_tool/db_stress_common.cc:204:17: error: format specifies type 'unsigned long' but the argument has type 'size_t' (aka 'unsigned int') [-Werror,-Wformat] block_cache->GetCapacity()); ^~~~~~~~~~~~~~~~~~~~~~~~~~ 1 error generated. ``` Pull Request resolved: https://github.com/facebook/rocksdb/pull/12719 Reviewed By: hx235 Differential Revision: D58093539 Pulled By: jaykorean fbshipit-source-id: 400cae3a4b0d23b168937a5388065ef1c4b8b56e |
|
Levi Tamasi | 023a808417 |
Disable iterator refresh for CoalescingIterator in TestIterateAgainstExpected (#12723)
Summary: Pull Request resolved: https://github.com/facebook/rocksdb/pull/12723 `CoalescingIterator` doesn't support `Refresh` currently; the patch adds a check that was missing from https://github.com/facebook/rocksdb/pull/12721 to disable this operation when multi-CF iterators are in use in the stress test. Reviewed By: jaykorean Differential Revision: D58053334 fbshipit-source-id: 3146f0e7e87230b49b244cecdfcee345c0ce78fa |
|
Jay Huh | f3b7e959b3 |
Add CoalescingIterator to TestIterateAgainstExpected (#12721)
Summary: Continuing from https://github.com/facebook/rocksdb/pull/12706. Adding the CoalescingIterator to `TestIterateAgainstExpected` as well when `use_multi_cf_iterator` is set True Pull Request resolved: https://github.com/facebook/rocksdb/pull/12721 Test Plan: ``` python3 tools/db_crashtest.py blackbox --simple --max_key=25000000 --write_buffer_size=4194304 --use_attribute_group=0 --use_put_entity_one_in=1 --use_multi_get=1 --use_multi_cf_iterator=1 --verify_iterator_with_expected_state_one_in=2 ``` Reviewed By: ltamasi Differential Revision: D58033811 Pulled By: jaykorean fbshipit-source-id: 7caf39883e277e695b653e295ad72b1004169ca0 |
|
Levi Tamasi | 6f17056e40 |
Add transactional/read-your-own-write MultiGetEntity stress test (#12717)
Summary: Pull Request resolved: https://github.com/facebook/rocksdb/pull/12717 The PR adds `Transaction::MultiGetEntity` to the stress tests. Similarly to what we do for `Transaction::MultiGet`, in this mode we open a transaction and randomly add writes for some of the queried keys to it while keeping track of the values written on a per-key basis. The results of `Transaction::MultiGetEntity` can then be validated against these expected values (in order to test the read-your-own-writes functionality) as well as the results returned by `Transaction::GetEntity` for the same keys. Reviewed By: jaykorean Differential Revision: D57990210 fbshipit-source-id: 9bf3bb292051c2c57757f86b517919197b03c524 |
|
Jay Huh | a901ef48f0 |
Introduce use_multi_cf_iterator in stress test (#12706)
Summary: Introduce `use_multi_cf_iterator`, and when it's set, use `CoalescingIterator` in `TestIterate()`. Because all the column families contain the same data in today's Stress Test, we can compare `CoalescingIterator` against any `DBIter` from any of the column families. Currently, coalescing logic verification is done by unit tests, but we can extend the stress test to support different data in different column families in the future. Pull Request resolved: https://github.com/facebook/rocksdb/pull/12706 Test Plan: ``` python3 tools/db_crashtest.py blackbox --simple --max_key=25000000 --write_buffer_size=4194304 --use_attribute_group=0 --use_put_entity_one_in=1 --use_multi_get=1 --use_multi_cf_iterator=1 ``` **More PRs to come** - Use `AttributeGroupIterator` when both `use_multi_cf_iterator` and `use_attribute_group` are true - Support `Refresh()` in `CoalescingIterator` - Extend Stress Test to support different data in different CFs (Long-term) Reviewed By: ltamasi Differential Revision: D58020247 Pulled By: jaykorean fbshipit-source-id: 8e2483b85cf2bb0f5a9bb44851601bbf063484ec |
|
Levi Tamasi | 01179678b2 |
Refactor the non-attribute-group/attribute-group code paths in NonBatchedOpsStressTest::TestMultiGetEntity (#12715)
Summary: Pull Request resolved: https://github.com/facebook/rocksdb/pull/12715 The patch refactors/deduplicates the non-attribute-group and attribute-group code paths in `NonBatchedOpsStressTest::TestMultiGetEntity` by introducing two new generic lambdas `verify_expected_errors` and `check_results` (the latter of which subsumes the existing `handle_results`) that can handle both types of APIs. This change also serves as groundwork for the upcoming transactional `MultiGetEntity` stress tests. Reviewed By: jaykorean Differential Revision: D57977700 fbshipit-source-id: 83a18a9e57f46ea92ba07b2f0dca3e9bc353f257 |
|
Levi Tamasi | b6ea246333 |
Fix NonBatchOpsStressTest::TestGetEntity by adding fuzzy match (#12711)
Summary: Pull Request resolved: https://github.com/facebook/rocksdb/pull/12711 The patch adds the missing other half of https://github.com/facebook/rocksdb/pull/12709: when there is no locking in a read test, we have to be more permissive when it comes to values returned by queries. In particular, any expected state value in a small window around the read call should be allowed, and discrepancies in the presence/absence of a key should only be treated as a failure if the key is guaranteed to have not existed/existed during the above window. Reviewed By: hx235 Differential Revision: D57938678 fbshipit-source-id: cd5c8bc2e014ec12ea4daf441965f3ec2115663e |
|
Levi Tamasi | d9316260e4 |
Remove unneccessary MutexLock from NonBatchedOpStressTest::TestGetEntity (#12709)
Summary: Pull Request resolved: https://github.com/facebook/rocksdb/pull/12709 This is most likely copypasta from `TestGet` from before https://github.com/facebook/rocksdb/pull/11058 . There is no need to lock the mutex for the key for reads; in fact, doing so is detrimental to test coverage since it locks out concurrent writers. Reviewed By: jowlyzhang Differential Revision: D57915207 fbshipit-source-id: eb0dbf6b84e5408b87d96dd47597511996e206a7 |
|
Levi Tamasi | 5cec4bbcab |
Support PutEntity as write method in the transactional MultiGet stress test (#12699)
Summary: Pull Request resolved: https://github.com/facebook/rocksdb/pull/12699 The patch adds `PutEntity` to the potential write operations used in the read-your-own-writes tests for `Transaction::MultiGet`. Note that since the stress test generates wide-column structures which have the value returned by `GenerateValue` in the default column, this does not affect the results returned by the `MultiGet` API (unless we have a bug). The wide-column entity is generated according to the usual rules based on the value base and the `use_put_entity_one_in` flag. The entire entity structure will be validated by the upcoming stress test for `Transaction::MultiGetEntity`, where we also plan to leverage this logic. Reviewed By: jowlyzhang Differential Revision: D57799075 fbshipit-source-id: 5f86c2b2b3ceee8e1b8bf7453c02f1f1b1b00751 |
|
Peter Dillinger | d2ef70872f |
Rename, deprecate `LogFile` and `VectorLogPtr` (#12695)
Summary: These names are confusing with `Logger` etc. so moving to `WalFile` etc. Other small, related name refactorings. Pull Request resolved: https://github.com/facebook/rocksdb/pull/12695 Test Plan: Left most unit tests using old names as an API compatibility test. Non-test code compiles with deprecated names removed. No functional changes. Reviewed By: ajkr Differential Revision: D57747458 Pulled By: pdillinger fbshipit-source-id: 7b77596b9c20d865d43b9dc66c30c8bd2b3b424f |
|
Levi Tamasi | bd801bd98c |
Factor out the RYW transaction building logic into a helper (#12697)
Summary: Pull Request resolved: https://github.com/facebook/rocksdb/pull/12697 As groundwork for stress testing `Transaction::MultiGetEntity`, the patch factors out the logic for adding transactional writes for some of the keys in a `MultiGet` batch into a separate helper method called `MaybeAddKeyToTxnForRYW`. Reviewed By: jowlyzhang Differential Revision: D57791830 fbshipit-source-id: ef347ba6e6e82dfe5cedb4cf67dd6d1503901d89 |
|
Changyu Bi | fecb10c2fa |
Improve universal compaction sorted-run trigger (#12477)
Summary: Universal compaction currently uses `level0_file_num_compaction_trigger` for two purposes: 1. the trigger for checking if there is any compaction to do, and 2. the limit on the number of sorted runs. RocksDB will do compaction to keep the number of sorted runs no more than the value of this option. This can make the option inflexible. A value that is too small causes higher write amp: more compactions to reduce the number of sorted runs. A value that is too big delays potential compaction work and causes worse read performance. This PR introduce an option `CompactionOptionsUniversal::max_read_amp` for only the second purpose: to specify the hard limit on the number of sorted runs. For backward compatibility, `max_read_amp = -1` by default, which means to fallback to the current behavior. When `max_read_amp > 0`,`level0_file_num_compaction_trigger` will only serve as a trigger to find potential compaction. When `max_read_amp = 0`, RocksDB will auto-tune the limit on the number of sorted runs. The estimation is based on DB size, write_buffer_size and size_ratio, so it is adaptive to the size change of the DB. See more in `UniversalCompactionBuilder::PickCompaction()`. Alternatively, users now can configure `max_read_amp` to a very big value and keep `level0_file_num_compaction_trigger` small. This will allow `size_ratio` and `max_size_amplification_percent` to control the number of sorted runs. This essentially disables compactions with reason kUniversalSortedRunNum. Pull Request resolved: https://github.com/facebook/rocksdb/pull/12477 Test Plan: * new unit test * existing unit test for default behavior * updated crash test with the new option * benchmark: * Create a DB that is roughly 24GB in the last level. When `max_read_amp = 0`, we estimate that the DB needs 9 levels to avoid excessive compactions to reduce the number of sorted runs. * We then run fillrandom to ingest another 24GB data to compare write amp. * case 1: small level0 trigger: `level0_file_num_compaction_trigger=5, max_read_amp=-1` * write-amp: 4.8 * case 2: auto-tune: `level0_file_num_compaction_trigger=5, max_read_amp=0` * write-amp: 3.6 * case 3: auto-tune with minimal trigger: `level0_file_num_compaction_trigger=1, max_read_amp=0` * write-amp: 3.8 * case 4: hard-code a good value for trigger: `level0_file_num_compaction_trigger=9` * write-amp: 2.8 ``` Case 1: ** Compaction Stats [default] ** Level Files Size Score Read(GB) Rn(GB) Rnp1(GB) Write(GB) Wnew(GB) Moved(GB) W-Amp Rd(MB/s) Wr(MB/s) Comp(sec) CompMergeCPU(sec) Comp(cnt) Avg(sec) KeyIn KeyDrop Rblob(GB) Wblob(GB) ------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------ L0 0/0 0.00 KB 1.0 0.0 0.0 0.0 22.6 22.6 0.0 1.0 0.0 163.2 141.94 111.10 108 1.314 0 0 0.0 0.0 L45 8/0 1.81 GB 0.0 39.6 11.1 28.5 39.3 10.8 0.0 3.5 209.0 207.3 194.25 191.29 43 4.517 348M 2498K 0.0 0.0 L46 13/0 3.12 GB 0.0 15.3 9.5 5.8 15.0 9.3 0.0 1.6 203.1 199.3 77.13 75.88 16 4.821 134M 2362K 0.0 0.0 L47 19/0 4.68 GB 0.0 15.4 10.5 4.9 14.7 9.8 0.0 1.4 204.0 194.9 77.38 76.15 8 9.673 135M 5920K 0.0 0.0 L48 38/0 9.42 GB 0.0 19.6 11.7 7.9 17.3 9.4 0.0 1.5 206.5 182.3 97.15 95.02 4 24.287 172M 20M 0.0 0.0 L49 91/0 22.70 GB 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.00 0.00 0 0.000 0 0 0.0 0.0 Sum 169/0 41.74 GB 0.0 89.9 42.9 47.0 109.0 61.9 0.0 4.8 156.7 189.8 587.85 549.45 179 3.284 791M 31M 0.0 0.0 Case 2: ** Compaction Stats [default] ** Level Files Size Score Read(GB) Rn(GB) Rnp1(GB) Write(GB) Wnew(GB) Moved(GB) W-Amp Rd(MB/s) Wr(MB/s) Comp(sec) CompMergeCPU(sec) Comp(cnt) Avg(sec) KeyIn KeyDrop Rblob(GB) Wblob(GB) ------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------ L0 1/0 214.47 MB 1.2 0.0 0.0 0.0 22.6 22.6 0.0 1.0 0.0 164.5 140.81 109.98 108 1.304 0 0 0.0 0.0 L44 0/0 0.00 KB 0.0 1.3 1.3 0.0 1.2 1.2 0.0 1.0 206.1 204.9 6.24 5.98 3 2.081 11M 51K 0.0 0.0 L45 4/0 844.36 MB 0.0 7.1 5.4 1.7 7.0 5.4 0.0 1.3 194.6 192.9 37.41 36.00 13 2.878 62M 489K 0.0 0.0 L46 11/0 2.57 GB 0.0 14.6 9.8 4.8 14.3 9.5 0.0 1.5 193.7 189.8 77.09 73.54 17 4.535 128M 2411K 0.0 0.0 L47 24/0 5.81 GB 0.0 19.8 12.0 7.8 18.8 11.0 0.0 1.6 191.4 181.1 106.19 101.21 9 11.799 174M 9166K 0.0 0.0 L48 38/0 9.42 GB 0.0 19.6 11.8 7.9 17.3 9.4 0.0 1.5 197.3 173.6 101.97 97.23 4 25.491 172M 20M 0.0 0.0 L49 91/0 22.70 GB 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.00 0.00 0 0.000 0 0 0.0 0.0 Sum 169/0 41.54 GB 0.0 62.4 40.3 22.1 81.3 59.2 0.0 3.6 136.1 177.2 469.71 423.94 154 3.050 549M 32M 0.0 0.0 Case 3: ** Compaction Stats [default] ** Level Files Size Score Read(GB) Rn(GB) Rnp1(GB) Write(GB) Wnew(GB) Moved(GB) W-Amp Rd(MB/s) Wr(MB/s) Comp(sec) CompMergeCPU(sec) Comp(cnt) Avg(sec) KeyIn KeyDrop Rblob(GB) Wblob(GB) ------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------ L0 0/0 0.00 KB 5.0 0.0 0.0 0.0 22.6 22.6 0.0 1.0 0.0 163.8 141.43 111.13 108 1.310 0 0 0.0 0.0 L44 0/0 0.00 KB 0.0 0.8 0.8 0.0 0.8 0.8 0.0 1.0 201.4 200.2 4.26 4.19 2 2.130 7360K 33K 0.0 0.0 L45 4/0 844.38 MB 0.0 6.3 5.0 1.2 6.2 5.0 0.0 1.2 202.0 200.3 31.81 31.50 12 2.651 55M 403K 0.0 0.0 L46 7/0 1.62 GB 0.0 13.3 8.8 4.6 13.1 8.6 0.0 1.5 198.9 195.7 68.72 67.89 17 4.042 117M 1696K 0.0 0.0 L47 24/0 5.81 GB 0.0 21.7 12.9 8.8 20.6 11.8 0.0 1.6 198.5 188.6 112.04 109.97 12 9.336 191M 9352K 0.0 0.0 L48 41/0 10.14 GB 0.0 24.8 13.0 11.8 21.9 10.1 0.0 1.7 198.6 175.6 127.88 125.36 6 21.313 218M 25M 0.0 0.0 L49 91/0 22.70 GB 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.00 0.00 0 0.000 0 0 0.0 0.0 Sum 167/0 41.10 GB 0.0 67.0 40.5 26.4 85.4 58.9 0.0 3.8 141.1 179.8 486.13 450.04 157 3.096 589M 36M 0.0 0.0 Case 4: ** Compaction Stats [default] ** Level Files Size Score Read(GB) Rn(GB) Rnp1(GB) Write(GB) Wnew(GB) Moved(GB) W-Amp Rd(MB/s) Wr(MB/s) Comp(sec) CompMergeCPU(sec) Comp(cnt) Avg(sec) KeyIn KeyDrop Rblob(GB) Wblob(GB) ------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------ L0 0/0 0.00 KB 0.7 0.0 0.0 0.0 22.6 22.6 0.0 1.0 0.0 158.6 146.02 114.68 108 1.352 0 0 0.0 0.0 L42 0/0 0.00 KB 0.0 1.7 1.7 0.0 1.7 1.7 0.0 1.0 185.4 184.3 9.25 8.96 4 2.314 14M 67K 0.0 0.0 L43 0/0 0.00 KB 0.0 2.5 2.5 0.0 2.5 2.5 0.0 1.0 197.8 195.6 13.01 12.65 4 3.253 22M 202K 0.0 0.0 L44 4/0 844.40 MB 0.0 4.2 4.2 0.0 4.1 4.1 0.0 1.0 188.1 185.1 22.81 21.89 5 4.562 36M 503K 0.0 0.0 L45 13/0 3.12 GB 0.0 7.5 6.5 1.0 7.2 6.2 0.0 1.1 188.7 181.8 40.69 39.32 5 8.138 65M 2282K 0.0 0.0 L46 17/0 4.18 GB 0.0 8.3 7.1 1.2 7.9 6.6 0.0 1.1 192.2 181.8 44.23 43.06 4 11.058 73M 3846K 0.0 0.0 L47 22/0 5.34 GB 0.0 8.9 7.5 1.4 8.2 6.8 0.0 1.1 189.1 174.1 48.12 45.37 3 16.041 78M 6098K 0.0 0.0 L48 27/0 6.58 GB 0.0 9.2 7.6 1.6 8.2 6.6 0.0 1.1 195.2 172.9 48.52 47.11 2 24.262 81M 9217K 0.0 0.0 L49 91/0 22.70 GB 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.00 0.00 0 0.000 0 0 0.0 0.0 Sum 174/0 42.74 GB 0.0 42.3 37.0 5.3 62.4 57.1 0.0 2.8 116.3 171.3 372.66 333.04 135 2.760 372M 22M 0.0 0.0 setup: ./db_bench --benchmarks=fillseq,compactall,waitforcompaction --num=200000000 --compression_type=none --disable_wal=1 --compaction_style=1 --num_levels=50 --target_file_size_base=268435456 --max_compaction_bytes=6710886400 --level0_file_num_compaction_trigger=10 --write_buffer_size=268435456 --seed 1708494134896523 benchmark: ./db_bench --benchmarks=overwrite,waitforcompaction,stats --num=200000000 --compression_type=none --disable_wal=1 --compaction_style=1 --write_buffer_size=268435456 --level0_file_num_compaction_trigger=5 --target_file_size_base=268435456 --use_existing_db=1 --num_levels=50 --writes=200000000 --universal_max_read_amp=-1 --seed=1716488324800233 ``` Reviewed By: ajkr Differential Revision: D55370922 Pulled By: cbi42 fbshipit-source-id: 9be69979126b840d08e93e7059260e76a878bb2a |
|
Levi Tamasi | f044b6a6ad |
Fix a couple of issues in the stress test for Transaction::MultiGet (#12696)
Summary: Pull Request resolved: https://github.com/facebook/rocksdb/pull/12696 Two fixes: 1) `Random::Uniform(n)` returns an integer from the interval [0, n - 1], so `Uniform(2)` returns 0 or 1, which means is that we have apparently never covered transactions with deletions in the test. (To prevent similar issues, the patch cleans this write logic up a bit using an `enum class` for the type of write.) 2) The keys passed in to `TestMultiGet` can have duplicates. What this boils down to is that we have to keep track of the latest expected values for read-your-own-writes on a per-key basis. Reviewed By: jowlyzhang Differential Revision: D57750212 fbshipit-source-id: e8ab603252c32331f8db0dfb2affcca1e188c790 |
|
Levi Tamasi | db0960800a |
Add Transaction::PutEntity to the stress tests (#12688)
Summary: Pull Request resolved: https://github.com/facebook/rocksdb/pull/12688 As a first step of covering the wide-column transaction APIs, the patch adds `PutEntity` to the optimistic and pessimistic transaction stress tests (for the latter, only when the WriteCommitted policy is utilized). Other APIs and the multi-operation transaction test will be covered by subsequent PRs. Reviewed By: jaykorean Differential Revision: D57675781 fbshipit-source-id: bfe062ec5f6ab48641cd99a70f239ce4aa39299c |
|
Peter Dillinger | d89ab23bec |
Disallow memtable flush and sst ingest while WAL is locked (#12652)
Summary: We recently noticed that some memtable flushed and file ingestions could proceed during LockWAL, in violation of its stated contract. (Note: we aren't 100% sure its actually needed by MySQL, but we want it to be in a clean state nonetheless.) Despite earlier skepticism that this could be done safely (https://github.com/facebook/rocksdb/issues/12666), I found a place to wait to wait for LockWAL to be cleared before allowing these operations to proceed: WaitForPendingWrites() Pull Request resolved: https://github.com/facebook/rocksdb/pull/12652 Test Plan: Added to unit tests. Extended how db_stress validates LockWAL and re-enabled combination of ingestion and LockWAL in crash test, in follow-up to https://github.com/facebook/rocksdb/issues/12642 Ran blackbox_crash_test for a long while with relevant features amplified. Suggested follow-up: fix FaultInjectionTestFS to report file sizes consistent with what the user has requested to be flushed. Reviewed By: jowlyzhang Differential Revision: D57622142 Pulled By: pdillinger fbshipit-source-id: aef265fce69465618974b4ec47f4636257c676ce |
|
Hui Xiao | d7b938882e |
Sync WAL during db Close() (#12556)
Summary: **Context/Summary:** Below crash test found out we don't sync WAL upon DB close, which can lead to unsynced data loss. This PR syncs it. ``` ./db_stress --threads=1 --disable_auto_compactions=1 --WAL_size_limit_MB=0 --WAL_ttl_seconds=0 --acquire_snapshot_one_in=0 --adaptive_readahead=0 --adm_policy=1 --advise_random_on_open=1 --allow_concurrent_memtable_write=1 --allow_data_in_errors=True --allow_fallocate=0 --async_io=0 --auto_readahead_size=0 --avoid_flush_during_recovery=1 --avoid_flush_during_shutdown=0 --avoid_unnecessary_blocking_io=1 --backup_max_size=104857600 --backup_one_in=0 --batch_protection_bytes_per_key=0 --bgerror_resume_retry_interval=1000000 --block_align=0 --block_protection_bytes_per_key=2 --block_size=16384 --bloom_before_level=1 --bloom_bits=29.895303579352174 --bottommost_compression_type=disable --bottommost_file_compaction_delay=0 --bytes_per_sync=0 --cache_index_and_filter_blocks=0 --cache_index_and_filter_blocks_with_high_priority=1 --cache_size=33554432 --cache_type=lru_cache --charge_compression_dictionary_building_buffer=1 --charge_file_metadata=0 --charge_filter_construction=1 --charge_table_reader=1 --checkpoint_one_in=0 --checksum_type=kxxHash64 --clear_column_family_one_in=0 --column_families=1 --compact_files_one_in=0 --compact_range_one_in=0 --compaction_pri=0 --compaction_readahead_size=0 --compaction_style=0 --compaction_ttl=0 --compress_format_version=2 --compressed_secondary_cache_ratio=0 --compressed_secondary_cache_size=0 --compression_checksum=1 --compression_max_dict_buffer_bytes=0 --compression_max_dict_bytes=0 --compression_parallel_threads=4 --compression_type=zstd --compression_use_zstd_dict_trainer=1 --compression_zstd_max_train_bytes=0 --continuous_verification_interval=0 --data_block_index_type=0 --db=/dev/shm/rocksdb_test/rocksdb_crashtest_whitebox --db_write_buffer_size=0 --default_temperature=kUnknown --default_write_temperature=kUnknown --delete_obsolete_files_period_micros=0 --delpercent=0 --delrangepercent=0 --destroy_db_initially=1 --detect_filter_construct_corruption=1 --disable_wal=0 --dump_malloc_stats=0 --enable_checksum_handoff=0 --enable_compaction_filter=0 --enable_custom_split_merge=0 --enable_do_not_compress_roles=1 --enable_index_compression=1 --enable_memtable_insert_with_hint_prefix_extractor=0 --enable_pipelined_write=0 --enable_sst_partitioner_factory=0 --enable_thread_tracking=1 --enable_write_thread_adaptive_yield=0 --expected_values_dir=/dev/shm/rocksdb_test/rocksdb_crashtest_expected --fail_if_options_file_error=0 --fifo_allow_compaction=1 --file_checksum_impl=none --fill_cache=0 --flush_one_in=1000 --format_version=5 --get_current_wal_file_one_in=0 --get_live_files_one_in=0 --get_property_one_in=0 --get_sorted_wal_files_one_in=0 --hard_pending_compaction_bytes_limit=274877906944 --high_pri_pool_ratio=0 --index_block_restart_interval=6 --index_shortening=0 --index_type=0 --ingest_external_file_one_in=0 --initial_auto_readahead_size=16384 --iterpercent=0 --key_len_percent_dist=1,30,69 --last_level_temperature=kUnknown --level_compaction_dynamic_level_bytes=1 --lock_wal_one_in=0 --log2_keys_per_lock=10 --log_file_time_to_roll=0 --log_readahead_size=16777216 --long_running_snapshots=0 --low_pri_pool_ratio=0 --lowest_used_cache_tier=0 --manifest_preallocation_size=5120 --manual_wal_flush_one_in=0 --mark_for_compaction_one_file_in=0 --max_auto_readahead_size=0 --max_background_compactions=1 --max_bytes_for_level_base=67108864 --max_key=2500000 --max_key_len=3 --max_log_file_size=0 --max_manifest_file_size=1073741824 --max_sequential_skip_in_iterations=8 --max_total_wal_size=0 --max_write_batch_group_size_bytes=64 --max_write_buffer_number=10 --max_write_buffer_size_to_maintain=0 --memtable_insert_hint_per_batch=0 --memtable_max_range_deletions=0 --memtable_prefix_bloom_size_ratio=0.5 --memtable_protection_bytes_per_key=1 --memtable_whole_key_filtering=1 --memtablerep=skip_list --metadata_charge_policy=0 --min_write_buffer_number_to_merge=1 --mmap_read=0 --mock_direct_io=True --nooverwritepercent=1 --num_file_reads_for_auto_readahead=0 --num_levels=1 --open_files=-1 --open_metadata_write_fault_one_in=0 --open_read_fault_one_in=0 --open_write_fault_one_in=0 --ops_per_thread=3 --optimize_filters_for_hits=1 --optimize_filters_for_memory=1 --optimize_multiget_for_io=0 --paranoid_file_checks=0 --partition_filters=0 --partition_pinning=1 --pause_background_one_in=0 --periodic_compaction_seconds=0 --prefix_size=1 --prefixpercent=0 --prepopulate_block_cache=0 --preserve_internal_time_seconds=3600 --progress_reports=0 --read_amp_bytes_per_bit=0 --read_fault_one_in=0 --readahead_size=16384 --readpercent=0 --recycle_log_file_num=0 --reopen=2 --report_bg_io_stats=1 --sample_for_compression=5 --secondary_cache_fault_one_in=0 --secondary_cache_uri= --skip_stats_update_on_db_open=1 --snapshot_hold_ops=0 --soft_pending_compaction_bytes_limit=68719476736 --sst_file_manager_bytes_per_sec=0 --sst_file_manager_bytes_per_truncate=0 --stats_dump_period_sec=10 --stats_history_buffer_size=1048576 --strict_bytes_per_sync=0 --subcompactions=3 --sync=0 --sync_fault_injection=1 --table_cache_numshardbits=6 --target_file_size_base=16777216 --target_file_size_multiplier=1 --test_batches_snapshots=0 --top_level_index_pinning=0 --unpartitioned_pinning=3 --use_adaptive_mutex=1 --use_adaptive_mutex_lru=0 --use_delta_encoding=1 --use_direct_io_for_flush_and_compaction=0 --use_direct_reads=0 --use_full_merge_v1=0 --use_get_entity=0 --use_merge=0 --use_multi_get_entity=0 --use_multiget=1 --use_put_entity_one_in=0 --use_write_buffer_manager=0 --user_timestamp_size=0 --value_size_mult=32 --verification_only=0 --verify_checksum=1 --verify_checksum_one_in=1000 --verify_compression=0 --verify_db_one_in=100000 --verify_file_checksums_one_in=0 --verify_iterator_with_expected_state_one_in=5 --verify_sst_unique_id_in_manifest=1 --wal_bytes_per_sync=0 --wal_compression=zstd --write_buffer_size=33554432 --write_dbid_to_manifest=0 --write_fault_one_in=0 --writepercent=100 Verification failed for column family 0 key 000000000000B9D1000000000000012B000000000000017D (4756691): value_from_db: , value_from_expected: 010000000504070609080B0A0D0C0F0E111013121514171619181B1A1D1C1F1E212023222524272629282B2A2D2C2F2E313033323534373639383B3A3D3C3F3E, msg: Iterator verification: Value not found: NotFound: Verification failed :( ``` Pull Request resolved: https://github.com/facebook/rocksdb/pull/12556 Test Plan: - New UT - Same stress test command failed before this fix but pass after - CI Reviewed By: ajkr Differential Revision: D56267964 Pulled By: hx235 fbshipit-source-id: af1b7e8769c129f64ba1c7f1ff17102f1239b929 |
|
Changyu Bi | 2eb404de13 |
Print non-ok status for multi_ops_txns_stress test (#12660)
Summary: Currently `assert(s.ok())` does not offer much information to debug. Pull Request resolved: https://github.com/facebook/rocksdb/pull/12660 Test Plan: revert https://github.com/facebook/rocksdb/issues/12639 and run `python3 ./tools/db_crashtest.py blackbox --write_policy write_prepared --recycle_log_file_num=1 --threads=1 --test_multiops_txn` before this PR: ``` Exit Before Killing stdout: stderr: db_stress: db_stress_tool/multi_ops_txns_stress.cc:1529: void rocksdb::MultiOpsTxnsStressTest::PreloadDb(rocksdb::SharedState*, int, uint32_t, uint32_t, uint32_t, uint32_t): Assertion `s.ok()' failed. ``` after this PR: ``` Exit Before Killing stdout: stderr: Verification failed: PreloadDB failed: Invalid argument: WriteOptions::disableWAL option is not supported if DBOptions::recycle_log_file_num > 0 db_stress: db_stress_tool/db_stress_test_base.cc:517: void rocksdb::StressTest::ProcessStatus(rocksdb::SharedState*, std::string, rocksdb::Status, bool) const: Assertion `false' failed. Received signal 6 (Aborted) ``` Reviewed By: ajkr Differential Revision: D57410819 Pulled By: cbi42 fbshipit-source-id: a03c2202c3fd666eb2f58bae24e0c9e3e6ed4265 |
|
Jay Huh | b4c6956a59 |
Add MultiGetEntity AttributeGroup API to stress test (#12640)
Summary: Continuing from https://github.com/facebook/rocksdb/pull/12605, adding AttributeGroup `MultiGetEntity` API to stress tests. Pull Request resolved: https://github.com/facebook/rocksdb/pull/12640 Test Plan: **AttributeGroup Tests** NonBatchOps ``` python3 tools/db_crashtest.py blackbox --simple --max_key=25000000 --write_buffer_size=4194304 --use_attribute_group=1 --use_put_entity_one_in=1 --use_multi_get=1 ``` BatchOps ``` python3 tools/db_crashtest.py blackbox --test_batches_snapshots=1 --max_key=25000000 --write_buffer_size=4194304 --use_attribute_group=1 --use_put_entity_one_in=1 --use_multi_get=1 ``` CfConsistency Test ``` python3 tools/db_crashtest.py blackbox --cf_consistency --max_key=25000000 --write_buffer_size=4194304 --use_attribute_group=1 --use_put_entity_one_in=1 --use_multi_get=1 ``` **Non-AttributeGroup Tests** NonBatchOps ``` python3 tools/db_crashtest.py blackbox --simple --max_key=25000000 --write_buffer_size=4194304 --use_attribute_group=0 --use_put_entity_one_in=1 --use_multi_get=1 ``` BatchOps ``` python3 tools/db_crashtest.py blackbox --test_batches_snapshots=1 --max_key=25000000 --write_buffer_size=4194304 --use_attribute_group=0 --use_put_entity_one_in=1 --use_multi_get=1 ``` CfConsistency Test ``` python3 tools/db_crashtest.py blackbox --cf_consistency --max_key=25000000 --write_buffer_size=4194304 --use_attribute_group=0 --use_put_entity_one_in=1 --use_multi_get=1 ``` Reviewed By: ltamasi Differential Revision: D57233931 Pulled By: jaykorean fbshipit-source-id: 8cea771ac2e5749050bf5319360c6c5aa476d7d5 |
|
Yu Zhang | 7d9642d876 |
Add logging for read timestamp during VerifyDB (#12638)
Summary: As titled. To help debug some verification failures. Pull Request resolved: https://github.com/facebook/rocksdb/pull/12638 Test Plan: manually tested Reviewed By: ajkr Differential Revision: D57219549 Pulled By: jowlyzhang fbshipit-source-id: 59c05ac85fb1c24449e7394ea04172c855d86420 |
|
Jay Huh | 1f2715d1d2 |
AttributeGroup APIs in stress test - PutEntity and GetEntity (#12605)
Summary: Adding AttributeGroup APIs in stress test. This contains the following changes only. More PRs to follow. - Introduce `use_attribute_group` flag - AttributeGroup `PutEntity()` and `GetEntity()` are now used per `use_attribute_group` flag in BatchOps, NonBatchOps and CfConsistency tests In the next PRs I plan to add - AttributeGroup `MultiGetEntity()` in Stress Test - AttributeGroupIterator in Stress Test (along with CoalescingIterator) Pull Request resolved: https://github.com/facebook/rocksdb/pull/12605 Test Plan: NonBatchOps ``` python3 tools/db_crashtest.py blackbox --simple --max_key=25000000 --write_buffer_size=4194304 --use_attribute_group=1 --use_put_entity_one_in=1 ``` BatchOps ``` python3 tools/db_crashtest.py blackbox --test_batches_snapshots=1 --max_key=25000000 --write_buffer_size=4194304 --use_attribute_group=1 --use_put_entity_one_in=1 ``` CfConsistency Test ``` python3 tools/db_crashtest.py blackbox --cf_consistency --max_key=25000000 --write_buffer_size=4194304 --use_attribute_group=1 --use_put_entity_one_in=1 ``` Reviewed By: ltamasi Differential Revision: D56916768 Pulled By: jaykorean fbshipit-source-id: 8555d9e0d05927740a10e4e8301e44beec59a6f5 |
|
Hui Xiao | 9bddac0dcf |
Add more public APIs to crash test (#12617)
Summary: **Context/Summary:** As titled. Bonus: found that PromoteL0 called with other concurrent PromoteL0 will return non-okay error so clarify the API. Pull Request resolved: https://github.com/facebook/rocksdb/pull/12617 Test Plan: CI Reviewed By: ajkr Differential Revision: D56954428 Pulled By: hx235 fbshipit-source-id: 0e056153c515003fd241ffec59b0d8a27529db4c |
|
Andrew Kryczka | 933ac0e05c |
Fix locking for `ColumnFamilyOptions::inplace_update_support` (#12624)
Summary: In `SaveValue()`, the read lock needs to be obtained before `VerifyEntryChecksum()` because the KV checksum verification reads the entire value metadata+data, which is all mutable when `ColumnFamilyOptions::inplace_update_support == true`. In `MemTable::Update()`, the write lock needs to be obtained before mutating the value metadata (changing the value size) because it can be read concurrently. Pull Request resolved: https://github.com/facebook/rocksdb/pull/12624 Test Plan: ``` $ make COMPILE_WITH_TSAN=1 -j56 db_stress ... $ python3 tools/db_crashtest.py blackbox --simple --max_key=10 --inplace_update_support=1 --interval=10 --allow_concurrent_memtable_write=0 ``` Reviewed By: cbi42 Differential Revision: D57034571 Pulled By: ajkr fbshipit-source-id: 3dddf881ad87923143acdf6bfec12ce47bb13a48 |
|
Changyu Bi | 5bf2c00a35 |
Clarify manual compaction and file ingestion behavior with FIFO compaction (#12618)
Summary:
For manual compaction, FIFO compaction will always skip key range overlapping checking with SST files. If CompactRange() is called with CompactionRangeOptions::change_level=true, a CF with FIFO compaction will now return Status::NotSupported.
For file ingestion, we will always ingest into L0. Previously, it's possible to ingest files into non-L0 levels with FIFO compaction.
These changes also help to fix [this](
|
|
Hui Xiao | b312dbe920 |
Remove duplicate inplace_update_support crash/stress test flag (#12610)
Summary: **Context/Summary:** As titled. There were two flags serving the same purpose so removed one of them. Pull Request resolved: https://github.com/facebook/rocksdb/pull/12610 Test Plan: CI Reviewed By: jaykorean, ajkr Differential Revision: D56916119 Pulled By: hx235 fbshipit-source-id: 011140a7945782cc613ca86d4b542db0cf7fb444 |
|
Yu Zhang | 8b3d9e6bfe |
Add TimedPut to stress test (#12559)
Summary: This also updates WriteBatch's protection info to include write time since there are several places in memtable that by default protects the whole value slice. This PR is stacked on https://github.com/facebook/rocksdb/issues/12543 Pull Request resolved: https://github.com/facebook/rocksdb/pull/12559 Reviewed By: pdillinger Differential Revision: D56308285 Pulled By: jowlyzhang fbshipit-source-id: 5524339fe0dd6c918dc940ca2f0657b5f2111c56 |
|
Hui Xiao | 24a35b6e57 |
Add more public APIs to crash/stress test (#12541)
Summary: **Context/Summary:** This PR includes some public DB APIs not tested in crash/stress yet can be added in a straightforward way. Pull Request resolved: https://github.com/facebook/rocksdb/pull/12541 Test Plan: - Locally run crash test heavily stressing on these new APIs - CI Reviewed By: jowlyzhang Differential Revision: D56164892 Pulled By: hx235 fbshipit-source-id: 8bb568c3e65aec39d642987033f1d76c52f69bd8 |
|
Jay Huh | dfdc3b158e |
Add offpeak feature to crash test (#12549)
Summary: As title. Add offpeak feature in stress test. Pull Request resolved: https://github.com/facebook/rocksdb/pull/12549 Test Plan: Ran stress test locally with the flag set ``` Running db_stress with pid=701060: ./db_stress ... --daily_offpeak_time_utc=04:00-08:00 ... --periodic_compaction_seconds=10 ... ... KILLED 701060 stdout: Choosing random keys with no overwrite Creating 6250000 locks 2024/04/16-11:38:19 Initializing db_stress RocksDB version : 9.2 Format version : 5 TransactionDB : false Stacked BlobDB : false Read only mode : false Atomic flush : false Manual WAL flush : true Column families : 1 Clear CFs one in : 0 Number of threads : 32 Ops per thread : 100000000 Time to live(sec) : unused Read percentage : 60% Prefix percentage : 0% Write percentage : 35% Delete percentage : 4% Delete range percentage : 1% No overwrite percentage : 1% Iterate percentage : 0% Custom ops percentage : 0% DB-write-buffer-size : 0 Write-buffer-size : 4194304 Iterations : 10 Max key : 25000000 Ratio #ops/#keys : 128.000000 Num times DB reopens : 0 Batches/snapshots : 0 Do update in place : 0 Num keys per lock : 4 Compression : LZ4 Bottommost Compression : DisableOption Checksum type : kxxHash File checksum impl : none Bloom bits / key : 18.000000 Max subcompactions : 4 Use MultiGet : false Use GetEntity : false Use MultiGetEntity : false Verification only : false Memtablerep : skip_list Test kill odd : 0 Periodic Compaction Secs : 10 Daily Offpeak UTC : 04:00-08:00 <<<<<<<<<<<<<<< Newly added Compaction TTL : 0 Compaction Pri : kMinOverlappingRatio Background Purge : 0 Write DB ID to manifest : 0 Max Write Batch Group Size: 16 Use dynamic level : 1 Read fault one in : 0 Write fault one in : 1000 Open metadata write fault one in: 8 Sync fault injection : 0 Best efforts recovery : 0 Fail if OPTIONS file error: 0 User timestamp size bytes : 0 Persist user defined timestamps : 1 WAL compression : zstd Try verify sst unique id : 1 ------------------------------------------------ ``` Reviewed By: hx235 Differential Revision: D56203102 Pulled By: jaykorean fbshipit-source-id: 11a9be7362b3b26940d74d41c8bf4ebac3f03a2d |
|
Hui Xiao | d41e568b1c |
Add inplace_update_support to crash/stress test (#12535)
Summary: **Context/Summary:** `inplace_update_support=true` is not tested in crash/stress test. Since it's not compatible with snapshots like compaction_filter, we need to sanitize its value in presence of snapshots-related options. A minor refactoring is added to centralize such sanitization in db_crashtest.py - see `check_multiget_consistency` and `check_multiget_entity_consistency` Pull Request resolved: https://github.com/facebook/rocksdb/pull/12535 Test Plan: CI Reviewed By: ajkr Differential Revision: D56102978 Pulled By: hx235 fbshipit-source-id: 2e2ab6685a65123b14a321b99f45f60bc6509c6b |
|
Hui Xiao | ef26d68e8d |
Renable kAdmPolicyThreeQueue in crash test (#12524)
Summary: Context/Summary: We need a `nvm_sec_cache` when `kAdmPolicyThreeQueue` is used otherwise a nullptr cache will be accessed causing us segfault in https://github.com/facebook/rocksdb/pull/12521 Pull Request resolved: https://github.com/facebook/rocksdb/pull/12524 Test Plan: - Re-enabled `kAdmPolicyThreeQueue` and rehearsed stress test that failed before this fix and pass after Reviewed By: jowlyzhang Differential Revision: D55997093 Pulled By: hx235 fbshipit-source-id: e1c6f1015091b4cff0ce6a3fff981d5dece52a62 |
|
Hui Xiao | 72c1376fcf |
Fix "assertion failed - iter != ROCKSDB_NAMESPACE::OptionsHelper::temperature_to_string.end()" (#12519)
Summary: Context/Summary: for unknown reason, calling a db stress common function in db stress flag file for temperature-related flags will cause some weird behavior in some compilation/build. ``` assertion failed - iter != ROCKSDB_NAMESPACE::OptionsHelper::temperature_to_string.end() ``` For now, we decide not to call such function by hard-coding their default stress test values. Pull Request resolved: https://github.com/facebook/rocksdb/pull/12519 Test Plan: - Run a rehearsal stress test with this fix and weird behavior is gone. Reviewed By: jowlyzhang Differential Revision: D55884693 Pulled By: hx235 fbshipit-source-id: ba5135f5b37a9fa686b3ccae8d3f77e62d6562c9 |
|
Hui Xiao | ad423abbd1 |
Add more missing options in crash test (#12508)
Summary: **Context/Summary:** This is to improve our crash test coverage. Bonus change: - Added the missing Options string mapping for `CacheTier::kVolatileCompressedTier` - Deprecated crash test options `enable_tiered_storage` mainly for setting `last_level_temperature` which is now covered in crash test by itself - Intensified `verify_checksum_one_in\verify_file_checksums_one_in` as I found out these together with new coverage surface more issues Pull Request resolved: https://github.com/facebook/rocksdb/pull/12508 Test Plan: CI to look out for trivial failures Reviewed By: jowlyzhang Differential Revision: D55768594 Pulled By: hx235 fbshipit-source-id: 9b829da0309a7db3fcdb17992a524dd64498325c |
|
Hui Xiao | d985902ef4 |
Disallow refitting more than 1 file from non-L0 to L0 (#12481)
Summary: **Context/Summary:** We recently discovered that `CompactRange(change_level=true, target_level=0)` can possibly refit more than 1 files to L0. This refitting can cause read performance regression as we need to go through every file in L0, corruption in some edge case and false positive corruption caught by force consistency check. We decided to explicitly disallow such behavior. A related change to OptionChangeMigration(): - When migrating to FIFO with `compaction_options_fifo.max_table_files_size > 0`, RocksDB will [CompactRange() all the to-be-migrate data into a couple L0 files](https://github.com/facebook/rocksdb/blob/main/utilities/option_change_migration/option_change_migration.cc#L164-L169) to avoid dropping all the data upon migration finishes when the migrated data is larger than max_table_files_size. This is achieved by first compacting all the data into a couple non-L0 files and refitting those files from non-L0 to L0 if needed. In that way, only some data instead of all data will be dropped immediately after migration to FIFO with a max_table_files_size. - Since this type of refitting behavior is disallowed from now on, we won't do this trick anymore and explicitly state such risk in API comment. Pull Request resolved: https://github.com/facebook/rocksdb/pull/12481 Test Plan: - New UT - Modified UT Reviewed By: cbi42 Differential Revision: D55351178 Pulled By: hx235 fbshipit-source-id: 9d8854f2f81d7e8aff859c3a4e53b7d688048e80 |