rocksdb/tools
Hui Xiao 0d93c8a6ca Decouple sync fault and write injection in FaultInjectionTestFS & fix tracing issue under WAL write error injection (#12797)
Summary:
**Context/Summary:**

After injecting write error to WAL, we started to see crash recovery verification failure in prefix recovery. That's because the current tracing implementation traces every write before it writes to WAL even when the WAL write can fail with write error injection. One consequence of that is the traced writes in trace files does not corresponding to write sequence sequence anymore e.g, it has more traced writes that the actual assigned sequence number to successful writes. Therefore b4a84efb4e/db_stress_tool/expected_state.cc (L674) won't restore the ExpectedState to the correct sequence number we want.

Ideally, we should have a prepare-commit mechanism for tracing just like our ExpectedState so we can ignore the traced write if the write fails later. But for now, to simplify, we simply don't inject WAL error (and metadata write error cuz it could fail write when sync WAL dir fails)

To do so, we need to be able to exclude WAL from write injection but still allow sync fault injection in it to maintain its original sync fault testing coverage. This prompts us to decouple sync fault and write injection in FaultInjectionTestFS. And this is what this PR mainly about.

So now `FaultInjectionTestFS` works as the following:
- If direct_writable is true, then `FaultInjectionTestFS` is bypassed for writable file
- Otherwise, FaultInjectionTestFS` can buffer data for sync fault injection (if inject_unsynced_data_loss_ == true, global settings) and/or inject write error (if MaybeInjectThreadLocalError(), thread-local settings). WAL file can be optionally excluded from write injection

Bonus: better naming of relevant variables

Pull Request resolved: https://github.com/facebook/rocksdb/pull/12797

Test Plan:
- The follow commands failed before this fix but passes after
```
python3 tools/db_crashtest.py --simple blackbox \
    --interval=5 \
    --preserve_unverified_changes=1 \
    --threads=32 \
    --disable_auto_compactions=1 \
    --WAL_size_limit_MB=0 --WAL_ttl_seconds=0 --acquire_snapshot_one_in=0 --adaptive_readahead=0 --adm_policy=0 --advise_random_on_open=1 --allow_concurrent_memtable_write=0 --allow_data_in_errors=True --allow_fallocate=1 --async_io=0 --auto_readahead_size=0 --avoid_flush_during_recovery=1 --avoid_flush_during_shutdown=0 --avoid_unnecessary_blocking_io=0 --backup_max_size=104857600 --backup_one_in=0 --batch_protection_bytes_per_key=0 --bgerror_resume_retry_interval=1000000 --block_align=0 --block_protection_bytes_per_key=4 --block_size=16384 --bloom_before_level=2147483646 --bloom_bits=3.2003682301518492 --bottommost_compression_type=zlib --bottommost_file_compaction_delay=600 --bytes_per_sync=0 --cache_index_and_filter_blocks=1 --cache_index_and_filter_blocks_with_high_priority=1 --cache_size=33554432 --cache_type=fixed_hyper_clock_cache --charge_compression_dictionary_building_buffer=0 --charge_file_metadata=0 --charge_filter_construction=0 --charge_table_reader=1 --check_multiget_consistency=0 --check_multiget_entity_consistency=0 --checkpoint_one_in=0 --checksum_type=kxxHash64 --clear_column_family_one_in=0 --column_families=1 --compact_files_one_in=0 --compact_range_one_in=0 --compaction_pri=2 --compaction_readahead_size=0 --compaction_ttl=0 --compress_format_version=1 --compressed_secondary_cache_size=16777216 --compression_checksum=1 --compression_max_dict_buffer_bytes=549755813887 --compression_max_dict_bytes=16384 --compression_parallel_threads=1 --compression_type=none --compression_use_zstd_dict_trainer=1 --compression_zstd_max_train_bytes=0 --continuous_verification_interval=0 --daily_offpeak_time_utc=00:00-23:59 --data_block_index_type=0 \
    --db_write_buffer_size=0 --delete_obsolete_files_period_micros=0 --delpercent=0 --delrangepercent=0 --destroy_db_initially=0 --detect_filter_construct_corruption=0 --disable_file_deletions_one_in=0 --disable_manual_compaction_one_in=0 --disable_wal=0 --dump_malloc_stats=0 --enable_checksum_handoff=0 --enable_compaction_filter=0 --enable_custom_split_merge=0 --enable_do_not_compress_roles=1 --enable_index_compression=0 --enable_memtable_insert_with_hint_prefix_extractor=0 --enable_pipelined_write=0 --enable_sst_partitioner_factory=0 --enable_thread_tracking=0 --enable_write_thread_adaptive_yield=0 --error_recovery_with_no_fault_injection=0 --fail_if_options_file_error=0 --fifo_allow_compaction=1 --file_checksum_impl=xxh64 --fill_cache=0 --flush_one_in=100 --format_version=4 --get_all_column_family_metadata_one_in=0 --get_current_wal_file_one_in=0 --get_live_files_apis_one_in=0 --get_properties_of_all_tables_one_in=0 --get_property_one_in=0 --get_sorted_wal_files_one_in=0 --hard_pending_compaction_bytes_limit=274877906944 --high_pri_pool_ratio=0.5 --index_block_restart_interval=9 --index_shortening=1 --index_type=0 --ingest_external_file_one_in=0 --initial_auto_readahead_size=0 --inplace_update_support=0 --iterpercent=0 --key_len_percent_dist=1,30,69 --key_may_exist_one_in=0 --last_level_temperature=kUnknown --level_compaction_dynamic_level_bytes=1 --lock_wal_one_in=0 --log2_keys_per_lock=10 --log_file_time_to_roll=0 --log_readahead_size=16777216 --long_running_snapshots=0 --low_pri_pool_ratio=0 --lowest_used_cache_tier=2 --manifest_preallocation_size=0 --manual_wal_flush_one_in=0 --mark_for_compaction_one_file_in=0 --max_auto_readahead_size=524288 --max_background_compactions=1 --max_bytes_for_level_base=67108864 --max_key=1000 --max_key_len=3 --memtable_insert_hint_per_batch=0 --memtable_max_range_deletions=0 --memtable_prefix_bloom_size_ratio=0.5 --memtable_protection_bytes_per_key=8 --memtable_whole_key_filtering=0 --memtablerep=skip_list --metadata_charge_policy=0 --metadata_read_fault_one_in=0 --metadata_write_fault_one_in=0 --min_write_buffer_number_to_merge=1 --mmap_read=0 --mock_direct_io=False --nooverwritepercent=1 --num_file_reads_for_auto_readahead=0 --open_files=-1 --open_metadata_read_fault_one_in=0 --open_metadata_write_fault_one_in=0 --open_read_fault_one_in=0 --open_write_fault_one_in=0 --ops_per_thread=20000000 \
    --optimize_filters_for_hits=1 --optimize_filters_for_memory=1 --optimize_multiget_for_io=0 --paranoid_file_checks=1 --partition_filters=0 --partition_pinning=3 --pause_background_one_in=0 --periodic_compaction_seconds=0 --prefix_size=1 --prefixpercent=0 --prepopulate_block_cache=0 --preserve_internal_time_seconds=0 --progress_reports=0 --promote_l0_one_in=0 --read_amp_bytes_per_bit=0 --read_fault_one_in=0 --readahead_size=0 --readpercent=0 --recycle_log_file_num=0 --reopen=0 --report_bg_io_stats=0 --reset_stats_one_in=1000000 --sample_for_compression=5 --secondary_cache_fault_one_in=0 --secondary_cache_uri= --skip_stats_update_on_db_open=0 --snapshot_hold_ops=100000 --soft_pending_compaction_bytes_limit=68719476736 --sqfc_name=bar --sqfc_version=1 --sst_file_manager_bytes_per_sec=0 --sst_file_manager_bytes_per_truncate=0 --stats_dump_period_sec=10 --stats_history_buffer_size=0 --strict_bytes_per_sync=0 --subcompactions=1 --sync=0 --sync_fault_injection=1 --table_cache_numshardbits=0 --target_file_size_base=16777216 --target_file_size_multiplier=1 --test_batches_snapshots=0 --top_level_index_pinning=3 --uncache_aggressiveness=9890 --universal_max_read_amp=-1 --unpartitioned_pinning=3 --use_adaptive_mutex=0 --use_adaptive_mutex_lru=1 --use_attribute_group=0 --use_delta_encoding=0 --use_direct_io_for_flush_and_compaction=0 --use_direct_reads=0 --use_full_merge_v1=0 --use_get_entity=0 --use_merge=0 --use_multi_cf_iterator=0 --use_multi_get_entity=0 --use_multiget=0 --use_put_entity_one_in=0 --use_sqfc_for_range_queries=0 --use_timed_put_one_in=0 --use_write_buffer_manager=0 --user_timestamp_size=0 --value_size_mult=32 --verification_only=0 --verify_checksum=0 --verify_checksum_one_in=0 --verify_compression=1 --verify_db_one_in=0 --verify_file_checksums_one_in=0 --verify_iterator_with_expected_state_one_in=5 --verify_sst_unique_id_in_manifest=1 --wal_bytes_per_sync=0 --wal_compression=zstd --write_buffer_size=335544320 --write_dbid_to_manifest=1 --write_fault_one_in=100 --writepercent=100

```
- CI

Reviewed By: cbi42

Differential Revision: D58917145

Pulled By: hx235

fbshipit-source-id: b6397036bea035a92341c2b05fb01872db2153d7
2024-06-26 14:56:35 -07:00
..
advisor Fix lint issues after enable BLACK (#10717) 2022-09-21 13:37:51 -07:00
block_cache_analyzer Block cache analyzer: Calculate miss ratio for each caller (#10823) 2024-01-10 14:02:14 -08:00
dump internal_repo_rocksdb (435146444452818992) (#12115) 2023-12-01 11:15:17 -08:00
CMakeLists.txt Mark dependencies as PRIVATE and fix missing dependencies in tools. (#6790) 2020-05-12 21:07:55 -07:00
Dockerfile
analyze_txn_stress_test.sh
auto_sanity_test.sh
backup_db.sh Revamp check_format_compatible.sh (#8012) 2021-03-02 11:42:27 -08:00
benchmark.sh optimize file size statistics in benchmark script (#12363) 2024-02-21 15:45:18 -08:00
benchmark_ci.py Remove NUMA setting for benchmark-linux (#11180) 2023-02-02 15:15:09 -08:00
benchmark_compare.sh Fix file modes (#10815) 2022-10-13 09:00:37 -07:00
benchmark_leveldb.sh
blob_dump.cc Remove RocksDB LITE (#11147) 2023-01-27 13:14:19 -08:00
check_all_python.py Enable BLACK for internal_repo_rocksdb (#10710) 2022-09-20 17:47:52 -07:00
check_format_compatible.sh Update main branch for 9.4 release (#12802) 2024-06-24 11:53:05 -07:00
db_bench.cc Add (& fix) some simple source code checks (#8821) 2021-09-07 21:19:27 -07:00
db_bench_tool.cc Support pro-actively erasing obsolete block cache entries (#12694) 2024-06-07 08:57:11 -07:00
db_bench_tool_test.cc Group SST write in flush, compaction and db open with new stats (#11910) 2023-12-29 15:29:23 -08:00
db_crashtest.py Decouple sync fault and write injection in FaultInjectionTestFS & fix tracing issue under WAL write error injection (#12797) 2024-06-26 14:56:35 -07:00
db_repl_stress.cc Prefer static_cast in place of most reinterpret_cast (#12308) 2024-02-07 10:44:11 -08:00
db_sanity_test.cc Remove 'virtual' when implied by 'override' (#12319) 2024-01-31 13:14:42 -08:00
dbench_monitor
generate_random_db.sh
ingest_external_sst.sh
io_tracer_parser.cc Remove RocksDB LITE (#11147) 2023-01-27 13:14:19 -08:00
io_tracer_parser_test.cc Remove RocksDB LITE (#11147) 2023-01-27 13:14:19 -08:00
io_tracer_parser_tool.cc Remove RocksDB LITE (#11147) 2023-01-27 13:14:19 -08:00
io_tracer_parser_tool.h Remove RocksDB LITE (#11147) 2023-01-27 13:14:19 -08:00
ldb.cc Remove RocksDB LITE (#11147) 2023-01-27 13:14:19 -08:00
ldb_cmd.cc Add user timestamp support into interactive query command (#12716) 2024-05-30 17:23:38 -07:00
ldb_cmd_impl.h Add LDB command and option for follower instances (#12682) 2024-05-28 23:21:32 -07:00
ldb_cmd_test.cc Remove `bottommost_temperature` (#12389) 2024-02-27 14:48:00 -08:00
ldb_test.py Multiget LDB Followup (#12332) 2024-02-05 20:11:35 -08:00
ldb_tool.cc Add LDB command and option for follower instances (#12682) 2024-05-28 23:21:32 -07:00
pflag
reduce_levels_test.cc Make option `level_compaction_dynamic_level_bytes` true by default (#11525) 2023-06-15 21:12:39 -07:00
regression_test.sh Fix regression script for async_io benchmarks (#11462) 2023-05-22 15:32:12 -07:00
restore_db.sh Revamp check_format_compatible.sh (#8012) 2021-03-02 11:42:27 -08:00
rocksdb_dump_test.sh
run_blob_bench.sh add exe and script path check (#11621) 2023-07-19 12:05:24 -07:00
run_flash_bench.sh
run_leveldb.sh
sample-dump.dmp
simulated_hybrid_file_system.cc Group SST write in flush, compaction and db open with new stats (#11910) 2023-12-29 15:29:23 -08:00
simulated_hybrid_file_system.h Remove RocksDB LITE (#11147) 2023-01-27 13:14:19 -08:00
sst_dump.cc Remove RocksDB LITE (#11147) 2023-01-27 13:14:19 -08:00
sst_dump_test.cc Allow SstFileReader to verify number of entries in SST files (#12418) 2024-03-12 11:05:20 -07:00
sst_dump_tool.cc Augment sst_dump tool to verify num_entries in table property (#12322) 2024-02-01 14:35:03 -08:00
trace_analyzer.cc Remove RocksDB LITE (#11147) 2023-01-27 13:14:19 -08:00
trace_analyzer_test.cc internal_repo_rocksdb (435146444452818992) (#12115) 2023-12-01 11:15:17 -08:00
trace_analyzer_tool.cc Trace analyzer: replace number with enumeration type (#10827) 2023-12-27 10:38:53 -08:00
trace_analyzer_tool.h Remove RocksDB LITE (#11147) 2023-01-27 13:14:19 -08:00
verify_random_db.sh Fix some bugs in verify_random_db.sh (#10112) 2022-06-03 16:35:13 -07:00
write_external_sst.sh Revamp check_format_compatible.sh (#8012) 2021-03-02 11:42:27 -08:00
write_stress.cc Remove RocksDB LITE (#11147) 2023-01-27 13:14:19 -08:00
write_stress_runner.py Enable BLACK for internal_repo_rocksdb (#10710) 2022-09-20 17:47:52 -07:00