rocksdb/utilities
Jay Huh b99baa5ae7 Do not add unprep_seqs when WriteImpl() fails in unprepared txn (#12927)
Summary:
With the following scenario, we see assertion failure in write_unprepared_txn

1. The write is the first one in the transaction (`log_number_` is not set yet - https://github.com/facebook/rocksdb/blob/main/utilities/transactions/write_unprepared_txn.cc#L376-L379)
2. `WriteToWAL()` fails inside `db_impl_->WriteImpl()` due to fault injection. `last_log_number_`is 0. https://github.com/facebook/rocksdb/blob/main/utilities/transactions/write_unprepared_txn.cc#L386
3. `prepare_batch_cnt_` is still added to `unprep_seqs_` https://github.com/facebook/rocksdb/blob/main/utilities/transactions/write_unprepared_txn.cc#L395-L398
4. When the transaction gets destructed after failed, it expects `log_number_ > 0` if `unprep_seqs_` is not empty - https://github.com/facebook/rocksdb/blob/main/utilities/transactions/write_unprepared_txn.cc#L54-L55.

Step 3 is wrong. `unprep_seqs_` is for the ordered list of unprep sequence numbers that we have already written to the DB. If `db_impl_->WriteImpl()` failed, `unprep_seqs_` shouldn't be set. `prepare_seq` value would be wrong anyway.

Pull Request resolved: https://github.com/facebook/rocksdb/pull/12927

Test Plan:
Following command steadily repros the issue. This no longer fails after the fix
```
./db_stress \
  --WAL_size_limit_MB=0 \
  --WAL_ttl_seconds=60 \
  --acquire_snapshot_one_in=10000 \
  --adaptive_readahead=1 \
  --adm_policy=2 \
  --advise_random_on_open=0 \
  --allow_data_in_errors=True \
  --allow_fallocate=1 \
  --async_io=1 \
  --auto_readahead_size=0 \
  --avoid_flush_during_recovery=0 \
  --avoid_flush_during_shutdown=1 \
  --avoid_unnecessary_blocking_io=0 \
  --backup_max_size=104857600 \
  --backup_one_in=100000 \
  --batch_protection_bytes_per_key=8 \
  --bgerror_resume_retry_interval=100 \
  --block_align=1 \
  --block_protection_bytes_per_key=8 \
  --block_size=16384 \
  --bloom_before_level=7 \
  --bloom_bits=4.1280979878467345 \
  --bottommost_compression_type=none \
  --bottommost_file_compaction_delay=600 \
  --bytes_per_sync=262144 \
  --cache_index_and_filter_blocks=1 \
  --cache_index_and_filter_blocks_with_high_priority=0 \
  --cache_size=33554432 \
  --cache_type=lru_cache \
  --charge_compression_dictionary_building_buffer=1 \
  --charge_file_metadata=1 \
  --charge_filter_construction=1 \
  --charge_table_reader=1 \
  --check_multiget_consistency=0 \
  --check_multiget_entity_consistency=1 \
  --checkpoint_one_in=0 \
  --checksum_type=kXXH3 \
  --clear_column_family_one_in=0 \
  --compact_files_one_in=1000000 \
  --compact_range_one_in=1000000 \
  --compaction_pri=4 \
  --compaction_readahead_size=1048576 \
  --compaction_ttl=0 \
  --compress_format_version=1 \
  --compressed_secondary_cache_ratio=0.0 \
  --compressed_secondary_cache_size=0 \
  --compression_checksum=1 \
  --compression_max_dict_buffer_bytes=4294967295 \
  --compression_max_dict_bytes=16384 \
  --compression_parallel_threads=8 \
  --compression_type=none \
  --compression_use_zstd_dict_trainer=1 \
  --compression_zstd_max_train_bytes=0 \
  --continuous_verification_interval=0 \
  --create_timestamped_snapshot_one_in=0 \
  --daily_offpeak_time_utc=04:00-08:00 \
  --data_block_index_type=0 \
  --db=$db \
  --db_write_buffer_size=8388608 \
  --default_temperature=kHot \
  --default_write_temperature=kUnknown \
  --delete_obsolete_files_period_micros=21600000000 \
  --delpercent=5 \
  --delrangepercent=0 \
  --destroy_db_initially=0 \
  --detect_filter_construct_corruption=0 \
  --disable_file_deletions_one_in=10000 \
  --disable_manual_compaction_one_in=10000 \
  --disable_wal=0 \
  --dump_malloc_stats=1 \
  --enable_checksum_handoff=1 \
  --enable_compaction_filter=0 \
  --enable_custom_split_merge=1 \
  --enable_do_not_compress_roles=0 \
  --enable_index_compression=1 \
  --enable_memtable_insert_with_hint_prefix_extractor=0 \
  --enable_pipelined_write=0 \
  --enable_sst_partitioner_factory=0 \
  --enable_thread_tracking=1 \
  --enable_write_thread_adaptive_yield=1 \
  --error_recovery_with_no_fault_injection=1 \
  --exclude_wal_from_write_fault_injection=0 \
  --expected_values_dir=$exp \
  --fail_if_options_file_error=0 \
  --fifo_allow_compaction=1 \
  --file_checksum_impl=crc32c \
  --fill_cache=1 \
  --flush_one_in=1000 \
  --format_version=2 \
  --get_all_column_family_metadata_one_in=1000000 \
  --get_current_wal_file_one_in=0 \
  --get_live_files_apis_one_in=1000000 \
  --get_properties_of_all_tables_one_in=100000 \
  --get_property_one_in=1000000 \
  --get_sorted_wal_files_one_in=0 \
  --hard_pending_compaction_bytes_limit=274877906944 \
  --high_pri_pool_ratio=0 \
  --index_block_restart_interval=15 \
  --index_shortening=0 \
  --index_type=2 \
  --ingest_external_file_one_in=0 \
  --initial_auto_readahead_size=524288 \
  --inplace_update_support=0 \
  --iterpercent=10 \
  --key_len_percent_dist=1,30,69 \
  --key_may_exist_one_in=100000 \
  --kill_random_test=888887 \
  --last_level_temperature=kCold \
  --level_compaction_dynamic_level_bytes=1 \
  --lock_wal_one_in=1000000 \
  --log2_keys_per_lock=10 \
  --log_file_time_to_roll=0 \
  --log_readahead_size=0 \
  --long_running_snapshots=1 \
  --low_pri_pool_ratio=0 \
  --lowest_used_cache_tier=1 \
  --manifest_preallocation_size=5120 \
  --manual_wal_flush_one_in=0 \
  --mark_for_compaction_one_file_in=10 \
  --max_auto_readahead_size=0 \
  --max_background_compactions=20 \
  --max_bytes_for_level_base=10485760 \
  --max_key=100000 \
  --max_key_len=3 \
  --max_log_file_size=1048576 \
  --max_manifest_file_size=1073741824 \
  --max_sequential_skip_in_iterations=2 \
  --max_total_wal_size=0 \
  --max_write_batch_group_size_bytes=16 \
  --max_write_buffer_number=10 \
  --max_write_buffer_size_to_maintain=0 \
  --memtable_insert_hint_per_batch=0 \
  --memtable_max_range_deletions=1000 \
  --memtable_prefix_bloom_size_ratio=0.1 \
  --memtable_protection_bytes_per_key=8 \
  --memtable_whole_key_filtering=1 \
  --memtablerep=skip_list \
  --metadata_charge_policy=0 \
  --metadata_read_fault_one_in=32 \
  --metadata_write_fault_one_in=0 \
  --min_write_buffer_number_to_merge=2 \
  --mmap_read=1 \
  --mock_direct_io=False \
  --nooverwritepercent=1 \
  --num_file_reads_for_auto_readahead=2 \
  --open_files=-1 \
  --open_metadata_read_fault_one_in=0 \
  --open_metadata_write_fault_one_in=0 \
  --open_read_fault_one_in=0 \
  --open_write_fault_one_in=0 \
  --ops_per_thread=20000000 \
  --optimize_filters_for_hits=1 \
  --optimize_filters_for_memory=0 \
  --optimize_multiget_for_io=0 \
  --paranoid_file_checks=1 \
  --partition_filters=0 \
  --partition_pinning=0 \
  --pause_background_one_in=1000000 \
  --periodic_compaction_seconds=1 \
  --prefix_size=5 \
  --prefixpercent=5 \
  --prepopulate_block_cache=0 \
  --preserve_internal_time_seconds=60 \
  --progress_reports=0 \
  --promote_l0_one_in=0 \
  --read_amp_bytes_per_bit=32 \
  --read_fault_one_in=1000 \
  --readahead_size=0 \
  --readpercent=45 \
  --recycle_log_file_num=1 \
  --reopen=20 \
  --report_bg_io_stats=1 \
  --reset_stats_one_in=10000 \
  --sample_for_compression=0 \
  --secondary_cache_fault_one_in=0 \
  --sync=0 \
  --sync_fault_injection=0 \
  --table_cache_numshardbits=6 \
  --target_file_size_base=524288 \
  --target_file_size_multiplier=2 \
  --test_batches_snapshots=0 \
  --top_level_index_pinning=0 \
  --txn_write_policy=2 \
  --uncache_aggressiveness=126 \
  --universal_max_read_amp=4 \
  --unordered_write=0 \
  --unpartitioned_pinning=1 \
  --use_adaptive_mutex=0 \
  --use_adaptive_mutex_lru=1 \
  --use_attribute_group=1 \
  --use_delta_encoding=1 \
  --use_direct_io_for_flush_and_compaction=0 \
  --use_direct_reads=0 \
  --use_full_merge_v1=0 \
  --use_get_entity=0 \
  --use_merge=1 \
  --use_multi_cf_iterator=0 \
  --use_multi_get_entity=0 \
  --use_multiget=1 \
  --use_optimistic_txn=0 \
  --use_put_entity_one_in=0 \
  --use_timed_put_one_in=0 \
  --use_txn=1 \
  --use_write_buffer_manager=0 \
  --user_timestamp_size=0 \
  --value_size_mult=32 \
  --verification_only=0 \
  --verify_checksum=1 \
  --verify_checksum_one_in=1000000 \
  --verify_compression=0 \
  --verify_db_one_in=10000 \
  --verify_file_checksums_one_in=1000000 \
  --verify_iterator_with_expected_state_one_in=5 \
  --verify_sst_unique_id_in_manifest=1 \
  --wal_bytes_per_sync=0 \
  --wal_compression=none \
  --write_buffer_size=4194304 \
  --write_dbid_to_manifest=1 \
  --write_fault_one_in=128 \
  --writepercent=35
```

Reviewed By: cbi42

Differential Revision: D61048774

Pulled By: jaykorean

fbshipit-source-id: 22200d55fd0b22b68732b12516e681a6c6e2c601
2024-08-15 09:16:29 -07:00
..
agg_merge Run internal cpp modernizer on RocksDB repo (#12398) 2024-03-04 10:08:32 -08:00
backup Avoid unnecessary work in internal calls to GetSortedWalFiles (#12831) 2024-07-01 23:29:02 -07:00
blob_db Fix file deletions in DestroyDB not rate limited (#12891) 2024-08-02 19:31:55 -07:00
cassandra Run internal cpp modernizer on RocksDB repo (#12398) 2024-03-04 10:08:32 -08:00
checkpoint Fix failure to clean the temporary directory due to NotFound in crash test checkpoint creation (#12919) 2024-08-08 15:37:19 -07:00
compaction_filters Remove RocksDB LITE (#11147) 2023-01-27 13:14:19 -08:00
convenience Run clang-format on utilities/ (except utilities/transactions/) (#10853) 2022-10-24 16:38:09 -07:00
leveldb_options Put Cache and CacheWrapper in new public header (#11192) 2023-02-09 12:12:02 -08:00
memory Run internal cpp modernizer on RocksDB repo (#12398) 2024-03-04 10:08:32 -08:00
merge_operators Run internal cpp modernizer on RocksDB repo (#12398) 2024-03-04 10:08:32 -08:00
option_change_migration Disallow refitting more than 1 file from non-L0 to L0 (#12481) 2024-03-29 10:52:36 -07:00
options Run internal cpp modernizer on RocksDB repo (#12398) 2024-03-04 10:08:32 -08:00
persistent_cache Run internal cpp modernizer on RocksDB repo (#12398) 2024-03-04 10:08:32 -08:00
simulator_cache Run internal cpp modernizer on RocksDB repo (#12398) 2024-03-04 10:08:32 -08:00
table_properties_collectors Remove unreachable code (#12846) 2024-07-09 09:24:43 -07:00
trace Prefer static_cast in place of most reinterpret_cast (#12308) 2024-02-07 10:44:11 -08:00
transactions Do not add unprep_seqs when WriteImpl() fails in unprepared txn (#12927) 2024-08-15 09:16:29 -07:00
ttl Run internal cpp modernizer on RocksDB repo (#12398) 2024-03-04 10:08:32 -08:00
write_batch_with_index Fix compile errors in C++23 (#12106) 2024-05-28 15:33:57 -07:00
cache_dump_load.cc Remove RocksDB LITE (#11147) 2023-01-27 13:14:19 -08:00
cache_dump_load_impl.cc Add dump all keys for cache dumper impl (#12500) 2024-04-12 10:47:13 -07:00
cache_dump_load_impl.h Add dump all keys for cache dumper impl (#12500) 2024-04-12 10:47:13 -07:00
compaction_filters.cc Remove FactoryFunc from LoadXXXObject (#11203) 2023-02-17 12:54:07 -08:00
counted_fs.cc Fix serious FSDirectory use-after-Close bug (missing fsync) (#10460) 2022-08-02 10:54:32 -07:00
counted_fs.h Explicitly closing all directory file descriptors (#10049) 2022-06-01 18:03:34 -07:00
debug.cc Add timestamp support in dump_wal/dump/idump (#12690) 2024-05-23 20:26:57 -07:00
env_mirror.cc Run internal cpp modernizer on RocksDB repo (#12398) 2024-03-04 10:08:32 -08:00
env_mirror_test.cc Remove RocksDB LITE (#11147) 2023-01-27 13:14:19 -08:00
env_timed.cc Remove RocksDB LITE (#11147) 2023-01-27 13:14:19 -08:00
env_timed.h Remove RocksDB LITE (#11147) 2023-01-27 13:14:19 -08:00
env_timed_test.cc Remove RocksDB LITE (#11147) 2023-01-27 13:14:19 -08:00
fault_injection_env.cc Run internal cpp modernizer on RocksDB repo (#12398) 2024-03-04 10:08:32 -08:00
fault_injection_env.h Remove extra semi colon from internal_repo_rocksdb/repo/util/filelock_test.cc 2024-03-19 16:17:57 -07:00
fault_injection_fs.cc Handle injected write error after successful WAL write in crash test + misc (#12838) 2024-07-29 13:51:49 -07:00
fault_injection_fs.h Handle injected write error after successful WAL write in crash test + misc (#12838) 2024-07-29 13:51:49 -07:00
fault_injection_secondary_cache.cc Add some compressed and tiered secondary cache stats (#12150) 2023-12-15 11:34:08 -08:00
fault_injection_secondary_cache.h Remove 'virtual' when implied by 'override' (#12319) 2024-01-31 13:14:42 -08:00
memory_allocators.h Major Cache refactoring, CPU efficiency improvement (#10975) 2023-01-11 14:20:40 -08:00
merge_operators.cc Remove FactoryFunc from LoadXXXObject (#11203) 2023-02-17 12:54:07 -08:00
merge_operators.h Run clang-format on utilities/ (except utilities/transactions/) (#10853) 2022-10-24 16:38:09 -07:00
object_registry.cc Run internal cpp modernizer on RocksDB repo (#12398) 2024-03-04 10:08:32 -08:00
object_registry_test.cc Remove RocksDB LITE (#11147) 2023-01-27 13:14:19 -08:00
types_util.cc Add support in SstFileReader to get a raw table iterator (#12385) 2024-04-02 21:23:06 -07:00
types_util_test.cc Add support in SstFileReader to get a raw table iterator (#12385) 2024-04-02 21:23:06 -07:00
util_merge_operators_test.cc Run internal cpp modernizer on RocksDB repo (#12398) 2024-03-04 10:08:32 -08:00
wal_filter.cc Remove FactoryFunc from LoadXXXObject (#11203) 2023-02-17 12:54:07 -08:00