rocksdb/db_stress_tool
Changyu Bi 9ded9f789f Fix db_stress FaultInjectionTestFS set up before DB open (#11958)
Summary:
We saw frequent stress test failures with error messages like:
```
Verification failed for column family 0 key ...: value_from_db: , value_from_expected: ..., msg: GetEntity verification: Value not found: NotFound:
```
One cause for this is that data in WAL is lost after a crash. We initialize FaultInjectionTestFS to be not direct writable when write_fault_injection is enabled (see code change). This can cause the first WAL created during DB open to be lost if a db_stress is killed before the first WAL is synced. This PR initializes FaultInjectionTestFS to be direct writable. Note that FaultInjectionTestFS will be configured propertly for write fault injection after DB open in `RunStressTestImpl()`. So this change should not affect write fault injection coverage.

Pull Request resolved: https://github.com/facebook/rocksdb/pull/11958

Test Plan:
a repro for the above bug:
```
Simulate crash before first WAL is sealed:

 --- a/db_stress_tool/db_stress_driver.cc
+++ b/db_stress_tool/db_stress_driver.cc
@@ -256,6 +256,7 @@ bool RunStressTestImpl(SharedState* shared) {
     fprintf(stderr, "Verification failed :(\n");
     return false;
   }
+  exit(1);
   return true;
 }

./db_stress --clear_column_family_one_in=0 --column_families=1 --preserve_internal_time_seconds=60 --destroy_db_initially=0 --db=/dev/shm/rocksdb_crashtest_blackbox --db_write_buffer_size=2097152 --destroy_db_initially=0 --expected_values_dir=/dev/shm/rocksdb_crashtest_expected --reopen=0 --test_batches_snapshots=0 --threads=1 --ops_per_thread=100 --write_fault_one_in=1000 --sync_fault_injection=0

./db_stress_main  --clear_column_family_one_in=0 --column_families=1 --preserve_internal_time_seconds=60 --destroy_db_initially=0 --db=/dev/shm/rocksdb_crashtest_blackbox --db_write_buffer_size=2097152 --destroy_db_initially=0 --expected_values_dir=/dev/shm/rocksdb_crashtest_expected --reopen=0 --test_batches_snapshots=0 --sync_fault_injection=1
```

Reviewed By: akankshamahajan15

Differential Revision: D50300347

Pulled By: cbi42

fbshipit-source-id: 3a4881d72197f5ece82364382a0100912e16c2d6
2023-10-14 13:33:55 -07:00
..
CMakeLists.txt Add the wide-column aware merge API to the stress tests (#11906) 2023-09-29 08:54:50 -07:00
batched_ops_stress.cc Add the wide-column aware merge API to the stress tests (#11906) 2023-09-29 08:54:50 -07:00
cf_consistency_stress.cc Add the wide-column aware merge API to the stress tests (#11906) 2023-09-29 08:54:50 -07:00
db_stress.cc Disable tiered storage + BlobDB stress test (#10699) 2022-09-19 15:39:31 -07:00
db_stress_common.cc Add TieredCache and compressed cache capacity change to db_stress (#11935) 2023-10-10 13:12:18 -07:00
db_stress_common.h Add TieredCache and compressed cache capacity change to db_stress (#11935) 2023-10-10 13:12:18 -07:00
db_stress_compaction_filter.h Enable compaction filter for db_stress with user-defined timestamp (#10259) 2022-06-27 11:53:09 -07:00
db_stress_driver.cc Add TieredCache and compressed cache capacity change to db_stress (#11935) 2023-10-10 13:12:18 -07:00
db_stress_driver.h fix shared state used after free (#11059) 2023-01-04 19:35:34 -08:00
db_stress_env_wrapper.h Group rocksdb.sst.read.micros stat by different user read IOActivity + misc (#11444) 2023-08-08 17:26:50 -07:00
db_stress_gflags.cc Add TieredCache and compressed cache capacity change to db_stress (#11935) 2023-10-10 13:12:18 -07:00
db_stress_listener.cc Remove RocksDB LITE (#11147) 2023-01-27 13:14:19 -08:00
db_stress_listener.h Make stopped writes block on recovery (#11879) 2023-10-10 06:29:01 -07:00
db_stress_shared_state.cc Remove ROCKSDB_SUPPORT_THREAD_LOCAL define because it's a part of C++11 (#10015) 2022-05-18 15:25:19 -07:00
db_stress_shared_state.h Added compaction read errors to `db_stress` (#11789) 2023-09-05 10:41:29 -07:00
db_stress_stat.cc Fix Statistics in db_stress (#9260) 2021-12-07 16:24:22 -08:00
db_stress_stat.h Fix Statistics in db_stress (#9260) 2021-12-07 16:24:22 -08:00
db_stress_table_properties_collector.h Fix and detect headers with missing dependencies (#8893) 2021-09-10 10:00:26 -07:00
db_stress_test_base.cc Enable write fault injection in db_stress (#11924) 2023-10-11 11:28:00 -07:00
db_stress_test_base.h Make stopped writes block on recovery (#11879) 2023-10-10 06:29:01 -07:00
db_stress_tool.cc Fix db_stress FaultInjectionTestFS set up before DB open (#11958) 2023-10-14 13:33:55 -07:00
db_stress_wide_merge_operator.cc Add the wide-column aware merge API to the stress tests (#11906) 2023-09-29 08:54:50 -07:00
db_stress_wide_merge_operator.h Add the wide-column aware merge API to the stress tests (#11906) 2023-09-29 08:54:50 -07:00
expected_state.cc Add helper methods WideColumnsHelper::{Has,Get}DefaultColumn (#11813) 2023-09-11 16:32:32 -07:00
expected_state.h Improve comment of ExpectedValue in db stress (#11456) 2023-05-18 09:44:15 -07:00
expected_value.cc Improve comment of ExpectedValue in db stress (#11456) 2023-05-18 09:44:15 -07:00
expected_value.h Refactor WriteUnpreparedStressTest to be a unit test (#11424) 2023-05-22 12:31:52 -07:00
multi_ops_txns_stress.cc Make stopped writes block on recovery (#11879) 2023-10-10 06:29:01 -07:00
multi_ops_txns_stress.h Group rocksdb.sst.read.micros stat by IOActivity flush and compaction (#11288) 2023-04-21 09:07:18 -07:00
no_batched_ops_stress.cc Do not fail stress test when file ingestion return injected error (#11956) 2023-10-14 10:08:03 -07:00