rocksdb/db_stress_tool
Changyu Bi defd97bc9d Add an option to verify memtable key order during reads (#12889)
Summary:
add a new CF option `paranoid_memory_checks` that allows additional data integrity validations during read/scan. Currently, skiplist-based memtable will validate the order of keys visited. Further data validation can be added in different layers. The option will be opt-in due to performance overhead.

The motivation for this feature is for services where data correctness is critical and want to detect in-memory corruption earlier. For a corrupted memtable key, this feature can help to detect it during during reads instead of during flush with existing protections (OutputValidator that verifies key order or per kv checksum). See internally linked task for more context.

Pull Request resolved: https://github.com/facebook/rocksdb/pull/12889

Test Plan:
* new unit test added for paranoid_memory_checks=true.
* existing unit test for paranoid_memory_checks=false.
* enable in stress test.

Performance Benchmark: we check for performance regression in read path where data is in memtable only. For each benchmark, the script was run at the same time for main and this PR:
* Memtable-only randomread ops/sec:
```
(for I in $(seq 1 50);do ./db_bench --benchmarks=fillseq,readrandom --write_buffer_size=268435456 --writes=250000 --num=250000 --reads=500000  --seed=1723056275 2>&1 | grep "readrandom"; done;) | awk '{ t += $5; c++; print } END { print 1.0 * t / c }';

Main: 608146
PR with paranoid_memory_checks=false: 607727 (- %0.07)
PR with paranoid_memory_checks=true: 521889 (-%14.2)
```

* Memtable-only sequential scan ops/sec:
```
(for I in $(seq 1 50); do ./db_bench--benchmarks=fillseq,readseq[-X10] --write_buffer_size=268435456 --num=1000000  --seed=1723056275 2>1 | grep "\[AVG 10 runs\]"; done;) | awk '{ t += $6; c++; print; } END { printf "%.0f\n", 1.0 * t / c }';

Main: 9180077
PR with paranoid_memory_checks=false: 9536241 (+%3.8)
PR with paranoid_memory_checks=true: 7653934 (-%16.6)
```

* Memtable-only reverse scan ops/sec:
```
(for I in $(seq 1 20); do ./db_bench --benchmarks=fillseq,readreverse[-X10] --write_buffer_size=268435456 --num=1000000  --seed=1723056275 2>1 | grep "\[AVG 10 runs\]"; done;) | awk '{ t += $6; c++; print; } END { printf "%.0f\n", 1.0 * t / c }';

 Main: 1285719
 PR with integrity_checks=false: 1431626 (+%11.3)
 PR with integrity_checks=true: 811031 (-%36.9)
```

The `readrandom` benchmark shows no regression. The scanning benchmarks show improvement that I can't explain.

Reviewed By: pdillinger

Differential Revision: D60414267

Pulled By: cbi42

fbshipit-source-id: a70b0cbeea131f1a249a5f78f9dc3a62dacfaa91
2024-08-19 13:53:25 -07:00
..
batched_ops_stress.cc Add experimental range filters to stress/crash test (#12769) 2024-06-18 16:16:09 -07:00
cf_consistency_stress.cc Remove unnecessary injected error logging in crash test (#12807) 2024-06-24 20:51:39 -07:00
CMakeLists.txt Add experimental range filters to stress/crash test (#12769) 2024-06-18 16:16:09 -07:00
db_stress.cc Disable tiered storage + BlobDB stress test (#10699) 2022-09-19 15:39:31 -07:00
db_stress_common.cc Fix nullptr access and race to fault_fs_guard (#12799) 2024-06-24 16:10:36 -07:00
db_stress_common.h Add an option to verify memtable key order during reads (#12889) 2024-08-19 13:53:25 -07:00
db_stress_compaction_filter.h Fix improper ExpectedValute::Exists() usages and disable compaction during VerifyDB() in crash test (#12933) 2024-08-15 12:32:59 -07:00
db_stress_driver.cc Fix improper ExpectedValute::Exists() usages and disable compaction during VerifyDB() in crash test (#12933) 2024-08-15 12:32:59 -07:00
db_stress_driver.h Remove unnecessary, confusing 'extern' (#12300) 2024-01-29 10:38:08 -08:00
db_stress_env_wrapper.h Deflake ThreadStatus related unit tests (#12858) 2024-07-15 09:56:09 -07:00
db_stress_filters.cc Add experimental range filters to stress/crash test (#12769) 2024-06-18 16:16:09 -07:00
db_stress_filters.h Add experimental range filters to stress/crash test (#12769) 2024-06-18 16:16:09 -07:00
db_stress_gflags.cc Add an option to verify memtable key order during reads (#12889) 2024-08-19 13:53:25 -07:00
db_stress_listener.cc Fix/improve temperature handling for file ingestion (#12402) 2024-03-05 16:56:08 -08:00
db_stress_listener.h Handle injected write error after successful WAL write in crash test + misc (#12838) 2024-07-29 13:51:49 -07:00
db_stress_shared_state.cc Remove ROCKSDB_SUPPORT_THREAD_LOCAL define because it's a part of C++11 (#10015) 2022-05-18 15:25:19 -07:00
db_stress_shared_state.h Fix improper ExpectedValute::Exists() usages and disable compaction during VerifyDB() in crash test (#12933) 2024-08-15 12:32:59 -07:00
db_stress_stat.cc Fix Statistics in db_stress (#9260) 2021-12-07 16:24:22 -08:00
db_stress_stat.h Fix Statistics in db_stress (#9260) 2021-12-07 16:24:22 -08:00
db_stress_table_properties_collector.h Remove 'virtual' when implied by 'override' (#12319) 2024-01-31 13:14:42 -08:00
db_stress_test_base.cc Add an option to verify memtable key order during reads (#12889) 2024-08-19 13:53:25 -07:00
db_stress_test_base.h Fix improper ExpectedValute::Exists() usages and disable compaction during VerifyDB() in crash test (#12933) 2024-08-15 12:32:59 -07:00
db_stress_tool.cc Fix race in multiops txn stress test (#12847) 2024-07-09 16:51:38 -07:00
db_stress_wide_merge_operator.cc Add the wide-column aware merge API to the stress tests (#11906) 2023-09-29 08:54:50 -07:00
db_stress_wide_merge_operator.h Add the wide-column aware merge API to the stress tests (#11906) 2023-09-29 08:54:50 -07:00
expected_state.cc Fix false-positive TestBackupRestore corruption (#12917) 2024-08-09 14:51:36 -07:00
expected_state.h Fix false-positive TestBackupRestore corruption (#12917) 2024-08-09 14:51:36 -07:00
expected_value.cc Fix improper ExpectedValute::Exists() usages and disable compaction during VerifyDB() in crash test (#12933) 2024-08-15 12:32:59 -07:00
expected_value.h Fix improper ExpectedValute::Exists() usages and disable compaction during VerifyDB() in crash test (#12933) 2024-08-15 12:32:59 -07:00
multi_ops_txns_stress.cc Add experimental range filters to stress/crash test (#12769) 2024-06-18 16:16:09 -07:00
multi_ops_txns_stress.h Disable AttributeGroup in multiops txn test (#12781) 2024-06-18 16:05:18 -07:00
no_batched_ops_stress.cc Fix improper ExpectedValute::Exists() usages and disable compaction during VerifyDB() in crash test (#12933) 2024-08-15 12:32:59 -07:00