rocksdb/memtable
Changyu Bi defd97bc9d Add an option to verify memtable key order during reads (#12889)
Summary:
add a new CF option `paranoid_memory_checks` that allows additional data integrity validations during read/scan. Currently, skiplist-based memtable will validate the order of keys visited. Further data validation can be added in different layers. The option will be opt-in due to performance overhead.

The motivation for this feature is for services where data correctness is critical and want to detect in-memory corruption earlier. For a corrupted memtable key, this feature can help to detect it during during reads instead of during flush with existing protections (OutputValidator that verifies key order or per kv checksum). See internally linked task for more context.

Pull Request resolved: https://github.com/facebook/rocksdb/pull/12889

Test Plan:
* new unit test added for paranoid_memory_checks=true.
* existing unit test for paranoid_memory_checks=false.
* enable in stress test.

Performance Benchmark: we check for performance regression in read path where data is in memtable only. For each benchmark, the script was run at the same time for main and this PR:
* Memtable-only randomread ops/sec:
```
(for I in $(seq 1 50);do ./db_bench --benchmarks=fillseq,readrandom --write_buffer_size=268435456 --writes=250000 --num=250000 --reads=500000  --seed=1723056275 2>&1 | grep "readrandom"; done;) | awk '{ t += $5; c++; print } END { print 1.0 * t / c }';

Main: 608146
PR with paranoid_memory_checks=false: 607727 (- %0.07)
PR with paranoid_memory_checks=true: 521889 (-%14.2)
```

* Memtable-only sequential scan ops/sec:
```
(for I in $(seq 1 50); do ./db_bench--benchmarks=fillseq,readseq[-X10] --write_buffer_size=268435456 --num=1000000  --seed=1723056275 2>1 | grep "\[AVG 10 runs\]"; done;) | awk '{ t += $6; c++; print; } END { printf "%.0f\n", 1.0 * t / c }';

Main: 9180077
PR with paranoid_memory_checks=false: 9536241 (+%3.8)
PR with paranoid_memory_checks=true: 7653934 (-%16.6)
```

* Memtable-only reverse scan ops/sec:
```
(for I in $(seq 1 20); do ./db_bench --benchmarks=fillseq,readreverse[-X10] --write_buffer_size=268435456 --num=1000000  --seed=1723056275 2>1 | grep "\[AVG 10 runs\]"; done;) | awk '{ t += $6; c++; print; } END { printf "%.0f\n", 1.0 * t / c }';

 Main: 1285719
 PR with integrity_checks=false: 1431626 (+%11.3)
 PR with integrity_checks=true: 811031 (-%36.9)
```

The `readrandom` benchmark shows no regression. The scanning benchmarks show improvement that I can't explain.

Reviewed By: pdillinger

Differential Revision: D60414267

Pulled By: cbi42

fbshipit-source-id: a70b0cbeea131f1a249a5f78f9dc3a62dacfaa91
2024-08-19 13:53:25 -07:00
..
alloc_tracker.cc internal_repo_rocksdb (-8794174668376270091) (#12114) 2023-12-01 11:10:30 -08:00
hash_linklist_rep.cc Remove 'virtual' when implied by 'override' (#12319) 2024-01-31 13:14:42 -08:00
hash_skiplist_rep.cc Remove 'virtual' when implied by 'override' (#12319) 2024-01-31 13:14:42 -08:00
inlineskiplist.h Add an option to verify memtable key order during reads (#12889) 2024-08-19 13:53:25 -07:00
inlineskiplist_test.cc Prefer static_cast in place of most reinterpret_cast (#12308) 2024-02-07 10:44:11 -08:00
memtablerep_bench.cc Remove RocksDB LITE (#11147) 2023-01-27 13:14:19 -08:00
skiplist.h Run clang format against files under example/, memory/ and memtable/ folders (#10893) 2022-10-28 13:16:50 -07:00
skiplist_test.cc Prefer static_cast in place of most reinterpret_cast (#12308) 2024-02-07 10:44:11 -08:00
skiplistrep.cc Add an option to verify memtable key order during reads (#12889) 2024-08-19 13:53:25 -07:00
stl_wrappers.h Run clang format against files under example/, memory/ and memtable/ folders (#10893) 2022-10-28 13:16:50 -07:00
vectorrep.cc internal_repo_rocksdb (-8794174668376270091) (#12114) 2023-12-01 11:10:30 -08:00
write_buffer_manager.cc Add SetAllowStall() (#11335) 2023-03-30 09:43:33 -07:00
write_buffer_manager_test.cc Put Cache and CacheWrapper in new public header (#11192) 2023-02-09 12:12:02 -08:00