rocksdb/db/compaction
Changyu Bi 62fc15f009 Block per key-value checksum (#11287)
Summary:
add option `block_protection_bytes_per_key` and implementation for block per key-value checksum. The main changes are
1. checksum construction and verification in block.cc/h
2. pass the option `block_protection_bytes_per_key` around (mainly for methods defined in table_cache.h)
3. unit tests/crash test updates

Tests:
* Added unit tests
* Crash test: `python3 tools/db_crashtest.py blackbox --simple --block_protection_bytes_per_key=1 --write_buffer_size=1048576`

Follow up (maybe as a separate PR): make sure corruption status returned from BlockIters are correctly handled.

Performance:
Turning on block per KV protection has a non-trivial negative impact on read performance and costs additional memory.
For memory, each block includes additional 24 bytes for checksum-related states beside checksum itself. For CPU, I set up a DB of size ~1.2GB with 5M keys (32 bytes key and 200 bytes value) which compacts to ~5 SST files (target file size 256 MB) in L6 without compression. I tested readrandom performance with various block cache size (to mimic various cache hit rates):

```
SETUP
make OPTIMIZE_LEVEL="-O3" USE_LTO=1 DEBUG_LEVEL=0 -j32 db_bench
./db_bench -benchmarks=fillseq,compact0,waitforcompaction,compact,waitforcompaction -write_buffer_size=33554432 -level_compaction_dynamic_level_bytes=true -max_background_jobs=8 -target_file_size_base=268435456 --num=5000000 --key_size=32 --value_size=200 --compression_type=none

BENCHMARK
./db_bench --use_existing_db -benchmarks=readtocache,readrandom[-X10] --num=5000000 --key_size=32 --disable_auto_compactions --reads=1000000 --block_protection_bytes_per_key=[0|1] --cache_size=$CACHESIZE

The readrandom ops/sec looks like the following:
Block cache size:  2GB        1.2GB * 0.9    1.2GB * 0.8     1.2GB * 0.5   8MB
Main              240805     223604         198176           161653       139040
PR prot_bytes=0   238691     226693         200127           161082       141153
PR prot_bytes=1   214983     193199         178532           137013       108211
prot_bytes=1 vs    -10%        -15%          -10.8%          -15%        -23%
prot_bytes=0
```

The benchmark has a lot of variance, but there was a 5% to 25% regression in this benchmark with different cache hit rates.

Pull Request resolved: https://github.com/facebook/rocksdb/pull/11287

Reviewed By: ajkr

Differential Revision: D43970708

Pulled By: cbi42

fbshipit-source-id: ef98d898b71779846fa74212b9ec9e08b7183940
2023-04-25 12:08:23 -07:00
..
clipping_iterator.h Refactor AddRangeDels() + consider range tombstone during compaction file cutting (#11113) 2023-02-22 12:28:18 -08:00
clipping_iterator_test.cc Print stack traces on frozen tests in CI (#10828) 2022-10-18 00:35:35 -07:00
compaction.cc Revert #10802 Consider range tombstone in compaction output file cutting (#11089) 2023-01-13 12:28:21 -08:00
compaction.h Drain unnecessary levels when level_compaction_dynamic_level_bytes=true (#11340) 2023-04-06 11:20:43 -07:00
compaction_iteration_stats.h Support readahead during compaction for blob files (#9187) 2021-11-19 17:53:47 -08:00
compaction_iterator.cc Group rocksdb.sst.read.micros stat by IOActivity flush and compaction (#11288) 2023-04-21 09:07:18 -07:00
compaction_iterator.h Refactor AddRangeDels() + consider range tombstone during compaction file cutting (#11113) 2023-02-22 12:28:18 -08:00
compaction_iterator_test.cc Basic Support for Merge with user-defined timestamp (#10819) 2022-10-31 22:28:58 -07:00
compaction_job.cc Block per key-value checksum (#11287) 2023-04-25 12:08:23 -07:00
compaction_job.h Refactor AddRangeDels() + consider range tombstone during compaction file cutting (#11113) 2023-02-22 12:28:18 -08:00
compaction_job_stats_test.cc Remove RocksDB LITE (#11147) 2023-01-27 13:14:19 -08:00
compaction_job_test.cc Block per key-value checksum (#11287) 2023-04-25 12:08:23 -07:00
compaction_outputs.cc Group rocksdb.sst.read.micros stat by IOActivity flush and compaction (#11288) 2023-04-21 09:07:18 -07:00
compaction_outputs.h Refactor AddRangeDels() + consider range tombstone during compaction file cutting (#11113) 2023-02-22 12:28:18 -08:00
compaction_picker.cc Remove RocksDB LITE (#11147) 2023-01-27 13:14:19 -08:00
compaction_picker.h Remove RocksDB LITE (#11147) 2023-01-27 13:14:19 -08:00
compaction_picker_fifo.cc Remove RocksDB LITE (#11147) 2023-01-27 13:14:19 -08:00
compaction_picker_fifo.h Remove RocksDB LITE (#11147) 2023-01-27 13:14:19 -08:00
compaction_picker_level.cc Try to pick more files in LevelCompactionBuilder::TryExtendNonL0TrivialMove() (#11347) 2023-04-14 11:50:20 -07:00
compaction_picker_level.h Sort L0 files by newly introduced epoch_num (#10922) 2022-12-13 13:29:37 -08:00
compaction_picker_test.cc Try to pick more files in LevelCompactionBuilder::TryExtendNonL0TrivialMove() (#11347) 2023-04-14 11:50:20 -07:00
compaction_picker_universal.cc Remove RocksDB LITE (#11147) 2023-01-27 13:14:19 -08:00
compaction_picker_universal.h Remove RocksDB LITE (#11147) 2023-01-27 13:14:19 -08:00
compaction_service_job.cc Remove RocksDB LITE (#11147) 2023-01-27 13:14:19 -08:00
compaction_service_test.cc Remove RocksDB LITE (#11147) 2023-01-27 13:14:19 -08:00
compaction_state.cc Tiered Compaction: per key placement support (#9964) 2022-07-13 20:54:49 -07:00
compaction_state.h Tiered Compaction: per key placement support (#9964) 2022-07-13 20:54:49 -07:00
file_pri.h Try to start TTL earlier with kMinOverlappingRatio is used (#8749) 2021-11-01 14:36:31 -07:00
sst_partitioner.cc Remove FactoryFunc from LoadXXXObject (#11203) 2023-02-17 12:54:07 -08:00
subcompaction_state.cc Refactor Compaction file cut ShouldStopBefore() (#10629) 2022-09-14 22:09:12 -07:00
subcompaction_state.h Refactor AddRangeDels() + consider range tombstone during compaction file cutting (#11113) 2023-02-22 12:28:18 -08:00
tiered_compaction_test.cc Drain unnecessary levels when level_compaction_dynamic_level_bytes=true (#11340) 2023-04-06 11:20:43 -07:00