rocksdb/table/block_based
Changyu Bi c2aad555c3 Add CompressionOptions::checksum for enabling ZSTD checksum (#11666)
Summary:
Optionally enable zstd checksum flag (d857369028/lib/zstd.h (L428)) to detect corruption during decompression. Main changes are in compression.h:
* User can set CompressionOptions::checksum to true to enable this feature.
* We enable this feature in ZSTD by setting the checksum flag in ZSTD compression context: `ZSTD_CCtx`.
* Uses `ZSTD_compress2()` to do compression since it supports frame parameter like the checksum flag. Compression level is also set in compression context as a flag.
* Error handling during decompression to propagate error message from ZSTD.
* Updated microbench to test read performance impact.

About compatibility, the current compression decoders should continue to work with the data created by the new compression API `ZSTD_compress2()`: https://github.com/facebook/zstd/issues/3711.

Pull Request resolved: https://github.com/facebook/rocksdb/pull/11666

Test Plan:
* Existing unit tests for zstd compression
* Add unit test `DBTest2.ZSTDChecksum` to test the corruption case
* Manually tested that compression levels, parallel compression, dictionary compression, index compression all work with the new ZSTD_compress2() API.
* Manually tested with `sst_dump --command=recompress` that different compression levels and dictionary compression settings all work.
* Manually tested compiling with older versions of ZSTD: v1.3.8, v1.1.0, v0.6.2.
* Perf impact: from public benchmark data: http://fastcompression.blogspot.com/2019/03/presenting-xxh3.html for checksum and https://github.com/facebook/zstd#benchmarks, if decompression is 1700MB/s and checksum computation is 70000MB/s, checksum computation is an additional ~2.4% time for decompression. Compression is slower and checksumming should be less noticeable.
* Microbench:
```
TEST_TMPDIR=/dev/shm ./branch_db_basic_bench --benchmark_filter=DBGet/comp_style:0/max_data:1048576/per_key_size:256/enable_statistics:0/negative_query:0/enable_filter:0/mmap:0/compression_type:7/compression_checksum:1/no_blockcache:1/iterations:10000/threads:1 --benchmark_repetitions=100

Min out of 100 runs:
Main:
10390 10436 10456 10484 10499 10535 10544 10545 10565 10568

After this PR, checksum=false
10285 10397 10503 10508 10515 10557 10562 10635 10640 10660

After this PR, checksum=true
10827 10876 10925 10949 10971 11052 11061 11063 11100 11109
```
* db_bench:
```
Write perf
TEST_TMPDIR=/dev/shm/ ./db_bench_ichecksum --benchmarks=fillseq[-X10] --compression_type=zstd --num=10000000 --compression_checksum=..

[FillSeq checksum=0]
fillseq [AVG    10 runs] : 281635 (± 31711) ops/sec;   31.2 (± 3.5) MB/sec
fillseq [MEDIAN 10 runs] : 294027 ops/sec;   32.5 MB/sec

[FillSeq checksum=1]
fillseq [AVG    10 runs] : 286961 (± 34700) ops/sec;   31.7 (± 3.8) MB/sec
fillseq [MEDIAN 10 runs] : 283278 ops/sec;   31.3 MB/sec

Read perf
TEST_TMPDIR=/dev/shm ./db_bench_ichecksum --benchmarks=readrandom[-X20] --num=100000000 --reads=1000000 --use_existing_db=true --readonly=1

[Readrandom checksum=1]
readrandom [AVG    20 runs] : 360928 (± 3579) ops/sec;    4.0 (± 0.0) MB/sec
readrandom [MEDIAN 20 runs] : 362468 ops/sec;    4.0 MB/sec

[Readrandom checksum=0]
readrandom [AVG    20 runs] : 380365 (± 2384) ops/sec;    4.2 (± 0.0) MB/sec
readrandom [MEDIAN 20 runs] : 379800 ops/sec;    4.2 MB/sec

Compression
TEST_TMPDIR=/dev/shm ./db_bench_ichecksum --benchmarks=compress[-X20] --compression_type=zstd --num=100000000 --compression_checksum=1

checksum=1
compress [AVG    20 runs] : 54074 (± 634) ops/sec;  211.2 (± 2.5) MB/sec
compress [MEDIAN 20 runs] : 54396 ops/sec;  212.5 MB/sec

checksum=0
compress [AVG    20 runs] : 54598 (± 393) ops/sec;  213.3 (± 1.5) MB/sec
compress [MEDIAN 20 runs] : 54592 ops/sec;  213.3 MB/sec

Decompression:
TEST_TMPDIR=/dev/shm ./db_bench_ichecksum --benchmarks=uncompress[-X20] --compression_type=zstd --compression_checksum=1

checksum = 0
uncompress [AVG    20 runs] : 167499 (± 962) ops/sec;  654.3 (± 3.8) MB/sec
uncompress [MEDIAN 20 runs] : 167210 ops/sec;  653.2 MB/sec
checksum = 1
uncompress [AVG    20 runs] : 167980 (± 924) ops/sec;  656.2 (± 3.6) MB/sec
uncompress [MEDIAN 20 runs] : 168465 ops/sec;  658.1 MB/sec
```

Reviewed By: ajkr

Differential Revision: D48019378

Pulled By: cbi42

fbshipit-source-id: 674120c6e1853c2ced1436ac8138559d0204feba
2023-08-18 15:01:59 -07:00
..
binary_search_index_reader.cc Add support to strip / pad timestamp when creating / reading a block based table (#11495) 2023-06-01 11:10:03 -07:00
binary_search_index_reader.h Extend Get/MultiGet deadline support to table open (#6982) 2020-06-29 14:53:17 -07:00
block.cc PutEntity Support in SST File Writer (#11688) 2023-08-10 18:16:10 -07:00
block.h Add support to strip / pad timestamp when writing / reading a block (#11472) 2023-05-25 15:41:32 -07:00
block_based_table_builder.cc Add CompressionOptions::checksum for enabling ZSTD checksum (#11666) 2023-08-18 15:01:59 -07:00
block_based_table_builder.h Record and use the tail size to prefetch table tail (#11406) 2023-05-08 13:14:28 -07:00
block_based_table_factory.cc fix typo (#11595) 2023-07-19 13:04:48 -07:00
block_based_table_factory.h Record and use the tail size to prefetch table tail (#11406) 2023-05-08 13:14:28 -07:00
block_based_table_iterator.cc Group rocksdb.sst.read.micros stat by different user read IOActivity + misc (#11444) 2023-08-08 17:26:50 -07:00
block_based_table_iterator.h Much better stats for seeks and prefix filtering (#11460) 2023-05-19 15:25:49 -07:00
block_based_table_reader.cc Explicitly instantiate MaybeReadBlockAndLoadToCache as well (#11714) 2023-08-18 10:19:33 -07:00
block_based_table_reader.h format_version=6 and context-aware block checksums (#9058) 2023-07-30 16:40:01 -07:00
block_based_table_reader_impl.h Group rocksdb.sst.read.micros stat by IOActivity flush and compaction (#11288) 2023-04-21 09:07:18 -07:00
block_based_table_reader_sync_and_async.h Group rocksdb.sst.read.micros stat by different user read IOActivity + misc (#11444) 2023-08-08 17:26:50 -07:00
block_based_table_reader_test.cc Add support to strip / pad timestamp when creating / reading a block based table (#11495) 2023-06-01 11:10:03 -07:00
block_builder.cc Logically strip timestamp during flush (#11557) 2023-06-29 15:50:50 -07:00
block_builder.h Logically strip timestamp during flush (#11557) 2023-06-29 15:50:50 -07:00
block_cache.cc Block per key-value checksum (#11287) 2023-04-25 12:08:23 -07:00
block_cache.h Block per key-value checksum (#11287) 2023-04-25 12:08:23 -07:00
block_prefetcher.cc Group rocksdb.sst.read.micros stat by different user read IOActivity + misc (#11444) 2023-08-08 17:26:50 -07:00
block_prefetcher.h Group rocksdb.sst.read.micros stat by different user read IOActivity + misc (#11444) 2023-08-08 17:26:50 -07:00
block_prefix_index.cc Fix bug with kHashSearch and changing prefix_extractor with SetOptions (#10128) 2022-06-10 08:51:45 -07:00
block_prefix_index.h Fix bug with kHashSearch and changing prefix_extractor with SetOptions (#10128) 2022-06-10 08:51:45 -07:00
block_test.cc Fix use after move in data block hash index (#11505) 2023-06-08 11:04:43 -07:00
block_type.h Remove deprecated block-based filter (#10184) 2022-06-16 15:51:33 -07:00
cachable_entry.h HyperClockCache support for SecondaryCache, with refactoring (#11301) 2023-03-17 20:23:49 -07:00
data_block_footer.cc Replace namespace name "rocksdb" with ROCKSDB_NAMESPACE (#6433) 2020-02-20 12:09:57 -08:00
data_block_footer.h Replace namespace name "rocksdb" with ROCKSDB_NAMESPACE (#6433) 2020-02-20 12:09:57 -08:00
data_block_hash_index.cc Format files under table/ by clang-format (#10852) 2022-10-25 11:50:38 -07:00
data_block_hash_index.h Fix build with gcc 13 by including <cstdint> (#11118) 2023-01-25 14:30:32 -08:00
data_block_hash_index_test.cc Block per key-value checksum (#11287) 2023-04-25 12:08:23 -07:00
filter_block.h Record and use the tail size to prefetch table tail (#11406) 2023-05-08 13:14:28 -07:00
filter_block_reader_common.cc Use user-provided ReadOptions for metadata block reads more often (#11208) 2023-04-04 16:53:14 -07:00
filter_block_reader_common.h Use user-provided ReadOptions for metadata block reads more often (#11208) 2023-04-04 16:53:14 -07:00
filter_policy.cc Remove RocksDB LITE (#11147) 2023-01-27 13:14:19 -08:00
filter_policy_internal.h Remove deprecated block-based filter (#10184) 2022-06-16 15:51:33 -07:00
flush_block_policy.cc Change internal headers with duplicate names (#11408) 2023-05-17 11:27:09 -07:00
flush_block_policy_impl.h Change internal headers with duplicate names (#11408) 2023-05-17 11:27:09 -07:00
full_filter_block.cc Use user-provided ReadOptions for metadata block reads more often (#11208) 2023-04-04 16:53:14 -07:00
full_filter_block.h Use user-provided ReadOptions for metadata block reads more often (#11208) 2023-04-04 16:53:14 -07:00
full_filter_block_test.cc Use user-provided ReadOptions for metadata block reads more often (#11208) 2023-04-04 16:53:14 -07:00
hash_index_reader.cc Add support to strip / pad timestamp when creating / reading a block based table (#11495) 2023-06-01 11:10:03 -07:00
hash_index_reader.h Extend Get/MultiGet deadline support to table open (#6982) 2020-06-29 14:53:17 -07:00
index_builder.cc Add support to strip / pad timestamp when creating / reading a block based table (#11495) 2023-06-01 11:10:03 -07:00
index_builder.h Remove some useless qualifier (#11596) 2023-07-20 13:43:26 -07:00
index_reader_common.cc format_version=6 and context-aware block checksums (#9058) 2023-07-30 16:40:01 -07:00
index_reader_common.h Add support to strip / pad timestamp when creating / reading a block based table (#11495) 2023-06-01 11:10:03 -07:00
mock_block_based_table.h Remove deprecated block-based filter (#10184) 2022-06-16 15:51:33 -07:00
parsed_full_filter_block.cc Hide FilterBits{Builder,Reader} from public API (#9592) 2022-02-17 16:34:46 -08:00
parsed_full_filter_block.h Major Cache refactoring, CPU efficiency improvement (#10975) 2023-01-11 14:20:40 -08:00
partitioned_filter_block.cc Group rocksdb.sst.read.micros stat by different user read IOActivity + misc (#11444) 2023-08-08 17:26:50 -07:00
partitioned_filter_block.h Add support to strip / pad timestamp when creating / reading a block based table (#11495) 2023-06-01 11:10:03 -07:00
partitioned_filter_block_test.cc Add support to strip / pad timestamp when creating / reading a block based table (#11495) 2023-06-01 11:10:03 -07:00
partitioned_index_iterator.cc Group rocksdb.sst.read.micros stat by different user read IOActivity + misc (#11444) 2023-08-08 17:26:50 -07:00
partitioned_index_iterator.h Format files under table/ by clang-format (#10852) 2022-10-25 11:50:38 -07:00
partitioned_index_reader.cc Group rocksdb.sst.read.micros stat by different user read IOActivity + misc (#11444) 2023-08-08 17:26:50 -07:00
partitioned_index_reader.h Record and use the tail size to prefetch table tail (#11406) 2023-05-08 13:14:28 -07:00
reader_common.cc format_version=6 and context-aware block checksums (#9058) 2023-07-30 16:40:01 -07:00
reader_common.h format_version=6 and context-aware block checksums (#9058) 2023-07-30 16:40:01 -07:00
uncompression_dict_reader.cc Group rocksdb.sst.read.micros stat by IOActivity flush and compaction (#11288) 2023-04-21 09:07:18 -07:00
uncompression_dict_reader.h Group rocksdb.sst.read.micros stat by IOActivity flush and compaction (#11288) 2023-04-21 09:07:18 -07:00