rocksdb/util
Changyu Bi c2aad555c3 Add `CompressionOptions::checksum` for enabling ZSTD checksum (#11666)
Summary:
Optionally enable zstd checksum flag (d857369028/lib/zstd.h (L428)) to detect corruption during decompression. Main changes are in compression.h:
* User can set CompressionOptions::checksum to true to enable this feature.
* We enable this feature in ZSTD by setting the checksum flag in ZSTD compression context: `ZSTD_CCtx`.
* Uses `ZSTD_compress2()` to do compression since it supports frame parameter like the checksum flag. Compression level is also set in compression context as a flag.
* Error handling during decompression to propagate error message from ZSTD.
* Updated microbench to test read performance impact.

About compatibility, the current compression decoders should continue to work with the data created by the new compression API `ZSTD_compress2()`: https://github.com/facebook/zstd/issues/3711.

Pull Request resolved: https://github.com/facebook/rocksdb/pull/11666

Test Plan:
* Existing unit tests for zstd compression
* Add unit test `DBTest2.ZSTDChecksum` to test the corruption case
* Manually tested that compression levels, parallel compression, dictionary compression, index compression all work with the new ZSTD_compress2() API.
* Manually tested with `sst_dump --command=recompress` that different compression levels and dictionary compression settings all work.
* Manually tested compiling with older versions of ZSTD: v1.3.8, v1.1.0, v0.6.2.
* Perf impact: from public benchmark data: http://fastcompression.blogspot.com/2019/03/presenting-xxh3.html for checksum and https://github.com/facebook/zstd#benchmarks, if decompression is 1700MB/s and checksum computation is 70000MB/s, checksum computation is an additional ~2.4% time for decompression. Compression is slower and checksumming should be less noticeable.
* Microbench:
```
TEST_TMPDIR=/dev/shm ./branch_db_basic_bench --benchmark_filter=DBGet/comp_style:0/max_data:1048576/per_key_size:256/enable_statistics:0/negative_query:0/enable_filter:0/mmap:0/compression_type:7/compression_checksum:1/no_blockcache:1/iterations:10000/threads:1 --benchmark_repetitions=100

Min out of 100 runs:
Main:
10390 10436 10456 10484 10499 10535 10544 10545 10565 10568

After this PR, checksum=false
10285 10397 10503 10508 10515 10557 10562 10635 10640 10660

After this PR, checksum=true
10827 10876 10925 10949 10971 11052 11061 11063 11100 11109
```
* db_bench:
```
Write perf
TEST_TMPDIR=/dev/shm/ ./db_bench_ichecksum --benchmarks=fillseq[-X10] --compression_type=zstd --num=10000000 --compression_checksum=..

[FillSeq checksum=0]
fillseq [AVG    10 runs] : 281635 (± 31711) ops/sec;   31.2 (± 3.5) MB/sec
fillseq [MEDIAN 10 runs] : 294027 ops/sec;   32.5 MB/sec

[FillSeq checksum=1]
fillseq [AVG    10 runs] : 286961 (± 34700) ops/sec;   31.7 (± 3.8) MB/sec
fillseq [MEDIAN 10 runs] : 283278 ops/sec;   31.3 MB/sec

Read perf
TEST_TMPDIR=/dev/shm ./db_bench_ichecksum --benchmarks=readrandom[-X20] --num=100000000 --reads=1000000 --use_existing_db=true --readonly=1

[Readrandom checksum=1]
readrandom [AVG    20 runs] : 360928 (± 3579) ops/sec;    4.0 (± 0.0) MB/sec
readrandom [MEDIAN 20 runs] : 362468 ops/sec;    4.0 MB/sec

[Readrandom checksum=0]
readrandom [AVG    20 runs] : 380365 (± 2384) ops/sec;    4.2 (± 0.0) MB/sec
readrandom [MEDIAN 20 runs] : 379800 ops/sec;    4.2 MB/sec

Compression
TEST_TMPDIR=/dev/shm ./db_bench_ichecksum --benchmarks=compress[-X20] --compression_type=zstd --num=100000000 --compression_checksum=1

checksum=1
compress [AVG    20 runs] : 54074 (± 634) ops/sec;  211.2 (± 2.5) MB/sec
compress [MEDIAN 20 runs] : 54396 ops/sec;  212.5 MB/sec

checksum=0
compress [AVG    20 runs] : 54598 (± 393) ops/sec;  213.3 (± 1.5) MB/sec
compress [MEDIAN 20 runs] : 54592 ops/sec;  213.3 MB/sec

Decompression:
TEST_TMPDIR=/dev/shm ./db_bench_ichecksum --benchmarks=uncompress[-X20] --compression_type=zstd --compression_checksum=1

checksum = 0
uncompress [AVG    20 runs] : 167499 (± 962) ops/sec;  654.3 (± 3.8) MB/sec
uncompress [MEDIAN 20 runs] : 167210 ops/sec;  653.2 MB/sec
checksum = 1
uncompress [AVG    20 runs] : 167980 (± 924) ops/sec;  656.2 (± 3.6) MB/sec
uncompress [MEDIAN 20 runs] : 168465 ops/sec;  658.1 MB/sec
```

Reviewed By: ajkr

Differential Revision: D48019378

Pulled By: cbi42

fbshipit-source-id: 674120c6e1853c2ced1436ac8138559d0204feba
2023-08-18 15:01:59 -07:00
..
aligned_buffer.h remove dependency on options.h for port_posix.h andport_win.h (#11214) 2023-02-13 02:21:38 -08:00
async_file_reader.cc Fix use_after_free bug when underlying FS enables kFSBuffer (#11645) 2023-07-27 12:02:03 -07:00
async_file_reader.h Add some missing headers (#10519) 2022-08-11 12:45:50 -07:00
autovector.h Remove RocksDB LITE (#11147) 2023-01-27 13:14:19 -08:00
autovector_test.cc Remove RocksDB LITE (#11147) 2023-01-27 13:14:19 -08:00
bloom_impl.h Clean up some FastRange calls (#11707) 2023-08-17 11:52:38 -07:00
bloom_test.cc Fix an uninitialized variable warning for g++ 12.2.0 (#10995) 2022-11-30 19:27:28 -08:00
build_version.cc.in Remove RocksDB LITE (#11147) 2023-01-27 13:14:19 -08:00
cast_util.h Improve memory efficiency of many OptimisticTransactionDBs (#11439) 2023-05-24 11:57:15 -07:00
channel.h Fix and detect headers with missing dependencies (#8893) 2021-09-10 10:00:26 -07:00
cleanable.cc Fix compile error in Clang 13 (#10033) 2022-05-28 00:15:28 -07:00
coding.cc clang-format cache/ and util/ directories (#10867) 2022-10-26 12:08:20 -07:00
coding.h clang-format cache/ and util/ directories (#10867) 2022-10-26 12:08:20 -07:00
coding_lean.h New stable, fixed-length cache keys (#9126) 2021-12-16 17:15:13 -08:00
coding_test.cc clang-format cache/ and util/ directories (#10867) 2022-10-26 12:08:20 -07:00
compaction_job_stats_impl.cc Compare the number of input keys and processed keys for compactions (#11571) 2023-07-28 09:47:31 -07:00
comparator.cc Expose the root comparator for built-in With64Ts comparators (#11704) 2023-08-15 13:44:13 -07:00
compression.cc Add `CompressionOptions::checksum` for enabling ZSTD checksum (#11666) 2023-08-18 15:01:59 -07:00
compression.h Add `CompressionOptions::checksum` for enabling ZSTD checksum (#11666) 2023-08-18 15:01:59 -07:00
compression_context_cache.cc clang-format cache/ and util/ directories (#10867) 2022-10-26 12:08:20 -07:00
compression_context_cache.h Replace namespace name "rocksdb" with ROCKSDB_NAMESPACE (#6433) 2020-02-20 12:09:57 -08:00
concurrent_task_limiter_impl.cc clang-format cache/ and util/ directories (#10867) 2022-10-26 12:08:20 -07:00
concurrent_task_limiter_impl.h clang-format cache/ and util/ directories (#10867) 2022-10-26 12:08:20 -07:00
core_local.h Add some more bit operations to internal APIs (#11660) 2023-08-02 11:30:10 -07:00
coro_utils.h Add some missing headers (#10519) 2022-08-11 12:45:50 -07:00
crc32c.cc Simplify detection of x86 CPU features (#11419) 2023-05-09 22:25:45 -07:00
crc32c.h clang-format cache/ and util/ directories (#10867) 2022-10-26 12:08:20 -07:00
crc32c_arm64.cc Minimal RocksJava compliance with Java 8 language level (EB 1046) (#10951) 2023-05-17 19:44:24 -07:00
crc32c_arm64.h Manual interventions for clang-format util/ (#10870) 2022-10-26 12:08:20 -07:00
crc32c_ppc.c Fix Compilation on ppc64le using Clang 11 (#7713) 2020-12-01 11:21:44 -08:00
crc32c_ppc.h Fix and detect headers with missing dependencies (#8893) 2021-09-10 10:00:26 -07:00
crc32c_ppc_asm.S Fix Compilation on ppc64le using Clang 11 (#7713) 2020-12-01 11:21:44 -08:00
crc32c_ppc_constants.h Remove PATENTS text from a few straggler files (#5326) 2019-05-21 16:22:35 -07:00
crc32c_test.cc clang-format cache/ and util/ directories (#10867) 2022-10-26 12:08:20 -07:00
data_structure.cc Improve SmallEnumSet (#11178) 2023-02-08 20:14:57 -08:00
defer.h clang-format cache/ and util/ directories (#10867) 2022-10-26 12:08:20 -07:00
defer_test.cc clang-format cache/ and util/ directories (#10867) 2022-10-26 12:08:20 -07:00
distributed_mutex.h remove dependency on options.h for port_posix.h andport_win.h (#11214) 2023-02-13 02:21:38 -08:00
duplicate_detector.h Remove RocksDB LITE (#11147) 2023-01-27 13:14:19 -08:00
dynamic_bloom.cc clang-format cache/ and util/ directories (#10867) 2022-10-26 12:08:20 -07:00
dynamic_bloom.h Clean up some FastRange calls (#11707) 2023-08-17 11:52:38 -07:00
dynamic_bloom_test.cc clang-format cache/ and util/ directories (#10867) 2022-10-26 12:08:20 -07:00
fastrange.h Fix and detect headers with missing dependencies (#8893) 2021-09-10 10:00:26 -07:00
file_checksum_helper.cc Group rocksdb.sst.read.micros stat by IOActivity flush and compaction (#11288) 2023-04-21 09:07:18 -07:00
file_checksum_helper.h remove dependency on options.h for port_posix.h andport_win.h (#11214) 2023-02-13 02:21:38 -08:00
file_reader_writer_test.cc Remove RocksDB LITE (#11147) 2023-01-27 13:14:19 -08:00
filelock_test.cc Add OpenBSD Support (#11255) 2023-06-27 11:58:29 -07:00
filter_bench.cc Use user-provided ReadOptions for metadata block reads more often (#11208) 2023-04-04 16:53:14 -07:00
gflags_compat.h Fix gflags_compat.h (#11346) 2023-04-03 10:41:00 -07:00
hash.cc Experimental support for SST unique IDs (#8990) 2021-10-18 23:32:01 -07:00
hash.h Improve memory efficiency of many OptimisticTransactionDBs (#11439) 2023-05-24 11:57:15 -07:00
hash128.h Upgrade xxhash, add Hash128 (#8634) 2021-08-20 18:41:51 -07:00
hash_containers.h Meta-internal folly integration with F14FastMap (#9546) 2022-04-13 07:34:01 -07:00
hash_map.h Change HashMap::Insert()'s value to a const reference (#6567) 2020-03-20 14:59:54 -07:00
hash_test.cc Add some more bit operations to internal APIs (#11660) 2023-08-02 11:30:10 -07:00
heap.h clang-format cache/ and util/ directories (#10867) 2022-10-26 12:08:20 -07:00
heap_test.cc clang-format cache/ and util/ directories (#10867) 2022-10-26 12:08:20 -07:00
kv_map.h clang-format cache/ and util/ directories (#10867) 2022-10-26 12:08:20 -07:00
log_write_bench.cc Add a SystemClock class to capture the time functions of an Env (#7858) 2021-01-25 22:09:11 -08:00
math.h Add some more bit operations to internal APIs (#11660) 2023-08-02 11:30:10 -07:00
math128.h Add some more bit operations to internal APIs (#11660) 2023-08-02 11:30:10 -07:00
murmurhash.cc Manual interventions for clang-format util/ (#10870) 2022-10-26 12:08:20 -07:00
murmurhash.h clang-format cache/ and util/ directories (#10867) 2022-10-26 12:08:20 -07:00
mutexlock.h Improve memory efficiency of many OptimisticTransactionDBs (#11439) 2023-05-24 11:57:15 -07:00
ppc-opcode.h Remove PATENTS text from a few straggler files (#5326) 2019-05-21 16:22:35 -07:00
random.cc remove dependency on options.h for port_posix.h andport_win.h (#11214) 2023-02-13 02:21:38 -08:00
random.h clang-format cache/ and util/ directories (#10867) 2022-10-26 12:08:20 -07:00
random_test.cc clang-format cache/ and util/ directories (#10867) 2022-10-26 12:08:20 -07:00
rate_limiter.cc Change internal headers with duplicate names (#11408) 2023-05-17 11:27:09 -07:00
rate_limiter_impl.h Change internal headers with duplicate names (#11408) 2023-05-17 11:27:09 -07:00
rate_limiter_test.cc Change internal headers with duplicate names (#11408) 2023-05-17 11:27:09 -07:00
repeatable_thread.h Use SystemClock* instead of std::shared_ptr<SystemClock> in lower level routines (#8033) 2021-03-15 04:34:11 -07:00
repeatable_thread_test.cc Print stack traces on frozen tests in CI (#10828) 2022-10-18 00:35:35 -07:00
ribbon_alg.h Use only ASCII in source files (#10164) 2022-06-15 14:44:43 -07:00
ribbon_config.cc Refine Ribbon configuration, improve testing, add Homogeneous (#7879) 2021-02-26 08:50:42 -08:00
ribbon_config.h Refine Ribbon configuration, improve testing, add Homogeneous (#7879) 2021-02-26 08:50:42 -08:00
ribbon_impl.h Account Bloom/Ribbon filter construction memory in global memory limit (#9073) 2021-11-18 09:42:20 -08:00
ribbon_test.cc util/ribbon_test.cc: avoid ambiguous reversed operator error in c++20 (#11371) 2023-04-12 13:24:34 -07:00
set_comparator.h Fix and detect headers with missing dependencies (#8893) 2021-09-10 10:00:26 -07:00
single_thread_executor.h add a missing include (#11624) 2023-07-18 15:38:52 -07:00
slice.cc Remove RocksDB LITE (#11147) 2023-01-27 13:14:19 -08:00
slice_test.cc Improve SmallEnumSet (#11178) 2023-02-08 20:14:57 -08:00
slice_transform_test.cc Much better stats for seeks and prefix filtering (#11460) 2023-05-19 15:25:49 -07:00
status.cc Merge operator failed subcode (#11231) 2023-02-17 10:58:46 -08:00
stderr_logger.cc Fix an import issue in fbcode. (#10604) 2022-08-29 21:09:36 -07:00
stderr_logger.h Fix an import issue in fbcode. (#10604) 2022-08-29 21:09:36 -07:00
stop_watch.h Fix StopWatch bug; Remove setting `record_read_stats` (#11474) 2023-05-25 10:16:58 -07:00
string_util.cc Remove RocksDB LITE (#11147) 2023-01-27 13:14:19 -08:00
string_util.h Remove RocksDB LITE (#11147) 2023-01-27 13:14:19 -08:00
thread_guard.h Introduce a ThreadGuard class and use it in ExternalSSTFileTest.PickedLevelBug (#8112) 2021-03-25 22:08:58 -07:00
thread_list_test.cc Group rocksdb.sst.read.micros stat by IOActivity flush and compaction (#11288) 2023-04-21 09:07:18 -07:00
thread_local.cc clang-format cache/ and util/ directories (#10867) 2022-10-26 12:08:20 -07:00
thread_local.h clang-format cache/ and util/ directories (#10867) 2022-10-26 12:08:20 -07:00
thread_local_test.cc Remove RocksDB LITE (#11147) 2023-01-27 13:14:19 -08:00
thread_operation.h Group rocksdb.sst.read.micros stat by different user read IOActivity + misc (#11444) 2023-08-08 17:26:50 -07:00
threadpool_imp.cc clang-format cache/ and util/ directories (#10867) 2022-10-26 12:08:20 -07:00
threadpool_imp.h clang-format cache/ and util/ directories (#10867) 2022-10-26 12:08:20 -07:00
timer.h clang-format cache/ and util/ directories (#10867) 2022-10-26 12:08:20 -07:00
timer_queue.h clang-format cache/ and util/ directories (#10867) 2022-10-26 12:08:20 -07:00
timer_queue_test.cc clang-format cache/ and util/ directories (#10867) 2022-10-26 12:08:20 -07:00
timer_test.cc Print stack traces on frozen tests in CI (#10828) 2022-10-18 00:35:35 -07:00
udt_util.cc Support switching on / off UDT together with in-Memtable-only feature (#11623) 2023-07-26 20:16:32 -07:00
udt_util.h Support switching on / off UDT together with in-Memtable-only feature (#11623) 2023-07-26 20:16:32 -07:00
udt_util_test.cc Add a UDT comparator for ReverseBytewiseComparator to object library (#11647) 2023-07-27 15:31:22 -07:00
user_comparator_wrapper.h Make UserComparatorWrapper not Customizable (#10837) 2022-10-21 12:27:50 -07:00
vector_iterator.h Make InternalKeyComparator not configurable (#10342) 2022-07-14 10:09:31 -07:00
work_queue.h Fix and detect headers with missing dependencies (#8893) 2021-09-10 10:00:26 -07:00
work_queue_test.cc Print stack traces on frozen tests in CI (#10828) 2022-10-18 00:35:35 -07:00
write_batch_util.cc Add utils to use for handling user defined timestamp size record in WAL (#11451) 2023-05-22 14:28:58 -07:00
write_batch_util.h Add utils to use for handling user defined timestamp size record in WAL (#11451) 2023-05-22 14:28:58 -07:00
xxhash.cc Manual interventions for clang-format util/ (#10870) 2022-10-26 12:08:20 -07:00
xxhash.h Upgrade xxhash.h to latest dev (#11098) 2023-01-19 12:07:50 -08:00
xxph3.h Manual interventions for clang-format util/ (#10870) 2022-10-26 12:08:20 -07:00