2016-02-09 23:12:00 +00:00
|
|
|
// Copyright (c) 2011-present, Facebook, Inc. All rights reserved.
|
2017-07-15 23:03:42 +00:00
|
|
|
// This source code is licensed under both the GPLv2 (found in the
|
|
|
|
// COPYING file in the root directory) and Apache 2.0 License
|
|
|
|
// (found in the LICENSE.Apache file in the root directory).
|
2015-01-09 20:57:11 +00:00
|
|
|
//
|
|
|
|
// Copyright (c) 2011 The LevelDB Authors. All rights reserved.
|
|
|
|
// Use of this source code is governed by a BSD-style license that can be
|
|
|
|
// found in the LICENSE file. See the AUTHORS file for names of contributors.
|
|
|
|
//
|
|
|
|
#pragma once
|
|
|
|
|
2015-01-15 00:24:24 +00:00
|
|
|
#include <algorithm>
|
|
|
|
#include <limits>
|
2019-01-24 02:11:08 +00:00
|
|
|
#ifdef ROCKSDB_MALLOC_USABLE_SIZE
|
|
|
|
#ifdef OS_FREEBSD
|
|
|
|
#include <malloc_np.h>
|
|
|
|
#else // OS_FREEBSD
|
|
|
|
#include <malloc.h>
|
|
|
|
#endif // OS_FREEBSD
|
|
|
|
#endif // ROCKSDB_MALLOC_USABLE_SIZE
|
2015-08-27 22:40:42 +00:00
|
|
|
#include <string>
|
2015-01-15 00:24:24 +00:00
|
|
|
|
2023-05-17 18:27:09 +00:00
|
|
|
#include "memory/memory_allocator_impl.h"
|
2015-01-09 20:57:11 +00:00
|
|
|
#include "rocksdb/options.h"
|
2018-10-03 00:21:54 +00:00
|
|
|
#include "rocksdb/table.h"
|
Major Cache refactoring, CPU efficiency improvement (#10975)
Summary:
This is several refactorings bundled into one to avoid having to incrementally re-modify uses of Cache several times. Overall, there are breaking changes to Cache class, and it becomes more of low-level interface for implementing caches, especially block cache. New internal APIs make using Cache cleaner than before, and more insulated from block cache evolution. Hopefully, this is the last really big block cache refactoring, because of rather effectively decoupling the implementations from the uses. This change also removes the EXPERIMENTAL designation on the SecondaryCache support in Cache. It seems reasonably mature at this point but still subject to change/evolution (as I warn in the API docs for Cache).
The high-level motivation for this refactoring is to minimize code duplication / compounding complexity in adding SecondaryCache support to HyperClockCache (in a later PR). Other benefits listed below.
* static_cast lines of code +29 -35 (net removed 6)
* reinterpret_cast lines of code +6 -32 (net removed 26)
## cache.h and secondary_cache.h
* Always use CacheItemHelper with entries instead of just a Deleter. There are several motivations / justifications:
* Simpler for implementations to deal with just one Insert and one Lookup.
* Simpler and more efficient implementation because we don't have to track which entries are using helpers and which are using deleters
* Gets rid of hack to classify cache entries by their deleter. Instead, the CacheItemHelper includes a CacheEntryRole. This simplifies a lot of code (cache_entry_roles.h almost eliminated). Fixes https://github.com/facebook/rocksdb/issues/9428.
* Makes it trivial to adjust SecondaryCache behavior based on kind of block (e.g. don't re-compress filter blocks).
* It is arguably less convenient for many direct users of Cache, but direct users of Cache are now rare with introduction of typed_cache.h (below).
* I considered and rejected an alternative approach in which we reduce customizability by assuming each secondary cache compatible value starts with a Slice referencing the uncompressed block contents (already true or mostly true), but we apparently intend to stack secondary caches. Saving an entry from a compressed secondary to a lower tier requires custom handling offered by SaveToCallback, etc.
* Make CreateCallback part of the helper and introduce CreateContext to work with it (alternative to https://github.com/facebook/rocksdb/issues/10562). This cleans up the interface while still allowing context to be provided for loading/parsing values into primary cache. This model works for async lookup in BlockBasedTable reader (reader owns a CreateContext) under the assumption that it always waits on secondary cache operations to finish. (Otherwise, the CreateContext could be destroyed while async operation depending on it continues.) This likely contributes most to the observed performance improvement because it saves an std::function backed by a heap allocation.
* Use char* for serialized data, e.g. in SaveToCallback, where void* was confusingly used. (We use `char*` for serialized byte data all over RocksDB, with many advantages over `void*`. `memcpy` etc. are legacy APIs that should not be mimicked.)
* Add a type alias Cache::ObjectPtr = void*, so that we can better indicate the intent of the void* when it is to be the object associated with a Cache entry. Related: started (but did not complete) a refactoring to move away from "value" of a cache entry toward "object" or "obj". (It is confusing to call Cache a key-value store (like DB) when it is really storing arbitrary in-memory objects, not byte strings.)
* Remove unnecessary key param from DeleterFn. This is good for efficiency in HyperClockCache, which does not directly store the cache key in memory. (Alternative to https://github.com/facebook/rocksdb/issues/10774)
* Add allocator to Cache DeleterFn. This is a kind of future-proofing change in case we get more serious about using the Cache allocator for memory tracked by the Cache. Right now, only the uncompressed block contents are allocated using the allocator, and a pointer to that allocator is saved as part of the cached object so that the deleter can use it. (See CacheAllocationPtr.) If in the future we are able to "flatten out" our Cache objects some more, it would be good not to have to track the allocator as part of each object.
* Removes legacy `ApplyToAllCacheEntries` and changes `ApplyToAllEntries` signature for Deleter->CacheItemHelper change.
## typed_cache.h
Adds various "typed" interfaces to the Cache as internal APIs, so that most uses of Cache can use simple type safe code without casting and without explicit deleters, etc. Almost all of the non-test, non-glue code uses of Cache have been migrated. (Follow-up work: CompressedSecondaryCache deserves deeper attention to migrate.) This change expands RocksDB's internal usage of metaprogramming and SFINAE (https://en.cppreference.com/w/cpp/language/sfinae).
The existing usages of Cache are divided up at a high level into these new interfaces. See updated existing uses of Cache for examples of how these are used.
* PlaceholderCacheInterface - Used for making cache reservations, with entries that have a charge but no value.
* BasicTypedCacheInterface<TValue> - Used for primary cache storage of objects of type TValue, which can be cleaned up with std::default_delete<TValue>. The role is provided by TValue::kCacheEntryRole or given in an optional template parameter.
* FullTypedCacheInterface<TValue, TCreateContext> - Used for secondary cache compatible storage of objects of type TValue. In addition to BasicTypedCacheInterface constraints, we require TValue::ContentSlice() to return persistable data. This simplifies usage for the normal case of simple secondary cache compatibility (can give you a Slice to the data already in memory). In addition to TCreateContext performing the role of Cache::CreateContext, it is also expected to provide a factory function for creating TValue.
* For each of these, there's a "Shared" version (e.g. FullTypedSharedCacheInterface) that holds a shared_ptr to the Cache, rather than assuming external ownership by holding only a raw `Cache*`.
These interfaces introduce specific handle types for each interface instantiation, so that it's easy to see what kind of object is controlled by a handle. (Ultimately, this might not be worth the extra complexity, but it seems OK so far.)
Note: I attempted to make the cache 'charge' automatically inferred from the cache object type, such as by expecting an ApproximateMemoryUsage() function, but this is not so clean because there are cases where we need to compute the charge ahead of time and don't want to re-compute it.
## block_cache.h
This header is essentially the replacement for the old block_like_traits.h. It includes various things to support block cache access with typed_cache.h for block-based table.
## block_based_table_reader.cc
Before this change, accessing the block cache here was an awkward mix of static polymorphism (template TBlocklike) and switch-case on a dynamic BlockType value. This change mostly unifies on static polymorphism, relying on minor hacks in block_cache.h to distinguish variants of Block. We still check BlockType in some places (especially for stats, which could be improved in follow-up work) but at least the BlockType is a static constant from the template parameter. (No more awkward partial redundancy between static and dynamic info.) This likely contributes to the overall performance improvement, but hasn't been tested in isolation.
The other key source of simplification here is a more unified system of creating block cache objects: for directly populating from primary cache and for promotion from secondary cache. Both use BlockCreateContext, for context and for factory functions.
## block_based_table_builder.cc, cache_dump_load_impl.cc
Before this change, warming caches was super ugly code. Both of these source files had switch statements to basically transition from the dynamic BlockType world to the static TBlocklike world. None of that mess is needed anymore as there's a new, untyped WarmInCache function that handles all the details just as promotion from SecondaryCache would. (Fixes `TODO akanksha: Dedup below code` in block_based_table_builder.cc.)
## Everything else
Mostly just updating Cache users to use new typed APIs when reasonably possible, or changed Cache APIs when not.
Pull Request resolved: https://github.com/facebook/rocksdb/pull/10975
Test Plan:
tests updated
Performance test setup similar to https://github.com/facebook/rocksdb/issues/10626 (by cache size, LRUCache when not "hyper" for HyperClockCache):
34MB 1thread base.hyper -> kops/s: 0.745 io_bytes/op: 2.52504e+06 miss_ratio: 0.140906 max_rss_mb: 76.4844
34MB 1thread new.hyper -> kops/s: 0.751 io_bytes/op: 2.5123e+06 miss_ratio: 0.140161 max_rss_mb: 79.3594
34MB 1thread base -> kops/s: 0.254 io_bytes/op: 1.36073e+07 miss_ratio: 0.918818 max_rss_mb: 45.9297
34MB 1thread new -> kops/s: 0.252 io_bytes/op: 1.36157e+07 miss_ratio: 0.918999 max_rss_mb: 44.1523
34MB 32thread base.hyper -> kops/s: 7.272 io_bytes/op: 2.88323e+06 miss_ratio: 0.162532 max_rss_mb: 516.602
34MB 32thread new.hyper -> kops/s: 7.214 io_bytes/op: 2.99046e+06 miss_ratio: 0.168818 max_rss_mb: 518.293
34MB 32thread base -> kops/s: 3.528 io_bytes/op: 1.35722e+07 miss_ratio: 0.914691 max_rss_mb: 264.926
34MB 32thread new -> kops/s: 3.604 io_bytes/op: 1.35744e+07 miss_ratio: 0.915054 max_rss_mb: 264.488
233MB 1thread base.hyper -> kops/s: 53.909 io_bytes/op: 2552.35 miss_ratio: 0.0440566 max_rss_mb: 241.984
233MB 1thread new.hyper -> kops/s: 62.792 io_bytes/op: 2549.79 miss_ratio: 0.044043 max_rss_mb: 241.922
233MB 1thread base -> kops/s: 1.197 io_bytes/op: 2.75173e+06 miss_ratio: 0.103093 max_rss_mb: 241.559
233MB 1thread new -> kops/s: 1.199 io_bytes/op: 2.73723e+06 miss_ratio: 0.10305 max_rss_mb: 240.93
233MB 32thread base.hyper -> kops/s: 1298.69 io_bytes/op: 2539.12 miss_ratio: 0.0440307 max_rss_mb: 371.418
233MB 32thread new.hyper -> kops/s: 1421.35 io_bytes/op: 2538.75 miss_ratio: 0.0440307 max_rss_mb: 347.273
233MB 32thread base -> kops/s: 9.693 io_bytes/op: 2.77304e+06 miss_ratio: 0.103745 max_rss_mb: 569.691
233MB 32thread new -> kops/s: 9.75 io_bytes/op: 2.77559e+06 miss_ratio: 0.103798 max_rss_mb: 552.82
1597MB 1thread base.hyper -> kops/s: 58.607 io_bytes/op: 1449.14 miss_ratio: 0.0249324 max_rss_mb: 1583.55
1597MB 1thread new.hyper -> kops/s: 69.6 io_bytes/op: 1434.89 miss_ratio: 0.0247167 max_rss_mb: 1584.02
1597MB 1thread base -> kops/s: 60.478 io_bytes/op: 1421.28 miss_ratio: 0.024452 max_rss_mb: 1589.45
1597MB 1thread new -> kops/s: 63.973 io_bytes/op: 1416.07 miss_ratio: 0.0243766 max_rss_mb: 1589.24
1597MB 32thread base.hyper -> kops/s: 1436.2 io_bytes/op: 1357.93 miss_ratio: 0.0235353 max_rss_mb: 1692.92
1597MB 32thread new.hyper -> kops/s: 1605.03 io_bytes/op: 1358.04 miss_ratio: 0.023538 max_rss_mb: 1702.78
1597MB 32thread base -> kops/s: 280.059 io_bytes/op: 1350.34 miss_ratio: 0.023289 max_rss_mb: 1675.36
1597MB 32thread new -> kops/s: 283.125 io_bytes/op: 1351.05 miss_ratio: 0.0232797 max_rss_mb: 1703.83
Almost uniformly improving over base revision, especially for hot paths with HyperClockCache, up to 12% higher throughput seen (1597MB, 32thread, hyper). The improvement for that is likely coming from much simplified code for providing context for secondary cache promotion (CreateCallback/CreateContext), and possibly from less branching in block_based_table_reader. And likely a small improvement from not reconstituting key for DeleterFn.
Reviewed By: anand1976
Differential Revision: D42417818
Pulled By: pdillinger
fbshipit-source-id: f86bfdd584dce27c028b151ba56818ad14f7a432
2023-01-11 22:20:40 +00:00
|
|
|
#include "table/block_based/block_type.h"
|
2020-08-13 01:24:27 +00:00
|
|
|
#include "test_util/sync_point.h"
|
2015-01-15 00:24:24 +00:00
|
|
|
#include "util/coding.h"
|
2018-06-04 19:04:52 +00:00
|
|
|
#include "util/compression_context_cache.h"
|
2019-04-02 21:48:52 +00:00
|
|
|
#include "util/string_util.h"
|
2015-01-09 20:57:11 +00:00
|
|
|
|
|
|
|
#ifdef SNAPPY
|
|
|
|
#include <snappy.h>
|
|
|
|
#endif
|
|
|
|
|
|
|
|
#ifdef ZLIB
|
|
|
|
#include <zlib.h>
|
|
|
|
#endif
|
|
|
|
|
|
|
|
#ifdef BZIP2
|
|
|
|
#include <bzlib.h>
|
|
|
|
#endif
|
|
|
|
|
|
|
|
#if defined(LZ4)
|
|
|
|
#include <lz4.h>
|
|
|
|
#include <lz4hc.h>
|
|
|
|
#endif
|
|
|
|
|
2015-08-27 22:40:42 +00:00
|
|
|
#if defined(ZSTD)
|
|
|
|
#include <zstd.h>
|
2022-10-26 19:08:20 +00:00
|
|
|
// v1.1.3+
|
|
|
|
#if ZSTD_VERSION_NUMBER >= 10103
|
2017-11-03 05:46:13 +00:00
|
|
|
#include <zdict.h>
|
2018-08-23 01:22:10 +00:00
|
|
|
#endif // ZSTD_VERSION_NUMBER >= 10103
|
2022-10-26 19:08:20 +00:00
|
|
|
// v1.4.0+
|
Add `CompressionOptions::checksum` for enabling ZSTD checksum (#11666)
Summary:
Optionally enable zstd checksum flag (https://github.com/facebook/zstd/blob/d857369028d997c92ff1f1861a4d7f679a125464/lib/zstd.h#L428) to detect corruption during decompression. Main changes are in compression.h:
* User can set CompressionOptions::checksum to true to enable this feature.
* We enable this feature in ZSTD by setting the checksum flag in ZSTD compression context: `ZSTD_CCtx`.
* Uses `ZSTD_compress2()` to do compression since it supports frame parameter like the checksum flag. Compression level is also set in compression context as a flag.
* Error handling during decompression to propagate error message from ZSTD.
* Updated microbench to test read performance impact.
About compatibility, the current compression decoders should continue to work with the data created by the new compression API `ZSTD_compress2()`: https://github.com/facebook/zstd/issues/3711.
Pull Request resolved: https://github.com/facebook/rocksdb/pull/11666
Test Plan:
* Existing unit tests for zstd compression
* Add unit test `DBTest2.ZSTDChecksum` to test the corruption case
* Manually tested that compression levels, parallel compression, dictionary compression, index compression all work with the new ZSTD_compress2() API.
* Manually tested with `sst_dump --command=recompress` that different compression levels and dictionary compression settings all work.
* Manually tested compiling with older versions of ZSTD: v1.3.8, v1.1.0, v0.6.2.
* Perf impact: from public benchmark data: http://fastcompression.blogspot.com/2019/03/presenting-xxh3.html for checksum and https://github.com/facebook/zstd#benchmarks, if decompression is 1700MB/s and checksum computation is 70000MB/s, checksum computation is an additional ~2.4% time for decompression. Compression is slower and checksumming should be less noticeable.
* Microbench:
```
TEST_TMPDIR=/dev/shm ./branch_db_basic_bench --benchmark_filter=DBGet/comp_style:0/max_data:1048576/per_key_size:256/enable_statistics:0/negative_query:0/enable_filter:0/mmap:0/compression_type:7/compression_checksum:1/no_blockcache:1/iterations:10000/threads:1 --benchmark_repetitions=100
Min out of 100 runs:
Main:
10390 10436 10456 10484 10499 10535 10544 10545 10565 10568
After this PR, checksum=false
10285 10397 10503 10508 10515 10557 10562 10635 10640 10660
After this PR, checksum=true
10827 10876 10925 10949 10971 11052 11061 11063 11100 11109
```
* db_bench:
```
Write perf
TEST_TMPDIR=/dev/shm/ ./db_bench_ichecksum --benchmarks=fillseq[-X10] --compression_type=zstd --num=10000000 --compression_checksum=..
[FillSeq checksum=0]
fillseq [AVG 10 runs] : 281635 (± 31711) ops/sec; 31.2 (± 3.5) MB/sec
fillseq [MEDIAN 10 runs] : 294027 ops/sec; 32.5 MB/sec
[FillSeq checksum=1]
fillseq [AVG 10 runs] : 286961 (± 34700) ops/sec; 31.7 (± 3.8) MB/sec
fillseq [MEDIAN 10 runs] : 283278 ops/sec; 31.3 MB/sec
Read perf
TEST_TMPDIR=/dev/shm ./db_bench_ichecksum --benchmarks=readrandom[-X20] --num=100000000 --reads=1000000 --use_existing_db=true --readonly=1
[Readrandom checksum=1]
readrandom [AVG 20 runs] : 360928 (± 3579) ops/sec; 4.0 (± 0.0) MB/sec
readrandom [MEDIAN 20 runs] : 362468 ops/sec; 4.0 MB/sec
[Readrandom checksum=0]
readrandom [AVG 20 runs] : 380365 (± 2384) ops/sec; 4.2 (± 0.0) MB/sec
readrandom [MEDIAN 20 runs] : 379800 ops/sec; 4.2 MB/sec
Compression
TEST_TMPDIR=/dev/shm ./db_bench_ichecksum --benchmarks=compress[-X20] --compression_type=zstd --num=100000000 --compression_checksum=1
checksum=1
compress [AVG 20 runs] : 54074 (± 634) ops/sec; 211.2 (± 2.5) MB/sec
compress [MEDIAN 20 runs] : 54396 ops/sec; 212.5 MB/sec
checksum=0
compress [AVG 20 runs] : 54598 (± 393) ops/sec; 213.3 (± 1.5) MB/sec
compress [MEDIAN 20 runs] : 54592 ops/sec; 213.3 MB/sec
Decompression:
TEST_TMPDIR=/dev/shm ./db_bench_ichecksum --benchmarks=uncompress[-X20] --compression_type=zstd --compression_checksum=1
checksum = 0
uncompress [AVG 20 runs] : 167499 (± 962) ops/sec; 654.3 (± 3.8) MB/sec
uncompress [MEDIAN 20 runs] : 167210 ops/sec; 653.2 MB/sec
checksum = 1
uncompress [AVG 20 runs] : 167980 (± 924) ops/sec; 656.2 (± 3.6) MB/sec
uncompress [MEDIAN 20 runs] : 168465 ops/sec; 658.1 MB/sec
```
Reviewed By: ajkr
Differential Revision: D48019378
Pulled By: cbi42
fbshipit-source-id: 674120c6e1853c2ced1436ac8138559d0204feba
2023-08-18 22:01:59 +00:00
|
|
|
// ZSTD_Compress2(), ZSTD_compressStream2() and frame parameters all belong to
|
|
|
|
// advanced APIs and require v1.4.0+.
|
|
|
|
// https://github.com/facebook/zstd/blob/eb9f881eb810f2242f1ef36b3f3e7014eecb8fa6/lib/zstd.h#L297C40-L297C45
|
2022-10-26 19:08:20 +00:00
|
|
|
#if ZSTD_VERSION_NUMBER >= 10400
|
Add `CompressionOptions::checksum` for enabling ZSTD checksum (#11666)
Summary:
Optionally enable zstd checksum flag (https://github.com/facebook/zstd/blob/d857369028d997c92ff1f1861a4d7f679a125464/lib/zstd.h#L428) to detect corruption during decompression. Main changes are in compression.h:
* User can set CompressionOptions::checksum to true to enable this feature.
* We enable this feature in ZSTD by setting the checksum flag in ZSTD compression context: `ZSTD_CCtx`.
* Uses `ZSTD_compress2()` to do compression since it supports frame parameter like the checksum flag. Compression level is also set in compression context as a flag.
* Error handling during decompression to propagate error message from ZSTD.
* Updated microbench to test read performance impact.
About compatibility, the current compression decoders should continue to work with the data created by the new compression API `ZSTD_compress2()`: https://github.com/facebook/zstd/issues/3711.
Pull Request resolved: https://github.com/facebook/rocksdb/pull/11666
Test Plan:
* Existing unit tests for zstd compression
* Add unit test `DBTest2.ZSTDChecksum` to test the corruption case
* Manually tested that compression levels, parallel compression, dictionary compression, index compression all work with the new ZSTD_compress2() API.
* Manually tested with `sst_dump --command=recompress` that different compression levels and dictionary compression settings all work.
* Manually tested compiling with older versions of ZSTD: v1.3.8, v1.1.0, v0.6.2.
* Perf impact: from public benchmark data: http://fastcompression.blogspot.com/2019/03/presenting-xxh3.html for checksum and https://github.com/facebook/zstd#benchmarks, if decompression is 1700MB/s and checksum computation is 70000MB/s, checksum computation is an additional ~2.4% time for decompression. Compression is slower and checksumming should be less noticeable.
* Microbench:
```
TEST_TMPDIR=/dev/shm ./branch_db_basic_bench --benchmark_filter=DBGet/comp_style:0/max_data:1048576/per_key_size:256/enable_statistics:0/negative_query:0/enable_filter:0/mmap:0/compression_type:7/compression_checksum:1/no_blockcache:1/iterations:10000/threads:1 --benchmark_repetitions=100
Min out of 100 runs:
Main:
10390 10436 10456 10484 10499 10535 10544 10545 10565 10568
After this PR, checksum=false
10285 10397 10503 10508 10515 10557 10562 10635 10640 10660
After this PR, checksum=true
10827 10876 10925 10949 10971 11052 11061 11063 11100 11109
```
* db_bench:
```
Write perf
TEST_TMPDIR=/dev/shm/ ./db_bench_ichecksum --benchmarks=fillseq[-X10] --compression_type=zstd --num=10000000 --compression_checksum=..
[FillSeq checksum=0]
fillseq [AVG 10 runs] : 281635 (± 31711) ops/sec; 31.2 (± 3.5) MB/sec
fillseq [MEDIAN 10 runs] : 294027 ops/sec; 32.5 MB/sec
[FillSeq checksum=1]
fillseq [AVG 10 runs] : 286961 (± 34700) ops/sec; 31.7 (± 3.8) MB/sec
fillseq [MEDIAN 10 runs] : 283278 ops/sec; 31.3 MB/sec
Read perf
TEST_TMPDIR=/dev/shm ./db_bench_ichecksum --benchmarks=readrandom[-X20] --num=100000000 --reads=1000000 --use_existing_db=true --readonly=1
[Readrandom checksum=1]
readrandom [AVG 20 runs] : 360928 (± 3579) ops/sec; 4.0 (± 0.0) MB/sec
readrandom [MEDIAN 20 runs] : 362468 ops/sec; 4.0 MB/sec
[Readrandom checksum=0]
readrandom [AVG 20 runs] : 380365 (± 2384) ops/sec; 4.2 (± 0.0) MB/sec
readrandom [MEDIAN 20 runs] : 379800 ops/sec; 4.2 MB/sec
Compression
TEST_TMPDIR=/dev/shm ./db_bench_ichecksum --benchmarks=compress[-X20] --compression_type=zstd --num=100000000 --compression_checksum=1
checksum=1
compress [AVG 20 runs] : 54074 (± 634) ops/sec; 211.2 (± 2.5) MB/sec
compress [MEDIAN 20 runs] : 54396 ops/sec; 212.5 MB/sec
checksum=0
compress [AVG 20 runs] : 54598 (± 393) ops/sec; 213.3 (± 1.5) MB/sec
compress [MEDIAN 20 runs] : 54592 ops/sec; 213.3 MB/sec
Decompression:
TEST_TMPDIR=/dev/shm ./db_bench_ichecksum --benchmarks=uncompress[-X20] --compression_type=zstd --compression_checksum=1
checksum = 0
uncompress [AVG 20 runs] : 167499 (± 962) ops/sec; 654.3 (± 3.8) MB/sec
uncompress [MEDIAN 20 runs] : 167210 ops/sec; 653.2 MB/sec
checksum = 1
uncompress [AVG 20 runs] : 167980 (± 924) ops/sec; 656.2 (± 3.6) MB/sec
uncompress [MEDIAN 20 runs] : 168465 ops/sec; 658.1 MB/sec
```
Reviewed By: ajkr
Differential Revision: D48019378
Pulled By: cbi42
fbshipit-source-id: 674120c6e1853c2ced1436ac8138559d0204feba
2023-08-18 22:01:59 +00:00
|
|
|
#define ZSTD_ADVANCED
|
2022-04-14 18:05:39 +00:00
|
|
|
#endif // ZSTD_VERSION_NUMBER >= 10400
|
2020-02-20 20:07:53 +00:00
|
|
|
namespace ROCKSDB_NAMESPACE {
|
2018-06-04 19:04:52 +00:00
|
|
|
// Need this for the context allocation override
|
|
|
|
// On windows we need to do this explicitly
|
|
|
|
#if (ZSTD_VERSION_NUMBER >= 500)
|
|
|
|
#if defined(ROCKSDB_JEMALLOC) && defined(OS_WIN) && \
|
|
|
|
defined(ZSTD_STATIC_LINKING_ONLY)
|
|
|
|
#define ROCKSDB_ZSTD_CUSTOM_MEM
|
|
|
|
namespace port {
|
|
|
|
ZSTD_customMem GetJeZstdAllocationOverrides();
|
|
|
|
} // namespace port
|
|
|
|
#endif // defined(ROCKSDB_JEMALLOC) && defined(OS_WIN) &&
|
|
|
|
// defined(ZSTD_STATIC_LINKING_ONLY)
|
|
|
|
|
2019-01-24 02:11:08 +00:00
|
|
|
// We require `ZSTD_sizeof_DDict` and `ZSTD_createDDict_byReference` to use
|
|
|
|
// `ZSTD_DDict`. The former was introduced in v1.0.0 and the latter was
|
|
|
|
// introduced in v1.1.3. But an important bug fix for `ZSTD_sizeof_DDict` came
|
|
|
|
// in v1.1.4, so that is the version we require. As of today's latest version
|
|
|
|
// (v1.3.8), they are both still in the experimental API, which means they are
|
|
|
|
// only exported when the compiler flag `ZSTD_STATIC_LINKING_ONLY` is set.
|
|
|
|
#if defined(ZSTD_STATIC_LINKING_ONLY) && ZSTD_VERSION_NUMBER >= 10104
|
|
|
|
#define ROCKSDB_ZSTD_DDICT
|
|
|
|
#endif // defined(ZSTD_STATIC_LINKING_ONLY) && ZSTD_VERSION_NUMBER >= 10104
|
|
|
|
|
2018-06-04 19:04:52 +00:00
|
|
|
// Cached data represents a portion that can be re-used
|
|
|
|
// If, in the future we have more than one native context to
|
|
|
|
// cache we can arrange this as a tuple
|
|
|
|
class ZSTDUncompressCachedData {
|
2018-06-05 19:51:05 +00:00
|
|
|
public:
|
2018-06-04 19:04:52 +00:00
|
|
|
using ZSTDNativeContext = ZSTD_DCtx*;
|
2018-06-05 19:51:05 +00:00
|
|
|
ZSTDUncompressCachedData() {}
|
2018-06-04 19:04:52 +00:00
|
|
|
// Init from cache
|
|
|
|
ZSTDUncompressCachedData(const ZSTDUncompressCachedData& o) = delete;
|
|
|
|
ZSTDUncompressCachedData& operator=(const ZSTDUncompressCachedData&) = delete;
|
2022-05-26 23:55:08 +00:00
|
|
|
ZSTDUncompressCachedData(ZSTDUncompressCachedData&& o) noexcept
|
2018-06-05 19:51:05 +00:00
|
|
|
: ZSTDUncompressCachedData() {
|
2018-06-04 19:04:52 +00:00
|
|
|
*this = std::move(o);
|
|
|
|
}
|
2022-05-26 23:55:08 +00:00
|
|
|
ZSTDUncompressCachedData& operator=(ZSTDUncompressCachedData&& o) noexcept {
|
2018-06-04 19:04:52 +00:00
|
|
|
assert(zstd_ctx_ == nullptr);
|
2018-06-05 19:51:05 +00:00
|
|
|
std::swap(zstd_ctx_, o.zstd_ctx_);
|
|
|
|
std::swap(cache_idx_, o.cache_idx_);
|
2018-06-04 19:04:52 +00:00
|
|
|
return *this;
|
|
|
|
}
|
2018-06-05 19:51:05 +00:00
|
|
|
ZSTDNativeContext Get() const { return zstd_ctx_; }
|
|
|
|
int64_t GetCacheIndex() const { return cache_idx_; }
|
2018-06-04 19:04:52 +00:00
|
|
|
void CreateIfNeeded() {
|
|
|
|
if (zstd_ctx_ == nullptr) {
|
|
|
|
#ifdef ROCKSDB_ZSTD_CUSTOM_MEM
|
2018-06-05 19:51:05 +00:00
|
|
|
zstd_ctx_ =
|
|
|
|
ZSTD_createDCtx_advanced(port::GetJeZstdAllocationOverrides());
|
2018-06-04 19:04:52 +00:00
|
|
|
#else // ROCKSDB_ZSTD_CUSTOM_MEM
|
2018-06-05 19:51:05 +00:00
|
|
|
zstd_ctx_ = ZSTD_createDCtx();
|
2018-06-04 19:04:52 +00:00
|
|
|
#endif // ROCKSDB_ZSTD_CUSTOM_MEM
|
|
|
|
cache_idx_ = -1;
|
|
|
|
}
|
|
|
|
}
|
|
|
|
void InitFromCache(const ZSTDUncompressCachedData& o, int64_t idx) {
|
|
|
|
zstd_ctx_ = o.zstd_ctx_;
|
|
|
|
cache_idx_ = idx;
|
|
|
|
}
|
|
|
|
~ZSTDUncompressCachedData() {
|
|
|
|
if (zstd_ctx_ != nullptr && cache_idx_ == -1) {
|
|
|
|
ZSTD_freeDCtx(zstd_ctx_);
|
|
|
|
}
|
|
|
|
}
|
2018-06-05 19:51:05 +00:00
|
|
|
|
|
|
|
private:
|
|
|
|
ZSTDNativeContext zstd_ctx_ = nullptr;
|
|
|
|
int64_t cache_idx_ = -1; // -1 means this instance owns the context
|
2018-06-04 19:04:52 +00:00
|
|
|
};
|
2018-06-05 19:51:05 +00:00
|
|
|
#endif // (ZSTD_VERSION_NUMBER >= 500)
|
2020-02-20 20:07:53 +00:00
|
|
|
} // namespace ROCKSDB_NAMESPACE
|
2017-11-03 05:46:13 +00:00
|
|
|
#endif // ZSTD
|
2015-08-27 22:40:42 +00:00
|
|
|
|
2018-06-04 19:04:52 +00:00
|
|
|
#if !(defined ZSTD) || !(ZSTD_VERSION_NUMBER >= 500)
|
2020-02-20 20:07:53 +00:00
|
|
|
namespace ROCKSDB_NAMESPACE {
|
2018-06-04 19:04:52 +00:00
|
|
|
class ZSTDUncompressCachedData {
|
2018-06-05 19:51:05 +00:00
|
|
|
void* padding; // unused
|
|
|
|
public:
|
2018-06-04 19:04:52 +00:00
|
|
|
using ZSTDNativeContext = void*;
|
|
|
|
ZSTDUncompressCachedData() {}
|
|
|
|
ZSTDUncompressCachedData(const ZSTDUncompressCachedData&) {}
|
|
|
|
ZSTDUncompressCachedData& operator=(const ZSTDUncompressCachedData&) = delete;
|
2022-05-26 23:55:08 +00:00
|
|
|
ZSTDUncompressCachedData(ZSTDUncompressCachedData&&) noexcept = default;
|
|
|
|
ZSTDUncompressCachedData& operator=(ZSTDUncompressCachedData&&) noexcept =
|
|
|
|
default;
|
2018-06-05 19:51:05 +00:00
|
|
|
ZSTDNativeContext Get() const { return nullptr; }
|
|
|
|
int64_t GetCacheIndex() const { return -1; }
|
2018-06-04 19:04:52 +00:00
|
|
|
void CreateIfNeeded() {}
|
|
|
|
void InitFromCache(const ZSTDUncompressCachedData&, int64_t) {}
|
2022-10-26 19:08:20 +00:00
|
|
|
|
2018-07-13 21:07:53 +00:00
|
|
|
private:
|
|
|
|
void ignore_padding__() { padding = nullptr; }
|
2018-06-04 19:04:52 +00:00
|
|
|
};
|
2020-02-20 20:07:53 +00:00
|
|
|
} // namespace ROCKSDB_NAMESPACE
|
2018-06-04 19:04:52 +00:00
|
|
|
#endif
|
|
|
|
|
2016-04-20 05:54:24 +00:00
|
|
|
#if defined(XPRESS)
|
|
|
|
#include "port/xpress.h"
|
|
|
|
#endif
|
|
|
|
|
2020-02-20 20:07:53 +00:00
|
|
|
namespace ROCKSDB_NAMESPACE {
|
2015-01-09 20:57:11 +00:00
|
|
|
|
2019-01-19 03:10:17 +00:00
|
|
|
// Holds dictionary and related data, like ZSTD's digested compression
|
|
|
|
// dictionary.
|
|
|
|
struct CompressionDict {
|
|
|
|
#if ZSTD_VERSION_NUMBER >= 700
|
|
|
|
ZSTD_CDict* zstd_cdict_ = nullptr;
|
|
|
|
#endif // ZSTD_VERSION_NUMBER >= 700
|
Reduce scope of compression dictionary to single SST (#4952)
Summary:
Our previous approach was to train one compression dictionary per compaction, using the first output SST to train a dictionary, and then applying it on subsequent SSTs in the same compaction. While this was great for minimizing CPU/memory/I/O overhead, it did not achieve good compression ratios in practice. In our most promising potential use case, moderate reductions in a dictionary's scope make a major difference on compression ratio.
So, this PR changes compression dictionary to be scoped per-SST. It accepts the tradeoff during table building to use more memory and CPU. Important changes include:
- The `BlockBasedTableBuilder` has a new state when dictionary compression is in-use: `kBuffered`. In that state it accumulates uncompressed data in-memory whenever `Add` is called.
- After accumulating target file size bytes or calling `BlockBasedTableBuilder::Finish`, a `BlockBasedTableBuilder` moves to the `kUnbuffered` state. The transition (`EnterUnbuffered()`) involves sampling the buffered data, training a dictionary, and compressing/writing out all buffered data. In the `kUnbuffered` state, a `BlockBasedTableBuilder` behaves the same as before -- blocks are compressed/written out as soon as they fill up.
- Samples are now whole uncompressed data blocks, except the final sample may be a partial data block so we don't breach the user's configured `max_dict_bytes` or `zstd_max_train_bytes`. The dictionary trainer is supposed to work better when we pass it real units of compression. Previously we were passing 64-byte KV samples which was not realistic.
Pull Request resolved: https://github.com/facebook/rocksdb/pull/4952
Differential Revision: D13967980
Pulled By: ajkr
fbshipit-source-id: 82bea6f7537e1529c7a1a4cdee84585f5949300f
2019-02-12 03:42:25 +00:00
|
|
|
std::string dict_;
|
2019-01-19 03:10:17 +00:00
|
|
|
|
|
|
|
public:
|
|
|
|
#if ZSTD_VERSION_NUMBER >= 700
|
Reduce scope of compression dictionary to single SST (#4952)
Summary:
Our previous approach was to train one compression dictionary per compaction, using the first output SST to train a dictionary, and then applying it on subsequent SSTs in the same compaction. While this was great for minimizing CPU/memory/I/O overhead, it did not achieve good compression ratios in practice. In our most promising potential use case, moderate reductions in a dictionary's scope make a major difference on compression ratio.
So, this PR changes compression dictionary to be scoped per-SST. It accepts the tradeoff during table building to use more memory and CPU. Important changes include:
- The `BlockBasedTableBuilder` has a new state when dictionary compression is in-use: `kBuffered`. In that state it accumulates uncompressed data in-memory whenever `Add` is called.
- After accumulating target file size bytes or calling `BlockBasedTableBuilder::Finish`, a `BlockBasedTableBuilder` moves to the `kUnbuffered` state. The transition (`EnterUnbuffered()`) involves sampling the buffered data, training a dictionary, and compressing/writing out all buffered data. In the `kUnbuffered` state, a `BlockBasedTableBuilder` behaves the same as before -- blocks are compressed/written out as soon as they fill up.
- Samples are now whole uncompressed data blocks, except the final sample may be a partial data block so we don't breach the user's configured `max_dict_bytes` or `zstd_max_train_bytes`. The dictionary trainer is supposed to work better when we pass it real units of compression. Previously we were passing 64-byte KV samples which was not realistic.
Pull Request resolved: https://github.com/facebook/rocksdb/pull/4952
Differential Revision: D13967980
Pulled By: ajkr
fbshipit-source-id: 82bea6f7537e1529c7a1a4cdee84585f5949300f
2019-02-12 03:42:25 +00:00
|
|
|
CompressionDict(std::string dict, CompressionType type, int level) {
|
2019-01-19 03:10:17 +00:00
|
|
|
#else // ZSTD_VERSION_NUMBER >= 700
|
Reduce scope of compression dictionary to single SST (#4952)
Summary:
Our previous approach was to train one compression dictionary per compaction, using the first output SST to train a dictionary, and then applying it on subsequent SSTs in the same compaction. While this was great for minimizing CPU/memory/I/O overhead, it did not achieve good compression ratios in practice. In our most promising potential use case, moderate reductions in a dictionary's scope make a major difference on compression ratio.
So, this PR changes compression dictionary to be scoped per-SST. It accepts the tradeoff during table building to use more memory and CPU. Important changes include:
- The `BlockBasedTableBuilder` has a new state when dictionary compression is in-use: `kBuffered`. In that state it accumulates uncompressed data in-memory whenever `Add` is called.
- After accumulating target file size bytes or calling `BlockBasedTableBuilder::Finish`, a `BlockBasedTableBuilder` moves to the `kUnbuffered` state. The transition (`EnterUnbuffered()`) involves sampling the buffered data, training a dictionary, and compressing/writing out all buffered data. In the `kUnbuffered` state, a `BlockBasedTableBuilder` behaves the same as before -- blocks are compressed/written out as soon as they fill up.
- Samples are now whole uncompressed data blocks, except the final sample may be a partial data block so we don't breach the user's configured `max_dict_bytes` or `zstd_max_train_bytes`. The dictionary trainer is supposed to work better when we pass it real units of compression. Previously we were passing 64-byte KV samples which was not realistic.
Pull Request resolved: https://github.com/facebook/rocksdb/pull/4952
Differential Revision: D13967980
Pulled By: ajkr
fbshipit-source-id: 82bea6f7537e1529c7a1a4cdee84585f5949300f
2019-02-12 03:42:25 +00:00
|
|
|
CompressionDict(std::string dict, CompressionType /*type*/, int /*level*/) {
|
2019-01-19 03:10:17 +00:00
|
|
|
#endif // ZSTD_VERSION_NUMBER >= 700
|
|
|
|
dict_ = std::move(dict);
|
|
|
|
#if ZSTD_VERSION_NUMBER >= 700
|
|
|
|
zstd_cdict_ = nullptr;
|
|
|
|
if (!dict_.empty() && (type == kZSTD || type == kZSTDNotFinalCompression)) {
|
|
|
|
if (level == CompressionOptions::kDefaultCompressionLevel) {
|
|
|
|
// 3 is the value of ZSTD_CLEVEL_DEFAULT (not exposed publicly), see
|
|
|
|
// https://github.com/facebook/zstd/issues/1148
|
Add `CompressionOptions::checksum` for enabling ZSTD checksum (#11666)
Summary:
Optionally enable zstd checksum flag (https://github.com/facebook/zstd/blob/d857369028d997c92ff1f1861a4d7f679a125464/lib/zstd.h#L428) to detect corruption during decompression. Main changes are in compression.h:
* User can set CompressionOptions::checksum to true to enable this feature.
* We enable this feature in ZSTD by setting the checksum flag in ZSTD compression context: `ZSTD_CCtx`.
* Uses `ZSTD_compress2()` to do compression since it supports frame parameter like the checksum flag. Compression level is also set in compression context as a flag.
* Error handling during decompression to propagate error message from ZSTD.
* Updated microbench to test read performance impact.
About compatibility, the current compression decoders should continue to work with the data created by the new compression API `ZSTD_compress2()`: https://github.com/facebook/zstd/issues/3711.
Pull Request resolved: https://github.com/facebook/rocksdb/pull/11666
Test Plan:
* Existing unit tests for zstd compression
* Add unit test `DBTest2.ZSTDChecksum` to test the corruption case
* Manually tested that compression levels, parallel compression, dictionary compression, index compression all work with the new ZSTD_compress2() API.
* Manually tested with `sst_dump --command=recompress` that different compression levels and dictionary compression settings all work.
* Manually tested compiling with older versions of ZSTD: v1.3.8, v1.1.0, v0.6.2.
* Perf impact: from public benchmark data: http://fastcompression.blogspot.com/2019/03/presenting-xxh3.html for checksum and https://github.com/facebook/zstd#benchmarks, if decompression is 1700MB/s and checksum computation is 70000MB/s, checksum computation is an additional ~2.4% time for decompression. Compression is slower and checksumming should be less noticeable.
* Microbench:
```
TEST_TMPDIR=/dev/shm ./branch_db_basic_bench --benchmark_filter=DBGet/comp_style:0/max_data:1048576/per_key_size:256/enable_statistics:0/negative_query:0/enable_filter:0/mmap:0/compression_type:7/compression_checksum:1/no_blockcache:1/iterations:10000/threads:1 --benchmark_repetitions=100
Min out of 100 runs:
Main:
10390 10436 10456 10484 10499 10535 10544 10545 10565 10568
After this PR, checksum=false
10285 10397 10503 10508 10515 10557 10562 10635 10640 10660
After this PR, checksum=true
10827 10876 10925 10949 10971 11052 11061 11063 11100 11109
```
* db_bench:
```
Write perf
TEST_TMPDIR=/dev/shm/ ./db_bench_ichecksum --benchmarks=fillseq[-X10] --compression_type=zstd --num=10000000 --compression_checksum=..
[FillSeq checksum=0]
fillseq [AVG 10 runs] : 281635 (± 31711) ops/sec; 31.2 (± 3.5) MB/sec
fillseq [MEDIAN 10 runs] : 294027 ops/sec; 32.5 MB/sec
[FillSeq checksum=1]
fillseq [AVG 10 runs] : 286961 (± 34700) ops/sec; 31.7 (± 3.8) MB/sec
fillseq [MEDIAN 10 runs] : 283278 ops/sec; 31.3 MB/sec
Read perf
TEST_TMPDIR=/dev/shm ./db_bench_ichecksum --benchmarks=readrandom[-X20] --num=100000000 --reads=1000000 --use_existing_db=true --readonly=1
[Readrandom checksum=1]
readrandom [AVG 20 runs] : 360928 (± 3579) ops/sec; 4.0 (± 0.0) MB/sec
readrandom [MEDIAN 20 runs] : 362468 ops/sec; 4.0 MB/sec
[Readrandom checksum=0]
readrandom [AVG 20 runs] : 380365 (± 2384) ops/sec; 4.2 (± 0.0) MB/sec
readrandom [MEDIAN 20 runs] : 379800 ops/sec; 4.2 MB/sec
Compression
TEST_TMPDIR=/dev/shm ./db_bench_ichecksum --benchmarks=compress[-X20] --compression_type=zstd --num=100000000 --compression_checksum=1
checksum=1
compress [AVG 20 runs] : 54074 (± 634) ops/sec; 211.2 (± 2.5) MB/sec
compress [MEDIAN 20 runs] : 54396 ops/sec; 212.5 MB/sec
checksum=0
compress [AVG 20 runs] : 54598 (± 393) ops/sec; 213.3 (± 1.5) MB/sec
compress [MEDIAN 20 runs] : 54592 ops/sec; 213.3 MB/sec
Decompression:
TEST_TMPDIR=/dev/shm ./db_bench_ichecksum --benchmarks=uncompress[-X20] --compression_type=zstd --compression_checksum=1
checksum = 0
uncompress [AVG 20 runs] : 167499 (± 962) ops/sec; 654.3 (± 3.8) MB/sec
uncompress [MEDIAN 20 runs] : 167210 ops/sec; 653.2 MB/sec
checksum = 1
uncompress [AVG 20 runs] : 167980 (± 924) ops/sec; 656.2 (± 3.6) MB/sec
uncompress [MEDIAN 20 runs] : 168465 ops/sec; 658.1 MB/sec
```
Reviewed By: ajkr
Differential Revision: D48019378
Pulled By: cbi42
fbshipit-source-id: 674120c6e1853c2ced1436ac8138559d0204feba
2023-08-18 22:01:59 +00:00
|
|
|
// TODO(cbi): ZSTD_CLEVEL_DEFAULT is exposed after
|
|
|
|
// https://github.com/facebook/zstd/pull/1174. Use ZSTD_CLEVEL_DEFAULT
|
|
|
|
// instead of hardcoding 3.
|
2019-01-19 03:10:17 +00:00
|
|
|
level = 3;
|
|
|
|
}
|
|
|
|
// Should be safe (but slower) if below call fails as we'll use the
|
|
|
|
// raw dictionary to compress.
|
|
|
|
zstd_cdict_ = ZSTD_createCDict(dict_.data(), dict_.size(), level);
|
|
|
|
assert(zstd_cdict_ != nullptr);
|
|
|
|
}
|
|
|
|
#endif // ZSTD_VERSION_NUMBER >= 700
|
|
|
|
}
|
|
|
|
|
|
|
|
~CompressionDict() {
|
|
|
|
#if ZSTD_VERSION_NUMBER >= 700
|
|
|
|
size_t res = 0;
|
|
|
|
if (zstd_cdict_ != nullptr) {
|
|
|
|
res = ZSTD_freeCDict(zstd_cdict_);
|
|
|
|
}
|
|
|
|
assert(res == 0); // Last I checked they can't fail
|
|
|
|
(void)res; // prevent unused var warning
|
|
|
|
#endif // ZSTD_VERSION_NUMBER >= 700
|
|
|
|
}
|
|
|
|
|
|
|
|
#if ZSTD_VERSION_NUMBER >= 700
|
2019-01-30 00:23:21 +00:00
|
|
|
const ZSTD_CDict* GetDigestedZstdCDict() const { return zstd_cdict_; }
|
2019-01-19 03:10:17 +00:00
|
|
|
#endif // ZSTD_VERSION_NUMBER >= 700
|
|
|
|
|
|
|
|
Slice GetRawDict() const { return dict_; }
|
|
|
|
|
|
|
|
static const CompressionDict& GetEmptyDict() {
|
|
|
|
static CompressionDict empty_dict{};
|
|
|
|
return empty_dict;
|
|
|
|
}
|
|
|
|
|
|
|
|
CompressionDict() = default;
|
|
|
|
// Disable copy/move
|
|
|
|
CompressionDict(const CompressionDict&) = delete;
|
|
|
|
CompressionDict& operator=(const CompressionDict&) = delete;
|
|
|
|
CompressionDict(CompressionDict&&) = delete;
|
|
|
|
CompressionDict& operator=(CompressionDict&&) = delete;
|
|
|
|
};
|
|
|
|
|
|
|
|
// Holds dictionary and related data, like ZSTD's digested uncompression
|
|
|
|
// dictionary.
|
2019-08-23 15:25:52 +00:00
|
|
|
struct UncompressionDict {
|
|
|
|
// Block containing the data for the compression dictionary in case the
|
|
|
|
// constructor that takes a string parameter is used.
|
2019-07-23 22:57:43 +00:00
|
|
|
std::string dict_;
|
|
|
|
|
2019-08-23 15:25:52 +00:00
|
|
|
// Block containing the data for the compression dictionary in case the
|
|
|
|
// constructor that takes a Slice parameter is used and the passed in
|
|
|
|
// CacheAllocationPtr is not nullptr.
|
|
|
|
CacheAllocationPtr allocation_;
|
|
|
|
|
|
|
|
// Slice pointing to the compression dictionary data. Can point to
|
|
|
|
// dict_, allocation_, or some other memory location, depending on how
|
|
|
|
// the object was constructed.
|
2019-07-23 22:57:43 +00:00
|
|
|
Slice slice_;
|
|
|
|
|
2019-01-24 02:11:08 +00:00
|
|
|
#ifdef ROCKSDB_ZSTD_DDICT
|
2019-07-23 22:57:43 +00:00
|
|
|
// Processed version of the contents of slice_ for ZSTD compression.
|
|
|
|
ZSTD_DDict* zstd_ddict_ = nullptr;
|
2019-01-24 02:11:08 +00:00
|
|
|
#endif // ROCKSDB_ZSTD_DDICT
|
|
|
|
|
|
|
|
#ifdef ROCKSDB_ZSTD_DDICT
|
2019-08-23 15:25:52 +00:00
|
|
|
UncompressionDict(std::string dict, bool using_zstd)
|
2019-01-24 02:11:08 +00:00
|
|
|
#else // ROCKSDB_ZSTD_DDICT
|
2019-08-23 15:25:52 +00:00
|
|
|
UncompressionDict(std::string dict, bool /* using_zstd */)
|
2019-01-24 02:11:08 +00:00
|
|
|
#endif // ROCKSDB_ZSTD_DDICT
|
2019-08-23 15:25:52 +00:00
|
|
|
: dict_(std::move(dict)), slice_(dict_) {
|
2019-01-24 02:11:08 +00:00
|
|
|
#ifdef ROCKSDB_ZSTD_DDICT
|
2019-07-23 22:57:43 +00:00
|
|
|
if (!slice_.empty() && using_zstd) {
|
|
|
|
zstd_ddict_ = ZSTD_createDDict_byReference(slice_.data(), slice_.size());
|
2019-01-19 03:10:17 +00:00
|
|
|
assert(zstd_ddict_ != nullptr);
|
|
|
|
}
|
2019-01-24 02:11:08 +00:00
|
|
|
#endif // ROCKSDB_ZSTD_DDICT
|
2019-01-19 03:10:17 +00:00
|
|
|
}
|
|
|
|
|
2019-08-23 15:25:52 +00:00
|
|
|
#ifdef ROCKSDB_ZSTD_DDICT
|
|
|
|
UncompressionDict(Slice slice, CacheAllocationPtr&& allocation,
|
|
|
|
bool using_zstd)
|
|
|
|
#else // ROCKSDB_ZSTD_DDICT
|
|
|
|
UncompressionDict(Slice slice, CacheAllocationPtr&& allocation,
|
|
|
|
bool /* using_zstd */)
|
|
|
|
#endif // ROCKSDB_ZSTD_DDICT
|
|
|
|
: allocation_(std::move(allocation)), slice_(std::move(slice)) {
|
|
|
|
#ifdef ROCKSDB_ZSTD_DDICT
|
|
|
|
if (!slice_.empty() && using_zstd) {
|
|
|
|
zstd_ddict_ = ZSTD_createDDict_byReference(slice_.data(), slice_.size());
|
|
|
|
assert(zstd_ddict_ != nullptr);
|
|
|
|
}
|
|
|
|
#endif // ROCKSDB_ZSTD_DDICT
|
2019-07-23 22:57:43 +00:00
|
|
|
}
|
|
|
|
|
|
|
|
UncompressionDict(UncompressionDict&& rhs)
|
|
|
|
: dict_(std::move(rhs.dict_)),
|
2019-08-23 15:25:52 +00:00
|
|
|
allocation_(std::move(rhs.allocation_)),
|
2019-07-23 22:57:43 +00:00
|
|
|
slice_(std::move(rhs.slice_))
|
|
|
|
#ifdef ROCKSDB_ZSTD_DDICT
|
|
|
|
,
|
|
|
|
zstd_ddict_(rhs.zstd_ddict_)
|
|
|
|
#endif
|
|
|
|
{
|
|
|
|
#ifdef ROCKSDB_ZSTD_DDICT
|
|
|
|
rhs.zstd_ddict_ = nullptr;
|
|
|
|
#endif
|
|
|
|
}
|
|
|
|
|
2019-01-19 03:10:17 +00:00
|
|
|
~UncompressionDict() {
|
2019-01-24 02:11:08 +00:00
|
|
|
#ifdef ROCKSDB_ZSTD_DDICT
|
2019-01-19 03:10:17 +00:00
|
|
|
size_t res = 0;
|
|
|
|
if (zstd_ddict_ != nullptr) {
|
|
|
|
res = ZSTD_freeDDict(zstd_ddict_);
|
|
|
|
}
|
|
|
|
assert(res == 0); // Last I checked they can't fail
|
|
|
|
(void)res; // prevent unused var warning
|
2019-01-24 02:11:08 +00:00
|
|
|
#endif // ROCKSDB_ZSTD_DDICT
|
2019-01-19 03:10:17 +00:00
|
|
|
}
|
|
|
|
|
2019-07-23 22:57:43 +00:00
|
|
|
UncompressionDict& operator=(UncompressionDict&& rhs) {
|
|
|
|
if (this == &rhs) {
|
|
|
|
return *this;
|
|
|
|
}
|
|
|
|
|
|
|
|
dict_ = std::move(rhs.dict_);
|
2019-08-23 15:25:52 +00:00
|
|
|
allocation_ = std::move(rhs.allocation_);
|
2019-07-23 22:57:43 +00:00
|
|
|
slice_ = std::move(rhs.slice_);
|
|
|
|
|
|
|
|
#ifdef ROCKSDB_ZSTD_DDICT
|
|
|
|
zstd_ddict_ = rhs.zstd_ddict_;
|
|
|
|
rhs.zstd_ddict_ = nullptr;
|
|
|
|
#endif
|
|
|
|
|
|
|
|
return *this;
|
|
|
|
}
|
|
|
|
|
2019-08-23 15:25:52 +00:00
|
|
|
// The object is self-contained if the string constructor is used, or the
|
|
|
|
// Slice constructor is invoked with a non-null allocation. Otherwise, it
|
|
|
|
// is the caller's responsibility to ensure that the underlying storage
|
|
|
|
// outlives this object.
|
|
|
|
bool own_bytes() const { return !dict_.empty() || allocation_; }
|
|
|
|
|
2019-07-23 22:57:43 +00:00
|
|
|
const Slice& GetRawDict() const { return slice_; }
|
|
|
|
|
Major Cache refactoring, CPU efficiency improvement (#10975)
Summary:
This is several refactorings bundled into one to avoid having to incrementally re-modify uses of Cache several times. Overall, there are breaking changes to Cache class, and it becomes more of low-level interface for implementing caches, especially block cache. New internal APIs make using Cache cleaner than before, and more insulated from block cache evolution. Hopefully, this is the last really big block cache refactoring, because of rather effectively decoupling the implementations from the uses. This change also removes the EXPERIMENTAL designation on the SecondaryCache support in Cache. It seems reasonably mature at this point but still subject to change/evolution (as I warn in the API docs for Cache).
The high-level motivation for this refactoring is to minimize code duplication / compounding complexity in adding SecondaryCache support to HyperClockCache (in a later PR). Other benefits listed below.
* static_cast lines of code +29 -35 (net removed 6)
* reinterpret_cast lines of code +6 -32 (net removed 26)
## cache.h and secondary_cache.h
* Always use CacheItemHelper with entries instead of just a Deleter. There are several motivations / justifications:
* Simpler for implementations to deal with just one Insert and one Lookup.
* Simpler and more efficient implementation because we don't have to track which entries are using helpers and which are using deleters
* Gets rid of hack to classify cache entries by their deleter. Instead, the CacheItemHelper includes a CacheEntryRole. This simplifies a lot of code (cache_entry_roles.h almost eliminated). Fixes https://github.com/facebook/rocksdb/issues/9428.
* Makes it trivial to adjust SecondaryCache behavior based on kind of block (e.g. don't re-compress filter blocks).
* It is arguably less convenient for many direct users of Cache, but direct users of Cache are now rare with introduction of typed_cache.h (below).
* I considered and rejected an alternative approach in which we reduce customizability by assuming each secondary cache compatible value starts with a Slice referencing the uncompressed block contents (already true or mostly true), but we apparently intend to stack secondary caches. Saving an entry from a compressed secondary to a lower tier requires custom handling offered by SaveToCallback, etc.
* Make CreateCallback part of the helper and introduce CreateContext to work with it (alternative to https://github.com/facebook/rocksdb/issues/10562). This cleans up the interface while still allowing context to be provided for loading/parsing values into primary cache. This model works for async lookup in BlockBasedTable reader (reader owns a CreateContext) under the assumption that it always waits on secondary cache operations to finish. (Otherwise, the CreateContext could be destroyed while async operation depending on it continues.) This likely contributes most to the observed performance improvement because it saves an std::function backed by a heap allocation.
* Use char* for serialized data, e.g. in SaveToCallback, where void* was confusingly used. (We use `char*` for serialized byte data all over RocksDB, with many advantages over `void*`. `memcpy` etc. are legacy APIs that should not be mimicked.)
* Add a type alias Cache::ObjectPtr = void*, so that we can better indicate the intent of the void* when it is to be the object associated with a Cache entry. Related: started (but did not complete) a refactoring to move away from "value" of a cache entry toward "object" or "obj". (It is confusing to call Cache a key-value store (like DB) when it is really storing arbitrary in-memory objects, not byte strings.)
* Remove unnecessary key param from DeleterFn. This is good for efficiency in HyperClockCache, which does not directly store the cache key in memory. (Alternative to https://github.com/facebook/rocksdb/issues/10774)
* Add allocator to Cache DeleterFn. This is a kind of future-proofing change in case we get more serious about using the Cache allocator for memory tracked by the Cache. Right now, only the uncompressed block contents are allocated using the allocator, and a pointer to that allocator is saved as part of the cached object so that the deleter can use it. (See CacheAllocationPtr.) If in the future we are able to "flatten out" our Cache objects some more, it would be good not to have to track the allocator as part of each object.
* Removes legacy `ApplyToAllCacheEntries` and changes `ApplyToAllEntries` signature for Deleter->CacheItemHelper change.
## typed_cache.h
Adds various "typed" interfaces to the Cache as internal APIs, so that most uses of Cache can use simple type safe code without casting and without explicit deleters, etc. Almost all of the non-test, non-glue code uses of Cache have been migrated. (Follow-up work: CompressedSecondaryCache deserves deeper attention to migrate.) This change expands RocksDB's internal usage of metaprogramming and SFINAE (https://en.cppreference.com/w/cpp/language/sfinae).
The existing usages of Cache are divided up at a high level into these new interfaces. See updated existing uses of Cache for examples of how these are used.
* PlaceholderCacheInterface - Used for making cache reservations, with entries that have a charge but no value.
* BasicTypedCacheInterface<TValue> - Used for primary cache storage of objects of type TValue, which can be cleaned up with std::default_delete<TValue>. The role is provided by TValue::kCacheEntryRole or given in an optional template parameter.
* FullTypedCacheInterface<TValue, TCreateContext> - Used for secondary cache compatible storage of objects of type TValue. In addition to BasicTypedCacheInterface constraints, we require TValue::ContentSlice() to return persistable data. This simplifies usage for the normal case of simple secondary cache compatibility (can give you a Slice to the data already in memory). In addition to TCreateContext performing the role of Cache::CreateContext, it is also expected to provide a factory function for creating TValue.
* For each of these, there's a "Shared" version (e.g. FullTypedSharedCacheInterface) that holds a shared_ptr to the Cache, rather than assuming external ownership by holding only a raw `Cache*`.
These interfaces introduce specific handle types for each interface instantiation, so that it's easy to see what kind of object is controlled by a handle. (Ultimately, this might not be worth the extra complexity, but it seems OK so far.)
Note: I attempted to make the cache 'charge' automatically inferred from the cache object type, such as by expecting an ApproximateMemoryUsage() function, but this is not so clean because there are cases where we need to compute the charge ahead of time and don't want to re-compute it.
## block_cache.h
This header is essentially the replacement for the old block_like_traits.h. It includes various things to support block cache access with typed_cache.h for block-based table.
## block_based_table_reader.cc
Before this change, accessing the block cache here was an awkward mix of static polymorphism (template TBlocklike) and switch-case on a dynamic BlockType value. This change mostly unifies on static polymorphism, relying on minor hacks in block_cache.h to distinguish variants of Block. We still check BlockType in some places (especially for stats, which could be improved in follow-up work) but at least the BlockType is a static constant from the template parameter. (No more awkward partial redundancy between static and dynamic info.) This likely contributes to the overall performance improvement, but hasn't been tested in isolation.
The other key source of simplification here is a more unified system of creating block cache objects: for directly populating from primary cache and for promotion from secondary cache. Both use BlockCreateContext, for context and for factory functions.
## block_based_table_builder.cc, cache_dump_load_impl.cc
Before this change, warming caches was super ugly code. Both of these source files had switch statements to basically transition from the dynamic BlockType world to the static TBlocklike world. None of that mess is needed anymore as there's a new, untyped WarmInCache function that handles all the details just as promotion from SecondaryCache would. (Fixes `TODO akanksha: Dedup below code` in block_based_table_builder.cc.)
## Everything else
Mostly just updating Cache users to use new typed APIs when reasonably possible, or changed Cache APIs when not.
Pull Request resolved: https://github.com/facebook/rocksdb/pull/10975
Test Plan:
tests updated
Performance test setup similar to https://github.com/facebook/rocksdb/issues/10626 (by cache size, LRUCache when not "hyper" for HyperClockCache):
34MB 1thread base.hyper -> kops/s: 0.745 io_bytes/op: 2.52504e+06 miss_ratio: 0.140906 max_rss_mb: 76.4844
34MB 1thread new.hyper -> kops/s: 0.751 io_bytes/op: 2.5123e+06 miss_ratio: 0.140161 max_rss_mb: 79.3594
34MB 1thread base -> kops/s: 0.254 io_bytes/op: 1.36073e+07 miss_ratio: 0.918818 max_rss_mb: 45.9297
34MB 1thread new -> kops/s: 0.252 io_bytes/op: 1.36157e+07 miss_ratio: 0.918999 max_rss_mb: 44.1523
34MB 32thread base.hyper -> kops/s: 7.272 io_bytes/op: 2.88323e+06 miss_ratio: 0.162532 max_rss_mb: 516.602
34MB 32thread new.hyper -> kops/s: 7.214 io_bytes/op: 2.99046e+06 miss_ratio: 0.168818 max_rss_mb: 518.293
34MB 32thread base -> kops/s: 3.528 io_bytes/op: 1.35722e+07 miss_ratio: 0.914691 max_rss_mb: 264.926
34MB 32thread new -> kops/s: 3.604 io_bytes/op: 1.35744e+07 miss_ratio: 0.915054 max_rss_mb: 264.488
233MB 1thread base.hyper -> kops/s: 53.909 io_bytes/op: 2552.35 miss_ratio: 0.0440566 max_rss_mb: 241.984
233MB 1thread new.hyper -> kops/s: 62.792 io_bytes/op: 2549.79 miss_ratio: 0.044043 max_rss_mb: 241.922
233MB 1thread base -> kops/s: 1.197 io_bytes/op: 2.75173e+06 miss_ratio: 0.103093 max_rss_mb: 241.559
233MB 1thread new -> kops/s: 1.199 io_bytes/op: 2.73723e+06 miss_ratio: 0.10305 max_rss_mb: 240.93
233MB 32thread base.hyper -> kops/s: 1298.69 io_bytes/op: 2539.12 miss_ratio: 0.0440307 max_rss_mb: 371.418
233MB 32thread new.hyper -> kops/s: 1421.35 io_bytes/op: 2538.75 miss_ratio: 0.0440307 max_rss_mb: 347.273
233MB 32thread base -> kops/s: 9.693 io_bytes/op: 2.77304e+06 miss_ratio: 0.103745 max_rss_mb: 569.691
233MB 32thread new -> kops/s: 9.75 io_bytes/op: 2.77559e+06 miss_ratio: 0.103798 max_rss_mb: 552.82
1597MB 1thread base.hyper -> kops/s: 58.607 io_bytes/op: 1449.14 miss_ratio: 0.0249324 max_rss_mb: 1583.55
1597MB 1thread new.hyper -> kops/s: 69.6 io_bytes/op: 1434.89 miss_ratio: 0.0247167 max_rss_mb: 1584.02
1597MB 1thread base -> kops/s: 60.478 io_bytes/op: 1421.28 miss_ratio: 0.024452 max_rss_mb: 1589.45
1597MB 1thread new -> kops/s: 63.973 io_bytes/op: 1416.07 miss_ratio: 0.0243766 max_rss_mb: 1589.24
1597MB 32thread base.hyper -> kops/s: 1436.2 io_bytes/op: 1357.93 miss_ratio: 0.0235353 max_rss_mb: 1692.92
1597MB 32thread new.hyper -> kops/s: 1605.03 io_bytes/op: 1358.04 miss_ratio: 0.023538 max_rss_mb: 1702.78
1597MB 32thread base -> kops/s: 280.059 io_bytes/op: 1350.34 miss_ratio: 0.023289 max_rss_mb: 1675.36
1597MB 32thread new -> kops/s: 283.125 io_bytes/op: 1351.05 miss_ratio: 0.0232797 max_rss_mb: 1703.83
Almost uniformly improving over base revision, especially for hot paths with HyperClockCache, up to 12% higher throughput seen (1597MB, 32thread, hyper). The improvement for that is likely coming from much simplified code for providing context for secondary cache promotion (CreateCallback/CreateContext), and possibly from less branching in block_based_table_reader. And likely a small improvement from not reconstituting key for DeleterFn.
Reviewed By: anand1976
Differential Revision: D42417818
Pulled By: pdillinger
fbshipit-source-id: f86bfdd584dce27c028b151ba56818ad14f7a432
2023-01-11 22:20:40 +00:00
|
|
|
// For TypedCacheInterface
|
|
|
|
const Slice& ContentSlice() const { return slice_; }
|
|
|
|
static constexpr CacheEntryRole kCacheEntryRole = CacheEntryRole::kOtherBlock;
|
|
|
|
static constexpr BlockType kBlockType = BlockType::kCompressionDictionary;
|
|
|
|
|
2019-01-24 02:11:08 +00:00
|
|
|
#ifdef ROCKSDB_ZSTD_DDICT
|
2019-01-30 00:23:21 +00:00
|
|
|
const ZSTD_DDict* GetDigestedZstdDDict() const { return zstd_ddict_; }
|
2019-01-24 02:11:08 +00:00
|
|
|
#endif // ROCKSDB_ZSTD_DDICT
|
2019-01-19 03:10:17 +00:00
|
|
|
|
|
|
|
static const UncompressionDict& GetEmptyDict() {
|
|
|
|
static UncompressionDict empty_dict{};
|
|
|
|
return empty_dict;
|
|
|
|
}
|
|
|
|
|
2019-07-23 22:57:43 +00:00
|
|
|
size_t ApproximateMemoryUsage() const {
|
2019-08-23 15:25:52 +00:00
|
|
|
size_t usage = sizeof(struct UncompressionDict);
|
|
|
|
usage += dict_.size();
|
|
|
|
if (allocation_) {
|
|
|
|
auto allocator = allocation_.get_deleter().allocator;
|
|
|
|
if (allocator) {
|
|
|
|
usage += allocator->UsableSize(allocation_.get(), slice_.size());
|
|
|
|
} else {
|
|
|
|
usage += slice_.size();
|
|
|
|
}
|
|
|
|
}
|
2019-01-24 02:11:08 +00:00
|
|
|
#ifdef ROCKSDB_ZSTD_DDICT
|
|
|
|
usage += ZSTD_sizeof_DDict(zstd_ddict_);
|
|
|
|
#endif // ROCKSDB_ZSTD_DDICT
|
|
|
|
return usage;
|
|
|
|
}
|
|
|
|
|
2019-01-19 03:10:17 +00:00
|
|
|
UncompressionDict() = default;
|
2019-07-23 22:57:43 +00:00
|
|
|
// Disable copy
|
2019-01-19 03:10:17 +00:00
|
|
|
UncompressionDict(const CompressionDict&) = delete;
|
|
|
|
UncompressionDict& operator=(const CompressionDict&) = delete;
|
|
|
|
};
|
|
|
|
|
2018-06-04 19:04:52 +00:00
|
|
|
class CompressionContext {
|
2018-06-05 19:51:05 +00:00
|
|
|
private:
|
2018-09-06 16:44:03 +00:00
|
|
|
#if defined(ZSTD) && (ZSTD_VERSION_NUMBER >= 500)
|
2018-06-05 19:51:05 +00:00
|
|
|
ZSTD_CCtx* zstd_ctx_ = nullptr;
|
Add `CompressionOptions::checksum` for enabling ZSTD checksum (#11666)
Summary:
Optionally enable zstd checksum flag (https://github.com/facebook/zstd/blob/d857369028d997c92ff1f1861a4d7f679a125464/lib/zstd.h#L428) to detect corruption during decompression. Main changes are in compression.h:
* User can set CompressionOptions::checksum to true to enable this feature.
* We enable this feature in ZSTD by setting the checksum flag in ZSTD compression context: `ZSTD_CCtx`.
* Uses `ZSTD_compress2()` to do compression since it supports frame parameter like the checksum flag. Compression level is also set in compression context as a flag.
* Error handling during decompression to propagate error message from ZSTD.
* Updated microbench to test read performance impact.
About compatibility, the current compression decoders should continue to work with the data created by the new compression API `ZSTD_compress2()`: https://github.com/facebook/zstd/issues/3711.
Pull Request resolved: https://github.com/facebook/rocksdb/pull/11666
Test Plan:
* Existing unit tests for zstd compression
* Add unit test `DBTest2.ZSTDChecksum` to test the corruption case
* Manually tested that compression levels, parallel compression, dictionary compression, index compression all work with the new ZSTD_compress2() API.
* Manually tested with `sst_dump --command=recompress` that different compression levels and dictionary compression settings all work.
* Manually tested compiling with older versions of ZSTD: v1.3.8, v1.1.0, v0.6.2.
* Perf impact: from public benchmark data: http://fastcompression.blogspot.com/2019/03/presenting-xxh3.html for checksum and https://github.com/facebook/zstd#benchmarks, if decompression is 1700MB/s and checksum computation is 70000MB/s, checksum computation is an additional ~2.4% time for decompression. Compression is slower and checksumming should be less noticeable.
* Microbench:
```
TEST_TMPDIR=/dev/shm ./branch_db_basic_bench --benchmark_filter=DBGet/comp_style:0/max_data:1048576/per_key_size:256/enable_statistics:0/negative_query:0/enable_filter:0/mmap:0/compression_type:7/compression_checksum:1/no_blockcache:1/iterations:10000/threads:1 --benchmark_repetitions=100
Min out of 100 runs:
Main:
10390 10436 10456 10484 10499 10535 10544 10545 10565 10568
After this PR, checksum=false
10285 10397 10503 10508 10515 10557 10562 10635 10640 10660
After this PR, checksum=true
10827 10876 10925 10949 10971 11052 11061 11063 11100 11109
```
* db_bench:
```
Write perf
TEST_TMPDIR=/dev/shm/ ./db_bench_ichecksum --benchmarks=fillseq[-X10] --compression_type=zstd --num=10000000 --compression_checksum=..
[FillSeq checksum=0]
fillseq [AVG 10 runs] : 281635 (± 31711) ops/sec; 31.2 (± 3.5) MB/sec
fillseq [MEDIAN 10 runs] : 294027 ops/sec; 32.5 MB/sec
[FillSeq checksum=1]
fillseq [AVG 10 runs] : 286961 (± 34700) ops/sec; 31.7 (± 3.8) MB/sec
fillseq [MEDIAN 10 runs] : 283278 ops/sec; 31.3 MB/sec
Read perf
TEST_TMPDIR=/dev/shm ./db_bench_ichecksum --benchmarks=readrandom[-X20] --num=100000000 --reads=1000000 --use_existing_db=true --readonly=1
[Readrandom checksum=1]
readrandom [AVG 20 runs] : 360928 (± 3579) ops/sec; 4.0 (± 0.0) MB/sec
readrandom [MEDIAN 20 runs] : 362468 ops/sec; 4.0 MB/sec
[Readrandom checksum=0]
readrandom [AVG 20 runs] : 380365 (± 2384) ops/sec; 4.2 (± 0.0) MB/sec
readrandom [MEDIAN 20 runs] : 379800 ops/sec; 4.2 MB/sec
Compression
TEST_TMPDIR=/dev/shm ./db_bench_ichecksum --benchmarks=compress[-X20] --compression_type=zstd --num=100000000 --compression_checksum=1
checksum=1
compress [AVG 20 runs] : 54074 (± 634) ops/sec; 211.2 (± 2.5) MB/sec
compress [MEDIAN 20 runs] : 54396 ops/sec; 212.5 MB/sec
checksum=0
compress [AVG 20 runs] : 54598 (± 393) ops/sec; 213.3 (± 1.5) MB/sec
compress [MEDIAN 20 runs] : 54592 ops/sec; 213.3 MB/sec
Decompression:
TEST_TMPDIR=/dev/shm ./db_bench_ichecksum --benchmarks=uncompress[-X20] --compression_type=zstd --compression_checksum=1
checksum = 0
uncompress [AVG 20 runs] : 167499 (± 962) ops/sec; 654.3 (± 3.8) MB/sec
uncompress [MEDIAN 20 runs] : 167210 ops/sec; 653.2 MB/sec
checksum = 1
uncompress [AVG 20 runs] : 167980 (± 924) ops/sec; 656.2 (± 3.6) MB/sec
uncompress [MEDIAN 20 runs] : 168465 ops/sec; 658.1 MB/sec
```
Reviewed By: ajkr
Differential Revision: D48019378
Pulled By: cbi42
fbshipit-source-id: 674120c6e1853c2ced1436ac8138559d0204feba
2023-08-18 22:01:59 +00:00
|
|
|
|
|
|
|
ZSTD_CCtx* CreateZSTDContext() {
|
2018-06-04 19:04:52 +00:00
|
|
|
#ifdef ROCKSDB_ZSTD_CUSTOM_MEM
|
Add `CompressionOptions::checksum` for enabling ZSTD checksum (#11666)
Summary:
Optionally enable zstd checksum flag (https://github.com/facebook/zstd/blob/d857369028d997c92ff1f1861a4d7f679a125464/lib/zstd.h#L428) to detect corruption during decompression. Main changes are in compression.h:
* User can set CompressionOptions::checksum to true to enable this feature.
* We enable this feature in ZSTD by setting the checksum flag in ZSTD compression context: `ZSTD_CCtx`.
* Uses `ZSTD_compress2()` to do compression since it supports frame parameter like the checksum flag. Compression level is also set in compression context as a flag.
* Error handling during decompression to propagate error message from ZSTD.
* Updated microbench to test read performance impact.
About compatibility, the current compression decoders should continue to work with the data created by the new compression API `ZSTD_compress2()`: https://github.com/facebook/zstd/issues/3711.
Pull Request resolved: https://github.com/facebook/rocksdb/pull/11666
Test Plan:
* Existing unit tests for zstd compression
* Add unit test `DBTest2.ZSTDChecksum` to test the corruption case
* Manually tested that compression levels, parallel compression, dictionary compression, index compression all work with the new ZSTD_compress2() API.
* Manually tested with `sst_dump --command=recompress` that different compression levels and dictionary compression settings all work.
* Manually tested compiling with older versions of ZSTD: v1.3.8, v1.1.0, v0.6.2.
* Perf impact: from public benchmark data: http://fastcompression.blogspot.com/2019/03/presenting-xxh3.html for checksum and https://github.com/facebook/zstd#benchmarks, if decompression is 1700MB/s and checksum computation is 70000MB/s, checksum computation is an additional ~2.4% time for decompression. Compression is slower and checksumming should be less noticeable.
* Microbench:
```
TEST_TMPDIR=/dev/shm ./branch_db_basic_bench --benchmark_filter=DBGet/comp_style:0/max_data:1048576/per_key_size:256/enable_statistics:0/negative_query:0/enable_filter:0/mmap:0/compression_type:7/compression_checksum:1/no_blockcache:1/iterations:10000/threads:1 --benchmark_repetitions=100
Min out of 100 runs:
Main:
10390 10436 10456 10484 10499 10535 10544 10545 10565 10568
After this PR, checksum=false
10285 10397 10503 10508 10515 10557 10562 10635 10640 10660
After this PR, checksum=true
10827 10876 10925 10949 10971 11052 11061 11063 11100 11109
```
* db_bench:
```
Write perf
TEST_TMPDIR=/dev/shm/ ./db_bench_ichecksum --benchmarks=fillseq[-X10] --compression_type=zstd --num=10000000 --compression_checksum=..
[FillSeq checksum=0]
fillseq [AVG 10 runs] : 281635 (± 31711) ops/sec; 31.2 (± 3.5) MB/sec
fillseq [MEDIAN 10 runs] : 294027 ops/sec; 32.5 MB/sec
[FillSeq checksum=1]
fillseq [AVG 10 runs] : 286961 (± 34700) ops/sec; 31.7 (± 3.8) MB/sec
fillseq [MEDIAN 10 runs] : 283278 ops/sec; 31.3 MB/sec
Read perf
TEST_TMPDIR=/dev/shm ./db_bench_ichecksum --benchmarks=readrandom[-X20] --num=100000000 --reads=1000000 --use_existing_db=true --readonly=1
[Readrandom checksum=1]
readrandom [AVG 20 runs] : 360928 (± 3579) ops/sec; 4.0 (± 0.0) MB/sec
readrandom [MEDIAN 20 runs] : 362468 ops/sec; 4.0 MB/sec
[Readrandom checksum=0]
readrandom [AVG 20 runs] : 380365 (± 2384) ops/sec; 4.2 (± 0.0) MB/sec
readrandom [MEDIAN 20 runs] : 379800 ops/sec; 4.2 MB/sec
Compression
TEST_TMPDIR=/dev/shm ./db_bench_ichecksum --benchmarks=compress[-X20] --compression_type=zstd --num=100000000 --compression_checksum=1
checksum=1
compress [AVG 20 runs] : 54074 (± 634) ops/sec; 211.2 (± 2.5) MB/sec
compress [MEDIAN 20 runs] : 54396 ops/sec; 212.5 MB/sec
checksum=0
compress [AVG 20 runs] : 54598 (± 393) ops/sec; 213.3 (± 1.5) MB/sec
compress [MEDIAN 20 runs] : 54592 ops/sec; 213.3 MB/sec
Decompression:
TEST_TMPDIR=/dev/shm ./db_bench_ichecksum --benchmarks=uncompress[-X20] --compression_type=zstd --compression_checksum=1
checksum = 0
uncompress [AVG 20 runs] : 167499 (± 962) ops/sec; 654.3 (± 3.8) MB/sec
uncompress [MEDIAN 20 runs] : 167210 ops/sec; 653.2 MB/sec
checksum = 1
uncompress [AVG 20 runs] : 167980 (± 924) ops/sec; 656.2 (± 3.6) MB/sec
uncompress [MEDIAN 20 runs] : 168465 ops/sec; 658.1 MB/sec
```
Reviewed By: ajkr
Differential Revision: D48019378
Pulled By: cbi42
fbshipit-source-id: 674120c6e1853c2ced1436ac8138559d0204feba
2023-08-18 22:01:59 +00:00
|
|
|
return ZSTD_createCCtx_advanced(port::GetJeZstdAllocationOverrides());
|
2018-06-04 19:04:52 +00:00
|
|
|
#else // ROCKSDB_ZSTD_CUSTOM_MEM
|
Add `CompressionOptions::checksum` for enabling ZSTD checksum (#11666)
Summary:
Optionally enable zstd checksum flag (https://github.com/facebook/zstd/blob/d857369028d997c92ff1f1861a4d7f679a125464/lib/zstd.h#L428) to detect corruption during decompression. Main changes are in compression.h:
* User can set CompressionOptions::checksum to true to enable this feature.
* We enable this feature in ZSTD by setting the checksum flag in ZSTD compression context: `ZSTD_CCtx`.
* Uses `ZSTD_compress2()` to do compression since it supports frame parameter like the checksum flag. Compression level is also set in compression context as a flag.
* Error handling during decompression to propagate error message from ZSTD.
* Updated microbench to test read performance impact.
About compatibility, the current compression decoders should continue to work with the data created by the new compression API `ZSTD_compress2()`: https://github.com/facebook/zstd/issues/3711.
Pull Request resolved: https://github.com/facebook/rocksdb/pull/11666
Test Plan:
* Existing unit tests for zstd compression
* Add unit test `DBTest2.ZSTDChecksum` to test the corruption case
* Manually tested that compression levels, parallel compression, dictionary compression, index compression all work with the new ZSTD_compress2() API.
* Manually tested with `sst_dump --command=recompress` that different compression levels and dictionary compression settings all work.
* Manually tested compiling with older versions of ZSTD: v1.3.8, v1.1.0, v0.6.2.
* Perf impact: from public benchmark data: http://fastcompression.blogspot.com/2019/03/presenting-xxh3.html for checksum and https://github.com/facebook/zstd#benchmarks, if decompression is 1700MB/s and checksum computation is 70000MB/s, checksum computation is an additional ~2.4% time for decompression. Compression is slower and checksumming should be less noticeable.
* Microbench:
```
TEST_TMPDIR=/dev/shm ./branch_db_basic_bench --benchmark_filter=DBGet/comp_style:0/max_data:1048576/per_key_size:256/enable_statistics:0/negative_query:0/enable_filter:0/mmap:0/compression_type:7/compression_checksum:1/no_blockcache:1/iterations:10000/threads:1 --benchmark_repetitions=100
Min out of 100 runs:
Main:
10390 10436 10456 10484 10499 10535 10544 10545 10565 10568
After this PR, checksum=false
10285 10397 10503 10508 10515 10557 10562 10635 10640 10660
After this PR, checksum=true
10827 10876 10925 10949 10971 11052 11061 11063 11100 11109
```
* db_bench:
```
Write perf
TEST_TMPDIR=/dev/shm/ ./db_bench_ichecksum --benchmarks=fillseq[-X10] --compression_type=zstd --num=10000000 --compression_checksum=..
[FillSeq checksum=0]
fillseq [AVG 10 runs] : 281635 (± 31711) ops/sec; 31.2 (± 3.5) MB/sec
fillseq [MEDIAN 10 runs] : 294027 ops/sec; 32.5 MB/sec
[FillSeq checksum=1]
fillseq [AVG 10 runs] : 286961 (± 34700) ops/sec; 31.7 (± 3.8) MB/sec
fillseq [MEDIAN 10 runs] : 283278 ops/sec; 31.3 MB/sec
Read perf
TEST_TMPDIR=/dev/shm ./db_bench_ichecksum --benchmarks=readrandom[-X20] --num=100000000 --reads=1000000 --use_existing_db=true --readonly=1
[Readrandom checksum=1]
readrandom [AVG 20 runs] : 360928 (± 3579) ops/sec; 4.0 (± 0.0) MB/sec
readrandom [MEDIAN 20 runs] : 362468 ops/sec; 4.0 MB/sec
[Readrandom checksum=0]
readrandom [AVG 20 runs] : 380365 (± 2384) ops/sec; 4.2 (± 0.0) MB/sec
readrandom [MEDIAN 20 runs] : 379800 ops/sec; 4.2 MB/sec
Compression
TEST_TMPDIR=/dev/shm ./db_bench_ichecksum --benchmarks=compress[-X20] --compression_type=zstd --num=100000000 --compression_checksum=1
checksum=1
compress [AVG 20 runs] : 54074 (± 634) ops/sec; 211.2 (± 2.5) MB/sec
compress [MEDIAN 20 runs] : 54396 ops/sec; 212.5 MB/sec
checksum=0
compress [AVG 20 runs] : 54598 (± 393) ops/sec; 213.3 (± 1.5) MB/sec
compress [MEDIAN 20 runs] : 54592 ops/sec; 213.3 MB/sec
Decompression:
TEST_TMPDIR=/dev/shm ./db_bench_ichecksum --benchmarks=uncompress[-X20] --compression_type=zstd --compression_checksum=1
checksum = 0
uncompress [AVG 20 runs] : 167499 (± 962) ops/sec; 654.3 (± 3.8) MB/sec
uncompress [MEDIAN 20 runs] : 167210 ops/sec; 653.2 MB/sec
checksum = 1
uncompress [AVG 20 runs] : 167980 (± 924) ops/sec; 656.2 (± 3.6) MB/sec
uncompress [MEDIAN 20 runs] : 168465 ops/sec; 658.1 MB/sec
```
Reviewed By: ajkr
Differential Revision: D48019378
Pulled By: cbi42
fbshipit-source-id: 674120c6e1853c2ced1436ac8138559d0204feba
2023-08-18 22:01:59 +00:00
|
|
|
return ZSTD_createCCtx();
|
2018-06-04 19:04:52 +00:00
|
|
|
#endif // ROCKSDB_ZSTD_CUSTOM_MEM
|
Add `CompressionOptions::checksum` for enabling ZSTD checksum (#11666)
Summary:
Optionally enable zstd checksum flag (https://github.com/facebook/zstd/blob/d857369028d997c92ff1f1861a4d7f679a125464/lib/zstd.h#L428) to detect corruption during decompression. Main changes are in compression.h:
* User can set CompressionOptions::checksum to true to enable this feature.
* We enable this feature in ZSTD by setting the checksum flag in ZSTD compression context: `ZSTD_CCtx`.
* Uses `ZSTD_compress2()` to do compression since it supports frame parameter like the checksum flag. Compression level is also set in compression context as a flag.
* Error handling during decompression to propagate error message from ZSTD.
* Updated microbench to test read performance impact.
About compatibility, the current compression decoders should continue to work with the data created by the new compression API `ZSTD_compress2()`: https://github.com/facebook/zstd/issues/3711.
Pull Request resolved: https://github.com/facebook/rocksdb/pull/11666
Test Plan:
* Existing unit tests for zstd compression
* Add unit test `DBTest2.ZSTDChecksum` to test the corruption case
* Manually tested that compression levels, parallel compression, dictionary compression, index compression all work with the new ZSTD_compress2() API.
* Manually tested with `sst_dump --command=recompress` that different compression levels and dictionary compression settings all work.
* Manually tested compiling with older versions of ZSTD: v1.3.8, v1.1.0, v0.6.2.
* Perf impact: from public benchmark data: http://fastcompression.blogspot.com/2019/03/presenting-xxh3.html for checksum and https://github.com/facebook/zstd#benchmarks, if decompression is 1700MB/s and checksum computation is 70000MB/s, checksum computation is an additional ~2.4% time for decompression. Compression is slower and checksumming should be less noticeable.
* Microbench:
```
TEST_TMPDIR=/dev/shm ./branch_db_basic_bench --benchmark_filter=DBGet/comp_style:0/max_data:1048576/per_key_size:256/enable_statistics:0/negative_query:0/enable_filter:0/mmap:0/compression_type:7/compression_checksum:1/no_blockcache:1/iterations:10000/threads:1 --benchmark_repetitions=100
Min out of 100 runs:
Main:
10390 10436 10456 10484 10499 10535 10544 10545 10565 10568
After this PR, checksum=false
10285 10397 10503 10508 10515 10557 10562 10635 10640 10660
After this PR, checksum=true
10827 10876 10925 10949 10971 11052 11061 11063 11100 11109
```
* db_bench:
```
Write perf
TEST_TMPDIR=/dev/shm/ ./db_bench_ichecksum --benchmarks=fillseq[-X10] --compression_type=zstd --num=10000000 --compression_checksum=..
[FillSeq checksum=0]
fillseq [AVG 10 runs] : 281635 (± 31711) ops/sec; 31.2 (± 3.5) MB/sec
fillseq [MEDIAN 10 runs] : 294027 ops/sec; 32.5 MB/sec
[FillSeq checksum=1]
fillseq [AVG 10 runs] : 286961 (± 34700) ops/sec; 31.7 (± 3.8) MB/sec
fillseq [MEDIAN 10 runs] : 283278 ops/sec; 31.3 MB/sec
Read perf
TEST_TMPDIR=/dev/shm ./db_bench_ichecksum --benchmarks=readrandom[-X20] --num=100000000 --reads=1000000 --use_existing_db=true --readonly=1
[Readrandom checksum=1]
readrandom [AVG 20 runs] : 360928 (± 3579) ops/sec; 4.0 (± 0.0) MB/sec
readrandom [MEDIAN 20 runs] : 362468 ops/sec; 4.0 MB/sec
[Readrandom checksum=0]
readrandom [AVG 20 runs] : 380365 (± 2384) ops/sec; 4.2 (± 0.0) MB/sec
readrandom [MEDIAN 20 runs] : 379800 ops/sec; 4.2 MB/sec
Compression
TEST_TMPDIR=/dev/shm ./db_bench_ichecksum --benchmarks=compress[-X20] --compression_type=zstd --num=100000000 --compression_checksum=1
checksum=1
compress [AVG 20 runs] : 54074 (± 634) ops/sec; 211.2 (± 2.5) MB/sec
compress [MEDIAN 20 runs] : 54396 ops/sec; 212.5 MB/sec
checksum=0
compress [AVG 20 runs] : 54598 (± 393) ops/sec; 213.3 (± 1.5) MB/sec
compress [MEDIAN 20 runs] : 54592 ops/sec; 213.3 MB/sec
Decompression:
TEST_TMPDIR=/dev/shm ./db_bench_ichecksum --benchmarks=uncompress[-X20] --compression_type=zstd --compression_checksum=1
checksum = 0
uncompress [AVG 20 runs] : 167499 (± 962) ops/sec; 654.3 (± 3.8) MB/sec
uncompress [MEDIAN 20 runs] : 167210 ops/sec; 653.2 MB/sec
checksum = 1
uncompress [AVG 20 runs] : 167980 (± 924) ops/sec; 656.2 (± 3.6) MB/sec
uncompress [MEDIAN 20 runs] : 168465 ops/sec; 658.1 MB/sec
```
Reviewed By: ajkr
Differential Revision: D48019378
Pulled By: cbi42
fbshipit-source-id: 674120c6e1853c2ced1436ac8138559d0204feba
2023-08-18 22:01:59 +00:00
|
|
|
}
|
|
|
|
|
|
|
|
void CreateNativeContext(CompressionType type, int level, bool checksum) {
|
|
|
|
if (type == kZSTD || type == kZSTDNotFinalCompression) {
|
|
|
|
zstd_ctx_ = CreateZSTDContext();
|
|
|
|
#ifdef ZSTD_ADVANCED
|
|
|
|
if (level == CompressionOptions::kDefaultCompressionLevel) {
|
|
|
|
// 3 is the value of ZSTD_CLEVEL_DEFAULT (not exposed publicly), see
|
|
|
|
// https://github.com/facebook/zstd/issues/1148
|
|
|
|
level = 3;
|
|
|
|
}
|
|
|
|
size_t err =
|
|
|
|
ZSTD_CCtx_setParameter(zstd_ctx_, ZSTD_c_compressionLevel, level);
|
|
|
|
if (ZSTD_isError(err)) {
|
|
|
|
assert(false);
|
|
|
|
ZSTD_freeCCtx(zstd_ctx_);
|
|
|
|
zstd_ctx_ = CreateZSTDContext();
|
|
|
|
}
|
|
|
|
if (checksum) {
|
|
|
|
err = ZSTD_CCtx_setParameter(zstd_ctx_, ZSTD_c_checksumFlag, 1);
|
|
|
|
if (ZSTD_isError(err)) {
|
|
|
|
assert(false);
|
|
|
|
ZSTD_freeCCtx(zstd_ctx_);
|
|
|
|
zstd_ctx_ = CreateZSTDContext();
|
|
|
|
}
|
|
|
|
}
|
|
|
|
#else
|
|
|
|
(void)level;
|
|
|
|
(void)checksum;
|
|
|
|
#endif
|
2018-06-04 19:04:52 +00:00
|
|
|
}
|
|
|
|
}
|
|
|
|
void DestroyNativeContext() {
|
|
|
|
if (zstd_ctx_ != nullptr) {
|
|
|
|
ZSTD_freeCCtx(zstd_ctx_);
|
|
|
|
}
|
|
|
|
}
|
2018-06-05 19:51:05 +00:00
|
|
|
|
|
|
|
public:
|
2018-06-04 19:04:52 +00:00
|
|
|
// callable inside ZSTD_Compress
|
2018-06-05 19:51:05 +00:00
|
|
|
ZSTD_CCtx* ZSTDPreallocCtx() const {
|
2019-01-24 02:11:08 +00:00
|
|
|
assert(zstd_ctx_ != nullptr);
|
2018-06-04 19:04:52 +00:00
|
|
|
return zstd_ctx_;
|
|
|
|
}
|
2019-01-19 03:10:17 +00:00
|
|
|
|
2018-06-05 19:51:05 +00:00
|
|
|
#else // ZSTD && (ZSTD_VERSION_NUMBER >= 500)
|
|
|
|
private:
|
Add `CompressionOptions::checksum` for enabling ZSTD checksum (#11666)
Summary:
Optionally enable zstd checksum flag (https://github.com/facebook/zstd/blob/d857369028d997c92ff1f1861a4d7f679a125464/lib/zstd.h#L428) to detect corruption during decompression. Main changes are in compression.h:
* User can set CompressionOptions::checksum to true to enable this feature.
* We enable this feature in ZSTD by setting the checksum flag in ZSTD compression context: `ZSTD_CCtx`.
* Uses `ZSTD_compress2()` to do compression since it supports frame parameter like the checksum flag. Compression level is also set in compression context as a flag.
* Error handling during decompression to propagate error message from ZSTD.
* Updated microbench to test read performance impact.
About compatibility, the current compression decoders should continue to work with the data created by the new compression API `ZSTD_compress2()`: https://github.com/facebook/zstd/issues/3711.
Pull Request resolved: https://github.com/facebook/rocksdb/pull/11666
Test Plan:
* Existing unit tests for zstd compression
* Add unit test `DBTest2.ZSTDChecksum` to test the corruption case
* Manually tested that compression levels, parallel compression, dictionary compression, index compression all work with the new ZSTD_compress2() API.
* Manually tested with `sst_dump --command=recompress` that different compression levels and dictionary compression settings all work.
* Manually tested compiling with older versions of ZSTD: v1.3.8, v1.1.0, v0.6.2.
* Perf impact: from public benchmark data: http://fastcompression.blogspot.com/2019/03/presenting-xxh3.html for checksum and https://github.com/facebook/zstd#benchmarks, if decompression is 1700MB/s and checksum computation is 70000MB/s, checksum computation is an additional ~2.4% time for decompression. Compression is slower and checksumming should be less noticeable.
* Microbench:
```
TEST_TMPDIR=/dev/shm ./branch_db_basic_bench --benchmark_filter=DBGet/comp_style:0/max_data:1048576/per_key_size:256/enable_statistics:0/negative_query:0/enable_filter:0/mmap:0/compression_type:7/compression_checksum:1/no_blockcache:1/iterations:10000/threads:1 --benchmark_repetitions=100
Min out of 100 runs:
Main:
10390 10436 10456 10484 10499 10535 10544 10545 10565 10568
After this PR, checksum=false
10285 10397 10503 10508 10515 10557 10562 10635 10640 10660
After this PR, checksum=true
10827 10876 10925 10949 10971 11052 11061 11063 11100 11109
```
* db_bench:
```
Write perf
TEST_TMPDIR=/dev/shm/ ./db_bench_ichecksum --benchmarks=fillseq[-X10] --compression_type=zstd --num=10000000 --compression_checksum=..
[FillSeq checksum=0]
fillseq [AVG 10 runs] : 281635 (± 31711) ops/sec; 31.2 (± 3.5) MB/sec
fillseq [MEDIAN 10 runs] : 294027 ops/sec; 32.5 MB/sec
[FillSeq checksum=1]
fillseq [AVG 10 runs] : 286961 (± 34700) ops/sec; 31.7 (± 3.8) MB/sec
fillseq [MEDIAN 10 runs] : 283278 ops/sec; 31.3 MB/sec
Read perf
TEST_TMPDIR=/dev/shm ./db_bench_ichecksum --benchmarks=readrandom[-X20] --num=100000000 --reads=1000000 --use_existing_db=true --readonly=1
[Readrandom checksum=1]
readrandom [AVG 20 runs] : 360928 (± 3579) ops/sec; 4.0 (± 0.0) MB/sec
readrandom [MEDIAN 20 runs] : 362468 ops/sec; 4.0 MB/sec
[Readrandom checksum=0]
readrandom [AVG 20 runs] : 380365 (± 2384) ops/sec; 4.2 (± 0.0) MB/sec
readrandom [MEDIAN 20 runs] : 379800 ops/sec; 4.2 MB/sec
Compression
TEST_TMPDIR=/dev/shm ./db_bench_ichecksum --benchmarks=compress[-X20] --compression_type=zstd --num=100000000 --compression_checksum=1
checksum=1
compress [AVG 20 runs] : 54074 (± 634) ops/sec; 211.2 (± 2.5) MB/sec
compress [MEDIAN 20 runs] : 54396 ops/sec; 212.5 MB/sec
checksum=0
compress [AVG 20 runs] : 54598 (± 393) ops/sec; 213.3 (± 1.5) MB/sec
compress [MEDIAN 20 runs] : 54592 ops/sec; 213.3 MB/sec
Decompression:
TEST_TMPDIR=/dev/shm ./db_bench_ichecksum --benchmarks=uncompress[-X20] --compression_type=zstd --compression_checksum=1
checksum = 0
uncompress [AVG 20 runs] : 167499 (± 962) ops/sec; 654.3 (± 3.8) MB/sec
uncompress [MEDIAN 20 runs] : 167210 ops/sec; 653.2 MB/sec
checksum = 1
uncompress [AVG 20 runs] : 167980 (± 924) ops/sec; 656.2 (± 3.6) MB/sec
uncompress [MEDIAN 20 runs] : 168465 ops/sec; 658.1 MB/sec
```
Reviewed By: ajkr
Differential Revision: D48019378
Pulled By: cbi42
fbshipit-source-id: 674120c6e1853c2ced1436ac8138559d0204feba
2023-08-18 22:01:59 +00:00
|
|
|
void CreateNativeContext(CompressionType /* type */, int /* level */,
|
|
|
|
bool /* checksum */) {}
|
2018-06-04 19:04:52 +00:00
|
|
|
void DestroyNativeContext() {}
|
2018-06-05 19:51:05 +00:00
|
|
|
#endif // ZSTD && (ZSTD_VERSION_NUMBER >= 500)
|
|
|
|
public:
|
Add `CompressionOptions::checksum` for enabling ZSTD checksum (#11666)
Summary:
Optionally enable zstd checksum flag (https://github.com/facebook/zstd/blob/d857369028d997c92ff1f1861a4d7f679a125464/lib/zstd.h#L428) to detect corruption during decompression. Main changes are in compression.h:
* User can set CompressionOptions::checksum to true to enable this feature.
* We enable this feature in ZSTD by setting the checksum flag in ZSTD compression context: `ZSTD_CCtx`.
* Uses `ZSTD_compress2()` to do compression since it supports frame parameter like the checksum flag. Compression level is also set in compression context as a flag.
* Error handling during decompression to propagate error message from ZSTD.
* Updated microbench to test read performance impact.
About compatibility, the current compression decoders should continue to work with the data created by the new compression API `ZSTD_compress2()`: https://github.com/facebook/zstd/issues/3711.
Pull Request resolved: https://github.com/facebook/rocksdb/pull/11666
Test Plan:
* Existing unit tests for zstd compression
* Add unit test `DBTest2.ZSTDChecksum` to test the corruption case
* Manually tested that compression levels, parallel compression, dictionary compression, index compression all work with the new ZSTD_compress2() API.
* Manually tested with `sst_dump --command=recompress` that different compression levels and dictionary compression settings all work.
* Manually tested compiling with older versions of ZSTD: v1.3.8, v1.1.0, v0.6.2.
* Perf impact: from public benchmark data: http://fastcompression.blogspot.com/2019/03/presenting-xxh3.html for checksum and https://github.com/facebook/zstd#benchmarks, if decompression is 1700MB/s and checksum computation is 70000MB/s, checksum computation is an additional ~2.4% time for decompression. Compression is slower and checksumming should be less noticeable.
* Microbench:
```
TEST_TMPDIR=/dev/shm ./branch_db_basic_bench --benchmark_filter=DBGet/comp_style:0/max_data:1048576/per_key_size:256/enable_statistics:0/negative_query:0/enable_filter:0/mmap:0/compression_type:7/compression_checksum:1/no_blockcache:1/iterations:10000/threads:1 --benchmark_repetitions=100
Min out of 100 runs:
Main:
10390 10436 10456 10484 10499 10535 10544 10545 10565 10568
After this PR, checksum=false
10285 10397 10503 10508 10515 10557 10562 10635 10640 10660
After this PR, checksum=true
10827 10876 10925 10949 10971 11052 11061 11063 11100 11109
```
* db_bench:
```
Write perf
TEST_TMPDIR=/dev/shm/ ./db_bench_ichecksum --benchmarks=fillseq[-X10] --compression_type=zstd --num=10000000 --compression_checksum=..
[FillSeq checksum=0]
fillseq [AVG 10 runs] : 281635 (± 31711) ops/sec; 31.2 (± 3.5) MB/sec
fillseq [MEDIAN 10 runs] : 294027 ops/sec; 32.5 MB/sec
[FillSeq checksum=1]
fillseq [AVG 10 runs] : 286961 (± 34700) ops/sec; 31.7 (± 3.8) MB/sec
fillseq [MEDIAN 10 runs] : 283278 ops/sec; 31.3 MB/sec
Read perf
TEST_TMPDIR=/dev/shm ./db_bench_ichecksum --benchmarks=readrandom[-X20] --num=100000000 --reads=1000000 --use_existing_db=true --readonly=1
[Readrandom checksum=1]
readrandom [AVG 20 runs] : 360928 (± 3579) ops/sec; 4.0 (± 0.0) MB/sec
readrandom [MEDIAN 20 runs] : 362468 ops/sec; 4.0 MB/sec
[Readrandom checksum=0]
readrandom [AVG 20 runs] : 380365 (± 2384) ops/sec; 4.2 (± 0.0) MB/sec
readrandom [MEDIAN 20 runs] : 379800 ops/sec; 4.2 MB/sec
Compression
TEST_TMPDIR=/dev/shm ./db_bench_ichecksum --benchmarks=compress[-X20] --compression_type=zstd --num=100000000 --compression_checksum=1
checksum=1
compress [AVG 20 runs] : 54074 (± 634) ops/sec; 211.2 (± 2.5) MB/sec
compress [MEDIAN 20 runs] : 54396 ops/sec; 212.5 MB/sec
checksum=0
compress [AVG 20 runs] : 54598 (± 393) ops/sec; 213.3 (± 1.5) MB/sec
compress [MEDIAN 20 runs] : 54592 ops/sec; 213.3 MB/sec
Decompression:
TEST_TMPDIR=/dev/shm ./db_bench_ichecksum --benchmarks=uncompress[-X20] --compression_type=zstd --compression_checksum=1
checksum = 0
uncompress [AVG 20 runs] : 167499 (± 962) ops/sec; 654.3 (± 3.8) MB/sec
uncompress [MEDIAN 20 runs] : 167210 ops/sec; 653.2 MB/sec
checksum = 1
uncompress [AVG 20 runs] : 167980 (± 924) ops/sec; 656.2 (± 3.6) MB/sec
uncompress [MEDIAN 20 runs] : 168465 ops/sec; 658.1 MB/sec
```
Reviewed By: ajkr
Differential Revision: D48019378
Pulled By: cbi42
fbshipit-source-id: 674120c6e1853c2ced1436ac8138559d0204feba
2023-08-18 22:01:59 +00:00
|
|
|
explicit CompressionContext(CompressionType type,
|
|
|
|
const CompressionOptions& options) {
|
|
|
|
CreateNativeContext(type, options.level, options.checksum);
|
2018-09-06 16:44:03 +00:00
|
|
|
}
|
2018-06-05 19:51:05 +00:00
|
|
|
~CompressionContext() { DestroyNativeContext(); }
|
2018-06-04 19:04:52 +00:00
|
|
|
CompressionContext(const CompressionContext&) = delete;
|
|
|
|
CompressionContext& operator=(const CompressionContext&) = delete;
|
2019-01-19 03:10:17 +00:00
|
|
|
};
|
|
|
|
|
|
|
|
class CompressionInfo {
|
|
|
|
const CompressionOptions& opts_;
|
|
|
|
const CompressionContext& context_;
|
|
|
|
const CompressionDict& dict_;
|
|
|
|
const CompressionType type_;
|
2019-03-18 19:07:35 +00:00
|
|
|
const uint64_t sample_for_compression_;
|
2019-01-19 03:10:17 +00:00
|
|
|
|
|
|
|
public:
|
|
|
|
CompressionInfo(const CompressionOptions& _opts,
|
|
|
|
const CompressionContext& _context,
|
2019-03-18 19:07:35 +00:00
|
|
|
const CompressionDict& _dict, CompressionType _type,
|
|
|
|
uint64_t _sample_for_compression)
|
|
|
|
: opts_(_opts),
|
|
|
|
context_(_context),
|
|
|
|
dict_(_dict),
|
|
|
|
type_(_type),
|
|
|
|
sample_for_compression_(_sample_for_compression) {}
|
2018-06-04 19:04:52 +00:00
|
|
|
|
2018-06-05 19:51:05 +00:00
|
|
|
const CompressionOptions& options() const { return opts_; }
|
2019-01-19 03:10:17 +00:00
|
|
|
const CompressionContext& context() const { return context_; }
|
|
|
|
const CompressionDict& dict() const { return dict_; }
|
2018-06-05 19:51:05 +00:00
|
|
|
CompressionType type() const { return type_; }
|
2019-03-18 19:07:35 +00:00
|
|
|
uint64_t SampleForCompression() const { return sample_for_compression_; }
|
2018-06-04 19:04:52 +00:00
|
|
|
};
|
|
|
|
|
|
|
|
class UncompressionContext {
|
2018-06-05 19:51:05 +00:00
|
|
|
private:
|
2018-06-04 19:04:52 +00:00
|
|
|
CompressionContextCache* ctx_cache_ = nullptr;
|
|
|
|
ZSTDUncompressCachedData uncomp_cached_data_;
|
2018-06-05 19:51:05 +00:00
|
|
|
|
|
|
|
public:
|
2019-01-24 02:11:08 +00:00
|
|
|
explicit UncompressionContext(CompressionType type) {
|
|
|
|
if (type == kZSTD || type == kZSTDNotFinalCompression) {
|
2018-06-04 19:04:52 +00:00
|
|
|
ctx_cache_ = CompressionContextCache::Instance();
|
|
|
|
uncomp_cached_data_ = ctx_cache_->GetCachedZSTDUncompressData();
|
|
|
|
}
|
|
|
|
}
|
|
|
|
~UncompressionContext() {
|
2019-01-24 02:11:08 +00:00
|
|
|
if (uncomp_cached_data_.GetCacheIndex() != -1) {
|
2018-06-04 19:04:52 +00:00
|
|
|
assert(ctx_cache_ != nullptr);
|
|
|
|
ctx_cache_->ReturnCachedZSTDUncompressData(
|
2018-06-05 19:51:05 +00:00
|
|
|
uncomp_cached_data_.GetCacheIndex());
|
2018-06-04 19:04:52 +00:00
|
|
|
}
|
|
|
|
}
|
|
|
|
UncompressionContext(const UncompressionContext&) = delete;
|
|
|
|
UncompressionContext& operator=(const UncompressionContext&) = delete;
|
|
|
|
|
|
|
|
ZSTDUncompressCachedData::ZSTDNativeContext GetZSTDContext() const {
|
|
|
|
return uncomp_cached_data_.Get();
|
|
|
|
}
|
2019-01-19 03:10:17 +00:00
|
|
|
};
|
|
|
|
|
|
|
|
class UncompressionInfo {
|
|
|
|
const UncompressionContext& context_;
|
|
|
|
const UncompressionDict& dict_;
|
|
|
|
const CompressionType type_;
|
|
|
|
|
|
|
|
public:
|
|
|
|
UncompressionInfo(const UncompressionContext& _context,
|
|
|
|
const UncompressionDict& _dict, CompressionType _type)
|
|
|
|
: context_(_context), dict_(_dict), type_(_type) {}
|
|
|
|
|
|
|
|
const UncompressionContext& context() const { return context_; }
|
|
|
|
const UncompressionDict& dict() const { return dict_; }
|
2018-06-05 19:51:05 +00:00
|
|
|
CompressionType type() const { return type_; }
|
2018-06-04 19:04:52 +00:00
|
|
|
};
|
|
|
|
|
2015-04-06 19:50:44 +00:00
|
|
|
inline bool Snappy_Supported() {
|
|
|
|
#ifdef SNAPPY
|
|
|
|
return true;
|
2017-10-19 17:48:47 +00:00
|
|
|
#else
|
2015-04-06 19:50:44 +00:00
|
|
|
return false;
|
2017-10-19 17:48:47 +00:00
|
|
|
#endif
|
2015-04-06 19:50:44 +00:00
|
|
|
}
|
|
|
|
|
|
|
|
inline bool Zlib_Supported() {
|
|
|
|
#ifdef ZLIB
|
|
|
|
return true;
|
2018-01-31 20:04:52 +00:00
|
|
|
#else
|
2015-04-06 19:50:44 +00:00
|
|
|
return false;
|
2018-01-31 20:04:52 +00:00
|
|
|
#endif
|
2015-04-06 19:50:44 +00:00
|
|
|
}
|
|
|
|
|
|
|
|
inline bool BZip2_Supported() {
|
|
|
|
#ifdef BZIP2
|
|
|
|
return true;
|
2017-10-19 17:48:47 +00:00
|
|
|
#else
|
2015-04-06 19:50:44 +00:00
|
|
|
return false;
|
2017-10-19 17:48:47 +00:00
|
|
|
#endif
|
2015-04-06 19:50:44 +00:00
|
|
|
}
|
|
|
|
|
|
|
|
inline bool LZ4_Supported() {
|
|
|
|
#ifdef LZ4
|
|
|
|
return true;
|
2017-10-19 17:48:47 +00:00
|
|
|
#else
|
2015-04-06 19:50:44 +00:00
|
|
|
return false;
|
2017-10-19 17:48:47 +00:00
|
|
|
#endif
|
2015-04-06 19:50:44 +00:00
|
|
|
}
|
|
|
|
|
2016-04-20 05:54:24 +00:00
|
|
|
inline bool XPRESS_Supported() {
|
|
|
|
#ifdef XPRESS
|
|
|
|
return true;
|
2017-10-19 17:48:47 +00:00
|
|
|
#else
|
2016-04-20 05:54:24 +00:00
|
|
|
return false;
|
2017-10-19 17:48:47 +00:00
|
|
|
#endif
|
2016-04-20 05:54:24 +00:00
|
|
|
}
|
|
|
|
|
2015-08-27 22:40:42 +00:00
|
|
|
inline bool ZSTD_Supported() {
|
2016-09-01 22:28:40 +00:00
|
|
|
#ifdef ZSTD
|
|
|
|
// ZSTD format is finalized since version 0.8.0.
|
|
|
|
return (ZSTD_versionNumber() >= 800);
|
2017-10-19 17:48:47 +00:00
|
|
|
#else
|
2016-09-01 22:28:40 +00:00
|
|
|
return false;
|
2017-10-19 17:48:47 +00:00
|
|
|
#endif
|
2016-09-01 22:28:40 +00:00
|
|
|
}
|
|
|
|
|
|
|
|
inline bool ZSTDNotFinal_Supported() {
|
2015-08-27 22:40:42 +00:00
|
|
|
#ifdef ZSTD
|
|
|
|
return true;
|
2017-10-19 17:48:47 +00:00
|
|
|
#else
|
2015-08-27 22:40:42 +00:00
|
|
|
return false;
|
2017-10-19 17:48:47 +00:00
|
|
|
#endif
|
2015-08-27 22:40:42 +00:00
|
|
|
}
|
|
|
|
|
2022-02-18 00:18:01 +00:00
|
|
|
inline bool ZSTD_Streaming_Supported() {
|
Add `CompressionOptions::checksum` for enabling ZSTD checksum (#11666)
Summary:
Optionally enable zstd checksum flag (https://github.com/facebook/zstd/blob/d857369028d997c92ff1f1861a4d7f679a125464/lib/zstd.h#L428) to detect corruption during decompression. Main changes are in compression.h:
* User can set CompressionOptions::checksum to true to enable this feature.
* We enable this feature in ZSTD by setting the checksum flag in ZSTD compression context: `ZSTD_CCtx`.
* Uses `ZSTD_compress2()` to do compression since it supports frame parameter like the checksum flag. Compression level is also set in compression context as a flag.
* Error handling during decompression to propagate error message from ZSTD.
* Updated microbench to test read performance impact.
About compatibility, the current compression decoders should continue to work with the data created by the new compression API `ZSTD_compress2()`: https://github.com/facebook/zstd/issues/3711.
Pull Request resolved: https://github.com/facebook/rocksdb/pull/11666
Test Plan:
* Existing unit tests for zstd compression
* Add unit test `DBTest2.ZSTDChecksum` to test the corruption case
* Manually tested that compression levels, parallel compression, dictionary compression, index compression all work with the new ZSTD_compress2() API.
* Manually tested with `sst_dump --command=recompress` that different compression levels and dictionary compression settings all work.
* Manually tested compiling with older versions of ZSTD: v1.3.8, v1.1.0, v0.6.2.
* Perf impact: from public benchmark data: http://fastcompression.blogspot.com/2019/03/presenting-xxh3.html for checksum and https://github.com/facebook/zstd#benchmarks, if decompression is 1700MB/s and checksum computation is 70000MB/s, checksum computation is an additional ~2.4% time for decompression. Compression is slower and checksumming should be less noticeable.
* Microbench:
```
TEST_TMPDIR=/dev/shm ./branch_db_basic_bench --benchmark_filter=DBGet/comp_style:0/max_data:1048576/per_key_size:256/enable_statistics:0/negative_query:0/enable_filter:0/mmap:0/compression_type:7/compression_checksum:1/no_blockcache:1/iterations:10000/threads:1 --benchmark_repetitions=100
Min out of 100 runs:
Main:
10390 10436 10456 10484 10499 10535 10544 10545 10565 10568
After this PR, checksum=false
10285 10397 10503 10508 10515 10557 10562 10635 10640 10660
After this PR, checksum=true
10827 10876 10925 10949 10971 11052 11061 11063 11100 11109
```
* db_bench:
```
Write perf
TEST_TMPDIR=/dev/shm/ ./db_bench_ichecksum --benchmarks=fillseq[-X10] --compression_type=zstd --num=10000000 --compression_checksum=..
[FillSeq checksum=0]
fillseq [AVG 10 runs] : 281635 (± 31711) ops/sec; 31.2 (± 3.5) MB/sec
fillseq [MEDIAN 10 runs] : 294027 ops/sec; 32.5 MB/sec
[FillSeq checksum=1]
fillseq [AVG 10 runs] : 286961 (± 34700) ops/sec; 31.7 (± 3.8) MB/sec
fillseq [MEDIAN 10 runs] : 283278 ops/sec; 31.3 MB/sec
Read perf
TEST_TMPDIR=/dev/shm ./db_bench_ichecksum --benchmarks=readrandom[-X20] --num=100000000 --reads=1000000 --use_existing_db=true --readonly=1
[Readrandom checksum=1]
readrandom [AVG 20 runs] : 360928 (± 3579) ops/sec; 4.0 (± 0.0) MB/sec
readrandom [MEDIAN 20 runs] : 362468 ops/sec; 4.0 MB/sec
[Readrandom checksum=0]
readrandom [AVG 20 runs] : 380365 (± 2384) ops/sec; 4.2 (± 0.0) MB/sec
readrandom [MEDIAN 20 runs] : 379800 ops/sec; 4.2 MB/sec
Compression
TEST_TMPDIR=/dev/shm ./db_bench_ichecksum --benchmarks=compress[-X20] --compression_type=zstd --num=100000000 --compression_checksum=1
checksum=1
compress [AVG 20 runs] : 54074 (± 634) ops/sec; 211.2 (± 2.5) MB/sec
compress [MEDIAN 20 runs] : 54396 ops/sec; 212.5 MB/sec
checksum=0
compress [AVG 20 runs] : 54598 (± 393) ops/sec; 213.3 (± 1.5) MB/sec
compress [MEDIAN 20 runs] : 54592 ops/sec; 213.3 MB/sec
Decompression:
TEST_TMPDIR=/dev/shm ./db_bench_ichecksum --benchmarks=uncompress[-X20] --compression_type=zstd --compression_checksum=1
checksum = 0
uncompress [AVG 20 runs] : 167499 (± 962) ops/sec; 654.3 (± 3.8) MB/sec
uncompress [MEDIAN 20 runs] : 167210 ops/sec; 653.2 MB/sec
checksum = 1
uncompress [AVG 20 runs] : 167980 (± 924) ops/sec; 656.2 (± 3.6) MB/sec
uncompress [MEDIAN 20 runs] : 168465 ops/sec; 658.1 MB/sec
```
Reviewed By: ajkr
Differential Revision: D48019378
Pulled By: cbi42
fbshipit-source-id: 674120c6e1853c2ced1436ac8138559d0204feba
2023-08-18 22:01:59 +00:00
|
|
|
#if defined(ZSTD_ADVANCED)
|
2022-04-14 18:05:39 +00:00
|
|
|
return true;
|
2022-02-18 00:18:01 +00:00
|
|
|
#else
|
|
|
|
return false;
|
|
|
|
#endif
|
|
|
|
}
|
|
|
|
|
|
|
|
inline bool StreamingCompressionTypeSupported(
|
|
|
|
CompressionType compression_type) {
|
|
|
|
switch (compression_type) {
|
|
|
|
case kNoCompression:
|
|
|
|
return true;
|
|
|
|
case kZSTD:
|
|
|
|
return ZSTD_Streaming_Supported();
|
|
|
|
default:
|
|
|
|
return false;
|
|
|
|
}
|
|
|
|
}
|
|
|
|
|
2015-06-18 21:55:05 +00:00
|
|
|
inline bool CompressionTypeSupported(CompressionType compression_type) {
|
|
|
|
switch (compression_type) {
|
|
|
|
case kNoCompression:
|
|
|
|
return true;
|
|
|
|
case kSnappyCompression:
|
|
|
|
return Snappy_Supported();
|
|
|
|
case kZlibCompression:
|
|
|
|
return Zlib_Supported();
|
|
|
|
case kBZip2Compression:
|
|
|
|
return BZip2_Supported();
|
|
|
|
case kLZ4Compression:
|
|
|
|
return LZ4_Supported();
|
|
|
|
case kLZ4HCCompression:
|
|
|
|
return LZ4_Supported();
|
2016-04-20 05:54:24 +00:00
|
|
|
case kXpressCompression:
|
|
|
|
return XPRESS_Supported();
|
2015-08-27 22:40:42 +00:00
|
|
|
case kZSTDNotFinalCompression:
|
2016-09-01 22:28:40 +00:00
|
|
|
return ZSTDNotFinal_Supported();
|
|
|
|
case kZSTD:
|
2015-09-09 23:36:19 +00:00
|
|
|
return ZSTD_Supported();
|
2015-06-18 21:55:05 +00:00
|
|
|
default:
|
|
|
|
assert(false);
|
|
|
|
return false;
|
|
|
|
}
|
|
|
|
}
|
|
|
|
|
2020-11-03 03:20:15 +00:00
|
|
|
inline bool DictCompressionTypeSupported(CompressionType compression_type) {
|
|
|
|
switch (compression_type) {
|
|
|
|
case kNoCompression:
|
|
|
|
return false;
|
|
|
|
case kSnappyCompression:
|
|
|
|
return false;
|
|
|
|
case kZlibCompression:
|
|
|
|
return Zlib_Supported();
|
|
|
|
case kBZip2Compression:
|
|
|
|
return false;
|
|
|
|
case kLZ4Compression:
|
|
|
|
case kLZ4HCCompression:
|
|
|
|
#if LZ4_VERSION_NUMBER >= 10400 // r124+
|
|
|
|
return LZ4_Supported();
|
|
|
|
#else
|
|
|
|
return false;
|
|
|
|
#endif
|
|
|
|
case kXpressCompression:
|
|
|
|
return false;
|
|
|
|
case kZSTDNotFinalCompression:
|
|
|
|
#if ZSTD_VERSION_NUMBER >= 500 // v0.5.0+
|
|
|
|
return ZSTDNotFinal_Supported();
|
|
|
|
#else
|
|
|
|
return false;
|
|
|
|
#endif
|
|
|
|
case kZSTD:
|
|
|
|
#if ZSTD_VERSION_NUMBER >= 500 // v0.5.0+
|
|
|
|
return ZSTD_Supported();
|
|
|
|
#else
|
|
|
|
return false;
|
|
|
|
#endif
|
|
|
|
default:
|
|
|
|
assert(false);
|
|
|
|
return false;
|
|
|
|
}
|
|
|
|
}
|
|
|
|
|
2015-06-18 21:55:05 +00:00
|
|
|
inline std::string CompressionTypeToString(CompressionType compression_type) {
|
|
|
|
switch (compression_type) {
|
|
|
|
case kNoCompression:
|
|
|
|
return "NoCompression";
|
|
|
|
case kSnappyCompression:
|
|
|
|
return "Snappy";
|
|
|
|
case kZlibCompression:
|
|
|
|
return "Zlib";
|
|
|
|
case kBZip2Compression:
|
|
|
|
return "BZip2";
|
|
|
|
case kLZ4Compression:
|
|
|
|
return "LZ4";
|
|
|
|
case kLZ4HCCompression:
|
|
|
|
return "LZ4HC";
|
2016-04-20 05:54:24 +00:00
|
|
|
case kXpressCompression:
|
|
|
|
return "Xpress";
|
2016-09-01 22:28:40 +00:00
|
|
|
case kZSTD:
|
2015-08-27 22:40:42 +00:00
|
|
|
return "ZSTD";
|
2017-11-16 03:30:35 +00:00
|
|
|
case kZSTDNotFinalCompression:
|
|
|
|
return "ZSTDNotFinal";
|
2019-12-21 00:13:19 +00:00
|
|
|
case kDisableCompressionOption:
|
|
|
|
return "DisableOption";
|
2015-06-18 21:55:05 +00:00
|
|
|
default:
|
|
|
|
assert(false);
|
|
|
|
return "";
|
|
|
|
}
|
|
|
|
}
|
|
|
|
|
2019-04-02 21:48:52 +00:00
|
|
|
inline std::string CompressionOptionsToString(
|
|
|
|
CompressionOptions& compression_options) {
|
|
|
|
std::string result;
|
|
|
|
result.reserve(512);
|
|
|
|
result.append("window_bits=")
|
2022-05-06 20:03:58 +00:00
|
|
|
.append(std::to_string(compression_options.window_bits))
|
2019-04-02 21:48:52 +00:00
|
|
|
.append("; ");
|
|
|
|
result.append("level=")
|
2022-05-06 20:03:58 +00:00
|
|
|
.append(std::to_string(compression_options.level))
|
2019-04-02 21:48:52 +00:00
|
|
|
.append("; ");
|
|
|
|
result.append("strategy=")
|
2022-05-06 20:03:58 +00:00
|
|
|
.append(std::to_string(compression_options.strategy))
|
2019-04-02 21:48:52 +00:00
|
|
|
.append("; ");
|
|
|
|
result.append("max_dict_bytes=")
|
2022-05-06 20:03:58 +00:00
|
|
|
.append(std::to_string(compression_options.max_dict_bytes))
|
2019-04-02 21:48:52 +00:00
|
|
|
.append("; ");
|
|
|
|
result.append("zstd_max_train_bytes=")
|
2022-05-06 20:03:58 +00:00
|
|
|
.append(std::to_string(compression_options.zstd_max_train_bytes))
|
2019-04-02 21:48:52 +00:00
|
|
|
.append("; ");
|
|
|
|
result.append("enabled=")
|
2022-05-06 20:03:58 +00:00
|
|
|
.append(std::to_string(compression_options.enabled))
|
2019-04-02 21:48:52 +00:00
|
|
|
.append("; ");
|
Limit buffering for collecting samples for compression dictionary (#7970)
Summary:
For dictionary compression, we need to collect some representative samples of the data to be compressed, which we use to either generate or train (when `CompressionOptions::zstd_max_train_bytes > 0`) a dictionary. Previously, the strategy was to buffer all the data blocks during flush, and up to the target file size during compaction. That strategy allowed us to randomly pick samples from as wide a range as possible that'd be guaranteed to land in a single output file.
However, some users try to make huge files in memory-constrained environments, where this strategy can cause OOM. This PR introduces an option, `CompressionOptions::max_dict_buffer_bytes`, that limits how much data blocks are buffered before we switch to unbuffered mode (which means creating the per-SST dictionary, writing out the buffered data, and compressing/writing new blocks as soon as they are built). It is not strict as we currently buffer more than just data blocks -- also keys are buffered. But it does make a step towards giving users predictable memory usage.
Related changes include:
- Changed sampling for dictionary compression to select unique data blocks when there is limited availability of data blocks
- Made use of `BlockBuilder::SwapAndReset()` to save an allocation+memcpy when buffering data blocks for building a dictionary
- Changed `ParseBoolean()` to accept an input containing characters after the boolean. This is necessary since, with this PR, a value for `CompressionOptions::enabled` is no longer necessarily the final component in the `CompressionOptions` string.
Pull Request resolved: https://github.com/facebook/rocksdb/pull/7970
Test Plan:
- updated `CompressionOptions` unit tests to verify limit is respected (to the extent expected in the current implementation) in various scenarios of flush/compaction to bottommost/non-bottommost level
- looked at jemalloc heap profiles right before and after switching to unbuffered mode during flush/compaction. Verified memory usage in buffering is proportional to the limit set.
Reviewed By: pdillinger
Differential Revision: D26467994
Pulled By: ajkr
fbshipit-source-id: 3da4ef9fba59974e4ef40e40c01611002c861465
2021-02-19 22:06:59 +00:00
|
|
|
result.append("max_dict_buffer_bytes=")
|
2022-05-06 20:03:58 +00:00
|
|
|
.append(std::to_string(compression_options.max_dict_buffer_bytes))
|
Limit buffering for collecting samples for compression dictionary (#7970)
Summary:
For dictionary compression, we need to collect some representative samples of the data to be compressed, which we use to either generate or train (when `CompressionOptions::zstd_max_train_bytes > 0`) a dictionary. Previously, the strategy was to buffer all the data blocks during flush, and up to the target file size during compaction. That strategy allowed us to randomly pick samples from as wide a range as possible that'd be guaranteed to land in a single output file.
However, some users try to make huge files in memory-constrained environments, where this strategy can cause OOM. This PR introduces an option, `CompressionOptions::max_dict_buffer_bytes`, that limits how much data blocks are buffered before we switch to unbuffered mode (which means creating the per-SST dictionary, writing out the buffered data, and compressing/writing new blocks as soon as they are built). It is not strict as we currently buffer more than just data blocks -- also keys are buffered. But it does make a step towards giving users predictable memory usage.
Related changes include:
- Changed sampling for dictionary compression to select unique data blocks when there is limited availability of data blocks
- Made use of `BlockBuilder::SwapAndReset()` to save an allocation+memcpy when buffering data blocks for building a dictionary
- Changed `ParseBoolean()` to accept an input containing characters after the boolean. This is necessary since, with this PR, a value for `CompressionOptions::enabled` is no longer necessarily the final component in the `CompressionOptions` string.
Pull Request resolved: https://github.com/facebook/rocksdb/pull/7970
Test Plan:
- updated `CompressionOptions` unit tests to verify limit is respected (to the extent expected in the current implementation) in various scenarios of flush/compaction to bottommost/non-bottommost level
- looked at jemalloc heap profiles right before and after switching to unbuffered mode during flush/compaction. Verified memory usage in buffering is proportional to the limit set.
Reviewed By: pdillinger
Differential Revision: D26467994
Pulled By: ajkr
fbshipit-source-id: 3da4ef9fba59974e4ef40e40c01611002c861465
2021-02-19 22:06:59 +00:00
|
|
|
.append("; ");
|
Support using ZDICT_finalizeDictionary to generate zstd dictionary (#9857)
Summary:
An untrained dictionary is currently simply the concatenation of several samples. The ZSTD API, ZDICT_finalizeDictionary(), can improve such a dictionary's effectiveness at low cost. This PR changes how dictionary is created by calling the ZSTD ZDICT_finalizeDictionary() API instead of creating raw content dictionary (when max_dict_buffer_bytes > 0), and pass in all buffered uncompressed data blocks as samples.
Pull Request resolved: https://github.com/facebook/rocksdb/pull/9857
Test Plan:
#### db_bench test for cpu/memory of compression+decompression and space saving on synthetic data:
Set up: change the parameter [here](https://github.com/facebook/rocksdb/blob/fb9a167a55e0970b1ef6f67c1600c8d9c4c6114f/tools/db_bench_tool.cc#L1766) to 16384 to make synthetic data more compressible.
```
# linked local ZSTD with version 1.5.2
# DEBUG_LEVEL=0 ROCKSDB_NO_FBCODE=1 ROCKSDB_DISABLE_ZSTD=1 EXTRA_CXXFLAGS="-DZSTD_STATIC_LINKING_ONLY -DZSTD -I/data/users/changyubi/install/include/" EXTRA_LDFLAGS="-L/data/users/changyubi/install/lib/ -l:libzstd.a" make -j32 db_bench
dict_bytes=16384
train_bytes=1048576
echo "========== No Dictionary =========="
TEST_TMPDIR=/dev/shm ./db_bench -benchmarks=filluniquerandom,compact -num=10000000 -compression_type=zstd -compression_max_dict_bytes=0 -block_size=4096 -max_background_jobs=24 -memtablerep=vector -allow_concurrent_memtable_write=false -disable_wal=true -max_write_buffer_number=8 >/dev/null 2>&1
TEST_TMPDIR=/dev/shm /usr/bin/time ./db_bench -use_existing_db=true -benchmarks=compact -compression_type=zstd -compression_max_dict_bytes=0 -block_size=4096 2>&1 | grep elapsed
du -hc /dev/shm/dbbench/*sst | grep total
echo "========== Raw Content Dictionary =========="
TEST_TMPDIR=/dev/shm ./db_bench_main -benchmarks=filluniquerandom,compact -num=10000000 -compression_type=zstd -compression_max_dict_bytes=$dict_bytes -block_size=4096 -max_background_jobs=24 -memtablerep=vector -allow_concurrent_memtable_write=false -disable_wal=true -max_write_buffer_number=8 >/dev/null 2>&1
TEST_TMPDIR=/dev/shm /usr/bin/time ./db_bench_main -use_existing_db=true -benchmarks=compact -compression_type=zstd -compression_max_dict_bytes=$dict_bytes -block_size=4096 2>&1 | grep elapsed
du -hc /dev/shm/dbbench/*sst | grep total
echo "========== FinalizeDictionary =========="
TEST_TMPDIR=/dev/shm ./db_bench -benchmarks=filluniquerandom,compact -num=10000000 -compression_type=zstd -compression_max_dict_bytes=$dict_bytes -compression_zstd_max_train_bytes=$train_bytes -compression_use_zstd_dict_trainer=false -block_size=4096 -max_background_jobs=24 -memtablerep=vector -allow_concurrent_memtable_write=false -disable_wal=true -max_write_buffer_number=8 >/dev/null 2>&1
TEST_TMPDIR=/dev/shm /usr/bin/time ./db_bench -use_existing_db=true -benchmarks=compact -compression_type=zstd -compression_max_dict_bytes=$dict_bytes -compression_zstd_max_train_bytes=$train_bytes -compression_use_zstd_dict_trainer=false -block_size=4096 2>&1 | grep elapsed
du -hc /dev/shm/dbbench/*sst | grep total
echo "========== TrainDictionary =========="
TEST_TMPDIR=/dev/shm ./db_bench -benchmarks=filluniquerandom,compact -num=10000000 -compression_type=zstd -compression_max_dict_bytes=$dict_bytes -compression_zstd_max_train_bytes=$train_bytes -block_size=4096 -max_background_jobs=24 -memtablerep=vector -allow_concurrent_memtable_write=false -disable_wal=true -max_write_buffer_number=8 >/dev/null 2>&1
TEST_TMPDIR=/dev/shm /usr/bin/time ./db_bench -use_existing_db=true -benchmarks=compact -compression_type=zstd -compression_max_dict_bytes=$dict_bytes -compression_zstd_max_train_bytes=$train_bytes -block_size=4096 2>&1 | grep elapsed
du -hc /dev/shm/dbbench/*sst | grep total
# Result: TrainDictionary is much better on space saving, but FinalizeDictionary seems to use less memory.
# before compression data size: 1.2GB
dict_bytes=16384
max_dict_buffer_bytes = 1048576
space cpu/memory
No Dictionary 468M 14.93user 1.00system 0:15.92elapsed 100%CPU (0avgtext+0avgdata 23904maxresident)k
Raw Dictionary 251M 15.81user 0.80system 0:16.56elapsed 100%CPU (0avgtext+0avgdata 156808maxresident)k
FinalizeDictionary 236M 11.93user 0.64system 0:12.56elapsed 100%CPU (0avgtext+0avgdata 89548maxresident)k
TrainDictionary 84M 7.29user 0.45system 0:07.75elapsed 100%CPU (0avgtext+0avgdata 97288maxresident)k
```
#### Benchmark on 10 sample SST files for spacing saving and CPU time on compression:
FinalizeDictionary is comparable to TrainDictionary in terms of space saving, and takes less time in compression.
```
dict_bytes=16384
train_bytes=1048576
for sst_file in `ls ../temp/myrock-sst/`
do
echo "********** $sst_file **********"
echo "========== No Dictionary =========="
./sst_dump --file="../temp/myrock-sst/$sst_file" --command=recompress --compression_level_from=6 --compression_level_to=6 --compression_types=kZSTD
echo "========== Raw Content Dictionary =========="
./sst_dump --file="../temp/myrock-sst/$sst_file" --command=recompress --compression_level_from=6 --compression_level_to=6 --compression_types=kZSTD --compression_max_dict_bytes=$dict_bytes
echo "========== FinalizeDictionary =========="
./sst_dump --file="../temp/myrock-sst/$sst_file" --command=recompress --compression_level_from=6 --compression_level_to=6 --compression_types=kZSTD --compression_max_dict_bytes=$dict_bytes --compression_zstd_max_train_bytes=$train_bytes --compression_use_zstd_finalize_dict
echo "========== TrainDictionary =========="
./sst_dump --file="../temp/myrock-sst/$sst_file" --command=recompress --compression_level_from=6 --compression_level_to=6 --compression_types=kZSTD --compression_max_dict_bytes=$dict_bytes --compression_zstd_max_train_bytes=$train_bytes
done
010240.sst (Size/Time) 011029.sst 013184.sst 021552.sst 185054.sst 185137.sst 191666.sst 7560381.sst 7604174.sst 7635312.sst
No Dictionary 28165569 / 2614419 32899411 / 2976832 32977848 / 3055542 31966329 / 2004590 33614351 / 1755877 33429029 / 1717042 33611933 / 1776936 33634045 / 2771417 33789721 / 2205414 33592194 / 388254
Raw Content Dictionary 28019950 / 2697961 33748665 / 3572422 33896373 / 3534701 26418431 / 2259658 28560825 / 1839168 28455030 / 1846039 28494319 / 1861349 32391599 / 3095649 33772142 / 2407843 33592230 / 474523
FinalizeDictionary 27896012 / 2650029 33763886 / 3719427 33904283 / 3552793 26008225 / 2198033 28111872 / 1869530 28014374 / 1789771 28047706 / 1848300 32296254 / 3204027 33698698 / 2381468 33592344 / 517433
TrainDictionary 28046089 / 2740037 33706480 / 3679019 33885741 / 3629351 25087123 / 2204558 27194353 / 1970207 27234229 / 1896811 27166710 / 1903119 32011041 / 3322315 32730692 / 2406146 33608631 / 570593
```
#### Decompression/Read test:
With FinalizeDictionary/TrainDictionary, some data structure used for decompression are in stored in dictionary, so they are expected to be faster in terms of decompression/reads.
```
dict_bytes=16384
train_bytes=1048576
echo "No Dictionary"
TEST_TMPDIR=/dev/shm/ ./db_bench -benchmarks=filluniquerandom,compact -compression_type=zstd -compression_max_dict_bytes=0 > /dev/null 2>&1
TEST_TMPDIR=/dev/shm/ ./db_bench -use_existing_db=true -benchmarks=readrandom -cache_size=0 -compression_type=zstd -compression_max_dict_bytes=0 2>&1 | grep MB/s
echo "Raw Dictionary"
TEST_TMPDIR=/dev/shm/ ./db_bench -benchmarks=filluniquerandom,compact -compression_type=zstd -compression_max_dict_bytes=$dict_bytes > /dev/null 2>&1
TEST_TMPDIR=/dev/shm/ ./db_bench -use_existing_db=true -benchmarks=readrandom -cache_size=0 -compression_type=zstd -compression_max_dict_bytes=$dict_bytes 2>&1 | grep MB/s
echo "FinalizeDict"
TEST_TMPDIR=/dev/shm/ ./db_bench -benchmarks=filluniquerandom,compact -compression_type=zstd -compression_max_dict_bytes=$dict_bytes -compression_zstd_max_train_bytes=$train_bytes -compression_use_zstd_dict_trainer=false > /dev/null 2>&1
TEST_TMPDIR=/dev/shm/ ./db_bench -use_existing_db=true -benchmarks=readrandom -cache_size=0 -compression_type=zstd -compression_max_dict_bytes=$dict_bytes -compression_zstd_max_train_bytes=$train_bytes -compression_use_zstd_dict_trainer=false 2>&1 | grep MB/s
echo "Train Dictionary"
TEST_TMPDIR=/dev/shm/ ./db_bench -benchmarks=filluniquerandom,compact -compression_type=zstd -compression_max_dict_bytes=$dict_bytes -compression_zstd_max_train_bytes=$train_bytes > /dev/null 2>&1
TEST_TMPDIR=/dev/shm/ ./db_bench -use_existing_db=true -benchmarks=readrandom -cache_size=0 -compression_type=zstd -compression_max_dict_bytes=$dict_bytes -compression_zstd_max_train_bytes=$train_bytes 2>&1 | grep MB/s
No Dictionary
readrandom : 12.183 micros/op 82082 ops/sec 12.183 seconds 1000000 operations; 9.1 MB/s (1000000 of 1000000 found)
Raw Dictionary
readrandom : 12.314 micros/op 81205 ops/sec 12.314 seconds 1000000 operations; 9.0 MB/s (1000000 of 1000000 found)
FinalizeDict
readrandom : 9.787 micros/op 102180 ops/sec 9.787 seconds 1000000 operations; 11.3 MB/s (1000000 of 1000000 found)
Train Dictionary
readrandom : 9.698 micros/op 103108 ops/sec 9.699 seconds 1000000 operations; 11.4 MB/s (1000000 of 1000000 found)
```
Reviewed By: ajkr
Differential Revision: D35720026
Pulled By: cbi42
fbshipit-source-id: 24d230fdff0fd28a1bb650658798f00dfcfb2a1f
2022-05-20 19:09:09 +00:00
|
|
|
result.append("use_zstd_dict_trainer=")
|
|
|
|
.append(std::to_string(compression_options.use_zstd_dict_trainer))
|
|
|
|
.append("; ");
|
2019-04-02 21:48:52 +00:00
|
|
|
return result;
|
|
|
|
}
|
|
|
|
|
2015-01-15 00:24:24 +00:00
|
|
|
// compress_format_version can have two values:
|
|
|
|
// 1 -- decompressed sizes for BZip2 and Zlib are not included in the compressed
|
|
|
|
// block. Also, decompressed sizes for LZ4 are encoded in platform-dependent
|
|
|
|
// way.
|
|
|
|
// 2 -- Zlib, BZip2 and LZ4 encode decompressed size as Varint32 just before the
|
|
|
|
// start of compressed block. Snappy format is the same as version 1.
|
|
|
|
|
2019-01-19 03:10:17 +00:00
|
|
|
inline bool Snappy_Compress(const CompressionInfo& /*info*/, const char* input,
|
|
|
|
size_t length, ::std::string* output) {
|
2015-01-09 20:57:11 +00:00
|
|
|
#ifdef SNAPPY
|
|
|
|
output->resize(snappy::MaxCompressedLength(length));
|
|
|
|
size_t outlen;
|
|
|
|
snappy::RawCompress(input, length, &(*output)[0], &outlen);
|
|
|
|
output->resize(outlen);
|
|
|
|
return true;
|
2017-10-19 17:48:47 +00:00
|
|
|
#else
|
2018-04-13 00:55:14 +00:00
|
|
|
(void)input;
|
|
|
|
(void)length;
|
|
|
|
(void)output;
|
2015-01-09 20:57:11 +00:00
|
|
|
return false;
|
2017-10-19 17:48:47 +00:00
|
|
|
#endif
|
2015-01-09 20:57:11 +00:00
|
|
|
}
|
|
|
|
|
2020-09-25 16:00:29 +00:00
|
|
|
inline CacheAllocationPtr Snappy_Uncompress(
|
|
|
|
const char* input, size_t length, size_t* uncompressed_size,
|
|
|
|
MemoryAllocator* allocator = nullptr) {
|
2015-01-09 20:57:11 +00:00
|
|
|
#ifdef SNAPPY
|
2020-09-25 16:00:29 +00:00
|
|
|
size_t uncompressed_length = 0;
|
|
|
|
if (!snappy::GetUncompressedLength(input, length, &uncompressed_length)) {
|
|
|
|
return nullptr;
|
|
|
|
}
|
2015-01-09 20:57:11 +00:00
|
|
|
|
2020-09-25 16:00:29 +00:00
|
|
|
CacheAllocationPtr output = AllocateBlock(uncompressed_length, allocator);
|
|
|
|
|
|
|
|
if (!snappy::RawUncompress(input, length, output.get())) {
|
|
|
|
return nullptr;
|
|
|
|
}
|
|
|
|
|
|
|
|
*uncompressed_size = uncompressed_length;
|
|
|
|
|
|
|
|
return output;
|
2015-01-09 20:57:11 +00:00
|
|
|
#else
|
2018-04-13 00:55:14 +00:00
|
|
|
(void)input;
|
|
|
|
(void)length;
|
2020-09-25 16:00:29 +00:00
|
|
|
(void)uncompressed_size;
|
|
|
|
(void)allocator;
|
|
|
|
return nullptr;
|
2015-01-09 20:57:11 +00:00
|
|
|
#endif
|
|
|
|
}
|
|
|
|
|
2015-01-15 00:24:24 +00:00
|
|
|
namespace compression {
|
|
|
|
// returns size
|
|
|
|
inline size_t PutDecompressedSizeInfo(std::string* output, uint32_t length) {
|
|
|
|
PutVarint32(output, length);
|
|
|
|
return output->size();
|
|
|
|
}
|
|
|
|
|
|
|
|
inline bool GetDecompressedSizeInfo(const char** input_data,
|
|
|
|
size_t* input_length,
|
|
|
|
uint32_t* output_len) {
|
|
|
|
auto new_input_data =
|
|
|
|
GetVarint32Ptr(*input_data, *input_data + *input_length, output_len);
|
|
|
|
if (new_input_data == nullptr) {
|
|
|
|
return false;
|
|
|
|
}
|
|
|
|
*input_length -= (new_input_data - *input_data);
|
|
|
|
*input_data = new_input_data;
|
|
|
|
return true;
|
|
|
|
}
|
|
|
|
} // namespace compression
|
|
|
|
|
|
|
|
// compress_format_version == 1 -- decompressed size is not included in the
|
|
|
|
// block header
|
|
|
|
// compress_format_version == 2 -- decompressed size is included in the block
|
|
|
|
// header in varint32 format
|
Shared dictionary compression using reference block
Summary:
This adds a new metablock containing a shared dictionary that is used
to compress all data blocks in the SST file. The size of the shared dictionary
is configurable in CompressionOptions and defaults to 0. It's currently only
used for zlib/lz4/lz4hc, but the block will be stored in the SST regardless of
the compression type if the user chooses a nonzero dictionary size.
During compaction, computes the dictionary by randomly sampling the first
output file in each subcompaction. It pre-computes the intervals to sample
by assuming the output file will have the maximum allowable length. In case
the file is smaller, some of the pre-computed sampling intervals can be beyond
end-of-file, in which case we skip over those samples and the dictionary will
be a bit smaller. After the dictionary is generated using the first file in a
subcompaction, it is loaded into the compression library before writing each
block in each subsequent file of that subcompaction.
On the read path, gets the dictionary from the metablock, if it exists. Then,
loads that dictionary into the compression library before reading each block.
Test Plan: new unit test
Reviewers: yhchiang, IslamAbdelRahman, cyan, sdong
Reviewed By: sdong
Subscribers: andrewkr, yoshinorim, kradhakrishnan, dhruba, leveldb
Differential Revision: https://reviews.facebook.net/D52287
2016-04-28 00:36:03 +00:00
|
|
|
// @param compression_dict Data for presetting the compression library's
|
|
|
|
// dictionary.
|
2019-01-19 03:10:17 +00:00
|
|
|
inline bool Zlib_Compress(const CompressionInfo& info,
|
Shared dictionary compression using reference block
Summary:
This adds a new metablock containing a shared dictionary that is used
to compress all data blocks in the SST file. The size of the shared dictionary
is configurable in CompressionOptions and defaults to 0. It's currently only
used for zlib/lz4/lz4hc, but the block will be stored in the SST regardless of
the compression type if the user chooses a nonzero dictionary size.
During compaction, computes the dictionary by randomly sampling the first
output file in each subcompaction. It pre-computes the intervals to sample
by assuming the output file will have the maximum allowable length. In case
the file is smaller, some of the pre-computed sampling intervals can be beyond
end-of-file, in which case we skip over those samples and the dictionary will
be a bit smaller. After the dictionary is generated using the first file in a
subcompaction, it is loaded into the compression library before writing each
block in each subsequent file of that subcompaction.
On the read path, gets the dictionary from the metablock, if it exists. Then,
loads that dictionary into the compression library before reading each block.
Test Plan: new unit test
Reviewers: yhchiang, IslamAbdelRahman, cyan, sdong
Reviewed By: sdong
Subscribers: andrewkr, yoshinorim, kradhakrishnan, dhruba, leveldb
Differential Revision: https://reviews.facebook.net/D52287
2016-04-28 00:36:03 +00:00
|
|
|
uint32_t compress_format_version, const char* input,
|
2018-06-04 19:04:52 +00:00
|
|
|
size_t length, ::std::string* output) {
|
2015-01-09 20:57:11 +00:00
|
|
|
#ifdef ZLIB
|
2015-01-15 00:24:24 +00:00
|
|
|
if (length > std::numeric_limits<uint32_t>::max()) {
|
|
|
|
// Can't compress more than 4GB
|
|
|
|
return false;
|
|
|
|
}
|
|
|
|
|
|
|
|
size_t output_header_len = 0;
|
|
|
|
if (compress_format_version == 2) {
|
|
|
|
output_header_len = compression::PutDecompressedSizeInfo(
|
|
|
|
output, static_cast<uint32_t>(length));
|
|
|
|
}
|
|
|
|
|
2015-01-09 20:57:11 +00:00
|
|
|
// The memLevel parameter specifies how much memory should be allocated for
|
|
|
|
// the internal compression state.
|
|
|
|
// memLevel=1 uses minimum memory but is slow and reduces compression ratio.
|
|
|
|
// memLevel=9 uses maximum memory for optimal speed.
|
|
|
|
// The default value is 8. See zconf.h for more details.
|
|
|
|
static const int memLevel = 8;
|
2018-05-24 01:33:00 +00:00
|
|
|
int level;
|
2019-01-19 03:10:17 +00:00
|
|
|
if (info.options().level == CompressionOptions::kDefaultCompressionLevel) {
|
2018-05-24 01:33:00 +00:00
|
|
|
level = Z_DEFAULT_COMPRESSION;
|
|
|
|
} else {
|
2019-01-19 03:10:17 +00:00
|
|
|
level = info.options().level;
|
2018-05-24 01:33:00 +00:00
|
|
|
}
|
2015-01-09 20:57:11 +00:00
|
|
|
z_stream _stream;
|
|
|
|
memset(&_stream, 0, sizeof(z_stream));
|
2019-01-19 03:10:17 +00:00
|
|
|
int st = deflateInit2(&_stream, level, Z_DEFLATED, info.options().window_bits,
|
|
|
|
memLevel, info.options().strategy);
|
2015-01-09 20:57:11 +00:00
|
|
|
if (st != Z_OK) {
|
|
|
|
return false;
|
|
|
|
}
|
|
|
|
|
2019-01-19 03:10:17 +00:00
|
|
|
Slice compression_dict = info.dict().GetRawDict();
|
|
|
|
if (compression_dict.size()) {
|
Shared dictionary compression using reference block
Summary:
This adds a new metablock containing a shared dictionary that is used
to compress all data blocks in the SST file. The size of the shared dictionary
is configurable in CompressionOptions and defaults to 0. It's currently only
used for zlib/lz4/lz4hc, but the block will be stored in the SST regardless of
the compression type if the user chooses a nonzero dictionary size.
During compaction, computes the dictionary by randomly sampling the first
output file in each subcompaction. It pre-computes the intervals to sample
by assuming the output file will have the maximum allowable length. In case
the file is smaller, some of the pre-computed sampling intervals can be beyond
end-of-file, in which case we skip over those samples and the dictionary will
be a bit smaller. After the dictionary is generated using the first file in a
subcompaction, it is loaded into the compression library before writing each
block in each subsequent file of that subcompaction.
On the read path, gets the dictionary from the metablock, if it exists. Then,
loads that dictionary into the compression library before reading each block.
Test Plan: new unit test
Reviewers: yhchiang, IslamAbdelRahman, cyan, sdong
Reviewed By: sdong
Subscribers: andrewkr, yoshinorim, kradhakrishnan, dhruba, leveldb
Differential Revision: https://reviews.facebook.net/D52287
2016-04-28 00:36:03 +00:00
|
|
|
// Initialize the compression library's dictionary
|
2019-01-19 03:10:17 +00:00
|
|
|
st = deflateSetDictionary(
|
|
|
|
&_stream, reinterpret_cast<const Bytef*>(compression_dict.data()),
|
|
|
|
static_cast<unsigned int>(compression_dict.size()));
|
Shared dictionary compression using reference block
Summary:
This adds a new metablock containing a shared dictionary that is used
to compress all data blocks in the SST file. The size of the shared dictionary
is configurable in CompressionOptions and defaults to 0. It's currently only
used for zlib/lz4/lz4hc, but the block will be stored in the SST regardless of
the compression type if the user chooses a nonzero dictionary size.
During compaction, computes the dictionary by randomly sampling the first
output file in each subcompaction. It pre-computes the intervals to sample
by assuming the output file will have the maximum allowable length. In case
the file is smaller, some of the pre-computed sampling intervals can be beyond
end-of-file, in which case we skip over those samples and the dictionary will
be a bit smaller. After the dictionary is generated using the first file in a
subcompaction, it is loaded into the compression library before writing each
block in each subsequent file of that subcompaction.
On the read path, gets the dictionary from the metablock, if it exists. Then,
loads that dictionary into the compression library before reading each block.
Test Plan: new unit test
Reviewers: yhchiang, IslamAbdelRahman, cyan, sdong
Reviewed By: sdong
Subscribers: andrewkr, yoshinorim, kradhakrishnan, dhruba, leveldb
Differential Revision: https://reviews.facebook.net/D52287
2016-04-28 00:36:03 +00:00
|
|
|
if (st != Z_OK) {
|
2016-07-08 17:52:25 +00:00
|
|
|
deflateEnd(&_stream);
|
Shared dictionary compression using reference block
Summary:
This adds a new metablock containing a shared dictionary that is used
to compress all data blocks in the SST file. The size of the shared dictionary
is configurable in CompressionOptions and defaults to 0. It's currently only
used for zlib/lz4/lz4hc, but the block will be stored in the SST regardless of
the compression type if the user chooses a nonzero dictionary size.
During compaction, computes the dictionary by randomly sampling the first
output file in each subcompaction. It pre-computes the intervals to sample
by assuming the output file will have the maximum allowable length. In case
the file is smaller, some of the pre-computed sampling intervals can be beyond
end-of-file, in which case we skip over those samples and the dictionary will
be a bit smaller. After the dictionary is generated using the first file in a
subcompaction, it is loaded into the compression library before writing each
block in each subsequent file of that subcompaction.
On the read path, gets the dictionary from the metablock, if it exists. Then,
loads that dictionary into the compression library before reading each block.
Test Plan: new unit test
Reviewers: yhchiang, IslamAbdelRahman, cyan, sdong
Reviewed By: sdong
Subscribers: andrewkr, yoshinorim, kradhakrishnan, dhruba, leveldb
Differential Revision: https://reviews.facebook.net/D52287
2016-04-28 00:36:03 +00:00
|
|
|
return false;
|
|
|
|
}
|
|
|
|
}
|
|
|
|
|
2022-03-03 00:35:21 +00:00
|
|
|
// Get an upper bound on the compressed size.
|
2022-03-17 04:16:12 +00:00
|
|
|
size_t upper_bound =
|
|
|
|
deflateBound(&_stream, static_cast<unsigned long>(length));
|
2022-03-03 00:35:21 +00:00
|
|
|
output->resize(output_header_len + upper_bound);
|
|
|
|
|
2015-01-09 20:57:11 +00:00
|
|
|
// Compress the input, and put compressed data in output.
|
2018-04-13 00:55:14 +00:00
|
|
|
_stream.next_in = (Bytef*)input;
|
2015-01-09 20:57:11 +00:00
|
|
|
_stream.avail_in = static_cast<unsigned int>(length);
|
|
|
|
|
|
|
|
// Initialize the output size.
|
2022-03-03 00:35:21 +00:00
|
|
|
_stream.avail_out = static_cast<unsigned int>(upper_bound);
|
2015-01-15 00:24:24 +00:00
|
|
|
_stream.next_out = reinterpret_cast<Bytef*>(&(*output)[output_header_len]);
|
2015-01-09 20:57:11 +00:00
|
|
|
|
2016-07-08 17:52:25 +00:00
|
|
|
bool compressed = false;
|
|
|
|
st = deflate(&_stream, Z_FINISH);
|
|
|
|
if (st == Z_STREAM_END) {
|
|
|
|
compressed = true;
|
|
|
|
output->resize(output->size() - _stream.avail_out);
|
2015-01-09 20:57:11 +00:00
|
|
|
}
|
2016-07-08 17:52:25 +00:00
|
|
|
// The only return value we really care about is Z_STREAM_END.
|
|
|
|
// Z_OK means insufficient output space. This means the compression is
|
|
|
|
// bigger than decompressed size. Just fail the compression in that case.
|
2015-01-09 20:57:11 +00:00
|
|
|
|
|
|
|
deflateEnd(&_stream);
|
2016-07-08 17:52:25 +00:00
|
|
|
return compressed;
|
2018-01-31 20:04:52 +00:00
|
|
|
#else
|
2019-01-19 03:10:17 +00:00
|
|
|
(void)info;
|
2018-04-13 00:55:14 +00:00
|
|
|
(void)compress_format_version;
|
|
|
|
(void)input;
|
|
|
|
(void)length;
|
|
|
|
(void)output;
|
2015-01-09 20:57:11 +00:00
|
|
|
return false;
|
2018-01-31 20:04:52 +00:00
|
|
|
#endif
|
2015-01-09 20:57:11 +00:00
|
|
|
}
|
|
|
|
|
2015-01-15 00:24:24 +00:00
|
|
|
// compress_format_version == 1 -- decompressed size is not included in the
|
|
|
|
// block header
|
|
|
|
// compress_format_version == 2 -- decompressed size is included in the block
|
|
|
|
// header in varint32 format
|
Shared dictionary compression using reference block
Summary:
This adds a new metablock containing a shared dictionary that is used
to compress all data blocks in the SST file. The size of the shared dictionary
is configurable in CompressionOptions and defaults to 0. It's currently only
used for zlib/lz4/lz4hc, but the block will be stored in the SST regardless of
the compression type if the user chooses a nonzero dictionary size.
During compaction, computes the dictionary by randomly sampling the first
output file in each subcompaction. It pre-computes the intervals to sample
by assuming the output file will have the maximum allowable length. In case
the file is smaller, some of the pre-computed sampling intervals can be beyond
end-of-file, in which case we skip over those samples and the dictionary will
be a bit smaller. After the dictionary is generated using the first file in a
subcompaction, it is loaded into the compression library before writing each
block in each subsequent file of that subcompaction.
On the read path, gets the dictionary from the metablock, if it exists. Then,
loads that dictionary into the compression library before reading each block.
Test Plan: new unit test
Reviewers: yhchiang, IslamAbdelRahman, cyan, sdong
Reviewed By: sdong
Subscribers: andrewkr, yoshinorim, kradhakrishnan, dhruba, leveldb
Differential Revision: https://reviews.facebook.net/D52287
2016-04-28 00:36:03 +00:00
|
|
|
// @param compression_dict Data for presetting the compression library's
|
|
|
|
// dictionary.
|
2018-10-03 00:21:54 +00:00
|
|
|
inline CacheAllocationPtr Zlib_Uncompress(
|
2019-01-19 03:10:17 +00:00
|
|
|
const UncompressionInfo& info, const char* input_data, size_t input_length,
|
2020-09-25 16:00:29 +00:00
|
|
|
size_t* uncompressed_size, uint32_t compress_format_version,
|
2018-10-26 21:27:09 +00:00
|
|
|
MemoryAllocator* allocator = nullptr, int windowBits = -14) {
|
2015-01-09 20:57:11 +00:00
|
|
|
#ifdef ZLIB
|
2015-01-15 00:24:24 +00:00
|
|
|
uint32_t output_len = 0;
|
|
|
|
if (compress_format_version == 2) {
|
|
|
|
if (!compression::GetDecompressedSizeInfo(&input_data, &input_length,
|
|
|
|
&output_len)) {
|
|
|
|
return nullptr;
|
|
|
|
}
|
|
|
|
} else {
|
|
|
|
// Assume the decompressed data size will 5x of compressed size, but round
|
|
|
|
// to the page size
|
|
|
|
size_t proposed_output_len = ((input_length * 5) & (~(4096 - 1))) + 4096;
|
|
|
|
output_len = static_cast<uint32_t>(
|
|
|
|
std::min(proposed_output_len,
|
|
|
|
static_cast<size_t>(std::numeric_limits<uint32_t>::max())));
|
|
|
|
}
|
|
|
|
|
2015-01-09 20:57:11 +00:00
|
|
|
z_stream _stream;
|
|
|
|
memset(&_stream, 0, sizeof(z_stream));
|
|
|
|
|
|
|
|
// For raw inflate, the windowBits should be -8..-15.
|
|
|
|
// If windowBits is bigger than zero, it will use either zlib
|
|
|
|
// header or gzip header. Adding 32 to it will do automatic detection.
|
2018-04-13 00:55:14 +00:00
|
|
|
int st =
|
|
|
|
inflateInit2(&_stream, windowBits > 0 ? windowBits + 32 : windowBits);
|
2015-01-09 20:57:11 +00:00
|
|
|
if (st != Z_OK) {
|
|
|
|
return nullptr;
|
|
|
|
}
|
|
|
|
|
2019-07-23 22:57:43 +00:00
|
|
|
const Slice& compression_dict = info.dict().GetRawDict();
|
2019-01-19 03:10:17 +00:00
|
|
|
if (compression_dict.size()) {
|
Shared dictionary compression using reference block
Summary:
This adds a new metablock containing a shared dictionary that is used
to compress all data blocks in the SST file. The size of the shared dictionary
is configurable in CompressionOptions and defaults to 0. It's currently only
used for zlib/lz4/lz4hc, but the block will be stored in the SST regardless of
the compression type if the user chooses a nonzero dictionary size.
During compaction, computes the dictionary by randomly sampling the first
output file in each subcompaction. It pre-computes the intervals to sample
by assuming the output file will have the maximum allowable length. In case
the file is smaller, some of the pre-computed sampling intervals can be beyond
end-of-file, in which case we skip over those samples and the dictionary will
be a bit smaller. After the dictionary is generated using the first file in a
subcompaction, it is loaded into the compression library before writing each
block in each subsequent file of that subcompaction.
On the read path, gets the dictionary from the metablock, if it exists. Then,
loads that dictionary into the compression library before reading each block.
Test Plan: new unit test
Reviewers: yhchiang, IslamAbdelRahman, cyan, sdong
Reviewed By: sdong
Subscribers: andrewkr, yoshinorim, kradhakrishnan, dhruba, leveldb
Differential Revision: https://reviews.facebook.net/D52287
2016-04-28 00:36:03 +00:00
|
|
|
// Initialize the compression library's dictionary
|
2019-01-19 03:10:17 +00:00
|
|
|
st = inflateSetDictionary(
|
|
|
|
&_stream, reinterpret_cast<const Bytef*>(compression_dict.data()),
|
|
|
|
static_cast<unsigned int>(compression_dict.size()));
|
Shared dictionary compression using reference block
Summary:
This adds a new metablock containing a shared dictionary that is used
to compress all data blocks in the SST file. The size of the shared dictionary
is configurable in CompressionOptions and defaults to 0. It's currently only
used for zlib/lz4/lz4hc, but the block will be stored in the SST regardless of
the compression type if the user chooses a nonzero dictionary size.
During compaction, computes the dictionary by randomly sampling the first
output file in each subcompaction. It pre-computes the intervals to sample
by assuming the output file will have the maximum allowable length. In case
the file is smaller, some of the pre-computed sampling intervals can be beyond
end-of-file, in which case we skip over those samples and the dictionary will
be a bit smaller. After the dictionary is generated using the first file in a
subcompaction, it is loaded into the compression library before writing each
block in each subsequent file of that subcompaction.
On the read path, gets the dictionary from the metablock, if it exists. Then,
loads that dictionary into the compression library before reading each block.
Test Plan: new unit test
Reviewers: yhchiang, IslamAbdelRahman, cyan, sdong
Reviewed By: sdong
Subscribers: andrewkr, yoshinorim, kradhakrishnan, dhruba, leveldb
Differential Revision: https://reviews.facebook.net/D52287
2016-04-28 00:36:03 +00:00
|
|
|
if (st != Z_OK) {
|
|
|
|
return nullptr;
|
|
|
|
}
|
|
|
|
}
|
|
|
|
|
2018-04-13 00:55:14 +00:00
|
|
|
_stream.next_in = (Bytef*)input_data;
|
2015-01-09 20:57:11 +00:00
|
|
|
_stream.avail_in = static_cast<unsigned int>(input_length);
|
|
|
|
|
2018-10-03 00:21:54 +00:00
|
|
|
auto output = AllocateBlock(output_len, allocator);
|
2015-01-09 20:57:11 +00:00
|
|
|
|
2018-10-03 00:21:54 +00:00
|
|
|
_stream.next_out = (Bytef*)output.get();
|
2015-01-09 20:57:11 +00:00
|
|
|
_stream.avail_out = static_cast<unsigned int>(output_len);
|
|
|
|
|
|
|
|
bool done = false;
|
|
|
|
while (!done) {
|
|
|
|
st = inflate(&_stream, Z_SYNC_FLUSH);
|
|
|
|
switch (st) {
|
|
|
|
case Z_STREAM_END:
|
|
|
|
done = true;
|
|
|
|
break;
|
2015-01-15 00:24:24 +00:00
|
|
|
case Z_OK: {
|
2015-01-09 20:57:11 +00:00
|
|
|
// No output space. Increase the output space by 20%.
|
2015-01-15 00:24:24 +00:00
|
|
|
// We should never run out of output space if
|
|
|
|
// compress_format_version == 2
|
|
|
|
assert(compress_format_version != 2);
|
|
|
|
size_t old_sz = output_len;
|
2018-04-13 00:55:14 +00:00
|
|
|
uint32_t output_len_delta = output_len / 5;
|
2015-01-09 20:57:11 +00:00
|
|
|
output_len += output_len_delta < 10 ? 10 : output_len_delta;
|
2018-10-03 00:21:54 +00:00
|
|
|
auto tmp = AllocateBlock(output_len, allocator);
|
|
|
|
memcpy(tmp.get(), output.get(), old_sz);
|
|
|
|
output = std::move(tmp);
|
2015-01-09 20:57:11 +00:00
|
|
|
|
|
|
|
// Set more output.
|
2018-10-03 00:21:54 +00:00
|
|
|
_stream.next_out = (Bytef*)(output.get() + old_sz);
|
2015-01-09 20:57:11 +00:00
|
|
|
_stream.avail_out = static_cast<unsigned int>(output_len - old_sz);
|
|
|
|
break;
|
2015-01-15 00:24:24 +00:00
|
|
|
}
|
2015-01-09 20:57:11 +00:00
|
|
|
case Z_BUF_ERROR:
|
|
|
|
default:
|
|
|
|
inflateEnd(&_stream);
|
|
|
|
return nullptr;
|
|
|
|
}
|
|
|
|
}
|
|
|
|
|
2015-01-15 00:24:24 +00:00
|
|
|
// If we encoded decompressed block size, we should have no bytes left
|
|
|
|
assert(compress_format_version != 2 || _stream.avail_out == 0);
|
2020-09-25 16:00:29 +00:00
|
|
|
assert(output_len >= _stream.avail_out);
|
|
|
|
*uncompressed_size = output_len - _stream.avail_out;
|
2015-01-09 20:57:11 +00:00
|
|
|
inflateEnd(&_stream);
|
|
|
|
return output;
|
2018-01-31 20:04:52 +00:00
|
|
|
#else
|
2019-01-19 03:10:17 +00:00
|
|
|
(void)info;
|
2018-04-13 00:55:14 +00:00
|
|
|
(void)input_data;
|
|
|
|
(void)input_length;
|
2020-09-25 16:00:29 +00:00
|
|
|
(void)uncompressed_size;
|
2018-04-13 00:55:14 +00:00
|
|
|
(void)compress_format_version;
|
2018-10-04 20:00:10 +00:00
|
|
|
(void)allocator;
|
2018-04-13 00:55:14 +00:00
|
|
|
(void)windowBits;
|
2015-01-09 20:57:11 +00:00
|
|
|
return nullptr;
|
2018-01-31 20:04:52 +00:00
|
|
|
#endif
|
2015-01-09 20:57:11 +00:00
|
|
|
}
|
|
|
|
|
2015-01-15 00:24:24 +00:00
|
|
|
// compress_format_version == 1 -- decompressed size is not included in the
|
|
|
|
// block header
|
|
|
|
// compress_format_version == 2 -- decompressed size is included in the block
|
|
|
|
// header in varint32 format
|
2019-01-19 03:10:17 +00:00
|
|
|
inline bool BZip2_Compress(const CompressionInfo& /*info*/,
|
2018-03-05 21:08:17 +00:00
|
|
|
uint32_t compress_format_version, const char* input,
|
|
|
|
size_t length, ::std::string* output) {
|
2015-01-09 20:57:11 +00:00
|
|
|
#ifdef BZIP2
|
2015-01-15 00:24:24 +00:00
|
|
|
if (length > std::numeric_limits<uint32_t>::max()) {
|
|
|
|
// Can't compress more than 4GB
|
|
|
|
return false;
|
|
|
|
}
|
|
|
|
size_t output_header_len = 0;
|
|
|
|
if (compress_format_version == 2) {
|
|
|
|
output_header_len = compression::PutDecompressedSizeInfo(
|
|
|
|
output, static_cast<uint32_t>(length));
|
|
|
|
}
|
|
|
|
// Resize output to be the plain data length.
|
|
|
|
// This may not be big enough if the compression actually expands data.
|
|
|
|
output->resize(output_header_len + length);
|
|
|
|
|
2015-01-09 20:57:11 +00:00
|
|
|
bz_stream _stream;
|
|
|
|
memset(&_stream, 0, sizeof(bz_stream));
|
|
|
|
|
|
|
|
// Block size 1 is 100K.
|
|
|
|
// 0 is for silent.
|
|
|
|
// 30 is the default workFactor
|
|
|
|
int st = BZ2_bzCompressInit(&_stream, 1, 0, 30);
|
|
|
|
if (st != BZ_OK) {
|
|
|
|
return false;
|
|
|
|
}
|
|
|
|
|
|
|
|
// Compress the input, and put compressed data in output.
|
2018-04-13 00:55:14 +00:00
|
|
|
_stream.next_in = (char*)input;
|
2015-01-09 20:57:11 +00:00
|
|
|
_stream.avail_in = static_cast<unsigned int>(length);
|
|
|
|
|
|
|
|
// Initialize the output size.
|
|
|
|
_stream.avail_out = static_cast<unsigned int>(length);
|
2015-01-15 00:24:24 +00:00
|
|
|
_stream.next_out = reinterpret_cast<char*>(&(*output)[output_header_len]);
|
2015-01-09 20:57:11 +00:00
|
|
|
|
2016-07-08 17:52:25 +00:00
|
|
|
bool compressed = false;
|
|
|
|
st = BZ2_bzCompress(&_stream, BZ_FINISH);
|
|
|
|
if (st == BZ_STREAM_END) {
|
|
|
|
compressed = true;
|
|
|
|
output->resize(output->size() - _stream.avail_out);
|
2015-01-09 20:57:11 +00:00
|
|
|
}
|
2016-07-08 17:52:25 +00:00
|
|
|
// The only return value we really care about is BZ_STREAM_END.
|
|
|
|
// BZ_FINISH_OK means insufficient output space. This means the compression
|
|
|
|
// is bigger than decompressed size. Just fail the compression in that case.
|
2015-01-09 20:57:11 +00:00
|
|
|
|
|
|
|
BZ2_bzCompressEnd(&_stream);
|
2016-07-08 17:52:25 +00:00
|
|
|
return compressed;
|
2018-01-31 20:04:52 +00:00
|
|
|
#else
|
2018-04-13 00:55:14 +00:00
|
|
|
(void)compress_format_version;
|
|
|
|
(void)input;
|
|
|
|
(void)length;
|
|
|
|
(void)output;
|
2015-01-09 20:57:11 +00:00
|
|
|
return false;
|
2018-01-31 20:04:52 +00:00
|
|
|
#endif
|
2015-01-09 20:57:11 +00:00
|
|
|
}
|
|
|
|
|
2015-01-15 00:24:24 +00:00
|
|
|
// compress_format_version == 1 -- decompressed size is not included in the
|
|
|
|
// block header
|
|
|
|
// compress_format_version == 2 -- decompressed size is included in the block
|
|
|
|
// header in varint32 format
|
2018-10-03 00:21:54 +00:00
|
|
|
inline CacheAllocationPtr BZip2_Uncompress(
|
2020-09-25 16:00:29 +00:00
|
|
|
const char* input_data, size_t input_length, size_t* uncompressed_size,
|
2018-10-26 21:27:09 +00:00
|
|
|
uint32_t compress_format_version, MemoryAllocator* allocator = nullptr) {
|
2015-01-09 20:57:11 +00:00
|
|
|
#ifdef BZIP2
|
2015-01-15 00:24:24 +00:00
|
|
|
uint32_t output_len = 0;
|
|
|
|
if (compress_format_version == 2) {
|
|
|
|
if (!compression::GetDecompressedSizeInfo(&input_data, &input_length,
|
|
|
|
&output_len)) {
|
|
|
|
return nullptr;
|
|
|
|
}
|
|
|
|
} else {
|
|
|
|
// Assume the decompressed data size will 5x of compressed size, but round
|
|
|
|
// to the next page size
|
|
|
|
size_t proposed_output_len = ((input_length * 5) & (~(4096 - 1))) + 4096;
|
|
|
|
output_len = static_cast<uint32_t>(
|
|
|
|
std::min(proposed_output_len,
|
|
|
|
static_cast<size_t>(std::numeric_limits<uint32_t>::max())));
|
|
|
|
}
|
|
|
|
|
2015-01-09 20:57:11 +00:00
|
|
|
bz_stream _stream;
|
|
|
|
memset(&_stream, 0, sizeof(bz_stream));
|
|
|
|
|
|
|
|
int st = BZ2_bzDecompressInit(&_stream, 0, 0);
|
|
|
|
if (st != BZ_OK) {
|
|
|
|
return nullptr;
|
|
|
|
}
|
|
|
|
|
2018-04-13 00:55:14 +00:00
|
|
|
_stream.next_in = (char*)input_data;
|
2015-01-09 20:57:11 +00:00
|
|
|
_stream.avail_in = static_cast<unsigned int>(input_length);
|
|
|
|
|
2018-10-03 00:21:54 +00:00
|
|
|
auto output = AllocateBlock(output_len, allocator);
|
2015-01-09 20:57:11 +00:00
|
|
|
|
2018-10-03 00:21:54 +00:00
|
|
|
_stream.next_out = (char*)output.get();
|
2015-01-09 20:57:11 +00:00
|
|
|
_stream.avail_out = static_cast<unsigned int>(output_len);
|
|
|
|
|
2015-01-15 00:24:24 +00:00
|
|
|
bool done = false;
|
|
|
|
while (!done) {
|
2015-01-09 20:57:11 +00:00
|
|
|
st = BZ2_bzDecompress(&_stream);
|
|
|
|
switch (st) {
|
|
|
|
case BZ_STREAM_END:
|
2015-01-15 00:24:24 +00:00
|
|
|
done = true;
|
2015-01-09 20:57:11 +00:00
|
|
|
break;
|
2015-01-15 00:24:24 +00:00
|
|
|
case BZ_OK: {
|
2015-01-09 20:57:11 +00:00
|
|
|
// No output space. Increase the output space by 20%.
|
2015-01-15 00:24:24 +00:00
|
|
|
// We should never run out of output space if
|
|
|
|
// compress_format_version == 2
|
|
|
|
assert(compress_format_version != 2);
|
|
|
|
uint32_t old_sz = output_len;
|
|
|
|
output_len = output_len * 1.2;
|
2018-10-03 00:21:54 +00:00
|
|
|
auto tmp = AllocateBlock(output_len, allocator);
|
|
|
|
memcpy(tmp.get(), output.get(), old_sz);
|
|
|
|
output = std::move(tmp);
|
2015-01-09 20:57:11 +00:00
|
|
|
|
|
|
|
// Set more output.
|
2018-10-03 00:21:54 +00:00
|
|
|
_stream.next_out = (char*)(output.get() + old_sz);
|
2015-01-09 20:57:11 +00:00
|
|
|
_stream.avail_out = static_cast<unsigned int>(output_len - old_sz);
|
|
|
|
break;
|
2015-01-15 00:24:24 +00:00
|
|
|
}
|
2015-01-09 20:57:11 +00:00
|
|
|
default:
|
|
|
|
BZ2_bzDecompressEnd(&_stream);
|
|
|
|
return nullptr;
|
|
|
|
}
|
|
|
|
}
|
|
|
|
|
2015-01-15 00:24:24 +00:00
|
|
|
// If we encoded decompressed block size, we should have no bytes left
|
|
|
|
assert(compress_format_version != 2 || _stream.avail_out == 0);
|
2020-09-25 16:00:29 +00:00
|
|
|
assert(output_len >= _stream.avail_out);
|
|
|
|
*uncompressed_size = output_len - _stream.avail_out;
|
2015-01-09 20:57:11 +00:00
|
|
|
BZ2_bzDecompressEnd(&_stream);
|
|
|
|
return output;
|
2018-01-31 20:04:52 +00:00
|
|
|
#else
|
2018-04-13 00:55:14 +00:00
|
|
|
(void)input_data;
|
|
|
|
(void)input_length;
|
2020-09-25 16:00:29 +00:00
|
|
|
(void)uncompressed_size;
|
2018-04-13 00:55:14 +00:00
|
|
|
(void)compress_format_version;
|
2018-10-04 20:00:10 +00:00
|
|
|
(void)allocator;
|
2015-01-09 20:57:11 +00:00
|
|
|
return nullptr;
|
2018-01-31 20:04:52 +00:00
|
|
|
#endif
|
2015-01-09 20:57:11 +00:00
|
|
|
}
|
|
|
|
|
2015-01-15 00:24:24 +00:00
|
|
|
// compress_format_version == 1 -- decompressed size is included in the
|
|
|
|
// block header using memcpy, which makes database non-portable)
|
|
|
|
// compress_format_version == 2 -- decompressed size is included in the block
|
|
|
|
// header in varint32 format
|
Shared dictionary compression using reference block
Summary:
This adds a new metablock containing a shared dictionary that is used
to compress all data blocks in the SST file. The size of the shared dictionary
is configurable in CompressionOptions and defaults to 0. It's currently only
used for zlib/lz4/lz4hc, but the block will be stored in the SST regardless of
the compression type if the user chooses a nonzero dictionary size.
During compaction, computes the dictionary by randomly sampling the first
output file in each subcompaction. It pre-computes the intervals to sample
by assuming the output file will have the maximum allowable length. In case
the file is smaller, some of the pre-computed sampling intervals can be beyond
end-of-file, in which case we skip over those samples and the dictionary will
be a bit smaller. After the dictionary is generated using the first file in a
subcompaction, it is loaded into the compression library before writing each
block in each subsequent file of that subcompaction.
On the read path, gets the dictionary from the metablock, if it exists. Then,
loads that dictionary into the compression library before reading each block.
Test Plan: new unit test
Reviewers: yhchiang, IslamAbdelRahman, cyan, sdong
Reviewed By: sdong
Subscribers: andrewkr, yoshinorim, kradhakrishnan, dhruba, leveldb
Differential Revision: https://reviews.facebook.net/D52287
2016-04-28 00:36:03 +00:00
|
|
|
// @param compression_dict Data for presetting the compression library's
|
|
|
|
// dictionary.
|
2019-01-19 03:10:17 +00:00
|
|
|
inline bool LZ4_Compress(const CompressionInfo& info,
|
2015-01-15 00:24:24 +00:00
|
|
|
uint32_t compress_format_version, const char* input,
|
2018-06-04 19:04:52 +00:00
|
|
|
size_t length, ::std::string* output) {
|
2015-01-09 20:57:11 +00:00
|
|
|
#ifdef LZ4
|
2015-01-15 00:24:24 +00:00
|
|
|
if (length > std::numeric_limits<uint32_t>::max()) {
|
|
|
|
// Can't compress more than 4GB
|
|
|
|
return false;
|
|
|
|
}
|
|
|
|
|
|
|
|
size_t output_header_len = 0;
|
|
|
|
if (compress_format_version == 2) {
|
|
|
|
// new encoding, using varint32 to store size information
|
|
|
|
output_header_len = compression::PutDecompressedSizeInfo(
|
|
|
|
output, static_cast<uint32_t>(length));
|
|
|
|
} else {
|
|
|
|
// legacy encoding, which is not really portable (depends on big/little
|
|
|
|
// endianness)
|
|
|
|
output_header_len = 8;
|
|
|
|
output->resize(output_header_len);
|
|
|
|
char* p = const_cast<char*>(output->c_str());
|
|
|
|
memcpy(p, &length, sizeof(length));
|
|
|
|
}
|
Shared dictionary compression using reference block
Summary:
This adds a new metablock containing a shared dictionary that is used
to compress all data blocks in the SST file. The size of the shared dictionary
is configurable in CompressionOptions and defaults to 0. It's currently only
used for zlib/lz4/lz4hc, but the block will be stored in the SST regardless of
the compression type if the user chooses a nonzero dictionary size.
During compaction, computes the dictionary by randomly sampling the first
output file in each subcompaction. It pre-computes the intervals to sample
by assuming the output file will have the maximum allowable length. In case
the file is smaller, some of the pre-computed sampling intervals can be beyond
end-of-file, in which case we skip over those samples and the dictionary will
be a bit smaller. After the dictionary is generated using the first file in a
subcompaction, it is loaded into the compression library before writing each
block in each subsequent file of that subcompaction.
On the read path, gets the dictionary from the metablock, if it exists. Then,
loads that dictionary into the compression library before reading each block.
Test Plan: new unit test
Reviewers: yhchiang, IslamAbdelRahman, cyan, sdong
Reviewed By: sdong
Subscribers: andrewkr, yoshinorim, kradhakrishnan, dhruba, leveldb
Differential Revision: https://reviews.facebook.net/D52287
2016-04-28 00:36:03 +00:00
|
|
|
int compress_bound = LZ4_compressBound(static_cast<int>(length));
|
|
|
|
output->resize(static_cast<size_t>(output_header_len + compress_bound));
|
|
|
|
|
|
|
|
int outlen;
|
|
|
|
#if LZ4_VERSION_NUMBER >= 10400 // r124+
|
|
|
|
LZ4_stream_t* stream = LZ4_createStream();
|
2019-01-19 03:10:17 +00:00
|
|
|
Slice compression_dict = info.dict().GetRawDict();
|
|
|
|
if (compression_dict.size()) {
|
|
|
|
LZ4_loadDict(stream, compression_dict.data(),
|
|
|
|
static_cast<int>(compression_dict.size()));
|
Shared dictionary compression using reference block
Summary:
This adds a new metablock containing a shared dictionary that is used
to compress all data blocks in the SST file. The size of the shared dictionary
is configurable in CompressionOptions and defaults to 0. It's currently only
used for zlib/lz4/lz4hc, but the block will be stored in the SST regardless of
the compression type if the user chooses a nonzero dictionary size.
During compaction, computes the dictionary by randomly sampling the first
output file in each subcompaction. It pre-computes the intervals to sample
by assuming the output file will have the maximum allowable length. In case
the file is smaller, some of the pre-computed sampling intervals can be beyond
end-of-file, in which case we skip over those samples and the dictionary will
be a bit smaller. After the dictionary is generated using the first file in a
subcompaction, it is loaded into the compression library before writing each
block in each subsequent file of that subcompaction.
On the read path, gets the dictionary from the metablock, if it exists. Then,
loads that dictionary into the compression library before reading each block.
Test Plan: new unit test
Reviewers: yhchiang, IslamAbdelRahman, cyan, sdong
Reviewed By: sdong
Subscribers: andrewkr, yoshinorim, kradhakrishnan, dhruba, leveldb
Differential Revision: https://reviews.facebook.net/D52287
2016-04-28 00:36:03 +00:00
|
|
|
}
|
2016-11-21 20:17:06 +00:00
|
|
|
#if LZ4_VERSION_NUMBER >= 10700 // r129+
|
2023-09-18 16:26:29 +00:00
|
|
|
int acceleration;
|
|
|
|
if (info.options().level < 0) {
|
|
|
|
acceleration = -info.options().level;
|
|
|
|
} else {
|
|
|
|
acceleration = 1;
|
|
|
|
}
|
|
|
|
outlen = LZ4_compress_fast_continue(
|
|
|
|
stream, input, &(*output)[output_header_len], static_cast<int>(length),
|
|
|
|
compress_bound, acceleration);
|
2016-11-21 20:17:06 +00:00
|
|
|
#else // up to r128
|
Shared dictionary compression using reference block
Summary:
This adds a new metablock containing a shared dictionary that is used
to compress all data blocks in the SST file. The size of the shared dictionary
is configurable in CompressionOptions and defaults to 0. It's currently only
used for zlib/lz4/lz4hc, but the block will be stored in the SST regardless of
the compression type if the user chooses a nonzero dictionary size.
During compaction, computes the dictionary by randomly sampling the first
output file in each subcompaction. It pre-computes the intervals to sample
by assuming the output file will have the maximum allowable length. In case
the file is smaller, some of the pre-computed sampling intervals can be beyond
end-of-file, in which case we skip over those samples and the dictionary will
be a bit smaller. After the dictionary is generated using the first file in a
subcompaction, it is loaded into the compression library before writing each
block in each subsequent file of that subcompaction.
On the read path, gets the dictionary from the metablock, if it exists. Then,
loads that dictionary into the compression library before reading each block.
Test Plan: new unit test
Reviewers: yhchiang, IslamAbdelRahman, cyan, sdong
Reviewed By: sdong
Subscribers: andrewkr, yoshinorim, kradhakrishnan, dhruba, leveldb
Differential Revision: https://reviews.facebook.net/D52287
2016-04-28 00:36:03 +00:00
|
|
|
outlen = LZ4_compress_limitedOutput_continue(
|
|
|
|
stream, input, &(*output)[output_header_len], static_cast<int>(length),
|
|
|
|
compress_bound);
|
2016-11-21 20:17:06 +00:00
|
|
|
#endif
|
Shared dictionary compression using reference block
Summary:
This adds a new metablock containing a shared dictionary that is used
to compress all data blocks in the SST file. The size of the shared dictionary
is configurable in CompressionOptions and defaults to 0. It's currently only
used for zlib/lz4/lz4hc, but the block will be stored in the SST regardless of
the compression type if the user chooses a nonzero dictionary size.
During compaction, computes the dictionary by randomly sampling the first
output file in each subcompaction. It pre-computes the intervals to sample
by assuming the output file will have the maximum allowable length. In case
the file is smaller, some of the pre-computed sampling intervals can be beyond
end-of-file, in which case we skip over those samples and the dictionary will
be a bit smaller. After the dictionary is generated using the first file in a
subcompaction, it is loaded into the compression library before writing each
block in each subsequent file of that subcompaction.
On the read path, gets the dictionary from the metablock, if it exists. Then,
loads that dictionary into the compression library before reading each block.
Test Plan: new unit test
Reviewers: yhchiang, IslamAbdelRahman, cyan, sdong
Reviewed By: sdong
Subscribers: andrewkr, yoshinorim, kradhakrishnan, dhruba, leveldb
Differential Revision: https://reviews.facebook.net/D52287
2016-04-28 00:36:03 +00:00
|
|
|
LZ4_freeStream(stream);
|
|
|
|
#else // up to r123
|
|
|
|
outlen = LZ4_compress_limitedOutput(input, &(*output)[output_header_len],
|
|
|
|
static_cast<int>(length), compress_bound);
|
|
|
|
#endif // LZ4_VERSION_NUMBER >= 10400
|
2015-01-15 00:24:24 +00:00
|
|
|
|
2015-01-09 20:57:11 +00:00
|
|
|
if (outlen == 0) {
|
|
|
|
return false;
|
|
|
|
}
|
2015-01-15 00:24:24 +00:00
|
|
|
output->resize(static_cast<size_t>(output_header_len + outlen));
|
2015-01-09 20:57:11 +00:00
|
|
|
return true;
|
2018-01-31 20:04:52 +00:00
|
|
|
#else // LZ4
|
2019-01-19 03:10:17 +00:00
|
|
|
(void)info;
|
2018-04-13 00:55:14 +00:00
|
|
|
(void)compress_format_version;
|
|
|
|
(void)input;
|
|
|
|
(void)length;
|
|
|
|
(void)output;
|
2015-01-09 20:57:11 +00:00
|
|
|
return false;
|
2018-01-31 20:04:52 +00:00
|
|
|
#endif
|
2015-01-09 20:57:11 +00:00
|
|
|
}
|
|
|
|
|
2015-01-15 00:24:24 +00:00
|
|
|
// compress_format_version == 1 -- decompressed size is included in the
|
|
|
|
// block header using memcpy, which makes database non-portable)
|
|
|
|
// compress_format_version == 2 -- decompressed size is included in the block
|
|
|
|
// header in varint32 format
|
Shared dictionary compression using reference block
Summary:
This adds a new metablock containing a shared dictionary that is used
to compress all data blocks in the SST file. The size of the shared dictionary
is configurable in CompressionOptions and defaults to 0. It's currently only
used for zlib/lz4/lz4hc, but the block will be stored in the SST regardless of
the compression type if the user chooses a nonzero dictionary size.
During compaction, computes the dictionary by randomly sampling the first
output file in each subcompaction. It pre-computes the intervals to sample
by assuming the output file will have the maximum allowable length. In case
the file is smaller, some of the pre-computed sampling intervals can be beyond
end-of-file, in which case we skip over those samples and the dictionary will
be a bit smaller. After the dictionary is generated using the first file in a
subcompaction, it is loaded into the compression library before writing each
block in each subsequent file of that subcompaction.
On the read path, gets the dictionary from the metablock, if it exists. Then,
loads that dictionary into the compression library before reading each block.
Test Plan: new unit test
Reviewers: yhchiang, IslamAbdelRahman, cyan, sdong
Reviewed By: sdong
Subscribers: andrewkr, yoshinorim, kradhakrishnan, dhruba, leveldb
Differential Revision: https://reviews.facebook.net/D52287
2016-04-28 00:36:03 +00:00
|
|
|
// @param compression_dict Data for presetting the compression library's
|
|
|
|
// dictionary.
|
2019-01-19 03:10:17 +00:00
|
|
|
inline CacheAllocationPtr LZ4_Uncompress(const UncompressionInfo& info,
|
2018-10-03 00:21:54 +00:00
|
|
|
const char* input_data,
|
|
|
|
size_t input_length,
|
2020-09-25 16:00:29 +00:00
|
|
|
size_t* uncompressed_size,
|
2018-10-03 00:21:54 +00:00
|
|
|
uint32_t compress_format_version,
|
2018-10-26 21:27:09 +00:00
|
|
|
MemoryAllocator* allocator = nullptr) {
|
2015-01-09 20:57:11 +00:00
|
|
|
#ifdef LZ4
|
2015-01-15 00:24:24 +00:00
|
|
|
uint32_t output_len = 0;
|
|
|
|
if (compress_format_version == 2) {
|
|
|
|
// new encoding, using varint32 to store size information
|
|
|
|
if (!compression::GetDecompressedSizeInfo(&input_data, &input_length,
|
|
|
|
&output_len)) {
|
|
|
|
return nullptr;
|
|
|
|
}
|
|
|
|
} else {
|
|
|
|
// legacy encoding, which is not really portable (depends on big/little
|
|
|
|
// endianness)
|
|
|
|
if (input_length < 8) {
|
|
|
|
return nullptr;
|
|
|
|
}
|
2021-10-22 17:12:09 +00:00
|
|
|
if (port::kLittleEndian) {
|
|
|
|
memcpy(&output_len, input_data, sizeof(output_len));
|
|
|
|
} else {
|
|
|
|
memcpy(&output_len, input_data + 4, sizeof(output_len));
|
|
|
|
}
|
2015-01-15 00:24:24 +00:00
|
|
|
input_length -= 8;
|
|
|
|
input_data += 8;
|
2015-01-09 20:57:11 +00:00
|
|
|
}
|
Shared dictionary compression using reference block
Summary:
This adds a new metablock containing a shared dictionary that is used
to compress all data blocks in the SST file. The size of the shared dictionary
is configurable in CompressionOptions and defaults to 0. It's currently only
used for zlib/lz4/lz4hc, but the block will be stored in the SST regardless of
the compression type if the user chooses a nonzero dictionary size.
During compaction, computes the dictionary by randomly sampling the first
output file in each subcompaction. It pre-computes the intervals to sample
by assuming the output file will have the maximum allowable length. In case
the file is smaller, some of the pre-computed sampling intervals can be beyond
end-of-file, in which case we skip over those samples and the dictionary will
be a bit smaller. After the dictionary is generated using the first file in a
subcompaction, it is loaded into the compression library before writing each
block in each subsequent file of that subcompaction.
On the read path, gets the dictionary from the metablock, if it exists. Then,
loads that dictionary into the compression library before reading each block.
Test Plan: new unit test
Reviewers: yhchiang, IslamAbdelRahman, cyan, sdong
Reviewed By: sdong
Subscribers: andrewkr, yoshinorim, kradhakrishnan, dhruba, leveldb
Differential Revision: https://reviews.facebook.net/D52287
2016-04-28 00:36:03 +00:00
|
|
|
|
2018-10-03 00:21:54 +00:00
|
|
|
auto output = AllocateBlock(output_len, allocator);
|
2020-09-25 16:00:29 +00:00
|
|
|
|
|
|
|
int decompress_bytes = 0;
|
|
|
|
|
Shared dictionary compression using reference block
Summary:
This adds a new metablock containing a shared dictionary that is used
to compress all data blocks in the SST file. The size of the shared dictionary
is configurable in CompressionOptions and defaults to 0. It's currently only
used for zlib/lz4/lz4hc, but the block will be stored in the SST regardless of
the compression type if the user chooses a nonzero dictionary size.
During compaction, computes the dictionary by randomly sampling the first
output file in each subcompaction. It pre-computes the intervals to sample
by assuming the output file will have the maximum allowable length. In case
the file is smaller, some of the pre-computed sampling intervals can be beyond
end-of-file, in which case we skip over those samples and the dictionary will
be a bit smaller. After the dictionary is generated using the first file in a
subcompaction, it is loaded into the compression library before writing each
block in each subsequent file of that subcompaction.
On the read path, gets the dictionary from the metablock, if it exists. Then,
loads that dictionary into the compression library before reading each block.
Test Plan: new unit test
Reviewers: yhchiang, IslamAbdelRahman, cyan, sdong
Reviewed By: sdong
Subscribers: andrewkr, yoshinorim, kradhakrishnan, dhruba, leveldb
Differential Revision: https://reviews.facebook.net/D52287
2016-04-28 00:36:03 +00:00
|
|
|
#if LZ4_VERSION_NUMBER >= 10400 // r124+
|
|
|
|
LZ4_streamDecode_t* stream = LZ4_createStreamDecode();
|
2019-07-23 22:57:43 +00:00
|
|
|
const Slice& compression_dict = info.dict().GetRawDict();
|
2019-01-19 03:10:17 +00:00
|
|
|
if (compression_dict.size()) {
|
|
|
|
LZ4_setStreamDecode(stream, compression_dict.data(),
|
|
|
|
static_cast<int>(compression_dict.size()));
|
Shared dictionary compression using reference block
Summary:
This adds a new metablock containing a shared dictionary that is used
to compress all data blocks in the SST file. The size of the shared dictionary
is configurable in CompressionOptions and defaults to 0. It's currently only
used for zlib/lz4/lz4hc, but the block will be stored in the SST regardless of
the compression type if the user chooses a nonzero dictionary size.
During compaction, computes the dictionary by randomly sampling the first
output file in each subcompaction. It pre-computes the intervals to sample
by assuming the output file will have the maximum allowable length. In case
the file is smaller, some of the pre-computed sampling intervals can be beyond
end-of-file, in which case we skip over those samples and the dictionary will
be a bit smaller. After the dictionary is generated using the first file in a
subcompaction, it is loaded into the compression library before writing each
block in each subsequent file of that subcompaction.
On the read path, gets the dictionary from the metablock, if it exists. Then,
loads that dictionary into the compression library before reading each block.
Test Plan: new unit test
Reviewers: yhchiang, IslamAbdelRahman, cyan, sdong
Reviewed By: sdong
Subscribers: andrewkr, yoshinorim, kradhakrishnan, dhruba, leveldb
Differential Revision: https://reviews.facebook.net/D52287
2016-04-28 00:36:03 +00:00
|
|
|
}
|
2020-09-25 16:00:29 +00:00
|
|
|
decompress_bytes = LZ4_decompress_safe_continue(
|
2018-10-03 00:21:54 +00:00
|
|
|
stream, input_data, output.get(), static_cast<int>(input_length),
|
Shared dictionary compression using reference block
Summary:
This adds a new metablock containing a shared dictionary that is used
to compress all data blocks in the SST file. The size of the shared dictionary
is configurable in CompressionOptions and defaults to 0. It's currently only
used for zlib/lz4/lz4hc, but the block will be stored in the SST regardless of
the compression type if the user chooses a nonzero dictionary size.
During compaction, computes the dictionary by randomly sampling the first
output file in each subcompaction. It pre-computes the intervals to sample
by assuming the output file will have the maximum allowable length. In case
the file is smaller, some of the pre-computed sampling intervals can be beyond
end-of-file, in which case we skip over those samples and the dictionary will
be a bit smaller. After the dictionary is generated using the first file in a
subcompaction, it is loaded into the compression library before writing each
block in each subsequent file of that subcompaction.
On the read path, gets the dictionary from the metablock, if it exists. Then,
loads that dictionary into the compression library before reading each block.
Test Plan: new unit test
Reviewers: yhchiang, IslamAbdelRahman, cyan, sdong
Reviewed By: sdong
Subscribers: andrewkr, yoshinorim, kradhakrishnan, dhruba, leveldb
Differential Revision: https://reviews.facebook.net/D52287
2016-04-28 00:36:03 +00:00
|
|
|
static_cast<int>(output_len));
|
|
|
|
LZ4_freeStreamDecode(stream);
|
|
|
|
#else // up to r123
|
2020-09-25 16:00:29 +00:00
|
|
|
decompress_bytes = LZ4_decompress_safe(input_data, output.get(),
|
2018-10-03 00:21:54 +00:00
|
|
|
static_cast<int>(input_length),
|
|
|
|
static_cast<int>(output_len));
|
Shared dictionary compression using reference block
Summary:
This adds a new metablock containing a shared dictionary that is used
to compress all data blocks in the SST file. The size of the shared dictionary
is configurable in CompressionOptions and defaults to 0. It's currently only
used for zlib/lz4/lz4hc, but the block will be stored in the SST regardless of
the compression type if the user chooses a nonzero dictionary size.
During compaction, computes the dictionary by randomly sampling the first
output file in each subcompaction. It pre-computes the intervals to sample
by assuming the output file will have the maximum allowable length. In case
the file is smaller, some of the pre-computed sampling intervals can be beyond
end-of-file, in which case we skip over those samples and the dictionary will
be a bit smaller. After the dictionary is generated using the first file in a
subcompaction, it is loaded into the compression library before writing each
block in each subsequent file of that subcompaction.
On the read path, gets the dictionary from the metablock, if it exists. Then,
loads that dictionary into the compression library before reading each block.
Test Plan: new unit test
Reviewers: yhchiang, IslamAbdelRahman, cyan, sdong
Reviewed By: sdong
Subscribers: andrewkr, yoshinorim, kradhakrishnan, dhruba, leveldb
Differential Revision: https://reviews.facebook.net/D52287
2016-04-28 00:36:03 +00:00
|
|
|
#endif // LZ4_VERSION_NUMBER >= 10400
|
|
|
|
|
2020-09-25 16:00:29 +00:00
|
|
|
if (decompress_bytes < 0) {
|
2015-01-09 20:57:11 +00:00
|
|
|
return nullptr;
|
|
|
|
}
|
2020-09-25 16:00:29 +00:00
|
|
|
assert(decompress_bytes == static_cast<int>(output_len));
|
|
|
|
*uncompressed_size = decompress_bytes;
|
2015-01-09 20:57:11 +00:00
|
|
|
return output;
|
2018-01-31 20:04:52 +00:00
|
|
|
#else // LZ4
|
2019-01-19 03:10:17 +00:00
|
|
|
(void)info;
|
2018-04-13 00:55:14 +00:00
|
|
|
(void)input_data;
|
|
|
|
(void)input_length;
|
2020-09-25 16:00:29 +00:00
|
|
|
(void)uncompressed_size;
|
2018-04-13 00:55:14 +00:00
|
|
|
(void)compress_format_version;
|
2018-10-03 00:21:54 +00:00
|
|
|
(void)allocator;
|
2015-01-09 20:57:11 +00:00
|
|
|
return nullptr;
|
2018-01-31 20:04:52 +00:00
|
|
|
#endif
|
2015-01-09 20:57:11 +00:00
|
|
|
}
|
|
|
|
|
2015-01-15 00:24:24 +00:00
|
|
|
// compress_format_version == 1 -- decompressed size is included in the
|
|
|
|
// block header using memcpy, which makes database non-portable)
|
|
|
|
// compress_format_version == 2 -- decompressed size is included in the block
|
|
|
|
// header in varint32 format
|
Shared dictionary compression using reference block
Summary:
This adds a new metablock containing a shared dictionary that is used
to compress all data blocks in the SST file. The size of the shared dictionary
is configurable in CompressionOptions and defaults to 0. It's currently only
used for zlib/lz4/lz4hc, but the block will be stored in the SST regardless of
the compression type if the user chooses a nonzero dictionary size.
During compaction, computes the dictionary by randomly sampling the first
output file in each subcompaction. It pre-computes the intervals to sample
by assuming the output file will have the maximum allowable length. In case
the file is smaller, some of the pre-computed sampling intervals can be beyond
end-of-file, in which case we skip over those samples and the dictionary will
be a bit smaller. After the dictionary is generated using the first file in a
subcompaction, it is loaded into the compression library before writing each
block in each subsequent file of that subcompaction.
On the read path, gets the dictionary from the metablock, if it exists. Then,
loads that dictionary into the compression library before reading each block.
Test Plan: new unit test
Reviewers: yhchiang, IslamAbdelRahman, cyan, sdong
Reviewed By: sdong
Subscribers: andrewkr, yoshinorim, kradhakrishnan, dhruba, leveldb
Differential Revision: https://reviews.facebook.net/D52287
2016-04-28 00:36:03 +00:00
|
|
|
// @param compression_dict Data for presetting the compression library's
|
|
|
|
// dictionary.
|
2019-01-19 03:10:17 +00:00
|
|
|
inline bool LZ4HC_Compress(const CompressionInfo& info,
|
2015-01-15 00:24:24 +00:00
|
|
|
uint32_t compress_format_version, const char* input,
|
2018-06-04 19:04:52 +00:00
|
|
|
size_t length, ::std::string* output) {
|
2015-01-09 20:57:11 +00:00
|
|
|
#ifdef LZ4
|
2015-01-15 00:24:24 +00:00
|
|
|
if (length > std::numeric_limits<uint32_t>::max()) {
|
|
|
|
// Can't compress more than 4GB
|
|
|
|
return false;
|
|
|
|
}
|
|
|
|
|
|
|
|
size_t output_header_len = 0;
|
|
|
|
if (compress_format_version == 2) {
|
|
|
|
// new encoding, using varint32 to store size information
|
|
|
|
output_header_len = compression::PutDecompressedSizeInfo(
|
|
|
|
output, static_cast<uint32_t>(length));
|
|
|
|
} else {
|
|
|
|
// legacy encoding, which is not really portable (depends on big/little
|
|
|
|
// endianness)
|
|
|
|
output_header_len = 8;
|
|
|
|
output->resize(output_header_len);
|
|
|
|
char* p = const_cast<char*>(output->c_str());
|
|
|
|
memcpy(p, &length, sizeof(length));
|
|
|
|
}
|
Shared dictionary compression using reference block
Summary:
This adds a new metablock containing a shared dictionary that is used
to compress all data blocks in the SST file. The size of the shared dictionary
is configurable in CompressionOptions and defaults to 0. It's currently only
used for zlib/lz4/lz4hc, but the block will be stored in the SST regardless of
the compression type if the user chooses a nonzero dictionary size.
During compaction, computes the dictionary by randomly sampling the first
output file in each subcompaction. It pre-computes the intervals to sample
by assuming the output file will have the maximum allowable length. In case
the file is smaller, some of the pre-computed sampling intervals can be beyond
end-of-file, in which case we skip over those samples and the dictionary will
be a bit smaller. After the dictionary is generated using the first file in a
subcompaction, it is loaded into the compression library before writing each
block in each subsequent file of that subcompaction.
On the read path, gets the dictionary from the metablock, if it exists. Then,
loads that dictionary into the compression library before reading each block.
Test Plan: new unit test
Reviewers: yhchiang, IslamAbdelRahman, cyan, sdong
Reviewed By: sdong
Subscribers: andrewkr, yoshinorim, kradhakrishnan, dhruba, leveldb
Differential Revision: https://reviews.facebook.net/D52287
2016-04-28 00:36:03 +00:00
|
|
|
int compress_bound = LZ4_compressBound(static_cast<int>(length));
|
|
|
|
output->resize(static_cast<size_t>(output_header_len + compress_bound));
|
2015-01-15 00:24:24 +00:00
|
|
|
|
2015-01-09 20:57:11 +00:00
|
|
|
int outlen;
|
2018-05-24 01:33:00 +00:00
|
|
|
int level;
|
2019-01-19 03:10:17 +00:00
|
|
|
if (info.options().level == CompressionOptions::kDefaultCompressionLevel) {
|
2018-05-24 01:33:00 +00:00
|
|
|
level = 0; // lz4hc.h says any value < 1 will be sanitized to default
|
|
|
|
} else {
|
2019-01-19 03:10:17 +00:00
|
|
|
level = info.options().level;
|
2018-05-24 01:33:00 +00:00
|
|
|
}
|
Shared dictionary compression using reference block
Summary:
This adds a new metablock containing a shared dictionary that is used
to compress all data blocks in the SST file. The size of the shared dictionary
is configurable in CompressionOptions and defaults to 0. It's currently only
used for zlib/lz4/lz4hc, but the block will be stored in the SST regardless of
the compression type if the user chooses a nonzero dictionary size.
During compaction, computes the dictionary by randomly sampling the first
output file in each subcompaction. It pre-computes the intervals to sample
by assuming the output file will have the maximum allowable length. In case
the file is smaller, some of the pre-computed sampling intervals can be beyond
end-of-file, in which case we skip over those samples and the dictionary will
be a bit smaller. After the dictionary is generated using the first file in a
subcompaction, it is loaded into the compression library before writing each
block in each subsequent file of that subcompaction.
On the read path, gets the dictionary from the metablock, if it exists. Then,
loads that dictionary into the compression library before reading each block.
Test Plan: new unit test
Reviewers: yhchiang, IslamAbdelRahman, cyan, sdong
Reviewed By: sdong
Subscribers: andrewkr, yoshinorim, kradhakrishnan, dhruba, leveldb
Differential Revision: https://reviews.facebook.net/D52287
2016-04-28 00:36:03 +00:00
|
|
|
#if LZ4_VERSION_NUMBER >= 10400 // r124+
|
|
|
|
LZ4_streamHC_t* stream = LZ4_createStreamHC();
|
2018-05-24 01:33:00 +00:00
|
|
|
LZ4_resetStreamHC(stream, level);
|
2019-01-19 03:10:17 +00:00
|
|
|
Slice compression_dict = info.dict().GetRawDict();
|
Shared dictionary compression using reference block
Summary:
This adds a new metablock containing a shared dictionary that is used
to compress all data blocks in the SST file. The size of the shared dictionary
is configurable in CompressionOptions and defaults to 0. It's currently only
used for zlib/lz4/lz4hc, but the block will be stored in the SST regardless of
the compression type if the user chooses a nonzero dictionary size.
During compaction, computes the dictionary by randomly sampling the first
output file in each subcompaction. It pre-computes the intervals to sample
by assuming the output file will have the maximum allowable length. In case
the file is smaller, some of the pre-computed sampling intervals can be beyond
end-of-file, in which case we skip over those samples and the dictionary will
be a bit smaller. After the dictionary is generated using the first file in a
subcompaction, it is loaded into the compression library before writing each
block in each subsequent file of that subcompaction.
On the read path, gets the dictionary from the metablock, if it exists. Then,
loads that dictionary into the compression library before reading each block.
Test Plan: new unit test
Reviewers: yhchiang, IslamAbdelRahman, cyan, sdong
Reviewed By: sdong
Subscribers: andrewkr, yoshinorim, kradhakrishnan, dhruba, leveldb
Differential Revision: https://reviews.facebook.net/D52287
2016-04-28 00:36:03 +00:00
|
|
|
const char* compression_dict_data =
|
2019-01-19 03:10:17 +00:00
|
|
|
compression_dict.size() > 0 ? compression_dict.data() : nullptr;
|
|
|
|
size_t compression_dict_size = compression_dict.size();
|
2021-08-09 23:04:26 +00:00
|
|
|
if (compression_dict_data != nullptr) {
|
|
|
|
LZ4_loadDictHC(stream, compression_dict_data,
|
2022-10-26 19:08:20 +00:00
|
|
|
static_cast<int>(compression_dict_size));
|
2021-08-09 23:04:26 +00:00
|
|
|
}
|
Shared dictionary compression using reference block
Summary:
This adds a new metablock containing a shared dictionary that is used
to compress all data blocks in the SST file. The size of the shared dictionary
is configurable in CompressionOptions and defaults to 0. It's currently only
used for zlib/lz4/lz4hc, but the block will be stored in the SST regardless of
the compression type if the user chooses a nonzero dictionary size.
During compaction, computes the dictionary by randomly sampling the first
output file in each subcompaction. It pre-computes the intervals to sample
by assuming the output file will have the maximum allowable length. In case
the file is smaller, some of the pre-computed sampling intervals can be beyond
end-of-file, in which case we skip over those samples and the dictionary will
be a bit smaller. After the dictionary is generated using the first file in a
subcompaction, it is loaded into the compression library before writing each
block in each subsequent file of that subcompaction.
On the read path, gets the dictionary from the metablock, if it exists. Then,
loads that dictionary into the compression library before reading each block.
Test Plan: new unit test
Reviewers: yhchiang, IslamAbdelRahman, cyan, sdong
Reviewed By: sdong
Subscribers: andrewkr, yoshinorim, kradhakrishnan, dhruba, leveldb
Differential Revision: https://reviews.facebook.net/D52287
2016-04-28 00:36:03 +00:00
|
|
|
|
|
|
|
#if LZ4_VERSION_NUMBER >= 10700 // r129+
|
|
|
|
outlen =
|
|
|
|
LZ4_compress_HC_continue(stream, input, &(*output)[output_header_len],
|
|
|
|
static_cast<int>(length), compress_bound);
|
|
|
|
#else // r124-r128
|
|
|
|
outlen = LZ4_compressHC_limitedOutput_continue(
|
|
|
|
stream, input, &(*output)[output_header_len], static_cast<int>(length),
|
|
|
|
compress_bound);
|
|
|
|
#endif // LZ4_VERSION_NUMBER >= 10700
|
|
|
|
LZ4_freeStreamHC(stream);
|
|
|
|
|
|
|
|
#elif LZ4_VERSION_MAJOR // r113-r123
|
2015-01-15 00:24:24 +00:00
|
|
|
outlen = LZ4_compressHC2_limitedOutput(input, &(*output)[output_header_len],
|
|
|
|
static_cast<int>(length),
|
2018-05-24 01:33:00 +00:00
|
|
|
compress_bound, level);
|
Shared dictionary compression using reference block
Summary:
This adds a new metablock containing a shared dictionary that is used
to compress all data blocks in the SST file. The size of the shared dictionary
is configurable in CompressionOptions and defaults to 0. It's currently only
used for zlib/lz4/lz4hc, but the block will be stored in the SST regardless of
the compression type if the user chooses a nonzero dictionary size.
During compaction, computes the dictionary by randomly sampling the first
output file in each subcompaction. It pre-computes the intervals to sample
by assuming the output file will have the maximum allowable length. In case
the file is smaller, some of the pre-computed sampling intervals can be beyond
end-of-file, in which case we skip over those samples and the dictionary will
be a bit smaller. After the dictionary is generated using the first file in a
subcompaction, it is loaded into the compression library before writing each
block in each subsequent file of that subcompaction.
On the read path, gets the dictionary from the metablock, if it exists. Then,
loads that dictionary into the compression library before reading each block.
Test Plan: new unit test
Reviewers: yhchiang, IslamAbdelRahman, cyan, sdong
Reviewed By: sdong
Subscribers: andrewkr, yoshinorim, kradhakrishnan, dhruba, leveldb
Differential Revision: https://reviews.facebook.net/D52287
2016-04-28 00:36:03 +00:00
|
|
|
#else // up to r112
|
2015-01-15 00:24:24 +00:00
|
|
|
outlen =
|
|
|
|
LZ4_compressHC_limitedOutput(input, &(*output)[output_header_len],
|
Shared dictionary compression using reference block
Summary:
This adds a new metablock containing a shared dictionary that is used
to compress all data blocks in the SST file. The size of the shared dictionary
is configurable in CompressionOptions and defaults to 0. It's currently only
used for zlib/lz4/lz4hc, but the block will be stored in the SST regardless of
the compression type if the user chooses a nonzero dictionary size.
During compaction, computes the dictionary by randomly sampling the first
output file in each subcompaction. It pre-computes the intervals to sample
by assuming the output file will have the maximum allowable length. In case
the file is smaller, some of the pre-computed sampling intervals can be beyond
end-of-file, in which case we skip over those samples and the dictionary will
be a bit smaller. After the dictionary is generated using the first file in a
subcompaction, it is loaded into the compression library before writing each
block in each subsequent file of that subcompaction.
On the read path, gets the dictionary from the metablock, if it exists. Then,
loads that dictionary into the compression library before reading each block.
Test Plan: new unit test
Reviewers: yhchiang, IslamAbdelRahman, cyan, sdong
Reviewed By: sdong
Subscribers: andrewkr, yoshinorim, kradhakrishnan, dhruba, leveldb
Differential Revision: https://reviews.facebook.net/D52287
2016-04-28 00:36:03 +00:00
|
|
|
static_cast<int>(length), compress_bound);
|
|
|
|
#endif // LZ4_VERSION_NUMBER >= 10400
|
|
|
|
|
2015-01-09 20:57:11 +00:00
|
|
|
if (outlen == 0) {
|
|
|
|
return false;
|
|
|
|
}
|
2015-01-15 00:24:24 +00:00
|
|
|
output->resize(static_cast<size_t>(output_header_len + outlen));
|
2015-01-09 20:57:11 +00:00
|
|
|
return true;
|
2018-01-31 20:04:52 +00:00
|
|
|
#else // LZ4
|
2019-01-19 03:10:17 +00:00
|
|
|
(void)info;
|
2018-04-13 00:55:14 +00:00
|
|
|
(void)compress_format_version;
|
|
|
|
(void)input;
|
|
|
|
(void)length;
|
|
|
|
(void)output;
|
2015-01-09 20:57:11 +00:00
|
|
|
return false;
|
2018-01-31 20:04:52 +00:00
|
|
|
#endif
|
2015-01-09 20:57:11 +00:00
|
|
|
}
|
2015-01-15 00:24:24 +00:00
|
|
|
|
2016-04-20 05:54:24 +00:00
|
|
|
#ifdef XPRESS
|
2018-03-05 21:08:17 +00:00
|
|
|
inline bool XPRESS_Compress(const char* input, size_t length,
|
|
|
|
std::string* output) {
|
2016-04-20 05:54:24 +00:00
|
|
|
return port::xpress::Compress(input, length, output);
|
2018-03-05 21:08:17 +00:00
|
|
|
}
|
2017-10-19 17:48:47 +00:00
|
|
|
#else
|
2018-03-05 21:08:17 +00:00
|
|
|
inline bool XPRESS_Compress(const char* /*input*/, size_t /*length*/,
|
|
|
|
std::string* /*output*/) {
|
2016-04-20 05:54:24 +00:00
|
|
|
return false;
|
|
|
|
}
|
2018-03-05 21:08:17 +00:00
|
|
|
#endif
|
2016-04-20 05:54:24 +00:00
|
|
|
|
|
|
|
#ifdef XPRESS
|
2018-04-13 00:55:14 +00:00
|
|
|
inline char* XPRESS_Uncompress(const char* input_data, size_t input_length,
|
2020-09-25 16:00:29 +00:00
|
|
|
size_t* uncompressed_size) {
|
|
|
|
return port::xpress::Decompress(input_data, input_length, uncompressed_size);
|
2018-03-05 21:08:17 +00:00
|
|
|
}
|
2017-10-19 17:48:47 +00:00
|
|
|
#else
|
2018-03-05 21:08:17 +00:00
|
|
|
inline char* XPRESS_Uncompress(const char* /*input_data*/,
|
|
|
|
size_t /*input_length*/,
|
2020-09-25 16:00:29 +00:00
|
|
|
size_t* /*uncompressed_size*/) {
|
2016-04-20 05:54:24 +00:00
|
|
|
return nullptr;
|
|
|
|
}
|
2018-03-05 21:08:17 +00:00
|
|
|
#endif
|
2016-04-20 05:54:24 +00:00
|
|
|
|
2019-01-19 03:10:17 +00:00
|
|
|
inline bool ZSTD_Compress(const CompressionInfo& info, const char* input,
|
2018-06-04 19:04:52 +00:00
|
|
|
size_t length, ::std::string* output) {
|
2015-08-27 22:40:42 +00:00
|
|
|
#ifdef ZSTD
|
|
|
|
if (length > std::numeric_limits<uint32_t>::max()) {
|
|
|
|
// Can't compress more than 4GB
|
|
|
|
return false;
|
|
|
|
}
|
|
|
|
|
|
|
|
size_t output_header_len = compression::PutDecompressedSizeInfo(
|
|
|
|
output, static_cast<uint32_t>(length));
|
|
|
|
|
|
|
|
size_t compressBound = ZSTD_compressBound(length);
|
|
|
|
output->resize(static_cast<size_t>(output_header_len + compressBound));
|
2018-06-04 19:04:52 +00:00
|
|
|
size_t outlen = 0;
|
Shared dictionary compression using reference block
Summary:
This adds a new metablock containing a shared dictionary that is used
to compress all data blocks in the SST file. The size of the shared dictionary
is configurable in CompressionOptions and defaults to 0. It's currently only
used for zlib/lz4/lz4hc, but the block will be stored in the SST regardless of
the compression type if the user chooses a nonzero dictionary size.
During compaction, computes the dictionary by randomly sampling the first
output file in each subcompaction. It pre-computes the intervals to sample
by assuming the output file will have the maximum allowable length. In case
the file is smaller, some of the pre-computed sampling intervals can be beyond
end-of-file, in which case we skip over those samples and the dictionary will
be a bit smaller. After the dictionary is generated using the first file in a
subcompaction, it is loaded into the compression library before writing each
block in each subsequent file of that subcompaction.
On the read path, gets the dictionary from the metablock, if it exists. Then,
loads that dictionary into the compression library before reading each block.
Test Plan: new unit test
Reviewers: yhchiang, IslamAbdelRahman, cyan, sdong
Reviewed By: sdong
Subscribers: andrewkr, yoshinorim, kradhakrishnan, dhruba, leveldb
Differential Revision: https://reviews.facebook.net/D52287
2016-04-28 00:36:03 +00:00
|
|
|
#if ZSTD_VERSION_NUMBER >= 500 // v0.5.0+
|
2019-01-19 03:10:17 +00:00
|
|
|
ZSTD_CCtx* context = info.context().ZSTDPreallocCtx();
|
2018-06-04 19:04:52 +00:00
|
|
|
assert(context != nullptr);
|
Add `CompressionOptions::checksum` for enabling ZSTD checksum (#11666)
Summary:
Optionally enable zstd checksum flag (https://github.com/facebook/zstd/blob/d857369028d997c92ff1f1861a4d7f679a125464/lib/zstd.h#L428) to detect corruption during decompression. Main changes are in compression.h:
* User can set CompressionOptions::checksum to true to enable this feature.
* We enable this feature in ZSTD by setting the checksum flag in ZSTD compression context: `ZSTD_CCtx`.
* Uses `ZSTD_compress2()` to do compression since it supports frame parameter like the checksum flag. Compression level is also set in compression context as a flag.
* Error handling during decompression to propagate error message from ZSTD.
* Updated microbench to test read performance impact.
About compatibility, the current compression decoders should continue to work with the data created by the new compression API `ZSTD_compress2()`: https://github.com/facebook/zstd/issues/3711.
Pull Request resolved: https://github.com/facebook/rocksdb/pull/11666
Test Plan:
* Existing unit tests for zstd compression
* Add unit test `DBTest2.ZSTDChecksum` to test the corruption case
* Manually tested that compression levels, parallel compression, dictionary compression, index compression all work with the new ZSTD_compress2() API.
* Manually tested with `sst_dump --command=recompress` that different compression levels and dictionary compression settings all work.
* Manually tested compiling with older versions of ZSTD: v1.3.8, v1.1.0, v0.6.2.
* Perf impact: from public benchmark data: http://fastcompression.blogspot.com/2019/03/presenting-xxh3.html for checksum and https://github.com/facebook/zstd#benchmarks, if decompression is 1700MB/s and checksum computation is 70000MB/s, checksum computation is an additional ~2.4% time for decompression. Compression is slower and checksumming should be less noticeable.
* Microbench:
```
TEST_TMPDIR=/dev/shm ./branch_db_basic_bench --benchmark_filter=DBGet/comp_style:0/max_data:1048576/per_key_size:256/enable_statistics:0/negative_query:0/enable_filter:0/mmap:0/compression_type:7/compression_checksum:1/no_blockcache:1/iterations:10000/threads:1 --benchmark_repetitions=100
Min out of 100 runs:
Main:
10390 10436 10456 10484 10499 10535 10544 10545 10565 10568
After this PR, checksum=false
10285 10397 10503 10508 10515 10557 10562 10635 10640 10660
After this PR, checksum=true
10827 10876 10925 10949 10971 11052 11061 11063 11100 11109
```
* db_bench:
```
Write perf
TEST_TMPDIR=/dev/shm/ ./db_bench_ichecksum --benchmarks=fillseq[-X10] --compression_type=zstd --num=10000000 --compression_checksum=..
[FillSeq checksum=0]
fillseq [AVG 10 runs] : 281635 (± 31711) ops/sec; 31.2 (± 3.5) MB/sec
fillseq [MEDIAN 10 runs] : 294027 ops/sec; 32.5 MB/sec
[FillSeq checksum=1]
fillseq [AVG 10 runs] : 286961 (± 34700) ops/sec; 31.7 (± 3.8) MB/sec
fillseq [MEDIAN 10 runs] : 283278 ops/sec; 31.3 MB/sec
Read perf
TEST_TMPDIR=/dev/shm ./db_bench_ichecksum --benchmarks=readrandom[-X20] --num=100000000 --reads=1000000 --use_existing_db=true --readonly=1
[Readrandom checksum=1]
readrandom [AVG 20 runs] : 360928 (± 3579) ops/sec; 4.0 (± 0.0) MB/sec
readrandom [MEDIAN 20 runs] : 362468 ops/sec; 4.0 MB/sec
[Readrandom checksum=0]
readrandom [AVG 20 runs] : 380365 (± 2384) ops/sec; 4.2 (± 0.0) MB/sec
readrandom [MEDIAN 20 runs] : 379800 ops/sec; 4.2 MB/sec
Compression
TEST_TMPDIR=/dev/shm ./db_bench_ichecksum --benchmarks=compress[-X20] --compression_type=zstd --num=100000000 --compression_checksum=1
checksum=1
compress [AVG 20 runs] : 54074 (± 634) ops/sec; 211.2 (± 2.5) MB/sec
compress [MEDIAN 20 runs] : 54396 ops/sec; 212.5 MB/sec
checksum=0
compress [AVG 20 runs] : 54598 (± 393) ops/sec; 213.3 (± 1.5) MB/sec
compress [MEDIAN 20 runs] : 54592 ops/sec; 213.3 MB/sec
Decompression:
TEST_TMPDIR=/dev/shm ./db_bench_ichecksum --benchmarks=uncompress[-X20] --compression_type=zstd --compression_checksum=1
checksum = 0
uncompress [AVG 20 runs] : 167499 (± 962) ops/sec; 654.3 (± 3.8) MB/sec
uncompress [MEDIAN 20 runs] : 167210 ops/sec; 653.2 MB/sec
checksum = 1
uncompress [AVG 20 runs] : 167980 (± 924) ops/sec; 656.2 (± 3.6) MB/sec
uncompress [MEDIAN 20 runs] : 168465 ops/sec; 658.1 MB/sec
```
Reviewed By: ajkr
Differential Revision: D48019378
Pulled By: cbi42
fbshipit-source-id: 674120c6e1853c2ced1436ac8138559d0204feba
2023-08-18 22:01:59 +00:00
|
|
|
#ifdef ZSTD_ADVANCED
|
|
|
|
if (info.dict().GetDigestedZstdCDict() != nullptr) {
|
|
|
|
ZSTD_CCtx_refCDict(context, info.dict().GetDigestedZstdCDict());
|
|
|
|
} else {
|
|
|
|
ZSTD_CCtx_loadDictionary(context, info.dict().GetRawDict().data(),
|
|
|
|
info.dict().GetRawDict().size());
|
|
|
|
}
|
|
|
|
|
|
|
|
// Compression level is set in `contex` during CreateNativeContext()
|
|
|
|
outlen = ZSTD_compress2(context, &(*output)[output_header_len], compressBound,
|
|
|
|
input, length);
|
|
|
|
#else // ZSTD_ADVANCED
|
2019-01-19 03:10:17 +00:00
|
|
|
#if ZSTD_VERSION_NUMBER >= 700 // v0.7.0+
|
|
|
|
if (info.dict().GetDigestedZstdCDict() != nullptr) {
|
|
|
|
outlen = ZSTD_compress_usingCDict(context, &(*output)[output_header_len],
|
|
|
|
compressBound, input, length,
|
|
|
|
info.dict().GetDigestedZstdCDict());
|
|
|
|
}
|
Add `CompressionOptions::checksum` for enabling ZSTD checksum (#11666)
Summary:
Optionally enable zstd checksum flag (https://github.com/facebook/zstd/blob/d857369028d997c92ff1f1861a4d7f679a125464/lib/zstd.h#L428) to detect corruption during decompression. Main changes are in compression.h:
* User can set CompressionOptions::checksum to true to enable this feature.
* We enable this feature in ZSTD by setting the checksum flag in ZSTD compression context: `ZSTD_CCtx`.
* Uses `ZSTD_compress2()` to do compression since it supports frame parameter like the checksum flag. Compression level is also set in compression context as a flag.
* Error handling during decompression to propagate error message from ZSTD.
* Updated microbench to test read performance impact.
About compatibility, the current compression decoders should continue to work with the data created by the new compression API `ZSTD_compress2()`: https://github.com/facebook/zstd/issues/3711.
Pull Request resolved: https://github.com/facebook/rocksdb/pull/11666
Test Plan:
* Existing unit tests for zstd compression
* Add unit test `DBTest2.ZSTDChecksum` to test the corruption case
* Manually tested that compression levels, parallel compression, dictionary compression, index compression all work with the new ZSTD_compress2() API.
* Manually tested with `sst_dump --command=recompress` that different compression levels and dictionary compression settings all work.
* Manually tested compiling with older versions of ZSTD: v1.3.8, v1.1.0, v0.6.2.
* Perf impact: from public benchmark data: http://fastcompression.blogspot.com/2019/03/presenting-xxh3.html for checksum and https://github.com/facebook/zstd#benchmarks, if decompression is 1700MB/s and checksum computation is 70000MB/s, checksum computation is an additional ~2.4% time for decompression. Compression is slower and checksumming should be less noticeable.
* Microbench:
```
TEST_TMPDIR=/dev/shm ./branch_db_basic_bench --benchmark_filter=DBGet/comp_style:0/max_data:1048576/per_key_size:256/enable_statistics:0/negative_query:0/enable_filter:0/mmap:0/compression_type:7/compression_checksum:1/no_blockcache:1/iterations:10000/threads:1 --benchmark_repetitions=100
Min out of 100 runs:
Main:
10390 10436 10456 10484 10499 10535 10544 10545 10565 10568
After this PR, checksum=false
10285 10397 10503 10508 10515 10557 10562 10635 10640 10660
After this PR, checksum=true
10827 10876 10925 10949 10971 11052 11061 11063 11100 11109
```
* db_bench:
```
Write perf
TEST_TMPDIR=/dev/shm/ ./db_bench_ichecksum --benchmarks=fillseq[-X10] --compression_type=zstd --num=10000000 --compression_checksum=..
[FillSeq checksum=0]
fillseq [AVG 10 runs] : 281635 (± 31711) ops/sec; 31.2 (± 3.5) MB/sec
fillseq [MEDIAN 10 runs] : 294027 ops/sec; 32.5 MB/sec
[FillSeq checksum=1]
fillseq [AVG 10 runs] : 286961 (± 34700) ops/sec; 31.7 (± 3.8) MB/sec
fillseq [MEDIAN 10 runs] : 283278 ops/sec; 31.3 MB/sec
Read perf
TEST_TMPDIR=/dev/shm ./db_bench_ichecksum --benchmarks=readrandom[-X20] --num=100000000 --reads=1000000 --use_existing_db=true --readonly=1
[Readrandom checksum=1]
readrandom [AVG 20 runs] : 360928 (± 3579) ops/sec; 4.0 (± 0.0) MB/sec
readrandom [MEDIAN 20 runs] : 362468 ops/sec; 4.0 MB/sec
[Readrandom checksum=0]
readrandom [AVG 20 runs] : 380365 (± 2384) ops/sec; 4.2 (± 0.0) MB/sec
readrandom [MEDIAN 20 runs] : 379800 ops/sec; 4.2 MB/sec
Compression
TEST_TMPDIR=/dev/shm ./db_bench_ichecksum --benchmarks=compress[-X20] --compression_type=zstd --num=100000000 --compression_checksum=1
checksum=1
compress [AVG 20 runs] : 54074 (± 634) ops/sec; 211.2 (± 2.5) MB/sec
compress [MEDIAN 20 runs] : 54396 ops/sec; 212.5 MB/sec
checksum=0
compress [AVG 20 runs] : 54598 (± 393) ops/sec; 213.3 (± 1.5) MB/sec
compress [MEDIAN 20 runs] : 54592 ops/sec; 213.3 MB/sec
Decompression:
TEST_TMPDIR=/dev/shm ./db_bench_ichecksum --benchmarks=uncompress[-X20] --compression_type=zstd --compression_checksum=1
checksum = 0
uncompress [AVG 20 runs] : 167499 (± 962) ops/sec; 654.3 (± 3.8) MB/sec
uncompress [MEDIAN 20 runs] : 167210 ops/sec; 653.2 MB/sec
checksum = 1
uncompress [AVG 20 runs] : 167980 (± 924) ops/sec; 656.2 (± 3.6) MB/sec
uncompress [MEDIAN 20 runs] : 168465 ops/sec; 658.1 MB/sec
```
Reviewed By: ajkr
Differential Revision: D48019378
Pulled By: cbi42
fbshipit-source-id: 674120c6e1853c2ced1436ac8138559d0204feba
2023-08-18 22:01:59 +00:00
|
|
|
#endif // ZSTD_VERSION_NUMBER >= 700
|
|
|
|
// TODO (cbi): error handling for compression.
|
2019-01-19 03:10:17 +00:00
|
|
|
if (outlen == 0) {
|
Add `CompressionOptions::checksum` for enabling ZSTD checksum (#11666)
Summary:
Optionally enable zstd checksum flag (https://github.com/facebook/zstd/blob/d857369028d997c92ff1f1861a4d7f679a125464/lib/zstd.h#L428) to detect corruption during decompression. Main changes are in compression.h:
* User can set CompressionOptions::checksum to true to enable this feature.
* We enable this feature in ZSTD by setting the checksum flag in ZSTD compression context: `ZSTD_CCtx`.
* Uses `ZSTD_compress2()` to do compression since it supports frame parameter like the checksum flag. Compression level is also set in compression context as a flag.
* Error handling during decompression to propagate error message from ZSTD.
* Updated microbench to test read performance impact.
About compatibility, the current compression decoders should continue to work with the data created by the new compression API `ZSTD_compress2()`: https://github.com/facebook/zstd/issues/3711.
Pull Request resolved: https://github.com/facebook/rocksdb/pull/11666
Test Plan:
* Existing unit tests for zstd compression
* Add unit test `DBTest2.ZSTDChecksum` to test the corruption case
* Manually tested that compression levels, parallel compression, dictionary compression, index compression all work with the new ZSTD_compress2() API.
* Manually tested with `sst_dump --command=recompress` that different compression levels and dictionary compression settings all work.
* Manually tested compiling with older versions of ZSTD: v1.3.8, v1.1.0, v0.6.2.
* Perf impact: from public benchmark data: http://fastcompression.blogspot.com/2019/03/presenting-xxh3.html for checksum and https://github.com/facebook/zstd#benchmarks, if decompression is 1700MB/s and checksum computation is 70000MB/s, checksum computation is an additional ~2.4% time for decompression. Compression is slower and checksumming should be less noticeable.
* Microbench:
```
TEST_TMPDIR=/dev/shm ./branch_db_basic_bench --benchmark_filter=DBGet/comp_style:0/max_data:1048576/per_key_size:256/enable_statistics:0/negative_query:0/enable_filter:0/mmap:0/compression_type:7/compression_checksum:1/no_blockcache:1/iterations:10000/threads:1 --benchmark_repetitions=100
Min out of 100 runs:
Main:
10390 10436 10456 10484 10499 10535 10544 10545 10565 10568
After this PR, checksum=false
10285 10397 10503 10508 10515 10557 10562 10635 10640 10660
After this PR, checksum=true
10827 10876 10925 10949 10971 11052 11061 11063 11100 11109
```
* db_bench:
```
Write perf
TEST_TMPDIR=/dev/shm/ ./db_bench_ichecksum --benchmarks=fillseq[-X10] --compression_type=zstd --num=10000000 --compression_checksum=..
[FillSeq checksum=0]
fillseq [AVG 10 runs] : 281635 (± 31711) ops/sec; 31.2 (± 3.5) MB/sec
fillseq [MEDIAN 10 runs] : 294027 ops/sec; 32.5 MB/sec
[FillSeq checksum=1]
fillseq [AVG 10 runs] : 286961 (± 34700) ops/sec; 31.7 (± 3.8) MB/sec
fillseq [MEDIAN 10 runs] : 283278 ops/sec; 31.3 MB/sec
Read perf
TEST_TMPDIR=/dev/shm ./db_bench_ichecksum --benchmarks=readrandom[-X20] --num=100000000 --reads=1000000 --use_existing_db=true --readonly=1
[Readrandom checksum=1]
readrandom [AVG 20 runs] : 360928 (± 3579) ops/sec; 4.0 (± 0.0) MB/sec
readrandom [MEDIAN 20 runs] : 362468 ops/sec; 4.0 MB/sec
[Readrandom checksum=0]
readrandom [AVG 20 runs] : 380365 (± 2384) ops/sec; 4.2 (± 0.0) MB/sec
readrandom [MEDIAN 20 runs] : 379800 ops/sec; 4.2 MB/sec
Compression
TEST_TMPDIR=/dev/shm ./db_bench_ichecksum --benchmarks=compress[-X20] --compression_type=zstd --num=100000000 --compression_checksum=1
checksum=1
compress [AVG 20 runs] : 54074 (± 634) ops/sec; 211.2 (± 2.5) MB/sec
compress [MEDIAN 20 runs] : 54396 ops/sec; 212.5 MB/sec
checksum=0
compress [AVG 20 runs] : 54598 (± 393) ops/sec; 213.3 (± 1.5) MB/sec
compress [MEDIAN 20 runs] : 54592 ops/sec; 213.3 MB/sec
Decompression:
TEST_TMPDIR=/dev/shm ./db_bench_ichecksum --benchmarks=uncompress[-X20] --compression_type=zstd --compression_checksum=1
checksum = 0
uncompress [AVG 20 runs] : 167499 (± 962) ops/sec; 654.3 (± 3.8) MB/sec
uncompress [MEDIAN 20 runs] : 167210 ops/sec; 653.2 MB/sec
checksum = 1
uncompress [AVG 20 runs] : 167980 (± 924) ops/sec; 656.2 (± 3.6) MB/sec
uncompress [MEDIAN 20 runs] : 168465 ops/sec; 658.1 MB/sec
```
Reviewed By: ajkr
Differential Revision: D48019378
Pulled By: cbi42
fbshipit-source-id: 674120c6e1853c2ced1436ac8138559d0204feba
2023-08-18 22:01:59 +00:00
|
|
|
int level;
|
|
|
|
if (info.options().level == CompressionOptions::kDefaultCompressionLevel) {
|
|
|
|
// 3 is the value of ZSTD_CLEVEL_DEFAULT (not exposed publicly), see
|
|
|
|
// https://github.com/facebook/zstd/issues/1148
|
|
|
|
level = 3;
|
|
|
|
} else {
|
|
|
|
level = info.options().level;
|
|
|
|
}
|
2019-01-19 03:10:17 +00:00
|
|
|
outlen = ZSTD_compress_usingDict(context, &(*output)[output_header_len],
|
|
|
|
compressBound, input, length,
|
|
|
|
info.dict().GetRawDict().data(),
|
|
|
|
info.dict().GetRawDict().size(), level);
|
|
|
|
}
|
Add `CompressionOptions::checksum` for enabling ZSTD checksum (#11666)
Summary:
Optionally enable zstd checksum flag (https://github.com/facebook/zstd/blob/d857369028d997c92ff1f1861a4d7f679a125464/lib/zstd.h#L428) to detect corruption during decompression. Main changes are in compression.h:
* User can set CompressionOptions::checksum to true to enable this feature.
* We enable this feature in ZSTD by setting the checksum flag in ZSTD compression context: `ZSTD_CCtx`.
* Uses `ZSTD_compress2()` to do compression since it supports frame parameter like the checksum flag. Compression level is also set in compression context as a flag.
* Error handling during decompression to propagate error message from ZSTD.
* Updated microbench to test read performance impact.
About compatibility, the current compression decoders should continue to work with the data created by the new compression API `ZSTD_compress2()`: https://github.com/facebook/zstd/issues/3711.
Pull Request resolved: https://github.com/facebook/rocksdb/pull/11666
Test Plan:
* Existing unit tests for zstd compression
* Add unit test `DBTest2.ZSTDChecksum` to test the corruption case
* Manually tested that compression levels, parallel compression, dictionary compression, index compression all work with the new ZSTD_compress2() API.
* Manually tested with `sst_dump --command=recompress` that different compression levels and dictionary compression settings all work.
* Manually tested compiling with older versions of ZSTD: v1.3.8, v1.1.0, v0.6.2.
* Perf impact: from public benchmark data: http://fastcompression.blogspot.com/2019/03/presenting-xxh3.html for checksum and https://github.com/facebook/zstd#benchmarks, if decompression is 1700MB/s and checksum computation is 70000MB/s, checksum computation is an additional ~2.4% time for decompression. Compression is slower and checksumming should be less noticeable.
* Microbench:
```
TEST_TMPDIR=/dev/shm ./branch_db_basic_bench --benchmark_filter=DBGet/comp_style:0/max_data:1048576/per_key_size:256/enable_statistics:0/negative_query:0/enable_filter:0/mmap:0/compression_type:7/compression_checksum:1/no_blockcache:1/iterations:10000/threads:1 --benchmark_repetitions=100
Min out of 100 runs:
Main:
10390 10436 10456 10484 10499 10535 10544 10545 10565 10568
After this PR, checksum=false
10285 10397 10503 10508 10515 10557 10562 10635 10640 10660
After this PR, checksum=true
10827 10876 10925 10949 10971 11052 11061 11063 11100 11109
```
* db_bench:
```
Write perf
TEST_TMPDIR=/dev/shm/ ./db_bench_ichecksum --benchmarks=fillseq[-X10] --compression_type=zstd --num=10000000 --compression_checksum=..
[FillSeq checksum=0]
fillseq [AVG 10 runs] : 281635 (± 31711) ops/sec; 31.2 (± 3.5) MB/sec
fillseq [MEDIAN 10 runs] : 294027 ops/sec; 32.5 MB/sec
[FillSeq checksum=1]
fillseq [AVG 10 runs] : 286961 (± 34700) ops/sec; 31.7 (± 3.8) MB/sec
fillseq [MEDIAN 10 runs] : 283278 ops/sec; 31.3 MB/sec
Read perf
TEST_TMPDIR=/dev/shm ./db_bench_ichecksum --benchmarks=readrandom[-X20] --num=100000000 --reads=1000000 --use_existing_db=true --readonly=1
[Readrandom checksum=1]
readrandom [AVG 20 runs] : 360928 (± 3579) ops/sec; 4.0 (± 0.0) MB/sec
readrandom [MEDIAN 20 runs] : 362468 ops/sec; 4.0 MB/sec
[Readrandom checksum=0]
readrandom [AVG 20 runs] : 380365 (± 2384) ops/sec; 4.2 (± 0.0) MB/sec
readrandom [MEDIAN 20 runs] : 379800 ops/sec; 4.2 MB/sec
Compression
TEST_TMPDIR=/dev/shm ./db_bench_ichecksum --benchmarks=compress[-X20] --compression_type=zstd --num=100000000 --compression_checksum=1
checksum=1
compress [AVG 20 runs] : 54074 (± 634) ops/sec; 211.2 (± 2.5) MB/sec
compress [MEDIAN 20 runs] : 54396 ops/sec; 212.5 MB/sec
checksum=0
compress [AVG 20 runs] : 54598 (± 393) ops/sec; 213.3 (± 1.5) MB/sec
compress [MEDIAN 20 runs] : 54592 ops/sec; 213.3 MB/sec
Decompression:
TEST_TMPDIR=/dev/shm ./db_bench_ichecksum --benchmarks=uncompress[-X20] --compression_type=zstd --compression_checksum=1
checksum = 0
uncompress [AVG 20 runs] : 167499 (± 962) ops/sec; 654.3 (± 3.8) MB/sec
uncompress [MEDIAN 20 runs] : 167210 ops/sec; 653.2 MB/sec
checksum = 1
uncompress [AVG 20 runs] : 167980 (± 924) ops/sec; 656.2 (± 3.6) MB/sec
uncompress [MEDIAN 20 runs] : 168465 ops/sec; 658.1 MB/sec
```
Reviewed By: ajkr
Differential Revision: D48019378
Pulled By: cbi42
fbshipit-source-id: 674120c6e1853c2ced1436ac8138559d0204feba
2023-08-18 22:01:59 +00:00
|
|
|
#endif // ZSTD_ADVANCED
|
2018-04-13 00:55:14 +00:00
|
|
|
#else // up to v0.4.x
|
Shared dictionary compression using reference block
Summary:
This adds a new metablock containing a shared dictionary that is used
to compress all data blocks in the SST file. The size of the shared dictionary
is configurable in CompressionOptions and defaults to 0. It's currently only
used for zlib/lz4/lz4hc, but the block will be stored in the SST regardless of
the compression type if the user chooses a nonzero dictionary size.
During compaction, computes the dictionary by randomly sampling the first
output file in each subcompaction. It pre-computes the intervals to sample
by assuming the output file will have the maximum allowable length. In case
the file is smaller, some of the pre-computed sampling intervals can be beyond
end-of-file, in which case we skip over those samples and the dictionary will
be a bit smaller. After the dictionary is generated using the first file in a
subcompaction, it is loaded into the compression library before writing each
block in each subsequent file of that subcompaction.
On the read path, gets the dictionary from the metablock, if it exists. Then,
loads that dictionary into the compression library before reading each block.
Test Plan: new unit test
Reviewers: yhchiang, IslamAbdelRahman, cyan, sdong
Reviewed By: sdong
Subscribers: andrewkr, yoshinorim, kradhakrishnan, dhruba, leveldb
Differential Revision: https://reviews.facebook.net/D52287
2016-04-28 00:36:03 +00:00
|
|
|
outlen = ZSTD_compress(&(*output)[output_header_len], compressBound, input,
|
2018-05-24 01:33:00 +00:00
|
|
|
length, level);
|
Shared dictionary compression using reference block
Summary:
This adds a new metablock containing a shared dictionary that is used
to compress all data blocks in the SST file. The size of the shared dictionary
is configurable in CompressionOptions and defaults to 0. It's currently only
used for zlib/lz4/lz4hc, but the block will be stored in the SST regardless of
the compression type if the user chooses a nonzero dictionary size.
During compaction, computes the dictionary by randomly sampling the first
output file in each subcompaction. It pre-computes the intervals to sample
by assuming the output file will have the maximum allowable length. In case
the file is smaller, some of the pre-computed sampling intervals can be beyond
end-of-file, in which case we skip over those samples and the dictionary will
be a bit smaller. After the dictionary is generated using the first file in a
subcompaction, it is loaded into the compression library before writing each
block in each subsequent file of that subcompaction.
On the read path, gets the dictionary from the metablock, if it exists. Then,
loads that dictionary into the compression library before reading each block.
Test Plan: new unit test
Reviewers: yhchiang, IslamAbdelRahman, cyan, sdong
Reviewed By: sdong
Subscribers: andrewkr, yoshinorim, kradhakrishnan, dhruba, leveldb
Differential Revision: https://reviews.facebook.net/D52287
2016-04-28 00:36:03 +00:00
|
|
|
#endif // ZSTD_VERSION_NUMBER >= 500
|
2015-08-27 22:40:42 +00:00
|
|
|
if (outlen == 0) {
|
|
|
|
return false;
|
|
|
|
}
|
|
|
|
output->resize(output_header_len + outlen);
|
|
|
|
return true;
|
2018-04-13 00:55:14 +00:00
|
|
|
#else // ZSTD
|
2019-01-19 03:10:17 +00:00
|
|
|
(void)info;
|
2018-04-13 00:55:14 +00:00
|
|
|
(void)input;
|
|
|
|
(void)length;
|
|
|
|
(void)output;
|
2015-08-27 22:40:42 +00:00
|
|
|
return false;
|
2018-01-31 20:04:52 +00:00
|
|
|
#endif
|
2015-08-27 22:40:42 +00:00
|
|
|
}
|
|
|
|
|
Shared dictionary compression using reference block
Summary:
This adds a new metablock containing a shared dictionary that is used
to compress all data blocks in the SST file. The size of the shared dictionary
is configurable in CompressionOptions and defaults to 0. It's currently only
used for zlib/lz4/lz4hc, but the block will be stored in the SST regardless of
the compression type if the user chooses a nonzero dictionary size.
During compaction, computes the dictionary by randomly sampling the first
output file in each subcompaction. It pre-computes the intervals to sample
by assuming the output file will have the maximum allowable length. In case
the file is smaller, some of the pre-computed sampling intervals can be beyond
end-of-file, in which case we skip over those samples and the dictionary will
be a bit smaller. After the dictionary is generated using the first file in a
subcompaction, it is loaded into the compression library before writing each
block in each subsequent file of that subcompaction.
On the read path, gets the dictionary from the metablock, if it exists. Then,
loads that dictionary into the compression library before reading each block.
Test Plan: new unit test
Reviewers: yhchiang, IslamAbdelRahman, cyan, sdong
Reviewed By: sdong
Subscribers: andrewkr, yoshinorim, kradhakrishnan, dhruba, leveldb
Differential Revision: https://reviews.facebook.net/D52287
2016-04-28 00:36:03 +00:00
|
|
|
// @param compression_dict Data for presetting the compression library's
|
|
|
|
// dictionary.
|
Add `CompressionOptions::checksum` for enabling ZSTD checksum (#11666)
Summary:
Optionally enable zstd checksum flag (https://github.com/facebook/zstd/blob/d857369028d997c92ff1f1861a4d7f679a125464/lib/zstd.h#L428) to detect corruption during decompression. Main changes are in compression.h:
* User can set CompressionOptions::checksum to true to enable this feature.
* We enable this feature in ZSTD by setting the checksum flag in ZSTD compression context: `ZSTD_CCtx`.
* Uses `ZSTD_compress2()` to do compression since it supports frame parameter like the checksum flag. Compression level is also set in compression context as a flag.
* Error handling during decompression to propagate error message from ZSTD.
* Updated microbench to test read performance impact.
About compatibility, the current compression decoders should continue to work with the data created by the new compression API `ZSTD_compress2()`: https://github.com/facebook/zstd/issues/3711.
Pull Request resolved: https://github.com/facebook/rocksdb/pull/11666
Test Plan:
* Existing unit tests for zstd compression
* Add unit test `DBTest2.ZSTDChecksum` to test the corruption case
* Manually tested that compression levels, parallel compression, dictionary compression, index compression all work with the new ZSTD_compress2() API.
* Manually tested with `sst_dump --command=recompress` that different compression levels and dictionary compression settings all work.
* Manually tested compiling with older versions of ZSTD: v1.3.8, v1.1.0, v0.6.2.
* Perf impact: from public benchmark data: http://fastcompression.blogspot.com/2019/03/presenting-xxh3.html for checksum and https://github.com/facebook/zstd#benchmarks, if decompression is 1700MB/s and checksum computation is 70000MB/s, checksum computation is an additional ~2.4% time for decompression. Compression is slower and checksumming should be less noticeable.
* Microbench:
```
TEST_TMPDIR=/dev/shm ./branch_db_basic_bench --benchmark_filter=DBGet/comp_style:0/max_data:1048576/per_key_size:256/enable_statistics:0/negative_query:0/enable_filter:0/mmap:0/compression_type:7/compression_checksum:1/no_blockcache:1/iterations:10000/threads:1 --benchmark_repetitions=100
Min out of 100 runs:
Main:
10390 10436 10456 10484 10499 10535 10544 10545 10565 10568
After this PR, checksum=false
10285 10397 10503 10508 10515 10557 10562 10635 10640 10660
After this PR, checksum=true
10827 10876 10925 10949 10971 11052 11061 11063 11100 11109
```
* db_bench:
```
Write perf
TEST_TMPDIR=/dev/shm/ ./db_bench_ichecksum --benchmarks=fillseq[-X10] --compression_type=zstd --num=10000000 --compression_checksum=..
[FillSeq checksum=0]
fillseq [AVG 10 runs] : 281635 (± 31711) ops/sec; 31.2 (± 3.5) MB/sec
fillseq [MEDIAN 10 runs] : 294027 ops/sec; 32.5 MB/sec
[FillSeq checksum=1]
fillseq [AVG 10 runs] : 286961 (± 34700) ops/sec; 31.7 (± 3.8) MB/sec
fillseq [MEDIAN 10 runs] : 283278 ops/sec; 31.3 MB/sec
Read perf
TEST_TMPDIR=/dev/shm ./db_bench_ichecksum --benchmarks=readrandom[-X20] --num=100000000 --reads=1000000 --use_existing_db=true --readonly=1
[Readrandom checksum=1]
readrandom [AVG 20 runs] : 360928 (± 3579) ops/sec; 4.0 (± 0.0) MB/sec
readrandom [MEDIAN 20 runs] : 362468 ops/sec; 4.0 MB/sec
[Readrandom checksum=0]
readrandom [AVG 20 runs] : 380365 (± 2384) ops/sec; 4.2 (± 0.0) MB/sec
readrandom [MEDIAN 20 runs] : 379800 ops/sec; 4.2 MB/sec
Compression
TEST_TMPDIR=/dev/shm ./db_bench_ichecksum --benchmarks=compress[-X20] --compression_type=zstd --num=100000000 --compression_checksum=1
checksum=1
compress [AVG 20 runs] : 54074 (± 634) ops/sec; 211.2 (± 2.5) MB/sec
compress [MEDIAN 20 runs] : 54396 ops/sec; 212.5 MB/sec
checksum=0
compress [AVG 20 runs] : 54598 (± 393) ops/sec; 213.3 (± 1.5) MB/sec
compress [MEDIAN 20 runs] : 54592 ops/sec; 213.3 MB/sec
Decompression:
TEST_TMPDIR=/dev/shm ./db_bench_ichecksum --benchmarks=uncompress[-X20] --compression_type=zstd --compression_checksum=1
checksum = 0
uncompress [AVG 20 runs] : 167499 (± 962) ops/sec; 654.3 (± 3.8) MB/sec
uncompress [MEDIAN 20 runs] : 167210 ops/sec; 653.2 MB/sec
checksum = 1
uncompress [AVG 20 runs] : 167980 (± 924) ops/sec; 656.2 (± 3.6) MB/sec
uncompress [MEDIAN 20 runs] : 168465 ops/sec; 658.1 MB/sec
```
Reviewed By: ajkr
Differential Revision: D48019378
Pulled By: cbi42
fbshipit-source-id: 674120c6e1853c2ced1436ac8138559d0204feba
2023-08-18 22:01:59 +00:00
|
|
|
// @param error_message If not null, will be set if decompression fails.
|
|
|
|
//
|
|
|
|
// Returns nullptr if decompression fails.
|
2018-10-26 21:27:09 +00:00
|
|
|
inline CacheAllocationPtr ZSTD_Uncompress(
|
2019-01-19 03:10:17 +00:00
|
|
|
const UncompressionInfo& info, const char* input_data, size_t input_length,
|
Add `CompressionOptions::checksum` for enabling ZSTD checksum (#11666)
Summary:
Optionally enable zstd checksum flag (https://github.com/facebook/zstd/blob/d857369028d997c92ff1f1861a4d7f679a125464/lib/zstd.h#L428) to detect corruption during decompression. Main changes are in compression.h:
* User can set CompressionOptions::checksum to true to enable this feature.
* We enable this feature in ZSTD by setting the checksum flag in ZSTD compression context: `ZSTD_CCtx`.
* Uses `ZSTD_compress2()` to do compression since it supports frame parameter like the checksum flag. Compression level is also set in compression context as a flag.
* Error handling during decompression to propagate error message from ZSTD.
* Updated microbench to test read performance impact.
About compatibility, the current compression decoders should continue to work with the data created by the new compression API `ZSTD_compress2()`: https://github.com/facebook/zstd/issues/3711.
Pull Request resolved: https://github.com/facebook/rocksdb/pull/11666
Test Plan:
* Existing unit tests for zstd compression
* Add unit test `DBTest2.ZSTDChecksum` to test the corruption case
* Manually tested that compression levels, parallel compression, dictionary compression, index compression all work with the new ZSTD_compress2() API.
* Manually tested with `sst_dump --command=recompress` that different compression levels and dictionary compression settings all work.
* Manually tested compiling with older versions of ZSTD: v1.3.8, v1.1.0, v0.6.2.
* Perf impact: from public benchmark data: http://fastcompression.blogspot.com/2019/03/presenting-xxh3.html for checksum and https://github.com/facebook/zstd#benchmarks, if decompression is 1700MB/s and checksum computation is 70000MB/s, checksum computation is an additional ~2.4% time for decompression. Compression is slower and checksumming should be less noticeable.
* Microbench:
```
TEST_TMPDIR=/dev/shm ./branch_db_basic_bench --benchmark_filter=DBGet/comp_style:0/max_data:1048576/per_key_size:256/enable_statistics:0/negative_query:0/enable_filter:0/mmap:0/compression_type:7/compression_checksum:1/no_blockcache:1/iterations:10000/threads:1 --benchmark_repetitions=100
Min out of 100 runs:
Main:
10390 10436 10456 10484 10499 10535 10544 10545 10565 10568
After this PR, checksum=false
10285 10397 10503 10508 10515 10557 10562 10635 10640 10660
After this PR, checksum=true
10827 10876 10925 10949 10971 11052 11061 11063 11100 11109
```
* db_bench:
```
Write perf
TEST_TMPDIR=/dev/shm/ ./db_bench_ichecksum --benchmarks=fillseq[-X10] --compression_type=zstd --num=10000000 --compression_checksum=..
[FillSeq checksum=0]
fillseq [AVG 10 runs] : 281635 (± 31711) ops/sec; 31.2 (± 3.5) MB/sec
fillseq [MEDIAN 10 runs] : 294027 ops/sec; 32.5 MB/sec
[FillSeq checksum=1]
fillseq [AVG 10 runs] : 286961 (± 34700) ops/sec; 31.7 (± 3.8) MB/sec
fillseq [MEDIAN 10 runs] : 283278 ops/sec; 31.3 MB/sec
Read perf
TEST_TMPDIR=/dev/shm ./db_bench_ichecksum --benchmarks=readrandom[-X20] --num=100000000 --reads=1000000 --use_existing_db=true --readonly=1
[Readrandom checksum=1]
readrandom [AVG 20 runs] : 360928 (± 3579) ops/sec; 4.0 (± 0.0) MB/sec
readrandom [MEDIAN 20 runs] : 362468 ops/sec; 4.0 MB/sec
[Readrandom checksum=0]
readrandom [AVG 20 runs] : 380365 (± 2384) ops/sec; 4.2 (± 0.0) MB/sec
readrandom [MEDIAN 20 runs] : 379800 ops/sec; 4.2 MB/sec
Compression
TEST_TMPDIR=/dev/shm ./db_bench_ichecksum --benchmarks=compress[-X20] --compression_type=zstd --num=100000000 --compression_checksum=1
checksum=1
compress [AVG 20 runs] : 54074 (± 634) ops/sec; 211.2 (± 2.5) MB/sec
compress [MEDIAN 20 runs] : 54396 ops/sec; 212.5 MB/sec
checksum=0
compress [AVG 20 runs] : 54598 (± 393) ops/sec; 213.3 (± 1.5) MB/sec
compress [MEDIAN 20 runs] : 54592 ops/sec; 213.3 MB/sec
Decompression:
TEST_TMPDIR=/dev/shm ./db_bench_ichecksum --benchmarks=uncompress[-X20] --compression_type=zstd --compression_checksum=1
checksum = 0
uncompress [AVG 20 runs] : 167499 (± 962) ops/sec; 654.3 (± 3.8) MB/sec
uncompress [MEDIAN 20 runs] : 167210 ops/sec; 653.2 MB/sec
checksum = 1
uncompress [AVG 20 runs] : 167980 (± 924) ops/sec; 656.2 (± 3.6) MB/sec
uncompress [MEDIAN 20 runs] : 168465 ops/sec; 658.1 MB/sec
```
Reviewed By: ajkr
Differential Revision: D48019378
Pulled By: cbi42
fbshipit-source-id: 674120c6e1853c2ced1436ac8138559d0204feba
2023-08-18 22:01:59 +00:00
|
|
|
size_t* uncompressed_size, MemoryAllocator* allocator = nullptr,
|
|
|
|
const char** error_message = nullptr) {
|
2015-08-27 22:40:42 +00:00
|
|
|
#ifdef ZSTD
|
Add `CompressionOptions::checksum` for enabling ZSTD checksum (#11666)
Summary:
Optionally enable zstd checksum flag (https://github.com/facebook/zstd/blob/d857369028d997c92ff1f1861a4d7f679a125464/lib/zstd.h#L428) to detect corruption during decompression. Main changes are in compression.h:
* User can set CompressionOptions::checksum to true to enable this feature.
* We enable this feature in ZSTD by setting the checksum flag in ZSTD compression context: `ZSTD_CCtx`.
* Uses `ZSTD_compress2()` to do compression since it supports frame parameter like the checksum flag. Compression level is also set in compression context as a flag.
* Error handling during decompression to propagate error message from ZSTD.
* Updated microbench to test read performance impact.
About compatibility, the current compression decoders should continue to work with the data created by the new compression API `ZSTD_compress2()`: https://github.com/facebook/zstd/issues/3711.
Pull Request resolved: https://github.com/facebook/rocksdb/pull/11666
Test Plan:
* Existing unit tests for zstd compression
* Add unit test `DBTest2.ZSTDChecksum` to test the corruption case
* Manually tested that compression levels, parallel compression, dictionary compression, index compression all work with the new ZSTD_compress2() API.
* Manually tested with `sst_dump --command=recompress` that different compression levels and dictionary compression settings all work.
* Manually tested compiling with older versions of ZSTD: v1.3.8, v1.1.0, v0.6.2.
* Perf impact: from public benchmark data: http://fastcompression.blogspot.com/2019/03/presenting-xxh3.html for checksum and https://github.com/facebook/zstd#benchmarks, if decompression is 1700MB/s and checksum computation is 70000MB/s, checksum computation is an additional ~2.4% time for decompression. Compression is slower and checksumming should be less noticeable.
* Microbench:
```
TEST_TMPDIR=/dev/shm ./branch_db_basic_bench --benchmark_filter=DBGet/comp_style:0/max_data:1048576/per_key_size:256/enable_statistics:0/negative_query:0/enable_filter:0/mmap:0/compression_type:7/compression_checksum:1/no_blockcache:1/iterations:10000/threads:1 --benchmark_repetitions=100
Min out of 100 runs:
Main:
10390 10436 10456 10484 10499 10535 10544 10545 10565 10568
After this PR, checksum=false
10285 10397 10503 10508 10515 10557 10562 10635 10640 10660
After this PR, checksum=true
10827 10876 10925 10949 10971 11052 11061 11063 11100 11109
```
* db_bench:
```
Write perf
TEST_TMPDIR=/dev/shm/ ./db_bench_ichecksum --benchmarks=fillseq[-X10] --compression_type=zstd --num=10000000 --compression_checksum=..
[FillSeq checksum=0]
fillseq [AVG 10 runs] : 281635 (± 31711) ops/sec; 31.2 (± 3.5) MB/sec
fillseq [MEDIAN 10 runs] : 294027 ops/sec; 32.5 MB/sec
[FillSeq checksum=1]
fillseq [AVG 10 runs] : 286961 (± 34700) ops/sec; 31.7 (± 3.8) MB/sec
fillseq [MEDIAN 10 runs] : 283278 ops/sec; 31.3 MB/sec
Read perf
TEST_TMPDIR=/dev/shm ./db_bench_ichecksum --benchmarks=readrandom[-X20] --num=100000000 --reads=1000000 --use_existing_db=true --readonly=1
[Readrandom checksum=1]
readrandom [AVG 20 runs] : 360928 (± 3579) ops/sec; 4.0 (± 0.0) MB/sec
readrandom [MEDIAN 20 runs] : 362468 ops/sec; 4.0 MB/sec
[Readrandom checksum=0]
readrandom [AVG 20 runs] : 380365 (± 2384) ops/sec; 4.2 (± 0.0) MB/sec
readrandom [MEDIAN 20 runs] : 379800 ops/sec; 4.2 MB/sec
Compression
TEST_TMPDIR=/dev/shm ./db_bench_ichecksum --benchmarks=compress[-X20] --compression_type=zstd --num=100000000 --compression_checksum=1
checksum=1
compress [AVG 20 runs] : 54074 (± 634) ops/sec; 211.2 (± 2.5) MB/sec
compress [MEDIAN 20 runs] : 54396 ops/sec; 212.5 MB/sec
checksum=0
compress [AVG 20 runs] : 54598 (± 393) ops/sec; 213.3 (± 1.5) MB/sec
compress [MEDIAN 20 runs] : 54592 ops/sec; 213.3 MB/sec
Decompression:
TEST_TMPDIR=/dev/shm ./db_bench_ichecksum --benchmarks=uncompress[-X20] --compression_type=zstd --compression_checksum=1
checksum = 0
uncompress [AVG 20 runs] : 167499 (± 962) ops/sec; 654.3 (± 3.8) MB/sec
uncompress [MEDIAN 20 runs] : 167210 ops/sec; 653.2 MB/sec
checksum = 1
uncompress [AVG 20 runs] : 167980 (± 924) ops/sec; 656.2 (± 3.6) MB/sec
uncompress [MEDIAN 20 runs] : 168465 ops/sec; 658.1 MB/sec
```
Reviewed By: ajkr
Differential Revision: D48019378
Pulled By: cbi42
fbshipit-source-id: 674120c6e1853c2ced1436ac8138559d0204feba
2023-08-18 22:01:59 +00:00
|
|
|
static const char* const kErrorDecodeOutputSize =
|
|
|
|
"Cannot decode output size.";
|
|
|
|
static const char* const kErrorOutputLenMismatch =
|
|
|
|
"Decompressed size does not match header.";
|
2015-08-27 22:40:42 +00:00
|
|
|
uint32_t output_len = 0;
|
|
|
|
if (!compression::GetDecompressedSizeInfo(&input_data, &input_length,
|
|
|
|
&output_len)) {
|
Add `CompressionOptions::checksum` for enabling ZSTD checksum (#11666)
Summary:
Optionally enable zstd checksum flag (https://github.com/facebook/zstd/blob/d857369028d997c92ff1f1861a4d7f679a125464/lib/zstd.h#L428) to detect corruption during decompression. Main changes are in compression.h:
* User can set CompressionOptions::checksum to true to enable this feature.
* We enable this feature in ZSTD by setting the checksum flag in ZSTD compression context: `ZSTD_CCtx`.
* Uses `ZSTD_compress2()` to do compression since it supports frame parameter like the checksum flag. Compression level is also set in compression context as a flag.
* Error handling during decompression to propagate error message from ZSTD.
* Updated microbench to test read performance impact.
About compatibility, the current compression decoders should continue to work with the data created by the new compression API `ZSTD_compress2()`: https://github.com/facebook/zstd/issues/3711.
Pull Request resolved: https://github.com/facebook/rocksdb/pull/11666
Test Plan:
* Existing unit tests for zstd compression
* Add unit test `DBTest2.ZSTDChecksum` to test the corruption case
* Manually tested that compression levels, parallel compression, dictionary compression, index compression all work with the new ZSTD_compress2() API.
* Manually tested with `sst_dump --command=recompress` that different compression levels and dictionary compression settings all work.
* Manually tested compiling with older versions of ZSTD: v1.3.8, v1.1.0, v0.6.2.
* Perf impact: from public benchmark data: http://fastcompression.blogspot.com/2019/03/presenting-xxh3.html for checksum and https://github.com/facebook/zstd#benchmarks, if decompression is 1700MB/s and checksum computation is 70000MB/s, checksum computation is an additional ~2.4% time for decompression. Compression is slower and checksumming should be less noticeable.
* Microbench:
```
TEST_TMPDIR=/dev/shm ./branch_db_basic_bench --benchmark_filter=DBGet/comp_style:0/max_data:1048576/per_key_size:256/enable_statistics:0/negative_query:0/enable_filter:0/mmap:0/compression_type:7/compression_checksum:1/no_blockcache:1/iterations:10000/threads:1 --benchmark_repetitions=100
Min out of 100 runs:
Main:
10390 10436 10456 10484 10499 10535 10544 10545 10565 10568
After this PR, checksum=false
10285 10397 10503 10508 10515 10557 10562 10635 10640 10660
After this PR, checksum=true
10827 10876 10925 10949 10971 11052 11061 11063 11100 11109
```
* db_bench:
```
Write perf
TEST_TMPDIR=/dev/shm/ ./db_bench_ichecksum --benchmarks=fillseq[-X10] --compression_type=zstd --num=10000000 --compression_checksum=..
[FillSeq checksum=0]
fillseq [AVG 10 runs] : 281635 (± 31711) ops/sec; 31.2 (± 3.5) MB/sec
fillseq [MEDIAN 10 runs] : 294027 ops/sec; 32.5 MB/sec
[FillSeq checksum=1]
fillseq [AVG 10 runs] : 286961 (± 34700) ops/sec; 31.7 (± 3.8) MB/sec
fillseq [MEDIAN 10 runs] : 283278 ops/sec; 31.3 MB/sec
Read perf
TEST_TMPDIR=/dev/shm ./db_bench_ichecksum --benchmarks=readrandom[-X20] --num=100000000 --reads=1000000 --use_existing_db=true --readonly=1
[Readrandom checksum=1]
readrandom [AVG 20 runs] : 360928 (± 3579) ops/sec; 4.0 (± 0.0) MB/sec
readrandom [MEDIAN 20 runs] : 362468 ops/sec; 4.0 MB/sec
[Readrandom checksum=0]
readrandom [AVG 20 runs] : 380365 (± 2384) ops/sec; 4.2 (± 0.0) MB/sec
readrandom [MEDIAN 20 runs] : 379800 ops/sec; 4.2 MB/sec
Compression
TEST_TMPDIR=/dev/shm ./db_bench_ichecksum --benchmarks=compress[-X20] --compression_type=zstd --num=100000000 --compression_checksum=1
checksum=1
compress [AVG 20 runs] : 54074 (± 634) ops/sec; 211.2 (± 2.5) MB/sec
compress [MEDIAN 20 runs] : 54396 ops/sec; 212.5 MB/sec
checksum=0
compress [AVG 20 runs] : 54598 (± 393) ops/sec; 213.3 (± 1.5) MB/sec
compress [MEDIAN 20 runs] : 54592 ops/sec; 213.3 MB/sec
Decompression:
TEST_TMPDIR=/dev/shm ./db_bench_ichecksum --benchmarks=uncompress[-X20] --compression_type=zstd --compression_checksum=1
checksum = 0
uncompress [AVG 20 runs] : 167499 (± 962) ops/sec; 654.3 (± 3.8) MB/sec
uncompress [MEDIAN 20 runs] : 167210 ops/sec; 653.2 MB/sec
checksum = 1
uncompress [AVG 20 runs] : 167980 (± 924) ops/sec; 656.2 (± 3.6) MB/sec
uncompress [MEDIAN 20 runs] : 168465 ops/sec; 658.1 MB/sec
```
Reviewed By: ajkr
Differential Revision: D48019378
Pulled By: cbi42
fbshipit-source-id: 674120c6e1853c2ced1436ac8138559d0204feba
2023-08-18 22:01:59 +00:00
|
|
|
if (error_message) {
|
|
|
|
*error_message = kErrorDecodeOutputSize;
|
|
|
|
}
|
2015-08-27 22:40:42 +00:00
|
|
|
return nullptr;
|
|
|
|
}
|
|
|
|
|
Add `CompressionOptions::checksum` for enabling ZSTD checksum (#11666)
Summary:
Optionally enable zstd checksum flag (https://github.com/facebook/zstd/blob/d857369028d997c92ff1f1861a4d7f679a125464/lib/zstd.h#L428) to detect corruption during decompression. Main changes are in compression.h:
* User can set CompressionOptions::checksum to true to enable this feature.
* We enable this feature in ZSTD by setting the checksum flag in ZSTD compression context: `ZSTD_CCtx`.
* Uses `ZSTD_compress2()` to do compression since it supports frame parameter like the checksum flag. Compression level is also set in compression context as a flag.
* Error handling during decompression to propagate error message from ZSTD.
* Updated microbench to test read performance impact.
About compatibility, the current compression decoders should continue to work with the data created by the new compression API `ZSTD_compress2()`: https://github.com/facebook/zstd/issues/3711.
Pull Request resolved: https://github.com/facebook/rocksdb/pull/11666
Test Plan:
* Existing unit tests for zstd compression
* Add unit test `DBTest2.ZSTDChecksum` to test the corruption case
* Manually tested that compression levels, parallel compression, dictionary compression, index compression all work with the new ZSTD_compress2() API.
* Manually tested with `sst_dump --command=recompress` that different compression levels and dictionary compression settings all work.
* Manually tested compiling with older versions of ZSTD: v1.3.8, v1.1.0, v0.6.2.
* Perf impact: from public benchmark data: http://fastcompression.blogspot.com/2019/03/presenting-xxh3.html for checksum and https://github.com/facebook/zstd#benchmarks, if decompression is 1700MB/s and checksum computation is 70000MB/s, checksum computation is an additional ~2.4% time for decompression. Compression is slower and checksumming should be less noticeable.
* Microbench:
```
TEST_TMPDIR=/dev/shm ./branch_db_basic_bench --benchmark_filter=DBGet/comp_style:0/max_data:1048576/per_key_size:256/enable_statistics:0/negative_query:0/enable_filter:0/mmap:0/compression_type:7/compression_checksum:1/no_blockcache:1/iterations:10000/threads:1 --benchmark_repetitions=100
Min out of 100 runs:
Main:
10390 10436 10456 10484 10499 10535 10544 10545 10565 10568
After this PR, checksum=false
10285 10397 10503 10508 10515 10557 10562 10635 10640 10660
After this PR, checksum=true
10827 10876 10925 10949 10971 11052 11061 11063 11100 11109
```
* db_bench:
```
Write perf
TEST_TMPDIR=/dev/shm/ ./db_bench_ichecksum --benchmarks=fillseq[-X10] --compression_type=zstd --num=10000000 --compression_checksum=..
[FillSeq checksum=0]
fillseq [AVG 10 runs] : 281635 (± 31711) ops/sec; 31.2 (± 3.5) MB/sec
fillseq [MEDIAN 10 runs] : 294027 ops/sec; 32.5 MB/sec
[FillSeq checksum=1]
fillseq [AVG 10 runs] : 286961 (± 34700) ops/sec; 31.7 (± 3.8) MB/sec
fillseq [MEDIAN 10 runs] : 283278 ops/sec; 31.3 MB/sec
Read perf
TEST_TMPDIR=/dev/shm ./db_bench_ichecksum --benchmarks=readrandom[-X20] --num=100000000 --reads=1000000 --use_existing_db=true --readonly=1
[Readrandom checksum=1]
readrandom [AVG 20 runs] : 360928 (± 3579) ops/sec; 4.0 (± 0.0) MB/sec
readrandom [MEDIAN 20 runs] : 362468 ops/sec; 4.0 MB/sec
[Readrandom checksum=0]
readrandom [AVG 20 runs] : 380365 (± 2384) ops/sec; 4.2 (± 0.0) MB/sec
readrandom [MEDIAN 20 runs] : 379800 ops/sec; 4.2 MB/sec
Compression
TEST_TMPDIR=/dev/shm ./db_bench_ichecksum --benchmarks=compress[-X20] --compression_type=zstd --num=100000000 --compression_checksum=1
checksum=1
compress [AVG 20 runs] : 54074 (± 634) ops/sec; 211.2 (± 2.5) MB/sec
compress [MEDIAN 20 runs] : 54396 ops/sec; 212.5 MB/sec
checksum=0
compress [AVG 20 runs] : 54598 (± 393) ops/sec; 213.3 (± 1.5) MB/sec
compress [MEDIAN 20 runs] : 54592 ops/sec; 213.3 MB/sec
Decompression:
TEST_TMPDIR=/dev/shm ./db_bench_ichecksum --benchmarks=uncompress[-X20] --compression_type=zstd --compression_checksum=1
checksum = 0
uncompress [AVG 20 runs] : 167499 (± 962) ops/sec; 654.3 (± 3.8) MB/sec
uncompress [MEDIAN 20 runs] : 167210 ops/sec; 653.2 MB/sec
checksum = 1
uncompress [AVG 20 runs] : 167980 (± 924) ops/sec; 656.2 (± 3.6) MB/sec
uncompress [MEDIAN 20 runs] : 168465 ops/sec; 658.1 MB/sec
```
Reviewed By: ajkr
Differential Revision: D48019378
Pulled By: cbi42
fbshipit-source-id: 674120c6e1853c2ced1436ac8138559d0204feba
2023-08-18 22:01:59 +00:00
|
|
|
CacheAllocationPtr output = AllocateBlock(output_len, allocator);
|
2019-01-19 03:10:17 +00:00
|
|
|
size_t actual_output_length = 0;
|
Shared dictionary compression using reference block
Summary:
This adds a new metablock containing a shared dictionary that is used
to compress all data blocks in the SST file. The size of the shared dictionary
is configurable in CompressionOptions and defaults to 0. It's currently only
used for zlib/lz4/lz4hc, but the block will be stored in the SST regardless of
the compression type if the user chooses a nonzero dictionary size.
During compaction, computes the dictionary by randomly sampling the first
output file in each subcompaction. It pre-computes the intervals to sample
by assuming the output file will have the maximum allowable length. In case
the file is smaller, some of the pre-computed sampling intervals can be beyond
end-of-file, in which case we skip over those samples and the dictionary will
be a bit smaller. After the dictionary is generated using the first file in a
subcompaction, it is loaded into the compression library before writing each
block in each subsequent file of that subcompaction.
On the read path, gets the dictionary from the metablock, if it exists. Then,
loads that dictionary into the compression library before reading each block.
Test Plan: new unit test
Reviewers: yhchiang, IslamAbdelRahman, cyan, sdong
Reviewed By: sdong
Subscribers: andrewkr, yoshinorim, kradhakrishnan, dhruba, leveldb
Differential Revision: https://reviews.facebook.net/D52287
2016-04-28 00:36:03 +00:00
|
|
|
#if ZSTD_VERSION_NUMBER >= 500 // v0.5.0+
|
2019-01-19 03:10:17 +00:00
|
|
|
ZSTD_DCtx* context = info.context().GetZSTDContext();
|
2018-06-04 19:04:52 +00:00
|
|
|
assert(context != nullptr);
|
2019-01-24 02:11:08 +00:00
|
|
|
#ifdef ROCKSDB_ZSTD_DDICT
|
2019-01-19 03:10:17 +00:00
|
|
|
if (info.dict().GetDigestedZstdDDict() != nullptr) {
|
|
|
|
actual_output_length = ZSTD_decompress_usingDDict(
|
|
|
|
context, output.get(), output_len, input_data, input_length,
|
|
|
|
info.dict().GetDigestedZstdDDict());
|
Add `CompressionOptions::checksum` for enabling ZSTD checksum (#11666)
Summary:
Optionally enable zstd checksum flag (https://github.com/facebook/zstd/blob/d857369028d997c92ff1f1861a4d7f679a125464/lib/zstd.h#L428) to detect corruption during decompression. Main changes are in compression.h:
* User can set CompressionOptions::checksum to true to enable this feature.
* We enable this feature in ZSTD by setting the checksum flag in ZSTD compression context: `ZSTD_CCtx`.
* Uses `ZSTD_compress2()` to do compression since it supports frame parameter like the checksum flag. Compression level is also set in compression context as a flag.
* Error handling during decompression to propagate error message from ZSTD.
* Updated microbench to test read performance impact.
About compatibility, the current compression decoders should continue to work with the data created by the new compression API `ZSTD_compress2()`: https://github.com/facebook/zstd/issues/3711.
Pull Request resolved: https://github.com/facebook/rocksdb/pull/11666
Test Plan:
* Existing unit tests for zstd compression
* Add unit test `DBTest2.ZSTDChecksum` to test the corruption case
* Manually tested that compression levels, parallel compression, dictionary compression, index compression all work with the new ZSTD_compress2() API.
* Manually tested with `sst_dump --command=recompress` that different compression levels and dictionary compression settings all work.
* Manually tested compiling with older versions of ZSTD: v1.3.8, v1.1.0, v0.6.2.
* Perf impact: from public benchmark data: http://fastcompression.blogspot.com/2019/03/presenting-xxh3.html for checksum and https://github.com/facebook/zstd#benchmarks, if decompression is 1700MB/s and checksum computation is 70000MB/s, checksum computation is an additional ~2.4% time for decompression. Compression is slower and checksumming should be less noticeable.
* Microbench:
```
TEST_TMPDIR=/dev/shm ./branch_db_basic_bench --benchmark_filter=DBGet/comp_style:0/max_data:1048576/per_key_size:256/enable_statistics:0/negative_query:0/enable_filter:0/mmap:0/compression_type:7/compression_checksum:1/no_blockcache:1/iterations:10000/threads:1 --benchmark_repetitions=100
Min out of 100 runs:
Main:
10390 10436 10456 10484 10499 10535 10544 10545 10565 10568
After this PR, checksum=false
10285 10397 10503 10508 10515 10557 10562 10635 10640 10660
After this PR, checksum=true
10827 10876 10925 10949 10971 11052 11061 11063 11100 11109
```
* db_bench:
```
Write perf
TEST_TMPDIR=/dev/shm/ ./db_bench_ichecksum --benchmarks=fillseq[-X10] --compression_type=zstd --num=10000000 --compression_checksum=..
[FillSeq checksum=0]
fillseq [AVG 10 runs] : 281635 (± 31711) ops/sec; 31.2 (± 3.5) MB/sec
fillseq [MEDIAN 10 runs] : 294027 ops/sec; 32.5 MB/sec
[FillSeq checksum=1]
fillseq [AVG 10 runs] : 286961 (± 34700) ops/sec; 31.7 (± 3.8) MB/sec
fillseq [MEDIAN 10 runs] : 283278 ops/sec; 31.3 MB/sec
Read perf
TEST_TMPDIR=/dev/shm ./db_bench_ichecksum --benchmarks=readrandom[-X20] --num=100000000 --reads=1000000 --use_existing_db=true --readonly=1
[Readrandom checksum=1]
readrandom [AVG 20 runs] : 360928 (± 3579) ops/sec; 4.0 (± 0.0) MB/sec
readrandom [MEDIAN 20 runs] : 362468 ops/sec; 4.0 MB/sec
[Readrandom checksum=0]
readrandom [AVG 20 runs] : 380365 (± 2384) ops/sec; 4.2 (± 0.0) MB/sec
readrandom [MEDIAN 20 runs] : 379800 ops/sec; 4.2 MB/sec
Compression
TEST_TMPDIR=/dev/shm ./db_bench_ichecksum --benchmarks=compress[-X20] --compression_type=zstd --num=100000000 --compression_checksum=1
checksum=1
compress [AVG 20 runs] : 54074 (± 634) ops/sec; 211.2 (± 2.5) MB/sec
compress [MEDIAN 20 runs] : 54396 ops/sec; 212.5 MB/sec
checksum=0
compress [AVG 20 runs] : 54598 (± 393) ops/sec; 213.3 (± 1.5) MB/sec
compress [MEDIAN 20 runs] : 54592 ops/sec; 213.3 MB/sec
Decompression:
TEST_TMPDIR=/dev/shm ./db_bench_ichecksum --benchmarks=uncompress[-X20] --compression_type=zstd --compression_checksum=1
checksum = 0
uncompress [AVG 20 runs] : 167499 (± 962) ops/sec; 654.3 (± 3.8) MB/sec
uncompress [MEDIAN 20 runs] : 167210 ops/sec; 653.2 MB/sec
checksum = 1
uncompress [AVG 20 runs] : 167980 (± 924) ops/sec; 656.2 (± 3.6) MB/sec
uncompress [MEDIAN 20 runs] : 168465 ops/sec; 658.1 MB/sec
```
Reviewed By: ajkr
Differential Revision: D48019378
Pulled By: cbi42
fbshipit-source-id: 674120c6e1853c2ced1436ac8138559d0204feba
2023-08-18 22:01:59 +00:00
|
|
|
} else {
|
2019-01-24 02:11:08 +00:00
|
|
|
#endif // ROCKSDB_ZSTD_DDICT
|
2019-01-19 03:10:17 +00:00
|
|
|
actual_output_length = ZSTD_decompress_usingDict(
|
|
|
|
context, output.get(), output_len, input_data, input_length,
|
|
|
|
info.dict().GetRawDict().data(), info.dict().GetRawDict().size());
|
Add `CompressionOptions::checksum` for enabling ZSTD checksum (#11666)
Summary:
Optionally enable zstd checksum flag (https://github.com/facebook/zstd/blob/d857369028d997c92ff1f1861a4d7f679a125464/lib/zstd.h#L428) to detect corruption during decompression. Main changes are in compression.h:
* User can set CompressionOptions::checksum to true to enable this feature.
* We enable this feature in ZSTD by setting the checksum flag in ZSTD compression context: `ZSTD_CCtx`.
* Uses `ZSTD_compress2()` to do compression since it supports frame parameter like the checksum flag. Compression level is also set in compression context as a flag.
* Error handling during decompression to propagate error message from ZSTD.
* Updated microbench to test read performance impact.
About compatibility, the current compression decoders should continue to work with the data created by the new compression API `ZSTD_compress2()`: https://github.com/facebook/zstd/issues/3711.
Pull Request resolved: https://github.com/facebook/rocksdb/pull/11666
Test Plan:
* Existing unit tests for zstd compression
* Add unit test `DBTest2.ZSTDChecksum` to test the corruption case
* Manually tested that compression levels, parallel compression, dictionary compression, index compression all work with the new ZSTD_compress2() API.
* Manually tested with `sst_dump --command=recompress` that different compression levels and dictionary compression settings all work.
* Manually tested compiling with older versions of ZSTD: v1.3.8, v1.1.0, v0.6.2.
* Perf impact: from public benchmark data: http://fastcompression.blogspot.com/2019/03/presenting-xxh3.html for checksum and https://github.com/facebook/zstd#benchmarks, if decompression is 1700MB/s and checksum computation is 70000MB/s, checksum computation is an additional ~2.4% time for decompression. Compression is slower and checksumming should be less noticeable.
* Microbench:
```
TEST_TMPDIR=/dev/shm ./branch_db_basic_bench --benchmark_filter=DBGet/comp_style:0/max_data:1048576/per_key_size:256/enable_statistics:0/negative_query:0/enable_filter:0/mmap:0/compression_type:7/compression_checksum:1/no_blockcache:1/iterations:10000/threads:1 --benchmark_repetitions=100
Min out of 100 runs:
Main:
10390 10436 10456 10484 10499 10535 10544 10545 10565 10568
After this PR, checksum=false
10285 10397 10503 10508 10515 10557 10562 10635 10640 10660
After this PR, checksum=true
10827 10876 10925 10949 10971 11052 11061 11063 11100 11109
```
* db_bench:
```
Write perf
TEST_TMPDIR=/dev/shm/ ./db_bench_ichecksum --benchmarks=fillseq[-X10] --compression_type=zstd --num=10000000 --compression_checksum=..
[FillSeq checksum=0]
fillseq [AVG 10 runs] : 281635 (± 31711) ops/sec; 31.2 (± 3.5) MB/sec
fillseq [MEDIAN 10 runs] : 294027 ops/sec; 32.5 MB/sec
[FillSeq checksum=1]
fillseq [AVG 10 runs] : 286961 (± 34700) ops/sec; 31.7 (± 3.8) MB/sec
fillseq [MEDIAN 10 runs] : 283278 ops/sec; 31.3 MB/sec
Read perf
TEST_TMPDIR=/dev/shm ./db_bench_ichecksum --benchmarks=readrandom[-X20] --num=100000000 --reads=1000000 --use_existing_db=true --readonly=1
[Readrandom checksum=1]
readrandom [AVG 20 runs] : 360928 (± 3579) ops/sec; 4.0 (± 0.0) MB/sec
readrandom [MEDIAN 20 runs] : 362468 ops/sec; 4.0 MB/sec
[Readrandom checksum=0]
readrandom [AVG 20 runs] : 380365 (± 2384) ops/sec; 4.2 (± 0.0) MB/sec
readrandom [MEDIAN 20 runs] : 379800 ops/sec; 4.2 MB/sec
Compression
TEST_TMPDIR=/dev/shm ./db_bench_ichecksum --benchmarks=compress[-X20] --compression_type=zstd --num=100000000 --compression_checksum=1
checksum=1
compress [AVG 20 runs] : 54074 (± 634) ops/sec; 211.2 (± 2.5) MB/sec
compress [MEDIAN 20 runs] : 54396 ops/sec; 212.5 MB/sec
checksum=0
compress [AVG 20 runs] : 54598 (± 393) ops/sec; 213.3 (± 1.5) MB/sec
compress [MEDIAN 20 runs] : 54592 ops/sec; 213.3 MB/sec
Decompression:
TEST_TMPDIR=/dev/shm ./db_bench_ichecksum --benchmarks=uncompress[-X20] --compression_type=zstd --compression_checksum=1
checksum = 0
uncompress [AVG 20 runs] : 167499 (± 962) ops/sec; 654.3 (± 3.8) MB/sec
uncompress [MEDIAN 20 runs] : 167210 ops/sec; 653.2 MB/sec
checksum = 1
uncompress [AVG 20 runs] : 167980 (± 924) ops/sec; 656.2 (± 3.6) MB/sec
uncompress [MEDIAN 20 runs] : 168465 ops/sec; 658.1 MB/sec
```
Reviewed By: ajkr
Differential Revision: D48019378
Pulled By: cbi42
fbshipit-source-id: 674120c6e1853c2ced1436ac8138559d0204feba
2023-08-18 22:01:59 +00:00
|
|
|
#ifdef ROCKSDB_ZSTD_DDICT
|
2019-01-19 03:10:17 +00:00
|
|
|
}
|
Add `CompressionOptions::checksum` for enabling ZSTD checksum (#11666)
Summary:
Optionally enable zstd checksum flag (https://github.com/facebook/zstd/blob/d857369028d997c92ff1f1861a4d7f679a125464/lib/zstd.h#L428) to detect corruption during decompression. Main changes are in compression.h:
* User can set CompressionOptions::checksum to true to enable this feature.
* We enable this feature in ZSTD by setting the checksum flag in ZSTD compression context: `ZSTD_CCtx`.
* Uses `ZSTD_compress2()` to do compression since it supports frame parameter like the checksum flag. Compression level is also set in compression context as a flag.
* Error handling during decompression to propagate error message from ZSTD.
* Updated microbench to test read performance impact.
About compatibility, the current compression decoders should continue to work with the data created by the new compression API `ZSTD_compress2()`: https://github.com/facebook/zstd/issues/3711.
Pull Request resolved: https://github.com/facebook/rocksdb/pull/11666
Test Plan:
* Existing unit tests for zstd compression
* Add unit test `DBTest2.ZSTDChecksum` to test the corruption case
* Manually tested that compression levels, parallel compression, dictionary compression, index compression all work with the new ZSTD_compress2() API.
* Manually tested with `sst_dump --command=recompress` that different compression levels and dictionary compression settings all work.
* Manually tested compiling with older versions of ZSTD: v1.3.8, v1.1.0, v0.6.2.
* Perf impact: from public benchmark data: http://fastcompression.blogspot.com/2019/03/presenting-xxh3.html for checksum and https://github.com/facebook/zstd#benchmarks, if decompression is 1700MB/s and checksum computation is 70000MB/s, checksum computation is an additional ~2.4% time for decompression. Compression is slower and checksumming should be less noticeable.
* Microbench:
```
TEST_TMPDIR=/dev/shm ./branch_db_basic_bench --benchmark_filter=DBGet/comp_style:0/max_data:1048576/per_key_size:256/enable_statistics:0/negative_query:0/enable_filter:0/mmap:0/compression_type:7/compression_checksum:1/no_blockcache:1/iterations:10000/threads:1 --benchmark_repetitions=100
Min out of 100 runs:
Main:
10390 10436 10456 10484 10499 10535 10544 10545 10565 10568
After this PR, checksum=false
10285 10397 10503 10508 10515 10557 10562 10635 10640 10660
After this PR, checksum=true
10827 10876 10925 10949 10971 11052 11061 11063 11100 11109
```
* db_bench:
```
Write perf
TEST_TMPDIR=/dev/shm/ ./db_bench_ichecksum --benchmarks=fillseq[-X10] --compression_type=zstd --num=10000000 --compression_checksum=..
[FillSeq checksum=0]
fillseq [AVG 10 runs] : 281635 (± 31711) ops/sec; 31.2 (± 3.5) MB/sec
fillseq [MEDIAN 10 runs] : 294027 ops/sec; 32.5 MB/sec
[FillSeq checksum=1]
fillseq [AVG 10 runs] : 286961 (± 34700) ops/sec; 31.7 (± 3.8) MB/sec
fillseq [MEDIAN 10 runs] : 283278 ops/sec; 31.3 MB/sec
Read perf
TEST_TMPDIR=/dev/shm ./db_bench_ichecksum --benchmarks=readrandom[-X20] --num=100000000 --reads=1000000 --use_existing_db=true --readonly=1
[Readrandom checksum=1]
readrandom [AVG 20 runs] : 360928 (± 3579) ops/sec; 4.0 (± 0.0) MB/sec
readrandom [MEDIAN 20 runs] : 362468 ops/sec; 4.0 MB/sec
[Readrandom checksum=0]
readrandom [AVG 20 runs] : 380365 (± 2384) ops/sec; 4.2 (± 0.0) MB/sec
readrandom [MEDIAN 20 runs] : 379800 ops/sec; 4.2 MB/sec
Compression
TEST_TMPDIR=/dev/shm ./db_bench_ichecksum --benchmarks=compress[-X20] --compression_type=zstd --num=100000000 --compression_checksum=1
checksum=1
compress [AVG 20 runs] : 54074 (± 634) ops/sec; 211.2 (± 2.5) MB/sec
compress [MEDIAN 20 runs] : 54396 ops/sec; 212.5 MB/sec
checksum=0
compress [AVG 20 runs] : 54598 (± 393) ops/sec; 213.3 (± 1.5) MB/sec
compress [MEDIAN 20 runs] : 54592 ops/sec; 213.3 MB/sec
Decompression:
TEST_TMPDIR=/dev/shm ./db_bench_ichecksum --benchmarks=uncompress[-X20] --compression_type=zstd --compression_checksum=1
checksum = 0
uncompress [AVG 20 runs] : 167499 (± 962) ops/sec; 654.3 (± 3.8) MB/sec
uncompress [MEDIAN 20 runs] : 167210 ops/sec; 653.2 MB/sec
checksum = 1
uncompress [AVG 20 runs] : 167980 (± 924) ops/sec; 656.2 (± 3.6) MB/sec
uncompress [MEDIAN 20 runs] : 168465 ops/sec; 658.1 MB/sec
```
Reviewed By: ajkr
Differential Revision: D48019378
Pulled By: cbi42
fbshipit-source-id: 674120c6e1853c2ced1436ac8138559d0204feba
2023-08-18 22:01:59 +00:00
|
|
|
#endif // ROCKSDB_ZSTD_DDICT
|
2018-04-13 00:55:14 +00:00
|
|
|
#else // up to v0.4.x
|
2019-01-19 03:10:17 +00:00
|
|
|
(void)info;
|
Shared dictionary compression using reference block
Summary:
This adds a new metablock containing a shared dictionary that is used
to compress all data blocks in the SST file. The size of the shared dictionary
is configurable in CompressionOptions and defaults to 0. It's currently only
used for zlib/lz4/lz4hc, but the block will be stored in the SST regardless of
the compression type if the user chooses a nonzero dictionary size.
During compaction, computes the dictionary by randomly sampling the first
output file in each subcompaction. It pre-computes the intervals to sample
by assuming the output file will have the maximum allowable length. In case
the file is smaller, some of the pre-computed sampling intervals can be beyond
end-of-file, in which case we skip over those samples and the dictionary will
be a bit smaller. After the dictionary is generated using the first file in a
subcompaction, it is loaded into the compression library before writing each
block in each subsequent file of that subcompaction.
On the read path, gets the dictionary from the metablock, if it exists. Then,
loads that dictionary into the compression library before reading each block.
Test Plan: new unit test
Reviewers: yhchiang, IslamAbdelRahman, cyan, sdong
Reviewed By: sdong
Subscribers: andrewkr, yoshinorim, kradhakrishnan, dhruba, leveldb
Differential Revision: https://reviews.facebook.net/D52287
2016-04-28 00:36:03 +00:00
|
|
|
actual_output_length =
|
2018-10-03 00:21:54 +00:00
|
|
|
ZSTD_decompress(output.get(), output_len, input_data, input_length);
|
Shared dictionary compression using reference block
Summary:
This adds a new metablock containing a shared dictionary that is used
to compress all data blocks in the SST file. The size of the shared dictionary
is configurable in CompressionOptions and defaults to 0. It's currently only
used for zlib/lz4/lz4hc, but the block will be stored in the SST regardless of
the compression type if the user chooses a nonzero dictionary size.
During compaction, computes the dictionary by randomly sampling the first
output file in each subcompaction. It pre-computes the intervals to sample
by assuming the output file will have the maximum allowable length. In case
the file is smaller, some of the pre-computed sampling intervals can be beyond
end-of-file, in which case we skip over those samples and the dictionary will
be a bit smaller. After the dictionary is generated using the first file in a
subcompaction, it is loaded into the compression library before writing each
block in each subsequent file of that subcompaction.
On the read path, gets the dictionary from the metablock, if it exists. Then,
loads that dictionary into the compression library before reading each block.
Test Plan: new unit test
Reviewers: yhchiang, IslamAbdelRahman, cyan, sdong
Reviewed By: sdong
Subscribers: andrewkr, yoshinorim, kradhakrishnan, dhruba, leveldb
Differential Revision: https://reviews.facebook.net/D52287
2016-04-28 00:36:03 +00:00
|
|
|
#endif // ZSTD_VERSION_NUMBER >= 500
|
Add `CompressionOptions::checksum` for enabling ZSTD checksum (#11666)
Summary:
Optionally enable zstd checksum flag (https://github.com/facebook/zstd/blob/d857369028d997c92ff1f1861a4d7f679a125464/lib/zstd.h#L428) to detect corruption during decompression. Main changes are in compression.h:
* User can set CompressionOptions::checksum to true to enable this feature.
* We enable this feature in ZSTD by setting the checksum flag in ZSTD compression context: `ZSTD_CCtx`.
* Uses `ZSTD_compress2()` to do compression since it supports frame parameter like the checksum flag. Compression level is also set in compression context as a flag.
* Error handling during decompression to propagate error message from ZSTD.
* Updated microbench to test read performance impact.
About compatibility, the current compression decoders should continue to work with the data created by the new compression API `ZSTD_compress2()`: https://github.com/facebook/zstd/issues/3711.
Pull Request resolved: https://github.com/facebook/rocksdb/pull/11666
Test Plan:
* Existing unit tests for zstd compression
* Add unit test `DBTest2.ZSTDChecksum` to test the corruption case
* Manually tested that compression levels, parallel compression, dictionary compression, index compression all work with the new ZSTD_compress2() API.
* Manually tested with `sst_dump --command=recompress` that different compression levels and dictionary compression settings all work.
* Manually tested compiling with older versions of ZSTD: v1.3.8, v1.1.0, v0.6.2.
* Perf impact: from public benchmark data: http://fastcompression.blogspot.com/2019/03/presenting-xxh3.html for checksum and https://github.com/facebook/zstd#benchmarks, if decompression is 1700MB/s and checksum computation is 70000MB/s, checksum computation is an additional ~2.4% time for decompression. Compression is slower and checksumming should be less noticeable.
* Microbench:
```
TEST_TMPDIR=/dev/shm ./branch_db_basic_bench --benchmark_filter=DBGet/comp_style:0/max_data:1048576/per_key_size:256/enable_statistics:0/negative_query:0/enable_filter:0/mmap:0/compression_type:7/compression_checksum:1/no_blockcache:1/iterations:10000/threads:1 --benchmark_repetitions=100
Min out of 100 runs:
Main:
10390 10436 10456 10484 10499 10535 10544 10545 10565 10568
After this PR, checksum=false
10285 10397 10503 10508 10515 10557 10562 10635 10640 10660
After this PR, checksum=true
10827 10876 10925 10949 10971 11052 11061 11063 11100 11109
```
* db_bench:
```
Write perf
TEST_TMPDIR=/dev/shm/ ./db_bench_ichecksum --benchmarks=fillseq[-X10] --compression_type=zstd --num=10000000 --compression_checksum=..
[FillSeq checksum=0]
fillseq [AVG 10 runs] : 281635 (± 31711) ops/sec; 31.2 (± 3.5) MB/sec
fillseq [MEDIAN 10 runs] : 294027 ops/sec; 32.5 MB/sec
[FillSeq checksum=1]
fillseq [AVG 10 runs] : 286961 (± 34700) ops/sec; 31.7 (± 3.8) MB/sec
fillseq [MEDIAN 10 runs] : 283278 ops/sec; 31.3 MB/sec
Read perf
TEST_TMPDIR=/dev/shm ./db_bench_ichecksum --benchmarks=readrandom[-X20] --num=100000000 --reads=1000000 --use_existing_db=true --readonly=1
[Readrandom checksum=1]
readrandom [AVG 20 runs] : 360928 (± 3579) ops/sec; 4.0 (± 0.0) MB/sec
readrandom [MEDIAN 20 runs] : 362468 ops/sec; 4.0 MB/sec
[Readrandom checksum=0]
readrandom [AVG 20 runs] : 380365 (± 2384) ops/sec; 4.2 (± 0.0) MB/sec
readrandom [MEDIAN 20 runs] : 379800 ops/sec; 4.2 MB/sec
Compression
TEST_TMPDIR=/dev/shm ./db_bench_ichecksum --benchmarks=compress[-X20] --compression_type=zstd --num=100000000 --compression_checksum=1
checksum=1
compress [AVG 20 runs] : 54074 (± 634) ops/sec; 211.2 (± 2.5) MB/sec
compress [MEDIAN 20 runs] : 54396 ops/sec; 212.5 MB/sec
checksum=0
compress [AVG 20 runs] : 54598 (± 393) ops/sec; 213.3 (± 1.5) MB/sec
compress [MEDIAN 20 runs] : 54592 ops/sec; 213.3 MB/sec
Decompression:
TEST_TMPDIR=/dev/shm ./db_bench_ichecksum --benchmarks=uncompress[-X20] --compression_type=zstd --compression_checksum=1
checksum = 0
uncompress [AVG 20 runs] : 167499 (± 962) ops/sec; 654.3 (± 3.8) MB/sec
uncompress [MEDIAN 20 runs] : 167210 ops/sec; 653.2 MB/sec
checksum = 1
uncompress [AVG 20 runs] : 167980 (± 924) ops/sec; 656.2 (± 3.6) MB/sec
uncompress [MEDIAN 20 runs] : 168465 ops/sec; 658.1 MB/sec
```
Reviewed By: ajkr
Differential Revision: D48019378
Pulled By: cbi42
fbshipit-source-id: 674120c6e1853c2ced1436ac8138559d0204feba
2023-08-18 22:01:59 +00:00
|
|
|
if (ZSTD_isError(actual_output_length)) {
|
|
|
|
if (error_message) {
|
|
|
|
*error_message = ZSTD_getErrorName(actual_output_length);
|
|
|
|
}
|
|
|
|
return nullptr;
|
|
|
|
} else if (actual_output_length != output_len) {
|
|
|
|
if (error_message) {
|
|
|
|
*error_message = kErrorOutputLenMismatch;
|
|
|
|
}
|
|
|
|
return nullptr;
|
|
|
|
}
|
|
|
|
|
2020-09-25 16:00:29 +00:00
|
|
|
*uncompressed_size = actual_output_length;
|
2015-08-27 22:40:42 +00:00
|
|
|
return output;
|
2018-04-13 00:55:14 +00:00
|
|
|
#else // ZSTD
|
2019-01-19 03:10:17 +00:00
|
|
|
(void)info;
|
2018-04-13 00:55:14 +00:00
|
|
|
(void)input_data;
|
|
|
|
(void)input_length;
|
2020-09-25 16:00:29 +00:00
|
|
|
(void)uncompressed_size;
|
2018-10-03 00:21:54 +00:00
|
|
|
(void)allocator;
|
Add `CompressionOptions::checksum` for enabling ZSTD checksum (#11666)
Summary:
Optionally enable zstd checksum flag (https://github.com/facebook/zstd/blob/d857369028d997c92ff1f1861a4d7f679a125464/lib/zstd.h#L428) to detect corruption during decompression. Main changes are in compression.h:
* User can set CompressionOptions::checksum to true to enable this feature.
* We enable this feature in ZSTD by setting the checksum flag in ZSTD compression context: `ZSTD_CCtx`.
* Uses `ZSTD_compress2()` to do compression since it supports frame parameter like the checksum flag. Compression level is also set in compression context as a flag.
* Error handling during decompression to propagate error message from ZSTD.
* Updated microbench to test read performance impact.
About compatibility, the current compression decoders should continue to work with the data created by the new compression API `ZSTD_compress2()`: https://github.com/facebook/zstd/issues/3711.
Pull Request resolved: https://github.com/facebook/rocksdb/pull/11666
Test Plan:
* Existing unit tests for zstd compression
* Add unit test `DBTest2.ZSTDChecksum` to test the corruption case
* Manually tested that compression levels, parallel compression, dictionary compression, index compression all work with the new ZSTD_compress2() API.
* Manually tested with `sst_dump --command=recompress` that different compression levels and dictionary compression settings all work.
* Manually tested compiling with older versions of ZSTD: v1.3.8, v1.1.0, v0.6.2.
* Perf impact: from public benchmark data: http://fastcompression.blogspot.com/2019/03/presenting-xxh3.html for checksum and https://github.com/facebook/zstd#benchmarks, if decompression is 1700MB/s and checksum computation is 70000MB/s, checksum computation is an additional ~2.4% time for decompression. Compression is slower and checksumming should be less noticeable.
* Microbench:
```
TEST_TMPDIR=/dev/shm ./branch_db_basic_bench --benchmark_filter=DBGet/comp_style:0/max_data:1048576/per_key_size:256/enable_statistics:0/negative_query:0/enable_filter:0/mmap:0/compression_type:7/compression_checksum:1/no_blockcache:1/iterations:10000/threads:1 --benchmark_repetitions=100
Min out of 100 runs:
Main:
10390 10436 10456 10484 10499 10535 10544 10545 10565 10568
After this PR, checksum=false
10285 10397 10503 10508 10515 10557 10562 10635 10640 10660
After this PR, checksum=true
10827 10876 10925 10949 10971 11052 11061 11063 11100 11109
```
* db_bench:
```
Write perf
TEST_TMPDIR=/dev/shm/ ./db_bench_ichecksum --benchmarks=fillseq[-X10] --compression_type=zstd --num=10000000 --compression_checksum=..
[FillSeq checksum=0]
fillseq [AVG 10 runs] : 281635 (± 31711) ops/sec; 31.2 (± 3.5) MB/sec
fillseq [MEDIAN 10 runs] : 294027 ops/sec; 32.5 MB/sec
[FillSeq checksum=1]
fillseq [AVG 10 runs] : 286961 (± 34700) ops/sec; 31.7 (± 3.8) MB/sec
fillseq [MEDIAN 10 runs] : 283278 ops/sec; 31.3 MB/sec
Read perf
TEST_TMPDIR=/dev/shm ./db_bench_ichecksum --benchmarks=readrandom[-X20] --num=100000000 --reads=1000000 --use_existing_db=true --readonly=1
[Readrandom checksum=1]
readrandom [AVG 20 runs] : 360928 (± 3579) ops/sec; 4.0 (± 0.0) MB/sec
readrandom [MEDIAN 20 runs] : 362468 ops/sec; 4.0 MB/sec
[Readrandom checksum=0]
readrandom [AVG 20 runs] : 380365 (± 2384) ops/sec; 4.2 (± 0.0) MB/sec
readrandom [MEDIAN 20 runs] : 379800 ops/sec; 4.2 MB/sec
Compression
TEST_TMPDIR=/dev/shm ./db_bench_ichecksum --benchmarks=compress[-X20] --compression_type=zstd --num=100000000 --compression_checksum=1
checksum=1
compress [AVG 20 runs] : 54074 (± 634) ops/sec; 211.2 (± 2.5) MB/sec
compress [MEDIAN 20 runs] : 54396 ops/sec; 212.5 MB/sec
checksum=0
compress [AVG 20 runs] : 54598 (± 393) ops/sec; 213.3 (± 1.5) MB/sec
compress [MEDIAN 20 runs] : 54592 ops/sec; 213.3 MB/sec
Decompression:
TEST_TMPDIR=/dev/shm ./db_bench_ichecksum --benchmarks=uncompress[-X20] --compression_type=zstd --compression_checksum=1
checksum = 0
uncompress [AVG 20 runs] : 167499 (± 962) ops/sec; 654.3 (± 3.8) MB/sec
uncompress [MEDIAN 20 runs] : 167210 ops/sec; 653.2 MB/sec
checksum = 1
uncompress [AVG 20 runs] : 167980 (± 924) ops/sec; 656.2 (± 3.6) MB/sec
uncompress [MEDIAN 20 runs] : 168465 ops/sec; 658.1 MB/sec
```
Reviewed By: ajkr
Differential Revision: D48019378
Pulled By: cbi42
fbshipit-source-id: 674120c6e1853c2ced1436ac8138559d0204feba
2023-08-18 22:01:59 +00:00
|
|
|
(void)error_message;
|
2015-08-27 22:40:42 +00:00
|
|
|
return nullptr;
|
2018-01-31 20:04:52 +00:00
|
|
|
#endif
|
2015-08-27 22:40:42 +00:00
|
|
|
}
|
|
|
|
|
2019-01-19 03:10:17 +00:00
|
|
|
inline bool ZSTD_TrainDictionarySupported() {
|
|
|
|
#ifdef ZSTD
|
|
|
|
// Dictionary trainer is available since v0.6.1 for static linking, but not
|
|
|
|
// available for dynamic linking until v1.1.3. For now we enable the feature
|
|
|
|
// in v1.1.3+ only.
|
|
|
|
return (ZSTD_versionNumber() >= 10103);
|
|
|
|
#else
|
|
|
|
return false;
|
|
|
|
#endif
|
|
|
|
}
|
|
|
|
|
2017-11-03 05:46:13 +00:00
|
|
|
inline std::string ZSTD_TrainDictionary(const std::string& samples,
|
|
|
|
const std::vector<size_t>& sample_lens,
|
|
|
|
size_t max_dict_bytes) {
|
2018-08-23 01:22:10 +00:00
|
|
|
// Dictionary trainer is available since v0.6.1 for static linking, but not
|
|
|
|
// available for dynamic linking until v1.1.3. For now we enable the feature
|
|
|
|
// in v1.1.3+ only.
|
|
|
|
#if ZSTD_VERSION_NUMBER >= 10103 // v1.1.3+
|
2019-01-19 03:10:17 +00:00
|
|
|
assert(samples.empty() == sample_lens.empty());
|
|
|
|
if (samples.empty()) {
|
|
|
|
return "";
|
|
|
|
}
|
2017-11-03 05:46:13 +00:00
|
|
|
std::string dict_data(max_dict_bytes, '\0');
|
2017-11-03 18:15:26 +00:00
|
|
|
size_t dict_len = ZDICT_trainFromBuffer(
|
|
|
|
&dict_data[0], max_dict_bytes, &samples[0], &sample_lens[0],
|
|
|
|
static_cast<unsigned>(sample_lens.size()));
|
2017-11-03 05:46:13 +00:00
|
|
|
if (ZDICT_isError(dict_len)) {
|
|
|
|
return "";
|
|
|
|
}
|
|
|
|
assert(dict_len <= max_dict_bytes);
|
|
|
|
dict_data.resize(dict_len);
|
|
|
|
return dict_data;
|
2018-08-23 01:22:10 +00:00
|
|
|
#else // up to v1.1.2
|
2017-11-03 05:46:13 +00:00
|
|
|
assert(false);
|
2018-04-13 00:55:14 +00:00
|
|
|
(void)samples;
|
|
|
|
(void)sample_lens;
|
|
|
|
(void)max_dict_bytes;
|
2017-11-03 05:46:13 +00:00
|
|
|
return "";
|
2018-08-23 01:22:10 +00:00
|
|
|
#endif // ZSTD_VERSION_NUMBER >= 10103
|
2017-11-03 05:46:13 +00:00
|
|
|
}
|
|
|
|
|
|
|
|
inline std::string ZSTD_TrainDictionary(const std::string& samples,
|
|
|
|
size_t sample_len_shift,
|
|
|
|
size_t max_dict_bytes) {
|
|
|
|
// Dictionary trainer is available since v0.6.1, but ZSTD was marked stable
|
|
|
|
// only since v0.8.0. For now we enable the feature in stable versions only.
|
2018-08-23 01:22:10 +00:00
|
|
|
#if ZSTD_VERSION_NUMBER >= 10103 // v1.1.3+
|
2017-11-03 05:46:13 +00:00
|
|
|
// skips potential partial sample at the end of "samples"
|
|
|
|
size_t num_samples = samples.size() >> sample_len_shift;
|
2018-03-30 18:16:10 +00:00
|
|
|
std::vector<size_t> sample_lens(num_samples, size_t(1) << sample_len_shift);
|
2017-11-03 05:46:13 +00:00
|
|
|
return ZSTD_TrainDictionary(samples, sample_lens, max_dict_bytes);
|
2018-08-23 01:22:10 +00:00
|
|
|
#else // up to v1.1.2
|
2017-11-03 05:46:13 +00:00
|
|
|
assert(false);
|
2018-04-13 00:55:14 +00:00
|
|
|
(void)samples;
|
|
|
|
(void)sample_len_shift;
|
|
|
|
(void)max_dict_bytes;
|
2017-11-03 05:46:13 +00:00
|
|
|
return "";
|
2018-08-23 01:22:10 +00:00
|
|
|
#endif // ZSTD_VERSION_NUMBER >= 10103
|
2017-11-03 05:46:13 +00:00
|
|
|
}
|
|
|
|
|
Support using ZDICT_finalizeDictionary to generate zstd dictionary (#9857)
Summary:
An untrained dictionary is currently simply the concatenation of several samples. The ZSTD API, ZDICT_finalizeDictionary(), can improve such a dictionary's effectiveness at low cost. This PR changes how dictionary is created by calling the ZSTD ZDICT_finalizeDictionary() API instead of creating raw content dictionary (when max_dict_buffer_bytes > 0), and pass in all buffered uncompressed data blocks as samples.
Pull Request resolved: https://github.com/facebook/rocksdb/pull/9857
Test Plan:
#### db_bench test for cpu/memory of compression+decompression and space saving on synthetic data:
Set up: change the parameter [here](https://github.com/facebook/rocksdb/blob/fb9a167a55e0970b1ef6f67c1600c8d9c4c6114f/tools/db_bench_tool.cc#L1766) to 16384 to make synthetic data more compressible.
```
# linked local ZSTD with version 1.5.2
# DEBUG_LEVEL=0 ROCKSDB_NO_FBCODE=1 ROCKSDB_DISABLE_ZSTD=1 EXTRA_CXXFLAGS="-DZSTD_STATIC_LINKING_ONLY -DZSTD -I/data/users/changyubi/install/include/" EXTRA_LDFLAGS="-L/data/users/changyubi/install/lib/ -l:libzstd.a" make -j32 db_bench
dict_bytes=16384
train_bytes=1048576
echo "========== No Dictionary =========="
TEST_TMPDIR=/dev/shm ./db_bench -benchmarks=filluniquerandom,compact -num=10000000 -compression_type=zstd -compression_max_dict_bytes=0 -block_size=4096 -max_background_jobs=24 -memtablerep=vector -allow_concurrent_memtable_write=false -disable_wal=true -max_write_buffer_number=8 >/dev/null 2>&1
TEST_TMPDIR=/dev/shm /usr/bin/time ./db_bench -use_existing_db=true -benchmarks=compact -compression_type=zstd -compression_max_dict_bytes=0 -block_size=4096 2>&1 | grep elapsed
du -hc /dev/shm/dbbench/*sst | grep total
echo "========== Raw Content Dictionary =========="
TEST_TMPDIR=/dev/shm ./db_bench_main -benchmarks=filluniquerandom,compact -num=10000000 -compression_type=zstd -compression_max_dict_bytes=$dict_bytes -block_size=4096 -max_background_jobs=24 -memtablerep=vector -allow_concurrent_memtable_write=false -disable_wal=true -max_write_buffer_number=8 >/dev/null 2>&1
TEST_TMPDIR=/dev/shm /usr/bin/time ./db_bench_main -use_existing_db=true -benchmarks=compact -compression_type=zstd -compression_max_dict_bytes=$dict_bytes -block_size=4096 2>&1 | grep elapsed
du -hc /dev/shm/dbbench/*sst | grep total
echo "========== FinalizeDictionary =========="
TEST_TMPDIR=/dev/shm ./db_bench -benchmarks=filluniquerandom,compact -num=10000000 -compression_type=zstd -compression_max_dict_bytes=$dict_bytes -compression_zstd_max_train_bytes=$train_bytes -compression_use_zstd_dict_trainer=false -block_size=4096 -max_background_jobs=24 -memtablerep=vector -allow_concurrent_memtable_write=false -disable_wal=true -max_write_buffer_number=8 >/dev/null 2>&1
TEST_TMPDIR=/dev/shm /usr/bin/time ./db_bench -use_existing_db=true -benchmarks=compact -compression_type=zstd -compression_max_dict_bytes=$dict_bytes -compression_zstd_max_train_bytes=$train_bytes -compression_use_zstd_dict_trainer=false -block_size=4096 2>&1 | grep elapsed
du -hc /dev/shm/dbbench/*sst | grep total
echo "========== TrainDictionary =========="
TEST_TMPDIR=/dev/shm ./db_bench -benchmarks=filluniquerandom,compact -num=10000000 -compression_type=zstd -compression_max_dict_bytes=$dict_bytes -compression_zstd_max_train_bytes=$train_bytes -block_size=4096 -max_background_jobs=24 -memtablerep=vector -allow_concurrent_memtable_write=false -disable_wal=true -max_write_buffer_number=8 >/dev/null 2>&1
TEST_TMPDIR=/dev/shm /usr/bin/time ./db_bench -use_existing_db=true -benchmarks=compact -compression_type=zstd -compression_max_dict_bytes=$dict_bytes -compression_zstd_max_train_bytes=$train_bytes -block_size=4096 2>&1 | grep elapsed
du -hc /dev/shm/dbbench/*sst | grep total
# Result: TrainDictionary is much better on space saving, but FinalizeDictionary seems to use less memory.
# before compression data size: 1.2GB
dict_bytes=16384
max_dict_buffer_bytes = 1048576
space cpu/memory
No Dictionary 468M 14.93user 1.00system 0:15.92elapsed 100%CPU (0avgtext+0avgdata 23904maxresident)k
Raw Dictionary 251M 15.81user 0.80system 0:16.56elapsed 100%CPU (0avgtext+0avgdata 156808maxresident)k
FinalizeDictionary 236M 11.93user 0.64system 0:12.56elapsed 100%CPU (0avgtext+0avgdata 89548maxresident)k
TrainDictionary 84M 7.29user 0.45system 0:07.75elapsed 100%CPU (0avgtext+0avgdata 97288maxresident)k
```
#### Benchmark on 10 sample SST files for spacing saving and CPU time on compression:
FinalizeDictionary is comparable to TrainDictionary in terms of space saving, and takes less time in compression.
```
dict_bytes=16384
train_bytes=1048576
for sst_file in `ls ../temp/myrock-sst/`
do
echo "********** $sst_file **********"
echo "========== No Dictionary =========="
./sst_dump --file="../temp/myrock-sst/$sst_file" --command=recompress --compression_level_from=6 --compression_level_to=6 --compression_types=kZSTD
echo "========== Raw Content Dictionary =========="
./sst_dump --file="../temp/myrock-sst/$sst_file" --command=recompress --compression_level_from=6 --compression_level_to=6 --compression_types=kZSTD --compression_max_dict_bytes=$dict_bytes
echo "========== FinalizeDictionary =========="
./sst_dump --file="../temp/myrock-sst/$sst_file" --command=recompress --compression_level_from=6 --compression_level_to=6 --compression_types=kZSTD --compression_max_dict_bytes=$dict_bytes --compression_zstd_max_train_bytes=$train_bytes --compression_use_zstd_finalize_dict
echo "========== TrainDictionary =========="
./sst_dump --file="../temp/myrock-sst/$sst_file" --command=recompress --compression_level_from=6 --compression_level_to=6 --compression_types=kZSTD --compression_max_dict_bytes=$dict_bytes --compression_zstd_max_train_bytes=$train_bytes
done
010240.sst (Size/Time) 011029.sst 013184.sst 021552.sst 185054.sst 185137.sst 191666.sst 7560381.sst 7604174.sst 7635312.sst
No Dictionary 28165569 / 2614419 32899411 / 2976832 32977848 / 3055542 31966329 / 2004590 33614351 / 1755877 33429029 / 1717042 33611933 / 1776936 33634045 / 2771417 33789721 / 2205414 33592194 / 388254
Raw Content Dictionary 28019950 / 2697961 33748665 / 3572422 33896373 / 3534701 26418431 / 2259658 28560825 / 1839168 28455030 / 1846039 28494319 / 1861349 32391599 / 3095649 33772142 / 2407843 33592230 / 474523
FinalizeDictionary 27896012 / 2650029 33763886 / 3719427 33904283 / 3552793 26008225 / 2198033 28111872 / 1869530 28014374 / 1789771 28047706 / 1848300 32296254 / 3204027 33698698 / 2381468 33592344 / 517433
TrainDictionary 28046089 / 2740037 33706480 / 3679019 33885741 / 3629351 25087123 / 2204558 27194353 / 1970207 27234229 / 1896811 27166710 / 1903119 32011041 / 3322315 32730692 / 2406146 33608631 / 570593
```
#### Decompression/Read test:
With FinalizeDictionary/TrainDictionary, some data structure used for decompression are in stored in dictionary, so they are expected to be faster in terms of decompression/reads.
```
dict_bytes=16384
train_bytes=1048576
echo "No Dictionary"
TEST_TMPDIR=/dev/shm/ ./db_bench -benchmarks=filluniquerandom,compact -compression_type=zstd -compression_max_dict_bytes=0 > /dev/null 2>&1
TEST_TMPDIR=/dev/shm/ ./db_bench -use_existing_db=true -benchmarks=readrandom -cache_size=0 -compression_type=zstd -compression_max_dict_bytes=0 2>&1 | grep MB/s
echo "Raw Dictionary"
TEST_TMPDIR=/dev/shm/ ./db_bench -benchmarks=filluniquerandom,compact -compression_type=zstd -compression_max_dict_bytes=$dict_bytes > /dev/null 2>&1
TEST_TMPDIR=/dev/shm/ ./db_bench -use_existing_db=true -benchmarks=readrandom -cache_size=0 -compression_type=zstd -compression_max_dict_bytes=$dict_bytes 2>&1 | grep MB/s
echo "FinalizeDict"
TEST_TMPDIR=/dev/shm/ ./db_bench -benchmarks=filluniquerandom,compact -compression_type=zstd -compression_max_dict_bytes=$dict_bytes -compression_zstd_max_train_bytes=$train_bytes -compression_use_zstd_dict_trainer=false > /dev/null 2>&1
TEST_TMPDIR=/dev/shm/ ./db_bench -use_existing_db=true -benchmarks=readrandom -cache_size=0 -compression_type=zstd -compression_max_dict_bytes=$dict_bytes -compression_zstd_max_train_bytes=$train_bytes -compression_use_zstd_dict_trainer=false 2>&1 | grep MB/s
echo "Train Dictionary"
TEST_TMPDIR=/dev/shm/ ./db_bench -benchmarks=filluniquerandom,compact -compression_type=zstd -compression_max_dict_bytes=$dict_bytes -compression_zstd_max_train_bytes=$train_bytes > /dev/null 2>&1
TEST_TMPDIR=/dev/shm/ ./db_bench -use_existing_db=true -benchmarks=readrandom -cache_size=0 -compression_type=zstd -compression_max_dict_bytes=$dict_bytes -compression_zstd_max_train_bytes=$train_bytes 2>&1 | grep MB/s
No Dictionary
readrandom : 12.183 micros/op 82082 ops/sec 12.183 seconds 1000000 operations; 9.1 MB/s (1000000 of 1000000 found)
Raw Dictionary
readrandom : 12.314 micros/op 81205 ops/sec 12.314 seconds 1000000 operations; 9.0 MB/s (1000000 of 1000000 found)
FinalizeDict
readrandom : 9.787 micros/op 102180 ops/sec 9.787 seconds 1000000 operations; 11.3 MB/s (1000000 of 1000000 found)
Train Dictionary
readrandom : 9.698 micros/op 103108 ops/sec 9.699 seconds 1000000 operations; 11.4 MB/s (1000000 of 1000000 found)
```
Reviewed By: ajkr
Differential Revision: D35720026
Pulled By: cbi42
fbshipit-source-id: 24d230fdff0fd28a1bb650658798f00dfcfb2a1f
2022-05-20 19:09:09 +00:00
|
|
|
inline bool ZSTD_FinalizeDictionarySupported() {
|
|
|
|
#ifdef ZSTD
|
|
|
|
// ZDICT_finalizeDictionary API is stable since v1.4.5
|
|
|
|
return (ZSTD_versionNumber() >= 10405);
|
|
|
|
#else
|
|
|
|
return false;
|
|
|
|
#endif
|
|
|
|
}
|
|
|
|
|
|
|
|
inline std::string ZSTD_FinalizeDictionary(
|
|
|
|
const std::string& samples, const std::vector<size_t>& sample_lens,
|
|
|
|
size_t max_dict_bytes, int level) {
|
|
|
|
// ZDICT_finalizeDictionary is stable since version v1.4.5
|
|
|
|
#if ZSTD_VERSION_NUMBER >= 10405 // v1.4.5+
|
|
|
|
assert(samples.empty() == sample_lens.empty());
|
|
|
|
if (samples.empty()) {
|
|
|
|
return "";
|
|
|
|
}
|
|
|
|
if (level == CompressionOptions::kDefaultCompressionLevel) {
|
|
|
|
// 3 is the value of ZSTD_CLEVEL_DEFAULT (not exposed publicly), see
|
|
|
|
// https://github.com/facebook/zstd/issues/1148
|
|
|
|
level = 3;
|
|
|
|
}
|
|
|
|
std::string dict_data(max_dict_bytes, '\0');
|
|
|
|
size_t dict_len = ZDICT_finalizeDictionary(
|
|
|
|
dict_data.data(), max_dict_bytes, samples.data(),
|
|
|
|
std::min(static_cast<size_t>(samples.size()), max_dict_bytes),
|
|
|
|
samples.data(), sample_lens.data(),
|
|
|
|
static_cast<unsigned>(sample_lens.size()),
|
|
|
|
{level, 0 /* notificationLevel */, 0 /* dictID */});
|
|
|
|
if (ZDICT_isError(dict_len)) {
|
|
|
|
return "";
|
|
|
|
} else {
|
|
|
|
assert(dict_len <= max_dict_bytes);
|
|
|
|
dict_data.resize(dict_len);
|
|
|
|
return dict_data;
|
|
|
|
}
|
|
|
|
#else // up to v1.4.4
|
Add `CompressionOptions::checksum` for enabling ZSTD checksum (#11666)
Summary:
Optionally enable zstd checksum flag (https://github.com/facebook/zstd/blob/d857369028d997c92ff1f1861a4d7f679a125464/lib/zstd.h#L428) to detect corruption during decompression. Main changes are in compression.h:
* User can set CompressionOptions::checksum to true to enable this feature.
* We enable this feature in ZSTD by setting the checksum flag in ZSTD compression context: `ZSTD_CCtx`.
* Uses `ZSTD_compress2()` to do compression since it supports frame parameter like the checksum flag. Compression level is also set in compression context as a flag.
* Error handling during decompression to propagate error message from ZSTD.
* Updated microbench to test read performance impact.
About compatibility, the current compression decoders should continue to work with the data created by the new compression API `ZSTD_compress2()`: https://github.com/facebook/zstd/issues/3711.
Pull Request resolved: https://github.com/facebook/rocksdb/pull/11666
Test Plan:
* Existing unit tests for zstd compression
* Add unit test `DBTest2.ZSTDChecksum` to test the corruption case
* Manually tested that compression levels, parallel compression, dictionary compression, index compression all work with the new ZSTD_compress2() API.
* Manually tested with `sst_dump --command=recompress` that different compression levels and dictionary compression settings all work.
* Manually tested compiling with older versions of ZSTD: v1.3.8, v1.1.0, v0.6.2.
* Perf impact: from public benchmark data: http://fastcompression.blogspot.com/2019/03/presenting-xxh3.html for checksum and https://github.com/facebook/zstd#benchmarks, if decompression is 1700MB/s and checksum computation is 70000MB/s, checksum computation is an additional ~2.4% time for decompression. Compression is slower and checksumming should be less noticeable.
* Microbench:
```
TEST_TMPDIR=/dev/shm ./branch_db_basic_bench --benchmark_filter=DBGet/comp_style:0/max_data:1048576/per_key_size:256/enable_statistics:0/negative_query:0/enable_filter:0/mmap:0/compression_type:7/compression_checksum:1/no_blockcache:1/iterations:10000/threads:1 --benchmark_repetitions=100
Min out of 100 runs:
Main:
10390 10436 10456 10484 10499 10535 10544 10545 10565 10568
After this PR, checksum=false
10285 10397 10503 10508 10515 10557 10562 10635 10640 10660
After this PR, checksum=true
10827 10876 10925 10949 10971 11052 11061 11063 11100 11109
```
* db_bench:
```
Write perf
TEST_TMPDIR=/dev/shm/ ./db_bench_ichecksum --benchmarks=fillseq[-X10] --compression_type=zstd --num=10000000 --compression_checksum=..
[FillSeq checksum=0]
fillseq [AVG 10 runs] : 281635 (± 31711) ops/sec; 31.2 (± 3.5) MB/sec
fillseq [MEDIAN 10 runs] : 294027 ops/sec; 32.5 MB/sec
[FillSeq checksum=1]
fillseq [AVG 10 runs] : 286961 (± 34700) ops/sec; 31.7 (± 3.8) MB/sec
fillseq [MEDIAN 10 runs] : 283278 ops/sec; 31.3 MB/sec
Read perf
TEST_TMPDIR=/dev/shm ./db_bench_ichecksum --benchmarks=readrandom[-X20] --num=100000000 --reads=1000000 --use_existing_db=true --readonly=1
[Readrandom checksum=1]
readrandom [AVG 20 runs] : 360928 (± 3579) ops/sec; 4.0 (± 0.0) MB/sec
readrandom [MEDIAN 20 runs] : 362468 ops/sec; 4.0 MB/sec
[Readrandom checksum=0]
readrandom [AVG 20 runs] : 380365 (± 2384) ops/sec; 4.2 (± 0.0) MB/sec
readrandom [MEDIAN 20 runs] : 379800 ops/sec; 4.2 MB/sec
Compression
TEST_TMPDIR=/dev/shm ./db_bench_ichecksum --benchmarks=compress[-X20] --compression_type=zstd --num=100000000 --compression_checksum=1
checksum=1
compress [AVG 20 runs] : 54074 (± 634) ops/sec; 211.2 (± 2.5) MB/sec
compress [MEDIAN 20 runs] : 54396 ops/sec; 212.5 MB/sec
checksum=0
compress [AVG 20 runs] : 54598 (± 393) ops/sec; 213.3 (± 1.5) MB/sec
compress [MEDIAN 20 runs] : 54592 ops/sec; 213.3 MB/sec
Decompression:
TEST_TMPDIR=/dev/shm ./db_bench_ichecksum --benchmarks=uncompress[-X20] --compression_type=zstd --compression_checksum=1
checksum = 0
uncompress [AVG 20 runs] : 167499 (± 962) ops/sec; 654.3 (± 3.8) MB/sec
uncompress [MEDIAN 20 runs] : 167210 ops/sec; 653.2 MB/sec
checksum = 1
uncompress [AVG 20 runs] : 167980 (± 924) ops/sec; 656.2 (± 3.6) MB/sec
uncompress [MEDIAN 20 runs] : 168465 ops/sec; 658.1 MB/sec
```
Reviewed By: ajkr
Differential Revision: D48019378
Pulled By: cbi42
fbshipit-source-id: 674120c6e1853c2ced1436ac8138559d0204feba
2023-08-18 22:01:59 +00:00
|
|
|
assert(false);
|
Support using ZDICT_finalizeDictionary to generate zstd dictionary (#9857)
Summary:
An untrained dictionary is currently simply the concatenation of several samples. The ZSTD API, ZDICT_finalizeDictionary(), can improve such a dictionary's effectiveness at low cost. This PR changes how dictionary is created by calling the ZSTD ZDICT_finalizeDictionary() API instead of creating raw content dictionary (when max_dict_buffer_bytes > 0), and pass in all buffered uncompressed data blocks as samples.
Pull Request resolved: https://github.com/facebook/rocksdb/pull/9857
Test Plan:
#### db_bench test for cpu/memory of compression+decompression and space saving on synthetic data:
Set up: change the parameter [here](https://github.com/facebook/rocksdb/blob/fb9a167a55e0970b1ef6f67c1600c8d9c4c6114f/tools/db_bench_tool.cc#L1766) to 16384 to make synthetic data more compressible.
```
# linked local ZSTD with version 1.5.2
# DEBUG_LEVEL=0 ROCKSDB_NO_FBCODE=1 ROCKSDB_DISABLE_ZSTD=1 EXTRA_CXXFLAGS="-DZSTD_STATIC_LINKING_ONLY -DZSTD -I/data/users/changyubi/install/include/" EXTRA_LDFLAGS="-L/data/users/changyubi/install/lib/ -l:libzstd.a" make -j32 db_bench
dict_bytes=16384
train_bytes=1048576
echo "========== No Dictionary =========="
TEST_TMPDIR=/dev/shm ./db_bench -benchmarks=filluniquerandom,compact -num=10000000 -compression_type=zstd -compression_max_dict_bytes=0 -block_size=4096 -max_background_jobs=24 -memtablerep=vector -allow_concurrent_memtable_write=false -disable_wal=true -max_write_buffer_number=8 >/dev/null 2>&1
TEST_TMPDIR=/dev/shm /usr/bin/time ./db_bench -use_existing_db=true -benchmarks=compact -compression_type=zstd -compression_max_dict_bytes=0 -block_size=4096 2>&1 | grep elapsed
du -hc /dev/shm/dbbench/*sst | grep total
echo "========== Raw Content Dictionary =========="
TEST_TMPDIR=/dev/shm ./db_bench_main -benchmarks=filluniquerandom,compact -num=10000000 -compression_type=zstd -compression_max_dict_bytes=$dict_bytes -block_size=4096 -max_background_jobs=24 -memtablerep=vector -allow_concurrent_memtable_write=false -disable_wal=true -max_write_buffer_number=8 >/dev/null 2>&1
TEST_TMPDIR=/dev/shm /usr/bin/time ./db_bench_main -use_existing_db=true -benchmarks=compact -compression_type=zstd -compression_max_dict_bytes=$dict_bytes -block_size=4096 2>&1 | grep elapsed
du -hc /dev/shm/dbbench/*sst | grep total
echo "========== FinalizeDictionary =========="
TEST_TMPDIR=/dev/shm ./db_bench -benchmarks=filluniquerandom,compact -num=10000000 -compression_type=zstd -compression_max_dict_bytes=$dict_bytes -compression_zstd_max_train_bytes=$train_bytes -compression_use_zstd_dict_trainer=false -block_size=4096 -max_background_jobs=24 -memtablerep=vector -allow_concurrent_memtable_write=false -disable_wal=true -max_write_buffer_number=8 >/dev/null 2>&1
TEST_TMPDIR=/dev/shm /usr/bin/time ./db_bench -use_existing_db=true -benchmarks=compact -compression_type=zstd -compression_max_dict_bytes=$dict_bytes -compression_zstd_max_train_bytes=$train_bytes -compression_use_zstd_dict_trainer=false -block_size=4096 2>&1 | grep elapsed
du -hc /dev/shm/dbbench/*sst | grep total
echo "========== TrainDictionary =========="
TEST_TMPDIR=/dev/shm ./db_bench -benchmarks=filluniquerandom,compact -num=10000000 -compression_type=zstd -compression_max_dict_bytes=$dict_bytes -compression_zstd_max_train_bytes=$train_bytes -block_size=4096 -max_background_jobs=24 -memtablerep=vector -allow_concurrent_memtable_write=false -disable_wal=true -max_write_buffer_number=8 >/dev/null 2>&1
TEST_TMPDIR=/dev/shm /usr/bin/time ./db_bench -use_existing_db=true -benchmarks=compact -compression_type=zstd -compression_max_dict_bytes=$dict_bytes -compression_zstd_max_train_bytes=$train_bytes -block_size=4096 2>&1 | grep elapsed
du -hc /dev/shm/dbbench/*sst | grep total
# Result: TrainDictionary is much better on space saving, but FinalizeDictionary seems to use less memory.
# before compression data size: 1.2GB
dict_bytes=16384
max_dict_buffer_bytes = 1048576
space cpu/memory
No Dictionary 468M 14.93user 1.00system 0:15.92elapsed 100%CPU (0avgtext+0avgdata 23904maxresident)k
Raw Dictionary 251M 15.81user 0.80system 0:16.56elapsed 100%CPU (0avgtext+0avgdata 156808maxresident)k
FinalizeDictionary 236M 11.93user 0.64system 0:12.56elapsed 100%CPU (0avgtext+0avgdata 89548maxresident)k
TrainDictionary 84M 7.29user 0.45system 0:07.75elapsed 100%CPU (0avgtext+0avgdata 97288maxresident)k
```
#### Benchmark on 10 sample SST files for spacing saving and CPU time on compression:
FinalizeDictionary is comparable to TrainDictionary in terms of space saving, and takes less time in compression.
```
dict_bytes=16384
train_bytes=1048576
for sst_file in `ls ../temp/myrock-sst/`
do
echo "********** $sst_file **********"
echo "========== No Dictionary =========="
./sst_dump --file="../temp/myrock-sst/$sst_file" --command=recompress --compression_level_from=6 --compression_level_to=6 --compression_types=kZSTD
echo "========== Raw Content Dictionary =========="
./sst_dump --file="../temp/myrock-sst/$sst_file" --command=recompress --compression_level_from=6 --compression_level_to=6 --compression_types=kZSTD --compression_max_dict_bytes=$dict_bytes
echo "========== FinalizeDictionary =========="
./sst_dump --file="../temp/myrock-sst/$sst_file" --command=recompress --compression_level_from=6 --compression_level_to=6 --compression_types=kZSTD --compression_max_dict_bytes=$dict_bytes --compression_zstd_max_train_bytes=$train_bytes --compression_use_zstd_finalize_dict
echo "========== TrainDictionary =========="
./sst_dump --file="../temp/myrock-sst/$sst_file" --command=recompress --compression_level_from=6 --compression_level_to=6 --compression_types=kZSTD --compression_max_dict_bytes=$dict_bytes --compression_zstd_max_train_bytes=$train_bytes
done
010240.sst (Size/Time) 011029.sst 013184.sst 021552.sst 185054.sst 185137.sst 191666.sst 7560381.sst 7604174.sst 7635312.sst
No Dictionary 28165569 / 2614419 32899411 / 2976832 32977848 / 3055542 31966329 / 2004590 33614351 / 1755877 33429029 / 1717042 33611933 / 1776936 33634045 / 2771417 33789721 / 2205414 33592194 / 388254
Raw Content Dictionary 28019950 / 2697961 33748665 / 3572422 33896373 / 3534701 26418431 / 2259658 28560825 / 1839168 28455030 / 1846039 28494319 / 1861349 32391599 / 3095649 33772142 / 2407843 33592230 / 474523
FinalizeDictionary 27896012 / 2650029 33763886 / 3719427 33904283 / 3552793 26008225 / 2198033 28111872 / 1869530 28014374 / 1789771 28047706 / 1848300 32296254 / 3204027 33698698 / 2381468 33592344 / 517433
TrainDictionary 28046089 / 2740037 33706480 / 3679019 33885741 / 3629351 25087123 / 2204558 27194353 / 1970207 27234229 / 1896811 27166710 / 1903119 32011041 / 3322315 32730692 / 2406146 33608631 / 570593
```
#### Decompression/Read test:
With FinalizeDictionary/TrainDictionary, some data structure used for decompression are in stored in dictionary, so they are expected to be faster in terms of decompression/reads.
```
dict_bytes=16384
train_bytes=1048576
echo "No Dictionary"
TEST_TMPDIR=/dev/shm/ ./db_bench -benchmarks=filluniquerandom,compact -compression_type=zstd -compression_max_dict_bytes=0 > /dev/null 2>&1
TEST_TMPDIR=/dev/shm/ ./db_bench -use_existing_db=true -benchmarks=readrandom -cache_size=0 -compression_type=zstd -compression_max_dict_bytes=0 2>&1 | grep MB/s
echo "Raw Dictionary"
TEST_TMPDIR=/dev/shm/ ./db_bench -benchmarks=filluniquerandom,compact -compression_type=zstd -compression_max_dict_bytes=$dict_bytes > /dev/null 2>&1
TEST_TMPDIR=/dev/shm/ ./db_bench -use_existing_db=true -benchmarks=readrandom -cache_size=0 -compression_type=zstd -compression_max_dict_bytes=$dict_bytes 2>&1 | grep MB/s
echo "FinalizeDict"
TEST_TMPDIR=/dev/shm/ ./db_bench -benchmarks=filluniquerandom,compact -compression_type=zstd -compression_max_dict_bytes=$dict_bytes -compression_zstd_max_train_bytes=$train_bytes -compression_use_zstd_dict_trainer=false > /dev/null 2>&1
TEST_TMPDIR=/dev/shm/ ./db_bench -use_existing_db=true -benchmarks=readrandom -cache_size=0 -compression_type=zstd -compression_max_dict_bytes=$dict_bytes -compression_zstd_max_train_bytes=$train_bytes -compression_use_zstd_dict_trainer=false 2>&1 | grep MB/s
echo "Train Dictionary"
TEST_TMPDIR=/dev/shm/ ./db_bench -benchmarks=filluniquerandom,compact -compression_type=zstd -compression_max_dict_bytes=$dict_bytes -compression_zstd_max_train_bytes=$train_bytes > /dev/null 2>&1
TEST_TMPDIR=/dev/shm/ ./db_bench -use_existing_db=true -benchmarks=readrandom -cache_size=0 -compression_type=zstd -compression_max_dict_bytes=$dict_bytes -compression_zstd_max_train_bytes=$train_bytes 2>&1 | grep MB/s
No Dictionary
readrandom : 12.183 micros/op 82082 ops/sec 12.183 seconds 1000000 operations; 9.1 MB/s (1000000 of 1000000 found)
Raw Dictionary
readrandom : 12.314 micros/op 81205 ops/sec 12.314 seconds 1000000 operations; 9.0 MB/s (1000000 of 1000000 found)
FinalizeDict
readrandom : 9.787 micros/op 102180 ops/sec 9.787 seconds 1000000 operations; 11.3 MB/s (1000000 of 1000000 found)
Train Dictionary
readrandom : 9.698 micros/op 103108 ops/sec 9.699 seconds 1000000 operations; 11.4 MB/s (1000000 of 1000000 found)
```
Reviewed By: ajkr
Differential Revision: D35720026
Pulled By: cbi42
fbshipit-source-id: 24d230fdff0fd28a1bb650658798f00dfcfb2a1f
2022-05-20 19:09:09 +00:00
|
|
|
(void)samples;
|
|
|
|
(void)sample_lens;
|
|
|
|
(void)max_dict_bytes;
|
|
|
|
(void)level;
|
|
|
|
return "";
|
|
|
|
#endif // ZSTD_VERSION_NUMBER >= 10405
|
|
|
|
}
|
|
|
|
|
2020-08-13 01:24:27 +00:00
|
|
|
inline bool CompressData(const Slice& raw,
|
|
|
|
const CompressionInfo& compression_info,
|
|
|
|
uint32_t compress_format_version,
|
|
|
|
std::string* compressed_output) {
|
|
|
|
bool ret = false;
|
|
|
|
|
|
|
|
// Will return compressed block contents if (1) the compression method is
|
|
|
|
// supported in this platform and (2) the compression rate is "good enough".
|
|
|
|
switch (compression_info.type()) {
|
|
|
|
case kSnappyCompression:
|
|
|
|
ret = Snappy_Compress(compression_info, raw.data(), raw.size(),
|
|
|
|
compressed_output);
|
|
|
|
break;
|
|
|
|
case kZlibCompression:
|
|
|
|
ret = Zlib_Compress(compression_info, compress_format_version, raw.data(),
|
|
|
|
raw.size(), compressed_output);
|
|
|
|
break;
|
|
|
|
case kBZip2Compression:
|
|
|
|
ret = BZip2_Compress(compression_info, compress_format_version,
|
|
|
|
raw.data(), raw.size(), compressed_output);
|
|
|
|
break;
|
|
|
|
case kLZ4Compression:
|
|
|
|
ret = LZ4_Compress(compression_info, compress_format_version, raw.data(),
|
|
|
|
raw.size(), compressed_output);
|
|
|
|
break;
|
|
|
|
case kLZ4HCCompression:
|
|
|
|
ret = LZ4HC_Compress(compression_info, compress_format_version,
|
|
|
|
raw.data(), raw.size(), compressed_output);
|
|
|
|
break;
|
|
|
|
case kXpressCompression:
|
|
|
|
ret = XPRESS_Compress(raw.data(), raw.size(), compressed_output);
|
|
|
|
break;
|
|
|
|
case kZSTD:
|
|
|
|
case kZSTDNotFinalCompression:
|
|
|
|
ret = ZSTD_Compress(compression_info, raw.data(), raw.size(),
|
|
|
|
compressed_output);
|
|
|
|
break;
|
|
|
|
default:
|
|
|
|
// Do not recognize this compression type
|
|
|
|
break;
|
|
|
|
}
|
|
|
|
|
|
|
|
TEST_SYNC_POINT_CALLBACK("CompressData:TamperWithReturnValue",
|
|
|
|
static_cast<void*>(&ret));
|
|
|
|
|
|
|
|
return ret;
|
|
|
|
}
|
|
|
|
|
2020-09-25 16:00:29 +00:00
|
|
|
inline CacheAllocationPtr UncompressData(
|
|
|
|
const UncompressionInfo& uncompression_info, const char* data, size_t n,
|
|
|
|
size_t* uncompressed_size, uint32_t compress_format_version,
|
Add `CompressionOptions::checksum` for enabling ZSTD checksum (#11666)
Summary:
Optionally enable zstd checksum flag (https://github.com/facebook/zstd/blob/d857369028d997c92ff1f1861a4d7f679a125464/lib/zstd.h#L428) to detect corruption during decompression. Main changes are in compression.h:
* User can set CompressionOptions::checksum to true to enable this feature.
* We enable this feature in ZSTD by setting the checksum flag in ZSTD compression context: `ZSTD_CCtx`.
* Uses `ZSTD_compress2()` to do compression since it supports frame parameter like the checksum flag. Compression level is also set in compression context as a flag.
* Error handling during decompression to propagate error message from ZSTD.
* Updated microbench to test read performance impact.
About compatibility, the current compression decoders should continue to work with the data created by the new compression API `ZSTD_compress2()`: https://github.com/facebook/zstd/issues/3711.
Pull Request resolved: https://github.com/facebook/rocksdb/pull/11666
Test Plan:
* Existing unit tests for zstd compression
* Add unit test `DBTest2.ZSTDChecksum` to test the corruption case
* Manually tested that compression levels, parallel compression, dictionary compression, index compression all work with the new ZSTD_compress2() API.
* Manually tested with `sst_dump --command=recompress` that different compression levels and dictionary compression settings all work.
* Manually tested compiling with older versions of ZSTD: v1.3.8, v1.1.0, v0.6.2.
* Perf impact: from public benchmark data: http://fastcompression.blogspot.com/2019/03/presenting-xxh3.html for checksum and https://github.com/facebook/zstd#benchmarks, if decompression is 1700MB/s and checksum computation is 70000MB/s, checksum computation is an additional ~2.4% time for decompression. Compression is slower and checksumming should be less noticeable.
* Microbench:
```
TEST_TMPDIR=/dev/shm ./branch_db_basic_bench --benchmark_filter=DBGet/comp_style:0/max_data:1048576/per_key_size:256/enable_statistics:0/negative_query:0/enable_filter:0/mmap:0/compression_type:7/compression_checksum:1/no_blockcache:1/iterations:10000/threads:1 --benchmark_repetitions=100
Min out of 100 runs:
Main:
10390 10436 10456 10484 10499 10535 10544 10545 10565 10568
After this PR, checksum=false
10285 10397 10503 10508 10515 10557 10562 10635 10640 10660
After this PR, checksum=true
10827 10876 10925 10949 10971 11052 11061 11063 11100 11109
```
* db_bench:
```
Write perf
TEST_TMPDIR=/dev/shm/ ./db_bench_ichecksum --benchmarks=fillseq[-X10] --compression_type=zstd --num=10000000 --compression_checksum=..
[FillSeq checksum=0]
fillseq [AVG 10 runs] : 281635 (± 31711) ops/sec; 31.2 (± 3.5) MB/sec
fillseq [MEDIAN 10 runs] : 294027 ops/sec; 32.5 MB/sec
[FillSeq checksum=1]
fillseq [AVG 10 runs] : 286961 (± 34700) ops/sec; 31.7 (± 3.8) MB/sec
fillseq [MEDIAN 10 runs] : 283278 ops/sec; 31.3 MB/sec
Read perf
TEST_TMPDIR=/dev/shm ./db_bench_ichecksum --benchmarks=readrandom[-X20] --num=100000000 --reads=1000000 --use_existing_db=true --readonly=1
[Readrandom checksum=1]
readrandom [AVG 20 runs] : 360928 (± 3579) ops/sec; 4.0 (± 0.0) MB/sec
readrandom [MEDIAN 20 runs] : 362468 ops/sec; 4.0 MB/sec
[Readrandom checksum=0]
readrandom [AVG 20 runs] : 380365 (± 2384) ops/sec; 4.2 (± 0.0) MB/sec
readrandom [MEDIAN 20 runs] : 379800 ops/sec; 4.2 MB/sec
Compression
TEST_TMPDIR=/dev/shm ./db_bench_ichecksum --benchmarks=compress[-X20] --compression_type=zstd --num=100000000 --compression_checksum=1
checksum=1
compress [AVG 20 runs] : 54074 (± 634) ops/sec; 211.2 (± 2.5) MB/sec
compress [MEDIAN 20 runs] : 54396 ops/sec; 212.5 MB/sec
checksum=0
compress [AVG 20 runs] : 54598 (± 393) ops/sec; 213.3 (± 1.5) MB/sec
compress [MEDIAN 20 runs] : 54592 ops/sec; 213.3 MB/sec
Decompression:
TEST_TMPDIR=/dev/shm ./db_bench_ichecksum --benchmarks=uncompress[-X20] --compression_type=zstd --compression_checksum=1
checksum = 0
uncompress [AVG 20 runs] : 167499 (± 962) ops/sec; 654.3 (± 3.8) MB/sec
uncompress [MEDIAN 20 runs] : 167210 ops/sec; 653.2 MB/sec
checksum = 1
uncompress [AVG 20 runs] : 167980 (± 924) ops/sec; 656.2 (± 3.6) MB/sec
uncompress [MEDIAN 20 runs] : 168465 ops/sec; 658.1 MB/sec
```
Reviewed By: ajkr
Differential Revision: D48019378
Pulled By: cbi42
fbshipit-source-id: 674120c6e1853c2ced1436ac8138559d0204feba
2023-08-18 22:01:59 +00:00
|
|
|
MemoryAllocator* allocator = nullptr,
|
|
|
|
const char** error_message = nullptr) {
|
2020-09-25 16:00:29 +00:00
|
|
|
switch (uncompression_info.type()) {
|
|
|
|
case kSnappyCompression:
|
|
|
|
return Snappy_Uncompress(data, n, uncompressed_size, allocator);
|
|
|
|
case kZlibCompression:
|
|
|
|
return Zlib_Uncompress(uncompression_info, data, n, uncompressed_size,
|
|
|
|
compress_format_version, allocator);
|
|
|
|
case kBZip2Compression:
|
|
|
|
return BZip2_Uncompress(data, n, uncompressed_size,
|
|
|
|
compress_format_version, allocator);
|
|
|
|
case kLZ4Compression:
|
|
|
|
case kLZ4HCCompression:
|
|
|
|
return LZ4_Uncompress(uncompression_info, data, n, uncompressed_size,
|
|
|
|
compress_format_version, allocator);
|
|
|
|
case kXpressCompression:
|
|
|
|
// XPRESS allocates memory internally, thus no support for custom
|
|
|
|
// allocator.
|
|
|
|
return CacheAllocationPtr(XPRESS_Uncompress(data, n, uncompressed_size));
|
|
|
|
case kZSTD:
|
|
|
|
case kZSTDNotFinalCompression:
|
Add `CompressionOptions::checksum` for enabling ZSTD checksum (#11666)
Summary:
Optionally enable zstd checksum flag (https://github.com/facebook/zstd/blob/d857369028d997c92ff1f1861a4d7f679a125464/lib/zstd.h#L428) to detect corruption during decompression. Main changes are in compression.h:
* User can set CompressionOptions::checksum to true to enable this feature.
* We enable this feature in ZSTD by setting the checksum flag in ZSTD compression context: `ZSTD_CCtx`.
* Uses `ZSTD_compress2()` to do compression since it supports frame parameter like the checksum flag. Compression level is also set in compression context as a flag.
* Error handling during decompression to propagate error message from ZSTD.
* Updated microbench to test read performance impact.
About compatibility, the current compression decoders should continue to work with the data created by the new compression API `ZSTD_compress2()`: https://github.com/facebook/zstd/issues/3711.
Pull Request resolved: https://github.com/facebook/rocksdb/pull/11666
Test Plan:
* Existing unit tests for zstd compression
* Add unit test `DBTest2.ZSTDChecksum` to test the corruption case
* Manually tested that compression levels, parallel compression, dictionary compression, index compression all work with the new ZSTD_compress2() API.
* Manually tested with `sst_dump --command=recompress` that different compression levels and dictionary compression settings all work.
* Manually tested compiling with older versions of ZSTD: v1.3.8, v1.1.0, v0.6.2.
* Perf impact: from public benchmark data: http://fastcompression.blogspot.com/2019/03/presenting-xxh3.html for checksum and https://github.com/facebook/zstd#benchmarks, if decompression is 1700MB/s and checksum computation is 70000MB/s, checksum computation is an additional ~2.4% time for decompression. Compression is slower and checksumming should be less noticeable.
* Microbench:
```
TEST_TMPDIR=/dev/shm ./branch_db_basic_bench --benchmark_filter=DBGet/comp_style:0/max_data:1048576/per_key_size:256/enable_statistics:0/negative_query:0/enable_filter:0/mmap:0/compression_type:7/compression_checksum:1/no_blockcache:1/iterations:10000/threads:1 --benchmark_repetitions=100
Min out of 100 runs:
Main:
10390 10436 10456 10484 10499 10535 10544 10545 10565 10568
After this PR, checksum=false
10285 10397 10503 10508 10515 10557 10562 10635 10640 10660
After this PR, checksum=true
10827 10876 10925 10949 10971 11052 11061 11063 11100 11109
```
* db_bench:
```
Write perf
TEST_TMPDIR=/dev/shm/ ./db_bench_ichecksum --benchmarks=fillseq[-X10] --compression_type=zstd --num=10000000 --compression_checksum=..
[FillSeq checksum=0]
fillseq [AVG 10 runs] : 281635 (± 31711) ops/sec; 31.2 (± 3.5) MB/sec
fillseq [MEDIAN 10 runs] : 294027 ops/sec; 32.5 MB/sec
[FillSeq checksum=1]
fillseq [AVG 10 runs] : 286961 (± 34700) ops/sec; 31.7 (± 3.8) MB/sec
fillseq [MEDIAN 10 runs] : 283278 ops/sec; 31.3 MB/sec
Read perf
TEST_TMPDIR=/dev/shm ./db_bench_ichecksum --benchmarks=readrandom[-X20] --num=100000000 --reads=1000000 --use_existing_db=true --readonly=1
[Readrandom checksum=1]
readrandom [AVG 20 runs] : 360928 (± 3579) ops/sec; 4.0 (± 0.0) MB/sec
readrandom [MEDIAN 20 runs] : 362468 ops/sec; 4.0 MB/sec
[Readrandom checksum=0]
readrandom [AVG 20 runs] : 380365 (± 2384) ops/sec; 4.2 (± 0.0) MB/sec
readrandom [MEDIAN 20 runs] : 379800 ops/sec; 4.2 MB/sec
Compression
TEST_TMPDIR=/dev/shm ./db_bench_ichecksum --benchmarks=compress[-X20] --compression_type=zstd --num=100000000 --compression_checksum=1
checksum=1
compress [AVG 20 runs] : 54074 (± 634) ops/sec; 211.2 (± 2.5) MB/sec
compress [MEDIAN 20 runs] : 54396 ops/sec; 212.5 MB/sec
checksum=0
compress [AVG 20 runs] : 54598 (± 393) ops/sec; 213.3 (± 1.5) MB/sec
compress [MEDIAN 20 runs] : 54592 ops/sec; 213.3 MB/sec
Decompression:
TEST_TMPDIR=/dev/shm ./db_bench_ichecksum --benchmarks=uncompress[-X20] --compression_type=zstd --compression_checksum=1
checksum = 0
uncompress [AVG 20 runs] : 167499 (± 962) ops/sec; 654.3 (± 3.8) MB/sec
uncompress [MEDIAN 20 runs] : 167210 ops/sec; 653.2 MB/sec
checksum = 1
uncompress [AVG 20 runs] : 167980 (± 924) ops/sec; 656.2 (± 3.6) MB/sec
uncompress [MEDIAN 20 runs] : 168465 ops/sec; 658.1 MB/sec
```
Reviewed By: ajkr
Differential Revision: D48019378
Pulled By: cbi42
fbshipit-source-id: 674120c6e1853c2ced1436ac8138559d0204feba
2023-08-18 22:01:59 +00:00
|
|
|
// TODO(cbi): error message handling for other compression algorithms.
|
2020-09-25 16:00:29 +00:00
|
|
|
return ZSTD_Uncompress(uncompression_info, data, n, uncompressed_size,
|
Add `CompressionOptions::checksum` for enabling ZSTD checksum (#11666)
Summary:
Optionally enable zstd checksum flag (https://github.com/facebook/zstd/blob/d857369028d997c92ff1f1861a4d7f679a125464/lib/zstd.h#L428) to detect corruption during decompression. Main changes are in compression.h:
* User can set CompressionOptions::checksum to true to enable this feature.
* We enable this feature in ZSTD by setting the checksum flag in ZSTD compression context: `ZSTD_CCtx`.
* Uses `ZSTD_compress2()` to do compression since it supports frame parameter like the checksum flag. Compression level is also set in compression context as a flag.
* Error handling during decompression to propagate error message from ZSTD.
* Updated microbench to test read performance impact.
About compatibility, the current compression decoders should continue to work with the data created by the new compression API `ZSTD_compress2()`: https://github.com/facebook/zstd/issues/3711.
Pull Request resolved: https://github.com/facebook/rocksdb/pull/11666
Test Plan:
* Existing unit tests for zstd compression
* Add unit test `DBTest2.ZSTDChecksum` to test the corruption case
* Manually tested that compression levels, parallel compression, dictionary compression, index compression all work with the new ZSTD_compress2() API.
* Manually tested with `sst_dump --command=recompress` that different compression levels and dictionary compression settings all work.
* Manually tested compiling with older versions of ZSTD: v1.3.8, v1.1.0, v0.6.2.
* Perf impact: from public benchmark data: http://fastcompression.blogspot.com/2019/03/presenting-xxh3.html for checksum and https://github.com/facebook/zstd#benchmarks, if decompression is 1700MB/s and checksum computation is 70000MB/s, checksum computation is an additional ~2.4% time for decompression. Compression is slower and checksumming should be less noticeable.
* Microbench:
```
TEST_TMPDIR=/dev/shm ./branch_db_basic_bench --benchmark_filter=DBGet/comp_style:0/max_data:1048576/per_key_size:256/enable_statistics:0/negative_query:0/enable_filter:0/mmap:0/compression_type:7/compression_checksum:1/no_blockcache:1/iterations:10000/threads:1 --benchmark_repetitions=100
Min out of 100 runs:
Main:
10390 10436 10456 10484 10499 10535 10544 10545 10565 10568
After this PR, checksum=false
10285 10397 10503 10508 10515 10557 10562 10635 10640 10660
After this PR, checksum=true
10827 10876 10925 10949 10971 11052 11061 11063 11100 11109
```
* db_bench:
```
Write perf
TEST_TMPDIR=/dev/shm/ ./db_bench_ichecksum --benchmarks=fillseq[-X10] --compression_type=zstd --num=10000000 --compression_checksum=..
[FillSeq checksum=0]
fillseq [AVG 10 runs] : 281635 (± 31711) ops/sec; 31.2 (± 3.5) MB/sec
fillseq [MEDIAN 10 runs] : 294027 ops/sec; 32.5 MB/sec
[FillSeq checksum=1]
fillseq [AVG 10 runs] : 286961 (± 34700) ops/sec; 31.7 (± 3.8) MB/sec
fillseq [MEDIAN 10 runs] : 283278 ops/sec; 31.3 MB/sec
Read perf
TEST_TMPDIR=/dev/shm ./db_bench_ichecksum --benchmarks=readrandom[-X20] --num=100000000 --reads=1000000 --use_existing_db=true --readonly=1
[Readrandom checksum=1]
readrandom [AVG 20 runs] : 360928 (± 3579) ops/sec; 4.0 (± 0.0) MB/sec
readrandom [MEDIAN 20 runs] : 362468 ops/sec; 4.0 MB/sec
[Readrandom checksum=0]
readrandom [AVG 20 runs] : 380365 (± 2384) ops/sec; 4.2 (± 0.0) MB/sec
readrandom [MEDIAN 20 runs] : 379800 ops/sec; 4.2 MB/sec
Compression
TEST_TMPDIR=/dev/shm ./db_bench_ichecksum --benchmarks=compress[-X20] --compression_type=zstd --num=100000000 --compression_checksum=1
checksum=1
compress [AVG 20 runs] : 54074 (± 634) ops/sec; 211.2 (± 2.5) MB/sec
compress [MEDIAN 20 runs] : 54396 ops/sec; 212.5 MB/sec
checksum=0
compress [AVG 20 runs] : 54598 (± 393) ops/sec; 213.3 (± 1.5) MB/sec
compress [MEDIAN 20 runs] : 54592 ops/sec; 213.3 MB/sec
Decompression:
TEST_TMPDIR=/dev/shm ./db_bench_ichecksum --benchmarks=uncompress[-X20] --compression_type=zstd --compression_checksum=1
checksum = 0
uncompress [AVG 20 runs] : 167499 (± 962) ops/sec; 654.3 (± 3.8) MB/sec
uncompress [MEDIAN 20 runs] : 167210 ops/sec; 653.2 MB/sec
checksum = 1
uncompress [AVG 20 runs] : 167980 (± 924) ops/sec; 656.2 (± 3.6) MB/sec
uncompress [MEDIAN 20 runs] : 168465 ops/sec; 658.1 MB/sec
```
Reviewed By: ajkr
Differential Revision: D48019378
Pulled By: cbi42
fbshipit-source-id: 674120c6e1853c2ced1436ac8138559d0204feba
2023-08-18 22:01:59 +00:00
|
|
|
allocator, error_message);
|
2020-09-25 16:00:29 +00:00
|
|
|
default:
|
|
|
|
return CacheAllocationPtr();
|
|
|
|
}
|
|
|
|
}
|
|
|
|
|
2022-02-18 00:18:01 +00:00
|
|
|
// Records the compression type for subsequent WAL records.
|
|
|
|
class CompressionTypeRecord {
|
|
|
|
public:
|
|
|
|
explicit CompressionTypeRecord(CompressionType compression_type)
|
|
|
|
: compression_type_(compression_type) {}
|
|
|
|
|
|
|
|
CompressionType GetCompressionType() const { return compression_type_; }
|
|
|
|
|
|
|
|
inline void EncodeTo(std::string* dst) const {
|
|
|
|
assert(dst != nullptr);
|
|
|
|
PutFixed32(dst, compression_type_);
|
|
|
|
}
|
|
|
|
|
|
|
|
inline Status DecodeFrom(Slice* src) {
|
|
|
|
constexpr char class_name[] = "CompressionTypeRecord";
|
|
|
|
|
|
|
|
uint32_t val;
|
|
|
|
if (!GetFixed32(src, &val)) {
|
|
|
|
return Status::Corruption(class_name,
|
|
|
|
"Error decoding WAL compression type");
|
|
|
|
}
|
|
|
|
CompressionType compression_type = static_cast<CompressionType>(val);
|
|
|
|
if (!StreamingCompressionTypeSupported(compression_type)) {
|
|
|
|
return Status::Corruption(class_name,
|
|
|
|
"WAL compression type not supported");
|
|
|
|
}
|
|
|
|
compression_type_ = compression_type;
|
|
|
|
return Status::OK();
|
|
|
|
}
|
|
|
|
|
|
|
|
inline std::string DebugString() const {
|
|
|
|
return "compression_type: " + CompressionTypeToString(compression_type_);
|
|
|
|
}
|
|
|
|
|
|
|
|
private:
|
|
|
|
CompressionType compression_type_;
|
|
|
|
};
|
|
|
|
|
2022-02-24 07:45:04 +00:00
|
|
|
// Base class to implement compression for a stream of buffers.
|
|
|
|
// Instantiate an implementation of the class using Create() with the
|
|
|
|
// compression type and use Compress() repeatedly.
|
|
|
|
// The output buffer needs to be at least max_output_len.
|
|
|
|
// Call Reset() in between frame boundaries or in case of an error.
|
|
|
|
// NOTE: This class is not thread safe.
|
|
|
|
class StreamingCompress {
|
|
|
|
public:
|
|
|
|
StreamingCompress(CompressionType compression_type,
|
|
|
|
const CompressionOptions& opts,
|
|
|
|
uint32_t compress_format_version, size_t max_output_len)
|
|
|
|
: compression_type_(compression_type),
|
|
|
|
opts_(opts),
|
|
|
|
compress_format_version_(compress_format_version),
|
|
|
|
max_output_len_(max_output_len) {}
|
|
|
|
virtual ~StreamingCompress() = default;
|
|
|
|
// compress should be called repeatedly with the same input till the method
|
|
|
|
// returns 0
|
|
|
|
// Parameters:
|
|
|
|
// input - buffer to compress
|
|
|
|
// input_size - size of input buffer
|
|
|
|
// output - compressed buffer allocated by caller, should be at least
|
|
|
|
// max_output_len
|
|
|
|
// output_size - size of the output buffer
|
|
|
|
// Returns -1 for errors, the remaining size of the input buffer that needs to
|
|
|
|
// be compressed
|
|
|
|
virtual int Compress(const char* input, size_t input_size, char* output,
|
2022-03-09 23:49:53 +00:00
|
|
|
size_t* output_pos) = 0;
|
2022-02-24 07:45:04 +00:00
|
|
|
// static method to create object of a class inherited from StreamingCompress
|
|
|
|
// based on the actual compression type.
|
|
|
|
static StreamingCompress* Create(CompressionType compression_type,
|
|
|
|
const CompressionOptions& opts,
|
|
|
|
uint32_t compress_format_version,
|
|
|
|
size_t max_output_len);
|
|
|
|
virtual void Reset() = 0;
|
|
|
|
|
|
|
|
protected:
|
|
|
|
const CompressionType compression_type_;
|
|
|
|
const CompressionOptions opts_;
|
|
|
|
const uint32_t compress_format_version_;
|
|
|
|
const size_t max_output_len_;
|
|
|
|
};
|
|
|
|
|
|
|
|
// Base class to uncompress a stream of compressed buffers.
|
|
|
|
// Instantiate an implementation of the class using Create() with the
|
|
|
|
// compression type and use Uncompress() repeatedly.
|
|
|
|
// The output buffer needs to be at least max_output_len.
|
|
|
|
// Call Reset() in between frame boundaries or in case of an error.
|
|
|
|
// NOTE: This class is not thread safe.
|
|
|
|
class StreamingUncompress {
|
|
|
|
public:
|
|
|
|
StreamingUncompress(CompressionType compression_type,
|
|
|
|
uint32_t compress_format_version, size_t max_output_len)
|
|
|
|
: compression_type_(compression_type),
|
|
|
|
compress_format_version_(compress_format_version),
|
|
|
|
max_output_len_(max_output_len) {}
|
|
|
|
virtual ~StreamingUncompress() = default;
|
2023-02-08 20:05:49 +00:00
|
|
|
// Uncompress can be called repeatedly to progressively process the same
|
|
|
|
// input buffer, or can be called with a new input buffer. When the input
|
|
|
|
// buffer is not fully consumed, the return value is > 0 or output_size
|
|
|
|
// == max_output_len. When calling uncompress to continue processing the
|
|
|
|
// same input buffer, the input argument should be nullptr.
|
2022-02-24 07:45:04 +00:00
|
|
|
// Parameters:
|
|
|
|
// input - buffer to uncompress
|
|
|
|
// input_size - size of input buffer
|
|
|
|
// output - uncompressed buffer allocated by caller, should be at least
|
|
|
|
// max_output_len
|
|
|
|
// output_size - size of the output buffer
|
|
|
|
// Returns -1 for errors, remaining input to be processed otherwise.
|
|
|
|
virtual int Uncompress(const char* input, size_t input_size, char* output,
|
2022-03-09 23:49:53 +00:00
|
|
|
size_t* output_pos) = 0;
|
2022-02-24 07:45:04 +00:00
|
|
|
static StreamingUncompress* Create(CompressionType compression_type,
|
|
|
|
uint32_t compress_format_version,
|
|
|
|
size_t max_output_len);
|
|
|
|
virtual void Reset() = 0;
|
|
|
|
|
|
|
|
protected:
|
|
|
|
CompressionType compression_type_;
|
|
|
|
uint32_t compress_format_version_;
|
|
|
|
size_t max_output_len_;
|
|
|
|
};
|
|
|
|
|
|
|
|
class ZSTDStreamingCompress final : public StreamingCompress {
|
|
|
|
public:
|
|
|
|
explicit ZSTDStreamingCompress(const CompressionOptions& opts,
|
|
|
|
uint32_t compress_format_version,
|
|
|
|
size_t max_output_len)
|
|
|
|
: StreamingCompress(kZSTD, opts, compress_format_version,
|
|
|
|
max_output_len) {
|
Add `CompressionOptions::checksum` for enabling ZSTD checksum (#11666)
Summary:
Optionally enable zstd checksum flag (https://github.com/facebook/zstd/blob/d857369028d997c92ff1f1861a4d7f679a125464/lib/zstd.h#L428) to detect corruption during decompression. Main changes are in compression.h:
* User can set CompressionOptions::checksum to true to enable this feature.
* We enable this feature in ZSTD by setting the checksum flag in ZSTD compression context: `ZSTD_CCtx`.
* Uses `ZSTD_compress2()` to do compression since it supports frame parameter like the checksum flag. Compression level is also set in compression context as a flag.
* Error handling during decompression to propagate error message from ZSTD.
* Updated microbench to test read performance impact.
About compatibility, the current compression decoders should continue to work with the data created by the new compression API `ZSTD_compress2()`: https://github.com/facebook/zstd/issues/3711.
Pull Request resolved: https://github.com/facebook/rocksdb/pull/11666
Test Plan:
* Existing unit tests for zstd compression
* Add unit test `DBTest2.ZSTDChecksum` to test the corruption case
* Manually tested that compression levels, parallel compression, dictionary compression, index compression all work with the new ZSTD_compress2() API.
* Manually tested with `sst_dump --command=recompress` that different compression levels and dictionary compression settings all work.
* Manually tested compiling with older versions of ZSTD: v1.3.8, v1.1.0, v0.6.2.
* Perf impact: from public benchmark data: http://fastcompression.blogspot.com/2019/03/presenting-xxh3.html for checksum and https://github.com/facebook/zstd#benchmarks, if decompression is 1700MB/s and checksum computation is 70000MB/s, checksum computation is an additional ~2.4% time for decompression. Compression is slower and checksumming should be less noticeable.
* Microbench:
```
TEST_TMPDIR=/dev/shm ./branch_db_basic_bench --benchmark_filter=DBGet/comp_style:0/max_data:1048576/per_key_size:256/enable_statistics:0/negative_query:0/enable_filter:0/mmap:0/compression_type:7/compression_checksum:1/no_blockcache:1/iterations:10000/threads:1 --benchmark_repetitions=100
Min out of 100 runs:
Main:
10390 10436 10456 10484 10499 10535 10544 10545 10565 10568
After this PR, checksum=false
10285 10397 10503 10508 10515 10557 10562 10635 10640 10660
After this PR, checksum=true
10827 10876 10925 10949 10971 11052 11061 11063 11100 11109
```
* db_bench:
```
Write perf
TEST_TMPDIR=/dev/shm/ ./db_bench_ichecksum --benchmarks=fillseq[-X10] --compression_type=zstd --num=10000000 --compression_checksum=..
[FillSeq checksum=0]
fillseq [AVG 10 runs] : 281635 (± 31711) ops/sec; 31.2 (± 3.5) MB/sec
fillseq [MEDIAN 10 runs] : 294027 ops/sec; 32.5 MB/sec
[FillSeq checksum=1]
fillseq [AVG 10 runs] : 286961 (± 34700) ops/sec; 31.7 (± 3.8) MB/sec
fillseq [MEDIAN 10 runs] : 283278 ops/sec; 31.3 MB/sec
Read perf
TEST_TMPDIR=/dev/shm ./db_bench_ichecksum --benchmarks=readrandom[-X20] --num=100000000 --reads=1000000 --use_existing_db=true --readonly=1
[Readrandom checksum=1]
readrandom [AVG 20 runs] : 360928 (± 3579) ops/sec; 4.0 (± 0.0) MB/sec
readrandom [MEDIAN 20 runs] : 362468 ops/sec; 4.0 MB/sec
[Readrandom checksum=0]
readrandom [AVG 20 runs] : 380365 (± 2384) ops/sec; 4.2 (± 0.0) MB/sec
readrandom [MEDIAN 20 runs] : 379800 ops/sec; 4.2 MB/sec
Compression
TEST_TMPDIR=/dev/shm ./db_bench_ichecksum --benchmarks=compress[-X20] --compression_type=zstd --num=100000000 --compression_checksum=1
checksum=1
compress [AVG 20 runs] : 54074 (± 634) ops/sec; 211.2 (± 2.5) MB/sec
compress [MEDIAN 20 runs] : 54396 ops/sec; 212.5 MB/sec
checksum=0
compress [AVG 20 runs] : 54598 (± 393) ops/sec; 213.3 (± 1.5) MB/sec
compress [MEDIAN 20 runs] : 54592 ops/sec; 213.3 MB/sec
Decompression:
TEST_TMPDIR=/dev/shm ./db_bench_ichecksum --benchmarks=uncompress[-X20] --compression_type=zstd --compression_checksum=1
checksum = 0
uncompress [AVG 20 runs] : 167499 (± 962) ops/sec; 654.3 (± 3.8) MB/sec
uncompress [MEDIAN 20 runs] : 167210 ops/sec; 653.2 MB/sec
checksum = 1
uncompress [AVG 20 runs] : 167980 (± 924) ops/sec; 656.2 (± 3.6) MB/sec
uncompress [MEDIAN 20 runs] : 168465 ops/sec; 658.1 MB/sec
```
Reviewed By: ajkr
Differential Revision: D48019378
Pulled By: cbi42
fbshipit-source-id: 674120c6e1853c2ced1436ac8138559d0204feba
2023-08-18 22:01:59 +00:00
|
|
|
#ifdef ZSTD_ADVANCED
|
2022-02-24 07:45:04 +00:00
|
|
|
cctx_ = ZSTD_createCCtx();
|
2022-07-13 22:29:20 +00:00
|
|
|
// Each compressed frame will have a checksum
|
|
|
|
ZSTD_CCtx_setParameter(cctx_, ZSTD_c_checksumFlag, 1);
|
2022-02-24 07:45:04 +00:00
|
|
|
assert(cctx_ != nullptr);
|
|
|
|
input_buffer_ = {/*src=*/nullptr, /*size=*/0, /*pos=*/0};
|
|
|
|
#endif
|
|
|
|
}
|
|
|
|
~ZSTDStreamingCompress() override {
|
Add `CompressionOptions::checksum` for enabling ZSTD checksum (#11666)
Summary:
Optionally enable zstd checksum flag (https://github.com/facebook/zstd/blob/d857369028d997c92ff1f1861a4d7f679a125464/lib/zstd.h#L428) to detect corruption during decompression. Main changes are in compression.h:
* User can set CompressionOptions::checksum to true to enable this feature.
* We enable this feature in ZSTD by setting the checksum flag in ZSTD compression context: `ZSTD_CCtx`.
* Uses `ZSTD_compress2()` to do compression since it supports frame parameter like the checksum flag. Compression level is also set in compression context as a flag.
* Error handling during decompression to propagate error message from ZSTD.
* Updated microbench to test read performance impact.
About compatibility, the current compression decoders should continue to work with the data created by the new compression API `ZSTD_compress2()`: https://github.com/facebook/zstd/issues/3711.
Pull Request resolved: https://github.com/facebook/rocksdb/pull/11666
Test Plan:
* Existing unit tests for zstd compression
* Add unit test `DBTest2.ZSTDChecksum` to test the corruption case
* Manually tested that compression levels, parallel compression, dictionary compression, index compression all work with the new ZSTD_compress2() API.
* Manually tested with `sst_dump --command=recompress` that different compression levels and dictionary compression settings all work.
* Manually tested compiling with older versions of ZSTD: v1.3.8, v1.1.0, v0.6.2.
* Perf impact: from public benchmark data: http://fastcompression.blogspot.com/2019/03/presenting-xxh3.html for checksum and https://github.com/facebook/zstd#benchmarks, if decompression is 1700MB/s and checksum computation is 70000MB/s, checksum computation is an additional ~2.4% time for decompression. Compression is slower and checksumming should be less noticeable.
* Microbench:
```
TEST_TMPDIR=/dev/shm ./branch_db_basic_bench --benchmark_filter=DBGet/comp_style:0/max_data:1048576/per_key_size:256/enable_statistics:0/negative_query:0/enable_filter:0/mmap:0/compression_type:7/compression_checksum:1/no_blockcache:1/iterations:10000/threads:1 --benchmark_repetitions=100
Min out of 100 runs:
Main:
10390 10436 10456 10484 10499 10535 10544 10545 10565 10568
After this PR, checksum=false
10285 10397 10503 10508 10515 10557 10562 10635 10640 10660
After this PR, checksum=true
10827 10876 10925 10949 10971 11052 11061 11063 11100 11109
```
* db_bench:
```
Write perf
TEST_TMPDIR=/dev/shm/ ./db_bench_ichecksum --benchmarks=fillseq[-X10] --compression_type=zstd --num=10000000 --compression_checksum=..
[FillSeq checksum=0]
fillseq [AVG 10 runs] : 281635 (± 31711) ops/sec; 31.2 (± 3.5) MB/sec
fillseq [MEDIAN 10 runs] : 294027 ops/sec; 32.5 MB/sec
[FillSeq checksum=1]
fillseq [AVG 10 runs] : 286961 (± 34700) ops/sec; 31.7 (± 3.8) MB/sec
fillseq [MEDIAN 10 runs] : 283278 ops/sec; 31.3 MB/sec
Read perf
TEST_TMPDIR=/dev/shm ./db_bench_ichecksum --benchmarks=readrandom[-X20] --num=100000000 --reads=1000000 --use_existing_db=true --readonly=1
[Readrandom checksum=1]
readrandom [AVG 20 runs] : 360928 (± 3579) ops/sec; 4.0 (± 0.0) MB/sec
readrandom [MEDIAN 20 runs] : 362468 ops/sec; 4.0 MB/sec
[Readrandom checksum=0]
readrandom [AVG 20 runs] : 380365 (± 2384) ops/sec; 4.2 (± 0.0) MB/sec
readrandom [MEDIAN 20 runs] : 379800 ops/sec; 4.2 MB/sec
Compression
TEST_TMPDIR=/dev/shm ./db_bench_ichecksum --benchmarks=compress[-X20] --compression_type=zstd --num=100000000 --compression_checksum=1
checksum=1
compress [AVG 20 runs] : 54074 (± 634) ops/sec; 211.2 (± 2.5) MB/sec
compress [MEDIAN 20 runs] : 54396 ops/sec; 212.5 MB/sec
checksum=0
compress [AVG 20 runs] : 54598 (± 393) ops/sec; 213.3 (± 1.5) MB/sec
compress [MEDIAN 20 runs] : 54592 ops/sec; 213.3 MB/sec
Decompression:
TEST_TMPDIR=/dev/shm ./db_bench_ichecksum --benchmarks=uncompress[-X20] --compression_type=zstd --compression_checksum=1
checksum = 0
uncompress [AVG 20 runs] : 167499 (± 962) ops/sec; 654.3 (± 3.8) MB/sec
uncompress [MEDIAN 20 runs] : 167210 ops/sec; 653.2 MB/sec
checksum = 1
uncompress [AVG 20 runs] : 167980 (± 924) ops/sec; 656.2 (± 3.6) MB/sec
uncompress [MEDIAN 20 runs] : 168465 ops/sec; 658.1 MB/sec
```
Reviewed By: ajkr
Differential Revision: D48019378
Pulled By: cbi42
fbshipit-source-id: 674120c6e1853c2ced1436ac8138559d0204feba
2023-08-18 22:01:59 +00:00
|
|
|
#ifdef ZSTD_ADVANCED
|
2022-02-24 07:45:04 +00:00
|
|
|
ZSTD_freeCCtx(cctx_);
|
|
|
|
#endif
|
|
|
|
}
|
|
|
|
int Compress(const char* input, size_t input_size, char* output,
|
2022-03-09 23:49:53 +00:00
|
|
|
size_t* output_pos) override;
|
2022-02-24 07:45:04 +00:00
|
|
|
void Reset() override;
|
Add `CompressionOptions::checksum` for enabling ZSTD checksum (#11666)
Summary:
Optionally enable zstd checksum flag (https://github.com/facebook/zstd/blob/d857369028d997c92ff1f1861a4d7f679a125464/lib/zstd.h#L428) to detect corruption during decompression. Main changes are in compression.h:
* User can set CompressionOptions::checksum to true to enable this feature.
* We enable this feature in ZSTD by setting the checksum flag in ZSTD compression context: `ZSTD_CCtx`.
* Uses `ZSTD_compress2()` to do compression since it supports frame parameter like the checksum flag. Compression level is also set in compression context as a flag.
* Error handling during decompression to propagate error message from ZSTD.
* Updated microbench to test read performance impact.
About compatibility, the current compression decoders should continue to work with the data created by the new compression API `ZSTD_compress2()`: https://github.com/facebook/zstd/issues/3711.
Pull Request resolved: https://github.com/facebook/rocksdb/pull/11666
Test Plan:
* Existing unit tests for zstd compression
* Add unit test `DBTest2.ZSTDChecksum` to test the corruption case
* Manually tested that compression levels, parallel compression, dictionary compression, index compression all work with the new ZSTD_compress2() API.
* Manually tested with `sst_dump --command=recompress` that different compression levels and dictionary compression settings all work.
* Manually tested compiling with older versions of ZSTD: v1.3.8, v1.1.0, v0.6.2.
* Perf impact: from public benchmark data: http://fastcompression.blogspot.com/2019/03/presenting-xxh3.html for checksum and https://github.com/facebook/zstd#benchmarks, if decompression is 1700MB/s and checksum computation is 70000MB/s, checksum computation is an additional ~2.4% time for decompression. Compression is slower and checksumming should be less noticeable.
* Microbench:
```
TEST_TMPDIR=/dev/shm ./branch_db_basic_bench --benchmark_filter=DBGet/comp_style:0/max_data:1048576/per_key_size:256/enable_statistics:0/negative_query:0/enable_filter:0/mmap:0/compression_type:7/compression_checksum:1/no_blockcache:1/iterations:10000/threads:1 --benchmark_repetitions=100
Min out of 100 runs:
Main:
10390 10436 10456 10484 10499 10535 10544 10545 10565 10568
After this PR, checksum=false
10285 10397 10503 10508 10515 10557 10562 10635 10640 10660
After this PR, checksum=true
10827 10876 10925 10949 10971 11052 11061 11063 11100 11109
```
* db_bench:
```
Write perf
TEST_TMPDIR=/dev/shm/ ./db_bench_ichecksum --benchmarks=fillseq[-X10] --compression_type=zstd --num=10000000 --compression_checksum=..
[FillSeq checksum=0]
fillseq [AVG 10 runs] : 281635 (± 31711) ops/sec; 31.2 (± 3.5) MB/sec
fillseq [MEDIAN 10 runs] : 294027 ops/sec; 32.5 MB/sec
[FillSeq checksum=1]
fillseq [AVG 10 runs] : 286961 (± 34700) ops/sec; 31.7 (± 3.8) MB/sec
fillseq [MEDIAN 10 runs] : 283278 ops/sec; 31.3 MB/sec
Read perf
TEST_TMPDIR=/dev/shm ./db_bench_ichecksum --benchmarks=readrandom[-X20] --num=100000000 --reads=1000000 --use_existing_db=true --readonly=1
[Readrandom checksum=1]
readrandom [AVG 20 runs] : 360928 (± 3579) ops/sec; 4.0 (± 0.0) MB/sec
readrandom [MEDIAN 20 runs] : 362468 ops/sec; 4.0 MB/sec
[Readrandom checksum=0]
readrandom [AVG 20 runs] : 380365 (± 2384) ops/sec; 4.2 (± 0.0) MB/sec
readrandom [MEDIAN 20 runs] : 379800 ops/sec; 4.2 MB/sec
Compression
TEST_TMPDIR=/dev/shm ./db_bench_ichecksum --benchmarks=compress[-X20] --compression_type=zstd --num=100000000 --compression_checksum=1
checksum=1
compress [AVG 20 runs] : 54074 (± 634) ops/sec; 211.2 (± 2.5) MB/sec
compress [MEDIAN 20 runs] : 54396 ops/sec; 212.5 MB/sec
checksum=0
compress [AVG 20 runs] : 54598 (± 393) ops/sec; 213.3 (± 1.5) MB/sec
compress [MEDIAN 20 runs] : 54592 ops/sec; 213.3 MB/sec
Decompression:
TEST_TMPDIR=/dev/shm ./db_bench_ichecksum --benchmarks=uncompress[-X20] --compression_type=zstd --compression_checksum=1
checksum = 0
uncompress [AVG 20 runs] : 167499 (± 962) ops/sec; 654.3 (± 3.8) MB/sec
uncompress [MEDIAN 20 runs] : 167210 ops/sec; 653.2 MB/sec
checksum = 1
uncompress [AVG 20 runs] : 167980 (± 924) ops/sec; 656.2 (± 3.6) MB/sec
uncompress [MEDIAN 20 runs] : 168465 ops/sec; 658.1 MB/sec
```
Reviewed By: ajkr
Differential Revision: D48019378
Pulled By: cbi42
fbshipit-source-id: 674120c6e1853c2ced1436ac8138559d0204feba
2023-08-18 22:01:59 +00:00
|
|
|
#ifdef ZSTD_ADVANCED
|
2022-02-24 07:45:04 +00:00
|
|
|
ZSTD_CCtx* cctx_;
|
|
|
|
ZSTD_inBuffer input_buffer_;
|
|
|
|
#endif
|
|
|
|
};
|
|
|
|
|
|
|
|
class ZSTDStreamingUncompress final : public StreamingUncompress {
|
|
|
|
public:
|
|
|
|
explicit ZSTDStreamingUncompress(uint32_t compress_format_version,
|
|
|
|
size_t max_output_len)
|
|
|
|
: StreamingUncompress(kZSTD, compress_format_version, max_output_len) {
|
Add `CompressionOptions::checksum` for enabling ZSTD checksum (#11666)
Summary:
Optionally enable zstd checksum flag (https://github.com/facebook/zstd/blob/d857369028d997c92ff1f1861a4d7f679a125464/lib/zstd.h#L428) to detect corruption during decompression. Main changes are in compression.h:
* User can set CompressionOptions::checksum to true to enable this feature.
* We enable this feature in ZSTD by setting the checksum flag in ZSTD compression context: `ZSTD_CCtx`.
* Uses `ZSTD_compress2()` to do compression since it supports frame parameter like the checksum flag. Compression level is also set in compression context as a flag.
* Error handling during decompression to propagate error message from ZSTD.
* Updated microbench to test read performance impact.
About compatibility, the current compression decoders should continue to work with the data created by the new compression API `ZSTD_compress2()`: https://github.com/facebook/zstd/issues/3711.
Pull Request resolved: https://github.com/facebook/rocksdb/pull/11666
Test Plan:
* Existing unit tests for zstd compression
* Add unit test `DBTest2.ZSTDChecksum` to test the corruption case
* Manually tested that compression levels, parallel compression, dictionary compression, index compression all work with the new ZSTD_compress2() API.
* Manually tested with `sst_dump --command=recompress` that different compression levels and dictionary compression settings all work.
* Manually tested compiling with older versions of ZSTD: v1.3.8, v1.1.0, v0.6.2.
* Perf impact: from public benchmark data: http://fastcompression.blogspot.com/2019/03/presenting-xxh3.html for checksum and https://github.com/facebook/zstd#benchmarks, if decompression is 1700MB/s and checksum computation is 70000MB/s, checksum computation is an additional ~2.4% time for decompression. Compression is slower and checksumming should be less noticeable.
* Microbench:
```
TEST_TMPDIR=/dev/shm ./branch_db_basic_bench --benchmark_filter=DBGet/comp_style:0/max_data:1048576/per_key_size:256/enable_statistics:0/negative_query:0/enable_filter:0/mmap:0/compression_type:7/compression_checksum:1/no_blockcache:1/iterations:10000/threads:1 --benchmark_repetitions=100
Min out of 100 runs:
Main:
10390 10436 10456 10484 10499 10535 10544 10545 10565 10568
After this PR, checksum=false
10285 10397 10503 10508 10515 10557 10562 10635 10640 10660
After this PR, checksum=true
10827 10876 10925 10949 10971 11052 11061 11063 11100 11109
```
* db_bench:
```
Write perf
TEST_TMPDIR=/dev/shm/ ./db_bench_ichecksum --benchmarks=fillseq[-X10] --compression_type=zstd --num=10000000 --compression_checksum=..
[FillSeq checksum=0]
fillseq [AVG 10 runs] : 281635 (± 31711) ops/sec; 31.2 (± 3.5) MB/sec
fillseq [MEDIAN 10 runs] : 294027 ops/sec; 32.5 MB/sec
[FillSeq checksum=1]
fillseq [AVG 10 runs] : 286961 (± 34700) ops/sec; 31.7 (± 3.8) MB/sec
fillseq [MEDIAN 10 runs] : 283278 ops/sec; 31.3 MB/sec
Read perf
TEST_TMPDIR=/dev/shm ./db_bench_ichecksum --benchmarks=readrandom[-X20] --num=100000000 --reads=1000000 --use_existing_db=true --readonly=1
[Readrandom checksum=1]
readrandom [AVG 20 runs] : 360928 (± 3579) ops/sec; 4.0 (± 0.0) MB/sec
readrandom [MEDIAN 20 runs] : 362468 ops/sec; 4.0 MB/sec
[Readrandom checksum=0]
readrandom [AVG 20 runs] : 380365 (± 2384) ops/sec; 4.2 (± 0.0) MB/sec
readrandom [MEDIAN 20 runs] : 379800 ops/sec; 4.2 MB/sec
Compression
TEST_TMPDIR=/dev/shm ./db_bench_ichecksum --benchmarks=compress[-X20] --compression_type=zstd --num=100000000 --compression_checksum=1
checksum=1
compress [AVG 20 runs] : 54074 (± 634) ops/sec; 211.2 (± 2.5) MB/sec
compress [MEDIAN 20 runs] : 54396 ops/sec; 212.5 MB/sec
checksum=0
compress [AVG 20 runs] : 54598 (± 393) ops/sec; 213.3 (± 1.5) MB/sec
compress [MEDIAN 20 runs] : 54592 ops/sec; 213.3 MB/sec
Decompression:
TEST_TMPDIR=/dev/shm ./db_bench_ichecksum --benchmarks=uncompress[-X20] --compression_type=zstd --compression_checksum=1
checksum = 0
uncompress [AVG 20 runs] : 167499 (± 962) ops/sec; 654.3 (± 3.8) MB/sec
uncompress [MEDIAN 20 runs] : 167210 ops/sec; 653.2 MB/sec
checksum = 1
uncompress [AVG 20 runs] : 167980 (± 924) ops/sec; 656.2 (± 3.6) MB/sec
uncompress [MEDIAN 20 runs] : 168465 ops/sec; 658.1 MB/sec
```
Reviewed By: ajkr
Differential Revision: D48019378
Pulled By: cbi42
fbshipit-source-id: 674120c6e1853c2ced1436ac8138559d0204feba
2023-08-18 22:01:59 +00:00
|
|
|
#ifdef ZSTD_ADVANCED
|
2022-02-24 07:45:04 +00:00
|
|
|
dctx_ = ZSTD_createDCtx();
|
|
|
|
assert(dctx_ != nullptr);
|
|
|
|
input_buffer_ = {/*src=*/nullptr, /*size=*/0, /*pos=*/0};
|
|
|
|
#endif
|
|
|
|
}
|
|
|
|
~ZSTDStreamingUncompress() override {
|
Add `CompressionOptions::checksum` for enabling ZSTD checksum (#11666)
Summary:
Optionally enable zstd checksum flag (https://github.com/facebook/zstd/blob/d857369028d997c92ff1f1861a4d7f679a125464/lib/zstd.h#L428) to detect corruption during decompression. Main changes are in compression.h:
* User can set CompressionOptions::checksum to true to enable this feature.
* We enable this feature in ZSTD by setting the checksum flag in ZSTD compression context: `ZSTD_CCtx`.
* Uses `ZSTD_compress2()` to do compression since it supports frame parameter like the checksum flag. Compression level is also set in compression context as a flag.
* Error handling during decompression to propagate error message from ZSTD.
* Updated microbench to test read performance impact.
About compatibility, the current compression decoders should continue to work with the data created by the new compression API `ZSTD_compress2()`: https://github.com/facebook/zstd/issues/3711.
Pull Request resolved: https://github.com/facebook/rocksdb/pull/11666
Test Plan:
* Existing unit tests for zstd compression
* Add unit test `DBTest2.ZSTDChecksum` to test the corruption case
* Manually tested that compression levels, parallel compression, dictionary compression, index compression all work with the new ZSTD_compress2() API.
* Manually tested with `sst_dump --command=recompress` that different compression levels and dictionary compression settings all work.
* Manually tested compiling with older versions of ZSTD: v1.3.8, v1.1.0, v0.6.2.
* Perf impact: from public benchmark data: http://fastcompression.blogspot.com/2019/03/presenting-xxh3.html for checksum and https://github.com/facebook/zstd#benchmarks, if decompression is 1700MB/s and checksum computation is 70000MB/s, checksum computation is an additional ~2.4% time for decompression. Compression is slower and checksumming should be less noticeable.
* Microbench:
```
TEST_TMPDIR=/dev/shm ./branch_db_basic_bench --benchmark_filter=DBGet/comp_style:0/max_data:1048576/per_key_size:256/enable_statistics:0/negative_query:0/enable_filter:0/mmap:0/compression_type:7/compression_checksum:1/no_blockcache:1/iterations:10000/threads:1 --benchmark_repetitions=100
Min out of 100 runs:
Main:
10390 10436 10456 10484 10499 10535 10544 10545 10565 10568
After this PR, checksum=false
10285 10397 10503 10508 10515 10557 10562 10635 10640 10660
After this PR, checksum=true
10827 10876 10925 10949 10971 11052 11061 11063 11100 11109
```
* db_bench:
```
Write perf
TEST_TMPDIR=/dev/shm/ ./db_bench_ichecksum --benchmarks=fillseq[-X10] --compression_type=zstd --num=10000000 --compression_checksum=..
[FillSeq checksum=0]
fillseq [AVG 10 runs] : 281635 (± 31711) ops/sec; 31.2 (± 3.5) MB/sec
fillseq [MEDIAN 10 runs] : 294027 ops/sec; 32.5 MB/sec
[FillSeq checksum=1]
fillseq [AVG 10 runs] : 286961 (± 34700) ops/sec; 31.7 (± 3.8) MB/sec
fillseq [MEDIAN 10 runs] : 283278 ops/sec; 31.3 MB/sec
Read perf
TEST_TMPDIR=/dev/shm ./db_bench_ichecksum --benchmarks=readrandom[-X20] --num=100000000 --reads=1000000 --use_existing_db=true --readonly=1
[Readrandom checksum=1]
readrandom [AVG 20 runs] : 360928 (± 3579) ops/sec; 4.0 (± 0.0) MB/sec
readrandom [MEDIAN 20 runs] : 362468 ops/sec; 4.0 MB/sec
[Readrandom checksum=0]
readrandom [AVG 20 runs] : 380365 (± 2384) ops/sec; 4.2 (± 0.0) MB/sec
readrandom [MEDIAN 20 runs] : 379800 ops/sec; 4.2 MB/sec
Compression
TEST_TMPDIR=/dev/shm ./db_bench_ichecksum --benchmarks=compress[-X20] --compression_type=zstd --num=100000000 --compression_checksum=1
checksum=1
compress [AVG 20 runs] : 54074 (± 634) ops/sec; 211.2 (± 2.5) MB/sec
compress [MEDIAN 20 runs] : 54396 ops/sec; 212.5 MB/sec
checksum=0
compress [AVG 20 runs] : 54598 (± 393) ops/sec; 213.3 (± 1.5) MB/sec
compress [MEDIAN 20 runs] : 54592 ops/sec; 213.3 MB/sec
Decompression:
TEST_TMPDIR=/dev/shm ./db_bench_ichecksum --benchmarks=uncompress[-X20] --compression_type=zstd --compression_checksum=1
checksum = 0
uncompress [AVG 20 runs] : 167499 (± 962) ops/sec; 654.3 (± 3.8) MB/sec
uncompress [MEDIAN 20 runs] : 167210 ops/sec; 653.2 MB/sec
checksum = 1
uncompress [AVG 20 runs] : 167980 (± 924) ops/sec; 656.2 (± 3.6) MB/sec
uncompress [MEDIAN 20 runs] : 168465 ops/sec; 658.1 MB/sec
```
Reviewed By: ajkr
Differential Revision: D48019378
Pulled By: cbi42
fbshipit-source-id: 674120c6e1853c2ced1436ac8138559d0204feba
2023-08-18 22:01:59 +00:00
|
|
|
#ifdef ZSTD_ADVANCED
|
2022-02-24 07:45:04 +00:00
|
|
|
ZSTD_freeDCtx(dctx_);
|
|
|
|
#endif
|
|
|
|
}
|
|
|
|
int Uncompress(const char* input, size_t input_size, char* output,
|
|
|
|
size_t* output_size) override;
|
|
|
|
void Reset() override;
|
|
|
|
|
|
|
|
private:
|
Add `CompressionOptions::checksum` for enabling ZSTD checksum (#11666)
Summary:
Optionally enable zstd checksum flag (https://github.com/facebook/zstd/blob/d857369028d997c92ff1f1861a4d7f679a125464/lib/zstd.h#L428) to detect corruption during decompression. Main changes are in compression.h:
* User can set CompressionOptions::checksum to true to enable this feature.
* We enable this feature in ZSTD by setting the checksum flag in ZSTD compression context: `ZSTD_CCtx`.
* Uses `ZSTD_compress2()` to do compression since it supports frame parameter like the checksum flag. Compression level is also set in compression context as a flag.
* Error handling during decompression to propagate error message from ZSTD.
* Updated microbench to test read performance impact.
About compatibility, the current compression decoders should continue to work with the data created by the new compression API `ZSTD_compress2()`: https://github.com/facebook/zstd/issues/3711.
Pull Request resolved: https://github.com/facebook/rocksdb/pull/11666
Test Plan:
* Existing unit tests for zstd compression
* Add unit test `DBTest2.ZSTDChecksum` to test the corruption case
* Manually tested that compression levels, parallel compression, dictionary compression, index compression all work with the new ZSTD_compress2() API.
* Manually tested with `sst_dump --command=recompress` that different compression levels and dictionary compression settings all work.
* Manually tested compiling with older versions of ZSTD: v1.3.8, v1.1.0, v0.6.2.
* Perf impact: from public benchmark data: http://fastcompression.blogspot.com/2019/03/presenting-xxh3.html for checksum and https://github.com/facebook/zstd#benchmarks, if decompression is 1700MB/s and checksum computation is 70000MB/s, checksum computation is an additional ~2.4% time for decompression. Compression is slower and checksumming should be less noticeable.
* Microbench:
```
TEST_TMPDIR=/dev/shm ./branch_db_basic_bench --benchmark_filter=DBGet/comp_style:0/max_data:1048576/per_key_size:256/enable_statistics:0/negative_query:0/enable_filter:0/mmap:0/compression_type:7/compression_checksum:1/no_blockcache:1/iterations:10000/threads:1 --benchmark_repetitions=100
Min out of 100 runs:
Main:
10390 10436 10456 10484 10499 10535 10544 10545 10565 10568
After this PR, checksum=false
10285 10397 10503 10508 10515 10557 10562 10635 10640 10660
After this PR, checksum=true
10827 10876 10925 10949 10971 11052 11061 11063 11100 11109
```
* db_bench:
```
Write perf
TEST_TMPDIR=/dev/shm/ ./db_bench_ichecksum --benchmarks=fillseq[-X10] --compression_type=zstd --num=10000000 --compression_checksum=..
[FillSeq checksum=0]
fillseq [AVG 10 runs] : 281635 (± 31711) ops/sec; 31.2 (± 3.5) MB/sec
fillseq [MEDIAN 10 runs] : 294027 ops/sec; 32.5 MB/sec
[FillSeq checksum=1]
fillseq [AVG 10 runs] : 286961 (± 34700) ops/sec; 31.7 (± 3.8) MB/sec
fillseq [MEDIAN 10 runs] : 283278 ops/sec; 31.3 MB/sec
Read perf
TEST_TMPDIR=/dev/shm ./db_bench_ichecksum --benchmarks=readrandom[-X20] --num=100000000 --reads=1000000 --use_existing_db=true --readonly=1
[Readrandom checksum=1]
readrandom [AVG 20 runs] : 360928 (± 3579) ops/sec; 4.0 (± 0.0) MB/sec
readrandom [MEDIAN 20 runs] : 362468 ops/sec; 4.0 MB/sec
[Readrandom checksum=0]
readrandom [AVG 20 runs] : 380365 (± 2384) ops/sec; 4.2 (± 0.0) MB/sec
readrandom [MEDIAN 20 runs] : 379800 ops/sec; 4.2 MB/sec
Compression
TEST_TMPDIR=/dev/shm ./db_bench_ichecksum --benchmarks=compress[-X20] --compression_type=zstd --num=100000000 --compression_checksum=1
checksum=1
compress [AVG 20 runs] : 54074 (± 634) ops/sec; 211.2 (± 2.5) MB/sec
compress [MEDIAN 20 runs] : 54396 ops/sec; 212.5 MB/sec
checksum=0
compress [AVG 20 runs] : 54598 (± 393) ops/sec; 213.3 (± 1.5) MB/sec
compress [MEDIAN 20 runs] : 54592 ops/sec; 213.3 MB/sec
Decompression:
TEST_TMPDIR=/dev/shm ./db_bench_ichecksum --benchmarks=uncompress[-X20] --compression_type=zstd --compression_checksum=1
checksum = 0
uncompress [AVG 20 runs] : 167499 (± 962) ops/sec; 654.3 (± 3.8) MB/sec
uncompress [MEDIAN 20 runs] : 167210 ops/sec; 653.2 MB/sec
checksum = 1
uncompress [AVG 20 runs] : 167980 (± 924) ops/sec; 656.2 (± 3.6) MB/sec
uncompress [MEDIAN 20 runs] : 168465 ops/sec; 658.1 MB/sec
```
Reviewed By: ajkr
Differential Revision: D48019378
Pulled By: cbi42
fbshipit-source-id: 674120c6e1853c2ced1436ac8138559d0204feba
2023-08-18 22:01:59 +00:00
|
|
|
#ifdef ZSTD_ADVANCED
|
2022-02-24 07:45:04 +00:00
|
|
|
ZSTD_DCtx* dctx_;
|
|
|
|
ZSTD_inBuffer input_buffer_;
|
|
|
|
#endif
|
|
|
|
};
|
|
|
|
|
2020-02-20 20:07:53 +00:00
|
|
|
} // namespace ROCKSDB_NAMESPACE
|