rocksdb/util
Peter Dillinger d79be3dca2 Changes and enhancements to compression stats, thresholds (#11388)
Summary:
## Option API updates
* Add new CompressionOptions::max_compressed_bytes_per_kb, which corresponds to 1024.0 / min allowable compression ratio. This avoids the hard-coded minimum ratio of 8/7.
* Remove unnecessary constructor for CompressionOptions.
* Document undocumented CompressionOptions. Use idiom for default values shown clearly in one place (not precariously repeated).

 ## Stat API updates
* Deprecate the BYTES_COMPRESSED, BYTES_DECOMPRESSED histograms. Histograms incur substantial extra space & time costs compared to tickers, and the distribution of uncompressed data block sizes tends to be uninteresting. If we're interested in that distribution, I don't see why it should be limited to blocks stored as compressed.
* Deprecate the NUMBER_BLOCK_NOT_COMPRESSED ticker, because the name is very confusing.
* New or existing tickers relevant to compression:
  * BYTES_COMPRESSED_FROM
  * BYTES_COMPRESSED_TO
  * BYTES_COMPRESSION_BYPASSED
  * BYTES_COMPRESSION_REJECTED
  * COMPACT_WRITE_BYTES + FLUSH_WRITE_BYTES (both existing)
  * NUMBER_BLOCK_COMPRESSED (existing)
  * NUMBER_BLOCK_COMPRESSION_BYPASSED
  * NUMBER_BLOCK_COMPRESSION_REJECTED
  * BYTES_DECOMPRESSED_FROM
  * BYTES_DECOMPRESSED_TO

We can compute a number of things with these stats:
* "Successful" compression ratio: BYTES_COMPRESSED_FROM / BYTES_COMPRESSED_TO
* Compression ratio of data on which compression was attempted: (BYTES_COMPRESSED_FROM + BYTES_COMPRESSION_REJECTED) / (BYTES_COMPRESSED_TO + BYTES_COMPRESSION_REJECTED)
* Compression ratio of data that could be eligible for compression: (BYTES_COMPRESSED_FROM + X) / (BYTES_COMPRESSED_TO + X) where X = BYTES_COMPRESSION_REJECTED + NUMBER_BLOCK_COMPRESSION_REJECTED
* Overall SST compression ratio (compression disabled vs. actual): (Y - BYTES_COMPRESSED_TO + BYTES_COMPRESSED_FROM) / Y where Y = COMPACT_WRITE_BYTES + FLUSH_WRITE_BYTES

Keeping _REJECTED separate from _BYPASSED helps us to understand "wasted" CPU time in compression.

 ## BlockBasedTableBuilder
Various small refactorings, optimizations, and name clean-ups.

Pull Request resolved: https://github.com/facebook/rocksdb/pull/11388

Test Plan:
unit tests added

* `options_settable_test.cc`: use non-deprecated idiom for configuring CompressionOptions from string. The old idiom is tested elsewhere and does not need to be updated to support the new field.

Reviewed By: ajkr

Differential Revision: D45128202

Pulled By: pdillinger

fbshipit-source-id: 5a652bf5c022b7ec340cf79018cccf0686962803
2023-04-21 21:57:40 -07:00
..
aligned_buffer.h remove dependency on options.h for port_posix.h andport_win.h (#11214) 2023-02-13 02:21:38 -08:00
async_file_reader.cc Return any errors returned by ReadAsync to the MultiGet caller (#11171) 2023-02-02 16:35:27 -08:00
async_file_reader.h Add some missing headers (#10519) 2022-08-11 12:45:50 -07:00
autovector.h Remove RocksDB LITE (#11147) 2023-01-27 13:14:19 -08:00
autovector_test.cc Remove RocksDB LITE (#11147) 2023-01-27 13:14:19 -08:00
bloom_impl.h FilterPolicy API changes for 7.0 (#9501) 2022-02-08 13:56:46 -08:00
bloom_test.cc Fix an uninitialized variable warning for g++ 12.2.0 (#10995) 2022-11-30 19:27:28 -08:00
build_version.cc.in Remove RocksDB LITE (#11147) 2023-01-27 13:14:19 -08:00
cast_util.h More refactoring ahead of footer & meta changes (#9240) 2021-12-10 08:13:26 -08:00
channel.h Fix and detect headers with missing dependencies (#8893) 2021-09-10 10:00:26 -07:00
cleanable.cc Fix compile error in Clang 13 (#10033) 2022-05-28 00:15:28 -07:00
coding.cc clang-format cache/ and util/ directories (#10867) 2022-10-26 12:08:20 -07:00
coding.h clang-format cache/ and util/ directories (#10867) 2022-10-26 12:08:20 -07:00
coding_lean.h New stable, fixed-length cache keys (#9126) 2021-12-16 17:15:13 -08:00
coding_test.cc clang-format cache/ and util/ directories (#10867) 2022-10-26 12:08:20 -07:00
compaction_job_stats_impl.cc Remove RocksDB LITE (#11147) 2023-01-27 13:14:19 -08:00
comparator.cc Remove RocksDB LITE (#11147) 2023-01-27 13:14:19 -08:00
compression.cc Fix bug in WAL streaming uncompression (#11198) 2023-02-08 12:05:49 -08:00
compression.h Fix bug in WAL streaming uncompression (#11198) 2023-02-08 12:05:49 -08:00
compression_context_cache.cc clang-format cache/ and util/ directories (#10867) 2022-10-26 12:08:20 -07:00
compression_context_cache.h Replace namespace name "rocksdb" with ROCKSDB_NAMESPACE (#6433) 2020-02-20 12:09:57 -08:00
concurrent_task_limiter_impl.cc clang-format cache/ and util/ directories (#10867) 2022-10-26 12:08:20 -07:00
concurrent_task_limiter_impl.h clang-format cache/ and util/ directories (#10867) 2022-10-26 12:08:20 -07:00
core_local.h remove dependency on options.h for port_posix.h andport_win.h (#11214) 2023-02-13 02:21:38 -08:00
coro_utils.h Add some missing headers (#10519) 2022-08-11 12:45:50 -07:00
crc32c.cc Fix use of crc32c 3way on portable builds using MSVC (#10667) 2022-11-08 11:56:55 -08:00
crc32c.h clang-format cache/ and util/ directories (#10867) 2022-10-26 12:08:20 -07:00
crc32c_arm64.cc Add OpenBSD/arm64 support for detection of CRC32 and PMULL (#10902) 2022-11-02 14:35:27 -07:00
crc32c_arm64.h Manual interventions for clang-format util/ (#10870) 2022-10-26 12:08:20 -07:00
crc32c_ppc.c Fix Compilation on ppc64le using Clang 11 (#7713) 2020-12-01 11:21:44 -08:00
crc32c_ppc.h Fix and detect headers with missing dependencies (#8893) 2021-09-10 10:00:26 -07:00
crc32c_ppc_asm.S Fix Compilation on ppc64le using Clang 11 (#7713) 2020-12-01 11:21:44 -08:00
crc32c_ppc_constants.h Remove PATENTS text from a few straggler files (#5326) 2019-05-21 16:22:35 -07:00
crc32c_test.cc clang-format cache/ and util/ directories (#10867) 2022-10-26 12:08:20 -07:00
data_structure.cc Improve SmallEnumSet (#11178) 2023-02-08 20:14:57 -08:00
defer.h clang-format cache/ and util/ directories (#10867) 2022-10-26 12:08:20 -07:00
defer_test.cc clang-format cache/ and util/ directories (#10867) 2022-10-26 12:08:20 -07:00
distributed_mutex.h remove dependency on options.h for port_posix.h andport_win.h (#11214) 2023-02-13 02:21:38 -08:00
duplicate_detector.h Remove RocksDB LITE (#11147) 2023-01-27 13:14:19 -08:00
dynamic_bloom.cc clang-format cache/ and util/ directories (#10867) 2022-10-26 12:08:20 -07:00
dynamic_bloom.h clang-format cache/ and util/ directories (#10867) 2022-10-26 12:08:20 -07:00
dynamic_bloom_test.cc clang-format cache/ and util/ directories (#10867) 2022-10-26 12:08:20 -07:00
fastrange.h Fix and detect headers with missing dependencies (#8893) 2021-09-10 10:00:26 -07:00
file_checksum_helper.cc Group rocksdb.sst.read.micros stat by IOActivity flush and compaction (#11288) 2023-04-21 09:07:18 -07:00
file_checksum_helper.h remove dependency on options.h for port_posix.h andport_win.h (#11214) 2023-02-13 02:21:38 -08:00
file_reader_writer_test.cc Remove RocksDB LITE (#11147) 2023-01-27 13:14:19 -08:00
filelock_test.cc clang-format cache/ and util/ directories (#10867) 2022-10-26 12:08:20 -07:00
filter_bench.cc Use user-provided ReadOptions for metadata block reads more often (#11208) 2023-04-04 16:53:14 -07:00
gflags_compat.h Fix gflags_compat.h (#11346) 2023-04-03 10:41:00 -07:00
hash.cc Experimental support for SST unique IDs (#8990) 2021-10-18 23:32:01 -07:00
hash.h Experimental support for SST unique IDs (#8990) 2021-10-18 23:32:01 -07:00
hash128.h Upgrade xxhash, add Hash128 (#8634) 2021-08-20 18:41:51 -07:00
hash_containers.h Meta-internal folly integration with F14FastMap (#9546) 2022-04-13 07:34:01 -07:00
hash_map.h Change HashMap::Insert()'s value to a const reference (#6567) 2020-03-20 14:59:54 -07:00
hash_test.cc clang-format cache/ and util/ directories (#10867) 2022-10-26 12:08:20 -07:00
heap.h clang-format cache/ and util/ directories (#10867) 2022-10-26 12:08:20 -07:00
heap_test.cc clang-format cache/ and util/ directories (#10867) 2022-10-26 12:08:20 -07:00
kv_map.h clang-format cache/ and util/ directories (#10867) 2022-10-26 12:08:20 -07:00
log_write_bench.cc Add a SystemClock class to capture the time functions of an Env (#7858) 2021-01-25 22:09:11 -08:00
math.h Derive cache keys from SST unique IDs (#10394) 2022-08-12 13:49:49 -07:00
math128.h Derive cache keys from SST unique IDs (#10394) 2022-08-12 13:49:49 -07:00
murmurhash.cc Manual interventions for clang-format util/ (#10870) 2022-10-26 12:08:20 -07:00
murmurhash.h clang-format cache/ and util/ directories (#10867) 2022-10-26 12:08:20 -07:00
mutexlock.h remove dependency on options.h for port_posix.h andport_win.h (#11214) 2023-02-13 02:21:38 -08:00
ppc-opcode.h Remove PATENTS text from a few straggler files (#5326) 2019-05-21 16:22:35 -07:00
random.cc remove dependency on options.h for port_posix.h andport_win.h (#11214) 2023-02-13 02:21:38 -08:00
random.h clang-format cache/ and util/ directories (#10867) 2022-10-26 12:08:20 -07:00
random_test.cc clang-format cache/ and util/ directories (#10867) 2022-10-26 12:08:20 -07:00
rate_limiter.cc Fix race conditions in GenericRateLimiter (#10374) 2022-07-19 09:31:14 -07:00
rate_limiter.h Fix race conditions in GenericRateLimiter (#10374) 2022-07-19 09:31:14 -07:00
rate_limiter_test.cc Print stack traces on frozen tests in CI (#10828) 2022-10-18 00:35:35 -07:00
repeatable_thread.h Use SystemClock* instead of std::shared_ptr<SystemClock> in lower level routines (#8033) 2021-03-15 04:34:11 -07:00
repeatable_thread_test.cc Print stack traces on frozen tests in CI (#10828) 2022-10-18 00:35:35 -07:00
ribbon_alg.h Use only ASCII in source files (#10164) 2022-06-15 14:44:43 -07:00
ribbon_config.cc Refine Ribbon configuration, improve testing, add Homogeneous (#7879) 2021-02-26 08:50:42 -08:00
ribbon_config.h Refine Ribbon configuration, improve testing, add Homogeneous (#7879) 2021-02-26 08:50:42 -08:00
ribbon_impl.h Account Bloom/Ribbon filter construction memory in global memory limit (#9073) 2021-11-18 09:42:20 -08:00
ribbon_test.cc util/ribbon_test.cc: avoid ambiguous reversed operator error in c++20 (#11371) 2023-04-12 13:24:34 -07:00
set_comparator.h Fix and detect headers with missing dependencies (#8893) 2021-09-10 10:00:26 -07:00
single_thread_executor.h Add some missing headers (#10519) 2022-08-11 12:45:50 -07:00
slice.cc Remove RocksDB LITE (#11147) 2023-01-27 13:14:19 -08:00
slice_test.cc Improve SmallEnumSet (#11178) 2023-02-08 20:14:57 -08:00
slice_transform_test.cc Print stack traces on frozen tests in CI (#10828) 2022-10-18 00:35:35 -07:00
status.cc Merge operator failed subcode (#11231) 2023-02-17 10:58:46 -08:00
stderr_logger.cc Fix an import issue in fbcode. (#10604) 2022-08-29 21:09:36 -07:00
stderr_logger.h Fix an import issue in fbcode. (#10604) 2022-08-29 21:09:36 -07:00
stop_watch.h Changes and enhancements to compression stats, thresholds (#11388) 2023-04-21 21:57:40 -07:00
string_util.cc Remove RocksDB LITE (#11147) 2023-01-27 13:14:19 -08:00
string_util.h Remove RocksDB LITE (#11147) 2023-01-27 13:14:19 -08:00
thread_guard.h Introduce a ThreadGuard class and use it in ExternalSSTFileTest.PickedLevelBug (#8112) 2021-03-25 22:08:58 -07:00
thread_list_test.cc Group rocksdb.sst.read.micros stat by IOActivity flush and compaction (#11288) 2023-04-21 09:07:18 -07:00
thread_local.cc clang-format cache/ and util/ directories (#10867) 2022-10-26 12:08:20 -07:00
thread_local.h clang-format cache/ and util/ directories (#10867) 2022-10-26 12:08:20 -07:00
thread_local_test.cc Remove RocksDB LITE (#11147) 2023-01-27 13:14:19 -08:00
thread_operation.h Group rocksdb.sst.read.micros stat by IOActivity flush and compaction (#11288) 2023-04-21 09:07:18 -07:00
threadpool_imp.cc clang-format cache/ and util/ directories (#10867) 2022-10-26 12:08:20 -07:00
threadpool_imp.h clang-format cache/ and util/ directories (#10867) 2022-10-26 12:08:20 -07:00
timer.h clang-format cache/ and util/ directories (#10867) 2022-10-26 12:08:20 -07:00
timer_queue.h clang-format cache/ and util/ directories (#10867) 2022-10-26 12:08:20 -07:00
timer_queue_test.cc clang-format cache/ and util/ directories (#10867) 2022-10-26 12:08:20 -07:00
timer_test.cc Print stack traces on frozen tests in CI (#10828) 2022-10-18 00:35:35 -07:00
user_comparator_wrapper.h Make UserComparatorWrapper not Customizable (#10837) 2022-10-21 12:27:50 -07:00
vector_iterator.h Make InternalKeyComparator not configurable (#10342) 2022-07-14 10:09:31 -07:00
work_queue.h Fix and detect headers with missing dependencies (#8893) 2021-09-10 10:00:26 -07:00
work_queue_test.cc Print stack traces on frozen tests in CI (#10828) 2022-10-18 00:35:35 -07:00
xxhash.cc Manual interventions for clang-format util/ (#10870) 2022-10-26 12:08:20 -07:00
xxhash.h Upgrade xxhash.h to latest dev (#11098) 2023-01-19 12:07:50 -08:00
xxph3.h Manual interventions for clang-format util/ (#10870) 2022-10-26 12:08:20 -07:00