rocksdb/util
Jay Huh f22557886e Fix Compaction Stats (#13071)
Summary:
Compaction stats code is not so straightforward to understand. Here's a bit of context for this PR and why this change was made.

- **CompactionStats (compaction_stats_.stats):** Internal stats about the compaction used for logging and public metrics.
- **CompactionJobStats (compaction_job_stats_)**: The public stats at job level. It's part of Compaction event listener and included in the CompactionResult.
- **CompactionOutputsStats**: output stats only. resides in CompactionOutputs. It gets aggregated toward the CompactionStats (internal stats).

The internal stats, `compaction_stats_.stats`, has the output information recorded from the compaction iterator, but it does not have any input information (input records, input output files) until `UpdateCompactionStats()` gets called. We cannot simply call `UpdateCompactionStats()` to fill in the input information in the remote compaction (which is a subcompaction of the primary host's compaction) because the `compaction->inputs()` have the full list of input files and `UpdateCompactionStats()` takes the entire list of records in all files. `num_input_records` gets double-counted if multiple sub-compactions are submitted to the remote worker.

The job level stats (in the case of remote compaction, it's subcompaction level stat), `compaction_job_stats_`, has the correct input records, but has no output information. We can use `UpdateCompactionJobStats(compaction_stats_.stats)` to set the output information (num_output_records, num_output_files, etc.) from the `compaction_stats_.stats`, but it also sets all other fields including the input information which sets all back to 0.

Therefore, we are overriding `UpdateCompactionJobStats()` in remote worker only to update job level stats, `compaction_job_stats_`, with output information of the internal stats.

Baiscally, we are merging the aggregated output info from the internal stats and aggregated input info from the compaction job stats.

In this PR we are also fixing how we are setting `is_remote_compaction` in CompactionJobStats.
- OnCompactionBegin event, if options.compaction_service is set, `is_remote_compaction=true` for all compactions except for trivial moves
- OnCompactionCompleted event, if any of the sub_compactions were done remotely, compaction level stats's `is_remote_compaction` will be true

Other minor changes
- num_output_records is already available in CompactionJobStats. No need to store separately in CompactionResult.
- total_bytes is not needed.
- Renamed `SubcompactionState::AggregateCompactionStats()` to `SubcompactionState::AggregateCompactionOutputStats()` to make it clear that it's only aggregating output stats.
- Renamed `SetTotalBytes()` to `AddBytesWritten()` to make it more clear that it's adding total written bytes from the compaction output.

Pull Request resolved: https://github.com/facebook/rocksdb/pull/13071

Test Plan:
Unit Tests added and updated
```
./compaction_service_test
```

Reviewed By: anand1976

Differential Revision: D64479657

Pulled By: jaykorean

fbshipit-source-id: a7a776a00dc718abae95d856b661bcbafd3b0ed5
2024-10-16 19:20:37 -07:00
..
aligned_buffer.h remove dependency on options.h for port_posix.h andport_win.h (#11214) 2023-02-13 02:21:38 -08:00
aligned_storage.h Fix compile errors in C++23 (#12106) 2024-05-28 15:33:57 -07:00
async_file_reader.cc Change ReadAsync callback API to remove const from FSReadRequest (#11649) 2024-02-16 09:14:55 -08:00
async_file_reader.h Deshim coro in fbcode/internal_repo_rocksdb 2024-09-14 09:48:21 -07:00
atomic.h Safer wrapper for std::atomic, use in HCC (#12051) 2023-11-08 13:28:43 -08:00
autovector.h Make autovector call default constructor explicitly before move/copy (#12499) 2024-04-04 12:33:05 -07:00
autovector_test.cc Remove RocksDB LITE (#11147) 2023-01-27 13:14:19 -08:00
bloom_impl.h Clean up some FastRange calls (#11707) 2023-08-17 11:52:38 -07:00
bloom_test.cc Set optimize_filters_for_memory by default (#12377) 2024-04-30 08:33:31 -07:00
build_version.cc.in Remove RocksDB LITE (#11147) 2023-01-27 13:14:19 -08:00
cast_util.h Create an UnownedPtr type (#12447) 2024-03-15 11:43:28 -07:00
channel.h Fix and detect headers with missing dependencies (#8893) 2021-09-10 10:00:26 -07:00
cleanable.cc Fix compile error in Clang 13 (#10033) 2022-05-28 00:15:28 -07:00
coding.cc clang-format cache/ and util/ directories (#10867) 2022-10-26 12:08:20 -07:00
coding.h Remove unnecessary, confusing 'extern' (#12300) 2024-01-29 10:38:08 -08:00
coding_lean.h New stable, fixed-length cache keys (#9126) 2021-12-16 17:15:13 -08:00
coding_test.cc Remove extra semi colon from internal_repo_rocksdb/repo/util/coding_test.cc 2024-03-30 07:17:52 -07:00
compaction_job_stats_impl.cc Fix Compaction Stats (#13071) 2024-10-16 19:20:37 -07:00
comparator.cc Support read timestamp in ldb (#12641) 2024-05-13 15:43:12 -07:00
compression.cc Add CompressionOptions::checksum for enabling ZSTD checksum (#11666) 2023-08-18 15:01:59 -07:00
compression.h LZ4 set acceleration parameter (#11844) 2023-09-18 09:26:29 -07:00
compression_context_cache.cc internal_repo_rocksdb (435146444452818992) (#12115) 2023-12-01 11:15:17 -08:00
compression_context_cache.h
concurrent_task_limiter_impl.cc clang-format cache/ and util/ directories (#10867) 2022-10-26 12:08:20 -07:00
concurrent_task_limiter_impl.h Remove 'virtual' when implied by 'override' (#12319) 2024-01-31 13:14:42 -08:00
core_local.h Add some more bit operations to internal APIs (#11660) 2023-08-02 11:30:10 -07:00
coro_utils.h Deshim coro in fbcode/internal_repo_rocksdb 2024-09-14 09:48:21 -07:00
crc32c.cc Run internal cpp modernizer on RocksDB repo (#12398) 2024-03-04 10:08:32 -08:00
crc32c.h Remove unnecessary, confusing 'extern' (#12300) 2024-01-29 10:38:08 -08:00
crc32c_arm64.cc Minimal RocksJava compliance with Java 8 language level (EB 1046) (#10951) 2023-05-17 19:44:24 -07:00
crc32c_arm64.h Remove unnecessary, confusing 'extern' (#12300) 2024-01-29 10:38:08 -08:00
crc32c_ppc.c Fix Compilation on ppc64le using Clang 11 (#7713) 2020-12-01 11:21:44 -08:00
crc32c_ppc.h Remove unnecessary, confusing 'extern' (#12300) 2024-01-29 10:38:08 -08:00
crc32c_ppc_asm.S Fix Compilation on ppc64le using Clang 11 (#7713) 2020-12-01 11:21:44 -08:00
crc32c_ppc_constants.h
crc32c_test.cc Run internal cpp modernizer on RocksDB repo (#12398) 2024-03-04 10:08:32 -08:00
data_structure.cc Run internal cpp modernizer on RocksDB repo (#12398) 2024-03-04 10:08:32 -08:00
defer.h clang-format cache/ and util/ directories (#10867) 2022-10-26 12:08:20 -07:00
defer_test.cc clang-format cache/ and util/ directories (#10867) 2022-10-26 12:08:20 -07:00
distributed_mutex.h Fix race in options taking effect (#11929) 2023-10-12 10:05:23 -07:00
duplicate_detector.h Remove RocksDB LITE (#11147) 2023-01-27 13:14:19 -08:00
dynamic_bloom.cc clang-format cache/ and util/ directories (#10867) 2022-10-26 12:08:20 -07:00
dynamic_bloom.h Clean up some FastRange calls (#11707) 2023-08-17 11:52:38 -07:00
dynamic_bloom_test.cc internal_repo_rocksdb (435146444452818992) (#12115) 2023-12-01 11:15:17 -08:00
fastrange.h Fix and detect headers with missing dependencies (#8893) 2021-09-10 10:00:26 -07:00
file_checksum_helper.cc Remove 'virtual' when implied by 'override' (#12319) 2024-01-31 13:14:42 -08:00
file_checksum_helper.h remove dependency on options.h for port_posix.h andport_win.h (#11214) 2023-02-13 02:21:38 -08:00
file_reader_writer_test.cc Remove WritableFile(FSWritableFile)::GetFileSize default implementation (#12303) 2024-01-30 09:49:32 -08:00
filelock_test.cc Remove extra semi colon from internal_repo_rocksdb/repo/util/filelock_test.cc 2024-03-19 16:17:57 -07:00
filter_bench.cc Remove redundant no_io parameters to filter functions (#12762) 2024-06-12 18:47:11 -07:00
gflags_compat.h Fix gflags_compat.h (#11346) 2023-04-03 10:41:00 -07:00
hash.cc Experimental support for SST unique IDs (#8990) 2021-10-18 23:32:01 -07:00
hash.h Remove unnecessary, confusing 'extern' (#12300) 2024-01-29 10:38:08 -08:00
hash128.h Upgrade xxhash, add Hash128 (#8634) 2021-08-20 18:41:51 -07:00
hash_containers.h Meta-internal folly integration with F14FastMap (#9546) 2022-04-13 07:34:01 -07:00
hash_map.h Change HashMap::Insert()'s value to a const reference (#6567) 2020-03-20 14:59:54 -07:00
hash_test.cc Add some more bit operations to internal APIs (#11660) 2023-08-02 11:30:10 -07:00
heap.h clang-format cache/ and util/ directories (#10867) 2022-10-26 12:08:20 -07:00
heap_test.cc clang-format cache/ and util/ directories (#10867) 2022-10-26 12:08:20 -07:00
kv_map.h clang-format cache/ and util/ directories (#10867) 2022-10-26 12:08:20 -07:00
log_write_bench.cc Group SST write in flush, compaction and db open with new stats (#11910) 2023-12-29 15:29:23 -08:00
math.h Add some more bit operations to internal APIs (#11660) 2023-08-02 11:30:10 -07:00
math128.h Add some more bit operations to internal APIs (#11660) 2023-08-02 11:30:10 -07:00
murmurhash.cc Remove extra semi colon from internal_repo_rocksdb/repo/util/murmurhash.cc (#12270) 2024-01-24 07:39:59 -08:00
murmurhash.h clang-format cache/ and util/ directories (#10867) 2022-10-26 12:08:20 -07:00
mutexlock.h Improve memory efficiency of many OptimisticTransactionDBs (#11439) 2023-05-24 11:57:15 -07:00
overload.h Fix copyright header in util/overload.h (#11826) 2023-09-13 09:50:44 -07:00
ppc-opcode.h
random.cc Fix compile errors in C++23 (#12106) 2024-05-28 15:33:57 -07:00
random.h clang-format cache/ and util/ directories (#10867) 2022-10-26 12:08:20 -07:00
random_test.cc clang-format cache/ and util/ directories (#10867) 2022-10-26 12:08:20 -07:00
rate_limiter.cc Decouple RateLimiter burst size and refill period (#12379) 2024-02-26 16:55:13 -08:00
rate_limiter_impl.h Decouple RateLimiter burst size and refill period (#12379) 2024-02-26 16:55:13 -08:00
rate_limiter_test.cc Decouple RateLimiter burst size and refill period (#12379) 2024-02-26 16:55:13 -08:00
repeatable_thread.h Use SystemClock* instead of std::shared_ptr<SystemClock> in lower level routines (#8033) 2021-03-15 04:34:11 -07:00
repeatable_thread_test.cc Prefer static_cast in place of most reinterpret_cast (#12308) 2024-02-07 10:44:11 -08:00
ribbon_alg.h Use only ASCII in source files (#10164) 2022-06-15 14:44:43 -07:00
ribbon_config.cc Run internal cpp modernizer on RocksDB repo (#12398) 2024-03-04 10:08:32 -08:00
ribbon_config.h Refine Ribbon configuration, improve testing, add Homogeneous (#7879) 2021-02-26 08:50:42 -08:00
ribbon_impl.h Remove extra semi colon from internal_repo_rocksdb/repo/util/ribbon_impl.h (#12269) 2024-01-24 08:20:50 -08:00
ribbon_test.cc util/ribbon_test.cc: avoid ambiguous reversed operator error in c++20 (#11371) 2023-04-12 13:24:34 -07:00
set_comparator.h Fix and detect headers with missing dependencies (#8893) 2021-09-10 10:00:26 -07:00
single_thread_executor.h add a missing include (#11624) 2023-07-18 15:38:52 -07:00
slice.cc Run internal cpp modernizer on RocksDB repo (#12398) 2024-03-04 10:08:32 -08:00
slice_test.cc Create an UnownedPtr type (#12447) 2024-03-15 11:43:28 -07:00
slice_transform_test.cc Much better stats for seeks and prefix filtering (#11460) 2023-05-19 15:25:49 -07:00
status.cc Run internal cpp modernizer on RocksDB repo (#12398) 2024-03-04 10:08:32 -08:00
stderr_logger.cc Fix truncating last character in the StderrLogger (#12620) 2024-05-06 08:53:06 -07:00
stderr_logger.h fix the non initialized bug in StderrLogger. (#12839) 2024-07-08 15:59:02 -07:00
stop_watch.h Fix StopWatch bug; Remove setting record_read_stats (#11474) 2023-05-25 10:16:58 -07:00
string_util.cc Run internal cpp modernizer on RocksDB repo (#12398) 2024-03-04 10:08:32 -08:00
string_util.h Remove unnecessary, confusing 'extern' (#12300) 2024-01-29 10:38:08 -08:00
string_util_test.cc Fix gcc12 build failure caused by INT_MIN in NumberToHumanString (#12215) 2024-01-10 10:17:31 -08:00
thread_guard.h Introduce a ThreadGuard class and use it in ExternalSSTFileTest.PickedLevelBug (#8112) 2021-03-25 22:08:58 -07:00
thread_list_test.cc Run internal cpp modernizer on RocksDB repo (#12398) 2024-03-04 10:08:32 -08:00
thread_local.cc Run internal cpp modernizer on RocksDB repo (#12398) 2024-03-04 10:08:32 -08:00
thread_local.h clang-format cache/ and util/ directories (#10867) 2022-10-26 12:08:20 -07:00
thread_local_test.cc Remove RocksDB LITE (#11147) 2023-01-27 13:14:19 -08:00
thread_operation.h GetEntity Support for ReadOnlyDB and SecondaryDB (#11799) 2023-09-15 08:30:44 -07:00
threadpool_imp.cc Run internal cpp modernizer on RocksDB repo (#12398) 2024-03-04 10:08:32 -08:00
threadpool_imp.h Remove 'virtual' when implied by 'override' (#12319) 2024-01-31 13:14:42 -08:00
timer.h clang-format cache/ and util/ directories (#10867) 2022-10-26 12:08:20 -07:00
timer_queue.h clang-format cache/ and util/ directories (#10867) 2022-10-26 12:08:20 -07:00
timer_queue_test.cc clang-format cache/ and util/ directories (#10867) 2022-10-26 12:08:20 -07:00
timer_test.cc Print stack traces on frozen tests in CI (#10828) 2022-10-18 00:35:35 -07:00
udt_util.cc Add timestamp support in dump_wal/dump/idump (#12690) 2024-05-23 20:26:57 -07:00
udt_util.h Add timestamp support in dump_wal/dump/idump (#12690) 2024-05-23 20:26:57 -07:00
udt_util_test.cc Add timestamp support in dump_wal/dump/idump (#12690) 2024-05-23 20:26:57 -07:00
user_comparator_wrapper.h Make UserComparatorWrapper not Customizable (#10837) 2022-10-21 12:27:50 -07:00
vector_iterator.h Remove 'virtual' when implied by 'override' (#12319) 2024-01-31 13:14:42 -08:00
work_queue.h Fix and detect headers with missing dependencies (#8893) 2021-09-10 10:00:26 -07:00
work_queue_test.cc Print stack traces on frozen tests in CI (#10828) 2022-10-18 00:35:35 -07:00
write_batch_util.cc Add utils to use for handling user defined timestamp size record in WAL (#11451) 2023-05-22 14:28:58 -07:00
write_batch_util.h Add timestamp support in dump_wal/dump/idump (#12690) 2024-05-23 20:26:57 -07:00
xxhash.cc Manual interventions for clang-format util/ (#10870) 2022-10-26 12:08:20 -07:00
xxhash.h Remove extra semi colon from internal_repo_rocksdb/repo/util/xxhash.h 2024-06-26 07:26:20 -07:00
xxph3.h Fix bad include (#11797) 2023-09-05 14:44:17 -07:00