rocksdb/db
Andrew Kryczka 9402a77a37 No filesystem reads during Merge() writes (#12365)
Summary:
This occasional filesystem read in the write path has caused user pain. It doesn't seem very useful considering it only limits one component's merge chain length, and only helps merge uncached (i.e., infrequently read) values. This PR proposes allowing `max_successive_merges` to be exceeded when the value cannot be read from in-memory components. I included a rollback flag (`strict_max_successive_merges`) just in case.

Pull Request resolved: https://github.com/facebook/rocksdb/pull/12365

Test Plan:
"rocksdb.block.cache.data.add" is number of data blocks read from filesystem. Since the benchmark is write-only, compaction is disabled, and flush doesn't read data blocks, any nonzero value means the user write issued the read.

```
$ for s in false true; do echo -n "strict_max_successive_merges=$s: " && ./db_bench -value_size=64 -write_buffer_size=131072 -writes=128 -num=1 -benchmarks=mergerandom,flush,mergerandom -merge_operator=stringappend -disable_auto_compactions=true -compression_type=none -strict_max_successive_merges=$s -max_successive_merges=100 -statistics=true |& grep 'block.cache.data.add COUNT' ; done
strict_max_successive_merges=false: rocksdb.block.cache.data.add COUNT : 0
strict_max_successive_merges=true: rocksdb.block.cache.data.add COUNT : 1
```

Reviewed By: hx235

Differential Revision: D53982520

Pulled By: ajkr

fbshipit-source-id: e40f761a60bd601f232417ac0058e4a33ee9c0f4
2024-02-21 13:41:44 -08:00
..
blob Refactor FilePrefetchBuffer code (#12097) 2024-01-05 09:29:01 -08:00
compaction Rate-limit un-ratelimited flush/compaction code paths (#12290) 2024-01-25 13:29:13 -08:00
db_impl Fix/cleanup SeqnoToTimeMapping (#12253) 2024-01-19 21:50:38 -08:00
wide internal_repo_rocksdb (4372117296613874540) (#12117) 2023-12-04 11:17:32 -08:00
arena_wrapped_db_iter.cc Invalidate threadlocal SV before incrementing super_version_number_ (#11848) 2023-09-18 09:37:40 -07:00
arena_wrapped_db_iter.h Add new Iterator API Refresh(const snapshot*) (#10594) 2023-09-15 10:44:43 -07:00
builder.cc Fix/cleanup SeqnoToTimeMapping (#12253) 2024-01-19 21:50:38 -08:00
builder.h Group SST write in flush, compaction and db open with new stats (#11910) 2023-12-29 15:29:23 -08:00
c.cc Expose Options::ttl through C API (#12170) 2023-12-21 15:04:53 -08:00
c_test.c Expose Options::ttl through C API (#12170) 2023-12-21 15:04:53 -08:00
column_family.cc Detect compaction pressure at lower debt ratios (#12236) 2024-01-15 22:41:18 -08:00
column_family.h Clean up WriteBatchWithIndexInternal a bit (#11930) 2023-10-09 15:25:35 -07:00
column_family_test.cc Speedup based on pending compaction bytes relative to data size (#12130) 2023-12-13 10:37:27 -08:00
compact_files_test.cc Add missing status check when compiling with ASSERT_STATUS_CHECKED=1 (#11686) 2023-08-09 15:46:44 -07:00
comparator_db_test.cc Enforce status checking after Valid() returns false for IteratorWrapper (#11975) 2023-10-18 09:38:38 -07:00
convenience.cc Group SST write in flush, compaction and db open with new stats (#11910) 2023-12-29 15:29:23 -08:00
convenience_impl.h Group rocksdb.sst.read.micros stat by different user read IOActivity + misc (#11444) 2023-08-08 17:26:50 -07:00
corruption_test.cc Make option level_compaction_dynamic_level_bytes true by default (#11525) 2023-06-15 21:12:39 -07:00
cuckoo_table_db_test.cc Make option level_compaction_dynamic_level_bytes true by default (#11525) 2023-06-15 21:12:39 -07:00
db_basic_test.cc Add some asserts in FilePickerMultiGet for debugging (#12241) 2024-01-16 17:08:58 -08:00
db_block_cache_test.cc Enforce status checking after Valid() returns false for IteratorWrapper (#11975) 2023-10-18 09:38:38 -07:00
db_bloom_filter_test.cc Enforce status checking after Valid() returns false for IteratorWrapper (#11975) 2023-10-18 09:38:38 -07:00
db_clip_test.cc Support Clip DB to KeyRange (#11379) 2023-05-18 13:25:01 -07:00
db_compaction_filter_test.cc Enforce status checking after Valid() returns false for IteratorWrapper (#11975) 2023-10-18 09:38:38 -07:00
db_compaction_test.cc Detect compaction pressure at lower debt ratios (#12236) 2024-01-15 22:41:18 -08:00
db_dynamic_level_test.cc Remove RocksDB LITE (#11147) 2023-01-27 13:14:19 -08:00
db_encryption_test.cc Remove RocksDB LITE (#11147) 2023-01-27 13:14:19 -08:00
db_filesnapshot.cc Remove the default force behavior for EnableFileDeletion API (#12001) 2023-11-10 14:35:54 -08:00
db_flush_test.cc Fix leak or crash on failure in automatic atomic flush (#12176) 2023-12-26 11:04:25 -08:00
db_info_dumper.cc Automated modernization (#12210) 2024-01-05 11:53:57 -08:00
db_info_dumper.h
db_inplace_update_test.cc
db_io_failure_test.cc Remove RocksDB LITE (#11147) 2023-01-27 13:14:19 -08:00
db_iter.cc Fix bug in auto_readahead_size that returned wrong key (#12229) 2024-01-16 11:30:36 -08:00
db_iter.h Fix bug in auto_readahead_size that returned wrong key (#12229) 2024-01-16 11:30:36 -08:00
db_iter_stress_test.cc Automated modernization (#12210) 2024-01-05 11:53:57 -08:00
db_iter_test.cc Automated modernization (#12210) 2024-01-05 11:53:57 -08:00
db_iterator_test.cc Automated modernization (#12210) 2024-01-05 11:53:57 -08:00
db_kv_checksum_test.cc Automated modernization (#12210) 2024-01-05 11:53:57 -08:00
db_log_iter_test.cc Remove the default force behavior for EnableFileDeletion API (#12001) 2023-11-10 14:35:54 -08:00
db_logical_block_size_cache_test.cc Remove RocksDB LITE (#11147) 2023-01-27 13:14:19 -08:00
db_memtable_test.cc
db_merge_operand_test.cc Make option level_compaction_dynamic_level_bytes true by default (#11525) 2023-06-15 21:12:39 -07:00
db_merge_operator_test.cc Fix the handling of wide-column base values in the max_successive_merges logic (#11913) 2023-10-02 16:25:25 -07:00
db_options_test.cc Automated modernization (#12210) 2024-01-05 11:53:57 -08:00
db_properties_test.cc Automated modernization (#12210) 2024-01-05 11:53:57 -08:00
db_range_del_test.cc Automated modernization (#12210) 2024-01-05 11:53:57 -08:00
db_rate_limiter_test.cc Enforce status checking after Valid() returns false for IteratorWrapper (#11975) 2023-10-18 09:38:38 -07:00
db_readonly_with_timestamp_test.cc Enforce status checking after Valid() returns false for IteratorWrapper (#11975) 2023-10-18 09:38:38 -07:00
db_secondary_test.cc Automated modernization (#12210) 2024-01-05 11:53:57 -08:00
db_sst_test.cc Automated modernization (#12210) 2024-01-05 11:53:57 -08:00
db_statistics_test.cc Fix double counting of BYTES_WRITTEN ticker (#12111) 2023-12-08 17:12:11 -08:00
db_table_properties_test.cc Automated modernization (#12210) 2024-01-05 11:53:57 -08:00
db_tailing_iter_test.cc Automated modernization (#12210) 2024-01-05 11:53:57 -08:00
db_test.cc Automated modernization (#12210) 2024-01-05 11:53:57 -08:00
db_test2.cc Rate-limit un-ratelimited flush/compaction code paths (#12290) 2024-01-25 13:29:13 -08:00
db_test_util.cc Automated modernization (#12210) 2024-01-05 11:53:57 -08:00
db_test_util.h test: WritableFile derived class: add missing GetFileSize() override (#11726) 2023-09-29 15:58:08 -07:00
db_universal_compaction_test.cc Automated modernization (#12210) 2024-01-05 11:53:57 -08:00
db_wal_test.cc Automated modernization (#12210) 2024-01-05 11:53:57 -08:00
db_with_timestamp_basic_test.cc Automated modernization (#12210) 2024-01-05 11:53:57 -08:00
db_with_timestamp_compaction_test.cc Automated modernization (#12210) 2024-01-05 11:53:57 -08:00
db_with_timestamp_test_util.cc
db_with_timestamp_test_util.h
db_write_buffer_manager_test.cc Add missing status check when compiling with ASSERT_STATUS_CHECKED=1 (#11686) 2023-08-09 15:46:44 -07:00
db_write_test.cc Automated modernization (#12210) 2024-01-05 11:53:57 -08:00
dbformat.cc Automated modernization (#12210) 2024-01-05 11:53:57 -08:00
dbformat.h InternalKey::Set: remove redundant assign (#12194) 2024-01-02 11:17:39 -08:00
dbformat_test.cc Logically strip timestamp during flush (#11557) 2023-06-29 15:50:50 -07:00
deletefile_test.cc Automated modernization (#12210) 2024-01-05 11:53:57 -08:00
error_handler.cc Automated modernization (#12210) 2024-01-05 11:53:57 -08:00
error_handler.h Quarantine files in a limbo state after a manifest error (#12030) 2023-11-11 08:11:11 -08:00
error_handler_fs_test.cc Quarantine files in a limbo state after a manifest error (#12030) 2023-11-11 08:11:11 -08:00
event_helpers.cc Fix/cleanup SeqnoToTimeMapping (#12253) 2024-01-19 21:50:38 -08:00
event_helpers.h Remove RocksDB LITE (#11147) 2023-01-27 13:14:19 -08:00
experimental.cc Group SST write in flush, compaction and db open with new stats (#11910) 2023-12-29 15:29:23 -08:00
external_sst_file_basic_test.cc Fix bug of newer ingested data assigned with an older seqno (#12257) 2024-01-25 13:28:39 -08:00
external_sst_file_ingestion_job.cc Fix bug of newer ingested data assigned with an older seqno (#12257) 2024-01-25 13:28:39 -08:00
external_sst_file_ingestion_job.h
external_sst_file_test.cc Automated modernization (#12210) 2024-01-05 11:53:57 -08:00
fault_injection_test.cc Group SST write in flush, compaction and db open with new stats (#11910) 2023-12-29 15:29:23 -08:00
file_indexer.cc Simplify conditional judgment (#11580) 2023-07-03 09:41:48 -07:00
file_indexer.h
file_indexer_test.cc
filename_test.cc
flush_job.cc Rate-limit un-ratelimited flush/compaction code paths (#12290) 2024-01-25 13:29:13 -08:00
flush_job.h Rate-limit un-ratelimited flush/compaction code paths (#12290) 2024-01-25 13:29:13 -08:00
flush_job_test.cc Rate-limit un-ratelimited flush/compaction code paths (#12290) 2024-01-25 13:29:13 -08:00
flush_scheduler.cc
flush_scheduler.h
forward_iterator.cc Add an interface to provide support for underlying FS to pass their own buffer during reads (#11324) 2023-06-23 11:48:49 -07:00
forward_iterator.h Ignore async_io ReadOption if FileSystem doesn't support it (#11296) 2023-03-17 14:57:09 -07:00
forward_iterator_bench.cc Remove RocksDB LITE (#11147) 2023-01-27 13:14:19 -08:00
history_trimming_iterator.h Refactor AddRangeDels() + consider range tombstone during compaction file cutting (#11113) 2023-02-22 12:28:18 -08:00
import_column_family_job.cc Group SST write in flush, compaction and db open with new stats (#11910) 2023-12-29 15:29:23 -08:00
import_column_family_job.h Support to create a CF by importing multiple non-overlapping CFs (#11378) 2023-06-15 12:25:04 -07:00
import_column_family_test.cc internal_repo_rocksdb (4372117296613874540) (#12117) 2023-12-04 11:17:32 -08:00
internal_stats.cc Group SST write in flush, compaction and db open with new stats (#11910) 2023-12-29 15:29:23 -08:00
internal_stats.h add property "rocksdb.obsolete-sst-files-size" (#11533) 2023-06-13 15:52:45 -07:00
job_context.h Quarantine files in a limbo state after a manifest error (#12030) 2023-11-11 08:11:11 -08:00
kv_checksum.h Block per key-value checksum (#11287) 2023-04-25 12:08:23 -07:00
listener_test.cc internal_repo_rocksdb (4372117296613874540) (#12117) 2023-12-04 11:17:32 -08:00
log_format.h Add support in log writer and reader for a user-defined timestamp size record (#11433) 2023-05-11 17:26:19 -07:00
log_reader.cc internal_repo_rocksdb (4372117296613874540) (#12117) 2023-12-04 11:17:32 -08:00
log_reader.h switch to use RocksDB UnorderedMap (#11507) 2023-06-05 13:36:26 -07:00
log_test.cc Group SST write in flush, compaction and db open with new stats (#11910) 2023-12-29 15:29:23 -08:00
log_writer.cc Group SST write in flush, compaction and db open with new stats (#11910) 2023-12-29 15:29:23 -08:00
log_writer.h Group SST write in flush, compaction and db open with new stats (#11910) 2023-12-29 15:29:23 -08:00
logs_with_prep_tracker.cc
logs_with_prep_tracker.h
lookup_key.h
malloc_stats.cc internal_repo_rocksdb (4372117296613874540) (#12117) 2023-12-04 11:17:32 -08:00
malloc_stats.h Remove RocksDB LITE (#11147) 2023-01-27 13:14:19 -08:00
manual_compaction_test.cc internal_repo_rocksdb (4372117296613874540) (#12117) 2023-12-04 11:17:32 -08:00
memtable.cc No filesystem reads during Merge() writes (#12365) 2024-02-21 13:41:44 -08:00
memtable.h No filesystem reads during Merge() writes (#12365) 2024-02-21 13:41:44 -08:00
memtable_list.cc Group SST write in flush, compaction and db open with new stats (#11910) 2023-12-29 15:29:23 -08:00
memtable_list.h Give retry flushes their own functions (#11903) 2023-10-02 16:26:24 -07:00
memtable_list_test.cc Lightweight verification of MANIFEST file after close on shutdown (#12174) 2023-12-28 18:25:29 -08:00
merge_context.h
merge_helper.cc Eliminate some code duplication in MergeHelper (#12121) 2023-12-05 14:07:42 -08:00
merge_helper.h Eliminate some code duplication in MergeHelper (#12121) 2023-12-05 14:07:42 -08:00
merge_helper_test.cc
merge_operator.cc Add helper methods WideColumnsHelper::{Has,Get}DefaultColumn (#11813) 2023-09-11 16:32:32 -07:00
merge_test.cc internal_repo_rocksdb (4372117296613874540) (#12117) 2023-12-04 11:17:32 -08:00
obsolete_files_test.cc internal_repo_rocksdb (4372117296613874540) (#12117) 2023-12-04 11:17:32 -08:00
options_file_test.cc internal_repo_rocksdb (4372117296613874540) (#12117) 2023-12-04 11:17:32 -08:00
output_validator.cc
output_validator.h
perf_context_test.cc Deflake PerfContextTest.CPUTimer (#12252) 2024-01-19 10:13:52 -08:00
periodic_task_scheduler.cc Turn the default Timer in PeriodicTaskScheduler into a leaky Meyers singleton (#12128) 2023-12-08 10:34:07 -08:00
periodic_task_scheduler.h Improve efficiency of create_missing_column_families, light refactor (#11920) 2023-10-04 14:14:22 -07:00
periodic_task_scheduler_test.cc Remove RocksDB LITE (#11147) 2023-01-27 13:14:19 -08:00
pinned_iterators_manager.h
plain_table_db_test.cc internal_repo_rocksdb (4372117296613874540) (#12117) 2023-12-04 11:17:32 -08:00
post_memtable_callback.h
pre_release_callback.h
prefix_test.cc internal_repo_rocksdb (4372117296613874540) (#12117) 2023-12-04 11:17:32 -08:00
range_del_aggregator.cc internal_repo_rocksdb (4372117296613874540) (#12117) 2023-12-04 11:17:32 -08:00
range_del_aggregator.h Add new Iterator API Refresh(const snapshot*) (#10594) 2023-09-15 10:44:43 -07:00
range_del_aggregator_bench.cc
range_del_aggregator_test.cc Refactor AddRangeDels() + consider range tombstone during compaction file cutting (#11113) 2023-02-22 12:28:18 -08:00
range_tombstone_fragmenter.cc
range_tombstone_fragmenter.h Add new Iterator API Refresh(const snapshot*) (#10594) 2023-09-15 10:44:43 -07:00
range_tombstone_fragmenter_test.cc
read_callback.h
repair.cc Group SST write in flush, compaction and db open with new stats (#11910) 2023-12-29 15:29:23 -08:00
repair_test.cc internal_repo_rocksdb (4372117296613874540) (#12117) 2023-12-04 11:17:32 -08:00
seqno_time_test.cc Fix UB/crash in new SeqnoToTimeMapping::CopyFromSeqnoRange (#12293) 2024-01-25 13:26:31 -08:00
seqno_to_time_mapping.cc Fix UB/crash in new SeqnoToTimeMapping::CopyFromSeqnoRange (#12293) 2024-01-25 13:26:31 -08:00
seqno_to_time_mapping.h Fix UB/crash in new SeqnoToTimeMapping::CopyFromSeqnoRange (#12293) 2024-01-25 13:26:31 -08:00
snapshot_checker.h Remove RocksDB LITE (#11147) 2023-01-27 13:14:19 -08:00
snapshot_impl.cc
snapshot_impl.h
table_cache.cc internal_repo_rocksdb (4372117296613874540) (#12117) 2023-12-04 11:17:32 -08:00
table_cache.h Fix row cache falsely return kNotFound when timestamp enabled (#11816) 2023-09-20 11:34:38 -07:00
table_cache_sync_and_async.h Fix StopWatch bug; Remove setting record_read_stats (#11474) 2023-05-25 10:16:58 -07:00
table_properties_collector.cc
table_properties_collector.h Allow TablePropertiesCollectorFactory to return null collector (#12129) 2023-12-11 12:02:56 -08:00
table_properties_collector_test.cc Group SST write in flush, compaction and db open with new stats (#11910) 2023-12-29 15:29:23 -08:00
transaction_log_impl.cc Remove RocksDB LITE (#11147) 2023-01-27 13:14:19 -08:00
transaction_log_impl.h Remove RocksDB LITE (#11147) 2023-01-27 13:14:19 -08:00
trim_history_scheduler.cc
trim_history_scheduler.h
version_builder.cc Fix blob files not reclaimed after deleting all SSTs (#12235) 2024-01-16 11:15:23 -08:00
version_builder.h Block per key-value checksum (#11287) 2023-04-25 12:08:23 -07:00
version_builder_test.cc Fix blob files not reclaimed after deleting all SSTs (#12235) 2024-01-16 11:15:23 -08:00
version_edit.cc internal_repo_rocksdb (4372117296613874540) (#12117) 2023-12-04 11:17:32 -08:00
version_edit.h Quarantine files in a limbo state after a manifest error (#12030) 2023-11-11 08:11:11 -08:00
version_edit_handler.cc Remove VersionEdit's friends pattern (#12024) 2023-11-01 12:04:11 -07:00
version_edit_handler.h Remove VersionEdit's friends pattern (#12024) 2023-11-01 12:04:11 -07:00
version_edit_test.cc internal_repo_rocksdb (4372117296613874540) (#12117) 2023-12-04 11:17:32 -08:00
version_set.cc Add some asserts in FilePickerMultiGet for debugging (#12241) 2024-01-16 17:08:58 -08:00
version_set.h Group SST write in flush, compaction and db open with new stats (#11910) 2023-12-29 15:29:23 -08:00
version_set_sync_and_async.h Add helper methods WideColumnsHelper::{Has,Get}DefaultColumn (#11813) 2023-09-11 16:32:32 -07:00
version_set_test.cc Group SST write in flush, compaction and db open with new stats (#11910) 2023-12-29 15:29:23 -08:00
version_util.h Group SST write in flush, compaction and db open with new stats (#11910) 2023-12-29 15:29:23 -08:00
wal_edit.cc
wal_edit.h
wal_edit_test.cc
wal_manager.cc Replace push_back by emplace_back in wal manager (#10805) 2023-12-27 10:40:33 -08:00
wal_manager.h Remove RocksDB LITE (#11147) 2023-01-27 13:14:19 -08:00
wal_manager_test.cc Group SST write in flush, compaction and db open with new stats (#11910) 2023-12-29 15:29:23 -08:00
write_batch.cc No filesystem reads during Merge() writes (#12365) 2024-02-21 13:41:44 -08:00
write_batch_base.cc
write_batch_internal.h Set default cf ts sz for a reused transaction (#11685) 2023-08-09 13:49:42 -07:00
write_batch_test.cc Stubs for piping write time (#12043) 2023-11-09 15:58:07 -08:00
write_callback.h
write_callback_test.cc internal_repo_rocksdb (4372117296613874540) (#12117) 2023-12-04 11:17:32 -08:00
write_controller.cc
write_controller.h
write_controller_test.cc
write_stall_stats.cc Fix initialization-order-fiasco in write_stall_stats.cc (#11355) 2023-04-05 14:42:31 -07:00
write_stall_stats.h Fix initialization-order-fiasco in write_stall_stats.cc (#11355) 2023-04-05 14:42:31 -07:00
write_thread.cc Ensure LockWAL() stall cleared for UnlockWAL() return (#11172) 2023-02-03 12:08:37 -08:00
write_thread.h Group SST write in flush, compaction and db open with new stats (#11910) 2023-12-29 15:29:23 -08:00