rocksdb/db
Andrew Kryczka 91166012c8 Prevent a case of WriteBufferManager flush thrashing (#6364)
Summary:
Previously, the flushes triggered by `WriteBufferManager` could affect
the same CF repeatedly if it happens to get consecutive writes. Such
flushes are not particularly useful for reducing memory usage since
they switch nearly-empty memtables to immutable while they've just begun
filling their first arena block. In fact they may not even reduce the
mutable memory count if they involve replacing one mutable memtable containing
one arena block with a new mutable memtable containing one arena block.
Further, if such switches happen even a few times before a flush finishes,
the immutable memtable limit will be reached and writes will stall.

This PR adds a heuristic to not switch memtables to immutable for CFs
that already have one or more immutable memtables awaiting flush. There
is a memory usage regression if the user continues writing to the same
CF, that DB does not have any CFs eligible for switching, flushes
are not finishing, and the `WriteBufferManager` was constructed with
`allow_stall=false`. Before, it would grow by switching nearly empty
memtables until writes stall. Now, it would grow by filling memtables
until writes stall. This feels like an acceptable behavior change because
users who prefer to stall over violate the memory limit should be using
`allow_stall=true`, which is unaffected by this PR.

Pull Request resolved: https://github.com/facebook/rocksdb/pull/6364

Test Plan:
- Command:

`rm -rf /dev/shm/dbbench/ && TEST_TMPDIR=/dev/shm ./db_bench -benchmarks=fillrandom -num_multi_db=8 -num_column_families=2 -write_buffer_size=4194304 -db_write_buffer_size=16777216 -compression_type=none -statistics=true -target_file_size_base=4194304 -max_bytes_for_level_base=16777216`

- `rocksdb.db.write.stall` count before this PR: 175
- `rocksdb.db.write.stall` count after this PR: 0

Reviewed By: jay-zhuang

Differential Revision: D20167197

Pulled By: ajkr

fbshipit-source-id: 4a64064e9bc33d57c0a35f15547542d0191d0cb7
2022-08-17 15:53:40 -07:00
..
blob Add a blob-specific cache priority (#10461) 2022-08-12 17:59:06 -07:00
compaction Add memtable per key-value checksum (#10281) 2022-08-12 13:51:32 -07:00
db_impl Prevent a case of WriteBufferManager flush thrashing (#6364) 2022-08-17 15:53:40 -07:00
wide Make queries return the value of the default column for wide-column entities (#10483) 2022-08-08 16:10:08 -07:00
arena_wrapped_db_iter.cc Fragment memtable range tombstone in the write path (#10380) 2022-08-05 12:02:33 -07:00
arena_wrapped_db_iter.h
builder.cc Support prepopulating/warming the blob cache (#10298) 2022-07-17 07:13:59 -07:00
builder.h Add seqno to time mapping (#10338) 2022-07-14 21:49:34 -07:00
c.cc Support prepopulating/warming the blob cache (#10298) 2022-07-17 07:13:59 -07:00
c_test.c Support prepopulating/warming the blob cache (#10298) 2022-07-17 07:13:59 -07:00
column_family.cc Fragment memtable range tombstone in the write path (#10380) 2022-08-05 12:02:33 -07:00
column_family.h Add seqno to time mapping (#10338) 2022-07-14 21:49:34 -07:00
column_family_test.cc Migrate to docker for CI run (#10496) 2022-08-10 17:34:38 -07:00
compact_files_test.cc Migrate to docker for CI run (#10496) 2022-08-10 17:34:38 -07:00
comparator_db_test.cc Document design/specification bugs with auto_prefix_mode (#10144) 2022-06-13 11:08:50 -07:00
convenience.cc Specify largest_seqno in VerifyChecksum (#9919) 2022-05-02 10:22:08 -07:00
corruption_test.cc Migrate to docker for CI run (#10496) 2022-08-10 17:34:38 -07:00
cuckoo_table_db_test.cc Remove own ToString() (#9955) 2022-05-06 13:03:58 -07:00
db_basic_test.cc Fix range deletion handling in async MultiGet (#10534) 2022-08-17 13:51:39 -07:00
db_block_cache_test.cc Add a blob-specific cache priority (#10461) 2022-08-12 17:59:06 -07:00
db_bloom_filter_test.cc Update/clarify required properties for prefix extractors (#10245) 2022-06-28 16:08:30 -07:00
db_compaction_filter_test.cc Remove own ToString() (#9955) 2022-05-06 13:03:58 -07:00
db_compaction_test.cc Allow manual compactions to run in parallel by default (#10317) 2022-07-28 17:07:36 -07:00
db_dynamic_level_test.cc Remove deprecated API AdvancedColumnFamilyOptions::soft_rate_limit/hard_rate_limit (#9452) 2022-01-27 13:01:09 -08:00
db_encryption_test.cc
db_filesnapshot.cc Reduce risk of backup or checkpoint missing a WAL file (#10083) 2022-06-01 11:02:27 -07:00
db_flush_test.cc Dynamically changeable `MemPurge` option (#10011) 2022-06-23 09:42:18 -07:00
db_info_dumper.cc Printing IO Error in DumpDBFileSummary (#9940) 2022-05-04 10:19:53 -07:00
db_info_dumper.h
db_inplace_update_test.cc Fix in-place updates for value types other than kTypeValue (#10254) 2022-06-27 16:37:09 -07:00
db_io_failure_test.cc
db_iter.cc Reset blob value as soon as it's not needed in DBIter (#10490) 2022-08-09 11:39:57 -07:00
db_iter.h Reset blob value as soon as it's not needed in DBIter (#10490) 2022-08-09 11:39:57 -07:00
db_iter_stress_test.cc Remove own ToString() (#9955) 2022-05-06 13:03:58 -07:00
db_iter_test.cc Remove own ToString() (#9955) 2022-05-06 13:03:58 -07:00
db_iterator_test.cc Remove own ToString() (#9955) 2022-05-06 13:03:58 -07:00
db_kv_checksum_test.cc Add memtable per key-value checksum (#10281) 2022-08-12 13:51:32 -07:00
db_log_iter_test.cc Remove own ToString() (#9955) 2022-05-06 13:03:58 -07:00
db_logical_block_size_cache_test.cc Attempt to deflake DBLogicalBlockSizeCacheTest.CreateColumnFamilies (#9516) 2022-03-04 11:35:28 -08:00
db_memtable_test.cc Fragment memtable range tombstone in the write path (#10380) 2022-08-05 12:02:33 -07:00
db_merge_operand_test.cc Avoid allocations/copies for large `GetMergeOperands()` results (#10458) 2022-08-04 00:42:13 -07:00
db_merge_operator_test.cc
db_options_test.cc Fix typo in comments and code (#10233) 2022-06-22 15:45:21 -07:00
db_properties_test.cc Add blob cache tickers, perf context statistics, and DB properties (#10203) 2022-06-28 13:52:35 -07:00
db_range_del_test.cc Try to trivial move more than one files (#10190) 2022-07-05 10:10:37 -07:00
db_rate_limiter_test.cc Add rate-limiting support to batched MultiGet() (#10159) 2022-06-17 16:40:47 -07:00
db_readonly_with_timestamp_test.cc Add timestamp support to CompactedDBImpl (#10030) 2022-05-24 12:14:10 -07:00
db_secondary_test.cc Fail DB::Open() if logger cannot be created (#9984) 2022-05-27 07:23:31 -07:00
db_sst_test.cc Pass the size of blob files to SstFileManager during DB open (#10062) 2022-05-27 05:58:43 -07:00
db_statistics_test.cc
db_table_properties_test.cc Remove own ToString() (#9955) 2022-05-06 13:03:58 -07:00
db_tailing_iter_test.cc
db_test.cc Re-enable SuggestCompactRangeTest and add Universal Compaction test (#10473) 2022-08-05 13:16:58 -07:00
db_test2.cc Add a blob-specific cache priority (#10461) 2022-08-12 17:59:06 -07:00
db_test_util.cc Tiered compaction: integrate Seqno time mapping with per key placement (#10370) 2022-07-15 19:01:30 -07:00
db_test_util.h Tiered compaction: integrate Seqno time mapping with per key placement (#10370) 2022-07-15 19:01:30 -07:00
db_universal_compaction_test.cc Remove own ToString() (#9955) 2022-05-06 13:03:58 -07:00
db_wal_test.cc Deflake DBWALTest.RaceInstallFlushResultsWithWalObsoletion (#10456) 2022-08-04 12:14:28 -07:00
db_with_timestamp_basic_test.cc Fix assertion error with read_opts.iter_start_ts (#10279) 2022-06-30 10:16:03 -07:00
db_with_timestamp_compaction_test.cc Provide support for subcompactions with user-defined timestamps (#10344) 2022-07-31 11:39:16 -07:00
db_with_timestamp_test_util.cc Add timestamp support to DBImplReadOnly (#10004) 2022-05-19 18:39:41 -07:00
db_with_timestamp_test_util.h Add timestamp support to DBImplReadOnly (#10004) 2022-05-19 18:39:41 -07:00
db_write_buffer_manager_test.cc Prevent a case of WriteBufferManager flush thrashing (#6364) 2022-08-17 15:53:40 -07:00
db_write_test.cc Fix race in ExitAsBatchGroupLeader with pipelined writes (#9944) 2022-08-02 14:52:10 -07:00
dbformat.cc Make InternalKeyComparator not configurable (#10342) 2022-07-14 10:09:31 -07:00
dbformat.h Add helper function to get debug type name (#10243) 2022-07-15 14:42:00 -07:00
dbformat_test.cc Make InternalKeyComparator not configurable (#10342) 2022-07-14 10:09:31 -07:00
deletefile_test.cc Remove own ToString() (#9955) 2022-05-06 13:03:58 -07:00
error_handler.cc Do not hold mutex when write keys if not necessary (#7516) 2022-07-21 13:35:36 -07:00
error_handler.h Do not hold mutex when write keys if not necessary (#7516) 2022-07-21 13:35:36 -07:00
error_handler_fs_test.cc Remove own ToString() (#9955) 2022-05-06 13:03:58 -07:00
event_helpers.cc LOG more info on oldest snapshot and sequence numbers (#10454) 2022-08-12 13:08:50 -07:00
event_helpers.h Add a listener callback for end of auto error recovery (#9244) 2021-12-08 14:30:57 -08:00
experimental.cc Remove unused fields from FileMetaData (temporarily) (#10443) 2022-08-01 17:56:13 -07:00
external_sst_file_basic_test.cc Tiered compaction: integrate Seqno time mapping with per key placement (#10370) 2022-07-15 19:01:30 -07:00
external_sst_file_ingestion_job.cc Remove unused fields from FileMetaData (temporarily) (#10443) 2022-08-01 17:56:13 -07:00
external_sst_file_ingestion_job.h Track SST unique id in MANIFEST and verify (#9990) 2022-05-19 11:04:21 -07:00
external_sst_file_test.cc Remove own ToString() (#9955) 2022-05-06 13:03:58 -07:00
fault_injection_test.cc Fix a bug causing duplicate trailing entries in WritableFile (buffered IO) (#9236) 2021-12-13 09:00:36 -08:00
file_indexer.cc
file_indexer.h Use std::numeric_limits<> (#9954) 2022-05-05 13:08:21 -07:00
file_indexer_test.cc
filename_test.cc fixing issue #8345 RocksDB does not work when using UNC network paths (#9384) 2022-03-30 15:55:31 -07:00
flush_job.cc Fragment memtable range tombstone in the write path (#10380) 2022-08-05 12:02:33 -07:00
flush_job.h Add seqno to time mapping (#10338) 2022-07-14 21:49:34 -07:00
flush_job_test.cc Fragment memtable range tombstone in the write path (#10380) 2022-08-05 12:02:33 -07:00
flush_scheduler.cc
flush_scheduler.h
forward_iterator.cc Fragment memtable range tombstone in the write path (#10380) 2022-08-05 12:02:33 -07:00
forward_iterator.h Make InternalKeyComparator not configurable (#10342) 2022-07-14 10:09:31 -07:00
forward_iterator_bench.cc Remove using namespace (#9369) 2022-01-12 09:31:12 -08:00
history_trimming_iterator.h Add OpenAndTrimHistory API to support trimming data with specified timestamp (#9410) 2022-03-11 16:13:23 -08:00
import_column_family_job.cc Add some missing headers (#10519) 2022-08-11 12:45:50 -07:00
import_column_family_job.h Add some missing headers (#10519) 2022-08-11 12:45:50 -07:00
import_column_family_test.cc Add some missing headers (#10519) 2022-08-11 12:45:50 -07:00
internal_stats.cc Add blob cache tickers, perf context statistics, and DB properties (#10203) 2022-06-28 13:52:35 -07:00
internal_stats.h Tiered Compaction: per key placement support (#9964) 2022-07-13 20:54:49 -07:00
job_context.h CompactionIterator sees consistent view of which keys are committed (#9830) 2022-04-14 11:11:04 -07:00
kv_checksum.h Add memtable per key-value checksum (#10281) 2022-08-12 13:51:32 -07:00
listener_test.cc Remove own ToString() (#9955) 2022-05-06 13:03:58 -07:00
log_format.h Add record to set WAL compression type if enabled (#9556) 2022-02-17 16:19:31 -08:00
log_reader.cc Add checksum handshake for WAL fragment decompression (#10339) 2022-07-25 16:27:26 -07:00
log_reader.h Add checksum handshake for WAL fragment decompression (#10339) 2022-07-25 16:27:26 -07:00
log_test.cc Add checksum handshake for WAL fragment decompression (#10339) 2022-07-25 16:27:26 -07:00
log_writer.cc WritableFileWriter tries to skip operations after failure (#10489) 2022-08-10 10:19:20 -07:00
log_writer.h Fix typo in comments and code (#10233) 2022-06-22 15:45:21 -07:00
logs_with_prep_tracker.cc
logs_with_prep_tracker.h
lookup_key.h
malloc_stats.cc
malloc_stats.h
manual_compaction_test.cc Migrate to docker for CI run (#10496) 2022-08-10 17:34:38 -07:00
memtable.cc Add memtable per key-value checksum (#10281) 2022-08-12 13:51:32 -07:00
memtable.h Add memtable per key-value checksum (#10281) 2022-08-12 13:51:32 -07:00
memtable_list.cc Prevent a case of WriteBufferManager flush thrashing (#6364) 2022-08-17 15:53:40 -07:00
memtable_list.h Prevent a case of WriteBufferManager flush thrashing (#6364) 2022-08-17 15:53:40 -07:00
memtable_list_test.cc Fragment memtable range tombstone in the write path (#10380) 2022-08-05 12:02:33 -07:00
merge_context.h
merge_helper.cc Add API for writing wide-column entities (#10242) 2022-06-25 15:30:47 -07:00
merge_helper.h
merge_helper_test.cc
merge_operator.cc
merge_test.cc Make the Env class Customizable (#9293) 2022-01-04 16:45:49 -08:00
obsolete_files_test.cc Remove own ToString() (#9955) 2022-05-06 13:03:58 -07:00
options_file_test.cc
output_validator.cc
output_validator.h
perf_context_test.cc Migrate to docker for CI run (#10496) 2022-08-10 17:34:38 -07:00
periodic_work_scheduler.cc Fix seqno->time worker not scheduled with multi DB instances (#10383) 2022-07-18 19:08:39 -07:00
periodic_work_scheduler.h Fix seqno->time worker not scheduled with multi DB instances (#10383) 2022-07-18 19:08:39 -07:00
periodic_work_scheduler_test.cc Add seqno to time mapping (#10338) 2022-07-14 21:49:34 -07:00
pinned_iterators_manager.h Avoid allocations/copies for large `GetMergeOperands()` results (#10458) 2022-08-04 00:42:13 -07:00
plain_table_db_test.cc Remove own ToString() (#9955) 2022-05-06 13:03:58 -07:00
post_memtable_callback.h Snapshots with user-specified timestamps (#9879) 2022-06-10 16:07:03 -07:00
pre_release_callback.h
prefix_test.cc Remove own ToString() (#9955) 2022-05-06 13:03:58 -07:00
range_del_aggregator.cc
range_del_aggregator.h
range_del_aggregator_bench.cc
range_del_aggregator_test.cc
range_tombstone_fragmenter.cc
range_tombstone_fragmenter.h
range_tombstone_fragmenter_test.cc
read_callback.h
repair.cc Fragment memtable range tombstone in the write path (#10380) 2022-08-05 12:02:33 -07:00
repair_test.cc Remove customized naming from InternalKeyComparator (#10343) 2022-07-12 13:30:35 -07:00
seqno_time_test.cc Change `bottommost_temperture` to `last_level_temperture` (#10471) 2022-08-08 14:36:34 -07:00
seqno_to_time_mapping.cc Tiered compaction: integrate Seqno time mapping with per key placement (#10370) 2022-07-15 19:01:30 -07:00
seqno_to_time_mapping.h Tiered compaction: integrate Seqno time mapping with per key placement (#10370) 2022-07-15 19:01:30 -07:00
snapshot_checker.h Use STATIC_AVOID_DESTRUCTION for static objects with non-trivial destructors (#9958) 2022-05-17 09:39:22 -07:00
snapshot_impl.cc
snapshot_impl.h Snapshots with user-specified timestamps (#9879) 2022-06-10 16:07:03 -07:00
table_cache.cc Fix range deletion handling in async MultiGet (#10534) 2022-08-17 13:51:39 -07:00
table_cache.h Fix range deletion handling in async MultiGet (#10534) 2022-08-17 13:51:39 -07:00
table_cache_sync_and_async.h Fix range deletion handling in async MultiGet (#10534) 2022-08-17 13:51:39 -07:00
table_properties_collector.cc
table_properties_collector.h
table_properties_collector_test.cc WritableFileWriter tries to skip operations after failure (#10489) 2022-08-10 10:19:20 -07:00
transaction_log_impl.cc Support read rate-limiting in SequentialFileReader (#9973) 2022-05-24 10:28:57 -07:00
transaction_log_impl.h Add checks to GetUpdatesSince (#9459) 2022-04-14 17:12:16 -07:00
trim_history_scheduler.cc
trim_history_scheduler.h
version_builder.cc Add basic kRoundRobin compaction policy (#10107) 2022-06-21 11:56:53 -07:00
version_builder.h Account memory of FileMetaData in global memory limit (#9924) 2022-06-14 13:06:40 -07:00
version_builder_test.cc Remove unused fields from FileMetaData (temporarily) (#10443) 2022-08-01 17:56:13 -07:00
version_edit.cc Remove unused fields from FileMetaData (temporarily) (#10443) 2022-08-01 17:56:13 -07:00
version_edit.h Remove unused fields from FileMetaData (temporarily) (#10443) 2022-08-01 17:56:13 -07:00
version_edit_handler.cc Add blob source to retrieve blobs in RocksDB (#10198) 2022-06-20 20:58:11 -07:00
version_edit_handler.h
version_edit_test.cc Remove unused fields from FileMetaData (temporarily) (#10443) 2022-08-01 17:56:13 -07:00
version_set.cc Fix range deletion handling in async MultiGet (#10534) 2022-08-17 13:51:39 -07:00
version_set.h Fix range deletion handling in async MultiGet (#10534) 2022-08-17 13:51:39 -07:00
version_set_sync_and_async.h Fix range deletion handling in async MultiGet (#10534) 2022-08-17 13:51:39 -07:00
version_set_test.cc WritableFileWriter tries to skip operations after failure (#10489) 2022-08-10 10:19:20 -07:00
version_util.h Add blob source to retrieve blobs in RocksDB (#10198) 2022-06-20 20:58:11 -07:00
wal_edit.cc Do not hold mutex when write keys if not necessary (#7516) 2022-07-21 13:35:36 -07:00
wal_edit.h Do not hold mutex when write keys if not necessary (#7516) 2022-07-21 13:35:36 -07:00
wal_edit_test.cc Do not hold mutex when write keys if not necessary (#7516) 2022-07-21 13:35:36 -07:00
wal_manager.cc Fix bug for WalManager with compressed WAL (#10130) 2022-06-08 14:16:43 -07:00
wal_manager.h Fix bug for WalManager with compressed WAL (#10130) 2022-06-08 14:16:43 -07:00
wal_manager_test.cc Migrate to docker for CI run (#10496) 2022-08-10 17:34:38 -07:00
write_batch.cc Handoff checksum during WAL replay (#10212) 2022-07-05 15:44:35 -07:00
write_batch_base.cc
write_batch_internal.h Handoff checksum during WAL replay (#10212) 2022-07-05 15:44:35 -07:00
write_batch_test.cc Fragment memtable range tombstone in the write path (#10380) 2022-08-05 12:02:33 -07:00
write_callback.h
write_callback_test.cc Migrate to docker for CI run (#10496) 2022-08-10 17:34:38 -07:00
write_controller.cc
write_controller.h Set Write rate limiter priority dynamically and pass it to FS (#9988) 2022-05-18 00:41:41 -07:00
write_controller_test.cc
write_thread.cc Fix race in ExitAsBatchGroupLeader with pipelined writes (#9944) 2022-08-02 14:52:10 -07:00
write_thread.h Fix race in ExitAsBatchGroupLeader with pipelined writes (#9944) 2022-08-02 14:52:10 -07:00