rocksdb/db
Baptiste Lemaire e3a96c4823 Memtable sampling for mempurge heuristic. (#8628)
Summary:
Changes the API of the MemPurge process: the `bool experimental_allow_mempurge` and `experimental_mempurge_policy` flags have been replaced by a `double experimental_mempurge_threshold` option.
This change of API reflects another major change introduced in this PR: the MemPurgeDecider() function now works by sampling the memtables being flushed to estimate the overall amount of useful payload (payload minus the garbage), and then compare this useful payload estimate with the `double experimental_mempurge_threshold` value.
Therefore, when the value of this flag is `0.0` (default value), mempurge is simply deactivated. On the other hand, a value of `DBL_MAX` would be equivalent to always going through a mempurge regardless of the garbage ratio estimate.
At the moment, a `double experimental_mempurge_threshold` value else than 0.0 or `DBL_MAX` is opnly supported`with the `SkipList` memtable representation.
Regarding the sampling, this PR includes the introduction of a `MemTable::UniqueRandomSample` function that collects (approximately) random entries from the memtable by using the new `SkipList::Iterator::RandomSeek()` under the hood, or by iterating through each memtable entry, depending on the target sample size and the total number of entries.
The unit tests have been readapted to support this new API.

Pull Request resolved: https://github.com/facebook/rocksdb/pull/8628

Reviewed By: pdillinger

Differential Revision: D30149315

Pulled By: bjlemaire

fbshipit-source-id: 1feef5390c95db6f4480ab4434716533d3947f27
2021-08-10 18:09:03 -07:00
..
blob Fix a issue with initializing blob header buffer (#8537) 2021-08-02 17:15:06 -07:00
compaction Move old files to warm tier in FIFO compactions (#8310) 2021-08-09 12:51:14 -07:00
db_impl Memtable sampling for mempurge heuristic. (#8628) 2021-08-10 18:09:03 -07:00
arena_wrapped_db_iter.cc Rename ImmutableOptions variables (#8409) 2021-06-16 16:51:38 -07:00
arena_wrapped_db_iter.h Rename ImmutableOptions variables (#8409) 2021-06-16 16:51:38 -07:00
builder.cc Memtable "MemPurge" prototype (#8454) 2021-07-02 05:23:02 -07:00
builder.h Added memtable garbage statistics (#8411) 2021-06-18 04:57:27 -07:00
c.cc Memtable sampling for mempurge heuristic. (#8628) 2021-08-10 18:09:03 -07:00
c_test.c Add ribbon filter to C API (#8486) 2021-07-09 16:22:48 -07:00
column_family.cc Fix a race in ColumnFamilyData::UnrefAndTryDelete (#8605) 2021-08-02 18:12:11 -07:00
column_family.h Fix a race in ColumnFamilyData::UnrefAndTryDelete (#8605) 2021-08-02 18:12:11 -07:00
column_family_test.cc Add CreateFrom methods to Env/FileSystem (#8174) 2021-06-15 03:43:48 -07:00
compact_files_test.cc Compaction should not move data to up level (#8116) 2021-03-29 17:10:42 -07:00
comparator_db_test.cc Remove unused includes (#7604) 2020-10-28 23:22:27 -07:00
convenience.cc Make ImmutableOptions struct that inherits from ImmutableCFOptions and ImmutableDBOptions (#8262) 2021-05-05 14:00:17 -07:00
corruption_test.cc Add CreateFrom methods to Env/FileSystem (#8174) 2021-06-15 03:43:48 -07:00
cuckoo_table_db_test.cc Revert "Turn on memtable bloom filter by default. (#6584)" (#7939) 2021-02-06 22:34:30 -08:00
db_basic_test.cc Fix the sorting of KeyContexts for batched MultiGet (#8633) 2021-08-06 16:27:42 -07:00
db_block_cache_test.cc Dynamically configure BlockBasedTableOptions.prepopulate_block_cache (#8620) 2021-08-05 19:44:51 -07:00
db_bloom_filter_test.cc Fix a minor issue with initializing the test path (#8555) 2021-07-23 08:38:45 -07:00
db_compaction_filter_test.cc Fix a minor issue with initializing the test path (#8555) 2021-07-23 08:38:45 -07:00
db_compaction_test.cc Move old files to warm tier in FIFO compactions (#8310) 2021-08-09 12:51:14 -07:00
db_dynamic_level_test.cc Fix a minor issue with initializing the test path (#8555) 2021-07-23 08:38:45 -07:00
db_encryption_test.cc Fix a minor issue with initializing the test path (#8555) 2021-07-23 08:38:45 -07:00
db_filesnapshot.cc DB::GetSortedWalFiles() to ensure file deletion is disabled (#8591) 2021-07-29 11:51:08 -07:00
db_flush_test.cc Memtable sampling for mempurge heuristic. (#8628) 2021-08-10 18:09:03 -07:00
db_info_dumper.cc Allow WAL dir to change with db dir (#8582) 2021-07-30 12:16:44 -07:00
db_info_dumper.h
db_inplace_update_test.cc Fix a minor issue with initializing the test path (#8555) 2021-07-23 08:38:45 -07:00
db_io_failure_test.cc Fix a minor issue with initializing the test path (#8555) 2021-07-23 08:38:45 -07:00
db_iter.cc Rename ImmutableOptions variables (#8409) 2021-06-16 16:51:38 -07:00
db_iter.h Rename ImmutableOptions variables (#8409) 2021-06-16 16:51:38 -07:00
db_iter_stress_test.cc Make ImmutableOptions struct that inherits from ImmutableCFOptions and ImmutableDBOptions (#8262) 2021-05-05 14:00:17 -07:00
db_iter_test.cc Rename ImmutableOptions variables (#8409) 2021-06-16 16:51:38 -07:00
db_iterator_test.cc Fix a minor issue with initializing the test path (#8555) 2021-07-23 08:38:45 -07:00
db_kv_checksum_test.cc Fix a minor issue with initializing the test path (#8555) 2021-07-23 08:38:45 -07:00
db_log_iter_test.cc Attempt to deflake DBTestXactLogIterator.TransactionLogIteratorCorruptedLog (#8627) 2021-08-10 11:10:07 -07:00
db_logical_block_size_cache_test.cc Add further tests to ASSERT_STATUS_CHECKED (2) (#7698) 2020-12-09 21:21:16 -08:00
db_memtable_test.cc Fix a minor issue with initializing the test path (#8555) 2021-07-23 08:38:45 -07:00
db_merge_operand_test.cc Fix a minor issue with initializing the test path (#8555) 2021-07-23 08:38:45 -07:00
db_merge_operator_test.cc Fix a minor issue with initializing the test path (#8555) 2021-07-23 08:38:45 -07:00
db_options_test.cc Fix a minor issue with initializing the test path (#8555) 2021-07-23 08:38:45 -07:00
db_properties_test.cc Fix a minor issue with initializing the test path (#8555) 2021-07-23 08:38:45 -07:00
db_range_del_test.cc Fix missing Handle release in TableCache::GetRangeTombstoneIterator (#8589) 2021-07-27 21:32:11 -07:00
db_secondary_test.cc Fix a minor issue with initializing the test path (#8555) 2021-07-23 08:38:45 -07:00
db_sst_test.cc Fix a minor issue with initializing the test path (#8555) 2021-07-23 08:38:45 -07:00
db_statistics_test.cc Fix a minor issue with initializing the test path (#8555) 2021-07-23 08:38:45 -07:00
db_table_properties_test.cc Fix a minor issue with initializing the test path (#8555) 2021-07-23 08:38:45 -07:00
db_tailing_iter_test.cc Fix a minor issue with initializing the test path (#8555) 2021-07-23 08:38:45 -07:00
db_test.cc Fix a minor issue with initializing the test path (#8555) 2021-07-23 08:38:45 -07:00
db_test2.cc Add an unittest for tiered storage universal compaction (#8631) 2021-08-09 13:44:23 -07:00
db_test_util.cc Make EncryptionProvider and BlockCipher into Customizable objects (#8354) 2021-07-16 07:58:51 -07:00
db_test_util.h Do not attempt to rename non-existent info log (#8622) 2021-08-04 17:25:00 -07:00
db_universal_compaction_test.cc Fix a minor issue with initializing the test path (#8555) 2021-07-23 08:38:45 -07:00
db_wal_test.cc Fix a minor issue with initializing the test path (#8555) 2021-07-23 08:38:45 -07:00
db_with_timestamp_basic_test.cc Move slow valgrind tests behind -DROCKSDB_FULL_VALGRIND_RUN (#8475) 2021-07-07 11:14:05 -07:00
db_with_timestamp_compaction_test.cc Fix a minor issue with initializing the test path (#8555) 2021-07-23 08:38:45 -07:00
db_write_buffer_manager_test.cc Fix a minor issue with initializing the test path (#8555) 2021-07-23 08:38:45 -07:00
db_write_test.cc Fix a minor issue with initializing the test path (#8555) 2021-07-23 08:38:45 -07:00
dbformat.cc Make CompactRange and GetApproximateSizes work with timestamp (#7684) 2020-12-02 13:00:53 -08:00
dbformat.h Fix some typos in comments (#8066) 2021-03-25 21:18:08 -07:00
dbformat_test.cc Remove unused includes (#7604) 2020-10-28 23:22:27 -07:00
deletefile_test.cc Fix a minor issue with initializing the test path (#8555) 2021-07-23 08:38:45 -07:00
error_handler.cc DB::GetSortedWalFiles() to ensure file deletion is disabled (#8591) 2021-07-29 11:51:08 -07:00
error_handler.h Fix some typos in comments (#8066) 2021-03-25 21:18:08 -07:00
error_handler_fs_test.cc Fix a minor issue with initializing the test path (#8555) 2021-07-23 08:38:45 -07:00
event_helpers.cc Make EventListener into a Customizable Class (#8473) 2021-07-27 07:47:02 -07:00
event_helpers.h
experimental.cc
external_sst_file_basic_test.cc Fix a minor issue with initializing the test path (#8555) 2021-07-23 08:38:45 -07:00
external_sst_file_ingestion_job.cc Sync ingested files only if reopen is supported by the FS (#8296) 2021-05-18 19:33:55 -07:00
external_sst_file_ingestion_job.h Use SystemClock* instead of std::shared_ptr<SystemClock> in lower level routines (#8033) 2021-03-15 04:34:11 -07:00
external_sst_file_test.cc Fix a minor issue with initializing the test path (#8555) 2021-07-23 08:38:45 -07:00
fault_injection_test.cc Add more tests for assert status checked (#7524) 2020-12-22 23:45:58 -08:00
file_indexer.cc
file_indexer.h
file_indexer_test.cc
filename_test.cc Remove unused includes (#7604) 2020-10-28 23:22:27 -07:00
flush_job.cc Memtable sampling for mempurge heuristic. (#8628) 2021-08-10 18:09:03 -07:00
flush_job.h Memtable sampling for mempurge heuristic. (#8628) 2021-08-10 18:09:03 -07:00
flush_job_test.cc Fix NotifyOnFlushCompleted() for atomic flush (#8585) 2021-08-03 13:31:10 -07:00
flush_scheduler.cc
flush_scheduler.h Include C++ standard library headers instead of C compatibility headers (#8068) 2021-03-19 12:09:47 -07:00
forward_iterator.cc Fix some typos in comments (#8066) 2021-03-25 21:18:08 -07:00
forward_iterator.h
forward_iterator_bench.cc
import_column_family_job.cc Add a SystemClock class to capture the time functions of an Env (#7858) 2021-01-25 22:09:11 -08:00
import_column_family_job.h Use SystemClock* instead of std::shared_ptr<SystemClock> in lower level routines (#8033) 2021-03-15 04:34:11 -07:00
import_column_family_test.cc Fix a minor issue with initializing the test path (#8555) 2021-07-23 08:38:45 -07:00
internal_stats.cc Don't hold DB mutex for block cache entry stat scans (#8538) 2021-07-16 14:13:08 -07:00
internal_stats.h Don't hold DB mutex for block cache entry stat scans (#8538) 2021-07-16 14:13:08 -07:00
job_context.h Rename ImmutableOptions variables (#8409) 2021-06-16 16:51:38 -07:00
kv_checksum.h Integrity protection for live updates to WriteBatch (#7748) 2021-01-29 12:18:58 -08:00
listener_test.cc Fix NotifyOnFlushCompleted() for atomic flush (#8585) 2021-08-03 13:31:10 -07:00
log_format.h
log_reader.cc Fix kPointInTimeRecovery handling of truncated WAL (#7701) 2020-11-30 18:11:38 -08:00
log_reader.h
log_test.cc Make StringEnv, StringSink, StringSource use FS classes (#7786) 2021-01-04 16:01:01 -08:00
log_writer.cc Using existing crc32c checksum in checksum handoff for Manifest and WAL (#8412) 2021-06-25 00:47:17 -07:00
log_writer.h Include C++ standard library headers instead of C compatibility headers (#8068) 2021-03-19 12:09:47 -07:00
logs_with_prep_tracker.cc
logs_with_prep_tracker.h Include C++ standard library headers instead of C compatibility headers (#8068) 2021-03-19 12:09:47 -07:00
lookup_key.h
malloc_stats.cc
malloc_stats.h
manual_compaction_test.cc Add more tests for assert status checked (#7524) 2020-12-22 23:45:58 -08:00
memtable.cc Retire superfluous functions introduced in earlier mempurge PRs. (#8558) 2021-07-22 18:29:13 -07:00
memtable.h Memtable sampling for mempurge heuristic. (#8628) 2021-08-10 18:09:03 -07:00
memtable_list.cc Fix NotifyOnFlushCompleted() for atomic flush (#8585) 2021-08-03 13:31:10 -07:00
memtable_list.h Memtable sampling for mempurge heuristic. (#8628) 2021-08-10 18:09:03 -07:00
memtable_list_test.cc Fix NotifyOnFlushCompleted() for atomic flush (#8585) 2021-08-03 13:31:10 -07:00
merge_context.h Add Merge Operator support to WriteBatchWithIndex (#8135) 2021-05-10 12:50:25 -07:00
merge_helper.cc Add support for Merge with base value during Compaction in IntegratedBlobDB (#8445) 2021-06-24 18:11:30 -07:00
merge_helper.h Add support for Merge with base value during Compaction in IntegratedBlobDB (#8445) 2021-06-24 18:11:30 -07:00
merge_helper_test.cc
merge_operator.cc
merge_test.cc MergeHelper::FilterMerge() calling ElapsedNanosSafe() upon exit even … (#7867) 2021-01-21 13:13:02 -08:00
obsolete_files_test.cc Attempt to deflake ObsoleteFilesTest.DeleteObsoleteOptionsFile (#8624) 2021-08-05 18:36:16 -07:00
options_file_test.cc No elide constructors (#7798) 2020-12-23 16:55:53 -08:00
output_validator.cc Use NPHash64 in more places (#7632) 2020-11-10 23:42:13 -08:00
output_validator.h Add remote compaction public API (#8300) 2021-05-19 21:41:31 -07:00
perf_context_test.cc Use SystemClock* instead of std::shared_ptr<SystemClock> in lower level routines (#8033) 2021-03-15 04:34:11 -07:00
periodic_work_scheduler.cc Use SystemClock* instead of std::shared_ptr<SystemClock> in lower level routines (#8033) 2021-03-15 04:34:11 -07:00
periodic_work_scheduler.h Add a SystemClock class to capture the time functions of an Env (#7858) 2021-01-25 22:09:11 -08:00
periodic_work_scheduler_test.cc Fix a minor issue with initializing the test path (#8555) 2021-07-23 08:38:45 -07:00
pinned_iterators_manager.h
plain_table_db_test.cc Make ImmutableOptions struct that inherits from ImmutableCFOptions and ImmutableDBOptions (#8262) 2021-05-05 14:00:17 -07:00
pre_release_callback.h
prefix_test.cc Use SystemClock* instead of std::shared_ptr<SystemClock> in lower level routines (#8033) 2021-03-15 04:34:11 -07:00
range_del_aggregator.cc In ParseInternalKey(), include corrupt key info in Status (#7515) 2020-10-28 10:12:58 -07:00
range_del_aggregator.h Fix some typos in comments (#8066) 2021-03-25 21:18:08 -07:00
range_del_aggregator_bench.cc Use SystemClock* instead of std::shared_ptr<SystemClock> in lower level routines (#8033) 2021-03-15 04:34:11 -07:00
range_del_aggregator_test.cc
range_tombstone_fragmenter.cc Added memtable garbage statistics (#8411) 2021-06-18 04:57:27 -07:00
range_tombstone_fragmenter.h Added memtable garbage statistics (#8411) 2021-06-18 04:57:27 -07:00
range_tombstone_fragmenter_test.cc
read_callback.h
repair.cc Allow WAL dir to change with db dir (#8582) 2021-07-30 12:16:44 -07:00
repair_test.cc Some fixes and enhancements to `ldb repair` (#8544) 2021-07-28 16:44:14 -07:00
snapshot_checker.h
snapshot_impl.cc
snapshot_impl.h Fix some typos in comments (#8066) 2021-03-25 21:18:08 -07:00
table_cache.cc Fix use-after-free on implicit temporary FileOptions (#8571) 2021-07-27 21:49:14 -07:00
table_cache.h Fix use-after-free on implicit temporary FileOptions (#8571) 2021-07-27 21:49:14 -07:00
table_properties_collector.cc Apply `sample_for_compression` to all block-based tables (#8105) 2021-03-25 15:00:45 -07:00
table_properties_collector.h Partially revert the "apply subrange of table property collectors" change (#8465) 2021-07-06 10:14:32 -07:00
table_properties_collector_test.cc Make it possible to apply only a subrange of table property collectors (#8298) 2021-05-17 18:28:39 -07:00
transaction_log_impl.cc Add more tests for assert status checked (#7524) 2020-12-22 23:45:58 -08:00
transaction_log_impl.h
trim_history_scheduler.cc
trim_history_scheduler.h
version_builder.cc Handle blob files when options.best_efforts_recovery is true (#8180) 2021-04-19 11:56:14 -07:00
version_builder.h Handle blob files when options.best_efforts_recovery is true (#8180) 2021-04-19 11:56:14 -07:00
version_builder_test.cc Print blob file checksums as hex (#8437) 2021-06-22 09:49:44 -07:00
version_edit.cc Write file temperature information to manifest (#8284) 2021-05-17 15:15:23 -07:00
version_edit.h Write file temperature information to manifest (#8284) 2021-05-17 15:15:23 -07:00
version_edit_handler.cc Retire superfluous functions introduced in earlier mempurge PRs. (#8558) 2021-07-22 18:29:13 -07:00
version_edit_handler.h Fixed manifest_dump issues when printing keys and values containing null characters (#8378) 2021-06-10 12:55:20 -07:00
version_edit_test.cc Make it able to ignore WAL related VersionEdits in older versions (#7873) 2021-01-19 19:27:53 -08:00
version_set.cc Move old files to warm tier in FIFO compactions (#8310) 2021-08-09 12:51:14 -07:00
version_set.h Retire superfluous functions introduced in earlier mempurge PRs. (#8558) 2021-07-22 18:29:13 -07:00
version_set_test.cc Print blob file checksums as hex (#8437) 2021-06-22 09:49:44 -07:00
wal_edit.cc Always track WAL obsoletion (#7759) 2020-12-09 16:02:12 -08:00
wal_edit.h Always track WAL obsoletion (#7759) 2020-12-09 16:02:12 -08:00
wal_edit_test.cc Always track WAL obsoletion (#7759) 2020-12-09 16:02:12 -08:00
wal_manager.cc Allow WAL dir to change with db dir (#8582) 2021-07-30 12:16:44 -07:00
wal_manager.h Allow WAL dir to change with db dir (#8582) 2021-07-30 12:16:44 -07:00
wal_manager_test.cc Use DbSessionId as cache key prefix when secondary cache is enabled (#8360) 2021-06-10 11:02:43 -07:00
write_batch.cc Several simple local code clean-ups (#8565) 2021-07-30 12:07:49 -07:00
write_batch_base.cc
write_batch_internal.h Several simple local code clean-ups (#8565) 2021-07-30 12:07:49 -07:00
write_batch_test.cc Make ImmutableOptions struct that inherits from ImmutableCFOptions and ImmutableDBOptions (#8262) 2021-05-05 14:00:17 -07:00
write_callback.h
write_callback_test.cc Move slow valgrind tests behind -DROCKSDB_FULL_VALGRIND_RUN (#8475) 2021-07-07 11:14:05 -07:00
write_controller.cc Revamp WriteController (#8064) 2021-03-18 09:47:31 -07:00
write_controller.h Revamp WriteController (#8064) 2021-03-18 09:47:31 -07:00
write_controller_test.cc Revamp WriteController (#8064) 2021-03-18 09:47:31 -07:00
write_thread.cc Stall writes in WriteBufferManager when memory_usage exceeds buffer_size (#7898) 2021-04-21 13:54:02 -07:00
write_thread.h typo: fix typo in db/write_thread's state (#8423) 2021-06-18 17:14:51 -07:00