mirror of https://github.com/facebook/rocksdb.git
Improve universal compaction sorted-run trigger (#12477)
Summary: Universal compaction currently uses `level0_file_num_compaction_trigger` for two purposes: 1. the trigger for checking if there is any compaction to do, and 2. the limit on the number of sorted runs. RocksDB will do compaction to keep the number of sorted runs no more than the value of this option. This can make the option inflexible. A value that is too small causes higher write amp: more compactions to reduce the number of sorted runs. A value that is too big delays potential compaction work and causes worse read performance. This PR introduce an option `CompactionOptionsUniversal::max_read_amp` for only the second purpose: to specify the hard limit on the number of sorted runs. For backward compatibility, `max_read_amp = -1` by default, which means to fallback to the current behavior. When `max_read_amp > 0`,`level0_file_num_compaction_trigger` will only serve as a trigger to find potential compaction. When `max_read_amp = 0`, RocksDB will auto-tune the limit on the number of sorted runs. The estimation is based on DB size, write_buffer_size and size_ratio, so it is adaptive to the size change of the DB. See more in `UniversalCompactionBuilder::PickCompaction()`. Alternatively, users now can configure `max_read_amp` to a very big value and keep `level0_file_num_compaction_trigger` small. This will allow `size_ratio` and `max_size_amplification_percent` to control the number of sorted runs. This essentially disables compactions with reason kUniversalSortedRunNum. Pull Request resolved: https://github.com/facebook/rocksdb/pull/12477 Test Plan: * new unit test * existing unit test for default behavior * updated crash test with the new option * benchmark: * Create a DB that is roughly 24GB in the last level. When `max_read_amp = 0`, we estimate that the DB needs 9 levels to avoid excessive compactions to reduce the number of sorted runs. * We then run fillrandom to ingest another 24GB data to compare write amp. * case 1: small level0 trigger: `level0_file_num_compaction_trigger=5, max_read_amp=-1` * write-amp: 4.8 * case 2: auto-tune: `level0_file_num_compaction_trigger=5, max_read_amp=0` * write-amp: 3.6 * case 3: auto-tune with minimal trigger: `level0_file_num_compaction_trigger=1, max_read_amp=0` * write-amp: 3.8 * case 4: hard-code a good value for trigger: `level0_file_num_compaction_trigger=9` * write-amp: 2.8 ``` Case 1: ** Compaction Stats [default] ** Level Files Size Score Read(GB) Rn(GB) Rnp1(GB) Write(GB) Wnew(GB) Moved(GB) W-Amp Rd(MB/s) Wr(MB/s) Comp(sec) CompMergeCPU(sec) Comp(cnt) Avg(sec) KeyIn KeyDrop Rblob(GB) Wblob(GB) ------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------ L0 0/0 0.00 KB 1.0 0.0 0.0 0.0 22.6 22.6 0.0 1.0 0.0 163.2 141.94 111.10 108 1.314 0 0 0.0 0.0 L45 8/0 1.81 GB 0.0 39.6 11.1 28.5 39.3 10.8 0.0 3.5 209.0 207.3 194.25 191.29 43 4.517 348M 2498K 0.0 0.0 L46 13/0 3.12 GB 0.0 15.3 9.5 5.8 15.0 9.3 0.0 1.6 203.1 199.3 77.13 75.88 16 4.821 134M 2362K 0.0 0.0 L47 19/0 4.68 GB 0.0 15.4 10.5 4.9 14.7 9.8 0.0 1.4 204.0 194.9 77.38 76.15 8 9.673 135M 5920K 0.0 0.0 L48 38/0 9.42 GB 0.0 19.6 11.7 7.9 17.3 9.4 0.0 1.5 206.5 182.3 97.15 95.02 4 24.287 172M 20M 0.0 0.0 L49 91/0 22.70 GB 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.00 0.00 0 0.000 0 0 0.0 0.0 Sum 169/0 41.74 GB 0.0 89.9 42.9 47.0 109.0 61.9 0.0 4.8 156.7 189.8 587.85 549.45 179 3.284 791M 31M 0.0 0.0 Case 2: ** Compaction Stats [default] ** Level Files Size Score Read(GB) Rn(GB) Rnp1(GB) Write(GB) Wnew(GB) Moved(GB) W-Amp Rd(MB/s) Wr(MB/s) Comp(sec) CompMergeCPU(sec) Comp(cnt) Avg(sec) KeyIn KeyDrop Rblob(GB) Wblob(GB) ------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------ L0 1/0 214.47 MB 1.2 0.0 0.0 0.0 22.6 22.6 0.0 1.0 0.0 164.5 140.81 109.98 108 1.304 0 0 0.0 0.0 L44 0/0 0.00 KB 0.0 1.3 1.3 0.0 1.2 1.2 0.0 1.0 206.1 204.9 6.24 5.98 3 2.081 11M 51K 0.0 0.0 L45 4/0 844.36 MB 0.0 7.1 5.4 1.7 7.0 5.4 0.0 1.3 194.6 192.9 37.41 36.00 13 2.878 62M 489K 0.0 0.0 L46 11/0 2.57 GB 0.0 14.6 9.8 4.8 14.3 9.5 0.0 1.5 193.7 189.8 77.09 73.54 17 4.535 128M 2411K 0.0 0.0 L47 24/0 5.81 GB 0.0 19.8 12.0 7.8 18.8 11.0 0.0 1.6 191.4 181.1 106.19 101.21 9 11.799 174M 9166K 0.0 0.0 L48 38/0 9.42 GB 0.0 19.6 11.8 7.9 17.3 9.4 0.0 1.5 197.3 173.6 101.97 97.23 4 25.491 172M 20M 0.0 0.0 L49 91/0 22.70 GB 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.00 0.00 0 0.000 0 0 0.0 0.0 Sum 169/0 41.54 GB 0.0 62.4 40.3 22.1 81.3 59.2 0.0 3.6 136.1 177.2 469.71 423.94 154 3.050 549M 32M 0.0 0.0 Case 3: ** Compaction Stats [default] ** Level Files Size Score Read(GB) Rn(GB) Rnp1(GB) Write(GB) Wnew(GB) Moved(GB) W-Amp Rd(MB/s) Wr(MB/s) Comp(sec) CompMergeCPU(sec) Comp(cnt) Avg(sec) KeyIn KeyDrop Rblob(GB) Wblob(GB) ------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------ L0 0/0 0.00 KB 5.0 0.0 0.0 0.0 22.6 22.6 0.0 1.0 0.0 163.8 141.43 111.13 108 1.310 0 0 0.0 0.0 L44 0/0 0.00 KB 0.0 0.8 0.8 0.0 0.8 0.8 0.0 1.0 201.4 200.2 4.26 4.19 2 2.130 7360K 33K 0.0 0.0 L45 4/0 844.38 MB 0.0 6.3 5.0 1.2 6.2 5.0 0.0 1.2 202.0 200.3 31.81 31.50 12 2.651 55M 403K 0.0 0.0 L46 7/0 1.62 GB 0.0 13.3 8.8 4.6 13.1 8.6 0.0 1.5 198.9 195.7 68.72 67.89 17 4.042 117M 1696K 0.0 0.0 L47 24/0 5.81 GB 0.0 21.7 12.9 8.8 20.6 11.8 0.0 1.6 198.5 188.6 112.04 109.97 12 9.336 191M 9352K 0.0 0.0 L48 41/0 10.14 GB 0.0 24.8 13.0 11.8 21.9 10.1 0.0 1.7 198.6 175.6 127.88 125.36 6 21.313 218M 25M 0.0 0.0 L49 91/0 22.70 GB 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.00 0.00 0 0.000 0 0 0.0 0.0 Sum 167/0 41.10 GB 0.0 67.0 40.5 26.4 85.4 58.9 0.0 3.8 141.1 179.8 486.13 450.04 157 3.096 589M 36M 0.0 0.0 Case 4: ** Compaction Stats [default] ** Level Files Size Score Read(GB) Rn(GB) Rnp1(GB) Write(GB) Wnew(GB) Moved(GB) W-Amp Rd(MB/s) Wr(MB/s) Comp(sec) CompMergeCPU(sec) Comp(cnt) Avg(sec) KeyIn KeyDrop Rblob(GB) Wblob(GB) ------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------ L0 0/0 0.00 KB 0.7 0.0 0.0 0.0 22.6 22.6 0.0 1.0 0.0 158.6 146.02 114.68 108 1.352 0 0 0.0 0.0 L42 0/0 0.00 KB 0.0 1.7 1.7 0.0 1.7 1.7 0.0 1.0 185.4 184.3 9.25 8.96 4 2.314 14M 67K 0.0 0.0 L43 0/0 0.00 KB 0.0 2.5 2.5 0.0 2.5 2.5 0.0 1.0 197.8 195.6 13.01 12.65 4 3.253 22M 202K 0.0 0.0 L44 4/0 844.40 MB 0.0 4.2 4.2 0.0 4.1 4.1 0.0 1.0 188.1 185.1 22.81 21.89 5 4.562 36M 503K 0.0 0.0 L45 13/0 3.12 GB 0.0 7.5 6.5 1.0 7.2 6.2 0.0 1.1 188.7 181.8 40.69 39.32 5 8.138 65M 2282K 0.0 0.0 L46 17/0 4.18 GB 0.0 8.3 7.1 1.2 7.9 6.6 0.0 1.1 192.2 181.8 44.23 43.06 4 11.058 73M 3846K 0.0 0.0 L47 22/0 5.34 GB 0.0 8.9 7.5 1.4 8.2 6.8 0.0 1.1 189.1 174.1 48.12 45.37 3 16.041 78M 6098K 0.0 0.0 L48 27/0 6.58 GB 0.0 9.2 7.6 1.6 8.2 6.6 0.0 1.1 195.2 172.9 48.52 47.11 2 24.262 81M 9217K 0.0 0.0 L49 91/0 22.70 GB 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.00 0.00 0 0.000 0 0 0.0 0.0 Sum 174/0 42.74 GB 0.0 42.3 37.0 5.3 62.4 57.1 0.0 2.8 116.3 171.3 372.66 333.04 135 2.760 372M 22M 0.0 0.0 setup: ./db_bench --benchmarks=fillseq,compactall,waitforcompaction --num=200000000 --compression_type=none --disable_wal=1 --compaction_style=1 --num_levels=50 --target_file_size_base=268435456 --max_compaction_bytes=6710886400 --level0_file_num_compaction_trigger=10 --write_buffer_size=268435456 --seed 1708494134896523 benchmark: ./db_bench --benchmarks=overwrite,waitforcompaction,stats --num=200000000 --compression_type=none --disable_wal=1 --compaction_style=1 --write_buffer_size=268435456 --level0_file_num_compaction_trigger=5 --target_file_size_base=268435456 --use_existing_db=1 --num_levels=50 --writes=200000000 --universal_max_read_amp=-1 --seed=1716488324800233 ``` Reviewed By: ajkr Differential Revision: D55370922 Pulled By: cbi42 fbshipit-source-id: 9be69979126b840d08e93e7059260e76a878bb2a
This commit is contained in:
parent
9a72cf1a61
commit
fecb10c2fa
|
@ -1528,6 +1528,18 @@ Status ColumnFamilyData::ValidateOptions(
|
|||
}
|
||||
}
|
||||
}
|
||||
|
||||
if (cf_options.compaction_style == kCompactionStyleUniversal) {
|
||||
int max_read_amp = cf_options.compaction_options_universal.max_read_amp;
|
||||
if (max_read_amp < -1) {
|
||||
return Status::NotSupported("max_read_amp should be at least -1.");
|
||||
} else if (0 < max_read_amp &&
|
||||
max_read_amp < cf_options.level0_file_num_compaction_trigger) {
|
||||
return Status::NotSupported(
|
||||
"max_read_amp limits the number of sorted runs but is smaller than "
|
||||
"the compaction trigger level0_file_num_compaction_trigger.");
|
||||
}
|
||||
}
|
||||
return s;
|
||||
}
|
||||
|
||||
|
|
|
@ -4330,6 +4330,118 @@ TEST_F(CompactionPickerTest, IntraL0WhenL0IsSmall) {
|
|||
}
|
||||
}
|
||||
|
||||
TEST_F(CompactionPickerTest, UniversalMaxReadAmpLargeDB) {
|
||||
ioptions_.compaction_style = kCompactionStyleUniversal;
|
||||
ioptions_.num_levels = 50;
|
||||
mutable_cf_options_.RefreshDerivedOptions(ioptions_);
|
||||
mutable_cf_options_.compaction_options_universal.size_ratio = 10;
|
||||
mutable_cf_options_.write_buffer_size = 256 << 20;
|
||||
// Avoid space amp compaction
|
||||
mutable_cf_options_.compaction_options_universal
|
||||
.max_size_amplification_percent = 200;
|
||||
const int kMaxRuns = 8;
|
||||
for (int max_read_amp : {kMaxRuns, 0, -1}) {
|
||||
SCOPED_TRACE("max_read_amp = " + std::to_string(max_read_amp));
|
||||
if (max_read_amp == -1) {
|
||||
mutable_cf_options_.level0_file_num_compaction_trigger = kMaxRuns;
|
||||
} else {
|
||||
mutable_cf_options_.level0_file_num_compaction_trigger = 4;
|
||||
}
|
||||
mutable_cf_options_.compaction_options_universal.max_read_amp =
|
||||
max_read_amp;
|
||||
UniversalCompactionPicker universal_compaction_picker(ioptions_, &icmp_);
|
||||
uint64_t max_run_size = 20ull << 30;
|
||||
// When max_read_amp = 0, we estimate the number of levels needed based on
|
||||
// size_ratio and write_buffer_size. See more in
|
||||
// UniversalCompactionBuilder::PickCompaction().
|
||||
// With a 20GB last level, we estimate that 8 levels are needed:
|
||||
// L0 256MB
|
||||
// L1 256MB * 1.1 (size_ratio) = 282MB
|
||||
// L2 (256MB + 282MB) * 1.1 = 592MB
|
||||
// L3 1243MB
|
||||
// L4 2610MB
|
||||
// L5 5481MB
|
||||
// L6 11510MB
|
||||
// L7 24171MB > 20GB
|
||||
for (int i = 0; i <= kMaxRuns; ++i) {
|
||||
SCOPED_TRACE("i = " + std::to_string(i));
|
||||
NewVersionStorage(/*num_levels=*/50, kCompactionStyleUniversal);
|
||||
Add(/*level=*/49, /*file_number=*/10, /*smallest=*/"100",
|
||||
/*largest=*/"200", /*file_size=*/max_run_size, /*path_id=*/0,
|
||||
/*smallest_seq=*/0, /*largest_seq=*/0,
|
||||
/*compensated_file_size=*/max_run_size);
|
||||
// Besides the last sorted run, we add additional `i` sorted runs
|
||||
// without triggering space-amp or size-amp compactions.
|
||||
uint64_t file_size = 1 << 20;
|
||||
for (int j = 0; j < i; ++j) {
|
||||
Add(/*level=*/j, /*file_number=*/100 - j, /*smallest=*/"100",
|
||||
/*largest=*/"200", /*file_size=*/file_size, /*path_id=*/0,
|
||||
/*smallest_seq=*/100 - j, /*largest_seq=*/100 - j,
|
||||
/*compensated_file_size=*/file_size);
|
||||
// to avoid space-amp and size-amp compaction
|
||||
file_size *= 2;
|
||||
}
|
||||
UpdateVersionStorageInfo();
|
||||
// level0_file_num_compaction_trigger is still used as trigger to
|
||||
// check potential compactions
|
||||
ASSERT_EQ(
|
||||
universal_compaction_picker.NeedsCompaction(vstorage_.get()),
|
||||
i + 1 >= mutable_cf_options_.level0_file_num_compaction_trigger);
|
||||
std::unique_ptr<Compaction> compaction(
|
||||
universal_compaction_picker.PickCompaction(
|
||||
cf_name_, mutable_cf_options_, mutable_db_options_,
|
||||
vstorage_.get(), &log_buffer_));
|
||||
if (i == kMaxRuns) {
|
||||
// There are in total i + 1 > kMaxRuns sorted runs.
|
||||
// This triggers compaction ignoring size_ratio.
|
||||
ASSERT_NE(nullptr, compaction);
|
||||
ASSERT_EQ(CompactionReason::kUniversalSortedRunNum,
|
||||
compaction->compaction_reason());
|
||||
// First two runs are compacted
|
||||
ASSERT_EQ(0, compaction->start_level());
|
||||
ASSERT_EQ(1, compaction->output_level());
|
||||
ASSERT_EQ(1U, compaction->num_input_files(0));
|
||||
ASSERT_EQ(1U, compaction->num_input_files(1));
|
||||
} else {
|
||||
ASSERT_EQ(nullptr, compaction);
|
||||
}
|
||||
}
|
||||
}
|
||||
}
|
||||
|
||||
TEST_F(CompactionPickerTest, UniversalMaxReadAmpSmallDB) {
|
||||
ioptions_.compaction_style = kCompactionStyleUniversal;
|
||||
ioptions_.num_levels = 50;
|
||||
mutable_cf_options_.RefreshDerivedOptions(ioptions_);
|
||||
mutable_cf_options_.level0_file_num_compaction_trigger = 1;
|
||||
mutable_cf_options_.compaction_options_universal.size_ratio = 10;
|
||||
mutable_cf_options_.write_buffer_size = 256 << 20;
|
||||
mutable_cf_options_.compaction_options_universal
|
||||
.max_size_amplification_percent = 200;
|
||||
const int kMaxRuns = 1;
|
||||
for (int max_read_amp : {-1, kMaxRuns, 0}) {
|
||||
SCOPED_TRACE("max_read_amp = " + std::to_string(max_read_amp));
|
||||
mutable_cf_options_.compaction_options_universal.max_read_amp =
|
||||
max_read_amp;
|
||||
UniversalCompactionPicker universal_compaction_picker(ioptions_, &icmp_);
|
||||
NewVersionStorage(/*num_levels=*/50, kCompactionStyleUniversal);
|
||||
// max_run_size is much smaller than write_buffer_size,
|
||||
// only 1 level is needed.
|
||||
uint64_t max_run_size = 8 << 10;
|
||||
Add(/*level=*/49, /*file_number=*/10, /*smallest=*/"100",
|
||||
/*largest=*/"200", /*file_size=*/max_run_size, /*path_id=*/0,
|
||||
/*smallest_seq=*/0, /*largest_seq=*/0,
|
||||
/*compensated_file_size=*/max_run_size);
|
||||
UpdateVersionStorageInfo();
|
||||
ASSERT_TRUE(universal_compaction_picker.NeedsCompaction(vstorage_.get()));
|
||||
std::unique_ptr<Compaction> compaction(
|
||||
universal_compaction_picker.PickCompaction(
|
||||
cf_name_, mutable_cf_options_, mutable_db_options_, vstorage_.get(),
|
||||
&log_buffer_));
|
||||
ASSERT_EQ(nullptr, compaction);
|
||||
}
|
||||
}
|
||||
|
||||
} // namespace ROCKSDB_NAMESPACE
|
||||
|
||||
int main(int argc, char** argv) {
|
||||
|
|
|
@ -227,6 +227,7 @@ class UniversalCompactionBuilder {
|
|||
const InternalKeyComparator* icmp_;
|
||||
double score_;
|
||||
std::vector<SortedRun> sorted_runs_;
|
||||
uint64_t max_run_size_;
|
||||
const std::string& cf_name_;
|
||||
const MutableCFOptions& mutable_cf_options_;
|
||||
const MutableDBOptions& mutable_db_options_;
|
||||
|
@ -235,7 +236,8 @@ class UniversalCompactionBuilder {
|
|||
LogBuffer* log_buffer_;
|
||||
|
||||
static std::vector<UniversalCompactionBuilder::SortedRun> CalculateSortedRuns(
|
||||
const VersionStorageInfo& vstorage, int last_level);
|
||||
const VersionStorageInfo& vstorage, int last_level,
|
||||
uint64_t* max_run_size);
|
||||
|
||||
// Pick a path ID to place a newly generated file, with its estimated file
|
||||
// size.
|
||||
|
@ -440,11 +442,15 @@ void UniversalCompactionBuilder::SortedRun::DumpSizeInfo(
|
|||
|
||||
std::vector<UniversalCompactionBuilder::SortedRun>
|
||||
UniversalCompactionBuilder::CalculateSortedRuns(
|
||||
const VersionStorageInfo& vstorage, int last_level) {
|
||||
const VersionStorageInfo& vstorage, int last_level,
|
||||
uint64_t* max_run_size) {
|
||||
assert(max_run_size);
|
||||
*max_run_size = 0;
|
||||
std::vector<UniversalCompactionBuilder::SortedRun> ret;
|
||||
for (FileMetaData* f : vstorage.LevelFiles(0)) {
|
||||
ret.emplace_back(0, f, f->fd.GetFileSize(), f->compensated_file_size,
|
||||
f->being_compacted);
|
||||
*max_run_size = std::max(*max_run_size, f->fd.GetFileSize());
|
||||
}
|
||||
for (int level = 1; level <= last_level; level++) {
|
||||
uint64_t total_compensated_size = 0U;
|
||||
|
@ -466,6 +472,7 @@ UniversalCompactionBuilder::CalculateSortedRuns(
|
|||
ret.emplace_back(level, nullptr, total_size, total_compensated_size,
|
||||
being_compacted);
|
||||
}
|
||||
*max_run_size = std::max(*max_run_size, total_size);
|
||||
}
|
||||
return ret;
|
||||
}
|
||||
|
@ -477,13 +484,16 @@ Compaction* UniversalCompactionBuilder::PickCompaction() {
|
|||
score_ = vstorage_->CompactionScore(kLevel0);
|
||||
int max_output_level =
|
||||
vstorage_->MaxOutputLevel(ioptions_.allow_ingest_behind);
|
||||
sorted_runs_ = CalculateSortedRuns(*vstorage_, max_output_level);
|
||||
max_run_size_ = 0;
|
||||
sorted_runs_ =
|
||||
CalculateSortedRuns(*vstorage_, max_output_level, &max_run_size_);
|
||||
int file_num_compaction_trigger =
|
||||
mutable_cf_options_.level0_file_num_compaction_trigger;
|
||||
|
||||
if (sorted_runs_.size() == 0 ||
|
||||
(vstorage_->FilesMarkedForPeriodicCompaction().empty() &&
|
||||
vstorage_->FilesMarkedForCompaction().empty() &&
|
||||
sorted_runs_.size() < (unsigned int)mutable_cf_options_
|
||||
.level0_file_num_compaction_trigger)) {
|
||||
sorted_runs_.size() < (unsigned int)file_num_compaction_trigger)) {
|
||||
ROCKS_LOG_BUFFER(log_buffer_, "[%s] Universal: nothing to do\n",
|
||||
cf_name_.c_str());
|
||||
TEST_SYNC_POINT_CALLBACK(
|
||||
|
@ -505,11 +515,9 @@ Compaction* UniversalCompactionBuilder::PickCompaction() {
|
|||
TEST_SYNC_POINT_CALLBACK("PostPickPeriodicCompaction", c);
|
||||
}
|
||||
|
||||
// Check for size amplification.
|
||||
if (c == nullptr &&
|
||||
sorted_runs_.size() >=
|
||||
static_cast<size_t>(
|
||||
mutable_cf_options_.level0_file_num_compaction_trigger)) {
|
||||
sorted_runs_.size() >= static_cast<size_t>(file_num_compaction_trigger)) {
|
||||
// Check for size amplification.
|
||||
if ((c = PickCompactionToReduceSizeAmp()) != nullptr) {
|
||||
TEST_SYNC_POINT("PickCompactionToReduceSizeAmpReturnNonnullptr");
|
||||
ROCKS_LOG_BUFFER(log_buffer_, "[%s] Universal: compacting for size amp\n",
|
||||
|
@ -527,13 +535,48 @@ Compaction* UniversalCompactionBuilder::PickCompaction() {
|
|||
cf_name_.c_str());
|
||||
} else {
|
||||
// Size amplification and file size ratios are within configured limits.
|
||||
// If max read amplification is exceeding configured limits, then force
|
||||
// compaction without looking at filesize ratios and try to reduce
|
||||
// the number of files to fewer than level0_file_num_compaction_trigger.
|
||||
// If max read amplification exceeds configured limits, then force
|
||||
// compaction to reduce the number sorted runs without looking at file
|
||||
// size ratios.
|
||||
|
||||
// This is guaranteed by NeedsCompaction()
|
||||
assert(sorted_runs_.size() >=
|
||||
static_cast<size_t>(
|
||||
mutable_cf_options_.level0_file_num_compaction_trigger));
|
||||
static_cast<size_t>(file_num_compaction_trigger));
|
||||
int max_num_runs =
|
||||
mutable_cf_options_.compaction_options_universal.max_read_amp;
|
||||
if (max_num_runs < 0) {
|
||||
// any value < -1 is not valid
|
||||
assert(max_num_runs == -1);
|
||||
// By default, fall back to `level0_file_num_compaction_trigger`
|
||||
max_num_runs = file_num_compaction_trigger;
|
||||
} else if (max_num_runs == 0) {
|
||||
if (mutable_cf_options_.compaction_options_universal.stop_style ==
|
||||
kCompactionStopStyleTotalSize) {
|
||||
// 0 means auto-tuning by RocksDB. We estimate max num run based on
|
||||
// max_run_size, size_ratio and write buffer size:
|
||||
// Assume the size of the lowest level size is equal to
|
||||
// write_buffer_size. Each subsequent level is the max size without
|
||||
// triggering size_ratio compaction. `max_num_runs` is the minimum
|
||||
// number of levels required such that the target size of the
|
||||
// largest level is at least `max_run_size_`.
|
||||
max_num_runs = 1;
|
||||
double cur_level_max_size =
|
||||
static_cast<double>(mutable_cf_options_.write_buffer_size);
|
||||
double total_run_size = 0;
|
||||
while (cur_level_max_size < static_cast<double>(max_run_size_)) {
|
||||
// This loop should not take too many iterations since
|
||||
// cur_level_max_size at least doubles each iteration.
|
||||
total_run_size += cur_level_max_size;
|
||||
cur_level_max_size = (100.0 + ratio) / 100.0 * total_run_size;
|
||||
++max_num_runs;
|
||||
}
|
||||
} else {
|
||||
// TODO: implement the auto-tune logic for this stop style
|
||||
max_num_runs = file_num_compaction_trigger;
|
||||
}
|
||||
} else {
|
||||
// max_num_runs > 0, it's the limit on the number of sorted run
|
||||
}
|
||||
// Get the total number of sorted runs that are not being compacted
|
||||
int num_sr_not_compacted = 0;
|
||||
for (size_t i = 0; i < sorted_runs_.size(); i++) {
|
||||
|
@ -544,17 +587,25 @@ Compaction* UniversalCompactionBuilder::PickCompaction() {
|
|||
|
||||
// The number of sorted runs that are not being compacted is greater
|
||||
// than the maximum allowed number of sorted runs
|
||||
if (num_sr_not_compacted >
|
||||
mutable_cf_options_.level0_file_num_compaction_trigger) {
|
||||
unsigned int num_files =
|
||||
num_sr_not_compacted -
|
||||
mutable_cf_options_.level0_file_num_compaction_trigger + 1;
|
||||
if (num_sr_not_compacted > max_num_runs) {
|
||||
unsigned int num_files = num_sr_not_compacted - max_num_runs + 1;
|
||||
if ((c = PickCompactionToReduceSortedRuns(UINT_MAX, num_files)) !=
|
||||
nullptr) {
|
||||
ROCKS_LOG_BUFFER(log_buffer_,
|
||||
"[%s] Universal: compacting for file num -- %u\n",
|
||||
cf_name_.c_str(), num_files);
|
||||
"[%s] Universal: compacting for file num, to "
|
||||
"compact file num -- %u, max num runs allowed"
|
||||
"-- %d, max_run_size -- %" PRIu64 "\n",
|
||||
cf_name_.c_str(), num_files, max_num_runs,
|
||||
max_run_size_);
|
||||
}
|
||||
} else {
|
||||
ROCKS_LOG_BUFFER(
|
||||
log_buffer_,
|
||||
"[%s] Universal: skipping compaction for file num, num runs not "
|
||||
"being compacted -- %u, max num runs allowed -- %d, max_run_size "
|
||||
"-- %" PRIu64 "\n",
|
||||
cf_name_.c_str(), num_sr_not_compacted, max_num_runs,
|
||||
max_run_size_);
|
||||
}
|
||||
}
|
||||
}
|
||||
|
|
|
@ -5645,6 +5645,8 @@ TEST_F(DBTest, DynamicUniversalCompactionOptions) {
|
|||
ASSERT_EQ(
|
||||
dbfull()->GetOptions().compaction_options_universal.allow_trivial_move,
|
||||
false);
|
||||
ASSERT_EQ(dbfull()->GetOptions().compaction_options_universal.max_read_amp,
|
||||
-1);
|
||||
|
||||
ASSERT_OK(dbfull()->SetOptions(
|
||||
{{"compaction_options_universal", "{size_ratio=7;}"}}));
|
||||
|
@ -5666,9 +5668,11 @@ TEST_F(DBTest, DynamicUniversalCompactionOptions) {
|
|||
ASSERT_EQ(
|
||||
dbfull()->GetOptions().compaction_options_universal.allow_trivial_move,
|
||||
false);
|
||||
ASSERT_EQ(dbfull()->GetOptions().compaction_options_universal.max_read_amp,
|
||||
-1);
|
||||
|
||||
ASSERT_OK(dbfull()->SetOptions(
|
||||
{{"compaction_options_universal", "{min_merge_width=11;}"}}));
|
||||
ASSERT_OK(dbfull()->SetOptions({{"compaction_options_universal",
|
||||
"{min_merge_width=11;max_read_amp=0;}"}}));
|
||||
ASSERT_EQ(dbfull()->GetOptions().compaction_options_universal.size_ratio, 7u);
|
||||
ASSERT_EQ(dbfull()->GetOptions().compaction_options_universal.min_merge_width,
|
||||
11u);
|
||||
|
@ -5687,6 +5691,8 @@ TEST_F(DBTest, DynamicUniversalCompactionOptions) {
|
|||
ASSERT_EQ(
|
||||
dbfull()->GetOptions().compaction_options_universal.allow_trivial_move,
|
||||
false);
|
||||
ASSERT_EQ(dbfull()->GetOptions().compaction_options_universal.max_read_amp,
|
||||
0);
|
||||
}
|
||||
|
||||
TEST_F(DBTest, FileCreationRandomFailure) {
|
||||
|
|
|
@ -3501,6 +3501,10 @@ void VersionStorageInfo::ComputeCompactionScore(
|
|||
score = kScoreForNeedCompaction;
|
||||
}
|
||||
} else {
|
||||
// For universal compaction, if a user configures `max_read_amp`, then
|
||||
// the score may be a false positive signal.
|
||||
// `level0_file_num_compaction_trigger` is used as a trigger to check
|
||||
// if there is any compaction work to do.
|
||||
score = static_cast<double>(num_sorted_runs) /
|
||||
mutable_cf_options.level0_file_num_compaction_trigger;
|
||||
if (compaction_style_ == kCompactionStyleLevel && num_levels() > 1) {
|
||||
|
|
|
@ -136,6 +136,7 @@ DECLARE_int32(universal_size_ratio);
|
|||
DECLARE_int32(universal_min_merge_width);
|
||||
DECLARE_int32(universal_max_merge_width);
|
||||
DECLARE_int32(universal_max_size_amplification_percent);
|
||||
DECLARE_int32(universal_max_read_amp);
|
||||
DECLARE_int32(clear_column_family_one_in);
|
||||
DECLARE_int32(get_live_files_apis_one_in);
|
||||
DECLARE_int32(get_all_column_family_metadata_one_in);
|
||||
|
|
|
@ -306,6 +306,9 @@ DEFINE_int32(universal_max_merge_width, 0,
|
|||
DEFINE_int32(universal_max_size_amplification_percent, 0,
|
||||
"The max size amplification for universal style compaction");
|
||||
|
||||
DEFINE_int32(universal_max_read_amp, -1,
|
||||
"The limit on the number of sorted runs");
|
||||
|
||||
DEFINE_int32(clear_column_family_one_in, 1000000,
|
||||
"With a chance of 1/N, delete a column family and then recreate "
|
||||
"it again. If N == 0, never drop/create column families. "
|
||||
|
|
|
@ -3712,6 +3712,8 @@ void InitializeOptionsFromFlags(
|
|||
FLAGS_universal_max_merge_width;
|
||||
options.compaction_options_universal.max_size_amplification_percent =
|
||||
FLAGS_universal_max_size_amplification_percent;
|
||||
options.compaction_options_universal.max_read_amp =
|
||||
FLAGS_universal_max_read_amp;
|
||||
options.atomic_flush = FLAGS_atomic_flush;
|
||||
options.manual_wal_flush = FLAGS_manual_wal_flush_one_in > 0 ? true : false;
|
||||
options.avoid_unnecessary_blocking_io = FLAGS_avoid_unnecessary_blocking_io;
|
||||
|
|
|
@ -234,6 +234,12 @@ struct ColumnFamilyOptions : public AdvancedColumnFamilyOptions {
|
|||
// Number of files to trigger level-0 compaction. A value <0 means that
|
||||
// level-0 compaction will not be triggered by number of files at all.
|
||||
//
|
||||
// Universal compaction: RocksDB will try to keep the number of sorted runs
|
||||
// no more than this number. If CompactionOptionsUniversal::max_read_amp is
|
||||
// set, then this option will be used only as a trigger to look for
|
||||
// compaction. CompactionOptionsUniversal::max_read_amp will be the limit
|
||||
// on the number of sorted runs.
|
||||
//
|
||||
// Default: 4
|
||||
//
|
||||
// Dynamically changeable through SetOptions() API
|
||||
|
|
|
@ -65,6 +65,36 @@ class CompactionOptionsUniversal {
|
|||
// Default: -1
|
||||
int compression_size_percent;
|
||||
|
||||
// The limit on the number of sorted runs. RocksDB will try to keep
|
||||
// the number of sorted runs at most this number. While compactions are
|
||||
// running, the number of sorted runs may be temporarily higher than
|
||||
// this number.
|
||||
//
|
||||
// Since universal compaction checks if there is compaction to do when
|
||||
// the number of sorted runs is at least level0_file_num_compaction_trigger,
|
||||
// it is suggested to set level0_file_num_compaction_trigger to be no larger
|
||||
// than max_read_amp.
|
||||
//
|
||||
// Values:
|
||||
// -1: special flag to let RocksDB pick default. Currently,
|
||||
// RocksDB will fall back to the behavior before this option is introduced,
|
||||
// which is to use level0_file_num_compaction_trigger as the limit.
|
||||
// This may change in the future to behave as 0 below.
|
||||
// 0: Let RocksDB auto-tune. Currently, we determine the max number of
|
||||
// sorted runs based on the current DB size, size_ratio and
|
||||
// write_buffer_size. Note that this is only supported for the default
|
||||
// stop_style kCompactionStopStyleTotalSize. For
|
||||
// kCompactionStopStyleSimilarSize, this behaves as if -1 is configured.
|
||||
// N > 0: limit the number of sorted runs to be at most N.
|
||||
// N should be at least the compaction trigger specified by
|
||||
// level0_file_num_compaction_trigger. If 0 < max_read_amp <
|
||||
// level0_file_num_compaction_trigger, Status::NotSupported() will be
|
||||
// returned during DB open.
|
||||
// N < -1: Status::NotSupported() will be returned during DB open.
|
||||
//
|
||||
// Default: -1
|
||||
int max_read_amp;
|
||||
|
||||
// The algorithm used to stop picking files into a single compaction run
|
||||
// Default: kCompactionStopStyleTotalSize
|
||||
CompactionStopStyle stop_style;
|
||||
|
@ -88,6 +118,7 @@ class CompactionOptionsUniversal {
|
|||
max_merge_width(UINT_MAX),
|
||||
max_size_amplification_percent(200),
|
||||
compression_size_percent(-1),
|
||||
max_read_amp(-1),
|
||||
stop_style(kCompactionStopStyleTotalSize),
|
||||
allow_trivial_move(false),
|
||||
incremental(false) {}
|
||||
|
|
|
@ -239,6 +239,10 @@ static std::unordered_map<std::string, OptionTypeInfo>
|
|||
{offsetof(class CompactionOptionsUniversal, compression_size_percent),
|
||||
OptionType::kInt, OptionVerificationType::kNormal,
|
||||
OptionTypeFlags::kMutable}},
|
||||
{"max_read_amp",
|
||||
{offsetof(class CompactionOptionsUniversal, max_read_amp),
|
||||
OptionType::kInt, OptionVerificationType::kNormal,
|
||||
OptionTypeFlags::kMutable}},
|
||||
{"stop_style",
|
||||
{offsetof(class CompactionOptionsUniversal, stop_style),
|
||||
OptionType::kCompactionStopStyle, OptionVerificationType::kNormal,
|
||||
|
@ -1137,6 +1141,8 @@ void MutableCFOptions::Dump(Logger* log) const {
|
|||
ROCKS_LOG_INFO(log,
|
||||
"compaction_options_universal.compression_size_percent : %d",
|
||||
compaction_options_universal.compression_size_percent);
|
||||
ROCKS_LOG_INFO(log, "compaction_options_universal.max_read_amp: %d",
|
||||
compaction_options_universal.max_read_amp);
|
||||
ROCKS_LOG_INFO(log, "compaction_options_universal.stop_style : %d",
|
||||
compaction_options_universal.stop_style);
|
||||
ROCKS_LOG_INFO(
|
||||
|
|
|
@ -360,6 +360,9 @@ void ColumnFamilyOptions::Dump(Logger* log) const {
|
|||
ROCKS_LOG_HEADER(log,
|
||||
"Options.compaction_options_universal.stop_style: %s",
|
||||
str_compaction_stop_style.c_str());
|
||||
ROCKS_LOG_HEADER(log,
|
||||
"Options.compaction_options_universal.max_read_amp: %d",
|
||||
compaction_options_universal.max_read_amp);
|
||||
ROCKS_LOG_HEADER(
|
||||
log, "Options.compaction_options_fifo.max_table_files_size: %" PRIu64,
|
||||
compaction_options_fifo.max_table_files_size);
|
||||
|
|
|
@ -544,12 +544,20 @@ DEFINE_int32(universal_compression_size_percent, -1,
|
|||
"The percentage of the database to compress for universal "
|
||||
"compaction. -1 means compress everything.");
|
||||
|
||||
DEFINE_int32(universal_max_read_amp, -1,
|
||||
"The limit on the number of sorted runs");
|
||||
|
||||
DEFINE_bool(universal_allow_trivial_move, false,
|
||||
"Allow trivial move in universal compaction.");
|
||||
|
||||
DEFINE_bool(universal_incremental, false,
|
||||
"Enable incremental compactions in universal compaction.");
|
||||
|
||||
DEFINE_int32(
|
||||
universal_stop_style,
|
||||
(int32_t)ROCKSDB_NAMESPACE::CompactionOptionsUniversal().stop_style,
|
||||
"Universal compaction stop style.");
|
||||
|
||||
DEFINE_int64(cache_size, 32 << 20, // 32MB
|
||||
"Number of bytes to use as a cache of uncompressed data");
|
||||
|
||||
|
@ -4664,10 +4672,14 @@ class Benchmark {
|
|||
options.compaction_options_universal.compression_size_percent =
|
||||
FLAGS_universal_compression_size_percent;
|
||||
}
|
||||
options.compaction_options_universal.max_read_amp =
|
||||
FLAGS_universal_max_read_amp;
|
||||
options.compaction_options_universal.allow_trivial_move =
|
||||
FLAGS_universal_allow_trivial_move;
|
||||
options.compaction_options_universal.incremental =
|
||||
FLAGS_universal_incremental;
|
||||
options.compaction_options_universal.stop_style =
|
||||
static_cast<CompactionStopStyle>(FLAGS_universal_stop_style);
|
||||
if (FLAGS_thread_status_per_interval > 0) {
|
||||
options.enable_thread_tracking = true;
|
||||
}
|
||||
|
|
|
@ -312,6 +312,7 @@ default_params = {
|
|||
"check_multiget_consistency": lambda: random.choice([0, 0, 0, 1]),
|
||||
"check_multiget_entity_consistency": lambda: random.choice([0, 0, 0, 1]),
|
||||
"use_timed_put_one_in": lambda: random.choice([0] * 7 + [1, 5, 10]),
|
||||
"universal_max_read_amp": lambda : random.choice([-1] * 3 + [0, 3, 10]),
|
||||
}
|
||||
_TEST_DIR_ENV_VAR = "TEST_TMPDIR"
|
||||
# If TEST_TMPDIR_EXPECTED is not specified, default value will be TEST_TMPDIR
|
||||
|
|
|
@ -0,0 +1 @@
|
|||
* Introduce a new universal compaction option CompactionOptionsUniversal::max_read_amp which allows user to define the limit on the number of sorted runs separately from the trigger for compaction (`level0_file_num_compaction_trigger`) #12477.
|
Loading…
Reference in New Issue