Improve universal compaction sorted-run trigger (#12477)

Summary:
Universal compaction currently uses `level0_file_num_compaction_trigger` for two purposes:
1. the trigger for checking if there is any compaction to do, and
2. the limit on the number of sorted runs. RocksDB will do compaction to keep the number of sorted runs no more than the value of this option.

This can make the option inflexible. A value that is too small causes higher write amp: more compactions to reduce the number of sorted runs. A value that is too big delays potential compaction work and causes worse read performance. This PR introduce an option `CompactionOptionsUniversal::max_read_amp` for only the second purpose: to specify
the hard limit on the number of sorted runs.

For backward compatibility, `max_read_amp = -1` by default, which means to fallback to the current behavior.
When `max_read_amp > 0`,`level0_file_num_compaction_trigger` will only serve as a trigger to find potential compaction.
When `max_read_amp = 0`, RocksDB will auto-tune the limit on the number of sorted runs. The estimation is based on DB size, write_buffer_size and size_ratio, so it is adaptive to the size change of the DB. See more in `UniversalCompactionBuilder::PickCompaction()`.
Alternatively, users now can configure `max_read_amp` to a very big value and keep `level0_file_num_compaction_trigger` small. This will allow `size_ratio` and `max_size_amplification_percent` to control the number of sorted runs. This essentially disables compactions with reason kUniversalSortedRunNum.

Pull Request resolved: https://github.com/facebook/rocksdb/pull/12477

Test Plan:
* new unit test
* existing unit test for default behavior
* updated crash test with the new option
* benchmark:
  * Create a DB that is roughly 24GB in the last level. When `max_read_amp = 0`, we estimate that the DB needs 9 levels to avoid excessive compactions to reduce the number of sorted runs.
  * We then run fillrandom to ingest another 24GB data to compare write amp.
     * case 1: small level0 trigger: `level0_file_num_compaction_trigger=5, max_read_amp=-1`
       * write-amp: 4.8
     * case 2: auto-tune: `level0_file_num_compaction_trigger=5, max_read_amp=0`
       *  write-amp: 3.6
     * case 3: auto-tune with minimal trigger: `level0_file_num_compaction_trigger=1, max_read_amp=0`
       *  write-amp: 3.8
     * case 4: hard-code a good value for trigger: `level0_file_num_compaction_trigger=9`
       * write-amp: 2.8
```
Case 1:
** Compaction Stats [default] **
Level    Files   Size     Score Read(GB)  Rn(GB) Rnp1(GB) Write(GB) Wnew(GB) Moved(GB) W-Amp Rd(MB/s) Wr(MB/s) Comp(sec) CompMergeCPU(sec) Comp(cnt) Avg(sec) KeyIn KeyDrop Rblob(GB) Wblob(GB)
------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
  L0      0/0    0.00 KB   1.0      0.0     0.0      0.0      22.6     22.6       0.0   1.0      0.0    163.2    141.94            111.10       108    1.314       0      0       0.0       0.0
 L45      8/0    1.81 GB   0.0     39.6    11.1     28.5      39.3     10.8       0.0   3.5    209.0    207.3    194.25            191.29        43    4.517    348M  2498K       0.0       0.0
 L46     13/0    3.12 GB   0.0     15.3     9.5      5.8      15.0      9.3       0.0   1.6    203.1    199.3     77.13             75.88        16    4.821    134M  2362K       0.0       0.0
 L47     19/0    4.68 GB   0.0     15.4    10.5      4.9      14.7      9.8       0.0   1.4    204.0    194.9     77.38             76.15         8    9.673    135M  5920K       0.0       0.0
 L48     38/0    9.42 GB   0.0     19.6    11.7      7.9      17.3      9.4       0.0   1.5    206.5    182.3     97.15             95.02         4   24.287    172M    20M       0.0       0.0
 L49     91/0   22.70 GB   0.0      0.0     0.0      0.0       0.0      0.0       0.0   0.0      0.0      0.0      0.00              0.00         0    0.000       0      0       0.0       0.0
 Sum    169/0   41.74 GB   0.0     89.9    42.9     47.0     109.0     61.9       0.0   4.8    156.7    189.8    587.85            549.45       179    3.284    791M    31M       0.0       0.0

Case 2:
** Compaction Stats [default] **
Level    Files   Size     Score Read(GB)  Rn(GB) Rnp1(GB) Write(GB) Wnew(GB) Moved(GB) W-Amp Rd(MB/s) Wr(MB/s) Comp(sec) CompMergeCPU(sec) Comp(cnt) Avg(sec) KeyIn KeyDrop Rblob(GB) Wblob(GB)
------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
  L0      1/0   214.47 MB   1.2      0.0     0.0      0.0      22.6     22.6       0.0   1.0      0.0    164.5    140.81            109.98       108    1.304       0      0       0.0       0.0
 L44      0/0    0.00 KB   0.0      1.3     1.3      0.0       1.2      1.2       0.0   1.0    206.1    204.9      6.24              5.98         3    2.081     11M    51K       0.0       0.0
 L45      4/0   844.36 MB   0.0      7.1     5.4      1.7       7.0      5.4       0.0   1.3    194.6    192.9     37.41             36.00        13    2.878     62M   489K       0.0       0.0
 L46     11/0    2.57 GB   0.0     14.6     9.8      4.8      14.3      9.5       0.0   1.5    193.7    189.8     77.09             73.54        17    4.535    128M  2411K       0.0       0.0
 L47     24/0    5.81 GB   0.0     19.8    12.0      7.8      18.8     11.0       0.0   1.6    191.4    181.1    106.19            101.21         9   11.799    174M  9166K       0.0       0.0
 L48     38/0    9.42 GB   0.0     19.6    11.8      7.9      17.3      9.4       0.0   1.5    197.3    173.6    101.97             97.23         4   25.491    172M    20M       0.0       0.0
 L49     91/0   22.70 GB   0.0      0.0     0.0      0.0       0.0      0.0       0.0   0.0      0.0      0.0      0.00              0.00         0    0.000       0      0       0.0       0.0
 Sum    169/0   41.54 GB   0.0     62.4    40.3     22.1      81.3     59.2       0.0   3.6    136.1    177.2    469.71            423.94       154    3.050    549M    32M       0.0       0.0

Case 3:
** Compaction Stats [default] **
Level    Files   Size     Score Read(GB)  Rn(GB) Rnp1(GB) Write(GB) Wnew(GB) Moved(GB) W-Amp Rd(MB/s) Wr(MB/s) Comp(sec) CompMergeCPU(sec) Comp(cnt) Avg(sec) KeyIn KeyDrop Rblob(GB) Wblob(GB)
------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
  L0      0/0    0.00 KB   5.0      0.0     0.0      0.0      22.6     22.6       0.0   1.0      0.0    163.8    141.43            111.13       108    1.310       0      0       0.0       0.0
 L44      0/0    0.00 KB   0.0      0.8     0.8      0.0       0.8      0.8       0.0   1.0    201.4    200.2      4.26              4.19         2    2.130   7360K    33K       0.0       0.0
 L45      4/0   844.38 MB   0.0      6.3     5.0      1.2       6.2      5.0       0.0   1.2    202.0    200.3     31.81             31.50        12    2.651     55M   403K       0.0       0.0
 L46      7/0    1.62 GB   0.0     13.3     8.8      4.6      13.1      8.6       0.0   1.5    198.9    195.7     68.72             67.89        17    4.042    117M  1696K       0.0       0.0
 L47     24/0    5.81 GB   0.0     21.7    12.9      8.8      20.6     11.8       0.0   1.6    198.5    188.6    112.04            109.97        12    9.336    191M  9352K       0.0       0.0
 L48     41/0   10.14 GB   0.0     24.8    13.0     11.8      21.9     10.1       0.0   1.7    198.6    175.6    127.88            125.36         6   21.313    218M    25M       0.0       0.0
 L49     91/0   22.70 GB   0.0      0.0     0.0      0.0       0.0      0.0       0.0   0.0      0.0      0.0      0.00              0.00         0    0.000       0      0       0.0       0.0
 Sum    167/0   41.10 GB   0.0     67.0    40.5     26.4      85.4     58.9       0.0   3.8    141.1    179.8    486.13            450.04       157    3.096    589M    36M       0.0       0.0

Case 4:
** Compaction Stats [default] **
Level    Files   Size     Score Read(GB)  Rn(GB) Rnp1(GB) Write(GB) Wnew(GB) Moved(GB) W-Amp Rd(MB/s) Wr(MB/s) Comp(sec) CompMergeCPU(sec) Comp(cnt) Avg(sec) KeyIn KeyDrop Rblob(GB) Wblob(GB)
------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
  L0      0/0    0.00 KB   0.7      0.0     0.0      0.0      22.6     22.6       0.0   1.0      0.0    158.6    146.02            114.68       108    1.352       0      0       0.0       0.0
 L42      0/0    0.00 KB   0.0      1.7     1.7      0.0       1.7      1.7       0.0   1.0    185.4    184.3      9.25              8.96         4    2.314     14M    67K       0.0       0.0
 L43      0/0    0.00 KB   0.0      2.5     2.5      0.0       2.5      2.5       0.0   1.0    197.8    195.6     13.01             12.65         4    3.253     22M   202K       0.0       0.0
 L44      4/0   844.40 MB   0.0      4.2     4.2      0.0       4.1      4.1       0.0   1.0    188.1    185.1     22.81             21.89         5    4.562     36M   503K       0.0       0.0
 L45     13/0    3.12 GB   0.0      7.5     6.5      1.0       7.2      6.2       0.0   1.1    188.7    181.8     40.69             39.32         5    8.138     65M  2282K       0.0       0.0
 L46     17/0    4.18 GB   0.0      8.3     7.1      1.2       7.9      6.6       0.0   1.1    192.2    181.8     44.23             43.06         4   11.058     73M  3846K       0.0       0.0
 L47     22/0    5.34 GB   0.0      8.9     7.5      1.4       8.2      6.8       0.0   1.1    189.1    174.1     48.12             45.37         3   16.041     78M  6098K       0.0       0.0
 L48     27/0    6.58 GB   0.0      9.2     7.6      1.6       8.2      6.6       0.0   1.1    195.2    172.9     48.52             47.11         2   24.262     81M  9217K       0.0       0.0
 L49     91/0   22.70 GB   0.0      0.0     0.0      0.0       0.0      0.0       0.0   0.0      0.0      0.0      0.00              0.00         0    0.000       0      0       0.0       0.0
 Sum    174/0   42.74 GB   0.0     42.3    37.0      5.3      62.4     57.1       0.0   2.8    116.3    171.3    372.66            333.04       135    2.760    372M    22M       0.0       0.0

setup:
./db_bench --benchmarks=fillseq,compactall,waitforcompaction --num=200000000 --compression_type=none --disable_wal=1 --compaction_style=1 --num_levels=50 --target_file_size_base=268435456 --max_compaction_bytes=6710886400 --level0_file_num_compaction_trigger=10 --write_buffer_size=268435456 --seed 1708494134896523

benchmark:
./db_bench --benchmarks=overwrite,waitforcompaction,stats --num=200000000 --compression_type=none --disable_wal=1 --compaction_style=1 --write_buffer_size=268435456 --level0_file_num_compaction_trigger=5 --target_file_size_base=268435456 --use_existing_db=1 --num_levels=50 --writes=200000000 --universal_max_read_amp=-1 --seed=1716488324800233

```

Reviewed By: ajkr

Differential Revision: D55370922

Pulled By: cbi42

fbshipit-source-id: 9be69979126b840d08e93e7059260e76a878bb2a
This commit is contained in:
Changyu Bi 2024-05-24 10:10:31 -07:00 committed by Facebook GitHub Bot
parent 9a72cf1a61
commit fecb10c2fa
15 changed files with 274 additions and 23 deletions

View File

@ -1528,6 +1528,18 @@ Status ColumnFamilyData::ValidateOptions(
} }
} }
} }
if (cf_options.compaction_style == kCompactionStyleUniversal) {
int max_read_amp = cf_options.compaction_options_universal.max_read_amp;
if (max_read_amp < -1) {
return Status::NotSupported("max_read_amp should be at least -1.");
} else if (0 < max_read_amp &&
max_read_amp < cf_options.level0_file_num_compaction_trigger) {
return Status::NotSupported(
"max_read_amp limits the number of sorted runs but is smaller than "
"the compaction trigger level0_file_num_compaction_trigger.");
}
}
return s; return s;
} }

View File

@ -4330,6 +4330,118 @@ TEST_F(CompactionPickerTest, IntraL0WhenL0IsSmall) {
} }
} }
TEST_F(CompactionPickerTest, UniversalMaxReadAmpLargeDB) {
ioptions_.compaction_style = kCompactionStyleUniversal;
ioptions_.num_levels = 50;
mutable_cf_options_.RefreshDerivedOptions(ioptions_);
mutable_cf_options_.compaction_options_universal.size_ratio = 10;
mutable_cf_options_.write_buffer_size = 256 << 20;
// Avoid space amp compaction
mutable_cf_options_.compaction_options_universal
.max_size_amplification_percent = 200;
const int kMaxRuns = 8;
for (int max_read_amp : {kMaxRuns, 0, -1}) {
SCOPED_TRACE("max_read_amp = " + std::to_string(max_read_amp));
if (max_read_amp == -1) {
mutable_cf_options_.level0_file_num_compaction_trigger = kMaxRuns;
} else {
mutable_cf_options_.level0_file_num_compaction_trigger = 4;
}
mutable_cf_options_.compaction_options_universal.max_read_amp =
max_read_amp;
UniversalCompactionPicker universal_compaction_picker(ioptions_, &icmp_);
uint64_t max_run_size = 20ull << 30;
// When max_read_amp = 0, we estimate the number of levels needed based on
// size_ratio and write_buffer_size. See more in
// UniversalCompactionBuilder::PickCompaction().
// With a 20GB last level, we estimate that 8 levels are needed:
// L0 256MB
// L1 256MB * 1.1 (size_ratio) = 282MB
// L2 (256MB + 282MB) * 1.1 = 592MB
// L3 1243MB
// L4 2610MB
// L5 5481MB
// L6 11510MB
// L7 24171MB > 20GB
for (int i = 0; i <= kMaxRuns; ++i) {
SCOPED_TRACE("i = " + std::to_string(i));
NewVersionStorage(/*num_levels=*/50, kCompactionStyleUniversal);
Add(/*level=*/49, /*file_number=*/10, /*smallest=*/"100",
/*largest=*/"200", /*file_size=*/max_run_size, /*path_id=*/0,
/*smallest_seq=*/0, /*largest_seq=*/0,
/*compensated_file_size=*/max_run_size);
// Besides the last sorted run, we add additional `i` sorted runs
// without triggering space-amp or size-amp compactions.
uint64_t file_size = 1 << 20;
for (int j = 0; j < i; ++j) {
Add(/*level=*/j, /*file_number=*/100 - j, /*smallest=*/"100",
/*largest=*/"200", /*file_size=*/file_size, /*path_id=*/0,
/*smallest_seq=*/100 - j, /*largest_seq=*/100 - j,
/*compensated_file_size=*/file_size);
// to avoid space-amp and size-amp compaction
file_size *= 2;
}
UpdateVersionStorageInfo();
// level0_file_num_compaction_trigger is still used as trigger to
// check potential compactions
ASSERT_EQ(
universal_compaction_picker.NeedsCompaction(vstorage_.get()),
i + 1 >= mutable_cf_options_.level0_file_num_compaction_trigger);
std::unique_ptr<Compaction> compaction(
universal_compaction_picker.PickCompaction(
cf_name_, mutable_cf_options_, mutable_db_options_,
vstorage_.get(), &log_buffer_));
if (i == kMaxRuns) {
// There are in total i + 1 > kMaxRuns sorted runs.
// This triggers compaction ignoring size_ratio.
ASSERT_NE(nullptr, compaction);
ASSERT_EQ(CompactionReason::kUniversalSortedRunNum,
compaction->compaction_reason());
// First two runs are compacted
ASSERT_EQ(0, compaction->start_level());
ASSERT_EQ(1, compaction->output_level());
ASSERT_EQ(1U, compaction->num_input_files(0));
ASSERT_EQ(1U, compaction->num_input_files(1));
} else {
ASSERT_EQ(nullptr, compaction);
}
}
}
}
TEST_F(CompactionPickerTest, UniversalMaxReadAmpSmallDB) {
ioptions_.compaction_style = kCompactionStyleUniversal;
ioptions_.num_levels = 50;
mutable_cf_options_.RefreshDerivedOptions(ioptions_);
mutable_cf_options_.level0_file_num_compaction_trigger = 1;
mutable_cf_options_.compaction_options_universal.size_ratio = 10;
mutable_cf_options_.write_buffer_size = 256 << 20;
mutable_cf_options_.compaction_options_universal
.max_size_amplification_percent = 200;
const int kMaxRuns = 1;
for (int max_read_amp : {-1, kMaxRuns, 0}) {
SCOPED_TRACE("max_read_amp = " + std::to_string(max_read_amp));
mutable_cf_options_.compaction_options_universal.max_read_amp =
max_read_amp;
UniversalCompactionPicker universal_compaction_picker(ioptions_, &icmp_);
NewVersionStorage(/*num_levels=*/50, kCompactionStyleUniversal);
// max_run_size is much smaller than write_buffer_size,
// only 1 level is needed.
uint64_t max_run_size = 8 << 10;
Add(/*level=*/49, /*file_number=*/10, /*smallest=*/"100",
/*largest=*/"200", /*file_size=*/max_run_size, /*path_id=*/0,
/*smallest_seq=*/0, /*largest_seq=*/0,
/*compensated_file_size=*/max_run_size);
UpdateVersionStorageInfo();
ASSERT_TRUE(universal_compaction_picker.NeedsCompaction(vstorage_.get()));
std::unique_ptr<Compaction> compaction(
universal_compaction_picker.PickCompaction(
cf_name_, mutable_cf_options_, mutable_db_options_, vstorage_.get(),
&log_buffer_));
ASSERT_EQ(nullptr, compaction);
}
}
} // namespace ROCKSDB_NAMESPACE } // namespace ROCKSDB_NAMESPACE
int main(int argc, char** argv) { int main(int argc, char** argv) {

View File

@ -227,6 +227,7 @@ class UniversalCompactionBuilder {
const InternalKeyComparator* icmp_; const InternalKeyComparator* icmp_;
double score_; double score_;
std::vector<SortedRun> sorted_runs_; std::vector<SortedRun> sorted_runs_;
uint64_t max_run_size_;
const std::string& cf_name_; const std::string& cf_name_;
const MutableCFOptions& mutable_cf_options_; const MutableCFOptions& mutable_cf_options_;
const MutableDBOptions& mutable_db_options_; const MutableDBOptions& mutable_db_options_;
@ -235,7 +236,8 @@ class UniversalCompactionBuilder {
LogBuffer* log_buffer_; LogBuffer* log_buffer_;
static std::vector<UniversalCompactionBuilder::SortedRun> CalculateSortedRuns( static std::vector<UniversalCompactionBuilder::SortedRun> CalculateSortedRuns(
const VersionStorageInfo& vstorage, int last_level); const VersionStorageInfo& vstorage, int last_level,
uint64_t* max_run_size);
// Pick a path ID to place a newly generated file, with its estimated file // Pick a path ID to place a newly generated file, with its estimated file
// size. // size.
@ -440,11 +442,15 @@ void UniversalCompactionBuilder::SortedRun::DumpSizeInfo(
std::vector<UniversalCompactionBuilder::SortedRun> std::vector<UniversalCompactionBuilder::SortedRun>
UniversalCompactionBuilder::CalculateSortedRuns( UniversalCompactionBuilder::CalculateSortedRuns(
const VersionStorageInfo& vstorage, int last_level) { const VersionStorageInfo& vstorage, int last_level,
uint64_t* max_run_size) {
assert(max_run_size);
*max_run_size = 0;
std::vector<UniversalCompactionBuilder::SortedRun> ret; std::vector<UniversalCompactionBuilder::SortedRun> ret;
for (FileMetaData* f : vstorage.LevelFiles(0)) { for (FileMetaData* f : vstorage.LevelFiles(0)) {
ret.emplace_back(0, f, f->fd.GetFileSize(), f->compensated_file_size, ret.emplace_back(0, f, f->fd.GetFileSize(), f->compensated_file_size,
f->being_compacted); f->being_compacted);
*max_run_size = std::max(*max_run_size, f->fd.GetFileSize());
} }
for (int level = 1; level <= last_level; level++) { for (int level = 1; level <= last_level; level++) {
uint64_t total_compensated_size = 0U; uint64_t total_compensated_size = 0U;
@ -466,6 +472,7 @@ UniversalCompactionBuilder::CalculateSortedRuns(
ret.emplace_back(level, nullptr, total_size, total_compensated_size, ret.emplace_back(level, nullptr, total_size, total_compensated_size,
being_compacted); being_compacted);
} }
*max_run_size = std::max(*max_run_size, total_size);
} }
return ret; return ret;
} }
@ -477,13 +484,16 @@ Compaction* UniversalCompactionBuilder::PickCompaction() {
score_ = vstorage_->CompactionScore(kLevel0); score_ = vstorage_->CompactionScore(kLevel0);
int max_output_level = int max_output_level =
vstorage_->MaxOutputLevel(ioptions_.allow_ingest_behind); vstorage_->MaxOutputLevel(ioptions_.allow_ingest_behind);
sorted_runs_ = CalculateSortedRuns(*vstorage_, max_output_level); max_run_size_ = 0;
sorted_runs_ =
CalculateSortedRuns(*vstorage_, max_output_level, &max_run_size_);
int file_num_compaction_trigger =
mutable_cf_options_.level0_file_num_compaction_trigger;
if (sorted_runs_.size() == 0 || if (sorted_runs_.size() == 0 ||
(vstorage_->FilesMarkedForPeriodicCompaction().empty() && (vstorage_->FilesMarkedForPeriodicCompaction().empty() &&
vstorage_->FilesMarkedForCompaction().empty() && vstorage_->FilesMarkedForCompaction().empty() &&
sorted_runs_.size() < (unsigned int)mutable_cf_options_ sorted_runs_.size() < (unsigned int)file_num_compaction_trigger)) {
.level0_file_num_compaction_trigger)) {
ROCKS_LOG_BUFFER(log_buffer_, "[%s] Universal: nothing to do\n", ROCKS_LOG_BUFFER(log_buffer_, "[%s] Universal: nothing to do\n",
cf_name_.c_str()); cf_name_.c_str());
TEST_SYNC_POINT_CALLBACK( TEST_SYNC_POINT_CALLBACK(
@ -505,11 +515,9 @@ Compaction* UniversalCompactionBuilder::PickCompaction() {
TEST_SYNC_POINT_CALLBACK("PostPickPeriodicCompaction", c); TEST_SYNC_POINT_CALLBACK("PostPickPeriodicCompaction", c);
} }
// Check for size amplification.
if (c == nullptr && if (c == nullptr &&
sorted_runs_.size() >= sorted_runs_.size() >= static_cast<size_t>(file_num_compaction_trigger)) {
static_cast<size_t>( // Check for size amplification.
mutable_cf_options_.level0_file_num_compaction_trigger)) {
if ((c = PickCompactionToReduceSizeAmp()) != nullptr) { if ((c = PickCompactionToReduceSizeAmp()) != nullptr) {
TEST_SYNC_POINT("PickCompactionToReduceSizeAmpReturnNonnullptr"); TEST_SYNC_POINT("PickCompactionToReduceSizeAmpReturnNonnullptr");
ROCKS_LOG_BUFFER(log_buffer_, "[%s] Universal: compacting for size amp\n", ROCKS_LOG_BUFFER(log_buffer_, "[%s] Universal: compacting for size amp\n",
@ -527,13 +535,48 @@ Compaction* UniversalCompactionBuilder::PickCompaction() {
cf_name_.c_str()); cf_name_.c_str());
} else { } else {
// Size amplification and file size ratios are within configured limits. // Size amplification and file size ratios are within configured limits.
// If max read amplification is exceeding configured limits, then force // If max read amplification exceeds configured limits, then force
// compaction without looking at filesize ratios and try to reduce // compaction to reduce the number sorted runs without looking at file
// the number of files to fewer than level0_file_num_compaction_trigger. // size ratios.
// This is guaranteed by NeedsCompaction() // This is guaranteed by NeedsCompaction()
assert(sorted_runs_.size() >= assert(sorted_runs_.size() >=
static_cast<size_t>( static_cast<size_t>(file_num_compaction_trigger));
mutable_cf_options_.level0_file_num_compaction_trigger)); int max_num_runs =
mutable_cf_options_.compaction_options_universal.max_read_amp;
if (max_num_runs < 0) {
// any value < -1 is not valid
assert(max_num_runs == -1);
// By default, fall back to `level0_file_num_compaction_trigger`
max_num_runs = file_num_compaction_trigger;
} else if (max_num_runs == 0) {
if (mutable_cf_options_.compaction_options_universal.stop_style ==
kCompactionStopStyleTotalSize) {
// 0 means auto-tuning by RocksDB. We estimate max num run based on
// max_run_size, size_ratio and write buffer size:
// Assume the size of the lowest level size is equal to
// write_buffer_size. Each subsequent level is the max size without
// triggering size_ratio compaction. `max_num_runs` is the minimum
// number of levels required such that the target size of the
// largest level is at least `max_run_size_`.
max_num_runs = 1;
double cur_level_max_size =
static_cast<double>(mutable_cf_options_.write_buffer_size);
double total_run_size = 0;
while (cur_level_max_size < static_cast<double>(max_run_size_)) {
// This loop should not take too many iterations since
// cur_level_max_size at least doubles each iteration.
total_run_size += cur_level_max_size;
cur_level_max_size = (100.0 + ratio) / 100.0 * total_run_size;
++max_num_runs;
}
} else {
// TODO: implement the auto-tune logic for this stop style
max_num_runs = file_num_compaction_trigger;
}
} else {
// max_num_runs > 0, it's the limit on the number of sorted run
}
// Get the total number of sorted runs that are not being compacted // Get the total number of sorted runs that are not being compacted
int num_sr_not_compacted = 0; int num_sr_not_compacted = 0;
for (size_t i = 0; i < sorted_runs_.size(); i++) { for (size_t i = 0; i < sorted_runs_.size(); i++) {
@ -544,17 +587,25 @@ Compaction* UniversalCompactionBuilder::PickCompaction() {
// The number of sorted runs that are not being compacted is greater // The number of sorted runs that are not being compacted is greater
// than the maximum allowed number of sorted runs // than the maximum allowed number of sorted runs
if (num_sr_not_compacted > if (num_sr_not_compacted > max_num_runs) {
mutable_cf_options_.level0_file_num_compaction_trigger) { unsigned int num_files = num_sr_not_compacted - max_num_runs + 1;
unsigned int num_files =
num_sr_not_compacted -
mutable_cf_options_.level0_file_num_compaction_trigger + 1;
if ((c = PickCompactionToReduceSortedRuns(UINT_MAX, num_files)) != if ((c = PickCompactionToReduceSortedRuns(UINT_MAX, num_files)) !=
nullptr) { nullptr) {
ROCKS_LOG_BUFFER(log_buffer_, ROCKS_LOG_BUFFER(log_buffer_,
"[%s] Universal: compacting for file num -- %u\n", "[%s] Universal: compacting for file num, to "
cf_name_.c_str(), num_files); "compact file num -- %u, max num runs allowed"
"-- %d, max_run_size -- %" PRIu64 "\n",
cf_name_.c_str(), num_files, max_num_runs,
max_run_size_);
} }
} else {
ROCKS_LOG_BUFFER(
log_buffer_,
"[%s] Universal: skipping compaction for file num, num runs not "
"being compacted -- %u, max num runs allowed -- %d, max_run_size "
"-- %" PRIu64 "\n",
cf_name_.c_str(), num_sr_not_compacted, max_num_runs,
max_run_size_);
} }
} }
} }

View File

@ -5645,6 +5645,8 @@ TEST_F(DBTest, DynamicUniversalCompactionOptions) {
ASSERT_EQ( ASSERT_EQ(
dbfull()->GetOptions().compaction_options_universal.allow_trivial_move, dbfull()->GetOptions().compaction_options_universal.allow_trivial_move,
false); false);
ASSERT_EQ(dbfull()->GetOptions().compaction_options_universal.max_read_amp,
-1);
ASSERT_OK(dbfull()->SetOptions( ASSERT_OK(dbfull()->SetOptions(
{{"compaction_options_universal", "{size_ratio=7;}"}})); {{"compaction_options_universal", "{size_ratio=7;}"}}));
@ -5666,9 +5668,11 @@ TEST_F(DBTest, DynamicUniversalCompactionOptions) {
ASSERT_EQ( ASSERT_EQ(
dbfull()->GetOptions().compaction_options_universal.allow_trivial_move, dbfull()->GetOptions().compaction_options_universal.allow_trivial_move,
false); false);
ASSERT_EQ(dbfull()->GetOptions().compaction_options_universal.max_read_amp,
-1);
ASSERT_OK(dbfull()->SetOptions( ASSERT_OK(dbfull()->SetOptions({{"compaction_options_universal",
{{"compaction_options_universal", "{min_merge_width=11;}"}})); "{min_merge_width=11;max_read_amp=0;}"}}));
ASSERT_EQ(dbfull()->GetOptions().compaction_options_universal.size_ratio, 7u); ASSERT_EQ(dbfull()->GetOptions().compaction_options_universal.size_ratio, 7u);
ASSERT_EQ(dbfull()->GetOptions().compaction_options_universal.min_merge_width, ASSERT_EQ(dbfull()->GetOptions().compaction_options_universal.min_merge_width,
11u); 11u);
@ -5687,6 +5691,8 @@ TEST_F(DBTest, DynamicUniversalCompactionOptions) {
ASSERT_EQ( ASSERT_EQ(
dbfull()->GetOptions().compaction_options_universal.allow_trivial_move, dbfull()->GetOptions().compaction_options_universal.allow_trivial_move,
false); false);
ASSERT_EQ(dbfull()->GetOptions().compaction_options_universal.max_read_amp,
0);
} }
TEST_F(DBTest, FileCreationRandomFailure) { TEST_F(DBTest, FileCreationRandomFailure) {

View File

@ -3501,6 +3501,10 @@ void VersionStorageInfo::ComputeCompactionScore(
score = kScoreForNeedCompaction; score = kScoreForNeedCompaction;
} }
} else { } else {
// For universal compaction, if a user configures `max_read_amp`, then
// the score may be a false positive signal.
// `level0_file_num_compaction_trigger` is used as a trigger to check
// if there is any compaction work to do.
score = static_cast<double>(num_sorted_runs) / score = static_cast<double>(num_sorted_runs) /
mutable_cf_options.level0_file_num_compaction_trigger; mutable_cf_options.level0_file_num_compaction_trigger;
if (compaction_style_ == kCompactionStyleLevel && num_levels() > 1) { if (compaction_style_ == kCompactionStyleLevel && num_levels() > 1) {

View File

@ -136,6 +136,7 @@ DECLARE_int32(universal_size_ratio);
DECLARE_int32(universal_min_merge_width); DECLARE_int32(universal_min_merge_width);
DECLARE_int32(universal_max_merge_width); DECLARE_int32(universal_max_merge_width);
DECLARE_int32(universal_max_size_amplification_percent); DECLARE_int32(universal_max_size_amplification_percent);
DECLARE_int32(universal_max_read_amp);
DECLARE_int32(clear_column_family_one_in); DECLARE_int32(clear_column_family_one_in);
DECLARE_int32(get_live_files_apis_one_in); DECLARE_int32(get_live_files_apis_one_in);
DECLARE_int32(get_all_column_family_metadata_one_in); DECLARE_int32(get_all_column_family_metadata_one_in);

View File

@ -306,6 +306,9 @@ DEFINE_int32(universal_max_merge_width, 0,
DEFINE_int32(universal_max_size_amplification_percent, 0, DEFINE_int32(universal_max_size_amplification_percent, 0,
"The max size amplification for universal style compaction"); "The max size amplification for universal style compaction");
DEFINE_int32(universal_max_read_amp, -1,
"The limit on the number of sorted runs");
DEFINE_int32(clear_column_family_one_in, 1000000, DEFINE_int32(clear_column_family_one_in, 1000000,
"With a chance of 1/N, delete a column family and then recreate " "With a chance of 1/N, delete a column family and then recreate "
"it again. If N == 0, never drop/create column families. " "it again. If N == 0, never drop/create column families. "

View File

@ -3712,6 +3712,8 @@ void InitializeOptionsFromFlags(
FLAGS_universal_max_merge_width; FLAGS_universal_max_merge_width;
options.compaction_options_universal.max_size_amplification_percent = options.compaction_options_universal.max_size_amplification_percent =
FLAGS_universal_max_size_amplification_percent; FLAGS_universal_max_size_amplification_percent;
options.compaction_options_universal.max_read_amp =
FLAGS_universal_max_read_amp;
options.atomic_flush = FLAGS_atomic_flush; options.atomic_flush = FLAGS_atomic_flush;
options.manual_wal_flush = FLAGS_manual_wal_flush_one_in > 0 ? true : false; options.manual_wal_flush = FLAGS_manual_wal_flush_one_in > 0 ? true : false;
options.avoid_unnecessary_blocking_io = FLAGS_avoid_unnecessary_blocking_io; options.avoid_unnecessary_blocking_io = FLAGS_avoid_unnecessary_blocking_io;

View File

@ -234,6 +234,12 @@ struct ColumnFamilyOptions : public AdvancedColumnFamilyOptions {
// Number of files to trigger level-0 compaction. A value <0 means that // Number of files to trigger level-0 compaction. A value <0 means that
// level-0 compaction will not be triggered by number of files at all. // level-0 compaction will not be triggered by number of files at all.
// //
// Universal compaction: RocksDB will try to keep the number of sorted runs
// no more than this number. If CompactionOptionsUniversal::max_read_amp is
// set, then this option will be used only as a trigger to look for
// compaction. CompactionOptionsUniversal::max_read_amp will be the limit
// on the number of sorted runs.
//
// Default: 4 // Default: 4
// //
// Dynamically changeable through SetOptions() API // Dynamically changeable through SetOptions() API

View File

@ -65,6 +65,36 @@ class CompactionOptionsUniversal {
// Default: -1 // Default: -1
int compression_size_percent; int compression_size_percent;
// The limit on the number of sorted runs. RocksDB will try to keep
// the number of sorted runs at most this number. While compactions are
// running, the number of sorted runs may be temporarily higher than
// this number.
//
// Since universal compaction checks if there is compaction to do when
// the number of sorted runs is at least level0_file_num_compaction_trigger,
// it is suggested to set level0_file_num_compaction_trigger to be no larger
// than max_read_amp.
//
// Values:
// -1: special flag to let RocksDB pick default. Currently,
// RocksDB will fall back to the behavior before this option is introduced,
// which is to use level0_file_num_compaction_trigger as the limit.
// This may change in the future to behave as 0 below.
// 0: Let RocksDB auto-tune. Currently, we determine the max number of
// sorted runs based on the current DB size, size_ratio and
// write_buffer_size. Note that this is only supported for the default
// stop_style kCompactionStopStyleTotalSize. For
// kCompactionStopStyleSimilarSize, this behaves as if -1 is configured.
// N > 0: limit the number of sorted runs to be at most N.
// N should be at least the compaction trigger specified by
// level0_file_num_compaction_trigger. If 0 < max_read_amp <
// level0_file_num_compaction_trigger, Status::NotSupported() will be
// returned during DB open.
// N < -1: Status::NotSupported() will be returned during DB open.
//
// Default: -1
int max_read_amp;
// The algorithm used to stop picking files into a single compaction run // The algorithm used to stop picking files into a single compaction run
// Default: kCompactionStopStyleTotalSize // Default: kCompactionStopStyleTotalSize
CompactionStopStyle stop_style; CompactionStopStyle stop_style;
@ -88,6 +118,7 @@ class CompactionOptionsUniversal {
max_merge_width(UINT_MAX), max_merge_width(UINT_MAX),
max_size_amplification_percent(200), max_size_amplification_percent(200),
compression_size_percent(-1), compression_size_percent(-1),
max_read_amp(-1),
stop_style(kCompactionStopStyleTotalSize), stop_style(kCompactionStopStyleTotalSize),
allow_trivial_move(false), allow_trivial_move(false),
incremental(false) {} incremental(false) {}

View File

@ -239,6 +239,10 @@ static std::unordered_map<std::string, OptionTypeInfo>
{offsetof(class CompactionOptionsUniversal, compression_size_percent), {offsetof(class CompactionOptionsUniversal, compression_size_percent),
OptionType::kInt, OptionVerificationType::kNormal, OptionType::kInt, OptionVerificationType::kNormal,
OptionTypeFlags::kMutable}}, OptionTypeFlags::kMutable}},
{"max_read_amp",
{offsetof(class CompactionOptionsUniversal, max_read_amp),
OptionType::kInt, OptionVerificationType::kNormal,
OptionTypeFlags::kMutable}},
{"stop_style", {"stop_style",
{offsetof(class CompactionOptionsUniversal, stop_style), {offsetof(class CompactionOptionsUniversal, stop_style),
OptionType::kCompactionStopStyle, OptionVerificationType::kNormal, OptionType::kCompactionStopStyle, OptionVerificationType::kNormal,
@ -1137,6 +1141,8 @@ void MutableCFOptions::Dump(Logger* log) const {
ROCKS_LOG_INFO(log, ROCKS_LOG_INFO(log,
"compaction_options_universal.compression_size_percent : %d", "compaction_options_universal.compression_size_percent : %d",
compaction_options_universal.compression_size_percent); compaction_options_universal.compression_size_percent);
ROCKS_LOG_INFO(log, "compaction_options_universal.max_read_amp: %d",
compaction_options_universal.max_read_amp);
ROCKS_LOG_INFO(log, "compaction_options_universal.stop_style : %d", ROCKS_LOG_INFO(log, "compaction_options_universal.stop_style : %d",
compaction_options_universal.stop_style); compaction_options_universal.stop_style);
ROCKS_LOG_INFO( ROCKS_LOG_INFO(

View File

@ -360,6 +360,9 @@ void ColumnFamilyOptions::Dump(Logger* log) const {
ROCKS_LOG_HEADER(log, ROCKS_LOG_HEADER(log,
"Options.compaction_options_universal.stop_style: %s", "Options.compaction_options_universal.stop_style: %s",
str_compaction_stop_style.c_str()); str_compaction_stop_style.c_str());
ROCKS_LOG_HEADER(log,
"Options.compaction_options_universal.max_read_amp: %d",
compaction_options_universal.max_read_amp);
ROCKS_LOG_HEADER( ROCKS_LOG_HEADER(
log, "Options.compaction_options_fifo.max_table_files_size: %" PRIu64, log, "Options.compaction_options_fifo.max_table_files_size: %" PRIu64,
compaction_options_fifo.max_table_files_size); compaction_options_fifo.max_table_files_size);

View File

@ -544,12 +544,20 @@ DEFINE_int32(universal_compression_size_percent, -1,
"The percentage of the database to compress for universal " "The percentage of the database to compress for universal "
"compaction. -1 means compress everything."); "compaction. -1 means compress everything.");
DEFINE_int32(universal_max_read_amp, -1,
"The limit on the number of sorted runs");
DEFINE_bool(universal_allow_trivial_move, false, DEFINE_bool(universal_allow_trivial_move, false,
"Allow trivial move in universal compaction."); "Allow trivial move in universal compaction.");
DEFINE_bool(universal_incremental, false, DEFINE_bool(universal_incremental, false,
"Enable incremental compactions in universal compaction."); "Enable incremental compactions in universal compaction.");
DEFINE_int32(
universal_stop_style,
(int32_t)ROCKSDB_NAMESPACE::CompactionOptionsUniversal().stop_style,
"Universal compaction stop style.");
DEFINE_int64(cache_size, 32 << 20, // 32MB DEFINE_int64(cache_size, 32 << 20, // 32MB
"Number of bytes to use as a cache of uncompressed data"); "Number of bytes to use as a cache of uncompressed data");
@ -4664,10 +4672,14 @@ class Benchmark {
options.compaction_options_universal.compression_size_percent = options.compaction_options_universal.compression_size_percent =
FLAGS_universal_compression_size_percent; FLAGS_universal_compression_size_percent;
} }
options.compaction_options_universal.max_read_amp =
FLAGS_universal_max_read_amp;
options.compaction_options_universal.allow_trivial_move = options.compaction_options_universal.allow_trivial_move =
FLAGS_universal_allow_trivial_move; FLAGS_universal_allow_trivial_move;
options.compaction_options_universal.incremental = options.compaction_options_universal.incremental =
FLAGS_universal_incremental; FLAGS_universal_incremental;
options.compaction_options_universal.stop_style =
static_cast<CompactionStopStyle>(FLAGS_universal_stop_style);
if (FLAGS_thread_status_per_interval > 0) { if (FLAGS_thread_status_per_interval > 0) {
options.enable_thread_tracking = true; options.enable_thread_tracking = true;
} }

View File

@ -312,6 +312,7 @@ default_params = {
"check_multiget_consistency": lambda: random.choice([0, 0, 0, 1]), "check_multiget_consistency": lambda: random.choice([0, 0, 0, 1]),
"check_multiget_entity_consistency": lambda: random.choice([0, 0, 0, 1]), "check_multiget_entity_consistency": lambda: random.choice([0, 0, 0, 1]),
"use_timed_put_one_in": lambda: random.choice([0] * 7 + [1, 5, 10]), "use_timed_put_one_in": lambda: random.choice([0] * 7 + [1, 5, 10]),
"universal_max_read_amp": lambda : random.choice([-1] * 3 + [0, 3, 10]),
} }
_TEST_DIR_ENV_VAR = "TEST_TMPDIR" _TEST_DIR_ENV_VAR = "TEST_TMPDIR"
# If TEST_TMPDIR_EXPECTED is not specified, default value will be TEST_TMPDIR # If TEST_TMPDIR_EXPECTED is not specified, default value will be TEST_TMPDIR

View File

@ -0,0 +1 @@
* Introduce a new universal compaction option CompactionOptionsUniversal::max_read_amp which allows user to define the limit on the number of sorted runs separately from the trigger for compaction (`level0_file_num_compaction_trigger`) #12477.