rocksdb/db/history_trimming_iterator.h
Changyu Bi 229297d1b8 Refactor AddRangeDels() + consider range tombstone during compaction file cutting (#11113)
Summary:
A second attempt after https://github.com/facebook/rocksdb/issues/10802, with bug fixes and refactoring. This PR updates compaction logic to take range tombstones into account when determining whether to cut the current compaction output file (https://github.com/facebook/rocksdb/issues/4811). Before this change, only point keys were considered, and range tombstones could cause large compactions. For example, if the current compaction outputs is a range tombstone [a, b) and 2 point keys y, z, they would be added to the same file, and may overlap with too many files in the next level and cause a large compaction in the future. This PR also includes ajkr's effort to simplify the logic to add range tombstones to compaction output files in `AddRangeDels()` ([https://github.com/facebook/rocksdb/issues/11078](https://github.com/facebook/rocksdb/pull/11078#issuecomment-1386078861)).

The main change is for `CompactionIterator` to emit range tombstone start keys to be processed by `CompactionOutputs`. A new class `CompactionMergingIterator` is introduced to replace `MergingIterator` under `CompactionIterator` to enable emitting of range tombstone start keys. Further improvement after this PR include cutting compaction output at some grandparent boundary key (instead of the next output key) when cutting within a range tombstone to reduce overlap with grandparents.

Pull Request resolved: https://github.com/facebook/rocksdb/pull/11113

Test Plan:
* added unit test in db_range_del_test
* crash test with a small key range: `python3 tools/db_crashtest.py blackbox --simple --max_key=100 --interval=600 --write_buffer_size=262144 --target_file_size_base=256 --max_bytes_for_level_base=262144 --block_size=128 --value_size_mult=33 --subcompactions=10 --use_multiget=1 --delpercent=3 --delrangepercent=2 --verify_iterator_with_expected_state_one_in=2 --num_iterations=10`

Reviewed By: ajkr

Differential Revision: D42655709

Pulled By: cbi42

fbshipit-source-id: 8367e36ef5640e8f21c14a3855d4a8d6e360a34c
2023-02-22 12:28:18 -08:00

96 lines
2.3 KiB
C++

// Copyright (c) 2011-present, Facebook, Inc. All rights reserved.
// This source code is licensed under both the GPLv2 (found in the
// COPYING file in the root directory) and Apache 2.0 License
// (found in the LICENSE.Apache file in the root directory).
#pragma once
#include <string>
#include <vector>
#include "db/dbformat.h"
#include "rocksdb/iterator.h"
#include "rocksdb/slice.h"
#include "table/internal_iterator.h"
namespace ROCKSDB_NAMESPACE {
class HistoryTrimmingIterator : public InternalIterator {
public:
explicit HistoryTrimmingIterator(InternalIterator* input,
const Comparator* cmp, const std::string& ts)
: input_(input), filter_ts_(ts), cmp_(cmp) {
assert(cmp_->timestamp_size() > 0 && !ts.empty());
}
bool filter() const {
if (!input_->Valid()) {
return true;
}
Slice current_ts = ExtractTimestampFromKey(key(), cmp_->timestamp_size());
return cmp_->CompareTimestamp(current_ts, Slice(filter_ts_)) <= 0;
}
bool Valid() const override { return input_->Valid(); }
void SeekToFirst() override {
input_->SeekToFirst();
while (!filter()) {
input_->Next();
}
}
void SeekToLast() override {
input_->SeekToLast();
while (!filter()) {
input_->Prev();
}
}
void Seek(const Slice& target) override {
input_->Seek(target);
while (!filter()) {
input_->Next();
}
}
void SeekForPrev(const Slice& target) override {
input_->SeekForPrev(target);
while (!filter()) {
input_->Prev();
}
}
void Next() override {
do {
input_->Next();
} while (!filter());
}
void Prev() override {
do {
input_->Prev();
} while (!filter());
}
Slice key() const override { return input_->key(); }
Slice value() const override { return input_->value(); }
Status status() const override { return input_->status(); }
bool IsKeyPinned() const override { return input_->IsKeyPinned(); }
bool IsValuePinned() const override { return input_->IsValuePinned(); }
bool IsDeleteRangeSentinelKey() const override {
return input_->IsDeleteRangeSentinelKey();
}
private:
InternalIterator* input_;
const std::string filter_ts_;
const Comparator* const cmp_;
};
} // namespace ROCKSDB_NAMESPACE