mirror of
https://github.com/facebook/rocksdb.git
synced 2024-12-02 20:52:55 +00:00
229297d1b8
Summary: A second attempt after https://github.com/facebook/rocksdb/issues/10802, with bug fixes and refactoring. This PR updates compaction logic to take range tombstones into account when determining whether to cut the current compaction output file (https://github.com/facebook/rocksdb/issues/4811). Before this change, only point keys were considered, and range tombstones could cause large compactions. For example, if the current compaction outputs is a range tombstone [a, b) and 2 point keys y, z, they would be added to the same file, and may overlap with too many files in the next level and cause a large compaction in the future. This PR also includes ajkr's effort to simplify the logic to add range tombstones to compaction output files in `AddRangeDels()` ([https://github.com/facebook/rocksdb/issues/11078](https://github.com/facebook/rocksdb/pull/11078#issuecomment-1386078861)). The main change is for `CompactionIterator` to emit range tombstone start keys to be processed by `CompactionOutputs`. A new class `CompactionMergingIterator` is introduced to replace `MergingIterator` under `CompactionIterator` to enable emitting of range tombstone start keys. Further improvement after this PR include cutting compaction output at some grandparent boundary key (instead of the next output key) when cutting within a range tombstone to reduce overlap with grandparents. Pull Request resolved: https://github.com/facebook/rocksdb/pull/11113 Test Plan: * added unit test in db_range_del_test * crash test with a small key range: `python3 tools/db_crashtest.py blackbox --simple --max_key=100 --interval=600 --write_buffer_size=262144 --target_file_size_base=256 --max_bytes_for_level_base=262144 --block_size=128 --value_size_mult=33 --subcompactions=10 --use_multiget=1 --delpercent=3 --delrangepercent=2 --verify_iterator_with_expected_state_one_in=2 --num_iterations=10` Reviewed By: ajkr Differential Revision: D42655709 Pulled By: cbi42 fbshipit-source-id: 8367e36ef5640e8f21c14a3855d4a8d6e360a34c
45 lines
1.9 KiB
C++
45 lines
1.9 KiB
C++
// Copyright (c) Meta Platforms, Inc. and affiliates.
|
|
//
|
|
// This source code is licensed under both the GPLv2 (found in the
|
|
// COPYING file in the root directory) and Apache 2.0 License
|
|
// (found in the LICENSE.Apache file in the root directory).
|
|
|
|
#pragma once
|
|
|
|
#include "db/range_del_aggregator.h"
|
|
#include "rocksdb/slice.h"
|
|
#include "rocksdb/types.h"
|
|
#include "table/merging_iterator.h"
|
|
|
|
namespace ROCKSDB_NAMESPACE {
|
|
|
|
/*
|
|
* This is a simplified version of MergingIterator and is specifically used for
|
|
* compaction. It merges the input `children` iterators into a sorted stream of
|
|
* keys. Range tombstone start keys are also emitted to prevent oversize
|
|
* compactions. For example, consider an L1 file with content [a, b), y, z,
|
|
* where [a, b) is a range tombstone and y and z are point keys. This could
|
|
* cause an oversize compaction as it can overlap with a wide range of key space
|
|
* in L2.
|
|
*
|
|
* CompactionMergingIterator emits range tombstone start keys from each LSM
|
|
* level's range tombstone iterator, and for each range tombstone
|
|
* [start,end)@seqno, the key will be start@seqno with op_type
|
|
* kTypeRangeDeletion unless truncated at file boundary (see detail in
|
|
* TruncatedRangeDelIterator::start_key()).
|
|
*
|
|
* Caller should use CompactionMergingIterator::IsDeleteRangeSentinelKey() to
|
|
* check if the current key is a range tombstone key.
|
|
* TODO(cbi): IsDeleteRangeSentinelKey() is used for two kinds of keys at
|
|
* different layers: file boundary and range tombstone keys. Separate them into
|
|
* two APIs for clarity.
|
|
*/
|
|
class CompactionMergingIterator;
|
|
|
|
InternalIterator* NewCompactionMergingIterator(
|
|
const InternalKeyComparator* comparator, InternalIterator** children, int n,
|
|
std::vector<std::pair<TruncatedRangeDelIterator*,
|
|
TruncatedRangeDelIterator***>>& range_tombstone_iters,
|
|
Arena* arena = nullptr);
|
|
} // namespace ROCKSDB_NAMESPACE
|