rocksdb/utilities/cassandra/merge_operator.cc
Pengchao Wang e4234fbdcf collecting kValue type tombstone
Summary:
In our testing cluster, we found large amount tombstone has been promoted to kValue type from kMerge after reaching the top level of compaction. Since we used to only collecting tombstone in merge operator, those tombstones can never be collected.

This PR addresses the issue by adding a GC step in compaction filter, which is only for kValue type records. Since those record already reached the top of compaction (no earlier data exists) we can safely remove them in compaction filter without worrying old data appears.

This PR also removes an old optimization in cassandra merge operator for single merge operands.  We need to do GC even on a single operand, so the optimation does not make sense anymore.
Closes https://github.com/facebook/rocksdb/pull/2855

Reviewed By: sagar0

Differential Revision: D5806445

Pulled By: wpc

fbshipit-source-id: 6eb25629d4ce917eb5e8b489f64a6aa78c7d270b
2017-09-18 16:27:12 -07:00

70 lines
2 KiB
C++

// Copyright (c) 2017-present, Facebook, Inc. All rights reserved.
// This source code is licensed under both the GPLv2 (found in the
// COPYING file in the root directory) and Apache 2.0 License
// (found in the LICENSE.Apache file in the root directory).
#include "merge_operator.h"
#include <memory>
#include <assert.h>
#include "rocksdb/slice.h"
#include "rocksdb/merge_operator.h"
#include "utilities/merge_operators.h"
#include "utilities/cassandra/format.h"
namespace rocksdb {
namespace cassandra {
// Implementation for the merge operation (merges two Cassandra values)
bool CassandraValueMergeOperator::FullMergeV2(
const MergeOperationInput& merge_in,
MergeOperationOutput* merge_out) const {
// Clear the *new_value for writing.
merge_out->new_value.clear();
std::vector<RowValue> row_values;
if (merge_in.existing_value) {
row_values.push_back(
RowValue::Deserialize(merge_in.existing_value->data(),
merge_in.existing_value->size()));
}
for (auto& operand : merge_in.operand_list) {
row_values.push_back(RowValue::Deserialize(operand.data(), operand.size()));
}
RowValue merged = RowValue::Merge(std::move(row_values));
merged = merged.RemoveTombstones(gc_grace_period_in_seconds_);
merge_out->new_value.reserve(merged.Size());
merged.Serialize(&(merge_out->new_value));
return true;
}
bool CassandraValueMergeOperator::PartialMergeMulti(
const Slice& key,
const std::deque<Slice>& operand_list,
std::string* new_value,
Logger* logger) const {
// Clear the *new_value for writing.
assert(new_value);
new_value->clear();
std::vector<RowValue> row_values;
for (auto& operand : operand_list) {
row_values.push_back(RowValue::Deserialize(operand.data(), operand.size()));
}
RowValue merged = RowValue::Merge(std::move(row_values));
new_value->reserve(merged.Size());
merged.Serialize(new_value);
return true;
}
const char* CassandraValueMergeOperator::Name() const {
return "CassandraValueMergeOperator";
}
} // namespace cassandra
} // namespace rocksdb