rocksdb/db/multi_cf_iterator_impl.h

297 lines
9.2 KiB
C
Raw Normal View History

MultiCFIterator Refactor - CoalescingIterator & AttributeGroupIterator (#12480) Summary: There are a couple of reasons to modify the current implementation of the MultiCfIterator, which implements the generic `Iterator` interface. - The default behavior of `value()`/`columns()` returning data from different Column Families for different keys can be prone to errors, even though there might be valid use cases where users do not care about the origin of the value/columns. - The `attribute_groups()` API, which is not yet implemented, will not be useful for a single-CF iterator. In this PR, we are implementing the following changes: - `IteratorBase` introduced, which includes all basic iterator functions except `value()` and `columns()`. - `Iterator`, which now inherits from `IteratorBase`, includes `value()` and `columns()`. - New public interface `AttributeGroupIterator` inherits from `IteratorBase` and additionally includes `attribute_groups()` (to be implemented). - Renamed former `MultiCfIterator` to `CoalescingIterator` which inherits from `Iterator` - Existing MultiCfIteratorTest has been split into two - `CoalescingIteratorTest` and `AttributeGroupIteratorTest`. - Moved AttributeGroup related code from `wide_columns.h` to a new file, `attribute_groups.h`. Some Implementation Details - `MultiCfIteratorImpl` takes two functions - `populate_func` and `reset_func` and use them to populate `value_` and `columns_` in CoalescingIterator and `attribute_groups_` in AttributeGroupIterator. In CoalescingIterator, populate_func is `Coalesce()`, in AttributeGroupIterator populate_func is `AddToAttributeGroups()`. `reset_func` clears populated value_, columns_ and attribute_groups_ accordingly. - `Coalesce()` merge sorts columns from multiple CFs when a key exists in more than on CFs. column that appears in later CF overwrites the prior ones. For example, if CF1 has `"key_1" ==> {"col_1": "foo", "col_2", "baz"}` and CF2 has `"key_1" ==> {"col_2": "quux", "col_3", "bla"}`, and when the iterator is at `key_1`, `columns()` will return `{"col_1": "foo", "col_2", "quux", "col_3", "bla"}` In this example, `value()` will be empty, because none of them have values for `kDefaultColumnName` Pull Request resolved: https://github.com/facebook/rocksdb/pull/12480 Test Plan: ## Unit Test ``` ./multi_cf_iterator_test ``` ## Performance Test To make sure this change does not impact existing `Iterator` performance **Build** ``` $> make -j64 release ``` **Setup** ``` $> TEST_TMPDIR=/dev/shm/db_bench ./db_bench -benchmarks="filluniquerandom" -key_size=32 -value_size=512 -num=1000000 -compression_type=none ``` **Run** ``` TEST_TMPDIR=/dev/shm/db_bench ./db_bench -use_existing_db=1 -benchmarks="newiterator,seekrandom" -cache_size=10485760000 ``` **Before the change** ``` DB path: [/dev/shm/db_bench/dbbench] newiterator : 0.519 micros/op 1927904 ops/sec 0.519 seconds 1000000 operations; DB path: [/dev/shm/db_bench/dbbench] seekrandom : 5.302 micros/op 188589 ops/sec 5.303 seconds 1000000 operations; (0 of 1000000 found) ``` **After the change** ``` DB path: [/dev/shm/db_bench/dbbench] newiterator : 0.497 micros/op 2011012 ops/sec 0.497 seconds 1000000 operations; DB path: [/dev/shm/db_bench/dbbench] seekrandom : 5.252 micros/op 190405 ops/sec 5.252 seconds 1000000 operations; (0 of 1000000 found) ``` Reviewed By: ltamasi Differential Revision: D55353909 Pulled By: jaykorean fbshipit-source-id: 8d7786ffee09e022261ce34aa60e8633685e1946
2024-04-11 18:34:04 +00:00
// Copyright (c) Meta Platforms, Inc. and affiliates.
// This source code is licensed under both the GPLv2 (found in the
// COPYING file in the root directory) and Apache 2.0 License
// (found in the LICENSE.Apache file in the root directory).
#pragma once
#include <functional>
#include <variant>
#include "rocksdb/comparator.h"
#include "rocksdb/iterator.h"
#include "rocksdb/options.h"
#include "util/heap.h"
namespace ROCKSDB_NAMESPACE {
MultiCfIterator - AttributeGroupIter Impl & CoalescingIter Optimization (#12534) Summary: Continuing from the previous MultiCfIterator Implementations - (https://github.com/facebook/rocksdb/issues/12422, https://github.com/facebook/rocksdb/issues/12480 #12465), this PR completes the `AttributeGroupIterator` by implementing `AttributeGroupIteratorImpl::AddToAttributeGroups()`. While implementing the `AttributeGroupIterator`, we had to make some changes in `MultiCfIteratorImpl` and found an opportunity to improve `Coalesce()` in `CoalescingIterator`. Lifting `UNDER CONSTRUCTION - DO NOT USE` comment by replacing it with `EXPERIMENTAL` Here are some implementation details: - `IteratorAttributeGroups` is introduced to avoid having to copy all `WideColumn` objects during iteration. - `PopulateIterator()` no longer advances non-top iterators that have the same key as the top iterator in the heap. - `AdvanceIterator()` needs to advance the non-top iterators when they have the same key as the top iterator in the heap. - Instead of populating one by one, `PopulateIterator()` now collects all items with the same key and calls `populate_func(items)` at once. - This allowed optimization in `Coalesce()` such that we no longer do K-1 rounds of 2-way merge, but do one K-way merge instead. Pull Request resolved: https://github.com/facebook/rocksdb/pull/12534 Test Plan: Uncommented the assertions in `verifyAttributeGroupIterator()` ``` ./multi_cf_iterator_test ``` Reviewed By: ltamasi Differential Revision: D56089019 Pulled By: jaykorean fbshipit-source-id: 6b0b4247e221f69b40b147d41492008cc9b15054
2024-04-16 15:45:38 +00:00
struct MultiCfIteratorInfo {
ColumnFamilyHandle* cfh;
Iterator* iterator;
int order;
};
MultiCFIterator Refactor - CoalescingIterator & AttributeGroupIterator (#12480) Summary: There are a couple of reasons to modify the current implementation of the MultiCfIterator, which implements the generic `Iterator` interface. - The default behavior of `value()`/`columns()` returning data from different Column Families for different keys can be prone to errors, even though there might be valid use cases where users do not care about the origin of the value/columns. - The `attribute_groups()` API, which is not yet implemented, will not be useful for a single-CF iterator. In this PR, we are implementing the following changes: - `IteratorBase` introduced, which includes all basic iterator functions except `value()` and `columns()`. - `Iterator`, which now inherits from `IteratorBase`, includes `value()` and `columns()`. - New public interface `AttributeGroupIterator` inherits from `IteratorBase` and additionally includes `attribute_groups()` (to be implemented). - Renamed former `MultiCfIterator` to `CoalescingIterator` which inherits from `Iterator` - Existing MultiCfIteratorTest has been split into two - `CoalescingIteratorTest` and `AttributeGroupIteratorTest`. - Moved AttributeGroup related code from `wide_columns.h` to a new file, `attribute_groups.h`. Some Implementation Details - `MultiCfIteratorImpl` takes two functions - `populate_func` and `reset_func` and use them to populate `value_` and `columns_` in CoalescingIterator and `attribute_groups_` in AttributeGroupIterator. In CoalescingIterator, populate_func is `Coalesce()`, in AttributeGroupIterator populate_func is `AddToAttributeGroups()`. `reset_func` clears populated value_, columns_ and attribute_groups_ accordingly. - `Coalesce()` merge sorts columns from multiple CFs when a key exists in more than on CFs. column that appears in later CF overwrites the prior ones. For example, if CF1 has `"key_1" ==> {"col_1": "foo", "col_2", "baz"}` and CF2 has `"key_1" ==> {"col_2": "quux", "col_3", "bla"}`, and when the iterator is at `key_1`, `columns()` will return `{"col_1": "foo", "col_2", "quux", "col_3", "bla"}` In this example, `value()` will be empty, because none of them have values for `kDefaultColumnName` Pull Request resolved: https://github.com/facebook/rocksdb/pull/12480 Test Plan: ## Unit Test ``` ./multi_cf_iterator_test ``` ## Performance Test To make sure this change does not impact existing `Iterator` performance **Build** ``` $> make -j64 release ``` **Setup** ``` $> TEST_TMPDIR=/dev/shm/db_bench ./db_bench -benchmarks="filluniquerandom" -key_size=32 -value_size=512 -num=1000000 -compression_type=none ``` **Run** ``` TEST_TMPDIR=/dev/shm/db_bench ./db_bench -use_existing_db=1 -benchmarks="newiterator,seekrandom" -cache_size=10485760000 ``` **Before the change** ``` DB path: [/dev/shm/db_bench/dbbench] newiterator : 0.519 micros/op 1927904 ops/sec 0.519 seconds 1000000 operations; DB path: [/dev/shm/db_bench/dbbench] seekrandom : 5.302 micros/op 188589 ops/sec 5.303 seconds 1000000 operations; (0 of 1000000 found) ``` **After the change** ``` DB path: [/dev/shm/db_bench/dbbench] newiterator : 0.497 micros/op 2011012 ops/sec 0.497 seconds 1000000 operations; DB path: [/dev/shm/db_bench/dbbench] seekrandom : 5.252 micros/op 190405 ops/sec 5.252 seconds 1000000 operations; (0 of 1000000 found) ``` Reviewed By: ltamasi Differential Revision: D55353909 Pulled By: jaykorean fbshipit-source-id: 8d7786ffee09e022261ce34aa60e8633685e1946
2024-04-11 18:34:04 +00:00
class MultiCfIteratorImpl {
public:
MultiCfIteratorImpl(
const Comparator* comparator,
const std::vector<ColumnFamilyHandle*>& column_families,
const std::vector<Iterator*>& child_iterators,
std::function<void()> reset_func,
MultiCfIterator - AttributeGroupIter Impl & CoalescingIter Optimization (#12534) Summary: Continuing from the previous MultiCfIterator Implementations - (https://github.com/facebook/rocksdb/issues/12422, https://github.com/facebook/rocksdb/issues/12480 #12465), this PR completes the `AttributeGroupIterator` by implementing `AttributeGroupIteratorImpl::AddToAttributeGroups()`. While implementing the `AttributeGroupIterator`, we had to make some changes in `MultiCfIteratorImpl` and found an opportunity to improve `Coalesce()` in `CoalescingIterator`. Lifting `UNDER CONSTRUCTION - DO NOT USE` comment by replacing it with `EXPERIMENTAL` Here are some implementation details: - `IteratorAttributeGroups` is introduced to avoid having to copy all `WideColumn` objects during iteration. - `PopulateIterator()` no longer advances non-top iterators that have the same key as the top iterator in the heap. - `AdvanceIterator()` needs to advance the non-top iterators when they have the same key as the top iterator in the heap. - Instead of populating one by one, `PopulateIterator()` now collects all items with the same key and calls `populate_func(items)` at once. - This allowed optimization in `Coalesce()` such that we no longer do K-1 rounds of 2-way merge, but do one K-way merge instead. Pull Request resolved: https://github.com/facebook/rocksdb/pull/12534 Test Plan: Uncommented the assertions in `verifyAttributeGroupIterator()` ``` ./multi_cf_iterator_test ``` Reviewed By: ltamasi Differential Revision: D56089019 Pulled By: jaykorean fbshipit-source-id: 6b0b4247e221f69b40b147d41492008cc9b15054
2024-04-16 15:45:38 +00:00
std::function<void(const autovector<MultiCfIteratorInfo>&)> populate_func)
MultiCFIterator Refactor - CoalescingIterator & AttributeGroupIterator (#12480) Summary: There are a couple of reasons to modify the current implementation of the MultiCfIterator, which implements the generic `Iterator` interface. - The default behavior of `value()`/`columns()` returning data from different Column Families for different keys can be prone to errors, even though there might be valid use cases where users do not care about the origin of the value/columns. - The `attribute_groups()` API, which is not yet implemented, will not be useful for a single-CF iterator. In this PR, we are implementing the following changes: - `IteratorBase` introduced, which includes all basic iterator functions except `value()` and `columns()`. - `Iterator`, which now inherits from `IteratorBase`, includes `value()` and `columns()`. - New public interface `AttributeGroupIterator` inherits from `IteratorBase` and additionally includes `attribute_groups()` (to be implemented). - Renamed former `MultiCfIterator` to `CoalescingIterator` which inherits from `Iterator` - Existing MultiCfIteratorTest has been split into two - `CoalescingIteratorTest` and `AttributeGroupIteratorTest`. - Moved AttributeGroup related code from `wide_columns.h` to a new file, `attribute_groups.h`. Some Implementation Details - `MultiCfIteratorImpl` takes two functions - `populate_func` and `reset_func` and use them to populate `value_` and `columns_` in CoalescingIterator and `attribute_groups_` in AttributeGroupIterator. In CoalescingIterator, populate_func is `Coalesce()`, in AttributeGroupIterator populate_func is `AddToAttributeGroups()`. `reset_func` clears populated value_, columns_ and attribute_groups_ accordingly. - `Coalesce()` merge sorts columns from multiple CFs when a key exists in more than on CFs. column that appears in later CF overwrites the prior ones. For example, if CF1 has `"key_1" ==> {"col_1": "foo", "col_2", "baz"}` and CF2 has `"key_1" ==> {"col_2": "quux", "col_3", "bla"}`, and when the iterator is at `key_1`, `columns()` will return `{"col_1": "foo", "col_2", "quux", "col_3", "bla"}` In this example, `value()` will be empty, because none of them have values for `kDefaultColumnName` Pull Request resolved: https://github.com/facebook/rocksdb/pull/12480 Test Plan: ## Unit Test ``` ./multi_cf_iterator_test ``` ## Performance Test To make sure this change does not impact existing `Iterator` performance **Build** ``` $> make -j64 release ``` **Setup** ``` $> TEST_TMPDIR=/dev/shm/db_bench ./db_bench -benchmarks="filluniquerandom" -key_size=32 -value_size=512 -num=1000000 -compression_type=none ``` **Run** ``` TEST_TMPDIR=/dev/shm/db_bench ./db_bench -use_existing_db=1 -benchmarks="newiterator,seekrandom" -cache_size=10485760000 ``` **Before the change** ``` DB path: [/dev/shm/db_bench/dbbench] newiterator : 0.519 micros/op 1927904 ops/sec 0.519 seconds 1000000 operations; DB path: [/dev/shm/db_bench/dbbench] seekrandom : 5.302 micros/op 188589 ops/sec 5.303 seconds 1000000 operations; (0 of 1000000 found) ``` **After the change** ``` DB path: [/dev/shm/db_bench/dbbench] newiterator : 0.497 micros/op 2011012 ops/sec 0.497 seconds 1000000 operations; DB path: [/dev/shm/db_bench/dbbench] seekrandom : 5.252 micros/op 190405 ops/sec 5.252 seconds 1000000 operations; (0 of 1000000 found) ``` Reviewed By: ltamasi Differential Revision: D55353909 Pulled By: jaykorean fbshipit-source-id: 8d7786ffee09e022261ce34aa60e8633685e1946
2024-04-11 18:34:04 +00:00
: comparator_(comparator),
heap_(MultiCfMinHeap(
MultiCfHeapItemComparator<std::greater<int>>(comparator_))),
reset_func_(std::move(reset_func)),
populate_func_(std::move(populate_func)) {
assert(column_families.size() > 0 &&
column_families.size() == child_iterators.size());
cfh_iter_pairs_.reserve(column_families.size());
for (size_t i = 0; i < column_families.size(); ++i) {
cfh_iter_pairs_.emplace_back(
column_families[i], std::unique_ptr<Iterator>(child_iterators[i]));
}
}
~MultiCfIteratorImpl() { status_.PermitUncheckedError(); }
// No copy allowed
MultiCfIteratorImpl(const MultiCfIteratorImpl&) = delete;
MultiCfIteratorImpl& operator=(const MultiCfIteratorImpl&) = delete;
Slice key() const {
assert(Valid());
return current()->key();
}
bool Valid() const {
if (std::holds_alternative<MultiCfMaxHeap>(heap_)) {
auto& max_heap = std::get<MultiCfMaxHeap>(heap_);
return !max_heap.empty() && status_.ok();
}
auto& min_heap = std::get<MultiCfMinHeap>(heap_);
return !min_heap.empty() && status_.ok();
}
Status status() const { return status_; }
void SeekToFirst() {
auto& min_heap = GetHeap<MultiCfMinHeap>([this]() { InitMinHeap(); });
MultiCfIterator - AttributeGroupIter Impl & CoalescingIter Optimization (#12534) Summary: Continuing from the previous MultiCfIterator Implementations - (https://github.com/facebook/rocksdb/issues/12422, https://github.com/facebook/rocksdb/issues/12480 #12465), this PR completes the `AttributeGroupIterator` by implementing `AttributeGroupIteratorImpl::AddToAttributeGroups()`. While implementing the `AttributeGroupIterator`, we had to make some changes in `MultiCfIteratorImpl` and found an opportunity to improve `Coalesce()` in `CoalescingIterator`. Lifting `UNDER CONSTRUCTION - DO NOT USE` comment by replacing it with `EXPERIMENTAL` Here are some implementation details: - `IteratorAttributeGroups` is introduced to avoid having to copy all `WideColumn` objects during iteration. - `PopulateIterator()` no longer advances non-top iterators that have the same key as the top iterator in the heap. - `AdvanceIterator()` needs to advance the non-top iterators when they have the same key as the top iterator in the heap. - Instead of populating one by one, `PopulateIterator()` now collects all items with the same key and calls `populate_func(items)` at once. - This allowed optimization in `Coalesce()` such that we no longer do K-1 rounds of 2-way merge, but do one K-way merge instead. Pull Request resolved: https://github.com/facebook/rocksdb/pull/12534 Test Plan: Uncommented the assertions in `verifyAttributeGroupIterator()` ``` ./multi_cf_iterator_test ``` Reviewed By: ltamasi Differential Revision: D56089019 Pulled By: jaykorean fbshipit-source-id: 6b0b4247e221f69b40b147d41492008cc9b15054
2024-04-16 15:45:38 +00:00
SeekCommon(min_heap, [](Iterator* iter) { iter->SeekToFirst(); });
MultiCFIterator Refactor - CoalescingIterator & AttributeGroupIterator (#12480) Summary: There are a couple of reasons to modify the current implementation of the MultiCfIterator, which implements the generic `Iterator` interface. - The default behavior of `value()`/`columns()` returning data from different Column Families for different keys can be prone to errors, even though there might be valid use cases where users do not care about the origin of the value/columns. - The `attribute_groups()` API, which is not yet implemented, will not be useful for a single-CF iterator. In this PR, we are implementing the following changes: - `IteratorBase` introduced, which includes all basic iterator functions except `value()` and `columns()`. - `Iterator`, which now inherits from `IteratorBase`, includes `value()` and `columns()`. - New public interface `AttributeGroupIterator` inherits from `IteratorBase` and additionally includes `attribute_groups()` (to be implemented). - Renamed former `MultiCfIterator` to `CoalescingIterator` which inherits from `Iterator` - Existing MultiCfIteratorTest has been split into two - `CoalescingIteratorTest` and `AttributeGroupIteratorTest`. - Moved AttributeGroup related code from `wide_columns.h` to a new file, `attribute_groups.h`. Some Implementation Details - `MultiCfIteratorImpl` takes two functions - `populate_func` and `reset_func` and use them to populate `value_` and `columns_` in CoalescingIterator and `attribute_groups_` in AttributeGroupIterator. In CoalescingIterator, populate_func is `Coalesce()`, in AttributeGroupIterator populate_func is `AddToAttributeGroups()`. `reset_func` clears populated value_, columns_ and attribute_groups_ accordingly. - `Coalesce()` merge sorts columns from multiple CFs when a key exists in more than on CFs. column that appears in later CF overwrites the prior ones. For example, if CF1 has `"key_1" ==> {"col_1": "foo", "col_2", "baz"}` and CF2 has `"key_1" ==> {"col_2": "quux", "col_3", "bla"}`, and when the iterator is at `key_1`, `columns()` will return `{"col_1": "foo", "col_2", "quux", "col_3", "bla"}` In this example, `value()` will be empty, because none of them have values for `kDefaultColumnName` Pull Request resolved: https://github.com/facebook/rocksdb/pull/12480 Test Plan: ## Unit Test ``` ./multi_cf_iterator_test ``` ## Performance Test To make sure this change does not impact existing `Iterator` performance **Build** ``` $> make -j64 release ``` **Setup** ``` $> TEST_TMPDIR=/dev/shm/db_bench ./db_bench -benchmarks="filluniquerandom" -key_size=32 -value_size=512 -num=1000000 -compression_type=none ``` **Run** ``` TEST_TMPDIR=/dev/shm/db_bench ./db_bench -use_existing_db=1 -benchmarks="newiterator,seekrandom" -cache_size=10485760000 ``` **Before the change** ``` DB path: [/dev/shm/db_bench/dbbench] newiterator : 0.519 micros/op 1927904 ops/sec 0.519 seconds 1000000 operations; DB path: [/dev/shm/db_bench/dbbench] seekrandom : 5.302 micros/op 188589 ops/sec 5.303 seconds 1000000 operations; (0 of 1000000 found) ``` **After the change** ``` DB path: [/dev/shm/db_bench/dbbench] newiterator : 0.497 micros/op 2011012 ops/sec 0.497 seconds 1000000 operations; DB path: [/dev/shm/db_bench/dbbench] seekrandom : 5.252 micros/op 190405 ops/sec 5.252 seconds 1000000 operations; (0 of 1000000 found) ``` Reviewed By: ltamasi Differential Revision: D55353909 Pulled By: jaykorean fbshipit-source-id: 8d7786ffee09e022261ce34aa60e8633685e1946
2024-04-11 18:34:04 +00:00
}
void Seek(const Slice& target) {
auto& min_heap = GetHeap<MultiCfMinHeap>([this]() { InitMinHeap(); });
MultiCfIterator - AttributeGroupIter Impl & CoalescingIter Optimization (#12534) Summary: Continuing from the previous MultiCfIterator Implementations - (https://github.com/facebook/rocksdb/issues/12422, https://github.com/facebook/rocksdb/issues/12480 #12465), this PR completes the `AttributeGroupIterator` by implementing `AttributeGroupIteratorImpl::AddToAttributeGroups()`. While implementing the `AttributeGroupIterator`, we had to make some changes in `MultiCfIteratorImpl` and found an opportunity to improve `Coalesce()` in `CoalescingIterator`. Lifting `UNDER CONSTRUCTION - DO NOT USE` comment by replacing it with `EXPERIMENTAL` Here are some implementation details: - `IteratorAttributeGroups` is introduced to avoid having to copy all `WideColumn` objects during iteration. - `PopulateIterator()` no longer advances non-top iterators that have the same key as the top iterator in the heap. - `AdvanceIterator()` needs to advance the non-top iterators when they have the same key as the top iterator in the heap. - Instead of populating one by one, `PopulateIterator()` now collects all items with the same key and calls `populate_func(items)` at once. - This allowed optimization in `Coalesce()` such that we no longer do K-1 rounds of 2-way merge, but do one K-way merge instead. Pull Request resolved: https://github.com/facebook/rocksdb/pull/12534 Test Plan: Uncommented the assertions in `verifyAttributeGroupIterator()` ``` ./multi_cf_iterator_test ``` Reviewed By: ltamasi Differential Revision: D56089019 Pulled By: jaykorean fbshipit-source-id: 6b0b4247e221f69b40b147d41492008cc9b15054
2024-04-16 15:45:38 +00:00
SeekCommon(min_heap, [&target](Iterator* iter) { iter->Seek(target); });
MultiCFIterator Refactor - CoalescingIterator & AttributeGroupIterator (#12480) Summary: There are a couple of reasons to modify the current implementation of the MultiCfIterator, which implements the generic `Iterator` interface. - The default behavior of `value()`/`columns()` returning data from different Column Families for different keys can be prone to errors, even though there might be valid use cases where users do not care about the origin of the value/columns. - The `attribute_groups()` API, which is not yet implemented, will not be useful for a single-CF iterator. In this PR, we are implementing the following changes: - `IteratorBase` introduced, which includes all basic iterator functions except `value()` and `columns()`. - `Iterator`, which now inherits from `IteratorBase`, includes `value()` and `columns()`. - New public interface `AttributeGroupIterator` inherits from `IteratorBase` and additionally includes `attribute_groups()` (to be implemented). - Renamed former `MultiCfIterator` to `CoalescingIterator` which inherits from `Iterator` - Existing MultiCfIteratorTest has been split into two - `CoalescingIteratorTest` and `AttributeGroupIteratorTest`. - Moved AttributeGroup related code from `wide_columns.h` to a new file, `attribute_groups.h`. Some Implementation Details - `MultiCfIteratorImpl` takes two functions - `populate_func` and `reset_func` and use them to populate `value_` and `columns_` in CoalescingIterator and `attribute_groups_` in AttributeGroupIterator. In CoalescingIterator, populate_func is `Coalesce()`, in AttributeGroupIterator populate_func is `AddToAttributeGroups()`. `reset_func` clears populated value_, columns_ and attribute_groups_ accordingly. - `Coalesce()` merge sorts columns from multiple CFs when a key exists in more than on CFs. column that appears in later CF overwrites the prior ones. For example, if CF1 has `"key_1" ==> {"col_1": "foo", "col_2", "baz"}` and CF2 has `"key_1" ==> {"col_2": "quux", "col_3", "bla"}`, and when the iterator is at `key_1`, `columns()` will return `{"col_1": "foo", "col_2", "quux", "col_3", "bla"}` In this example, `value()` will be empty, because none of them have values for `kDefaultColumnName` Pull Request resolved: https://github.com/facebook/rocksdb/pull/12480 Test Plan: ## Unit Test ``` ./multi_cf_iterator_test ``` ## Performance Test To make sure this change does not impact existing `Iterator` performance **Build** ``` $> make -j64 release ``` **Setup** ``` $> TEST_TMPDIR=/dev/shm/db_bench ./db_bench -benchmarks="filluniquerandom" -key_size=32 -value_size=512 -num=1000000 -compression_type=none ``` **Run** ``` TEST_TMPDIR=/dev/shm/db_bench ./db_bench -use_existing_db=1 -benchmarks="newiterator,seekrandom" -cache_size=10485760000 ``` **Before the change** ``` DB path: [/dev/shm/db_bench/dbbench] newiterator : 0.519 micros/op 1927904 ops/sec 0.519 seconds 1000000 operations; DB path: [/dev/shm/db_bench/dbbench] seekrandom : 5.302 micros/op 188589 ops/sec 5.303 seconds 1000000 operations; (0 of 1000000 found) ``` **After the change** ``` DB path: [/dev/shm/db_bench/dbbench] newiterator : 0.497 micros/op 2011012 ops/sec 0.497 seconds 1000000 operations; DB path: [/dev/shm/db_bench/dbbench] seekrandom : 5.252 micros/op 190405 ops/sec 5.252 seconds 1000000 operations; (0 of 1000000 found) ``` Reviewed By: ltamasi Differential Revision: D55353909 Pulled By: jaykorean fbshipit-source-id: 8d7786ffee09e022261ce34aa60e8633685e1946
2024-04-11 18:34:04 +00:00
}
void SeekToLast() {
auto& max_heap = GetHeap<MultiCfMaxHeap>([this]() { InitMaxHeap(); });
MultiCfIterator - AttributeGroupIter Impl & CoalescingIter Optimization (#12534) Summary: Continuing from the previous MultiCfIterator Implementations - (https://github.com/facebook/rocksdb/issues/12422, https://github.com/facebook/rocksdb/issues/12480 #12465), this PR completes the `AttributeGroupIterator` by implementing `AttributeGroupIteratorImpl::AddToAttributeGroups()`. While implementing the `AttributeGroupIterator`, we had to make some changes in `MultiCfIteratorImpl` and found an opportunity to improve `Coalesce()` in `CoalescingIterator`. Lifting `UNDER CONSTRUCTION - DO NOT USE` comment by replacing it with `EXPERIMENTAL` Here are some implementation details: - `IteratorAttributeGroups` is introduced to avoid having to copy all `WideColumn` objects during iteration. - `PopulateIterator()` no longer advances non-top iterators that have the same key as the top iterator in the heap. - `AdvanceIterator()` needs to advance the non-top iterators when they have the same key as the top iterator in the heap. - Instead of populating one by one, `PopulateIterator()` now collects all items with the same key and calls `populate_func(items)` at once. - This allowed optimization in `Coalesce()` such that we no longer do K-1 rounds of 2-way merge, but do one K-way merge instead. Pull Request resolved: https://github.com/facebook/rocksdb/pull/12534 Test Plan: Uncommented the assertions in `verifyAttributeGroupIterator()` ``` ./multi_cf_iterator_test ``` Reviewed By: ltamasi Differential Revision: D56089019 Pulled By: jaykorean fbshipit-source-id: 6b0b4247e221f69b40b147d41492008cc9b15054
2024-04-16 15:45:38 +00:00
SeekCommon(max_heap, [](Iterator* iter) { iter->SeekToLast(); });
MultiCFIterator Refactor - CoalescingIterator & AttributeGroupIterator (#12480) Summary: There are a couple of reasons to modify the current implementation of the MultiCfIterator, which implements the generic `Iterator` interface. - The default behavior of `value()`/`columns()` returning data from different Column Families for different keys can be prone to errors, even though there might be valid use cases where users do not care about the origin of the value/columns. - The `attribute_groups()` API, which is not yet implemented, will not be useful for a single-CF iterator. In this PR, we are implementing the following changes: - `IteratorBase` introduced, which includes all basic iterator functions except `value()` and `columns()`. - `Iterator`, which now inherits from `IteratorBase`, includes `value()` and `columns()`. - New public interface `AttributeGroupIterator` inherits from `IteratorBase` and additionally includes `attribute_groups()` (to be implemented). - Renamed former `MultiCfIterator` to `CoalescingIterator` which inherits from `Iterator` - Existing MultiCfIteratorTest has been split into two - `CoalescingIteratorTest` and `AttributeGroupIteratorTest`. - Moved AttributeGroup related code from `wide_columns.h` to a new file, `attribute_groups.h`. Some Implementation Details - `MultiCfIteratorImpl` takes two functions - `populate_func` and `reset_func` and use them to populate `value_` and `columns_` in CoalescingIterator and `attribute_groups_` in AttributeGroupIterator. In CoalescingIterator, populate_func is `Coalesce()`, in AttributeGroupIterator populate_func is `AddToAttributeGroups()`. `reset_func` clears populated value_, columns_ and attribute_groups_ accordingly. - `Coalesce()` merge sorts columns from multiple CFs when a key exists in more than on CFs. column that appears in later CF overwrites the prior ones. For example, if CF1 has `"key_1" ==> {"col_1": "foo", "col_2", "baz"}` and CF2 has `"key_1" ==> {"col_2": "quux", "col_3", "bla"}`, and when the iterator is at `key_1`, `columns()` will return `{"col_1": "foo", "col_2", "quux", "col_3", "bla"}` In this example, `value()` will be empty, because none of them have values for `kDefaultColumnName` Pull Request resolved: https://github.com/facebook/rocksdb/pull/12480 Test Plan: ## Unit Test ``` ./multi_cf_iterator_test ``` ## Performance Test To make sure this change does not impact existing `Iterator` performance **Build** ``` $> make -j64 release ``` **Setup** ``` $> TEST_TMPDIR=/dev/shm/db_bench ./db_bench -benchmarks="filluniquerandom" -key_size=32 -value_size=512 -num=1000000 -compression_type=none ``` **Run** ``` TEST_TMPDIR=/dev/shm/db_bench ./db_bench -use_existing_db=1 -benchmarks="newiterator,seekrandom" -cache_size=10485760000 ``` **Before the change** ``` DB path: [/dev/shm/db_bench/dbbench] newiterator : 0.519 micros/op 1927904 ops/sec 0.519 seconds 1000000 operations; DB path: [/dev/shm/db_bench/dbbench] seekrandom : 5.302 micros/op 188589 ops/sec 5.303 seconds 1000000 operations; (0 of 1000000 found) ``` **After the change** ``` DB path: [/dev/shm/db_bench/dbbench] newiterator : 0.497 micros/op 2011012 ops/sec 0.497 seconds 1000000 operations; DB path: [/dev/shm/db_bench/dbbench] seekrandom : 5.252 micros/op 190405 ops/sec 5.252 seconds 1000000 operations; (0 of 1000000 found) ``` Reviewed By: ltamasi Differential Revision: D55353909 Pulled By: jaykorean fbshipit-source-id: 8d7786ffee09e022261ce34aa60e8633685e1946
2024-04-11 18:34:04 +00:00
}
void SeekForPrev(const Slice& target) {
auto& max_heap = GetHeap<MultiCfMaxHeap>([this]() { InitMaxHeap(); });
MultiCfIterator - AttributeGroupIter Impl & CoalescingIter Optimization (#12534) Summary: Continuing from the previous MultiCfIterator Implementations - (https://github.com/facebook/rocksdb/issues/12422, https://github.com/facebook/rocksdb/issues/12480 #12465), this PR completes the `AttributeGroupIterator` by implementing `AttributeGroupIteratorImpl::AddToAttributeGroups()`. While implementing the `AttributeGroupIterator`, we had to make some changes in `MultiCfIteratorImpl` and found an opportunity to improve `Coalesce()` in `CoalescingIterator`. Lifting `UNDER CONSTRUCTION - DO NOT USE` comment by replacing it with `EXPERIMENTAL` Here are some implementation details: - `IteratorAttributeGroups` is introduced to avoid having to copy all `WideColumn` objects during iteration. - `PopulateIterator()` no longer advances non-top iterators that have the same key as the top iterator in the heap. - `AdvanceIterator()` needs to advance the non-top iterators when they have the same key as the top iterator in the heap. - Instead of populating one by one, `PopulateIterator()` now collects all items with the same key and calls `populate_func(items)` at once. - This allowed optimization in `Coalesce()` such that we no longer do K-1 rounds of 2-way merge, but do one K-way merge instead. Pull Request resolved: https://github.com/facebook/rocksdb/pull/12534 Test Plan: Uncommented the assertions in `verifyAttributeGroupIterator()` ``` ./multi_cf_iterator_test ``` Reviewed By: ltamasi Differential Revision: D56089019 Pulled By: jaykorean fbshipit-source-id: 6b0b4247e221f69b40b147d41492008cc9b15054
2024-04-16 15:45:38 +00:00
SeekCommon(max_heap,
[&target](Iterator* iter) { iter->SeekForPrev(target); });
MultiCFIterator Refactor - CoalescingIterator & AttributeGroupIterator (#12480) Summary: There are a couple of reasons to modify the current implementation of the MultiCfIterator, which implements the generic `Iterator` interface. - The default behavior of `value()`/`columns()` returning data from different Column Families for different keys can be prone to errors, even though there might be valid use cases where users do not care about the origin of the value/columns. - The `attribute_groups()` API, which is not yet implemented, will not be useful for a single-CF iterator. In this PR, we are implementing the following changes: - `IteratorBase` introduced, which includes all basic iterator functions except `value()` and `columns()`. - `Iterator`, which now inherits from `IteratorBase`, includes `value()` and `columns()`. - New public interface `AttributeGroupIterator` inherits from `IteratorBase` and additionally includes `attribute_groups()` (to be implemented). - Renamed former `MultiCfIterator` to `CoalescingIterator` which inherits from `Iterator` - Existing MultiCfIteratorTest has been split into two - `CoalescingIteratorTest` and `AttributeGroupIteratorTest`. - Moved AttributeGroup related code from `wide_columns.h` to a new file, `attribute_groups.h`. Some Implementation Details - `MultiCfIteratorImpl` takes two functions - `populate_func` and `reset_func` and use them to populate `value_` and `columns_` in CoalescingIterator and `attribute_groups_` in AttributeGroupIterator. In CoalescingIterator, populate_func is `Coalesce()`, in AttributeGroupIterator populate_func is `AddToAttributeGroups()`. `reset_func` clears populated value_, columns_ and attribute_groups_ accordingly. - `Coalesce()` merge sorts columns from multiple CFs when a key exists in more than on CFs. column that appears in later CF overwrites the prior ones. For example, if CF1 has `"key_1" ==> {"col_1": "foo", "col_2", "baz"}` and CF2 has `"key_1" ==> {"col_2": "quux", "col_3", "bla"}`, and when the iterator is at `key_1`, `columns()` will return `{"col_1": "foo", "col_2", "quux", "col_3", "bla"}` In this example, `value()` will be empty, because none of them have values for `kDefaultColumnName` Pull Request resolved: https://github.com/facebook/rocksdb/pull/12480 Test Plan: ## Unit Test ``` ./multi_cf_iterator_test ``` ## Performance Test To make sure this change does not impact existing `Iterator` performance **Build** ``` $> make -j64 release ``` **Setup** ``` $> TEST_TMPDIR=/dev/shm/db_bench ./db_bench -benchmarks="filluniquerandom" -key_size=32 -value_size=512 -num=1000000 -compression_type=none ``` **Run** ``` TEST_TMPDIR=/dev/shm/db_bench ./db_bench -use_existing_db=1 -benchmarks="newiterator,seekrandom" -cache_size=10485760000 ``` **Before the change** ``` DB path: [/dev/shm/db_bench/dbbench] newiterator : 0.519 micros/op 1927904 ops/sec 0.519 seconds 1000000 operations; DB path: [/dev/shm/db_bench/dbbench] seekrandom : 5.302 micros/op 188589 ops/sec 5.303 seconds 1000000 operations; (0 of 1000000 found) ``` **After the change** ``` DB path: [/dev/shm/db_bench/dbbench] newiterator : 0.497 micros/op 2011012 ops/sec 0.497 seconds 1000000 operations; DB path: [/dev/shm/db_bench/dbbench] seekrandom : 5.252 micros/op 190405 ops/sec 5.252 seconds 1000000 operations; (0 of 1000000 found) ``` Reviewed By: ltamasi Differential Revision: D55353909 Pulled By: jaykorean fbshipit-source-id: 8d7786ffee09e022261ce34aa60e8633685e1946
2024-04-11 18:34:04 +00:00
}
void Next() {
assert(Valid());
auto& min_heap = GetHeap<MultiCfMinHeap>([this]() {
Fix heap-use-after-free in MultiCfIteratorImpl (#12784) Summary: # Summary When changing the direction of the multi-cf-iter, we do this by `Seek(current_key)` (if changing from backward to forward) or `SeekForPrev(current_key)` (if forward -> backward) in the child iters and rebuild the heap. `Slice target` is just a pointer and contents are not guaranteed to be the same after re-init the heap. Pull Request resolved: https://github.com/facebook/rocksdb/pull/12784 Test Plan: I was able to steadily repro by building with `COMPILE_WITH_ASAN=1` running db_stress. ``` COMPILE_WITH_ASAN=1 make -j64 dbg ``` ``` ./db_stress --WAL_size_limit_MB=1 --WAL_ttl_seconds=60 --acquire_snapshot_one_in=10000 --adaptive_readahead=0 --adm_policy=2 --advise_random_on_open=1 --allow_data_in_errors=True --allow_fallocate=0 --async_io=0 --auto_readahead_size=1 --avoid_flush_during_recovery=0 --avoid_flush_during_shutdown=1 --avoid_unnecessary_blocking_io=1 --backup_max_size=104857600 --backup_one_in=1000 --batch_protection_bytes_per_key=0 --bgerror_resume_retry_interval=100 --block_align=1 --block_protection_bytes_per_key=8 --block_size=16384 --bloom_before_level=2147483646 --bloom_bits=62.9095874568401 --bottommost_compression_type=none --bottommost_file_compaction_delay=600 --bytes_per_sync=0 --cache_index_and_filter_blocks=0 --cache_index_and_filter_blocks_with_high_priority=0 --cache_size=33554432 --cache_type=lru_cache --charge_compression_dictionary_building_buffer=0 --charge_file_metadata=1 --charge_filter_construction=1 --charge_table_reader=0 --check_multiget_consistency=0 --check_multiget_entity_consistency=0 --checkpoint_one_in=10000 --checksum_type=kxxHash64 --clear_column_family_one_in=0 --compact_files_one_in=1000000 --compact_range_one_in=1000000 --compaction_pri=1 --compaction_readahead_size=0 --compaction_ttl=100 --compress_format_version=2 --compressed_secondary_cache_size=8388608 --compression_checksum=1 --compression_max_dict_buffer_bytes=1099511627775 --compression_max_dict_bytes=16384 --compression_parallel_threads=1 --compression_type=none --compression_use_zstd_dict_trainer=1 --compression_zstd_max_train_bytes=0 --continuous_verification_interval=0 --daily_offpeak_time_utc= --data_block_index_type=1 --db=/dev/shm/rocksdb_test/rocksdb_crashtest_whitebox --db_write_buffer_size=0 --default_temperature=kUnknown --default_write_temperature=kWarm --delete_obsolete_files_period_micros=21600000000 --delpercent=4 --delrangepercent=1 --destroy_db_initially=0 --detect_filter_construct_corruption=1 --disable_file_deletions_one_in=1000000 --disable_manual_compaction_one_in=10000 --disable_wal=0 --dump_malloc_stats=1 --enable_checksum_handoff=0 --enable_compaction_filter=0 --enable_custom_split_merge=1 --enable_do_not_compress_roles=0 --enable_index_compression=0 --enable_memtable_insert_with_hint_prefix_extractor=0 --enable_pipelined_write=1 --enable_sst_partitioner_factory=1 --enable_thread_tracking=1 --enable_write_thread_adaptive_yield=1 --error_recovery_with_no_fault_injection=0 --expected_values_dir=/dev/shm/rocksdb_test/rocksdb_crashtest_expected --fail_if_options_file_error=0 --fifo_allow_compaction=1 --file_checksum_impl=crc32c --fill_cache=0 --flush_one_in=1000000 --format_version=4 --get_all_column_family_metadata_one_in=1000000 --get_current_wal_file_one_in=0 --get_live_files_apis_one_in=10000 --get_properties_of_all_tables_one_in=1000000 --get_property_one_in=1000000 --get_sorted_wal_files_one_in=0 --hard_pending_compaction_bytes_limit=274877906944 --high_pri_pool_ratio=0 --index_block_restart_interval=4 --index_shortening=1 --index_type=0 --ingest_external_file_one_in=0 --initial_auto_readahead_size=524288 --inplace_update_support=0 --iterpercent=10 --key_len_percent_dist=1,30,69 --key_may_exist_one_in=100000 --kill_random_test=888887 --last_level_temperature=kHot --level_compaction_dynamic_level_bytes=1 --lock_wal_one_in=10000 --log2_keys_per_lock=10 --log_file_time_to_roll=60 --log_readahead_size=0 --long_running_snapshots=1 --low_pri_pool_ratio=0 --lowest_used_cache_tier=0 --manifest_preallocation_size=5120 --manual_wal_flush_one_in=0 --mark_for_compaction_one_file_in=0 --max_auto_readahead_size=16384 --max_background_compactions=20 --max_bytes_for_level_base=10485760 --max_key=100000 --max_key_len=3 --max_log_file_size=0 --max_manifest_file_size=1073741824 --max_sequential_skip_in_iterations=1 --max_total_wal_size=0 --max_write_batch_group_size_bytes=64 --max_write_buffer_number=3 --max_write_buffer_size_to_maintain=0 --memtable_insert_hint_per_batch=0 --memtable_max_range_deletions=0 --memtable_prefix_bloom_size_ratio=0 --memtable_protection_bytes_per_key=8 --memtable_whole_key_filtering=0 --memtablerep=skip_list --metadata_charge_policy=0 --metadata_read_fault_one_in=1000 --metadata_write_fault_one_in=128 --min_write_buffer_number_to_merge=1 --mmap_read=0 --mock_direct_io=True --nooverwritepercent=1 --num_file_reads_for_auto_readahead=1 --open_files=-1 --open_metadata_read_fault_one_in=0 --open_metadata_write_fault_one_in=0 --open_read_fault_one_in=0 --open_write_fault_one_in=16 --ops_per_thread=20000000 --optimize_filters_for_hits=0 --optimize_filters_for_memory=0 --optimize_multiget_for_io=0 --paranoid_file_checks=1 --partition_filters=0 --partition_pinning=3 --pause_background_one_in=1000000 --periodic_compaction_seconds=0 --persist_user_defined_timestamps=1 --prefix_size=-1 --prefixpercent=0 --prepopulate_block_cache=1 --preserve_internal_time_seconds=36000 --progress_reports=0 --promote_l0_one_in=0 --read_amp_bytes_per_bit=0 --read_fault_one_in=0 --readahead_size=0 --readpercent=50 --recycle_log_file_num=0 --reopen=20 --report_bg_io_stats=1 --reset_stats_one_in=10000 --sample_for_compression=0 --secondary_cache_fault_one_in=0 --secondary_cache_uri= --skip_stats_update_on_db_open=0 --snapshot_hold_ops=100000 --soft_pending_compaction_bytes_limit=68719476736 --sqfc_name=bar --sqfc_version=0 --sst_file_manager_bytes_per_sec=0 --sst_file_manager_bytes_per_truncate=0 --stats_dump_period_sec=10 --stats_history_buffer_size=1048576 --strict_bytes_per_sync=0 --subcompactions=1 --sync=0 --sync_fault_injection=1 --table_cache_numshardbits=6 --target_file_size_base=2097152 --target_file_size_multiplier=2 --test_batches_snapshots=0 --test_cf_consistency=0 --top_level_index_pinning=0 --uncache_aggressiveness=14 --universal_max_read_amp=-1 --unpartitioned_pinning=2 --use_adaptive_mutex=1 --use_adaptive_mutex_lru=0 --use_attribute_group=0 --use_delta_encoding=1 --use_direct_io_for_flush_and_compaction=0 --use_direct_reads=1 --use_full_merge_v1=0 --use_get_entity=1 --use_merge=0 --use_multi_cf_iterator=1 --use_multi_get_entity=1 --use_multiget=1 --use_put_entity_one_in=0 --use_sqfc_for_range_queries=1 --use_timed_put_one_in=0 --use_txn=0 --use_write_buffer_manager=0 --user_timestamp_size=8 --value_size_mult=32 --verification_only=0 --verify_checksum=1 --verify_checksum_one_in=1000000 --verify_compression=1 --verify_db_one_in=10000 --verify_file_checksums_one_in=1000 --verify_iterator_with_expected_state_one_in=5 --verify_sst_unique_id_in_manifest=1 --wal_bytes_per_sync=0 --wal_compression=zstd --write_buffer_size=4194304 --write_dbid_to_manifest=1 --write_fault_one_in=0 --writepercent=35 ``` ``` ==1606272==ERROR: AddressSanitizer: heap-use-after-free on address 0x6060000b0cc0 at pc 0x7f733469c7de bp 0x7f7311bfcfe0 sp 0x7f7311bfc790 READ of size 40 at 0x6060000b0cc0 thread T57 #0 0x7f733469c7dd in __interceptor_memcpy /home/engshare/third-party2/gcc/11.x/src/gcc-11.x/libsanitizer/sanitizer_common/sanitizer_common_interceptors.inc:827 https://github.com/facebook/rocksdb/issues/1 0x7f7331f65f7e in rocksdb::IterKey::SetInternalKey(rocksdb::Slice const&, rocksdb::Slice const&, unsigned long, rocksdb::ValueType, rocksdb::Slice const*) db/dbformat.h:761 https://github.com/facebook/rocksdb/issues/2 0x7f7331f661ee in rocksdb::IterKey::SetInternalKey(rocksdb::Slice const&, unsigned long, rocksdb::ValueType, rocksdb::Slice const*) db/dbformat.h:776 https://github.com/facebook/rocksdb/issues/3 0x7f73323039ff in rocksdb::DBIter::SetSavedKeyToSeekTarget(rocksdb::Slice const&) db/db_iter.cc:1462 https://github.com/facebook/rocksdb/issues/4 0x7f7332304eb8 in rocksdb::DBIter::Seek(rocksdb::Slice const&) db/db_iter.cc:1540 https://github.com/facebook/rocksdb/issues/5 0x7f7331d94abd in rocksdb::ArenaWrappedDBIter::Seek(rocksdb::Slice const&) (/data/users/jewoongh/rocksdb/librocksdb.so.9.4+0x1394abd) https://github.com/facebook/rocksdb/issues/6 0x7f73320f1a52 in rocksdb::MultiCfIteratorImpl::Seek(rocksdb::Slice const&)::{lambda(rocksdb::Iterator*)https://github.com/facebook/rocksdb/issues/2}::operator()(rocksdb::Iterator*) const db/multi_cf_iterator_impl.h:73 https://github.com/facebook/rocksdb/issues/7 0x7f73320fccf0 in void rocksdb::MultiCfIteratorImpl::SeekCommon<rocksdb::BinaryHeap<rocksdb::MultiCfIteratorInfo, rocksdb::MultiCfIteratorImpl::MultiCfHeapItemComparator<std::greater<int> > >, rocksdb::MultiCfIteratorImpl::Seek(rocksdb::Slice const&)::{lambda(rocksdb::Iterator*)https://github.com/facebook/rocksdb/issues/2}>(rocksdb::BinaryHeap<rocksdb::MultiCfIteratorInfo, rocksdb::MultiCfIteratorImpl::MultiCfHeapItemComparator<std::greater<int> > >&, rocksdb::MultiCfIteratorImpl::Seek(rocksdb::Slice const&)::{lambda(rocksdb::Iterator*)https://github.com/facebook/rocksdb/issues/2}) (/data/users/jewoongh/rocksdb/librocksdb.so.9.4+0x16fccf0) https://github.com/facebook/rocksdb/issues/8 0x7f73320f1a93 in rocksdb::MultiCfIteratorImpl::Seek(rocksdb::Slice const&) db/multi_cf_iterator_impl.h:73 https://github.com/facebook/rocksdb/issues/9 0x7f73320f1dbe in rocksdb::MultiCfIteratorImpl::Next()::{lambda()https://github.com/facebook/rocksdb/issues/1}::operator()() const db/multi_cf_iterator_impl.h:90 https://github.com/facebook/rocksdb/issues/10 0x7f73320fe159 in rocksdb::BinaryHeap<rocksdb::MultiCfIteratorInfo, rocksdb::MultiCfIteratorImpl::MultiCfHeapItemComparator<std::greater<int> > >& rocksdb::MultiCfIteratorImpl::GetHeap<rocksdb::BinaryHeap<rocksdb::MultiCfIteratorInfo, rocksdb::MultiCfIteratorImpl::MultiCfHeapItemComparator<std::greater<int> > >, rocksdb::MultiCfIteratorImpl::Next()::{lambda()https://github.com/facebook/rocksdb/issues/1}>(rocksdb::MultiCfIteratorImpl::Next()::{lambda()https://github.com/facebook/rocksdb/issues/1}) (/data/users/jewoongh/rocksdb/librocksdb.so.9.4+0x16fe159) https://github.com/facebook/rocksdb/issues/11 0x7f73320f1ec9 in rocksdb::MultiCfIteratorImpl::Next() db/multi_cf_iterator_impl.h:87 https://github.com/facebook/rocksdb/issues/12 0x7f73320f3255 in rocksdb::CoalescingIterator::Next() db/coalescing_iterator.h:34 https://github.com/facebook/rocksdb/issues/13 0x66f28a in TestIterateImpl<rocksdb::Iterator, rocksdb::StressTest::TestIterate(rocksdb::ThreadState*, const rocksdb::ReadOptions&, const std::vector<int>&, const std::vector<long int>&)::<lambda(const rocksdb::ReadOptions&)>, rocksdb::StressTest::TestIterate(rocksdb::ThreadState*, const rocksdb::ReadOptions&, const std::vector<int>&, const std::vector<long int>&)::<lambda(rocksdb::Iterator*)> > db_stress_tool/db_stress_test_base.cc:1718 https://github.com/facebook/rocksdb/issues/14 0x6440b4 in rocksdb::StressTest::TestIterate(rocksdb::ThreadState*, rocksdb::ReadOptions const&, std::vector<int, std::allocator<int> > const&, std::vector<long, std::allocator<long> > const&) db_stress_tool/db_stress_test_base.cc:1504 https://github.com/facebook/rocksdb/issues/15 0x640cb0 in rocksdb::StressTest::OperateDb(rocksdb::ThreadState*) db_stress_tool/db_stress_test_base.cc:1376 https://github.com/facebook/rocksdb/issues/16 0x6004f6 in rocksdb::ThreadBody(void*) db_stress_tool/db_stress_driver.cc:39 https://github.com/facebook/rocksdb/issues/17 0x7f73327caed4 in StartThreadWrapper env/env_posix.cc:469 https://github.com/facebook/rocksdb/issues/18 0x7f733029abc8 in start_thread /home/engshare/third-party2/glibc/2.34/src/glibc-2.34/nptl/pthread_create.c:434 https://github.com/facebook/rocksdb/issues/19 0x7f733032cf5b in __GI___clone3 (/usr/local/fbcode/platform010/lib/libc.so.6+0x12cf5b) 0x6060000b0cc0 is located 0 bytes inside of 55-byte region [0x6060000b0cc0,0x6060000b0cf7) freed by thread T57 here: #0 0x7f73346d1d77 in operator delete[](void*) /home/engshare/third-party2/gcc/11.x/src/gcc-11.x/libsanitizer/asan/asan_new_delete.cpp:163 https://github.com/facebook/rocksdb/issues/1 0x7f7331d9274b in rocksdb::IterKey::ResetBuffer() db/dbformat.h:830 https://github.com/facebook/rocksdb/issues/2 0x7f73323146b9 in rocksdb::IterKey::EnlargeBuffer(unsigned long) db/dbformat.cc:278 https://github.com/facebook/rocksdb/issues/3 0x7f7331f33031 in rocksdb::IterKey::EnlargeBufferIfNeeded(unsigned long) db/dbformat.h:846 https://github.com/facebook/rocksdb/issues/4 0x7f7331f65ee0 in rocksdb::IterKey::SetInternalKey(rocksdb::Slice const&, rocksdb::Slice const&, unsigned long, rocksdb::ValueType, rocksdb::Slice const*) db/dbformat.h:757 https://github.com/facebook/rocksdb/issues/5 0x7f7331f661ee in rocksdb::IterKey::SetInternalKey(rocksdb::Slice const&, unsigned long, rocksdb::ValueType, rocksdb::Slice const*) db/dbformat.h:776 https://github.com/facebook/rocksdb/issues/6 0x7f73323039ff in rocksdb::DBIter::SetSavedKeyToSeekTarget(rocksdb::Slice const&) db/db_iter.cc:1462 https://github.com/facebook/rocksdb/issues/7 0x7f7332304eb8 in rocksdb::DBIter::Seek(rocksdb::Slice const&) db/db_iter.cc:1540 https://github.com/facebook/rocksdb/issues/8 0x7f7331d94abd in rocksdb::ArenaWrappedDBIter::Seek(rocksdb::Slice const&) (/data/users/jewoongh/rocksdb/librocksdb.so.9.4+0x1394abd) https://github.com/facebook/rocksdb/issues/9 0x7f73320f1a52 in rocksdb::MultiCfIteratorImpl::Seek(rocksdb::Slice const&)::{lambda(rocksdb::Iterator*)https://github.com/facebook/rocksdb/issues/2}::operator()(rocksdb::Iterator*) const db/multi_cf_iterator_impl.h:73 https://github.com/facebook/rocksdb/issues/10 0x7f73320fccf0 in void rocksdb::MultiCfIteratorImpl::SeekCommon<rocksdb::BinaryHeap<rocksdb::MultiCfIteratorInfo, rocksdb::MultiCfIteratorImpl::MultiCfHeapItemComparator<std::greater<int> > >, rocksdb::MultiCfIteratorImpl::Seek(rocksdb::Slice const&)::{lambda(rocksdb::Iterator*)https://github.com/facebook/rocksdb/issues/2}>(rocksdb::BinaryHeap<rocksdb::MultiCfIteratorInfo, rocksdb::MultiCfIteratorImpl::MultiCfHeapItemComparator<std::greater<int> > >&, rocksdb::MultiCfIteratorImpl::Seek(rocksdb::Slice const&)::{lambda(rocksdb::Iterator*)https://github.com/facebook/rocksdb/issues/2}) (/data/users/jewoongh/rocksdb/librocksdb.so.9.4+0x16fccf0) https://github.com/facebook/rocksdb/issues/11 0x7f73320f1a93 in rocksdb::MultiCfIteratorImpl::Seek(rocksdb::Slice const&) db/multi_cf_iterator_impl.h:73 https://github.com/facebook/rocksdb/issues/12 0x7f73320f1dbe in rocksdb::MultiCfIteratorImpl::Next()::{lambda()https://github.com/facebook/rocksdb/issues/1}::operator()() const db/multi_cf_iterator_impl.h:90 https://github.com/facebook/rocksdb/issues/13 0x7f73320fe159 in rocksdb::BinaryHeap<rocksdb::MultiCfIteratorInfo, rocksdb::MultiCfIteratorImpl::MultiCfHeapItemComparator<std::greater<int> > >& rocksdb::MultiCfIteratorImpl::GetHeap<rocksdb::BinaryHeap<rocksdb::MultiCfIteratorInfo, rocksdb::MultiCfIteratorImpl::MultiCfHeapItemComparator<std::greater<int> > >, rocksdb::MultiCfIteratorImpl::Next()::{lambda()https://github.com/facebook/rocksdb/issues/1}>(rocksdb::MultiCfIteratorImpl::Next()::{lambda()https://github.com/facebook/rocksdb/issues/1}) (/data/users/jewoongh/rocksdb/librocksdb.so.9.4+0x16fe159) https://github.com/facebook/rocksdb/issues/14 0x7f73320f1ec9 in rocksdb::MultiCfIteratorImpl::Next() db/multi_cf_iterator_impl.h:87 https://github.com/facebook/rocksdb/issues/15 0x7f73320f3255 in rocksdb::CoalescingIterator::Next() db/coalescing_iterator.h:34 https://github.com/facebook/rocksdb/issues/16 0x66f28a in TestIterateImpl<rocksdb::Iterator, rocksdb::StressTest::TestIterate(rocksdb::ThreadState*, const rocksdb::ReadOptions&, const std::vector<int>&, const std::vector<long int>&)::<lambda(const rocksdb::ReadOptions&)>, rocksdb::StressTest::TestIterate(rocksdb::ThreadState*, const rocksdb::ReadOptions&, const std::vector<int>&, const std::vector<long int>&)::<lambda(rocksdb::Iterator*)> > db_stress_tool/db_stress_test_base.cc:1718 https://github.com/facebook/rocksdb/issues/17 0x6440b4 in rocksdb::StressTest::TestIterate(rocksdb::ThreadState*, rocksdb::ReadOptions const&, std::vector<int, std::allocator<int> > const&, std::vector<long, std::allocator<long> > const&) db_stress_tool/db_stress_test_base.cc:1504 https://github.com/facebook/rocksdb/issues/18 0x640cb0 in rocksdb::StressTest::OperateDb(rocksdb::ThreadState*) db_stress_tool/db_stress_test_base.cc:1376 https://github.com/facebook/rocksdb/issues/19 0x6004f6 in rocksdb::ThreadBody(void*) db_stress_tool/db_stress_driver.cc:39 https://github.com/facebook/rocksdb/issues/20 0x7f73327caed4 in StartThreadWrapper env/env_posix.cc:469 https://github.com/facebook/rocksdb/issues/21 0x7f733029abc8 in start_thread /home/engshare/third-party2/glibc/2.34/src/glibc-2.34/nptl/pthread_create.c:434 previously allocated by thread T57 here: #0 0x7f73346d13b7 in operator new[](unsigned long) /home/engshare/third-party2/gcc/11.x/src/gcc-11.x/libsanitizer/asan/asan_new_delete.cpp:102 https://github.com/facebook/rocksdb/issues/1 0x7f73323146c5 in rocksdb::IterKey::EnlargeBuffer(unsigned long) db/dbformat.cc:279 https://github.com/facebook/rocksdb/issues/2 0x7f7331f33031 in rocksdb::IterKey::EnlargeBufferIfNeeded(unsigned long) db/dbformat.h:846 https://github.com/facebook/rocksdb/issues/3 0x7f7331f65ee0 in rocksdb::IterKey::SetInternalKey(rocksdb::Slice const&, rocksdb::Slice const&, unsigned long, rocksdb::ValueType, rocksdb::Slice const*) db/dbformat.h:757 https://github.com/facebook/rocksdb/issues/4 0x7f7331f661ee in rocksdb::IterKey::SetInternalKey(rocksdb::Slice const&, unsigned long, rocksdb::ValueType, rocksdb::Slice const*) db/dbformat.h:776 https://github.com/facebook/rocksdb/issues/5 0x7f7332303e1e in rocksdb::DBIter::SetSavedKeyToSeekForPrevTarget(rocksdb::Slice const&) db/db_iter.cc:1479 https://github.com/facebook/rocksdb/issues/6 0x7f7332306302 in rocksdb::DBIter::SeekForPrev(rocksdb::Slice const&) db/db_iter.cc:1615 https://github.com/facebook/rocksdb/issues/7 0x7f7331d94b0f in rocksdb::ArenaWrappedDBIter::SeekForPrev(rocksdb::Slice const&) (/data/users/jewoongh/rocksdb/librocksdb.so.9.4+0x1394b0f) https://github.com/facebook/rocksdb/issues/8 0x7f73320f1c5a in rocksdb::MultiCfIteratorImpl::SeekForPrev(rocksdb::Slice const&)::{lambda(rocksdb::Iterator*)https://github.com/facebook/rocksdb/issues/2}::operator()(rocksdb::Iterator*) const db/multi_cf_iterator_impl.h:82 https://github.com/facebook/rocksdb/issues/9 0x7f73320fdc1e in void rocksdb::MultiCfIteratorImpl::SeekCommon<rocksdb::BinaryHeap<rocksdb::MultiCfIteratorInfo, rocksdb::MultiCfIteratorImpl::MultiCfHeapItemComparator<std::less<int> > >, rocksdb::MultiCfIteratorImpl::SeekForPrev(rocksdb::Slice const&)::{lambda(rocksdb::Iterator*)https://github.com/facebook/rocksdb/issues/2}>(rocksdb::BinaryHeap<rocksdb::MultiCfIteratorInfo, rocksdb::MultiCfIteratorImpl::MultiCfHeapItemComparator<std::less<int> > >&, rocksdb::MultiCfIteratorImpl::SeekForPrev(rocksdb::Slice const&)::{lambda(rocksdb::Iterator*)https://github.com/facebook/rocksdb/issues/2}) (/data/users/jewoongh/rocksdb/librocksdb.so.9.4+0x16fdc1e) https://github.com/facebook/rocksdb/issues/10 0x7f73320f1c9b in rocksdb::MultiCfIteratorImpl::SeekForPrev(rocksdb::Slice const&) db/multi_cf_iterator_impl.h:81 https://github.com/facebook/rocksdb/issues/11 0x7f73320f2002 in rocksdb::MultiCfIteratorImpl::Prev()::{lambda()https://github.com/facebook/rocksdb/issues/1}::operator()() const db/multi_cf_iterator_impl.h:99 https://github.com/facebook/rocksdb/issues/12 0x7f73320ff223 in rocksdb::BinaryHeap<rocksdb::MultiCfIteratorInfo, rocksdb::MultiCfIteratorImpl::MultiCfHeapItemComparator<std::less<int> > >& rocksdb::MultiCfIteratorImpl::GetHeap<rocksdb::BinaryHeap<rocksdb::MultiCfIteratorInfo, rocksdb::MultiCfIteratorImpl::MultiCfHeapItemComparator<std::less<int> > >, rocksdb::MultiCfIteratorImpl::Prev()::{lambda()https://github.com/facebook/rocksdb/issues/1}>(rocksdb::MultiCfIteratorImpl::Prev()::{lambda()https://github.com/facebook/rocksdb/issues/1}) (/data/users/jewoongh/rocksdb/librocksdb.so.9.4+0x16ff223) https://github.com/facebook/rocksdb/issues/13 0x7f73320f210d in rocksdb::MultiCfIteratorImpl::Prev() db/multi_cf_iterator_impl.h:96 https://github.com/facebook/rocksdb/issues/14 0x7f73320f3275 in rocksdb::CoalescingIterator::Prev() db/coalescing_iterator.h:35 https://github.com/facebook/rocksdb/issues/15 0x66f440 in TestIterateImpl<rocksdb::Iterator, rocksdb::StressTest::TestIterate(rocksdb::ThreadState*, const rocksdb::ReadOptions&, const std::vector<int>&, const std::vector<long int>&)::<lambda(const rocksdb::ReadOptions&)>, rocksdb::StressTest::TestIterate(rocksdb::ThreadState*, const rocksdb::ReadOptions&, const std::vector<int>&, const std::vector<long int>&)::<lambda(rocksdb::Iterator*)> > db_stress_tool/db_stress_test_base.cc:1725 https://github.com/facebook/rocksdb/issues/16 0x6440b4 in rocksdb::StressTest::TestIterate(rocksdb::ThreadState*, rocksdb::ReadOptions const&, std::vector<int, std::allocator<int> > const&, std::vector<long, std::allocator<long> > const&) db_stress_tool/db_stress_test_base.cc:1504 https://github.com/facebook/rocksdb/issues/17 0x640cb0 in rocksdb::StressTest::OperateDb(rocksdb::ThreadState*) db_stress_tool/db_stress_test_base.cc:1376 https://github.com/facebook/rocksdb/issues/18 0x6004f6 in rocksdb::ThreadBody(void*) db_stress_tool/db_stress_driver.cc:39 https://github.com/facebook/rocksdb/issues/19 0x7f73327caed4 in StartThreadWrapper env/env_posix.cc:469 https://github.com/facebook/rocksdb/issues/20 0x7f733029abc8 in start_thread /home/engshare/third-party2/glibc/2.34/src/glibc-2.34/nptl/pthread_create.c:434 Thread T57 created by T0 here: #0 0x7f7334642136 in __interceptor_pthread_create /home/engshare/third-party2/gcc/11.x/src/gcc-11.x/libsanitizer/asan/asan_interceptors.cpp:216 https://github.com/facebook/rocksdb/issues/1 0x7f73327cb008 in StartThread env/env_posix.cc:479 https://github.com/facebook/rocksdb/issues/2 0x7f733276b406 in rocksdb::CompositeEnvWrapper::StartThread(void (*)(void*), void*) env/composite_env_wrapper.h:316 https://github.com/facebook/rocksdb/issues/3 0x7f733276b406 in rocksdb::CompositeEnvWrapper::StartThread(void (*)(void*), void*) env/composite_env_wrapper.h:316 https://github.com/facebook/rocksdb/issues/4 0x6013d9 in rocksdb::RunStressTestImpl(rocksdb::SharedState*) db_stress_tool/db_stress_driver.cc:108 https://github.com/facebook/rocksdb/issues/5 0x603083 in rocksdb::RunStressTest(rocksdb::SharedState*) db_stress_tool/db_stress_driver.cc:248 https://github.com/facebook/rocksdb/issues/6 0x4e6ab3 in rocksdb::db_stress_tool(int, char**) db_stress_tool/db_stress_tool.cc:365 https://github.com/facebook/rocksdb/issues/7 0x4e260a in main db_stress_tool/db_stress.cc:23 https://github.com/facebook/rocksdb/issues/8 0x7f733022c656 in __libc_start_call_main ../sysdeps/nptl/libc_start_call_main.h:58 https://github.com/facebook/rocksdb/issues/9 0x7f733022c717 in __libc_start_main_impl ../csu/libc-start.c:409 https://github.com/facebook/rocksdb/issues/10 0x4e2530 in _start (/data/users/jewoongh/rocksdb/db_stress+0x4e2530) ``` `heap-use-after-free` was no longer happening with the same command after making the change. Reviewed By: pdillinger Differential Revision: D58871081 Pulled By: jaykorean fbshipit-source-id: 0194c34ffec5f16a6556c6bf3941a27253a4ecb4
2024-06-21 18:56:10 +00:00
std::string target(key().data(), key().size());
MultiCFIterator Refactor - CoalescingIterator & AttributeGroupIterator (#12480) Summary: There are a couple of reasons to modify the current implementation of the MultiCfIterator, which implements the generic `Iterator` interface. - The default behavior of `value()`/`columns()` returning data from different Column Families for different keys can be prone to errors, even though there might be valid use cases where users do not care about the origin of the value/columns. - The `attribute_groups()` API, which is not yet implemented, will not be useful for a single-CF iterator. In this PR, we are implementing the following changes: - `IteratorBase` introduced, which includes all basic iterator functions except `value()` and `columns()`. - `Iterator`, which now inherits from `IteratorBase`, includes `value()` and `columns()`. - New public interface `AttributeGroupIterator` inherits from `IteratorBase` and additionally includes `attribute_groups()` (to be implemented). - Renamed former `MultiCfIterator` to `CoalescingIterator` which inherits from `Iterator` - Existing MultiCfIteratorTest has been split into two - `CoalescingIteratorTest` and `AttributeGroupIteratorTest`. - Moved AttributeGroup related code from `wide_columns.h` to a new file, `attribute_groups.h`. Some Implementation Details - `MultiCfIteratorImpl` takes two functions - `populate_func` and `reset_func` and use them to populate `value_` and `columns_` in CoalescingIterator and `attribute_groups_` in AttributeGroupIterator. In CoalescingIterator, populate_func is `Coalesce()`, in AttributeGroupIterator populate_func is `AddToAttributeGroups()`. `reset_func` clears populated value_, columns_ and attribute_groups_ accordingly. - `Coalesce()` merge sorts columns from multiple CFs when a key exists in more than on CFs. column that appears in later CF overwrites the prior ones. For example, if CF1 has `"key_1" ==> {"col_1": "foo", "col_2", "baz"}` and CF2 has `"key_1" ==> {"col_2": "quux", "col_3", "bla"}`, and when the iterator is at `key_1`, `columns()` will return `{"col_1": "foo", "col_2", "quux", "col_3", "bla"}` In this example, `value()` will be empty, because none of them have values for `kDefaultColumnName` Pull Request resolved: https://github.com/facebook/rocksdb/pull/12480 Test Plan: ## Unit Test ``` ./multi_cf_iterator_test ``` ## Performance Test To make sure this change does not impact existing `Iterator` performance **Build** ``` $> make -j64 release ``` **Setup** ``` $> TEST_TMPDIR=/dev/shm/db_bench ./db_bench -benchmarks="filluniquerandom" -key_size=32 -value_size=512 -num=1000000 -compression_type=none ``` **Run** ``` TEST_TMPDIR=/dev/shm/db_bench ./db_bench -use_existing_db=1 -benchmarks="newiterator,seekrandom" -cache_size=10485760000 ``` **Before the change** ``` DB path: [/dev/shm/db_bench/dbbench] newiterator : 0.519 micros/op 1927904 ops/sec 0.519 seconds 1000000 operations; DB path: [/dev/shm/db_bench/dbbench] seekrandom : 5.302 micros/op 188589 ops/sec 5.303 seconds 1000000 operations; (0 of 1000000 found) ``` **After the change** ``` DB path: [/dev/shm/db_bench/dbbench] newiterator : 0.497 micros/op 2011012 ops/sec 0.497 seconds 1000000 operations; DB path: [/dev/shm/db_bench/dbbench] seekrandom : 5.252 micros/op 190405 ops/sec 5.252 seconds 1000000 operations; (0 of 1000000 found) ``` Reviewed By: ltamasi Differential Revision: D55353909 Pulled By: jaykorean fbshipit-source-id: 8d7786ffee09e022261ce34aa60e8633685e1946
2024-04-11 18:34:04 +00:00
InitMinHeap();
Seek(target);
});
AdvanceIterator(min_heap, [](Iterator* iter) { iter->Next(); });
}
void Prev() {
assert(Valid());
auto& max_heap = GetHeap<MultiCfMaxHeap>([this]() {
Fix heap-use-after-free in MultiCfIteratorImpl (#12784) Summary: # Summary When changing the direction of the multi-cf-iter, we do this by `Seek(current_key)` (if changing from backward to forward) or `SeekForPrev(current_key)` (if forward -> backward) in the child iters and rebuild the heap. `Slice target` is just a pointer and contents are not guaranteed to be the same after re-init the heap. Pull Request resolved: https://github.com/facebook/rocksdb/pull/12784 Test Plan: I was able to steadily repro by building with `COMPILE_WITH_ASAN=1` running db_stress. ``` COMPILE_WITH_ASAN=1 make -j64 dbg ``` ``` ./db_stress --WAL_size_limit_MB=1 --WAL_ttl_seconds=60 --acquire_snapshot_one_in=10000 --adaptive_readahead=0 --adm_policy=2 --advise_random_on_open=1 --allow_data_in_errors=True --allow_fallocate=0 --async_io=0 --auto_readahead_size=1 --avoid_flush_during_recovery=0 --avoid_flush_during_shutdown=1 --avoid_unnecessary_blocking_io=1 --backup_max_size=104857600 --backup_one_in=1000 --batch_protection_bytes_per_key=0 --bgerror_resume_retry_interval=100 --block_align=1 --block_protection_bytes_per_key=8 --block_size=16384 --bloom_before_level=2147483646 --bloom_bits=62.9095874568401 --bottommost_compression_type=none --bottommost_file_compaction_delay=600 --bytes_per_sync=0 --cache_index_and_filter_blocks=0 --cache_index_and_filter_blocks_with_high_priority=0 --cache_size=33554432 --cache_type=lru_cache --charge_compression_dictionary_building_buffer=0 --charge_file_metadata=1 --charge_filter_construction=1 --charge_table_reader=0 --check_multiget_consistency=0 --check_multiget_entity_consistency=0 --checkpoint_one_in=10000 --checksum_type=kxxHash64 --clear_column_family_one_in=0 --compact_files_one_in=1000000 --compact_range_one_in=1000000 --compaction_pri=1 --compaction_readahead_size=0 --compaction_ttl=100 --compress_format_version=2 --compressed_secondary_cache_size=8388608 --compression_checksum=1 --compression_max_dict_buffer_bytes=1099511627775 --compression_max_dict_bytes=16384 --compression_parallel_threads=1 --compression_type=none --compression_use_zstd_dict_trainer=1 --compression_zstd_max_train_bytes=0 --continuous_verification_interval=0 --daily_offpeak_time_utc= --data_block_index_type=1 --db=/dev/shm/rocksdb_test/rocksdb_crashtest_whitebox --db_write_buffer_size=0 --default_temperature=kUnknown --default_write_temperature=kWarm --delete_obsolete_files_period_micros=21600000000 --delpercent=4 --delrangepercent=1 --destroy_db_initially=0 --detect_filter_construct_corruption=1 --disable_file_deletions_one_in=1000000 --disable_manual_compaction_one_in=10000 --disable_wal=0 --dump_malloc_stats=1 --enable_checksum_handoff=0 --enable_compaction_filter=0 --enable_custom_split_merge=1 --enable_do_not_compress_roles=0 --enable_index_compression=0 --enable_memtable_insert_with_hint_prefix_extractor=0 --enable_pipelined_write=1 --enable_sst_partitioner_factory=1 --enable_thread_tracking=1 --enable_write_thread_adaptive_yield=1 --error_recovery_with_no_fault_injection=0 --expected_values_dir=/dev/shm/rocksdb_test/rocksdb_crashtest_expected --fail_if_options_file_error=0 --fifo_allow_compaction=1 --file_checksum_impl=crc32c --fill_cache=0 --flush_one_in=1000000 --format_version=4 --get_all_column_family_metadata_one_in=1000000 --get_current_wal_file_one_in=0 --get_live_files_apis_one_in=10000 --get_properties_of_all_tables_one_in=1000000 --get_property_one_in=1000000 --get_sorted_wal_files_one_in=0 --hard_pending_compaction_bytes_limit=274877906944 --high_pri_pool_ratio=0 --index_block_restart_interval=4 --index_shortening=1 --index_type=0 --ingest_external_file_one_in=0 --initial_auto_readahead_size=524288 --inplace_update_support=0 --iterpercent=10 --key_len_percent_dist=1,30,69 --key_may_exist_one_in=100000 --kill_random_test=888887 --last_level_temperature=kHot --level_compaction_dynamic_level_bytes=1 --lock_wal_one_in=10000 --log2_keys_per_lock=10 --log_file_time_to_roll=60 --log_readahead_size=0 --long_running_snapshots=1 --low_pri_pool_ratio=0 --lowest_used_cache_tier=0 --manifest_preallocation_size=5120 --manual_wal_flush_one_in=0 --mark_for_compaction_one_file_in=0 --max_auto_readahead_size=16384 --max_background_compactions=20 --max_bytes_for_level_base=10485760 --max_key=100000 --max_key_len=3 --max_log_file_size=0 --max_manifest_file_size=1073741824 --max_sequential_skip_in_iterations=1 --max_total_wal_size=0 --max_write_batch_group_size_bytes=64 --max_write_buffer_number=3 --max_write_buffer_size_to_maintain=0 --memtable_insert_hint_per_batch=0 --memtable_max_range_deletions=0 --memtable_prefix_bloom_size_ratio=0 --memtable_protection_bytes_per_key=8 --memtable_whole_key_filtering=0 --memtablerep=skip_list --metadata_charge_policy=0 --metadata_read_fault_one_in=1000 --metadata_write_fault_one_in=128 --min_write_buffer_number_to_merge=1 --mmap_read=0 --mock_direct_io=True --nooverwritepercent=1 --num_file_reads_for_auto_readahead=1 --open_files=-1 --open_metadata_read_fault_one_in=0 --open_metadata_write_fault_one_in=0 --open_read_fault_one_in=0 --open_write_fault_one_in=16 --ops_per_thread=20000000 --optimize_filters_for_hits=0 --optimize_filters_for_memory=0 --optimize_multiget_for_io=0 --paranoid_file_checks=1 --partition_filters=0 --partition_pinning=3 --pause_background_one_in=1000000 --periodic_compaction_seconds=0 --persist_user_defined_timestamps=1 --prefix_size=-1 --prefixpercent=0 --prepopulate_block_cache=1 --preserve_internal_time_seconds=36000 --progress_reports=0 --promote_l0_one_in=0 --read_amp_bytes_per_bit=0 --read_fault_one_in=0 --readahead_size=0 --readpercent=50 --recycle_log_file_num=0 --reopen=20 --report_bg_io_stats=1 --reset_stats_one_in=10000 --sample_for_compression=0 --secondary_cache_fault_one_in=0 --secondary_cache_uri= --skip_stats_update_on_db_open=0 --snapshot_hold_ops=100000 --soft_pending_compaction_bytes_limit=68719476736 --sqfc_name=bar --sqfc_version=0 --sst_file_manager_bytes_per_sec=0 --sst_file_manager_bytes_per_truncate=0 --stats_dump_period_sec=10 --stats_history_buffer_size=1048576 --strict_bytes_per_sync=0 --subcompactions=1 --sync=0 --sync_fault_injection=1 --table_cache_numshardbits=6 --target_file_size_base=2097152 --target_file_size_multiplier=2 --test_batches_snapshots=0 --test_cf_consistency=0 --top_level_index_pinning=0 --uncache_aggressiveness=14 --universal_max_read_amp=-1 --unpartitioned_pinning=2 --use_adaptive_mutex=1 --use_adaptive_mutex_lru=0 --use_attribute_group=0 --use_delta_encoding=1 --use_direct_io_for_flush_and_compaction=0 --use_direct_reads=1 --use_full_merge_v1=0 --use_get_entity=1 --use_merge=0 --use_multi_cf_iterator=1 --use_multi_get_entity=1 --use_multiget=1 --use_put_entity_one_in=0 --use_sqfc_for_range_queries=1 --use_timed_put_one_in=0 --use_txn=0 --use_write_buffer_manager=0 --user_timestamp_size=8 --value_size_mult=32 --verification_only=0 --verify_checksum=1 --verify_checksum_one_in=1000000 --verify_compression=1 --verify_db_one_in=10000 --verify_file_checksums_one_in=1000 --verify_iterator_with_expected_state_one_in=5 --verify_sst_unique_id_in_manifest=1 --wal_bytes_per_sync=0 --wal_compression=zstd --write_buffer_size=4194304 --write_dbid_to_manifest=1 --write_fault_one_in=0 --writepercent=35 ``` ``` ==1606272==ERROR: AddressSanitizer: heap-use-after-free on address 0x6060000b0cc0 at pc 0x7f733469c7de bp 0x7f7311bfcfe0 sp 0x7f7311bfc790 READ of size 40 at 0x6060000b0cc0 thread T57 #0 0x7f733469c7dd in __interceptor_memcpy /home/engshare/third-party2/gcc/11.x/src/gcc-11.x/libsanitizer/sanitizer_common/sanitizer_common_interceptors.inc:827 https://github.com/facebook/rocksdb/issues/1 0x7f7331f65f7e in rocksdb::IterKey::SetInternalKey(rocksdb::Slice const&, rocksdb::Slice const&, unsigned long, rocksdb::ValueType, rocksdb::Slice const*) db/dbformat.h:761 https://github.com/facebook/rocksdb/issues/2 0x7f7331f661ee in rocksdb::IterKey::SetInternalKey(rocksdb::Slice const&, unsigned long, rocksdb::ValueType, rocksdb::Slice const*) db/dbformat.h:776 https://github.com/facebook/rocksdb/issues/3 0x7f73323039ff in rocksdb::DBIter::SetSavedKeyToSeekTarget(rocksdb::Slice const&) db/db_iter.cc:1462 https://github.com/facebook/rocksdb/issues/4 0x7f7332304eb8 in rocksdb::DBIter::Seek(rocksdb::Slice const&) db/db_iter.cc:1540 https://github.com/facebook/rocksdb/issues/5 0x7f7331d94abd in rocksdb::ArenaWrappedDBIter::Seek(rocksdb::Slice const&) (/data/users/jewoongh/rocksdb/librocksdb.so.9.4+0x1394abd) https://github.com/facebook/rocksdb/issues/6 0x7f73320f1a52 in rocksdb::MultiCfIteratorImpl::Seek(rocksdb::Slice const&)::{lambda(rocksdb::Iterator*)https://github.com/facebook/rocksdb/issues/2}::operator()(rocksdb::Iterator*) const db/multi_cf_iterator_impl.h:73 https://github.com/facebook/rocksdb/issues/7 0x7f73320fccf0 in void rocksdb::MultiCfIteratorImpl::SeekCommon<rocksdb::BinaryHeap<rocksdb::MultiCfIteratorInfo, rocksdb::MultiCfIteratorImpl::MultiCfHeapItemComparator<std::greater<int> > >, rocksdb::MultiCfIteratorImpl::Seek(rocksdb::Slice const&)::{lambda(rocksdb::Iterator*)https://github.com/facebook/rocksdb/issues/2}>(rocksdb::BinaryHeap<rocksdb::MultiCfIteratorInfo, rocksdb::MultiCfIteratorImpl::MultiCfHeapItemComparator<std::greater<int> > >&, rocksdb::MultiCfIteratorImpl::Seek(rocksdb::Slice const&)::{lambda(rocksdb::Iterator*)https://github.com/facebook/rocksdb/issues/2}) (/data/users/jewoongh/rocksdb/librocksdb.so.9.4+0x16fccf0) https://github.com/facebook/rocksdb/issues/8 0x7f73320f1a93 in rocksdb::MultiCfIteratorImpl::Seek(rocksdb::Slice const&) db/multi_cf_iterator_impl.h:73 https://github.com/facebook/rocksdb/issues/9 0x7f73320f1dbe in rocksdb::MultiCfIteratorImpl::Next()::{lambda()https://github.com/facebook/rocksdb/issues/1}::operator()() const db/multi_cf_iterator_impl.h:90 https://github.com/facebook/rocksdb/issues/10 0x7f73320fe159 in rocksdb::BinaryHeap<rocksdb::MultiCfIteratorInfo, rocksdb::MultiCfIteratorImpl::MultiCfHeapItemComparator<std::greater<int> > >& rocksdb::MultiCfIteratorImpl::GetHeap<rocksdb::BinaryHeap<rocksdb::MultiCfIteratorInfo, rocksdb::MultiCfIteratorImpl::MultiCfHeapItemComparator<std::greater<int> > >, rocksdb::MultiCfIteratorImpl::Next()::{lambda()https://github.com/facebook/rocksdb/issues/1}>(rocksdb::MultiCfIteratorImpl::Next()::{lambda()https://github.com/facebook/rocksdb/issues/1}) (/data/users/jewoongh/rocksdb/librocksdb.so.9.4+0x16fe159) https://github.com/facebook/rocksdb/issues/11 0x7f73320f1ec9 in rocksdb::MultiCfIteratorImpl::Next() db/multi_cf_iterator_impl.h:87 https://github.com/facebook/rocksdb/issues/12 0x7f73320f3255 in rocksdb::CoalescingIterator::Next() db/coalescing_iterator.h:34 https://github.com/facebook/rocksdb/issues/13 0x66f28a in TestIterateImpl<rocksdb::Iterator, rocksdb::StressTest::TestIterate(rocksdb::ThreadState*, const rocksdb::ReadOptions&, const std::vector<int>&, const std::vector<long int>&)::<lambda(const rocksdb::ReadOptions&)>, rocksdb::StressTest::TestIterate(rocksdb::ThreadState*, const rocksdb::ReadOptions&, const std::vector<int>&, const std::vector<long int>&)::<lambda(rocksdb::Iterator*)> > db_stress_tool/db_stress_test_base.cc:1718 https://github.com/facebook/rocksdb/issues/14 0x6440b4 in rocksdb::StressTest::TestIterate(rocksdb::ThreadState*, rocksdb::ReadOptions const&, std::vector<int, std::allocator<int> > const&, std::vector<long, std::allocator<long> > const&) db_stress_tool/db_stress_test_base.cc:1504 https://github.com/facebook/rocksdb/issues/15 0x640cb0 in rocksdb::StressTest::OperateDb(rocksdb::ThreadState*) db_stress_tool/db_stress_test_base.cc:1376 https://github.com/facebook/rocksdb/issues/16 0x6004f6 in rocksdb::ThreadBody(void*) db_stress_tool/db_stress_driver.cc:39 https://github.com/facebook/rocksdb/issues/17 0x7f73327caed4 in StartThreadWrapper env/env_posix.cc:469 https://github.com/facebook/rocksdb/issues/18 0x7f733029abc8 in start_thread /home/engshare/third-party2/glibc/2.34/src/glibc-2.34/nptl/pthread_create.c:434 https://github.com/facebook/rocksdb/issues/19 0x7f733032cf5b in __GI___clone3 (/usr/local/fbcode/platform010/lib/libc.so.6+0x12cf5b) 0x6060000b0cc0 is located 0 bytes inside of 55-byte region [0x6060000b0cc0,0x6060000b0cf7) freed by thread T57 here: #0 0x7f73346d1d77 in operator delete[](void*) /home/engshare/third-party2/gcc/11.x/src/gcc-11.x/libsanitizer/asan/asan_new_delete.cpp:163 https://github.com/facebook/rocksdb/issues/1 0x7f7331d9274b in rocksdb::IterKey::ResetBuffer() db/dbformat.h:830 https://github.com/facebook/rocksdb/issues/2 0x7f73323146b9 in rocksdb::IterKey::EnlargeBuffer(unsigned long) db/dbformat.cc:278 https://github.com/facebook/rocksdb/issues/3 0x7f7331f33031 in rocksdb::IterKey::EnlargeBufferIfNeeded(unsigned long) db/dbformat.h:846 https://github.com/facebook/rocksdb/issues/4 0x7f7331f65ee0 in rocksdb::IterKey::SetInternalKey(rocksdb::Slice const&, rocksdb::Slice const&, unsigned long, rocksdb::ValueType, rocksdb::Slice const*) db/dbformat.h:757 https://github.com/facebook/rocksdb/issues/5 0x7f7331f661ee in rocksdb::IterKey::SetInternalKey(rocksdb::Slice const&, unsigned long, rocksdb::ValueType, rocksdb::Slice const*) db/dbformat.h:776 https://github.com/facebook/rocksdb/issues/6 0x7f73323039ff in rocksdb::DBIter::SetSavedKeyToSeekTarget(rocksdb::Slice const&) db/db_iter.cc:1462 https://github.com/facebook/rocksdb/issues/7 0x7f7332304eb8 in rocksdb::DBIter::Seek(rocksdb::Slice const&) db/db_iter.cc:1540 https://github.com/facebook/rocksdb/issues/8 0x7f7331d94abd in rocksdb::ArenaWrappedDBIter::Seek(rocksdb::Slice const&) (/data/users/jewoongh/rocksdb/librocksdb.so.9.4+0x1394abd) https://github.com/facebook/rocksdb/issues/9 0x7f73320f1a52 in rocksdb::MultiCfIteratorImpl::Seek(rocksdb::Slice const&)::{lambda(rocksdb::Iterator*)https://github.com/facebook/rocksdb/issues/2}::operator()(rocksdb::Iterator*) const db/multi_cf_iterator_impl.h:73 https://github.com/facebook/rocksdb/issues/10 0x7f73320fccf0 in void rocksdb::MultiCfIteratorImpl::SeekCommon<rocksdb::BinaryHeap<rocksdb::MultiCfIteratorInfo, rocksdb::MultiCfIteratorImpl::MultiCfHeapItemComparator<std::greater<int> > >, rocksdb::MultiCfIteratorImpl::Seek(rocksdb::Slice const&)::{lambda(rocksdb::Iterator*)https://github.com/facebook/rocksdb/issues/2}>(rocksdb::BinaryHeap<rocksdb::MultiCfIteratorInfo, rocksdb::MultiCfIteratorImpl::MultiCfHeapItemComparator<std::greater<int> > >&, rocksdb::MultiCfIteratorImpl::Seek(rocksdb::Slice const&)::{lambda(rocksdb::Iterator*)https://github.com/facebook/rocksdb/issues/2}) (/data/users/jewoongh/rocksdb/librocksdb.so.9.4+0x16fccf0) https://github.com/facebook/rocksdb/issues/11 0x7f73320f1a93 in rocksdb::MultiCfIteratorImpl::Seek(rocksdb::Slice const&) db/multi_cf_iterator_impl.h:73 https://github.com/facebook/rocksdb/issues/12 0x7f73320f1dbe in rocksdb::MultiCfIteratorImpl::Next()::{lambda()https://github.com/facebook/rocksdb/issues/1}::operator()() const db/multi_cf_iterator_impl.h:90 https://github.com/facebook/rocksdb/issues/13 0x7f73320fe159 in rocksdb::BinaryHeap<rocksdb::MultiCfIteratorInfo, rocksdb::MultiCfIteratorImpl::MultiCfHeapItemComparator<std::greater<int> > >& rocksdb::MultiCfIteratorImpl::GetHeap<rocksdb::BinaryHeap<rocksdb::MultiCfIteratorInfo, rocksdb::MultiCfIteratorImpl::MultiCfHeapItemComparator<std::greater<int> > >, rocksdb::MultiCfIteratorImpl::Next()::{lambda()https://github.com/facebook/rocksdb/issues/1}>(rocksdb::MultiCfIteratorImpl::Next()::{lambda()https://github.com/facebook/rocksdb/issues/1}) (/data/users/jewoongh/rocksdb/librocksdb.so.9.4+0x16fe159) https://github.com/facebook/rocksdb/issues/14 0x7f73320f1ec9 in rocksdb::MultiCfIteratorImpl::Next() db/multi_cf_iterator_impl.h:87 https://github.com/facebook/rocksdb/issues/15 0x7f73320f3255 in rocksdb::CoalescingIterator::Next() db/coalescing_iterator.h:34 https://github.com/facebook/rocksdb/issues/16 0x66f28a in TestIterateImpl<rocksdb::Iterator, rocksdb::StressTest::TestIterate(rocksdb::ThreadState*, const rocksdb::ReadOptions&, const std::vector<int>&, const std::vector<long int>&)::<lambda(const rocksdb::ReadOptions&)>, rocksdb::StressTest::TestIterate(rocksdb::ThreadState*, const rocksdb::ReadOptions&, const std::vector<int>&, const std::vector<long int>&)::<lambda(rocksdb::Iterator*)> > db_stress_tool/db_stress_test_base.cc:1718 https://github.com/facebook/rocksdb/issues/17 0x6440b4 in rocksdb::StressTest::TestIterate(rocksdb::ThreadState*, rocksdb::ReadOptions const&, std::vector<int, std::allocator<int> > const&, std::vector<long, std::allocator<long> > const&) db_stress_tool/db_stress_test_base.cc:1504 https://github.com/facebook/rocksdb/issues/18 0x640cb0 in rocksdb::StressTest::OperateDb(rocksdb::ThreadState*) db_stress_tool/db_stress_test_base.cc:1376 https://github.com/facebook/rocksdb/issues/19 0x6004f6 in rocksdb::ThreadBody(void*) db_stress_tool/db_stress_driver.cc:39 https://github.com/facebook/rocksdb/issues/20 0x7f73327caed4 in StartThreadWrapper env/env_posix.cc:469 https://github.com/facebook/rocksdb/issues/21 0x7f733029abc8 in start_thread /home/engshare/third-party2/glibc/2.34/src/glibc-2.34/nptl/pthread_create.c:434 previously allocated by thread T57 here: #0 0x7f73346d13b7 in operator new[](unsigned long) /home/engshare/third-party2/gcc/11.x/src/gcc-11.x/libsanitizer/asan/asan_new_delete.cpp:102 https://github.com/facebook/rocksdb/issues/1 0x7f73323146c5 in rocksdb::IterKey::EnlargeBuffer(unsigned long) db/dbformat.cc:279 https://github.com/facebook/rocksdb/issues/2 0x7f7331f33031 in rocksdb::IterKey::EnlargeBufferIfNeeded(unsigned long) db/dbformat.h:846 https://github.com/facebook/rocksdb/issues/3 0x7f7331f65ee0 in rocksdb::IterKey::SetInternalKey(rocksdb::Slice const&, rocksdb::Slice const&, unsigned long, rocksdb::ValueType, rocksdb::Slice const*) db/dbformat.h:757 https://github.com/facebook/rocksdb/issues/4 0x7f7331f661ee in rocksdb::IterKey::SetInternalKey(rocksdb::Slice const&, unsigned long, rocksdb::ValueType, rocksdb::Slice const*) db/dbformat.h:776 https://github.com/facebook/rocksdb/issues/5 0x7f7332303e1e in rocksdb::DBIter::SetSavedKeyToSeekForPrevTarget(rocksdb::Slice const&) db/db_iter.cc:1479 https://github.com/facebook/rocksdb/issues/6 0x7f7332306302 in rocksdb::DBIter::SeekForPrev(rocksdb::Slice const&) db/db_iter.cc:1615 https://github.com/facebook/rocksdb/issues/7 0x7f7331d94b0f in rocksdb::ArenaWrappedDBIter::SeekForPrev(rocksdb::Slice const&) (/data/users/jewoongh/rocksdb/librocksdb.so.9.4+0x1394b0f) https://github.com/facebook/rocksdb/issues/8 0x7f73320f1c5a in rocksdb::MultiCfIteratorImpl::SeekForPrev(rocksdb::Slice const&)::{lambda(rocksdb::Iterator*)https://github.com/facebook/rocksdb/issues/2}::operator()(rocksdb::Iterator*) const db/multi_cf_iterator_impl.h:82 https://github.com/facebook/rocksdb/issues/9 0x7f73320fdc1e in void rocksdb::MultiCfIteratorImpl::SeekCommon<rocksdb::BinaryHeap<rocksdb::MultiCfIteratorInfo, rocksdb::MultiCfIteratorImpl::MultiCfHeapItemComparator<std::less<int> > >, rocksdb::MultiCfIteratorImpl::SeekForPrev(rocksdb::Slice const&)::{lambda(rocksdb::Iterator*)https://github.com/facebook/rocksdb/issues/2}>(rocksdb::BinaryHeap<rocksdb::MultiCfIteratorInfo, rocksdb::MultiCfIteratorImpl::MultiCfHeapItemComparator<std::less<int> > >&, rocksdb::MultiCfIteratorImpl::SeekForPrev(rocksdb::Slice const&)::{lambda(rocksdb::Iterator*)https://github.com/facebook/rocksdb/issues/2}) (/data/users/jewoongh/rocksdb/librocksdb.so.9.4+0x16fdc1e) https://github.com/facebook/rocksdb/issues/10 0x7f73320f1c9b in rocksdb::MultiCfIteratorImpl::SeekForPrev(rocksdb::Slice const&) db/multi_cf_iterator_impl.h:81 https://github.com/facebook/rocksdb/issues/11 0x7f73320f2002 in rocksdb::MultiCfIteratorImpl::Prev()::{lambda()https://github.com/facebook/rocksdb/issues/1}::operator()() const db/multi_cf_iterator_impl.h:99 https://github.com/facebook/rocksdb/issues/12 0x7f73320ff223 in rocksdb::BinaryHeap<rocksdb::MultiCfIteratorInfo, rocksdb::MultiCfIteratorImpl::MultiCfHeapItemComparator<std::less<int> > >& rocksdb::MultiCfIteratorImpl::GetHeap<rocksdb::BinaryHeap<rocksdb::MultiCfIteratorInfo, rocksdb::MultiCfIteratorImpl::MultiCfHeapItemComparator<std::less<int> > >, rocksdb::MultiCfIteratorImpl::Prev()::{lambda()https://github.com/facebook/rocksdb/issues/1}>(rocksdb::MultiCfIteratorImpl::Prev()::{lambda()https://github.com/facebook/rocksdb/issues/1}) (/data/users/jewoongh/rocksdb/librocksdb.so.9.4+0x16ff223) https://github.com/facebook/rocksdb/issues/13 0x7f73320f210d in rocksdb::MultiCfIteratorImpl::Prev() db/multi_cf_iterator_impl.h:96 https://github.com/facebook/rocksdb/issues/14 0x7f73320f3275 in rocksdb::CoalescingIterator::Prev() db/coalescing_iterator.h:35 https://github.com/facebook/rocksdb/issues/15 0x66f440 in TestIterateImpl<rocksdb::Iterator, rocksdb::StressTest::TestIterate(rocksdb::ThreadState*, const rocksdb::ReadOptions&, const std::vector<int>&, const std::vector<long int>&)::<lambda(const rocksdb::ReadOptions&)>, rocksdb::StressTest::TestIterate(rocksdb::ThreadState*, const rocksdb::ReadOptions&, const std::vector<int>&, const std::vector<long int>&)::<lambda(rocksdb::Iterator*)> > db_stress_tool/db_stress_test_base.cc:1725 https://github.com/facebook/rocksdb/issues/16 0x6440b4 in rocksdb::StressTest::TestIterate(rocksdb::ThreadState*, rocksdb::ReadOptions const&, std::vector<int, std::allocator<int> > const&, std::vector<long, std::allocator<long> > const&) db_stress_tool/db_stress_test_base.cc:1504 https://github.com/facebook/rocksdb/issues/17 0x640cb0 in rocksdb::StressTest::OperateDb(rocksdb::ThreadState*) db_stress_tool/db_stress_test_base.cc:1376 https://github.com/facebook/rocksdb/issues/18 0x6004f6 in rocksdb::ThreadBody(void*) db_stress_tool/db_stress_driver.cc:39 https://github.com/facebook/rocksdb/issues/19 0x7f73327caed4 in StartThreadWrapper env/env_posix.cc:469 https://github.com/facebook/rocksdb/issues/20 0x7f733029abc8 in start_thread /home/engshare/third-party2/glibc/2.34/src/glibc-2.34/nptl/pthread_create.c:434 Thread T57 created by T0 here: #0 0x7f7334642136 in __interceptor_pthread_create /home/engshare/third-party2/gcc/11.x/src/gcc-11.x/libsanitizer/asan/asan_interceptors.cpp:216 https://github.com/facebook/rocksdb/issues/1 0x7f73327cb008 in StartThread env/env_posix.cc:479 https://github.com/facebook/rocksdb/issues/2 0x7f733276b406 in rocksdb::CompositeEnvWrapper::StartThread(void (*)(void*), void*) env/composite_env_wrapper.h:316 https://github.com/facebook/rocksdb/issues/3 0x7f733276b406 in rocksdb::CompositeEnvWrapper::StartThread(void (*)(void*), void*) env/composite_env_wrapper.h:316 https://github.com/facebook/rocksdb/issues/4 0x6013d9 in rocksdb::RunStressTestImpl(rocksdb::SharedState*) db_stress_tool/db_stress_driver.cc:108 https://github.com/facebook/rocksdb/issues/5 0x603083 in rocksdb::RunStressTest(rocksdb::SharedState*) db_stress_tool/db_stress_driver.cc:248 https://github.com/facebook/rocksdb/issues/6 0x4e6ab3 in rocksdb::db_stress_tool(int, char**) db_stress_tool/db_stress_tool.cc:365 https://github.com/facebook/rocksdb/issues/7 0x4e260a in main db_stress_tool/db_stress.cc:23 https://github.com/facebook/rocksdb/issues/8 0x7f733022c656 in __libc_start_call_main ../sysdeps/nptl/libc_start_call_main.h:58 https://github.com/facebook/rocksdb/issues/9 0x7f733022c717 in __libc_start_main_impl ../csu/libc-start.c:409 https://github.com/facebook/rocksdb/issues/10 0x4e2530 in _start (/data/users/jewoongh/rocksdb/db_stress+0x4e2530) ``` `heap-use-after-free` was no longer happening with the same command after making the change. Reviewed By: pdillinger Differential Revision: D58871081 Pulled By: jaykorean fbshipit-source-id: 0194c34ffec5f16a6556c6bf3941a27253a4ecb4
2024-06-21 18:56:10 +00:00
std::string target(key().data(), key().size());
MultiCFIterator Refactor - CoalescingIterator & AttributeGroupIterator (#12480) Summary: There are a couple of reasons to modify the current implementation of the MultiCfIterator, which implements the generic `Iterator` interface. - The default behavior of `value()`/`columns()` returning data from different Column Families for different keys can be prone to errors, even though there might be valid use cases where users do not care about the origin of the value/columns. - The `attribute_groups()` API, which is not yet implemented, will not be useful for a single-CF iterator. In this PR, we are implementing the following changes: - `IteratorBase` introduced, which includes all basic iterator functions except `value()` and `columns()`. - `Iterator`, which now inherits from `IteratorBase`, includes `value()` and `columns()`. - New public interface `AttributeGroupIterator` inherits from `IteratorBase` and additionally includes `attribute_groups()` (to be implemented). - Renamed former `MultiCfIterator` to `CoalescingIterator` which inherits from `Iterator` - Existing MultiCfIteratorTest has been split into two - `CoalescingIteratorTest` and `AttributeGroupIteratorTest`. - Moved AttributeGroup related code from `wide_columns.h` to a new file, `attribute_groups.h`. Some Implementation Details - `MultiCfIteratorImpl` takes two functions - `populate_func` and `reset_func` and use them to populate `value_` and `columns_` in CoalescingIterator and `attribute_groups_` in AttributeGroupIterator. In CoalescingIterator, populate_func is `Coalesce()`, in AttributeGroupIterator populate_func is `AddToAttributeGroups()`. `reset_func` clears populated value_, columns_ and attribute_groups_ accordingly. - `Coalesce()` merge sorts columns from multiple CFs when a key exists in more than on CFs. column that appears in later CF overwrites the prior ones. For example, if CF1 has `"key_1" ==> {"col_1": "foo", "col_2", "baz"}` and CF2 has `"key_1" ==> {"col_2": "quux", "col_3", "bla"}`, and when the iterator is at `key_1`, `columns()` will return `{"col_1": "foo", "col_2", "quux", "col_3", "bla"}` In this example, `value()` will be empty, because none of them have values for `kDefaultColumnName` Pull Request resolved: https://github.com/facebook/rocksdb/pull/12480 Test Plan: ## Unit Test ``` ./multi_cf_iterator_test ``` ## Performance Test To make sure this change does not impact existing `Iterator` performance **Build** ``` $> make -j64 release ``` **Setup** ``` $> TEST_TMPDIR=/dev/shm/db_bench ./db_bench -benchmarks="filluniquerandom" -key_size=32 -value_size=512 -num=1000000 -compression_type=none ``` **Run** ``` TEST_TMPDIR=/dev/shm/db_bench ./db_bench -use_existing_db=1 -benchmarks="newiterator,seekrandom" -cache_size=10485760000 ``` **Before the change** ``` DB path: [/dev/shm/db_bench/dbbench] newiterator : 0.519 micros/op 1927904 ops/sec 0.519 seconds 1000000 operations; DB path: [/dev/shm/db_bench/dbbench] seekrandom : 5.302 micros/op 188589 ops/sec 5.303 seconds 1000000 operations; (0 of 1000000 found) ``` **After the change** ``` DB path: [/dev/shm/db_bench/dbbench] newiterator : 0.497 micros/op 2011012 ops/sec 0.497 seconds 1000000 operations; DB path: [/dev/shm/db_bench/dbbench] seekrandom : 5.252 micros/op 190405 ops/sec 5.252 seconds 1000000 operations; (0 of 1000000 found) ``` Reviewed By: ltamasi Differential Revision: D55353909 Pulled By: jaykorean fbshipit-source-id: 8d7786ffee09e022261ce34aa60e8633685e1946
2024-04-11 18:34:04 +00:00
InitMaxHeap();
SeekForPrev(target);
});
AdvanceIterator(max_heap, [](Iterator* iter) { iter->Prev(); });
}
private:
std::vector<std::pair<ColumnFamilyHandle*, std::unique_ptr<Iterator>>>
cfh_iter_pairs_;
Status status_;
template <typename CompareOp>
class MultiCfHeapItemComparator {
public:
explicit MultiCfHeapItemComparator(const Comparator* comparator)
: comparator_(comparator) {}
bool operator()(const MultiCfIteratorInfo& a,
const MultiCfIteratorInfo& b) const {
assert(a.iterator);
assert(b.iterator);
assert(a.iterator->Valid());
assert(b.iterator->Valid());
int c = comparator_->Compare(a.iterator->key(), b.iterator->key());
assert(c != 0 || a.order != b.order);
return c == 0 ? a.order - b.order > 0 : CompareOp()(c, 0);
}
private:
const Comparator* comparator_;
};
const Comparator* comparator_;
using MultiCfMinHeap =
BinaryHeap<MultiCfIteratorInfo,
MultiCfHeapItemComparator<std::greater<int>>>;
using MultiCfMaxHeap = BinaryHeap<MultiCfIteratorInfo,
MultiCfHeapItemComparator<std::less<int>>>;
using MultiCfIterHeap = std::variant<MultiCfMinHeap, MultiCfMaxHeap>;
MultiCfIterHeap heap_;
std::function<void()> reset_func_;
MultiCfIterator - AttributeGroupIter Impl & CoalescingIter Optimization (#12534) Summary: Continuing from the previous MultiCfIterator Implementations - (https://github.com/facebook/rocksdb/issues/12422, https://github.com/facebook/rocksdb/issues/12480 #12465), this PR completes the `AttributeGroupIterator` by implementing `AttributeGroupIteratorImpl::AddToAttributeGroups()`. While implementing the `AttributeGroupIterator`, we had to make some changes in `MultiCfIteratorImpl` and found an opportunity to improve `Coalesce()` in `CoalescingIterator`. Lifting `UNDER CONSTRUCTION - DO NOT USE` comment by replacing it with `EXPERIMENTAL` Here are some implementation details: - `IteratorAttributeGroups` is introduced to avoid having to copy all `WideColumn` objects during iteration. - `PopulateIterator()` no longer advances non-top iterators that have the same key as the top iterator in the heap. - `AdvanceIterator()` needs to advance the non-top iterators when they have the same key as the top iterator in the heap. - Instead of populating one by one, `PopulateIterator()` now collects all items with the same key and calls `populate_func(items)` at once. - This allowed optimization in `Coalesce()` such that we no longer do K-1 rounds of 2-way merge, but do one K-way merge instead. Pull Request resolved: https://github.com/facebook/rocksdb/pull/12534 Test Plan: Uncommented the assertions in `verifyAttributeGroupIterator()` ``` ./multi_cf_iterator_test ``` Reviewed By: ltamasi Differential Revision: D56089019 Pulled By: jaykorean fbshipit-source-id: 6b0b4247e221f69b40b147d41492008cc9b15054
2024-04-16 15:45:38 +00:00
std::function<void(autovector<MultiCfIteratorInfo>)> populate_func_;
MultiCFIterator Refactor - CoalescingIterator & AttributeGroupIterator (#12480) Summary: There are a couple of reasons to modify the current implementation of the MultiCfIterator, which implements the generic `Iterator` interface. - The default behavior of `value()`/`columns()` returning data from different Column Families for different keys can be prone to errors, even though there might be valid use cases where users do not care about the origin of the value/columns. - The `attribute_groups()` API, which is not yet implemented, will not be useful for a single-CF iterator. In this PR, we are implementing the following changes: - `IteratorBase` introduced, which includes all basic iterator functions except `value()` and `columns()`. - `Iterator`, which now inherits from `IteratorBase`, includes `value()` and `columns()`. - New public interface `AttributeGroupIterator` inherits from `IteratorBase` and additionally includes `attribute_groups()` (to be implemented). - Renamed former `MultiCfIterator` to `CoalescingIterator` which inherits from `Iterator` - Existing MultiCfIteratorTest has been split into two - `CoalescingIteratorTest` and `AttributeGroupIteratorTest`. - Moved AttributeGroup related code from `wide_columns.h` to a new file, `attribute_groups.h`. Some Implementation Details - `MultiCfIteratorImpl` takes two functions - `populate_func` and `reset_func` and use them to populate `value_` and `columns_` in CoalescingIterator and `attribute_groups_` in AttributeGroupIterator. In CoalescingIterator, populate_func is `Coalesce()`, in AttributeGroupIterator populate_func is `AddToAttributeGroups()`. `reset_func` clears populated value_, columns_ and attribute_groups_ accordingly. - `Coalesce()` merge sorts columns from multiple CFs when a key exists in more than on CFs. column that appears in later CF overwrites the prior ones. For example, if CF1 has `"key_1" ==> {"col_1": "foo", "col_2", "baz"}` and CF2 has `"key_1" ==> {"col_2": "quux", "col_3", "bla"}`, and when the iterator is at `key_1`, `columns()` will return `{"col_1": "foo", "col_2", "quux", "col_3", "bla"}` In this example, `value()` will be empty, because none of them have values for `kDefaultColumnName` Pull Request resolved: https://github.com/facebook/rocksdb/pull/12480 Test Plan: ## Unit Test ``` ./multi_cf_iterator_test ``` ## Performance Test To make sure this change does not impact existing `Iterator` performance **Build** ``` $> make -j64 release ``` **Setup** ``` $> TEST_TMPDIR=/dev/shm/db_bench ./db_bench -benchmarks="filluniquerandom" -key_size=32 -value_size=512 -num=1000000 -compression_type=none ``` **Run** ``` TEST_TMPDIR=/dev/shm/db_bench ./db_bench -use_existing_db=1 -benchmarks="newiterator,seekrandom" -cache_size=10485760000 ``` **Before the change** ``` DB path: [/dev/shm/db_bench/dbbench] newiterator : 0.519 micros/op 1927904 ops/sec 0.519 seconds 1000000 operations; DB path: [/dev/shm/db_bench/dbbench] seekrandom : 5.302 micros/op 188589 ops/sec 5.303 seconds 1000000 operations; (0 of 1000000 found) ``` **After the change** ``` DB path: [/dev/shm/db_bench/dbbench] newiterator : 0.497 micros/op 2011012 ops/sec 0.497 seconds 1000000 operations; DB path: [/dev/shm/db_bench/dbbench] seekrandom : 5.252 micros/op 190405 ops/sec 5.252 seconds 1000000 operations; (0 of 1000000 found) ``` Reviewed By: ltamasi Differential Revision: D55353909 Pulled By: jaykorean fbshipit-source-id: 8d7786ffee09e022261ce34aa60e8633685e1946
2024-04-11 18:34:04 +00:00
Iterator* current() const {
if (std::holds_alternative<MultiCfMaxHeap>(heap_)) {
auto& max_heap = std::get<MultiCfMaxHeap>(heap_);
return max_heap.top().iterator;
}
auto& min_heap = std::get<MultiCfMinHeap>(heap_);
return min_heap.top().iterator;
}
void considerStatus(Status s) {
if (!s.ok() && status_.ok()) {
status_ = std::move(s);
}
}
template <typename HeapType, typename InitFunc>
HeapType& GetHeap(InitFunc initFunc) {
if (!std::holds_alternative<HeapType>(heap_)) {
initFunc();
}
return std::get<HeapType>(heap_);
}
void InitMinHeap() {
heap_.emplace<MultiCfMinHeap>(
MultiCfHeapItemComparator<std::greater<int>>(comparator_));
}
void InitMaxHeap() {
heap_.emplace<MultiCfMaxHeap>(
MultiCfHeapItemComparator<std::less<int>>(comparator_));
}
MultiCfIterator - AttributeGroupIter Impl & CoalescingIter Optimization (#12534) Summary: Continuing from the previous MultiCfIterator Implementations - (https://github.com/facebook/rocksdb/issues/12422, https://github.com/facebook/rocksdb/issues/12480 #12465), this PR completes the `AttributeGroupIterator` by implementing `AttributeGroupIteratorImpl::AddToAttributeGroups()`. While implementing the `AttributeGroupIterator`, we had to make some changes in `MultiCfIteratorImpl` and found an opportunity to improve `Coalesce()` in `CoalescingIterator`. Lifting `UNDER CONSTRUCTION - DO NOT USE` comment by replacing it with `EXPERIMENTAL` Here are some implementation details: - `IteratorAttributeGroups` is introduced to avoid having to copy all `WideColumn` objects during iteration. - `PopulateIterator()` no longer advances non-top iterators that have the same key as the top iterator in the heap. - `AdvanceIterator()` needs to advance the non-top iterators when they have the same key as the top iterator in the heap. - Instead of populating one by one, `PopulateIterator()` now collects all items with the same key and calls `populate_func(items)` at once. - This allowed optimization in `Coalesce()` such that we no longer do K-1 rounds of 2-way merge, but do one K-way merge instead. Pull Request resolved: https://github.com/facebook/rocksdb/pull/12534 Test Plan: Uncommented the assertions in `verifyAttributeGroupIterator()` ``` ./multi_cf_iterator_test ``` Reviewed By: ltamasi Differential Revision: D56089019 Pulled By: jaykorean fbshipit-source-id: 6b0b4247e221f69b40b147d41492008cc9b15054
2024-04-16 15:45:38 +00:00
template <typename BinaryHeap, typename ChildSeekFuncType>
void SeekCommon(BinaryHeap& heap, ChildSeekFuncType child_seek_func) {
MultiCFIterator Refactor - CoalescingIterator & AttributeGroupIterator (#12480) Summary: There are a couple of reasons to modify the current implementation of the MultiCfIterator, which implements the generic `Iterator` interface. - The default behavior of `value()`/`columns()` returning data from different Column Families for different keys can be prone to errors, even though there might be valid use cases where users do not care about the origin of the value/columns. - The `attribute_groups()` API, which is not yet implemented, will not be useful for a single-CF iterator. In this PR, we are implementing the following changes: - `IteratorBase` introduced, which includes all basic iterator functions except `value()` and `columns()`. - `Iterator`, which now inherits from `IteratorBase`, includes `value()` and `columns()`. - New public interface `AttributeGroupIterator` inherits from `IteratorBase` and additionally includes `attribute_groups()` (to be implemented). - Renamed former `MultiCfIterator` to `CoalescingIterator` which inherits from `Iterator` - Existing MultiCfIteratorTest has been split into two - `CoalescingIteratorTest` and `AttributeGroupIteratorTest`. - Moved AttributeGroup related code from `wide_columns.h` to a new file, `attribute_groups.h`. Some Implementation Details - `MultiCfIteratorImpl` takes two functions - `populate_func` and `reset_func` and use them to populate `value_` and `columns_` in CoalescingIterator and `attribute_groups_` in AttributeGroupIterator. In CoalescingIterator, populate_func is `Coalesce()`, in AttributeGroupIterator populate_func is `AddToAttributeGroups()`. `reset_func` clears populated value_, columns_ and attribute_groups_ accordingly. - `Coalesce()` merge sorts columns from multiple CFs when a key exists in more than on CFs. column that appears in later CF overwrites the prior ones. For example, if CF1 has `"key_1" ==> {"col_1": "foo", "col_2", "baz"}` and CF2 has `"key_1" ==> {"col_2": "quux", "col_3", "bla"}`, and when the iterator is at `key_1`, `columns()` will return `{"col_1": "foo", "col_2", "quux", "col_3", "bla"}` In this example, `value()` will be empty, because none of them have values for `kDefaultColumnName` Pull Request resolved: https://github.com/facebook/rocksdb/pull/12480 Test Plan: ## Unit Test ``` ./multi_cf_iterator_test ``` ## Performance Test To make sure this change does not impact existing `Iterator` performance **Build** ``` $> make -j64 release ``` **Setup** ``` $> TEST_TMPDIR=/dev/shm/db_bench ./db_bench -benchmarks="filluniquerandom" -key_size=32 -value_size=512 -num=1000000 -compression_type=none ``` **Run** ``` TEST_TMPDIR=/dev/shm/db_bench ./db_bench -use_existing_db=1 -benchmarks="newiterator,seekrandom" -cache_size=10485760000 ``` **Before the change** ``` DB path: [/dev/shm/db_bench/dbbench] newiterator : 0.519 micros/op 1927904 ops/sec 0.519 seconds 1000000 operations; DB path: [/dev/shm/db_bench/dbbench] seekrandom : 5.302 micros/op 188589 ops/sec 5.303 seconds 1000000 operations; (0 of 1000000 found) ``` **After the change** ``` DB path: [/dev/shm/db_bench/dbbench] newiterator : 0.497 micros/op 2011012 ops/sec 0.497 seconds 1000000 operations; DB path: [/dev/shm/db_bench/dbbench] seekrandom : 5.252 micros/op 190405 ops/sec 5.252 seconds 1000000 operations; (0 of 1000000 found) ``` Reviewed By: ltamasi Differential Revision: D55353909 Pulled By: jaykorean fbshipit-source-id: 8d7786ffee09e022261ce34aa60e8633685e1946
2024-04-11 18:34:04 +00:00
reset_func_();
heap.clear();
int i = 0;
MultiCfIterator - AttributeGroupIter Impl & CoalescingIter Optimization (#12534) Summary: Continuing from the previous MultiCfIterator Implementations - (https://github.com/facebook/rocksdb/issues/12422, https://github.com/facebook/rocksdb/issues/12480 #12465), this PR completes the `AttributeGroupIterator` by implementing `AttributeGroupIteratorImpl::AddToAttributeGroups()`. While implementing the `AttributeGroupIterator`, we had to make some changes in `MultiCfIteratorImpl` and found an opportunity to improve `Coalesce()` in `CoalescingIterator`. Lifting `UNDER CONSTRUCTION - DO NOT USE` comment by replacing it with `EXPERIMENTAL` Here are some implementation details: - `IteratorAttributeGroups` is introduced to avoid having to copy all `WideColumn` objects during iteration. - `PopulateIterator()` no longer advances non-top iterators that have the same key as the top iterator in the heap. - `AdvanceIterator()` needs to advance the non-top iterators when they have the same key as the top iterator in the heap. - Instead of populating one by one, `PopulateIterator()` now collects all items with the same key and calls `populate_func(items)` at once. - This allowed optimization in `Coalesce()` such that we no longer do K-1 rounds of 2-way merge, but do one K-way merge instead. Pull Request resolved: https://github.com/facebook/rocksdb/pull/12534 Test Plan: Uncommented the assertions in `verifyAttributeGroupIterator()` ``` ./multi_cf_iterator_test ``` Reviewed By: ltamasi Differential Revision: D56089019 Pulled By: jaykorean fbshipit-source-id: 6b0b4247e221f69b40b147d41492008cc9b15054
2024-04-16 15:45:38 +00:00
for (auto& [cfh, iter] : cfh_iter_pairs_) {
MultiCFIterator Refactor - CoalescingIterator & AttributeGroupIterator (#12480) Summary: There are a couple of reasons to modify the current implementation of the MultiCfIterator, which implements the generic `Iterator` interface. - The default behavior of `value()`/`columns()` returning data from different Column Families for different keys can be prone to errors, even though there might be valid use cases where users do not care about the origin of the value/columns. - The `attribute_groups()` API, which is not yet implemented, will not be useful for a single-CF iterator. In this PR, we are implementing the following changes: - `IteratorBase` introduced, which includes all basic iterator functions except `value()` and `columns()`. - `Iterator`, which now inherits from `IteratorBase`, includes `value()` and `columns()`. - New public interface `AttributeGroupIterator` inherits from `IteratorBase` and additionally includes `attribute_groups()` (to be implemented). - Renamed former `MultiCfIterator` to `CoalescingIterator` which inherits from `Iterator` - Existing MultiCfIteratorTest has been split into two - `CoalescingIteratorTest` and `AttributeGroupIteratorTest`. - Moved AttributeGroup related code from `wide_columns.h` to a new file, `attribute_groups.h`. Some Implementation Details - `MultiCfIteratorImpl` takes two functions - `populate_func` and `reset_func` and use them to populate `value_` and `columns_` in CoalescingIterator and `attribute_groups_` in AttributeGroupIterator. In CoalescingIterator, populate_func is `Coalesce()`, in AttributeGroupIterator populate_func is `AddToAttributeGroups()`. `reset_func` clears populated value_, columns_ and attribute_groups_ accordingly. - `Coalesce()` merge sorts columns from multiple CFs when a key exists in more than on CFs. column that appears in later CF overwrites the prior ones. For example, if CF1 has `"key_1" ==> {"col_1": "foo", "col_2", "baz"}` and CF2 has `"key_1" ==> {"col_2": "quux", "col_3", "bla"}`, and when the iterator is at `key_1`, `columns()` will return `{"col_1": "foo", "col_2", "quux", "col_3", "bla"}` In this example, `value()` will be empty, because none of them have values for `kDefaultColumnName` Pull Request resolved: https://github.com/facebook/rocksdb/pull/12480 Test Plan: ## Unit Test ``` ./multi_cf_iterator_test ``` ## Performance Test To make sure this change does not impact existing `Iterator` performance **Build** ``` $> make -j64 release ``` **Setup** ``` $> TEST_TMPDIR=/dev/shm/db_bench ./db_bench -benchmarks="filluniquerandom" -key_size=32 -value_size=512 -num=1000000 -compression_type=none ``` **Run** ``` TEST_TMPDIR=/dev/shm/db_bench ./db_bench -use_existing_db=1 -benchmarks="newiterator,seekrandom" -cache_size=10485760000 ``` **Before the change** ``` DB path: [/dev/shm/db_bench/dbbench] newiterator : 0.519 micros/op 1927904 ops/sec 0.519 seconds 1000000 operations; DB path: [/dev/shm/db_bench/dbbench] seekrandom : 5.302 micros/op 188589 ops/sec 5.303 seconds 1000000 operations; (0 of 1000000 found) ``` **After the change** ``` DB path: [/dev/shm/db_bench/dbbench] newiterator : 0.497 micros/op 2011012 ops/sec 0.497 seconds 1000000 operations; DB path: [/dev/shm/db_bench/dbbench] seekrandom : 5.252 micros/op 190405 ops/sec 5.252 seconds 1000000 operations; (0 of 1000000 found) ``` Reviewed By: ltamasi Differential Revision: D55353909 Pulled By: jaykorean fbshipit-source-id: 8d7786ffee09e022261ce34aa60e8633685e1946
2024-04-11 18:34:04 +00:00
child_seek_func(iter.get());
if (iter->Valid()) {
assert(iter->status().ok());
MultiCfIterator - AttributeGroupIter Impl & CoalescingIter Optimization (#12534) Summary: Continuing from the previous MultiCfIterator Implementations - (https://github.com/facebook/rocksdb/issues/12422, https://github.com/facebook/rocksdb/issues/12480 #12465), this PR completes the `AttributeGroupIterator` by implementing `AttributeGroupIteratorImpl::AddToAttributeGroups()`. While implementing the `AttributeGroupIterator`, we had to make some changes in `MultiCfIteratorImpl` and found an opportunity to improve `Coalesce()` in `CoalescingIterator`. Lifting `UNDER CONSTRUCTION - DO NOT USE` comment by replacing it with `EXPERIMENTAL` Here are some implementation details: - `IteratorAttributeGroups` is introduced to avoid having to copy all `WideColumn` objects during iteration. - `PopulateIterator()` no longer advances non-top iterators that have the same key as the top iterator in the heap. - `AdvanceIterator()` needs to advance the non-top iterators when they have the same key as the top iterator in the heap. - Instead of populating one by one, `PopulateIterator()` now collects all items with the same key and calls `populate_func(items)` at once. - This allowed optimization in `Coalesce()` such that we no longer do K-1 rounds of 2-way merge, but do one K-way merge instead. Pull Request resolved: https://github.com/facebook/rocksdb/pull/12534 Test Plan: Uncommented the assertions in `verifyAttributeGroupIterator()` ``` ./multi_cf_iterator_test ``` Reviewed By: ltamasi Differential Revision: D56089019 Pulled By: jaykorean fbshipit-source-id: 6b0b4247e221f69b40b147d41492008cc9b15054
2024-04-16 15:45:38 +00:00
heap.push(MultiCfIteratorInfo{cfh, iter.get(), i});
MultiCFIterator Refactor - CoalescingIterator & AttributeGroupIterator (#12480) Summary: There are a couple of reasons to modify the current implementation of the MultiCfIterator, which implements the generic `Iterator` interface. - The default behavior of `value()`/`columns()` returning data from different Column Families for different keys can be prone to errors, even though there might be valid use cases where users do not care about the origin of the value/columns. - The `attribute_groups()` API, which is not yet implemented, will not be useful for a single-CF iterator. In this PR, we are implementing the following changes: - `IteratorBase` introduced, which includes all basic iterator functions except `value()` and `columns()`. - `Iterator`, which now inherits from `IteratorBase`, includes `value()` and `columns()`. - New public interface `AttributeGroupIterator` inherits from `IteratorBase` and additionally includes `attribute_groups()` (to be implemented). - Renamed former `MultiCfIterator` to `CoalescingIterator` which inherits from `Iterator` - Existing MultiCfIteratorTest has been split into two - `CoalescingIteratorTest` and `AttributeGroupIteratorTest`. - Moved AttributeGroup related code from `wide_columns.h` to a new file, `attribute_groups.h`. Some Implementation Details - `MultiCfIteratorImpl` takes two functions - `populate_func` and `reset_func` and use them to populate `value_` and `columns_` in CoalescingIterator and `attribute_groups_` in AttributeGroupIterator. In CoalescingIterator, populate_func is `Coalesce()`, in AttributeGroupIterator populate_func is `AddToAttributeGroups()`. `reset_func` clears populated value_, columns_ and attribute_groups_ accordingly. - `Coalesce()` merge sorts columns from multiple CFs when a key exists in more than on CFs. column that appears in later CF overwrites the prior ones. For example, if CF1 has `"key_1" ==> {"col_1": "foo", "col_2", "baz"}` and CF2 has `"key_1" ==> {"col_2": "quux", "col_3", "bla"}`, and when the iterator is at `key_1`, `columns()` will return `{"col_1": "foo", "col_2", "quux", "col_3", "bla"}` In this example, `value()` will be empty, because none of them have values for `kDefaultColumnName` Pull Request resolved: https://github.com/facebook/rocksdb/pull/12480 Test Plan: ## Unit Test ``` ./multi_cf_iterator_test ``` ## Performance Test To make sure this change does not impact existing `Iterator` performance **Build** ``` $> make -j64 release ``` **Setup** ``` $> TEST_TMPDIR=/dev/shm/db_bench ./db_bench -benchmarks="filluniquerandom" -key_size=32 -value_size=512 -num=1000000 -compression_type=none ``` **Run** ``` TEST_TMPDIR=/dev/shm/db_bench ./db_bench -use_existing_db=1 -benchmarks="newiterator,seekrandom" -cache_size=10485760000 ``` **Before the change** ``` DB path: [/dev/shm/db_bench/dbbench] newiterator : 0.519 micros/op 1927904 ops/sec 0.519 seconds 1000000 operations; DB path: [/dev/shm/db_bench/dbbench] seekrandom : 5.302 micros/op 188589 ops/sec 5.303 seconds 1000000 operations; (0 of 1000000 found) ``` **After the change** ``` DB path: [/dev/shm/db_bench/dbbench] newiterator : 0.497 micros/op 2011012 ops/sec 0.497 seconds 1000000 operations; DB path: [/dev/shm/db_bench/dbbench] seekrandom : 5.252 micros/op 190405 ops/sec 5.252 seconds 1000000 operations; (0 of 1000000 found) ``` Reviewed By: ltamasi Differential Revision: D55353909 Pulled By: jaykorean fbshipit-source-id: 8d7786ffee09e022261ce34aa60e8633685e1946
2024-04-11 18:34:04 +00:00
} else {
considerStatus(iter->status());
if (!status_.ok()) {
// Non-OK status from the iterator. Bail out early
heap.clear();
break;
}
}
++i;
}
if (!heap.empty()) {
MultiCfIterator - AttributeGroupIter Impl & CoalescingIter Optimization (#12534) Summary: Continuing from the previous MultiCfIterator Implementations - (https://github.com/facebook/rocksdb/issues/12422, https://github.com/facebook/rocksdb/issues/12480 #12465), this PR completes the `AttributeGroupIterator` by implementing `AttributeGroupIteratorImpl::AddToAttributeGroups()`. While implementing the `AttributeGroupIterator`, we had to make some changes in `MultiCfIteratorImpl` and found an opportunity to improve `Coalesce()` in `CoalescingIterator`. Lifting `UNDER CONSTRUCTION - DO NOT USE` comment by replacing it with `EXPERIMENTAL` Here are some implementation details: - `IteratorAttributeGroups` is introduced to avoid having to copy all `WideColumn` objects during iteration. - `PopulateIterator()` no longer advances non-top iterators that have the same key as the top iterator in the heap. - `AdvanceIterator()` needs to advance the non-top iterators when they have the same key as the top iterator in the heap. - Instead of populating one by one, `PopulateIterator()` now collects all items with the same key and calls `populate_func(items)` at once. - This allowed optimization in `Coalesce()` such that we no longer do K-1 rounds of 2-way merge, but do one K-way merge instead. Pull Request resolved: https://github.com/facebook/rocksdb/pull/12534 Test Plan: Uncommented the assertions in `verifyAttributeGroupIterator()` ``` ./multi_cf_iterator_test ``` Reviewed By: ltamasi Differential Revision: D56089019 Pulled By: jaykorean fbshipit-source-id: 6b0b4247e221f69b40b147d41492008cc9b15054
2024-04-16 15:45:38 +00:00
PopulateIterator(heap);
MultiCFIterator Refactor - CoalescingIterator & AttributeGroupIterator (#12480) Summary: There are a couple of reasons to modify the current implementation of the MultiCfIterator, which implements the generic `Iterator` interface. - The default behavior of `value()`/`columns()` returning data from different Column Families for different keys can be prone to errors, even though there might be valid use cases where users do not care about the origin of the value/columns. - The `attribute_groups()` API, which is not yet implemented, will not be useful for a single-CF iterator. In this PR, we are implementing the following changes: - `IteratorBase` introduced, which includes all basic iterator functions except `value()` and `columns()`. - `Iterator`, which now inherits from `IteratorBase`, includes `value()` and `columns()`. - New public interface `AttributeGroupIterator` inherits from `IteratorBase` and additionally includes `attribute_groups()` (to be implemented). - Renamed former `MultiCfIterator` to `CoalescingIterator` which inherits from `Iterator` - Existing MultiCfIteratorTest has been split into two - `CoalescingIteratorTest` and `AttributeGroupIteratorTest`. - Moved AttributeGroup related code from `wide_columns.h` to a new file, `attribute_groups.h`. Some Implementation Details - `MultiCfIteratorImpl` takes two functions - `populate_func` and `reset_func` and use them to populate `value_` and `columns_` in CoalescingIterator and `attribute_groups_` in AttributeGroupIterator. In CoalescingIterator, populate_func is `Coalesce()`, in AttributeGroupIterator populate_func is `AddToAttributeGroups()`. `reset_func` clears populated value_, columns_ and attribute_groups_ accordingly. - `Coalesce()` merge sorts columns from multiple CFs when a key exists in more than on CFs. column that appears in later CF overwrites the prior ones. For example, if CF1 has `"key_1" ==> {"col_1": "foo", "col_2", "baz"}` and CF2 has `"key_1" ==> {"col_2": "quux", "col_3", "bla"}`, and when the iterator is at `key_1`, `columns()` will return `{"col_1": "foo", "col_2", "quux", "col_3", "bla"}` In this example, `value()` will be empty, because none of them have values for `kDefaultColumnName` Pull Request resolved: https://github.com/facebook/rocksdb/pull/12480 Test Plan: ## Unit Test ``` ./multi_cf_iterator_test ``` ## Performance Test To make sure this change does not impact existing `Iterator` performance **Build** ``` $> make -j64 release ``` **Setup** ``` $> TEST_TMPDIR=/dev/shm/db_bench ./db_bench -benchmarks="filluniquerandom" -key_size=32 -value_size=512 -num=1000000 -compression_type=none ``` **Run** ``` TEST_TMPDIR=/dev/shm/db_bench ./db_bench -use_existing_db=1 -benchmarks="newiterator,seekrandom" -cache_size=10485760000 ``` **Before the change** ``` DB path: [/dev/shm/db_bench/dbbench] newiterator : 0.519 micros/op 1927904 ops/sec 0.519 seconds 1000000 operations; DB path: [/dev/shm/db_bench/dbbench] seekrandom : 5.302 micros/op 188589 ops/sec 5.303 seconds 1000000 operations; (0 of 1000000 found) ``` **After the change** ``` DB path: [/dev/shm/db_bench/dbbench] newiterator : 0.497 micros/op 2011012 ops/sec 0.497 seconds 1000000 operations; DB path: [/dev/shm/db_bench/dbbench] seekrandom : 5.252 micros/op 190405 ops/sec 5.252 seconds 1000000 operations; (0 of 1000000 found) ``` Reviewed By: ltamasi Differential Revision: D55353909 Pulled By: jaykorean fbshipit-source-id: 8d7786ffee09e022261ce34aa60e8633685e1946
2024-04-11 18:34:04 +00:00
}
}
template <typename BinaryHeap, typename AdvanceFuncType>
void AdvanceIterator(BinaryHeap& heap, AdvanceFuncType advance_func) {
reset_func_();
// It is possible for one or more child iters are at invalid keys due to
// manual prefix iteration. For such cases, we consider the result of the
// multi-cf-iter is also undefined.
// https://github.com/facebook/rocksdb/wiki/Prefix-Seek#manual-prefix-iterating
// for details about manual prefix iteration
if (heap.empty()) {
return;
}
MultiCFIterator Refactor - CoalescingIterator & AttributeGroupIterator (#12480) Summary: There are a couple of reasons to modify the current implementation of the MultiCfIterator, which implements the generic `Iterator` interface. - The default behavior of `value()`/`columns()` returning data from different Column Families for different keys can be prone to errors, even though there might be valid use cases where users do not care about the origin of the value/columns. - The `attribute_groups()` API, which is not yet implemented, will not be useful for a single-CF iterator. In this PR, we are implementing the following changes: - `IteratorBase` introduced, which includes all basic iterator functions except `value()` and `columns()`. - `Iterator`, which now inherits from `IteratorBase`, includes `value()` and `columns()`. - New public interface `AttributeGroupIterator` inherits from `IteratorBase` and additionally includes `attribute_groups()` (to be implemented). - Renamed former `MultiCfIterator` to `CoalescingIterator` which inherits from `Iterator` - Existing MultiCfIteratorTest has been split into two - `CoalescingIteratorTest` and `AttributeGroupIteratorTest`. - Moved AttributeGroup related code from `wide_columns.h` to a new file, `attribute_groups.h`. Some Implementation Details - `MultiCfIteratorImpl` takes two functions - `populate_func` and `reset_func` and use them to populate `value_` and `columns_` in CoalescingIterator and `attribute_groups_` in AttributeGroupIterator. In CoalescingIterator, populate_func is `Coalesce()`, in AttributeGroupIterator populate_func is `AddToAttributeGroups()`. `reset_func` clears populated value_, columns_ and attribute_groups_ accordingly. - `Coalesce()` merge sorts columns from multiple CFs when a key exists in more than on CFs. column that appears in later CF overwrites the prior ones. For example, if CF1 has `"key_1" ==> {"col_1": "foo", "col_2", "baz"}` and CF2 has `"key_1" ==> {"col_2": "quux", "col_3", "bla"}`, and when the iterator is at `key_1`, `columns()` will return `{"col_1": "foo", "col_2", "quux", "col_3", "bla"}` In this example, `value()` will be empty, because none of them have values for `kDefaultColumnName` Pull Request resolved: https://github.com/facebook/rocksdb/pull/12480 Test Plan: ## Unit Test ``` ./multi_cf_iterator_test ``` ## Performance Test To make sure this change does not impact existing `Iterator` performance **Build** ``` $> make -j64 release ``` **Setup** ``` $> TEST_TMPDIR=/dev/shm/db_bench ./db_bench -benchmarks="filluniquerandom" -key_size=32 -value_size=512 -num=1000000 -compression_type=none ``` **Run** ``` TEST_TMPDIR=/dev/shm/db_bench ./db_bench -use_existing_db=1 -benchmarks="newiterator,seekrandom" -cache_size=10485760000 ``` **Before the change** ``` DB path: [/dev/shm/db_bench/dbbench] newiterator : 0.519 micros/op 1927904 ops/sec 0.519 seconds 1000000 operations; DB path: [/dev/shm/db_bench/dbbench] seekrandom : 5.302 micros/op 188589 ops/sec 5.303 seconds 1000000 operations; (0 of 1000000 found) ``` **After the change** ``` DB path: [/dev/shm/db_bench/dbbench] newiterator : 0.497 micros/op 2011012 ops/sec 0.497 seconds 1000000 operations; DB path: [/dev/shm/db_bench/dbbench] seekrandom : 5.252 micros/op 190405 ops/sec 5.252 seconds 1000000 operations; (0 of 1000000 found) ``` Reviewed By: ltamasi Differential Revision: D55353909 Pulled By: jaykorean fbshipit-source-id: 8d7786ffee09e022261ce34aa60e8633685e1946
2024-04-11 18:34:04 +00:00
MultiCfIterator - AttributeGroupIter Impl & CoalescingIter Optimization (#12534) Summary: Continuing from the previous MultiCfIterator Implementations - (https://github.com/facebook/rocksdb/issues/12422, https://github.com/facebook/rocksdb/issues/12480 #12465), this PR completes the `AttributeGroupIterator` by implementing `AttributeGroupIteratorImpl::AddToAttributeGroups()`. While implementing the `AttributeGroupIterator`, we had to make some changes in `MultiCfIteratorImpl` and found an opportunity to improve `Coalesce()` in `CoalescingIterator`. Lifting `UNDER CONSTRUCTION - DO NOT USE` comment by replacing it with `EXPERIMENTAL` Here are some implementation details: - `IteratorAttributeGroups` is introduced to avoid having to copy all `WideColumn` objects during iteration. - `PopulateIterator()` no longer advances non-top iterators that have the same key as the top iterator in the heap. - `AdvanceIterator()` needs to advance the non-top iterators when they have the same key as the top iterator in the heap. - Instead of populating one by one, `PopulateIterator()` now collects all items with the same key and calls `populate_func(items)` at once. - This allowed optimization in `Coalesce()` such that we no longer do K-1 rounds of 2-way merge, but do one K-way merge instead. Pull Request resolved: https://github.com/facebook/rocksdb/pull/12534 Test Plan: Uncommented the assertions in `verifyAttributeGroupIterator()` ``` ./multi_cf_iterator_test ``` Reviewed By: ltamasi Differential Revision: D56089019 Pulled By: jaykorean fbshipit-source-id: 6b0b4247e221f69b40b147d41492008cc9b15054
2024-04-16 15:45:38 +00:00
// 1. Keep the top iterator (by popping it from the heap)
// 2. Make sure all others have iterated past the top iterator key slice
// 3. Advance the top iterator, and add it back to the heap if valid
MultiCFIterator Refactor - CoalescingIterator & AttributeGroupIterator (#12480) Summary: There are a couple of reasons to modify the current implementation of the MultiCfIterator, which implements the generic `Iterator` interface. - The default behavior of `value()`/`columns()` returning data from different Column Families for different keys can be prone to errors, even though there might be valid use cases where users do not care about the origin of the value/columns. - The `attribute_groups()` API, which is not yet implemented, will not be useful for a single-CF iterator. In this PR, we are implementing the following changes: - `IteratorBase` introduced, which includes all basic iterator functions except `value()` and `columns()`. - `Iterator`, which now inherits from `IteratorBase`, includes `value()` and `columns()`. - New public interface `AttributeGroupIterator` inherits from `IteratorBase` and additionally includes `attribute_groups()` (to be implemented). - Renamed former `MultiCfIterator` to `CoalescingIterator` which inherits from `Iterator` - Existing MultiCfIteratorTest has been split into two - `CoalescingIteratorTest` and `AttributeGroupIteratorTest`. - Moved AttributeGroup related code from `wide_columns.h` to a new file, `attribute_groups.h`. Some Implementation Details - `MultiCfIteratorImpl` takes two functions - `populate_func` and `reset_func` and use them to populate `value_` and `columns_` in CoalescingIterator and `attribute_groups_` in AttributeGroupIterator. In CoalescingIterator, populate_func is `Coalesce()`, in AttributeGroupIterator populate_func is `AddToAttributeGroups()`. `reset_func` clears populated value_, columns_ and attribute_groups_ accordingly. - `Coalesce()` merge sorts columns from multiple CFs when a key exists in more than on CFs. column that appears in later CF overwrites the prior ones. For example, if CF1 has `"key_1" ==> {"col_1": "foo", "col_2", "baz"}` and CF2 has `"key_1" ==> {"col_2": "quux", "col_3", "bla"}`, and when the iterator is at `key_1`, `columns()` will return `{"col_1": "foo", "col_2", "quux", "col_3", "bla"}` In this example, `value()` will be empty, because none of them have values for `kDefaultColumnName` Pull Request resolved: https://github.com/facebook/rocksdb/pull/12480 Test Plan: ## Unit Test ``` ./multi_cf_iterator_test ``` ## Performance Test To make sure this change does not impact existing `Iterator` performance **Build** ``` $> make -j64 release ``` **Setup** ``` $> TEST_TMPDIR=/dev/shm/db_bench ./db_bench -benchmarks="filluniquerandom" -key_size=32 -value_size=512 -num=1000000 -compression_type=none ``` **Run** ``` TEST_TMPDIR=/dev/shm/db_bench ./db_bench -use_existing_db=1 -benchmarks="newiterator,seekrandom" -cache_size=10485760000 ``` **Before the change** ``` DB path: [/dev/shm/db_bench/dbbench] newiterator : 0.519 micros/op 1927904 ops/sec 0.519 seconds 1000000 operations; DB path: [/dev/shm/db_bench/dbbench] seekrandom : 5.302 micros/op 188589 ops/sec 5.303 seconds 1000000 operations; (0 of 1000000 found) ``` **After the change** ``` DB path: [/dev/shm/db_bench/dbbench] newiterator : 0.497 micros/op 2011012 ops/sec 0.497 seconds 1000000 operations; DB path: [/dev/shm/db_bench/dbbench] seekrandom : 5.252 micros/op 190405 ops/sec 5.252 seconds 1000000 operations; (0 of 1000000 found) ``` Reviewed By: ltamasi Differential Revision: D55353909 Pulled By: jaykorean fbshipit-source-id: 8d7786ffee09e022261ce34aa60e8633685e1946
2024-04-11 18:34:04 +00:00
auto top = heap.top();
MultiCfIterator - AttributeGroupIter Impl & CoalescingIter Optimization (#12534) Summary: Continuing from the previous MultiCfIterator Implementations - (https://github.com/facebook/rocksdb/issues/12422, https://github.com/facebook/rocksdb/issues/12480 #12465), this PR completes the `AttributeGroupIterator` by implementing `AttributeGroupIteratorImpl::AddToAttributeGroups()`. While implementing the `AttributeGroupIterator`, we had to make some changes in `MultiCfIteratorImpl` and found an opportunity to improve `Coalesce()` in `CoalescingIterator`. Lifting `UNDER CONSTRUCTION - DO NOT USE` comment by replacing it with `EXPERIMENTAL` Here are some implementation details: - `IteratorAttributeGroups` is introduced to avoid having to copy all `WideColumn` objects during iteration. - `PopulateIterator()` no longer advances non-top iterators that have the same key as the top iterator in the heap. - `AdvanceIterator()` needs to advance the non-top iterators when they have the same key as the top iterator in the heap. - Instead of populating one by one, `PopulateIterator()` now collects all items with the same key and calls `populate_func(items)` at once. - This allowed optimization in `Coalesce()` such that we no longer do K-1 rounds of 2-way merge, but do one K-way merge instead. Pull Request resolved: https://github.com/facebook/rocksdb/pull/12534 Test Plan: Uncommented the assertions in `verifyAttributeGroupIterator()` ``` ./multi_cf_iterator_test ``` Reviewed By: ltamasi Differential Revision: D56089019 Pulled By: jaykorean fbshipit-source-id: 6b0b4247e221f69b40b147d41492008cc9b15054
2024-04-16 15:45:38 +00:00
heap.pop();
if (!heap.empty()) {
auto current = heap.top();
assert(current.iterator);
while (current.iterator->Valid() &&
comparator_->Compare(top.iterator->key(),
current.iterator->key()) == 0) {
assert(current.iterator->status().ok());
advance_func(current.iterator);
if (current.iterator->Valid()) {
heap.replace_top(heap.top());
} else {
considerStatus(current.iterator->status());
if (!status_.ok()) {
heap.clear();
return;
} else {
heap.pop();
}
}
if (!heap.empty()) {
current = heap.top();
}
}
}
MultiCFIterator Refactor - CoalescingIterator & AttributeGroupIterator (#12480) Summary: There are a couple of reasons to modify the current implementation of the MultiCfIterator, which implements the generic `Iterator` interface. - The default behavior of `value()`/`columns()` returning data from different Column Families for different keys can be prone to errors, even though there might be valid use cases where users do not care about the origin of the value/columns. - The `attribute_groups()` API, which is not yet implemented, will not be useful for a single-CF iterator. In this PR, we are implementing the following changes: - `IteratorBase` introduced, which includes all basic iterator functions except `value()` and `columns()`. - `Iterator`, which now inherits from `IteratorBase`, includes `value()` and `columns()`. - New public interface `AttributeGroupIterator` inherits from `IteratorBase` and additionally includes `attribute_groups()` (to be implemented). - Renamed former `MultiCfIterator` to `CoalescingIterator` which inherits from `Iterator` - Existing MultiCfIteratorTest has been split into two - `CoalescingIteratorTest` and `AttributeGroupIteratorTest`. - Moved AttributeGroup related code from `wide_columns.h` to a new file, `attribute_groups.h`. Some Implementation Details - `MultiCfIteratorImpl` takes two functions - `populate_func` and `reset_func` and use them to populate `value_` and `columns_` in CoalescingIterator and `attribute_groups_` in AttributeGroupIterator. In CoalescingIterator, populate_func is `Coalesce()`, in AttributeGroupIterator populate_func is `AddToAttributeGroups()`. `reset_func` clears populated value_, columns_ and attribute_groups_ accordingly. - `Coalesce()` merge sorts columns from multiple CFs when a key exists in more than on CFs. column that appears in later CF overwrites the prior ones. For example, if CF1 has `"key_1" ==> {"col_1": "foo", "col_2", "baz"}` and CF2 has `"key_1" ==> {"col_2": "quux", "col_3", "bla"}`, and when the iterator is at `key_1`, `columns()` will return `{"col_1": "foo", "col_2", "quux", "col_3", "bla"}` In this example, `value()` will be empty, because none of them have values for `kDefaultColumnName` Pull Request resolved: https://github.com/facebook/rocksdb/pull/12480 Test Plan: ## Unit Test ``` ./multi_cf_iterator_test ``` ## Performance Test To make sure this change does not impact existing `Iterator` performance **Build** ``` $> make -j64 release ``` **Setup** ``` $> TEST_TMPDIR=/dev/shm/db_bench ./db_bench -benchmarks="filluniquerandom" -key_size=32 -value_size=512 -num=1000000 -compression_type=none ``` **Run** ``` TEST_TMPDIR=/dev/shm/db_bench ./db_bench -use_existing_db=1 -benchmarks="newiterator,seekrandom" -cache_size=10485760000 ``` **Before the change** ``` DB path: [/dev/shm/db_bench/dbbench] newiterator : 0.519 micros/op 1927904 ops/sec 0.519 seconds 1000000 operations; DB path: [/dev/shm/db_bench/dbbench] seekrandom : 5.302 micros/op 188589 ops/sec 5.303 seconds 1000000 operations; (0 of 1000000 found) ``` **After the change** ``` DB path: [/dev/shm/db_bench/dbbench] newiterator : 0.497 micros/op 2011012 ops/sec 0.497 seconds 1000000 operations; DB path: [/dev/shm/db_bench/dbbench] seekrandom : 5.252 micros/op 190405 ops/sec 5.252 seconds 1000000 operations; (0 of 1000000 found) ``` Reviewed By: ltamasi Differential Revision: D55353909 Pulled By: jaykorean fbshipit-source-id: 8d7786ffee09e022261ce34aa60e8633685e1946
2024-04-11 18:34:04 +00:00
advance_func(top.iterator);
if (top.iterator->Valid()) {
assert(top.iterator->status().ok());
MultiCfIterator - AttributeGroupIter Impl & CoalescingIter Optimization (#12534) Summary: Continuing from the previous MultiCfIterator Implementations - (https://github.com/facebook/rocksdb/issues/12422, https://github.com/facebook/rocksdb/issues/12480 #12465), this PR completes the `AttributeGroupIterator` by implementing `AttributeGroupIteratorImpl::AddToAttributeGroups()`. While implementing the `AttributeGroupIterator`, we had to make some changes in `MultiCfIteratorImpl` and found an opportunity to improve `Coalesce()` in `CoalescingIterator`. Lifting `UNDER CONSTRUCTION - DO NOT USE` comment by replacing it with `EXPERIMENTAL` Here are some implementation details: - `IteratorAttributeGroups` is introduced to avoid having to copy all `WideColumn` objects during iteration. - `PopulateIterator()` no longer advances non-top iterators that have the same key as the top iterator in the heap. - `AdvanceIterator()` needs to advance the non-top iterators when they have the same key as the top iterator in the heap. - Instead of populating one by one, `PopulateIterator()` now collects all items with the same key and calls `populate_func(items)` at once. - This allowed optimization in `Coalesce()` such that we no longer do K-1 rounds of 2-way merge, but do one K-way merge instead. Pull Request resolved: https://github.com/facebook/rocksdb/pull/12534 Test Plan: Uncommented the assertions in `verifyAttributeGroupIterator()` ``` ./multi_cf_iterator_test ``` Reviewed By: ltamasi Differential Revision: D56089019 Pulled By: jaykorean fbshipit-source-id: 6b0b4247e221f69b40b147d41492008cc9b15054
2024-04-16 15:45:38 +00:00
heap.push(top);
MultiCFIterator Refactor - CoalescingIterator & AttributeGroupIterator (#12480) Summary: There are a couple of reasons to modify the current implementation of the MultiCfIterator, which implements the generic `Iterator` interface. - The default behavior of `value()`/`columns()` returning data from different Column Families for different keys can be prone to errors, even though there might be valid use cases where users do not care about the origin of the value/columns. - The `attribute_groups()` API, which is not yet implemented, will not be useful for a single-CF iterator. In this PR, we are implementing the following changes: - `IteratorBase` introduced, which includes all basic iterator functions except `value()` and `columns()`. - `Iterator`, which now inherits from `IteratorBase`, includes `value()` and `columns()`. - New public interface `AttributeGroupIterator` inherits from `IteratorBase` and additionally includes `attribute_groups()` (to be implemented). - Renamed former `MultiCfIterator` to `CoalescingIterator` which inherits from `Iterator` - Existing MultiCfIteratorTest has been split into two - `CoalescingIteratorTest` and `AttributeGroupIteratorTest`. - Moved AttributeGroup related code from `wide_columns.h` to a new file, `attribute_groups.h`. Some Implementation Details - `MultiCfIteratorImpl` takes two functions - `populate_func` and `reset_func` and use them to populate `value_` and `columns_` in CoalescingIterator and `attribute_groups_` in AttributeGroupIterator. In CoalescingIterator, populate_func is `Coalesce()`, in AttributeGroupIterator populate_func is `AddToAttributeGroups()`. `reset_func` clears populated value_, columns_ and attribute_groups_ accordingly. - `Coalesce()` merge sorts columns from multiple CFs when a key exists in more than on CFs. column that appears in later CF overwrites the prior ones. For example, if CF1 has `"key_1" ==> {"col_1": "foo", "col_2", "baz"}` and CF2 has `"key_1" ==> {"col_2": "quux", "col_3", "bla"}`, and when the iterator is at `key_1`, `columns()` will return `{"col_1": "foo", "col_2", "quux", "col_3", "bla"}` In this example, `value()` will be empty, because none of them have values for `kDefaultColumnName` Pull Request resolved: https://github.com/facebook/rocksdb/pull/12480 Test Plan: ## Unit Test ``` ./multi_cf_iterator_test ``` ## Performance Test To make sure this change does not impact existing `Iterator` performance **Build** ``` $> make -j64 release ``` **Setup** ``` $> TEST_TMPDIR=/dev/shm/db_bench ./db_bench -benchmarks="filluniquerandom" -key_size=32 -value_size=512 -num=1000000 -compression_type=none ``` **Run** ``` TEST_TMPDIR=/dev/shm/db_bench ./db_bench -use_existing_db=1 -benchmarks="newiterator,seekrandom" -cache_size=10485760000 ``` **Before the change** ``` DB path: [/dev/shm/db_bench/dbbench] newiterator : 0.519 micros/op 1927904 ops/sec 0.519 seconds 1000000 operations; DB path: [/dev/shm/db_bench/dbbench] seekrandom : 5.302 micros/op 188589 ops/sec 5.303 seconds 1000000 operations; (0 of 1000000 found) ``` **After the change** ``` DB path: [/dev/shm/db_bench/dbbench] newiterator : 0.497 micros/op 2011012 ops/sec 0.497 seconds 1000000 operations; DB path: [/dev/shm/db_bench/dbbench] seekrandom : 5.252 micros/op 190405 ops/sec 5.252 seconds 1000000 operations; (0 of 1000000 found) ``` Reviewed By: ltamasi Differential Revision: D55353909 Pulled By: jaykorean fbshipit-source-id: 8d7786ffee09e022261ce34aa60e8633685e1946
2024-04-11 18:34:04 +00:00
} else {
considerStatus(top.iterator->status());
if (!status_.ok()) {
heap.clear();
return;
}
}
MultiCfIterator - AttributeGroupIter Impl & CoalescingIter Optimization (#12534) Summary: Continuing from the previous MultiCfIterator Implementations - (https://github.com/facebook/rocksdb/issues/12422, https://github.com/facebook/rocksdb/issues/12480 #12465), this PR completes the `AttributeGroupIterator` by implementing `AttributeGroupIteratorImpl::AddToAttributeGroups()`. While implementing the `AttributeGroupIterator`, we had to make some changes in `MultiCfIteratorImpl` and found an opportunity to improve `Coalesce()` in `CoalescingIterator`. Lifting `UNDER CONSTRUCTION - DO NOT USE` comment by replacing it with `EXPERIMENTAL` Here are some implementation details: - `IteratorAttributeGroups` is introduced to avoid having to copy all `WideColumn` objects during iteration. - `PopulateIterator()` no longer advances non-top iterators that have the same key as the top iterator in the heap. - `AdvanceIterator()` needs to advance the non-top iterators when they have the same key as the top iterator in the heap. - Instead of populating one by one, `PopulateIterator()` now collects all items with the same key and calls `populate_func(items)` at once. - This allowed optimization in `Coalesce()` such that we no longer do K-1 rounds of 2-way merge, but do one K-way merge instead. Pull Request resolved: https://github.com/facebook/rocksdb/pull/12534 Test Plan: Uncommented the assertions in `verifyAttributeGroupIterator()` ``` ./multi_cf_iterator_test ``` Reviewed By: ltamasi Differential Revision: D56089019 Pulled By: jaykorean fbshipit-source-id: 6b0b4247e221f69b40b147d41492008cc9b15054
2024-04-16 15:45:38 +00:00
MultiCFIterator Refactor - CoalescingIterator & AttributeGroupIterator (#12480) Summary: There are a couple of reasons to modify the current implementation of the MultiCfIterator, which implements the generic `Iterator` interface. - The default behavior of `value()`/`columns()` returning data from different Column Families for different keys can be prone to errors, even though there might be valid use cases where users do not care about the origin of the value/columns. - The `attribute_groups()` API, which is not yet implemented, will not be useful for a single-CF iterator. In this PR, we are implementing the following changes: - `IteratorBase` introduced, which includes all basic iterator functions except `value()` and `columns()`. - `Iterator`, which now inherits from `IteratorBase`, includes `value()` and `columns()`. - New public interface `AttributeGroupIterator` inherits from `IteratorBase` and additionally includes `attribute_groups()` (to be implemented). - Renamed former `MultiCfIterator` to `CoalescingIterator` which inherits from `Iterator` - Existing MultiCfIteratorTest has been split into two - `CoalescingIteratorTest` and `AttributeGroupIteratorTest`. - Moved AttributeGroup related code from `wide_columns.h` to a new file, `attribute_groups.h`. Some Implementation Details - `MultiCfIteratorImpl` takes two functions - `populate_func` and `reset_func` and use them to populate `value_` and `columns_` in CoalescingIterator and `attribute_groups_` in AttributeGroupIterator. In CoalescingIterator, populate_func is `Coalesce()`, in AttributeGroupIterator populate_func is `AddToAttributeGroups()`. `reset_func` clears populated value_, columns_ and attribute_groups_ accordingly. - `Coalesce()` merge sorts columns from multiple CFs when a key exists in more than on CFs. column that appears in later CF overwrites the prior ones. For example, if CF1 has `"key_1" ==> {"col_1": "foo", "col_2", "baz"}` and CF2 has `"key_1" ==> {"col_2": "quux", "col_3", "bla"}`, and when the iterator is at `key_1`, `columns()` will return `{"col_1": "foo", "col_2", "quux", "col_3", "bla"}` In this example, `value()` will be empty, because none of them have values for `kDefaultColumnName` Pull Request resolved: https://github.com/facebook/rocksdb/pull/12480 Test Plan: ## Unit Test ``` ./multi_cf_iterator_test ``` ## Performance Test To make sure this change does not impact existing `Iterator` performance **Build** ``` $> make -j64 release ``` **Setup** ``` $> TEST_TMPDIR=/dev/shm/db_bench ./db_bench -benchmarks="filluniquerandom" -key_size=32 -value_size=512 -num=1000000 -compression_type=none ``` **Run** ``` TEST_TMPDIR=/dev/shm/db_bench ./db_bench -use_existing_db=1 -benchmarks="newiterator,seekrandom" -cache_size=10485760000 ``` **Before the change** ``` DB path: [/dev/shm/db_bench/dbbench] newiterator : 0.519 micros/op 1927904 ops/sec 0.519 seconds 1000000 operations; DB path: [/dev/shm/db_bench/dbbench] seekrandom : 5.302 micros/op 188589 ops/sec 5.303 seconds 1000000 operations; (0 of 1000000 found) ``` **After the change** ``` DB path: [/dev/shm/db_bench/dbbench] newiterator : 0.497 micros/op 2011012 ops/sec 0.497 seconds 1000000 operations; DB path: [/dev/shm/db_bench/dbbench] seekrandom : 5.252 micros/op 190405 ops/sec 5.252 seconds 1000000 operations; (0 of 1000000 found) ``` Reviewed By: ltamasi Differential Revision: D55353909 Pulled By: jaykorean fbshipit-source-id: 8d7786ffee09e022261ce34aa60e8633685e1946
2024-04-11 18:34:04 +00:00
if (!heap.empty()) {
MultiCfIterator - AttributeGroupIter Impl & CoalescingIter Optimization (#12534) Summary: Continuing from the previous MultiCfIterator Implementations - (https://github.com/facebook/rocksdb/issues/12422, https://github.com/facebook/rocksdb/issues/12480 #12465), this PR completes the `AttributeGroupIterator` by implementing `AttributeGroupIteratorImpl::AddToAttributeGroups()`. While implementing the `AttributeGroupIterator`, we had to make some changes in `MultiCfIteratorImpl` and found an opportunity to improve `Coalesce()` in `CoalescingIterator`. Lifting `UNDER CONSTRUCTION - DO NOT USE` comment by replacing it with `EXPERIMENTAL` Here are some implementation details: - `IteratorAttributeGroups` is introduced to avoid having to copy all `WideColumn` objects during iteration. - `PopulateIterator()` no longer advances non-top iterators that have the same key as the top iterator in the heap. - `AdvanceIterator()` needs to advance the non-top iterators when they have the same key as the top iterator in the heap. - Instead of populating one by one, `PopulateIterator()` now collects all items with the same key and calls `populate_func(items)` at once. - This allowed optimization in `Coalesce()` such that we no longer do K-1 rounds of 2-way merge, but do one K-way merge instead. Pull Request resolved: https://github.com/facebook/rocksdb/pull/12534 Test Plan: Uncommented the assertions in `verifyAttributeGroupIterator()` ``` ./multi_cf_iterator_test ``` Reviewed By: ltamasi Differential Revision: D56089019 Pulled By: jaykorean fbshipit-source-id: 6b0b4247e221f69b40b147d41492008cc9b15054
2024-04-16 15:45:38 +00:00
PopulateIterator(heap);
MultiCFIterator Refactor - CoalescingIterator & AttributeGroupIterator (#12480) Summary: There are a couple of reasons to modify the current implementation of the MultiCfIterator, which implements the generic `Iterator` interface. - The default behavior of `value()`/`columns()` returning data from different Column Families for different keys can be prone to errors, even though there might be valid use cases where users do not care about the origin of the value/columns. - The `attribute_groups()` API, which is not yet implemented, will not be useful for a single-CF iterator. In this PR, we are implementing the following changes: - `IteratorBase` introduced, which includes all basic iterator functions except `value()` and `columns()`. - `Iterator`, which now inherits from `IteratorBase`, includes `value()` and `columns()`. - New public interface `AttributeGroupIterator` inherits from `IteratorBase` and additionally includes `attribute_groups()` (to be implemented). - Renamed former `MultiCfIterator` to `CoalescingIterator` which inherits from `Iterator` - Existing MultiCfIteratorTest has been split into two - `CoalescingIteratorTest` and `AttributeGroupIteratorTest`. - Moved AttributeGroup related code from `wide_columns.h` to a new file, `attribute_groups.h`. Some Implementation Details - `MultiCfIteratorImpl` takes two functions - `populate_func` and `reset_func` and use them to populate `value_` and `columns_` in CoalescingIterator and `attribute_groups_` in AttributeGroupIterator. In CoalescingIterator, populate_func is `Coalesce()`, in AttributeGroupIterator populate_func is `AddToAttributeGroups()`. `reset_func` clears populated value_, columns_ and attribute_groups_ accordingly. - `Coalesce()` merge sorts columns from multiple CFs when a key exists in more than on CFs. column that appears in later CF overwrites the prior ones. For example, if CF1 has `"key_1" ==> {"col_1": "foo", "col_2", "baz"}` and CF2 has `"key_1" ==> {"col_2": "quux", "col_3", "bla"}`, and when the iterator is at `key_1`, `columns()` will return `{"col_1": "foo", "col_2", "quux", "col_3", "bla"}` In this example, `value()` will be empty, because none of them have values for `kDefaultColumnName` Pull Request resolved: https://github.com/facebook/rocksdb/pull/12480 Test Plan: ## Unit Test ``` ./multi_cf_iterator_test ``` ## Performance Test To make sure this change does not impact existing `Iterator` performance **Build** ``` $> make -j64 release ``` **Setup** ``` $> TEST_TMPDIR=/dev/shm/db_bench ./db_bench -benchmarks="filluniquerandom" -key_size=32 -value_size=512 -num=1000000 -compression_type=none ``` **Run** ``` TEST_TMPDIR=/dev/shm/db_bench ./db_bench -use_existing_db=1 -benchmarks="newiterator,seekrandom" -cache_size=10485760000 ``` **Before the change** ``` DB path: [/dev/shm/db_bench/dbbench] newiterator : 0.519 micros/op 1927904 ops/sec 0.519 seconds 1000000 operations; DB path: [/dev/shm/db_bench/dbbench] seekrandom : 5.302 micros/op 188589 ops/sec 5.303 seconds 1000000 operations; (0 of 1000000 found) ``` **After the change** ``` DB path: [/dev/shm/db_bench/dbbench] newiterator : 0.497 micros/op 2011012 ops/sec 0.497 seconds 1000000 operations; DB path: [/dev/shm/db_bench/dbbench] seekrandom : 5.252 micros/op 190405 ops/sec 5.252 seconds 1000000 operations; (0 of 1000000 found) ``` Reviewed By: ltamasi Differential Revision: D55353909 Pulled By: jaykorean fbshipit-source-id: 8d7786ffee09e022261ce34aa60e8633685e1946
2024-04-11 18:34:04 +00:00
}
}
MultiCfIterator - AttributeGroupIter Impl & CoalescingIter Optimization (#12534) Summary: Continuing from the previous MultiCfIterator Implementations - (https://github.com/facebook/rocksdb/issues/12422, https://github.com/facebook/rocksdb/issues/12480 #12465), this PR completes the `AttributeGroupIterator` by implementing `AttributeGroupIteratorImpl::AddToAttributeGroups()`. While implementing the `AttributeGroupIterator`, we had to make some changes in `MultiCfIteratorImpl` and found an opportunity to improve `Coalesce()` in `CoalescingIterator`. Lifting `UNDER CONSTRUCTION - DO NOT USE` comment by replacing it with `EXPERIMENTAL` Here are some implementation details: - `IteratorAttributeGroups` is introduced to avoid having to copy all `WideColumn` objects during iteration. - `PopulateIterator()` no longer advances non-top iterators that have the same key as the top iterator in the heap. - `AdvanceIterator()` needs to advance the non-top iterators when they have the same key as the top iterator in the heap. - Instead of populating one by one, `PopulateIterator()` now collects all items with the same key and calls `populate_func(items)` at once. - This allowed optimization in `Coalesce()` such that we no longer do K-1 rounds of 2-way merge, but do one K-way merge instead. Pull Request resolved: https://github.com/facebook/rocksdb/pull/12534 Test Plan: Uncommented the assertions in `verifyAttributeGroupIterator()` ``` ./multi_cf_iterator_test ``` Reviewed By: ltamasi Differential Revision: D56089019 Pulled By: jaykorean fbshipit-source-id: 6b0b4247e221f69b40b147d41492008cc9b15054
2024-04-16 15:45:38 +00:00
template <typename BinaryHeap>
void PopulateIterator(BinaryHeap& heap) {
// 1. Keep the top iterator (by popping it from the heap) and add it to list
// to populate
// 2. For all non-top iterators having the same key as top iter popped
// from the previous step, add them to the same list and pop it
// temporarily from the heap
// 3. Once no other iters have the same key as the top iter from step 1,
// populate the value/columns and attribute_groups from the list
// collected in step 1 and 2 and add all the iters back to the heap
MultiCFIterator Refactor - CoalescingIterator & AttributeGroupIterator (#12480) Summary: There are a couple of reasons to modify the current implementation of the MultiCfIterator, which implements the generic `Iterator` interface. - The default behavior of `value()`/`columns()` returning data from different Column Families for different keys can be prone to errors, even though there might be valid use cases where users do not care about the origin of the value/columns. - The `attribute_groups()` API, which is not yet implemented, will not be useful for a single-CF iterator. In this PR, we are implementing the following changes: - `IteratorBase` introduced, which includes all basic iterator functions except `value()` and `columns()`. - `Iterator`, which now inherits from `IteratorBase`, includes `value()` and `columns()`. - New public interface `AttributeGroupIterator` inherits from `IteratorBase` and additionally includes `attribute_groups()` (to be implemented). - Renamed former `MultiCfIterator` to `CoalescingIterator` which inherits from `Iterator` - Existing MultiCfIteratorTest has been split into two - `CoalescingIteratorTest` and `AttributeGroupIteratorTest`. - Moved AttributeGroup related code from `wide_columns.h` to a new file, `attribute_groups.h`. Some Implementation Details - `MultiCfIteratorImpl` takes two functions - `populate_func` and `reset_func` and use them to populate `value_` and `columns_` in CoalescingIterator and `attribute_groups_` in AttributeGroupIterator. In CoalescingIterator, populate_func is `Coalesce()`, in AttributeGroupIterator populate_func is `AddToAttributeGroups()`. `reset_func` clears populated value_, columns_ and attribute_groups_ accordingly. - `Coalesce()` merge sorts columns from multiple CFs when a key exists in more than on CFs. column that appears in later CF overwrites the prior ones. For example, if CF1 has `"key_1" ==> {"col_1": "foo", "col_2", "baz"}` and CF2 has `"key_1" ==> {"col_2": "quux", "col_3", "bla"}`, and when the iterator is at `key_1`, `columns()` will return `{"col_1": "foo", "col_2", "quux", "col_3", "bla"}` In this example, `value()` will be empty, because none of them have values for `kDefaultColumnName` Pull Request resolved: https://github.com/facebook/rocksdb/pull/12480 Test Plan: ## Unit Test ``` ./multi_cf_iterator_test ``` ## Performance Test To make sure this change does not impact existing `Iterator` performance **Build** ``` $> make -j64 release ``` **Setup** ``` $> TEST_TMPDIR=/dev/shm/db_bench ./db_bench -benchmarks="filluniquerandom" -key_size=32 -value_size=512 -num=1000000 -compression_type=none ``` **Run** ``` TEST_TMPDIR=/dev/shm/db_bench ./db_bench -use_existing_db=1 -benchmarks="newiterator,seekrandom" -cache_size=10485760000 ``` **Before the change** ``` DB path: [/dev/shm/db_bench/dbbench] newiterator : 0.519 micros/op 1927904 ops/sec 0.519 seconds 1000000 operations; DB path: [/dev/shm/db_bench/dbbench] seekrandom : 5.302 micros/op 188589 ops/sec 5.303 seconds 1000000 operations; (0 of 1000000 found) ``` **After the change** ``` DB path: [/dev/shm/db_bench/dbbench] newiterator : 0.497 micros/op 2011012 ops/sec 0.497 seconds 1000000 operations; DB path: [/dev/shm/db_bench/dbbench] seekrandom : 5.252 micros/op 190405 ops/sec 5.252 seconds 1000000 operations; (0 of 1000000 found) ``` Reviewed By: ltamasi Differential Revision: D55353909 Pulled By: jaykorean fbshipit-source-id: 8d7786ffee09e022261ce34aa60e8633685e1946
2024-04-11 18:34:04 +00:00
assert(!heap.empty());
auto top = heap.top();
heap.pop();
MultiCfIterator - AttributeGroupIter Impl & CoalescingIter Optimization (#12534) Summary: Continuing from the previous MultiCfIterator Implementations - (https://github.com/facebook/rocksdb/issues/12422, https://github.com/facebook/rocksdb/issues/12480 #12465), this PR completes the `AttributeGroupIterator` by implementing `AttributeGroupIteratorImpl::AddToAttributeGroups()`. While implementing the `AttributeGroupIterator`, we had to make some changes in `MultiCfIteratorImpl` and found an opportunity to improve `Coalesce()` in `CoalescingIterator`. Lifting `UNDER CONSTRUCTION - DO NOT USE` comment by replacing it with `EXPERIMENTAL` Here are some implementation details: - `IteratorAttributeGroups` is introduced to avoid having to copy all `WideColumn` objects during iteration. - `PopulateIterator()` no longer advances non-top iterators that have the same key as the top iterator in the heap. - `AdvanceIterator()` needs to advance the non-top iterators when they have the same key as the top iterator in the heap. - Instead of populating one by one, `PopulateIterator()` now collects all items with the same key and calls `populate_func(items)` at once. - This allowed optimization in `Coalesce()` such that we no longer do K-1 rounds of 2-way merge, but do one K-way merge instead. Pull Request resolved: https://github.com/facebook/rocksdb/pull/12534 Test Plan: Uncommented the assertions in `verifyAttributeGroupIterator()` ``` ./multi_cf_iterator_test ``` Reviewed By: ltamasi Differential Revision: D56089019 Pulled By: jaykorean fbshipit-source-id: 6b0b4247e221f69b40b147d41492008cc9b15054
2024-04-16 15:45:38 +00:00
autovector<MultiCfIteratorInfo> to_populate;
to_populate.push_back(top);
MultiCFIterator Refactor - CoalescingIterator & AttributeGroupIterator (#12480) Summary: There are a couple of reasons to modify the current implementation of the MultiCfIterator, which implements the generic `Iterator` interface. - The default behavior of `value()`/`columns()` returning data from different Column Families for different keys can be prone to errors, even though there might be valid use cases where users do not care about the origin of the value/columns. - The `attribute_groups()` API, which is not yet implemented, will not be useful for a single-CF iterator. In this PR, we are implementing the following changes: - `IteratorBase` introduced, which includes all basic iterator functions except `value()` and `columns()`. - `Iterator`, which now inherits from `IteratorBase`, includes `value()` and `columns()`. - New public interface `AttributeGroupIterator` inherits from `IteratorBase` and additionally includes `attribute_groups()` (to be implemented). - Renamed former `MultiCfIterator` to `CoalescingIterator` which inherits from `Iterator` - Existing MultiCfIteratorTest has been split into two - `CoalescingIteratorTest` and `AttributeGroupIteratorTest`. - Moved AttributeGroup related code from `wide_columns.h` to a new file, `attribute_groups.h`. Some Implementation Details - `MultiCfIteratorImpl` takes two functions - `populate_func` and `reset_func` and use them to populate `value_` and `columns_` in CoalescingIterator and `attribute_groups_` in AttributeGroupIterator. In CoalescingIterator, populate_func is `Coalesce()`, in AttributeGroupIterator populate_func is `AddToAttributeGroups()`. `reset_func` clears populated value_, columns_ and attribute_groups_ accordingly. - `Coalesce()` merge sorts columns from multiple CFs when a key exists in more than on CFs. column that appears in later CF overwrites the prior ones. For example, if CF1 has `"key_1" ==> {"col_1": "foo", "col_2", "baz"}` and CF2 has `"key_1" ==> {"col_2": "quux", "col_3", "bla"}`, and when the iterator is at `key_1`, `columns()` will return `{"col_1": "foo", "col_2", "quux", "col_3", "bla"}` In this example, `value()` will be empty, because none of them have values for `kDefaultColumnName` Pull Request resolved: https://github.com/facebook/rocksdb/pull/12480 Test Plan: ## Unit Test ``` ./multi_cf_iterator_test ``` ## Performance Test To make sure this change does not impact existing `Iterator` performance **Build** ``` $> make -j64 release ``` **Setup** ``` $> TEST_TMPDIR=/dev/shm/db_bench ./db_bench -benchmarks="filluniquerandom" -key_size=32 -value_size=512 -num=1000000 -compression_type=none ``` **Run** ``` TEST_TMPDIR=/dev/shm/db_bench ./db_bench -use_existing_db=1 -benchmarks="newiterator,seekrandom" -cache_size=10485760000 ``` **Before the change** ``` DB path: [/dev/shm/db_bench/dbbench] newiterator : 0.519 micros/op 1927904 ops/sec 0.519 seconds 1000000 operations; DB path: [/dev/shm/db_bench/dbbench] seekrandom : 5.302 micros/op 188589 ops/sec 5.303 seconds 1000000 operations; (0 of 1000000 found) ``` **After the change** ``` DB path: [/dev/shm/db_bench/dbbench] newiterator : 0.497 micros/op 2011012 ops/sec 0.497 seconds 1000000 operations; DB path: [/dev/shm/db_bench/dbbench] seekrandom : 5.252 micros/op 190405 ops/sec 5.252 seconds 1000000 operations; (0 of 1000000 found) ``` Reviewed By: ltamasi Differential Revision: D55353909 Pulled By: jaykorean fbshipit-source-id: 8d7786ffee09e022261ce34aa60e8633685e1946
2024-04-11 18:34:04 +00:00
if (!heap.empty()) {
MultiCfIterator - AttributeGroupIter Impl & CoalescingIter Optimization (#12534) Summary: Continuing from the previous MultiCfIterator Implementations - (https://github.com/facebook/rocksdb/issues/12422, https://github.com/facebook/rocksdb/issues/12480 #12465), this PR completes the `AttributeGroupIterator` by implementing `AttributeGroupIteratorImpl::AddToAttributeGroups()`. While implementing the `AttributeGroupIterator`, we had to make some changes in `MultiCfIteratorImpl` and found an opportunity to improve `Coalesce()` in `CoalescingIterator`. Lifting `UNDER CONSTRUCTION - DO NOT USE` comment by replacing it with `EXPERIMENTAL` Here are some implementation details: - `IteratorAttributeGroups` is introduced to avoid having to copy all `WideColumn` objects during iteration. - `PopulateIterator()` no longer advances non-top iterators that have the same key as the top iterator in the heap. - `AdvanceIterator()` needs to advance the non-top iterators when they have the same key as the top iterator in the heap. - Instead of populating one by one, `PopulateIterator()` now collects all items with the same key and calls `populate_func(items)` at once. - This allowed optimization in `Coalesce()` such that we no longer do K-1 rounds of 2-way merge, but do one K-way merge instead. Pull Request resolved: https://github.com/facebook/rocksdb/pull/12534 Test Plan: Uncommented the assertions in `verifyAttributeGroupIterator()` ``` ./multi_cf_iterator_test ``` Reviewed By: ltamasi Differential Revision: D56089019 Pulled By: jaykorean fbshipit-source-id: 6b0b4247e221f69b40b147d41492008cc9b15054
2024-04-16 15:45:38 +00:00
auto current = heap.top();
assert(current.iterator);
while (current.iterator->Valid() &&
comparator_->Compare(top.iterator->key(),
current.iterator->key()) == 0) {
assert(current.iterator->status().ok());
to_populate.push_back(current);
heap.pop();
MultiCFIterator Refactor - CoalescingIterator & AttributeGroupIterator (#12480) Summary: There are a couple of reasons to modify the current implementation of the MultiCfIterator, which implements the generic `Iterator` interface. - The default behavior of `value()`/`columns()` returning data from different Column Families for different keys can be prone to errors, even though there might be valid use cases where users do not care about the origin of the value/columns. - The `attribute_groups()` API, which is not yet implemented, will not be useful for a single-CF iterator. In this PR, we are implementing the following changes: - `IteratorBase` introduced, which includes all basic iterator functions except `value()` and `columns()`. - `Iterator`, which now inherits from `IteratorBase`, includes `value()` and `columns()`. - New public interface `AttributeGroupIterator` inherits from `IteratorBase` and additionally includes `attribute_groups()` (to be implemented). - Renamed former `MultiCfIterator` to `CoalescingIterator` which inherits from `Iterator` - Existing MultiCfIteratorTest has been split into two - `CoalescingIteratorTest` and `AttributeGroupIteratorTest`. - Moved AttributeGroup related code from `wide_columns.h` to a new file, `attribute_groups.h`. Some Implementation Details - `MultiCfIteratorImpl` takes two functions - `populate_func` and `reset_func` and use them to populate `value_` and `columns_` in CoalescingIterator and `attribute_groups_` in AttributeGroupIterator. In CoalescingIterator, populate_func is `Coalesce()`, in AttributeGroupIterator populate_func is `AddToAttributeGroups()`. `reset_func` clears populated value_, columns_ and attribute_groups_ accordingly. - `Coalesce()` merge sorts columns from multiple CFs when a key exists in more than on CFs. column that appears in later CF overwrites the prior ones. For example, if CF1 has `"key_1" ==> {"col_1": "foo", "col_2", "baz"}` and CF2 has `"key_1" ==> {"col_2": "quux", "col_3", "bla"}`, and when the iterator is at `key_1`, `columns()` will return `{"col_1": "foo", "col_2", "quux", "col_3", "bla"}` In this example, `value()` will be empty, because none of them have values for `kDefaultColumnName` Pull Request resolved: https://github.com/facebook/rocksdb/pull/12480 Test Plan: ## Unit Test ``` ./multi_cf_iterator_test ``` ## Performance Test To make sure this change does not impact existing `Iterator` performance **Build** ``` $> make -j64 release ``` **Setup** ``` $> TEST_TMPDIR=/dev/shm/db_bench ./db_bench -benchmarks="filluniquerandom" -key_size=32 -value_size=512 -num=1000000 -compression_type=none ``` **Run** ``` TEST_TMPDIR=/dev/shm/db_bench ./db_bench -use_existing_db=1 -benchmarks="newiterator,seekrandom" -cache_size=10485760000 ``` **Before the change** ``` DB path: [/dev/shm/db_bench/dbbench] newiterator : 0.519 micros/op 1927904 ops/sec 0.519 seconds 1000000 operations; DB path: [/dev/shm/db_bench/dbbench] seekrandom : 5.302 micros/op 188589 ops/sec 5.303 seconds 1000000 operations; (0 of 1000000 found) ``` **After the change** ``` DB path: [/dev/shm/db_bench/dbbench] newiterator : 0.497 micros/op 2011012 ops/sec 0.497 seconds 1000000 operations; DB path: [/dev/shm/db_bench/dbbench] seekrandom : 5.252 micros/op 190405 ops/sec 5.252 seconds 1000000 operations; (0 of 1000000 found) ``` Reviewed By: ltamasi Differential Revision: D55353909 Pulled By: jaykorean fbshipit-source-id: 8d7786ffee09e022261ce34aa60e8633685e1946
2024-04-11 18:34:04 +00:00
if (!heap.empty()) {
MultiCfIterator - AttributeGroupIter Impl & CoalescingIter Optimization (#12534) Summary: Continuing from the previous MultiCfIterator Implementations - (https://github.com/facebook/rocksdb/issues/12422, https://github.com/facebook/rocksdb/issues/12480 #12465), this PR completes the `AttributeGroupIterator` by implementing `AttributeGroupIteratorImpl::AddToAttributeGroups()`. While implementing the `AttributeGroupIterator`, we had to make some changes in `MultiCfIteratorImpl` and found an opportunity to improve `Coalesce()` in `CoalescingIterator`. Lifting `UNDER CONSTRUCTION - DO NOT USE` comment by replacing it with `EXPERIMENTAL` Here are some implementation details: - `IteratorAttributeGroups` is introduced to avoid having to copy all `WideColumn` objects during iteration. - `PopulateIterator()` no longer advances non-top iterators that have the same key as the top iterator in the heap. - `AdvanceIterator()` needs to advance the non-top iterators when they have the same key as the top iterator in the heap. - Instead of populating one by one, `PopulateIterator()` now collects all items with the same key and calls `populate_func(items)` at once. - This allowed optimization in `Coalesce()` such that we no longer do K-1 rounds of 2-way merge, but do one K-way merge instead. Pull Request resolved: https://github.com/facebook/rocksdb/pull/12534 Test Plan: Uncommented the assertions in `verifyAttributeGroupIterator()` ``` ./multi_cf_iterator_test ``` Reviewed By: ltamasi Differential Revision: D56089019 Pulled By: jaykorean fbshipit-source-id: 6b0b4247e221f69b40b147d41492008cc9b15054
2024-04-16 15:45:38 +00:00
current = heap.top();
} else {
break;
MultiCFIterator Refactor - CoalescingIterator & AttributeGroupIterator (#12480) Summary: There are a couple of reasons to modify the current implementation of the MultiCfIterator, which implements the generic `Iterator` interface. - The default behavior of `value()`/`columns()` returning data from different Column Families for different keys can be prone to errors, even though there might be valid use cases where users do not care about the origin of the value/columns. - The `attribute_groups()` API, which is not yet implemented, will not be useful for a single-CF iterator. In this PR, we are implementing the following changes: - `IteratorBase` introduced, which includes all basic iterator functions except `value()` and `columns()`. - `Iterator`, which now inherits from `IteratorBase`, includes `value()` and `columns()`. - New public interface `AttributeGroupIterator` inherits from `IteratorBase` and additionally includes `attribute_groups()` (to be implemented). - Renamed former `MultiCfIterator` to `CoalescingIterator` which inherits from `Iterator` - Existing MultiCfIteratorTest has been split into two - `CoalescingIteratorTest` and `AttributeGroupIteratorTest`. - Moved AttributeGroup related code from `wide_columns.h` to a new file, `attribute_groups.h`. Some Implementation Details - `MultiCfIteratorImpl` takes two functions - `populate_func` and `reset_func` and use them to populate `value_` and `columns_` in CoalescingIterator and `attribute_groups_` in AttributeGroupIterator. In CoalescingIterator, populate_func is `Coalesce()`, in AttributeGroupIterator populate_func is `AddToAttributeGroups()`. `reset_func` clears populated value_, columns_ and attribute_groups_ accordingly. - `Coalesce()` merge sorts columns from multiple CFs when a key exists in more than on CFs. column that appears in later CF overwrites the prior ones. For example, if CF1 has `"key_1" ==> {"col_1": "foo", "col_2", "baz"}` and CF2 has `"key_1" ==> {"col_2": "quux", "col_3", "bla"}`, and when the iterator is at `key_1`, `columns()` will return `{"col_1": "foo", "col_2", "quux", "col_3", "bla"}` In this example, `value()` will be empty, because none of them have values for `kDefaultColumnName` Pull Request resolved: https://github.com/facebook/rocksdb/pull/12480 Test Plan: ## Unit Test ``` ./multi_cf_iterator_test ``` ## Performance Test To make sure this change does not impact existing `Iterator` performance **Build** ``` $> make -j64 release ``` **Setup** ``` $> TEST_TMPDIR=/dev/shm/db_bench ./db_bench -benchmarks="filluniquerandom" -key_size=32 -value_size=512 -num=1000000 -compression_type=none ``` **Run** ``` TEST_TMPDIR=/dev/shm/db_bench ./db_bench -use_existing_db=1 -benchmarks="newiterator,seekrandom" -cache_size=10485760000 ``` **Before the change** ``` DB path: [/dev/shm/db_bench/dbbench] newiterator : 0.519 micros/op 1927904 ops/sec 0.519 seconds 1000000 operations; DB path: [/dev/shm/db_bench/dbbench] seekrandom : 5.302 micros/op 188589 ops/sec 5.303 seconds 1000000 operations; (0 of 1000000 found) ``` **After the change** ``` DB path: [/dev/shm/db_bench/dbbench] newiterator : 0.497 micros/op 2011012 ops/sec 0.497 seconds 1000000 operations; DB path: [/dev/shm/db_bench/dbbench] seekrandom : 5.252 micros/op 190405 ops/sec 5.252 seconds 1000000 operations; (0 of 1000000 found) ``` Reviewed By: ltamasi Differential Revision: D55353909 Pulled By: jaykorean fbshipit-source-id: 8d7786ffee09e022261ce34aa60e8633685e1946
2024-04-11 18:34:04 +00:00
}
}
}
MultiCfIterator - AttributeGroupIter Impl & CoalescingIter Optimization (#12534) Summary: Continuing from the previous MultiCfIterator Implementations - (https://github.com/facebook/rocksdb/issues/12422, https://github.com/facebook/rocksdb/issues/12480 #12465), this PR completes the `AttributeGroupIterator` by implementing `AttributeGroupIteratorImpl::AddToAttributeGroups()`. While implementing the `AttributeGroupIterator`, we had to make some changes in `MultiCfIteratorImpl` and found an opportunity to improve `Coalesce()` in `CoalescingIterator`. Lifting `UNDER CONSTRUCTION - DO NOT USE` comment by replacing it with `EXPERIMENTAL` Here are some implementation details: - `IteratorAttributeGroups` is introduced to avoid having to copy all `WideColumn` objects during iteration. - `PopulateIterator()` no longer advances non-top iterators that have the same key as the top iterator in the heap. - `AdvanceIterator()` needs to advance the non-top iterators when they have the same key as the top iterator in the heap. - Instead of populating one by one, `PopulateIterator()` now collects all items with the same key and calls `populate_func(items)` at once. - This allowed optimization in `Coalesce()` such that we no longer do K-1 rounds of 2-way merge, but do one K-way merge instead. Pull Request resolved: https://github.com/facebook/rocksdb/pull/12534 Test Plan: Uncommented the assertions in `verifyAttributeGroupIterator()` ``` ./multi_cf_iterator_test ``` Reviewed By: ltamasi Differential Revision: D56089019 Pulled By: jaykorean fbshipit-source-id: 6b0b4247e221f69b40b147d41492008cc9b15054
2024-04-16 15:45:38 +00:00
// Add the items back to the heap
for (auto& item : to_populate) {
heap.push(item);
}
populate_func_(to_populate);
MultiCFIterator Refactor - CoalescingIterator & AttributeGroupIterator (#12480) Summary: There are a couple of reasons to modify the current implementation of the MultiCfIterator, which implements the generic `Iterator` interface. - The default behavior of `value()`/`columns()` returning data from different Column Families for different keys can be prone to errors, even though there might be valid use cases where users do not care about the origin of the value/columns. - The `attribute_groups()` API, which is not yet implemented, will not be useful for a single-CF iterator. In this PR, we are implementing the following changes: - `IteratorBase` introduced, which includes all basic iterator functions except `value()` and `columns()`. - `Iterator`, which now inherits from `IteratorBase`, includes `value()` and `columns()`. - New public interface `AttributeGroupIterator` inherits from `IteratorBase` and additionally includes `attribute_groups()` (to be implemented). - Renamed former `MultiCfIterator` to `CoalescingIterator` which inherits from `Iterator` - Existing MultiCfIteratorTest has been split into two - `CoalescingIteratorTest` and `AttributeGroupIteratorTest`. - Moved AttributeGroup related code from `wide_columns.h` to a new file, `attribute_groups.h`. Some Implementation Details - `MultiCfIteratorImpl` takes two functions - `populate_func` and `reset_func` and use them to populate `value_` and `columns_` in CoalescingIterator and `attribute_groups_` in AttributeGroupIterator. In CoalescingIterator, populate_func is `Coalesce()`, in AttributeGroupIterator populate_func is `AddToAttributeGroups()`. `reset_func` clears populated value_, columns_ and attribute_groups_ accordingly. - `Coalesce()` merge sorts columns from multiple CFs when a key exists in more than on CFs. column that appears in later CF overwrites the prior ones. For example, if CF1 has `"key_1" ==> {"col_1": "foo", "col_2", "baz"}` and CF2 has `"key_1" ==> {"col_2": "quux", "col_3", "bla"}`, and when the iterator is at `key_1`, `columns()` will return `{"col_1": "foo", "col_2", "quux", "col_3", "bla"}` In this example, `value()` will be empty, because none of them have values for `kDefaultColumnName` Pull Request resolved: https://github.com/facebook/rocksdb/pull/12480 Test Plan: ## Unit Test ``` ./multi_cf_iterator_test ``` ## Performance Test To make sure this change does not impact existing `Iterator` performance **Build** ``` $> make -j64 release ``` **Setup** ``` $> TEST_TMPDIR=/dev/shm/db_bench ./db_bench -benchmarks="filluniquerandom" -key_size=32 -value_size=512 -num=1000000 -compression_type=none ``` **Run** ``` TEST_TMPDIR=/dev/shm/db_bench ./db_bench -use_existing_db=1 -benchmarks="newiterator,seekrandom" -cache_size=10485760000 ``` **Before the change** ``` DB path: [/dev/shm/db_bench/dbbench] newiterator : 0.519 micros/op 1927904 ops/sec 0.519 seconds 1000000 operations; DB path: [/dev/shm/db_bench/dbbench] seekrandom : 5.302 micros/op 188589 ops/sec 5.303 seconds 1000000 operations; (0 of 1000000 found) ``` **After the change** ``` DB path: [/dev/shm/db_bench/dbbench] newiterator : 0.497 micros/op 2011012 ops/sec 0.497 seconds 1000000 operations; DB path: [/dev/shm/db_bench/dbbench] seekrandom : 5.252 micros/op 190405 ops/sec 5.252 seconds 1000000 operations; (0 of 1000000 found) ``` Reviewed By: ltamasi Differential Revision: D55353909 Pulled By: jaykorean fbshipit-source-id: 8d7786ffee09e022261ce34aa60e8633685e1946
2024-04-11 18:34:04 +00:00
}
};
} // namespace ROCKSDB_NAMESPACE