MultiCFIterator Refactor - CoalescingIterator & AttributeGroupIterator (#12480)
Summary:
There are a couple of reasons to modify the current implementation of the MultiCfIterator, which implements the generic `Iterator` interface.
- The default behavior of `value()`/`columns()` returning data from different Column Families for different keys can be prone to errors, even though there might be valid use cases where users do not care about the origin of the value/columns.
- The `attribute_groups()` API, which is not yet implemented, will not be useful for a single-CF iterator.
In this PR, we are implementing the following changes:
- `IteratorBase` introduced, which includes all basic iterator functions except `value()` and `columns()`.
- `Iterator`, which now inherits from `IteratorBase`, includes `value()` and `columns()`.
- New public interface `AttributeGroupIterator` inherits from `IteratorBase` and additionally includes `attribute_groups()` (to be implemented).
- Renamed former `MultiCfIterator` to `CoalescingIterator` which inherits from `Iterator`
- Existing MultiCfIteratorTest has been split into two - `CoalescingIteratorTest` and `AttributeGroupIteratorTest`.
- Moved AttributeGroup related code from `wide_columns.h` to a new file, `attribute_groups.h`.
Some Implementation Details
- `MultiCfIteratorImpl` takes two functions - `populate_func` and `reset_func` and use them to populate `value_` and `columns_` in CoalescingIterator and `attribute_groups_` in AttributeGroupIterator. In CoalescingIterator, populate_func is `Coalesce()`, in AttributeGroupIterator populate_func is `AddToAttributeGroups()`. `reset_func` clears populated value_, columns_ and attribute_groups_ accordingly.
- `Coalesce()` merge sorts columns from multiple CFs when a key exists in more than on CFs. column that appears in later CF overwrites the prior ones.
For example, if CF1 has `"key_1" ==> {"col_1": "foo", "col_2", "baz"}` and CF2 has `"key_1" ==> {"col_2": "quux", "col_3", "bla"}`, and when the iterator is at `key_1`, `columns()` will return `{"col_1": "foo", "col_2", "quux", "col_3", "bla"}`
In this example, `value()` will be empty, because none of them have values for `kDefaultColumnName`
Pull Request resolved: https://github.com/facebook/rocksdb/pull/12480
Test Plan:
## Unit Test
```
./multi_cf_iterator_test
```
## Performance Test
To make sure this change does not impact existing `Iterator` performance
**Build**
```
$> make -j64 release
```
**Setup**
```
$> TEST_TMPDIR=/dev/shm/db_bench ./db_bench -benchmarks="filluniquerandom" -key_size=32 -value_size=512 -num=1000000 -compression_type=none
```
**Run**
```
TEST_TMPDIR=/dev/shm/db_bench ./db_bench -use_existing_db=1 -benchmarks="newiterator,seekrandom" -cache_size=10485760000
```
**Before the change**
```
DB path: [/dev/shm/db_bench/dbbench]
newiterator : 0.519 micros/op 1927904 ops/sec 0.519 seconds 1000000 operations;
DB path: [/dev/shm/db_bench/dbbench]
seekrandom : 5.302 micros/op 188589 ops/sec 5.303 seconds 1000000 operations; (0 of 1000000 found)
```
**After the change**
```
DB path: [/dev/shm/db_bench/dbbench]
newiterator : 0.497 micros/op 2011012 ops/sec 0.497 seconds 1000000 operations;
DB path: [/dev/shm/db_bench/dbbench]
seekrandom : 5.252 micros/op 190405 ops/sec 5.252 seconds 1000000 operations; (0 of 1000000 found)
```
Reviewed By: ltamasi
Differential Revision: D55353909
Pulled By: jaykorean
fbshipit-source-id: 8d7786ffee09e022261ce34aa60e8633685e1946
2024-04-11 18:34:04 +00:00
|
|
|
// Copyright (c) Meta Platforms, Inc. and affiliates.
|
|
|
|
// This source code is licensed under both the GPLv2 (found in the
|
|
|
|
// COPYING file in the root directory) and Apache 2.0 License
|
|
|
|
// (found in the LICENSE.Apache file in the root directory).
|
|
|
|
|
|
|
|
#pragma once
|
|
|
|
|
|
|
|
#include <functional>
|
|
|
|
#include <variant>
|
|
|
|
|
|
|
|
#include "rocksdb/comparator.h"
|
|
|
|
#include "rocksdb/iterator.h"
|
|
|
|
#include "rocksdb/options.h"
|
|
|
|
#include "util/heap.h"
|
|
|
|
|
|
|
|
namespace ROCKSDB_NAMESPACE {
|
|
|
|
|
2024-04-16 15:45:38 +00:00
|
|
|
struct MultiCfIteratorInfo {
|
|
|
|
ColumnFamilyHandle* cfh;
|
|
|
|
Iterator* iterator;
|
|
|
|
int order;
|
|
|
|
};
|
|
|
|
|
MultiCFIterator Refactor - CoalescingIterator & AttributeGroupIterator (#12480)
Summary:
There are a couple of reasons to modify the current implementation of the MultiCfIterator, which implements the generic `Iterator` interface.
- The default behavior of `value()`/`columns()` returning data from different Column Families for different keys can be prone to errors, even though there might be valid use cases where users do not care about the origin of the value/columns.
- The `attribute_groups()` API, which is not yet implemented, will not be useful for a single-CF iterator.
In this PR, we are implementing the following changes:
- `IteratorBase` introduced, which includes all basic iterator functions except `value()` and `columns()`.
- `Iterator`, which now inherits from `IteratorBase`, includes `value()` and `columns()`.
- New public interface `AttributeGroupIterator` inherits from `IteratorBase` and additionally includes `attribute_groups()` (to be implemented).
- Renamed former `MultiCfIterator` to `CoalescingIterator` which inherits from `Iterator`
- Existing MultiCfIteratorTest has been split into two - `CoalescingIteratorTest` and `AttributeGroupIteratorTest`.
- Moved AttributeGroup related code from `wide_columns.h` to a new file, `attribute_groups.h`.
Some Implementation Details
- `MultiCfIteratorImpl` takes two functions - `populate_func` and `reset_func` and use them to populate `value_` and `columns_` in CoalescingIterator and `attribute_groups_` in AttributeGroupIterator. In CoalescingIterator, populate_func is `Coalesce()`, in AttributeGroupIterator populate_func is `AddToAttributeGroups()`. `reset_func` clears populated value_, columns_ and attribute_groups_ accordingly.
- `Coalesce()` merge sorts columns from multiple CFs when a key exists in more than on CFs. column that appears in later CF overwrites the prior ones.
For example, if CF1 has `"key_1" ==> {"col_1": "foo", "col_2", "baz"}` and CF2 has `"key_1" ==> {"col_2": "quux", "col_3", "bla"}`, and when the iterator is at `key_1`, `columns()` will return `{"col_1": "foo", "col_2", "quux", "col_3", "bla"}`
In this example, `value()` will be empty, because none of them have values for `kDefaultColumnName`
Pull Request resolved: https://github.com/facebook/rocksdb/pull/12480
Test Plan:
## Unit Test
```
./multi_cf_iterator_test
```
## Performance Test
To make sure this change does not impact existing `Iterator` performance
**Build**
```
$> make -j64 release
```
**Setup**
```
$> TEST_TMPDIR=/dev/shm/db_bench ./db_bench -benchmarks="filluniquerandom" -key_size=32 -value_size=512 -num=1000000 -compression_type=none
```
**Run**
```
TEST_TMPDIR=/dev/shm/db_bench ./db_bench -use_existing_db=1 -benchmarks="newiterator,seekrandom" -cache_size=10485760000
```
**Before the change**
```
DB path: [/dev/shm/db_bench/dbbench]
newiterator : 0.519 micros/op 1927904 ops/sec 0.519 seconds 1000000 operations;
DB path: [/dev/shm/db_bench/dbbench]
seekrandom : 5.302 micros/op 188589 ops/sec 5.303 seconds 1000000 operations; (0 of 1000000 found)
```
**After the change**
```
DB path: [/dev/shm/db_bench/dbbench]
newiterator : 0.497 micros/op 2011012 ops/sec 0.497 seconds 1000000 operations;
DB path: [/dev/shm/db_bench/dbbench]
seekrandom : 5.252 micros/op 190405 ops/sec 5.252 seconds 1000000 operations; (0 of 1000000 found)
```
Reviewed By: ltamasi
Differential Revision: D55353909
Pulled By: jaykorean
fbshipit-source-id: 8d7786ffee09e022261ce34aa60e8633685e1946
2024-04-11 18:34:04 +00:00
|
|
|
class MultiCfIteratorImpl {
|
|
|
|
public:
|
|
|
|
MultiCfIteratorImpl(
|
|
|
|
const Comparator* comparator,
|
|
|
|
const std::vector<ColumnFamilyHandle*>& column_families,
|
|
|
|
const std::vector<Iterator*>& child_iterators,
|
|
|
|
std::function<void()> reset_func,
|
2024-04-16 15:45:38 +00:00
|
|
|
std::function<void(const autovector<MultiCfIteratorInfo>&)> populate_func)
|
MultiCFIterator Refactor - CoalescingIterator & AttributeGroupIterator (#12480)
Summary:
There are a couple of reasons to modify the current implementation of the MultiCfIterator, which implements the generic `Iterator` interface.
- The default behavior of `value()`/`columns()` returning data from different Column Families for different keys can be prone to errors, even though there might be valid use cases where users do not care about the origin of the value/columns.
- The `attribute_groups()` API, which is not yet implemented, will not be useful for a single-CF iterator.
In this PR, we are implementing the following changes:
- `IteratorBase` introduced, which includes all basic iterator functions except `value()` and `columns()`.
- `Iterator`, which now inherits from `IteratorBase`, includes `value()` and `columns()`.
- New public interface `AttributeGroupIterator` inherits from `IteratorBase` and additionally includes `attribute_groups()` (to be implemented).
- Renamed former `MultiCfIterator` to `CoalescingIterator` which inherits from `Iterator`
- Existing MultiCfIteratorTest has been split into two - `CoalescingIteratorTest` and `AttributeGroupIteratorTest`.
- Moved AttributeGroup related code from `wide_columns.h` to a new file, `attribute_groups.h`.
Some Implementation Details
- `MultiCfIteratorImpl` takes two functions - `populate_func` and `reset_func` and use them to populate `value_` and `columns_` in CoalescingIterator and `attribute_groups_` in AttributeGroupIterator. In CoalescingIterator, populate_func is `Coalesce()`, in AttributeGroupIterator populate_func is `AddToAttributeGroups()`. `reset_func` clears populated value_, columns_ and attribute_groups_ accordingly.
- `Coalesce()` merge sorts columns from multiple CFs when a key exists in more than on CFs. column that appears in later CF overwrites the prior ones.
For example, if CF1 has `"key_1" ==> {"col_1": "foo", "col_2", "baz"}` and CF2 has `"key_1" ==> {"col_2": "quux", "col_3", "bla"}`, and when the iterator is at `key_1`, `columns()` will return `{"col_1": "foo", "col_2", "quux", "col_3", "bla"}`
In this example, `value()` will be empty, because none of them have values for `kDefaultColumnName`
Pull Request resolved: https://github.com/facebook/rocksdb/pull/12480
Test Plan:
## Unit Test
```
./multi_cf_iterator_test
```
## Performance Test
To make sure this change does not impact existing `Iterator` performance
**Build**
```
$> make -j64 release
```
**Setup**
```
$> TEST_TMPDIR=/dev/shm/db_bench ./db_bench -benchmarks="filluniquerandom" -key_size=32 -value_size=512 -num=1000000 -compression_type=none
```
**Run**
```
TEST_TMPDIR=/dev/shm/db_bench ./db_bench -use_existing_db=1 -benchmarks="newiterator,seekrandom" -cache_size=10485760000
```
**Before the change**
```
DB path: [/dev/shm/db_bench/dbbench]
newiterator : 0.519 micros/op 1927904 ops/sec 0.519 seconds 1000000 operations;
DB path: [/dev/shm/db_bench/dbbench]
seekrandom : 5.302 micros/op 188589 ops/sec 5.303 seconds 1000000 operations; (0 of 1000000 found)
```
**After the change**
```
DB path: [/dev/shm/db_bench/dbbench]
newiterator : 0.497 micros/op 2011012 ops/sec 0.497 seconds 1000000 operations;
DB path: [/dev/shm/db_bench/dbbench]
seekrandom : 5.252 micros/op 190405 ops/sec 5.252 seconds 1000000 operations; (0 of 1000000 found)
```
Reviewed By: ltamasi
Differential Revision: D55353909
Pulled By: jaykorean
fbshipit-source-id: 8d7786ffee09e022261ce34aa60e8633685e1946
2024-04-11 18:34:04 +00:00
|
|
|
: comparator_(comparator),
|
|
|
|
heap_(MultiCfMinHeap(
|
|
|
|
MultiCfHeapItemComparator<std::greater<int>>(comparator_))),
|
|
|
|
reset_func_(std::move(reset_func)),
|
|
|
|
populate_func_(std::move(populate_func)) {
|
|
|
|
assert(column_families.size() > 0 &&
|
|
|
|
column_families.size() == child_iterators.size());
|
|
|
|
cfh_iter_pairs_.reserve(column_families.size());
|
|
|
|
for (size_t i = 0; i < column_families.size(); ++i) {
|
|
|
|
cfh_iter_pairs_.emplace_back(
|
|
|
|
column_families[i], std::unique_ptr<Iterator>(child_iterators[i]));
|
|
|
|
}
|
|
|
|
}
|
|
|
|
~MultiCfIteratorImpl() { status_.PermitUncheckedError(); }
|
|
|
|
|
|
|
|
// No copy allowed
|
|
|
|
MultiCfIteratorImpl(const MultiCfIteratorImpl&) = delete;
|
|
|
|
MultiCfIteratorImpl& operator=(const MultiCfIteratorImpl&) = delete;
|
|
|
|
|
|
|
|
Slice key() const {
|
|
|
|
assert(Valid());
|
|
|
|
return current()->key();
|
|
|
|
}
|
|
|
|
|
|
|
|
bool Valid() const {
|
|
|
|
if (std::holds_alternative<MultiCfMaxHeap>(heap_)) {
|
|
|
|
auto& max_heap = std::get<MultiCfMaxHeap>(heap_);
|
|
|
|
return !max_heap.empty() && status_.ok();
|
|
|
|
}
|
|
|
|
auto& min_heap = std::get<MultiCfMinHeap>(heap_);
|
|
|
|
return !min_heap.empty() && status_.ok();
|
|
|
|
}
|
|
|
|
|
|
|
|
Status status() const { return status_; }
|
|
|
|
|
|
|
|
void SeekToFirst() {
|
|
|
|
auto& min_heap = GetHeap<MultiCfMinHeap>([this]() { InitMinHeap(); });
|
2024-04-16 15:45:38 +00:00
|
|
|
SeekCommon(min_heap, [](Iterator* iter) { iter->SeekToFirst(); });
|
MultiCFIterator Refactor - CoalescingIterator & AttributeGroupIterator (#12480)
Summary:
There are a couple of reasons to modify the current implementation of the MultiCfIterator, which implements the generic `Iterator` interface.
- The default behavior of `value()`/`columns()` returning data from different Column Families for different keys can be prone to errors, even though there might be valid use cases where users do not care about the origin of the value/columns.
- The `attribute_groups()` API, which is not yet implemented, will not be useful for a single-CF iterator.
In this PR, we are implementing the following changes:
- `IteratorBase` introduced, which includes all basic iterator functions except `value()` and `columns()`.
- `Iterator`, which now inherits from `IteratorBase`, includes `value()` and `columns()`.
- New public interface `AttributeGroupIterator` inherits from `IteratorBase` and additionally includes `attribute_groups()` (to be implemented).
- Renamed former `MultiCfIterator` to `CoalescingIterator` which inherits from `Iterator`
- Existing MultiCfIteratorTest has been split into two - `CoalescingIteratorTest` and `AttributeGroupIteratorTest`.
- Moved AttributeGroup related code from `wide_columns.h` to a new file, `attribute_groups.h`.
Some Implementation Details
- `MultiCfIteratorImpl` takes two functions - `populate_func` and `reset_func` and use them to populate `value_` and `columns_` in CoalescingIterator and `attribute_groups_` in AttributeGroupIterator. In CoalescingIterator, populate_func is `Coalesce()`, in AttributeGroupIterator populate_func is `AddToAttributeGroups()`. `reset_func` clears populated value_, columns_ and attribute_groups_ accordingly.
- `Coalesce()` merge sorts columns from multiple CFs when a key exists in more than on CFs. column that appears in later CF overwrites the prior ones.
For example, if CF1 has `"key_1" ==> {"col_1": "foo", "col_2", "baz"}` and CF2 has `"key_1" ==> {"col_2": "quux", "col_3", "bla"}`, and when the iterator is at `key_1`, `columns()` will return `{"col_1": "foo", "col_2", "quux", "col_3", "bla"}`
In this example, `value()` will be empty, because none of them have values for `kDefaultColumnName`
Pull Request resolved: https://github.com/facebook/rocksdb/pull/12480
Test Plan:
## Unit Test
```
./multi_cf_iterator_test
```
## Performance Test
To make sure this change does not impact existing `Iterator` performance
**Build**
```
$> make -j64 release
```
**Setup**
```
$> TEST_TMPDIR=/dev/shm/db_bench ./db_bench -benchmarks="filluniquerandom" -key_size=32 -value_size=512 -num=1000000 -compression_type=none
```
**Run**
```
TEST_TMPDIR=/dev/shm/db_bench ./db_bench -use_existing_db=1 -benchmarks="newiterator,seekrandom" -cache_size=10485760000
```
**Before the change**
```
DB path: [/dev/shm/db_bench/dbbench]
newiterator : 0.519 micros/op 1927904 ops/sec 0.519 seconds 1000000 operations;
DB path: [/dev/shm/db_bench/dbbench]
seekrandom : 5.302 micros/op 188589 ops/sec 5.303 seconds 1000000 operations; (0 of 1000000 found)
```
**After the change**
```
DB path: [/dev/shm/db_bench/dbbench]
newiterator : 0.497 micros/op 2011012 ops/sec 0.497 seconds 1000000 operations;
DB path: [/dev/shm/db_bench/dbbench]
seekrandom : 5.252 micros/op 190405 ops/sec 5.252 seconds 1000000 operations; (0 of 1000000 found)
```
Reviewed By: ltamasi
Differential Revision: D55353909
Pulled By: jaykorean
fbshipit-source-id: 8d7786ffee09e022261ce34aa60e8633685e1946
2024-04-11 18:34:04 +00:00
|
|
|
}
|
|
|
|
void Seek(const Slice& target) {
|
|
|
|
auto& min_heap = GetHeap<MultiCfMinHeap>([this]() { InitMinHeap(); });
|
2024-04-16 15:45:38 +00:00
|
|
|
SeekCommon(min_heap, [&target](Iterator* iter) { iter->Seek(target); });
|
MultiCFIterator Refactor - CoalescingIterator & AttributeGroupIterator (#12480)
Summary:
There are a couple of reasons to modify the current implementation of the MultiCfIterator, which implements the generic `Iterator` interface.
- The default behavior of `value()`/`columns()` returning data from different Column Families for different keys can be prone to errors, even though there might be valid use cases where users do not care about the origin of the value/columns.
- The `attribute_groups()` API, which is not yet implemented, will not be useful for a single-CF iterator.
In this PR, we are implementing the following changes:
- `IteratorBase` introduced, which includes all basic iterator functions except `value()` and `columns()`.
- `Iterator`, which now inherits from `IteratorBase`, includes `value()` and `columns()`.
- New public interface `AttributeGroupIterator` inherits from `IteratorBase` and additionally includes `attribute_groups()` (to be implemented).
- Renamed former `MultiCfIterator` to `CoalescingIterator` which inherits from `Iterator`
- Existing MultiCfIteratorTest has been split into two - `CoalescingIteratorTest` and `AttributeGroupIteratorTest`.
- Moved AttributeGroup related code from `wide_columns.h` to a new file, `attribute_groups.h`.
Some Implementation Details
- `MultiCfIteratorImpl` takes two functions - `populate_func` and `reset_func` and use them to populate `value_` and `columns_` in CoalescingIterator and `attribute_groups_` in AttributeGroupIterator. In CoalescingIterator, populate_func is `Coalesce()`, in AttributeGroupIterator populate_func is `AddToAttributeGroups()`. `reset_func` clears populated value_, columns_ and attribute_groups_ accordingly.
- `Coalesce()` merge sorts columns from multiple CFs when a key exists in more than on CFs. column that appears in later CF overwrites the prior ones.
For example, if CF1 has `"key_1" ==> {"col_1": "foo", "col_2", "baz"}` and CF2 has `"key_1" ==> {"col_2": "quux", "col_3", "bla"}`, and when the iterator is at `key_1`, `columns()` will return `{"col_1": "foo", "col_2", "quux", "col_3", "bla"}`
In this example, `value()` will be empty, because none of them have values for `kDefaultColumnName`
Pull Request resolved: https://github.com/facebook/rocksdb/pull/12480
Test Plan:
## Unit Test
```
./multi_cf_iterator_test
```
## Performance Test
To make sure this change does not impact existing `Iterator` performance
**Build**
```
$> make -j64 release
```
**Setup**
```
$> TEST_TMPDIR=/dev/shm/db_bench ./db_bench -benchmarks="filluniquerandom" -key_size=32 -value_size=512 -num=1000000 -compression_type=none
```
**Run**
```
TEST_TMPDIR=/dev/shm/db_bench ./db_bench -use_existing_db=1 -benchmarks="newiterator,seekrandom" -cache_size=10485760000
```
**Before the change**
```
DB path: [/dev/shm/db_bench/dbbench]
newiterator : 0.519 micros/op 1927904 ops/sec 0.519 seconds 1000000 operations;
DB path: [/dev/shm/db_bench/dbbench]
seekrandom : 5.302 micros/op 188589 ops/sec 5.303 seconds 1000000 operations; (0 of 1000000 found)
```
**After the change**
```
DB path: [/dev/shm/db_bench/dbbench]
newiterator : 0.497 micros/op 2011012 ops/sec 0.497 seconds 1000000 operations;
DB path: [/dev/shm/db_bench/dbbench]
seekrandom : 5.252 micros/op 190405 ops/sec 5.252 seconds 1000000 operations; (0 of 1000000 found)
```
Reviewed By: ltamasi
Differential Revision: D55353909
Pulled By: jaykorean
fbshipit-source-id: 8d7786ffee09e022261ce34aa60e8633685e1946
2024-04-11 18:34:04 +00:00
|
|
|
}
|
|
|
|
void SeekToLast() {
|
|
|
|
auto& max_heap = GetHeap<MultiCfMaxHeap>([this]() { InitMaxHeap(); });
|
2024-04-16 15:45:38 +00:00
|
|
|
SeekCommon(max_heap, [](Iterator* iter) { iter->SeekToLast(); });
|
MultiCFIterator Refactor - CoalescingIterator & AttributeGroupIterator (#12480)
Summary:
There are a couple of reasons to modify the current implementation of the MultiCfIterator, which implements the generic `Iterator` interface.
- The default behavior of `value()`/`columns()` returning data from different Column Families for different keys can be prone to errors, even though there might be valid use cases where users do not care about the origin of the value/columns.
- The `attribute_groups()` API, which is not yet implemented, will not be useful for a single-CF iterator.
In this PR, we are implementing the following changes:
- `IteratorBase` introduced, which includes all basic iterator functions except `value()` and `columns()`.
- `Iterator`, which now inherits from `IteratorBase`, includes `value()` and `columns()`.
- New public interface `AttributeGroupIterator` inherits from `IteratorBase` and additionally includes `attribute_groups()` (to be implemented).
- Renamed former `MultiCfIterator` to `CoalescingIterator` which inherits from `Iterator`
- Existing MultiCfIteratorTest has been split into two - `CoalescingIteratorTest` and `AttributeGroupIteratorTest`.
- Moved AttributeGroup related code from `wide_columns.h` to a new file, `attribute_groups.h`.
Some Implementation Details
- `MultiCfIteratorImpl` takes two functions - `populate_func` and `reset_func` and use them to populate `value_` and `columns_` in CoalescingIterator and `attribute_groups_` in AttributeGroupIterator. In CoalescingIterator, populate_func is `Coalesce()`, in AttributeGroupIterator populate_func is `AddToAttributeGroups()`. `reset_func` clears populated value_, columns_ and attribute_groups_ accordingly.
- `Coalesce()` merge sorts columns from multiple CFs when a key exists in more than on CFs. column that appears in later CF overwrites the prior ones.
For example, if CF1 has `"key_1" ==> {"col_1": "foo", "col_2", "baz"}` and CF2 has `"key_1" ==> {"col_2": "quux", "col_3", "bla"}`, and when the iterator is at `key_1`, `columns()` will return `{"col_1": "foo", "col_2", "quux", "col_3", "bla"}`
In this example, `value()` will be empty, because none of them have values for `kDefaultColumnName`
Pull Request resolved: https://github.com/facebook/rocksdb/pull/12480
Test Plan:
## Unit Test
```
./multi_cf_iterator_test
```
## Performance Test
To make sure this change does not impact existing `Iterator` performance
**Build**
```
$> make -j64 release
```
**Setup**
```
$> TEST_TMPDIR=/dev/shm/db_bench ./db_bench -benchmarks="filluniquerandom" -key_size=32 -value_size=512 -num=1000000 -compression_type=none
```
**Run**
```
TEST_TMPDIR=/dev/shm/db_bench ./db_bench -use_existing_db=1 -benchmarks="newiterator,seekrandom" -cache_size=10485760000
```
**Before the change**
```
DB path: [/dev/shm/db_bench/dbbench]
newiterator : 0.519 micros/op 1927904 ops/sec 0.519 seconds 1000000 operations;
DB path: [/dev/shm/db_bench/dbbench]
seekrandom : 5.302 micros/op 188589 ops/sec 5.303 seconds 1000000 operations; (0 of 1000000 found)
```
**After the change**
```
DB path: [/dev/shm/db_bench/dbbench]
newiterator : 0.497 micros/op 2011012 ops/sec 0.497 seconds 1000000 operations;
DB path: [/dev/shm/db_bench/dbbench]
seekrandom : 5.252 micros/op 190405 ops/sec 5.252 seconds 1000000 operations; (0 of 1000000 found)
```
Reviewed By: ltamasi
Differential Revision: D55353909
Pulled By: jaykorean
fbshipit-source-id: 8d7786ffee09e022261ce34aa60e8633685e1946
2024-04-11 18:34:04 +00:00
|
|
|
}
|
|
|
|
void SeekForPrev(const Slice& target) {
|
|
|
|
auto& max_heap = GetHeap<MultiCfMaxHeap>([this]() { InitMaxHeap(); });
|
2024-04-16 15:45:38 +00:00
|
|
|
SeekCommon(max_heap,
|
|
|
|
[&target](Iterator* iter) { iter->SeekForPrev(target); });
|
MultiCFIterator Refactor - CoalescingIterator & AttributeGroupIterator (#12480)
Summary:
There are a couple of reasons to modify the current implementation of the MultiCfIterator, which implements the generic `Iterator` interface.
- The default behavior of `value()`/`columns()` returning data from different Column Families for different keys can be prone to errors, even though there might be valid use cases where users do not care about the origin of the value/columns.
- The `attribute_groups()` API, which is not yet implemented, will not be useful for a single-CF iterator.
In this PR, we are implementing the following changes:
- `IteratorBase` introduced, which includes all basic iterator functions except `value()` and `columns()`.
- `Iterator`, which now inherits from `IteratorBase`, includes `value()` and `columns()`.
- New public interface `AttributeGroupIterator` inherits from `IteratorBase` and additionally includes `attribute_groups()` (to be implemented).
- Renamed former `MultiCfIterator` to `CoalescingIterator` which inherits from `Iterator`
- Existing MultiCfIteratorTest has been split into two - `CoalescingIteratorTest` and `AttributeGroupIteratorTest`.
- Moved AttributeGroup related code from `wide_columns.h` to a new file, `attribute_groups.h`.
Some Implementation Details
- `MultiCfIteratorImpl` takes two functions - `populate_func` and `reset_func` and use them to populate `value_` and `columns_` in CoalescingIterator and `attribute_groups_` in AttributeGroupIterator. In CoalescingIterator, populate_func is `Coalesce()`, in AttributeGroupIterator populate_func is `AddToAttributeGroups()`. `reset_func` clears populated value_, columns_ and attribute_groups_ accordingly.
- `Coalesce()` merge sorts columns from multiple CFs when a key exists in more than on CFs. column that appears in later CF overwrites the prior ones.
For example, if CF1 has `"key_1" ==> {"col_1": "foo", "col_2", "baz"}` and CF2 has `"key_1" ==> {"col_2": "quux", "col_3", "bla"}`, and when the iterator is at `key_1`, `columns()` will return `{"col_1": "foo", "col_2", "quux", "col_3", "bla"}`
In this example, `value()` will be empty, because none of them have values for `kDefaultColumnName`
Pull Request resolved: https://github.com/facebook/rocksdb/pull/12480
Test Plan:
## Unit Test
```
./multi_cf_iterator_test
```
## Performance Test
To make sure this change does not impact existing `Iterator` performance
**Build**
```
$> make -j64 release
```
**Setup**
```
$> TEST_TMPDIR=/dev/shm/db_bench ./db_bench -benchmarks="filluniquerandom" -key_size=32 -value_size=512 -num=1000000 -compression_type=none
```
**Run**
```
TEST_TMPDIR=/dev/shm/db_bench ./db_bench -use_existing_db=1 -benchmarks="newiterator,seekrandom" -cache_size=10485760000
```
**Before the change**
```
DB path: [/dev/shm/db_bench/dbbench]
newiterator : 0.519 micros/op 1927904 ops/sec 0.519 seconds 1000000 operations;
DB path: [/dev/shm/db_bench/dbbench]
seekrandom : 5.302 micros/op 188589 ops/sec 5.303 seconds 1000000 operations; (0 of 1000000 found)
```
**After the change**
```
DB path: [/dev/shm/db_bench/dbbench]
newiterator : 0.497 micros/op 2011012 ops/sec 0.497 seconds 1000000 operations;
DB path: [/dev/shm/db_bench/dbbench]
seekrandom : 5.252 micros/op 190405 ops/sec 5.252 seconds 1000000 operations; (0 of 1000000 found)
```
Reviewed By: ltamasi
Differential Revision: D55353909
Pulled By: jaykorean
fbshipit-source-id: 8d7786ffee09e022261ce34aa60e8633685e1946
2024-04-11 18:34:04 +00:00
|
|
|
}
|
|
|
|
|
|
|
|
void Next() {
|
|
|
|
assert(Valid());
|
|
|
|
auto& min_heap = GetHeap<MultiCfMinHeap>([this]() {
|
Fix heap-use-after-free in MultiCfIteratorImpl (#12784)
Summary:
# Summary
When changing the direction of the multi-cf-iter, we do this by `Seek(current_key)` (if changing from backward to forward) or `SeekForPrev(current_key)` (if forward -> backward) in the child iters and rebuild the heap.
`Slice target` is just a pointer and contents are not guaranteed to be the same after re-init the heap.
Pull Request resolved: https://github.com/facebook/rocksdb/pull/12784
Test Plan:
I was able to steadily repro by building with `COMPILE_WITH_ASAN=1` running db_stress.
```
COMPILE_WITH_ASAN=1 make -j64 dbg
```
```
./db_stress --WAL_size_limit_MB=1 --WAL_ttl_seconds=60 --acquire_snapshot_one_in=10000 --adaptive_readahead=0 --adm_policy=2 --advise_random_on_open=1 --allow_data_in_errors=True --allow_fallocate=0 --async_io=0 --auto_readahead_size=1 --avoid_flush_during_recovery=0 --avoid_flush_during_shutdown=1 --avoid_unnecessary_blocking_io=1 --backup_max_size=104857600 --backup_one_in=1000 --batch_protection_bytes_per_key=0 --bgerror_resume_retry_interval=100 --block_align=1 --block_protection_bytes_per_key=8 --block_size=16384 --bloom_before_level=2147483646 --bloom_bits=62.9095874568401 --bottommost_compression_type=none --bottommost_file_compaction_delay=600 --bytes_per_sync=0 --cache_index_and_filter_blocks=0 --cache_index_and_filter_blocks_with_high_priority=0 --cache_size=33554432 --cache_type=lru_cache --charge_compression_dictionary_building_buffer=0 --charge_file_metadata=1 --charge_filter_construction=1 --charge_table_reader=0 --check_multiget_consistency=0 --check_multiget_entity_consistency=0 --checkpoint_one_in=10000 --checksum_type=kxxHash64 --clear_column_family_one_in=0 --compact_files_one_in=1000000 --compact_range_one_in=1000000 --compaction_pri=1 --compaction_readahead_size=0 --compaction_ttl=100 --compress_format_version=2 --compressed_secondary_cache_size=8388608 --compression_checksum=1 --compression_max_dict_buffer_bytes=1099511627775 --compression_max_dict_bytes=16384 --compression_parallel_threads=1 --compression_type=none --compression_use_zstd_dict_trainer=1 --compression_zstd_max_train_bytes=0 --continuous_verification_interval=0 --daily_offpeak_time_utc= --data_block_index_type=1 --db=/dev/shm/rocksdb_test/rocksdb_crashtest_whitebox --db_write_buffer_size=0 --default_temperature=kUnknown --default_write_temperature=kWarm --delete_obsolete_files_period_micros=21600000000 --delpercent=4 --delrangepercent=1 --destroy_db_initially=0 --detect_filter_construct_corruption=1 --disable_file_deletions_one_in=1000000 --disable_manual_compaction_one_in=10000 --disable_wal=0 --dump_malloc_stats=1 --enable_checksum_handoff=0 --enable_compaction_filter=0 --enable_custom_split_merge=1 --enable_do_not_compress_roles=0 --enable_index_compression=0 --enable_memtable_insert_with_hint_prefix_extractor=0 --enable_pipelined_write=1 --enable_sst_partitioner_factory=1 --enable_thread_tracking=1 --enable_write_thread_adaptive_yield=1 --error_recovery_with_no_fault_injection=0 --expected_values_dir=/dev/shm/rocksdb_test/rocksdb_crashtest_expected --fail_if_options_file_error=0 --fifo_allow_compaction=1 --file_checksum_impl=crc32c --fill_cache=0 --flush_one_in=1000000 --format_version=4 --get_all_column_family_metadata_one_in=1000000 --get_current_wal_file_one_in=0 --get_live_files_apis_one_in=10000 --get_properties_of_all_tables_one_in=1000000 --get_property_one_in=1000000 --get_sorted_wal_files_one_in=0 --hard_pending_compaction_bytes_limit=274877906944 --high_pri_pool_ratio=0 --index_block_restart_interval=4 --index_shortening=1 --index_type=0 --ingest_external_file_one_in=0 --initial_auto_readahead_size=524288 --inplace_update_support=0 --iterpercent=10 --key_len_percent_dist=1,30,69 --key_may_exist_one_in=100000 --kill_random_test=888887 --last_level_temperature=kHot --level_compaction_dynamic_level_bytes=1 --lock_wal_one_in=10000 --log2_keys_per_lock=10 --log_file_time_to_roll=60 --log_readahead_size=0 --long_running_snapshots=1 --low_pri_pool_ratio=0 --lowest_used_cache_tier=0 --manifest_preallocation_size=5120 --manual_wal_flush_one_in=0 --mark_for_compaction_one_file_in=0 --max_auto_readahead_size=16384 --max_background_compactions=20 --max_bytes_for_level_base=10485760 --max_key=100000 --max_key_len=3 --max_log_file_size=0 --max_manifest_file_size=1073741824 --max_sequential_skip_in_iterations=1 --max_total_wal_size=0 --max_write_batch_group_size_bytes=64 --max_write_buffer_number=3 --max_write_buffer_size_to_maintain=0 --memtable_insert_hint_per_batch=0 --memtable_max_range_deletions=0 --memtable_prefix_bloom_size_ratio=0 --memtable_protection_bytes_per_key=8 --memtable_whole_key_filtering=0 --memtablerep=skip_list --metadata_charge_policy=0 --metadata_read_fault_one_in=1000 --metadata_write_fault_one_in=128 --min_write_buffer_number_to_merge=1 --mmap_read=0 --mock_direct_io=True --nooverwritepercent=1 --num_file_reads_for_auto_readahead=1 --open_files=-1 --open_metadata_read_fault_one_in=0 --open_metadata_write_fault_one_in=0 --open_read_fault_one_in=0 --open_write_fault_one_in=16 --ops_per_thread=20000000 --optimize_filters_for_hits=0 --optimize_filters_for_memory=0 --optimize_multiget_for_io=0 --paranoid_file_checks=1 --partition_filters=0 --partition_pinning=3 --pause_background_one_in=1000000 --periodic_compaction_seconds=0 --persist_user_defined_timestamps=1 --prefix_size=-1 --prefixpercent=0 --prepopulate_block_cache=1 --preserve_internal_time_seconds=36000 --progress_reports=0 --promote_l0_one_in=0 --read_amp_bytes_per_bit=0 --read_fault_one_in=0 --readahead_size=0 --readpercent=50 --recycle_log_file_num=0 --reopen=20 --report_bg_io_stats=1 --reset_stats_one_in=10000 --sample_for_compression=0 --secondary_cache_fault_one_in=0 --secondary_cache_uri= --skip_stats_update_on_db_open=0 --snapshot_hold_ops=100000 --soft_pending_compaction_bytes_limit=68719476736 --sqfc_name=bar --sqfc_version=0 --sst_file_manager_bytes_per_sec=0 --sst_file_manager_bytes_per_truncate=0 --stats_dump_period_sec=10 --stats_history_buffer_size=1048576 --strict_bytes_per_sync=0 --subcompactions=1 --sync=0 --sync_fault_injection=1 --table_cache_numshardbits=6 --target_file_size_base=2097152 --target_file_size_multiplier=2 --test_batches_snapshots=0 --test_cf_consistency=0 --top_level_index_pinning=0 --uncache_aggressiveness=14 --universal_max_read_amp=-1 --unpartitioned_pinning=2 --use_adaptive_mutex=1 --use_adaptive_mutex_lru=0 --use_attribute_group=0 --use_delta_encoding=1 --use_direct_io_for_flush_and_compaction=0 --use_direct_reads=1 --use_full_merge_v1=0 --use_get_entity=1 --use_merge=0 --use_multi_cf_iterator=1 --use_multi_get_entity=1 --use_multiget=1 --use_put_entity_one_in=0 --use_sqfc_for_range_queries=1 --use_timed_put_one_in=0 --use_txn=0 --use_write_buffer_manager=0 --user_timestamp_size=8 --value_size_mult=32 --verification_only=0 --verify_checksum=1 --verify_checksum_one_in=1000000 --verify_compression=1 --verify_db_one_in=10000 --verify_file_checksums_one_in=1000 --verify_iterator_with_expected_state_one_in=5 --verify_sst_unique_id_in_manifest=1 --wal_bytes_per_sync=0 --wal_compression=zstd --write_buffer_size=4194304 --write_dbid_to_manifest=1 --write_fault_one_in=0 --writepercent=35
```
```
==1606272==ERROR: AddressSanitizer: heap-use-after-free on address 0x6060000b0cc0 at pc 0x7f733469c7de bp 0x7f7311bfcfe0 sp 0x7f7311bfc790
READ of size 40 at 0x6060000b0cc0 thread T57
#0 0x7f733469c7dd in __interceptor_memcpy /home/engshare/third-party2/gcc/11.x/src/gcc-11.x/libsanitizer/sanitizer_common/sanitizer_common_interceptors.inc:827
https://github.com/facebook/rocksdb/issues/1 0x7f7331f65f7e in rocksdb::IterKey::SetInternalKey(rocksdb::Slice const&, rocksdb::Slice const&, unsigned long, rocksdb::ValueType, rocksdb::Slice const*) db/dbformat.h:761
https://github.com/facebook/rocksdb/issues/2 0x7f7331f661ee in rocksdb::IterKey::SetInternalKey(rocksdb::Slice const&, unsigned long, rocksdb::ValueType, rocksdb::Slice const*) db/dbformat.h:776
https://github.com/facebook/rocksdb/issues/3 0x7f73323039ff in rocksdb::DBIter::SetSavedKeyToSeekTarget(rocksdb::Slice const&) db/db_iter.cc:1462
https://github.com/facebook/rocksdb/issues/4 0x7f7332304eb8 in rocksdb::DBIter::Seek(rocksdb::Slice const&) db/db_iter.cc:1540
https://github.com/facebook/rocksdb/issues/5 0x7f7331d94abd in rocksdb::ArenaWrappedDBIter::Seek(rocksdb::Slice const&) (/data/users/jewoongh/rocksdb/librocksdb.so.9.4+0x1394abd)
https://github.com/facebook/rocksdb/issues/6 0x7f73320f1a52 in rocksdb::MultiCfIteratorImpl::Seek(rocksdb::Slice const&)::{lambda(rocksdb::Iterator*)https://github.com/facebook/rocksdb/issues/2}::operator()(rocksdb::Iterator*) const db/multi_cf_iterator_impl.h:73
https://github.com/facebook/rocksdb/issues/7 0x7f73320fccf0 in void rocksdb::MultiCfIteratorImpl::SeekCommon<rocksdb::BinaryHeap<rocksdb::MultiCfIteratorInfo, rocksdb::MultiCfIteratorImpl::MultiCfHeapItemComparator<std::greater<int> > >, rocksdb::MultiCfIteratorImpl::Seek(rocksdb::Slice const&)::{lambda(rocksdb::Iterator*)https://github.com/facebook/rocksdb/issues/2}>(rocksdb::BinaryHeap<rocksdb::MultiCfIteratorInfo, rocksdb::MultiCfIteratorImpl::MultiCfHeapItemComparator<std::greater<int> > >&, rocksdb::MultiCfIteratorImpl::Seek(rocksdb::Slice const&)::{lambda(rocksdb::Iterator*)https://github.com/facebook/rocksdb/issues/2}) (/data/users/jewoongh/rocksdb/librocksdb.so.9.4+0x16fccf0)
https://github.com/facebook/rocksdb/issues/8 0x7f73320f1a93 in rocksdb::MultiCfIteratorImpl::Seek(rocksdb::Slice const&) db/multi_cf_iterator_impl.h:73
https://github.com/facebook/rocksdb/issues/9 0x7f73320f1dbe in rocksdb::MultiCfIteratorImpl::Next()::{lambda()https://github.com/facebook/rocksdb/issues/1}::operator()() const db/multi_cf_iterator_impl.h:90
https://github.com/facebook/rocksdb/issues/10 0x7f73320fe159 in rocksdb::BinaryHeap<rocksdb::MultiCfIteratorInfo, rocksdb::MultiCfIteratorImpl::MultiCfHeapItemComparator<std::greater<int> > >& rocksdb::MultiCfIteratorImpl::GetHeap<rocksdb::BinaryHeap<rocksdb::MultiCfIteratorInfo, rocksdb::MultiCfIteratorImpl::MultiCfHeapItemComparator<std::greater<int> > >, rocksdb::MultiCfIteratorImpl::Next()::{lambda()https://github.com/facebook/rocksdb/issues/1}>(rocksdb::MultiCfIteratorImpl::Next()::{lambda()https://github.com/facebook/rocksdb/issues/1}) (/data/users/jewoongh/rocksdb/librocksdb.so.9.4+0x16fe159)
https://github.com/facebook/rocksdb/issues/11 0x7f73320f1ec9 in rocksdb::MultiCfIteratorImpl::Next() db/multi_cf_iterator_impl.h:87
https://github.com/facebook/rocksdb/issues/12 0x7f73320f3255 in rocksdb::CoalescingIterator::Next() db/coalescing_iterator.h:34
https://github.com/facebook/rocksdb/issues/13 0x66f28a in TestIterateImpl<rocksdb::Iterator, rocksdb::StressTest::TestIterate(rocksdb::ThreadState*, const rocksdb::ReadOptions&, const std::vector<int>&, const std::vector<long int>&)::<lambda(const rocksdb::ReadOptions&)>, rocksdb::StressTest::TestIterate(rocksdb::ThreadState*, const rocksdb::ReadOptions&, const std::vector<int>&, const std::vector<long int>&)::<lambda(rocksdb::Iterator*)> > db_stress_tool/db_stress_test_base.cc:1718
https://github.com/facebook/rocksdb/issues/14 0x6440b4 in rocksdb::StressTest::TestIterate(rocksdb::ThreadState*, rocksdb::ReadOptions const&, std::vector<int, std::allocator<int> > const&, std::vector<long, std::allocator<long> > const&) db_stress_tool/db_stress_test_base.cc:1504
https://github.com/facebook/rocksdb/issues/15 0x640cb0 in rocksdb::StressTest::OperateDb(rocksdb::ThreadState*) db_stress_tool/db_stress_test_base.cc:1376
https://github.com/facebook/rocksdb/issues/16 0x6004f6 in rocksdb::ThreadBody(void*) db_stress_tool/db_stress_driver.cc:39
https://github.com/facebook/rocksdb/issues/17 0x7f73327caed4 in StartThreadWrapper env/env_posix.cc:469
https://github.com/facebook/rocksdb/issues/18 0x7f733029abc8 in start_thread /home/engshare/third-party2/glibc/2.34/src/glibc-2.34/nptl/pthread_create.c:434
https://github.com/facebook/rocksdb/issues/19 0x7f733032cf5b in __GI___clone3 (/usr/local/fbcode/platform010/lib/libc.so.6+0x12cf5b)
0x6060000b0cc0 is located 0 bytes inside of 55-byte region [0x6060000b0cc0,0x6060000b0cf7)
freed by thread T57 here:
#0 0x7f73346d1d77 in operator delete[](void*) /home/engshare/third-party2/gcc/11.x/src/gcc-11.x/libsanitizer/asan/asan_new_delete.cpp:163
https://github.com/facebook/rocksdb/issues/1 0x7f7331d9274b in rocksdb::IterKey::ResetBuffer() db/dbformat.h:830
https://github.com/facebook/rocksdb/issues/2 0x7f73323146b9 in rocksdb::IterKey::EnlargeBuffer(unsigned long) db/dbformat.cc:278
https://github.com/facebook/rocksdb/issues/3 0x7f7331f33031 in rocksdb::IterKey::EnlargeBufferIfNeeded(unsigned long) db/dbformat.h:846
https://github.com/facebook/rocksdb/issues/4 0x7f7331f65ee0 in rocksdb::IterKey::SetInternalKey(rocksdb::Slice const&, rocksdb::Slice const&, unsigned long, rocksdb::ValueType, rocksdb::Slice const*) db/dbformat.h:757
https://github.com/facebook/rocksdb/issues/5 0x7f7331f661ee in rocksdb::IterKey::SetInternalKey(rocksdb::Slice const&, unsigned long, rocksdb::ValueType, rocksdb::Slice const*) db/dbformat.h:776
https://github.com/facebook/rocksdb/issues/6 0x7f73323039ff in rocksdb::DBIter::SetSavedKeyToSeekTarget(rocksdb::Slice const&) db/db_iter.cc:1462
https://github.com/facebook/rocksdb/issues/7 0x7f7332304eb8 in rocksdb::DBIter::Seek(rocksdb::Slice const&) db/db_iter.cc:1540
https://github.com/facebook/rocksdb/issues/8 0x7f7331d94abd in rocksdb::ArenaWrappedDBIter::Seek(rocksdb::Slice const&) (/data/users/jewoongh/rocksdb/librocksdb.so.9.4+0x1394abd)
https://github.com/facebook/rocksdb/issues/9 0x7f73320f1a52 in rocksdb::MultiCfIteratorImpl::Seek(rocksdb::Slice const&)::{lambda(rocksdb::Iterator*)https://github.com/facebook/rocksdb/issues/2}::operator()(rocksdb::Iterator*) const db/multi_cf_iterator_impl.h:73
https://github.com/facebook/rocksdb/issues/10 0x7f73320fccf0 in void rocksdb::MultiCfIteratorImpl::SeekCommon<rocksdb::BinaryHeap<rocksdb::MultiCfIteratorInfo, rocksdb::MultiCfIteratorImpl::MultiCfHeapItemComparator<std::greater<int> > >, rocksdb::MultiCfIteratorImpl::Seek(rocksdb::Slice const&)::{lambda(rocksdb::Iterator*)https://github.com/facebook/rocksdb/issues/2}>(rocksdb::BinaryHeap<rocksdb::MultiCfIteratorInfo, rocksdb::MultiCfIteratorImpl::MultiCfHeapItemComparator<std::greater<int> > >&, rocksdb::MultiCfIteratorImpl::Seek(rocksdb::Slice const&)::{lambda(rocksdb::Iterator*)https://github.com/facebook/rocksdb/issues/2}) (/data/users/jewoongh/rocksdb/librocksdb.so.9.4+0x16fccf0)
https://github.com/facebook/rocksdb/issues/11 0x7f73320f1a93 in rocksdb::MultiCfIteratorImpl::Seek(rocksdb::Slice const&) db/multi_cf_iterator_impl.h:73
https://github.com/facebook/rocksdb/issues/12 0x7f73320f1dbe in rocksdb::MultiCfIteratorImpl::Next()::{lambda()https://github.com/facebook/rocksdb/issues/1}::operator()() const db/multi_cf_iterator_impl.h:90
https://github.com/facebook/rocksdb/issues/13 0x7f73320fe159 in rocksdb::BinaryHeap<rocksdb::MultiCfIteratorInfo, rocksdb::MultiCfIteratorImpl::MultiCfHeapItemComparator<std::greater<int> > >& rocksdb::MultiCfIteratorImpl::GetHeap<rocksdb::BinaryHeap<rocksdb::MultiCfIteratorInfo, rocksdb::MultiCfIteratorImpl::MultiCfHeapItemComparator<std::greater<int> > >, rocksdb::MultiCfIteratorImpl::Next()::{lambda()https://github.com/facebook/rocksdb/issues/1}>(rocksdb::MultiCfIteratorImpl::Next()::{lambda()https://github.com/facebook/rocksdb/issues/1}) (/data/users/jewoongh/rocksdb/librocksdb.so.9.4+0x16fe159)
https://github.com/facebook/rocksdb/issues/14 0x7f73320f1ec9 in rocksdb::MultiCfIteratorImpl::Next() db/multi_cf_iterator_impl.h:87
https://github.com/facebook/rocksdb/issues/15 0x7f73320f3255 in rocksdb::CoalescingIterator::Next() db/coalescing_iterator.h:34
https://github.com/facebook/rocksdb/issues/16 0x66f28a in TestIterateImpl<rocksdb::Iterator, rocksdb::StressTest::TestIterate(rocksdb::ThreadState*, const rocksdb::ReadOptions&, const std::vector<int>&, const std::vector<long int>&)::<lambda(const rocksdb::ReadOptions&)>, rocksdb::StressTest::TestIterate(rocksdb::ThreadState*, const rocksdb::ReadOptions&, const std::vector<int>&, const std::vector<long int>&)::<lambda(rocksdb::Iterator*)> > db_stress_tool/db_stress_test_base.cc:1718
https://github.com/facebook/rocksdb/issues/17 0x6440b4 in rocksdb::StressTest::TestIterate(rocksdb::ThreadState*, rocksdb::ReadOptions const&, std::vector<int, std::allocator<int> > const&, std::vector<long, std::allocator<long> > const&) db_stress_tool/db_stress_test_base.cc:1504
https://github.com/facebook/rocksdb/issues/18 0x640cb0 in rocksdb::StressTest::OperateDb(rocksdb::ThreadState*) db_stress_tool/db_stress_test_base.cc:1376
https://github.com/facebook/rocksdb/issues/19 0x6004f6 in rocksdb::ThreadBody(void*) db_stress_tool/db_stress_driver.cc:39
https://github.com/facebook/rocksdb/issues/20 0x7f73327caed4 in StartThreadWrapper env/env_posix.cc:469
https://github.com/facebook/rocksdb/issues/21 0x7f733029abc8 in start_thread /home/engshare/third-party2/glibc/2.34/src/glibc-2.34/nptl/pthread_create.c:434
previously allocated by thread T57 here:
#0 0x7f73346d13b7 in operator new[](unsigned long) /home/engshare/third-party2/gcc/11.x/src/gcc-11.x/libsanitizer/asan/asan_new_delete.cpp:102
https://github.com/facebook/rocksdb/issues/1 0x7f73323146c5 in rocksdb::IterKey::EnlargeBuffer(unsigned long) db/dbformat.cc:279
https://github.com/facebook/rocksdb/issues/2 0x7f7331f33031 in rocksdb::IterKey::EnlargeBufferIfNeeded(unsigned long) db/dbformat.h:846
https://github.com/facebook/rocksdb/issues/3 0x7f7331f65ee0 in rocksdb::IterKey::SetInternalKey(rocksdb::Slice const&, rocksdb::Slice const&, unsigned long, rocksdb::ValueType, rocksdb::Slice const*) db/dbformat.h:757
https://github.com/facebook/rocksdb/issues/4 0x7f7331f661ee in rocksdb::IterKey::SetInternalKey(rocksdb::Slice const&, unsigned long, rocksdb::ValueType, rocksdb::Slice const*) db/dbformat.h:776
https://github.com/facebook/rocksdb/issues/5 0x7f7332303e1e in rocksdb::DBIter::SetSavedKeyToSeekForPrevTarget(rocksdb::Slice const&) db/db_iter.cc:1479
https://github.com/facebook/rocksdb/issues/6 0x7f7332306302 in rocksdb::DBIter::SeekForPrev(rocksdb::Slice const&) db/db_iter.cc:1615
https://github.com/facebook/rocksdb/issues/7 0x7f7331d94b0f in rocksdb::ArenaWrappedDBIter::SeekForPrev(rocksdb::Slice const&) (/data/users/jewoongh/rocksdb/librocksdb.so.9.4+0x1394b0f)
https://github.com/facebook/rocksdb/issues/8 0x7f73320f1c5a in rocksdb::MultiCfIteratorImpl::SeekForPrev(rocksdb::Slice const&)::{lambda(rocksdb::Iterator*)https://github.com/facebook/rocksdb/issues/2}::operator()(rocksdb::Iterator*) const db/multi_cf_iterator_impl.h:82
https://github.com/facebook/rocksdb/issues/9 0x7f73320fdc1e in void rocksdb::MultiCfIteratorImpl::SeekCommon<rocksdb::BinaryHeap<rocksdb::MultiCfIteratorInfo, rocksdb::MultiCfIteratorImpl::MultiCfHeapItemComparator<std::less<int> > >, rocksdb::MultiCfIteratorImpl::SeekForPrev(rocksdb::Slice const&)::{lambda(rocksdb::Iterator*)https://github.com/facebook/rocksdb/issues/2}>(rocksdb::BinaryHeap<rocksdb::MultiCfIteratorInfo, rocksdb::MultiCfIteratorImpl::MultiCfHeapItemComparator<std::less<int> > >&, rocksdb::MultiCfIteratorImpl::SeekForPrev(rocksdb::Slice const&)::{lambda(rocksdb::Iterator*)https://github.com/facebook/rocksdb/issues/2}) (/data/users/jewoongh/rocksdb/librocksdb.so.9.4+0x16fdc1e)
https://github.com/facebook/rocksdb/issues/10 0x7f73320f1c9b in rocksdb::MultiCfIteratorImpl::SeekForPrev(rocksdb::Slice const&) db/multi_cf_iterator_impl.h:81
https://github.com/facebook/rocksdb/issues/11 0x7f73320f2002 in rocksdb::MultiCfIteratorImpl::Prev()::{lambda()https://github.com/facebook/rocksdb/issues/1}::operator()() const db/multi_cf_iterator_impl.h:99
https://github.com/facebook/rocksdb/issues/12 0x7f73320ff223 in rocksdb::BinaryHeap<rocksdb::MultiCfIteratorInfo, rocksdb::MultiCfIteratorImpl::MultiCfHeapItemComparator<std::less<int> > >& rocksdb::MultiCfIteratorImpl::GetHeap<rocksdb::BinaryHeap<rocksdb::MultiCfIteratorInfo, rocksdb::MultiCfIteratorImpl::MultiCfHeapItemComparator<std::less<int> > >, rocksdb::MultiCfIteratorImpl::Prev()::{lambda()https://github.com/facebook/rocksdb/issues/1}>(rocksdb::MultiCfIteratorImpl::Prev()::{lambda()https://github.com/facebook/rocksdb/issues/1}) (/data/users/jewoongh/rocksdb/librocksdb.so.9.4+0x16ff223)
https://github.com/facebook/rocksdb/issues/13 0x7f73320f210d in rocksdb::MultiCfIteratorImpl::Prev() db/multi_cf_iterator_impl.h:96
https://github.com/facebook/rocksdb/issues/14 0x7f73320f3275 in rocksdb::CoalescingIterator::Prev() db/coalescing_iterator.h:35
https://github.com/facebook/rocksdb/issues/15 0x66f440 in TestIterateImpl<rocksdb::Iterator, rocksdb::StressTest::TestIterate(rocksdb::ThreadState*, const rocksdb::ReadOptions&, const std::vector<int>&, const std::vector<long int>&)::<lambda(const rocksdb::ReadOptions&)>, rocksdb::StressTest::TestIterate(rocksdb::ThreadState*, const rocksdb::ReadOptions&, const std::vector<int>&, const std::vector<long int>&)::<lambda(rocksdb::Iterator*)> > db_stress_tool/db_stress_test_base.cc:1725
https://github.com/facebook/rocksdb/issues/16 0x6440b4 in rocksdb::StressTest::TestIterate(rocksdb::ThreadState*, rocksdb::ReadOptions const&, std::vector<int, std::allocator<int> > const&, std::vector<long, std::allocator<long> > const&) db_stress_tool/db_stress_test_base.cc:1504
https://github.com/facebook/rocksdb/issues/17 0x640cb0 in rocksdb::StressTest::OperateDb(rocksdb::ThreadState*) db_stress_tool/db_stress_test_base.cc:1376
https://github.com/facebook/rocksdb/issues/18 0x6004f6 in rocksdb::ThreadBody(void*) db_stress_tool/db_stress_driver.cc:39
https://github.com/facebook/rocksdb/issues/19 0x7f73327caed4 in StartThreadWrapper env/env_posix.cc:469
https://github.com/facebook/rocksdb/issues/20 0x7f733029abc8 in start_thread /home/engshare/third-party2/glibc/2.34/src/glibc-2.34/nptl/pthread_create.c:434
Thread T57 created by T0 here:
#0 0x7f7334642136 in __interceptor_pthread_create /home/engshare/third-party2/gcc/11.x/src/gcc-11.x/libsanitizer/asan/asan_interceptors.cpp:216
https://github.com/facebook/rocksdb/issues/1 0x7f73327cb008 in StartThread env/env_posix.cc:479
https://github.com/facebook/rocksdb/issues/2 0x7f733276b406 in rocksdb::CompositeEnvWrapper::StartThread(void (*)(void*), void*) env/composite_env_wrapper.h:316
https://github.com/facebook/rocksdb/issues/3 0x7f733276b406 in rocksdb::CompositeEnvWrapper::StartThread(void (*)(void*), void*) env/composite_env_wrapper.h:316
https://github.com/facebook/rocksdb/issues/4 0x6013d9 in rocksdb::RunStressTestImpl(rocksdb::SharedState*) db_stress_tool/db_stress_driver.cc:108
https://github.com/facebook/rocksdb/issues/5 0x603083 in rocksdb::RunStressTest(rocksdb::SharedState*) db_stress_tool/db_stress_driver.cc:248
https://github.com/facebook/rocksdb/issues/6 0x4e6ab3 in rocksdb::db_stress_tool(int, char**) db_stress_tool/db_stress_tool.cc:365
https://github.com/facebook/rocksdb/issues/7 0x4e260a in main db_stress_tool/db_stress.cc:23
https://github.com/facebook/rocksdb/issues/8 0x7f733022c656 in __libc_start_call_main ../sysdeps/nptl/libc_start_call_main.h:58
https://github.com/facebook/rocksdb/issues/9 0x7f733022c717 in __libc_start_main_impl ../csu/libc-start.c:409
https://github.com/facebook/rocksdb/issues/10 0x4e2530 in _start (/data/users/jewoongh/rocksdb/db_stress+0x4e2530)
```
`heap-use-after-free` was no longer happening with the same command after making the change.
Reviewed By: pdillinger
Differential Revision: D58871081
Pulled By: jaykorean
fbshipit-source-id: 0194c34ffec5f16a6556c6bf3941a27253a4ecb4
2024-06-21 18:56:10 +00:00
|
|
|
std::string target(key().data(), key().size());
|
MultiCFIterator Refactor - CoalescingIterator & AttributeGroupIterator (#12480)
Summary:
There are a couple of reasons to modify the current implementation of the MultiCfIterator, which implements the generic `Iterator` interface.
- The default behavior of `value()`/`columns()` returning data from different Column Families for different keys can be prone to errors, even though there might be valid use cases where users do not care about the origin of the value/columns.
- The `attribute_groups()` API, which is not yet implemented, will not be useful for a single-CF iterator.
In this PR, we are implementing the following changes:
- `IteratorBase` introduced, which includes all basic iterator functions except `value()` and `columns()`.
- `Iterator`, which now inherits from `IteratorBase`, includes `value()` and `columns()`.
- New public interface `AttributeGroupIterator` inherits from `IteratorBase` and additionally includes `attribute_groups()` (to be implemented).
- Renamed former `MultiCfIterator` to `CoalescingIterator` which inherits from `Iterator`
- Existing MultiCfIteratorTest has been split into two - `CoalescingIteratorTest` and `AttributeGroupIteratorTest`.
- Moved AttributeGroup related code from `wide_columns.h` to a new file, `attribute_groups.h`.
Some Implementation Details
- `MultiCfIteratorImpl` takes two functions - `populate_func` and `reset_func` and use them to populate `value_` and `columns_` in CoalescingIterator and `attribute_groups_` in AttributeGroupIterator. In CoalescingIterator, populate_func is `Coalesce()`, in AttributeGroupIterator populate_func is `AddToAttributeGroups()`. `reset_func` clears populated value_, columns_ and attribute_groups_ accordingly.
- `Coalesce()` merge sorts columns from multiple CFs when a key exists in more than on CFs. column that appears in later CF overwrites the prior ones.
For example, if CF1 has `"key_1" ==> {"col_1": "foo", "col_2", "baz"}` and CF2 has `"key_1" ==> {"col_2": "quux", "col_3", "bla"}`, and when the iterator is at `key_1`, `columns()` will return `{"col_1": "foo", "col_2", "quux", "col_3", "bla"}`
In this example, `value()` will be empty, because none of them have values for `kDefaultColumnName`
Pull Request resolved: https://github.com/facebook/rocksdb/pull/12480
Test Plan:
## Unit Test
```
./multi_cf_iterator_test
```
## Performance Test
To make sure this change does not impact existing `Iterator` performance
**Build**
```
$> make -j64 release
```
**Setup**
```
$> TEST_TMPDIR=/dev/shm/db_bench ./db_bench -benchmarks="filluniquerandom" -key_size=32 -value_size=512 -num=1000000 -compression_type=none
```
**Run**
```
TEST_TMPDIR=/dev/shm/db_bench ./db_bench -use_existing_db=1 -benchmarks="newiterator,seekrandom" -cache_size=10485760000
```
**Before the change**
```
DB path: [/dev/shm/db_bench/dbbench]
newiterator : 0.519 micros/op 1927904 ops/sec 0.519 seconds 1000000 operations;
DB path: [/dev/shm/db_bench/dbbench]
seekrandom : 5.302 micros/op 188589 ops/sec 5.303 seconds 1000000 operations; (0 of 1000000 found)
```
**After the change**
```
DB path: [/dev/shm/db_bench/dbbench]
newiterator : 0.497 micros/op 2011012 ops/sec 0.497 seconds 1000000 operations;
DB path: [/dev/shm/db_bench/dbbench]
seekrandom : 5.252 micros/op 190405 ops/sec 5.252 seconds 1000000 operations; (0 of 1000000 found)
```
Reviewed By: ltamasi
Differential Revision: D55353909
Pulled By: jaykorean
fbshipit-source-id: 8d7786ffee09e022261ce34aa60e8633685e1946
2024-04-11 18:34:04 +00:00
|
|
|
InitMinHeap();
|
|
|
|
Seek(target);
|
|
|
|
});
|
|
|
|
AdvanceIterator(min_heap, [](Iterator* iter) { iter->Next(); });
|
|
|
|
}
|
|
|
|
void Prev() {
|
|
|
|
assert(Valid());
|
|
|
|
auto& max_heap = GetHeap<MultiCfMaxHeap>([this]() {
|
Fix heap-use-after-free in MultiCfIteratorImpl (#12784)
Summary:
# Summary
When changing the direction of the multi-cf-iter, we do this by `Seek(current_key)` (if changing from backward to forward) or `SeekForPrev(current_key)` (if forward -> backward) in the child iters and rebuild the heap.
`Slice target` is just a pointer and contents are not guaranteed to be the same after re-init the heap.
Pull Request resolved: https://github.com/facebook/rocksdb/pull/12784
Test Plan:
I was able to steadily repro by building with `COMPILE_WITH_ASAN=1` running db_stress.
```
COMPILE_WITH_ASAN=1 make -j64 dbg
```
```
./db_stress --WAL_size_limit_MB=1 --WAL_ttl_seconds=60 --acquire_snapshot_one_in=10000 --adaptive_readahead=0 --adm_policy=2 --advise_random_on_open=1 --allow_data_in_errors=True --allow_fallocate=0 --async_io=0 --auto_readahead_size=1 --avoid_flush_during_recovery=0 --avoid_flush_during_shutdown=1 --avoid_unnecessary_blocking_io=1 --backup_max_size=104857600 --backup_one_in=1000 --batch_protection_bytes_per_key=0 --bgerror_resume_retry_interval=100 --block_align=1 --block_protection_bytes_per_key=8 --block_size=16384 --bloom_before_level=2147483646 --bloom_bits=62.9095874568401 --bottommost_compression_type=none --bottommost_file_compaction_delay=600 --bytes_per_sync=0 --cache_index_and_filter_blocks=0 --cache_index_and_filter_blocks_with_high_priority=0 --cache_size=33554432 --cache_type=lru_cache --charge_compression_dictionary_building_buffer=0 --charge_file_metadata=1 --charge_filter_construction=1 --charge_table_reader=0 --check_multiget_consistency=0 --check_multiget_entity_consistency=0 --checkpoint_one_in=10000 --checksum_type=kxxHash64 --clear_column_family_one_in=0 --compact_files_one_in=1000000 --compact_range_one_in=1000000 --compaction_pri=1 --compaction_readahead_size=0 --compaction_ttl=100 --compress_format_version=2 --compressed_secondary_cache_size=8388608 --compression_checksum=1 --compression_max_dict_buffer_bytes=1099511627775 --compression_max_dict_bytes=16384 --compression_parallel_threads=1 --compression_type=none --compression_use_zstd_dict_trainer=1 --compression_zstd_max_train_bytes=0 --continuous_verification_interval=0 --daily_offpeak_time_utc= --data_block_index_type=1 --db=/dev/shm/rocksdb_test/rocksdb_crashtest_whitebox --db_write_buffer_size=0 --default_temperature=kUnknown --default_write_temperature=kWarm --delete_obsolete_files_period_micros=21600000000 --delpercent=4 --delrangepercent=1 --destroy_db_initially=0 --detect_filter_construct_corruption=1 --disable_file_deletions_one_in=1000000 --disable_manual_compaction_one_in=10000 --disable_wal=0 --dump_malloc_stats=1 --enable_checksum_handoff=0 --enable_compaction_filter=0 --enable_custom_split_merge=1 --enable_do_not_compress_roles=0 --enable_index_compression=0 --enable_memtable_insert_with_hint_prefix_extractor=0 --enable_pipelined_write=1 --enable_sst_partitioner_factory=1 --enable_thread_tracking=1 --enable_write_thread_adaptive_yield=1 --error_recovery_with_no_fault_injection=0 --expected_values_dir=/dev/shm/rocksdb_test/rocksdb_crashtest_expected --fail_if_options_file_error=0 --fifo_allow_compaction=1 --file_checksum_impl=crc32c --fill_cache=0 --flush_one_in=1000000 --format_version=4 --get_all_column_family_metadata_one_in=1000000 --get_current_wal_file_one_in=0 --get_live_files_apis_one_in=10000 --get_properties_of_all_tables_one_in=1000000 --get_property_one_in=1000000 --get_sorted_wal_files_one_in=0 --hard_pending_compaction_bytes_limit=274877906944 --high_pri_pool_ratio=0 --index_block_restart_interval=4 --index_shortening=1 --index_type=0 --ingest_external_file_one_in=0 --initial_auto_readahead_size=524288 --inplace_update_support=0 --iterpercent=10 --key_len_percent_dist=1,30,69 --key_may_exist_one_in=100000 --kill_random_test=888887 --last_level_temperature=kHot --level_compaction_dynamic_level_bytes=1 --lock_wal_one_in=10000 --log2_keys_per_lock=10 --log_file_time_to_roll=60 --log_readahead_size=0 --long_running_snapshots=1 --low_pri_pool_ratio=0 --lowest_used_cache_tier=0 --manifest_preallocation_size=5120 --manual_wal_flush_one_in=0 --mark_for_compaction_one_file_in=0 --max_auto_readahead_size=16384 --max_background_compactions=20 --max_bytes_for_level_base=10485760 --max_key=100000 --max_key_len=3 --max_log_file_size=0 --max_manifest_file_size=1073741824 --max_sequential_skip_in_iterations=1 --max_total_wal_size=0 --max_write_batch_group_size_bytes=64 --max_write_buffer_number=3 --max_write_buffer_size_to_maintain=0 --memtable_insert_hint_per_batch=0 --memtable_max_range_deletions=0 --memtable_prefix_bloom_size_ratio=0 --memtable_protection_bytes_per_key=8 --memtable_whole_key_filtering=0 --memtablerep=skip_list --metadata_charge_policy=0 --metadata_read_fault_one_in=1000 --metadata_write_fault_one_in=128 --min_write_buffer_number_to_merge=1 --mmap_read=0 --mock_direct_io=True --nooverwritepercent=1 --num_file_reads_for_auto_readahead=1 --open_files=-1 --open_metadata_read_fault_one_in=0 --open_metadata_write_fault_one_in=0 --open_read_fault_one_in=0 --open_write_fault_one_in=16 --ops_per_thread=20000000 --optimize_filters_for_hits=0 --optimize_filters_for_memory=0 --optimize_multiget_for_io=0 --paranoid_file_checks=1 --partition_filters=0 --partition_pinning=3 --pause_background_one_in=1000000 --periodic_compaction_seconds=0 --persist_user_defined_timestamps=1 --prefix_size=-1 --prefixpercent=0 --prepopulate_block_cache=1 --preserve_internal_time_seconds=36000 --progress_reports=0 --promote_l0_one_in=0 --read_amp_bytes_per_bit=0 --read_fault_one_in=0 --readahead_size=0 --readpercent=50 --recycle_log_file_num=0 --reopen=20 --report_bg_io_stats=1 --reset_stats_one_in=10000 --sample_for_compression=0 --secondary_cache_fault_one_in=0 --secondary_cache_uri= --skip_stats_update_on_db_open=0 --snapshot_hold_ops=100000 --soft_pending_compaction_bytes_limit=68719476736 --sqfc_name=bar --sqfc_version=0 --sst_file_manager_bytes_per_sec=0 --sst_file_manager_bytes_per_truncate=0 --stats_dump_period_sec=10 --stats_history_buffer_size=1048576 --strict_bytes_per_sync=0 --subcompactions=1 --sync=0 --sync_fault_injection=1 --table_cache_numshardbits=6 --target_file_size_base=2097152 --target_file_size_multiplier=2 --test_batches_snapshots=0 --test_cf_consistency=0 --top_level_index_pinning=0 --uncache_aggressiveness=14 --universal_max_read_amp=-1 --unpartitioned_pinning=2 --use_adaptive_mutex=1 --use_adaptive_mutex_lru=0 --use_attribute_group=0 --use_delta_encoding=1 --use_direct_io_for_flush_and_compaction=0 --use_direct_reads=1 --use_full_merge_v1=0 --use_get_entity=1 --use_merge=0 --use_multi_cf_iterator=1 --use_multi_get_entity=1 --use_multiget=1 --use_put_entity_one_in=0 --use_sqfc_for_range_queries=1 --use_timed_put_one_in=0 --use_txn=0 --use_write_buffer_manager=0 --user_timestamp_size=8 --value_size_mult=32 --verification_only=0 --verify_checksum=1 --verify_checksum_one_in=1000000 --verify_compression=1 --verify_db_one_in=10000 --verify_file_checksums_one_in=1000 --verify_iterator_with_expected_state_one_in=5 --verify_sst_unique_id_in_manifest=1 --wal_bytes_per_sync=0 --wal_compression=zstd --write_buffer_size=4194304 --write_dbid_to_manifest=1 --write_fault_one_in=0 --writepercent=35
```
```
==1606272==ERROR: AddressSanitizer: heap-use-after-free on address 0x6060000b0cc0 at pc 0x7f733469c7de bp 0x7f7311bfcfe0 sp 0x7f7311bfc790
READ of size 40 at 0x6060000b0cc0 thread T57
#0 0x7f733469c7dd in __interceptor_memcpy /home/engshare/third-party2/gcc/11.x/src/gcc-11.x/libsanitizer/sanitizer_common/sanitizer_common_interceptors.inc:827
https://github.com/facebook/rocksdb/issues/1 0x7f7331f65f7e in rocksdb::IterKey::SetInternalKey(rocksdb::Slice const&, rocksdb::Slice const&, unsigned long, rocksdb::ValueType, rocksdb::Slice const*) db/dbformat.h:761
https://github.com/facebook/rocksdb/issues/2 0x7f7331f661ee in rocksdb::IterKey::SetInternalKey(rocksdb::Slice const&, unsigned long, rocksdb::ValueType, rocksdb::Slice const*) db/dbformat.h:776
https://github.com/facebook/rocksdb/issues/3 0x7f73323039ff in rocksdb::DBIter::SetSavedKeyToSeekTarget(rocksdb::Slice const&) db/db_iter.cc:1462
https://github.com/facebook/rocksdb/issues/4 0x7f7332304eb8 in rocksdb::DBIter::Seek(rocksdb::Slice const&) db/db_iter.cc:1540
https://github.com/facebook/rocksdb/issues/5 0x7f7331d94abd in rocksdb::ArenaWrappedDBIter::Seek(rocksdb::Slice const&) (/data/users/jewoongh/rocksdb/librocksdb.so.9.4+0x1394abd)
https://github.com/facebook/rocksdb/issues/6 0x7f73320f1a52 in rocksdb::MultiCfIteratorImpl::Seek(rocksdb::Slice const&)::{lambda(rocksdb::Iterator*)https://github.com/facebook/rocksdb/issues/2}::operator()(rocksdb::Iterator*) const db/multi_cf_iterator_impl.h:73
https://github.com/facebook/rocksdb/issues/7 0x7f73320fccf0 in void rocksdb::MultiCfIteratorImpl::SeekCommon<rocksdb::BinaryHeap<rocksdb::MultiCfIteratorInfo, rocksdb::MultiCfIteratorImpl::MultiCfHeapItemComparator<std::greater<int> > >, rocksdb::MultiCfIteratorImpl::Seek(rocksdb::Slice const&)::{lambda(rocksdb::Iterator*)https://github.com/facebook/rocksdb/issues/2}>(rocksdb::BinaryHeap<rocksdb::MultiCfIteratorInfo, rocksdb::MultiCfIteratorImpl::MultiCfHeapItemComparator<std::greater<int> > >&, rocksdb::MultiCfIteratorImpl::Seek(rocksdb::Slice const&)::{lambda(rocksdb::Iterator*)https://github.com/facebook/rocksdb/issues/2}) (/data/users/jewoongh/rocksdb/librocksdb.so.9.4+0x16fccf0)
https://github.com/facebook/rocksdb/issues/8 0x7f73320f1a93 in rocksdb::MultiCfIteratorImpl::Seek(rocksdb::Slice const&) db/multi_cf_iterator_impl.h:73
https://github.com/facebook/rocksdb/issues/9 0x7f73320f1dbe in rocksdb::MultiCfIteratorImpl::Next()::{lambda()https://github.com/facebook/rocksdb/issues/1}::operator()() const db/multi_cf_iterator_impl.h:90
https://github.com/facebook/rocksdb/issues/10 0x7f73320fe159 in rocksdb::BinaryHeap<rocksdb::MultiCfIteratorInfo, rocksdb::MultiCfIteratorImpl::MultiCfHeapItemComparator<std::greater<int> > >& rocksdb::MultiCfIteratorImpl::GetHeap<rocksdb::BinaryHeap<rocksdb::MultiCfIteratorInfo, rocksdb::MultiCfIteratorImpl::MultiCfHeapItemComparator<std::greater<int> > >, rocksdb::MultiCfIteratorImpl::Next()::{lambda()https://github.com/facebook/rocksdb/issues/1}>(rocksdb::MultiCfIteratorImpl::Next()::{lambda()https://github.com/facebook/rocksdb/issues/1}) (/data/users/jewoongh/rocksdb/librocksdb.so.9.4+0x16fe159)
https://github.com/facebook/rocksdb/issues/11 0x7f73320f1ec9 in rocksdb::MultiCfIteratorImpl::Next() db/multi_cf_iterator_impl.h:87
https://github.com/facebook/rocksdb/issues/12 0x7f73320f3255 in rocksdb::CoalescingIterator::Next() db/coalescing_iterator.h:34
https://github.com/facebook/rocksdb/issues/13 0x66f28a in TestIterateImpl<rocksdb::Iterator, rocksdb::StressTest::TestIterate(rocksdb::ThreadState*, const rocksdb::ReadOptions&, const std::vector<int>&, const std::vector<long int>&)::<lambda(const rocksdb::ReadOptions&)>, rocksdb::StressTest::TestIterate(rocksdb::ThreadState*, const rocksdb::ReadOptions&, const std::vector<int>&, const std::vector<long int>&)::<lambda(rocksdb::Iterator*)> > db_stress_tool/db_stress_test_base.cc:1718
https://github.com/facebook/rocksdb/issues/14 0x6440b4 in rocksdb::StressTest::TestIterate(rocksdb::ThreadState*, rocksdb::ReadOptions const&, std::vector<int, std::allocator<int> > const&, std::vector<long, std::allocator<long> > const&) db_stress_tool/db_stress_test_base.cc:1504
https://github.com/facebook/rocksdb/issues/15 0x640cb0 in rocksdb::StressTest::OperateDb(rocksdb::ThreadState*) db_stress_tool/db_stress_test_base.cc:1376
https://github.com/facebook/rocksdb/issues/16 0x6004f6 in rocksdb::ThreadBody(void*) db_stress_tool/db_stress_driver.cc:39
https://github.com/facebook/rocksdb/issues/17 0x7f73327caed4 in StartThreadWrapper env/env_posix.cc:469
https://github.com/facebook/rocksdb/issues/18 0x7f733029abc8 in start_thread /home/engshare/third-party2/glibc/2.34/src/glibc-2.34/nptl/pthread_create.c:434
https://github.com/facebook/rocksdb/issues/19 0x7f733032cf5b in __GI___clone3 (/usr/local/fbcode/platform010/lib/libc.so.6+0x12cf5b)
0x6060000b0cc0 is located 0 bytes inside of 55-byte region [0x6060000b0cc0,0x6060000b0cf7)
freed by thread T57 here:
#0 0x7f73346d1d77 in operator delete[](void*) /home/engshare/third-party2/gcc/11.x/src/gcc-11.x/libsanitizer/asan/asan_new_delete.cpp:163
https://github.com/facebook/rocksdb/issues/1 0x7f7331d9274b in rocksdb::IterKey::ResetBuffer() db/dbformat.h:830
https://github.com/facebook/rocksdb/issues/2 0x7f73323146b9 in rocksdb::IterKey::EnlargeBuffer(unsigned long) db/dbformat.cc:278
https://github.com/facebook/rocksdb/issues/3 0x7f7331f33031 in rocksdb::IterKey::EnlargeBufferIfNeeded(unsigned long) db/dbformat.h:846
https://github.com/facebook/rocksdb/issues/4 0x7f7331f65ee0 in rocksdb::IterKey::SetInternalKey(rocksdb::Slice const&, rocksdb::Slice const&, unsigned long, rocksdb::ValueType, rocksdb::Slice const*) db/dbformat.h:757
https://github.com/facebook/rocksdb/issues/5 0x7f7331f661ee in rocksdb::IterKey::SetInternalKey(rocksdb::Slice const&, unsigned long, rocksdb::ValueType, rocksdb::Slice const*) db/dbformat.h:776
https://github.com/facebook/rocksdb/issues/6 0x7f73323039ff in rocksdb::DBIter::SetSavedKeyToSeekTarget(rocksdb::Slice const&) db/db_iter.cc:1462
https://github.com/facebook/rocksdb/issues/7 0x7f7332304eb8 in rocksdb::DBIter::Seek(rocksdb::Slice const&) db/db_iter.cc:1540
https://github.com/facebook/rocksdb/issues/8 0x7f7331d94abd in rocksdb::ArenaWrappedDBIter::Seek(rocksdb::Slice const&) (/data/users/jewoongh/rocksdb/librocksdb.so.9.4+0x1394abd)
https://github.com/facebook/rocksdb/issues/9 0x7f73320f1a52 in rocksdb::MultiCfIteratorImpl::Seek(rocksdb::Slice const&)::{lambda(rocksdb::Iterator*)https://github.com/facebook/rocksdb/issues/2}::operator()(rocksdb::Iterator*) const db/multi_cf_iterator_impl.h:73
https://github.com/facebook/rocksdb/issues/10 0x7f73320fccf0 in void rocksdb::MultiCfIteratorImpl::SeekCommon<rocksdb::BinaryHeap<rocksdb::MultiCfIteratorInfo, rocksdb::MultiCfIteratorImpl::MultiCfHeapItemComparator<std::greater<int> > >, rocksdb::MultiCfIteratorImpl::Seek(rocksdb::Slice const&)::{lambda(rocksdb::Iterator*)https://github.com/facebook/rocksdb/issues/2}>(rocksdb::BinaryHeap<rocksdb::MultiCfIteratorInfo, rocksdb::MultiCfIteratorImpl::MultiCfHeapItemComparator<std::greater<int> > >&, rocksdb::MultiCfIteratorImpl::Seek(rocksdb::Slice const&)::{lambda(rocksdb::Iterator*)https://github.com/facebook/rocksdb/issues/2}) (/data/users/jewoongh/rocksdb/librocksdb.so.9.4+0x16fccf0)
https://github.com/facebook/rocksdb/issues/11 0x7f73320f1a93 in rocksdb::MultiCfIteratorImpl::Seek(rocksdb::Slice const&) db/multi_cf_iterator_impl.h:73
https://github.com/facebook/rocksdb/issues/12 0x7f73320f1dbe in rocksdb::MultiCfIteratorImpl::Next()::{lambda()https://github.com/facebook/rocksdb/issues/1}::operator()() const db/multi_cf_iterator_impl.h:90
https://github.com/facebook/rocksdb/issues/13 0x7f73320fe159 in rocksdb::BinaryHeap<rocksdb::MultiCfIteratorInfo, rocksdb::MultiCfIteratorImpl::MultiCfHeapItemComparator<std::greater<int> > >& rocksdb::MultiCfIteratorImpl::GetHeap<rocksdb::BinaryHeap<rocksdb::MultiCfIteratorInfo, rocksdb::MultiCfIteratorImpl::MultiCfHeapItemComparator<std::greater<int> > >, rocksdb::MultiCfIteratorImpl::Next()::{lambda()https://github.com/facebook/rocksdb/issues/1}>(rocksdb::MultiCfIteratorImpl::Next()::{lambda()https://github.com/facebook/rocksdb/issues/1}) (/data/users/jewoongh/rocksdb/librocksdb.so.9.4+0x16fe159)
https://github.com/facebook/rocksdb/issues/14 0x7f73320f1ec9 in rocksdb::MultiCfIteratorImpl::Next() db/multi_cf_iterator_impl.h:87
https://github.com/facebook/rocksdb/issues/15 0x7f73320f3255 in rocksdb::CoalescingIterator::Next() db/coalescing_iterator.h:34
https://github.com/facebook/rocksdb/issues/16 0x66f28a in TestIterateImpl<rocksdb::Iterator, rocksdb::StressTest::TestIterate(rocksdb::ThreadState*, const rocksdb::ReadOptions&, const std::vector<int>&, const std::vector<long int>&)::<lambda(const rocksdb::ReadOptions&)>, rocksdb::StressTest::TestIterate(rocksdb::ThreadState*, const rocksdb::ReadOptions&, const std::vector<int>&, const std::vector<long int>&)::<lambda(rocksdb::Iterator*)> > db_stress_tool/db_stress_test_base.cc:1718
https://github.com/facebook/rocksdb/issues/17 0x6440b4 in rocksdb::StressTest::TestIterate(rocksdb::ThreadState*, rocksdb::ReadOptions const&, std::vector<int, std::allocator<int> > const&, std::vector<long, std::allocator<long> > const&) db_stress_tool/db_stress_test_base.cc:1504
https://github.com/facebook/rocksdb/issues/18 0x640cb0 in rocksdb::StressTest::OperateDb(rocksdb::ThreadState*) db_stress_tool/db_stress_test_base.cc:1376
https://github.com/facebook/rocksdb/issues/19 0x6004f6 in rocksdb::ThreadBody(void*) db_stress_tool/db_stress_driver.cc:39
https://github.com/facebook/rocksdb/issues/20 0x7f73327caed4 in StartThreadWrapper env/env_posix.cc:469
https://github.com/facebook/rocksdb/issues/21 0x7f733029abc8 in start_thread /home/engshare/third-party2/glibc/2.34/src/glibc-2.34/nptl/pthread_create.c:434
previously allocated by thread T57 here:
#0 0x7f73346d13b7 in operator new[](unsigned long) /home/engshare/third-party2/gcc/11.x/src/gcc-11.x/libsanitizer/asan/asan_new_delete.cpp:102
https://github.com/facebook/rocksdb/issues/1 0x7f73323146c5 in rocksdb::IterKey::EnlargeBuffer(unsigned long) db/dbformat.cc:279
https://github.com/facebook/rocksdb/issues/2 0x7f7331f33031 in rocksdb::IterKey::EnlargeBufferIfNeeded(unsigned long) db/dbformat.h:846
https://github.com/facebook/rocksdb/issues/3 0x7f7331f65ee0 in rocksdb::IterKey::SetInternalKey(rocksdb::Slice const&, rocksdb::Slice const&, unsigned long, rocksdb::ValueType, rocksdb::Slice const*) db/dbformat.h:757
https://github.com/facebook/rocksdb/issues/4 0x7f7331f661ee in rocksdb::IterKey::SetInternalKey(rocksdb::Slice const&, unsigned long, rocksdb::ValueType, rocksdb::Slice const*) db/dbformat.h:776
https://github.com/facebook/rocksdb/issues/5 0x7f7332303e1e in rocksdb::DBIter::SetSavedKeyToSeekForPrevTarget(rocksdb::Slice const&) db/db_iter.cc:1479
https://github.com/facebook/rocksdb/issues/6 0x7f7332306302 in rocksdb::DBIter::SeekForPrev(rocksdb::Slice const&) db/db_iter.cc:1615
https://github.com/facebook/rocksdb/issues/7 0x7f7331d94b0f in rocksdb::ArenaWrappedDBIter::SeekForPrev(rocksdb::Slice const&) (/data/users/jewoongh/rocksdb/librocksdb.so.9.4+0x1394b0f)
https://github.com/facebook/rocksdb/issues/8 0x7f73320f1c5a in rocksdb::MultiCfIteratorImpl::SeekForPrev(rocksdb::Slice const&)::{lambda(rocksdb::Iterator*)https://github.com/facebook/rocksdb/issues/2}::operator()(rocksdb::Iterator*) const db/multi_cf_iterator_impl.h:82
https://github.com/facebook/rocksdb/issues/9 0x7f73320fdc1e in void rocksdb::MultiCfIteratorImpl::SeekCommon<rocksdb::BinaryHeap<rocksdb::MultiCfIteratorInfo, rocksdb::MultiCfIteratorImpl::MultiCfHeapItemComparator<std::less<int> > >, rocksdb::MultiCfIteratorImpl::SeekForPrev(rocksdb::Slice const&)::{lambda(rocksdb::Iterator*)https://github.com/facebook/rocksdb/issues/2}>(rocksdb::BinaryHeap<rocksdb::MultiCfIteratorInfo, rocksdb::MultiCfIteratorImpl::MultiCfHeapItemComparator<std::less<int> > >&, rocksdb::MultiCfIteratorImpl::SeekForPrev(rocksdb::Slice const&)::{lambda(rocksdb::Iterator*)https://github.com/facebook/rocksdb/issues/2}) (/data/users/jewoongh/rocksdb/librocksdb.so.9.4+0x16fdc1e)
https://github.com/facebook/rocksdb/issues/10 0x7f73320f1c9b in rocksdb::MultiCfIteratorImpl::SeekForPrev(rocksdb::Slice const&) db/multi_cf_iterator_impl.h:81
https://github.com/facebook/rocksdb/issues/11 0x7f73320f2002 in rocksdb::MultiCfIteratorImpl::Prev()::{lambda()https://github.com/facebook/rocksdb/issues/1}::operator()() const db/multi_cf_iterator_impl.h:99
https://github.com/facebook/rocksdb/issues/12 0x7f73320ff223 in rocksdb::BinaryHeap<rocksdb::MultiCfIteratorInfo, rocksdb::MultiCfIteratorImpl::MultiCfHeapItemComparator<std::less<int> > >& rocksdb::MultiCfIteratorImpl::GetHeap<rocksdb::BinaryHeap<rocksdb::MultiCfIteratorInfo, rocksdb::MultiCfIteratorImpl::MultiCfHeapItemComparator<std::less<int> > >, rocksdb::MultiCfIteratorImpl::Prev()::{lambda()https://github.com/facebook/rocksdb/issues/1}>(rocksdb::MultiCfIteratorImpl::Prev()::{lambda()https://github.com/facebook/rocksdb/issues/1}) (/data/users/jewoongh/rocksdb/librocksdb.so.9.4+0x16ff223)
https://github.com/facebook/rocksdb/issues/13 0x7f73320f210d in rocksdb::MultiCfIteratorImpl::Prev() db/multi_cf_iterator_impl.h:96
https://github.com/facebook/rocksdb/issues/14 0x7f73320f3275 in rocksdb::CoalescingIterator::Prev() db/coalescing_iterator.h:35
https://github.com/facebook/rocksdb/issues/15 0x66f440 in TestIterateImpl<rocksdb::Iterator, rocksdb::StressTest::TestIterate(rocksdb::ThreadState*, const rocksdb::ReadOptions&, const std::vector<int>&, const std::vector<long int>&)::<lambda(const rocksdb::ReadOptions&)>, rocksdb::StressTest::TestIterate(rocksdb::ThreadState*, const rocksdb::ReadOptions&, const std::vector<int>&, const std::vector<long int>&)::<lambda(rocksdb::Iterator*)> > db_stress_tool/db_stress_test_base.cc:1725
https://github.com/facebook/rocksdb/issues/16 0x6440b4 in rocksdb::StressTest::TestIterate(rocksdb::ThreadState*, rocksdb::ReadOptions const&, std::vector<int, std::allocator<int> > const&, std::vector<long, std::allocator<long> > const&) db_stress_tool/db_stress_test_base.cc:1504
https://github.com/facebook/rocksdb/issues/17 0x640cb0 in rocksdb::StressTest::OperateDb(rocksdb::ThreadState*) db_stress_tool/db_stress_test_base.cc:1376
https://github.com/facebook/rocksdb/issues/18 0x6004f6 in rocksdb::ThreadBody(void*) db_stress_tool/db_stress_driver.cc:39
https://github.com/facebook/rocksdb/issues/19 0x7f73327caed4 in StartThreadWrapper env/env_posix.cc:469
https://github.com/facebook/rocksdb/issues/20 0x7f733029abc8 in start_thread /home/engshare/third-party2/glibc/2.34/src/glibc-2.34/nptl/pthread_create.c:434
Thread T57 created by T0 here:
#0 0x7f7334642136 in __interceptor_pthread_create /home/engshare/third-party2/gcc/11.x/src/gcc-11.x/libsanitizer/asan/asan_interceptors.cpp:216
https://github.com/facebook/rocksdb/issues/1 0x7f73327cb008 in StartThread env/env_posix.cc:479
https://github.com/facebook/rocksdb/issues/2 0x7f733276b406 in rocksdb::CompositeEnvWrapper::StartThread(void (*)(void*), void*) env/composite_env_wrapper.h:316
https://github.com/facebook/rocksdb/issues/3 0x7f733276b406 in rocksdb::CompositeEnvWrapper::StartThread(void (*)(void*), void*) env/composite_env_wrapper.h:316
https://github.com/facebook/rocksdb/issues/4 0x6013d9 in rocksdb::RunStressTestImpl(rocksdb::SharedState*) db_stress_tool/db_stress_driver.cc:108
https://github.com/facebook/rocksdb/issues/5 0x603083 in rocksdb::RunStressTest(rocksdb::SharedState*) db_stress_tool/db_stress_driver.cc:248
https://github.com/facebook/rocksdb/issues/6 0x4e6ab3 in rocksdb::db_stress_tool(int, char**) db_stress_tool/db_stress_tool.cc:365
https://github.com/facebook/rocksdb/issues/7 0x4e260a in main db_stress_tool/db_stress.cc:23
https://github.com/facebook/rocksdb/issues/8 0x7f733022c656 in __libc_start_call_main ../sysdeps/nptl/libc_start_call_main.h:58
https://github.com/facebook/rocksdb/issues/9 0x7f733022c717 in __libc_start_main_impl ../csu/libc-start.c:409
https://github.com/facebook/rocksdb/issues/10 0x4e2530 in _start (/data/users/jewoongh/rocksdb/db_stress+0x4e2530)
```
`heap-use-after-free` was no longer happening with the same command after making the change.
Reviewed By: pdillinger
Differential Revision: D58871081
Pulled By: jaykorean
fbshipit-source-id: 0194c34ffec5f16a6556c6bf3941a27253a4ecb4
2024-06-21 18:56:10 +00:00
|
|
|
std::string target(key().data(), key().size());
|
MultiCFIterator Refactor - CoalescingIterator & AttributeGroupIterator (#12480)
Summary:
There are a couple of reasons to modify the current implementation of the MultiCfIterator, which implements the generic `Iterator` interface.
- The default behavior of `value()`/`columns()` returning data from different Column Families for different keys can be prone to errors, even though there might be valid use cases where users do not care about the origin of the value/columns.
- The `attribute_groups()` API, which is not yet implemented, will not be useful for a single-CF iterator.
In this PR, we are implementing the following changes:
- `IteratorBase` introduced, which includes all basic iterator functions except `value()` and `columns()`.
- `Iterator`, which now inherits from `IteratorBase`, includes `value()` and `columns()`.
- New public interface `AttributeGroupIterator` inherits from `IteratorBase` and additionally includes `attribute_groups()` (to be implemented).
- Renamed former `MultiCfIterator` to `CoalescingIterator` which inherits from `Iterator`
- Existing MultiCfIteratorTest has been split into two - `CoalescingIteratorTest` and `AttributeGroupIteratorTest`.
- Moved AttributeGroup related code from `wide_columns.h` to a new file, `attribute_groups.h`.
Some Implementation Details
- `MultiCfIteratorImpl` takes two functions - `populate_func` and `reset_func` and use them to populate `value_` and `columns_` in CoalescingIterator and `attribute_groups_` in AttributeGroupIterator. In CoalescingIterator, populate_func is `Coalesce()`, in AttributeGroupIterator populate_func is `AddToAttributeGroups()`. `reset_func` clears populated value_, columns_ and attribute_groups_ accordingly.
- `Coalesce()` merge sorts columns from multiple CFs when a key exists in more than on CFs. column that appears in later CF overwrites the prior ones.
For example, if CF1 has `"key_1" ==> {"col_1": "foo", "col_2", "baz"}` and CF2 has `"key_1" ==> {"col_2": "quux", "col_3", "bla"}`, and when the iterator is at `key_1`, `columns()` will return `{"col_1": "foo", "col_2", "quux", "col_3", "bla"}`
In this example, `value()` will be empty, because none of them have values for `kDefaultColumnName`
Pull Request resolved: https://github.com/facebook/rocksdb/pull/12480
Test Plan:
## Unit Test
```
./multi_cf_iterator_test
```
## Performance Test
To make sure this change does not impact existing `Iterator` performance
**Build**
```
$> make -j64 release
```
**Setup**
```
$> TEST_TMPDIR=/dev/shm/db_bench ./db_bench -benchmarks="filluniquerandom" -key_size=32 -value_size=512 -num=1000000 -compression_type=none
```
**Run**
```
TEST_TMPDIR=/dev/shm/db_bench ./db_bench -use_existing_db=1 -benchmarks="newiterator,seekrandom" -cache_size=10485760000
```
**Before the change**
```
DB path: [/dev/shm/db_bench/dbbench]
newiterator : 0.519 micros/op 1927904 ops/sec 0.519 seconds 1000000 operations;
DB path: [/dev/shm/db_bench/dbbench]
seekrandom : 5.302 micros/op 188589 ops/sec 5.303 seconds 1000000 operations; (0 of 1000000 found)
```
**After the change**
```
DB path: [/dev/shm/db_bench/dbbench]
newiterator : 0.497 micros/op 2011012 ops/sec 0.497 seconds 1000000 operations;
DB path: [/dev/shm/db_bench/dbbench]
seekrandom : 5.252 micros/op 190405 ops/sec 5.252 seconds 1000000 operations; (0 of 1000000 found)
```
Reviewed By: ltamasi
Differential Revision: D55353909
Pulled By: jaykorean
fbshipit-source-id: 8d7786ffee09e022261ce34aa60e8633685e1946
2024-04-11 18:34:04 +00:00
|
|
|
InitMaxHeap();
|
|
|
|
SeekForPrev(target);
|
|
|
|
});
|
|
|
|
AdvanceIterator(max_heap, [](Iterator* iter) { iter->Prev(); });
|
|
|
|
}
|
|
|
|
|
|
|
|
private:
|
|
|
|
std::vector<std::pair<ColumnFamilyHandle*, std::unique_ptr<Iterator>>>
|
|
|
|
cfh_iter_pairs_;
|
|
|
|
Status status_;
|
|
|
|
|
|
|
|
template <typename CompareOp>
|
|
|
|
class MultiCfHeapItemComparator {
|
|
|
|
public:
|
|
|
|
explicit MultiCfHeapItemComparator(const Comparator* comparator)
|
|
|
|
: comparator_(comparator) {}
|
|
|
|
bool operator()(const MultiCfIteratorInfo& a,
|
|
|
|
const MultiCfIteratorInfo& b) const {
|
|
|
|
assert(a.iterator);
|
|
|
|
assert(b.iterator);
|
|
|
|
assert(a.iterator->Valid());
|
|
|
|
assert(b.iterator->Valid());
|
|
|
|
int c = comparator_->Compare(a.iterator->key(), b.iterator->key());
|
|
|
|
assert(c != 0 || a.order != b.order);
|
|
|
|
return c == 0 ? a.order - b.order > 0 : CompareOp()(c, 0);
|
|
|
|
}
|
|
|
|
|
|
|
|
private:
|
|
|
|
const Comparator* comparator_;
|
|
|
|
};
|
|
|
|
const Comparator* comparator_;
|
|
|
|
using MultiCfMinHeap =
|
|
|
|
BinaryHeap<MultiCfIteratorInfo,
|
|
|
|
MultiCfHeapItemComparator<std::greater<int>>>;
|
|
|
|
using MultiCfMaxHeap = BinaryHeap<MultiCfIteratorInfo,
|
|
|
|
MultiCfHeapItemComparator<std::less<int>>>;
|
|
|
|
|
|
|
|
using MultiCfIterHeap = std::variant<MultiCfMinHeap, MultiCfMaxHeap>;
|
|
|
|
|
|
|
|
MultiCfIterHeap heap_;
|
|
|
|
|
|
|
|
std::function<void()> reset_func_;
|
2024-04-16 15:45:38 +00:00
|
|
|
std::function<void(autovector<MultiCfIteratorInfo>)> populate_func_;
|
MultiCFIterator Refactor - CoalescingIterator & AttributeGroupIterator (#12480)
Summary:
There are a couple of reasons to modify the current implementation of the MultiCfIterator, which implements the generic `Iterator` interface.
- The default behavior of `value()`/`columns()` returning data from different Column Families for different keys can be prone to errors, even though there might be valid use cases where users do not care about the origin of the value/columns.
- The `attribute_groups()` API, which is not yet implemented, will not be useful for a single-CF iterator.
In this PR, we are implementing the following changes:
- `IteratorBase` introduced, which includes all basic iterator functions except `value()` and `columns()`.
- `Iterator`, which now inherits from `IteratorBase`, includes `value()` and `columns()`.
- New public interface `AttributeGroupIterator` inherits from `IteratorBase` and additionally includes `attribute_groups()` (to be implemented).
- Renamed former `MultiCfIterator` to `CoalescingIterator` which inherits from `Iterator`
- Existing MultiCfIteratorTest has been split into two - `CoalescingIteratorTest` and `AttributeGroupIteratorTest`.
- Moved AttributeGroup related code from `wide_columns.h` to a new file, `attribute_groups.h`.
Some Implementation Details
- `MultiCfIteratorImpl` takes two functions - `populate_func` and `reset_func` and use them to populate `value_` and `columns_` in CoalescingIterator and `attribute_groups_` in AttributeGroupIterator. In CoalescingIterator, populate_func is `Coalesce()`, in AttributeGroupIterator populate_func is `AddToAttributeGroups()`. `reset_func` clears populated value_, columns_ and attribute_groups_ accordingly.
- `Coalesce()` merge sorts columns from multiple CFs when a key exists in more than on CFs. column that appears in later CF overwrites the prior ones.
For example, if CF1 has `"key_1" ==> {"col_1": "foo", "col_2", "baz"}` and CF2 has `"key_1" ==> {"col_2": "quux", "col_3", "bla"}`, and when the iterator is at `key_1`, `columns()` will return `{"col_1": "foo", "col_2", "quux", "col_3", "bla"}`
In this example, `value()` will be empty, because none of them have values for `kDefaultColumnName`
Pull Request resolved: https://github.com/facebook/rocksdb/pull/12480
Test Plan:
## Unit Test
```
./multi_cf_iterator_test
```
## Performance Test
To make sure this change does not impact existing `Iterator` performance
**Build**
```
$> make -j64 release
```
**Setup**
```
$> TEST_TMPDIR=/dev/shm/db_bench ./db_bench -benchmarks="filluniquerandom" -key_size=32 -value_size=512 -num=1000000 -compression_type=none
```
**Run**
```
TEST_TMPDIR=/dev/shm/db_bench ./db_bench -use_existing_db=1 -benchmarks="newiterator,seekrandom" -cache_size=10485760000
```
**Before the change**
```
DB path: [/dev/shm/db_bench/dbbench]
newiterator : 0.519 micros/op 1927904 ops/sec 0.519 seconds 1000000 operations;
DB path: [/dev/shm/db_bench/dbbench]
seekrandom : 5.302 micros/op 188589 ops/sec 5.303 seconds 1000000 operations; (0 of 1000000 found)
```
**After the change**
```
DB path: [/dev/shm/db_bench/dbbench]
newiterator : 0.497 micros/op 2011012 ops/sec 0.497 seconds 1000000 operations;
DB path: [/dev/shm/db_bench/dbbench]
seekrandom : 5.252 micros/op 190405 ops/sec 5.252 seconds 1000000 operations; (0 of 1000000 found)
```
Reviewed By: ltamasi
Differential Revision: D55353909
Pulled By: jaykorean
fbshipit-source-id: 8d7786ffee09e022261ce34aa60e8633685e1946
2024-04-11 18:34:04 +00:00
|
|
|
|
|
|
|
Iterator* current() const {
|
|
|
|
if (std::holds_alternative<MultiCfMaxHeap>(heap_)) {
|
|
|
|
auto& max_heap = std::get<MultiCfMaxHeap>(heap_);
|
|
|
|
return max_heap.top().iterator;
|
|
|
|
}
|
|
|
|
auto& min_heap = std::get<MultiCfMinHeap>(heap_);
|
|
|
|
return min_heap.top().iterator;
|
|
|
|
}
|
|
|
|
|
|
|
|
void considerStatus(Status s) {
|
|
|
|
if (!s.ok() && status_.ok()) {
|
|
|
|
status_ = std::move(s);
|
|
|
|
}
|
|
|
|
}
|
|
|
|
|
|
|
|
template <typename HeapType, typename InitFunc>
|
|
|
|
HeapType& GetHeap(InitFunc initFunc) {
|
|
|
|
if (!std::holds_alternative<HeapType>(heap_)) {
|
|
|
|
initFunc();
|
|
|
|
}
|
|
|
|
return std::get<HeapType>(heap_);
|
|
|
|
}
|
|
|
|
|
|
|
|
void InitMinHeap() {
|
|
|
|
heap_.emplace<MultiCfMinHeap>(
|
|
|
|
MultiCfHeapItemComparator<std::greater<int>>(comparator_));
|
|
|
|
}
|
|
|
|
void InitMaxHeap() {
|
|
|
|
heap_.emplace<MultiCfMaxHeap>(
|
|
|
|
MultiCfHeapItemComparator<std::less<int>>(comparator_));
|
|
|
|
}
|
|
|
|
|
2024-04-16 15:45:38 +00:00
|
|
|
template <typename BinaryHeap, typename ChildSeekFuncType>
|
|
|
|
void SeekCommon(BinaryHeap& heap, ChildSeekFuncType child_seek_func) {
|
MultiCFIterator Refactor - CoalescingIterator & AttributeGroupIterator (#12480)
Summary:
There are a couple of reasons to modify the current implementation of the MultiCfIterator, which implements the generic `Iterator` interface.
- The default behavior of `value()`/`columns()` returning data from different Column Families for different keys can be prone to errors, even though there might be valid use cases where users do not care about the origin of the value/columns.
- The `attribute_groups()` API, which is not yet implemented, will not be useful for a single-CF iterator.
In this PR, we are implementing the following changes:
- `IteratorBase` introduced, which includes all basic iterator functions except `value()` and `columns()`.
- `Iterator`, which now inherits from `IteratorBase`, includes `value()` and `columns()`.
- New public interface `AttributeGroupIterator` inherits from `IteratorBase` and additionally includes `attribute_groups()` (to be implemented).
- Renamed former `MultiCfIterator` to `CoalescingIterator` which inherits from `Iterator`
- Existing MultiCfIteratorTest has been split into two - `CoalescingIteratorTest` and `AttributeGroupIteratorTest`.
- Moved AttributeGroup related code from `wide_columns.h` to a new file, `attribute_groups.h`.
Some Implementation Details
- `MultiCfIteratorImpl` takes two functions - `populate_func` and `reset_func` and use them to populate `value_` and `columns_` in CoalescingIterator and `attribute_groups_` in AttributeGroupIterator. In CoalescingIterator, populate_func is `Coalesce()`, in AttributeGroupIterator populate_func is `AddToAttributeGroups()`. `reset_func` clears populated value_, columns_ and attribute_groups_ accordingly.
- `Coalesce()` merge sorts columns from multiple CFs when a key exists in more than on CFs. column that appears in later CF overwrites the prior ones.
For example, if CF1 has `"key_1" ==> {"col_1": "foo", "col_2", "baz"}` and CF2 has `"key_1" ==> {"col_2": "quux", "col_3", "bla"}`, and when the iterator is at `key_1`, `columns()` will return `{"col_1": "foo", "col_2", "quux", "col_3", "bla"}`
In this example, `value()` will be empty, because none of them have values for `kDefaultColumnName`
Pull Request resolved: https://github.com/facebook/rocksdb/pull/12480
Test Plan:
## Unit Test
```
./multi_cf_iterator_test
```
## Performance Test
To make sure this change does not impact existing `Iterator` performance
**Build**
```
$> make -j64 release
```
**Setup**
```
$> TEST_TMPDIR=/dev/shm/db_bench ./db_bench -benchmarks="filluniquerandom" -key_size=32 -value_size=512 -num=1000000 -compression_type=none
```
**Run**
```
TEST_TMPDIR=/dev/shm/db_bench ./db_bench -use_existing_db=1 -benchmarks="newiterator,seekrandom" -cache_size=10485760000
```
**Before the change**
```
DB path: [/dev/shm/db_bench/dbbench]
newiterator : 0.519 micros/op 1927904 ops/sec 0.519 seconds 1000000 operations;
DB path: [/dev/shm/db_bench/dbbench]
seekrandom : 5.302 micros/op 188589 ops/sec 5.303 seconds 1000000 operations; (0 of 1000000 found)
```
**After the change**
```
DB path: [/dev/shm/db_bench/dbbench]
newiterator : 0.497 micros/op 2011012 ops/sec 0.497 seconds 1000000 operations;
DB path: [/dev/shm/db_bench/dbbench]
seekrandom : 5.252 micros/op 190405 ops/sec 5.252 seconds 1000000 operations; (0 of 1000000 found)
```
Reviewed By: ltamasi
Differential Revision: D55353909
Pulled By: jaykorean
fbshipit-source-id: 8d7786ffee09e022261ce34aa60e8633685e1946
2024-04-11 18:34:04 +00:00
|
|
|
reset_func_();
|
|
|
|
heap.clear();
|
|
|
|
int i = 0;
|
2024-04-16 15:45:38 +00:00
|
|
|
for (auto& [cfh, iter] : cfh_iter_pairs_) {
|
MultiCFIterator Refactor - CoalescingIterator & AttributeGroupIterator (#12480)
Summary:
There are a couple of reasons to modify the current implementation of the MultiCfIterator, which implements the generic `Iterator` interface.
- The default behavior of `value()`/`columns()` returning data from different Column Families for different keys can be prone to errors, even though there might be valid use cases where users do not care about the origin of the value/columns.
- The `attribute_groups()` API, which is not yet implemented, will not be useful for a single-CF iterator.
In this PR, we are implementing the following changes:
- `IteratorBase` introduced, which includes all basic iterator functions except `value()` and `columns()`.
- `Iterator`, which now inherits from `IteratorBase`, includes `value()` and `columns()`.
- New public interface `AttributeGroupIterator` inherits from `IteratorBase` and additionally includes `attribute_groups()` (to be implemented).
- Renamed former `MultiCfIterator` to `CoalescingIterator` which inherits from `Iterator`
- Existing MultiCfIteratorTest has been split into two - `CoalescingIteratorTest` and `AttributeGroupIteratorTest`.
- Moved AttributeGroup related code from `wide_columns.h` to a new file, `attribute_groups.h`.
Some Implementation Details
- `MultiCfIteratorImpl` takes two functions - `populate_func` and `reset_func` and use them to populate `value_` and `columns_` in CoalescingIterator and `attribute_groups_` in AttributeGroupIterator. In CoalescingIterator, populate_func is `Coalesce()`, in AttributeGroupIterator populate_func is `AddToAttributeGroups()`. `reset_func` clears populated value_, columns_ and attribute_groups_ accordingly.
- `Coalesce()` merge sorts columns from multiple CFs when a key exists in more than on CFs. column that appears in later CF overwrites the prior ones.
For example, if CF1 has `"key_1" ==> {"col_1": "foo", "col_2", "baz"}` and CF2 has `"key_1" ==> {"col_2": "quux", "col_3", "bla"}`, and when the iterator is at `key_1`, `columns()` will return `{"col_1": "foo", "col_2", "quux", "col_3", "bla"}`
In this example, `value()` will be empty, because none of them have values for `kDefaultColumnName`
Pull Request resolved: https://github.com/facebook/rocksdb/pull/12480
Test Plan:
## Unit Test
```
./multi_cf_iterator_test
```
## Performance Test
To make sure this change does not impact existing `Iterator` performance
**Build**
```
$> make -j64 release
```
**Setup**
```
$> TEST_TMPDIR=/dev/shm/db_bench ./db_bench -benchmarks="filluniquerandom" -key_size=32 -value_size=512 -num=1000000 -compression_type=none
```
**Run**
```
TEST_TMPDIR=/dev/shm/db_bench ./db_bench -use_existing_db=1 -benchmarks="newiterator,seekrandom" -cache_size=10485760000
```
**Before the change**
```
DB path: [/dev/shm/db_bench/dbbench]
newiterator : 0.519 micros/op 1927904 ops/sec 0.519 seconds 1000000 operations;
DB path: [/dev/shm/db_bench/dbbench]
seekrandom : 5.302 micros/op 188589 ops/sec 5.303 seconds 1000000 operations; (0 of 1000000 found)
```
**After the change**
```
DB path: [/dev/shm/db_bench/dbbench]
newiterator : 0.497 micros/op 2011012 ops/sec 0.497 seconds 1000000 operations;
DB path: [/dev/shm/db_bench/dbbench]
seekrandom : 5.252 micros/op 190405 ops/sec 5.252 seconds 1000000 operations; (0 of 1000000 found)
```
Reviewed By: ltamasi
Differential Revision: D55353909
Pulled By: jaykorean
fbshipit-source-id: 8d7786ffee09e022261ce34aa60e8633685e1946
2024-04-11 18:34:04 +00:00
|
|
|
child_seek_func(iter.get());
|
|
|
|
if (iter->Valid()) {
|
|
|
|
assert(iter->status().ok());
|
2024-04-16 15:45:38 +00:00
|
|
|
heap.push(MultiCfIteratorInfo{cfh, iter.get(), i});
|
MultiCFIterator Refactor - CoalescingIterator & AttributeGroupIterator (#12480)
Summary:
There are a couple of reasons to modify the current implementation of the MultiCfIterator, which implements the generic `Iterator` interface.
- The default behavior of `value()`/`columns()` returning data from different Column Families for different keys can be prone to errors, even though there might be valid use cases where users do not care about the origin of the value/columns.
- The `attribute_groups()` API, which is not yet implemented, will not be useful for a single-CF iterator.
In this PR, we are implementing the following changes:
- `IteratorBase` introduced, which includes all basic iterator functions except `value()` and `columns()`.
- `Iterator`, which now inherits from `IteratorBase`, includes `value()` and `columns()`.
- New public interface `AttributeGroupIterator` inherits from `IteratorBase` and additionally includes `attribute_groups()` (to be implemented).
- Renamed former `MultiCfIterator` to `CoalescingIterator` which inherits from `Iterator`
- Existing MultiCfIteratorTest has been split into two - `CoalescingIteratorTest` and `AttributeGroupIteratorTest`.
- Moved AttributeGroup related code from `wide_columns.h` to a new file, `attribute_groups.h`.
Some Implementation Details
- `MultiCfIteratorImpl` takes two functions - `populate_func` and `reset_func` and use them to populate `value_` and `columns_` in CoalescingIterator and `attribute_groups_` in AttributeGroupIterator. In CoalescingIterator, populate_func is `Coalesce()`, in AttributeGroupIterator populate_func is `AddToAttributeGroups()`. `reset_func` clears populated value_, columns_ and attribute_groups_ accordingly.
- `Coalesce()` merge sorts columns from multiple CFs when a key exists in more than on CFs. column that appears in later CF overwrites the prior ones.
For example, if CF1 has `"key_1" ==> {"col_1": "foo", "col_2", "baz"}` and CF2 has `"key_1" ==> {"col_2": "quux", "col_3", "bla"}`, and when the iterator is at `key_1`, `columns()` will return `{"col_1": "foo", "col_2", "quux", "col_3", "bla"}`
In this example, `value()` will be empty, because none of them have values for `kDefaultColumnName`
Pull Request resolved: https://github.com/facebook/rocksdb/pull/12480
Test Plan:
## Unit Test
```
./multi_cf_iterator_test
```
## Performance Test
To make sure this change does not impact existing `Iterator` performance
**Build**
```
$> make -j64 release
```
**Setup**
```
$> TEST_TMPDIR=/dev/shm/db_bench ./db_bench -benchmarks="filluniquerandom" -key_size=32 -value_size=512 -num=1000000 -compression_type=none
```
**Run**
```
TEST_TMPDIR=/dev/shm/db_bench ./db_bench -use_existing_db=1 -benchmarks="newiterator,seekrandom" -cache_size=10485760000
```
**Before the change**
```
DB path: [/dev/shm/db_bench/dbbench]
newiterator : 0.519 micros/op 1927904 ops/sec 0.519 seconds 1000000 operations;
DB path: [/dev/shm/db_bench/dbbench]
seekrandom : 5.302 micros/op 188589 ops/sec 5.303 seconds 1000000 operations; (0 of 1000000 found)
```
**After the change**
```
DB path: [/dev/shm/db_bench/dbbench]
newiterator : 0.497 micros/op 2011012 ops/sec 0.497 seconds 1000000 operations;
DB path: [/dev/shm/db_bench/dbbench]
seekrandom : 5.252 micros/op 190405 ops/sec 5.252 seconds 1000000 operations; (0 of 1000000 found)
```
Reviewed By: ltamasi
Differential Revision: D55353909
Pulled By: jaykorean
fbshipit-source-id: 8d7786ffee09e022261ce34aa60e8633685e1946
2024-04-11 18:34:04 +00:00
|
|
|
} else {
|
|
|
|
considerStatus(iter->status());
|
|
|
|
if (!status_.ok()) {
|
|
|
|
// Non-OK status from the iterator. Bail out early
|
|
|
|
heap.clear();
|
|
|
|
break;
|
|
|
|
}
|
|
|
|
}
|
|
|
|
++i;
|
|
|
|
}
|
|
|
|
if (!heap.empty()) {
|
2024-04-16 15:45:38 +00:00
|
|
|
PopulateIterator(heap);
|
MultiCFIterator Refactor - CoalescingIterator & AttributeGroupIterator (#12480)
Summary:
There are a couple of reasons to modify the current implementation of the MultiCfIterator, which implements the generic `Iterator` interface.
- The default behavior of `value()`/`columns()` returning data from different Column Families for different keys can be prone to errors, even though there might be valid use cases where users do not care about the origin of the value/columns.
- The `attribute_groups()` API, which is not yet implemented, will not be useful for a single-CF iterator.
In this PR, we are implementing the following changes:
- `IteratorBase` introduced, which includes all basic iterator functions except `value()` and `columns()`.
- `Iterator`, which now inherits from `IteratorBase`, includes `value()` and `columns()`.
- New public interface `AttributeGroupIterator` inherits from `IteratorBase` and additionally includes `attribute_groups()` (to be implemented).
- Renamed former `MultiCfIterator` to `CoalescingIterator` which inherits from `Iterator`
- Existing MultiCfIteratorTest has been split into two - `CoalescingIteratorTest` and `AttributeGroupIteratorTest`.
- Moved AttributeGroup related code from `wide_columns.h` to a new file, `attribute_groups.h`.
Some Implementation Details
- `MultiCfIteratorImpl` takes two functions - `populate_func` and `reset_func` and use them to populate `value_` and `columns_` in CoalescingIterator and `attribute_groups_` in AttributeGroupIterator. In CoalescingIterator, populate_func is `Coalesce()`, in AttributeGroupIterator populate_func is `AddToAttributeGroups()`. `reset_func` clears populated value_, columns_ and attribute_groups_ accordingly.
- `Coalesce()` merge sorts columns from multiple CFs when a key exists in more than on CFs. column that appears in later CF overwrites the prior ones.
For example, if CF1 has `"key_1" ==> {"col_1": "foo", "col_2", "baz"}` and CF2 has `"key_1" ==> {"col_2": "quux", "col_3", "bla"}`, and when the iterator is at `key_1`, `columns()` will return `{"col_1": "foo", "col_2", "quux", "col_3", "bla"}`
In this example, `value()` will be empty, because none of them have values for `kDefaultColumnName`
Pull Request resolved: https://github.com/facebook/rocksdb/pull/12480
Test Plan:
## Unit Test
```
./multi_cf_iterator_test
```
## Performance Test
To make sure this change does not impact existing `Iterator` performance
**Build**
```
$> make -j64 release
```
**Setup**
```
$> TEST_TMPDIR=/dev/shm/db_bench ./db_bench -benchmarks="filluniquerandom" -key_size=32 -value_size=512 -num=1000000 -compression_type=none
```
**Run**
```
TEST_TMPDIR=/dev/shm/db_bench ./db_bench -use_existing_db=1 -benchmarks="newiterator,seekrandom" -cache_size=10485760000
```
**Before the change**
```
DB path: [/dev/shm/db_bench/dbbench]
newiterator : 0.519 micros/op 1927904 ops/sec 0.519 seconds 1000000 operations;
DB path: [/dev/shm/db_bench/dbbench]
seekrandom : 5.302 micros/op 188589 ops/sec 5.303 seconds 1000000 operations; (0 of 1000000 found)
```
**After the change**
```
DB path: [/dev/shm/db_bench/dbbench]
newiterator : 0.497 micros/op 2011012 ops/sec 0.497 seconds 1000000 operations;
DB path: [/dev/shm/db_bench/dbbench]
seekrandom : 5.252 micros/op 190405 ops/sec 5.252 seconds 1000000 operations; (0 of 1000000 found)
```
Reviewed By: ltamasi
Differential Revision: D55353909
Pulled By: jaykorean
fbshipit-source-id: 8d7786ffee09e022261ce34aa60e8633685e1946
2024-04-11 18:34:04 +00:00
|
|
|
}
|
|
|
|
}
|
|
|
|
|
|
|
|
template <typename BinaryHeap, typename AdvanceFuncType>
|
|
|
|
void AdvanceIterator(BinaryHeap& heap, AdvanceFuncType advance_func) {
|
|
|
|
reset_func_();
|
2024-06-14 22:59:17 +00:00
|
|
|
// It is possible for one or more child iters are at invalid keys due to
|
|
|
|
// manual prefix iteration. For such cases, we consider the result of the
|
|
|
|
// multi-cf-iter is also undefined.
|
|
|
|
// https://github.com/facebook/rocksdb/wiki/Prefix-Seek#manual-prefix-iterating
|
|
|
|
// for details about manual prefix iteration
|
|
|
|
if (heap.empty()) {
|
|
|
|
return;
|
|
|
|
}
|
MultiCFIterator Refactor - CoalescingIterator & AttributeGroupIterator (#12480)
Summary:
There are a couple of reasons to modify the current implementation of the MultiCfIterator, which implements the generic `Iterator` interface.
- The default behavior of `value()`/`columns()` returning data from different Column Families for different keys can be prone to errors, even though there might be valid use cases where users do not care about the origin of the value/columns.
- The `attribute_groups()` API, which is not yet implemented, will not be useful for a single-CF iterator.
In this PR, we are implementing the following changes:
- `IteratorBase` introduced, which includes all basic iterator functions except `value()` and `columns()`.
- `Iterator`, which now inherits from `IteratorBase`, includes `value()` and `columns()`.
- New public interface `AttributeGroupIterator` inherits from `IteratorBase` and additionally includes `attribute_groups()` (to be implemented).
- Renamed former `MultiCfIterator` to `CoalescingIterator` which inherits from `Iterator`
- Existing MultiCfIteratorTest has been split into two - `CoalescingIteratorTest` and `AttributeGroupIteratorTest`.
- Moved AttributeGroup related code from `wide_columns.h` to a new file, `attribute_groups.h`.
Some Implementation Details
- `MultiCfIteratorImpl` takes two functions - `populate_func` and `reset_func` and use them to populate `value_` and `columns_` in CoalescingIterator and `attribute_groups_` in AttributeGroupIterator. In CoalescingIterator, populate_func is `Coalesce()`, in AttributeGroupIterator populate_func is `AddToAttributeGroups()`. `reset_func` clears populated value_, columns_ and attribute_groups_ accordingly.
- `Coalesce()` merge sorts columns from multiple CFs when a key exists in more than on CFs. column that appears in later CF overwrites the prior ones.
For example, if CF1 has `"key_1" ==> {"col_1": "foo", "col_2", "baz"}` and CF2 has `"key_1" ==> {"col_2": "quux", "col_3", "bla"}`, and when the iterator is at `key_1`, `columns()` will return `{"col_1": "foo", "col_2", "quux", "col_3", "bla"}`
In this example, `value()` will be empty, because none of them have values for `kDefaultColumnName`
Pull Request resolved: https://github.com/facebook/rocksdb/pull/12480
Test Plan:
## Unit Test
```
./multi_cf_iterator_test
```
## Performance Test
To make sure this change does not impact existing `Iterator` performance
**Build**
```
$> make -j64 release
```
**Setup**
```
$> TEST_TMPDIR=/dev/shm/db_bench ./db_bench -benchmarks="filluniquerandom" -key_size=32 -value_size=512 -num=1000000 -compression_type=none
```
**Run**
```
TEST_TMPDIR=/dev/shm/db_bench ./db_bench -use_existing_db=1 -benchmarks="newiterator,seekrandom" -cache_size=10485760000
```
**Before the change**
```
DB path: [/dev/shm/db_bench/dbbench]
newiterator : 0.519 micros/op 1927904 ops/sec 0.519 seconds 1000000 operations;
DB path: [/dev/shm/db_bench/dbbench]
seekrandom : 5.302 micros/op 188589 ops/sec 5.303 seconds 1000000 operations; (0 of 1000000 found)
```
**After the change**
```
DB path: [/dev/shm/db_bench/dbbench]
newiterator : 0.497 micros/op 2011012 ops/sec 0.497 seconds 1000000 operations;
DB path: [/dev/shm/db_bench/dbbench]
seekrandom : 5.252 micros/op 190405 ops/sec 5.252 seconds 1000000 operations; (0 of 1000000 found)
```
Reviewed By: ltamasi
Differential Revision: D55353909
Pulled By: jaykorean
fbshipit-source-id: 8d7786ffee09e022261ce34aa60e8633685e1946
2024-04-11 18:34:04 +00:00
|
|
|
|
2024-04-16 15:45:38 +00:00
|
|
|
// 1. Keep the top iterator (by popping it from the heap)
|
|
|
|
// 2. Make sure all others have iterated past the top iterator key slice
|
|
|
|
// 3. Advance the top iterator, and add it back to the heap if valid
|
MultiCFIterator Refactor - CoalescingIterator & AttributeGroupIterator (#12480)
Summary:
There are a couple of reasons to modify the current implementation of the MultiCfIterator, which implements the generic `Iterator` interface.
- The default behavior of `value()`/`columns()` returning data from different Column Families for different keys can be prone to errors, even though there might be valid use cases where users do not care about the origin of the value/columns.
- The `attribute_groups()` API, which is not yet implemented, will not be useful for a single-CF iterator.
In this PR, we are implementing the following changes:
- `IteratorBase` introduced, which includes all basic iterator functions except `value()` and `columns()`.
- `Iterator`, which now inherits from `IteratorBase`, includes `value()` and `columns()`.
- New public interface `AttributeGroupIterator` inherits from `IteratorBase` and additionally includes `attribute_groups()` (to be implemented).
- Renamed former `MultiCfIterator` to `CoalescingIterator` which inherits from `Iterator`
- Existing MultiCfIteratorTest has been split into two - `CoalescingIteratorTest` and `AttributeGroupIteratorTest`.
- Moved AttributeGroup related code from `wide_columns.h` to a new file, `attribute_groups.h`.
Some Implementation Details
- `MultiCfIteratorImpl` takes two functions - `populate_func` and `reset_func` and use them to populate `value_` and `columns_` in CoalescingIterator and `attribute_groups_` in AttributeGroupIterator. In CoalescingIterator, populate_func is `Coalesce()`, in AttributeGroupIterator populate_func is `AddToAttributeGroups()`. `reset_func` clears populated value_, columns_ and attribute_groups_ accordingly.
- `Coalesce()` merge sorts columns from multiple CFs when a key exists in more than on CFs. column that appears in later CF overwrites the prior ones.
For example, if CF1 has `"key_1" ==> {"col_1": "foo", "col_2", "baz"}` and CF2 has `"key_1" ==> {"col_2": "quux", "col_3", "bla"}`, and when the iterator is at `key_1`, `columns()` will return `{"col_1": "foo", "col_2", "quux", "col_3", "bla"}`
In this example, `value()` will be empty, because none of them have values for `kDefaultColumnName`
Pull Request resolved: https://github.com/facebook/rocksdb/pull/12480
Test Plan:
## Unit Test
```
./multi_cf_iterator_test
```
## Performance Test
To make sure this change does not impact existing `Iterator` performance
**Build**
```
$> make -j64 release
```
**Setup**
```
$> TEST_TMPDIR=/dev/shm/db_bench ./db_bench -benchmarks="filluniquerandom" -key_size=32 -value_size=512 -num=1000000 -compression_type=none
```
**Run**
```
TEST_TMPDIR=/dev/shm/db_bench ./db_bench -use_existing_db=1 -benchmarks="newiterator,seekrandom" -cache_size=10485760000
```
**Before the change**
```
DB path: [/dev/shm/db_bench/dbbench]
newiterator : 0.519 micros/op 1927904 ops/sec 0.519 seconds 1000000 operations;
DB path: [/dev/shm/db_bench/dbbench]
seekrandom : 5.302 micros/op 188589 ops/sec 5.303 seconds 1000000 operations; (0 of 1000000 found)
```
**After the change**
```
DB path: [/dev/shm/db_bench/dbbench]
newiterator : 0.497 micros/op 2011012 ops/sec 0.497 seconds 1000000 operations;
DB path: [/dev/shm/db_bench/dbbench]
seekrandom : 5.252 micros/op 190405 ops/sec 5.252 seconds 1000000 operations; (0 of 1000000 found)
```
Reviewed By: ltamasi
Differential Revision: D55353909
Pulled By: jaykorean
fbshipit-source-id: 8d7786ffee09e022261ce34aa60e8633685e1946
2024-04-11 18:34:04 +00:00
|
|
|
auto top = heap.top();
|
2024-04-16 15:45:38 +00:00
|
|
|
heap.pop();
|
|
|
|
if (!heap.empty()) {
|
|
|
|
auto current = heap.top();
|
|
|
|
assert(current.iterator);
|
|
|
|
while (current.iterator->Valid() &&
|
|
|
|
comparator_->Compare(top.iterator->key(),
|
|
|
|
current.iterator->key()) == 0) {
|
|
|
|
assert(current.iterator->status().ok());
|
|
|
|
advance_func(current.iterator);
|
|
|
|
if (current.iterator->Valid()) {
|
|
|
|
heap.replace_top(heap.top());
|
|
|
|
} else {
|
|
|
|
considerStatus(current.iterator->status());
|
|
|
|
if (!status_.ok()) {
|
|
|
|
heap.clear();
|
|
|
|
return;
|
|
|
|
} else {
|
|
|
|
heap.pop();
|
|
|
|
}
|
|
|
|
}
|
|
|
|
if (!heap.empty()) {
|
|
|
|
current = heap.top();
|
|
|
|
}
|
|
|
|
}
|
|
|
|
}
|
MultiCFIterator Refactor - CoalescingIterator & AttributeGroupIterator (#12480)
Summary:
There are a couple of reasons to modify the current implementation of the MultiCfIterator, which implements the generic `Iterator` interface.
- The default behavior of `value()`/`columns()` returning data from different Column Families for different keys can be prone to errors, even though there might be valid use cases where users do not care about the origin of the value/columns.
- The `attribute_groups()` API, which is not yet implemented, will not be useful for a single-CF iterator.
In this PR, we are implementing the following changes:
- `IteratorBase` introduced, which includes all basic iterator functions except `value()` and `columns()`.
- `Iterator`, which now inherits from `IteratorBase`, includes `value()` and `columns()`.
- New public interface `AttributeGroupIterator` inherits from `IteratorBase` and additionally includes `attribute_groups()` (to be implemented).
- Renamed former `MultiCfIterator` to `CoalescingIterator` which inherits from `Iterator`
- Existing MultiCfIteratorTest has been split into two - `CoalescingIteratorTest` and `AttributeGroupIteratorTest`.
- Moved AttributeGroup related code from `wide_columns.h` to a new file, `attribute_groups.h`.
Some Implementation Details
- `MultiCfIteratorImpl` takes two functions - `populate_func` and `reset_func` and use them to populate `value_` and `columns_` in CoalescingIterator and `attribute_groups_` in AttributeGroupIterator. In CoalescingIterator, populate_func is `Coalesce()`, in AttributeGroupIterator populate_func is `AddToAttributeGroups()`. `reset_func` clears populated value_, columns_ and attribute_groups_ accordingly.
- `Coalesce()` merge sorts columns from multiple CFs when a key exists in more than on CFs. column that appears in later CF overwrites the prior ones.
For example, if CF1 has `"key_1" ==> {"col_1": "foo", "col_2", "baz"}` and CF2 has `"key_1" ==> {"col_2": "quux", "col_3", "bla"}`, and when the iterator is at `key_1`, `columns()` will return `{"col_1": "foo", "col_2", "quux", "col_3", "bla"}`
In this example, `value()` will be empty, because none of them have values for `kDefaultColumnName`
Pull Request resolved: https://github.com/facebook/rocksdb/pull/12480
Test Plan:
## Unit Test
```
./multi_cf_iterator_test
```
## Performance Test
To make sure this change does not impact existing `Iterator` performance
**Build**
```
$> make -j64 release
```
**Setup**
```
$> TEST_TMPDIR=/dev/shm/db_bench ./db_bench -benchmarks="filluniquerandom" -key_size=32 -value_size=512 -num=1000000 -compression_type=none
```
**Run**
```
TEST_TMPDIR=/dev/shm/db_bench ./db_bench -use_existing_db=1 -benchmarks="newiterator,seekrandom" -cache_size=10485760000
```
**Before the change**
```
DB path: [/dev/shm/db_bench/dbbench]
newiterator : 0.519 micros/op 1927904 ops/sec 0.519 seconds 1000000 operations;
DB path: [/dev/shm/db_bench/dbbench]
seekrandom : 5.302 micros/op 188589 ops/sec 5.303 seconds 1000000 operations; (0 of 1000000 found)
```
**After the change**
```
DB path: [/dev/shm/db_bench/dbbench]
newiterator : 0.497 micros/op 2011012 ops/sec 0.497 seconds 1000000 operations;
DB path: [/dev/shm/db_bench/dbbench]
seekrandom : 5.252 micros/op 190405 ops/sec 5.252 seconds 1000000 operations; (0 of 1000000 found)
```
Reviewed By: ltamasi
Differential Revision: D55353909
Pulled By: jaykorean
fbshipit-source-id: 8d7786ffee09e022261ce34aa60e8633685e1946
2024-04-11 18:34:04 +00:00
|
|
|
advance_func(top.iterator);
|
|
|
|
if (top.iterator->Valid()) {
|
|
|
|
assert(top.iterator->status().ok());
|
2024-04-16 15:45:38 +00:00
|
|
|
heap.push(top);
|
MultiCFIterator Refactor - CoalescingIterator & AttributeGroupIterator (#12480)
Summary:
There are a couple of reasons to modify the current implementation of the MultiCfIterator, which implements the generic `Iterator` interface.
- The default behavior of `value()`/`columns()` returning data from different Column Families for different keys can be prone to errors, even though there might be valid use cases where users do not care about the origin of the value/columns.
- The `attribute_groups()` API, which is not yet implemented, will not be useful for a single-CF iterator.
In this PR, we are implementing the following changes:
- `IteratorBase` introduced, which includes all basic iterator functions except `value()` and `columns()`.
- `Iterator`, which now inherits from `IteratorBase`, includes `value()` and `columns()`.
- New public interface `AttributeGroupIterator` inherits from `IteratorBase` and additionally includes `attribute_groups()` (to be implemented).
- Renamed former `MultiCfIterator` to `CoalescingIterator` which inherits from `Iterator`
- Existing MultiCfIteratorTest has been split into two - `CoalescingIteratorTest` and `AttributeGroupIteratorTest`.
- Moved AttributeGroup related code from `wide_columns.h` to a new file, `attribute_groups.h`.
Some Implementation Details
- `MultiCfIteratorImpl` takes two functions - `populate_func` and `reset_func` and use them to populate `value_` and `columns_` in CoalescingIterator and `attribute_groups_` in AttributeGroupIterator. In CoalescingIterator, populate_func is `Coalesce()`, in AttributeGroupIterator populate_func is `AddToAttributeGroups()`. `reset_func` clears populated value_, columns_ and attribute_groups_ accordingly.
- `Coalesce()` merge sorts columns from multiple CFs when a key exists in more than on CFs. column that appears in later CF overwrites the prior ones.
For example, if CF1 has `"key_1" ==> {"col_1": "foo", "col_2", "baz"}` and CF2 has `"key_1" ==> {"col_2": "quux", "col_3", "bla"}`, and when the iterator is at `key_1`, `columns()` will return `{"col_1": "foo", "col_2", "quux", "col_3", "bla"}`
In this example, `value()` will be empty, because none of them have values for `kDefaultColumnName`
Pull Request resolved: https://github.com/facebook/rocksdb/pull/12480
Test Plan:
## Unit Test
```
./multi_cf_iterator_test
```
## Performance Test
To make sure this change does not impact existing `Iterator` performance
**Build**
```
$> make -j64 release
```
**Setup**
```
$> TEST_TMPDIR=/dev/shm/db_bench ./db_bench -benchmarks="filluniquerandom" -key_size=32 -value_size=512 -num=1000000 -compression_type=none
```
**Run**
```
TEST_TMPDIR=/dev/shm/db_bench ./db_bench -use_existing_db=1 -benchmarks="newiterator,seekrandom" -cache_size=10485760000
```
**Before the change**
```
DB path: [/dev/shm/db_bench/dbbench]
newiterator : 0.519 micros/op 1927904 ops/sec 0.519 seconds 1000000 operations;
DB path: [/dev/shm/db_bench/dbbench]
seekrandom : 5.302 micros/op 188589 ops/sec 5.303 seconds 1000000 operations; (0 of 1000000 found)
```
**After the change**
```
DB path: [/dev/shm/db_bench/dbbench]
newiterator : 0.497 micros/op 2011012 ops/sec 0.497 seconds 1000000 operations;
DB path: [/dev/shm/db_bench/dbbench]
seekrandom : 5.252 micros/op 190405 ops/sec 5.252 seconds 1000000 operations; (0 of 1000000 found)
```
Reviewed By: ltamasi
Differential Revision: D55353909
Pulled By: jaykorean
fbshipit-source-id: 8d7786ffee09e022261ce34aa60e8633685e1946
2024-04-11 18:34:04 +00:00
|
|
|
} else {
|
|
|
|
considerStatus(top.iterator->status());
|
|
|
|
if (!status_.ok()) {
|
|
|
|
heap.clear();
|
|
|
|
return;
|
|
|
|
}
|
|
|
|
}
|
2024-04-16 15:45:38 +00:00
|
|
|
|
MultiCFIterator Refactor - CoalescingIterator & AttributeGroupIterator (#12480)
Summary:
There are a couple of reasons to modify the current implementation of the MultiCfIterator, which implements the generic `Iterator` interface.
- The default behavior of `value()`/`columns()` returning data from different Column Families for different keys can be prone to errors, even though there might be valid use cases where users do not care about the origin of the value/columns.
- The `attribute_groups()` API, which is not yet implemented, will not be useful for a single-CF iterator.
In this PR, we are implementing the following changes:
- `IteratorBase` introduced, which includes all basic iterator functions except `value()` and `columns()`.
- `Iterator`, which now inherits from `IteratorBase`, includes `value()` and `columns()`.
- New public interface `AttributeGroupIterator` inherits from `IteratorBase` and additionally includes `attribute_groups()` (to be implemented).
- Renamed former `MultiCfIterator` to `CoalescingIterator` which inherits from `Iterator`
- Existing MultiCfIteratorTest has been split into two - `CoalescingIteratorTest` and `AttributeGroupIteratorTest`.
- Moved AttributeGroup related code from `wide_columns.h` to a new file, `attribute_groups.h`.
Some Implementation Details
- `MultiCfIteratorImpl` takes two functions - `populate_func` and `reset_func` and use them to populate `value_` and `columns_` in CoalescingIterator and `attribute_groups_` in AttributeGroupIterator. In CoalescingIterator, populate_func is `Coalesce()`, in AttributeGroupIterator populate_func is `AddToAttributeGroups()`. `reset_func` clears populated value_, columns_ and attribute_groups_ accordingly.
- `Coalesce()` merge sorts columns from multiple CFs when a key exists in more than on CFs. column that appears in later CF overwrites the prior ones.
For example, if CF1 has `"key_1" ==> {"col_1": "foo", "col_2", "baz"}` and CF2 has `"key_1" ==> {"col_2": "quux", "col_3", "bla"}`, and when the iterator is at `key_1`, `columns()` will return `{"col_1": "foo", "col_2", "quux", "col_3", "bla"}`
In this example, `value()` will be empty, because none of them have values for `kDefaultColumnName`
Pull Request resolved: https://github.com/facebook/rocksdb/pull/12480
Test Plan:
## Unit Test
```
./multi_cf_iterator_test
```
## Performance Test
To make sure this change does not impact existing `Iterator` performance
**Build**
```
$> make -j64 release
```
**Setup**
```
$> TEST_TMPDIR=/dev/shm/db_bench ./db_bench -benchmarks="filluniquerandom" -key_size=32 -value_size=512 -num=1000000 -compression_type=none
```
**Run**
```
TEST_TMPDIR=/dev/shm/db_bench ./db_bench -use_existing_db=1 -benchmarks="newiterator,seekrandom" -cache_size=10485760000
```
**Before the change**
```
DB path: [/dev/shm/db_bench/dbbench]
newiterator : 0.519 micros/op 1927904 ops/sec 0.519 seconds 1000000 operations;
DB path: [/dev/shm/db_bench/dbbench]
seekrandom : 5.302 micros/op 188589 ops/sec 5.303 seconds 1000000 operations; (0 of 1000000 found)
```
**After the change**
```
DB path: [/dev/shm/db_bench/dbbench]
newiterator : 0.497 micros/op 2011012 ops/sec 0.497 seconds 1000000 operations;
DB path: [/dev/shm/db_bench/dbbench]
seekrandom : 5.252 micros/op 190405 ops/sec 5.252 seconds 1000000 operations; (0 of 1000000 found)
```
Reviewed By: ltamasi
Differential Revision: D55353909
Pulled By: jaykorean
fbshipit-source-id: 8d7786ffee09e022261ce34aa60e8633685e1946
2024-04-11 18:34:04 +00:00
|
|
|
if (!heap.empty()) {
|
2024-04-16 15:45:38 +00:00
|
|
|
PopulateIterator(heap);
|
MultiCFIterator Refactor - CoalescingIterator & AttributeGroupIterator (#12480)
Summary:
There are a couple of reasons to modify the current implementation of the MultiCfIterator, which implements the generic `Iterator` interface.
- The default behavior of `value()`/`columns()` returning data from different Column Families for different keys can be prone to errors, even though there might be valid use cases where users do not care about the origin of the value/columns.
- The `attribute_groups()` API, which is not yet implemented, will not be useful for a single-CF iterator.
In this PR, we are implementing the following changes:
- `IteratorBase` introduced, which includes all basic iterator functions except `value()` and `columns()`.
- `Iterator`, which now inherits from `IteratorBase`, includes `value()` and `columns()`.
- New public interface `AttributeGroupIterator` inherits from `IteratorBase` and additionally includes `attribute_groups()` (to be implemented).
- Renamed former `MultiCfIterator` to `CoalescingIterator` which inherits from `Iterator`
- Existing MultiCfIteratorTest has been split into two - `CoalescingIteratorTest` and `AttributeGroupIteratorTest`.
- Moved AttributeGroup related code from `wide_columns.h` to a new file, `attribute_groups.h`.
Some Implementation Details
- `MultiCfIteratorImpl` takes two functions - `populate_func` and `reset_func` and use them to populate `value_` and `columns_` in CoalescingIterator and `attribute_groups_` in AttributeGroupIterator. In CoalescingIterator, populate_func is `Coalesce()`, in AttributeGroupIterator populate_func is `AddToAttributeGroups()`. `reset_func` clears populated value_, columns_ and attribute_groups_ accordingly.
- `Coalesce()` merge sorts columns from multiple CFs when a key exists in more than on CFs. column that appears in later CF overwrites the prior ones.
For example, if CF1 has `"key_1" ==> {"col_1": "foo", "col_2", "baz"}` and CF2 has `"key_1" ==> {"col_2": "quux", "col_3", "bla"}`, and when the iterator is at `key_1`, `columns()` will return `{"col_1": "foo", "col_2", "quux", "col_3", "bla"}`
In this example, `value()` will be empty, because none of them have values for `kDefaultColumnName`
Pull Request resolved: https://github.com/facebook/rocksdb/pull/12480
Test Plan:
## Unit Test
```
./multi_cf_iterator_test
```
## Performance Test
To make sure this change does not impact existing `Iterator` performance
**Build**
```
$> make -j64 release
```
**Setup**
```
$> TEST_TMPDIR=/dev/shm/db_bench ./db_bench -benchmarks="filluniquerandom" -key_size=32 -value_size=512 -num=1000000 -compression_type=none
```
**Run**
```
TEST_TMPDIR=/dev/shm/db_bench ./db_bench -use_existing_db=1 -benchmarks="newiterator,seekrandom" -cache_size=10485760000
```
**Before the change**
```
DB path: [/dev/shm/db_bench/dbbench]
newiterator : 0.519 micros/op 1927904 ops/sec 0.519 seconds 1000000 operations;
DB path: [/dev/shm/db_bench/dbbench]
seekrandom : 5.302 micros/op 188589 ops/sec 5.303 seconds 1000000 operations; (0 of 1000000 found)
```
**After the change**
```
DB path: [/dev/shm/db_bench/dbbench]
newiterator : 0.497 micros/op 2011012 ops/sec 0.497 seconds 1000000 operations;
DB path: [/dev/shm/db_bench/dbbench]
seekrandom : 5.252 micros/op 190405 ops/sec 5.252 seconds 1000000 operations; (0 of 1000000 found)
```
Reviewed By: ltamasi
Differential Revision: D55353909
Pulled By: jaykorean
fbshipit-source-id: 8d7786ffee09e022261ce34aa60e8633685e1946
2024-04-11 18:34:04 +00:00
|
|
|
}
|
|
|
|
}
|
|
|
|
|
2024-04-16 15:45:38 +00:00
|
|
|
template <typename BinaryHeap>
|
|
|
|
void PopulateIterator(BinaryHeap& heap) {
|
|
|
|
// 1. Keep the top iterator (by popping it from the heap) and add it to list
|
|
|
|
// to populate
|
|
|
|
// 2. For all non-top iterators having the same key as top iter popped
|
|
|
|
// from the previous step, add them to the same list and pop it
|
|
|
|
// temporarily from the heap
|
|
|
|
// 3. Once no other iters have the same key as the top iter from step 1,
|
|
|
|
// populate the value/columns and attribute_groups from the list
|
|
|
|
// collected in step 1 and 2 and add all the iters back to the heap
|
MultiCFIterator Refactor - CoalescingIterator & AttributeGroupIterator (#12480)
Summary:
There are a couple of reasons to modify the current implementation of the MultiCfIterator, which implements the generic `Iterator` interface.
- The default behavior of `value()`/`columns()` returning data from different Column Families for different keys can be prone to errors, even though there might be valid use cases where users do not care about the origin of the value/columns.
- The `attribute_groups()` API, which is not yet implemented, will not be useful for a single-CF iterator.
In this PR, we are implementing the following changes:
- `IteratorBase` introduced, which includes all basic iterator functions except `value()` and `columns()`.
- `Iterator`, which now inherits from `IteratorBase`, includes `value()` and `columns()`.
- New public interface `AttributeGroupIterator` inherits from `IteratorBase` and additionally includes `attribute_groups()` (to be implemented).
- Renamed former `MultiCfIterator` to `CoalescingIterator` which inherits from `Iterator`
- Existing MultiCfIteratorTest has been split into two - `CoalescingIteratorTest` and `AttributeGroupIteratorTest`.
- Moved AttributeGroup related code from `wide_columns.h` to a new file, `attribute_groups.h`.
Some Implementation Details
- `MultiCfIteratorImpl` takes two functions - `populate_func` and `reset_func` and use them to populate `value_` and `columns_` in CoalescingIterator and `attribute_groups_` in AttributeGroupIterator. In CoalescingIterator, populate_func is `Coalesce()`, in AttributeGroupIterator populate_func is `AddToAttributeGroups()`. `reset_func` clears populated value_, columns_ and attribute_groups_ accordingly.
- `Coalesce()` merge sorts columns from multiple CFs when a key exists in more than on CFs. column that appears in later CF overwrites the prior ones.
For example, if CF1 has `"key_1" ==> {"col_1": "foo", "col_2", "baz"}` and CF2 has `"key_1" ==> {"col_2": "quux", "col_3", "bla"}`, and when the iterator is at `key_1`, `columns()` will return `{"col_1": "foo", "col_2", "quux", "col_3", "bla"}`
In this example, `value()` will be empty, because none of them have values for `kDefaultColumnName`
Pull Request resolved: https://github.com/facebook/rocksdb/pull/12480
Test Plan:
## Unit Test
```
./multi_cf_iterator_test
```
## Performance Test
To make sure this change does not impact existing `Iterator` performance
**Build**
```
$> make -j64 release
```
**Setup**
```
$> TEST_TMPDIR=/dev/shm/db_bench ./db_bench -benchmarks="filluniquerandom" -key_size=32 -value_size=512 -num=1000000 -compression_type=none
```
**Run**
```
TEST_TMPDIR=/dev/shm/db_bench ./db_bench -use_existing_db=1 -benchmarks="newiterator,seekrandom" -cache_size=10485760000
```
**Before the change**
```
DB path: [/dev/shm/db_bench/dbbench]
newiterator : 0.519 micros/op 1927904 ops/sec 0.519 seconds 1000000 operations;
DB path: [/dev/shm/db_bench/dbbench]
seekrandom : 5.302 micros/op 188589 ops/sec 5.303 seconds 1000000 operations; (0 of 1000000 found)
```
**After the change**
```
DB path: [/dev/shm/db_bench/dbbench]
newiterator : 0.497 micros/op 2011012 ops/sec 0.497 seconds 1000000 operations;
DB path: [/dev/shm/db_bench/dbbench]
seekrandom : 5.252 micros/op 190405 ops/sec 5.252 seconds 1000000 operations; (0 of 1000000 found)
```
Reviewed By: ltamasi
Differential Revision: D55353909
Pulled By: jaykorean
fbshipit-source-id: 8d7786ffee09e022261ce34aa60e8633685e1946
2024-04-11 18:34:04 +00:00
|
|
|
assert(!heap.empty());
|
|
|
|
auto top = heap.top();
|
|
|
|
heap.pop();
|
2024-04-16 15:45:38 +00:00
|
|
|
autovector<MultiCfIteratorInfo> to_populate;
|
|
|
|
to_populate.push_back(top);
|
MultiCFIterator Refactor - CoalescingIterator & AttributeGroupIterator (#12480)
Summary:
There are a couple of reasons to modify the current implementation of the MultiCfIterator, which implements the generic `Iterator` interface.
- The default behavior of `value()`/`columns()` returning data from different Column Families for different keys can be prone to errors, even though there might be valid use cases where users do not care about the origin of the value/columns.
- The `attribute_groups()` API, which is not yet implemented, will not be useful for a single-CF iterator.
In this PR, we are implementing the following changes:
- `IteratorBase` introduced, which includes all basic iterator functions except `value()` and `columns()`.
- `Iterator`, which now inherits from `IteratorBase`, includes `value()` and `columns()`.
- New public interface `AttributeGroupIterator` inherits from `IteratorBase` and additionally includes `attribute_groups()` (to be implemented).
- Renamed former `MultiCfIterator` to `CoalescingIterator` which inherits from `Iterator`
- Existing MultiCfIteratorTest has been split into two - `CoalescingIteratorTest` and `AttributeGroupIteratorTest`.
- Moved AttributeGroup related code from `wide_columns.h` to a new file, `attribute_groups.h`.
Some Implementation Details
- `MultiCfIteratorImpl` takes two functions - `populate_func` and `reset_func` and use them to populate `value_` and `columns_` in CoalescingIterator and `attribute_groups_` in AttributeGroupIterator. In CoalescingIterator, populate_func is `Coalesce()`, in AttributeGroupIterator populate_func is `AddToAttributeGroups()`. `reset_func` clears populated value_, columns_ and attribute_groups_ accordingly.
- `Coalesce()` merge sorts columns from multiple CFs when a key exists in more than on CFs. column that appears in later CF overwrites the prior ones.
For example, if CF1 has `"key_1" ==> {"col_1": "foo", "col_2", "baz"}` and CF2 has `"key_1" ==> {"col_2": "quux", "col_3", "bla"}`, and when the iterator is at `key_1`, `columns()` will return `{"col_1": "foo", "col_2", "quux", "col_3", "bla"}`
In this example, `value()` will be empty, because none of them have values for `kDefaultColumnName`
Pull Request resolved: https://github.com/facebook/rocksdb/pull/12480
Test Plan:
## Unit Test
```
./multi_cf_iterator_test
```
## Performance Test
To make sure this change does not impact existing `Iterator` performance
**Build**
```
$> make -j64 release
```
**Setup**
```
$> TEST_TMPDIR=/dev/shm/db_bench ./db_bench -benchmarks="filluniquerandom" -key_size=32 -value_size=512 -num=1000000 -compression_type=none
```
**Run**
```
TEST_TMPDIR=/dev/shm/db_bench ./db_bench -use_existing_db=1 -benchmarks="newiterator,seekrandom" -cache_size=10485760000
```
**Before the change**
```
DB path: [/dev/shm/db_bench/dbbench]
newiterator : 0.519 micros/op 1927904 ops/sec 0.519 seconds 1000000 operations;
DB path: [/dev/shm/db_bench/dbbench]
seekrandom : 5.302 micros/op 188589 ops/sec 5.303 seconds 1000000 operations; (0 of 1000000 found)
```
**After the change**
```
DB path: [/dev/shm/db_bench/dbbench]
newiterator : 0.497 micros/op 2011012 ops/sec 0.497 seconds 1000000 operations;
DB path: [/dev/shm/db_bench/dbbench]
seekrandom : 5.252 micros/op 190405 ops/sec 5.252 seconds 1000000 operations; (0 of 1000000 found)
```
Reviewed By: ltamasi
Differential Revision: D55353909
Pulled By: jaykorean
fbshipit-source-id: 8d7786ffee09e022261ce34aa60e8633685e1946
2024-04-11 18:34:04 +00:00
|
|
|
if (!heap.empty()) {
|
2024-04-16 15:45:38 +00:00
|
|
|
auto current = heap.top();
|
|
|
|
assert(current.iterator);
|
|
|
|
while (current.iterator->Valid() &&
|
|
|
|
comparator_->Compare(top.iterator->key(),
|
|
|
|
current.iterator->key()) == 0) {
|
|
|
|
assert(current.iterator->status().ok());
|
|
|
|
to_populate.push_back(current);
|
|
|
|
heap.pop();
|
MultiCFIterator Refactor - CoalescingIterator & AttributeGroupIterator (#12480)
Summary:
There are a couple of reasons to modify the current implementation of the MultiCfIterator, which implements the generic `Iterator` interface.
- The default behavior of `value()`/`columns()` returning data from different Column Families for different keys can be prone to errors, even though there might be valid use cases where users do not care about the origin of the value/columns.
- The `attribute_groups()` API, which is not yet implemented, will not be useful for a single-CF iterator.
In this PR, we are implementing the following changes:
- `IteratorBase` introduced, which includes all basic iterator functions except `value()` and `columns()`.
- `Iterator`, which now inherits from `IteratorBase`, includes `value()` and `columns()`.
- New public interface `AttributeGroupIterator` inherits from `IteratorBase` and additionally includes `attribute_groups()` (to be implemented).
- Renamed former `MultiCfIterator` to `CoalescingIterator` which inherits from `Iterator`
- Existing MultiCfIteratorTest has been split into two - `CoalescingIteratorTest` and `AttributeGroupIteratorTest`.
- Moved AttributeGroup related code from `wide_columns.h` to a new file, `attribute_groups.h`.
Some Implementation Details
- `MultiCfIteratorImpl` takes two functions - `populate_func` and `reset_func` and use them to populate `value_` and `columns_` in CoalescingIterator and `attribute_groups_` in AttributeGroupIterator. In CoalescingIterator, populate_func is `Coalesce()`, in AttributeGroupIterator populate_func is `AddToAttributeGroups()`. `reset_func` clears populated value_, columns_ and attribute_groups_ accordingly.
- `Coalesce()` merge sorts columns from multiple CFs when a key exists in more than on CFs. column that appears in later CF overwrites the prior ones.
For example, if CF1 has `"key_1" ==> {"col_1": "foo", "col_2", "baz"}` and CF2 has `"key_1" ==> {"col_2": "quux", "col_3", "bla"}`, and when the iterator is at `key_1`, `columns()` will return `{"col_1": "foo", "col_2", "quux", "col_3", "bla"}`
In this example, `value()` will be empty, because none of them have values for `kDefaultColumnName`
Pull Request resolved: https://github.com/facebook/rocksdb/pull/12480
Test Plan:
## Unit Test
```
./multi_cf_iterator_test
```
## Performance Test
To make sure this change does not impact existing `Iterator` performance
**Build**
```
$> make -j64 release
```
**Setup**
```
$> TEST_TMPDIR=/dev/shm/db_bench ./db_bench -benchmarks="filluniquerandom" -key_size=32 -value_size=512 -num=1000000 -compression_type=none
```
**Run**
```
TEST_TMPDIR=/dev/shm/db_bench ./db_bench -use_existing_db=1 -benchmarks="newiterator,seekrandom" -cache_size=10485760000
```
**Before the change**
```
DB path: [/dev/shm/db_bench/dbbench]
newiterator : 0.519 micros/op 1927904 ops/sec 0.519 seconds 1000000 operations;
DB path: [/dev/shm/db_bench/dbbench]
seekrandom : 5.302 micros/op 188589 ops/sec 5.303 seconds 1000000 operations; (0 of 1000000 found)
```
**After the change**
```
DB path: [/dev/shm/db_bench/dbbench]
newiterator : 0.497 micros/op 2011012 ops/sec 0.497 seconds 1000000 operations;
DB path: [/dev/shm/db_bench/dbbench]
seekrandom : 5.252 micros/op 190405 ops/sec 5.252 seconds 1000000 operations; (0 of 1000000 found)
```
Reviewed By: ltamasi
Differential Revision: D55353909
Pulled By: jaykorean
fbshipit-source-id: 8d7786ffee09e022261ce34aa60e8633685e1946
2024-04-11 18:34:04 +00:00
|
|
|
if (!heap.empty()) {
|
2024-04-16 15:45:38 +00:00
|
|
|
current = heap.top();
|
|
|
|
} else {
|
|
|
|
break;
|
MultiCFIterator Refactor - CoalescingIterator & AttributeGroupIterator (#12480)
Summary:
There are a couple of reasons to modify the current implementation of the MultiCfIterator, which implements the generic `Iterator` interface.
- The default behavior of `value()`/`columns()` returning data from different Column Families for different keys can be prone to errors, even though there might be valid use cases where users do not care about the origin of the value/columns.
- The `attribute_groups()` API, which is not yet implemented, will not be useful for a single-CF iterator.
In this PR, we are implementing the following changes:
- `IteratorBase` introduced, which includes all basic iterator functions except `value()` and `columns()`.
- `Iterator`, which now inherits from `IteratorBase`, includes `value()` and `columns()`.
- New public interface `AttributeGroupIterator` inherits from `IteratorBase` and additionally includes `attribute_groups()` (to be implemented).
- Renamed former `MultiCfIterator` to `CoalescingIterator` which inherits from `Iterator`
- Existing MultiCfIteratorTest has been split into two - `CoalescingIteratorTest` and `AttributeGroupIteratorTest`.
- Moved AttributeGroup related code from `wide_columns.h` to a new file, `attribute_groups.h`.
Some Implementation Details
- `MultiCfIteratorImpl` takes two functions - `populate_func` and `reset_func` and use them to populate `value_` and `columns_` in CoalescingIterator and `attribute_groups_` in AttributeGroupIterator. In CoalescingIterator, populate_func is `Coalesce()`, in AttributeGroupIterator populate_func is `AddToAttributeGroups()`. `reset_func` clears populated value_, columns_ and attribute_groups_ accordingly.
- `Coalesce()` merge sorts columns from multiple CFs when a key exists in more than on CFs. column that appears in later CF overwrites the prior ones.
For example, if CF1 has `"key_1" ==> {"col_1": "foo", "col_2", "baz"}` and CF2 has `"key_1" ==> {"col_2": "quux", "col_3", "bla"}`, and when the iterator is at `key_1`, `columns()` will return `{"col_1": "foo", "col_2", "quux", "col_3", "bla"}`
In this example, `value()` will be empty, because none of them have values for `kDefaultColumnName`
Pull Request resolved: https://github.com/facebook/rocksdb/pull/12480
Test Plan:
## Unit Test
```
./multi_cf_iterator_test
```
## Performance Test
To make sure this change does not impact existing `Iterator` performance
**Build**
```
$> make -j64 release
```
**Setup**
```
$> TEST_TMPDIR=/dev/shm/db_bench ./db_bench -benchmarks="filluniquerandom" -key_size=32 -value_size=512 -num=1000000 -compression_type=none
```
**Run**
```
TEST_TMPDIR=/dev/shm/db_bench ./db_bench -use_existing_db=1 -benchmarks="newiterator,seekrandom" -cache_size=10485760000
```
**Before the change**
```
DB path: [/dev/shm/db_bench/dbbench]
newiterator : 0.519 micros/op 1927904 ops/sec 0.519 seconds 1000000 operations;
DB path: [/dev/shm/db_bench/dbbench]
seekrandom : 5.302 micros/op 188589 ops/sec 5.303 seconds 1000000 operations; (0 of 1000000 found)
```
**After the change**
```
DB path: [/dev/shm/db_bench/dbbench]
newiterator : 0.497 micros/op 2011012 ops/sec 0.497 seconds 1000000 operations;
DB path: [/dev/shm/db_bench/dbbench]
seekrandom : 5.252 micros/op 190405 ops/sec 5.252 seconds 1000000 operations; (0 of 1000000 found)
```
Reviewed By: ltamasi
Differential Revision: D55353909
Pulled By: jaykorean
fbshipit-source-id: 8d7786ffee09e022261ce34aa60e8633685e1946
2024-04-11 18:34:04 +00:00
|
|
|
}
|
|
|
|
}
|
|
|
|
}
|
2024-04-16 15:45:38 +00:00
|
|
|
// Add the items back to the heap
|
|
|
|
for (auto& item : to_populate) {
|
|
|
|
heap.push(item);
|
|
|
|
}
|
|
|
|
populate_func_(to_populate);
|
MultiCFIterator Refactor - CoalescingIterator & AttributeGroupIterator (#12480)
Summary:
There are a couple of reasons to modify the current implementation of the MultiCfIterator, which implements the generic `Iterator` interface.
- The default behavior of `value()`/`columns()` returning data from different Column Families for different keys can be prone to errors, even though there might be valid use cases where users do not care about the origin of the value/columns.
- The `attribute_groups()` API, which is not yet implemented, will not be useful for a single-CF iterator.
In this PR, we are implementing the following changes:
- `IteratorBase` introduced, which includes all basic iterator functions except `value()` and `columns()`.
- `Iterator`, which now inherits from `IteratorBase`, includes `value()` and `columns()`.
- New public interface `AttributeGroupIterator` inherits from `IteratorBase` and additionally includes `attribute_groups()` (to be implemented).
- Renamed former `MultiCfIterator` to `CoalescingIterator` which inherits from `Iterator`
- Existing MultiCfIteratorTest has been split into two - `CoalescingIteratorTest` and `AttributeGroupIteratorTest`.
- Moved AttributeGroup related code from `wide_columns.h` to a new file, `attribute_groups.h`.
Some Implementation Details
- `MultiCfIteratorImpl` takes two functions - `populate_func` and `reset_func` and use them to populate `value_` and `columns_` in CoalescingIterator and `attribute_groups_` in AttributeGroupIterator. In CoalescingIterator, populate_func is `Coalesce()`, in AttributeGroupIterator populate_func is `AddToAttributeGroups()`. `reset_func` clears populated value_, columns_ and attribute_groups_ accordingly.
- `Coalesce()` merge sorts columns from multiple CFs when a key exists in more than on CFs. column that appears in later CF overwrites the prior ones.
For example, if CF1 has `"key_1" ==> {"col_1": "foo", "col_2", "baz"}` and CF2 has `"key_1" ==> {"col_2": "quux", "col_3", "bla"}`, and when the iterator is at `key_1`, `columns()` will return `{"col_1": "foo", "col_2", "quux", "col_3", "bla"}`
In this example, `value()` will be empty, because none of them have values for `kDefaultColumnName`
Pull Request resolved: https://github.com/facebook/rocksdb/pull/12480
Test Plan:
## Unit Test
```
./multi_cf_iterator_test
```
## Performance Test
To make sure this change does not impact existing `Iterator` performance
**Build**
```
$> make -j64 release
```
**Setup**
```
$> TEST_TMPDIR=/dev/shm/db_bench ./db_bench -benchmarks="filluniquerandom" -key_size=32 -value_size=512 -num=1000000 -compression_type=none
```
**Run**
```
TEST_TMPDIR=/dev/shm/db_bench ./db_bench -use_existing_db=1 -benchmarks="newiterator,seekrandom" -cache_size=10485760000
```
**Before the change**
```
DB path: [/dev/shm/db_bench/dbbench]
newiterator : 0.519 micros/op 1927904 ops/sec 0.519 seconds 1000000 operations;
DB path: [/dev/shm/db_bench/dbbench]
seekrandom : 5.302 micros/op 188589 ops/sec 5.303 seconds 1000000 operations; (0 of 1000000 found)
```
**After the change**
```
DB path: [/dev/shm/db_bench/dbbench]
newiterator : 0.497 micros/op 2011012 ops/sec 0.497 seconds 1000000 operations;
DB path: [/dev/shm/db_bench/dbbench]
seekrandom : 5.252 micros/op 190405 ops/sec 5.252 seconds 1000000 operations; (0 of 1000000 found)
```
Reviewed By: ltamasi
Differential Revision: D55353909
Pulled By: jaykorean
fbshipit-source-id: 8d7786ffee09e022261ce34aa60e8633685e1946
2024-04-11 18:34:04 +00:00
|
|
|
}
|
|
|
|
};
|
|
|
|
|
|
|
|
} // namespace ROCKSDB_NAMESPACE
|