Go to file
Jay Huh 58a98bded9 MultiCFIterator Refactor - CoalescingIterator & AttributeGroupIterator (#12480)
Summary:
There are a couple of reasons to modify the current implementation of the MultiCfIterator, which implements the generic `Iterator` interface.
- The default behavior of `value()`/`columns()` returning data from different Column Families for different keys can be prone to errors, even though there might be valid use cases where users do not care about the origin of the value/columns.
- The `attribute_groups()` API, which is not yet implemented, will not be useful for a single-CF iterator.

In this PR, we are implementing the following changes:
- `IteratorBase` introduced, which includes all basic iterator functions except `value()` and `columns()`.
- `Iterator`, which now inherits from `IteratorBase`, includes `value()` and `columns()`.
- New public interface `AttributeGroupIterator` inherits from `IteratorBase` and additionally includes `attribute_groups()` (to be implemented).
- Renamed former `MultiCfIterator` to `CoalescingIterator` which inherits from `Iterator`
- Existing MultiCfIteratorTest has been split into two - `CoalescingIteratorTest` and `AttributeGroupIteratorTest`.
- Moved AttributeGroup related code from `wide_columns.h` to a new file, `attribute_groups.h`.

Some Implementation Details
- `MultiCfIteratorImpl` takes two functions - `populate_func` and `reset_func` and use them to populate `value_` and `columns_` in CoalescingIterator and `attribute_groups_` in AttributeGroupIterator. In CoalescingIterator, populate_func is `Coalesce()`, in AttributeGroupIterator populate_func is `AddToAttributeGroups()`. `reset_func` clears populated value_, columns_ and attribute_groups_ accordingly.
- `Coalesce()` merge sorts columns from multiple CFs when a key exists in more than on CFs. column that appears in later CF overwrites the prior ones.

For example, if CF1 has `"key_1" ==> {"col_1": "foo",  "col_2", "baz"}` and CF2 has `"key_1" ==> {"col_2": "quux", "col_3", "bla"}`, and when the iterator is at `key_1`, `columns()` will return `{"col_1": "foo", "col_2", "quux", "col_3", "bla"}`

In this example, `value()` will be empty, because none of them have values for `kDefaultColumnName`

Pull Request resolved: https://github.com/facebook/rocksdb/pull/12480

Test Plan:
## Unit Test
```
./multi_cf_iterator_test
```

## Performance Test

To make sure this change does not impact existing `Iterator` performance

**Build**
```
$> make -j64 release
```
**Setup**
```
$> TEST_TMPDIR=/dev/shm/db_bench ./db_bench -benchmarks="filluniquerandom" -key_size=32 -value_size=512 -num=1000000 -compression_type=none
```
**Run**
```
TEST_TMPDIR=/dev/shm/db_bench ./db_bench -use_existing_db=1 -benchmarks="newiterator,seekrandom" -cache_size=10485760000
```

**Before the change**
```
DB path: [/dev/shm/db_bench/dbbench]
newiterator  :       0.519 micros/op 1927904 ops/sec 0.519 seconds 1000000 operations;
DB path: [/dev/shm/db_bench/dbbench]
seekrandom   :       5.302 micros/op 188589 ops/sec 5.303 seconds 1000000 operations; (0 of 1000000 found)
```
**After the change**
```
DB path: [/dev/shm/db_bench/dbbench]
newiterator  :       0.497 micros/op 2011012 ops/sec 0.497 seconds 1000000 operations;
DB path: [/dev/shm/db_bench/dbbench]
seekrandom   :       5.252 micros/op 190405 ops/sec 5.252 seconds 1000000 operations; (0 of 1000000 found)
```

Reviewed By: ltamasi

Differential Revision: D55353909

Pulled By: jaykorean

fbshipit-source-id: 8d7786ffee09e022261ce34aa60e8633685e1946
2024-04-11 11:34:04 -07:00
.circleci Enable io_uring in stress test (#12313) 2024-01-31 12:37:42 -08:00
.github Fix windows build and CI (#12426) 2024-03-14 12:04:41 -07:00
buckifier Error out in case of std errors in blackbox test and export file in TARGETS 2023-10-24 11:46:18 -07:00
build_tools switch to using centos8-native (#12367) 2024-02-21 12:03:40 -08:00
cache Remove extra semi colon from icsp/lib/logging/IcspLogRpcMessage.cpp 2024-03-31 10:26:34 -07:00
cmake Fix zstd typo in cmake (#12309) 2024-02-22 14:39:05 -08:00
coverage Remove platform009 and default to platform010 (#11333) 2023-03-30 09:56:37 -07:00
db MultiCFIterator Refactor - CoalescingIterator & AttributeGroupIterator (#12480) 2024-04-11 11:34:04 -07:00
db_stress_tool Fix "assertion failed - iter != ROCKSDB_NAMESPACE::OptionsHelper::temperature_to_string.end()" (#12519) 2024-04-08 13:45:41 -07:00
docs Java FFI blog post - Post-publication issues with images (2) (#12372) 2024-02-22 15:01:55 -08:00
env Change ReadAsync callback API to remove const from FSReadRequest (#11649) 2024-02-16 09:14:55 -08:00
examples Prefer static_cast in place of most reinterpret_cast (#12308) 2024-02-07 10:44:11 -08:00
file Fix/improve temperature handling for file ingestion (#12402) 2024-03-05 16:56:08 -08:00
fuzz Block per key-value checksum (#11287) 2023-04-25 12:08:23 -07:00
include/rocksdb MultiCFIterator Refactor - CoalescingIterator & AttributeGroupIterator (#12480) 2024-04-11 11:34:04 -07:00
java Fix exception on RocksDB.getColumnFamilyMetaData() (#12474) 2024-04-05 13:55:18 -07:00
logging Fix data race in AutoRollLogger (#12436) 2024-03-14 14:28:33 -07:00
memory Replace ScopedArenaIterator with ScopedArenaPtr<InternalIterator> (#12470) 2024-03-22 13:40:42 -07:00
memtable Prefer static_cast in place of most reinterpret_cast (#12308) 2024-02-07 10:44:11 -08:00
microbench internal_repo_rocksdb (-8794174668376270091) (#12114) 2023-12-01 11:10:30 -08:00
monitoring New PerfContext counters for block cache bytes read (#12459) 2024-03-21 10:46:46 -07:00
options Add more missing options in crash test (#12508) 2024-04-08 09:48:03 -07:00
plugin Add initial CMake support to plugin (#9214) 2021-11-30 17:16:53 -08:00
port Run internal cpp modernizer on RocksDB repo (#12398) 2024-03-04 10:08:32 -08:00
table Add support in SstFileReader to get a raw table iterator (#12385) 2024-04-02 21:23:06 -07:00
test_util Run internal cpp modernizer on RocksDB repo (#12398) 2024-03-04 10:08:32 -08:00
third-party fix optimization-disabled test builds with platform010 (#11361) 2023-04-10 13:59:44 -07:00
tools Temporarily disable `kAdmPolicyThreeQueue` in crash test (#12521) 2024-04-09 16:15:54 -07:00
trace_replay Remove 'virtual' when implied by 'override' (#12319) 2024-01-31 13:14:42 -08:00
unreleased_history Add support in SstFileReader to get a raw table iterator (#12385) 2024-04-02 21:23:06 -07:00
util Make autovector call default constructor explicitly before move/copy (#12499) 2024-04-04 12:33:05 -07:00
utilities Add GetLiveFilesStorageInfo to legacy BlobDB (#12468) 2024-04-05 13:50:27 -07:00
.clang-format
.gitignore Add .arcconfig to .gitignore (fb internal use) (#11803) 2023-09-07 14:57:39 -07:00
.lgtm.yml Create lgtm.yml for LGTM.com C/C++ analysis (#4058) 2018-06-26 12:43:04 -07:00
.watchmanconfig Added .watchmanconfig file to rocksdb repo (#5593) 2019-07-19 15:00:33 -07:00
AUTHORS Update RocksDB Authors File 2017-10-18 14:42:10 -07:00
CMakeLists.txt MultiCFIterator Refactor - CoalescingIterator & AttributeGroupIterator (#12480) 2024-04-11 11:34:04 -07:00
CODE_OF_CONDUCT.md Adopt Contributor Covenant 2019-08-29 23:21:01 -07:00
CONTRIBUTING.md Add Code of Conduct 2017-12-05 18:42:35 -08:00
COPYING Add GPLv2 as an alternative license. 2017-04-27 18:06:12 -07:00
DEFAULT_OPTIONS_HISTORY.md Add Options::DisableExtraChecks, clarify force_consistency_checks (#9363) 2022-01-18 17:31:03 -08:00
DUMP_FORMAT.md
HISTORY.md Branch cut 9.1.fb (#12476) 2024-03-25 15:07:43 -07:00
INSTALL.md fix out of date macos instructions in INSTALL.md (#12393) 2024-02-28 12:38:15 -08:00
LANGUAGE-BINDINGS.md Add grocksdb in Go language bindings (#10498) 2022-08-23 15:02:10 -07:00
LICENSE.Apache Change RocksDB License 2017-07-15 16:11:23 -07:00
LICENSE.leveldb Add back the LevelDB license file 2017-07-16 18:42:18 -07:00
Makefile Add support in SstFileReader to get a raw table iterator (#12385) 2024-04-02 21:23:06 -07:00
PLUGINS.md Add encfs plugin link (#12070) 2023-11-14 07:33:21 -08:00
README.md Remove deprecated integration tests from README.md (#11354) 2023-04-07 16:52:50 -07:00
TARGETS MultiCFIterator Refactor - CoalescingIterator & AttributeGroupIterator (#12480) 2024-04-11 11:34:04 -07:00
USERS.md Add Qdrant to USERS.md (#12072) 2023-11-16 10:35:08 -08:00
Vagrantfile Adding CentOS 7 Vagrantfile & build script 2018-02-26 15:27:17 -08:00
WINDOWS_PORT.md Update branch name in WINDOWS_PORT.md (#8745) 2021-09-01 19:26:39 -07:00
common.mk Clean up variables for temporary directory (#9961) 2022-05-06 16:38:06 -07:00
crash_test.mk Stress/Crash Test for OptimisticTransactionDB (#11513) 2023-06-17 16:27:37 -07:00
issue_template.md Add Google Group to Issue Template 2020-01-28 14:40:37 -08:00
rocksdb.pc.in build: fix pkg-config file generation (#9953) 2022-05-30 12:46:40 -07:00
src.mk MultiCFIterator Refactor - CoalescingIterator & AttributeGroupIterator (#12480) 2024-04-11 11:34:04 -07:00
thirdparty.inc Fix build jemalloc api (#5470) 2019-06-24 17:40:32 -07:00

README.md

RocksDB: A Persistent Key-Value Store for Flash and RAM Storage

CircleCI Status

RocksDB is developed and maintained by Facebook Database Engineering Team. It is built on earlier work on LevelDB by Sanjay Ghemawat (sanjay@google.com) and Jeff Dean (jeff@google.com)

This code is a library that forms the core building block for a fast key-value server, especially suited for storing data on flash drives. It has a Log-Structured-Merge-Database (LSM) design with flexible tradeoffs between Write-Amplification-Factor (WAF), Read-Amplification-Factor (RAF) and Space-Amplification-Factor (SAF). It has multi-threaded compactions, making it especially suitable for storing multiple terabytes of data in a single database.

Start with example usage here: https://github.com/facebook/rocksdb/tree/main/examples

See the github wiki for more explanation.

The public interface is in include/. Callers should not include or rely on the details of any other header files in this package. Those internal APIs may be changed without warning.

Questions and discussions are welcome on the RocksDB Developers Public Facebook group and email list on Google Groups.

License

RocksDB is dual-licensed under both the GPLv2 (found in the COPYING file in the root directory) and Apache 2.0 License (found in the LICENSE.Apache file in the root directory). You may select, at your option, one of the above-listed licenses.