Update files for version 8.8 (#11878)

Summary: Pull Request resolved: https://github.com/facebook/rocksdb/pull/11878

Reviewed By: ajkr

Differential Revision: D49568389

Pulled By: cbi42

fbshipit-source-id: b2022735799be9b5e81e03dfb418f8b104632ecf
This commit is contained in:
Changyu Bi 2023-09-23 11:02:19 -07:00 committed by Facebook GitHub Bot
parent 3d67b5e8e5
commit 49da91ec09
37 changed files with 47 additions and 36 deletions

View File

@ -1,6 +1,51 @@
# Rocksdb Change Log
> NOTE: Entries for next release do not go here. Follow instructions in `unreleased_history/README.txt`
## 8.7.0 (09/22/2023)
### New Features
* Added an experimental new "automatic" variant of HyperClockCache that does not require a prior estimate of the average size of cache entries. This variant is activated when HyperClockCacheOptions::estimated\_entry\_charge = 0 and has essentially the same concurrency benefits as the existing HyperClockCache.
* Add a new statistic `COMPACTION_CPU_TOTAL_TIME` that records cumulative compaction cpu time. This ticker is updated regularly while a compaction is running.
* Add `GetEntity()` API for ReadOnly DB and Secondary DB.
* Add a new iterator API `Iterator::Refresh(const Snapshot *)` that allows iterator to be refreshed while using the input snapshot to read.
* Added a new read option `merge_operand_count_threshold`. When the number of merge operands applied during a successful point lookup exceeds this threshold, the query will return a special OK status with a new subcode `kMergeOperandThresholdExceeded`. Applications might use this signal to take action to reduce the number of merge operands for the affected key(s), for example by running a compaction.
* For `NewRibbonFilterPolicy()`, made the `bloom_before_level` option mutable through the Configurable interface and the SetOptions API, allowing dynamic switching between all-Bloom and all-Ribbon configurations, and configurations in between. See comments on `NewRibbonFilterPolicy()`
* RocksDB now allows the block cache to be stacked on top of a compressed secondary cache and a non-volatile secondary cache, thus creating a three-tier cache. To set it up, use the `NewTieredCache()` API in rocksdb/cache.h..
* Added a new wide-column aware full merge API called `FullMergeV3` to `MergeOperator`. `FullMergeV3` supports wide columns both as base value and merge result, which enables the application to perform more general transformations during merges. For backward compatibility, the default implementation implements the earlier logic of applying the merge operation to the default column of any wide-column entities. Specifically, if there is no base value or the base value is a plain key-value, the default implementation falls back to `FullMergeV2`. If the base value is a wide-column entity, the default implementation invokes `FullMergeV2` to perform the merge on the default column, and leaves any other columns unchanged.
* Add wide column support to ldb commands (scan, dump, idump, dump_wal) and sst_dump tool's scan command
### Public API Changes
* Expose more information about input files used in table creation (if any) in `CompactionFilter::Context`. See `CompactionFilter::Context::input_start_level`,`CompactionFilter::Context::input_table_properties` for more.
* `Options::compaction_readahead_size` 's default value is changed from 0 to 2MB.
* When using LZ4 compression, the `acceleration` parameter is configurable by setting the negated value in `CompressionOptions::level`. For example, `CompressionOptions::level=-10` will set `acceleration=10`
* The `NewTieredCache` API has been changed to take the total cache capacity (inclusive of both the primary and the compressed secondary cache) and the ratio of total capacity to allocate to the compressed cache. These are specified in `TieredCacheOptions`. Any capacity specified in `LRUCacheOptions`, `HyperClockCacheOptions` and `CompressedSecondaryCacheOptions` is ignored. A new API, `UpdateTieredCache` is provided to dynamically update the total capacity, ratio of compressed cache, and admission policy.
* The `NewTieredVolatileCache()` API in rocksdb/cache.h has been renamed to `NewTieredCache()`.
### Behavior Changes
* Compaction read performance will regress when `Options::compaction_readahead_size` is explicitly set to 0
* Universal size amp compaction will conditionally exclude some of the newest L0 files when selecting input with a small negative impact to size amp. This is to prevent a large number of L0 files from being locked by a size amp compaction, potentially leading to write stop with a few more flushes.
* Change ldb scan command delimiter from ':' to '==>'.
### Bug Fixes
* Fix a bug where if there is an error reading from offset 0 of a file from L1+ and that the file is not the first file in the sorted run, data can be lost in compaction and read/scan can return incorrect results.
* Fix a bug where iterator may return incorrect result for DeleteRange() users if there was an error reading from a file.
* Fix a bug with atomic_flush=true that can cause DB to stuck after a flush fails (#11872).
* Fix a bug where RocksDB (with atomic_flush=false) can delete output SST files of pending flushes when a previous concurrent flush fails (#11865). This can result in DB entering read-only state with error message like `IO error: No such file or directory: While open a file for random read: /tmp/rocksdbtest-501/db_flush_test_87732_4230653031040984171/000013.sst`.
* Fix an assertion fault during seek with async_io when readahead trimming is enabled.
* When the compressed secondary cache capacity is reduced to 0, it should be completely disabled. Before this fix, inserts and lookups would still go to the backing `LRUCache` before returning, thus incurring locking overhead. With this fix, inserts and lookups are no-ops and do not add any overhead.
* Updating the tiered cache (cache allocated using NewTieredCache()) by calling SetCapacity() on it was not working properly. The initial creation would set the primary cache capacity to the combined primary and compressed secondary cache capacity. But SetCapacity() would just set the primary cache capacity. With this fix, the user always specifies the total budget and compressed secondary cache ratio on creation. Subsequently, SetCapacity() will distribute the new capacity across the two caches by the same ratio.
* Fixed a bug in `MultiGet` for cleaning up SuperVersion acquired with locking db mutex.
* Fix a bug where row cache can falsely return kNotFound even though row cache entry is hit.
* Fixed a race condition in `GenericRateLimiter` that could cause it to stop granting requests
* Fix a bug (Issue #10257) where DB can hang after write stall since no compaction is scheduled (#11764).
* Add a fix for async_io where during seek, when reading a block for seeking a target key in a file without any readahead, the iterator aligned the read on a page boundary and reading more than necessary. This increased the storage read bandwidth usage.
* Fix an issue in sst dump tool to handle bounds specified for data with user-defined timestamps.
* When auto_readahead_size is enabled, update readahead upper bound during readahead trimming when reseek changes iterate_upper_bound dynamically.
* Fixed a bug where `rocksdb.file.read.verify.file.checksums.micros` is not populated
### Performance Improvements
* Added additional improvements in tuning readahead_size during Scans when auto_readahead_size is enabled. However it's not supported with Iterator::Prev operation and will return NotSupported error.
* During async_io, the Seek happens in 2 phases. Phase 1 starts an asynchronous read on a block cache miss, and phase 2 waits for it to complete and finishes the seek. In both phases, it tries to lookup the block cache for the data block first before looking in the prefetch buffer. It's optimized by doing the block cache lookup only in the first phase that would save some CPU.
## 8.6.0 (08/18/2023)
### New Features
* Added enhanced data integrity checking on SST files with new format_version=6. Performance impact is very small or negligible. Previously if SST data was misplaced or re-arranged by the storage layer, it could pass block checksum with higher than 1 in 4 billion probability. With format_version=6, block checksums depend on what file they are in and location within the file. This way, misplaced SST data is no more likely to pass checksum verification than randomly corrupted data. Also in format_version=6, SST footers are checksum-protected.

View File

@ -12,7 +12,7 @@
// NOTE: in 'main' development branch, this should be the *next*
// minor or major version number planned for release.
#define ROCKSDB_MAJOR 8
#define ROCKSDB_MINOR 7
#define ROCKSDB_MINOR 8
#define ROCKSDB_PATCH 0
// Do not use these. We made the mistake of declaring macros starting with

View File

@ -125,7 +125,7 @@ EOF
# To check for DB forward compatibility with loading options (old version
# reading data from new), as well as backward compatibility
declare -a db_forward_with_options_refs=("6.27.fb" "6.28.fb" "6.29.fb" "7.0.fb" "7.1.fb" "7.2.fb" "7.3.fb" "7.4.fb" "7.5.fb" "7.6.fb" "7.7.fb" "7.8.fb" "7.9.fb" "7.10.fb" "8.0.fb" "8.1.fb" "8.2.fb" "8.3.fb" "8.4.fb" "8.5.fb" "8.6.fb")
declare -a db_forward_with_options_refs=("6.27.fb" "6.28.fb" "6.29.fb" "7.0.fb" "7.1.fb" "7.2.fb" "7.3.fb" "7.4.fb" "7.5.fb" "7.6.fb" "7.7.fb" "7.8.fb" "7.9.fb" "7.10.fb" "8.0.fb" "8.1.fb" "8.2.fb" "8.3.fb" "8.4.fb" "8.5.fb" "8.6.fb" "8.7.fb")
# To check for DB forward compatibility without loading options (in addition
# to the "with loading options" set), as well as backward compatibility
declare -a db_forward_no_options_refs=() # N/A at the moment

View File

@ -1 +0,0 @@
Compaction read performance will regress when `Options::compaction_readahead_size` is explicitly set to 0

View File

@ -1 +0,0 @@
Universal size amp compaction will conditionally exclude some of the newest L0 files when selecting input with a small negative impact to size amp. This is to prevent a large number of L0 files from being locked by a size amp compaction, potentially leading to write stop with a few more flushes.

View File

@ -1 +0,0 @@
Change ldb scan command delimiter from ':' to '==>'.

View File

@ -1 +0,0 @@
* Fix a bug where if there is an error reading from offset 0 of a file from L1+ and that the file is not the first file in the sorted run, data can be lost in compaction and read/scan can return incorrect results.

View File

@ -1 +0,0 @@
* Fix a bug where iterator may return incorrect result for DeleteRange() users if there was an error reading from a file.

View File

@ -1 +0,0 @@
* Fix a bug with atomic_flush=true that can cause DB to stuck after a flush fails (#11872).

View File

@ -1 +0,0 @@
* Fix a bug where RocksDB (with atomic_flush=false) can delete output SST files of pending flushes when a previous concurrent flush fails (#11865). This can result in DB entering read-only state with error message like `IO error: No such file or directory: While open a file for random read: /tmp/rocksdbtest-501/db_flush_test_87732_4230653031040984171/000013.sst`.

View File

@ -1 +0,0 @@
Fix an assertion fault during seek with async_io when readahead trimming is enabled.

View File

@ -1 +0,0 @@
When the compressed secondary cache capacity is reduced to 0, it should be completely disabled. Before this fix, inserts and lookups would still go to the backing `LRUCache` before returning, thus incurring locking overhead. With this fix, inserts and lookups are no-ops and do not add any overhead.

View File

@ -1 +0,0 @@
Updating the tiered cache (cache allocated using NewTieredCache()) by calling SetCapacity() on it was not working properly. The initial creation would set the primary cache capacity to the combined primary and compressed secondary cache capacity. But SetCapacity() would just set the primary cache capacity. With this fix, the user always specifies the total budget and compressed secondary cache ratio on creation. Subsequently, SetCapacity() will distribute the new capacity across the two caches by the same ratio.

View File

@ -1 +0,0 @@
Fixed a bug in `MultiGet` for cleaning up SuperVersion acquired with locking db mutex.

View File

@ -1 +0,0 @@
* Fix a bug where row cache can falsely return kNotFound even though row cache entry is hit.

View File

@ -1 +0,0 @@
Fixed a race condition in `GenericRateLimiter` that could cause it to stop granting requests

View File

@ -1 +0,0 @@
* Fix a bug (Issue #10257) where DB can hang after write stall since no compaction is scheduled (#11764).

View File

@ -1 +0,0 @@
Add a fix for async_io where during seek, when reading a block for seeking a target key in a file without any readahead, the iterator aligned the read on a page boundary and reading more than necessary. This increased the storage read bandwidth usage.

View File

@ -1 +0,0 @@
Fix an issue in sst dump tool to handle bounds specified for data with user-defined timestamps.

View File

@ -1 +0,0 @@
* When auto_readahead_size is enabled, update readahead upper bound during readahead trimming when reseek changes iterate_upper_bound dynamically.

View File

@ -1 +0,0 @@
Fixed a bug where `rocksdb.file.read.verify.file.checksums.micros` is not populated

View File

@ -1 +0,0 @@
Added an experimental new "automatic" variant of HyperClockCache that does not require a prior estimate of the average size of cache entries. This variant is activated when HyperClockCacheOptions::estimated\_entry\_charge = 0 and has essentially the same concurrency benefits as the existing HyperClockCache.

View File

@ -1 +0,0 @@
* Add a new statistic `COMPACTION_CPU_TOTAL_TIME` that records cumulative compaction cpu time. This ticker is updated regularly while a compaction is running.

View File

@ -1 +0,0 @@
Add `GetEntity()` API for ReadOnly DB and Secondary DB.

View File

@ -1 +0,0 @@
Add a new iterator API `Iterator::Refresh(const Snapshot *)` that allows iterator to be refreshed while using the input snapshot to read.

View File

@ -1 +0,0 @@
Added a new read option `merge_operand_count_threshold`. When the number of merge operands applied during a successful point lookup exceeds this threshold, the query will return a special OK status with a new subcode `kMergeOperandThresholdExceeded`. Applications might use this signal to take action to reduce the number of merge operands for the affected key(s), for example by running a compaction.

View File

@ -1 +0,0 @@
For `NewRibbonFilterPolicy()`, made the `bloom_before_level` option mutable through the Configurable interface and the SetOptions API, allowing dynamic switching between all-Bloom and all-Ribbon configurations, and configurations in between. See comments on `NewRibbonFilterPolicy()`

View File

@ -1 +0,0 @@
RocksDB now allows the block cache to be stacked on top of a compressed secondary cache and a non-volatile secondary cache, thus creating a three-tier cache. To set it up, use the `NewTieredCache()` API in rocksdb/cache.h..

View File

@ -1 +0,0 @@
Added a new wide-column aware full merge API called `FullMergeV3` to `MergeOperator`. `FullMergeV3` supports wide columns both as base value and merge result, which enables the application to perform more general transformations during merges. For backward compatibility, the default implementation implements the earlier logic of applying the merge operation to the default column of any wide-column entities. Specifically, if there is no base value or the base value is a plain key-value, the default implementation falls back to `FullMergeV2`. If the base value is a wide-column entity, the default implementation invokes `FullMergeV2` to perform the merge on the default column, and leaves any other columns unchanged.

View File

@ -1 +0,0 @@
Add wide column support to ldb commands (scan, dump, idump, dump_wal) and sst_dump tool's scan command

View File

@ -1 +0,0 @@
Added additional improvements in tuning readahead_size during Scans when auto_readahead_size is enabled. However it's not supported with Iterator::Prev operation and will return NotSupported error.

View File

@ -1 +0,0 @@
During async_io, the Seek happens in 2 phases. Phase 1 starts an asynchronous read on a block cache miss, and phase 2 waits for it to complete and finishes the seek. In both phases, it tries to lookup the block cache for the data block first before looking in the prefetch buffer. It's optimized by doing the block cache lookup only in the first phase that would save some CPU.

View File

@ -1 +0,0 @@
Expose more information about input files used in table creation (if any) in `CompactionFilter::Context`. See `CompactionFilter::Context::input_start_level`,`CompactionFilter::Context::input_table_properties` for more.

View File

@ -1 +0,0 @@
`Options::compaction_readahead_size` 's default value is changed from 0 to 2MB.

View File

@ -1 +0,0 @@
* When using LZ4 compression, the `acceleration` parameter is configurable by setting the negated value in `CompressionOptions::level`. For example, `CompressionOptions::level=-10` will set `acceleration=10`

View File

@ -1 +0,0 @@
The `NewTieredCache` API has been changed to take the total cache capacity (inclusive of both the primary and the compressed secondary cache) and the ratio of total capacity to allocate to the compressed cache. These are specified in `TieredCacheOptions`. Any capacity specified in `LRUCacheOptions`, `HyperClockCacheOptions` and `CompressedSecondaryCacheOptions` is ignored. A new API, `UpdateTieredCache` is provided to dynamically update the total capacity, ratio of compressed cache, and admission policy.

View File

@ -1 +0,0 @@
The `NewTieredVolatileCache()` API in rocksdb/cache.h has been renamed to `NewTieredCache()`.