Find a file
Andrew Kryczka a43481b3d0 Decouple RateLimiter burst size and refill period (#12379)
Summary:
When the rate limiter does not have any waiting requests, the first request to arrive may consume all of the available bandwidth, despite potentially having lower priority than requests that arrive later in the same refill interval. Then, those higher priority requests must wait for a refill. So even in scenarios in which we have an overall bandwidth surplus, the highest priority requests can be sporadically delayed up to a whole refill period.

Alone, this isn't necessarily problematic as the refill period is configurable via `refill_period_us` and can be tuned down as needed until the max sporadic delay is tolerable. However, tuning down `refill_period_us` had a side effect of reducing burst size. Some users require a certain burst size to issue optimal I/O sizes to the underlying storage system.

To satisfy those users, this PR decouples the refill period from the burst size. That way, the max sporadic delay can be limited without impacting I/O sizes issued to the underlying storage system.

Pull Request resolved: https://github.com/facebook/rocksdb/pull/12379

Test Plan:
The goal is to show we can now limit the max sporadic delay without impacting compaction's I/O size.

The benchmark runs compaction with a large I/O size, while user reads simultaneously run at a low rate that does not consume all of the available bandwidth. The max sporadic delay is measured using the P100 of rocksdb.file.read.get.micros. I just used strace to verify the compaction reads follow `rate_limiter_single_burst_bytes`

Setup: `./db_bench -benchmarks=fillrandom,flush -write_buffer_size=67108864 -disable_auto_compactions=true -value_size=256 -num=1048576`

Benchmark: `./db_bench -benchmarks=readrandom -use_existing_db=true -num=1048576 -duration=10 -benchmark_read_rate_limit=4096 -rate_limiter_bytes_per_sec=67108864 -rate_limiter_refill_period_us=$refill_micros -rate_limiter_single_burst_bytes=16777216 -rate_limit_bg_reads=true -rate_limit_user_ops=true -statistics=true -cache_size=0 -stats_level=5 -compaction_readahead_size=16777216 -use_direct_reads=true`

Results:

refill_micros | rocksdb.file.read.get.micros (P100)
-- | --
10000 | 10802
100000 | 100240
1000000 | 922061

For verifying compaction read sizes: `strace -fye pread64 ./db_bench -benchmarks=compact -use_existing_db=true -rate_limiter_bytes_per_sec=67108864 -rate_limiter_refill_period_us=$refill_micros -rate_limiter_single_burst_bytes=16777216 -rate_limit_bg_reads=true -compaction_readahead_size=16777216 -use_direct_reads=true`

Reviewed By: hx235

Differential Revision: D54165675

Pulled By: ajkr

fbshipit-source-id: c5968486316cbfb7ff8e5b7d75d3589883dd1105
2024-02-26 16:55:13 -08:00
.circleci Enable io_uring in stress test (#12313) 2024-01-31 12:37:42 -08:00
.github Enable io_uring in stress test (#12313) 2024-01-31 12:37:42 -08:00
buckifier Error out in case of std errors in blackbox test and export file in TARGETS 2023-10-24 11:46:18 -07:00
build_tools switch to using centos8-native (#12367) 2024-02-21 12:03:40 -08:00
cache Prefer static_cast in place of most reinterpret_cast (#12308) 2024-02-07 10:44:11 -08:00
cmake Fix zstd typo in cmake (#12309) 2024-02-22 14:39:05 -08:00
coverage Remove platform009 and default to platform010 (#11333) 2023-03-30 09:56:37 -07:00
db Add a perf level for measuring user thread block time (#12368) 2024-02-22 12:14:53 -08:00
db_stress_tool No filesystem reads during Merge() writes (#12365) 2024-02-21 13:15:27 -08:00
docs Java FFI blog post - Post-publication issues with images (2) (#12372) 2024-02-22 15:01:55 -08:00
env Change ReadAsync callback API to remove const from FSReadRequest (#11649) 2024-02-16 09:14:55 -08:00
examples Prefer static_cast in place of most reinterpret_cast (#12308) 2024-02-07 10:44:11 -08:00
file Change ReadAsync callback API to remove const from FSReadRequest (#11649) 2024-02-16 09:14:55 -08:00
fuzz Block per key-value checksum (#11287) 2023-04-25 12:08:23 -07:00
include/rocksdb Decouple RateLimiter burst size and refill period (#12379) 2024-02-26 16:55:13 -08:00
java Persist table options use_delta_encoding in options file (#11987) 2024-02-23 14:13:28 -08:00
logging Remove 'virtual' when implied by 'override' (#12319) 2024-01-31 13:14:42 -08:00
memory Fix ArenaTest.UnmappedAllocation in some cases (#12378) 2024-02-26 16:08:21 -08:00
memtable Prefer static_cast in place of most reinterpret_cast (#12308) 2024-02-07 10:44:11 -08:00
microbench internal_repo_rocksdb (-8794174668376270091) (#12114) 2023-12-01 11:10:30 -08:00
monitoring Add a perf level for measuring user thread block time (#12368) 2024-02-22 12:14:53 -08:00
options Persist table options use_delta_encoding in options file (#11987) 2024-02-23 14:13:28 -08:00
plugin Add initial CMake support to plugin (#9214) 2021-11-30 17:16:53 -08:00
port Prefer static_cast in place of most reinterpret_cast (#12308) 2024-02-07 10:44:11 -08:00
table Persist table options use_delta_encoding in options file (#11987) 2024-02-23 14:13:28 -08:00
test_util No filesystem reads during Merge() writes (#12365) 2024-02-21 13:15:27 -08:00
third-party fix optimization-disabled test builds with platform010 (#11361) 2023-04-10 13:59:44 -07:00
tools Decouple RateLimiter burst size and refill period (#12379) 2024-02-26 16:55:13 -08:00
trace_replay Remove 'virtual' when implied by 'override' (#12319) 2024-01-31 13:14:42 -08:00
unreleased_history Decouple RateLimiter burst size and refill period (#12379) 2024-02-26 16:55:13 -08:00
util Decouple RateLimiter burst size and refill period (#12379) 2024-02-26 16:55:13 -08:00
utilities Decouple RateLimiter burst size and refill period (#12379) 2024-02-26 16:55:13 -08:00
.clang-format
.gitignore Add .arcconfig to .gitignore (fb internal use) (#11803) 2023-09-07 14:57:39 -07:00
.lgtm.yml Create lgtm.yml for LGTM.com C/C++ analysis (#4058) 2018-06-26 12:43:04 -07:00
.watchmanconfig Added .watchmanconfig file to rocksdb repo (#5593) 2019-07-19 15:00:33 -07:00
AUTHORS Update RocksDB Authors File 2017-10-18 14:42:10 -07:00
CMakeLists.txt Fix typo in CMakeList. (#12247) 2024-01-25 12:35:27 -08:00
CODE_OF_CONDUCT.md Adopt Contributor Covenant 2019-08-29 23:21:01 -07:00
common.mk Clean up variables for temporary directory (#9961) 2022-05-06 16:38:06 -07:00
CONTRIBUTING.md Add Code of Conduct 2017-12-05 18:42:35 -08:00
COPYING Add GPLv2 as an alternative license. 2017-04-27 18:06:12 -07:00
crash_test.mk Stress/Crash Test for OptimisticTransactionDB (#11513) 2023-06-17 16:27:37 -07:00
DEFAULT_OPTIONS_HISTORY.md Add Options::DisableExtraChecks, clarify force_consistency_checks (#9363) 2022-01-18 17:31:03 -08:00
DUMP_FORMAT.md
HISTORY.md Start 9.1.0 release (#12360) 2024-02-16 18:26:48 -08:00
INSTALL.md Fix build on OpenBSD i386 (#12142) 2024-01-29 16:19:59 -08:00
issue_template.md Add Google Group to Issue Template 2020-01-28 14:40:37 -08:00
LANGUAGE-BINDINGS.md Add grocksdb in Go language bindings (#10498) 2022-08-23 15:02:10 -07:00
LICENSE.Apache Change RocksDB License 2017-07-15 16:11:23 -07:00
LICENSE.leveldb Add back the LevelDB license file 2017-07-16 18:42:18 -07:00
Makefile Update ZLib to 1.3.1 (#12358) 2024-02-16 10:26:32 -08:00
PLUGINS.md Add encfs plugin link (#12070) 2023-11-14 07:33:21 -08:00
README.md Remove deprecated integration tests from README.md (#11354) 2023-04-07 16:52:50 -07:00
rocksdb.pc.in build: fix pkg-config file generation (#9953) 2022-05-30 12:46:40 -07:00
src.mk Add native logger support to RocksJava (#12213) 2024-01-17 17:51:36 -08:00
TARGETS Make OffpeakTimeInfo available in VersionSet (#12018) 2023-10-27 15:56:48 -07:00
thirdparty.inc Fix build jemalloc api (#5470) 2019-06-24 17:40:32 -07:00
USERS.md Add Qdrant to USERS.md (#12072) 2023-11-16 10:35:08 -08:00
Vagrantfile Adding CentOS 7 Vagrantfile & build script 2018-02-26 15:27:17 -08:00
WINDOWS_PORT.md Update branch name in WINDOWS_PORT.md (#8745) 2021-09-01 19:26:39 -07:00

RocksDB: A Persistent Key-Value Store for Flash and RAM Storage

CircleCI Status

RocksDB is developed and maintained by Facebook Database Engineering Team. It is built on earlier work on LevelDB by Sanjay Ghemawat (sanjay@google.com) and Jeff Dean (jeff@google.com)

This code is a library that forms the core building block for a fast key-value server, especially suited for storing data on flash drives. It has a Log-Structured-Merge-Database (LSM) design with flexible tradeoffs between Write-Amplification-Factor (WAF), Read-Amplification-Factor (RAF) and Space-Amplification-Factor (SAF). It has multi-threaded compactions, making it especially suitable for storing multiple terabytes of data in a single database.

Start with example usage here: https://github.com/facebook/rocksdb/tree/main/examples

See the github wiki for more explanation.

The public interface is in include/. Callers should not include or rely on the details of any other header files in this package. Those internal APIs may be changed without warning.

Questions and discussions are welcome on the RocksDB Developers Public Facebook group and email list on Google Groups.

License

RocksDB is dual-licensed under both the GPLv2 (found in the COPYING file in the root directory) and Apache 2.0 License (found in the LICENSE.Apache file in the root directory). You may select, at your option, one of the above-listed licenses.