rocksdb

Go to file

Hui Xiao 06e593376c Group SST write in flush, compaction and db open with new stats (#11910 ) Summary: ## Context/Summary Similar to https://github.com/facebook/rocksdb/pull/11288, https://github.com/facebook/rocksdb/pull/11444, categorizing SST/blob file write according to different io activities allows more insight into the activity. For that, this PR does the following: - Tag different write IOs by passing down and converting WriteOptions to IOOptions - Add new SST_WRITE_MICROS histogram in WritableFileWriter::Append() and breakdown FILE_WRITE_{FLUSH\|COMPACTION\|DB_OPEN}_MICROS Some related code refactory to make implementation cleaner: - Blob stats - Replace high-level write measurement with low-level WritableFileWriter::Append() measurement for BLOB_DB_BLOB_FILE_WRITE_MICROS. This is to make FILE_WRITE_{FLUSH\|COMPACTION\|DB_OPEN}_MICROS include blob file. As a consequence, this introduces some behavioral changes on it, see HISTORY and db bench test plan below for more info. - Fix bugs where BLOB_DB_BLOB_FILE_SYNCED/BLOB_DB_BLOB_FILE_BYTES_WRITTEN include file failed to sync and bytes failed to write. - Refactor WriteOptions constructor for easier construction with io_activity and rate_limiter_priority - Refactor DBImpl::~DBImpl()/BlobDBImpl::Close() to bypass thread op verification - Build table - TableBuilderOptions now includes Read/WriteOpitons so BuildTable() do not need to take these two variables - Replace the io_priority passed into BuildTable() with TableBuilderOptions::WriteOpitons::rate_limiter_priority. Similar for BlobFileBuilder. This parameter is used for dynamically changing file io priority for flush, see https://github.com/facebook/rocksdb/pull/9988?fbclid=IwAR1DtKel6c-bRJAdesGo0jsbztRtciByNlvokbxkV6h_L-AE9MACzqRTT5s for more - Update ThreadStatus::FLUSH_BYTES_WRITTEN to use io_activity to track flush IO in flush job and db open instead of io_priority ## Test ### db bench Flush ``` ./db_bench --statistics=1 --benchmarks=fillseq --num=100000 --write_buffer_size=100 rocksdb.sst.write.micros P50 : 1.830863 P95 : 4.094720 P99 : 6.578947 P100 : 26.000000 COUNT : 7875 SUM : 20377 rocksdb.file.write.flush.micros P50 : 1.830863 P95 : 4.094720 P99 : 6.578947 P100 : 26.000000 COUNT : 7875 SUM : 20377 rocksdb.file.write.compaction.micros P50 : 0.000000 P95 : 0.000000 P99 : 0.000000 P100 : 0.000000 COUNT : 0 SUM : 0 rocksdb.file.write.db.open.micros P50 : 0.000000 P95 : 0.000000 P99 : 0.000000 P100 : 0.000000 COUNT : 0 SUM : 0 ``` compaction, db oopen ``` Setup: ./db_bench --statistics=1 --benchmarks=fillseq --num=10000 --disable_auto_compactions=1 -write_buffer_size=100 --db=../db_bench Run:./db_bench --statistics=1 --benchmarks=compact --db=../db_bench --use_existing_db=1 rocksdb.sst.write.micros P50 : 2.675325 P95 : 9.578788 P99 : 18.780000 P100 : 314.000000 COUNT : 638 SUM : 3279 rocksdb.file.write.flush.micros P50 : 0.000000 P95 : 0.000000 P99 : 0.000000 P100 : 0.000000 COUNT : 0 SUM : 0 rocksdb.file.write.compaction.micros P50 : 2.757353 P95 : 9.610687 P99 : 19.316667 P100 : 314.000000 COUNT : 615 SUM : 3213 rocksdb.file.write.db.open.micros P50 : 2.055556 P95 : 3.925000 P99 : 9.000000 P100 : 9.000000 COUNT : 23 SUM : 66 ``` blob stats - just to make sure they aren't broken by this PR ``` Integrated Blob DB Setup: ./db_bench --enable_blob_files=1 --statistics=1 --benchmarks=fillseq --num=10000 --disable_auto_compactions=1 -write_buffer_size=100 --db=../db_bench Run:./db_bench --enable_blob_files=1 --statistics=1 --benchmarks=compact --db=../db_bench --use_existing_db=1 pre-PR: rocksdb.blobdb.blob.file.write.micros P50 : 7.298246 P95 : 9.771930 P99 : 9.991813 P100 : 16.000000 COUNT : 235 SUM : 1600 rocksdb.blobdb.blob.file.synced COUNT : 1 rocksdb.blobdb.blob.file.bytes.written COUNT : 34842 post-PR: rocksdb.blobdb.blob.file.write.micros P50 : 2.000000 P95 : 2.829360 P99 : 2.993779 P100 : 9.000000 COUNT : 707 SUM : 1614 - COUNT is higher and values are smaller as it includes header and footer write - COUNT is 3X higher due to each Append() count as one post-PR, while in pre-PR, 3 Append()s counts as one. See https://github.com/facebook/rocksdb/pull/11910/files#diff-32b811c0a1c000768cfb2532052b44dc0b3bf82253f3eab078e15ff201a0dabfL157-L164 rocksdb.blobdb.blob.file.synced COUNT : 1 (stay the same) rocksdb.blobdb.blob.file.bytes.written COUNT : 34842 (stay the same) ``` ``` Stacked Blob DB Run: ./db_bench --use_blob_db=1 --statistics=1 --benchmarks=fillseq --num=10000 --disable_auto_compactions=1 -write_buffer_size=100 --db=../db_bench pre-PR: rocksdb.blobdb.blob.file.write.micros P50 : 12.808042 P95 : 19.674497 P99 : 28.539683 P100 : 51.000000 COUNT : 10000 SUM : 140876 rocksdb.blobdb.blob.file.synced COUNT : 8 rocksdb.blobdb.blob.file.bytes.written COUNT : 1043445 post-PR: rocksdb.blobdb.blob.file.write.micros P50 : 1.657370 P95 : 2.952175 P99 : 3.877519 P100 : 24.000000 COUNT : 30001 SUM : 67924 - COUNT is higher and values are smaller as it includes header and footer write - COUNT is 3X higher due to each Append() count as one post-PR, while in pre-PR, 3 Append()s counts as one. See https://github.com/facebook/rocksdb/pull/11910/files#diff-32b811c0a1c000768cfb2532052b44dc0b3bf82253f3eab078e15ff201a0dabfL157-L164 rocksdb.blobdb.blob.file.synced COUNT : 8 (stay the same) rocksdb.blobdb.blob.file.bytes.written COUNT : 1043445 (stay the same) ``` ### Rehearsal CI stress test Trigger 3 full runs of all our CI stress tests ### Performance Flush ``` TEST_TMPDIR=/dev/shm ./db_basic_bench_pre_pr --benchmark_filter=ManualFlush/key_num:524288/per_key_size:256 --benchmark_repetitions=1000 -- default: 1 thread is used to run benchmark; enable_statistics = true Pre-pr: avg 507515519.3 ns 497686074,499444327,500862543,501389862,502994471,503744435,504142123,504224056,505724198,506610393,506837742,506955122,507695561,507929036,508307733,508312691,508999120,509963561,510142147,510698091,510743096,510769317,510957074,511053311,511371367,511409911,511432960,511642385,511691964,511730908, Post-pr: avg 511971266.5 ns, regressed 0.88% 502744835,506502498,507735420,507929724,508313335,509548582,509994942,510107257,510715603,511046955,511352639,511458478,512117521,512317380,512766303,512972652,513059586,513804934,513808980,514059409,514187369,514389494,514447762,514616464,514622882,514641763,514666265,514716377,514990179,515502408, ``` Compaction ``` TEST_TMPDIR=/dev/shm ./db_basic_bench_{pre\|post}_pr --benchmark_filter=ManualCompaction/comp_style:0/max_data:134217728/per_key_size:256/enable_statistics:1 --benchmark_repetitions=1000 -- default: 1 thread is used to run benchmark Pre-pr: avg 495346098.30 ns 492118301,493203526,494201411,494336607,495269217,495404950,496402598,497012157,497358370,498153846 Post-pr: avg 504528077.20, regressed 1.85%. "ManualCompaction" include flush so the isolated regression for compaction should be around 1.85-0.88 = 0.97% 502465338,502485945,502541789,502909283,503438601,504143885,506113087,506629423,507160414,507393007 ``` Put with WAL (in case passing WriteOptions slows down this path even without collecting SST write stats) ``` TEST_TMPDIR=/dev/shm ./db_basic_bench_pre_pr --benchmark_filter=DBPut/comp_style:0/max_data:107374182400/per_key_size:256/enable_statistics:1/wal:1 --benchmark_repetitions=1000 -- default: 1 thread is used to run benchmark Pre-pr: avg 3848.10 ns 3814,3838,3839,3848,3854,3854,3854,3860,3860,3860 Post-pr: avg 3874.20 ns, regressed 0.68% 3863,3867,3871,3874,3875,3877,3877,3877,3880,3881 ``` Pull Request resolved: https://github.com/facebook/rocksdb/pull/11910 Reviewed By: ajkr Differential Revision: D49788060 Pulled By: hx235 fbshipit-source-id: 79e73699cda5be3b66461687e5147c2484fc5eff		2023-12-29 15:29:23 -08:00
.circleci	Initial CircleCI -> GitHub Actions migration (#12163 )	2023-12-21 15:40:21 -08:00
.github	Disable GitHub Actions jobs on forks (#12191 )	2023-12-28 17:23:18 -08:00
buckifier	Error out in case of std errors in blackbox test and export file in TARGETS	2023-10-24 11:46:18 -07:00
build_tools	Initial CircleCI -> GitHub Actions migration (#12163 )	2023-12-21 15:40:21 -08:00
cache	Add some compressed and tiered secondary cache stats (#12150 )	2023-12-15 11:34:08 -08:00
cmake	gcc-11 and cmake related cleanup (#9286 )	2021-12-17 17:04:35 -08:00
coverage	Remove platform009 and default to platform010 (#11333 )	2023-03-30 09:56:37 -07:00
db	Group SST write in flush, compaction and db open with new stats (#11910 )	2023-12-29 15:29:23 -08:00
db_stress_tool	Group SST write in flush, compaction and db open with new stats (#11910 )	2023-12-29 15:29:23 -08:00
docs	FIX new blog post (JNI performance) Locate images correctly (#12050 )	2023-11-07 11:58:58 -08:00
env	Group SST write in flush, compaction and db open with new stats (#11910 )	2023-12-29 15:29:23 -08:00
examples	Fix compact_files_example (#12084 )	2023-11-21 09:34:59 -08:00
file	Group SST write in flush, compaction and db open with new stats (#11910 )	2023-12-29 15:29:23 -08:00
fuzz	Block per key-value checksum (#11287 )	2023-04-25 12:08:23 -07:00
include/rocksdb	Group SST write in flush, compaction and db open with new stats (#11910 )	2023-12-29 15:29:23 -08:00
java	Group SST write in flush, compaction and db open with new stats (#11910 )	2023-12-29 15:29:23 -08:00
logging	Group SST write in flush, compaction and db open with new stats (#11910 )	2023-12-29 15:29:23 -08:00
memory	internal_repo_rocksdb (-8794174668376270091) (#12114 )	2023-12-01 11:10:30 -08:00
memtable	internal_repo_rocksdb (-8794174668376270091) (#12114 )	2023-12-01 11:10:30 -08:00
microbench	internal_repo_rocksdb (-8794174668376270091) (#12114 )	2023-12-01 11:10:30 -08:00
monitoring	Group SST write in flush, compaction and db open with new stats (#11910 )	2023-12-29 15:29:23 -08:00
options	Group SST write in flush, compaction and db open with new stats (#11910 )	2023-12-29 15:29:23 -08:00
plugin	Add initial CMake support to plugin (#9214 )	2021-11-30 17:16:53 -08:00
port	internal_repo_rocksdb (-8794174668376270091) (#12114 )	2023-12-01 11:10:30 -08:00
table	Group SST write in flush, compaction and db open with new stats (#11910 )	2023-12-29 15:29:23 -08:00
test_util	Group SST write in flush, compaction and db open with new stats (#11910 )	2023-12-29 15:29:23 -08:00
third-party	fix optimization-disabled test builds with platform010 (#11361 )	2023-04-10 13:59:44 -07:00
tools	Group SST write in flush, compaction and db open with new stats (#11910 )	2023-12-29 15:29:23 -08:00
trace_replay	Trace analyzer: replace number with enumeration type (#10827 )	2023-12-27 10:38:53 -08:00
unreleased_history	Group SST write in flush, compaction and db open with new stats (#11910 )	2023-12-29 15:29:23 -08:00
util	Group SST write in flush, compaction and db open with new stats (#11910 )	2023-12-29 15:29:23 -08:00
utilities	Group SST write in flush, compaction and db open with new stats (#11910 )	2023-12-29 15:29:23 -08:00
.clang-format	A script that automatically reformat affected lines	2014-01-14 12:21:24 -08:00
.gitignore	Add .arcconfig to .gitignore (fb internal use) (#11803 )	2023-09-07 14:57:39 -07:00
.lgtm.yml	Create lgtm.yml for LGTM.com C/C++ analysis (#4058 )	2018-06-26 12:43:04 -07:00
.watchmanconfig	Added .watchmanconfig file to rocksdb repo (#5593 )	2019-07-19 15:00:33 -07:00
AUTHORS	Update RocksDB Authors File	2017-10-18 14:42:10 -07:00
CMakeLists.txt	Initial CircleCI -> GitHub Actions migration (#12163 )	2023-12-21 15:40:21 -08:00
CODE_OF_CONDUCT.md	Adopt Contributor Covenant	2019-08-29 23:21:01 -07:00
CONTRIBUTING.md	Add Code of Conduct	2017-12-05 18:42:35 -08:00
COPYING	Add GPLv2 as an alternative license.	2017-04-27 18:06:12 -07:00
DEFAULT_OPTIONS_HISTORY.md	Add Options::DisableExtraChecks, clarify force_consistency_checks (#9363 )	2022-01-18 17:31:03 -08:00
DUMP_FORMAT.md	First version of rocksdb_dump and rocksdb_undump.	2015-06-19 16:24:36 -07:00
HISTORY.md	Update HISTORY/version/format compatibility script for the 8.10 release (#12154 )	2023-12-15 14:44:23 -08:00
INSTALL.md	RocksDB now requires gflags v2.2.0 (#10933 )	2023-10-03 09:58:49 -07:00
LANGUAGE-BINDINGS.md	Add grocksdb in Go language bindings (#10498 )	2022-08-23 15:02:10 -07:00
LICENSE.Apache	Change RocksDB License	2017-07-15 16:11:23 -07:00
LICENSE.leveldb	Add back the LevelDB license file	2017-07-16 18:42:18 -07:00
Makefile	Add support for linux-riscv64 (#12139 )	2023-12-14 11:27:17 -08:00
PLUGINS.md	Add encfs plugin link (#12070 )	2023-11-14 07:33:21 -08:00
README.md	Remove deprecated integration tests from README.md (#11354 )	2023-04-07 16:52:50 -07:00
TARGETS	Make OffpeakTimeInfo available in VersionSet (#12018 )	2023-10-27 15:56:48 -07:00
USERS.md	Add Qdrant to USERS.md (#12072 )	2023-11-16 10:35:08 -08:00
Vagrantfile	Adding CentOS 7 Vagrantfile & build script	2018-02-26 15:27:17 -08:00
WINDOWS_PORT.md	Update branch name in WINDOWS_PORT.md (#8745 )	2021-09-01 19:26:39 -07:00
common.mk	Clean up variables for temporary directory (#9961 )	2022-05-06 16:38:06 -07:00
crash_test.mk	Stress/Crash Test for OptimisticTransactionDB (#11513 )	2023-06-17 16:27:37 -07:00
issue_template.md	Add Google Group to Issue Template	2020-01-28 14:40:37 -08:00
rocksdb.pc.in	build: fix pkg-config file generation (#9953 )	2022-05-30 12:46:40 -07:00
src.mk	Add deletion-triggered compaction to RocksJava (#12028 )	2023-12-18 13:43:01 -08:00
thirdparty.inc	Fix build jemalloc api (#5470 )	2019-06-24 17:40:32 -07:00

README.md

RocksDB: A Persistent Key-Value Store for Flash and RAM Storage

RocksDB is developed and maintained by Facebook Database Engineering Team. It is built on earlier work on LevelDB by Sanjay Ghemawat (sanjay@google.com) and Jeff Dean (jeff@google.com)

This code is a library that forms the core building block for a fast key-value server, especially suited for storing data on flash drives. It has a Log-Structured-Merge-Database (LSM) design with flexible tradeoffs between Write-Amplification-Factor (WAF), Read-Amplification-Factor (RAF) and Space-Amplification-Factor (SAF). It has multi-threaded compactions, making it especially suitable for storing multiple terabytes of data in a single database.

Start with example usage here: https://github.com/facebook/rocksdb/tree/main/examples

See the github wiki for more explanation.

The public interface is in include/. Callers should not include or rely on the details of any other header files in this package. Those internal APIs may be changed without warning.

Questions and discussions are welcome on the RocksDB Developers Public Facebook group and email list on Google Groups.

License

RocksDB is dual-licensed under both the GPLv2 (found in the COPYING file in the root directory) and Apache 2.0 License (found in the LICENSE.Apache file in the root directory). You may select, at your option, one of the above-listed licenses.