Go to file
Changyu Bi cc6f323705 Include estimated bytes deleted by range tombstones in compensated file size (#10734)
Summary:
compensate file sizes in compaction picking so files with range tombstones are preferred, such that they get compacted down earlier as they tend to delete a lot of data. This PR adds a `compensated_range_deletion_size` field in FileMeta that is computed during Flush/Compaction and persisted in MANIFEST. This value is added to `compensated_file_size` which will be used for compaction picking. Currently, for a file in level L, `compensated_range_deletion_size` is set to the estimated bytes deleted by range tombstone of this file in all levels > L. This helps to reduce space amp when data in older levels are covered by range tombstones in level L.

Pull Request resolved: https://github.com/facebook/rocksdb/pull/10734

Test Plan:
- Added unit tests.
- benchmark to check if the above definition `compensated_range_deletion_size` is reducing space amp as intended, without affecting write amp too much. The experiment set up favorable for this optimization: large range tombstone issued infrequently. Command used:
```
./db_bench -benchmarks=fillrandom,waitforcompaction,stats,levelstats -use_existing_db=false -avoid_flush_during_recovery=true -write_buffer_size=33554432 -level_compaction_dynamic_level_bytes=true -max_background_jobs=8 -max_bytes_for_level_base=134217728 -target_file_size_base=33554432 -writes_per_range_tombstone=500000 -range_tombstone_width=5000000 -num=50000000 -benchmark_write_rate_limit=8388608 -threads=16 -duration=1800 --max_num_range_tombstones=1000000000
```

In this experiment, each thread wrote 16 range tombstones over the duration of 30 minutes, each range tombstone has width 5M that is the 10% of the key space width. Results shows this PR generates a smaller DB size.

Compaction stats from this PR:
```
Level    Files   Size     Score Read(GB)  Rn(GB) Rnp1(GB) Write(GB) Wnew(GB) Moved(GB) W-Amp Rd(MB/s) Wr(MB/s) Comp(sec) CompMergeCPU(sec) Comp(cnt) Avg(sec) KeyIn KeyDrop Rblob(GB) Wblob(GB)
------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
  L0      2/0   31.54 MB   0.5      0.0     0.0      0.0       8.4      8.4       0.0   1.0      0.0     63.4    135.56            110.94       544    0.249       0      0       0.0       0.0
  L4      3/0   96.55 MB   0.8     18.5     6.7     11.8      18.4      6.6       0.0   2.7     65.3     64.9    290.08            284.03       108    2.686    284M  1957K       0.0       0.0
  L5     15/0   404.41 MB   1.0     19.1     7.7     11.4      18.8      7.4       0.3   2.5     66.6     65.7    292.93            285.34       220    1.332    293M  3808K       0.0       0.0
  L6    143/0    4.12 GB   0.0     45.0     7.5     37.5      41.6      4.1       0.0   5.5     71.2     65.9    647.00            632.66       251    2.578    739M    47M       0.0       0.0
 Sum    163/0    4.64 GB   0.0     82.6    21.9     60.7      87.2     26.5       0.3  10.4     61.9     65.4   1365.58           1312.97      1123    1.216   1318M    52M       0.0       0.0
```

Compaction stats from main:
```
Level    Files   Size     Score Read(GB)  Rn(GB) Rnp1(GB) Write(GB) Wnew(GB) Moved(GB) W-Amp Rd(MB/s) Wr(MB/s) Comp(sec) CompMergeCPU(sec) Comp(cnt) Avg(sec) KeyIn KeyDrop Rblob(GB) Wblob(GB)
------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
  L0      0/0    0.00 KB   0.0      0.0     0.0      0.0       8.4      8.4       0.0   1.0      0.0     60.5    142.12            115.89       569    0.250       0      0       0.0       0.0
  L4      3/0   85.68 MB   1.0     17.7     6.8     10.9      17.6      6.7       0.0   2.6     62.7     62.3    289.05            281.79       112    2.581    272M  2309K       0.0       0.0
  L5     11/0   293.73 MB   1.0     18.8     7.5     11.2      18.5      7.2       0.5   2.5     64.9     63.9    296.07            288.50       220    1.346    288M  4365K       0.0       0.0
  L6    130/0    3.94 GB   0.0     51.5     7.6     43.9      47.9      3.9       0.0   6.3     67.2     62.4    784.95            765.92       258    3.042    848M    51M       0.0       0.0
 Sum    144/0    4.31 GB   0.0     88.0    21.9     66.0      92.3     26.3       0.5  11.0     59.6     62.5   1512.19           1452.09      1159    1.305   1409M    58M       0.0       0.0```

Reviewed By: ajkr

Differential Revision: D39834713

Pulled By: cbi42

fbshipit-source-id: fe9341040b8704a8fbb10cad5cf5c43e962c7e6b
2022-12-29 13:28:24 -08:00
.circleci Upgrade CircleCI Windows Build (#10090) 2022-10-28 09:14:47 -07:00
.github/workflows ci: add GitHub token permissions for workflow (#10549) 2022-10-04 12:10:30 -07:00
buckifier Enable BLACK for internal_repo_rocksdb (#10710) 2022-09-20 17:47:52 -07:00
build_tools Print stack traces on frozen tests in CI (#10828) 2022-10-18 00:35:35 -07:00
cache Add a SecondaryCache::InsertSaved() API, use in CacheDumper impl (#10945) 2022-11-21 16:17:36 -08:00
cmake gcc-11 and cmake related cleanup (#9286) 2021-12-17 17:04:35 -08:00
coverage Enable BLACK for internal_repo_rocksdb (#10710) 2022-09-20 17:47:52 -07:00
db Include estimated bytes deleted by range tombstones in compensated file size (#10734) 2022-12-29 13:28:24 -08:00
db_stress_tool Prevent db_stress failure when io_uring is disabled (#11045) 2022-12-19 11:38:42 -08:00
docs Bump nokogiri from 1.13.9 to 1.13.10 in /docs (#11024) 2022-12-08 09:21:36 -08:00
env Added placeholders for MADV defines (#10881) 2022-11-02 14:42:42 -07:00
examples Add rocksdb_backup_restore_example to examples/.gitignore (#10825) 2022-11-02 15:02:09 -07:00
file Fix async prefetch heap use after free (#11049) 2022-12-21 09:15:53 -08:00
fuzz Add some missing headers (#10519) 2022-08-11 12:45:50 -07:00
include/rocksdb Add BackupEngine feature to exclude files (#11030) 2022-12-29 10:42:50 -08:00
java Improve Java API get() performance by reducing copies (#10970) 2022-12-21 11:54:24 -08:00
logging Observe and warn about misconfigured HyperClockCache (#10965) 2022-11-21 12:08:21 -08:00
memory Fix use of make_unique in Arena::AllocateNewBlock (#11012) 2022-12-01 13:18:40 -08:00
memtable Run clang format against files under example/, memory/ and memtable/ folders (#10893) 2022-10-28 13:16:50 -07:00
microbench Avoid allocations/copies for large `GetMergeOperands()` results (#10458) 2022-08-04 00:42:13 -07:00
monitoring Add an unittest for Periodic compaction conflict with ongoing compaction (#10908) 2022-12-12 10:37:55 -08:00
options Ignore max_compaction_bytes for compaction input that are within output key-range (#10835) 2022-10-21 10:22:41 -07:00
plugin Add initial CMake support to plugin (#9214) 2021-11-30 17:16:53 -08:00
port Fix include of windows.h in mmap.h (#10885) 2022-10-26 18:07:57 -07:00
table Avoid mixing sync and async prefetch (#11050) 2022-12-21 22:42:19 -08:00
test_util ~SleepingBackgroundTask() to wake up the sleeping task (#11036) 2022-12-14 12:06:24 -08:00
third-party Meta-internal folly integration with F14FastMap (#9546) 2022-04-13 07:34:01 -07:00
tools Revert PR 10777 "Fix FIFO causing overlapping seqnos in L0 files due to overla…" (#10999) 2022-11-29 10:56:42 -08:00
trace_replay fix compile warnings (#10976) 2022-11-22 15:51:01 -08:00
util Improve error messages for SST footer and size errors (#11009) 2022-12-09 10:03:47 -08:00
utilities Add BackupEngine feature to exclude files (#11030) 2022-12-29 10:42:50 -08:00
.clang-format A script that automatically reformat affected lines 2014-01-14 12:21:24 -08:00
.gitignore Git ignore .clangd/ (#10817) 2022-10-17 08:33:58 -07:00
.lgtm.yml Create lgtm.yml for LGTM.com C/C++ analysis (#4058) 2018-06-26 12:43:04 -07:00
.watchmanconfig Added .watchmanconfig file to rocksdb repo (#5593) 2019-07-19 15:00:33 -07:00
AUTHORS Update RocksDB Authors File 2017-10-18 14:42:10 -07:00
CMakeLists.txt Consider range tombstone in compaction output file cutting (#10802) 2022-12-15 09:11:54 -08:00
CODE_OF_CONDUCT.md Adopt Contributor Covenant 2019-08-29 23:21:01 -07:00
CONTRIBUTING.md Add Code of Conduct 2017-12-05 18:42:35 -08:00
COPYING Add GPLv2 as an alternative license. 2017-04-27 18:06:12 -07:00
DEFAULT_OPTIONS_HISTORY.md Add Options::DisableExtraChecks, clarify force_consistency_checks (#9363) 2022-01-18 17:31:03 -08:00
DUMP_FORMAT.md First version of rocksdb_dump and rocksdb_undump. 2015-06-19 16:24:36 -07:00
HISTORY.md Add BackupEngine feature to exclude files (#11030) 2022-12-29 10:42:50 -08:00
INSTALL.md Update supported VS versions in INSTALL.md (#9823) 2022-04-13 13:03:40 -07:00
LANGUAGE-BINDINGS.md Add grocksdb in Go language bindings (#10498) 2022-08-23 15:02:10 -07:00
LICENSE.Apache Change RocksDB License 2017-07-15 16:11:23 -07:00
LICENSE.leveldb Add back the LevelDB license file 2017-07-16 18:42:18 -07:00
Makefile Fix broken dependency: update zlib from 1.2.12 to 1.2.13 (#10833) 2022-11-14 11:49:06 -08:00
PLUGINS.md Add pmem-rocksdb-plugin link in PLUGINs.md (#9934) 2022-05-12 22:02:28 -07:00
README.md Remove Travis CI (#10407) 2022-07-22 20:16:45 -07:00
ROCKSDB_LITE.md Fix remaining uses of "backupable" (#9792) 2022-04-05 09:52:33 -07:00
TARGETS Consider range tombstone in compaction output file cutting (#10802) 2022-12-15 09:11:54 -08:00
USERS.md Add Apache Spark as a user (#10993) 2022-11-28 09:42:42 -08:00
Vagrantfile Adding CentOS 7 Vagrantfile & build script 2018-02-26 15:27:17 -08:00
WINDOWS_PORT.md Update branch name in WINDOWS_PORT.md (#8745) 2021-09-01 19:26:39 -07:00
common.mk Clean up variables for temporary directory (#9961) 2022-05-06 16:38:06 -07:00
crash_test.mk Allow a custom DB cleanup command to be passed to db_crashtest.py (#10883) 2022-10-27 19:47:01 -07:00
issue_template.md Add Google Group to Issue Template 2020-01-28 14:40:37 -08:00
rocksdb.pc.in build: fix pkg-config file generation (#9953) 2022-05-30 12:46:40 -07:00
src.mk Consider range tombstone in compaction output file cutting (#10802) 2022-12-15 09:11:54 -08:00
thirdparty.inc Fix build jemalloc api (#5470) 2019-06-24 17:40:32 -07:00

README.md

RocksDB: A Persistent Key-Value Store for Flash and RAM Storage

CircleCI Status Appveyor Build status PPC64le Build Status

RocksDB is developed and maintained by Facebook Database Engineering Team. It is built on earlier work on LevelDB by Sanjay Ghemawat (sanjay@google.com) and Jeff Dean (jeff@google.com)

This code is a library that forms the core building block for a fast key-value server, especially suited for storing data on flash drives. It has a Log-Structured-Merge-Database (LSM) design with flexible tradeoffs between Write-Amplification-Factor (WAF), Read-Amplification-Factor (RAF) and Space-Amplification-Factor (SAF). It has multi-threaded compactions, making it especially suitable for storing multiple terabytes of data in a single database.

Start with example usage here: https://github.com/facebook/rocksdb/tree/main/examples

See the github wiki for more explanation.

The public interface is in include/. Callers should not include or rely on the details of any other header files in this package. Those internal APIs may be changed without warning.

Questions and discussions are welcome on the RocksDB Developers Public Facebook group and email list on Google Groups.

License

RocksDB is dual-licensed under both the GPLv2 (found in the COPYING file in the root directory) and Apache 2.0 License (found in the LICENSE.Apache file in the root directory). You may select, at your option, one of the above-listed licenses.