Go to file
Peter Dillinger d33d25f903 Disable WAL recycling in crash test; reproducer for recovery data loss (#12918)
Summary:
I was investigating a crash test failure with "Corruption: SST file is ahead of WALs" which I haven't reproduced, but I did reproduce a data loss issue on recovery which I suspect could be the same root problem. The problem is already somewhat known (see https://github.com/facebook/rocksdb/issues/12403 and https://github.com/facebook/rocksdb/issues/12639) where it's only safe to recovery multiple recycled WAL files with trailing old data if the sequence numbers between them are adjacent (to ensure we didn't lose anything in the corrupt/obsolete WAL tail).

However, aside from disableWAL=true, there are features like external file ingestion that can increment the sequence numbers without writing to the WAL. It is simply unsustainable to worry about this kind of feature interaction limiting where we can consume sequence numbers. It is very hard to test and audit as well. For reliable crash recovery of recycled WALs, we need a better way of detecting that we didn't drop data from one WAL to the next.

Until then, let's disable WAL recycling in the crash test, to help stabilize it.

Ideas for follow-up to fix the underlying problem:
(a) With recycling, we could always sync the WAL before opening the next one. HOWEVER, this potentially very large sync could cause a big hiccup in writes (vs. O(1) sized manifest sync).
(a1) The WAL sync could ensure it is truncated to size, or
(a2) By requiring track_and_verify_wals_in_manifest, we could assume that the last synced size in the manifest is the final usable size of the WAL. (It might also be worth avoiding truncating recycled WALs.)
(b) Add a new mechanism to record and verify the final size of a WAL without requiring a sync.
(b1) By requiring track_and_verify_wals_in_manifest, this could be new WAL metadata recorded in the manifest (at the time of switching WALs). Note that new fields of WalMetadata are not forward-compatible, but a new kind of manifest record (next to WalAddition, WalDeletion; e.g. WalCompletion) is IIRC forward-compatible.
(b2) A new kind of WAL header entry (not forward compatible, unfortunately) could record the final size of the previous WAL.

Pull Request resolved: https://github.com/facebook/rocksdb/pull/12918

Test Plan: Added disabled reproducer for non-linear data loss on recovery

Reviewed By: hx235

Differential Revision: D60917527

Pulled By: pdillinger

fbshipit-source-id: 3663d79aec81851f5cf41669f84a712bb4563fd7
2024-08-07 14:20:45 -07:00
.circleci Enable io_uring in stress test (#12313) 2024-01-31 12:37:42 -08:00
.github Attempt to fix the nightly build-linux-clang-13-asan-ubsan-with-folly build 2024-08-01 13:29:56 -07:00
buckifier add export_file to rockdb TARGETS generator and re-gen 2024-05-25 17:10:12 -07:00
build_tools Fix folly build (#12795) 2024-06-22 15:15:02 -07:00
cache Support pro-actively erasing obsolete block cache entries (#12694) 2024-06-07 08:57:11 -07:00
cmake Fix zstd typo in cmake (#12309) 2024-02-22 14:39:05 -08:00
coverage Remove platform009 and default to platform010 (#11333) 2023-03-30 09:56:37 -07:00
db Disable WAL recycling in crash test; reproducer for recovery data loss (#12918) 2024-08-07 14:20:45 -07:00
db_stress_tool SyncWAL() before Close() when FLAGS_avoid_flush_during_shutdown=true in crash test (#12900) 2024-08-02 10:45:34 -07:00
docs Java FFI blog post - Post-publication issues with images (2) (#12372) 2024-02-22 15:01:55 -08:00
env Add some documentation for Env related interfaces (#12813) 2024-06-28 18:56:40 -07:00
examples Prefer static_cast in place of most reinterpret_cast (#12308) 2024-02-07 10:44:11 -08:00
file Fix file deletions in DestroyDB not rate limited (#12891) 2024-08-02 19:31:55 -07:00
fuzz Block per key-value checksum (#11287) 2023-04-25 12:08:23 -07:00
include/rocksdb Add a TransactionOptions to enable tracking timestamp size info inside WriteBatch (#12864) 2024-08-05 13:06:45 -07:00
java Fix incorrect refillPeriodMicros unit in the document (#12832) 2024-07-08 18:08:53 -07:00
logging Fix data race in AutoRollLogger (#12436) 2024-03-14 14:28:33 -07:00
memory Set optimize_filters_for_memory by default (#12377) 2024-04-30 08:33:31 -07:00
memtable Prefer static_cast in place of most reinterpret_cast (#12308) 2024-02-07 10:44:11 -08:00
microbench internal_repo_rocksdb (-8794174668376270091) (#12114) 2023-12-01 11:10:30 -08:00
monitoring New PerfContext counters for block cache bytes read (#12459) 2024-03-21 10:46:46 -07:00
options Remove unreachable code (#12846) 2024-07-09 09:24:43 -07:00
plugin Add initial CMake support to plugin (#9214) 2021-11-30 17:16:53 -08:00
port Fix CondVar::TimedWait for Windows (#12815) 2024-07-08 21:38:21 -07:00
table Fix filter partition size logic (#12904) 2024-08-02 14:49:02 -07:00
test_util Remove redundant no_io parameters to filter functions (#12762) 2024-06-12 18:47:11 -07:00
third-party fix optimization-disabled test builds with platform010 (#11361) 2023-04-10 13:59:44 -07:00
tools Disable WAL recycling in crash test; reproducer for recovery data loss (#12918) 2024-08-07 14:20:45 -07:00
trace_replay Remove 'virtual' when implied by 'override' (#12319) 2024-01-31 13:14:42 -08:00
unreleased_history Fix file deletions in DestroyDB not rate limited (#12891) 2024-08-02 19:31:55 -07:00
util fix the non initialized bug in StderrLogger. (#12839) 2024-07-08 15:59:02 -07:00
utilities Fix CreateCheckpoint not handling failed CleanStagingDirectory well (#12894) 2024-08-06 11:24:29 -07:00
.clang-format A script that automatically reformat affected lines 2014-01-14 12:21:24 -08:00
.gitignore add gtags files ignore (#12747) 2024-06-12 21:46:40 -07:00
.lgtm.yml Create lgtm.yml for LGTM.com C/C++ analysis (#4058) 2018-06-26 12:43:04 -07:00
.watchmanconfig Added .watchmanconfig file to rocksdb repo (#5593) 2019-07-19 15:00:33 -07:00
AUTHORS Update RocksDB Authors File 2017-10-18 14:42:10 -07:00
CMakeLists.txt Fix folly build (#12795) 2024-06-22 15:15:02 -07:00
CODE_OF_CONDUCT.md Adopt Contributor Covenant 2019-08-29 23:21:01 -07:00
CONTRIBUTING.md Add Code of Conduct 2017-12-05 18:42:35 -08:00
COPYING Add GPLv2 as an alternative license. 2017-04-27 18:06:12 -07:00
DEFAULT_OPTIONS_HISTORY.md Add Options::DisableExtraChecks, clarify force_consistency_checks (#9363) 2022-01-18 17:31:03 -08:00
DUMP_FORMAT.md First version of rocksdb_dump and rocksdb_undump. 2015-06-19 16:24:36 -07:00
HISTORY.md Update history and version for 9.5.fb release (#12880) 2024-07-22 13:15:09 -07:00
INSTALL.md fix out of date macos instructions in INSTALL.md (#12393) 2024-02-28 12:38:15 -08:00
LANGUAGE-BINDINGS.md Add grocksdb in Go language bindings (#10498) 2022-08-23 15:02:10 -07:00
LICENSE.Apache Change RocksDB License 2017-07-15 16:11:23 -07:00
LICENSE.leveldb Add back the LevelDB license file 2017-07-16 18:42:18 -07:00
Makefile Update snappy dependency for Java releases. (#12207) 2024-07-05 09:30:28 -07:00
PLUGINS.md Add encfs plugin link (#12070) 2023-11-14 07:33:21 -08:00
README.md Remove deprecated integration tests from README.md (#11354) 2023-04-07 16:52:50 -07:00
TARGETS Add experimental range filters to stress/crash test (#12769) 2024-06-18 16:16:09 -07:00
USERS.md Add Qdrant to USERS.md (#12072) 2023-11-16 10:35:08 -08:00
Vagrantfile Adding CentOS 7 Vagrantfile & build script 2018-02-26 15:27:17 -08:00
WINDOWS_PORT.md Update branch name in WINDOWS_PORT.md (#8745) 2021-09-01 19:26:39 -07:00
common.mk Clean up variables for temporary directory (#9961) 2022-05-06 16:38:06 -07:00
crash_test.mk Stress/Crash Test for OptimisticTransactionDB (#11513) 2023-06-17 16:27:37 -07:00
issue_template.md Add Google Group to Issue Template 2020-01-28 14:40:37 -08:00
rocksdb.pc.in build: fix pkg-config file generation (#9953) 2022-05-30 12:46:40 -07:00
src.mk Fix folly build (#12795) 2024-06-22 15:15:02 -07:00
thirdparty.inc Fix build jemalloc api (#5470) 2019-06-24 17:40:32 -07:00

README.md

RocksDB: A Persistent Key-Value Store for Flash and RAM Storage

CircleCI Status

RocksDB is developed and maintained by Facebook Database Engineering Team. It is built on earlier work on LevelDB by Sanjay Ghemawat (sanjay@google.com) and Jeff Dean (jeff@google.com)

This code is a library that forms the core building block for a fast key-value server, especially suited for storing data on flash drives. It has a Log-Structured-Merge-Database (LSM) design with flexible tradeoffs between Write-Amplification-Factor (WAF), Read-Amplification-Factor (RAF) and Space-Amplification-Factor (SAF). It has multi-threaded compactions, making it especially suitable for storing multiple terabytes of data in a single database.

Start with example usage here: https://github.com/facebook/rocksdb/tree/main/examples

See the github wiki for more explanation.

The public interface is in include/. Callers should not include or rely on the details of any other header files in this package. Those internal APIs may be changed without warning.

Questions and discussions are welcome on the RocksDB Developers Public Facebook group and email list on Google Groups.

License

RocksDB is dual-licensed under both the GPLv2 (found in the COPYING file in the root directory) and Apache 2.0 License (found in the LICENSE.Apache file in the root directory). You may select, at your option, one of the above-listed licenses.