rocksdb

mirror of https://github.com/facebook/rocksdb.git synced 2024-12-03 05:54:17 +00:00

Author	SHA1	Message	Date
Andrew Kryczka	933ac0e05c	Fix locking for `ColumnFamilyOptions::inplace_update_support` (#12624 ) Summary: In `SaveValue()`, the read lock needs to be obtained before `VerifyEntryChecksum()` because the KV checksum verification reads the entire value metadata+data, which is all mutable when `ColumnFamilyOptions::inplace_update_support == true`. In `MemTable::Update()`, the write lock needs to be obtained before mutating the value metadata (changing the value size) because it can be read concurrently. Pull Request resolved: https://github.com/facebook/rocksdb/pull/12624 Test Plan: ``` $ make COMPILE_WITH_TSAN=1 -j56 db_stress ... $ python3 tools/db_crashtest.py blackbox --simple --max_key=10 --inplace_update_support=1 --interval=10 --allow_concurrent_memtable_write=0 ``` Reviewed By: cbi42 Differential Revision: D57034571 Pulled By: ajkr fbshipit-source-id: 3dddf881ad87923143acdf6bfec12ce47bb13a48	2024-05-08 08:30:12 -07:00
Changyu Bi	5bf2c00a35	Clarify manual compaction and file ingestion behavior with FIFO compaction (#12618 ) Summary: For manual compaction, FIFO compaction will always skip key range overlapping checking with SST files. If CompactRange() is called with CompactionRangeOptions::change_level=true, a CF with FIFO compaction will now return Status::NotSupported. For file ingestion, we will always ingest into L0. Previously, it's possible to ingest files into non-L0 levels with FIFO compaction. These changes also help to fix [this](`a178d15baf/db/db_impl/db_impl_compaction_flush.cc (L1269)`) assertion failure in crash tests. Pull Request resolved: https://github.com/facebook/rocksdb/pull/12618 Test Plan: added unit tests to verify the new behavior. Reviewed By: hx235 Differential Revision: D56962401 Pulled By: cbi42 fbshipit-source-id: 19812a1509650b4162b379ca5bee02f2e9d9569d	2024-05-07 12:00:15 -07:00
Hui Xiao	b312dbe920	Remove duplicate inplace_update_support crash/stress test flag (#12610 ) Summary: Context/Summary: As titled. There were two flags serving the same purpose so removed one of them. Pull Request resolved: https://github.com/facebook/rocksdb/pull/12610 Test Plan: CI Reviewed By: jaykorean, ajkr Differential Revision: D56916119 Pulled By: hx235 fbshipit-source-id: 011140a7945782cc613ca86d4b542db0cf7fb444	2024-05-03 11:33:33 -07:00
Yu Zhang	8b3d9e6bfe	Add TimedPut to stress test (#12559 ) Summary: This also updates WriteBatch's protection info to include write time since there are several places in memtable that by default protects the whole value slice. This PR is stacked on https://github.com/facebook/rocksdb/issues/12543 Pull Request resolved: https://github.com/facebook/rocksdb/pull/12559 Reviewed By: pdillinger Differential Revision: D56308285 Pulled By: jowlyzhang fbshipit-source-id: 5524339fe0dd6c918dc940ca2f0657b5f2111c56	2024-04-30 15:40:35 -07:00
Hui Xiao	24a35b6e57	Add more public APIs to crash/stress test (#12541 ) Summary: Context/Summary: This PR includes some public DB APIs not tested in crash/stress yet can be added in a straightforward way. Pull Request resolved: https://github.com/facebook/rocksdb/pull/12541 Test Plan: - Locally run crash test heavily stressing on these new APIs - CI Reviewed By: jowlyzhang Differential Revision: D56164892 Pulled By: hx235 fbshipit-source-id: 8bb568c3e65aec39d642987033f1d76c52f69bd8	2024-04-16 15:43:26 -07:00
Jay Huh	dfdc3b158e	Add offpeak feature to crash test (#12549 ) Summary: As title. Add offpeak feature in stress test. Pull Request resolved: https://github.com/facebook/rocksdb/pull/12549 Test Plan: Ran stress test locally with the flag set ``` Running db_stress with pid=701060: ./db_stress ... --daily_offpeak_time_utc=04:00-08:00 ... --periodic_compaction_seconds=10 ... ... KILLED 701060 stdout: Choosing random keys with no overwrite Creating 6250000 locks 2024/04/16-11:38:19 Initializing db_stress RocksDB version : 9.2 Format version : 5 TransactionDB : false Stacked BlobDB : false Read only mode : false Atomic flush : false Manual WAL flush : true Column families : 1 Clear CFs one in : 0 Number of threads : 32 Ops per thread : 100000000 Time to live(sec) : unused Read percentage : 60% Prefix percentage : 0% Write percentage : 35% Delete percentage : 4% Delete range percentage : 1% No overwrite percentage : 1% Iterate percentage : 0% Custom ops percentage : 0% DB-write-buffer-size : 0 Write-buffer-size : 4194304 Iterations : 10 Max key : 25000000 Ratio #ops/#keys : 128.000000 Num times DB reopens : 0 Batches/snapshots : 0 Do update in place : 0 Num keys per lock : 4 Compression : LZ4 Bottommost Compression : DisableOption Checksum type : kxxHash File checksum impl : none Bloom bits / key : 18.000000 Max subcompactions : 4 Use MultiGet : false Use GetEntity : false Use MultiGetEntity : false Verification only : false Memtablerep : skip_list Test kill odd : 0 Periodic Compaction Secs : 10 Daily Offpeak UTC : 04:00-08:00 <<<<<<<<<<<<<<< Newly added Compaction TTL : 0 Compaction Pri : kMinOverlappingRatio Background Purge : 0 Write DB ID to manifest : 0 Max Write Batch Group Size: 16 Use dynamic level : 1 Read fault one in : 0 Write fault one in : 1000 Open metadata write fault one in: 8 Sync fault injection : 0 Best efforts recovery : 0 Fail if OPTIONS file error: 0 User timestamp size bytes : 0 Persist user defined timestamps : 1 WAL compression : zstd Try verify sst unique id : 1 ------------------------------------------------ ``` Reviewed By: hx235 Differential Revision: D56203102 Pulled By: jaykorean fbshipit-source-id: 11a9be7362b3b26940d74d41c8bf4ebac3f03a2d	2024-04-16 12:44:44 -07:00
Hui Xiao	d41e568b1c	Add inplace_update_support to crash/stress test (#12535 ) Summary: Context/Summary: `inplace_update_support=true` is not tested in crash/stress test. Since it's not compatible with snapshots like compaction_filter, we need to sanitize its value in presence of snapshots-related options. A minor refactoring is added to centralize such sanitization in db_crashtest.py - see `check_multiget_consistency` and `check_multiget_entity_consistency` Pull Request resolved: https://github.com/facebook/rocksdb/pull/12535 Test Plan: CI Reviewed By: ajkr Differential Revision: D56102978 Pulled By: hx235 fbshipit-source-id: 2e2ab6685a65123b14a321b99f45f60bc6509c6b	2024-04-15 16:11:58 -07:00
Hui Xiao	ef26d68e8d	Renable kAdmPolicyThreeQueue in crash test (#12524 ) Summary: Context/Summary: We need a `nvm_sec_cache` when `kAdmPolicyThreeQueue` is used otherwise a nullptr cache will be accessed causing us segfault in https://github.com/facebook/rocksdb/pull/12521 Pull Request resolved: https://github.com/facebook/rocksdb/pull/12524 Test Plan: - Re-enabled `kAdmPolicyThreeQueue` and rehearsed stress test that failed before this fix and pass after Reviewed By: jowlyzhang Differential Revision: D55997093 Pulled By: hx235 fbshipit-source-id: e1c6f1015091b4cff0ce6a3fff981d5dece52a62	2024-04-11 14:53:11 -07:00
Hui Xiao	72c1376fcf	Fix "assertion failed - iter != ROCKSDB_NAMESPACE::OptionsHelper::temperature_to_string.end()" (#12519 ) Summary: Context/Summary: for unknown reason, calling a db stress common function in db stress flag file for temperature-related flags will cause some weird behavior in some compilation/build. ``` assertion failed - iter != ROCKSDB_NAMESPACE::OptionsHelper::temperature_to_string.end() ``` For now, we decide not to call such function by hard-coding their default stress test values. Pull Request resolved: https://github.com/facebook/rocksdb/pull/12519 Test Plan: - Run a rehearsal stress test with this fix and weird behavior is gone. Reviewed By: jowlyzhang Differential Revision: D55884693 Pulled By: hx235 fbshipit-source-id: ba5135f5b37a9fa686b3ccae8d3f77e62d6562c9	2024-04-08 13:45:41 -07:00
Hui Xiao	ad423abbd1	Add more missing options in crash test (#12508 ) Summary: Context/Summary: This is to improve our crash test coverage. Bonus change: - Added the missing Options string mapping for `CacheTier::kVolatileCompressedTier` - Deprecated crash test options `enable_tiered_storage` mainly for setting `last_level_temperature` which is now covered in crash test by itself - Intensified `verify_checksum_one_in\verify_file_checksums_one_in` as I found out these together with new coverage surface more issues Pull Request resolved: https://github.com/facebook/rocksdb/pull/12508 Test Plan: CI to look out for trivial failures Reviewed By: jowlyzhang Differential Revision: D55768594 Pulled By: hx235 fbshipit-source-id: 9b829da0309a7db3fcdb17992a524dd64498325c	2024-04-08 09:48:03 -07:00
Hui Xiao	d985902ef4	Disallow refitting more than 1 file from non-L0 to L0 (#12481 ) Summary: Context/Summary: We recently discovered that `CompactRange(change_level=true, target_level=0)` can possibly refit more than 1 files to L0. This refitting can cause read performance regression as we need to go through every file in L0, corruption in some edge case and false positive corruption caught by force consistency check. We decided to explicitly disallow such behavior. A related change to OptionChangeMigration(): - When migrating to FIFO with `compaction_options_fifo.max_table_files_size > 0`, RocksDB will [CompactRange() all the to-be-migrate data into a couple L0 files](https://github.com/facebook/rocksdb/blob/main/utilities/option_change_migration/option_change_migration.cc#L164-L169) to avoid dropping all the data upon migration finishes when the migrated data is larger than max_table_files_size. This is achieved by first compacting all the data into a couple non-L0 files and refitting those files from non-L0 to L0 if needed. In that way, only some data instead of all data will be dropped immediately after migration to FIFO with a max_table_files_size. - Since this type of refitting behavior is disallowed from now on, we won't do this trick anymore and explicitly state such risk in API comment. Pull Request resolved: https://github.com/facebook/rocksdb/pull/12481 Test Plan: - New UT - Modified UT Reviewed By: cbi42 Differential Revision: D55351178 Pulled By: hx235 fbshipit-source-id: 9d8854f2f81d7e8aff859c3a4e53b7d688048e80	2024-03-29 10:52:36 -07:00
Changyu Bi	3d5be596a5	Fix a bug in iterator with UDT + `ReadOptions::pin_data` (#12451 ) Summary: with https://github.com/facebook/rocksdb/issues/12414 enabling `ReadOptions::pin_data`, this bug surfaced as corrupted per key-value checksum during crash test. `saved_key_.GetUserKey()` could be pinned user key, so DBIter should not overwrite it. In one case, it only surfaces when iterator skips many keys of the same user key. To stress that code path, this PR also added `max_sequential_skip_in_iterations` to crash test. Pull Request resolved: https://github.com/facebook/rocksdb/pull/12451 Test Plan: - Set ReadOptions::pin_data to true, the bug can be reproed quickly with `./db_stress --persist_user_defined_timestamps=1 --user_timestamp_size=8 --writepercent=35 --delpercent=4 --delrangepercent=1 --iterpercent=20 --nooverwritepercent=1 --prefix_size=8 --prefixpercent=10 --readpercent=30 --memtable_protection_bytes_per_key=8 --block_protection_bytes_per_key=2 --clear_column_family_one_in=0`. - Set max_sequential_skip_in_iterations to 1 for the other occurrence of the bug. Reviewed By: jowlyzhang Differential Revision: D55003766 Pulled By: cbi42 fbshipit-source-id: 23e1049129456684dafb028b6132b70e0afc07fb	2024-03-18 09:05:11 -07:00
Hui Xiao	fa4978c566	Re-suppress tolerable manual compaction stress test failures (#12437 ) Summary: Context/Summary: Previously manual compaction stress test failures won't terminate stress test. https://github.com/facebook/rocksdb/pull/12414 made more manual compaction failures terminate the stress test for signal boosting. A downside to that PR: some tolerable manual compaction stress test failures also unnecessarily terminate stress test. Ideally we should exclude exactly those tolerable errors (left as a TODO) from being able to terminate. For now we approximate those errors by Aborted(), InvalidArgument(), NotSupported() etc. It's still an improvement to pre-https://github.com/facebook/rocksdb/pull/12414 situation. Pull Request resolved: https://github.com/facebook/rocksdb/pull/12437 Test Plan: - No more tolerable manual compaction stress test failures terminating stress test. Reviewed By: cbi42 Differential Revision: D54913010 Pulled By: hx235 fbshipit-source-id: c43fa79d8f9c1c8b4f8786f8f46508b0ad619a9e	2024-03-14 14:50:56 -07:00
Hui Xiao	30243c6573	Add missing db crash options (#12414 ) Summary: Context/Summary: We are doing a sweep in all public options, including but not limited to the `Options`, `Read/WriteOptions`, `IngestExternalFileOptions`, cache options.., to find and add the uncovered ones into db crash. The options included in this PR require minimum changes to db crash other than adding the options themselves. A bonus change: to surface new issues by improved coverage in stderror, we decided to fail/terminate crash test for manual compactions (CompactFiles, CompactRange()) on meaningful errors. See https://github.com/facebook/rocksdb/pull/12414/files#diff-5c4ced6afb6a90e27fec18ab03b2cd89e8f99db87791b4ecc6fa2694284d50c0R2528-R2532, https://github.com/facebook/rocksdb/pull/12414/files#diff-5c4ced6afb6a90e27fec18ab03b2cd89e8f99db87791b4ecc6fa2694284d50c0R2330-R2336 for more. Pull Request resolved: https://github.com/facebook/rocksdb/pull/12414 Test Plan: - Run `python3 ./tools/db_crashtest.py --simple blackbox` for 10 minutes to ensure no trivial failure - Run `python3 tools/db_crashtest.py --simple blackbox --compact_files_one_in=1 --compact_range_one_in=1 --read_fault_one_in=1 --write_fault_one_in=1 --interval=50` for a while to ensure the bonus change does not result in trivial crash/termination of stress test Reviewed By: ajkr, jowlyzhang, cbi42 Differential Revision: D54691774 Pulled By: hx235 fbshipit-source-id: 50443dfb6aaabd8e24c79a2e42b68c6de877be88	2024-03-12 17:24:12 -07:00
Peter Dillinger	a53ed91691	Fix/improve temperature handling for file ingestion (#12402 ) Summary: Partly following up on leftovers from https://github.com/facebook/rocksdb/issues/12388 In terms of public API: * Make it clear that IngestExternalFileArg::file_temperature is just a hint for opening the existing file, though it was previously used for both copy-from temp hint and copy-to temp, which was bizarre. * Specify how IngestExternalFile assigns temperature to file ingested into DB. (See details in comments.) This approach is not perfect in terms of matching how the DB assigns temperatures, but was the simplest way to get close. The key complication for matching DB temperature assignments is that ingestion files are copied (to a destination temp) before their target level is determined (in general). * Add a temperature option to SstFileWriter::Open so that files intended for ingestion can be initially written to a chosen temperature. * Note that "fail_if_not_bottommost_level" is obsolete/confusing use of "bottommost" In terms of the implementation, there was a similar bit of oddness with the internal CopyFile API, which only took one temperature, ambiguously applicable to the source, destination, or both. This is also fixed. Eventual suggested follow-up: * Before copying files for ingestion, determine a tentative level assignment to use for destination temperature, and keep that even if final level assignment happens to be different at commit time (rare). * More temperature handling for CreateColumnFamilyWithImport and Checkpoints. Pull Request resolved: https://github.com/facebook/rocksdb/pull/12402 Test Plan: Deeply revamped ExternalSSTFileBasicTest.IngestWithTemperature to test the new changes. Previously this test was insufficient because it was only looking at temperatures according to the DB manifest. Incorporating FileTemperatureTestFS allows us to also test the temperatures in the storage layer. Used macros instead of functions for better tracing to critical source location on test failures. Some enhancements to FileTemperatureTestFS in the process of developing the revamped test. Reviewed By: jowlyzhang Differential Revision: D54442794 Pulled By: pdillinger fbshipit-source-id: 41d9d0afdc073e6a983304c10bbc07c70cc7e995	2024-03-05 16:56:08 -08:00
Peter Dillinger	d780e7a561	Remove `bottommost_temperature` (#12389 ) Summary: deprecated option already replaced by `last_level_temperature`. (Keeping recognition of the option in old options files.) Pull Request resolved: https://github.com/facebook/rocksdb/pull/12389 Test Plan: tests updated Reviewed By: jowlyzhang, cbi42 Differential Revision: D54267946 Pulled By: pdillinger fbshipit-source-id: 65c49b15e7394829c1f3b44edd4179d2daff6017	2024-02-27 14:48:00 -08:00
Andrew Kryczka	8e29f243c9	No filesystem reads during `Merge()` writes (#12365 ) Summary: This occasional filesystem read in the write path has caused user pain. It doesn't seem very useful considering it only limits one component's merge chain length, and only helps merge uncached (i.e., infrequently read) values. This PR proposes allowing `max_successive_merges` to be exceeded when the value cannot be read from in-memory components. I included a rollback flag (`strict_max_successive_merges`) just in case. Pull Request resolved: https://github.com/facebook/rocksdb/pull/12365 Test Plan: "rocksdb.block.cache.data.add" is number of data blocks read from filesystem. Since the benchmark is write-only, compaction is disabled, and flush doesn't read data blocks, any nonzero value means the user write issued the read. ``` $ for s in false true; do echo -n "strict_max_successive_merges=$s: " && ./db_bench -value_size=64 -write_buffer_size=131072 -writes=128 -num=1 -benchmarks=mergerandom,flush,mergerandom -merge_operator=stringappend -disable_auto_compactions=true -compression_type=none -strict_max_successive_merges=$s -max_successive_merges=100 -statistics=true \|& grep 'block.cache.data.add COUNT' ; done strict_max_successive_merges=false: rocksdb.block.cache.data.add COUNT : 0 strict_max_successive_merges=true: rocksdb.block.cache.data.add COUNT : 1 ``` Reviewed By: hx235 Differential Revision: D53982520 Pulled By: ajkr fbshipit-source-id: e40f761a60bd601f232417ac0058e4a33ee9c0f4	2024-02-21 13:15:27 -08:00
Akanksha Mahajan	956f1dfde3	Change ReadAsync callback API to remove const from FSReadRequest (#11649 ) Summary: Modify ReadAsync callback API to remove const from FSReadRequest as const doesn't let to fs_scratch to move the ownership. Pull Request resolved: https://github.com/facebook/rocksdb/pull/11649 Test Plan: CircleCI jobs Reviewed By: anand1976 Differential Revision: D53585309 Pulled By: akankshamahajan15 fbshipit-source-id: 3bff9035db0e6fbbe34721a5963443355807420d	2024-02-16 09:14:55 -08:00
Changyu Bi	42a8e583c9	Print zstd warning to stdout in stress test (#12338 ) Summary: so the stress test does not fail. Pull Request resolved: https://github.com/facebook/rocksdb/pull/12338 Reviewed By: jaykorean Differential Revision: D53542941 Pulled By: cbi42 fbshipit-source-id: 83b2eb3cb5cc4c5a268da386c22c4aadeb039a74	2024-02-07 14:17:51 -08:00
Peter Dillinger	54cb9c77d9	Prefer static_cast in place of most reinterpret_cast (#12308 ) Summary: The following are risks associated with pointer-to-pointer reinterpret_cast: * Can produce the "wrong result" (crash or memory corruption). IIRC, in theory this can happen for any up-cast or down-cast for a non-standard-layout type, though in practice would only happen for multiple inheritance cases (where the base class pointer might be "inside" the derived object). We don't use multiple inheritance a lot, but we do. * Can mask useful compiler errors upon code change, including converting between unrelated pointer types that you are expecting to be related, and converting between pointer and scalar types unintentionally. I can only think of some obscure cases where static_cast could be troublesome when it compiles as a replacement: * Going through `void` could plausibly cause unnecessary or broken pointer arithmetic. Suppose we have `struct Derived: public Base1, public Base2`. If we have `Derived` -> `void` -> `Base2` -> `Derived` through reinterpret casts, this could plausibly work (though technical UB) assuming the `Base2` is not dereferenced. Changing to static cast could introduce breaking pointer arithmetic. * Unnecessary (but safe) pointer arithmetic could arise in a case like `Derived` -> `Base2` -> `Derived` where before the Base2 pointer might not have been dereferenced. This could potentially affect performance. With some light scripting, I tried replacing pointer-to-pointer reinterpret_casts with static_cast and kept the cases that still compile. Most occurrences of reinterpret_cast have successfully been changed (except for java/ and third-party/). 294 changed, 257 remain. A couple of related interventions included here: Previously Cache::Handle was not actually derived from in the implementations and just used as a `void` stand-in with reinterpret_cast. Now there is a relationship to allow static_cast. In theory, this could introduce pointer arithmetic (as described above) but is unlikely without multiple inheritance AND non-empty Cache::Handle. Remove some unnecessary casts to void* as this is allowed to be implicit (for better or worse). Most of the remaining reinterpret_casts are for converting to/from raw bytes of objects. We could consider better idioms for these patterns in follow-up work. I wish there were a way to implement a template variant of static_cast that would only compile if no pointer arithmetic is generated, but best I can tell, this is not possible. AFAIK the best you could do is a dynamic check that the void* conversion after the static cast is unchanged. Pull Request resolved: https://github.com/facebook/rocksdb/pull/12308 Test Plan: existing tests, CI Reviewed By: ltamasi Differential Revision: D53204947 Pulled By: pdillinger fbshipit-source-id: 9de23e618263b0d5b9820f4e15966876888a16e2	2024-02-07 10:44:11 -08:00
Peter Dillinger	76c834e441	Remove 'virtual' when implied by 'override' (#12319 ) Summary: ... to follow modern C++ style / idioms. Used this hack: ``` for FILE in `cat my_list_of_files`; do perl -pi -e 'BEGIN{undef $/;} s/ virtual( [^;{]* override)/$1/smg' $FILE; done ``` Pull Request resolved: https://github.com/facebook/rocksdb/pull/12319 Test Plan: existing tests, CI Reviewed By: jaykorean Differential Revision: D53275303 Pulled By: pdillinger fbshipit-source-id: bc0881af270aa8ef4d0ae4f44c5a6614b6407377	2024-01-31 13:14:42 -08:00
Akanksha Mahajan	95d582e0cc	Enable io_uring in stress test (#12313 ) Summary: Enable io_uring in stress test Pull Request resolved: https://github.com/facebook/rocksdb/pull/12313 Test Plan: Crash test Reviewed By: anand1976 Differential Revision: D53238319 Pulled By: akankshamahajan15 fbshipit-source-id: c0c8e6a6479f6977210370606e9d551c1299ba62	2024-01-31 12:37:42 -08:00
Peter Dillinger	acf77e1bfe	Fix possible crash test segfault in FileExpectedStateManager::Restore() (#12314 ) Summary: `replayer` could be `nullptr` if `!s.ok()` from an earlier failure. Also consider status returned from `record->Accept()`. Pull Request resolved: https://github.com/facebook/rocksdb/pull/12314 Test Plan: blackbox_crash_test run Reviewed By: hx235 Differential Revision: D53241506 Pulled By: pdillinger fbshipit-source-id: fd330417c23391ca819c3ee0f69e4156d81934dc	2024-01-30 16:16:04 -08:00
Jay Huh	0d68aff3a1	StressTest - Move some stderr messages to stdout (#12304 ) Summary: Moving some of the messages that we print out in `stderr` to `stdout` to make `stderr` more strictly related to errors. Pull Request resolved: https://github.com/facebook/rocksdb/pull/12304 Test Plan: ``` $> python3 tools/db_crashtest.py blackbox --simple --max_key=25000000 --write_buffer_size=4194304 ``` Before ``` ... stderr: WARNING: prefix_size is non-zero but memtablerep != prefix_hash Error : jewoongh injected test error This is not a real failure. Verification failed :( ``` After No longer seeing the `WARNING: prefix_size is non-zero but memtablerep != prefix_hash` message in stderr, but still appears in stdout. ``` WARNING: prefix_size is non-zero but memtablerep != prefix_hash Integrated BlobDB: blob files enabled 0, min blob size 0, blob file size 268435456, blob compression type NoCompression, blob GC enabled 0, cutoff 0.250000, force threshold 1.000000, blob compaction readahead size 0, blob file starting level 0 Integrated BlobDB: blob cache disabled DB path: [/tmp/jewoongh/rocksdb_crashtest_blackboxzztp281q] (Re-)verified 0 unique IDs 2024/01/29-11:57:51 Initializing worker threads Crash-recovery verification passed :) 2024/01/29-11:57:58 Starting database operations 2024/01/29-11:57:58 Starting verification Stress Test : 245.167 micros/op 10221 ops/sec : Wrote 0.00 MB (0.10 MB/sec) (16% of 6 ops) : Wrote 1 times : Deleted 0 times : Single deleted 0 times : 4 read and 0 found the key : Prefix scanned 0 times : Iterator size sum is 0 : Iterated 3 times : Deleted 0 key-ranges : Range deletions covered 0 keys : Got errors 0 times : 0 CompactFiles() succeed : 0 CompactFiles() did not succeed stderr: Error : jewoongh injected test error This is not a real failure. Error : jewoongh injected test error This is not a real failure. Error : jewoongh injected test error This is not a real failure. Verification failed :( ``` Reviewed By: akankshamahajan15 Differential Revision: D53193587 Pulled By: jaykorean fbshipit-source-id: 40d59f4c993c5ce043c571a207ccc9b74a0180c6	2024-01-29 12:52:59 -08:00
Peter Dillinger	4e60663b31	Remove unnecessary, confusing 'extern' (#12300 ) Summary: In C++, `extern` is redundant in a number of cases: * "Global" function declarations and definitions * "Global" variable definitions when already declared `extern` For consistency and simplicity, I've removed these in code that we own. In a couple of cases, I removed obsolete declarations, and for MagicNumber constants, I have consolidated the declarations into a header file (format.h) as standard best practice would prescribe. Pull Request resolved: https://github.com/facebook/rocksdb/pull/12300 Test Plan: no functional changes, CI Reviewed By: ajkr Differential Revision: D53148629 Pulled By: pdillinger fbshipit-source-id: fb8d927959892e03af09b0c0d542b0a3b38fd886	2024-01-29 10:38:08 -08:00
Hui Xiao	438fc3d9b7	No consistency check when compaction filter is enabled in stress test (#12291 ) Summary: Context/Summary: [Consistency check between Multiget and Get](`d82d179a5e/db_stress_tool/no_batched_ops_stress.cc (L585-L591)`) requires snapshot to be repeatable, that is, no keys being protected by this snapshot is deleted. However compaction filter can remove keys under snapshot, e,g,[DBStressCompactionFilter](`d82d179a5e/db_stress_tool/db_stress_compaction_filter.h (L59)`), which makes consistency check fail meaninglessly. This is noted in https://github.com/facebook/rocksdb/wiki/Compaction-Filter - "Since release 6.0, with compaction filter enabled, RocksDB always invoke filtering for any key, even if it knows it will make a snapshot not repeatable." This PR makes consistency check happens only when compaction filter is not enabled in stress test Pull Request resolved: https://github.com/facebook/rocksdb/pull/12291 Test Plan: - Make the stress test command that fails on consistency check pass ``` ./db_stress --preserve_unverified_changes=1 --acquire_snapshot_one_in=0 --adaptive_readahead=0 --allow_concurrent_memtable_write=0 --allow_data_in_errors=True --async_io=0 --auto_readahead_size=0 --avoid_flush_during_recovery=0 --avoid_unnecessary_blocking_io=1 --backup_max_size=104857600 --backup_one_in=0 --batch_protection_bytes_per_key=0 --block_protection_bytes_per_key=0 --block_size=16384 --bloom_before_level=2147483647 --bloom_bits=30.729729833325962 --bottommost_compression_type=disable --bottommost_file_compaction_delay=0 --bytes_per_sync=0 --cache_index_and_filter_blocks=1 --cache_size=33554432 --cache_type=lru_cache --charge_compression_dictionary_building_buffer=0 --charge_file_metadata=1 --charge_filter_construction=0 --charge_table_reader=1 --checkpoint_one_in=0 --checksum_type=kCRC32c --clear_column_family_one_in=0 --column_families=1 --compact_files_one_in=0 --compact_range_one_in=0 --compaction_pri=4 --compaction_readahead_size=0 --compaction_ttl=0 --compressed_secondary_cache_size=8388608 --compression_checksum=0 --compression_max_dict_buffer_bytes=0 --compression_max_dict_bytes=0 --compression_parallel_threads=1 --compression_type=zlib --compression_use_zstd_dict_trainer=0 --compression_zstd_max_train_bytes=0 --continuous_verification_interval=0 --data_block_index_type=0 --db=$db --db_write_buffer_size=0 --delpercent=0 --delrangepercent=50 --destroy_db_initially=0 --detect_filter_construct_corruption=1 --disable_wal=0 --enable_compaction_filter=1 --disable_auto_compactions=1 --enable_pipelined_write=0 --enable_thread_tracking=0 --expected_values_dir=$expected --fail_if_options_file_error=1 --fifo_allow_compaction=0 --file_checksum_impl=xxh64 --flush_one_in=0 --format_version=5 --get_current_wal_file_one_in=0 --get_live_files_one_in=0 --get_property_one_in=0 --get_sorted_wal_files_one_in=0 --index_block_restart_interval=3 --index_type=0 --ingest_external_file_one_in=100 --initial_auto_readahead_size=0 --iterpercent=0 --key_len_percent_dist=1,30,69 --level_compaction_dynamic_level_bytes=1 --lock_wal_one_in=0 --long_running_snapshots=0 --manual_wal_flush_one_in=0 --mark_for_compaction_one_file_in=0 --max_auto_readahead_size=0 --max_background_compactions=1 --max_bytes_for_level_base=67108864 --max_key=1000 --max_key_len=3 --max_manifest_file_size=1073741824 --max_write_batch_group_size_bytes=64 --max_write_buffer_number=10 --max_write_buffer_size_to_maintain=2097152 --memtable_max_range_deletions=0 --memtable_prefix_bloom_size_ratio=0.01 --memtable_protection_bytes_per_key=8 --memtable_whole_key_filtering=0 --memtablerep=skip_list --min_write_buffer_number_to_merge=1 --mmap_read=0 --mock_direct_io=True --nooverwritepercent=1 --num_file_reads_for_auto_readahead=1 --open_files=-1 --open_metadata_write_fault_one_in=0 --open_read_fault_one_in=0 --open_write_fault_one_in=0 --ops_per_thread=100000 --optimize_filters_for_memory=0 --paranoid_file_checks=0 --partition_filters=0 --partition_pinning=2 --pause_background_one_in=0 --periodic_compaction_seconds=0 --prefix_size=8 --prefixpercent=0 --prepopulate_block_cache=0 --preserve_internal_time_seconds=0 --progress_reports=0 --read_fault_one_in=0 --readahead_size=0 --readpercent=45 --recycle_log_file_num=0 --reopen=0 --secondary_cache_fault_one_in=0 --secondary_cache_uri= --set_options_one_in=0 --snapshot_hold_ops=0 --sst_file_manager_bytes_per_sec=104857600 --sst_file_manager_bytes_per_truncate=1048576 --stats_dump_period_sec=0 --subcompactions=1 --sync=0 --sync_fault_injection=0 --target_file_size_base=167772 --target_file_size_multiplier=1 --test_batches_snapshots=0 --top_level_index_pinning=0 --unpartitioned_pinning=3 --use_direct_io_for_flush_and_compaction=0 --use_direct_reads=0 --use_full_merge_v1=0 --use_get_entity=0 --use_merge=0 --use_multi_get_entity=0 --use_multiget=1 --use_put_entity_one_in=0 --use_write_buffer_manager=0 --user_timestamp_size=0 --value_size_mult=32 --verification_only=0 --verify_checksum=0 --verify_checksum_one_in=0 --verify_db_one_in=0 --verify_file_checksums_one_in=0 --verify_iterator_with_expected_state_one_in=0 --verify_sst_unique_id_in_manifest=0 --wal_bytes_per_sync=0 --wal_compression=none --write_buffer_size=335544 --write_dbid_to_manifest=1 --write_fault_one_in=0 --writepercent=5 ``` Reviewed By: jaykorean, ajkr Differential Revision: D53075223 Pulled By: hx235 fbshipit-source-id: 61aa4a79de5d123a55eb5ac08b449a8362cc91ae	2024-01-25 09:58:25 -08:00
Yu Zhang	9243f1b668	Ensures PendingExpectedValue either Commit or Rollback (#12244 ) Summary: This PR adds automatic checks in the `PendingExpectedValue` class to make sure it's either committed or rolled back before being destructed. Pull Request resolved: https://github.com/facebook/rocksdb/pull/12244 Reviewed By: hx235 Differential Revision: D52853794 Pulled By: jowlyzhang fbshipit-source-id: 1dcd7695f2c52b79695be0abe11e861047637dc4	2024-01-24 11:04:40 -08:00
Changyu Bi	a29db3048f	Fix TestGetEntity failure with UDT (#12264 ) Summary: Use the read option with right timestamp and skip verification when using old timestamps. Pull Request resolved: https://github.com/facebook/rocksdb/pull/12264 Test Plan: I can repro with small keyspace: ``` ./db_stress --acquire_snapshot_one_in=10000 --adaptive_readahead=0 --allow_data_in_errors=True --async_io=0 --auto_readahead_size=1 --avoid_flush_during_recovery=0 --avoid_unnecessary_blocking_io=0 --backup_max_size=104857600 --backup_one_in=1000 --batch_protection_bytes_per_key=0 --block_protection_bytes_per_key=4 --block_size=16384 --bloom_before_level=7 --bloom_bits=15 --bottommost_compression_type=xpress --bottommost_file_compaction_delay=0 --bytes_per_sync=262144 --cache_index_and_filter_blocks=1 --cache_size=33554432 --cache_type=fixed_hyper_clock_cache --charge_compression_dictionary_building_buffer=1 --charge_file_metadata=0 --charge_filter_construction=1 --charge_table_reader=1 --checkpoint_one_in=10000 --checksum_type=kXXH3 --clear_column_family_one_in=0 --compact_files_one_in=1000000 --compact_range_one_in=1000000 --compaction_pri=4 --compaction_readahead_size=0 --compaction_style=1 --compaction_ttl=0 --compressed_secondary_cache_size=16777216 --compression_checksum=1 --compression_max_dict_buffer_bytes=0 --compression_max_dict_bytes=0 --compression_parallel_threads=1 --compression_type=snappy --compression_use_zstd_dict_trainer=0 --compression_zstd_max_train_bytes=0 --continuous_verification_interval=0 --data_block_index_type=1 --db_write_buffer_size=0 --delpercent=4 --delrangepercent=1 --destroy_db_initially=1 --detect_filter_construct_corruption=1 --disable_wal=0 --enable_compaction_filter=0 --enable_pipelined_write=1 --enable_thread_tracking=0 --fail_if_options_file_error=1 --fifo_allow_compaction=1 --file_checksum_impl=big --flush_one_in=1000 --format_version=2 --get_current_wal_file_one_in=0 --get_live_files_one_in=10000 --get_property_one_in=100000 --get_sorted_wal_files_one_in=0 --index_block_restart_interval=13 --index_type=0 --ingest_external_file_one_in=0 --initial_auto_readahead_size=16384 --iterpercent=10 --key_len_percent_dist=1,30,69 --level_compaction_dynamic_level_bytes=0 --lock_wal_one_in=10000 --log2_keys_per_lock=10 --long_running_snapshots=0 --manual_wal_flush_one_in=1000 --mark_for_compaction_one_file_in=0 --max_auto_readahead_size=0 --max_background_compactions=20 --max_bytes_for_level_base=10485760 --max_key=1000 --max_key_len=3 --max_manifest_file_size=16384 --max_write_batch_group_size_bytes=1048576 --max_write_buffer_number=3 --max_write_buffer_size_to_maintain=2097152 --memtable_max_range_deletions=0 --memtable_prefix_bloom_size_ratio=0.01 --memtable_protection_bytes_per_key=0 --memtable_whole_key_filtering=0 --memtablerep=skip_list --min_write_buffer_number_to_merge=2 --mmap_read=1 --mock_direct_io=False --nooverwritepercent=1 --num_file_reads_for_auto_readahead=1 --open_files=100 --open_metadata_write_fault_one_in=0 --open_read_fault_one_in=32 --open_write_fault_one_in=16 --ops_per_thread=200000 --optimize_filters_for_memory=0 --paranoid_file_checks=0 --partition_filters=0 --partition_pinning=2 --pause_background_one_in=1000000 --periodic_compaction_seconds=0 --persist_user_defined_timestamps=1 --prefix_size=8 --prefixpercent=5 --prepopulate_block_cache=1 --preserve_internal_time_seconds=3600 --progress_reports=0 --read_fault_one_in=0 --readahead_size=16384 --readpercent=45 --recycle_log_file_num=1 --reopen=20 --secondary_cache_fault_one_in=32 --secondary_cache_uri= --snapshot_hold_ops=100000 --sst_file_manager_bytes_per_sec=0 --sst_file_manager_bytes_per_truncate=0 --stats_dump_period_sec=0 --subcompactions=3 --sync=0 --sync_fault_injection=0 --target_file_size_base=524288 --target_file_size_multiplier=2 --test_batches_snapshots=0 --test_cf_consistency=0 --top_level_index_pinning=1 --unpartitioned_pinning=1 --use_direct_io_for_flush_and_compaction=0 --use_direct_reads=0 --use_full_merge_v1=0 --use_get_entity=1 --use_merge=0 --use_multi_get_entity=0 --use_multiget=1 --use_put_entity_one_in=0 --use_txn=0 --use_write_buffer_manager=0 --user_timestamp_size=8 --value_size_mult=32 --verification_only=0 --verify_checksum=1 --verify_checksum_one_in=1000000 --verify_db_one_in=100000 --verify_file_checksums_one_in=100000 --verify_iterator_with_expected_state_one_in=5 --verify_sst_unique_id_in_manifest=1 --wal_bytes_per_sync=524288 --wal_compression=zstd --write_buffer_size=4194304 --write_dbid_to_manifest=0 --write_fault_one_in=0 --writepercent=35 --db=/dev/shm/rocksdb_test/rocksdb_crashtest_whitebox --expected_values_dir=/dev/shm/rocksdb_test/rocksdb_crashtest_expected Errors when run with main: error : inconsistent values for key 0x00000000000000E5000000000000012B000000000000014D: expected state has the key, GetEntity returns NotFound. error : inconsistent values for key 0x0000000000000009000000000000012B0000000000000254: GetEntity returns :0x010000000504070609080B0A0D0C0F0E111013121514171619181B1A1D1C1F1E212023222524272629282B2A2D2C2F2E313033323534373639383B3A3D3C3F3E, expected state does not have the key. ``` Reviewed By: jaykorean Differential Revision: D52966251 Pulled By: cbi42 fbshipit-source-id: 09436a1b747f1ac545140fc83a2fa4555fef51c1	2024-01-22 12:15:17 -08:00
Yu Zhang	c4228abdc0	Fix backup/checkpoint stress test failure (#12227 ) Summary: This PR fixes this type of stress test failure that could happen in either checkpoint or backup. Example failure messages are like this: `Failure in a backup/restore operation with: Corruption: 0x00000000000001D5000000000000012B00000000000000FD exists in original db but not in restore` `A checkpoint operation failed with: Corruption: 0x0000000000000365000000000000012B0000000000000067 exists in original db but not in checkpoint /...` The internal task has an example test command to quickly reproduce this type of error. The common symptom of these test failures are these expected keys do not exist in the original db either. The root cause is `TestCheckpoint` and `TestBackupRestore` both use the expected state as a proxy for the state of the original db when it comes to check a key's existence. `0758271d51/db_stress_tool/db_stress_test_base.cc (L1838)` This `ExpectedState::Exists` API returns true if a key has a pending write, such as a pending put. In usual case, this pending put should either soon materialize to an actual write when `PendingExpectedValue::Commit` is called to reflect a successful write to the DB, or test should be safely terminated if write to DB fails. All of which happens while a key is locked. So checkpoint and backup usually won't see the discrepancy between db and expected state caused by pending writes. However, the external file ingestion test currently has a path that will proceed the test after a failed ingestion caused by injected errors, leaving the pending put in the expected state. `0758271d51/db_stress_tool/no_batched_ops_stress.cc (L1577-L1589)` I think a proper and future proof fix for this is to explicitly rollback a pending state when a db write operation failed so that expected state do not diverge from db in the first place. I added a `PendingExpectedValue::Rollback` API so that we don't implicitly depend on thread termination to prevent test failures. Another place that could cause same divergence as external file ingestion is `PreloadDbAndReopenAsReadOnly`. `0758271d51/db_stress_tool/db_stress_test_base.cc (L616-L619)` Pull Request resolved: https://github.com/facebook/rocksdb/pull/12227 Reviewed By: hx235 Differential Revision: D52705470 Pulled By: jowlyzhang fbshipit-source-id: b21586b037caeeba29a2cff8c2fdc6f1d0bda9cf	2024-01-16 13:28:51 -08:00
Yu Zhang	8d0c09d7e6	Abort verification when expected state has pending writes / db return non OK(NotFound) status (#12232 ) Summary: In the current flow, the verification will pass and continue the test when db return non Ok(NotFound) status while expected state has pending writes. `fdfd044bb2/db_stress_tool/no_batched_ops_stress.cc (L2054-L2065)` We can just abort when such a db status is ever encountered. This can prevent follow up tests like `TestCheckpoint` and `TestBackupRestore` to consider such a key as existing in the db via the `ExpectedState::Exists` API. This could be a reason for some recent test failures in this path. Pull Request resolved: https://github.com/facebook/rocksdb/pull/12232 Reviewed By: cbi42 Differential Revision: D52737393 Pulled By: jowlyzhang fbshipit-source-id: f2658c5332ccd42f6190783960e2dc6fcd81ccc5	2024-01-12 12:23:09 -08:00
Jay Huh	fdfd044bb2	Logging for test failure due to get/multiget inconsistency (#12228 ) Summary: Additional logging for debugging purpose Pull Request resolved: https://github.com/facebook/rocksdb/pull/12228 Test Plan: CI Reviewed By: cbi42 Differential Revision: D52713401 Pulled By: jaykorean fbshipit-source-id: 535972d60debb70c220887f0f4c06a32f7668f72	2024-01-11 18:15:17 -08:00
Jay Huh	0758271d51	Fix TestGetEntity in stress test when UDT is enabled (#12222 ) Summary: Similar to https://github.com/facebook/rocksdb/issues/11249 , we started to get failures from `TestGetEntity` when the User-defined-timestamp was enabled. Applying the same fix as the `TestGet` _Scenario copied from #11249_ <table> <tr> <th>TestGet thread</th> <th> A writing thread</th> </tr> <tr> <td>read_opts.timestamp = GetNow()</td> <td></td> </tr> <tr> <td></td> <td>Lock key, do write</td> </tr> <tr> <td>Lock key, read(read_opts) return NotFound</td> <td></td> </tr> </table> Pull Request resolved: https://github.com/facebook/rocksdb/pull/12222 Reviewed By: jowlyzhang Differential Revision: D52678830 Pulled By: jaykorean fbshipit-source-id: 6e154f67bb32968add8fea0b7ae7c4858ea64ee7	2024-01-10 16:35:54 -08:00
Changyu Bi	cd15331711	Print status when VerifyOrSyncValue() fails with non-OK status (#12217 ) Summary: This should print more helpful message when a non-ok status like Corruption is returned. Pull Request resolved: https://github.com/facebook/rocksdb/pull/12217 Test Plan: CI passes. Reviewed By: jaykorean Differential Revision: D52637595 Pulled By: cbi42 fbshipit-source-id: e810eeb4cba633d4d4c5d198da4468995e4ed427	2024-01-09 14:20:08 -08:00
Hui Xiao	06e593376c	Group SST write in flush, compaction and db open with new stats (#11910 ) Summary: ## Context/Summary Similar to https://github.com/facebook/rocksdb/pull/11288, https://github.com/facebook/rocksdb/pull/11444, categorizing SST/blob file write according to different io activities allows more insight into the activity. For that, this PR does the following: - Tag different write IOs by passing down and converting WriteOptions to IOOptions - Add new SST_WRITE_MICROS histogram in WritableFileWriter::Append() and breakdown FILE_WRITE_{FLUSH\|COMPACTION\|DB_OPEN}_MICROS Some related code refactory to make implementation cleaner: - Blob stats - Replace high-level write measurement with low-level WritableFileWriter::Append() measurement for BLOB_DB_BLOB_FILE_WRITE_MICROS. This is to make FILE_WRITE_{FLUSH\|COMPACTION\|DB_OPEN}_MICROS include blob file. As a consequence, this introduces some behavioral changes on it, see HISTORY and db bench test plan below for more info. - Fix bugs where BLOB_DB_BLOB_FILE_SYNCED/BLOB_DB_BLOB_FILE_BYTES_WRITTEN include file failed to sync and bytes failed to write. - Refactor WriteOptions constructor for easier construction with io_activity and rate_limiter_priority - Refactor DBImpl::~DBImpl()/BlobDBImpl::Close() to bypass thread op verification - Build table - TableBuilderOptions now includes Read/WriteOpitons so BuildTable() do not need to take these two variables - Replace the io_priority passed into BuildTable() with TableBuilderOptions::WriteOpitons::rate_limiter_priority. Similar for BlobFileBuilder. This parameter is used for dynamically changing file io priority for flush, see https://github.com/facebook/rocksdb/pull/9988?fbclid=IwAR1DtKel6c-bRJAdesGo0jsbztRtciByNlvokbxkV6h_L-AE9MACzqRTT5s for more - Update ThreadStatus::FLUSH_BYTES_WRITTEN to use io_activity to track flush IO in flush job and db open instead of io_priority ## Test ### db bench Flush ``` ./db_bench --statistics=1 --benchmarks=fillseq --num=100000 --write_buffer_size=100 rocksdb.sst.write.micros P50 : 1.830863 P95 : 4.094720 P99 : 6.578947 P100 : 26.000000 COUNT : 7875 SUM : 20377 rocksdb.file.write.flush.micros P50 : 1.830863 P95 : 4.094720 P99 : 6.578947 P100 : 26.000000 COUNT : 7875 SUM : 20377 rocksdb.file.write.compaction.micros P50 : 0.000000 P95 : 0.000000 P99 : 0.000000 P100 : 0.000000 COUNT : 0 SUM : 0 rocksdb.file.write.db.open.micros P50 : 0.000000 P95 : 0.000000 P99 : 0.000000 P100 : 0.000000 COUNT : 0 SUM : 0 ``` compaction, db oopen ``` Setup: ./db_bench --statistics=1 --benchmarks=fillseq --num=10000 --disable_auto_compactions=1 -write_buffer_size=100 --db=../db_bench Run:./db_bench --statistics=1 --benchmarks=compact --db=../db_bench --use_existing_db=1 rocksdb.sst.write.micros P50 : 2.675325 P95 : 9.578788 P99 : 18.780000 P100 : 314.000000 COUNT : 638 SUM : 3279 rocksdb.file.write.flush.micros P50 : 0.000000 P95 : 0.000000 P99 : 0.000000 P100 : 0.000000 COUNT : 0 SUM : 0 rocksdb.file.write.compaction.micros P50 : 2.757353 P95 : 9.610687 P99 : 19.316667 P100 : 314.000000 COUNT : 615 SUM : 3213 rocksdb.file.write.db.open.micros P50 : 2.055556 P95 : 3.925000 P99 : 9.000000 P100 : 9.000000 COUNT : 23 SUM : 66 ``` blob stats - just to make sure they aren't broken by this PR ``` Integrated Blob DB Setup: ./db_bench --enable_blob_files=1 --statistics=1 --benchmarks=fillseq --num=10000 --disable_auto_compactions=1 -write_buffer_size=100 --db=../db_bench Run:./db_bench --enable_blob_files=1 --statistics=1 --benchmarks=compact --db=../db_bench --use_existing_db=1 pre-PR: rocksdb.blobdb.blob.file.write.micros P50 : 7.298246 P95 : 9.771930 P99 : 9.991813 P100 : 16.000000 COUNT : 235 SUM : 1600 rocksdb.blobdb.blob.file.synced COUNT : 1 rocksdb.blobdb.blob.file.bytes.written COUNT : 34842 post-PR: rocksdb.blobdb.blob.file.write.micros P50 : 2.000000 P95 : 2.829360 P99 : 2.993779 P100 : 9.000000 COUNT : 707 SUM : 1614 - COUNT is higher and values are smaller as it includes header and footer write - COUNT is 3X higher due to each Append() count as one post-PR, while in pre-PR, 3 Append()s counts as one. See https://github.com/facebook/rocksdb/pull/11910/files#diff-32b811c0a1c000768cfb2532052b44dc0b3bf82253f3eab078e15ff201a0dabfL157-L164 rocksdb.blobdb.blob.file.synced COUNT : 1 (stay the same) rocksdb.blobdb.blob.file.bytes.written COUNT : 34842 (stay the same) ``` ``` Stacked Blob DB Run: ./db_bench --use_blob_db=1 --statistics=1 --benchmarks=fillseq --num=10000 --disable_auto_compactions=1 -write_buffer_size=100 --db=../db_bench pre-PR: rocksdb.blobdb.blob.file.write.micros P50 : 12.808042 P95 : 19.674497 P99 : 28.539683 P100 : 51.000000 COUNT : 10000 SUM : 140876 rocksdb.blobdb.blob.file.synced COUNT : 8 rocksdb.blobdb.blob.file.bytes.written COUNT : 1043445 post-PR: rocksdb.blobdb.blob.file.write.micros P50 : 1.657370 P95 : 2.952175 P99 : 3.877519 P100 : 24.000000 COUNT : 30001 SUM : 67924 - COUNT is higher and values are smaller as it includes header and footer write - COUNT is 3X higher due to each Append() count as one post-PR, while in pre-PR, 3 Append()s counts as one. See https://github.com/facebook/rocksdb/pull/11910/files#diff-32b811c0a1c000768cfb2532052b44dc0b3bf82253f3eab078e15ff201a0dabfL157-L164 rocksdb.blobdb.blob.file.synced COUNT : 8 (stay the same) rocksdb.blobdb.blob.file.bytes.written COUNT : 1043445 (stay the same) ``` ### Rehearsal CI stress test Trigger 3 full runs of all our CI stress tests ### Performance Flush ``` TEST_TMPDIR=/dev/shm ./db_basic_bench_pre_pr --benchmark_filter=ManualFlush/key_num:524288/per_key_size:256 --benchmark_repetitions=1000 -- default: 1 thread is used to run benchmark; enable_statistics = true Pre-pr: avg 507515519.3 ns 497686074,499444327,500862543,501389862,502994471,503744435,504142123,504224056,505724198,506610393,506837742,506955122,507695561,507929036,508307733,508312691,508999120,509963561,510142147,510698091,510743096,510769317,510957074,511053311,511371367,511409911,511432960,511642385,511691964,511730908, Post-pr: avg 511971266.5 ns, regressed 0.88% 502744835,506502498,507735420,507929724,508313335,509548582,509994942,510107257,510715603,511046955,511352639,511458478,512117521,512317380,512766303,512972652,513059586,513804934,513808980,514059409,514187369,514389494,514447762,514616464,514622882,514641763,514666265,514716377,514990179,515502408, ``` Compaction ``` TEST_TMPDIR=/dev/shm ./db_basic_bench_{pre\|post}_pr --benchmark_filter=ManualCompaction/comp_style:0/max_data:134217728/per_key_size:256/enable_statistics:1 --benchmark_repetitions=1000 -- default: 1 thread is used to run benchmark Pre-pr: avg 495346098.30 ns 492118301,493203526,494201411,494336607,495269217,495404950,496402598,497012157,497358370,498153846 Post-pr: avg 504528077.20, regressed 1.85%. "ManualCompaction" include flush so the isolated regression for compaction should be around 1.85-0.88 = 0.97% 502465338,502485945,502541789,502909283,503438601,504143885,506113087,506629423,507160414,507393007 ``` Put with WAL (in case passing WriteOptions slows down this path even without collecting SST write stats) ``` TEST_TMPDIR=/dev/shm ./db_basic_bench_pre_pr --benchmark_filter=DBPut/comp_style:0/max_data:107374182400/per_key_size:256/enable_statistics:1/wal:1 --benchmark_repetitions=1000 -- default: 1 thread is used to run benchmark Pre-pr: avg 3848.10 ns 3814,3838,3839,3848,3854,3854,3854,3860,3860,3860 Post-pr: avg 3874.20 ns, regressed 0.68% 3863,3867,3871,3874,3875,3877,3877,3877,3880,3881 ``` Pull Request resolved: https://github.com/facebook/rocksdb/pull/11910 Reviewed By: ajkr Differential Revision: D49788060 Pulled By: hx235 fbshipit-source-id: 79e73699cda5be3b66461687e5147c2484fc5eff	2023-12-29 15:29:23 -08:00
Yu Zhang	c2ab4e754b	Add initial support to stress test persist_user_defined_timestamps (#12124 ) Summary: This PR adds initial stress testing for the user-defined timestamps in memtable only feature. Each flavor of the `*_ts` crash test get a 1 in 3 chance to run with timestamps not persisted, this setting is initialized once and kept consistent across the following re-runs. This initial stress test included these things besides disabling incompatible feature combinations to make the test run more stably: 1) It currently only run test methods that validates db state with expected state. Not the ones that validate db state by comparing result from one API to another API. Such as `TestMultiGet` (compared with `Get`), similarly `TestMultiGetEntity`, `TestIterate` (compare src iterator to a control iterator). Due to timestamps being removed, results from one API to another API is not directly comparable as it is now. More test logic to handle that need to be added, will do that in a follow up. 2) Even when comparing db state to expected state, sometimes the db can receive `InvalidArgument` too due to timestamps getting flushed and removed. Added some logic to handle that. 3) When timestamps are not persisted, we don't try to read with older timestamp. Since that's making it easier to get `InvalidArgument`. And this capability is not yet needed by our customer so it's disabled for now. Pull Request resolved: https://github.com/facebook/rocksdb/pull/12124 Test Plan: running multiple flavor of this test on continuous run for sometime before checkin Reviewed By: ltamasi Differential Revision: D51916267 Pulled By: jowlyzhang fbshipit-source-id: 3f3eb5f9618d05d296062820e0ef5cb8edc7c2b2	2023-12-12 09:35:29 -08:00
Yu Zhang	ba8fa0f546	internal_repo_rocksdb (4372117296613874540) (#12117 ) Summary: Pull Request resolved: https://github.com/facebook/rocksdb/pull/12117 Reviewed By: ajkr Differential Revision: D51745846 Pulled By: jowlyzhang fbshipit-source-id: 51c806a484b3b43d174b06d2cfe9499191d09914	2023-12-04 11:17:32 -08:00
anand76	4d04138512	Add dynamic disabling of compressed cache to db_stress (#12102 ) Summary: We now support re-enabling the compressed portion of the `TieredCache` after dynamically disabling it. Add it to db_stress for testing purposes. Pull Request resolved: https://github.com/facebook/rocksdb/pull/12102 Reviewed By: akankshamahajan15 Differential Revision: D51594259 Pulled By: anand1976 fbshipit-source-id: ea544e30a5ebd6290fc9ed46a241f09634764d2a	2023-11-27 13:00:15 -08:00
anand76	2222caec9e	Make CacheWithSecondaryAdapter reservation accounting more robust (#12059 ) Summary: `CacheWithSecondaryAdapter` can distribute placeholder reservations across the primary and secondary caches. The current implementation of the accounting is quite complicated in order to avoid using a mutex. This may cause the accounting to be slightly off after changes to the cache capacity and ratio, resulting in assertion failures. There's also a bug in the unlikely event that the total reservation exceeds the cache capacity. Furthermore, the current implementation is difficult to reason about. This PR simplifies it by doing the accounting while holding a mutex. The reservations are processed in 1MB chunks in order to avoid taking a lock too frequently. As a side effect, this also removes the restriction of not allowing to increase the compressed secondary cache capacity after decreasing it to 0. Pull Request resolved: https://github.com/facebook/rocksdb/pull/12059 Test Plan: Existing unit tests, and a new test for capacity increase from 0 Reviewed By: pdillinger Differential Revision: D51278686 Pulled By: anand1976 fbshipit-source-id: 7e1ad2c50694772997072dd59cab35c93c12ba4f	2023-11-14 16:25:52 -08:00
Changyu Bi	b48480cfd0	Enable `TestIterateAgainstExpected()` in more crash tests (#12040 ) Summary: db_stress flag `verify_iterator_with_expected_state_one_in` is only enabled for in crash test if --simple flag is set. This PR enables it for all supported crash tests by enabling it by default. This adds coverage for --txn and --enable_ts crash tests. Pull Request resolved: https://github.com/facebook/rocksdb/pull/12040 Test Plan: ran crash tests that disabled this flag before for a few hours ``` python3 ./tools/db_crashtest.py blackbox --verify_iterator_with_expected_state_one_in=1 --txn --txn_write_policy=[0,1,2] python3 ./tools/db_crashtest.py blackbox --verify_iterator_with_expected_state_one_in=1 --enable_ts ``` Reviewed By: ajkr, hx235 Differential Revision: D50980001 Pulled By: cbi42 fbshipit-source-id: 3daf6b4c32bdddc5df057240068162aa1a907587	2023-11-03 16:27:11 -07:00
Changyu Bi	8505b26db1	Fix stress test error message for black/whitebox test to catch failures (#12039 ) Summary: black/whitebox crash test relies on error/fail keyword in stderr to catch stress test failure. If a db_stress run prints an error message without these keyword, and then is killed before it graceful exits and prints out "Verification failed" here (`2648e0a747/db_stress_tool/db_stress_driver.cc (L256)`), the error won't be caught. This is more likely to happen if db_stress is printing a stack trace. This PR fixes some error messages. Ideally in the future we should not rely on searching for keywords in stderr to determine failed stress tests. Pull Request resolved: https://github.com/facebook/rocksdb/pull/12039 Test Plan: ``` Added the following change on top of this PR to simulate exit without relevant keyword: @@ -1586,6 +1587,8 @@ class NonBatchedOpsStressTest : public StressTest { assert(thread); assert(!rand_column_families.empty()); assert(!rand_keys.empty()); + fprintf(stderr, "Inconsistency"); + thread->shared->SafeTerminate(); python3 ./tools/db_crashtest.py blackbox --simple --verify_iterator_with_expected_state_one_in=1 --interval=10 will print a stack trace but continue to run db_stress. ``` Reviewed By: jaykorean Differential Revision: D50960076 Pulled By: cbi42 fbshipit-source-id: 5c60a1be04ce4a43adbd33f040d54434f2ae24c9	2023-11-03 09:53:22 -07:00
Yu Zhang	a42910537d	Save the correct user comparator name in OPTIONS file (#12037 ) Summary: I noticed the user comparator name in OPTIONS file can be incorrect when working on a recent stress test failure. The name of the comparator retrieved via the "Comparator::GetRootComparator" API is saved in OPTIONS file as the user comparator. The intention was to get the user comparator wrapped in the internal comparator. However `ImmutableCFOptions.user_comparator` has always been a user comparator of type `Comparator`. The corresponding `GetRootComparator` API is also defined only for user comparator type `Comparator`, not the internal key comparator type `InternalKeyComparator`. For built in comparator `BytewiseComparator` and `ReverseBytewiseComparator`, there is no difference between `Comparator::Name` and `Comparator::GetRootComparator::Name` because these built in comparators' root comparator is themselves. However, for built in comparator `BytewiseComparatorWithU64Ts` and `ReverseBytewiseComparatorWithU64Ts`, there are differences. So this change update the logic to persist the user comparator's name, not its root comparator's name. Pull Request resolved: https://github.com/facebook/rocksdb/pull/12037 Test Plan: The restore flow in stress test, which relies on converting Options object to string and back to Options object is updated to help validate comparator object can be correctly serialized and deserialized with the OPTIONS file mechanism Updated unit test to use a comparator that has a root comparator that is not itself. Reviewed By: cbi42 Differential Revision: D50909750 Pulled By: jowlyzhang fbshipit-source-id: 9086d7135c7a6f4b5565fb47fce194ea0a024f52	2023-11-02 13:27:59 -07:00
Yu Zhang	0b057a7acc	Initialize comparator explicitly in PrepareOptionsForRestoredDB() (#12034 ) Summary: This is to fix below error seeing in stress test: ``` Failure in DB::Open in backup/restore with: Invalid argument: Cannot open a column family and disable user-defined timestamps feature if its existing persist_user_defined_timestamps flag is not false. ``` Pull Request resolved: https://github.com/facebook/rocksdb/pull/12034 Reviewed By: cbi42 Differential Revision: D50860689 Pulled By: jowlyzhang fbshipit-source-id: ebc6cf0a75caa43d3d3bd58e3d5c2ac754cc637c	2023-10-31 16:10:48 -07:00
Changyu Bi	2818a74b95	Initialize merge operator explicitly in PrepareOptionsForRestoredDB() (#12033 ) Summary: We are seeing the following stress test failure: `Failure in DB::Get in backup/restore with: Invalid argument: merge_operator is not properly initialized. Verification failed: Backup/restore failed: Invalid argument: merge_operator is not properly initialized.`. The reason is likely that `GetColumnFamilyOptionsFromString()` does not set merge operator if it's a customized merge operator. Fixing it by initializing merge operator explicitly. Pull Request resolved: https://github.com/facebook/rocksdb/pull/12033 Test Plan: this repro gives the error consistently before this PR ``` ./db_stress --acquire_snapshot_one_in=10000 --adaptive_readahead=0 --allow_concurrent_memtable_write=1 --allow_data_in_errors=True --async_io=0 --atomic_flush=1 --auto_readahead_size=1 --avoid_flush_during_recovery=0 --avoid_unnecessary_blocking_io=1 --backup_max_size=1048576000000 --backup_one_in=50 --batch_protection_bytes_per_key=8 --block_protection_bytes_per_key=2 --block_size=16384 --bloom_before_level=2147483646 --bloom_bits=31.014388066505518 --bottommost_compression_type=lz4hc --bottommost_file_compaction_delay=0 --bytes_per_sync=0 --cache_index_and_filter_blocks=0 --cache_size=33554432 --cache_type=fixed_hyper_clock_cache --charge_compression_dictionary_building_buffer=1 --charge_file_metadata=1 --charge_filter_construction=0 --charge_table_reader=1 --checkpoint_one_in=1000000 --checksum_type=kxxHash --clear_column_family_one_in=0 --column_families=1 --compact_files_one_in=1000000 --compact_range_one_in=1000000 --compaction_pri=3 --compaction_readahead_size=0 --compaction_ttl=10 --compressed_secondary_cache_ratio=0.0 --compressed_secondary_cache_size=0 --compression_checksum=1 --compression_max_dict_buffer_bytes=4095 --compression_max_dict_bytes=16384 --compression_parallel_threads=1 --compression_type=none --compression_use_zstd_dict_trainer=1 --compression_zstd_max_train_bytes=0 --continuous_verification_interval=0 --data_block_index_type=0 --db=/dev/shm/rocksdb_test/rocksdb_crashtest_blackbox --db_write_buffer_size=0 --delpercent=4 --delrangepercent=1 --destroy_db_initially=1 --detect_filter_construct_corruption=0 --disable_wal=1 --enable_compaction_filter=0 --enable_pipelined_write=0 --enable_thread_tracking=0 --expected_values_dir=/dev/shm/rocksdb_test/rocksdb_crashtest_expected --fail_if_options_file_error=1 --fifo_allow_compaction=0 --file_checksum_impl=xxh64 --flush_one_in=1000000 --format_version=2 --get_current_wal_file_one_in=0 --get_live_files_one_in=1000000 --get_property_one_in=1000000 --get_sorted_wal_files_one_in=0 --index_block_restart_interval=10 --index_type=2 --ingest_external_file_one_in=0 --initial_auto_readahead_size=16384 --iterpercent=10 --key_len_percent_dist=1,30,69 --level_compaction_dynamic_level_bytes=1 --lock_wal_one_in=1000000 --long_running_snapshots=1 --manual_wal_flush_one_in=0 --mark_for_compaction_one_file_in=10 --max_auto_readahead_size=524288 --max_background_compactions=1 --max_bytes_for_level_base=67108864 --max_key=100 --max_key_len=3 --max_manifest_file_size=1073741824 --max_write_batch_group_size_bytes=1048576 --max_write_buffer_number=3 --max_write_buffer_size_to_maintain=8388608 --memtable_max_range_deletions=1000 --memtable_prefix_bloom_size_ratio=0 --memtable_protection_bytes_per_key=2 --memtable_whole_key_filtering=0 --memtablerep=skip_list --min_write_buffer_number_to_merge=2 --mmap_read=1 --mock_direct_io=False --nooverwritepercent=1 --num_file_reads_for_auto_readahead=0 --open_files=-1 --open_metadata_write_fault_one_in=0 --open_read_fault_one_in=0 --open_write_fault_one_in=16 --ops_per_thread=100000000 --optimize_filters_for_memory=1 --paranoid_file_checks=0 --partition_filters=0 --partition_pinning=0 --pause_background_one_in=1000000 --periodic_compaction_seconds=0 --prefix_size=-1 --prefixpercent=0 --prepopulate_block_cache=1 --preserve_internal_time_seconds=0 --progress_reports=0 --read_fault_one_in=0 --readahead_size=0 --readpercent=50 --recycle_log_file_num=0 --reopen=0 --secondary_cache_fault_one_in=0 --set_options_one_in=0 --snapshot_hold_ops=100000 --sst_file_manager_bytes_per_sec=0 --sst_file_manager_bytes_per_truncate=0 --stats_dump_period_sec=0 --subcompactions=1 --sync=0 --sync_fault_injection=1 --target_file_size_base=16777216 --target_file_size_multiplier=1 --test_batches_snapshots=0 --top_level_index_pinning=0 --unpartitioned_pinning=1 --use_direct_io_for_flush_and_compaction=0 --use_direct_reads=0 --use_full_merge_v1=0 --use_get_entity=0 --use_merge=1 --use_multi_get_entity=0 --use_multiget=1 --use_put_entity_one_in=10 --use_write_buffer_manager=0 --user_timestamp_size=0 --value_size_mult=32 --verification_only=0 --verify_checksum=1 --verify_checksum_one_in=1000000 --verify_db_one_in=100000 --verify_file_checksums_one_in=1000000 --verify_iterator_with_expected_state_one_in=5 --verify_sst_unique_id_in_manifest=1 --wal_bytes_per_sync=524288 --wal_compression=zstd --write_buffer_size=33554432 --write_dbid_to_manifest=0 --write_fault_one_in=0 --writepercent=35 ``` Reviewed By: hx235 Differential Revision: D50825558 Pulled By: cbi42 fbshipit-source-id: 8468dc0444c112415a515af8291ef3abec8a42de	2023-10-31 07:39:41 -07:00
Hui Xiao	212b5bf826	Deep-copy Options in restored db for stress test to avoid race with SetOptions() (#12015 ) Summary: Context DB open will persist the `Options` in memory to options file and verify the file right after the write. The verification is done by comparing the options from parsing the written options file against the `Options` object in memory. Upon inconsistency, corruption such as https://github.com/facebook/rocksdb/blob/main/options/options_parser.cc#L725 will be returned. This verification assumes the `Options` object in memory is not changed from before the write till the verification. This assumption can break during [opening the restored db in stress test](`0f141352d8/db_stress_tool/db_stress_test_base.cc (L1784-L1799)`). This [line](`0f141352d8/db_stress_tool/db_stress_test_base.cc (L1770)`) makes it shares some pointer options (e.g, `std::shared_ptr<const FilterPolicy> filter_policy`) with other threads (e.g, SetOptions()) in db stress. And since https://github.com/facebook/rocksdb/pull/11838, filter_policy's field `bloom_before_level ` has now been mutable by SetOptions(). Therefore we started to see stress test failure like below: ``` Failure in DB::Open in backup/restore with: IO error: DB::Open() failed --- Unable to persist Options file: IO error: Unable to persist options.: Corruption: [RocksDBOptionsParser]:failed the verification on BlockBasedTable::: filter_policy.id Verification failed: Backup/restore failed: IO error: DB::Open() failed --- Unable to persist Options file: IO error: Unable to persist options.: Corruption: [RocksDBOptionsParser]:failed the verification on BlockBasedTable::: filter_policy.id db_stress: db_stress_tool/db_stress_test_base.cc:479: void rocksdb::StressTest::ProcessStatus(rocksdb::SharedState, std::string, rocksdb::Status) const: Assertion `false' failed. ``` Summary* This PR uses "deep copy" of the `options_` by CreateXXXFromString() to avoid sharing pointer options. Test plan Run the below db stress command that failed before this PR and pass after ``` ./db_stress --column_families=1 --threads=2 --preserve_unverified_changes=0 --acquire_snapshot_one_in=10000 --adaptive_readahead=0 --allow_data_in_errors=True --async_io=0 --auto_readahead_size=1 --avoid_flush_during_recovery=0 --avoid_unnecessary_blocking_io=0 --backup_max_size=104857600 --backup_one_in=10 --batch_protection_bytes_per_key=0 --block_protection_bytes_per_key=1 --block_size=16384 --bloom_before_level=2147483646 --bloom_bits=0 --bottommost_compression_type=disable --bottommost_file_compaction_delay=86400 --bytes_per_sync=0 --cache_index_and_filter_blocks=0 --cache_size=33554432 --cache_type=tiered_auto_hyper_clock_cache --charge_compression_dictionary_building_buffer=0 --charge_file_metadata=0 --charge_filter_construction=0 --charge_table_reader=1 --checkpoint_one_in=1000000 --checksum_type=kXXH3 --clear_column_family_one_in=0 --compact_files_one_in=1000000 --compact_range_one_in=1000000 --compaction_pri=2 --compaction_readahead_size=0 --compaction_ttl=0 --compressed_secondary_cache_ratio=0.3333333333333333 --compressed_secondary_cache_size=0 --compression_checksum=1 --compression_max_dict_buffer_bytes=0 --compression_max_dict_bytes=0 --compression_parallel_threads=1 --compression_type=lz4 --compression_use_zstd_dict_trainer=1 --compression_zstd_max_train_bytes=0 --continuous_verification_interval=0 --data_block_index_type=1 --db=/dev/shm/rocksdb_test/rocksdb_crashtest_blackbox --db_write_buffer_size=8388608 --delpercent=4 --delrangepercent=1 --destroy_db_initially=1 --detect_filter_construct_corruption=0 --disable_wal=0 --enable_compaction_filter=0 --enable_pipelined_write=0 --enable_thread_tracking=0 --expected_values_dir=/dev/shm/rocksdb_test/rocksdb_crashtest_expected --fail_if_options_file_error=1 --fifo_allow_compaction=1 --file_checksum_impl=big --flush_one_in=1000000 --format_version=2 --get_current_wal_file_one_in=0 --get_live_files_one_in=1000000 --get_property_one_in=1000000 --get_sorted_wal_files_one_in=0 --index_block_restart_interval=14 --index_type=0 --ingest_external_file_one_in=0 --initial_auto_readahead_size=524288 --iterpercent=10 --key_len_percent_dist=1,30,69 --level_compaction_dynamic_level_bytes=1 --lock_wal_one_in=1000000 --long_running_snapshots=1 --manual_wal_flush_one_in=1000 --mark_for_compaction_one_file_in=0 --max_auto_readahead_size=0 --max_background_compactions=20 --max_bytes_for_level_base=10485760 --max_key=2500 --max_key_len=3 --max_manifest_file_size=1073741824 --max_write_batch_group_size_bytes=16777216 --max_write_buffer_number=3 --max_write_buffer_size_to_maintain=4194304 --memtable_max_range_deletions=0 --memtable_prefix_bloom_size_ratio=0 --memtable_protection_bytes_per_key=0 --memtable_whole_key_filtering=0 --memtablerep=skip_list --min_write_buffer_number_to_merge=1 --mmap_read=1 --mock_direct_io=False --nooverwritepercent=1 --num_file_reads_for_auto_readahead=1 --open_files=500000 --open_metadata_write_fault_one_in=0 --open_read_fault_one_in=0 --open_write_fault_one_in=0 --ops_per_thread=100000000 --optimize_filters_for_memory=0 --paranoid_file_checks=1 --partition_filters=0 --partition_pinning=2 --pause_background_one_in=1000000 --periodic_compaction_seconds=0 --prefix_size=-1 --prefixpercent=0 --prepopulate_block_cache=0 --preserve_internal_time_seconds=3600 --progress_reports=0 --read_fault_one_in=1000 --readahead_size=0 --readpercent=50 --recycle_log_file_num=0 --reopen=0 --secondary_cache_fault_one_in=0 --secondary_cache_uri= --set_options_one_in=5 --snapshot_hold_ops=100000 --sst_file_manager_bytes_per_sec=104857600 --sst_file_manager_bytes_per_truncate=1048576 --stats_dump_period_sec=600 --subcompactions=3 --sync=0 --sync_fault_injection=0 --target_file_size_base=2097152 --target_file_size_multiplier=2 --test_batches_snapshots=0 --top_level_index_pinning=2 --unpartitioned_pinning=0 --use_direct_io_for_flush_and_compaction=0 --use_direct_reads=0 --use_full_merge_v1=0 --use_get_entity=0 --use_merge=1 --use_multi_get_entity=0 --use_multiget=1 --use_put_entity_one_in=0 --use_write_buffer_manager=0 --user_timestamp_size=0 --value_size_mult=32 --verification_only=0 --verify_checksum=1 --verify_checksum_one_in=1000000 --verify_db_one_in=100000 --verify_file_checksums_one_in=1000000 --verify_sst_unique_id_in_manifest=1 --wal_bytes_per_sync=524288 --wal_compression=zstd --write_buffer_size=4194304 --write_dbid_to_manifest=1 --write_fault_one_in=0 --writepercent=35 ``` Pull Request resolved: https://github.com/facebook/rocksdb/pull/12015 Reviewed By: pdillinger Differential Revision: D50666136 Pulled By: hx235 fbshipit-source-id: 804acc23aecb4eedfe5c44f732e86291f2420b2b	2023-10-27 17:07:39 -07:00
Changyu Bi	9ded9f789f	Fix db_stress FaultInjectionTestFS set up before DB open (#11958 ) Summary: We saw frequent stress test failures with error messages like: ``` Verification failed for column family 0 key ...: value_from_db: , value_from_expected: ..., msg: GetEntity verification: Value not found: NotFound: ``` One cause for this is that data in WAL is lost after a crash. We initialize FaultInjectionTestFS to be not direct writable when write_fault_injection is enabled (see code change). This can cause the first WAL created during DB open to be lost if a db_stress is killed before the first WAL is synced. This PR initializes FaultInjectionTestFS to be direct writable. Note that FaultInjectionTestFS will be configured propertly for write fault injection after DB open in `RunStressTestImpl()`. So this change should not affect write fault injection coverage. Pull Request resolved: https://github.com/facebook/rocksdb/pull/11958 Test Plan: a repro for the above bug: ``` Simulate crash before first WAL is sealed: --- a/db_stress_tool/db_stress_driver.cc +++ b/db_stress_tool/db_stress_driver.cc @@ -256,6 +256,7 @@ bool RunStressTestImpl(SharedState* shared) { fprintf(stderr, "Verification failed :(\n"); return false; } + exit(1); return true; } ./db_stress --clear_column_family_one_in=0 --column_families=1 --preserve_internal_time_seconds=60 --destroy_db_initially=0 --db=/dev/shm/rocksdb_crashtest_blackbox --db_write_buffer_size=2097152 --destroy_db_initially=0 --expected_values_dir=/dev/shm/rocksdb_crashtest_expected --reopen=0 --test_batches_snapshots=0 --threads=1 --ops_per_thread=100 --write_fault_one_in=1000 --sync_fault_injection=0 ./db_stress_main --clear_column_family_one_in=0 --column_families=1 --preserve_internal_time_seconds=60 --destroy_db_initially=0 --db=/dev/shm/rocksdb_crashtest_blackbox --db_write_buffer_size=2097152 --destroy_db_initially=0 --expected_values_dir=/dev/shm/rocksdb_crashtest_expected --reopen=0 --test_batches_snapshots=0 --sync_fault_injection=1 ``` Reviewed By: akankshamahajan15 Differential Revision: D50300347 Pulled By: cbi42 fbshipit-source-id: 3a4881d72197f5ece82364382a0100912e16c2d6	2023-10-14 13:33:55 -07:00
Changyu Bi	50b0879d50	Do not fail stress test when file ingestion return injected error (#11956 ) Summary: Currently, if file ingestion hit injected error, stress test is considered failed since it prints a message to stderr containing the keyword "error" and db_crashtest.py looks for it in stderr. This PR fixes it by print injected error to stdout. Pull Request resolved: https://github.com/facebook/rocksdb/pull/11956 Test Plan: Check future stress test runs. Reviewed By: akankshamahajan15 Differential Revision: D50293537 Pulled By: cbi42 fbshipit-source-id: e74915b1b3c6876a61ab6933c4529780362ec02b	2023-10-14 10:08:03 -07:00
Jay Huh	d2daa10afc	Fix crash_test_with_best_efforts_recovery (#11938 ) Summary: Thanks ltamasi and ajkr for initial investigations on the test failure. Per the investigations, the following scenario is likely causing the test to fail. 1. Recovery is needed (could be any reason during crash test) 2. Trying to recover from the latest manifest fails (likely due to read error injection) 3. DB opens with recovery from the next manifest which is different from step 2. 4. Expected state is based on the manifest we tried and failed in step 2. 5. Two manifests used in step 2 and 3 are confirmed to have difference in LSM trees (Thanks ltamasi again for the finding). ``` 2023/10/05-11:24:18.942189 56341 [db/version_set.cc:6079] Trying to recover from manifest: /dev/shm/rocksdb_test/rocksdb_crashtest_blackbox/MANIFEST-007184 ... 2023/10/05-11:24:18.978007 56341 [db/version_set.cc:6079] Trying to recover from manifest: /dev/shm/rocksdb_test/rocksdb_crashtest_blackbox/MANIFEST-007180 ``` ``` [ltamasi@devbig1024.prn1 /tmp/x]$ ldb manifest_dump --hex --path=MANIFEST-007184_renamed_ > 2 [ltamasi@devbig1024.prn1 /tmp/x]$ ldb manifest_dump --hex --path=MANIFEST-007180_renamed_ > 1 [ltamasi@devbig1024.prn1 /tmp/x]$ diff 1 2 --- 1 2023-10-09 10:29:16.966215207 -0700 +++ 2 2023-10-09 10:29:11.984241645 -0700 @@ -13,7 +13,7 @@ 7174:3950254[1875617 .. 2203952]['000000000003415B000000000000012B000000000000007D' seq:1906214, type:1 .. '000000000003CA59000000000000012B000000000000005C' seq:2039838, type:1] 7175:88060[2074748 .. 2203892]['000000000003CA6300000000000000CF78787878787878' seq:2167539, type:2 .. '000000000003D08F000000000000012B0000000000000130' seq:2112478, type:0] --- level 6 --- version# 1 --- - 7057:3132633[0 .. 2046144]['0000000000000009000000000000000978' seq:0, type:1 .. '0000000000005F8B000000000000012B00000000000002AC' seq:0, type:1] + 7219:2135565[0 .. 2046144]['0000000000000009000000000000000978' seq:0, type:1 .. '0000000000005F8B000000000000012B00000000000002AC' seq:0, type:1] 7061:827724[0 .. 2046131]['0000000000005F95000000000000000778787878787878' seq:0, type:1 .. '000000000000784F000000000000012B0000000000000113' seq:0, type:1] 6763:1352[0 .. 0]['000000000000784F000000000000012B0000000000000129' seq:0, type:1 .. '000000000000784F000000000000012B0000000000000129' seq:0, type:1] 7173:4812291[0 .. 2203957]['000000000000784F000000000000012B0000000000000138' seq:0, type:1 .. '0000000000020FAE787878787878' seq:0, type:1] @@ -77,4 +77,4 @@ --- level 61 --- version# 1 --- --- level 62 --- version# 1 --- --- level 63 --- version# 1 --- -next_file_number 7182 last_sequence 2203963 prev_log_number 0 max_column_family 0 min_log_number_to_keep 7015 +next_file_number 7221 last_sequence 2203963 prev_log_number 0 max_column_family 0 min_log_number_to_keep 7015 ``` We have two options to fix this. Either skip verification against expected state or disable read injection when BE recovery is enabled. I chose to skip verification against expected state per discussion. (See comments in this PR) Please note that some linter changes were included in this PR. Pull Request resolved: https://github.com/facebook/rocksdb/pull/11938 Test Plan: ``` TEST_TMPDIR=/dev/shm/rocksdb make crash_test_with_best_efforts_recovery ``` Reviewed By: ltamasi Differential Revision: D50136341 Pulled By: jaykorean fbshipit-source-id: ac7434d592aebc148bfc3a4fcaa34936f136b95c	2023-10-11 14:26:10 -07:00
anand76	20b4f1356e	Enable write fault injection in db_stress (#11924 ) Summary: This PR depends on https://github.com/facebook/rocksdb/issues/11879 . Enable write fault injection for the basic whitebox, blackbox, and cf_consistency modes. For other test modes like multiops_txn, best_efforts_recovery etc., leave it disabled for now until we can do more testing. Pull Request resolved: https://github.com/facebook/rocksdb/pull/11924 Reviewed By: ajkr Differential Revision: D50178252 Pulled By: anand1976 fbshipit-source-id: 5794f81c14cded1eb28762b2de818dfff1c1a34c	2023-10-11 11:28:00 -07:00
anand76	5b11f5a3a2	Add TieredCache and compressed cache capacity change to db_stress (#11935 ) Summary: Add `TieredCache` to the cache types tested by db_stress. Also add compressed secondary cache capacity change, and `WriteBufferManager` integration with `TieredCache` for memory charging. Pull Request resolved: https://github.com/facebook/rocksdb/pull/11935 Test Plan: Run whitebox/blackbox crash tests locally Reviewed By: akankshamahajan15 Differential Revision: D50135365 Pulled By: anand1976 fbshipit-source-id: 7d73ed00c00a0953d86e49f35cce6bd550ba00f1	2023-10-10 13:12:18 -07:00
Andrew Kryczka	8a9cfd5292	Make stopped writes block on recovery (#11879 ) Summary: Relaxed the constraints for blocking when writes are stopped. When a recovery is already being attempted, we might as well let `!no_slowdown` writes wait on it in case it succeeds. This makes the user-visible behavior consistent across recovery flush and non-recovery flush. This enables `db_stress` to inject retryable (soft) flush read errors without having to handle user write failures. I changed `db_stress` a bit to permit injected errors in much more foreground operations as more admin operations (like `GetLiveFiles()`) can fail on a retryable error during flush. Pull Request resolved: https://github.com/facebook/rocksdb/pull/11879 Reviewed By: anand1976 Differential Revision: D49571196 Pulled By: ajkr fbshipit-source-id: 5d516d6faf20d2c6bfe0594ab4f2706bca6d69b0	2023-10-10 06:29:01 -07:00

1 2 3 4 5 ...

405 commits