rocksdb

Commit Graph

Author	SHA1	Message	Date
Jay Zhuang	1e86d424e4	Tiered storage stress test (#10493 ) Summary: Add Tiered storage stress test and db_bench option Pull Request resolved: https://github.com/facebook/rocksdb/pull/10493 Test Plan: new crashtest: https://app.circleci.com/pipelines/github/facebook/rocksdb/16905/workflows/68c2967c-9274-434f-8506-1403cf441ead Reviewed By: ajkr Differential Revision: D38481892 Pulled By: jay-zhuang fbshipit-source-id: 217a0be4acb93d420222e6ede2a1290d9f464776	2022-08-08 13:08:35 -07:00
gitbw95	e1b176d274	Add CompressedSecondaryCache into stress test (#10442 ) Summary: The secondary cache is randomly disabled or enabled with CompressedSecondaryCache. Pull Request resolved: https://github.com/facebook/rocksdb/pull/10442 Test Plan: - To test that the CompressedSecondaryCache is used and the stress test runs successfully, run `make -j24 CRASH_TEST_EXT_ARGS=—duration=960 blackbox_crash_test ` Reviewed By: anand1976 Differential Revision: D38290796 Pulled By: gitbw95 fbshipit-source-id: bb7027b39e0ed9c0c62835abe09e759898130ec8	2022-08-01 11:01:03 -07:00
Guido Tagliavini Ponce	9d7de6517c	Towards a production-quality ClockCache (#10418 ) Summary: In this PR we bring ClockCache closer to production quality. We implement the following changes: 1. Fixed a few bugs in ClockCache. 2. ClockCache now fully supports ``strict_capacity_limit == false``: When an insertion over capacity is commanded, we allocate a handle separately from the hash table. 3. ClockCache now runs on almost every test in cache_test. The only exceptions are a test where either the LRU policy is required, and a test that dynamically increases the table capacity. 4. ClockCache now supports dynamically decreasing capacity via SetCapacity. (This is easy: we shrink the capacity upper bound and run the clock algorithm.) 5. Old FastLRUCache tests in lru_cache_test.cc are now also used on ClockCache. As a byproduct of 1. and 2. we are able to turn on ClockCache in the stress tests. Pull Request resolved: https://github.com/facebook/rocksdb/pull/10418 Test Plan: - ``make -j24 USE_CLANG=1 COMPILE_WITH_ASAN=1 COMPILE_WITH_UBSAN=1 check`` - ``make -j24 USE_CLANG=1 COMPILE_WITH_TSAN=1 check`` - ``make -j24 USE_CLANG=1 COMPILE_WITH_ASAN=1 COMPILE_WITH_UBSAN=1 CRASH_TEST_EXT_ARGS="--duration=960 --cache_type=clock_cache" blackbox_crash_test_with_atomic_flush`` - ``make -j24 USE_CLANG=1 COMPILE_WITH_TSAN=1 CRASH_TEST_EXT_ARGS="--duration=960 --cache_type=clock_cache" blackbox_crash_test_with_atomic_flush`` Reviewed By: pdillinger Differential Revision: D38170673 Pulled By: guidotag fbshipit-source-id: 508987b9dc9d9d68f1a03eefac769820b680340a	2022-07-26 17:42:03 -07:00
Gang Liao	ec4ebeff30	Support prepopulating/warming the blob cache (#10298 ) Summary: Many workloads have temporal locality, where recently written items are read back in a short period of time. When using remote file systems, this is inefficient since it involves network traffic and higher latencies. Because of this, we would like to support prepopulating the blob cache during flush. This task is a part of https://github.com/facebook/rocksdb/issues/10156 Pull Request resolved: https://github.com/facebook/rocksdb/pull/10298 Reviewed By: ltamasi Differential Revision: D37908743 Pulled By: gangliao fbshipit-source-id: 9feaed234bc719d38f0c02975c1ad19fa4bb37d1	2022-07-17 07:13:59 -07:00
Yanqin Jin	7679f22a89	Add coverage for the combination of write-prepared and WAL recycling (#10350 ) Summary: as title. Test plan - make check - CI on PR - TEST_TMPDIR=/dev/shm make crash_test_with_multiops_wp_txn (tested with successful run) Pull Request resolved: https://github.com/facebook/rocksdb/pull/10350 Reviewed By: ajkr Differential Revision: D37792872 Pulled By: riversand963 fbshipit-source-id: ff064093b7f715d0acf387af2e3ae87b1278b52b	2022-07-12 13:17:21 -07:00
Yanqin Jin	2f13f5f7d0	Add coverage for timestamped snapshot to MultiOpsTxnsStressTest (#10325 ) Summary: As title. Pull Request resolved: https://github.com/facebook/rocksdb/pull/10325 Test Plan: ```bash TEST_TMPDIR=/dev/shm/rocksdb/ make crash_test_with_multiops_wc_txn TEST_TMPDIR=/dev/shm/rocksdb/ make crash_test_with_txn ``` Reviewed By: akankshamahajan15 Differential Revision: D37688742 Pulled By: riversand963 fbshipit-source-id: e198ace921898af63f99e869568c1a7bbf69f1a4	2022-07-11 12:44:08 -07:00
zczhu	e716bda010	Add FLAGS_compaction_pri into crash_test (#10255 ) Summary: Add FLAGS_compaction_pri into correctness test Pull Request resolved: https://github.com/facebook/rocksdb/pull/10255 Test Plan: run crash_test with FLAGS_compaction_pri Reviewed By: ajkr Differential Revision: D37510372 Pulled By: littlepig2013 fbshipit-source-id: 73d93a0a047d0c3993c8a512383dd6ee6acef641	2022-06-30 22:56:58 -07:00
Guido Tagliavini Ponce	57a0e2f304	Clock cache (#10273 ) Summary: This is the initial step in the development of a lock-free clock cache. This PR includes the base hash table design (which we mostly ported over from FastLRUCache) and the clock eviction algorithm. Importantly, it's still _not_ lock-free---all operations use a shard lock. Besides the locking, there are other features left as future work: - Remove keys from the handles. Instead, use 128-bit bijective hashes of them for handle comparisons, probing (we need two 32-bit hashes of the key for double hashing) and sharding (we need one 6-bit hash). - Remove the clock_usage_ field, which is updated on every lookup. Even if it were atomically updated, it could cause memory invalidations across cores. - Middle insertions into the clock list. - A test that exercises the clock eviction policy. - Update the Java API of ClockCache and Java calls to C++. Along the way, we improved the code and comments quality of FastLRUCache. These changes are relatively minor. Pull Request resolved: https://github.com/facebook/rocksdb/pull/10273 Test Plan: ``make -j24 check`` Reviewed By: pdillinger Differential Revision: D37522461 Pulled By: guidotag fbshipit-source-id: 3d70b737dbb70dcf662f00cef8c609750f083943	2022-06-29 21:50:39 -07:00
Yanqin Jin	d3de59255a	Enable compaction filter for db_stress with user-defined timestamp (#10259 ) Summary: Before this PR, when user-defined timestamp is enabled, db_stress disables compaction filter. This is no longer necessary after this PR, since the `DbStressCompactionFilter` is now aware of the presence of timestamps. Pull Request resolved: https://github.com/facebook/rocksdb/pull/10259 Test Plan: TEST_TMPDIR=/dev/shm make crash_test_with_ts Reviewed By: ajkr Differential Revision: D37459692 Pulled By: riversand963 fbshipit-source-id: 8fe62e90a63bd9317fe1bb95a2b4984080c9e5ef	2022-06-27 11:53:09 -07:00
Andrew Kryczka	f322f273b0	Temporarily disable mempurge in crash test (#10252 ) Summary: Need to disable it for now as CI is failing, particularly `MultiOpsTxnsStressTest`. Investigation details in internal task T124324915. This PR disables mempurge more widely than `MultiOpsTxnsStressTest` until we know the issue is contained to that particular test. Pull Request resolved: https://github.com/facebook/rocksdb/pull/10252 Reviewed By: riversand963 Differential Revision: D37432948 Pulled By: ajkr fbshipit-source-id: d0cf5b0e0ec7c3142c382a0347f35a4c34f4607a	2022-06-24 17:11:27 -07:00
Gang Liao	2352e2dfda	Add the blob cache to the stress tests and the benchmarking tool (#10202 ) Summary: In order to facilitate correctness and performance testing, we would like to add the new blob cache to our stress test tool `db_stress` and our continuously running crash test script `db_crashtest.py`, as well as our synthetic benchmarking tool `db_bench` and the BlobDB performance testing script `run_blob_bench.sh`. As part of this task, we would also like to utilize these benchmarking tools to get some initial performance numbers about the effectiveness of caching blobs. This PR is a part of https://github.com/facebook/rocksdb/issues/10156 Pull Request resolved: https://github.com/facebook/rocksdb/pull/10202 Reviewed By: ltamasi Differential Revision: D37325739 Pulled By: gangliao fbshipit-source-id: deb65d0d414502270dd4c324d987fd5469869fa8	2022-06-22 16:04:03 -07:00
Peter Dillinger	84210c9489	Add data block hash index to crash test, fix MultiGet issue (#10220 ) Summary: There was a bug in the MultiGet enhancement in https://github.com/facebook/rocksdb/issues/9899 with data block hash index, which was not caught because data block hash index was never added to stress tests. This change fixes both issues. Fixes https://github.com/facebook/rocksdb/issues/10186 I intend to pick this into the 7.4.0 release candidate Pull Request resolved: https://github.com/facebook/rocksdb/pull/10220 Test Plan: Failure quickly reproduces in crash test with kDataBlockBinaryAndHash, and does not seem to with the fix. Reproducing the failure with a unit test I believe would be too tricky and fragile to be worthwhile. Reviewed By: anand1976 Differential Revision: D37315647 Pulled By: pdillinger fbshipit-source-id: 9f648265bba867275edc752f7a56611a59401cba	2022-06-21 16:23:58 -07:00
Guido Tagliavini Ponce	3afed7408c	Replace per-shard chained hash tables with open-addressing scheme (#10194 ) Summary: In FastLRUCache, we replace the current chained per-shard hash table by an open-addressing hash table. In particular, this allows us to preallocate all handles. Because all handles are preallocated, this implementation doesn't support strict_capacity_limit = false (i.e., allowing insertions beyond the predefined capacity). This clashes with current assumptions of some tests, namely two tests in cache_test and the crash tests. We have disabled these for now. Pull Request resolved: https://github.com/facebook/rocksdb/pull/10194 Test Plan: ``make -j24 check`` Reviewed By: pdillinger Differential Revision: D37296770 Pulled By: guidotag fbshipit-source-id: 232ff1b8260331d868ebf4e3e5d8ad709390b0ad	2022-06-21 08:45:04 -07:00
Andrew Kryczka	5d6005c780	Add WriteOptions::protection_bytes_per_key (#10037 ) Summary: Added an option, `WriteOptions::protection_bytes_per_key`, that controls how many bytes per key we use for integrity protection in `WriteBatch`. It takes effect when `WriteBatch::GetProtectionBytesPerKey() == 0`. Currently the only supported value is eight. Invoking a user API with it set to any other nonzero value will result in `Status::NotSupported` returned to the user. There is also a bug fix for integrity protection with `inplace_callback`, where we forgot to take into account the possible change in varint length when calculating KV checksum for the final encoded buffer. Pull Request resolved: https://github.com/facebook/rocksdb/pull/10037 Test Plan: - Manual - Set default value of `WriteOptions::protection_bytes_per_key` to eight and ran `make check -j24` - Enabled in MyShadow for 1+ week - Automated - Unit tests have a `WriteMode` that enables the integrity protection via `WriteOptions` - Crash test - in most cases, use `WriteOptions::protection_bytes_per_key` to enable integrity protection Reviewed By: cbi42 Differential Revision: D36614569 Pulled By: ajkr fbshipit-source-id: 8650087ceac9b61b560f1e5fafe5e1baf9c725fb	2022-06-16 23:10:07 -07:00
Peter Dillinger	126c223714	Remove deprecated block-based filter (#10184 ) Summary: In https://github.com/facebook/rocksdb/issues/9535, release 7.0, we hid the old block-based filter from being created using the public API, because of its inefficiency. Although we normally maintain read compatibility on old DBs forever, filters are not required for reading a DB, only for optimizing read performance. Thus, it should be acceptable to remove this code and the substantial maintenance burden it carries as useful features are developed and validated (such as user timestamp). This change completely removes the code for reading and writing the old block-based filters, net removing about 1370 lines of code no longer needed. Options removed from testing / benchmarking tools. The prior existence is only evident in a couple of places: * `CacheEntryRole::kDeprecatedFilterBlock` - We can update this public API enum in a major release to minimize source code incompatibilities. * A warning is logged when an old table file is opened that used the old block-based filter. This is provided as a courtesy, and would be a pain to unit test, so manual testing should suffice. Unfortunately, sst_dump does not tell you whether a file uses block-based filter, and the structure of the code makes it very difficult to fix. * To detect that case, `kObsoleteFilterBlockPrefix` (renamed from `kFilterBlockPrefix`) for metaindex is maintained (for now). Other notes: * In some cases where numbers are associated with filter configurations, we have had to update the assigned numbers so that they all correspond to something that exists. * Fixed potential stat counting bug by assuming `filter_checked = false` for cases like `filter == nullptr` rather than assuming `filter_checked = true` * Removed obsolete `block_offset` and `prefix_extractor` parameters from several functions. * Removed some unnecessary checks `if (!table_prefix_extractor() && !prefix_extractor)` because the caller guarantees the prefix extractor exists and is compatible Pull Request resolved: https://github.com/facebook/rocksdb/pull/10184 Test Plan: tests updated, manually test new warning in LOG using base version to generate a DB Reviewed By: riversand963 Differential Revision: D37212647 Pulled By: pdillinger fbshipit-source-id: 06ee020d8de3b81260ffc36ad0c1202cbf463a80	2022-06-16 15:51:33 -07:00
Yanqin Jin	ce419c0f10	Allow db_bench and db_stress to set `allow_data_in_errors` (#10171 ) Summary: There is `Options::allow_data_in_errors` that controls whether RocksDB is allowed to log data, e.g. key, value, etc in LOG files. It is false by default. However, in db_bench and db_stress, it is often ok to log data because there is no concern about privacy. This PR allows db_stress and db_bench to set this option on the command line, while it remains false by default. Furthermore, make crash/recovery test driven by db_crashtest.py to opt-in. Pull Request resolved: https://github.com/facebook/rocksdb/pull/10171 Test Plan: Stress test and db_bench Reviewed By: hx235 Differential Revision: D37163787 Pulled By: riversand963 fbshipit-source-id: 0242f24d292ba15b6faf8ff903963b85d3e011f8	2022-06-15 12:38:04 -07:00
Hui Xiao	d665afdbf3	Account memory of FileMetaData in global memory limit (#9924 ) Summary: Context/Summary: As revealed by heap profiling, allocation of `FileMetaData` for [newly created file added to a Version](https://github.com/facebook/rocksdb/pull/9924/files#diff-a6aa385940793f95a2c5b39cc670bd440c4547fa54fd44622f756382d5e47e43R774) can consume significant heap memory. This PR is to account that toward our global memory limit based on block cache capacity. Pull Request resolved: https://github.com/facebook/rocksdb/pull/9924 Test Plan: - Previous `make check` verified there are only 2 places where the memory of the allocated `FileMetaData` can be released - New unit test `TEST_P(ChargeFileMetadataTestWithParam, Basic)` - db bench (CPU cost of `charge_file_metadata` in write and compact) - write micros/op: -0.24% : `TEST_TMPDIR=/dev/shm/testdb ./db_bench -benchmarks=fillseq -db=$TEST_TMPDIR -charge_file_metadata=1 (remove this option for pre-PR) -disable_auto_compactions=1 -write_buffer_size=100000 -num=4000000 \| egrep 'fillseq'` - compact micros/op -0.87% : `TEST_TMPDIR=/dev/shm/testdb ./db_bench -benchmarks=fillseq -db=$TEST_TMPDIR -charge_file_metadata=1 -disable_auto_compactions=1 -write_buffer_size=100000 -num=4000000 -numdistinct=1000 && ./db_bench -benchmarks=compact -db=$TEST_TMPDIR -use_existing_db=1 -charge_file_metadata=1 -disable_auto_compactions=1 \| egrep 'compact'` table 1 - write #-run \| (pre-PR) avg micros/op \| std micros/op \| (post-PR) micros/op \| std micros/op \| change (%) -- \| -- \| -- \| -- \| -- \| -- 10 \| 3.9711 \| 0.264408 \| 3.9914 \| 0.254563 \| 0.5111933721 20 \| 3.83905 \| 0.0664488 \| 3.8251 \| 0.0695456 \| -0.3633711465 40 \| 3.86625 \| 0.136669 \| 3.8867 \| 0.143765 \| 0.5289363078 80 \| 3.87828 \| 0.119007 \| 3.86791 \| 0.115674 \| -0.2673865734 160 \| 3.87677 \| 0.162231 \| 3.86739 \| 0.16663 \| -0.2419539978 table 2 - compact #-run \| (pre-PR) avg micros/op \| std micros/op \| (post-PR) micros/op \| std micros/op \| change (%) -- \| -- \| -- \| -- \| -- \| -- 10 \| 2,399,650.00 \| 96,375.80 \| 2,359,537.00 \| 53,243.60 \| -1.67 20 \| 2,410,480.00 \| 89,988.00 \| 2,433,580.00 \| 91,121.20 \| 0.96 40 \| 2.41E+06 \| 121811 \| 2.39E+06 \| 131525 \| -0.96 80 \| 2.40E+06 \| 134503 \| 2.39E+06 \| 108799 \| -0.78 - stress test: `python3 tools/db_crashtest.py blackbox --charge_file_metadata=1 --cache_size=1` killed as normal Reviewed By: ajkr Differential Revision: D36055583 Pulled By: hx235 fbshipit-source-id: b60eab94707103cb1322cf815f05810ef0232625	2022-06-14 13:06:40 -07:00
Yanqin Jin	1777e5f7e9	Snapshots with user-specified timestamps (#9879 ) Summary: In RocksDB, keys are associated with (internal) sequence numbers which denote when the keys are written to the database. Sequence numbers in different RocksDB instances are unrelated, thus not comparable. It is nice if we can associate sequence numbers with their corresponding actual timestamps. One thing we can do is to support user-defined timestamp, which allows the applications to specify the format of custom timestamps and encode a timestamp with each key. More details can be found at https://github.com/facebook/rocksdb/wiki/User-defined-Timestamp-%28Experimental%29. This PR provides a different but complementary approach. We can associate rocksdb snapshots (defined in https://github.com/facebook/rocksdb/blob/7.2.fb/include/rocksdb/snapshot.h#L20) with user-specified timestamps. Since a snapshot is essentially an object representing a sequence number, this PR establishes a bi-directional mapping between sequence numbers and timestamps. In the past, snapshots are usually taken by readers. The current super-version is grabbed, and a `rocksdb::Snapshot` object is created with the last published sequence number of the super-version. You can see that the reader actually has no good idea of what timestamp to assign to this snapshot, because by the time the `GetSnapshot()` is called, an arbitrarily long period of time may have already elapsed since the last write, which is when the last published sequence number is written. This observation motivates the creation of "timestamped" snapshots on the write path. Currently, this functionality is exposed only to the layer of `TransactionDB`. Application can tell RocksDB to create a snapshot when a transaction commits, effectively associating the last sequence number with a timestamp. It is also assumed that application will ensure any two snapshots with timestamps should satisfy the following: ``` snapshot1.seq < snapshot2.seq iff. snapshot1.ts < snapshot2.ts ``` If the application can guarantee that when a reader takes a timestamped snapshot, there is no active writes going on in the database, then we also allow the user to use a new API `TransactionDB::CreateTimestampedSnapshot()` to create a snapshot with associated timestamp. Code example ```cpp // Create a timestamped snapshot when committing transaction. txn->SetCommitTimestamp(100); txn->SetSnapshotOnNextOperation(); txn->Commit(); // A wrapper API for convenience Status Transaction::CommitAndTryCreateSnapshot( std::shared_ptr<TransactionNotifier> notifier, TxnTimestamp ts, std::shared_ptr<const Snapshot>* ret); // Create a timestamped snapshot if caller guarantees no concurrent writes std::pair<Status, std::shared_ptr<const Snapshot>> snapshot = txn_db->CreateTimestampedSnapshot(100); ``` The snapshots created in this way will be managed by RocksDB with ref-counting and potentially shared with other readers. We provide the following APIs for readers to retrieve a snapshot given a timestamp. ```cpp // Return the timestamped snapshot correponding to given timestamp. If ts is // kMaxTxnTimestamp, then we return the latest timestamped snapshot if present. // Othersise, we return the snapshot whose timestamp is equal to `ts`. If no // such snapshot exists, then we return null. std::shared_ptr<const Snapshot> TransactionDB::GetTimestampedSnapshot(TxnTimestamp ts) const; // Return the latest timestamped snapshot if present. std::shared_ptr<const Snapshot> TransactionDB::GetLatestTimestampedSnapshot() const; ``` We also provide two additional APIs for stats collection and reporting purposes. ```cpp Status TransactionDB::GetAllTimestampedSnapshots( std::vector<std::shared_ptr<const Snapshot>>& snapshots) const; // Return timestamped snapshots whose timestamps fall in [ts_lb, ts_ub) and store them in `snapshots`. Status TransactionDB::GetTimestampedSnapshots( TxnTimestamp ts_lb, TxnTimestamp ts_ub, std::vector<std::shared_ptr<const Snapshot>>& snapshots) const; ``` To prevent the number of timestamped snapshots from growing infinitely, we provide the following API to release timestamped snapshots whose timestamps are older than or equal to a given threshold. ```cpp void TransactionDB::ReleaseTimestampedSnapshotsOlderThan(TxnTimestamp ts); ``` Before shutdown, RocksDB will release all timestamped snapshots. Comparison with user-defined timestamp and how they can be combined: User-defined timestamp persists every key with a timestamp, while timestamped snapshots maintain a volatile mapping between snapshots (sequence numbers) and timestamps. Different internal keys with the same user key but different timestamps will be treated as different by compaction, thus a newer version will not hide older versions (with smaller timestamps) unless they are eligible for garbage collection. In contrast, taking a timestamped snapshot at a certain sequence number and timestamp prevents all the keys visible in this snapshot from been dropped by compaction. Here, visible means (seq < snapshot and most recent). The timestamped snapshot supports the semantics of reading at an exact point in time. Timestamped snapshots can also be used with user-defined timestamp. Pull Request resolved: https://github.com/facebook/rocksdb/pull/9879 Test Plan: ``` make check TEST_TMPDIR=/dev/shm make crash_test_with_txn ``` Reviewed By: siying Differential Revision: D35783919 Pulled By: riversand963 fbshipit-source-id: 586ad905e169189e19d3bfc0cb0177a7239d1bd4	2022-06-10 16:07:03 -07:00
Akanksha Mahajan	ecfd4aef0c	Enable wal_compression in crash_tests (#10141 ) Summary: Same as title Pull Request resolved: https://github.com/facebook/rocksdb/pull/10141 Test Plan: ``` export CRASH_TEST_EXT_ARGS=" --wal_compression=zstd" make crash_test -j ``` Reviewed By: riversand963 Differential Revision: D37042810 Pulled By: akankshamahajan15 fbshipit-source-id: 53f0793d78241f1b5c954dcc808cb4c0a3e9172a	2022-06-09 12:08:01 -07:00
Yanqin Jin	f890527b16	Update test for secondary instance in stress test (#10121 ) Summary: This PR updates secondary instance testing in stress test by default. A background thread will be started (disabled by default), running a secondary instance tailing the logs of the primary. Periodically (every 1 sec), this thread calls `TryCatchUpWithPrimary()` and uses point lookup or range scan to read some random keys with only very basic verification to make sure no assertion failure is triggered. Thanks to https://github.com/facebook/rocksdb/issues/10061 , we can enable secondary instance when user-defined timestamp is enabled. Also removed a less useful test configuration, `secondary_catch_up_one_in`. This is very similar to the periodic catch-up. In the last commit, I decided not to enable it now, but just update the tests, since secondary instance does not work well when the underlying file is renamed by primary, e.g. SstFileManager. Pull Request resolved: https://github.com/facebook/rocksdb/pull/10121 Test Plan: ``` TEST_TMPDIR=/dev/shm/rocksdb make crash_test TEST_TMPDIR=/dev/shm/rocksdb make crash_test_with_ts TEST_TMPDIR=/dev/shm/rocksdb make crash_test_with_atomic_flush ``` Reviewed By: ajkr Differential Revision: D36939458 Pulled By: riversand963 fbshipit-source-id: 1c065b7efc3690fc341569b9d369a5cbd8ef6b3e	2022-06-07 21:07:47 -07:00
Yanqin Jin	2b3c50c429	Temporarily disable wal compression (#10108 ) Summary: Will re-enable after fixing the bug in https://github.com/facebook/rocksdb/issues/10099 and https://github.com/facebook/rocksdb/issues/10097. Right now, the priority is https://github.com/facebook/rocksdb/issues/10087, but the bug in WAL compression prevents the mini crash test from passing. Pull Request resolved: https://github.com/facebook/rocksdb/pull/10108 Reviewed By: pdillinger Differential Revision: D36897214 Pulled By: riversand963 fbshipit-source-id: d64dc52738222d5f66003f7731dc46eaeed812be	2022-06-03 10:22:52 -07:00
Gang Liao	e6432dfd4c	Make it possible to enable blob files starting from a certain LSM tree level (#10077 ) Summary: Currently, if blob files are enabled (i.e. `enable_blob_files` is true), large values are extracted both during flush/recovery (when SST files are written into level 0 of the LSM tree) and during compaction into any LSM tree level. For certain use cases that have a mix of short-lived and long-lived values, it might make sense to support extracting large values only during compactions whose output level is greater than or equal to a specified LSM tree level (e.g. compactions into L1/L2/... or above). This could reduce the space amplification caused by large values that are turned into garbage shortly after being written at the price of some write amplification incurred by long-lived values whose extraction to blob files is delayed. In order to achieve this, we would like to do the following: - Add a new configuration option `blob_file_starting_level` (default: 0) to `AdvancedColumnFamilyOptions` (and `MutableCFOptions` and extend the related logic) - Instantiate `BlobFileBuilder` in `BuildTable` (used during flush and recovery, where the LSM tree level is L0) and `CompactionJob` iff `enable_blob_files` is set and the LSM tree level is `>= blob_file_starting_level` - Add unit tests for the new functionality, and add the new option to our stress tests (`db_stress` and `db_crashtest.py` ) - Add the new option to our benchmarking tool `db_bench` and the BlobDB benchmark script `run_blob_bench.sh` - Add the new option to the `ldb` tool (see https://github.com/facebook/rocksdb/wiki/Administration-and-Data-Access-Tool) - Ideally extend the C and Java bindings with the new option - Update the BlobDB wiki to document the new option. Pull Request resolved: https://github.com/facebook/rocksdb/pull/10077 Reviewed By: ltamasi Differential Revision: D36884156 Pulled By: gangliao fbshipit-source-id: 942bab025f04633edca8564ed64791cb5e31627d	2022-06-02 20:04:33 -07:00
Guido Tagliavini Ponce	b4d0e041d0	Add support for FastLRUCache in stress and crash tests. (#10081 ) Summary: Stress tests can run with the experimental FastLRUCache. Crash tests randomly choose between LRUCache and FastLRUCache. Since only LRUCache supports a secondary cache, we validate the `--secondary_cache_uri` and `--cache_type` flags---when `--secondary_cache_uri` is set, the `--cache_type` is set to `lru_cache`. Pull Request resolved: https://github.com/facebook/rocksdb/pull/10081 Test Plan: - To test that the FastLRUCache is used and the stress test runs successfully, run `make -j24 CRASH_TEST_EXT_ARGS=—duration=960 blackbox_crash_test_with_atomic_flush`. The cache type should sometimes be `fast_lru_cache`. - To test the flag validation, run `make -j24 CRASH_TEST_EXT_ARGS="--duration=960 --secondary_cache_uri=x" blackbox_crash_test_with_atomic_flush` multiple times. The test will always be aborted (which is okay). Check that the cache type is always `lru_cache`. Reviewed By: anand1976 Differential Revision: D36839908 Pulled By: guidotag fbshipit-source-id: ebcdfdcd12ec04c96c09ae5b9c9d1e613bdd1725	2022-06-01 18:00:28 -07:00
Andrew Kryczka	f6e45382e9	Disable file ingestion in crash test for CF consistency (#10067 ) Summary: Pull Request resolved: https://github.com/facebook/rocksdb/pull/10067 Reviewed By: jay-zhuang Differential Revision: D36727948 Pulled By: ajkr fbshipit-source-id: a3502730412c01ba63d822a5d4bf56f8bae8fcb2	2022-05-26 17:41:30 -07:00
Andrew Kryczka	91ba7837b7	Enable IngestExternalFile() in crash test (#9357 ) Summary: Thanks to https://github.com/facebook/rocksdb/issues/9919 and https://github.com/facebook/rocksdb/issues/10051 the known bugs in file ingestion (besides mmap read + file checksum) are fixed. Now we can try again to enable file ingestion in crash test. Pull Request resolved: https://github.com/facebook/rocksdb/pull/9357 Test Plan: stress file ingestion heavily for an hour: `$ TEST_TMPDIR=/dev/shm python3 tools/db_crashtest.py blackbox --max_key=1000000 --ingest_external_file_one_in=100 --duration=3600 --interval=20 --write_buffer_size=524288 --target_file_size_base=524288 --max_bytes_for_level_base=2097152` Reviewed By: riversand963 Differential Revision: D33410746 Pulled By: ajkr fbshipit-source-id: d276431390995a67f68390d61c06a40945fdd280	2022-05-26 10:31:37 -07:00
Yanqin Jin	9901e7f681	Enable checkpoint and backup in db_stress when timestamp is enabled (#10047 ) Summary: After https://github.com/facebook/rocksdb/issues/10030 and https://github.com/facebook/rocksdb/issues/10004, we can enable checkpoint and backup in stress tests when user-defined timestamp is enabled. This PR has no production risk. Pull Request resolved: https://github.com/facebook/rocksdb/pull/10047 Test Plan: ``` TEST_TMPDIR=/dev/shm make crash_test_with_ts ``` Reviewed By: jowlyzhang Differential Revision: D36641565 Pulled By: riversand963 fbshipit-source-id: d86c9d87efcc34c32d1aa176af691d32b897644a	2022-05-24 18:25:43 -07:00
Changyu Bi	cc23b46da1	Support using ZDICT_finalizeDictionary to generate zstd dictionary (#9857 ) Summary: An untrained dictionary is currently simply the concatenation of several samples. The ZSTD API, ZDICT_finalizeDictionary(), can improve such a dictionary's effectiveness at low cost. This PR changes how dictionary is created by calling the ZSTD ZDICT_finalizeDictionary() API instead of creating raw content dictionary (when max_dict_buffer_bytes > 0), and pass in all buffered uncompressed data blocks as samples. Pull Request resolved: https://github.com/facebook/rocksdb/pull/9857 Test Plan: #### db_bench test for cpu/memory of compression+decompression and space saving on synthetic data: Set up: change the parameter [here](`fb9a167a55/tools/db_bench_tool.cc (L1766)`) to 16384 to make synthetic data more compressible. ``` # linked local ZSTD with version 1.5.2 # DEBUG_LEVEL=0 ROCKSDB_NO_FBCODE=1 ROCKSDB_DISABLE_ZSTD=1 EXTRA_CXXFLAGS="-DZSTD_STATIC_LINKING_ONLY -DZSTD -I/data/users/changyubi/install/include/" EXTRA_LDFLAGS="-L/data/users/changyubi/install/lib/ -l:libzstd.a" make -j32 db_bench dict_bytes=16384 train_bytes=1048576 echo "========== No Dictionary ==========" TEST_TMPDIR=/dev/shm ./db_bench -benchmarks=filluniquerandom,compact -num=10000000 -compression_type=zstd -compression_max_dict_bytes=0 -block_size=4096 -max_background_jobs=24 -memtablerep=vector -allow_concurrent_memtable_write=false -disable_wal=true -max_write_buffer_number=8 >/dev/null 2>&1 TEST_TMPDIR=/dev/shm /usr/bin/time ./db_bench -use_existing_db=true -benchmarks=compact -compression_type=zstd -compression_max_dict_bytes=0 -block_size=4096 2>&1 \| grep elapsed du -hc /dev/shm/dbbench/sst \| grep total echo "========== Raw Content Dictionary ==========" TEST_TMPDIR=/dev/shm ./db_bench_main -benchmarks=filluniquerandom,compact -num=10000000 -compression_type=zstd -compression_max_dict_bytes=$dict_bytes -block_size=4096 -max_background_jobs=24 -memtablerep=vector -allow_concurrent_memtable_write=false -disable_wal=true -max_write_buffer_number=8 >/dev/null 2>&1 TEST_TMPDIR=/dev/shm /usr/bin/time ./db_bench_main -use_existing_db=true -benchmarks=compact -compression_type=zstd -compression_max_dict_bytes=$dict_bytes -block_size=4096 2>&1 \| grep elapsed du -hc /dev/shm/dbbench/sst \| grep total echo "========== FinalizeDictionary ==========" TEST_TMPDIR=/dev/shm ./db_bench -benchmarks=filluniquerandom,compact -num=10000000 -compression_type=zstd -compression_max_dict_bytes=$dict_bytes -compression_zstd_max_train_bytes=$train_bytes -compression_use_zstd_dict_trainer=false -block_size=4096 -max_background_jobs=24 -memtablerep=vector -allow_concurrent_memtable_write=false -disable_wal=true -max_write_buffer_number=8 >/dev/null 2>&1 TEST_TMPDIR=/dev/shm /usr/bin/time ./db_bench -use_existing_db=true -benchmarks=compact -compression_type=zstd -compression_max_dict_bytes=$dict_bytes -compression_zstd_max_train_bytes=$train_bytes -compression_use_zstd_dict_trainer=false -block_size=4096 2>&1 \| grep elapsed du -hc /dev/shm/dbbench/sst \| grep total echo "========== TrainDictionary ==========" TEST_TMPDIR=/dev/shm ./db_bench -benchmarks=filluniquerandom,compact -num=10000000 -compression_type=zstd -compression_max_dict_bytes=$dict_bytes -compression_zstd_max_train_bytes=$train_bytes -block_size=4096 -max_background_jobs=24 -memtablerep=vector -allow_concurrent_memtable_write=false -disable_wal=true -max_write_buffer_number=8 >/dev/null 2>&1 TEST_TMPDIR=/dev/shm /usr/bin/time ./db_bench -use_existing_db=true -benchmarks=compact -compression_type=zstd -compression_max_dict_bytes=$dict_bytes -compression_zstd_max_train_bytes=$train_bytes -block_size=4096 2>&1 \| grep elapsed du -hc /dev/shm/dbbench/sst \| grep total # Result: TrainDictionary is much better on space saving, but FinalizeDictionary seems to use less memory. # before compression data size: 1.2GB dict_bytes=16384 max_dict_buffer_bytes = 1048576 space cpu/memory No Dictionary 468M 14.93user 1.00system 0:15.92elapsed 100%CPU (0avgtext+0avgdata 23904maxresident)k Raw Dictionary 251M 15.81user 0.80system 0:16.56elapsed 100%CPU (0avgtext+0avgdata 156808maxresident)k FinalizeDictionary 236M 11.93user 0.64system 0:12.56elapsed 100%CPU (0avgtext+0avgdata 89548maxresident)k TrainDictionary 84M 7.29user 0.45system 0:07.75elapsed 100%CPU (0avgtext+0avgdata 97288maxresident)k ``` #### Benchmark on 10 sample SST files for spacing saving and CPU time on compression: FinalizeDictionary is comparable to TrainDictionary in terms of space saving, and takes less time in compression. ``` dict_bytes=16384 train_bytes=1048576 for sst_file in `ls ../temp/myrock-sst/` do echo "******** $sst_file ********" echo "========== No Dictionary ==========" ./sst_dump --file="../temp/myrock-sst/$sst_file" --command=recompress --compression_level_from=6 --compression_level_to=6 --compression_types=kZSTD echo "========== Raw Content Dictionary ==========" ./sst_dump --file="../temp/myrock-sst/$sst_file" --command=recompress --compression_level_from=6 --compression_level_to=6 --compression_types=kZSTD --compression_max_dict_bytes=$dict_bytes echo "========== FinalizeDictionary ==========" ./sst_dump --file="../temp/myrock-sst/$sst_file" --command=recompress --compression_level_from=6 --compression_level_to=6 --compression_types=kZSTD --compression_max_dict_bytes=$dict_bytes --compression_zstd_max_train_bytes=$train_bytes --compression_use_zstd_finalize_dict echo "========== TrainDictionary ==========" ./sst_dump --file="../temp/myrock-sst/$sst_file" --command=recompress --compression_level_from=6 --compression_level_to=6 --compression_types=kZSTD --compression_max_dict_bytes=$dict_bytes --compression_zstd_max_train_bytes=$train_bytes done 010240.sst (Size/Time) 011029.sst 013184.sst 021552.sst 185054.sst 185137.sst 191666.sst 7560381.sst 7604174.sst 7635312.sst No Dictionary 28165569 / 2614419 32899411 / 2976832 32977848 / 3055542 31966329 / 2004590 33614351 / 1755877 33429029 / 1717042 33611933 / 1776936 33634045 / 2771417 33789721 / 2205414 33592194 / 388254 Raw Content Dictionary 28019950 / 2697961 33748665 / 3572422 33896373 / 3534701 26418431 / 2259658 28560825 / 1839168 28455030 / 1846039 28494319 / 1861349 32391599 / 3095649 33772142 / 2407843 33592230 / 474523 FinalizeDictionary 27896012 / 2650029 33763886 / 3719427 33904283 / 3552793 26008225 / 2198033 28111872 / 1869530 28014374 / 1789771 28047706 / 1848300 32296254 / 3204027 33698698 / 2381468 33592344 / 517433 TrainDictionary 28046089 / 2740037 33706480 / 3679019 33885741 / 3629351 25087123 / 2204558 27194353 / 1970207 27234229 / 1896811 27166710 / 1903119 32011041 / 3322315 32730692 / 2406146 33608631 / 570593 ``` #### Decompression/Read test: With FinalizeDictionary/TrainDictionary, some data structure used for decompression are in stored in dictionary, so they are expected to be faster in terms of decompression/reads. ``` dict_bytes=16384 train_bytes=1048576 echo "No Dictionary" TEST_TMPDIR=/dev/shm/ ./db_bench -benchmarks=filluniquerandom,compact -compression_type=zstd -compression_max_dict_bytes=0 > /dev/null 2>&1 TEST_TMPDIR=/dev/shm/ ./db_bench -use_existing_db=true -benchmarks=readrandom -cache_size=0 -compression_type=zstd -compression_max_dict_bytes=0 2>&1 \| grep MB/s echo "Raw Dictionary" TEST_TMPDIR=/dev/shm/ ./db_bench -benchmarks=filluniquerandom,compact -compression_type=zstd -compression_max_dict_bytes=$dict_bytes > /dev/null 2>&1 TEST_TMPDIR=/dev/shm/ ./db_bench -use_existing_db=true -benchmarks=readrandom -cache_size=0 -compression_type=zstd -compression_max_dict_bytes=$dict_bytes 2>&1 \| grep MB/s echo "FinalizeDict" TEST_TMPDIR=/dev/shm/ ./db_bench -benchmarks=filluniquerandom,compact -compression_type=zstd -compression_max_dict_bytes=$dict_bytes -compression_zstd_max_train_bytes=$train_bytes -compression_use_zstd_dict_trainer=false > /dev/null 2>&1 TEST_TMPDIR=/dev/shm/ ./db_bench -use_existing_db=true -benchmarks=readrandom -cache_size=0 -compression_type=zstd -compression_max_dict_bytes=$dict_bytes -compression_zstd_max_train_bytes=$train_bytes -compression_use_zstd_dict_trainer=false 2>&1 \| grep MB/s echo "Train Dictionary" TEST_TMPDIR=/dev/shm/ ./db_bench -benchmarks=filluniquerandom,compact -compression_type=zstd -compression_max_dict_bytes=$dict_bytes -compression_zstd_max_train_bytes=$train_bytes > /dev/null 2>&1 TEST_TMPDIR=/dev/shm/ ./db_bench -use_existing_db=true -benchmarks=readrandom -cache_size=0 -compression_type=zstd -compression_max_dict_bytes=$dict_bytes -compression_zstd_max_train_bytes=$train_bytes 2>&1 \| grep MB/s No Dictionary readrandom : 12.183 micros/op 82082 ops/sec 12.183 seconds 1000000 operations; 9.1 MB/s (1000000 of 1000000 found) Raw Dictionary readrandom : 12.314 micros/op 81205 ops/sec 12.314 seconds 1000000 operations; 9.0 MB/s (1000000 of 1000000 found) FinalizeDict readrandom : 9.787 micros/op 102180 ops/sec 9.787 seconds 1000000 operations; 11.3 MB/s (1000000 of 1000000 found) Train Dictionary readrandom : 9.698 micros/op 103108 ops/sec 9.699 seconds 1000000 operations; 11.4 MB/s (1000000 of 1000000 found) ``` Reviewed By: ajkr Differential Revision: D35720026 Pulled By: cbi42 fbshipit-source-id: 24d230fdff0fd28a1bb650658798f00dfcfb2a1f	2022-05-20 12:09:09 -07:00
Jay Zhuang	c6d326d3d7	Track SST unique id in MANIFEST and verify (#9990 ) Summary: Start tracking SST unique id in MANIFEST, which is used to verify with SST properties to make sure the SST file is not overwritten or misplaced. A DB option `try_verify_sst_unique_id` is introduced to enable/disable the verification, if enabled, it opens all SST files during DB-open to read the unique_id from table properties (default is false), so it's recommended to use it with `max_open_files = -1` to pre-open the files. Pull Request resolved: https://github.com/facebook/rocksdb/pull/9990 Test Plan: unittests, format-compatible test, mini-crash Reviewed By: anand1976 Differential Revision: D36381863 Pulled By: jay-zhuang fbshipit-source-id: 89ea2eb6b35ed3e80ead9c724eb096083eaba63f	2022-05-19 11:04:21 -07:00
Hui Xiao	3573558ec5	Rewrite memory-charging feature's option API (#9926 ) Summary: Context: Previous PR https://github.com/facebook/rocksdb/pull/9748, https://github.com/facebook/rocksdb/pull/9073, https://github.com/facebook/rocksdb/pull/8428 added separate flag for each charged memory area. Such API design is not scalable as we charge more and more memory areas. Also, we foresee an opportunity to consolidate this feature with other cache usage related features such as `cache_index_and_filter_blocks` using `CacheEntryRole`. Therefore we decided to consolidate all these flags with `CacheUsageOptions cache_usage_options` and this PR serves as the first step by consolidating memory-charging related flags. Summary: - Replaced old API reference with new ones, including making `kCompressionDictionaryBuildingBuffer` opt-out and added a unit test for that - Added missing db bench/stress test for some memory charging features - Renamed related test suite to indicate they are under the same theme of memory charging - Refactored a commonly used mocked cache component in memory charging related tests to reduce code duplication - Replaced the phrases "memory tracking" / "cache reservation" (other than CacheReservationManager-related ones) with "memory charging" for standard description of this feature. Pull Request resolved: https://github.com/facebook/rocksdb/pull/9926 Test Plan: - New unit test for opt-out `kCompressionDictionaryBuildingBuffer` `TEST_F(ChargeCompressionDictionaryBuildingBufferTest, Basic)` - New unit test for option validation/sanitization `TEST_F(CacheUsageOptionsOverridesTest, SanitizeAndValidateOptions)` - CI - db bench (in case querying new options introduces regression) +0.5% micros/op: `TEST_TMPDIR=/dev/shm/testdb ./db_bench -benchmarks=fillseq -db=$TEST_TMPDIR -charge_compression_dictionary_building_buffer=1(remove this for comparison) -compression_max_dict_bytes=10000 -disable_auto_compactions=1 -write_buffer_size=100000 -num=4000000 \| egrep 'fillseq'` #-run \| (pre-PR) avg micros/op \| std micros/op \| (post-PR) micros/op \| std micros/op \| change (%) -- \| -- \| -- \| -- \| -- \| -- 10 \| 3.9711 \| 0.264408 \| 3.9914 \| 0.254563 \| 0.5111933721 20 \| 3.83905 \| 0.0664488 \| 3.8251 \| 0.0695456 \| -0.3633711465 40 \| 3.86625 \| 0.136669 \| 3.8867 \| 0.143765 \| 0.5289363078 - db_stress: `python3 tools/db_crashtest.py blackbox -charge_compression_dictionary_building_buffer=1 -charge_filter_construction=1 -charge_table_reader=1 -cache_size=1` killed as normal Reviewed By: ajkr Differential Revision: D36054712 Pulled By: hx235 fbshipit-source-id: d406e90f5e0c5ea4dbcb585a484ad9302d4302af	2022-05-17 15:01:51 -07:00
Yanqin Jin	f6d9730ea1	Fix stress test with best-efforts-recovery (#9986 ) Summary: This PR - since we are testing with disable_wal = true and best_efforts_recovery, we should set column family count to 1, due to the requirement of `ExpectedState` tracking and replaying logic. - during backup and checkpoint restore, disable best-efforts-recovery. This does not matter now because db_crashtest.py always disables wal when testing best-efforts-recovery. In the future, if we enable wal, then not setting `restore_opitions.best_efforts_recovery` will cause backup db not to recover the WALs, and differ from db (that enables WAL). - during verification of backup and checkpoint restore, print the key where inconsistency exists between expected state and db. Pull Request resolved: https://github.com/facebook/rocksdb/pull/9986 Test Plan: TEST_TMPDIR=/dev/shm/rocksdb make crash_test_with_best_efforts_recovery Reviewed By: siying Differential Revision: D36353105 Pulled By: riversand963 fbshipit-source-id: a484da161273e6216a1f7e245bac15a349693917	2022-05-13 12:29:20 -07:00
Andrew Kryczka	e943bbdd2f	Temporarily disable sync_fault_injection (#9979 ) Summary: Pull Request resolved: https://github.com/facebook/rocksdb/pull/9979 Reviewed By: siying Differential Revision: D36301555 Pulled By: ajkr fbshipit-source-id: ed298d3484b6aad3ef19746e984bf4c52be33a9f	2022-05-11 12:19:07 -07:00
Andrew Kryczka	62d84e2a2b	db_stress fault injection in release mode (#9957 ) Summary: Previously all fault injection was ignored in release mode. This PR adds it back except for read fault injection (`--read_fault_one_in > 0`) since its dependency (`IGNORE_STATUS_IF_ERROR`) is unavailable in release mode. Other notable changes include: - Moved `EnableWriteErrorInjection()` for `--write_fault_one_in > 0` so it's after `DB::Open()` without depending on `SyncPoint` - Made `--read_fault_one_in > 0` return an error in release mode - Updated `db_crashtest.py` to always set `--read_fault_one_in=0` in release mode Pull Request resolved: https://github.com/facebook/rocksdb/pull/9957 Test Plan: ``` $ DEBUG_LEVEL=0 make -j24 db_stress $ DEBUG_LEVEL=0 TEST_TMPDIR=/dev/shm python3 tools/db_crashtest.py blackbox ``` Reviewed By: anand1976 Differential Revision: D36193830 Pulled By: ajkr fbshipit-source-id: 0b97946b4e3f06e3e0f6e7833c2763da08ec5321	2022-05-06 11:17:08 -07:00
Andrew Kryczka	a62506aee2	Enable unsynced data loss in crash test (#9947 ) Summary: `db_stress` already tracks expected state history to verify prefix-recoverability when `sync_fault_injection` is enabled. This PR enables `sync_fault_injection` in `db_crashtest.py`. Previously enabling `sync_fault_injection` would cause whole unsynced files to be dropped. This PR adds a more interesting case of losing only the tail of unsynced data by implementing `TestFSWritableFile::RangeSync()` and enabling `{wal_,}bytes_per_sync`. Pull Request resolved: https://github.com/facebook/rocksdb/pull/9947 Test Plan: - regular blackbox, blackbox --simple - various commands to stress this new case, such as `TEST_TMPDIR=/dev/shm python3 tools/db_crashtest.py blackbox --max_key=100000 --write_buffer_size=2097152 --avoid_flush_during_recovery=1 --disable_wal=0 --interval=10 --db_write_buffer_size=0 --sync_fault_injection=1 --wal_compression=none --delpercent=0 --delrangepercent=0 --prefixpercent=0 --iterpercent=0 --writepercent=100 --readpercent=0 --wal_bytes_per_sync=131072 --duration=36000 --sync=0 --open_write_fault_one_in=16` Reviewed By: riversand963 Differential Revision: D36152775 Pulled By: ajkr fbshipit-source-id: 44b68a7fad0a4cf74af9fe1f39be01baab8141d8	2022-05-05 13:21:03 -07:00
Yanqin Jin	06394ff4e7	Fix a bug of CompactionIterator/CompactionFilter using `Delete` (#9929 ) Summary: When compaction filter determines that a key should be removed, it updates the internal key's type to `Delete`. If this internal key is preserved in current compaction but seen by a later compaction together with `SingleDelete`, it will cause compaction iterator to return Corruption. To fix the issue, compaction filter should return more information in addition to the intention of removing a key. Therefore, we add a new `kRemoveWithSingleDelete` to `CompactionFilter::Decision`. Seeing `kRemoveWithSingleDelete`, compaction iterator will update the op type of the internal key to `kTypeSingleDelete`. In addition, I updated db_stress_shared_state.[cc\|h] so that `no_overwrite_ids_` becomes `const`. It is easier to reason about thread-safety if accessed from multiple threads. This information is passed to `PrepareTxnDBOptions()` when calling from `Open()` so that we can set up the rollback deletion type callback for transactions. Finally, disable compaction filter for multiops_txn because the key removal logic of `DbStressCompactionFilter` does not quite work with `MultiOpsTxnsStressTest`. Pull Request resolved: https://github.com/facebook/rocksdb/pull/9929 Test Plan: make check make crash_test make crash_test_with_txn Reviewed By: anand1976 Differential Revision: D36069678 Pulled By: riversand963 fbshipit-source-id: cedd2f1ba958af59ad3916f1ba6f424307955f92	2022-05-02 13:25:45 -07:00
Yanqin Jin	94e245a14d	Improve stress test for MultiOpsTxnsStressTest (#9829 ) Summary: Adds more coverage to `MultiOpsTxnsStressTest` with a focus on write-prepared transactions. 1. Add a hack to manually evict commit cache entries. We currently cannot assign small values to `wp_commit_cache_bits` because it requires a prepared transaction to commit within a certain range of sequence numbers, otherwise it will throw. 2. Add coverage for commit-time-write-batch. If write policy is write-prepared, we need to set `use_only_the_last_commit_time_batch_for_recovery` to true. 3. After each flush/compaction, verify data consistency. This is possible since data size can be small: default numbers of primary/secondary keys are just 1000. Pull Request resolved: https://github.com/facebook/rocksdb/pull/9829 Test Plan: ``` TEST_TMPDIR=/dev/shm/rocksdb_crashtest_blackbox/ make blackbox_crash_test_with_multiops_wp_txn ``` Reviewed By: pdillinger Differential Revision: D35806678 Pulled By: riversand963 fbshipit-source-id: d7fde7a29fda0fb481a61f553e0ca0c47da93616	2022-04-27 17:50:54 -07:00
anand76	c3d7e16252	Add WAL compression to stress tests (#9811 ) Summary: Add the WAL compression feature to the stress test. Pull Request resolved: https://github.com/facebook/rocksdb/pull/9811 Reviewed By: riversand963 Differential Revision: D35414316 Pulled By: anand1976 fbshipit-source-id: 0c17b1ec55679a52f088ad368798b57139bd921a	2022-04-06 15:47:09 -07:00
Hui Xiao	49623f9c8e	Account memory of big memory users in BlockBasedTable in global memory limit (#9748 ) Summary: Context: Through heap profiling, we discovered that `BlockBasedTableReader` objects can accumulate and lead to high memory usage (e.g, `max_open_file = -1`). These memories are currently not saved, not tracked, not constrained and not cache evict-able. As a first step to improve this, similar to https://github.com/facebook/rocksdb/pull/8428, this PR is to track an estimate of `BlockBasedTableReader` object's memory in block cache and fail future creation if the memory usage exceeds the available space of cache at the time of creation. Summary: - Approximate big memory users (`BlockBasedTable::Rep` and `TableProperties` )' memory usage in addition to the existing estimated ones (filter block/index block/un-compression dictionary) - Charge all of these memory usages to block cache on `BlockBasedTable::Open()` and release them on `~BlockBasedTable()` as there is no memory usage fluctuation of concern in between - Refactor on CacheReservationManager (and its call-sites) to add concurrent support for BlockBasedTable used in this PR. Pull Request resolved: https://github.com/facebook/rocksdb/pull/9748 Test Plan: - New unit tests - db bench: `OpenDb` : -0.52% in ms - Setup `./db_bench -benchmarks=fillseq -db=/dev/shm/testdb -disable_auto_compactions=1 -write_buffer_size=1048576` - Repeated run with pre-change w/o feature and post-change with feature, benchmark `OpenDb`: `./db_bench -benchmarks=readrandom -use_existing_db=1 -db=/dev/shm/testdb -reserve_table_reader_memory=true (remove this when running w/o feature) -file_opening_threads=3 -open_files=-1 -report_open_timing=true\| egrep 'OpenDb:'` #-run \| (feature-off) avg milliseconds \| std milliseconds \| (feature-on) avg milliseconds \| std milliseconds \| change (%) -- \| -- \| -- \| -- \| -- \| -- 10 \| 11.4018 \| 5.95173 \| 9.47788 \| 1.57538 \| -16.87382694 20 \| 9.23746 \| 0.841053 \| 9.32377 \| 1.14074 \| 0.9343477536 40 \| 9.0876 \| 0.671129 \| 9.35053 \| 1.11713 \| 2.893283155 80 \| 9.72514 \| 2.28459 \| 9.52013 \| 1.0894 \| -2.108041632 160 \| 9.74677 \| 0.991234 \| 9.84743 \| 1.73396 \| 1.032752389 320 \| 10.7297 \| 5.11555 \| 10.547 \| 1.97692 \| -1.70275031 640 \| 11.7092 \| 2.36565 \| 11.7869 \| 2.69377 \| 0.6635807741 - db bench on write with cost to cache in WriteBufferManager (just in case this PR's CRM refactoring accidentally slows down anything in WBM) : `fillseq` : +0.54% in micros/op `./db_bench -benchmarks=fillseq -db=/dev/shm/testdb -disable_auto_compactions=1 -cost_write_buffer_to_cache=true -write_buffer_size=10000000000 \| egrep 'fillseq'` #-run \| (pre-PR) avg micros/op \| std micros/op \| (post-PR) avg micros/op \| std micros/op \| change (%) -- \| -- \| -- \| -- \| -- \| -- 10 \| 6.15 \| 0.260187 \| 6.289 \| 0.371192 \| 2.260162602 20 \| 7.28025 \| 0.465402 \| 7.37255 \| 0.451256 \| 1.267813605 40 \| 7.06312 \| 0.490654 \| 7.13803 \| 0.478676 \| 1.060579461 80 \| 7.14035 \| 0.972831 \| 7.14196 \| 0.92971 \| 0.02254791432 - filter bench: `bloom filter`: -0.78% in ms/key - ` ./filter_bench -impl=2 -quick -reserve_table_builder_memory=true \| grep 'Build avg'` #-run \| (pre-PR) avg ns/key \| std ns/key \| (post-PR) ns/key \| std ns/key \| change (%) -- \| -- \| -- \| -- \| -- \| -- 10 \| 26.4369 \| 0.442182 \| 26.3273 \| 0.422919 \| -0.4145720565 20 \| 26.4451 \| 0.592787 \| 26.1419 \| 0.62451 \| -1.1465262 - Crash test `python3 tools/db_crashtest.py blackbox --reserve_table_reader_memory=1 --cache_size=1` killed as normal Reviewed By: ajkr Differential Revision: D35136549 Pulled By: hx235 fbshipit-source-id: 146978858d0f900f43f4eb09bfd3e83195e3be28	2022-04-06 10:33:00 -07:00
Chen Lixiang	cd59b139fc	Fix some typos in comments and HISTORY.md (#9798 ) Summary: compation --> compaction Pull Request resolved: https://github.com/facebook/rocksdb/pull/9798 Reviewed By: ajkr Differential Revision: D35341611 Pulled By: jay-zhuang fbshipit-source-id: 5ea07527c311de75cade219456b6ee52b23020f6	2022-04-04 09:32:57 -07:00
Akanksha Mahajan	fd66005628	Add 'adaptive_readahead' and 'async_io' options to db_stress (#9750 ) Summary: Same as title Pull Request resolved: https://github.com/facebook/rocksdb/pull/9750 Test Plan: export CRASH_TEST_EXT_ARGS=" --async_io=1 --adaptive_readahead=1; make -j crash_test Reviewed By: jay-zhuang Differential Revision: D35114326 Pulled By: akankshamahajan15 fbshipit-source-id: 8b05c95be09f7aff6cb9eb757aa20a6520349d45	2022-03-30 13:52:37 -07:00
Yanqin Jin	c18c4a081c	Add new determinators for multiops transactions stress test (#9708 ) Summary: Add determinators for multiops transactions stress test with write-committed and write-prepared policies. Pull Request resolved: https://github.com/facebook/rocksdb/pull/9708 Test Plan: Internal CI Reviewed By: jay-zhuang Differential Revision: D34967263 Pulled By: riversand963 fbshipit-source-id: 170a0842d56dccb6ed6bc0c5adfd33849acd6b31	2022-03-23 22:29:50 -07:00
Yanqin Jin	5894761056	Improve stress test for transactions (#9568 ) Summary: Test only, no change to functionality. Extremely low risk of library regression. Update test key generation by maintaining existing and non-existing keys. Update db_crashtest.py to drive multiops_txn stress test for both write-committed and write-prepared. Add a make target 'blackbox_crash_test_with_multiops_txn'. Running the following commands caught the bug exposed in https://github.com/facebook/rocksdb/issues/9571. ``` $rm -rf /tmp/rocksdbtest/* $./db_stress -progress_reports=0 -test_multi_ops_txns -use_txn -clear_column_family_one_in=0 \ -column_families=1 -writepercent=0 -delpercent=0 -delrangepercent=0 -customopspercent=60 \ -readpercent=20 -prefixpercent=0 -iterpercent=20 -reopen=0 -ops_per_thread=1000 -ub_a=10000 \ -ub_c=100 -destroy_db_initially=0 -key_spaces_path=/dev/shm/key_spaces_desc -threads=32 -read_fault_one_in=0 $./db_stress -progress_reports=0 -test_multi_ops_txns -use_txn -clear_column_family_one_in=0 -column_families=1 -writepercent=0 -delpercent=0 -delrangepercent=0 -customopspercent=60 -readpercent=20 \ -prefixpercent=0 -iterpercent=20 -reopen=0 -ops_per_thread=1000 -ub_a=10000 -ub_c=100 -destroy_db_initially=0 \ -key_spaces_path=/dev/shm/key_spaces_desc -threads=32 -read_fault_one_in=0 ``` Running the following command caught a bug which will be fixed in https://github.com/facebook/rocksdb/issues/9648 . ``` $TEST_TMPDIR=/dev/shm make blackbox_crash_test_with_multiops_wc_txn ``` Pull Request resolved: https://github.com/facebook/rocksdb/pull/9568 Reviewed By: jay-zhuang Differential Revision: D34308154 Pulled By: riversand963 fbshipit-source-id: 99ff1b65c19b46c471d2f2d3b47adcd342a1b9e7	2022-03-16 19:00:04 -07:00
Baptiste Lemaire	e4c87773e1	Reactivate Mempurge feature in crash test. (#9684 ) Summary: Set `experimental_mempurge_threshold` back to `lambda: 10.0*random.random()` in crash test, reverting https://github.com/facebook/rocksdb/issues/8958 after fix provided in https://github.com/facebook/rocksdb/issues/9671 . Pull Request resolved: https://github.com/facebook/rocksdb/pull/9684 Reviewed By: pdillinger Differential Revision: D34820257 Pulled By: bjlemaire fbshipit-source-id: 1e5ae8c872c4ac4c4267c990ac5e3e793d77908c	2022-03-11 15:47:30 -08:00
Andrew Kryczka	ad2cab8f0c	minor tweaks to db_crashtest.py settings (#9483 ) Summary: I did another pass through running CI jobs. It is uncommon now to see `db_stress` stuck in the setup phase but still happen. One reason was repeatedly reading/verifying checksum on filter blocks when `-cache_index_and_filter_blocks=1` and `-cache_size=1048576`. To address that I increased the cache size. Another reason was having a WAL with many range tombstones and every `db_stress` run using `-avoid_flush_during_recovery=1` (in that scenario, the setup phase spent too much CPU in `rocksdb::MemTable::NewRangeTombstoneIteratorInternal()`). To address that I fixed the `-avoid_flush_during_recovery` setting so it is reevaluated for every `db_stress` run. Pull Request resolved: https://github.com/facebook/rocksdb/pull/9483 Reviewed By: riversand963 Differential Revision: D33922929 Pulled By: ajkr fbshipit-source-id: 0a298ec7c4df6f6b44620233996047a2dc7ee5f3	2022-02-15 13:56:27 -08:00
Yanqin Jin	8b62abcc21	Disable backup/restore for ts-stress test (#9497 ) Summary: Pull Request resolved: https://github.com/facebook/rocksdb/pull/9497 Reviewed By: ajkr Differential Revision: D33990256 Pulled By: riversand963 fbshipit-source-id: 268ce16b037e23e42b14fa0fcb45535582e1a0d6	2022-02-03 16:18:34 -08:00
Hui Xiao	920386f2b7	Detect (new) Bloom/Ribbon Filter construction corruption (#9342 ) Summary: Note: rebase on and merge after https://github.com/facebook/rocksdb/pull/9349, https://github.com/facebook/rocksdb/pull/9345, (optional) https://github.com/facebook/rocksdb/pull/9393 Context: (Quoted from pdillinger) Layers of information during new Bloom/Ribbon Filter construction in building block-based tables includes the following: a) set of keys to add to filter b) set of hashes to add to filter (64-bit hash applied to each key) c) set of Bloom indices to set in filter, with duplicates d) set of Bloom indices to set in filter, deduplicated e) final filter and its checksum This PR aims to detect corruption (e.g, unexpected hardware/software corruption on data structures residing in the memory for a long time) from b) to e) and leave a) as future works for application level. - b)'s corruption is detected by verifying the xor checksum of the hash entries calculated as the entries accumulate before being added to the filter. (i.e, `XXPH3FilterBitsBuilder::MaybeVerifyHashEntriesChecksum()`) - c) - e)'s corruption is detected by verifying the hash entries indeed exists in the constructed filter by re-querying these hash entries in the filter (i.e, `FilterBitsBuilder::MaybePostVerify()`) after computing the block checksum (except for PartitionFilter, which is done right after each `FilterBitsBuilder::Finish` for impl simplicity - see code comment for more). For this stage of detection, we assume hash entries are not corrupted after checking on b) since the time interval from b) to c) is relatively short IMO. Option to enable this feature of detection is `BlockBasedTableOptions::detect_filter_construct_corruption` which is false by default. Summary: - Implemented new functions `XXPH3FilterBitsBuilder::MaybeVerifyHashEntriesChecksum()` and `FilterBitsBuilder::MaybePostVerify()` - Ensured hash entries, final filter and banding and their [cache reservation ](https://github.com/facebook/rocksdb/issues/9073) are released properly despite corruption - See [Filter.construction.artifacts.release.point.pdf ](https://github.com/facebook/rocksdb/files/7923487/Design.Filter.construction.artifacts.release.point.pdf) for high-level design - Bundled and refactored hash entries's related artifact in XXPH3FilterBitsBuilder into `HashEntriesInfo` for better control on lifetime of these artifact during `SwapEntires`, `ResetEntries` - Ensured RocksDB block-based table builder calls `FilterBitsBuilder::MaybePostVerify()` after constructing the filter by `FilterBitsBuilder::Finish()` - When encountering such filter construction corruption, stop writing the filter content to files and mark such a block-based table building non-ok by storing the corruption status in the builder. Pull Request resolved: https://github.com/facebook/rocksdb/pull/9342 Test Plan: - Added new unit test `DBFilterConstructionCorruptionTestWithParam.DetectCorruption` - Included this new feature in `DBFilterConstructionReserveMemoryTestWithParam.ReserveMemory` as this feature heavily touch ReserveMemory's impl - For fallback case, I run `./filter_bench -impl=3 -detect_filter_construct_corruption=true -reserve_table_builder_memory=true -strict_capacity_limit=true -quick -runs 10 \| grep 'Build avg'` to make sure nothing break. - Added to `filter_bench`: increased filter construction time by 30%, mostly by `MaybePostVerify()` - FastLocalBloom - Before change: `./filter_bench -impl=2 -quick -runs 10 \| grep 'Build avg'`: 28.86643s - After change: - `./filter_bench -impl=2 -detect_filter_construct_corruption=false -quick -runs 10 \| grep 'Build avg'` (expect a tiny increase due to MaybePostVerify is always called regardless): 27.6644s (-4% perf improvement might be due to now we don't drop bloom hash entry in `AddAllEntries` along iteration but in bulk later, same with the bypassing-MaybePostVerify case below) - `./filter_bench -impl=2 -detect_filter_construct_corruption=true -quick -runs 10 \| grep 'Build avg'` (expect acceptable increase): 34.41159s (+20%) - `./filter_bench -impl=2 -detect_filter_construct_corruption=true -quick -runs 10 \| grep 'Build avg'` (by-passing MaybePostVerify, expect minor increase): 27.13431s (-6%) - Standard128Ribbon - Before change: `./filter_bench -impl=3 -quick -runs 10 \| grep 'Build avg'`: 122.5384s - After change: - `./filter_bench -impl=3 -detect_filter_construct_corruption=false -quick -runs 10 \| grep 'Build avg'` (expect a tiny increase due to MaybePostVerify is always called regardless - verified by removing MaybePostVerify under this case and found only +-1ns difference): 124.3588s (+2%) - `./filter_bench -impl=3 -detect_filter_construct_corruption=true -quick -runs 10 \| grep 'Build avg'`(expect acceptable increase): 159.4946s (+30%) - `./filter_bench -impl=3 -detect_filter_construct_corruption=true -quick -runs 10 \| grep 'Build avg'`(by-passing MaybePostVerify, expect minor increase) : 125.258s (+2%) - Added to `db_stress`: `make crash_test`, `./db_stress --detect_filter_construct_corruption=true` - Manually smoke-tested: manually corrupted the filter construction in some db level tests with basic PUT and background flush. As expected, the error did get returned to users in subsequent PUT and Flush status. Reviewed By: pdillinger Differential Revision: D33746928 Pulled By: hx235 fbshipit-source-id: cb056426be5a7debc1cd16f23bc250f36a08ca57	2022-02-01 17:42:35 -08:00
Andrew Kryczka	8dbd0bd11f	db_crashtest.py use cheaper settings (#9476 ) Summary: Despite attempts to optimize `db_stress` setup phase (i.e., pre-`OperateDb()`) latency in https://github.com/facebook/rocksdb/issues/9470 and https://github.com/facebook/rocksdb/issues/9475, it still always took tens of seconds. Since we still aren't able to setup a 100M key `db_stress` quickly, we should reduce the number of keys. This PR reduces it 4x while increasing `value_size_mult` 4x (from its default value of 8) so that memtables and SST files fill at a similar rate compared to before this PR. Also disabled bzip2 compression since we'll probably never use it and I noticed many CI runs spending majority of CPU on bzip2 decompression. Pull Request resolved: https://github.com/facebook/rocksdb/pull/9476 Reviewed By: siying Differential Revision: D33898520 Pulled By: ajkr fbshipit-source-id: 855021784ad9664f2be5bce21f0339a1cf93230d	2022-01-31 13:21:24 -08:00
Peter Dillinger	c11fe94000	Fix^2 prefix extractor testing in crash test (#9463 ) Summary: Even after https://github.com/facebook/rocksdb/issues/9461 could see ``` Error: please specify prefix_size for test_batches_snapshots test! ``` Pull Request resolved: https://github.com/facebook/rocksdb/pull/9463 Test Plan: run `make blackbox_crashtest` for a long time. (Unfortunately, it's taking a long time to reproduce these failures) Reviewed By: akankshamahajan15 Differential Revision: D33838152 Pulled By: pdillinger fbshipit-source-id: b9a73c5bbb68df53f14c22b9b52f61d1f7ef38af	2022-01-27 23:11:11 -08:00
Peter Dillinger	981e8c621f	Fix/expand prefix extractor testing in crash test (#9461 ) Summary: Changes in https://github.com/facebook/rocksdb/issues/9453 could trigger ``` stderr: Error: prefixpercent is non-zero while prefix_size is not positive! ``` Pull Request resolved: https://github.com/facebook/rocksdb/pull/9461 Test Plan: run `make blackbox_crashtest` for a long time Reviewed By: ajkr Differential Revision: D33830751 Pulled By: pdillinger fbshipit-source-id: be88377dcaa47e4bb7adb0347762639eff8f1476	2022-01-27 16:37:55 -08:00
Peter Dillinger	ea89c77f27	Fix major bug with MultiGet, DeleteRange, and memtable Bloom (#9453 ) Summary: MemTable::MultiGet was not considering range tombstones before querying Bloom filter. This means range tombstones would be skipped for keys (or prefixes) with no other entries in the memtable. This could cause old values for a key (in SST files) to still show up until the range tombstone covering it has been flushed. This is fixed by essentially disabling the memtable Bloom filter when there are any range tombstones. (This could be better optimized in the future, but good enough for now.) Did some other cleanup/optimization in the same code to (more than) offset the cost of checking on range tombstones in more cases. There is now notable improvement when memtable_whole_key_filtering and prefix_extractor are used together (unusual), and this makes MultiGet closer to the Get implementation. Pull Request resolved: https://github.com/facebook/rocksdb/pull/9453 Test Plan: new unit test added. Added memtable Bloom to crash test. Performance testing -------------------- Build WAL-only DB (recovers to memtable): ``` TEST_TMPDIR=/dev/shm/rocksdb ./db_bench -benchmarks=fillrandom -num=1000000 -write_buffer_size=250000000 ``` Query test command, to maximize sensitivity to the changed code: ``` TEST_TMPDIR=/dev/shm/rocksdb ./db_bench -use_existing_db -readonly -benchmarks=multireadrandom -num=10000000 -write_buffer_size=250000000 -memtable_bloom_size_ratio=0.015 -multiread_batched -batch_size=24 -threads=8 -memtable_whole_key_filtering=$MWKF -prefix_size=$PXS ``` (Note -num here is 10x larger for mostly memtable misses) Before & after run simultaneously, average over 10 iterations per data point, ops/sec. MWKF=0 PXS=0 (Bloom disabled) Before: 5724844 After: 6722066 MWKF=0 PXS=7 (prefixes hardly unique; Bloom not useful) Before: 9981319 After: 10237990 MWKF=0 PXS=8 (prefixes unique; Bloom useful) Before: 12081715 After: 12117603 MWKF=1 PXS=0 (whole key Bloom useful) Before: 11944354 After: 12096085 MWKF=1 PXS=7 (whole key Bloom useful in new version; prefixes not useful in old version) Before: 9444299 After: 11826029 MWKF=1 PXS=7 (whole key Bloom useful in new version; prefixes useful in old version) Before: 11784465 After: 11778591 Only in this last case is the 'before' slightly faster, perhaps because hashing prefixes is slightly faster than hashing whole keys. Otherwise, 'after' is faster. Reviewed By: ajkr Differential Revision: D33805025 Pulled By: pdillinger fbshipit-source-id: 597523cae4f4eafdf6ae6bb2bc6cb46f83b017bf	2022-01-27 14:55:04 -08:00
Andrew Kryczka	6892f19b11	Test correctness with WAL disabled in non-txn blackbox crash tests (#9338 ) Summary: Recently we added the ability to verify some prefix of operations are recovered (AKA no "hole" in the recovered data) (https://github.com/facebook/rocksdb/issues/8966). Besides testing unsynced data loss scenarios, it is also useful to test WAL disabled use cases, where unflushed writes are expected to be lost. Note RocksDB only offers the prefix-recovery guarantee to WAL-disabled use cases that use atomic flush, so crash test always enables atomic flush when WAL is disabled. To verify WAL-disabled crash-recovery correctness globally, i.e., also in whitebox and blackbox transaction tests, it is possible but requires further changes. I added TODOs in db_crashtest.py. Depends on https://github.com/facebook/rocksdb/issues/9305. Pull Request resolved: https://github.com/facebook/rocksdb/pull/9338 Test Plan: Running all crash tests and many instances of blackbox. Sandcastle links are in Phabricator diff test plan. Reviewed By: riversand963 Differential Revision: D33345333 Pulled By: ajkr fbshipit-source-id: f56dd7d2e5a78d59301bf4fc3fedb980eb31e0ce	2022-01-05 16:23:37 -08:00
Andrew Kryczka	5383f1eec4	Verify recovery correctness in multi-CF blackbox crash test (#9303 ) Summary: db_crashtest.py uses multiple CFs only when run without flag `--simple`. The previous config set `-test_batches_snapshots=1` in that case for blackbox mode. But `-test_batches_snapshots=1` cannot verify recovery correctness, so it should not always be set for multi-CF blackbox tests. We can instead randomly toggle it. Pull Request resolved: https://github.com/facebook/rocksdb/pull/9303 Reviewed By: riversand963 Differential Revision: D33155229 Pulled By: ajkr fbshipit-source-id: 4a6fdc4eddccc8ece664063baf6393ce1c5de6b7	2021-12-16 09:05:40 -08:00
Akanksha Mahajan	9e4d56f2c9	Fix segmentation fault in table_options.prepopulate_block_cache when used with partition_filters (#9263 ) Summary: When table_options.prepopulate_block_cache is set to BlockBasedTableOptions::PrepopulateBlockCache::kFlushOnly and table_options.partition_filters is also set true, then there is segmentation failure when top level filter is fetched because its entered with wrong type in cache. Pull Request resolved: https://github.com/facebook/rocksdb/pull/9263 Test Plan: Updated unit tests; Ran db_stress: make crash_test -j32 Reviewed By: pdillinger Differential Revision: D32936566 Pulled By: akankshamahajan15 fbshipit-source-id: 8bd79e53830d3e3c1bb79787e1ffbc3cb46d4426	2021-12-08 12:44:38 -08:00
Levi Tamasi	dc5de45af8	Support readahead during compaction for blob files (#9187 ) Summary: The patch adds a new BlobDB configuration option `blob_compaction_readahead_size` that can be used to enable prefetching data from blob files during compaction. This is important when using storage with higher latencies like HDDs or remote filesystems. If enabled, prefetching is used for all cases when blobs are read during compaction, namely garbage collection, compaction filters (when the existing value has to be read from a blob file), and `Merge` (when the value of the base `Put` is stored in a blob file). Pull Request resolved: https://github.com/facebook/rocksdb/pull/9187 Test Plan: Ran `make check` and the stress/crash test. Reviewed By: riversand963 Differential Revision: D32565512 Pulled By: ltamasi fbshipit-source-id: 87be9cebc3aa01cc227bec6b5f64d827b8164f5d	2021-11-19 17:53:47 -08:00
anand76	78556c14dd	Secondary cache error injection (#9002 ) Summary: Implement secondary cache error injection in db_stress. Pull Request resolved: https://github.com/facebook/rocksdb/pull/9002 Reviewed By: zhichao-cao Differential Revision: D31874896 Pulled By: anand1976 fbshipit-source-id: 8cf04c061a4a44efa0fe88423d05cade67b85f73	2021-11-08 10:27:27 -08:00
Peter Dillinger	a7d4bea43a	Implement XXH3 block checksum type (#9069 ) Summary: XXH3 - latest hash function that is extremely fast on large data, easily faster than crc32c on most any x86_64 hardware. In integrating this hash function, I have handled the compression type byte in a non-standard way to avoid using the streaming API (extra data movement and active code size because of hash function complexity). This approach got a thumbs-up from Yann Collet. Existing functionality change: * reject bad ChecksumType in options with InvalidArgument This change split off from https://github.com/facebook/rocksdb/issues/9058 because context-aware checksum is likely to be handled through different configuration than ChecksumType. Pull Request resolved: https://github.com/facebook/rocksdb/pull/9069 Test Plan: tests updated, and substantially expanded. Unit tests now check that we don't accidentally change the values generated by the checksum algorithms ("schema test") and that we properly handle invalid/unrecognized checksum types in options or in file footer. DBTestBase::ChangeOptions (etc.) updated from two to one configuration changing from default CRC32c ChecksumType. The point of this test code is to detect possible interactions among features, and the likelihood of some bad interaction being detected by including configurations other than XXH3 and CRC32c--and then not detected by stress/crash test--is extremely low. Stress/crash test also updated (manual run long enough to see it accepts new checksum type). db_bench also updated for microbenchmarking checksums. ### Performance microbenchmark (PORTABLE=0 DEBUG_LEVEL=0, Broadwell processor) ./db_bench -benchmarks=crc32c,xxhash,xxhash64,xxh3,crc32c,xxhash,xxhash64,xxh3,crc32c,xxhash,xxhash64,xxh3 crc32c : 0.200 micros/op 5005220 ops/sec; 19551.6 MB/s (4096 per op) xxhash : 0.807 micros/op 1238408 ops/sec; 4837.5 MB/s (4096 per op) xxhash64 : 0.421 micros/op 2376514 ops/sec; 9283.3 MB/s (4096 per op) xxh3 : 0.171 micros/op 5858391 ops/sec; 22884.3 MB/s (4096 per op) crc32c : 0.206 micros/op 4859566 ops/sec; 18982.7 MB/s (4096 per op) xxhash : 0.793 micros/op 1260850 ops/sec; 4925.2 MB/s (4096 per op) xxhash64 : 0.410 micros/op 2439182 ops/sec; 9528.1 MB/s (4096 per op) xxh3 : 0.161 micros/op 6202872 ops/sec; 24230.0 MB/s (4096 per op) crc32c : 0.203 micros/op 4924686 ops/sec; 19237.1 MB/s (4096 per op) xxhash : 0.839 micros/op 1192388 ops/sec; 4657.8 MB/s (4096 per op) xxhash64 : 0.424 micros/op 2357391 ops/sec; 9208.6 MB/s (4096 per op) xxh3 : 0.162 micros/op 6182678 ops/sec; 24151.1 MB/s (4096 per op) As you can see, especially once warmed up, xxh3 is fastest. ### Performance macrobenchmark (PORTABLE=0 DEBUG_LEVEL=0, Broadwell processor) Test for I in `seq 1 50`; do for CHK in 0 1 2 3 4; do TEST_TMPDIR=/dev/shm/rocksdb$CHK ./db_bench -benchmarks=fillseq -memtablerep=vector -allow_concurrent_memtable_write=false -num=30000000 -checksum_type=$CHK 2>&1 \| grep 'micros/op' \| tee -a results-$CHK & done; wait; done Results (ops/sec) for FILE in results; do echo -n "$FILE "; awk '{ s += $5; c++; } END { print 1.0 s / c; }' < $FILE; done results-0 252118 # kNoChecksum results-1 251588 # kCRC32c results-2 251863 # kxxHash results-3 252016 # kxxHash64 results-4 252038 # kXXH3 Reviewed By: mrambacher Differential Revision: D31905249 Pulled By: pdillinger fbshipit-source-id: cb9b998ebe2523fc7c400eedf62124a78bf4b4d1	2021-10-28 22:15:17 -07:00
Levi Tamasi	3e1bf771a3	Make it possible to force the garbage collection of the oldest blob files (#8994 ) Summary: The current BlobDB garbage collection logic works by relocating the valid blobs from the oldest blob files as they are encountered during compaction, and cleaning up blob files once they contain nothing but garbage. However, with sufficiently skewed workloads, it is theoretically possible to end up in a situation when few or no compactions get scheduled for the SST files that contain references to the oldest blob files, which can lead to increased space amp due to the lack of GC. In order to efficiently handle such workloads, the patch adds a new BlobDB configuration option called `blob_garbage_collection_force_threshold`, which signals to BlobDB to schedule targeted compactions for the SST files that keep alive the oldest batch of blob files if the overall ratio of garbage in the given blob files meets the threshold and all the given blob files are eligible for GC based on `blob_garbage_collection_age_cutoff`. (For example, if the new option is set to 0.9, targeted compactions will get scheduled if the sum of garbage bytes meets or exceeds 90% of the sum of total bytes in the oldest blob files, assuming all affected blob files are below the age-based cutoff.) The net result of these targeted compactions is that the valid blobs in the oldest blob files are relocated and the oldest blob files themselves cleaned up (since all SST files that rely on them get compacted away). These targeted compactions are similar to periodic compactions in the sense that they force certain SST files that otherwise would not get picked up to undergo compaction and also in the sense that instead of merging files from multiple levels, they target a single file. (Note: such compactions might still include neighboring files from the same level due to the need of having a "clean cut" boundary but they never include any files from any other level.) This functionality is currently only supported with the leveled compaction style and is inactive by default (since the default value is set to 1.0, i.e. 100%). Pull Request resolved: https://github.com/facebook/rocksdb/pull/8994 Test Plan: Ran `make check` and tested using `db_bench` and the stress/crash tests. Reviewed By: riversand963 Differential Revision: D31489850 Pulled By: ltamasi fbshipit-source-id: 44057d511726a0e2a03c5d9313d7511b3f0c4eab	2021-10-11 18:03:01 -07:00
Andrew Kryczka	a282eff3d1	Protect existing files in `FaultInjectionTest{Env,FS}::ReopenWritableFile()` (#8995 ) Summary: `FaultInjectionTest{Env,FS}::ReopenWritableFile()` functions were accidentally deleting WALs from previous `db_stress` runs causing verification to fail. They were operating under the assumption that `ReopenWritableFile()` would delete any existing file. It was a reasonable assumption considering the `{Env,FileSystem}::ReopenWritableFile()` documentation stated that would happen. The only problem was neither the implementations we offer nor the "real" clients in RocksDB code followed that contract. So, this PR updates the contract as well as fixing the fault injection client usage. The fault injection change exposed that `ExternalSSTFileBasicTest.SyncFailure` was relying on a fault injection `Env` dropping unsynced data written by a regular `Env`. I changed that test to make its `SstFileWriter` use fault injection `Env`, and also implemented `LinkFile()` in fault injection so the unsynced data is tracked under the new name. Pull Request resolved: https://github.com/facebook/rocksdb/pull/8995 Test Plan: - Verified it fixes the following failure: ``` $ ./db_stress --clear_column_family_one_in=0 --column_families=1 --db=/dev/shm/rocksdb_crashtest_whitebox --delpercent=5 --expected_values_dir=/dev/shm/rocksdb_crashtest_expected --iterpercent=0 --key_len_percent_dist=1,30,69 --max_key=100000 --max_key_len=3 --nooverwritepercent=1 --ops_per_thread=1000 --prefixpercent=0 --readpercent=60 --reopen=0 --target_file_size_base=1048576 --test_batches_snapshots=0 --write_buffer_size=1048576 --writepercent=35 --value_size_mult=33 -threads=1 ... $ ./db_stress --avoid_flush_during_recovery=1 --clear_column_family_one_in=0 --column_families=1 --db=/dev/shm/rocksdb_crashtest_whitebox --delpercent=5 --destroy_db_initially=0 --expected_values_dir=/dev/shm/rocksdb_crashtest_expected --iterpercent=10 --key_len_percent_dist=1,30,69 --max_bytes_for_level_base=4194304 --max_key=100000 --max_key_len=3 --nooverwritepercent=1 --open_files=-1 --open_metadata_write_fault_one_in=8 --open_write_fault_one_in=16 --ops_per_thread=1000 --prefix_size=-1 --prefixpercent=0 --readpercent=50 --sync=1 --target_file_size_base=1048576 --test_batches_snapshots=0 --write_buffer_size=1048576 --writepercent=35 --value_size_mult=33 -threads=1 ... Verification failed for column family 0 key 000000000000001300000000000000857878787878 (1143): Value not found: NotFound: Crash-recovery verification failed :( ... ``` - `make check -j48` Reviewed By: ltamasi Differential Revision: D31495388 Pulled By: ajkr fbshipit-source-id: 7886ccb6a07cb8b78ad7b6c1c341ccf40bb68385	2021-10-11 16:23:18 -07:00
Akanksha Mahajan	84d71f30c4	Enable SingleDelete with user defined ts in db_bench and crash tests (#8971 ) Summary: Enable SingleDelete with user defined timestamp in db_bench, db_stress and crash test Pull Request resolved: https://github.com/facebook/rocksdb/pull/8971 Test Plan: 1. For db_stress, ran the command for full duration: i) python3 -u tools/db_crashtest.py --enable_ts whitebox --nooverwritepercent=100 ii) make crash_test_with_ts 2. For db_bench, ran: ./db_bench -benchmarks=randomreplacekeys -user_timestamp_size=8 -use_single_deletes=true Reviewed By: riversand963 Differential Revision: D31246558 Pulled By: akankshamahajan15 fbshipit-source-id: 29cd8740c9921341e52f09242fca3c44d75a12b7	2021-10-01 16:48:01 -07:00
Andrew Kryczka	559943cdc0	Refactor expected state in stress/crash test (#8913 ) Summary: This is a precursor refactoring to enable an upcoming feature: persistence failure correctness testing. - Changed `--expected_values_path` to `--expected_values_dir` and migrated "db_crashtest.py" to use the new flag. For persistence failure correctness testing there are multiple possible correct states since unsynced data is allowed to be dropped. Making it possible to restore all these possible correct states will eventually involve files containing snapshots of expected values and DB trace files. - The expected values directory is managed by an `ExpectedStateManager` instance. Managing expected state files is separated out of `SharedState` to prevent `SharedState` from becoming too complex when the new files and features (snapshotting, tracing, and restoring) are introduced. - Migrated expected values file access/management out of `SharedState` into a separate class called `ExpectedState`. This is not exposed directly to the test but rather the `ExpectedState` for the latest values file is accessed via a pass-through API on `ExpectedStateManager`. This forces the test to always access the single latest `ExpectedState`. - Changed the initialization of the latest expected values file to use a tempfile followed by rename, and also add cleanup logic for possible stranded tempfiles. Pull Request resolved: https://github.com/facebook/rocksdb/pull/8913 Test Plan: run in several ways; try to make sure it's not obviously broken. - crashtest blackbox without TEST_TMPDIR ``` $ python3 tools/db_crashtest.py blackbox --simple --write_buffer_size=1048576 --target_file_size_base=1048576 --max_bytes_for_level_base=4194304 --max_key=100000 --value_size_mult=33 --compression_type=none --duration=120 --interval=10 --compression_type=none --blob_compression_type=none ``` - crashtest blackbox with TEST_TMPDIR ``` $ TEST_TMPDIR=/dev/shm python3 tools/db_crashtest.py blackbox --simple --write_buffer_size=1048576 --target_file_size_base=1048576 --max_bytes_for_level_base=4194304 --max_key=100000 --value_size_mult=33 --compression_type=none --duration=120 --interval=10 --compression_type=none --blob_compression_type=none ``` - crashtest whitebox with TEST_TMPDIR ``` $ TEST_TMPDIR=/dev/shm python3 tools/db_crashtest.py whitebox --simple --write_buffer_size=1048576 --target_file_size_base=1048576 --max_bytes_for_level_base=4194304 --max_key=100000 --value_size_mult=33 --compression_type=none --duration=120 --interval=10 --compression_type=none --blob_compression_type=none --random_kill_odd=88887 ``` - db_stress without expected_values_dir ``` $ ./db_stress --write_buffer_size=1048576 --target_file_size_base=1048576 --max_bytes_for_level_base=4194304 --max_key=100000 --value_size_mult=33 --compression_type=none --ops_per_thread=10000 --clear_column_family_one_in=0 --destroy_db_initially=true ``` - db_stress with expected_values_dir and manual corruption ``` $ ./db_stress --write_buffer_size=1048576 --target_file_size_base=1048576 --max_bytes_for_level_base=4194304 --max_key=100000 --value_size_mult=33 --compression_type=none --ops_per_thread=10000 --clear_column_family_one_in=0 --destroy_db_initially=true --expected_values_dir=./ // modify one byte in "./LATEST.state" $ ./db_stress --write_buffer_size=1048576 --target_file_size_base=1048576 --max_bytes_for_level_base=4194304 --max_key=100000 --value_size_mult=33 --compression_type=none --ops_per_thread=10000 --clear_column_family_one_in=0 --destroy_db_initially=false --expected_values_dir=./ ... Verification failed for column family 0 key 0000000000000000 (0): Value not found: NotFound: ... ``` Reviewed By: riversand963 Differential Revision: D30921951 Pulled By: ajkr fbshipit-source-id: babfe218062e55d018c9b046536c0289fb78f41c	2021-09-28 14:13:33 -07:00
Andrew Kryczka	6d424be910	Temporarily set experimental_mempurge_threshold=0 in crash test (#8958 ) Summary: For now, disable it since the below command indicates it can cause a failure. Running that command with `-experimental_mempurge_threshold=0` has been running successfully for several minutes, whereas before it failed in seconds. ``` $ while rm -rf /dev/shm/single_stress && ./db_stress --clear_column_family_one_in=0 --column_families=1 --db=/dev/shm/single_stress --experimental_mempurge_threshold=5.493146827397074 --flush_one_in=10000 --reopen=0 --write_buffer_size=262144 --value_size_mult=33 --max_write_buffer_number=3 -ops_per_thread=10000; do : ; done ``` Pull Request resolved: https://github.com/facebook/rocksdb/pull/8958 Reviewed By: ltamasi Differential Revision: D31187059 Pulled By: ajkr fbshipit-source-id: 04d5bfb4fcc4f5b66233e691427dfd940c67037f	2021-09-24 18:29:48 -07:00
sdong	9320067703	Improve fault injection to MultiRead (#8937 ) Summary: Several improvements to MultiRead: 1. Fix a bug in stress test which causes false positive when both MultiRead() return and individual read request have failure injected. 2. Add two more types of fault that should be handled: empty read results and checksum mismatch 3. Add a message indicating which type of fault is injected 4. Increase the failure rate Pull Request resolved: https://github.com/facebook/rocksdb/pull/8937 Reviewed By: anand1976 Differential Revision: D31085930 fbshipit-source-id: 3a04994a3cadebf9a64d25e1fe12b14b7a272fba	2021-09-21 14:48:15 -07:00
Yanqin Jin	d8eb824325	Temporarily disable block-based filter when stress testing timestamp (#8703 ) Summary: Current implementation does not support user-defined timestamp when block-based filter is used. Will implement the support in the future, or wait to see if block-based filter can be deprecated and removed. Pull Request resolved: https://github.com/facebook/rocksdb/pull/8703 Test Plan: make whitebox_crash_test_with_ts Reviewed By: pdillinger Differential Revision: D30528931 Pulled By: riversand963 fbshipit-source-id: 60dd74ee0a6194e69072069d8c4bd876f249f38d	2021-08-24 19:04:58 -07:00
Peter Dillinger	2a383f21f4	Add Bloom/Ribbon hybrid API support (#8679 ) Summary: This is essentially resurrection and fixing of the part of https://github.com/facebook/rocksdb/issues/8198 that was reverted in https://github.com/facebook/rocksdb/issues/8212, using data added in https://github.com/facebook/rocksdb/issues/8246. Basically, when configuring Ribbon filter, you can specify an LSM level before which Bloom will be used instead of Ribbon. But Bloom is only considered for Leveled and Universal compaction styles and file going into a known LSM level. This way, SST file writer, FIFO compaction, etc. use Ribbon filter as you would expect with NewRibbonFilterPolicy. So that this can be controlled with a single int value and so that flushes can be distinguished from intra-L0, we consider flush to go to level -1 for the purposes of this option. (Explained in API comment.) I also expect the most common and recommended Ribbon configuration to use Bloom during flush, to minimize slowing down writes and because according to my estimates, Ribbon only pays off if the structure lives in memory for more than an hour. Thus, I have changed the default for NewRibbonFilterPolicy to be this mild hybrid configuration. I don't really want to add something like NewHybridFilterPolicy because at least the mild hybrid configuration (Bloom for flush, Ribbon otherwise) should be considered a natural choice. C APIs also updated, but because they don't support overloading, rocksdb_filterpolicy_create_ribbon is kept pure ribbon for clarity and rocksdb_filterpolicy_create_ribbon_hybrid must be called for a hybrid configuration. While touching C API, I changed bits per key options from int to double. BuiltinFilterPolicy is needed so that LevelThresholdFilterPolicy doesn't inherit unused fields from BloomFilterPolicy. Pull Request resolved: https://github.com/facebook/rocksdb/pull/8679 Test Plan: new + updated tests, including crash test Reviewed By: jay-zhuang Differential Revision: D30445797 Pulled By: pdillinger fbshipit-source-id: 6f5aeddfd6d79f7e55493b563c2d1d2d568892e1	2021-08-20 18:00:16 -07:00
Baptiste Lemaire	e3a96c4823	Memtable sampling for mempurge heuristic. (#8628 ) Summary: Changes the API of the MemPurge process: the `bool experimental_allow_mempurge` and `experimental_mempurge_policy` flags have been replaced by a `double experimental_mempurge_threshold` option. This change of API reflects another major change introduced in this PR: the MemPurgeDecider() function now works by sampling the memtables being flushed to estimate the overall amount of useful payload (payload minus the garbage), and then compare this useful payload estimate with the `double experimental_mempurge_threshold` value. Therefore, when the value of this flag is `0.0` (default value), mempurge is simply deactivated. On the other hand, a value of `DBL_MAX` would be equivalent to always going through a mempurge regardless of the garbage ratio estimate. At the moment, a `double experimental_mempurge_threshold` value else than 0.0 or `DBL_MAX` is opnly supported`with the `SkipList` memtable representation. Regarding the sampling, this PR includes the introduction of a `MemTable::UniqueRandomSample` function that collects (approximately) random entries from the memtable by using the new `SkipList::Iterator::RandomSeek()` under the hood, or by iterating through each memtable entry, depending on the target sample size and the total number of entries. The unit tests have been readapted to support this new API. Pull Request resolved: https://github.com/facebook/rocksdb/pull/8628 Reviewed By: pdillinger Differential Revision: D30149315 Pulled By: bjlemaire fbshipit-source-id: 1feef5390c95db6f4480ab4434716533d3947f27	2021-08-10 18:09:03 -07:00
Peter (Stig) Edwards	543a201b93	Remove unused variable - run_had_errors (#8599 ) Summary: Unused since `ab718b415f` . Noticed on `b215f1a832/files/tools/db_crashtest.py`?sort=name&dir=ASC&mode=heatmap#xf254f528ad18f108:1 Pull Request resolved: https://github.com/facebook/rocksdb/pull/8599 Reviewed By: ajkr Differential Revision: D30057041 Pulled By: zhichao-cao fbshipit-source-id: e80438cf9717086d2bf67461e19393d426a7676e	2021-08-06 14:46:37 -07:00
Baptiste Lemaire	d6006f9c9b	Add experimental mempurge policy flag to db_stress. (#8588 ) Summary: Add `experimental_mempurge_policy` flag to `db_stress` and `db_crashtest.py`. This flag is only read if the `experimental_allow_mempurge` flag is set to `true`. This flag can take the following values: `kAlways`, and `kAlternate` (default). - `kAlways`: a flush is always redirected to a mempurge. If the mempurge aborts, the a regular flush proceeds. - `kAlternate`: if one or more of the flush input memtables is an mempurge output memtable, then a flush is performed, else a mempurge is carried out. Similar to kAlways, if a mempurge aborts, the FlushJob proceeds to a regular flush to storage. Pull Request resolved: https://github.com/facebook/rocksdb/pull/8588 Reviewed By: pdillinger Differential Revision: D29934251 Pulled By: bjlemaire fbshipit-source-id: 90c1debed2029b9915d066914556547507c33dae	2021-07-28 13:27:58 -07:00
Baptiste Lemaire	0229a88dfe	Crashtest mempurge (#8545 ) Summary: Add `experiemental_allow_mempurge` flag support for `db_stress` and `db_crashtest.py`, with a `false` default value. I succesfully tested locally both `whitebox` and `blackbox` crash tests with `experiemental_allow_mempurge` flag set as true. Pull Request resolved: https://github.com/facebook/rocksdb/pull/8545 Reviewed By: akankshamahajan15 Differential Revision: D29734513 Pulled By: bjlemaire fbshipit-source-id: 24316c0eccf6caf409e95c035f31d822c66714ae	2021-07-16 10:20:22 -07:00
sdong	f33611d5e9	Stress test to inject read failures in DB reopen (#8476 ) Summary: Inject read failures in DB reopen, just as what we do for metadata writes and writes. Pull Request resolved: https://github.com/facebook/rocksdb/pull/8476 Test Plan: Some manual tests and make sure failures are triggered. Reviewed By: anand1976 Differential Revision: D29507283 fbshipit-source-id: d04da0163973447041038bd87701686a417c4e0c	2021-07-06 11:05:27 -07:00
sdong	ba224b75c7	Stress Test to inject write failures in reopen (#8474 ) Summary: Previously Stress can inject metadata write failures when reopening a DB. We extend it to file append too, in the same way. Pull Request resolved: https://github.com/facebook/rocksdb/pull/8474 Test Plan: manually run crash test with various setting and make sure the failures are triggered as expected. Reviewed By: zhichao-cao Differential Revision: D29503116 fbshipit-source-id: e73a446e80ccbd09301a579280e56ff949381fab	2021-06-30 16:46:41 -07:00
anand76	6f9ed59b1d	Allow db_stress to use a secondary cache (#8455 ) Summary: Add a ```-secondary_cache_uri``` to db_stress to allow the user to specify a custom ```SecondaryCache``` object from the object registry. Also allow db_crashtest.py to be run with an alternate db_stress location. Together, these changes will allow us to run db_stress using FB internal components. Pull Request resolved: https://github.com/facebook/rocksdb/pull/8455 Reviewed By: zhichao-cao Differential Revision: D29371972 Pulled By: anand1976 fbshipit-source-id: dd1b1fd80ebbedc11aa63d9246ea6ae49edb77c4	2021-06-27 23:54:39 -07:00
Akanksha Mahajan	be8199cdb9	Run Merge with Integrated BlobDB in stress, crash and db_bench (#8461 ) Summary: Run Merge with Intergrated BlobDB in stress tests, crash tests and db_bench. Pull Request resolved: https://github.com/facebook/rocksdb/pull/8461 Test Plan: 1. python3 -u tools/db_crashtest.py --simple whitebox ---use_merge=1 --enable_blob_files=1 2. ./db_bench --benchmarks="readwhilemerging" --merge_operator=uint64add --enable_blob_files=true Reviewed By: ltamasi Differential Revision: D29394824 Pulled By: akankshamahajan15 fbshipit-source-id: 0a8e492b13129673e088fb8af3402ab678bb473a	2021-06-25 10:45:52 -07:00
sdong	ab718b415f	Kill whitebox crash test if it is 15 minutes over the limit (#8341 ) Summary: Whitebox crash test can run significantly over the time limit for test slowness or no kiling points. This indefinite job can create problem when this test is periodically scheduled as a job. Instead, kill the job if it is 15 minutes over the limit. Refactor the code slightly to consolidate the code for executing commands for white and black box tests. Pull Request resolved: https://github.com/facebook/rocksdb/pull/8341 Test Plan: Run both of black and white box tests with both of natual and explicit kill condition. Reviewed By: jay-zhuang Differential Revision: D28756170 fbshipit-source-id: f253149890e62ace78f871be927e093e9b12f49b	2021-06-01 09:34:53 -07:00
Peter Dillinger	ecd63b9262	Revert accidental enabling broken ClockCache in stress test (#8277 ) Summary: From https://github.com/facebook/rocksdb/issues/8261 Pull Request resolved: https://github.com/facebook/rocksdb/pull/8277 Test Plan: briefly make blackbox_crash_test Reviewed By: zhichao-cao Differential Revision: D28270648 Pulled By: pdillinger fbshipit-source-id: 9bfd46c5a1a449165f6597bddb17af910331773f	2021-05-06 16:31:51 -07:00
Andrew Kryczka	b71b4597e7	Permit stdout "fail"/"error" in whitebox crash test (#8272 ) Summary: In https://github.com/facebook/rocksdb/issues/8268, the `db_stress` stdout began containing both the strings "fail" and "error" (case-insensitive). The whitebox crash test failed upon seeing either of those strings. I checked that all other occurrences of "fail" and "error" (case-insensitive) that `db_stress` produces are printed to `stderr`. So this PR separates the handling of `db_stress`'s stdout and stderr, and only fails when one those bad strings are found in stderr. The downside of this PR is `db_stress`'s original interleaving of stdout/stderr is not preserved in `db_crashtest.py`'s output. Pull Request resolved: https://github.com/facebook/rocksdb/pull/8272 Test Plan: run it; see it succeeds for several runs until encountering a real error ``` $ python3 tools/db_crashtest.py whitebox --simple --random_kill_odd=8887 --max_key=1000000 --value_size_mult=33 ... db_stress: cache/clock_cache.cc:483: bool rocksdb::{anonymous}::ClockCacheShard::Unref(rocksdb::{anonymous}::CacheHandle, bool, rocksdb::{anonymous}::CleanupContext): Assertion `CountRefs(flags) > 0' failed. TEST FAILED. Output has 'fail'!!! ``` Reviewed By: zhichao-cao Differential Revision: D28239233 Pulled By: ajkr fbshipit-source-id: 3b8602a0d570466a7e2c81bb9c49468f7716091e	2021-05-05 17:54:13 -07:00
Andrew Kryczka	0f42e50fec	Fix `GetLiveFiles()` returning OPTIONS-000000 (#8268 ) Summary: See release note in HISTORY.md. Pull Request resolved: https://github.com/facebook/rocksdb/pull/8268 Test Plan: unit test repro Reviewed By: siying Differential Revision: D28227901 Pulled By: ajkr fbshipit-source-id: faf61d13b9e43a761e3d5dcf8203923126b51339	2021-05-05 12:54:46 -07:00
Peter Dillinger	3b981eaa1d	Fix use-after-free threading bug in ClockCache (#8261 ) Summary: In testing for https://github.com/facebook/rocksdb/issues/8225 I found cache_bench would crash with -use_clock_cache, as well as db_bench -use_clock_cache, but not single-threaded. Smaller cache size hits failure much faster. ASAN reported the failuer as calling malloc_usable_size on the `key` pointer of a ClockCache handle after it was reportedly freed. On detailed inspection I found this bad sequence of operations for a cache entry: state=InCache=1,refs=1 [thread 1] Start ClockCacheShard::Unref (from Release, no mutex) [thread 1] Decrement ref count state=InCache=1,refs=0 [thread 1] Suspend before CalcTotalCharge (no mutex) [thread 2] Start UnsetInCache (from Insert, mutex held) [thread 2] clear InCache bit state=InCache=0,refs=0 [thread 2] Calls RecycleHandle (based on pre-updated state) [thread 2] Returns to Insert which calls Cleanup which deletes `key` [thread 1] Resume ClockCacheShard::Unref [thread 1] Read `key` in CalcTotalCharge To fix this, I've added a field to the handle to store the metadata charge so that we can efficiently remember everything we need from the handle in Unref. We must not read from the handle again if we decrement the count to zero with InCache=1, which means we don't own the entry and someone else could eject/overwrite it immediately. Note before this change, on amd64 sizeof(Handle) == 56 even though there are only 48 bytes of data. Grouping together the uint32_t fields would cut it down to 48, but I've added another uint32_t, which takes it back up to 56. Not a big deal. Also fixed DisownData to cooperate with ASAN as in LRUCache. Pull Request resolved: https://github.com/facebook/rocksdb/pull/8261 Test Plan: Manual + adding use_clock_cache to db_crashtest.py Base performance ./cache_bench -use_clock_cache Complete in 17.060 s; QPS = 2458513 New performance ./cache_bench -use_clock_cache Complete in 17.052 s; QPS = 2459695 Any difference is easily buried in small noise. Crash test shows still more bug(s) in ClockCache, so I'm expecting to disable ClockCache from production code in a follow-up PR (if we can't find and fix the bug(s)) Reviewed By: mrambacher Differential Revision: D28207358 Pulled By: pdillinger fbshipit-source-id: aa7a9322afc6f18f30e462c75dbbe4a1206eb294	2021-05-04 22:18:00 -07:00
sdong	cde69a7cfd	db_stress to add --open_metadata_write_fault_one_in (#8235 ) Summary: DB Stress to add --open_metadata_write_fault_one_in which would randomly fail in some file metadata modification operations during DB Open, including file creation, close, renaming and directory sync. Some operations can fail before and after the operations take place. If DB open fails, db_stress would retry without the failure ingestion, and DB is expected to open successfully. This option is enabled in crash test in half of the time. Some follow up changes would allow write failures in open time, and ingesting those failures in non-DB open cases. Pull Request resolved: https://github.com/facebook/rocksdb/pull/8235 Test Plan: Run stress tests for a while and see failures got triggered. This can reproduce the bug fixed by https://github.com/facebook/rocksdb/pull/8192 and a similar one that fails when fsyncing parent directory. Reviewed By: anand1976 Differential Revision: D28010944 fbshipit-source-id: 36a96da4dc3633e5f7680cef3ea0a900fcdb5558	2021-04-28 10:58:05 -07:00
Peter Dillinger	95f6add746	Revert Ribbon starting level support from #8198 (#8212 ) Summary: This partially reverts commit `10196d7edc`. The problem with this change is because of important filter use cases: FIFO compaction and SST writer. FIFO "compaction" always uses level 0 so would only use Ribbon filters if specifically including level 0 for the Ribbon filter policy. SST writer sets level_at_creation=-1 to indicate unknown level, and this would be treated the same as level 0 unless fixed. We are keeping the part about committing to permanent schema, which is only changes to API comments and HISTORY.md. Pull Request resolved: https://github.com/facebook/rocksdb/pull/8212 Test Plan: CI Reviewed By: jay-zhuang Differential Revision: D27896468 Pulled By: pdillinger fbshipit-source-id: 50a775f7cba5d64fb729d9b982e355864020596e	2021-04-20 19:46:40 -07:00
Peter Dillinger	10196d7edc	Ribbon long-term support, starting level support (#8198 ) Summary: Since the Ribbon filter schema seems good (compatible back to 6.15.0), this change commits to long term support of the SST schema, even though we expect the API for enabling Ribbon to change (still called NewExperimentalRibbonFilterPolicy). This also adds support for "hybrid" configuration in which some levels use Bloom (higher levels, lower numbered) for speed and the rest use Ribbon (lower levels, higher numbered) for memory space efficiency. Pull Request resolved: https://github.com/facebook/rocksdb/pull/8198 Test Plan: unit test added, crash test support Reviewed By: jay-zhuang Differential Revision: D27831232 Pulled By: pdillinger fbshipit-source-id: 90e528677689474d293ed6710b42ba89fbd5b5ab	2021-04-16 15:43:08 -07:00
Akanksha Mahajan	0be89e87fd	Enable backup/restore for Integrated BlobDB in stress and crash tests (#8165 ) Summary: Enable backup/restore functionality with Integrated BlobDB in db_stress and crash test. Pull Request resolved: https://github.com/facebook/rocksdb/pull/8165 Test Plan: Ran python3 -u tools/db_crashtest.py --simple whitebox along with : 1. decreased "backup_in_one" value for backups to be more frequent and 2. manually changed code for "enable_blob_file" to be always true and apply blobdb params 100% for testing purpose. Reviewed By: ltamasi Differential Revision: D27636025 Pulled By: akankshamahajan15 fbshipit-source-id: 0d0e0d1479ced163f992872dc998e79c581bfc99	2021-04-07 17:57:24 -07:00
Yanqin Jin	09528f9fa1	Fix a bug for SeekForPrev with partitioned filter and prefix (#8137 ) Summary: According to https://github.com/facebook/rocksdb/issues/5907, each filter partition "should include the bloom of the prefix of the last key in the previous partition" so that SeekForPrev() in prefix mode can return correct result. The prefix of the last key in the previous partition does not necessarily have the same prefix as the first key in the current partition. Regardless of the first key in current partition, the prefix of the last key in the previous partition should be added. The existing code, however, does not follow this. Furthermore, there is another issue: when finishing current filter partition, `FullFilterBlockBuilder::AddPrefix()` is called for the first key in next filter partition, which effectively overwrites `last_prefix_str_` prematurely. Consequently, when the filter block builder proceeds to the next partition, `last_prefix_str_` will be the prefix of its first key, leaving no way of adding the bloom of the prefix of the last key of the previous partition. Prefix extractor is FixedLength.2. ``` [ filter part 1 ] [ filter part 2 ] abc d ``` When SeekForPrev("abcd"), checking the filter partition will land on filter part 2 because "abcd" > "abc" but smaller than "d". If the filter in filter part 2 happens to return false for the test for "ab", then SeekForPrev("abcd") will build incorrect iterator tree in non-total-order mode. Also fix a unit test which starts to fail following this PR. `InDomain` should not fail due to assertion error when checking on an arbitrary key. Pull Request resolved: https://github.com/facebook/rocksdb/pull/8137 Test Plan: ``` make check ``` Without this fix, the following command will fail pretty soon. ``` ./db_stress --acquire_snapshot_one_in=10000 --avoid_flush_during_recovery=0 \ --avoid_unnecessary_blocking_io=0 --backup_max_size=104857600 --backup_one_in=0 \ --batch_protection_bytes_per_key=0 --block_size=16384 --bloom_bits=17 \ --bottommost_compression_type=disable --cache_index_and_filter_blocks=1 --cache_size=1048576 \ --checkpoint_one_in=0 --checksum_type=kxxHash64 --clear_column_family_one_in=0 \ --compact_files_one_in=1000000 --compact_range_one_in=1000000 --compaction_ttl=0 \ --compression_max_dict_buffer_bytes=0 --compression_max_dict_bytes=0 \ --compression_parallel_threads=1 --compression_type=zstd --compression_zstd_max_train_bytes=0 \ --continuous_verification_interval=0 --db=/dev/shm/rocksdb/rocksdb_crashtest_whitebox \ --db_write_buffer_size=8388608 --delpercent=5 --delrangepercent=0 --destroy_db_initially=0 --enable_blob_files=0 \ --enable_compaction_filter=0 --enable_pipelined_write=1 --file_checksum_impl=big --flush_one_in=1000000 \ --format_version=5 --get_current_wal_file_one_in=0 --get_live_files_one_in=1000000 --get_property_one_in=1000000 \ --get_sorted_wal_files_one_in=0 --index_block_restart_interval=4 --index_type=2 --ingest_external_file_one_in=0 \ --iterpercent=10 --key_len_percent_dist=1,30,69 --level_compaction_dynamic_level_bytes=True \ --log2_keys_per_lock=10 --long_running_snapshots=1 --mark_for_compaction_one_file_in=0 \ --max_background_compactions=20 --max_bytes_for_level_base=10485760 --max_key=100000000 --max_key_len=3 \ --max_manifest_file_size=1073741824 --max_write_batch_group_size_bytes=16777216 --max_write_buffer_number=3 \ --max_write_buffer_size_to_maintain=8388608 --memtablerep=skip_list --mmap_read=1 --mock_direct_io=False \ --nooverwritepercent=0 --open_files=500000 --ops_per_thread=20000000 --optimize_filters_for_memory=0 --paranoid_file_checks=1 --partition_filters=1 --partition_pinning=0 --pause_background_one_in=1000000 \ --periodic_compaction_seconds=0 --prefixpercent=5 --progress_reports=0 --read_fault_one_in=0 --read_only=0 \ --readpercent=45 --recycle_log_file_num=0 --reopen=20 --secondary_catch_up_one_in=0 \ --snapshot_hold_ops=100000 --sst_file_manager_bytes_per_sec=104857600 \ --sst_file_manager_bytes_per_truncate=0 --subcompactions=2 --sync=0 --sync_fault_injection=False \ --target_file_size_base=2097152 --target_file_size_multiplier=2 --test_batches_snapshots=0 --test_cf_consistency=0 \ --top_level_index_pinning=0 --unpartitioned_pinning=1 --use_blob_db=0 --use_block_based_filter=0 \ --use_direct_io_for_flush_and_compaction=0 --use_direct_reads=0 --use_full_merge_v1=0 --use_merge=0 \ --use_multiget=0 --use_ribbon_filter=0 --use_txn=0 --user_timestamp_size=8 --verify_checksum=1 \ --verify_checksum_one_in=1000000 --verify_db_one_in=100000 --write_buffer_size=4194304 \ --write_dbid_to_manifest=1 --writepercent=35 ``` Reviewed By: pdillinger Differential Revision: D27553054 Pulled By: riversand963 fbshipit-source-id: 60e391e4a2d8d98a9a3172ec5d6176b90ec3de98	2021-04-06 12:14:08 -07:00
Yanqin Jin	ae7a795686	Disable partitioned filters in ts stress test (#8127 ) Summary: Currently, partitioned filter does not support user-defined timestamp. Disable it for now in ts stress test so that the contrun jobs can proceed. Pull Request resolved: https://github.com/facebook/rocksdb/pull/8127 Test Plan: make crash_test_with_ts Reviewed By: ajkr Differential Revision: D27388488 Pulled By: riversand963 fbshipit-source-id: 5ccff18121cb537bd82f2ac072cd25efb625c666	2021-03-29 00:40:52 -07:00
Yanqin Jin	e1aa8c160f	Fix an error while running db_crashtest for non-user-ts tests (#8091 ) Summary: Fix the following error while running `make crash_test` ``` Traceback (most recent call last): File "tools/db_crashtest.py", line 705, in <module> main() File "tools/db_crashtest.py", line 696, in main blackbox_crash_main(args, unknown_args) File "tools/db_crashtest.py", line 479, in blackbox_crash_main + list({'db': dbname}.items())), unknown_args) File "tools/db_crashtest.py", line 414, in gen_cmd finalzied_params = finalize_and_sanitize(params) File "tools/db_crashtest.py", line 331, in finalize_and_sanitize dest_params.get("user_timestamp_size") > 0): TypeError: '>' not supported between instances of 'NoneType' and 'int' ``` Pull Request resolved: https://github.com/facebook/rocksdb/pull/8091 Test Plan: make crash_test Reviewed By: ltamasi Differential Revision: D27268276 Pulled By: riversand963 fbshipit-source-id: ed2873b9587ecc51e24abc35ef2bd3d91fb1ed1b	2021-03-23 12:45:20 -07:00
Yanqin Jin	08144bc2f5	Add user-defined timestamps to db_stress (#8061 ) Summary: Add some basic test for user-defined timestamp to db_stress. Currently, read with timestamp always tries to read using the current timestamp. Due to the per-key timestamp-sequence ordering constraint, we only add timestamp- related tests to the `NonBatchedOpsStressTest` since this test serializes accesses to the same key and uses a file to cross-check data correctness. The timestamp feature is not supported in a number of components, e.g. Merge, SingleDelete, DeleteRange, CompactionFilter, Readonly instance, secondary instance, SST file ingestion, transaction, etc. Therefore, db_stress should exit if user enables both timestamp and these features at the same time. The (currently) incompatible features can be found in `CheckAndSetOptionsForUserTimestamp`. This PR also fixes a bug triggered when timestamp is enabled together with `index_type=kBinarySearchWithFirstKey`. This bug fix will also be in another separate PR with more unit tests coverage. Fixing it here because I do not want to exclude the index type from crash test. Pull Request resolved: https://github.com/facebook/rocksdb/pull/8061 Test Plan: make crash_test_with_ts Reviewed By: jay-zhuang Differential Revision: D27056282 Pulled By: riversand963 fbshipit-source-id: c3e00ad1023fdb9ebbdf9601ec18270c5e2925a9	2021-03-23 05:13:30 -07:00
Levi Tamasi	0d800dadea	Adjust the set of potential min_blob_size values in stress/crash tests (#8085 ) Summary: Since our stress/crash tests by default generate values of size 8, 16, or 24, it does not make much sense to set `min_blob_size` to 256. The patch updates the set of potential `min_blob_size` values in the crash test script and in `db_stress` where it might be set dynamically using `SetOptions`. Pull Request resolved: https://github.com/facebook/rocksdb/pull/8085 Test Plan: Ran `make check` and tried the crash test script. Reviewed By: riversand963 Differential Revision: D27238620 Pulled By: ltamasi fbshipit-source-id: 4a96f9944b1ed9220d3045c5ab0b34c49009aeee	2021-03-22 14:38:09 -07:00
Yanqin Jin	1f11d07f24	Enable compact filter for blob in dbstress and dbbench (#8011 ) Summary: As title. Pull Request resolved: https://github.com/facebook/rocksdb/pull/8011 Test Plan: ``` ./db_bench -enable_blob_files=1 -use_keep_filter=1 -disable_auto_compactions=1 /db_stress -enable_blob_files=1 -enable_compaction_filter=1 -acquire_snapshot_one_in=0 -compact_range_one_in=0 -iterpercent=0 -test_batches_snapshots=0 -readpercent=10 -prefixpercent=20 -writepercent=55 -delpercent=15 -continuous_verification_interval=0 ``` Reviewed By: ltamasi Differential Revision: D26736061 Pulled By: riversand963 fbshipit-source-id: 1c7834903c28431ce23324c4f259ed71255614e2	2021-03-01 17:24:47 -08:00
Andrew Kryczka	d904233d2f	Limit buffering for collecting samples for compression dictionary (#7970 ) Summary: For dictionary compression, we need to collect some representative samples of the data to be compressed, which we use to either generate or train (when `CompressionOptions::zstd_max_train_bytes > 0`) a dictionary. Previously, the strategy was to buffer all the data blocks during flush, and up to the target file size during compaction. That strategy allowed us to randomly pick samples from as wide a range as possible that'd be guaranteed to land in a single output file. However, some users try to make huge files in memory-constrained environments, where this strategy can cause OOM. This PR introduces an option, `CompressionOptions::max_dict_buffer_bytes`, that limits how much data blocks are buffered before we switch to unbuffered mode (which means creating the per-SST dictionary, writing out the buffered data, and compressing/writing new blocks as soon as they are built). It is not strict as we currently buffer more than just data blocks -- also keys are buffered. But it does make a step towards giving users predictable memory usage. Related changes include: - Changed sampling for dictionary compression to select unique data blocks when there is limited availability of data blocks - Made use of `BlockBuilder::SwapAndReset()` to save an allocation+memcpy when buffering data blocks for building a dictionary - Changed `ParseBoolean()` to accept an input containing characters after the boolean. This is necessary since, with this PR, a value for `CompressionOptions::enabled` is no longer necessarily the final component in the `CompressionOptions` string. Pull Request resolved: https://github.com/facebook/rocksdb/pull/7970 Test Plan: - updated `CompressionOptions` unit tests to verify limit is respected (to the extent expected in the current implementation) in various scenarios of flush/compaction to bottommost/non-bottommost level - looked at jemalloc heap profiles right before and after switching to unbuffered mode during flush/compaction. Verified memory usage in buffering is proportional to the limit set. Reviewed By: pdillinger Differential Revision: D26467994 Pulled By: ajkr fbshipit-source-id: 3da4ef9fba59974e4ef40e40c01611002c861465	2021-02-19 14:09:54 -08:00
Levi Tamasi	dab4fe5bcd	Add checkpoint support to BlobDB (#7959 ) Summary: The patch adds checkpoint support to BlobDB. Blob files are hard linked or copied, depending on whether the checkpoint directory is on the same filesystem or not, similarly to table files. TODO: Add support for blob files to `ExportColumnFamily` and to the checksum verification logic used by backup/restore. Pull Request resolved: https://github.com/facebook/rocksdb/pull/7959 Test Plan: Ran `make check` and the crash test for a while. Reviewed By: riversand963 Differential Revision: D26434768 Pulled By: ltamasi fbshipit-source-id: 994be55a8dc08133028250760fca440d2c7c4dc5	2021-02-17 12:42:36 -08:00
Levi Tamasi	0288bdbc53	Add the integrated BlobDB to the stress/crash tests (#7900 ) Summary: The patch adds support for the options related to the new BlobDB implementation to `db_stress`, including support for dynamically adjusting them using `SetOptions` when `set_options_one_in` and a new flag `allow_setting_blob_options_dynamically` are specified. (The latter is used to prevent the options from being enabled when incompatible features are in use.) The patch also updates the `db_stress` help messages of the existing stacked BlobDB related options to clarify that they pertain to the old implementation. In addition, it adds the new BlobDB to the crash test script. In order to prevent a combinatorial explosion of jobs and still perform whitebox/blackbox testing (including under ASAN/TSAN/UBSAN), and to also test BlobDB in conjunction with atomic flush and transactions, the script sets the BlobDB options in 10% of normal/`cf_consistency`/`txn` crash test runs. Pull Request resolved: https://github.com/facebook/rocksdb/pull/7900 Test Plan: Ran `make check` and `db_stress`/`db_crashtest.py` with various options. Reviewed By: jay-zhuang Differential Revision: D26094913 Pulled By: ltamasi fbshipit-source-id: c2ef3391a05e43a9687f24e297df05f4a5584814	2021-02-02 11:41:18 -08:00
Andrew Kryczka	78ee8564ad	Integrity protection for live updates to WriteBatch (#7748 ) Summary: This PR adds the foundation classes for key-value integrity protection and the first use case: protecting live updates from the source buffers added to `WriteBatch` through the destination buffer in `MemTable`. The width of the protection info is not yet configurable -- only eight bytes per key is supported. This PR allows users to enable protection by constructing `WriteBatch` with `protection_bytes_per_key == 8`. It does not yet expose a way for users to get integrity protection via other write APIs (e.g., `Put()`, `Merge()`, `Delete()`, etc.). The foundation classes (`ProtectionInfo.`) embed the coverage info in their type, and provide `Protect.()` and `Strip.()` functions to navigate between types with different coverage. For making bytes per key configurable (for powers of two up to eight) in the future, these classes are templated on the unsigned integer type used to store the protection info. That integer contains the XOR'd result of hashes with independent seeds for all covered fields. For integer fields, the hash is computed on the raw unadjusted bytes, so the result is endian-dependent. The most significant bytes are truncated when the hash value (8 bytes) is wider than the protection integer. When `WriteBatch` is constructed with `protection_bytes_per_key == 8`, we hold a `ProtectionInfoKVOTC` (i.e., one that covers key, value, optype aka `ValueType`, timestamp, and CF ID) for each entry added to the batch. The protection info is generated from the original buffers passed by the user, as well as the original metadata generated internally. When writing to memtable, each entry is transformed to a `ProtectionInfoKVOTS` (i.e., dropping coverage of CF ID and adding coverage of sequence number), since at that point we know the sequence number, and have already selected a memtable corresponding to a particular CF. This protection info is verified once the entry is encoded in the `MemTable` buffer. Pull Request resolved: https://github.com/facebook/rocksdb/pull/7748 Test Plan: - an integration test to verify a wide variety of single-byte changes to the encoded `MemTable` buffer are caught - add to stress/crash test to verify it works in variety of configs/operations without intentional corruption - [deferred] unit tests for `ProtectionInfo.` classes for edge cases like KV swap, `SliceParts` and `Slice` APIs are interchangeable, etc. Reviewed By: pdillinger Differential Revision: D25754492 Pulled By: ajkr fbshipit-source-id: e481bac6c03c2ab268be41359730f1ceb9964866	2021-01-29 12:18:58 -08:00
Zhichao Cao	04b3524ad0	Inject the random write error to stress test (#7653 ) Summary: Inject the random write error to stress test, it requires set reopen=0 and disable_wal=true. Pull Request resolved: https://github.com/facebook/rocksdb/pull/7653 Test Plan: pass db_stress and python3 db_crashtest.py blackbox Reviewed By: ajkr Differential Revision: D25354132 Pulled By: zhichao-cao fbshipit-source-id: 44721104eecb416e27f65f854912c40e301dd669	2020-12-17 11:52:28 -08:00
Peter Dillinger	60af964372	Experimental (production candidate) SST schema for Ribbon filter (#7658 ) Summary: Added experimental public API for Ribbon filter: NewExperimentalRibbonFilterPolicy(). This experimental API will take a "Bloom equivalent" bits per key, and configure the Ribbon filter for the same FP rate as Bloom would have but ~30% space savings. (Note: optimize_filters_for_memory is not yet implemented for Ribbon filter. That can be added with no effect on schema.) Internally, the Ribbon filter is configured using a "one_in_fp_rate" value, which is 1 over desired FP rate. For example, use 100 for 1% FP rate. I'm expecting this will be used in the future for configuring Bloom-like filters, as I expect people to more commonly hold constant the filter accuracy and change the space vs. time trade-off, rather than hold constant the space (per key) and change the accuracy vs. time trade-off, though we might make that available. ### Benchmarking ``` $ ./filter_bench -impl=2 -quick -m_keys_total_max=200 -average_keys_per_filter=100000 -net_includes_hashing Building... Build avg ns/key: 34.1341 Number of filters: 1993 Total size (MB): 238.488 Reported total allocated memory (MB): 262.875 Reported internal fragmentation: 10.2255% Bits/key stored: 10.0029 ---------------------------- Mixed inside/outside queries... Single filter net ns/op: 18.7508 Random filter net ns/op: 258.246 Average FP rate %: 0.968672 ---------------------------- Done. (For more info, run with -legend or -help.) $ ./filter_bench -impl=3 -quick -m_keys_total_max=200 -average_keys_per_filter=100000 -net_includes_hashing Building... Build avg ns/key: 130.851 Number of filters: 1993 Total size (MB): 168.166 Reported total allocated memory (MB): 183.211 Reported internal fragmentation: 8.94626% Bits/key stored: 7.05341 ---------------------------- Mixed inside/outside queries... Single filter net ns/op: 58.4523 Random filter net ns/op: 363.717 Average FP rate %: 0.952978 ---------------------------- Done. (For more info, run with -legend or -help.) ``` 168.166 / 238.488 = 0.705 -> 29.5% space reduction 130.851 / 34.1341 = 3.83x construction time for this Ribbon filter vs. lastest Bloom filter (could make that as little as about 2.5x for less space reduction) ### Working around a hashing "flaw" bloom_test discovered a flaw in the simple hashing applied in StandardHasher when num_starts == 1 (num_slots == 128), showing an excessively high FP rate. The problem is that when many entries, on the order of number of hash bits or kCoeffBits, are associated with the same start location, the correlation between the CoeffRow and ResultRow (for efficiency) can lead to a solution that is "universal," or nearly so, for entries mapping to that start location. (Normally, variance in start location breaks the effective association between CoeffRow and ResultRow; the same value for CoeffRow is effectively different if start locations are different.) Without kUseSmash and with num_starts > 1 (thus num_starts ~= num_slots), this flaw should be completely irrelevant. Even with 10M slots, the chances of a single slot having just 16 (or more) entries map to it--not enough to cause an FP problem, which would be local to that slot if it happened--is 1 in millions. This spreadsheet formula shows that: =1/(10000000(1 - POISSON(15, 1, TRUE))) As kUseSmash==false (the setting for Standard128RibbonBitsBuilder) is intended for CPU efficiency of filters with many more entries/slots than kCoeffBits, a very reasonable work-around is to disallow num_starts==1 when !kUseSmash, by making the minimum non-zero number of slots 2kCoeffBits. This is the work-around I've applied. This also means that the new Ribbon filter schema (Standard128RibbonBitsBuilder) is not space-efficient for less than a few hundred entries. Because of this, I have made it fall back on constructing a Bloom filter, under existing schema, when that is more space efficient for small filters. (We can change this in the future if we want.) TODO: better unit tests for this case in ribbon_test, and probably update StandardHasher for kUseSmash case so that it can scale nicely to small filters. ### Other related changes * Add Ribbon filter to stress/crash test * Add Ribbon filter to filter_bench as -impl=3 * Add option string support, as in "filter_policy=experimental_ribbon:5.678;" where 5.678 is the Bloom equivalent bits per key. * Rename internal mode BloomFilterPolicy::kAuto to kAutoBloom * Add a general BuiltinFilterBitsBuilder::CalculateNumEntry based on binary searching CalculateSpace (inefficient), so that subclasses (especially experimental ones) don't have to provide an efficient implementation inverting CalculateSpace. * Minor refactor FastLocalBloomBitsBuilder for new base class XXH3pFilterBitsBuilder shared with new Standard128RibbonBitsBuilder, which allows the latter to fall back on Bloom construction in some extreme cases. * Mostly updated bloom_test for Ribbon filter, though a test like FullBloomTest::Schema is a next TODO to ensure schema stability (in case this becomes production-ready schema as it is). * Add some APIs to ribbon_impl.h for configuring Ribbon filters. Although these are reasonably covered by bloom_test, TODO more unit tests in ribbon_test * Added a "tool" FindOccupancyForSuccessRate to ribbon_test to get data for constructing the linear approximations in GetNumSlotsFor95PctSuccess. Pull Request resolved: https://github.com/facebook/rocksdb/pull/7658 Test Plan: Some unit tests updated but other testing is left TODO. This is considered experimental but laying down schema compatibility as early as possible in case it proves production-quality. Also tested in stress/crash test. Reviewed By: jay-zhuang Differential Revision: D24899349 Pulled By: pdillinger fbshipit-source-id: 9715f3e6371c959d923aea8077c9423c7a9f82b8	2020-11-12 20:46:14 -08:00
Akanksha Mahajan	202605143b	Fix crash test to run in DEBUG_LEVEL=0 mode in tmpfs (#7643 ) Summary: crash tests donot run in DEBUG_MODE=0 on tmpfs when use_direct_reads/use_direct_io_for_flush_and_compaction is set randomly because direct I/O is not supported on tmpfs and tests exit. Fix: Sanitize direct I/O read options in DEBUG_LEVEL=0 so that crash tests can run in tmpfs. When mmap_reads is set, direct I/O reads options are unset so we can sanitize direct I/O reads options in case of tmpfs as well. Pull Request resolved: https://github.com/facebook/rocksdb/pull/7643 Test Plan: 1. export DEBUG_LEVEL=0; export TEST_TMPDIR="/dev/shm"; export CRASH_TEST_EXT_ARGS="--use_direct_reads=1 --mmap_read=0"; make crash_test -j64 2. In DEBUG_LEVEL=1 mode: make crash_test -j64 Reviewed By: jay-zhuang Differential Revision: D24766550 Pulled By: akankshamahajan15 fbshipit-source-id: 021720b2343c12c72004f84b26147625d3991d9e	2020-11-10 10:50:34 -08:00
Akanksha Mahajan	06a92fcf5c	Add "max_write_buffer_size_to_maintain" to crash test (#7634 ) Summary: Add "max_write_buffer_size_to_maintain" to crash test Pull Request resolved: https://github.com/facebook/rocksdb/pull/7634 Test Plan: make crash_test -j64 Reviewed By: zhichao-cao Differential Revision: D24710401 Pulled By: akankshamahajan15 fbshipit-source-id: 89e0412aaa56b2ef5a75603971b82f4b0b494ab7	2020-11-03 13:55:18 -08:00
Jay Zhuang	9e03e4dd52	db_crashtest preserves expected values file for failed crash tests (#7534 ) Summary: If crash test fails, don't delete the `expected_values_file` for later debug. More details: https://github.com/facebook/rocksdb/issues/7530 Pull Request resolved: https://github.com/facebook/rocksdb/pull/7534 Test Plan: local host Reviewed By: ajkr Differential Revision: D24239655 Pulled By: jay-zhuang fbshipit-source-id: 3566f91a30aae1e27d2f51d910cddd08edb7d4cf	2020-10-12 14:10:14 -07:00
Andrew Kryczka	75d3b6fdf0	Redesign block cache pinning API (#7520 ) Summary: The old flag-based APIs (`BlockBasedTableOptions::pin_l0_filter_and_index_blocks_in_cache` and `BlockBasedTableOptions::pin_top_level_index_and_filter`) were insufficient for our needs. For example, it was impossible to pin only unpartitioned meta-blocks, which could prevent block cache contention when turning on dictionary compression or during a migration to partitioned indexes/filters. It was also impossible to pin all meta-blocks in memory while having predictable memory usage via block cache. If we had continued adding flags to address these scenarios, they would have had significant overlap causing confusion. Instead, this PR deprecates the flags and starts a new API with non-overlapping options. Pull Request resolved: https://github.com/facebook/rocksdb/pull/7520 Test Plan: - new unit test - added new options to stress/crash test and ran for a while: `$ python tools/db_crashtest.py blackbox --simple --max_key=1000000 -write_buffer_size=1048576 -target_file_size_base=1048576 -max_bytes_for_level_base=4194304 --interval=10 -value_size_mult=33 -column_families=1 -reopen=0` Reviewed By: pdillinger Differential Revision: D24200034 Pulled By: ajkr fbshipit-source-id: 3fa7cfc71e7960f7a867511dd6ae5834dd73b13e	2020-10-11 14:58:24 -07:00
sdong	82d42606ad	Enable paranoid_file_checks in crash test (#7489 ) Summary: Cover paranoid_file_checks in crash test. Pull Request resolved: https://github.com/facebook/rocksdb/pull/7489 Test Plan: Run crash tests for hours and didn't see any failure. Reviewed By: ajkr Differential Revision: D24063868 fbshipit-source-id: 7b48b110e66ce78ae5d0c99a9f32af86edd34c1e	2020-10-09 08:32:22 -07:00
sdong	aedcaaef99	Stress test to support paranoid_file_checks (#7473 ) Summary: It's important to make sure no false positive is reported when options.paranoid_file_checks is used. Add it to stress test and a place holder in crash test. It is disabled in crash test as there appears to be a bug causing false positive. Pull Request resolved: https://github.com/facebook/rocksdb/pull/7473 Test Plan: Run crash test Reviewed By: ajkr Differential Revision: D24026939 fbshipit-source-id: 89102acb45cf041776775ce44a4eef4b0f3a380c	2020-09-30 14:41:33 -07:00
Peter Dillinger	4e258d3e63	Fix backup/restore in stress/crash test (#7357 ) Summary: (1) Skip check on specific key if restoring an old backup (small minority of cases) because it can fail in those cases. (2) Remove an old assertion about number of column families and number of keys passed in, which is broken by atomic flush (cf_consistency) test. Like other code (for better or worse) assume a single key and iterate over column families. (3) Apply mock_direct_io to NewSequentialFile so that db_stress backup works on /dev/shm. Also add more context to output in case of backup/restore db_stress failure. Also a minor fix to BackupEngine to report first failure status in creating new backup, and drop another clue about the potential source of a "Backup failed" status. Reverts "Disable backup/restore stress test (https://github.com/facebook/rocksdb/issues/7350)" Pull Request resolved: https://github.com/facebook/rocksdb/pull/7357 Test Plan: Using backup_one_in=10000, "USE_CLANG=1 make crash_test_with_atomic_flush" for 30+ minutes "USE_CLANG=1 make blackbox_crash_test" for 30+ minutes And with use_direct_reads with TEST_TMPDIR=/dev/shm/rocksdb Reviewed By: riversand963 Differential Revision: D23567244 Pulled By: pdillinger fbshipit-source-id: e77171c2e8394d173917e36898c02dead1c40b77	2020-09-08 10:50:19 -07:00
Jay Zhuang	ef32f11004	Disable backup/restore stress test (#7350 ) Summary: Seems it's causing some tests failures. Pull Request resolved: https://github.com/facebook/rocksdb/pull/7350 Reviewed By: riversand963 Differential Revision: D23544109 Pulled By: jay-zhuang fbshipit-source-id: 798a0ca374a20b6c2d0f29582729ff101c6a2e99	2020-09-04 11:58:18 -07:00
Peter Dillinger	06ad5dd293	Add file checksum to stress/crash test (#7343 ) Summary: This change has the crash test randomly select from a few file checksum implementations, or nullptr, for DB file_checksum_gen_factory. For compatibility across runs on same DB, each non-null factory can understand all the other functions, but the default changes. Pull Request resolved: https://github.com/facebook/rocksdb/pull/7343 Test Plan: 'make blackbox_crash_test' for a while, including with some debug output to ensure code is being exercised. Reviewed By: zhichao-cao Differential Revision: D23494580 Pulled By: pdillinger fbshipit-source-id: 73bbc7ca32c1adaf619134c0c830f12894880b8a	2020-09-03 23:50:33 -07:00
Peter Dillinger	499c9448d0	Fix, enable, and enhance backup/restore in db_stress (#7348 ) Summary: Although added to db_stress, testing of backup/restore was never integrated into the crash test, originally concerned about performance. I've enabled it now and to address the peformance concern, testing backup/restore is always skipped once the db exceeds a certain size threshold, default 100MB. This should provide sufficient opportunity for testing BackupEngine without bogging down everything else with heavier and heavier operations. Also fixed backup/restore in db_stress by making sure PurgeOldBackups can remove manifest files, which are normally kept around for db_stress. Added more coverage of backup options, and up to three backups being saved in one backup directory (in some cases). Pull Request resolved: https://github.com/facebook/rocksdb/pull/7348 Test Plan: ran 'make blackbox_crash_test' for a while, with heightened probabilitly of taking backups (1/10k). Also confirmed with some debug output that the code is being covered, TestBackupRestore only takes a few seconds to complete when triggered, and even at 1/10k and ~50MB database, there's <,~ 1 thread testing backups at any time. Reviewed By: ajkr Differential Revision: D23510835 Pulled By: pdillinger fbshipit-source-id: b6b8735591808141f81f10773ac31634cf03b6c0	2020-09-03 20:13:15 -07:00
Andrew Kryczka	7eebe6d38a	Mark files for compaction in stress/crash tests (#7231 ) Summary: The mechanism to mark files for compaction is most commonly used in delete-triggered compaction. This PR adds an option to exercise the marking mechanism on random files created by db_stress. This PR also enables that option in db_crashtest.py on its db_stress runs at random. Pull Request resolved: https://github.com/facebook/rocksdb/pull/7231 Test Plan: - ran some minified crash tests; verified they succeed and we see `"compaction_reason": "FilesMarkedForCompaction"` regularly in the logs. ``` $ TEST_TMPDIR=/dev/shm python tools/db_crashtest.py blackbox --duration=600 --interval=30 --max_key=10000000 --write_buffer_size=1048576 --target_file_size_base=1048576 --max_bytes_for_level_base=4194304 --value_size_mult=33 $ TEST_TMPDIR=/dev/shm python tools/db_crashtest.py whitebox --duration=600 --interval=30 --max_key=1000000 --write_buffer_size=1048576 --target_file_size_base=1048576 --max_bytes_for_level_base=4194304 --value_size_mult=33 --random_kill_odd=8887 ``` Reviewed By: anand1976 Differential Revision: D23025156 Pulled By: ajkr fbshipit-source-id: a404c467ebc12afa94dae35956ea9b372f592a96	2020-08-10 16:17:56 -07:00
Jay Zhuang	fc4d5f5065	Add stress test for GetProperty (#7111 ) Summary: Add stress test coverage for `DB::GetProperty()`. Pull Request resolved: https://github.com/facebook/rocksdb/pull/7111 Test Plan: ``` ./db_stress -get_property_one_in=1 make crash_test ``` Reviewed By: ajkr Differential Revision: D22487906 Pulled By: jay-zhuang fbshipit-source-id: c118d95cc9b4e2fa669a06e6aa531541fa885dc5	2020-07-14 12:12:36 -07:00
Andrew Kryczka	8efb5cfb6f	add SstFileManager to crash test (#6993 ) Summary: SstFileManager is already supported in the stress test as of https://github.com/facebook/rocksdb/issues/6454. This PR enables the SstFileManager in some of the crash test runs. Pull Request resolved: https://github.com/facebook/rocksdb/pull/6993 Reviewed By: riversand963 Differential Revision: D22084406 Pulled By: ajkr fbshipit-source-id: 78b8642682e7570ff6ec3a1c3ccd9940f4362289	2020-06-23 16:27:20 -07:00
Peter Dillinger	5b2bbacb6f	Minimize memory internal fragmentation for Bloom filters (#6427 ) Summary: New experimental option BBTO::optimize_filters_for_memory builds filters that maximize their use of "usable size" from malloc_usable_size, which is also used to compute block cache charges. Rather than always "rounding up," we track state in the BloomFilterPolicy object to mix essentially "rounding down" and "rounding up" so that the average FP rate of all generated filters is the same as without the option. (YMMV as heavily accessed filters might be unluckily lower accuracy.) Thus, the option near-minimizes what the block cache considers as "memory used" for a given target Bloom filter false positive rate and Bloom filter implementation. There are no forward or backward compatibility issues with this change, though it only works on the format_version=5 Bloom filter. With Jemalloc, we see about 10% reduction in memory footprint (and block cache charge) for Bloom filters, but 1-2% increase in storage footprint, due to encoding efficiency losses (FP rate is non-linear with bits/key). Why not weighted random round up/down rather than state tracking? By only requiring malloc_usable_size, we don't actually know what the next larger and next smaller usable sizes for the allocator are. We pick a requested size, accept and use whatever usable size it has, and use the difference to inform our next choice. This allows us to narrow in on the right balance without tracking/predicting usable sizes. Why not weight history of generated filter false positive rates by number of keys? This could lead to excess skew in small filters after generating a large filter. Results from filter_bench with jemalloc (irrelevant details omitted): (normal keys/filter, but high variance) $ ./filter_bench -quick -impl=2 -average_keys_per_filter=30000 -vary_key_count_ratio=0.9 Build avg ns/key: 29.6278 Number of filters: 5516 Total size (MB): 200.046 Reported total allocated memory (MB): 220.597 Reported internal fragmentation: 10.2732% Bits/key stored: 10.0097 Average FP rate %: 0.965228 $ ./filter_bench -quick -impl=2 -average_keys_per_filter=30000 -vary_key_count_ratio=0.9 -optimize_filters_for_memory Build avg ns/key: 30.5104 Number of filters: 5464 Total size (MB): 200.015 Reported total allocated memory (MB): 200.322 Reported internal fragmentation: 0.153709% Bits/key stored: 10.1011 Average FP rate %: 0.966313 (very few keys / filter, optimization not as effective due to ~59 byte internal fragmentation in blocked Bloom filter representation) $ ./filter_bench -quick -impl=2 -average_keys_per_filter=1000 -vary_key_count_ratio=0.9 Build avg ns/key: 29.5649 Number of filters: 162950 Total size (MB): 200.001 Reported total allocated memory (MB): 224.624 Reported internal fragmentation: 12.3117% Bits/key stored: 10.2951 Average FP rate %: 0.821534 $ ./filter_bench -quick -impl=2 -average_keys_per_filter=1000 -vary_key_count_ratio=0.9 -optimize_filters_for_memory Build avg ns/key: 31.8057 Number of filters: 159849 Total size (MB): 200 Reported total allocated memory (MB): 208.846 Reported internal fragmentation: 4.42297% Bits/key stored: 10.4948 Average FP rate %: 0.811006 (high keys/filter) $ ./filter_bench -quick -impl=2 -average_keys_per_filter=1000000 -vary_key_count_ratio=0.9 Build avg ns/key: 29.7017 Number of filters: 164 Total size (MB): 200.352 Reported total allocated memory (MB): 221.5 Reported internal fragmentation: 10.5552% Bits/key stored: 10.0003 Average FP rate %: 0.969358 $ ./filter_bench -quick -impl=2 -average_keys_per_filter=1000000 -vary_key_count_ratio=0.9 -optimize_filters_for_memory Build avg ns/key: 30.7131 Number of filters: 160 Total size (MB): 200.928 Reported total allocated memory (MB): 200.938 Reported internal fragmentation: 0.00448054% Bits/key stored: 10.1852 Average FP rate %: 0.963387 And from db_bench (block cache) with jemalloc: $ ./db_bench -db=/dev/shm/dbbench.no_optimize -benchmarks=fillrandom -format_version=5 -value_size=90 -bloom_bits=10 -num=2000000 -threads=8 -compaction_style=2 -fifo_compaction_max_table_files_size_mb=10000 -fifo_compaction_allow_compaction=false $ ./db_bench -db=/dev/shm/dbbench -benchmarks=fillrandom -format_version=5 -value_size=90 -bloom_bits=10 -num=2000000 -threads=8 -optimize_filters_for_memory -compaction_style=2 -fifo_compaction_max_table_files_size_mb=10000 -fifo_compaction_allow_compaction=false $ (for FILE in /dev/shm/dbbench.no_optimize/.sst; do ./sst_dump --file=$FILE --show_properties \| grep 'filter block' ; done) \| awk '{ t += $4; } END { print t; }' 17063835 $ (for FILE in /dev/shm/dbbench/.sst; do ./sst_dump --file=$FILE --show_properties \| grep 'filter block' ; done) \| awk '{ t += $4; } END { print t; }' 17430747 $ #^ 2.1% additional filter storage $ ./db_bench -db=/dev/shm/dbbench.no_optimize -use_existing_db -benchmarks=readrandom,stats -statistics -bloom_bits=10 -num=2000000 -compaction_style=2 -fifo_compaction_max_table_files_size_mb=10000 -fifo_compaction_allow_compaction=false -duration=10 -cache_index_and_filter_blocks -cache_size=1000000000 rocksdb.block.cache.index.add COUNT : 33 rocksdb.block.cache.index.bytes.insert COUNT : 8440400 rocksdb.block.cache.filter.add COUNT : 33 rocksdb.block.cache.filter.bytes.insert COUNT : 21087528 rocksdb.bloom.filter.useful COUNT : 4963889 rocksdb.bloom.filter.full.positive COUNT : 1214081 rocksdb.bloom.filter.full.true.positive COUNT : 1161999 $ #^ 1.04 % observed FP rate $ ./db_bench -db=/dev/shm/dbbench -use_existing_db -benchmarks=readrandom,stats -statistics -bloom_bits=10 -num=2000000 -compaction_style=2 -fifo_compaction_max_table_files_size_mb=10000 -fifo_compaction_allow_compaction=false -optimize_filters_for_memory -duration=10 -cache_index_and_filter_blocks -cache_size=1000000000 rocksdb.block.cache.index.add COUNT : 33 rocksdb.block.cache.index.bytes.insert COUNT : 8448592 rocksdb.block.cache.filter.add COUNT : 33 rocksdb.block.cache.filter.bytes.insert COUNT : 18220328 rocksdb.bloom.filter.useful COUNT : 5360933 rocksdb.bloom.filter.full.positive COUNT : 1321315 rocksdb.bloom.filter.full.true.positive COUNT : 1262999 $ #^ 1.08 % observed FP rate, 13.6% less memory usage for filters (Due to specific key density, this example tends to generate filters that are "worse than average" for internal fragmentation. "Better than average" cases can show little or no improvement.) Pull Request resolved: https://github.com/facebook/rocksdb/pull/6427 Test Plan: unit test added, 'make check' with gcc, clang and valgrind Reviewed By: siying Differential Revision: D22124374 Pulled By: pdillinger fbshipit-source-id: f3e3aa152f9043ddf4fae25799e76341d0d8714e	2020-06-22 13:32:07 -07:00
Andrew Kryczka	d76eed4839	minor fixes for stress/crash contruns (#7006 ) Summary: Avoid using `cf_consistency` together with `enable_compaction_filter` as the former heavily uses snapshots while the latter is incompatible with snapshots. Also fix a clang-analyze error for a write to a variable that is never read. Pull Request resolved: https://github.com/facebook/rocksdb/pull/7006 Reviewed By: zhichao-cao Differential Revision: D22141679 Pulled By: ajkr fbshipit-source-id: 1840ae238168818a9ab5973f90fd78c067399447	2020-06-19 16:05:17 -07:00
Peter Dillinger	88b4210701	Remove racially charged terms "whitelist" and "blacklist" (#7008 ) Summary: We don't need them. Pull Request resolved: https://github.com/facebook/rocksdb/pull/7008 Test Plan: "make check" and ensure "make crash_test" starts Reviewed By: ajkr Differential Revision: D22143838 Pulled By: pdillinger fbshipit-source-id: 72c8e16603abc59f4954e304466bc4dc1f58f94e	2020-06-19 15:27:32 -07:00
Andrew Kryczka	775dc623ad	add `CompactionFilter` to stress/crash tests (#6988 ) Summary: Added a `CompactionFilter` that is aware of the stress test's expected state. It only drops key versions that are already covered according to the expected state. It is incompatible with snapshots (same as all `CompactionFilter`s), so disables all snapshot-related features when used in the crash test. Pull Request resolved: https://github.com/facebook/rocksdb/pull/6988 Test Plan: running a minified blackbox crash test ``` $ TEST_TMPDIR=/dev/shm python tools/db_crashtest.py blackbox --max_key=1000000 -write_buffer_size=1048576 -max_bytes_for_level_base=4194304 -target_file_size_base=1048576 -value_size_mult=33 --interval=10 --duration=3600 ``` Reviewed By: anand1976 Differential Revision: D22072888 Pulled By: ajkr fbshipit-source-id: 727b9d7a90d5eab18be0ec6cd5a810712ac13320	2020-06-18 09:54:55 -07:00
Yanqin Jin	15d9f28da5	Add stress test for best-efforts recovery (#6819 ) Summary: Add crash test for the case of best-efforts recovery. After a certain amount of time, we kill the db_stress process, randomly delete some certain table files and restart db_stress. Given the randomness of file deletion, it is difficult to verify against a reference for data correctness. Therefore, we just check that the db can restart successfully. Pull Request resolved: https://github.com/facebook/rocksdb/pull/6819 Test Plan: ``` ./db_stress -best_efforts_recovery=true -disable_wal=1 -reopen=0 ./db_stress -best_efforts_recovery=true -disable_wal=0 -skip_verifydb=1 -verify_db_one_in=0 -continuous_verification_interval=0 make crash_test_with_best_efforts_recovery ``` Reviewed By: anand1976 Differential Revision: D21436753 Pulled By: riversand963 fbshipit-source-id: 0b3605c922a16c37ed17d5ab6682ca4240e47926	2020-06-12 19:27:46 -07:00
Peter Dillinger	0c56fc4d66	Allow missing "unversioned" python, as in CentOS 8 (#6883 ) Summary: RocksDB Makefile was assuming existence of 'python' command, which is not present in CentOS 8. We avoid using 'python' if 'python3' is available. Also added fancy logic to format-diff.sh to make clang-format-diff.py for Python2 work even with Python3 only (as some CentOS 8 FB machines come equipped) Also, now use just 'python3' for PYTHON if not found so that an informative "command not found" error will result rather than something weird. Pull Request resolved: https://github.com/facebook/rocksdb/pull/6883 Test Plan: manually tried some variants, 'make check' on a fresh CentOS 8 machine without 'python' executable or Python2 but with clang-format-diff.py for Python2. Reviewed By: gg814 Differential Revision: D21767029 Pulled By: pdillinger fbshipit-source-id: 54761b376b140a3922407bdc462f3572f461d0e9	2020-05-29 11:29:23 -07:00
sdong	12825894a2	Disable "compression_parallel_threads" in crash test for now (#6816 ) Summary: "compressio_parallel_threads" caused several test failure tests. To keep crash test clean, disable it for now. Pull Request resolved: https://github.com/facebook/rocksdb/pull/6816 Test Plan: "make crash_test" to make sure the python script doesn't break Reviewed By: zhichao-cao Differential Revision: D21462112 fbshipit-source-id: 9eecc764800da82cd19665dc8b167eacead3310b	2020-05-07 20:52:29 -07:00
Andrew Kryczka	1f20df2f38	cover single level universal in crash test (#6818 ) Summary: Pull Request resolved: https://github.com/facebook/rocksdb/pull/6818 Test Plan: fast whitebox test and verify there are some single-level universal and some multi-level universal runs. ``` $ python ./tools/db_crashtest.py whitebox --simple -max_key=1000000 -value_size_mult=33 -write_buffer_size=524288 -target_file_size_base=524288 -max_bytes_for_level_base=2097152 --duration=120 --interval=10 --ops_per_thread=1000 --random_kill_odd=887 ``` Reviewed By: riversand963 Differential Revision: D21432138 Pulled By: ajkr fbshipit-source-id: 2fc5ba9f3dfa49bb11e81da7dd00a17b476e64d7	2020-05-06 18:08:09 -07:00
Ziyue Yang	e619a20e93	Add an option for parallel compression in for db_stress (#6722 ) Summary: This commit adds an `compression_parallel_threads` option in db_stress. It also fixes the naming of parallel compression option in db_bench to keep it aligned with others. Pull Request resolved: https://github.com/facebook/rocksdb/pull/6722 Reviewed By: pdillinger Differential Revision: D21091385 fbshipit-source-id: c9ba8c4e5cc327ff9e6094a6dc6a15fcff70f100	2020-04-30 10:49:07 -07:00
Cheng Chang	0a77617820	Disable O_DIRECT in stress test when db directory does not support direct IO (#6727 ) Summary: In crash test, the db directory might be set to /dev/shm or /tmp, in certain environments such as internal testing infrastructure, neither of these directories support direct IO, so direct IO is never enabled in crash test. This PR sets up SyncPoints in direct IO related code paths to disable O_DIRECT flag in calls to `open`, so the direct IO code paths will be executed, all direct IO related assertions will be checked, but no real direct IO request will be issued to the file system. Pull Request resolved: https://github.com/facebook/rocksdb/pull/6727 Test Plan: export CRASH_TEST_EXT_ARGS="--use_direct_reads=1 --mmap_read=0" make -j24 crash_test Reviewed By: zhichao-cao Differential Revision: D21139250 Pulled By: cheng-chang fbshipit-source-id: db9adfe78d91aa4759835b1af91c5db7b27b62ee	2020-04-25 00:01:03 -07:00
sdong	fe206f4f7c	crash_test to cover index_type kBinarySearchWithFirstKey (#6721 ) Summary: Recently index_type kBinarySearchWithFirstKey is improved so that the API guarantee is exactly the same as other types and it is ready for wide production. We should cover it in crash tst. Pull Request resolved: https://github.com/facebook/rocksdb/pull/6721 Test Plan: Run crash_test Reviewed By: anand1976 Differential Revision: D21099781 fbshipit-source-id: fda91eba831d9eacbb140c703e9768bb1701f935	2020-04-20 12:57:15 -07:00
sdong	63d82e57b9	crash_test to cover small max_open_files (#6719 ) Summary: RocksDB behavior is different while max_open_files is small or large. Add the coverage to small max_open_files. Pull Request resolved: https://github.com/facebook/rocksdb/pull/6719 Test Plan: Run crash_test Reviewed By: pdillinger Differential Revision: D21081021 fbshipit-source-id: e3e211761a9bd25d93d19a61c1f7b62d48cf5e3c	2020-04-17 11:00:07 -07:00
sdong	73523baeb1	crash_test to cover options.avoid_flush_during_recovery (#6712 ) Summary: Options.avoid_flush_during_recovery is uncovered in crash_test. Add the coverage with a chance of 1/8, as it is a less frequently used options. Pull Request resolved: https://github.com/facebook/rocksdb/pull/6712 Test Plan: Run crash_test and see the option can be used or not used by chance. Reviewed By: ltamasi Differential Revision: D21056566 fbshipit-source-id: c3b1521517cfc204786e6ef8c6acd7fffda64793	2020-04-16 12:11:45 -07:00
Yueh-Hsuan Chiang	5801af4646	Add env_fault_injection argument to db_stress (#6687 ) Summary: Add env_fault_injection argument to db_stress. When enabled, FaultInjectionTestEnv will be used instead. Currently this option does not support running with other env setting. This will allow us to later manually produce error when running db_crashtest. Pull Request resolved: https://github.com/facebook/rocksdb/pull/6687 Test Plan: make db_stress -j32 ./db_stress --env_fault_injection ./db_stress --env_fault_injection --hdfs // expect error message Reviewed By: ajkr Differential Revision: D21014683 Pulled By: yhchiang fbshipit-source-id: 0724aeac37efd57adb72a37defe6dbd3bfa8106a	2020-04-16 11:13:44 -07:00
anand76	610a09ccff	Remove a printf from db_stress that's not useful info (#6705 ) Summary: This was causing db_crashtest.py to wrongly assume an error by parsing the output. Hopefully this will stabilize the crash tests. Pull Request resolved: https://github.com/facebook/rocksdb/pull/6705 Test Plan: make blackbox_crash_test Reviewed By: ltamasi Differential Revision: D21043335 Pulled By: anand1976 fbshipit-source-id: 5cddd112b124d4e2ebd11724a17d4ef0f50c1cf8	2020-04-15 12:13:35 -07:00
anand76	79c838eb0f	Fix a few bugs in db_stress fault injection (#6693 ) Summary: Fix the following issues - 1. Output parsing error in db_crashtest.py 2. Memory leak on exit 3. False alarm on filter block read error Pull Request resolved: https://github.com/facebook/rocksdb/pull/6693 Test Plan: asan_crash Reviewed By: cheng-chang Differential Revision: D20990399 Pulled By: anand1976 fbshipit-source-id: 178ee0dd7c69a4bc5db698379db0dedb29281699	2020-04-13 11:01:03 -07:00
anand76	5c19a441c4	Fault injection in db_stress (#6538 ) Summary: This PR implements a fault injection mechanism for injecting errors in reads in db_stress. The FaultInjectionTestFS is used for this purpose. A thread local structure is used to track the errors, so that each db_stress thread can independently enable/disable error injection and verify observed errors against expected errors. This is initially enabled only for Get and MultiGet, but can be extended to iterator as well once its proven stable. Pull Request resolved: https://github.com/facebook/rocksdb/pull/6538 Test Plan: crash_test make check Reviewed By: riversand963 Differential Revision: D20714347 Pulled By: anand1976 fbshipit-source-id: d7598321d4a2d72bda0ced57411a337a91d87dc7	2020-04-10 17:21:26 -07:00
Yanqin Jin	ccf7676455	Update a few scripts to be python3 compatible (#6525 ) Summary: There are a few scripts with python3 compatibility issues that were not detected by automated tool before. Update them now. Test Plan (devserver): python2 tools/ldb_test.py python3 tools/ldb_test.py python2 tools/write_stress_runner.py --runtime_sec=30 python3 tools/write_stress_runner.py --runtime_sec=30 python2 tools/db_crashtest.py --simple --interval=2 --duration=10 blackbox python3 tools/db_crashtest.py --simple --interval=2 --duration=10 blackbox python2 tools/db_crashtest.py --simple --duration=10 --random_kill_odd=1000 --ops_per_thread=1000 whitebox python3 tools/db_crashtest.py --simple --duration=10 --random_kill_odd=1000 --ops_per_thread=1000 whitebox Pull Request resolved: https://github.com/facebook/rocksdb/pull/6525 Reviewed By: cheng-chang Differential Revision: D20627820 Pulled By: riversand963 fbshipit-source-id: 4b25a7bd4d001c7f868be8b640ef876523be6ca3	2020-03-24 21:00:27 -07:00
Levi Tamasi	217ce20021	Remove GetSortedWalFiles/GetCurrentWalFile from the crash test (#6491 ) Summary: Currently, `db_stress` tests a randomly picked one of `GetLiveFiles`, `GetSortedWalFiles`, and `GetCurrentWalFile` with a 1/N chance when the command line parameter `get_live_files_and_wal_files_one_in` is specified. The problem is that `GetSortedWalFiles` and `GetCurrentWalFile` are unreliable in the sense that they can return errors if another thread removes a WAL file while they are executing (which is a perfectly plausible and legitimate scenario). The patch splits this command line parameter into three (one for each API), and changes the crash test script so that only `GetLiveFiles` is tested during our continuous crash test runs. Pull Request resolved: https://github.com/facebook/rocksdb/pull/6491 Test Plan: ``` make check python tools/db_crashtest.py whitebox ``` Reviewed By: siying Differential Revision: D20312200 Pulled By: ltamasi fbshipit-source-id: e7c3481eddfe3bd3d5349476e34abc9eee5b7dc8	2020-03-18 17:14:15 -07:00
Maysam Yabandeh	967a2d953f	Revert "crash_test to enable block-based table hash index (#6310 )" (#6327 ) Summary: This reverts commit `8e309b35bb`. The stress tests are failing . Revert it until we figure the root cause. Pull Request resolved: https://github.com/facebook/rocksdb/pull/6327 Differential Revision: D19537657 Pulled By: maysamyabandeh fbshipit-source-id: bf34a5dd720825957729e136e9a5a729a240e61a	2020-01-23 09:09:17 -08:00
Maysam Yabandeh	cb1142e00d	Set index_block_restart_interval of kHashSearch to 1 in stress test (#6324 ) Summary: kHashSearch is incompatible with larger than 1 values for index_block_restart_interval. Setting it to 1 in stress tests would avoid confusion about the test parameters. Pull Request resolved: https://github.com/facebook/rocksdb/pull/6324 Differential Revision: D19525669 Pulled By: maysamyabandeh fbshipit-source-id: fbf3a797e0ebcebb4d32eba3728cf3583906fc8a	2020-01-22 16:33:21 -08:00
sdong	8e309b35bb	crash_test to enable block-based table hash index (#6310 ) Summary: Block-based table has index has been disabled in crash test due to bugs. We fixed a bug and re-enable it. Pull Request resolved: https://github.com/facebook/rocksdb/pull/6310 Test Plan: Finish one round of "crash_test_with_atomic_flush" test successfully while exclusively running has index. Another run also ran for several hours without failure. Differential Revision: D19455856 fbshipit-source-id: 1192752d2c1e81ed7e5c5c7a9481c841582d5274	2020-01-21 12:27:30 -08:00
sdong	6b64aed4c0	Fix bug which causes crash_test to always run on sync mode (#6304 ) Summary: A previous change meant to make db_stress to run on sync=1 mode for 1/20 of the time in crash_test, but a bug caused to to always run on sync=1 mode. Fix it. Pull Request resolved: https://github.com/facebook/rocksdb/pull/6304 Test Plan: Start and kill "python -u tools/db_crashtest.py --simple whitebox" multiple times and observe that most times sync=0 is used while some times sync=1 is used. Differential Revision: D19433000 fbshipit-source-id: 7a0adba39b17a1b3acbbd791bb0cdb743b91fa95	2020-01-17 01:46:48 -08:00
anand76	687119aeaf	Variable key length in db_stress (#6273 ) Summary: Undo https://github.com/facebook/rocksdb/issues/6243 and fix the crash test failures. Pull Request resolved: https://github.com/facebook/rocksdb/pull/6273 Test Plan: Run make ubsan_crash_test Differential Revision: D19331472 Pulled By: anand1976 fbshipit-source-id: 30aa4a36c1b0f77a97159d82bbfd1cd767878e28	2020-01-09 21:27:18 -08:00
Peter Dillinger	83108d23e8	Re-enable level_compaction_dynamic_level_bytes in crash test (#6251 ) Summary: Was probably a false signal suggesting a problem in https://github.com/facebook/rocksdb/issues/6217 Pull Request resolved: https://github.com/facebook/rocksdb/pull/6251 Test Plan: 'make crash_test' Differential Revision: D19246951 Pulled By: pdillinger fbshipit-source-id: 3e4fafcd9a7cf5f19ffd207b90279ba615145d6f	2019-12-30 10:15:49 -08:00
Peter Dillinger	37fd2b9694	Revert "Generate variable length keys in db_stress (#6165 )" and follow-ups (#6243 ) Summary: This commit is suspected in some crash test failures such as Verification failed for column family 0 key 78438077: Value not found: NotFound: Pull Request resolved: https://github.com/facebook/rocksdb/pull/6243 Test Plan: 'make check' and start 'make crash_test' Differential Revision: D19220495 Pulled By: pdillinger fbshipit-source-id: 6c4709cee80ab4344e06ce360f51e947d79fb3fa	2019-12-23 16:32:57 -08:00
anand76	3160edfdc7	Generate variable length keys in db_stress (#6165 ) Summary: Currently, db_stress generates fixed length keys of 8 bytes. This patch adds the ability to generate variable length keys. Most of the db_stress code continues to work with a numeric key randomly generated, and the numeric key also acts as an index into the values_ array. The numeric key is mapped to a variable length string key in a deterministic way. Furthermore, the ordering is preserved. Pull Request resolved: https://github.com/facebook/rocksdb/pull/6165 Test Plan: run make crash_test Differential Revision: D19204646 Pulled By: anand1976 fbshipit-source-id: d2d46a96615b4832a8be2a981f5913905f0e1ca7	2019-12-20 21:10:33 -08:00
sdong	338c149b92	crash_test to cover bottommost compression and some other changes (#6215 ) Summary: Several improvements to crash_test/stress_test: (1) Stress_test to support an parameter of bottommost compression (2) Rename those FLAGS_* variables that are not gflags to avoid confusion (3) Crash_test to randomly generate compression type for bottommost compression with half the chance. (4) Stress_test to sanitize unsupported compression type to snappy, so that crash_test to cover all possible compression types and people don't need to worry about they don't support all comrpession types in their environment. (5) In crash_test, when generating db_stress command, sort arguments in alphabeta order, so that it is easier to find value for a specific argument. Pull Request resolved: https://github.com/facebook/rocksdb/pull/6215 Test Plan: Run "make crash_test" for a while and see the botommost option shown in LOG files. Differential Revision: D19171255 fbshipit-source-id: d7001e246c4ff9ee5760776eea0be97738650735	2019-12-20 16:14:52 -08:00
Zhichao Cao	f89dea4fec	db_stress: Added the verification for GetLiveFiles, GetSortedWalFiles and GetCurrentWalFile (#6224 ) Summary: Add the verification in operateDB to verify GetLiveFiles, GetSortedWalFiles and GetCurrentWalFile. The test will be called every 1 out of N, N is decided by get_live_files_and_wal_files_one_i, whose default is 1000000. Pull Request resolved: https://github.com/facebook/rocksdb/pull/6224 Test Plan: pass db_stress default run. Differential Revision: D19183358 Pulled By: zhichao-cao fbshipit-source-id: 20073cf72ede77a3e0d3cf5f28304f1f605d2b1a	2019-12-20 12:07:30 -08:00
Yanqin Jin	670a916d01	Add more verification to db_stress (#6173 ) Summary: Currently, db_stress performs verification by calling `VerifyDb()` at the end of test and optionally before tests start. In case of corruption or incorrect result, it will be too late. This PR adds more verification in two ways. 1. For cf consistency test, each test thread takes a snapshot and verifies every N ops. N is configurable via `-verify_db_one_in`. This option is not supported in other stress tests. 2. For cf consistency test, we use another background thread in which a secondary instance periodically tails the primary (interval is configurable). We verify the secondary. Once an error is detected, we terminate the test and report. This does not affect other stress tests. Test plan (devserver) ``` $./db_stress -test_cf_consistency -verify_db_one_in=0 -ops_per_thread=100000 -continuous_verification_interval=100 $./db_stress -test_cf_consistency -verify_db_one_in=1000 -ops_per_thread=10000 -continuous_verification_interval=0 $make crash_test ``` Pull Request resolved: https://github.com/facebook/rocksdb/pull/6173 Differential Revision: D19047367 Pulled By: riversand963 fbshipit-source-id: aeed584ad71f9310c111445f34975e5ab47a0615	2019-12-20 08:49:29 -08:00
Maysam Yabandeh	77d5ba7887	Revert "Add kHashSearch to stress tests (#6210 )" (#6220 ) Summary: This reverts commit `54f9092b0c`. It making our daily stress tests fail. Revert it until the issues are fixed. Pull Request resolved: https://github.com/facebook/rocksdb/pull/6220 Differential Revision: D19179881 Pulled By: maysamyabandeh fbshipit-source-id: 99de0eaf776567fa81110b9ad2608234a16083ce	2019-12-19 10:46:55 -08:00
Peter Dillinger	9ff569bdfc	Temporarily disable level_compaction_dynamic_level_bytes in crash test (#6217 ) Summary: We're seeing assertion violations like this in crash test: db_stress: table/block_based/block_based_table_reader.cc:4129: virtual uint64_t rocksdb::BlockBasedTable::ApproximateSize(const rocksdb::Slice&, const rocksdb::Slice&, rocksdb::TableReaderCaller): Assertion `end_offset >= start_offset' failed.*** And ApproximateSize appears only to be called with the level_compaction_dynamic_level_bytes option. Pull Request resolved: https://github.com/facebook/rocksdb/pull/6217 Test Plan: temporarily put an assert(false) in ApproximateSize and briefly run 'make crash_test' Differential Revision: D19179174 Pulled By: pdillinger fbshipit-source-id: 506e6549aea0da19b363a1a6da04373c364d92e4	2019-12-19 10:24:49 -08:00
Maysam Yabandeh	54f9092b0c	Add kHashSearch to stress tests (#6210 ) Summary: Beside extending index_type to kHashSearch, it clarifies in the code base that this feature is incompatible with index_block_restart_interval > 1. Pull Request resolved: https://github.com/facebook/rocksdb/pull/6210 Test Plan: ``` make -j32 crash_test Differential Revision: D19166567 Pulled By: maysamyabandeh fbshipit-source-id: 3aaf75a70a8b462d372d43aac69dbd10df303ec7	2019-12-18 18:09:30 -08:00
Peter Dillinger	dfb259e48d	Fix syntax error (!) in db_crashtest.py (#6207 ) Summary: Fixes syntax error from https://github.com/facebook/rocksdb/pull/6203 Pull Request resolved: https://github.com/facebook/rocksdb/pull/6207 Test Plan: make blackbox_crash_test -> no more syntax error Differential Revision: D19161752 Pulled By: pdillinger fbshipit-source-id: b3032f296041ab56307762622b9ef6c03a8379aa	2019-12-18 09:32:52 -08:00
anand76	2afea29762	Add VerifyChecksum() to db_stress (#6203 ) Summary: Add an option to db_stress, verify_checksum_one_in, to call DB::VerifyChecksum() once every N ops. Pull Request resolved: https://github.com/facebook/rocksdb/pull/6203 Differential Revision: D19145753 Pulled By: anand1976 fbshipit-source-id: d09edf21f309ad53aa40dd25b7a563d50665fd8b	2019-12-17 20:44:58 -08:00
sdong	9f250dd88e	crash_test: two fixes (#6200 ) Summary: Fix two crash test issues: 1. sync mode should not run with disable_wal=true 2. disable "compaction_readahead_size" for now. With it on, some block checksum verification failure will happen in compaction paths. Not sure why, but disable it for now to keep the test clean. Pull Request resolved: https://github.com/facebook/rocksdb/pull/6200 Test Plan: Run "make crash_test" and "make crash_test_with_atomic_flush" and see it runs way longer than before the fix without failing. Differential Revision: D19143493 fbshipit-source-id: 438fad52fbda60aafd142e1b65578addbe7d72b1	2019-12-17 18:25:04 -08:00
sdong	bcc372c0c3	Add some new options to crash_test (#6176 ) Summary: Several options are trivially added to crash test and random values are picked. Made simple test run non-dynamic level and normal test run dynamic level. Pull Request resolved: https://github.com/facebook/rocksdb/pull/6176 Test Plan: Run crash_test and watch the printing Differential Revision: D19053955 fbshipit-source-id: 958cb43c968541ebd87ed4d91e778bd1d40e7502	2019-12-16 15:43:13 -08:00
Maysam Yabandeh	4b97812da8	Add long-running snapshots to stress tests (#6171 ) Summary: Current implementation holds on to 10% of snapshots for 10x longer, and 1% of snapshots 100x longer. Pull Request resolved: https://github.com/facebook/rocksdb/pull/6171 Test Plan: ``` make -j32 crash_test Differential Revision: D19038399 Pulled By: maysamyabandeh fbshipit-source-id: 75da2dbb5c47a0b3f37d299b8719e392b73b42c0	2019-12-14 15:22:40 -08:00
Maysam Yabandeh	fec7302a9d	Enable unordered_write in stress tests (#6164 ) Summary: With WritePrepared transactions configured with two_write_queues, unordered_write will offer the same guarantees as vanilla rocksdb and thus can be enabled in stress tests. Pull Request resolved: https://github.com/facebook/rocksdb/pull/6164 Test Plan: ``` make -j32 crash_test_with_txn Differential Revision: D18991899 Pulled By: maysamyabandeh fbshipit-source-id: eece5e96b4169b67d7931e5c0afca88540a113e1	2019-12-13 10:25:04 -08:00
Maysam Yabandeh	8613ee2e94	Enable all txn write policies in crash test (#6158 ) Summary: Currently the default txn write policy in crash tests is WRITE_PREPARED. The patch randomly picks the write policy at the start of the crash test. Pull Request resolved: https://github.com/facebook/rocksdb/pull/6158 Test Plan: ``` make -j32 crash_test_with_txn ``` Differential Revision: D18946307 Pulled By: maysamyabandeh fbshipit-source-id: f77d7a94f99a08791ef9626da153d284bf521950	2019-12-12 10:43:49 -08:00
Maysam Yabandeh	1ad6fa9cc7	Enable txn in crash tests (#6155 ) Summary: Start daily crash tests with use_txn flag. Pull Request resolved: https://github.com/facebook/rocksdb/pull/6155 Differential Revision: D18943630 Pulled By: maysamyabandeh fbshipit-source-id: eea99a6ffd5f57fb9651f6ca7dab8fbf70379c87	2019-12-11 16:01:55 -08:00
Peter Dillinger	a653857178	Add PauseBackgroundWork() to db_stress (#6148 ) Summary: Worker thread will occasionally call PauseBackgroundWork(), briefly sleep (to avoid stalling itself) and then call ContinueBackgroundWork(). Pull Request resolved: https://github.com/facebook/rocksdb/pull/6148 Test Plan: some running of 'make blackbox_crash_test' with temporary printf output to confirm code occasionally reached. Differential Revision: D18913886 Pulled By: pdillinger fbshipit-source-id: ae9356a803390929f3165dfb6a00194692ba92be	2019-12-10 15:46:48 -08:00
Peter Dillinger	6380df5e10	Vary bloom_bits in db_crashtest (#6103 ) Summary: Especially with non-integral bits/key now supported, db_crashtest should vary the bloom_bits configuration. The probabilities look like this: 1/2 chance of a uniform int from 0 to 19. This includes overall 1/40 chance of 0 which disables the bloom filter. 1/2 chance of a float from a lognormal distribution with a median of 10. This always produces positive values but with a decent chance of < 1 (overall ~1/40) or > 100 (overall ~1/40), the enforced/coerced implementation limits. Pull Request resolved: https://github.com/facebook/rocksdb/pull/6103 Test Plan: start 'make blackbox_crash_test' several times and look at configuration output Differential Revision: D18734877 Pulled By: pdillinger fbshipit-source-id: 4a38cb057d3b3fc1327f93199f65b9a9ffbd7316	2019-12-10 08:39:50 -08:00
Peter Dillinger	f19faf7814	Add format_version=5 to db_crashtest (#6102 ) Summary: format_version=5 enables new Bloom filter. Using 2/5 probability for "latest and greatest" rather than naive 1/4. Pull Request resolved: https://github.com/facebook/rocksdb/pull/6102 Test Plan: start 'make blackbox_crash_test' Differential Revision: D18735685 Pulled By: pdillinger fbshipit-source-id: e81529c8a3f53560d246086ee5f92ee7d79a2eab	2019-11-27 13:19:11 -08:00
sdong	6123611c42	crash_test: use large max_manifest_file_size most of the time. (#6034 ) Summary: Right now, crash_test always uses 16KB max_manifest_file_size value. It is good to cover logic of manifest file switch. However, information stored in manifest files might be useful in debugging failures. Switch to only use small manifest file size in 1/15 of the time. Pull Request resolved: https://github.com/facebook/rocksdb/pull/6034 Test Plan: Observe command generated by db_crash_test.py multiple times and see the --max_manifest_file_size value distribution. Differential Revision: D18513824 fbshipit-source-id: 7b3ae6dbe521a0918df41064e3fa5ecbf2466e04	2019-11-14 14:01:06 -08:00
sdong	5b656584af	crash_test: disable periodic compaction in FIFO compaction. (#5993 ) Summary: A recent commit make periodic compaction option valid in FIFO, which means TTL. But we fail to disable it in crash test, causing assert failure. Fix it by having it disabled. Pull Request resolved: https://github.com/facebook/rocksdb/pull/5993 Test Plan: Restart "make crash_test" many times and make sure --periodic_compaction_seconds=0 is always the case when --compaction_style=2 Differential Revision: D18263223 fbshipit-source-id: c91a802017d83ae89ac43827d1b0012861933814	2019-10-31 17:28:03 -07:00
sdong	0337d87b42	crash_test: disable atomic flush with pipelined write (#5986 ) Summary: Recently, pipelined write is enabled even if atomic flush is enabled, which causing sanitizing failure in db_stress. Revert this change. Pull Request resolved: https://github.com/facebook/rocksdb/pull/5986 Test Plan: Run "make crash_test_with_atomic_flush" and see it to run for some while so that the old sanitizing error (which showed up quickly) doesn't show up. Differential Revision: D18228278 fbshipit-source-id: 27fdf2f8e3e77068c9725a838b9bef4ab25a2553	2019-10-30 11:36:55 -07:00
sdong	a3960fc875	Move pipeline write waiting logic into WaitForPendingWrites() (#5716 ) Summary: In pipeline writing mode, memtable switching needs to wait for memtable writing to finish to make sure that when memtables are made immutable, inserts are not going to them. This is currently done in DBImpl::SwitchMemtable(). This is done after flush_scheduler_.TakeNextColumnFamily() is called to fetch the list of column families to switch. The function flush_scheduler_.TakeNextColumnFamily() itself, however, is not thread-safe when being called together with flush_scheduler_.ScheduleFlush(). This change provides a fix, which moves the waiting logic before flush_scheduler_.TakeNextColumnFamily(). WaitForPendingWrites() is a natural place where the logic can happen. Pull Request resolved: https://github.com/facebook/rocksdb/pull/5716 Test Plan: Run all tests with ASAN and TSAN. Differential Revision: D18217658 fbshipit-source-id: b9c5e765c9989645bf10afda7c5c726c3f82f6c3	2019-10-29 18:16:36 -07:00
Maysam Yabandeh	a6e615a7ba	Enable partitioned index/filter in stress tests (#5918 ) Summary: This is the 3rd attempt after the revert of https://github.com/facebook/rocksdb/issues/4020 and https://github.com/facebook/rocksdb/issues/5895 The last bug is fixed https://github.com/facebook/rocksdb/pull/5907 Pull Request resolved: https://github.com/facebook/rocksdb/pull/5918 Test Plan: ``` make -j32 crash_test ``` Differential Revision: D17909489 Pulled By: maysamyabandeh fbshipit-source-id: 7dfb8cf998c2d295c86465dd21734593d277887e	2019-10-14 10:35:18 -07:00
Yanqin Jin	bc8b05cb77	Revert "Enable partitioned index/filter in stress tests (#5895 )" (#5904 ) Summary: This reverts commit `2f4e288143`. Pull Request resolved: https://github.com/facebook/rocksdb/pull/5904 Differential Revision: D17871282 Pulled By: riversand963 fbshipit-source-id: d210725f8f3b26d8eac25892094da09d9694337e	2019-10-10 19:19:39 -07:00
Maysam Yabandeh	2f4e288143	Enable partitioned index/filter in stress tests (#5895 ) Summary: This is the 2nd attempt after the revert of https://github.com/facebook/rocksdb/pull/4020 Pull Request resolved: https://github.com/facebook/rocksdb/pull/5895 Test Plan: ``` ./tools/db_crashtest.py blackbox --simple --interval=10 --max_key=10000000 ``` Differential Revision: D17822137 Pulled By: maysamyabandeh fbshipit-source-id: 3d148c0d8cc129080410ff859c04b544223c8ea3	2019-10-08 16:50:21 -07:00
sdong	679a45d0cb	crash_test to do some verification for prefix extractor and iterator bounds. (#5846 ) Summary: For now, crash_test is not able to report any failure for the logic related to iterator upper, lower bounds or iterators, or reseek. These are features prone to errors. Improve db_stress in several ways: (1) For each iterator run, reseek up to 3 times. (2) For every iterator, create control iterator with upper or lower bound, with total order seek. Compare the results with the iterator. (3) Make simple crash test to avoid prefix size to have more coverage. (4) make prefix_size = 0 a valid size and -1 to indicate disabling prefix extractor. Pull Request resolved: https://github.com/facebook/rocksdb/pull/5846 Test Plan: Manually hack the code to create wrong results and see they are caught by the tool. Differential Revision: D17631760 fbshipit-source-id: acd460a177bd2124a5ffd7fff490702dba63030b	2019-09-27 11:10:44 -07:00
Maysam Yabandeh	6ec6a4a9a4	Remove snap_refresh_nanos option (#5826 ) Summary: The snap_refresh_nanos option didn't bring much benefit. Remove the feature to simplify the code. Pull Request resolved: https://github.com/facebook/rocksdb/pull/5826 Differential Revision: D17467147 Pulled By: maysamyabandeh fbshipit-source-id: 4f950b046990d0d1292d7fc04c2ccafaf751c7f0	2019-09-18 20:26:04 -07:00
Levi Tamasi	94d62d771e	Temporarily disable partitioned index/filter in stress test (#5811 ) Summary: PR https://github.com/facebook/rocksdb/issues/4020 enabled partitioned indexes/filters in stress tests; however, this causes assertion failures in BatchedOpsStressTest. This patch disables them until we can root cause the failures. Pull Request resolved: https://github.com/facebook/rocksdb/pull/5811 Test Plan: Ran the script and made sure it only uses the binary search index. Differential Revision: D17399366 Pulled By: ltamasi fbshipit-source-id: adb116e6297f9c6ccd7ac15b6a16c9aa91f21ac5	2019-09-16 11:41:35 -07:00
Levi Tamasi	d35ffd569c	Temporarily disable hash index in stress tests (#5792 ) Summary: PR https://github.com/facebook/rocksdb/issues/4020 implicitly enabled the hash index as well in stress/crash tests, resulting in assertion failures in Block. This patch disables the hash index until we can pinpoint the root cause of these issues. Pull Request resolved: https://github.com/facebook/rocksdb/pull/5792 Test Plan: Ran tools/db_crashtest.py and made sure it only uses index types 0 and 2 (binary search and partitioned index). Differential Revision: D17346777 Pulled By: ltamasi fbshipit-source-id: b4318f37f1fda3ee1bbff4ef2c2f556ca9e6b551	2019-09-12 12:11:34 -07:00
Andrew Kryczka	dd2a35f13f	Support partitioned index and filters in stress/crash tests (#4020 ) Summary: - In `db_stress`, support choosing index type and whether to enable filter partitioning, and randomly set those options in crash test - When partitioned filter is enabled by crash test, force partitioned index to also be enabled since it's a prerequisite Pull Request resolved: https://github.com/facebook/rocksdb/pull/4020 Test Plan: currently this is blocked on fixing the bug that crash test caught: ``` $ TEST_TMPDIR=/data/compaction_bench python ./tools/db_crashtest.py blackbox --simple --interval=10 --max_key=10000000 ... Verification failed for column family 0 key 937501: Value not found: NotFound: Crash-recovery verification failed :( ``` Differential Revision: D8508683 Pulled By: maysamyabandeh fbshipit-source-id: 0337e5d0558bcef26b1f3699f47265a2c1e99629	2019-09-11 14:13:38 -07:00
sdong	1daff8f85a	crash_test to skip compaction TTL for FIFO compaction. (#5749 ) Summary: https://github.com/facebook/rocksdb/pull/5741 added compaction TTL to crash test, but it causes assertion fails for FIFO compaction. Disable this combination for now while we debug the assertion failure. Pull Request resolved: https://github.com/facebook/rocksdb/pull/5749 Test Plan: Run crash test and observe that when compaction_style=2, compaction_ttl is always 0. Differential Revision: D17078292 fbshipit-source-id: 446821a3b9739956094d5e4f9be1251a15b57f5d	2019-08-27 17:55:37 -07:00
sdong	1d6a10f52d	Extend stress test to cover periodic compaction and compaction TTL (#5741 ) Summary: Covering periodic compaction and compaction TTL can help us expose potential issues. Add it there. Randomly select value for these two options. Pull Request resolved: https://github.com/facebook/rocksdb/pull/5741 Test Plan: Run crash_test and see the perameters generated. Differential Revision: D17059515 fbshipit-source-id: 8213974846a0b6a22fc13be705825c9054d1d097	2019-08-26 15:03:25 -07:00
sdong	d8a27d9331	Atomic Flush Crash Test also covers the case that WAL is enabled. (#5729 ) Summary: AtomicFlushStressTest is a powerful test, but right now we only run it for atomic_flush=true + disable_wal=true. We further extend it to the case where atomic_flush=false + disable_wal = false. All the workload generation and validation can stay the same. Atomic flush crash test is also changed to switch between the two test scenarios. It makes the name "atomic flush crash test" out of sync from what it really does. We leave it as it is to avoid troubles with continous test set-up. Pull Request resolved: https://github.com/facebook/rocksdb/pull/5729 Test Plan: Run "CRASH_TEST_KILL_ODD=188 TEST_TMPDIR=/dev/shm/ USE_CLANG=1 make whitebox_crash_test_with_atomic_flush", observe the settings used and see it passed. Differential Revision: D16969791 fbshipit-source-id: 56e37487000ae631e31b0100acd7bdc441c04163	2019-08-22 16:32:55 -07:00
sdong	8e12638f3d	Slightly adjust atomic white box test's kill odd (#5717 ) Summary: Atomic white box test's kill odd is the same as normal test. However, in the scenario that only WritableFileWriter::Append() is blacklisted, WritableFileWriter::Flush() dominates the killing odds. Normally, most of WritableFileWriter::Flush() are called in WAL writes, where every write triggers a WAL flush. In atomic test, WAL is disabled, so the kill happens less frequently than we antipated. In some rare cases, the kill didn't end up with happening (for reasons I still don't fully understand) and cause the stress test timeout. If WAL is disabled, make the odds 5x likely to trigger. Pull Request resolved: https://github.com/facebook/rocksdb/pull/5717 Test Plan: Run whitebox_crash_test_with_atomic_flush and whitebox_crash_test and observe the kill odds printed out. Differential Revision: D16897237 fbshipit-source-id: cbf5d96f6fc0e980523d0f1f94bf4e72cdb82d1c	2019-08-19 10:51:59 -07:00
Yanqin Jin	a78503bd6c	Temporarily disable snapshot list refresh for atomic flush stress test (#5581 ) Summary: Atomic flush test started to fail after https://github.com/facebook/rocksdb/issues/5099. Then https://github.com/facebook/rocksdb/issues/5278 provided a fix after which the same error occurred much less frequently. However it still occur occasionally. Not sure what the root cause is. This PR disables the feature of snapshot list refresh, and we should keep an eye on the failure in the future. Pull Request resolved: https://github.com/facebook/rocksdb/pull/5581 Differential Revision: D16295985 Pulled By: riversand963 fbshipit-source-id: c9e62e65133c52c21b07097de359632ca62571e4	2019-07-22 14:38:16 -07:00
Tim Hatch	a6a9213a36	Fix interpreter lines for files with python2-only syntax. Reviewed By: lisroach Differential Revision: D15362271 fbshipit-source-id: 48fab12ab6e55a8537b19b4623d2545ca9950ec5	2019-07-09 10:51:37 -07:00
Maysam Yabandeh	f9842869cf	Disable pipeline writes in stress test (#5445 ) Summary: The tsan crash tests are failing with a data race compliant with pipelined write option. Temporarily disable it until its concurrency issue are fixed. Pull Request resolved: https://github.com/facebook/rocksdb/pull/5445 Differential Revision: D15783824 Pulled By: maysamyabandeh fbshipit-source-id: 413a0c3230b86f524fc7eeea2cf8e8375406e65b	2019-06-12 11:12:36 -07:00
anand76	181bb43f08	Fix bugs in FilePickerMultiGet (#5292 ) Summary: This PR fixes a couple of bugs in FilePickerMultiGet that were causing db_stress test failures. The failures were caused by - 1. Improper handling of a key that matches the user key portion of an L0 file's largest key. In this case, the curr_index_in_curr_level file index in L0 for that key was getting incremented, but batch_iter_ was not advanced. By design, all keys in a batch are supposed to be checked against an L0 file before advancing to the next L0 file. Not advancing to the next key in the batch was causing a double increment of curr_index_in_curr_level due to the same key being processed again 2. Improper handling of a key that matches the user key portion of the largest key in the last file of L1 and higher. This was resulting in a premature end to the processing of the batch for that level when the next key in the batch is a duplicate. Typically, the keys in MultiGet will not be duplicates, but its good to handle that case correctly Test - asan_crash make check Pull Request resolved: https://github.com/facebook/rocksdb/pull/5292 Differential Revision: D15282530 Pulled By: anand1976 fbshipit-source-id: d1a6a86e0af273169c3632db22a44d79c66a581f	2019-05-09 13:18:00 -07:00
anand76	930bfa5750	Disable MultiGet from db_stress (#5284 ) Summary: Disable it for now until we can get stress tests to pass consistently. Pull Request resolved: https://github.com/facebook/rocksdb/pull/5284 Differential Revision: D15230727 Pulled By: anand1976 fbshipit-source-id: 239baacdb3c4cd4fb7c4447f7582b9042501d752	2019-05-06 18:26:50 -07:00
Maysam Yabandeh	6a40ee5eb1	Refresh snapshot list during long compactions (2nd attempt) (#5278 ) Summary: Part of compaction cpu goes to processing snapshot list, the larger the list the bigger the overhead. Although the lifetime of most of the snapshots is much shorter than the lifetime of compactions, the compaction conservatively operates on the list of snapshots that it initially obtained. This patch allows the snapshot list to be updated via a callback if the compaction is taking long. This should let the compaction to continue more efficiently with much smaller snapshot list. For simplicity, to avoid the feature is disabled in two cases: i) When more than one sub-compaction are sharing the same snapshot list, ii) when Range Delete is used in which the range delete aggregator has its own copy of snapshot list. This fixes the reverted https://github.com/facebook/rocksdb/pull/5099 issue with range deletes. Pull Request resolved: https://github.com/facebook/rocksdb/pull/5278 Differential Revision: D15203291 Pulled By: maysamyabandeh fbshipit-source-id: fa645611e606aa222c7ce53176dc5bb6f259c258	2019-05-03 17:30:22 -07:00
anand76	434ccf2df4	Add option to use MultiGet in db_stress (#5264 ) Summary: The new option will pick a batch size randomly in the range 1-64. It will then space the keys in the batch by random intervals. Pull Request resolved: https://github.com/facebook/rocksdb/pull/5264 Differential Revision: D15175522 Pulled By: anand1976 fbshipit-source-id: c16baa69d0f1ff4cf53c55c813ddd82c8aeb58fc	2019-05-01 23:06:56 -07:00
Andrew Kryczka	b02d0c238d	Init compression dict handle before reading meta-blocks (#5267 ) Summary: At least one of the meta-block loading functions (`ReadRangeDelBlock`) uses the same block reading function (`NewDataBlockIterator`) as data block reads, which means it uses the dictionary handle. However, the dictionary handle was uninitialized while reading meta-blocks, causing readers to receive an error. This situation was only noticed when `cache_index_and_filter_blocks=true`. This PR initializes the handle to null while reading meta-blocks to prevent the error. It also adds support to `db_stress` / `db_crashtest.py` for `cache_index_and_filter_blocks`. Fixes #5263. Pull Request resolved: https://github.com/facebook/rocksdb/pull/5267 Differential Revision: D15149264 Pulled By: maysamyabandeh fbshipit-source-id: 991d38a306c62db5976778bfb050fa3cd4a0671b	2019-04-30 09:50:49 -07:00
Yanqin Jin	210b49cac9	Disable pipelined write in atomic flush stress test (#5266 ) Summary: Since currently pipelined write allows one thread to perform memtable writes while another thread is traversing the `flush_scheduler_`, it will cause an assertion failure in `FlushScheduler::Clear`. To unblock crash recoery tests, we temporarily disable pipelined write when atomic flush is enabled. Pull Request resolved: https://github.com/facebook/rocksdb/pull/5266 Differential Revision: D15142285 Pulled By: riversand963 fbshipit-source-id: a0c20fe4ac543e08feaed602414f982054df7831	2019-04-30 08:12:42 -07:00
Fosco Marotto	6c2bf9e916	Add copyright headers per FB open-source checkup tool. (#5199 ) Summary: internal task: T35568575 Pull Request resolved: https://github.com/facebook/rocksdb/pull/5199 Differential Revision: D14962794 Pulled By: gfosco fbshipit-source-id: 93838ede6d0235eaecff90d200faed9a8515bbbe	2019-04-18 10:55:01 -07:00
Andrew Kryczka	2263f86901	exercise WAL recycling in crash test (#5070 ) Summary: Since this feature affects the WAL behavior, it seems important our crash-recovery tests cover it. Pull Request resolved: https://github.com/facebook/rocksdb/pull/5070 Differential Revision: D14470085 Pulled By: miasantreble fbshipit-source-id: 9b9682a718a926d57d055e0a5ec867efbd2eb9c1	2019-03-15 12:03:26 -07:00
Andrew Kryczka	1218704b61	Fix `compression_zstd_max_train_bytes` coverage in stress test (#4957 ) Summary: Previously `finalize_and_sanitize` function was always zeroing out `compression_zstd_max_train_bytes`. It was only supposed to do that when non-ZSTD compression was used. But since `--compression_type` was an unknown argument (i.e., one that `db_crashtest.py` does not recognize and blindly forwards to `db_stress`), `finalize_and_sanitize` could not tell whether ZSTD was used. This PR fixes it simply by making `--compression_type` a known argument with snappy as default (same as `db_stress`). Pull Request resolved: https://github.com/facebook/rocksdb/pull/4957 Differential Revision: D13994302 Pulled By: ajkr fbshipit-source-id: 1b0baea7331397822830970d3698642eb7a7df65	2019-02-11 14:56:39 -08:00
Andrew Kryczka	68d949b3e3	Enable DeleteRange in stress/crash tests (#4483 ) Summary: Set `delrangepercent=1` when `test_batches_snapshots=false`. Pull Request resolved: https://github.com/facebook/rocksdb/pull/4483 Differential Revision: D10324361 Pulled By: ajkr fbshipit-source-id: 0cde1f1504f9493408a0c6493b976d7e5f5b2d23	2018-12-18 13:42:49 -08:00
Andrew Kryczka	8d2b74d287	Refine db_stress params for atomic flush (#4781 ) Summary: Separate flag for enabling option from flag for enabling dedicated atomic stress test. I have found setting the former without setting the latter can detect different problems. Pull Request resolved: https://github.com/facebook/rocksdb/pull/4781 Differential Revision: D13463211 Pulled By: ajkr fbshipit-source-id: 054f777885b2dc7d5ea99faafa21d6537eee45fd	2018-12-13 22:10:38 -08:00
Yanqin Jin	912bbbbc72	Enable crash-recovery stress test for atomic flush (#4605 ) Summary: This PR adds test of atomic flush to our continuous stress tests. Pull Request resolved: https://github.com/facebook/rocksdb/pull/4605 Differential Revision: D12840607 Pulled By: riversand963 fbshipit-source-id: 0da187572791a59530065a7952697c05b1197ad9	2018-10-30 14:03:36 -07:00
Zhongyi Xie	9b3cf908a6	add missing range in random.choice argument (#4397 ) Summary: This will fix the broken asan crash test: > Traceback (most recent call last): File "tools/db_crashtest.py", line 384, in <module> main() File "tools/db_crashtest.py", line 368, in main parser.add_argument("--" + k, type=type(v() if callable(v) else v)) File "tools/db_crashtest.py", line 59, in <lambda> "index_block_restart_interval": lambda: random.choice(1, 16), TypeError: choice() takes exactly 2 arguments (3 given) Pull Request resolved: https://github.com/facebook/rocksdb/pull/4397 Differential Revision: D9933041 Pulled By: miasantreble fbshipit-source-id: 10998e5bc6b6a5cea3e4088b18465affc246e639	2018-09-19 12:13:20 -07:00
Maysam Yabandeh	a0ebec3804	Extend crash test with index_block_restart_interval (#4383 ) Summary: The default for index_block_restart_interval is 1 but some use 16 in production. The patch extends crash test to test both values. Pull Request resolved: https://github.com/facebook/rocksdb/pull/4383 Differential Revision: D9887304 Pulled By: maysamyabandeh fbshipit-source-id: a8d00fea974a79ad563f9f4d9d7b069e9f746a8f	2018-09-18 15:43:29 -07:00
Andrew Kryczka	8c25204633	Support manual flush in stress/crash tests (#4368 ) Summary: - Made stress test call `Flush()` periodically according to `--flush_one_in` flag. - Enabled by default in crash test. Pull Request resolved: https://github.com/facebook/rocksdb/pull/4368 Differential Revision: D9838593 Pulled By: ajkr fbshipit-source-id: fe5a6e49b36e5ea752acc3aa8be364f8ef34d9cc	2018-09-17 12:27:55 -07:00
Maysam Yabandeh	d122025891	Extend stress test to format_version 4 (#4265 ) Summary: Stress tests currently cover format_version 2 and 3. The patch adds 4 as well. Pull Request resolved: https://github.com/facebook/rocksdb/pull/4265 Differential Revision: D9323185 Pulled By: maysamyabandeh fbshipit-source-id: 54d11e41ecae09bae14cadd7313f07c9a3db5a57	2018-08-14 14:13:33 -07:00
Andrew Kryczka	6175b4b294	Support dictionary compression in stress/crash tests (#4234 ) Summary: - Add `--compression_max_dict_bytes` and `--compression_zstd_max_train_bytes` flags to stress test - Randomly enable/disable the above flags in crash test - Set `--compression_type=zstd` in FB-specific crash test runs Pull Request resolved: https://github.com/facebook/rocksdb/pull/4234 Differential Revision: D9187207 Pulled By: ajkr fbshipit-source-id: 8d78cf8d8e1165f2cd1c32e069b73726b5bc1fd2	2018-08-06 15:27:29 -07:00
Siying Dong	4b0a43574a	db_stress to cover upper bound in iterators (#4162 ) Summary: db_stress doesn't cover upper or lower bound in iterators. Try to cover it by randomly assigning a random one. Also in prefix scan tests, with 50% of the chance, set next prefix as the upper bound. Pull Request resolved: https://github.com/facebook/rocksdb/pull/4162 Differential Revision: D8953507 Pulled By: siying fbshipit-source-id: f0f04e9cb6c07cbebbb82b892ca23e0daeea708b	2018-07-23 10:45:29 -07:00
Peter (Stig) Edwards	2694b6dc26	Remove unused imports, from python scripts. (#4057 ) Summary: Also remove redefined variable. As reported on https://lgtm.com/projects/g/facebook/rocksdb/ Closes https://github.com/facebook/rocksdb/pull/4057 Differential Revision: D8648342 Pulled By: ajkr fbshipit-source-id: afd2ba84d1364d316010179edd44777e64ca9183	2018-06-26 12:43:04 -07:00
Andrew Kryczka	7f3a634e06	Support pipelined write in stress/crash tests Summary: Closes https://github.com/facebook/rocksdb/pull/4019 Differential Revision: D8508681 Pulled By: ajkr fbshipit-source-id: 23a3c07d642386446e322b02e69cdf70d12ef009	2018-06-19 09:14:12 -07:00
Andrew Kryczka	8585059ae0	Support backup and checkpoint in db_stress (#4005 ) Summary: Add the `backup_one_in` and `checkpoint_one_in` options to periodically trigger backups and checkpoints. The directory names contain thread ID to avoid clashing with parallel backups/checkpoints. Enable checkpoint in crash test so our CI runs will use it. Didn't enable backup in crash test since it copies all the files which is too slow. Closes https://github.com/facebook/rocksdb/pull/4005 Differential Revision: D8472275 Pulled By: ajkr fbshipit-source-id: ff91bdc37caac4ffd97aea8df96b3983313ac1d5	2018-06-18 19:28:18 -07:00
Andrew Kryczka	de2c6fb158	Fix stderr processing in crash test (#4006 ) Summary: Fixed bug where `db_stress` output a line with a warning followed by a line with an error, and `db_crashtest.py` considered that a success. For example: ``` WARNING: prefix_size is non-zero but memtablerep != prefix_hash open error: Corruption: SST file is ahead of WALs ``` Closes https://github.com/facebook/rocksdb/pull/4006 Differential Revision: D8473463 Pulled By: ajkr fbshipit-source-id: 60461bdd7491d9d26c63f7d4ee522a0f88ba3de7	2018-06-18 17:58:13 -07:00
Andrew Kryczka	7497f992e0	Run manual compaction in stress/crash tests (#3936 ) Summary: - Add support to `db_stress` for `CompactRange` - Enable `CompactRange` and `CompactFiles` in crash tests Closes https://github.com/facebook/rocksdb/pull/3936 Differential Revision: D8230953 Pulled By: ajkr fbshipit-source-id: 208f9980b5bc8c204b1fa726e83791ad674e21e8	2018-06-13 16:45:28 -07:00
Maysam Yabandeh	d0c38c0c8c	Extend some tests to format_version=3 (#3942 ) Summary: format_version=3 changes the format of SST index. This is however not being tested currently since tests only work with the default format_version which is currently 2. The patch extends the most related tests to also test for format_version=3. Closes https://github.com/facebook/rocksdb/pull/3942 Differential Revision: D8238413 Pulled By: maysamyabandeh fbshipit-source-id: 915725f55753dd8e9188e802bf471c23645ad035	2018-06-04 20:13:00 -07:00
Andrew Kryczka	4f297ad05f	Fix crash test check for direct I/O Summary: We need to keep the DB directory around since the direct IO check in "db_crashtest.py" relies on it existing. This PR fixes an issue where it was removed after each stress test run during the second half of whitebox crash testing. Closes https://github.com/facebook/rocksdb/pull/3946 Differential Revision: D8247998 Pulled By: ajkr fbshipit-source-id: 4e7cffbdab9b40df125e7842d0d59916e76261d3	2018-06-03 21:42:12 -07:00
Andrew Kryczka	88c3ee2d31	Configure direct I/O statically in db_stress Summary: Previously `db_stress` attempted to configure direct I/O dynamically in `SetOptions()` which had multiple problems (ummm must've never been tested): - It's a DB option so SetDBOptions should've been called instead - It's not a dynamic option so even SetDBOptions would fail - It required enabling SyncPoint to mask O_DIRECT since it had no way to detect whether the DB directory was in tmpfs or not. This required locking that consumed ~80% of db_stress CPU. In this PR I delete the broken dynamic config and instead configure it statically, only enabling it if the DB directory truly supports O_DIRECT. Closes https://github.com/facebook/rocksdb/pull/3939 Differential Revision: D8238120 Pulled By: ajkr fbshipit-source-id: 60bb2deebe6c9b54a3f788079261715b4a229279	2018-06-01 16:42:34 -07:00
Andrew Kryczka	d19f568abf	Refactor argument handling in db_crashtest.py Summary: - Any options unknown to `db_crashtest.py` are now passed directly to `db_stress`. This way, we won't need to update `db_crashtest.py` every time `db_stress` gets a new option. - Remove `db_crashtest.py` redundant arguments where the value is the same as `db_stress`'s default - Remove `db_crashtest.py` redundant arguments where the value is the same in a previously applied options map. For example, default_params are always applied before whitebox_default_params, so if they require the same value for an argument, that value only needs to be provided in default_params. - Made the simple option maps applied in addition to the regular option maps. Previously they were exclusive which led to lots of duplication Closes https://github.com/facebook/rocksdb/pull/3809 Differential Revision: D7885779 Pulled By: ajkr fbshipit-source-id: 3a3243b55724d6d5bff36e939b582b9b62c538a8	2018-05-09 13:42:41 -07:00
Andrew Kryczka	46152d53bf	Second attempt at db_stress crash-recovery verification Summary: - Original commit: `a4fb1f8c04` - Revert commit (we reverted as a quick fix to get crash tests passing): `6afe22db2e` This PR includes the contents of the original commit plus two bug fixes, which are: - In whitebox crash test, only set `--expected_values_path` for `db_stress` runs in the first half of the crash test's duration. In the second half, a fresh DB is created for each `db_stress` run, so we cannot maintain expected state across `db_stress` runs. - Made `Exists()` return true for `UNKNOWN_SENTINEL` values. I previously had an assert in `Exists()` that value was not `UNKNOWN_SENTINEL`. But it is possible for post-crash-recovery expected values to be `UNKNOWN_SENTINEL` (i.e., if the crash happens in the middle of an update), in which case this assertion would be tripped. The effect of returning true in this case is there may be cases where a `SingleDelete` deletes no data. But if we had returned false, the effect would be calling `SingleDelete` on a key with multiple older versions, which is not supported. Closes https://github.com/facebook/rocksdb/pull/3793 Differential Revision: D7811671 Pulled By: ajkr fbshipit-source-id: 67e0295bfb1695ff9674837f2e05bb29c50efc30	2018-04-30 12:27:34 -07:00
Andrew Kryczka	6afe22db2e	revert db_stress crash-recovery verification Summary: crash-recovery verification is failing in the whitebox testing, which may or may not be a valid correctness issue -- need more time to investigate. In the meantime, reverting so we don't mask other failures. Closes https://github.com/facebook/rocksdb/pull/3786 Differential Revision: D7794516 Pulled By: ajkr fbshipit-source-id: 28ccdfdb9ec9b3b0fb08c15cbf9d2e282201ff33	2018-04-27 12:57:01 -07:00
Andrew Kryczka	db36f222d8	Allow options file in db_stress and db_crashtest Summary: - When options file is provided to db_stress, take supported options from the file instead of from flags - Call `BuildOptionsTable` after `Open` so it can use `options_` once it has been populated either from flags or from file - Allow options filename to be passed via `db_crashtest.py` Closes https://github.com/facebook/rocksdb/pull/3768 Differential Revision: D7755331 Pulled By: ajkr fbshipit-source-id: 5205cc5deb0d74d677b9832174153812bab9a60a	2018-04-26 18:42:07 -07:00
Andrew Kryczka	a4fb1f8c04	Add crash-recovery correctness check to db_stress Summary: Previously, our `db_stress` tool held the expected state of the DB in-memory, so after crash-recovery, there was no way to verify data correctness. This PR adds an option, `--expected_values_file`, which specifies a file holding the expected values. In black-box testing, the `db_stress` process can be killed arbitrarily, so updates to the `--expected_values_file` must be atomic. We achieve this by `mmap`ing the file and relying on `std::atomic<uint32_t>` for atomicity. Actually this doesn't provide a total guarantee on what we want as `std::atomic<uint32_t>` could, in theory, be translated into multiple stores surrounded by a mutex. We can verify our assumption by looking at `std::atomic::is_always_lock_free`. For the `mmap`'d file, we didn't have an existing way to expose its contents as a raw memory buffer. This PR adds it in the `Env::NewMemoryMappedFileBuffer` function, and `MemoryMappedFileBuffer` class. `db_crashtest.py` is updated to use an expected values file for black-box testing. On the first iteration (when the DB is created), an empty file is provided as `db_stress` will populate it when it runs. On subsequent iterations, that same filename is provided so `db_stress` can check the data is as expected on startup. Closes https://github.com/facebook/rocksdb/pull/3629 Differential Revision: D7463144 Pulled By: ajkr fbshipit-source-id: c8f3e82c93e045a90055e2468316be155633bd8b	2018-04-24 15:58:22 -07:00
Andrew Kryczka	b058a33705	Reduce default --nooverwritepercent in black-box crash tests Summary: Previously `python tools/db_crashtest.py blackbox` would do no useful work as the crash interval (two minutes) was shorter than the preparation phase. The preparation phase is slow because of the ridiculously inefficient way it computes which keys should not be overwritten. It was doing this for 60M keys since default values were `FLAGS_nooverwritepercent == 60` and `FLAGS_max_key == 100000000`. Move the "nooverwritepercent" override from whitebox-specific to the general options so it also applies to blackbox test runs. Now preparation phase takes a few seconds. Closes https://github.com/facebook/rocksdb/pull/3671 Differential Revision: D7457732 Pulled By: ajkr fbshipit-source-id: 601f4461a6a7e49e50449dcf15aebc9b8a98d6f0	2018-04-03 15:28:40 -07:00

... 2 3 4 5 6 ...

397 Commits