rocksdb

Commit Graph

Author	SHA1	Message	Date
sdong	f33611d5e9	Stress test to inject read failures in DB reopen (#8476 ) Summary: Inject read failures in DB reopen, just as what we do for metadata writes and writes. Pull Request resolved: https://github.com/facebook/rocksdb/pull/8476 Test Plan: Some manual tests and make sure failures are triggered. Reviewed By: anand1976 Differential Revision: D29507283 fbshipit-source-id: d04da0163973447041038bd87701686a417c4e0c	2021-07-06 11:05:27 -07:00
mrambacher	570248aeff	Make SecondaryCache Customizable (#8480 ) Summary: Pull Request resolved: https://github.com/facebook/rocksdb/pull/8480 Reviewed By: zhichao-cao Differential Revision: D29528740 Pulled By: mrambacher fbshipit-source-id: fd0f70d15f66611c8498257a9973f7e98ca13839	2021-07-06 09:18:08 -07:00
Zhichao Cao	a95a776d75	Inject fatal write failures to db_stress when DB is running (#8479 ) Summary: add the injest_error_severity to control if it is a retryable IO Error or a fatal or unrecoverable error. Use a flag to indicate, if fatal error comes, the flag is set and db is stopped (but not corrupted). Pull Request resolved: https://github.com/facebook/rocksdb/pull/8479 Test Plan: run ./db_stress --reopen=0 --read_fault_one_in=1000 --write_fault_one_in=5 --disable_wal=true --write_buffer_size=3000000 -writepercent=5 -readpercent=50 --injest_error_severity=2 --column_families=1, make check Reviewed By: anand1976 Differential Revision: D29524271 Pulled By: zhichao-cao fbshipit-source-id: 1aa9fb9b5655b0adba6f5ad12005ca8c074c795b	2021-07-01 14:16:47 -07:00
sdong	ba224b75c7	Stress Test to inject write failures in reopen (#8474 ) Summary: Previously Stress can inject metadata write failures when reopening a DB. We extend it to file append too, in the same way. Pull Request resolved: https://github.com/facebook/rocksdb/pull/8474 Test Plan: manually run crash test with various setting and make sure the failures are triggered as expected. Reviewed By: zhichao-cao Differential Revision: D29503116 fbshipit-source-id: e73a446e80ccbd09301a579280e56ff949381fab	2021-06-30 16:46:41 -07:00
anand76	6f9ed59b1d	Allow db_stress to use a secondary cache (#8455 ) Summary: Add a ```-secondary_cache_uri``` to db_stress to allow the user to specify a custom ```SecondaryCache``` object from the object registry. Also allow db_crashtest.py to be run with an alternate db_stress location. Together, these changes will allow us to run db_stress using FB internal components. Pull Request resolved: https://github.com/facebook/rocksdb/pull/8455 Reviewed By: zhichao-cao Differential Revision: D29371972 Pulled By: anand1976 fbshipit-source-id: dd1b1fd80ebbedc11aa63d9246ea6ae49edb77c4	2021-06-27 23:54:39 -07:00
Akanksha Mahajan	be8199cdb9	Run Merge with Integrated BlobDB in stress, crash and db_bench (#8461 ) Summary: Run Merge with Intergrated BlobDB in stress tests, crash tests and db_bench. Pull Request resolved: https://github.com/facebook/rocksdb/pull/8461 Test Plan: 1. python3 -u tools/db_crashtest.py --simple whitebox ---use_merge=1 --enable_blob_files=1 2. ./db_bench --benchmarks="readwhilemerging" --merge_operator=uint64add --enable_blob_files=true Reviewed By: ltamasi Differential Revision: D29394824 Pulled By: akankshamahajan15 fbshipit-source-id: 0a8e492b13129673e088fb8af3402ab678bb473a	2021-06-25 10:45:52 -07:00
Peter Dillinger	865a25101d	Mark Ribbon filter and optimize_filters_for_memory as production (#8408 ) Summary: Marked the Ribbon filter and optimize_filters_for_memory features as production-ready, each enabling memory savings for Bloom-like filters. Use `NewRibbonFilterPolicy` in place of `NewBloomFilterPolicy` to use Ribbon filters instead of Bloom, or `ribbonfilter` in place of `bloomfilter` in configuration string. Some small refactoring in db_stress. Removed/refactored unused code in db_bench, in part preparing for future default possibly being different from "disabled." Pull Request resolved: https://github.com/facebook/rocksdb/pull/8408 Test Plan: Lots of prior automated, ad-hoc, and "real world" testing. Updated tests for new API names. Quick db_bench test: bloom fillrandom 77730 ops/sec rocksdb.block.cache.filter.bytes.insert COUNT : 89929384 ribbon fillrandom 71492 ops/sec rocksdb.block.cache.filter.bytes.insert COUNT : 64531384 Reviewed By: mrambacher Differential Revision: D29140805 Pulled By: pdillinger fbshipit-source-id: d742c922722421678f95ad85eeb0aaebc9f5e49a	2021-06-17 12:29:16 -07:00
sdong	7f3a0f5bc6	db_stress: wait for compaction to finish after open with failure injection (#8270 ) Summary: When injecting in DB open, error can happen in background threads, causing DB open succeed, but DB is soon made read-only and subsequence writes will fail, which is not expected. To prevent it from happening, wait for compaction to finish before serving the traffic. If there is a failure, reopen. Pull Request resolved: https://github.com/facebook/rocksdb/pull/8270 Test Plan: Run the test. Reviewed By: ajkr Differential Revision: D28230537 fbshipit-source-id: e2e97888904f9b9bb50c35ccf95b88c2319ef5c3	2021-05-05 16:41:45 -07:00
sdong	e19908cba6	Refactor kill point (#8241 ) Summary: Refactor kill point to one single class, rather than several extern variables. The intention was to drop unflushed data before killing to simulate some job, and I tried to a pointer to fault ingestion fs to the killing class, but it ended up with harder than I thought. Perhaps we'll need to do this in another way. But I thought the refactoring itself is good so I send it out. Pull Request resolved: https://github.com/facebook/rocksdb/pull/8241 Test Plan: make release and run crash test for a while. Reviewed By: anand1976 Differential Revision: D28078486 fbshipit-source-id: f9182c1455f52e6851c13f88a21bade63bcec45f	2021-05-05 15:50:29 -07:00
Andrew Kryczka	0f42e50fec	Fix `GetLiveFiles()` returning OPTIONS-000000 (#8268 ) Summary: See release note in HISTORY.md. Pull Request resolved: https://github.com/facebook/rocksdb/pull/8268 Test Plan: unit test repro Reviewed By: siying Differential Revision: D28227901 Pulled By: ajkr fbshipit-source-id: faf61d13b9e43a761e3d5dcf8203923126b51339	2021-05-05 12:54:46 -07:00
sdong	cde69a7cfd	db_stress to add --open_metadata_write_fault_one_in (#8235 ) Summary: DB Stress to add --open_metadata_write_fault_one_in which would randomly fail in some file metadata modification operations during DB Open, including file creation, close, renaming and directory sync. Some operations can fail before and after the operations take place. If DB open fails, db_stress would retry without the failure ingestion, and DB is expected to open successfully. This option is enabled in crash test in half of the time. Some follow up changes would allow write failures in open time, and ingesting those failures in non-DB open cases. Pull Request resolved: https://github.com/facebook/rocksdb/pull/8235 Test Plan: Run stress tests for a while and see failures got triggered. This can reproduce the bug fixed by https://github.com/facebook/rocksdb/pull/8192 and a similar one that fails when fsyncing parent directory. Reviewed By: anand1976 Differential Revision: D28010944 fbshipit-source-id: 36a96da4dc3633e5f7680cef3ea0a900fcdb5558	2021-04-28 10:58:05 -07:00
Peter Dillinger	95f6add746	Revert Ribbon starting level support from #8198 (#8212 ) Summary: This partially reverts commit `10196d7edc`. The problem with this change is because of important filter use cases: FIFO compaction and SST writer. FIFO "compaction" always uses level 0 so would only use Ribbon filters if specifically including level 0 for the Ribbon filter policy. SST writer sets level_at_creation=-1 to indicate unknown level, and this would be treated the same as level 0 unless fixed. We are keeping the part about committing to permanent schema, which is only changes to API comments and HISTORY.md. Pull Request resolved: https://github.com/facebook/rocksdb/pull/8212 Test Plan: CI Reviewed By: jay-zhuang Differential Revision: D27896468 Pulled By: pdillinger fbshipit-source-id: 50a775f7cba5d64fb729d9b982e355864020596e	2021-04-20 19:46:40 -07:00
Peter Dillinger	10196d7edc	Ribbon long-term support, starting level support (#8198 ) Summary: Since the Ribbon filter schema seems good (compatible back to 6.15.0), this change commits to long term support of the SST schema, even though we expect the API for enabling Ribbon to change (still called NewExperimentalRibbonFilterPolicy). This also adds support for "hybrid" configuration in which some levels use Bloom (higher levels, lower numbered) for speed and the rest use Ribbon (lower levels, higher numbered) for memory space efficiency. Pull Request resolved: https://github.com/facebook/rocksdb/pull/8198 Test Plan: unit test added, crash test support Reviewed By: jay-zhuang Differential Revision: D27831232 Pulled By: pdillinger fbshipit-source-id: 90e528677689474d293ed6710b42ba89fbd5b5ab	2021-04-16 15:43:08 -07:00
Akanksha Mahajan	0be89e87fd	Enable backup/restore for Integrated BlobDB in stress and crash tests (#8165 ) Summary: Enable backup/restore functionality with Integrated BlobDB in db_stress and crash test. Pull Request resolved: https://github.com/facebook/rocksdb/pull/8165 Test Plan: Ran python3 -u tools/db_crashtest.py --simple whitebox along with : 1. decreased "backup_in_one" value for backups to be more frequent and 2. manually changed code for "enable_blob_file" to be always true and apply blobdb params 100% for testing purpose. Reviewed By: ltamasi Differential Revision: D27636025 Pulled By: akankshamahajan15 fbshipit-source-id: 0d0e0d1479ced163f992872dc998e79c581bfc99	2021-04-07 17:57:24 -07:00
Peter Dillinger	879357fdb0	Make backups openable as read-only DBs (#8142 ) Summary: A current limitation of backups is that you don't know the exact database state of when the backup was taken. With this new feature, you can at least inspect the backup's DB state without restoring it by opening it as a read-only DB. Rather than add something like OpenAsReadOnlyDB to the BackupEngine API, which would inhibit opening stackable DB implementations read-only (if/when their APIs support it), we instead provide a DB name and Env that can be used to open as a read-only DB. Possible follow-up work: * Add a version of GetBackupInfo for a single backup. * Let CreateNewBackup return the BackupID of the newly-created backup. Implementation details: Refactored ChrootFileSystem to split off new base class RemapFileSystem, which allows more general remapping of files. We use this base class to implement BackupEngineImpl::RemapSharedFileSystem. To minimize API impact, I decided to just add these fields `name_for_open` and `env_for_open` to those set by GetBackupInfo when include_file_details=true. Creating the RemapSharedFileSystem adds a bit to the memory consumption, perhaps unnecessarily in some cases, but this has been mitigated by (a) only initialize the RemapSharedFileSystem lazily when GetBackupInfo with include_file_details=true is called, and (b) using the existing `shared_ptr<FileInfo>` objects to hold most of the mapping data. To enhance API safety, RemapSharedFileSystem is wrapped by new ReadOnlyFileSystem which rejects any attempts to write. This uncovered a couple of places in which DB::OpenForReadOnly would write to the filesystem, so I fixed these. Added a release note because this affects logging. Additional minor refactoring in backupable_db.cc to support the new functionality. Pull Request resolved: https://github.com/facebook/rocksdb/pull/8142 Test Plan: new test (run with ASAN and UBSAN), added to stress test and ran it for a while with amplified backup_one_in Reviewed By: ajkr Differential Revision: D27535408 Pulled By: pdillinger fbshipit-source-id: 04666d310aa0261ef6b2385c43ca793ce1dfd148	2021-04-06 14:37:53 -07:00
Yanqin Jin	08144bc2f5	Add user-defined timestamps to db_stress (#8061 ) Summary: Add some basic test for user-defined timestamp to db_stress. Currently, read with timestamp always tries to read using the current timestamp. Due to the per-key timestamp-sequence ordering constraint, we only add timestamp- related tests to the `NonBatchedOpsStressTest` since this test serializes accesses to the same key and uses a file to cross-check data correctness. The timestamp feature is not supported in a number of components, e.g. Merge, SingleDelete, DeleteRange, CompactionFilter, Readonly instance, secondary instance, SST file ingestion, transaction, etc. Therefore, db_stress should exit if user enables both timestamp and these features at the same time. The (currently) incompatible features can be found in `CheckAndSetOptionsForUserTimestamp`. This PR also fixes a bug triggered when timestamp is enabled together with `index_type=kBinarySearchWithFirstKey`. This bug fix will also be in another separate PR with more unit tests coverage. Fixing it here because I do not want to exclude the index type from crash test. Pull Request resolved: https://github.com/facebook/rocksdb/pull/8061 Test Plan: make crash_test_with_ts Reviewed By: jay-zhuang Differential Revision: D27056282 Pulled By: riversand963 fbshipit-source-id: c3e00ad1023fdb9ebbdf9601ec18270c5e2925a9	2021-03-23 05:13:30 -07:00
Levi Tamasi	0d800dadea	Adjust the set of potential min_blob_size values in stress/crash tests (#8085 ) Summary: Since our stress/crash tests by default generate values of size 8, 16, or 24, it does not make much sense to set `min_blob_size` to 256. The patch updates the set of potential `min_blob_size` values in the crash test script and in `db_stress` where it might be set dynamically using `SetOptions`. Pull Request resolved: https://github.com/facebook/rocksdb/pull/8085 Test Plan: Ran `make check` and tried the crash test script. Reviewed By: riversand963 Differential Revision: D27238620 Pulled By: ltamasi fbshipit-source-id: 4a96f9944b1ed9220d3045c5ab0b34c49009aeee	2021-03-22 14:38:09 -07:00
Peter Dillinger	3bfd3ed2f3	Begin forward compatibility for new backup meta schema (#8069 ) Summary: This does not add any new public APIs or published functionality, but adds the ability to read and use (and in tests, write) backups with a new meta file schema, based on the old schema but not forward-compatible (before this change). The new schema enables some capabilities not in the old: * Explicit versioning, so that users get clean error messages the next time we want to break forward compatibility. * Ignoring unrecognized fields (with warning), so that new non-critical features can be added without breaking forward compatibility. * Rejecting future "non-ignorable" fields, so that new features critical to some use-cases could potentially be added outside of linear schema versions, with broken forward compatibility. * Fields at the end of the meta file, such as for checksum of the meta file's contents (up to that point) * New optional 'size' field for each file, which is checked when present * Optionally omitting 'crc32' field, so that we aren't required to have a crc32c checksum for files to take a backup. (E.g. to support backup via hard links and to better support file custom checksums.) Because we do not have a JSON parser and to share code, the new schema is simply derived from the old schema. BackupEngine code is updated to allow missing checksums in some places, and to make that easier, `has_checksum` and `verify_checksum_after_work` are eliminated. Empty `checksum_hex` indicates checksum is unknown. I'm not too afraid of regressing on data integrity, because (a) we have pretty good test coverage of corruption detection in backups, and (b) we are increasingly relying on the DB itself for data integrity rather than it being an exclusive feature of backups. Pull Request resolved: https://github.com/facebook/rocksdb/pull/8069 Test Plan: new unit tests, added to crash test (some local run with boosted backup probability) Reviewed By: ajkr Differential Revision: D27139824 Pulled By: pdillinger fbshipit-source-id: 9e0e4decfb42bb84783d64d2d246456d97e8e8c5	2021-03-19 20:15:40 -07:00
mrambacher	3dff28cf9b	Use SystemClock* instead of std::shared_ptr<SystemClock> in lower level routines (#8033 ) Summary: For performance purposes, the lower level routines were changed to use a SystemClock* instead of a std::shared_ptr<SystemClock>. The shared ptr has some performance degradation on certain hardware classes. For most of the system, there is no risk of the pointer being deleted/invalid because the shared_ptr will be stored elsewhere. For example, the ImmutableDBOptions stores the Env which has a std::shared_ptr<SystemClock> in it. The SystemClock* within the ImmutableDBOptions is essentially a "short cut" to gain access to this constant resource. There were a few classes (PeriodicWorkScheduler?) where the "short cut" property did not hold. In those cases, the shared pointer was preserved. Using db_bench readrandom perf_level=3 on my EC2 box, this change performed as well or better than 6.17: 6.17: readrandom : 28.046 micros/op 854902 ops/sec; 61.3 MB/s (355999 of 355999 found) 6.18: readrandom : 32.615 micros/op 735306 ops/sec; 52.7 MB/s (290999 of 290999 found) PR: readrandom : 27.500 micros/op 871909 ops/sec; 62.5 MB/s (367999 of 367999 found) (Note that the times for 6.18 are prior to revert of the SystemClock). Pull Request resolved: https://github.com/facebook/rocksdb/pull/8033 Reviewed By: pdillinger Differential Revision: D27014563 Pulled By: mrambacher fbshipit-source-id: ad0459eba03182e454391b5926bf5cdd45657b67	2021-03-15 04:34:11 -07:00
Peter Dillinger	847ca9f964	Make default share_files_with_checksum=true (#8020 ) Summary: New comment for share_files_with_checksum: // Only used if share_table_files is set to true. Setting to false is // DEPRECATED and potentially dangerous because in that case BackupEngine // can lose data if backing up databases with distinct or divergent // history, for example if restoring from a backup other than the latest, // writing to the DB, and creating another backup. Setting to true (default) // prevents these issues by ensuring that different table files (SSTs) with // the same number are treated as distinct. See // share_files_with_checksum_naming and ShareFilesNaming. I have also removed interim option kFlagMatchInterimNaming, which is no longer needed and was never needed for correct+compatible operation (just performance). Pull Request resolved: https://github.com/facebook/rocksdb/pull/8020 Test Plan: tests updated. Backward+forward compatibility verified with SHORT_TEST=1 check_format_compatible.sh. ldb uses default backup options, and I manually verified shared_checksum in /tmp/rocksdb_format_compatible_peterd/bak/current/ after run. Reviewed By: ajkr Differential Revision: D26786331 Pulled By: pdillinger fbshipit-source-id: 36f968dfef1f5cacbd65154abe1d846151a55130	2021-03-09 16:27:13 -08:00
Yanqin Jin	1f11d07f24	Enable compact filter for blob in dbstress and dbbench (#8011 ) Summary: As title. Pull Request resolved: https://github.com/facebook/rocksdb/pull/8011 Test Plan: ``` ./db_bench -enable_blob_files=1 -use_keep_filter=1 -disable_auto_compactions=1 /db_stress -enable_blob_files=1 -enable_compaction_filter=1 -acquire_snapshot_one_in=0 -compact_range_one_in=0 -iterpercent=0 -test_batches_snapshots=0 -readpercent=10 -prefixpercent=20 -writepercent=55 -delpercent=15 -continuous_verification_interval=0 ``` Reviewed By: ltamasi Differential Revision: D26736061 Pulled By: riversand963 fbshipit-source-id: 1c7834903c28431ce23324c4f259ed71255614e2	2021-03-01 17:24:47 -08:00
Andrew Kryczka	d904233d2f	Limit buffering for collecting samples for compression dictionary (#7970 ) Summary: For dictionary compression, we need to collect some representative samples of the data to be compressed, which we use to either generate or train (when `CompressionOptions::zstd_max_train_bytes > 0`) a dictionary. Previously, the strategy was to buffer all the data blocks during flush, and up to the target file size during compaction. That strategy allowed us to randomly pick samples from as wide a range as possible that'd be guaranteed to land in a single output file. However, some users try to make huge files in memory-constrained environments, where this strategy can cause OOM. This PR introduces an option, `CompressionOptions::max_dict_buffer_bytes`, that limits how much data blocks are buffered before we switch to unbuffered mode (which means creating the per-SST dictionary, writing out the buffered data, and compressing/writing new blocks as soon as they are built). It is not strict as we currently buffer more than just data blocks -- also keys are buffered. But it does make a step towards giving users predictable memory usage. Related changes include: - Changed sampling for dictionary compression to select unique data blocks when there is limited availability of data blocks - Made use of `BlockBuilder::SwapAndReset()` to save an allocation+memcpy when buffering data blocks for building a dictionary - Changed `ParseBoolean()` to accept an input containing characters after the boolean. This is necessary since, with this PR, a value for `CompressionOptions::enabled` is no longer necessarily the final component in the `CompressionOptions` string. Pull Request resolved: https://github.com/facebook/rocksdb/pull/7970 Test Plan: - updated `CompressionOptions` unit tests to verify limit is respected (to the extent expected in the current implementation) in various scenarios of flush/compaction to bottommost/non-bottommost level - looked at jemalloc heap profiles right before and after switching to unbuffered mode during flush/compaction. Verified memory usage in buffering is proportional to the limit set. Reviewed By: pdillinger Differential Revision: D26467994 Pulled By: ajkr fbshipit-source-id: 3da4ef9fba59974e4ef40e40c01611002c861465	2021-02-19 14:09:54 -08:00
Levi Tamasi	dab4fe5bcd	Add checkpoint support to BlobDB (#7959 ) Summary: The patch adds checkpoint support to BlobDB. Blob files are hard linked or copied, depending on whether the checkpoint directory is on the same filesystem or not, similarly to table files. TODO: Add support for blob files to `ExportColumnFamily` and to the checksum verification logic used by backup/restore. Pull Request resolved: https://github.com/facebook/rocksdb/pull/7959 Test Plan: Ran `make check` and the crash test for a while. Reviewed By: riversand963 Differential Revision: D26434768 Pulled By: ltamasi fbshipit-source-id: 994be55a8dc08133028250760fca440d2c7c4dc5	2021-02-17 12:42:36 -08:00
Levi Tamasi	0288bdbc53	Add the integrated BlobDB to the stress/crash tests (#7900 ) Summary: The patch adds support for the options related to the new BlobDB implementation to `db_stress`, including support for dynamically adjusting them using `SetOptions` when `set_options_one_in` and a new flag `allow_setting_blob_options_dynamically` are specified. (The latter is used to prevent the options from being enabled when incompatible features are in use.) The patch also updates the `db_stress` help messages of the existing stacked BlobDB related options to clarify that they pertain to the old implementation. In addition, it adds the new BlobDB to the crash test script. In order to prevent a combinatorial explosion of jobs and still perform whitebox/blackbox testing (including under ASAN/TSAN/UBSAN), and to also test BlobDB in conjunction with atomic flush and transactions, the script sets the BlobDB options in 10% of normal/`cf_consistency`/`txn` crash test runs. Pull Request resolved: https://github.com/facebook/rocksdb/pull/7900 Test Plan: Ran `make check` and `db_stress`/`db_crashtest.py` with various options. Reviewed By: jay-zhuang Differential Revision: D26094913 Pulled By: ltamasi fbshipit-source-id: c2ef3391a05e43a9687f24e297df05f4a5584814	2021-02-02 11:41:18 -08:00
Cheng Chang	30cd38c687	Increase the txn lock timeout in stress test (#7823 ) Summary: We recently encounter two cases of txn lock timeout in stress test. It might be caused due to latencies of resource scheduling in the internal infrastructure. Hopefully increasing the timeout can make the related tests less flaky. Pull Request resolved: https://github.com/facebook/rocksdb/pull/7823 Test Plan: watch internal stress test to pass. Reviewed By: siying Differential Revision: D25739233 Pulled By: cheng-chang fbshipit-source-id: 84a5a8ae820db24dacd0cfc05928b26505fab89d	2020-12-30 20:31:35 -08:00
Zhichao Cao	601585bca4	fix memory leak in db_stress checkpoint test (#7813 ) Summary: fix memory leak in db_stress checkpoint test. If s is not ok, checkpoint is not deleted, may cause memory leak. Pull Request resolved: https://github.com/facebook/rocksdb/pull/7813 Test Plan: make asan_check Reviewed By: cheng-chang Differential Revision: D25702999 Pulled By: zhichao-cao fbshipit-source-id: 08253b0852835acb8cfd412503cdabf720afb678	2020-12-25 13:15:48 -08:00
Zhichao Cao	04b3524ad0	Inject the random write error to stress test (#7653 ) Summary: Inject the random write error to stress test, it requires set reopen=0 and disable_wal=true. Pull Request resolved: https://github.com/facebook/rocksdb/pull/7653 Test Plan: pass db_stress and python3 db_crashtest.py blackbox Reviewed By: ajkr Differential Revision: D25354132 Pulled By: zhichao-cao fbshipit-source-id: 44721104eecb416e27f65f854912c40e301dd669	2020-12-17 11:52:28 -08:00
Peter Dillinger	60af964372	Experimental (production candidate) SST schema for Ribbon filter (#7658 ) Summary: Added experimental public API for Ribbon filter: NewExperimentalRibbonFilterPolicy(). This experimental API will take a "Bloom equivalent" bits per key, and configure the Ribbon filter for the same FP rate as Bloom would have but ~30% space savings. (Note: optimize_filters_for_memory is not yet implemented for Ribbon filter. That can be added with no effect on schema.) Internally, the Ribbon filter is configured using a "one_in_fp_rate" value, which is 1 over desired FP rate. For example, use 100 for 1% FP rate. I'm expecting this will be used in the future for configuring Bloom-like filters, as I expect people to more commonly hold constant the filter accuracy and change the space vs. time trade-off, rather than hold constant the space (per key) and change the accuracy vs. time trade-off, though we might make that available. ### Benchmarking ``` $ ./filter_bench -impl=2 -quick -m_keys_total_max=200 -average_keys_per_filter=100000 -net_includes_hashing Building... Build avg ns/key: 34.1341 Number of filters: 1993 Total size (MB): 238.488 Reported total allocated memory (MB): 262.875 Reported internal fragmentation: 10.2255% Bits/key stored: 10.0029 ---------------------------- Mixed inside/outside queries... Single filter net ns/op: 18.7508 Random filter net ns/op: 258.246 Average FP rate %: 0.968672 ---------------------------- Done. (For more info, run with -legend or -help.) $ ./filter_bench -impl=3 -quick -m_keys_total_max=200 -average_keys_per_filter=100000 -net_includes_hashing Building... Build avg ns/key: 130.851 Number of filters: 1993 Total size (MB): 168.166 Reported total allocated memory (MB): 183.211 Reported internal fragmentation: 8.94626% Bits/key stored: 7.05341 ---------------------------- Mixed inside/outside queries... Single filter net ns/op: 58.4523 Random filter net ns/op: 363.717 Average FP rate %: 0.952978 ---------------------------- Done. (For more info, run with -legend or -help.) ``` 168.166 / 238.488 = 0.705 -> 29.5% space reduction 130.851 / 34.1341 = 3.83x construction time for this Ribbon filter vs. lastest Bloom filter (could make that as little as about 2.5x for less space reduction) ### Working around a hashing "flaw" bloom_test discovered a flaw in the simple hashing applied in StandardHasher when num_starts == 1 (num_slots == 128), showing an excessively high FP rate. The problem is that when many entries, on the order of number of hash bits or kCoeffBits, are associated with the same start location, the correlation between the CoeffRow and ResultRow (for efficiency) can lead to a solution that is "universal," or nearly so, for entries mapping to that start location. (Normally, variance in start location breaks the effective association between CoeffRow and ResultRow; the same value for CoeffRow is effectively different if start locations are different.) Without kUseSmash and with num_starts > 1 (thus num_starts ~= num_slots), this flaw should be completely irrelevant. Even with 10M slots, the chances of a single slot having just 16 (or more) entries map to it--not enough to cause an FP problem, which would be local to that slot if it happened--is 1 in millions. This spreadsheet formula shows that: =1/(10000000(1 - POISSON(15, 1, TRUE))) As kUseSmash==false (the setting for Standard128RibbonBitsBuilder) is intended for CPU efficiency of filters with many more entries/slots than kCoeffBits, a very reasonable work-around is to disallow num_starts==1 when !kUseSmash, by making the minimum non-zero number of slots 2kCoeffBits. This is the work-around I've applied. This also means that the new Ribbon filter schema (Standard128RibbonBitsBuilder) is not space-efficient for less than a few hundred entries. Because of this, I have made it fall back on constructing a Bloom filter, under existing schema, when that is more space efficient for small filters. (We can change this in the future if we want.) TODO: better unit tests for this case in ribbon_test, and probably update StandardHasher for kUseSmash case so that it can scale nicely to small filters. ### Other related changes * Add Ribbon filter to stress/crash test * Add Ribbon filter to filter_bench as -impl=3 * Add option string support, as in "filter_policy=experimental_ribbon:5.678;" where 5.678 is the Bloom equivalent bits per key. * Rename internal mode BloomFilterPolicy::kAuto to kAutoBloom * Add a general BuiltinFilterBitsBuilder::CalculateNumEntry based on binary searching CalculateSpace (inefficient), so that subclasses (especially experimental ones) don't have to provide an efficient implementation inverting CalculateSpace. * Minor refactor FastLocalBloomBitsBuilder for new base class XXH3pFilterBitsBuilder shared with new Standard128RibbonBitsBuilder, which allows the latter to fall back on Bloom construction in some extreme cases. * Mostly updated bloom_test for Ribbon filter, though a test like FullBloomTest::Schema is a next TODO to ensure schema stability (in case this becomes production-ready schema as it is). * Add some APIs to ribbon_impl.h for configuring Ribbon filters. Although these are reasonably covered by bloom_test, TODO more unit tests in ribbon_test * Added a "tool" FindOccupancyForSuccessRate to ribbon_test to get data for constructing the linear approximations in GetNumSlotsFor95PctSuccess. Pull Request resolved: https://github.com/facebook/rocksdb/pull/7658 Test Plan: Some unit tests updated but other testing is left TODO. This is considered experimental but laying down schema compatibility as early as possible in case it proves production-quality. Also tested in stress/crash test. Reviewed By: jay-zhuang Differential Revision: D24899349 Pulled By: pdillinger fbshipit-source-id: 9715f3e6371c959d923aea8077c9423c7a9f82b8	2020-11-12 20:46:14 -08:00
Cheng Chang	c3911f1a72	Track WAL in MANIFEST: Track deleted WALs in MANIFEST after recovering from the WALs (#7649 ) Summary: After replaying the WALs, the memtables are flushed synchronously to L0 instead of being flushed in background. Currently, we only track WAL obsoletion events in the code path of background flush jobs. This PR tracks these events in RecoverLogFiles. After this change, we can enable `track_and_verify_wal_in_manifest` in `db_stress`. Pull Request resolved: https://github.com/facebook/rocksdb/pull/7649 Test Plan: `python tools/db_crashtest.py whitebox` Reviewed By: riversand963 Differential Revision: D24824501 Pulled By: cheng-chang fbshipit-source-id: 207129f7b845c50b333680ce6818a68a2fad54b9	2020-11-09 10:25:43 -08:00
Cheng Chang	5e794b0841	Fix a recovery corner case (#7621 ) Summary: Consider the following sequence of events: 1. Db flushed an SST with file number N, appended to MANIFEST, and tried to sync the MANIFEST. 2. Syncing MANIFEST failed and db crashed. 3. Db tried to recover with this MANIFEST. In the meantime, no entry about the newly-flushed SST was found in the MANIFEST. Therefore, RocksDB replayed WAL and tried to flush to an SST file reusing the same file number N. This failed because file system does not support overwrite. Then Db deleted this file. 4. Db crashed again. 5. Db tried to recover. When db read the MANIFEST, there was an entry referencing N.sst. This could happen probably because the append in step 1 finally reached the MANIFEST and became visible. Since N.sst had been deleted in step 3, recovery failed. It is possible that N.sst created in step 1 is valid. Although step 3 would still fail since the MANIFEST was not synced properly in step 1 and 2, deleting N.sst would make it impossible for the db to recover even if the remaining part of MANIFEST was appended and visible after step 5. After this PR, in step 3, immediately after recovering from MANIFEST, a new MANIFEST is created, then we find that N.sst is not referenced in the MANIFEST, so we delete it, and we'll not reuse N as file number. Then in step 5, since the new MANIFEST does not contain N.sst, the recovery failure situation in step 5 won't happen. Pull Request resolved: https://github.com/facebook/rocksdb/pull/7621 Test Plan: 1. some tests are updated, because these tests assume that new MANIFEST is created after WAL recovery. 2. a new unit test is added in db_basic_test to simulate step 3. Reviewed By: riversand963 Differential Revision: D24668144 Pulled By: cheng-chang fbshipit-source-id: 90d7487fbad2bc3714f5ede46ea949895b15ae3b	2020-11-07 22:23:27 -08:00
Cheng Chang	1e40696dd1	Track WAL in MANIFEST: LogAndApply WAL events to MANIFEST (#7601 ) Summary: When a WAL is synced, an edit is written to MANIFEST. After flushing memtables, the obsoleted WALs are piggybacked to MANIFEST while writing the new L0 files to MANIFEST. Pull Request resolved: https://github.com/facebook/rocksdb/pull/7601 Test Plan: `track_and_verify_wals_in_manifest` is enabled by default for all tests extending `DBBasicTest`, and in db_stress_test. Unit test `wal_edit_test`, `version_edit_test`, and `version_set_test` are also updated. Watch all tests to pass. Reviewed By: ltamasi Differential Revision: D24553957 Pulled By: cheng-chang fbshipit-source-id: 66a569ff1bdced38e22900bd240b73113906e040	2020-11-06 17:22:36 -08:00
Jay Zhuang	4a6840bd00	db_stress prints key in Hex (#7533 ) Summary: db_stress prints key in both id and hex. https://github.com/facebook/rocksdb/issues/7531 Pull Request resolved: https://github.com/facebook/rocksdb/pull/7533 Test Plan: local test Reviewed By: akankshamahajan15 Differential Revision: D24252725 Pulled By: jay-zhuang fbshipit-source-id: f0c1409a0568874df36949d5da139316d978fa98	2020-10-13 12:38:59 -07:00
Andrew Kryczka	75d3b6fdf0	Redesign block cache pinning API (#7520 ) Summary: The old flag-based APIs (`BlockBasedTableOptions::pin_l0_filter_and_index_blocks_in_cache` and `BlockBasedTableOptions::pin_top_level_index_and_filter`) were insufficient for our needs. For example, it was impossible to pin only unpartitioned meta-blocks, which could prevent block cache contention when turning on dictionary compression or during a migration to partitioned indexes/filters. It was also impossible to pin all meta-blocks in memory while having predictable memory usage via block cache. If we had continued adding flags to address these scenarios, they would have had significant overlap causing confusion. Instead, this PR deprecates the flags and starts a new API with non-overlapping options. Pull Request resolved: https://github.com/facebook/rocksdb/pull/7520 Test Plan: - new unit test - added new options to stress/crash test and ran for a while: `$ python tools/db_crashtest.py blackbox --simple --max_key=1000000 -write_buffer_size=1048576 -target_file_size_base=1048576 -max_bytes_for_level_base=4194304 --interval=10 -value_size_mult=33 -column_families=1 -reopen=0` Reviewed By: pdillinger Differential Revision: D24200034 Pulled By: ajkr fbshipit-source-id: 3fa7cfc71e7960f7a867511dd6ae5834dd73b13e	2020-10-11 14:58:24 -07:00
sdong	aedcaaef99	Stress test to support paranoid_file_checks (#7473 ) Summary: It's important to make sure no false positive is reported when options.paranoid_file_checks is used. Add it to stress test and a place holder in crash test. It is disabled in crash test as there appears to be a bug causing false positive. Pull Request resolved: https://github.com/facebook/rocksdb/pull/7473 Test Plan: Run crash test Reviewed By: ajkr Differential Revision: D24026939 fbshipit-source-id: 89102acb45cf041776775ce44a4eef4b0f3a380c	2020-09-30 14:41:33 -07:00
Peter Dillinger	b475a83f9d	Postponing custom checksum support in BackupEngine (#7411 ) Summary: This change reverts BackupEngine to 6.12 state to accommodate a higher-priority fix that does not easily merge with this custom checksum support. We intend to reinstate this support soon, by merging a revert of this change. For backupable_db_test, I've removed the tests depending on this feature. I've also removed relevant HISTORY.md entry. Pull Request resolved: https://github.com/facebook/rocksdb/pull/7411 Test Plan: unit tests Reviewed By: ajkr Differential Revision: D23793835 Pulled By: pdillinger fbshipit-source-id: 7e861436539584799b13d1a8ae559b81b6d08052	2020-09-18 15:27:03 -07:00
Peter Dillinger	93719fc953	Restore file size in backup table file names (and other cleanup) (#7400 ) Summary: Prior to 6.12, backup files using share_files_with_checksum had the file size encoded in the file name, after the last '\_' and before the last '.'. We considered this an implementation detail subject to change, and indeed removed this information from the file name (with an option to use old behavior) because it was considered ineffective/inefficient for file name uniqueness. However, some downstream RocksDB users were relying on this information since the file size is not explicitly in the backup manifest file. This primary purpose of this change is "retrofitting" the 6.12 release (not yet a public release) to simultaneously support the benefits of the new naming scheme (I/O performance and data correctness at scale) and preserve the file size information, both as default behaviors. With this change, we are essentially making the file size information encoded in the file name an official, though obscure, extension of the backup meta file format. We preserve an option (kLegacyCrc32cAndFileSize) to use the original "legacy" naming scheme, with its caveats, and make it easy to omit the file size information (no kFlagIncludeFileSize), for more compact file names. But note that changing the naming scheme used on an existing db and backup directory can lead to transient space amplification, as some files will be stored under two names in the shared_checksum directory. Because some backups were saved using the original 6.12 naming scheme, we offer two ways of dealing with those files: SST files generated by older 6.12 versions can either use the default naming scheme in effect when the SST files were generated (kFlagMatchInterimNaming, default, no transient space amplification) or can use a new naming scheme (no kFlagMatchInterimNaming, potential space amplification because some already stored files getting a new name). We don't have a natural way to detect which files were generated by previous 6.12 versions, but this change hacks one in by changing DB session ids to now use a more concise encoding, reducing file name length, saving ~dozen bytes from SST files, and making them visually distinct from DB ids so that they are less likely to be mixed up. Two final auxiliary notes: Recognizing that the backup file names have become a de facto part of the backup meta schema, this change makes them easier to parse and extend by putting a distinct marker, 's', before DB session ids embedded in the name. When we extend this to allow custom checksums in the name, they can get their own marker to ensure safe parsing. For backward compatibility, file size does not get a marker but is assumed for `_[0-9]+[.]` Another change from initial 6.12 default behavior is never including file custom checksum in the file name. Looking ahead to 6.13, we do not want the default behavior to cause backup space amplification for someone turning on file custom checksum checking in BackupEngine; we want that to be an easy decision. When implemented, including file custom checksums in backup file names will be a non-default option. Actual file name patterns and priorities, as regexes: kLegacyCrc32cAndFileSize OR pre-6.12 SST file -> [0-9]+_[0-9]+_[0-9]+[.]sst kFlagMatchInterimNaming set (default) AND early 6.12 SST file -> [0-9]+_[0-9a-fA-F-]+[.]sst kUseDbSessionId AND NOT kFlagIncludeFileSize -> [0-9]+_s[0-9A-Z]{20}[.]sst kUseDbSessionId AND kFlagIncludeFileSize (default) -> [0-9]+_s[0-9A-Z]{20}_[0-9]+[.]sst We might add opt-in options for more '\_' separated data in the name, but embedded file size, if present, will always be after last '\_' and before '.sst'. This change was originally applied to version 6.12. (See https://github.com/facebook/rocksdb/issues/7390) Pull Request resolved: https://github.com/facebook/rocksdb/pull/7400 Test Plan: unit tests included. Sync point callbacks are used to mimic previous version SST files. Reviewed By: ajkr Differential Revision: D23759587 Pulled By: pdillinger fbshipit-source-id: f62d8af4e0978de0a34f26288cfbe66049b70025	2020-09-17 10:24:22 -07:00
Peter Dillinger	a0ac71aae1	Disable sst_file_manager in stress testing backup restore (#7384 ) Summary: This is potentially the cause of failures: Failure in Destroy restore dir with: IO error: file rmdir: /dev/shm/rocksdb/rocksdb_crashtest_whitebox/.restore13: Directory not empty Pull Request resolved: https://github.com/facebook/rocksdb/pull/7384 Test Plan: smoke test blackbox_crash_test Reviewed By: jay-zhuang Differential Revision: D23685087 Pulled By: pdillinger fbshipit-source-id: 55f62e9853ce84be1d5ca7d856de867f0f2596ee	2020-09-14 14:21:06 -07:00
Peter Dillinger	c4e2066dbd	Fix cf_consistency_stress for backup/restore, harmonize (#7373 ) Summary: We can only check key on restored backup if in a stress test configuration locking the key. (Fixes mismatch seen in backup/restore with atomic flush.) TestCheckpoint used a very ugly solution to the same problem: copy-paste dozens of lines of code with some changes and removals. I removed the unnecessary implementation and made the existing one simply adaptive, like TestBackupRestore. Also made TestBackupRestore clean up dead backup/restore directories on success. Pull Request resolved: https://github.com/facebook/rocksdb/pull/7373 Test Plan: blackbox_crash_test_with_atomic_flush for a while, blackbox_crash_test for a while, with backup and checkpoint 1 in 5k and only 1k max_keys to stress this area Reviewed By: ajkr Differential Revision: D23629057 Pulled By: pdillinger fbshipit-source-id: d7fe7e2be75aaf3cf974be9540a7c5c5de8b371b	2020-09-10 22:55:06 -07:00
Peter Dillinger	e3149358a5	More backup/restore stress test fixes (#7361 ) Summary: (a) Missed a case in updating handling of rand_keys (b) Only opening restored db with DB::Open so don't (yet) attempt to open restored BlobDB or TransactionDB. Pull Request resolved: https://github.com/facebook/rocksdb/pull/7361 Test Plan: better than being broken Reviewed By: ajkr Differential Revision: D23592570 Pulled By: pdillinger fbshipit-source-id: dd1d999bcc0c852ee77cb6041964ec4abc0fd4fd	2020-09-09 11:19:05 -07:00
Peter Dillinger	4e258d3e63	Fix backup/restore in stress/crash test (#7357 ) Summary: (1) Skip check on specific key if restoring an old backup (small minority of cases) because it can fail in those cases. (2) Remove an old assertion about number of column families and number of keys passed in, which is broken by atomic flush (cf_consistency) test. Like other code (for better or worse) assume a single key and iterate over column families. (3) Apply mock_direct_io to NewSequentialFile so that db_stress backup works on /dev/shm. Also add more context to output in case of backup/restore db_stress failure. Also a minor fix to BackupEngine to report first failure status in creating new backup, and drop another clue about the potential source of a "Backup failed" status. Reverts "Disable backup/restore stress test (https://github.com/facebook/rocksdb/issues/7350)" Pull Request resolved: https://github.com/facebook/rocksdb/pull/7357 Test Plan: Using backup_one_in=10000, "USE_CLANG=1 make crash_test_with_atomic_flush" for 30+ minutes "USE_CLANG=1 make blackbox_crash_test" for 30+ minutes And with use_direct_reads with TEST_TMPDIR=/dev/shm/rocksdb Reviewed By: riversand963 Differential Revision: D23567244 Pulled By: pdillinger fbshipit-source-id: e77171c2e8394d173917e36898c02dead1c40b77	2020-09-08 10:50:19 -07:00
Peter Dillinger	06ad5dd293	Add file checksum to stress/crash test (#7343 ) Summary: This change has the crash test randomly select from a few file checksum implementations, or nullptr, for DB file_checksum_gen_factory. For compatibility across runs on same DB, each non-null factory can understand all the other functions, but the default changes. Pull Request resolved: https://github.com/facebook/rocksdb/pull/7343 Test Plan: 'make blackbox_crash_test' for a while, including with some debug output to ensure code is being exercised. Reviewed By: zhichao-cao Differential Revision: D23494580 Pulled By: pdillinger fbshipit-source-id: 73bbc7ca32c1adaf619134c0c830f12894880b8a	2020-09-03 23:50:33 -07:00
Peter Dillinger	499c9448d0	Fix, enable, and enhance backup/restore in db_stress (#7348 ) Summary: Although added to db_stress, testing of backup/restore was never integrated into the crash test, originally concerned about performance. I've enabled it now and to address the peformance concern, testing backup/restore is always skipped once the db exceeds a certain size threshold, default 100MB. This should provide sufficient opportunity for testing BackupEngine without bogging down everything else with heavier and heavier operations. Also fixed backup/restore in db_stress by making sure PurgeOldBackups can remove manifest files, which are normally kept around for db_stress. Added more coverage of backup options, and up to three backups being saved in one backup directory (in some cases). Pull Request resolved: https://github.com/facebook/rocksdb/pull/7348 Test Plan: ran 'make blackbox_crash_test' for a while, with heightened probabilitly of taking backups (1/10k). Also confirmed with some debug output that the code is being covered, TestBackupRestore only takes a few seconds to complete when triggered, and even at 1/10k and ~50MB database, there's <,~ 1 thread testing backups at any time. Reviewed By: ajkr Differential Revision: D23510835 Pulled By: pdillinger fbshipit-source-id: b6b8735591808141f81f10773ac31634cf03b6c0	2020-09-03 20:13:15 -07:00
Andrew Kryczka	7eebe6d38a	Mark files for compaction in stress/crash tests (#7231 ) Summary: The mechanism to mark files for compaction is most commonly used in delete-triggered compaction. This PR adds an option to exercise the marking mechanism on random files created by db_stress. This PR also enables that option in db_crashtest.py on its db_stress runs at random. Pull Request resolved: https://github.com/facebook/rocksdb/pull/7231 Test Plan: - ran some minified crash tests; verified they succeed and we see `"compaction_reason": "FilesMarkedForCompaction"` regularly in the logs. ``` $ TEST_TMPDIR=/dev/shm python tools/db_crashtest.py blackbox --duration=600 --interval=30 --max_key=10000000 --write_buffer_size=1048576 --target_file_size_base=1048576 --max_bytes_for_level_base=4194304 --value_size_mult=33 $ TEST_TMPDIR=/dev/shm python tools/db_crashtest.py whitebox --duration=600 --interval=30 --max_key=1000000 --write_buffer_size=1048576 --target_file_size_base=1048576 --max_bytes_for_level_base=4194304 --value_size_mult=33 --random_kill_odd=8887 ``` Reviewed By: anand1976 Differential Revision: D23025156 Pulled By: ajkr fbshipit-source-id: a404c467ebc12afa94dae35956ea9b372f592a96	2020-08-10 16:17:56 -07:00
Jay Zhuang	0c5bb10f06	Remove redundant ROCKSDB_LITE check (#7172 ) Summary: It's already inside of a `#ifdef ROCKSDB_LITE` block. Pull Request resolved: https://github.com/facebook/rocksdb/pull/7172 Reviewed By: gg814 Differential Revision: D22736057 Pulled By: jay-zhuang fbshipit-source-id: 31f4aa05aba98e2e42fa6f890fa72acf3a0f12f2	2020-07-24 14:14:14 -07:00
Jay Zhuang	fc4d5f5065	Add stress test for GetProperty (#7111 ) Summary: Add stress test coverage for `DB::GetProperty()`. Pull Request resolved: https://github.com/facebook/rocksdb/pull/7111 Test Plan: ``` ./db_stress -get_property_one_in=1 make crash_test ``` Reviewed By: ajkr Differential Revision: D22487906 Pulled By: jay-zhuang fbshipit-source-id: c118d95cc9b4e2fa669a06e6aa531541fa885dc5	2020-07-14 12:12:36 -07:00
mrambacher	c7c7b07f06	More Makefile Cleanup (#7097 ) Summary: Cleans up some of the dependencies on test code in the Makefile while building tools: - Moves the test::RandomString, DBBaseTest::RandomString into Random - Moves the test::RandomHumanReadableString into Random - Moves the DestroyDir method into file_utils - Moves the SetupSyncPointsToMockDirectIO into sync_point. - Moves the FaultInjection Env and FS classes under env These changes allow all of the tools to build without dependencies on test_util, thereby simplifying the build dependencies. By moving the FaultInjection code, the dependency in db_stress on different libraries for debug vs release was eliminated. Tested both release and debug builds via Make and CMake for both static and shared libraries. More work remains to clean up how the tools are built and remove some unnecessary dependencies. There is also more work that should be done to get the Makefile and CMake to align in their builds -- what is in the libraries and the sizes of the executables are different. Pull Request resolved: https://github.com/facebook/rocksdb/pull/7097 Reviewed By: riversand963 Differential Revision: D22463160 Pulled By: pdillinger fbshipit-source-id: e19462b53324ab3f0b7c72459dbc73165cc382b2	2020-07-09 14:35:17 -07:00
Yanqin Jin	842bd2742a	Running ./ldb without any extra arg print usage (#7107 ) Summary: as title. Pull Request resolved: https://github.com/facebook/rocksdb/pull/7107 Test Plan: make ldb && ./ldb Reviewed By: pdillinger Differential Revision: D22451399 Pulled By: riversand963 fbshipit-source-id: 797645e06473bb9cf139c533877e5161281515e8	2020-07-09 10:20:06 -07:00
Jay Zhuang	ca7659e2c4	Fix release build caused by #7067 (#7077 ) Summary: The issue is introduced by https://github.com/facebook/rocksdb/issues/7067 Pull Request resolved: https://github.com/facebook/rocksdb/pull/7077 Test Plan: `make release` Reviewed By: pdillinger Differential Revision: D22370835 Pulled By: jay-zhuang fbshipit-source-id: 44326bae07809c4518371b6a7d1f47124e24a4f3	2020-07-02 20:53:08 -07:00
Jay Zhuang	00de699096	Replace reinterpret_cast with static_cast_with_check (#7067 ) Summary: Replace `reinterpret_cast` with `static_cast_with_check` for `DBImpl` and `ColumnFamilyHandleImpl`. Pull Request resolved: https://github.com/facebook/rocksdb/pull/7067 Reviewed By: siying Differential Revision: D22361587 Pulled By: jay-zhuang fbshipit-source-id: dfe9e8f3af39c3d27cc372c55ab9ad905eb0a5a1	2020-07-02 19:25:41 -07:00
Cheng Chang	f045ee6422	Increase transaction timeout and enable deadlock detection in stress test (#7056 ) Summary: There are errors like `Transaction put: Operation timed out: Timeout waiting to lock key terminate called without an active exception`, based on experiment on devserver, increasing timeouts can resolve the issue. Pull Request resolved: https://github.com/facebook/rocksdb/pull/7056 Test Plan: watch stress test with txn. Reviewed By: anand1976 Differential Revision: D22317265 Pulled By: cheng-chang fbshipit-source-id: 2dc3352def5e78d2c39a18d7262a3a65ca98bbba	2020-06-30 14:29:17 -07:00
sdong	2d1d51d385	db_stress: deep clean directory before checkpoint (#7039 ) Summary: We see crash test occassionally fails with "A checkpoint operation failed with: Invalid argument: Directory exists". The suspicious is that the directory fails to be deleted because some trash files. Deep clean the directory after a DestroyDB() call. Also add more debugging printf in case it fails. Also, preserve the DB if verification fails. Pull Request resolved: https://github.com/facebook/rocksdb/pull/7039 Test Plan: Run db_stress with low --checkpoint_one_in value Reviewed By: riversand963 Differential Revision: D22271694 fbshipit-source-id: 6a9b2abb664fc69a4dc666741df4f6b23703cd6d	2020-06-30 12:01:34 -07:00
Peter Dillinger	5b2bbacb6f	Minimize memory internal fragmentation for Bloom filters (#6427 ) Summary: New experimental option BBTO::optimize_filters_for_memory builds filters that maximize their use of "usable size" from malloc_usable_size, which is also used to compute block cache charges. Rather than always "rounding up," we track state in the BloomFilterPolicy object to mix essentially "rounding down" and "rounding up" so that the average FP rate of all generated filters is the same as without the option. (YMMV as heavily accessed filters might be unluckily lower accuracy.) Thus, the option near-minimizes what the block cache considers as "memory used" for a given target Bloom filter false positive rate and Bloom filter implementation. There are no forward or backward compatibility issues with this change, though it only works on the format_version=5 Bloom filter. With Jemalloc, we see about 10% reduction in memory footprint (and block cache charge) for Bloom filters, but 1-2% increase in storage footprint, due to encoding efficiency losses (FP rate is non-linear with bits/key). Why not weighted random round up/down rather than state tracking? By only requiring malloc_usable_size, we don't actually know what the next larger and next smaller usable sizes for the allocator are. We pick a requested size, accept and use whatever usable size it has, and use the difference to inform our next choice. This allows us to narrow in on the right balance without tracking/predicting usable sizes. Why not weight history of generated filter false positive rates by number of keys? This could lead to excess skew in small filters after generating a large filter. Results from filter_bench with jemalloc (irrelevant details omitted): (normal keys/filter, but high variance) $ ./filter_bench -quick -impl=2 -average_keys_per_filter=30000 -vary_key_count_ratio=0.9 Build avg ns/key: 29.6278 Number of filters: 5516 Total size (MB): 200.046 Reported total allocated memory (MB): 220.597 Reported internal fragmentation: 10.2732% Bits/key stored: 10.0097 Average FP rate %: 0.965228 $ ./filter_bench -quick -impl=2 -average_keys_per_filter=30000 -vary_key_count_ratio=0.9 -optimize_filters_for_memory Build avg ns/key: 30.5104 Number of filters: 5464 Total size (MB): 200.015 Reported total allocated memory (MB): 200.322 Reported internal fragmentation: 0.153709% Bits/key stored: 10.1011 Average FP rate %: 0.966313 (very few keys / filter, optimization not as effective due to ~59 byte internal fragmentation in blocked Bloom filter representation) $ ./filter_bench -quick -impl=2 -average_keys_per_filter=1000 -vary_key_count_ratio=0.9 Build avg ns/key: 29.5649 Number of filters: 162950 Total size (MB): 200.001 Reported total allocated memory (MB): 224.624 Reported internal fragmentation: 12.3117% Bits/key stored: 10.2951 Average FP rate %: 0.821534 $ ./filter_bench -quick -impl=2 -average_keys_per_filter=1000 -vary_key_count_ratio=0.9 -optimize_filters_for_memory Build avg ns/key: 31.8057 Number of filters: 159849 Total size (MB): 200 Reported total allocated memory (MB): 208.846 Reported internal fragmentation: 4.42297% Bits/key stored: 10.4948 Average FP rate %: 0.811006 (high keys/filter) $ ./filter_bench -quick -impl=2 -average_keys_per_filter=1000000 -vary_key_count_ratio=0.9 Build avg ns/key: 29.7017 Number of filters: 164 Total size (MB): 200.352 Reported total allocated memory (MB): 221.5 Reported internal fragmentation: 10.5552% Bits/key stored: 10.0003 Average FP rate %: 0.969358 $ ./filter_bench -quick -impl=2 -average_keys_per_filter=1000000 -vary_key_count_ratio=0.9 -optimize_filters_for_memory Build avg ns/key: 30.7131 Number of filters: 160 Total size (MB): 200.928 Reported total allocated memory (MB): 200.938 Reported internal fragmentation: 0.00448054% Bits/key stored: 10.1852 Average FP rate %: 0.963387 And from db_bench (block cache) with jemalloc: $ ./db_bench -db=/dev/shm/dbbench.no_optimize -benchmarks=fillrandom -format_version=5 -value_size=90 -bloom_bits=10 -num=2000000 -threads=8 -compaction_style=2 -fifo_compaction_max_table_files_size_mb=10000 -fifo_compaction_allow_compaction=false $ ./db_bench -db=/dev/shm/dbbench -benchmarks=fillrandom -format_version=5 -value_size=90 -bloom_bits=10 -num=2000000 -threads=8 -optimize_filters_for_memory -compaction_style=2 -fifo_compaction_max_table_files_size_mb=10000 -fifo_compaction_allow_compaction=false $ (for FILE in /dev/shm/dbbench.no_optimize/.sst; do ./sst_dump --file=$FILE --show_properties \| grep 'filter block' ; done) \| awk '{ t += $4; } END { print t; }' 17063835 $ (for FILE in /dev/shm/dbbench/.sst; do ./sst_dump --file=$FILE --show_properties \| grep 'filter block' ; done) \| awk '{ t += $4; } END { print t; }' 17430747 $ #^ 2.1% additional filter storage $ ./db_bench -db=/dev/shm/dbbench.no_optimize -use_existing_db -benchmarks=readrandom,stats -statistics -bloom_bits=10 -num=2000000 -compaction_style=2 -fifo_compaction_max_table_files_size_mb=10000 -fifo_compaction_allow_compaction=false -duration=10 -cache_index_and_filter_blocks -cache_size=1000000000 rocksdb.block.cache.index.add COUNT : 33 rocksdb.block.cache.index.bytes.insert COUNT : 8440400 rocksdb.block.cache.filter.add COUNT : 33 rocksdb.block.cache.filter.bytes.insert COUNT : 21087528 rocksdb.bloom.filter.useful COUNT : 4963889 rocksdb.bloom.filter.full.positive COUNT : 1214081 rocksdb.bloom.filter.full.true.positive COUNT : 1161999 $ #^ 1.04 % observed FP rate $ ./db_bench -db=/dev/shm/dbbench -use_existing_db -benchmarks=readrandom,stats -statistics -bloom_bits=10 -num=2000000 -compaction_style=2 -fifo_compaction_max_table_files_size_mb=10000 -fifo_compaction_allow_compaction=false -optimize_filters_for_memory -duration=10 -cache_index_and_filter_blocks -cache_size=1000000000 rocksdb.block.cache.index.add COUNT : 33 rocksdb.block.cache.index.bytes.insert COUNT : 8448592 rocksdb.block.cache.filter.add COUNT : 33 rocksdb.block.cache.filter.bytes.insert COUNT : 18220328 rocksdb.bloom.filter.useful COUNT : 5360933 rocksdb.bloom.filter.full.positive COUNT : 1321315 rocksdb.bloom.filter.full.true.positive COUNT : 1262999 $ #^ 1.08 % observed FP rate, 13.6% less memory usage for filters (Due to specific key density, this example tends to generate filters that are "worse than average" for internal fragmentation. "Better than average" cases can show little or no improvement.) Pull Request resolved: https://github.com/facebook/rocksdb/pull/6427 Test Plan: unit test added, 'make check' with gcc, clang and valgrind Reviewed By: siying Differential Revision: D22124374 Pulled By: pdillinger fbshipit-source-id: f3e3aa152f9043ddf4fae25799e76341d0d8714e	2020-06-22 13:32:07 -07:00
Peter Dillinger	88b4210701	Remove racially charged terms "whitelist" and "blacklist" (#7008 ) Summary: We don't need them. Pull Request resolved: https://github.com/facebook/rocksdb/pull/7008 Test Plan: "make check" and ensure "make crash_test" starts Reviewed By: ajkr Differential Revision: D22143838 Pulled By: pdillinger fbshipit-source-id: 72c8e16603abc59f4954e304466bc4dc1f58f94e	2020-06-19 15:27:32 -07:00
Andrew Kryczka	775dc623ad	add `CompactionFilter` to stress/crash tests (#6988 ) Summary: Added a `CompactionFilter` that is aware of the stress test's expected state. It only drops key versions that are already covered according to the expected state. It is incompatible with snapshots (same as all `CompactionFilter`s), so disables all snapshot-related features when used in the crash test. Pull Request resolved: https://github.com/facebook/rocksdb/pull/6988 Test Plan: running a minified blackbox crash test ``` $ TEST_TMPDIR=/dev/shm python tools/db_crashtest.py blackbox --max_key=1000000 -write_buffer_size=1048576 -max_bytes_for_level_base=4194304 -target_file_size_base=1048576 -value_size_mult=33 --interval=10 --duration=3600 ``` Reviewed By: anand1976 Differential Revision: D22072888 Pulled By: ajkr fbshipit-source-id: 727b9d7a90d5eab18be0ec6cd5a810712ac13320	2020-06-18 09:54:55 -07:00
Yanqin Jin	15d9f28da5	Add stress test for best-efforts recovery (#6819 ) Summary: Add crash test for the case of best-efforts recovery. After a certain amount of time, we kill the db_stress process, randomly delete some certain table files and restart db_stress. Given the randomness of file deletion, it is difficult to verify against a reference for data correctness. Therefore, we just check that the db can restart successfully. Pull Request resolved: https://github.com/facebook/rocksdb/pull/6819 Test Plan: ``` ./db_stress -best_efforts_recovery=true -disable_wal=1 -reopen=0 ./db_stress -best_efforts_recovery=true -disable_wal=0 -skip_verifydb=1 -verify_db_one_in=0 -continuous_verification_interval=0 make crash_test_with_best_efforts_recovery ``` Reviewed By: anand1976 Differential Revision: D21436753 Pulled By: riversand963 fbshipit-source-id: 0b3605c922a16c37ed17d5ab6682ca4240e47926	2020-06-12 19:27:46 -07:00
Ziyue Yang	e619a20e93	Add an option for parallel compression in for db_stress (#6722 ) Summary: This commit adds an `compression_parallel_threads` option in db_stress. It also fixes the naming of parallel compression option in db_bench to keep it aligned with others. Pull Request resolved: https://github.com/facebook/rocksdb/pull/6722 Reviewed By: pdillinger Differential Revision: D21091385 fbshipit-source-id: c9ba8c4e5cc327ff9e6094a6dc6a15fcff70f100	2020-04-30 10:49:07 -07:00
sdong	73523baeb1	crash_test to cover options.avoid_flush_during_recovery (#6712 ) Summary: Options.avoid_flush_during_recovery is uncovered in crash_test. Add the coverage with a chance of 1/8, as it is a less frequently used options. Pull Request resolved: https://github.com/facebook/rocksdb/pull/6712 Test Plan: Run crash_test and see the option can be used or not used by chance. Reviewed By: ltamasi Differential Revision: D21056566 fbshipit-source-id: c3b1521517cfc204786e6ef8c6acd7fffda64793	2020-04-16 12:11:45 -07:00
Yueh-Hsuan Chiang	5801af4646	Add env_fault_injection argument to db_stress (#6687 ) Summary: Add env_fault_injection argument to db_stress. When enabled, FaultInjectionTestEnv will be used instead. Currently this option does not support running with other env setting. This will allow us to later manually produce error when running db_crashtest. Pull Request resolved: https://github.com/facebook/rocksdb/pull/6687 Test Plan: make db_stress -j32 ./db_stress --env_fault_injection ./db_stress --env_fault_injection --hdfs // expect error message Reviewed By: ajkr Differential Revision: D21014683 Pulled By: yhchiang fbshipit-source-id: 0724aeac37efd57adb72a37defe6dbd3bfa8106a	2020-04-16 11:13:44 -07:00
anand76	5c19a441c4	Fault injection in db_stress (#6538 ) Summary: This PR implements a fault injection mechanism for injecting errors in reads in db_stress. The FaultInjectionTestFS is used for this purpose. A thread local structure is used to track the errors, so that each db_stress thread can independently enable/disable error injection and verify observed errors against expected errors. This is initially enabled only for Get and MultiGet, but can be extended to iterator as well once its proven stable. Pull Request resolved: https://github.com/facebook/rocksdb/pull/6538 Test Plan: crash_test make check Reviewed By: riversand963 Differential Revision: D20714347 Pulled By: anand1976 fbshipit-source-id: d7598321d4a2d72bda0ced57411a337a91d87dc7	2020-04-10 17:21:26 -07:00
Levi Tamasi	217ce20021	Remove GetSortedWalFiles/GetCurrentWalFile from the crash test (#6491 ) Summary: Currently, `db_stress` tests a randomly picked one of `GetLiveFiles`, `GetSortedWalFiles`, and `GetCurrentWalFile` with a 1/N chance when the command line parameter `get_live_files_and_wal_files_one_in` is specified. The problem is that `GetSortedWalFiles` and `GetCurrentWalFile` are unreliable in the sense that they can return errors if another thread removes a WAL file while they are executing (which is a perfectly plausible and legitimate scenario). The patch splits this command line parameter into three (one for each API), and changes the crash test script so that only `GetLiveFiles` is tested during our continuous crash test runs. Pull Request resolved: https://github.com/facebook/rocksdb/pull/6491 Test Plan: ``` make check python tools/db_crashtest.py whitebox ``` Reviewed By: siying Differential Revision: D20312200 Pulled By: ltamasi fbshipit-source-id: e7c3481eddfe3bd3d5349476e34abc9eee5b7dc8	2020-03-18 17:14:15 -07:00
Yanqin Jin	58918d4ccc	Use correct Env for DestroyDB in stress test (#6539 ) Summary: When using custom Env, trying to call DestroyDB() with default Options will fail. Pull Request resolved: https://github.com/facebook/rocksdb/pull/6539 Test Plan: ./db_stress Differential Revision: D20476204 Pulled By: riversand963 fbshipit-source-id: 612c6754660cc9b5bb3e9c2dbb2f6ecd7f648797	2020-03-16 16:57:48 -07:00
Andrew Kryczka	f52db84650	support SstFileManager in db_stress (#6454 ) Summary: Add some flags for configuring an SstFileManager. An SstFileManager is only created when one or more of these flags are set. Pull Request resolved: https://github.com/facebook/rocksdb/pull/6454 Test Plan: - ran it a while: ``` $ python ./tools/db_crashtest.py blackbox --simple -max_key=100000 -write_buffer_size=131072 -target_file_size_base=131072 -max_bytes_for_level_base=524288 -value_size_mult=33 --interval=10 -max_background_compactions=4 -max_background_flushes=2 -sst_file_manager_bytes_per_sec=1048576 ``` - verified with strace the SstFileManager is behaving as configured: ``` $ strace -fp `pidof db_stress` -e ftruncate,unlink ... [pid 3074805] ftruncate(9</tmp/rocksdb_crashtest_blackbox6OJywh/000070.sst.trash>, 67423) = 0 [pid 3074805] ftruncate(9</tmp/rocksdb_crashtest_blackbox6OJywh/000070.sst.trash>, 51039) = 0 [pid 3074805] ftruncate(9</tmp/rocksdb_crashtest_blackbox6OJywh/000070.sst.trash>, 34655) = 0 [pid 3074805] ftruncate(9</tmp/rocksdb_crashtest_blackbox6OJywh/000070.sst.trash>, 18271) = 0 [pid 3074805] ftruncate(9</tmp/rocksdb_crashtest_blackbox6OJywh/000070.sst.trash>, 1887) = 0 [pid 3074805] unlink("/tmp/rocksdb_crashtest_blackbox6OJywh/000070.sst.trash") = 0 ... ``` Differential Revision: D20103315 Pulled By: ajkr fbshipit-source-id: b3e1092747157459d244b047947a979b85c98f48	2020-02-25 16:45:30 -08:00
sdong	fdf882ded2	Replace namespace name "rocksdb" with ROCKSDB_NAMESPACE (#6433 ) Summary: When dynamically linking two binaries together, different builds of RocksDB from two sources might cause errors. To provide a tool for user to solve the problem, the RocksDB namespace is changed to a flag which can be overridden in build time. Pull Request resolved: https://github.com/facebook/rocksdb/pull/6433 Test Plan: Build release, all and jtest. Try to build with ROCKSDB_NAMESPACE with another flag. Differential Revision: D19977691 fbshipit-source-id: aa7f2d0972e1c31d75339ac48478f34f6cfcfb3e	2020-02-20 12:09:57 -08:00
Maysam Yabandeh	01ab882ba3	Fix release warning for unused bg_canceled Summary: Pull Request resolved: https://github.com/facebook/rocksdb/pull/6357 Differential Revision: D19670931 Pulled By: maysamyabandeh fbshipit-source-id: d528c4c7f9450f1f38b9d2a36e0d5d0865b39be9	2020-01-31 15:09:10 -08:00
Maysam Yabandeh	2243030bc5	Cancel bg jobs before deleting WritePrepared DB in stress tests (#6355 ) Summary: Background jobs in WritePrepared DB might access the db via a snapshot checker callback. The stress tests therefore should cancel background jobs before deleting the db in ::Reopen. Pull Request resolved: https://github.com/facebook/rocksdb/pull/6355 Differential Revision: D19664132 Pulled By: maysamyabandeh fbshipit-source-id: 6060a830e8aad0015c10448286ad37c8a346ac01	2020-01-31 10:29:11 -08:00
Burton Li	c9a5e48762	fix build warnnings on MSVC (#6309 ) Summary: Fix build warnings on MSVC. siying Pull Request resolved: https://github.com/facebook/rocksdb/pull/6309 Differential Revision: D19455012 Pulled By: ltamasi fbshipit-source-id: 940739f2c92de60e47cc2bed8dd7f921459545a9	2020-01-30 16:07:26 -08:00
sdong	8f2bee6747	Add ReadOptions.auto_prefix_mode (#6314 ) Summary: Add a new option ReadOptions.auto_prefix_mode. When set to true, iterator should return the same result as total order seek, but may choose to do prefix seek internally, based on iterator upper bounds. Also fix two previous bugs when handling prefix extrator changes: (1) reverse iterator should not rely on upper bound to determine prefix. Fix it with skipping prefix check. (2) block-based filter is not handled properly. Pull Request resolved: https://github.com/facebook/rocksdb/pull/6314 Test Plan: (1) add a unit test; (2) add the check to stress test and run see whether it can pass at least one run. Differential Revision: D19458717 fbshipit-source-id: 51c1bcc5cdd826c2469af201979a39600e779bce	2020-01-28 14:44:05 -08:00
sdong	1244abef66	Stress Test: relax prefix iterator check condition (#6269 ) Summary: Right now, when validating prefix iterator, if control iterator is invalidate but prefix iterator shows value, we determine it as a test failure. However, this fails to consider the case where a file or memtable containing a tombstone is filtered out by a prefix bloom filter. The fix is to relax the check in this case. If we are out of prefix range, then ignore the check. Pull Request resolved: https://github.com/facebook/rocksdb/pull/6269 Test Plan: Run crash_test for a short while and it still passes. Differential Revision: D19317594 fbshipit-source-id: b964a1cdc1df5efe439d4b32f8023e1fbc8598c1	2020-01-08 13:32:06 -08:00
Maysam Yabandeh	f4a378be3e	Print out non-ok DB::Open status in db_stress (#6272 ) Summary: The crash test is failing with non-ok status after TransactionDB::Open. This patch adds more debugging information. Pull Request resolved: https://github.com/facebook/rocksdb/pull/6272 Differential Revision: D19314527 Pulled By: maysamyabandeh fbshipit-source-id: d45ecb0f2144e052fb4b5fdd483150440991a3b4	2020-01-08 12:10:55 -08:00
Maysam Yabandeh	7c98d71567	Print before AddErrors in stress tests (#6256 ) Summary: Stress tests count number of errors and report them at the end. Not all the cases are accompanied with a log line which makes debugging difficult. The patch adds a log line to the remaining cases. Pull Request resolved: https://github.com/facebook/rocksdb/pull/6256 Differential Revision: D19268785 Pulled By: maysamyabandeh fbshipit-source-id: bdabcaa5c5c7edcb4ce4f25e38fd8a3fd9c7700b	2020-01-02 16:46:23 -08:00
Maysam Yabandeh	48a678b7c9	Prevent an incompatible combination of options (#6254 ) Summary: allow_concurrent_memtable_write is incompatible with non-zero max_successive_merges. Although we check this at runtime, we currently don't prevent the user from setting this combination in options. This has led to stress tests to fail with this combination is tried in ::SetOptions. The patch fixes that. Pull Request resolved: https://github.com/facebook/rocksdb/pull/6254 Differential Revision: D19265819 Pulled By: maysamyabandeh fbshipit-source-id: 47f2e2dc26fe0972c7152f4da15dadb9703f1179	2020-01-02 16:15:06 -08:00
anand76	d4da412864	Add Transaction::MultiGet to db_stress (#6227 ) Summary: Call Transaction::MultiGet from TestMultiGet() in db_stress. We add some Puts/Merges/Deletes into the transaction in order to exercise MultiGetFromBatchAndDB. There is no data verification on read, just check status. There is currently no read data verification in db_stress as it requires synchronization with writes. It needs to be tackled separately. Pull Request resolved: https://github.com/facebook/rocksdb/pull/6227 Test Plan: make crash_test_with_txn Differential Revision: D19204611 Pulled By: anand1976 fbshipit-source-id: 770d0e30d002e88626c264c58103f1d709bb060c	2019-12-20 23:12:51 -08:00
sdong	79cc8dc29b	db_stress: cover approximate size (#6213 ) Summary: db_stress to execute DB::GetApproximateSizes() with randomized keys and options. Return value is not validated but error will be reported. Two ways to generate the range keys: (1) two random keys; (2) a small range. Pull Request resolved: https://github.com/facebook/rocksdb/pull/6213 Test Plan: (1) run "make crash_test" for a while; (2) hack the code to ingest some errors to see it is reported. Differential Revision: D19204665 fbshipit-source-id: 652db36f13bcb5a3bd8fe4a10c0aa22a77a0bce2	2019-12-20 21:43:35 -08:00
sdong	338c149b92	crash_test to cover bottommost compression and some other changes (#6215 ) Summary: Several improvements to crash_test/stress_test: (1) Stress_test to support an parameter of bottommost compression (2) Rename those FLAGS_* variables that are not gflags to avoid confusion (3) Crash_test to randomly generate compression type for bottommost compression with half the chance. (4) Stress_test to sanitize unsupported compression type to snappy, so that crash_test to cover all possible compression types and people don't need to worry about they don't support all comrpession types in their environment. (5) In crash_test, when generating db_stress command, sort arguments in alphabeta order, so that it is easier to find value for a specific argument. Pull Request resolved: https://github.com/facebook/rocksdb/pull/6215 Test Plan: Run "make crash_test" for a while and see the botommost option shown in LOG files. Differential Revision: D19171255 fbshipit-source-id: d7001e246c4ff9ee5760776eea0be97738650735	2019-12-20 16:14:52 -08:00
sdong	e55c2b3f0b	db_stress: improvements in TestIterator (#6166 ) Summary: 1. Cover SeekToFirst() and SeekToLast(). 2. Try to record the history of iterator operations. Pull Request resolved: https://github.com/facebook/rocksdb/pull/6166 Test Plan: Do some manual changes in the code to cover the failure cases and see the error printing is correct and SeekToFirst() and SeekToLast() sometimes show up. Differential Revision: D19047079 fbshipit-source-id: 1ed616f919fe4d32c0a021fc37932a7bd3063bcd	2019-12-20 14:56:15 -08:00
Zhichao Cao	f89dea4fec	db_stress: Added the verification for GetLiveFiles, GetSortedWalFiles and GetCurrentWalFile (#6224 ) Summary: Add the verification in operateDB to verify GetLiveFiles, GetSortedWalFiles and GetCurrentWalFile. The test will be called every 1 out of N, N is decided by get_live_files_and_wal_files_one_i, whose default is 1000000. Pull Request resolved: https://github.com/facebook/rocksdb/pull/6224 Test Plan: pass db_stress default run. Differential Revision: D19183358 Pulled By: zhichao-cao fbshipit-source-id: 20073cf72ede77a3e0d3cf5f28304f1f605d2b1a	2019-12-20 12:07:30 -08:00
Levi Tamasi	786c3d45ed	Support BlobDB in db_stress (#6230 ) Summary: The patch adds support for BlobDB to `db_stress`. Note that BlobDB currently does not support (amongst other features) Column Families or the `SingleDelete` API, so for now, those should be disabled on the command line when running `db_stress` in BlobDB mode (using `-column_families=1` and `-nooverwritepercent=0`, respectively). Also, some BlobDB features that do not go well with the verification logic in `db_stress` like TTL and FIFO eviction are not supported currently. Pull Request resolved: https://github.com/facebook/rocksdb/pull/6230 Test Plan: ``` ./db_stress -max_key=100000 -use_blob_db -column_families=1 -nooverwritepercent=0 -reopen=1 -blob_db_file_size=1000000 -target_file_size_base=1000000 -blob_db_enable_gc -blob_db_gc_cutoff=0.1 -blob_db_min_blob_size=10 -blob_db_bytes_per_sync=16384 ``` Differential Revision: D19191476 Pulled By: ltamasi fbshipit-source-id: 35840452af8c5e6095249c7fd9a53a119a0985fc	2019-12-20 10:27:56 -08:00
Yanqin Jin	670a916d01	Add more verification to db_stress (#6173 ) Summary: Currently, db_stress performs verification by calling `VerifyDb()` at the end of test and optionally before tests start. In case of corruption or incorrect result, it will be too late. This PR adds more verification in two ways. 1. For cf consistency test, each test thread takes a snapshot and verifies every N ops. N is configurable via `-verify_db_one_in`. This option is not supported in other stress tests. 2. For cf consistency test, we use another background thread in which a secondary instance periodically tails the primary (interval is configurable). We verify the secondary. Once an error is detected, we terminate the test and report. This does not affect other stress tests. Test plan (devserver) ``` $./db_stress -test_cf_consistency -verify_db_one_in=0 -ops_per_thread=100000 -continuous_verification_interval=100 $./db_stress -test_cf_consistency -verify_db_one_in=1000 -ops_per_thread=10000 -continuous_verification_interval=0 $make crash_test ``` Pull Request resolved: https://github.com/facebook/rocksdb/pull/6173 Differential Revision: D19047367 Pulled By: riversand963 fbshipit-source-id: aeed584ad71f9310c111445f34975e5ab47a0615	2019-12-20 08:49:29 -08:00
Peter Dillinger	873331fe49	Refactor pulling out parts of StressTest::OperateDb (#6195 ) Summary: Complete some refactoring called for in https://github.com/facebook/rocksdb/issues/6148. Somehow I got some 'make format' in here for files I didn't change, but that should be OK. (I'm not sure why "hide whitespace changes" doesn't seem to help in review.) Not addressed in this PR: some operations simply print to stdout rather than failing on discovering a bad status or inconsistency. Pull Request resolved: https://github.com/facebook/rocksdb/pull/6195 Differential Revision: D19131067 Pulled By: pdillinger fbshipit-source-id: 4f416e6b792023989ef119f385fe122426cb825d	2019-12-19 14:04:49 -08:00
anand76	2afea29762	Add VerifyChecksum() to db_stress (#6203 ) Summary: Add an option to db_stress, verify_checksum_one_in, to call DB::VerifyChecksum() once every N ops. Pull Request resolved: https://github.com/facebook/rocksdb/pull/6203 Differential Revision: D19145753 Pulled By: anand1976 fbshipit-source-id: d09edf21f309ad53aa40dd25b7a563d50665fd8b	2019-12-17 20:44:58 -08:00
Maysam Yabandeh	68d5d82d1f	Wait for CancelAllBackgroundWork before Close in db stress (#6191 ) Summary: In https://github.com/facebook/rocksdb/issues/6174 we fixed the stress test to respect the CancelAllBackgroundWork + Close order for WritePrepared transactions. The fix missed to take into account that some invocation of CancelAllBackgroundWork are with wait=false parameter which essentially breaks the order. Pull Request resolved: https://github.com/facebook/rocksdb/pull/6191 Differential Revision: D19102709 Pulled By: maysamyabandeh fbshipit-source-id: f4e7b5fdae47ff1c1ac284ba1cf67d5d3f3d03eb	2019-12-16 18:33:09 -08:00
sdong	bcc372c0c3	Add some new options to crash_test (#6176 ) Summary: Several options are trivially added to crash test and random values are picked. Made simple test run non-dynamic level and normal test run dynamic level. Pull Request resolved: https://github.com/facebook/rocksdb/pull/6176 Test Plan: Run crash_test and watch the printing Differential Revision: D19053955 fbshipit-source-id: 958cb43c968541ebd87ed4d91e778bd1d40e7502	2019-12-16 15:43:13 -08:00
sdong	35126dd874	db_stress: preserve all historic manifest files (#6142 ) Summary: compaction history is stored in manifest files. Preserve all of them in db_stress would help debugging. Pull Request resolved: https://github.com/facebook/rocksdb/pull/6142 Test Plan: Run db_stress and observe that manifest files are preserved. Run whole crash_test and see how DB directory looks like. Differential Revision: D19047026 fbshipit-source-id: f83c3e0bb5332b1b4768be5dcee56a24f9b760a9	2019-12-16 14:32:34 -08:00
Maysam Yabandeh	4b97812da8	Add long-running snapshots to stress tests (#6171 ) Summary: Current implementation holds on to 10% of snapshots for 10x longer, and 1% of snapshots 100x longer. Pull Request resolved: https://github.com/facebook/rocksdb/pull/6171 Test Plan: ``` make -j32 crash_test Differential Revision: D19038399 Pulled By: maysamyabandeh fbshipit-source-id: 75da2dbb5c47a0b3f37d299b8719e392b73b42c0	2019-12-14 15:22:40 -08:00
Maysam Yabandeh	349bd3ed82	CancelAllBackgroundWork before Close in db stress (#6174 ) Summary: Close asserts that there is no unreleased snapshots. For WritePrepared transaction, this means that the background work that holds on a snapshot must be canceled first. Update the stress tests to respect the sequence. Pull Request resolved: https://github.com/facebook/rocksdb/pull/6174 Test Plan: ``` make -j32 crash_test Differential Revision: D19057322 Pulled By: maysamyabandeh fbshipit-source-id: c9e9e24f779bbfb0ab72c2717e34576c01bc6362	2019-12-13 18:22:50 -08:00
Peter Dillinger	58d46d1915	Add useful idioms to Random API (OneInOpt, PercentTrue) (#6154 ) Summary: And clean up related code, especially in stress test. (More clean up of db_stress_test_base.cc coming after this.) Pull Request resolved: https://github.com/facebook/rocksdb/pull/6154 Test Plan: make check, make blackbox_crash_test for a bit Differential Revision: D18938180 Pulled By: pdillinger fbshipit-source-id: 524d27621b8dbb25f6dff40f1081e7c00630357e	2019-12-13 14:30:14 -08:00
Maysam Yabandeh	fec7302a9d	Enable unordered_write in stress tests (#6164 ) Summary: With WritePrepared transactions configured with two_write_queues, unordered_write will offer the same guarantees as vanilla rocksdb and thus can be enabled in stress tests. Pull Request resolved: https://github.com/facebook/rocksdb/pull/6164 Test Plan: ``` make -j32 crash_test_with_txn Differential Revision: D18991899 Pulled By: maysamyabandeh fbshipit-source-id: eece5e96b4169b67d7931e5c0afca88540a113e1	2019-12-13 10:25:04 -08:00
Maysam Yabandeh	8613ee2e94	Enable all txn write policies in crash test (#6158 ) Summary: Currently the default txn write policy in crash tests is WRITE_PREPARED. The patch randomly picks the write policy at the start of the crash test. Pull Request resolved: https://github.com/facebook/rocksdb/pull/6158 Test Plan: ``` make -j32 crash_test_with_txn ``` Differential Revision: D18946307 Pulled By: maysamyabandeh fbshipit-source-id: f77d7a94f99a08791ef9626da153d284bf521950	2019-12-12 10:43:49 -08:00
Yanqin Jin	383f5071f0	Add SyncWAL to db_stress (#6149 ) Summary: Add SyncWAL to db_stress. Specify with `-sync_wal_one_in=N` so that it will be called once every N operations on average. Pull Request resolved: https://github.com/facebook/rocksdb/pull/6149 Test Plan: ``` $make db_stress $./db_stress -sync_wal_one_in=100 -ops_per_thread=100000 ``` Differential Revision: D18922529 Pulled By: riversand963 fbshipit-source-id: 4c0b8cb8fa21852722cffd957deddf688f12ea56	2019-12-10 21:55:25 -08:00
sdong	7a99162a74	db_stress: sometimes call CancelAllBackgroundWork() and Close() before closing DB (#6141 ) Summary: CancelAllBackgroundWork() and Close() are frequently used features but we don't cover it in stress test. Simply execute them before closing the DB with 1/2 chance. Pull Request resolved: https://github.com/facebook/rocksdb/pull/6141 Test Plan: Run "db_stress". Differential Revision: D18900861 fbshipit-source-id: 49b46ccfae120d0f9de3e0543b82fb6d715949d0	2019-12-10 20:04:52 -08:00
Peter Dillinger	a653857178	Add PauseBackgroundWork() to db_stress (#6148 ) Summary: Worker thread will occasionally call PauseBackgroundWork(), briefly sleep (to avoid stalling itself) and then call ContinueBackgroundWork(). Pull Request resolved: https://github.com/facebook/rocksdb/pull/6148 Test Plan: some running of 'make blackbox_crash_test' with temporary printf output to confirm code occasionally reached. Differential Revision: D18913886 Pulled By: pdillinger fbshipit-source-id: ae9356a803390929f3165dfb6a00194692ba92be	2019-12-10 15:46:48 -08:00
sdong	14c38baca0	db_stress: sometimes validate compact range data (#6140 ) Summary: Right now, in db_stress, compact range is simply executed without any immediate data validation. Add a simply validation which compares hash for all keys within the compact range to stay the same against the same snapshot before and after the compaction. Also, randomly tune most knobs of CompactRangeOptions. Pull Request resolved: https://github.com/facebook/rocksdb/pull/6140 Test Plan: Run db_stress with "--compact_range_one_in=2000 --compact_range_width=100000000" for a while. Manually ingest some hacky code and observe the error path. Differential Revision: D18900230 fbshipit-source-id: d96e75bc8c38dd5ec702571ffe7cf5f4ea93ee10	2019-12-10 11:41:50 -08:00
Peter Dillinger	6380df5e10	Vary bloom_bits in db_crashtest (#6103 ) Summary: Especially with non-integral bits/key now supported, db_crashtest should vary the bloom_bits configuration. The probabilities look like this: 1/2 chance of a uniform int from 0 to 19. This includes overall 1/40 chance of 0 which disables the bloom filter. 1/2 chance of a float from a lognormal distribution with a median of 10. This always produces positive values but with a decent chance of < 1 (overall ~1/40) or > 100 (overall ~1/40), the enforced/coerced implementation limits. Pull Request resolved: https://github.com/facebook/rocksdb/pull/6103 Test Plan: start 'make blackbox_crash_test' several times and look at configuration output Differential Revision: D18734877 Pulled By: pdillinger fbshipit-source-id: 4a38cb057d3b3fc1327f93199f65b9a9ffbd7316	2019-12-10 08:39:50 -08:00
sdong	7d79b32618	Break db_stress_tool.cc to a list of source files (#6134 ) Summary: db_stress_tool.cc now is a giant file. In order to main it easier to improve and maintain, break it down to multiple source files. Most classes are turned into their own files. Separate .h and .cc files are created for gflag definiations. Another .h and .cc files are created for some common functions. Some test execution logic that is only loosely related to class StressTest is moved to db_stress_driver.h and db_stress_driver.cc. All the files are located under db_stress_tool/. The directory name is created as such because if we end it with either stress or test, .gitignore will ignore any file under it and makes it prone to issues in developements. Pull Request resolved: https://github.com/facebook/rocksdb/pull/6134 Test Plan: Build under GCC7 with and without LITE on using GNU Make. Build with GCC 4.8. Build with cmake with -DWITH_TOOL=1 Differential Revision: D18876064 fbshipit-source-id: b25d0a7451840f31ac0f5ebb0068785f783fdf7d	2019-12-08 23:51:01 -08:00

1 2 3 4 5

244 Commits