Commit Graph

33 Commits

Author SHA1 Message Date
Andrew Kryczka 6ce782beaf move db_stress locking to `StressTest::Test*()` functions (#10678)
Summary:
One problem of the previous strategy was `NonBatchedOpsStressTest::TestIngestExternalFile()` could release the lock for `rand_keys[0]` in `rand_column_families[0]`, and then subsequent operations in the same loop iteration (e.g., `TestPut()`) would run without locking. This PR changes the strategy so each `Test*()` function is responsible for acquiring and releasing its own locks.

Pull Request resolved: https://github.com/facebook/rocksdb/pull/10678

Reviewed By: hx235

Differential Revision: D39516401

Pulled By: ajkr

fbshipit-source-id: bf67f12ebbd293ba8c24fdf8754ff28737bcd758
2022-09-15 15:55:37 -07:00
Yanqin Jin 3613d862ba print value when verification fails (#10587)
Summary:
When verification fails for db_stress, print more information about
value read from the db and expected state.

Pull Request resolved: https://github.com/facebook/rocksdb/pull/10587

Test Plan:
make check
./db_stress

Reviewed By: akankshamahajan15, hx235

Differential Revision: D39078511

Pulled By: riversand963

fbshipit-source-id: 77ac8ffae01fc3a9b58a02c2e7bbe141e1a18f0b
2022-08-29 14:13:06 -07:00
Changyu Bi d140fbfd7d Add Iterator test against expected state to stress test (#10538)
Summary:
As mentioned in https://github.com/facebook/rocksdb/pull/5506#issuecomment-506021913,
`db_stress` does not have much verification for iterator correctness.
It has a `TestIterate()` function, but that is mainly for comparing results
between two iterators, one with `total_order_seek` and the other optionally
sets auto_prefix, upper/lower bounds. Commit 49a0581ad2462e31aa3f768afa769e0d33390f33
added a new `TestIterateAgainstExpected()` function that compares iterator against
expected state. It locks a range of keys, creates an iterator, does
a random sequence of `Next/Prev` and compares against expected state.
This PR is based on that commit, the main changes include some logs
(for easier debugging if a test fails), a forward and backward scan to
cover the entire locked key range, and a flag for optionally turning on
this version of Iterator testing.

Added constraint that the checks against expected state in
`TestIterateAgainstExpected()` and in `TestGet()` are only turned on
when `--skip_verifydb` flag is not set.
Remove the change log introduced in https://github.com/facebook/rocksdb/issues/10553.

Pull Request resolved: https://github.com/facebook/rocksdb/pull/10538

Test Plan:
Run `db_stress` with `--verify_iterator_with_expected_state_one_in=1`,
and a large `--iterpercent` and `--num_iterations`. Checked `op_logs`
manually to ensure expected coverage. Tweaked part of the code in
https://github.com/facebook/rocksdb/issues/10449 and stress test was able to catch it.
- internally run various flavor of crash test

Reviewed By: ajkr

Differential Revision: D38847269

Pulled By: cbi42

fbshipit-source-id: 8b4402a9bba9f6cfa08051943cd672579d489599
2022-08-24 14:59:50 -07:00
Yanqin Jin b443d24f4d Stop operating on DB in a stress test background thread (#10373)
Summary:
Stress test background threads do not coordinate with test worker
threads for db reopen in the middle of a test run, thus accessing db
obj in a stress test bg thread can race with test workers. Remove the
TimestampedSnapshotThread.

Pull Request resolved: https://github.com/facebook/rocksdb/pull/10373

Test Plan:
```
./db_stress --acquire_snapshot_one_in=0 --adaptive_readahead=0 --allow_concurrent_memtable_write=1 \
--allow_data_in_errors=True --async_io=0 --avoid_flush_during_recovery=0 --avoid_unnecessary_blocking_io=1 \
--backup_max_size=104857600 --backup_one_in=100000 --batch_protection_bytes_per_key=8 \
--block_size=16384 --bloom_bits=7.580319535285394 --bottommost_compression_type=disable \
--bytes_per_sync=262144 --cache_index_and_filter_blocks=0 --cache_size=8388608 --cache_type=lru_cache \
--charge_compression_dictionary_building_buffer=1 --charge_file_metadata=0 --charge_filter_construction=1 \
--charge_table_reader=0 --checkpoint_one_in=0 --checksum_type=kxxHash64 --clear_column_family_one_in=0 \
--compact_files_one_in=1000000 --compact_range_one_in=0 --compaction_pri=1 --compaction_ttl=0 \
--compression_max_dict_buffer_bytes=0 --compression_max_dict_bytes=0 --compression_parallel_threads=1 \
--compression_type=xpress --compression_use_zstd_dict_trainer=1 --compression_zstd_max_train_bytes=0 \
--continuous_verification_interval=0 --create_timestamped_snapshot_one_in=20 --data_block_index_type=0 \
--db=/dev/shm/rocksdb/ --db_write_buffer_size=0 --delpercent=5 --delrangepercent=0 --destroy_db_initially=1 \
--detect_filter_construct_corruption=0 --disable_wal=0 --enable_compaction_filter=1 --enable_pipelined_write=0 \
--fail_if_options_file_error=1 --file_checksum_impl=xxh64 --flush_one_in=1000000 --format_version=2 \
--get_current_wal_file_one_in=0 --get_live_files_one_in=1000000 --get_property_one_in=1000000 \
--get_sorted_wal_files_one_in=0 --index_block_restart_interval=11 --index_type=0 --ingest_external_file_one_in=0 \
--iterpercent=0 --key_len_percent_dist=1,30,69 --level_compaction_dynamic_level_bytes=True \
--log2_keys_per_lock=10 --long_running_snapshots=0 --mark_for_compaction_one_file_in=10 \
--max_background_compactions=20 --max_bytes_for_level_base=10485760 --max_key=25000000 \
--max_key_len=3 --max_manifest_file_size=1073741824 --max_write_batch_group_size_bytes=64 \
--max_write_buffer_number=3 --max_write_buffer_size_to_maintain=0 --memtable_prefix_bloom_size_ratio=0.5 \
--memtable_whole_key_filtering=1 --memtablerep=skip_list --mmap_read=0 --mock_direct_io=True \
--nooverwritepercent=1 --open_files=500000 --open_metadata_write_fault_one_in=0 \
--open_read_fault_one_in=0 --open_write_fault_one_in=0 --ops_per_thread=20000 \
--optimize_filters_for_memory=1 --paranoid_file_checks=1 --partition_filters=0 --partition_pinning=2 \
--pause_background_one_in=1000000 --periodic_compaction_seconds=0 --prefix_size=1 \
--prefixpercent=5 --prepopulate_block_cache=0 --progress_reports=0 --read_fault_one_in=1000 \
--readpercent=55 --recycle_log_file_num=0 --reopen=100 --ribbon_starting_level=8 \
--secondary_cache_fault_one_in=0 --secondary_cache_uri= --snapshot_hold_ops=100000 \
--sst_file_manager_bytes_per_sec=104857600 --sst_file_manager_bytes_per_truncate=0 \
--subcompactions=3 --sync=0 --sync_fault_injection=0 --target_file_size_base=2097152 \
--target_file_size_multiplier=2 --test_batches_snapshots=0 --top_level_index_pinning=1 \
--txn_write_policy=0 --unordered_write=0 --unpartitioned_pinning=0 \
--use_direct_io_for_flush_and_compaction=0 --use_direct_reads=1 --use_full_merge_v1=1 \
--use_merge=1 --use_multiget=0 --use_txn=1 --user_timestamp_size=0 --value_size_mult=32 \
--verify_checksum=1 --verify_checksum_one_in=1000000 --verify_db_one_in=100000 \
--verify_sst_unique_id_in_manifest=1 --wal_bytes_per_sync=0 --wal_compression=none \
--write_buffer_size=4194304 --write_dbid_to_manifest=0 --writepercent=35
```
make crash_test_with_txn
make crash_test_with_multiops_wc_txn

Reviewed By: jay-zhuang

Differential Revision: D37903189

Pulled By: riversand963

fbshipit-source-id: cd1728ad7ba4ce4cf47af23c4f65dda0956744f9
2022-07-19 11:25:43 -07:00
Yanqin Jin 2f13f5f7d0 Add coverage for timestamped snapshot to MultiOpsTxnsStressTest (#10325)
Summary:
As title.

Pull Request resolved: https://github.com/facebook/rocksdb/pull/10325

Test Plan:
```bash
TEST_TMPDIR=/dev/shm/rocksdb/ make crash_test_with_multiops_wc_txn
TEST_TMPDIR=/dev/shm/rocksdb/ make crash_test_with_txn
```

Reviewed By: akankshamahajan15

Differential Revision: D37688742

Pulled By: riversand963

fbshipit-source-id: e198ace921898af63f99e869568c1a7bbf69f1a4
2022-07-11 12:44:08 -07:00
Yanqin Jin caced09e79 Expand stress test coverage for user-defined timestamp (#10280)
Summary:
Before this PR, we call `now()` to get the wall time before performing point-lookup and range
scans when user-defined timestamp is enabled.

With this PR, we expand the coverage to:
- read with an older timestamp which is larger then the wall time when the process starts but potentially smaller than now()
- add coverage for `ReadOptions::iter_start_ts != nullptr`

Pull Request resolved: https://github.com/facebook/rocksdb/pull/10280

Test Plan:
```bash
make check
```

Also,
```bash
TEST_TMPDIR=/dev/shm/rocksdb make crash_test_with_ts
```

So far, we have had four successful runs of the above

In addition,
```bash
TEST_TMPDIR=/dev/shm/rocksdb make crash_test
```
Succeeded twice showing no regression.

Reviewed By: ltamasi

Differential Revision: D37539805

Pulled By: riversand963

fbshipit-source-id: f2d9887ad95245945ce17a014d55bb93f00e1cb5
2022-07-05 13:30:15 -07:00
Yanqin Jin bfaf8291c5 Fix a race condition in transaction stress test (#10157)
Summary:
Before this PR, there can be a race condition between the thread calling
`StressTest::Open()` and a background compaction thread calling
`MultiOpsTxnsStressTest::VerifyPkSkFast()`.

```
Time   thread1                             bg_compact_thr
 |     TransactionDB::Open(..., &txn_db_)
 |     db_ is still nullptr
 |                                         db_->GetSnapshot()  // segfault
 |     db_ = txn_db_
 V
```

Pull Request resolved: https://github.com/facebook/rocksdb/pull/10157

Test Plan: CI

Reviewed By: akankshamahajan15

Differential Revision: D37121653

Pulled By: riversand963

fbshipit-source-id: 6a53117f958e9ee86f77297fdeb843e5160a9331
2022-06-13 18:54:38 -07:00
Yanqin Jin 1777e5f7e9 Snapshots with user-specified timestamps (#9879)
Summary:
In RocksDB, keys are associated with (internal) sequence numbers which denote when the keys are written
to the database. Sequence numbers in different RocksDB instances are unrelated, thus not comparable.

It is nice if we can associate sequence numbers with their corresponding actual timestamps. One thing we can
do is to support user-defined timestamp, which allows the applications to specify the format of custom timestamps
and encode a timestamp with each key. More details can be found at https://github.com/facebook/rocksdb/wiki/User-defined-Timestamp-%28Experimental%29.

This PR provides a different but complementary approach. We can associate rocksdb snapshots (defined in
https://github.com/facebook/rocksdb/blob/7.2.fb/include/rocksdb/snapshot.h#L20) with **user-specified** timestamps.
Since a snapshot is essentially an object representing a sequence number, this PR establishes a bi-directional mapping between sequence numbers and timestamps.

In the past, snapshots are usually taken by readers. The current super-version is grabbed, and a `rocksdb::Snapshot`
object is created with the last published sequence number of the super-version. You can see that the reader actually
has no good idea of what timestamp to assign to this snapshot, because by the time the `GetSnapshot()` is called,
an arbitrarily long period of time may have already elapsed since the last write, which is when the last published
sequence number is written.

This observation motivates the creation of "timestamped" snapshots on the write path. Currently, this functionality is
exposed only to the layer of `TransactionDB`. Application can tell RocksDB to create a snapshot when a transaction
commits, effectively associating the last sequence number with a timestamp. It is also assumed that application will
ensure any two snapshots with timestamps should satisfy the following:
```
snapshot1.seq < snapshot2.seq iff. snapshot1.ts < snapshot2.ts
```

If the application can guarantee that when a reader takes a timestamped snapshot, there is no active writes going on
in the database, then we also allow the user to use a new API `TransactionDB::CreateTimestampedSnapshot()` to create
a snapshot with associated timestamp.

Code example
```cpp
// Create a timestamped snapshot when committing transaction.
txn->SetCommitTimestamp(100);
txn->SetSnapshotOnNextOperation();
txn->Commit();

// A wrapper API for convenience
Status Transaction::CommitAndTryCreateSnapshot(
    std::shared_ptr<TransactionNotifier> notifier,
    TxnTimestamp ts,
    std::shared_ptr<const Snapshot>* ret);

// Create a timestamped snapshot if caller guarantees no concurrent writes
std::pair<Status, std::shared_ptr<const Snapshot>> snapshot = txn_db->CreateTimestampedSnapshot(100);
```

The snapshots created in this way will be managed by RocksDB with ref-counting and potentially shared with
other readers. We provide the following APIs for readers to retrieve a snapshot given a timestamp.
```cpp
// Return the timestamped snapshot correponding to given timestamp. If ts is
// kMaxTxnTimestamp, then we return the latest timestamped snapshot if present.
// Othersise, we return the snapshot whose timestamp is equal to `ts`. If no
// such snapshot exists, then we return null.
std::shared_ptr<const Snapshot> TransactionDB::GetTimestampedSnapshot(TxnTimestamp ts) const;
// Return the latest timestamped snapshot if present.
std::shared_ptr<const Snapshot> TransactionDB::GetLatestTimestampedSnapshot() const;
```

We also provide two additional APIs for stats collection and reporting purposes.

```cpp
Status TransactionDB::GetAllTimestampedSnapshots(
    std::vector<std::shared_ptr<const Snapshot>>& snapshots) const;
// Return timestamped snapshots whose timestamps fall in [ts_lb, ts_ub) and store them in `snapshots`.
Status TransactionDB::GetTimestampedSnapshots(
    TxnTimestamp ts_lb,
    TxnTimestamp ts_ub,
    std::vector<std::shared_ptr<const Snapshot>>& snapshots) const;
```

To prevent the number of timestamped snapshots from growing infinitely, we provide the following API to release
timestamped snapshots whose timestamps are older than or equal to a given threshold.
```cpp
void TransactionDB::ReleaseTimestampedSnapshotsOlderThan(TxnTimestamp ts);
```

Before shutdown, RocksDB will release all timestamped snapshots.

Comparison with user-defined timestamp and how they can be combined:
User-defined timestamp persists every key with a timestamp, while timestamped snapshots maintain a volatile
mapping between snapshots (sequence numbers) and timestamps.
Different internal keys with the same user key but different timestamps will be treated as different by compaction,
thus a newer version will not hide older versions (with smaller timestamps) unless they are eligible for garbage collection.
In contrast, taking a timestamped snapshot at a certain sequence number and timestamp prevents all the keys visible in
this snapshot from been dropped by compaction. Here, visible means (seq < snapshot and most recent).
The timestamped snapshot supports the semantics of reading at an exact point in time.

Timestamped snapshots can also be used with user-defined timestamp.

Pull Request resolved: https://github.com/facebook/rocksdb/pull/9879

Test Plan:
```
make check
TEST_TMPDIR=/dev/shm make crash_test_with_txn
```

Reviewed By: siying

Differential Revision: D35783919

Pulled By: riversand963

fbshipit-source-id: 586ad905e169189e19d3bfc0cb0177a7239d1bd4
2022-06-10 16:07:03 -07:00
Yanqin Jin f890527b16 Update test for secondary instance in stress test (#10121)
Summary:
This PR updates secondary instance testing in stress test by default.

A background thread will be started (disabled by default), running a secondary instance tailing the logs of the primary.

Periodically (every 1 sec), this thread calls `TryCatchUpWithPrimary()` and uses point lookup or range scan
to read some random keys with only very basic verification to make sure no assertion failure is triggered.

Thanks to https://github.com/facebook/rocksdb/issues/10061 , we can enable secondary instance when user-defined timestamp is enabled.

Also removed a less useful test configuration, `secondary_catch_up_one_in`. This is very similar to the periodic
catch-up.

In the last commit, I decided not to enable it now, but just update the tests, since secondary instance does not
work well when the underlying file is renamed by primary, e.g. SstFileManager.

Pull Request resolved: https://github.com/facebook/rocksdb/pull/10121

Test Plan:
```
TEST_TMPDIR=/dev/shm/rocksdb make crash_test
TEST_TMPDIR=/dev/shm/rocksdb make crash_test_with_ts
TEST_TMPDIR=/dev/shm/rocksdb make crash_test_with_atomic_flush
```

Reviewed By: ajkr

Differential Revision: D36939458

Pulled By: riversand963

fbshipit-source-id: 1c065b7efc3690fc341569b9d369a5cbd8ef6b3e
2022-06-07 21:07:47 -07:00
Yanqin Jin e3a3dbf2be Avoid overwriting options loaded from OPTIONS (#9943)
Summary:
This is similar to https://github.com/facebook/rocksdb/issues/9862, including the following fixes/refactoring:

1. If OPTIONS file is specified via `-options_file`, majority of options will be loaded from the file. We should not
overwrite options that have been loaded from the file. Instead, we configure only fields of options which are
shared objects and not set by the OPTIONS file. We also configure a few fields, e.g. `create_if_missing` necessary
for stress test to run.

2. Refactor options initialization into three functions, `InitializeOptionsFromFile()`, `InitializeOptionsFromFlags()`
and `InitializeOptionsGeneral()` similar to db_bench. I hope they can be shared in the future. The high-level logic is
as follows:
```cpp
if (!InitializeOptionsFromFile(...)) {
  InitializeOptionsFromFlags(...);
}
InitializeOptionsGeneral(...);
```

3. Currently, the setting for `block_cache_compressed` does not seem correct because it by default specifies a
size of `numeric_limits<size_t>::max()` ((size_t)-1). According to code comments, `-1` indicates default value,
which should be referring to `num_shard_bits` argument.

4. Clarify `fail_if_options_file_error`.

Pull Request resolved: https://github.com/facebook/rocksdb/pull/9943

Test Plan:
1. make check
2. Run stress tests, and manually check generated OPTIONS file and compare them with input OPTIONS files

Reviewed By: jay-zhuang

Differential Revision: D36133769

Pulled By: riversand963

fbshipit-source-id: 35dacdc090a0a72c922907170cd132b9ecaa073e
2022-05-18 12:43:50 -07:00
Yanqin Jin 06394ff4e7 Fix a bug of CompactionIterator/CompactionFilter using `Delete` (#9929)
Summary:
When compaction filter determines that a key should be removed, it updates the internal key's type
to `Delete`. If this internal key is preserved in current compaction but seen by a later compaction
together with `SingleDelete`, it will cause compaction iterator to return Corruption.

To fix the issue, compaction filter should return more information in addition to the intention of removing
a key. Therefore, we add a new `kRemoveWithSingleDelete` to `CompactionFilter::Decision`. Seeing
`kRemoveWithSingleDelete`, compaction iterator will update the op type of the internal key to `kTypeSingleDelete`.

In addition, I updated db_stress_shared_state.[cc|h] so that `no_overwrite_ids_` becomes `const`. It is easier to
reason about thread-safety if accessed from multiple threads. This information is passed to `PrepareTxnDBOptions()`
when calling from `Open()` so that we can set up the rollback deletion type callback for transactions.

Finally, disable compaction filter for multiops_txn because the key removal logic of `DbStressCompactionFilter` does
not quite work with `MultiOpsTxnsStressTest`.

Pull Request resolved: https://github.com/facebook/rocksdb/pull/9929

Test Plan:
make check
make crash_test
make crash_test_with_txn

Reviewed By: anand1976

Differential Revision: D36069678

Pulled By: riversand963

fbshipit-source-id: cedd2f1ba958af59ad3916f1ba6f424307955f92
2022-05-02 13:25:45 -07:00
Yanqin Jin 94e245a14d Improve stress test for MultiOpsTxnsStressTest (#9829)
Summary:
Adds more coverage to `MultiOpsTxnsStressTest` with a focus on write-prepared transactions.

1. Add a hack to manually evict commit cache entries. We currently cannot assign small values to `wp_commit_cache_bits` because it requires a prepared transaction to commit within a certain range of sequence numbers, otherwise it will throw.
2. Add coverage for commit-time-write-batch. If write policy is write-prepared, we need to set `use_only_the_last_commit_time_batch_for_recovery` to true.
3. After each flush/compaction, verify data consistency. This is possible since data size can be small: default numbers of primary/secondary keys are just 1000.

Pull Request resolved: https://github.com/facebook/rocksdb/pull/9829

Test Plan:
```
TEST_TMPDIR=/dev/shm/rocksdb_crashtest_blackbox/ make blackbox_crash_test_with_multiops_wp_txn
```

Reviewed By: pdillinger

Differential Revision: D35806678

Pulled By: riversand963

fbshipit-source-id: d7fde7a29fda0fb481a61f553e0ca0c47da93616
2022-04-27 17:50:54 -07:00
Andrew Kryczka c7ce03dce1 db_stress begin tracking expected state after verification (#9470)
Summary:
Previously we enabled tracking expected state changes during
`FinishInitDb()`, as soon as the DB was opened. This meant tracing was
enabled during `VerifyDb()`. This cost extra CPU by requiring
`DBImpl::trace_mutex_` to be acquired on each read operation. It was
unnecessary since we know there are no expected state changes during the
`VerifyDb()` phase. So, this PR delays tracking expected state changes
until after the `VerifyDb()` phase has completed.

Pull Request resolved: https://github.com/facebook/rocksdb/pull/9470

Test Plan:
Measured this PR reduced `VerifyDb()` 76% (387 -> 92 seconds) with
`-disable_wal=1` (i.e., expected state tracking enabled).

- benchmark command: `./db_stress -max_key=100000000 -ops_per_thread=1 -destroy_db_initially=1 -expected_values_dir=/dev/shm/dbstress_expected/ -db=/dev/shm/dbstress/ --clear_column_family_one_in=0 --disable_wal=1 --reopen=0`
- without this PR, `VerifyDb()` takes 387 seconds:

```
2022/01/30-21:43:04  Initializing worker threads
Crash-recovery verification passed :)
2022/01/30-21:49:31  Starting database operations
```

- with this PR, `VerifyDb()` takes 92 seconds

```
2022/01/30-21:59:06  Initializing worker threads
Crash-recovery verification passed :)
2022/01/30-22:00:38  Starting database operations
```

Reviewed By: riversand963

Differential Revision: D33884596

Pulled By: ajkr

fbshipit-source-id: 5f259de8087de5b0531f088e11297f37ed2f7685
2022-01-31 13:35:32 -08:00
Yanqin Jin e05c2bb549 Stress test for RocksDB transactions (#8936)
Summary:
Current db_stress does not cover complex read-write transactions. Therefore, this PR adds
coverage for emulated MyRocks-style transactions in `MultiOpsTxnsStressTest`. To achieve this, we need:

- Add a new operation type 'customops' so that we can add new complex groups of operations, e.g. transactions involving multiple read-write operations.
- Implement three read-write transactions and two read-only ones to emulate MyRocks-style transactions.

Pull Request resolved: https://github.com/facebook/rocksdb/pull/8936

Test Plan:
```
make check
./db_stress -test_multi_ops_txns -use_txn -clear_column_family_one_in=0 -column_families=1 -writepercent=0 -delpercent=0 -delrangepercent=0 -customopspercent=60 -readpercent=20 -prefixpercent=0 -iterpercent=20 -reopen=0 -ops_per_thread=100000
```

Next step is to add more configurability and refine input generation and result reporting, which will done in separate follow-up PRs.

Reviewed By: zhichao-cao

Differential Revision: D31071795

Pulled By: riversand963

fbshipit-source-id: 50d7c828346ec643311336b904848a1588a37006
2021-12-14 13:34:43 -08:00
Andrew Kryczka a6a6aad74e db_stress support tracking historical values (#8960)
Summary:
When `--sync_fault_injection` is set, this PR takes a snapshot of the expected values and starts an operation trace when the DB is opened. These files are stored in `--expected_values_dir`. They will be used for recovering the expected state of the DB following a crash where a suffix of unsynced operations are allowed to be lost.

Pull Request resolved: https://github.com/facebook/rocksdb/pull/8960

Test Plan: injected crashed at various points in `FileExpectedStateManager` and verified the next run recovers the state/trace file with highest seqno and removes all older/temporary files. Note we don't use sync_fault_injection in CI crash tests yet.

Reviewed By: pdillinger

Differential Revision: D31194941

Pulled By: ajkr

fbshipit-source-id: b0f935a529a0186c5a9c7709fcaa8829de8a84cf
2021-12-07 13:41:48 -08:00
Zhichao Cao a95a776d75 Inject fatal write failures to db_stress when DB is running (#8479)
Summary:
add the injest_error_severity to control if it is a retryable IO Error or a fatal or unrecoverable error. Use a flag to indicate, if fatal error comes, the flag is set and db is stopped (but not corrupted).

Pull Request resolved: https://github.com/facebook/rocksdb/pull/8479

Test Plan: run  ./db_stress --reopen=0 --read_fault_one_in=1000 --write_fault_one_in=5 --disable_wal=true --write_buffer_size=3000000 -writepercent=5 -readpercent=50 --injest_error_severity=2 --column_families=1, make check

Reviewed By: anand1976

Differential Revision: D29524271

Pulled By: zhichao-cao

fbshipit-source-id: 1aa9fb9b5655b0adba6f5ad12005ca8c074c795b
2021-07-01 14:16:47 -07:00
anand76 6f9ed59b1d Allow db_stress to use a secondary cache (#8455)
Summary:
Add a ```-secondary_cache_uri``` to db_stress to allow the user to specify a custom ```SecondaryCache``` object from the object registry. Also allow db_crashtest.py to be run with an alternate db_stress location. Together, these changes will allow us to run db_stress using FB internal components.

Pull Request resolved: https://github.com/facebook/rocksdb/pull/8455

Reviewed By: zhichao-cao

Differential Revision: D29371972

Pulled By: anand1976

fbshipit-source-id: dd1b1fd80ebbedc11aa63d9246ea6ae49edb77c4
2021-06-27 23:54:39 -07:00
Yanqin Jin 08144bc2f5 Add user-defined timestamps to db_stress (#8061)
Summary:
Add some basic test for user-defined timestamp to db_stress. Currently,
read with timestamp always tries to read using the current timestamp.
Due to the per-key timestamp-sequence ordering constraint, we only add timestamp-
related tests to the `NonBatchedOpsStressTest` since this test serializes accesses
to the same key and uses a file to cross-check data correctness.
The timestamp feature is not supported in a number of components, e.g. Merge, SingleDelete,
DeleteRange, CompactionFilter, Readonly instance, secondary instance, SST file ingestion, transaction,
etc. Therefore, db_stress should exit if user enables both timestamp and these features at the same
time. The (currently) incompatible features can be found in
`CheckAndSetOptionsForUserTimestamp`.

This PR also fixes a bug triggered when timestamp is enabled together with
`index_type=kBinarySearchWithFirstKey`. This bug fix will also be in another separate PR
with more unit tests coverage. Fixing it here because I do not want to exclude the index type
from crash test.

Pull Request resolved: https://github.com/facebook/rocksdb/pull/8061

Test Plan: make crash_test_with_ts

Reviewed By: jay-zhuang

Differential Revision: D27056282

Pulled By: riversand963

fbshipit-source-id: c3e00ad1023fdb9ebbdf9601ec18270c5e2925a9
2021-03-23 05:13:30 -07:00
mrambacher 3dff28cf9b Use SystemClock* instead of std::shared_ptr<SystemClock> in lower level routines (#8033)
Summary:
For performance purposes, the lower level routines were changed to use a SystemClock* instead of a std::shared_ptr<SystemClock>.  The shared ptr has some performance degradation on certain hardware classes.

For most of the system, there is no risk of the pointer being deleted/invalid because the shared_ptr will be stored elsewhere.  For example, the ImmutableDBOptions stores the Env which has a std::shared_ptr<SystemClock> in it.  The SystemClock* within the ImmutableDBOptions is essentially a "short cut" to gain access to this constant resource.

There were a few classes (PeriodicWorkScheduler?) where the "short cut" property did not hold.  In those cases, the shared pointer was preserved.

Using db_bench readrandom perf_level=3 on my EC2 box, this change performed as well or better than 6.17:

6.17: readrandom   :      28.046 micros/op 854902 ops/sec;   61.3 MB/s (355999 of 355999 found)
6.18: readrandom   :      32.615 micros/op 735306 ops/sec;   52.7 MB/s (290999 of 290999 found)
PR: readrandom   :      27.500 micros/op 871909 ops/sec;   62.5 MB/s (367999 of 367999 found)

(Note that the times for 6.18 are prior to revert of the SystemClock).

Pull Request resolved: https://github.com/facebook/rocksdb/pull/8033

Reviewed By: pdillinger

Differential Revision: D27014563

Pulled By: mrambacher

fbshipit-source-id: ad0459eba03182e454391b5926bf5cdd45657b67
2021-03-15 04:34:11 -07:00
Levi Tamasi 0288bdbc53 Add the integrated BlobDB to the stress/crash tests (#7900)
Summary:
The patch adds support for the options related to the new BlobDB implementation
to `db_stress`, including support for dynamically adjusting them using `SetOptions`
when `set_options_one_in` and a new flag `allow_setting_blob_options_dynamically`
are specified. (The latter is used to prevent the options from being enabled when
incompatible features are in use.)

The patch also updates the `db_stress` help messages of the existing stacked BlobDB
related options to clarify that they pertain to the old implementation. In addition, it
adds the new BlobDB to the crash test script. In order to prevent a combinatorial explosion
of jobs and still perform whitebox/blackbox testing (including under ASAN/TSAN/UBSAN),
and to also test BlobDB in conjunction with atomic flush and transactions, the script sets
the BlobDB options in 10% of normal/`cf_consistency`/`txn` crash test runs.

Pull Request resolved: https://github.com/facebook/rocksdb/pull/7900

Test Plan: Ran `make check` and `db_stress`/`db_crashtest.py` with various options.

Reviewed By: jay-zhuang

Differential Revision: D26094913

Pulled By: ltamasi

fbshipit-source-id: c2ef3391a05e43a9687f24e297df05f4a5584814
2021-02-02 11:41:18 -08:00
Jay Zhuang fc4d5f5065 Add stress test for GetProperty (#7111)
Summary:
Add stress test coverage for `DB::GetProperty()`.

Pull Request resolved: https://github.com/facebook/rocksdb/pull/7111

Test Plan:
```
./db_stress -get_property_one_in=1
make crash_test
```

Reviewed By: ajkr

Differential Revision: D22487906

Pulled By: jay-zhuang

fbshipit-source-id: c118d95cc9b4e2fa669a06e6aa531541fa885dc5
2020-07-14 12:12:36 -07:00
Andrew Kryczka 775dc623ad add `CompactionFilter` to stress/crash tests (#6988)
Summary:
Added a `CompactionFilter` that is aware of the stress test's expected state. It only drops key versions that are already covered according to the expected state. It is incompatible with snapshots (same as all `CompactionFilter`s), so disables all snapshot-related features when used in the crash test.
Pull Request resolved: https://github.com/facebook/rocksdb/pull/6988

Test Plan:
running a minified blackbox crash test

```
$ TEST_TMPDIR=/dev/shm python tools/db_crashtest.py blackbox --max_key=1000000 -write_buffer_size=1048576 -max_bytes_for_level_base=4194304 -target_file_size_base=1048576 -value_size_mult=33 --interval=10 --duration=3600
```

Reviewed By: anand1976

Differential Revision: D22072888

Pulled By: ajkr

fbshipit-source-id: 727b9d7a90d5eab18be0ec6cd5a810712ac13320
2020-06-18 09:54:55 -07:00
Levi Tamasi 217ce20021 Remove GetSortedWalFiles/GetCurrentWalFile from the crash test (#6491)
Summary:
Currently, `db_stress` tests a randomly picked one of `GetLiveFiles`,
`GetSortedWalFiles`, and `GetCurrentWalFile` with a 1/N chance when the
command line parameter `get_live_files_and_wal_files_one_in` is specified.
The problem is that `GetSortedWalFiles` and `GetCurrentWalFile` are unreliable
in the sense that they can return errors if another thread removes a WAL file
while they are executing (which is a perfectly plausible and legitimate scenario).
The patch splits this command line parameter into three (one for each API),
and changes the crash test script so that only `GetLiveFiles` is tested during
our continuous crash test runs.
Pull Request resolved: https://github.com/facebook/rocksdb/pull/6491

Test Plan:
```
make check
python tools/db_crashtest.py whitebox
```

Reviewed By: siying

Differential Revision: D20312200

Pulled By: ltamasi

fbshipit-source-id: e7c3481eddfe3bd3d5349476e34abc9eee5b7dc8
2020-03-18 17:14:15 -07:00
sdong fdf882ded2 Replace namespace name "rocksdb" with ROCKSDB_NAMESPACE (#6433)
Summary:
When dynamically linking two binaries together, different builds of RocksDB from two sources might cause errors. To provide a tool for user to solve the problem, the RocksDB namespace is changed to a flag which can be overridden in build time.
Pull Request resolved: https://github.com/facebook/rocksdb/pull/6433

Test Plan: Build release, all and jtest. Try to build with ROCKSDB_NAMESPACE with another flag.

Differential Revision: D19977691

fbshipit-source-id: aa7f2d0972e1c31d75339ac48478f34f6cfcfb3e
2020-02-20 12:09:57 -08:00
anand76 d4da412864 Add Transaction::MultiGet to db_stress (#6227)
Summary:
Call Transaction::MultiGet from TestMultiGet() in db_stress. We add some Puts/Merges/Deletes into the transaction in order to exercise MultiGetFromBatchAndDB. There is no data verification on read, just check status. There is currently no read data verification in db_stress as it requires synchronization with writes. It needs to be tackled separately.
Pull Request resolved: https://github.com/facebook/rocksdb/pull/6227

Test Plan: make crash_test_with_txn

Differential Revision: D19204611

Pulled By: anand1976

fbshipit-source-id: 770d0e30d002e88626c264c58103f1d709bb060c
2019-12-20 23:12:51 -08:00
sdong 79cc8dc29b db_stress: cover approximate size (#6213)
Summary:
db_stress to execute DB::GetApproximateSizes() with randomized keys and options. Return value is not validated but error will be reported.
Two ways to generate the range keys: (1) two random keys; (2) a small range.
Pull Request resolved: https://github.com/facebook/rocksdb/pull/6213

Test Plan: (1) run "make crash_test" for a while; (2) hack the code to ingest some errors to see it is reported.

Differential Revision: D19204665

fbshipit-source-id: 652db36f13bcb5a3bd8fe4a10c0aa22a77a0bce2
2019-12-20 21:43:35 -08:00
sdong e55c2b3f0b db_stress: improvements in TestIterator (#6166)
Summary:
1. Cover SeekToFirst() and SeekToLast().
2. Try to record the history of iterator operations.
Pull Request resolved: https://github.com/facebook/rocksdb/pull/6166

Test Plan: Do some manual changes in the code to cover the failure cases and see the error printing is correct and SeekToFirst() and SeekToLast() sometimes show up.

Differential Revision: D19047079

fbshipit-source-id: 1ed616f919fe4d32c0a021fc37932a7bd3063bcd
2019-12-20 14:56:15 -08:00
Zhichao Cao f89dea4fec db_stress: Added the verification for GetLiveFiles, GetSortedWalFiles and GetCurrentWalFile (#6224)
Summary:
Add the verification in operateDB to verify GetLiveFiles, GetSortedWalFiles and GetCurrentWalFile. The test will be called every 1 out of N, N is decided by get_live_files_and_wal_files_one_i, whose default is 1000000.
Pull Request resolved: https://github.com/facebook/rocksdb/pull/6224

Test Plan: pass db_stress default run.

Differential Revision: D19183358

Pulled By: zhichao-cao

fbshipit-source-id: 20073cf72ede77a3e0d3cf5f28304f1f605d2b1a
2019-12-20 12:07:30 -08:00
Yanqin Jin 670a916d01 Add more verification to db_stress (#6173)
Summary:
Currently, db_stress performs verification by calling `VerifyDb()` at the end of test and optionally before tests start. In case of corruption or incorrect result, it will be too late. This PR adds more verification in two ways.
1. For cf consistency test, each test thread takes a snapshot and verifies every N ops. N is configurable via `-verify_db_one_in`. This option is not supported in other stress tests.
2. For cf consistency test, we use another background thread in which a secondary instance periodically tails the primary (interval is configurable). We verify the secondary. Once an error is detected, we terminate the test and report. This does not affect other stress tests.

Test plan (devserver)
```
$./db_stress -test_cf_consistency -verify_db_one_in=0 -ops_per_thread=100000 -continuous_verification_interval=100
$./db_stress -test_cf_consistency -verify_db_one_in=1000 -ops_per_thread=10000 -continuous_verification_interval=0
$make crash_test
```
Pull Request resolved: https://github.com/facebook/rocksdb/pull/6173

Differential Revision: D19047367

Pulled By: riversand963

fbshipit-source-id: aeed584ad71f9310c111445f34975e5ab47a0615
2019-12-20 08:49:29 -08:00
Peter Dillinger 873331fe49 Refactor pulling out parts of StressTest::OperateDb (#6195)
Summary:
Complete some refactoring called for in https://github.com/facebook/rocksdb/issues/6148. Somehow I got some 'make format' in here for files I didn't change, but that should be OK. (I'm not sure why "hide whitespace changes" doesn't seem to help in review.)

Not addressed in this PR: some operations simply print to stdout rather than failing on discovering a bad status or inconsistency.
Pull Request resolved: https://github.com/facebook/rocksdb/pull/6195

Differential Revision: D19131067

Pulled By: pdillinger

fbshipit-source-id: 4f416e6b792023989ef119f385fe122426cb825d
2019-12-19 14:04:49 -08:00
sdong 7a99162a74 db_stress: sometimes call CancelAllBackgroundWork() and Close() before closing DB (#6141)
Summary:
CancelAllBackgroundWork() and Close() are frequently used features but we don't cover it in stress test. Simply execute them before closing the DB with 1/2 chance.
Pull Request resolved: https://github.com/facebook/rocksdb/pull/6141

Test Plan: Run "db_stress".

Differential Revision: D18900861

fbshipit-source-id: 49b46ccfae120d0f9de3e0543b82fb6d715949d0
2019-12-10 20:04:52 -08:00
sdong 14c38baca0 db_stress: sometimes validate compact range data (#6140)
Summary:
Right now, in db_stress, compact range is simply executed without any immediate data validation. Add a simply validation which compares hash for all keys within the compact range to stay the same against the same snapshot before and after the compaction.

Also, randomly tune most knobs of CompactRangeOptions.
Pull Request resolved: https://github.com/facebook/rocksdb/pull/6140

Test Plan: Run db_stress with "--compact_range_one_in=2000 --compact_range_width=100000000" for a while. Manually ingest some hacky code and observe the error path.

Differential Revision: D18900230

fbshipit-source-id: d96e75bc8c38dd5ec702571ffe7cf5f4ea93ee10
2019-12-10 11:41:50 -08:00
sdong 7d79b32618 Break db_stress_tool.cc to a list of source files (#6134)
Summary:
db_stress_tool.cc now is a giant file. In order to main it easier to improve and maintain, break it down to multiple source files.
Most classes are turned into their own files. Separate .h and .cc files are created for gflag definiations. Another .h and .cc files are created for some common functions. Some test execution logic that is only loosely related to class StressTest is moved to db_stress_driver.h and db_stress_driver.cc. All the files are located under db_stress_tool/. The directory name is created as such because if we end it with either stress or test, .gitignore will ignore any file under it and makes it prone to issues in developements.
Pull Request resolved: https://github.com/facebook/rocksdb/pull/6134

Test Plan: Build under GCC7 with and without LITE on using GNU Make. Build with GCC 4.8. Build with cmake with -DWITH_TOOL=1

Differential Revision: D18876064

fbshipit-source-id: b25d0a7451840f31ac0f5ebb0068785f783fdf7d
2019-12-08 23:51:01 -08:00