Commit Graph

295 Commits

Author SHA1 Message Date
Levi Tamasi dc5de45af8 Support readahead during compaction for blob files (#9187)
Summary:
The patch adds a new BlobDB configuration option `blob_compaction_readahead_size`
that can be used to enable prefetching data from blob files during compaction.
This is important when using storage with higher latencies like HDDs or remote filesystems.
If enabled, prefetching is used for all cases when blobs are read during compaction,
namely garbage collection, compaction filters (when the existing value has to be read from
a blob file), and `Merge` (when the value of the base `Put` is stored in a blob file).

Pull Request resolved: https://github.com/facebook/rocksdb/pull/9187

Test Plan: Ran `make check` and the stress/crash test.

Reviewed By: riversand963

Differential Revision: D32565512

Pulled By: ltamasi

fbshipit-source-id: 87be9cebc3aa01cc227bec6b5f64d827b8164f5d
2021-11-19 17:53:47 -08:00
anand76 78556c14dd Secondary cache error injection (#9002)
Summary:
Implement secondary cache error injection in db_stress.

Pull Request resolved: https://github.com/facebook/rocksdb/pull/9002

Reviewed By: zhichao-cao

Differential Revision: D31874896

Pulled By: anand1976

fbshipit-source-id: 8cf04c061a4a44efa0fe88423d05cade67b85f73
2021-11-08 10:27:27 -08:00
Peter Dillinger a7d4bea43a Implement XXH3 block checksum type (#9069)
Summary:
XXH3 - latest hash function that is extremely fast on large
data, easily faster than crc32c on most any x86_64 hardware. In
integrating this hash function, I have handled the compression type byte
in a non-standard way to avoid using the streaming API (extra data
movement and active code size because of hash function complexity). This
approach got a thumbs-up from Yann Collet.

Existing functionality change:
* reject bad ChecksumType in options with InvalidArgument

This change split off from https://github.com/facebook/rocksdb/issues/9058 because context-aware checksum is
likely to be handled through different configuration than ChecksumType.

Pull Request resolved: https://github.com/facebook/rocksdb/pull/9069

Test Plan:
tests updated, and substantially expanded. Unit tests now check
that we don't accidentally change the values generated by the checksum
algorithms ("schema test") and that we properly handle
invalid/unrecognized checksum types in options or in file footer.

DBTestBase::ChangeOptions (etc.) updated from two to one configuration
changing from default CRC32c ChecksumType. The point of this test code
is to detect possible interactions among features, and the likelihood of
some bad interaction being detected by including configurations other
than XXH3 and CRC32c--and then not detected by stress/crash test--is
extremely low.

Stress/crash test also updated (manual run long enough to see it accepts
new checksum type). db_bench also updated for microbenchmarking
checksums.

 ### Performance microbenchmark (PORTABLE=0 DEBUG_LEVEL=0, Broadwell processor)

./db_bench -benchmarks=crc32c,xxhash,xxhash64,xxh3,crc32c,xxhash,xxhash64,xxh3,crc32c,xxhash,xxhash64,xxh3
crc32c       :       0.200 micros/op 5005220 ops/sec; 19551.6 MB/s (4096 per op)
xxhash       :       0.807 micros/op 1238408 ops/sec; 4837.5 MB/s (4096 per op)
xxhash64     :       0.421 micros/op 2376514 ops/sec; 9283.3 MB/s (4096 per op)
xxh3         :       0.171 micros/op 5858391 ops/sec; 22884.3 MB/s (4096 per op)
crc32c       :       0.206 micros/op 4859566 ops/sec; 18982.7 MB/s (4096 per op)
xxhash       :       0.793 micros/op 1260850 ops/sec; 4925.2 MB/s (4096 per op)
xxhash64     :       0.410 micros/op 2439182 ops/sec; 9528.1 MB/s (4096 per op)
xxh3         :       0.161 micros/op 6202872 ops/sec; 24230.0 MB/s (4096 per op)
crc32c       :       0.203 micros/op 4924686 ops/sec; 19237.1 MB/s (4096 per op)
xxhash       :       0.839 micros/op 1192388 ops/sec; 4657.8 MB/s (4096 per op)
xxhash64     :       0.424 micros/op 2357391 ops/sec; 9208.6 MB/s (4096 per op)
xxh3         :       0.162 micros/op 6182678 ops/sec; 24151.1 MB/s (4096 per op)

As you can see, especially once warmed up, xxh3 is fastest.

 ### Performance macrobenchmark (PORTABLE=0 DEBUG_LEVEL=0, Broadwell processor)

Test

    for I in `seq 1 50`; do for CHK in 0 1 2 3 4; do TEST_TMPDIR=/dev/shm/rocksdb$CHK ./db_bench -benchmarks=fillseq -memtablerep=vector -allow_concurrent_memtable_write=false -num=30000000 -checksum_type=$CHK 2>&1 | grep 'micros/op' | tee -a results-$CHK & done; wait; done

Results (ops/sec)

    for FILE in results*; do echo -n "$FILE "; awk '{ s += $5; c++; } END { print 1.0 * s / c; }' < $FILE; done

results-0 252118 # kNoChecksum
results-1 251588 # kCRC32c
results-2 251863 # kxxHash
results-3 252016 # kxxHash64
results-4 252038 # kXXH3

Reviewed By: mrambacher

Differential Revision: D31905249

Pulled By: pdillinger

fbshipit-source-id: cb9b998ebe2523fc7c400eedf62124a78bf4b4d1
2021-10-28 22:15:17 -07:00
Levi Tamasi 3e1bf771a3 Make it possible to force the garbage collection of the oldest blob files (#8994)
Summary:
The current BlobDB garbage collection logic works by relocating the valid
blobs from the oldest blob files as they are encountered during compaction,
and cleaning up blob files once they contain nothing but garbage. However,
with sufficiently skewed workloads, it is theoretically possible to end up in a
situation when few or no compactions get scheduled for the SST files that contain
references to the oldest blob files, which can lead to increased space amp due
to the lack of GC.

In order to efficiently handle such workloads, the patch adds a new BlobDB
configuration option called `blob_garbage_collection_force_threshold`,
which signals to BlobDB to schedule targeted compactions for the SST files
that keep alive the oldest batch of blob files if the overall ratio of garbage in
the given blob files meets the threshold *and* all the given blob files are
eligible for GC based on `blob_garbage_collection_age_cutoff`. (For example,
if the new option is set to 0.9, targeted compactions will get scheduled if the
sum of garbage bytes meets or exceeds 90% of the sum of total bytes in the
oldest blob files, assuming all affected blob files are below the age-based cutoff.)
The net result of these targeted compactions is that the valid blobs in the oldest
blob files are relocated and the oldest blob files themselves cleaned up (since
*all* SST files that rely on them get compacted away).

These targeted compactions are similar to periodic compactions in the sense
that they force certain SST files that otherwise would not get picked up to undergo
compaction and also in the sense that instead of merging files from multiple levels,
they target a single file. (Note: such compactions might still include neighboring files
from the same level due to the need of having a "clean cut" boundary but they never
include any files from any other level.)

This functionality is currently only supported with the leveled compaction style
and is inactive by default (since the default value is set to 1.0, i.e. 100%).

Pull Request resolved: https://github.com/facebook/rocksdb/pull/8994

Test Plan: Ran `make check` and tested using `db_bench` and the stress/crash tests.

Reviewed By: riversand963

Differential Revision: D31489850

Pulled By: ltamasi

fbshipit-source-id: 44057d511726a0e2a03c5d9313d7511b3f0c4eab
2021-10-11 18:03:01 -07:00
Andrew Kryczka a282eff3d1 Protect existing files in `FaultInjectionTest{Env,FS}::ReopenWritableFile()` (#8995)
Summary:
`FaultInjectionTest{Env,FS}::ReopenWritableFile()` functions were accidentally deleting WALs from previous `db_stress` runs causing verification to fail. They were operating under the assumption that `ReopenWritableFile()` would delete any existing file. It was a reasonable assumption considering the `{Env,FileSystem}::ReopenWritableFile()` documentation stated that would happen. The only problem was neither the implementations we offer nor the "real" clients in RocksDB code followed that contract. So, this PR updates the contract as well as fixing the fault injection client usage.

The fault injection change exposed that `ExternalSSTFileBasicTest.SyncFailure` was relying on a fault injection `Env` dropping unsynced data written by a regular `Env`. I changed that test to make its `SstFileWriter` use fault injection `Env`, and also implemented `LinkFile()` in fault injection so the unsynced data is tracked under the new name.

Pull Request resolved: https://github.com/facebook/rocksdb/pull/8995

Test Plan:
- Verified it fixes the following failure:

```
$ ./db_stress --clear_column_family_one_in=0 --column_families=1 --db=/dev/shm/rocksdb_crashtest_whitebox --delpercent=5 --expected_values_dir=/dev/shm/rocksdb_crashtest_expected --iterpercent=0 --key_len_percent_dist=1,30,69 --max_key=100000 --max_key_len=3 --nooverwritepercent=1 --ops_per_thread=1000 --prefixpercent=0 --readpercent=60 --reopen=0 --target_file_size_base=1048576 --test_batches_snapshots=0 --write_buffer_size=1048576 --writepercent=35 --value_size_mult=33 -threads=1
...
$ ./db_stress --avoid_flush_during_recovery=1 --clear_column_family_one_in=0 --column_families=1 --db=/dev/shm/rocksdb_crashtest_whitebox --delpercent=5 --destroy_db_initially=0 --expected_values_dir=/dev/shm/rocksdb_crashtest_expected --iterpercent=10 --key_len_percent_dist=1,30,69 --max_bytes_for_level_base=4194304 --max_key=100000 --max_key_len=3 --nooverwritepercent=1 --open_files=-1 --open_metadata_write_fault_one_in=8 --open_write_fault_one_in=16 --ops_per_thread=1000 --prefix_size=-1 --prefixpercent=0 --readpercent=50 --sync=1 --target_file_size_base=1048576 --test_batches_snapshots=0 --write_buffer_size=1048576 --writepercent=35 --value_size_mult=33 -threads=1
...
Verification failed for column family 0 key 000000000000001300000000000000857878787878 (1143): Value not found: NotFound:
Crash-recovery verification failed :(
...
```

- `make check -j48`

Reviewed By: ltamasi

Differential Revision: D31495388

Pulled By: ajkr

fbshipit-source-id: 7886ccb6a07cb8b78ad7b6c1c341ccf40bb68385
2021-10-11 16:23:18 -07:00
Akanksha Mahajan 84d71f30c4 Enable SingleDelete with user defined ts in db_bench and crash tests (#8971)
Summary:
Enable SingleDelete with user defined timestamp in db_bench,
db_stress and crash test

Pull Request resolved: https://github.com/facebook/rocksdb/pull/8971

Test Plan:
1. For db_stress, ran the command for full duration: i) python3 -u tools/db_crashtest.py
--enable_ts whitebox --nooverwritepercent=100
ii) make crash_test_with_ts

2. For db_bench, ran:  ./db_bench -benchmarks=randomreplacekeys
-user_timestamp_size=8 -use_single_deletes=true

Reviewed By: riversand963

Differential Revision: D31246558

Pulled By: akankshamahajan15

fbshipit-source-id: 29cd8740c9921341e52f09242fca3c44d75a12b7
2021-10-01 16:48:01 -07:00
Andrew Kryczka 559943cdc0 Refactor expected state in stress/crash test (#8913)
Summary:
This is a precursor refactoring to enable an upcoming feature: persistence failure correctness testing.

- Changed `--expected_values_path` to `--expected_values_dir` and migrated "db_crashtest.py" to use the new flag. For persistence failure correctness testing there are multiple possible correct states since unsynced data is allowed to be dropped. Making it possible to restore all these possible correct states will eventually involve files containing snapshots of expected values and DB trace files.
- The expected values directory is managed by an `ExpectedStateManager` instance. Managing expected state files is separated out of `SharedState` to prevent `SharedState` from becoming too complex when the new files and features (snapshotting, tracing, and restoring) are introduced.
- Migrated expected values file access/management out of `SharedState` into a separate class called `ExpectedState`. This is not exposed directly to the test but rather the `ExpectedState` for the latest values file is accessed via a pass-through API on `ExpectedStateManager`. This forces the test to always access the single latest `ExpectedState`.
- Changed the initialization of the latest expected values file to use a tempfile followed by rename, and also add cleanup logic for possible stranded tempfiles.

Pull Request resolved: https://github.com/facebook/rocksdb/pull/8913

Test Plan:
run in several ways; try to make sure it's not obviously broken.

- crashtest blackbox without TEST_TMPDIR
```
$ python3 tools/db_crashtest.py blackbox --simple --write_buffer_size=1048576 --target_file_size_base=1048576 --max_bytes_for_level_base=4194304 --max_key=100000 --value_size_mult=33 --compression_type=none --duration=120 --interval=10 --compression_type=none --blob_compression_type=none
```
- crashtest blackbox with TEST_TMPDIR
```
$ TEST_TMPDIR=/dev/shm python3 tools/db_crashtest.py blackbox --simple --write_buffer_size=1048576 --target_file_size_base=1048576 --max_bytes_for_level_base=4194304 --max_key=100000 --value_size_mult=33 --compression_type=none --duration=120 --interval=10 --compression_type=none --blob_compression_type=none
```
- crashtest whitebox with TEST_TMPDIR
```
$ TEST_TMPDIR=/dev/shm python3 tools/db_crashtest.py whitebox --simple --write_buffer_size=1048576 --target_file_size_base=1048576 --max_bytes_for_level_base=4194304 --max_key=100000 --value_size_mult=33 --compression_type=none --duration=120 --interval=10 --compression_type=none --blob_compression_type=none --random_kill_odd=88887
```
- db_stress without expected_values_dir
```
$ ./db_stress --write_buffer_size=1048576 --target_file_size_base=1048576 --max_bytes_for_level_base=4194304 --max_key=100000 --value_size_mult=33 --compression_type=none --ops_per_thread=10000 --clear_column_family_one_in=0 --destroy_db_initially=true
```
- db_stress with expected_values_dir and manual corruption
```
$ ./db_stress --write_buffer_size=1048576 --target_file_size_base=1048576 --max_bytes_for_level_base=4194304 --max_key=100000 --value_size_mult=33 --compression_type=none --ops_per_thread=10000 --clear_column_family_one_in=0 --destroy_db_initially=true --expected_values_dir=./
// modify one byte in "./LATEST.state"
$ ./db_stress --write_buffer_size=1048576 --target_file_size_base=1048576 --max_bytes_for_level_base=4194304 --max_key=100000 --value_size_mult=33 --compression_type=none --ops_per_thread=10000 --clear_column_family_one_in=0 --destroy_db_initially=false --expected_values_dir=./
...
Verification failed for column family 0 key 0000000000000000 (0): Value not found: NotFound:
...
```

Reviewed By: riversand963

Differential Revision: D30921951

Pulled By: ajkr

fbshipit-source-id: babfe218062e55d018c9b046536c0289fb78f41c
2021-09-28 14:13:33 -07:00
Andrew Kryczka 6d424be910 Temporarily set experimental_mempurge_threshold=0 in crash test (#8958)
Summary:
For now, disable it since the below command indicates it can cause a
failure. Running that command with `-experimental_mempurge_threshold=0`
has been running successfully for several minutes, whereas before it
failed in seconds.

```
$ while rm -rf /dev/shm/single_stress && ./db_stress --clear_column_family_one_in=0 --column_families=1 --db=/dev/shm/single_stress --experimental_mempurge_threshold=5.493146827397074 --flush_one_in=10000 --reopen=0 --write_buffer_size=262144 --value_size_mult=33 --max_write_buffer_number=3 -ops_per_thread=10000; do : ; done
```

Pull Request resolved: https://github.com/facebook/rocksdb/pull/8958

Reviewed By: ltamasi

Differential Revision: D31187059

Pulled By: ajkr

fbshipit-source-id: 04d5bfb4fcc4f5b66233e691427dfd940c67037f
2021-09-24 18:29:48 -07:00
sdong 9320067703 Improve fault injection to MultiRead (#8937)
Summary:
Several improvements to MultiRead:
1. Fix a bug in stress test which causes false positive when both MultiRead() return and individual read request have failure injected.
2. Add two more types of fault that should be handled: empty read results and checksum mismatch
3. Add a message indicating which type of fault is injected
4. Increase the failure rate

Pull Request resolved: https://github.com/facebook/rocksdb/pull/8937

Reviewed By: anand1976

Differential Revision: D31085930

fbshipit-source-id: 3a04994a3cadebf9a64d25e1fe12b14b7a272fba
2021-09-21 14:48:15 -07:00
Yanqin Jin d8eb824325 Temporarily disable block-based filter when stress testing timestamp (#8703)
Summary:
Current implementation does not support user-defined timestamp when
block-based filter is used. Will implement the support in the future, or
wait to see if block-based filter can be deprecated and removed.

Pull Request resolved: https://github.com/facebook/rocksdb/pull/8703

Test Plan: make whitebox_crash_test_with_ts

Reviewed By: pdillinger

Differential Revision: D30528931

Pulled By: riversand963

fbshipit-source-id: 60dd74ee0a6194e69072069d8c4bd876f249f38d
2021-08-24 19:04:58 -07:00
Peter Dillinger 2a383f21f4 Add Bloom/Ribbon hybrid API support (#8679)
Summary:
This is essentially resurrection and fixing of the part of
https://github.com/facebook/rocksdb/issues/8198 that was reverted in https://github.com/facebook/rocksdb/issues/8212, using data added in https://github.com/facebook/rocksdb/issues/8246. Basically,
when configuring Ribbon filter, you can specify an LSM level before which
Bloom will be used instead of Ribbon. But Bloom is only considered for
Leveled and Universal compaction styles and file going into a known LSM
level. This way, SST file writer, FIFO compaction, etc. use Ribbon filter as
you would expect with NewRibbonFilterPolicy.

So that this can be controlled with a single int value and so that flushes
can be distinguished from intra-L0, we consider flush to go to level -1 for
the purposes of this option. (Explained in API comment.)

I also expect the most common and recommended Ribbon configuration to
use Bloom during flush, to minimize slowing down writes and because according
to my estimates, Ribbon only pays off if the structure lives in memory for
more than an hour. Thus, I have changed the default for NewRibbonFilterPolicy
to be this mild hybrid configuration. I don't really want to add something like
NewHybridFilterPolicy because at least the mild hybrid configuration (Bloom for
flush, Ribbon otherwise) should be considered a natural choice.

C APIs also updated, but because they don't support overloading,
rocksdb_filterpolicy_create_ribbon is kept pure ribbon for clarity and
rocksdb_filterpolicy_create_ribbon_hybrid must be called for a hybrid
configuration. While touching C API, I changed bits per key options from
int to double.

BuiltinFilterPolicy is needed so that LevelThresholdFilterPolicy doesn't inherit
unused fields from BloomFilterPolicy.

Pull Request resolved: https://github.com/facebook/rocksdb/pull/8679

Test Plan: new + updated tests, including crash test

Reviewed By: jay-zhuang

Differential Revision: D30445797

Pulled By: pdillinger

fbshipit-source-id: 6f5aeddfd6d79f7e55493b563c2d1d2d568892e1
2021-08-20 18:00:16 -07:00
Baptiste Lemaire e3a96c4823 Memtable sampling for mempurge heuristic. (#8628)
Summary:
Changes the API of the MemPurge process: the `bool experimental_allow_mempurge` and `experimental_mempurge_policy` flags have been replaced by a `double experimental_mempurge_threshold` option.
This change of API reflects another major change introduced in this PR: the MemPurgeDecider() function now works by sampling the memtables being flushed to estimate the overall amount of useful payload (payload minus the garbage), and then compare this useful payload estimate with the `double experimental_mempurge_threshold` value.
Therefore, when the value of this flag is `0.0` (default value), mempurge is simply deactivated. On the other hand, a value of `DBL_MAX` would be equivalent to always going through a mempurge regardless of the garbage ratio estimate.
At the moment, a `double experimental_mempurge_threshold` value else than 0.0 or `DBL_MAX` is opnly supported`with the `SkipList` memtable representation.
Regarding the sampling, this PR includes the introduction of a `MemTable::UniqueRandomSample` function that collects (approximately) random entries from the memtable by using the new `SkipList::Iterator::RandomSeek()` under the hood, or by iterating through each memtable entry, depending on the target sample size and the total number of entries.
The unit tests have been readapted to support this new API.

Pull Request resolved: https://github.com/facebook/rocksdb/pull/8628

Reviewed By: pdillinger

Differential Revision: D30149315

Pulled By: bjlemaire

fbshipit-source-id: 1feef5390c95db6f4480ab4434716533d3947f27
2021-08-10 18:09:03 -07:00
Peter (Stig) Edwards 543a201b93 Remove unused variable - run_had_errors (#8599)
Summary:
Unused since ab718b415f .
Noticed on b215f1a832/files/tools/db_crashtest.py?sort=name&dir=ASC&mode=heatmap#xf254f528ad18f108:1

Pull Request resolved: https://github.com/facebook/rocksdb/pull/8599

Reviewed By: ajkr

Differential Revision: D30057041

Pulled By: zhichao-cao

fbshipit-source-id: e80438cf9717086d2bf67461e19393d426a7676e
2021-08-06 14:46:37 -07:00
Baptiste Lemaire d6006f9c9b Add experimental mempurge policy flag to db_stress. (#8588)
Summary:
Add `experimental_mempurge_policy` flag to `db_stress` and `db_crashtest.py`.
This flag is only read if the `experimental_allow_mempurge` flag is set to `true`. This flag can take the following values: `kAlways`, and `kAlternate` (default).
- `kAlways`: a flush is always redirected to a mempurge. If the mempurge aborts, the a regular flush proceeds.
- `kAlternate`: if one or more of the flush input memtables is an mempurge output memtable, then a flush is performed, else a mempurge is carried out. Similar to kAlways, if a mempurge aborts, the FlushJob proceeds to a regular flush to storage.

Pull Request resolved: https://github.com/facebook/rocksdb/pull/8588

Reviewed By: pdillinger

Differential Revision: D29934251

Pulled By: bjlemaire

fbshipit-source-id: 90c1debed2029b9915d066914556547507c33dae
2021-07-28 13:27:58 -07:00
Baptiste Lemaire 0229a88dfe Crashtest mempurge (#8545)
Summary:
Add `experiemental_allow_mempurge` flag support for `db_stress` and `db_crashtest.py`, with a `false` default value.
I succesfully tested locally both `whitebox` and `blackbox` crash tests with `experiemental_allow_mempurge` flag set as true.

Pull Request resolved: https://github.com/facebook/rocksdb/pull/8545

Reviewed By: akankshamahajan15

Differential Revision: D29734513

Pulled By: bjlemaire

fbshipit-source-id: 24316c0eccf6caf409e95c035f31d822c66714ae
2021-07-16 10:20:22 -07:00
sdong f33611d5e9 Stress test to inject read failures in DB reopen (#8476)
Summary:
Inject read failures in DB reopen, just as what we do for metadata writes and writes.

Pull Request resolved: https://github.com/facebook/rocksdb/pull/8476

Test Plan: Some manual tests and make sure failures are triggered.

Reviewed By: anand1976

Differential Revision: D29507283

fbshipit-source-id: d04da0163973447041038bd87701686a417c4e0c
2021-07-06 11:05:27 -07:00
sdong ba224b75c7 Stress Test to inject write failures in reopen (#8474)
Summary:
Previously Stress can inject metadata write failures when reopening a DB. We extend it to file append too, in the same way.

Pull Request resolved: https://github.com/facebook/rocksdb/pull/8474

Test Plan: manually run crash test with various setting and make sure the failures are triggered as expected.

Reviewed By: zhichao-cao

Differential Revision: D29503116

fbshipit-source-id: e73a446e80ccbd09301a579280e56ff949381fab
2021-06-30 16:46:41 -07:00
anand76 6f9ed59b1d Allow db_stress to use a secondary cache (#8455)
Summary:
Add a ```-secondary_cache_uri``` to db_stress to allow the user to specify a custom ```SecondaryCache``` object from the object registry. Also allow db_crashtest.py to be run with an alternate db_stress location. Together, these changes will allow us to run db_stress using FB internal components.

Pull Request resolved: https://github.com/facebook/rocksdb/pull/8455

Reviewed By: zhichao-cao

Differential Revision: D29371972

Pulled By: anand1976

fbshipit-source-id: dd1b1fd80ebbedc11aa63d9246ea6ae49edb77c4
2021-06-27 23:54:39 -07:00
Akanksha Mahajan be8199cdb9 Run Merge with Integrated BlobDB in stress, crash and db_bench (#8461)
Summary:
Run Merge with Intergrated BlobDB in stress tests, crash tests and db_bench.

Pull Request resolved: https://github.com/facebook/rocksdb/pull/8461

Test Plan:
1. python3 -u tools/db_crashtest.py --simple whitebox
---use_merge=1 --enable_blob_files=1
           2.  ./db_bench --benchmarks="readwhilemerging"
--merge_operator=uint64add --enable_blob_files=true

Reviewed By: ltamasi

Differential Revision: D29394824

Pulled By: akankshamahajan15

fbshipit-source-id: 0a8e492b13129673e088fb8af3402ab678bb473a
2021-06-25 10:45:52 -07:00
sdong ab718b415f Kill whitebox crash test if it is 15 minutes over the limit (#8341)
Summary:
Whitebox crash test can run significantly over the time limit for test slowness or no kiling points. This indefinite job can create problem when this test is periodically scheduled as a job. Instead, kill the job if it is 15 minutes over the limit.
Refactor the code slightly to consolidate the code for executing commands for white and black box tests.

Pull Request resolved: https://github.com/facebook/rocksdb/pull/8341

Test Plan: Run both of black and white box tests with both of natual and explicit kill condition.

Reviewed By: jay-zhuang

Differential Revision: D28756170

fbshipit-source-id: f253149890e62ace78f871be927e093e9b12f49b
2021-06-01 09:34:53 -07:00
Peter Dillinger ecd63b9262 Revert accidental enabling broken ClockCache in stress test (#8277)
Summary:
From https://github.com/facebook/rocksdb/issues/8261

Pull Request resolved: https://github.com/facebook/rocksdb/pull/8277

Test Plan: briefly make blackbox_crash_test

Reviewed By: zhichao-cao

Differential Revision: D28270648

Pulled By: pdillinger

fbshipit-source-id: 9bfd46c5a1a449165f6597bddb17af910331773f
2021-05-06 16:31:51 -07:00
Andrew Kryczka b71b4597e7 Permit stdout "fail"/"error" in whitebox crash test (#8272)
Summary:
In https://github.com/facebook/rocksdb/issues/8268, the `db_stress` stdout began containing both the strings
"fail" and "error" (case-insensitive). The whitebox crash test
failed upon seeing either of those strings.

I checked that all other occurrences of "fail" and "error"
(case-insensitive) that `db_stress` produces are printed to `stderr`. So
this PR separates the handling of `db_stress`'s stdout and stderr, and
only fails when one those bad strings are found in stderr.

The downside of this PR is `db_stress`'s original interleaving of stdout/stderr is not preserved in `db_crashtest.py`'s output.

Pull Request resolved: https://github.com/facebook/rocksdb/pull/8272

Test Plan:
run it; see it succeeds for several runs until encountering a real error

```
$ python3 tools/db_crashtest.py whitebox --simple --random_kill_odd=8887 --max_key=1000000 --value_size_mult=33
...
db_stress: cache/clock_cache.cc:483: bool rocksdb::{anonymous}::ClockCacheShard::Unref(rocksdb::{anonymous}::CacheHandle*, bool, rocksdb::{anonymous}::CleanupContext*): Assertion `CountRefs(flags) > 0' failed.

TEST FAILED. Output has 'fail'!!!
```

Reviewed By: zhichao-cao

Differential Revision: D28239233

Pulled By: ajkr

fbshipit-source-id: 3b8602a0d570466a7e2c81bb9c49468f7716091e
2021-05-05 17:54:13 -07:00
Andrew Kryczka 0f42e50fec Fix `GetLiveFiles()` returning OPTIONS-000000 (#8268)
Summary:
See release note in HISTORY.md.

Pull Request resolved: https://github.com/facebook/rocksdb/pull/8268

Test Plan: unit test repro

Reviewed By: siying

Differential Revision: D28227901

Pulled By: ajkr

fbshipit-source-id: faf61d13b9e43a761e3d5dcf8203923126b51339
2021-05-05 12:54:46 -07:00
Peter Dillinger 3b981eaa1d Fix use-after-free threading bug in ClockCache (#8261)
Summary:
In testing for https://github.com/facebook/rocksdb/issues/8225 I found cache_bench would crash with
-use_clock_cache, as well as db_bench -use_clock_cache, but not
single-threaded. Smaller cache size hits failure much faster. ASAN
reported the failuer as calling malloc_usable_size on the `key` pointer
of a ClockCache handle after it was reportedly freed. On detailed
inspection I found this bad sequence of operations for a cache entry:

state=InCache=1,refs=1
[thread 1] Start ClockCacheShard::Unref (from Release, no mutex)
[thread 1] Decrement ref count
state=InCache=1,refs=0
[thread 1] Suspend before CalcTotalCharge (no mutex)

[thread 2] Start UnsetInCache (from Insert, mutex held)
[thread 2] clear InCache bit
state=InCache=0,refs=0
[thread 2] Calls RecycleHandle (based on pre-updated state)
[thread 2] Returns to Insert which calls Cleanup which deletes `key`

[thread 1] Resume ClockCacheShard::Unref
[thread 1] Read `key` in CalcTotalCharge

To fix this, I've added a field to the handle to store the metadata
charge so that we can efficiently remember everything we need from
the handle in Unref. We must not read from the handle again if we
decrement the count to zero with InCache=1, which means we don't own
the entry and someone else could eject/overwrite it immediately.

Note before this change, on amd64 sizeof(Handle) == 56 even though there
are only 48 bytes of data. Grouping together the uint32_t fields would
cut it down to 48, but I've added another uint32_t, which takes it
back up to 56. Not a big deal.

Also fixed DisownData to cooperate with ASAN as in LRUCache.

Pull Request resolved: https://github.com/facebook/rocksdb/pull/8261

Test Plan:
Manual + adding use_clock_cache to db_crashtest.py

Base performance
./cache_bench -use_clock_cache
Complete in 17.060 s; QPS = 2458513
New performance
./cache_bench -use_clock_cache
Complete in 17.052 s; QPS = 2459695

Any difference is easily buried in small noise.

Crash test shows still more bug(s) in ClockCache, so I'm expecting to
disable ClockCache from production code in a follow-up PR (if we
can't find and fix the bug(s))

Reviewed By: mrambacher

Differential Revision: D28207358

Pulled By: pdillinger

fbshipit-source-id: aa7a9322afc6f18f30e462c75dbbe4a1206eb294
2021-05-04 22:18:00 -07:00
sdong cde69a7cfd db_stress to add --open_metadata_write_fault_one_in (#8235)
Summary:
DB Stress to add --open_metadata_write_fault_one_in which would randomly fail in some file metadata modification operations during DB Open, including file creation, close, renaming and directory sync. Some operations can fail before and after the operations take place.
If DB open fails, db_stress would retry without the failure ingestion, and DB is expected to open successfully.
This option is enabled in crash test in half of the time.
Some follow up changes would allow write failures in open time, and ingesting those failures in non-DB open cases.

Pull Request resolved: https://github.com/facebook/rocksdb/pull/8235

Test Plan: Run stress tests for a while and see failures got triggered. This can reproduce the bug fixed by https://github.com/facebook/rocksdb/pull/8192 and a similar one that fails when fsyncing parent directory.

Reviewed By: anand1976

Differential Revision: D28010944

fbshipit-source-id: 36a96da4dc3633e5f7680cef3ea0a900fcdb5558
2021-04-28 10:58:05 -07:00
Peter Dillinger 95f6add746 Revert Ribbon starting level support from #8198 (#8212)
Summary:
This partially reverts commit 10196d7edc.

The problem with this change is because of important filter use cases:
FIFO compaction and SST writer. FIFO "compaction" always uses level 0 so
would only use Ribbon filters if specifically including level 0 for the
Ribbon filter policy. SST writer sets level_at_creation=-1 to indicate
unknown level, and this would be treated the same as level 0 unless
fixed.

We are keeping the part about committing to permanent schema, which is
only changes to API comments and HISTORY.md.

Pull Request resolved: https://github.com/facebook/rocksdb/pull/8212

Test Plan: CI

Reviewed By: jay-zhuang

Differential Revision: D27896468

Pulled By: pdillinger

fbshipit-source-id: 50a775f7cba5d64fb729d9b982e355864020596e
2021-04-20 19:46:40 -07:00
Peter Dillinger 10196d7edc Ribbon long-term support, starting level support (#8198)
Summary:
Since the Ribbon filter schema seems good (compatible back to
6.15.0), this change commits to long term support of the SST schema,
even though we expect the API for enabling Ribbon to change (still
called NewExperimentalRibbonFilterPolicy).

This also adds support for "hybrid" configuration in which some levels
use Bloom (higher levels, lower numbered) for speed and the rest use
Ribbon (lower levels, higher numbered) for memory space efficiency.

Pull Request resolved: https://github.com/facebook/rocksdb/pull/8198

Test Plan: unit test added, crash test support

Reviewed By: jay-zhuang

Differential Revision: D27831232

Pulled By: pdillinger

fbshipit-source-id: 90e528677689474d293ed6710b42ba89fbd5b5ab
2021-04-16 15:43:08 -07:00
Akanksha Mahajan 0be89e87fd Enable backup/restore for Integrated BlobDB in stress and crash tests (#8165)
Summary:
Enable backup/restore functionality with Integrated BlobDB in
db_stress and crash test.

Pull Request resolved: https://github.com/facebook/rocksdb/pull/8165

Test Plan:
Ran python3 -u tools/db_crashtest.py --simple whitebox along
with :
  1. decreased "backup_in_one" value for backups to be more frequent and
  2. manually changed code for "enable_blob_file" to be always true and
     apply blobdb params 100% for testing purpose.

Reviewed By: ltamasi

Differential Revision: D27636025

Pulled By: akankshamahajan15

fbshipit-source-id: 0d0e0d1479ced163f992872dc998e79c581bfc99
2021-04-07 17:57:24 -07:00
Yanqin Jin 09528f9fa1 Fix a bug for SeekForPrev with partitioned filter and prefix (#8137)
Summary:
According to https://github.com/facebook/rocksdb/issues/5907, each filter partition "should include the bloom of the prefix of the last
key in the previous partition" so that SeekForPrev() in prefix mode can return correct result.
The prefix of the last key in the previous partition does not necessarily have the same prefix
as the first key in the current partition. Regardless of the first key in current partition, the
prefix of the last key in the previous partition should be added. The existing code, however,
does not follow this. Furthermore, there is another issue: when finishing current filter partition,
`FullFilterBlockBuilder::AddPrefix()` is called for the first key in next filter partition, which effectively
overwrites `last_prefix_str_` prematurely. Consequently, when the filter block builder proceeds
to the next partition, `last_prefix_str_` will be the prefix of its first key, leaving no way of adding
the bloom of the prefix of the last key of the previous partition.

Prefix extractor is FixedLength.2.
```
[  filter part 1   ]    [  filter part 2    ]
                  abc    d
```
When SeekForPrev("abcd"), checking the filter partition will land on filter part 2 because "abcd" > "abc"
but smaller than "d".
If the filter in filter part 2 happens to return false for the test for "ab", then SeekForPrev("abcd") will build
incorrect iterator tree in non-total-order mode.

Also fix a unit test which starts to fail following this PR. `InDomain` should not fail due to assertion
error when checking on an arbitrary key.

Pull Request resolved: https://github.com/facebook/rocksdb/pull/8137

Test Plan:
```
make check
```

Without this fix, the following command will fail pretty soon.
```
./db_stress --acquire_snapshot_one_in=10000 --avoid_flush_during_recovery=0 \
--avoid_unnecessary_blocking_io=0 --backup_max_size=104857600 --backup_one_in=0 \
--batch_protection_bytes_per_key=0 --block_size=16384 --bloom_bits=17 \
--bottommost_compression_type=disable --cache_index_and_filter_blocks=1 --cache_size=1048576 \
--checkpoint_one_in=0 --checksum_type=kxxHash64 --clear_column_family_one_in=0 \
--compact_files_one_in=1000000 --compact_range_one_in=1000000 --compaction_ttl=0 \
--compression_max_dict_buffer_bytes=0 --compression_max_dict_bytes=0 \
--compression_parallel_threads=1 --compression_type=zstd --compression_zstd_max_train_bytes=0 \
--continuous_verification_interval=0 --db=/dev/shm/rocksdb/rocksdb_crashtest_whitebox \
--db_write_buffer_size=8388608 --delpercent=5 --delrangepercent=0 --destroy_db_initially=0 --enable_blob_files=0 \
--enable_compaction_filter=0 --enable_pipelined_write=1 --file_checksum_impl=big --flush_one_in=1000000 \
--format_version=5 --get_current_wal_file_one_in=0 --get_live_files_one_in=1000000 --get_property_one_in=1000000 \
--get_sorted_wal_files_one_in=0 --index_block_restart_interval=4 --index_type=2 --ingest_external_file_one_in=0 \
--iterpercent=10 --key_len_percent_dist=1,30,69 --level_compaction_dynamic_level_bytes=True \
--log2_keys_per_lock=10 --long_running_snapshots=1 --mark_for_compaction_one_file_in=0 \
--max_background_compactions=20 --max_bytes_for_level_base=10485760 --max_key=100000000 --max_key_len=3 \
--max_manifest_file_size=1073741824 --max_write_batch_group_size_bytes=16777216 --max_write_buffer_number=3 \
--max_write_buffer_size_to_maintain=8388608 --memtablerep=skip_list --mmap_read=1 --mock_direct_io=False \
--nooverwritepercent=0 --open_files=500000 --ops_per_thread=20000000 --optimize_filters_for_memory=0 --paranoid_file_checks=1 --partition_filters=1 --partition_pinning=0 --pause_background_one_in=1000000 \
--periodic_compaction_seconds=0 --prefixpercent=5 --progress_reports=0 --read_fault_one_in=0 --read_only=0 \
--readpercent=45 --recycle_log_file_num=0 --reopen=20 --secondary_catch_up_one_in=0 \
--snapshot_hold_ops=100000 --sst_file_manager_bytes_per_sec=104857600 \
--sst_file_manager_bytes_per_truncate=0 --subcompactions=2 --sync=0 --sync_fault_injection=False \
--target_file_size_base=2097152 --target_file_size_multiplier=2 --test_batches_snapshots=0 --test_cf_consistency=0 \
--top_level_index_pinning=0 --unpartitioned_pinning=1 --use_blob_db=0 --use_block_based_filter=0 \
--use_direct_io_for_flush_and_compaction=0 --use_direct_reads=0 --use_full_merge_v1=0 --use_merge=0 \
--use_multiget=0 --use_ribbon_filter=0 --use_txn=0 --user_timestamp_size=8 --verify_checksum=1 \
--verify_checksum_one_in=1000000 --verify_db_one_in=100000 --write_buffer_size=4194304 \
--write_dbid_to_manifest=1 --writepercent=35
```

Reviewed By: pdillinger

Differential Revision: D27553054

Pulled By: riversand963

fbshipit-source-id: 60e391e4a2d8d98a9a3172ec5d6176b90ec3de98
2021-04-06 12:14:08 -07:00
Yanqin Jin ae7a795686 Disable partitioned filters in ts stress test (#8127)
Summary:
Currently, partitioned filter does not support user-defined timestamp. Disable it for now in ts stress test so that
the contrun jobs can proceed.

Pull Request resolved: https://github.com/facebook/rocksdb/pull/8127

Test Plan: make crash_test_with_ts

Reviewed By: ajkr

Differential Revision: D27388488

Pulled By: riversand963

fbshipit-source-id: 5ccff18121cb537bd82f2ac072cd25efb625c666
2021-03-29 00:40:52 -07:00
Yanqin Jin e1aa8c160f Fix an error while running db_crashtest for non-user-ts tests (#8091)
Summary:
Fix the following error while running `make crash_test`
```
Traceback (most recent call last):
  File "tools/db_crashtest.py", line 705, in <module>
    main()
  File "tools/db_crashtest.py", line 696, in main
    blackbox_crash_main(args, unknown_args)
  File "tools/db_crashtest.py", line 479, in blackbox_crash_main
    + list({'db': dbname}.items())), unknown_args)
  File "tools/db_crashtest.py", line 414, in gen_cmd
    finalzied_params = finalize_and_sanitize(params)
  File "tools/db_crashtest.py", line 331, in finalize_and_sanitize
    dest_params.get("user_timestamp_size") > 0):
TypeError: '>' not supported between instances of 'NoneType' and 'int'
```

Pull Request resolved: https://github.com/facebook/rocksdb/pull/8091

Test Plan: make crash_test

Reviewed By: ltamasi

Differential Revision: D27268276

Pulled By: riversand963

fbshipit-source-id: ed2873b9587ecc51e24abc35ef2bd3d91fb1ed1b
2021-03-23 12:45:20 -07:00
Yanqin Jin 08144bc2f5 Add user-defined timestamps to db_stress (#8061)
Summary:
Add some basic test for user-defined timestamp to db_stress. Currently,
read with timestamp always tries to read using the current timestamp.
Due to the per-key timestamp-sequence ordering constraint, we only add timestamp-
related tests to the `NonBatchedOpsStressTest` since this test serializes accesses
to the same key and uses a file to cross-check data correctness.
The timestamp feature is not supported in a number of components, e.g. Merge, SingleDelete,
DeleteRange, CompactionFilter, Readonly instance, secondary instance, SST file ingestion, transaction,
etc. Therefore, db_stress should exit if user enables both timestamp and these features at the same
time. The (currently) incompatible features can be found in
`CheckAndSetOptionsForUserTimestamp`.

This PR also fixes a bug triggered when timestamp is enabled together with
`index_type=kBinarySearchWithFirstKey`. This bug fix will also be in another separate PR
with more unit tests coverage. Fixing it here because I do not want to exclude the index type
from crash test.

Pull Request resolved: https://github.com/facebook/rocksdb/pull/8061

Test Plan: make crash_test_with_ts

Reviewed By: jay-zhuang

Differential Revision: D27056282

Pulled By: riversand963

fbshipit-source-id: c3e00ad1023fdb9ebbdf9601ec18270c5e2925a9
2021-03-23 05:13:30 -07:00
Levi Tamasi 0d800dadea Adjust the set of potential min_blob_size values in stress/crash tests (#8085)
Summary:
Since our stress/crash tests by default generate values of size 8, 16, or 24,
it does not make much sense to set `min_blob_size` to 256. The patch
updates the set of potential `min_blob_size` values in the crash test
script and in `db_stress` where it might be set dynamically using
`SetOptions`.

Pull Request resolved: https://github.com/facebook/rocksdb/pull/8085

Test Plan: Ran `make check` and tried the crash test script.

Reviewed By: riversand963

Differential Revision: D27238620

Pulled By: ltamasi

fbshipit-source-id: 4a96f9944b1ed9220d3045c5ab0b34c49009aeee
2021-03-22 14:38:09 -07:00
Yanqin Jin 1f11d07f24 Enable compact filter for blob in dbstress and dbbench (#8011)
Summary:
As title.

Pull Request resolved: https://github.com/facebook/rocksdb/pull/8011

Test Plan:
```
./db_bench -enable_blob_files=1 -use_keep_filter=1 -disable_auto_compactions=1
/db_stress -enable_blob_files=1 -enable_compaction_filter=1 -acquire_snapshot_one_in=0 -compact_range_one_in=0 -iterpercent=0 -test_batches_snapshots=0 -readpercent=10 -prefixpercent=20 -writepercent=55 -delpercent=15 -continuous_verification_interval=0
```

Reviewed By: ltamasi

Differential Revision: D26736061

Pulled By: riversand963

fbshipit-source-id: 1c7834903c28431ce23324c4f259ed71255614e2
2021-03-01 17:24:47 -08:00
Andrew Kryczka d904233d2f Limit buffering for collecting samples for compression dictionary (#7970)
Summary:
For dictionary compression, we need to collect some representative samples of the data to be compressed, which we use to either generate or train (when `CompressionOptions::zstd_max_train_bytes > 0`) a dictionary. Previously, the strategy was to buffer all the data blocks during flush, and up to the target file size during compaction. That strategy allowed us to randomly pick samples from as wide a range as possible that'd be guaranteed to land in a single output file.

However, some users try to make huge files in memory-constrained environments, where this strategy can cause OOM. This PR introduces an option, `CompressionOptions::max_dict_buffer_bytes`, that limits how much data blocks are buffered before we switch to unbuffered mode (which means creating the per-SST dictionary, writing out the buffered data, and compressing/writing new blocks as soon as they are built). It is not strict as we currently buffer more than just data blocks -- also keys are buffered. But it does make a step towards giving users predictable memory usage.

Related changes include:

- Changed sampling for dictionary compression to select unique data blocks when there is limited availability of data blocks
- Made use of `BlockBuilder::SwapAndReset()` to save an allocation+memcpy when buffering data blocks for building a dictionary
- Changed `ParseBoolean()` to accept an input containing characters after the boolean. This is necessary since, with this PR, a value for `CompressionOptions::enabled` is no longer necessarily the final component in the `CompressionOptions` string.

Pull Request resolved: https://github.com/facebook/rocksdb/pull/7970

Test Plan:
- updated `CompressionOptions` unit tests to verify limit is respected (to the extent expected in the current implementation) in various scenarios of flush/compaction to bottommost/non-bottommost level
- looked at jemalloc heap profiles right before and after switching to unbuffered mode during flush/compaction. Verified memory usage in buffering is proportional to the limit set.

Reviewed By: pdillinger

Differential Revision: D26467994

Pulled By: ajkr

fbshipit-source-id: 3da4ef9fba59974e4ef40e40c01611002c861465
2021-02-19 14:09:54 -08:00
Levi Tamasi dab4fe5bcd Add checkpoint support to BlobDB (#7959)
Summary:
The patch adds checkpoint support to BlobDB. Blob files are hard linked or
copied, depending on whether the checkpoint directory is on the same filesystem
or not, similarly to table files.

TODO: Add support for blob files to `ExportColumnFamily` and to the checksum
verification logic used by backup/restore.

Pull Request resolved: https://github.com/facebook/rocksdb/pull/7959

Test Plan: Ran `make check` and the crash test for a while.

Reviewed By: riversand963

Differential Revision: D26434768

Pulled By: ltamasi

fbshipit-source-id: 994be55a8dc08133028250760fca440d2c7c4dc5
2021-02-17 12:42:36 -08:00
Levi Tamasi 0288bdbc53 Add the integrated BlobDB to the stress/crash tests (#7900)
Summary:
The patch adds support for the options related to the new BlobDB implementation
to `db_stress`, including support for dynamically adjusting them using `SetOptions`
when `set_options_one_in` and a new flag `allow_setting_blob_options_dynamically`
are specified. (The latter is used to prevent the options from being enabled when
incompatible features are in use.)

The patch also updates the `db_stress` help messages of the existing stacked BlobDB
related options to clarify that they pertain to the old implementation. In addition, it
adds the new BlobDB to the crash test script. In order to prevent a combinatorial explosion
of jobs and still perform whitebox/blackbox testing (including under ASAN/TSAN/UBSAN),
and to also test BlobDB in conjunction with atomic flush and transactions, the script sets
the BlobDB options in 10% of normal/`cf_consistency`/`txn` crash test runs.

Pull Request resolved: https://github.com/facebook/rocksdb/pull/7900

Test Plan: Ran `make check` and `db_stress`/`db_crashtest.py` with various options.

Reviewed By: jay-zhuang

Differential Revision: D26094913

Pulled By: ltamasi

fbshipit-source-id: c2ef3391a05e43a9687f24e297df05f4a5584814
2021-02-02 11:41:18 -08:00
Andrew Kryczka 78ee8564ad Integrity protection for live updates to WriteBatch (#7748)
Summary:
This PR adds the foundation classes for key-value integrity protection and the first use case: protecting live updates from the source buffers added to `WriteBatch` through the destination buffer in `MemTable`. The width of the protection info is not yet configurable -- only eight bytes per key is supported. This PR allows users to enable protection by constructing `WriteBatch` with `protection_bytes_per_key == 8`. It does not yet expose a way for users to get integrity protection via other write APIs (e.g., `Put()`, `Merge()`, `Delete()`, etc.).

The foundation classes (`ProtectionInfo.*`) embed the coverage info in their type, and provide `Protect.*()` and `Strip.*()` functions to navigate between types with different coverage. For making bytes per key configurable (for powers of two up to eight) in the future, these classes are templated on the unsigned integer type used to store the protection info. That integer contains the XOR'd result of hashes with independent seeds for all covered fields. For integer fields, the hash is computed on the raw unadjusted bytes, so the result is endian-dependent. The most significant bytes are truncated when the hash value (8 bytes) is wider than the protection integer.

When `WriteBatch` is constructed with `protection_bytes_per_key == 8`, we hold a `ProtectionInfoKVOTC` (i.e., one that covers key, value, optype aka `ValueType`, timestamp, and CF ID) for each entry added to the batch. The protection info is generated from the original buffers passed by the user, as well as the original metadata generated internally. When writing to memtable, each entry is transformed to a `ProtectionInfoKVOTS` (i.e., dropping coverage of CF ID and adding coverage of sequence number), since at that point we know the sequence number, and have already selected a memtable corresponding to a particular CF. This protection info is verified once the entry is encoded in the `MemTable` buffer.

Pull Request resolved: https://github.com/facebook/rocksdb/pull/7748

Test Plan:
- an integration test to verify a wide variety of single-byte changes to the encoded `MemTable` buffer are caught
- add to stress/crash test to verify it works in variety of configs/operations without intentional corruption
- [deferred] unit tests for `ProtectionInfo.*` classes for edge cases like KV swap, `SliceParts` and `Slice` APIs are interchangeable, etc.

Reviewed By: pdillinger

Differential Revision: D25754492

Pulled By: ajkr

fbshipit-source-id: e481bac6c03c2ab268be41359730f1ceb9964866
2021-01-29 12:18:58 -08:00
Zhichao Cao 04b3524ad0 Inject the random write error to stress test (#7653)
Summary:
Inject the random write error to stress test, it requires set reopen=0 and disable_wal=true.

Pull Request resolved: https://github.com/facebook/rocksdb/pull/7653

Test Plan: pass db_stress and python3 db_crashtest.py blackbox

Reviewed By: ajkr

Differential Revision: D25354132

Pulled By: zhichao-cao

fbshipit-source-id: 44721104eecb416e27f65f854912c40e301dd669
2020-12-17 11:52:28 -08:00
Peter Dillinger 60af964372 Experimental (production candidate) SST schema for Ribbon filter (#7658)
Summary:
Added experimental public API for Ribbon filter:
NewExperimentalRibbonFilterPolicy(). This experimental API will
take a "Bloom equivalent" bits per key, and configure the Ribbon
filter for the same FP rate as Bloom would have but ~30% space
savings. (Note: optimize_filters_for_memory is not yet implemented
for Ribbon filter. That can be added with no effect on schema.)

Internally, the Ribbon filter is configured using a "one_in_fp_rate"
value, which is 1 over desired FP rate. For example, use 100 for 1%
FP rate. I'm expecting this will be used in the future for configuring
Bloom-like filters, as I expect people to more commonly hold constant
the filter accuracy and change the space vs. time trade-off, rather than
hold constant the space (per key) and change the accuracy vs. time
trade-off, though we might make that available.

### Benchmarking

```
$ ./filter_bench -impl=2 -quick -m_keys_total_max=200 -average_keys_per_filter=100000 -net_includes_hashing
Building...
Build avg ns/key: 34.1341
Number of filters: 1993
Total size (MB): 238.488
Reported total allocated memory (MB): 262.875
Reported internal fragmentation: 10.2255%
Bits/key stored: 10.0029
----------------------------
Mixed inside/outside queries...
  Single filter net ns/op: 18.7508
  Random filter net ns/op: 258.246
    Average FP rate %: 0.968672
----------------------------
Done. (For more info, run with -legend or -help.)
$ ./filter_bench -impl=3 -quick -m_keys_total_max=200 -average_keys_per_filter=100000 -net_includes_hashing
Building...
Build avg ns/key: 130.851
Number of filters: 1993
Total size (MB): 168.166
Reported total allocated memory (MB): 183.211
Reported internal fragmentation: 8.94626%
Bits/key stored: 7.05341
----------------------------
Mixed inside/outside queries...
  Single filter net ns/op: 58.4523
  Random filter net ns/op: 363.717
    Average FP rate %: 0.952978
----------------------------
Done. (For more info, run with -legend or -help.)
```

168.166 / 238.488 = 0.705  -> 29.5% space reduction

130.851 / 34.1341 = 3.83x construction time for this Ribbon filter vs. lastest Bloom filter (could make that as little as about 2.5x for less space reduction)

### Working around a hashing "flaw"

bloom_test discovered a flaw in the simple hashing applied in
StandardHasher when num_starts == 1 (num_slots == 128), showing an
excessively high FP rate.  The problem is that when many entries, on the
order of number of hash bits or kCoeffBits, are associated with the same
start location, the correlation between the CoeffRow and ResultRow (for
efficiency) can lead to a solution that is "universal," or nearly so, for
entries mapping to that start location. (Normally, variance in start
location breaks the effective association between CoeffRow and
ResultRow; the same value for CoeffRow is effectively different if start
locations are different.) Without kUseSmash and with num_starts > 1 (thus
num_starts ~= num_slots), this flaw should be completely irrelevant.  Even
with 10M slots, the chances of a single slot having just 16 (or more)
entries map to it--not enough to cause an FP problem, which would be local
to that slot if it happened--is 1 in millions. This spreadsheet formula
shows that: =1/(10000000*(1 - POISSON(15, 1, TRUE)))

As kUseSmash==false (the setting for Standard128RibbonBitsBuilder) is
intended for CPU efficiency of filters with many more entries/slots than
kCoeffBits, a very reasonable work-around is to disallow num_starts==1
when !kUseSmash, by making the minimum non-zero number of slots
2*kCoeffBits. This is the work-around I've applied. This also means that
the new Ribbon filter schema (Standard128RibbonBitsBuilder) is not
space-efficient for less than a few hundred entries. Because of this, I
have made it fall back on constructing a Bloom filter, under existing
schema, when that is more space efficient for small filters. (We can
change this in the future if we want.)

TODO: better unit tests for this case in ribbon_test, and probably
update StandardHasher for kUseSmash case so that it can scale nicely to
small filters.

### Other related changes

* Add Ribbon filter to stress/crash test
* Add Ribbon filter to filter_bench as -impl=3
* Add option string support, as in "filter_policy=experimental_ribbon:5.678;"
where 5.678 is the Bloom equivalent bits per key.
* Rename internal mode BloomFilterPolicy::kAuto to kAutoBloom
* Add a general BuiltinFilterBitsBuilder::CalculateNumEntry based on
binary searching CalculateSpace (inefficient), so that subclasses
(especially experimental ones) don't have to provide an efficient
implementation inverting CalculateSpace.
* Minor refactor FastLocalBloomBitsBuilder for new base class
XXH3pFilterBitsBuilder shared with new Standard128RibbonBitsBuilder,
which allows the latter to fall back on Bloom construction in some
extreme cases.
* Mostly updated bloom_test for Ribbon filter, though a test like
FullBloomTest::Schema is a next TODO to ensure schema stability
(in case this becomes production-ready schema as it is).
* Add some APIs to ribbon_impl.h for configuring Ribbon filters.
Although these are reasonably covered by bloom_test, TODO more unit
tests in ribbon_test
* Added a "tool" FindOccupancyForSuccessRate to ribbon_test to get data
for constructing the linear approximations in GetNumSlotsFor95PctSuccess.

Pull Request resolved: https://github.com/facebook/rocksdb/pull/7658

Test Plan:
Some unit tests updated but other testing is left TODO. This
is considered experimental but laying down schema compatibility as early
as possible in case it proves production-quality. Also tested in
stress/crash test.

Reviewed By: jay-zhuang

Differential Revision: D24899349

Pulled By: pdillinger

fbshipit-source-id: 9715f3e6371c959d923aea8077c9423c7a9f82b8
2020-11-12 20:46:14 -08:00
Akanksha Mahajan 202605143b Fix crash test to run in DEBUG_LEVEL=0 mode in tmpfs (#7643)
Summary:
crash tests donot run in DEBUG_MODE=0 on tmpfs when
use_direct_reads/use_direct_io_for_flush_and_compaction is set randomly because
direct I/O is not supported on tmpfs and tests exit.

Fix: Sanitize direct I/O read options in DEBUG_LEVEL=0 so that crash
tests can run in tmpfs. When mmap_reads is set, direct I/O reads options are
unset so we can sanitize direct I/O reads options in case of tmpfs as well.

Pull Request resolved: https://github.com/facebook/rocksdb/pull/7643

Test Plan:
1. export DEBUG_LEVEL=0; export TEST_TMPDIR="/dev/shm";
           export CRASH_TEST_EXT_ARGS="--use_direct_reads=1 --mmap_read=0";
           make crash_test -j64
           2. In DEBUG_LEVEL=1 mode:  make crash_test -j64

Reviewed By: jay-zhuang

Differential Revision: D24766550

Pulled By: akankshamahajan15

fbshipit-source-id: 021720b2343c12c72004f84b26147625d3991d9e
2020-11-10 10:50:34 -08:00
Akanksha Mahajan 06a92fcf5c Add "max_write_buffer_size_to_maintain" to crash test (#7634)
Summary:
Add "max_write_buffer_size_to_maintain" to crash test

Pull Request resolved: https://github.com/facebook/rocksdb/pull/7634

Test Plan: make crash_test -j64

Reviewed By: zhichao-cao

Differential Revision: D24710401

Pulled By: akankshamahajan15

fbshipit-source-id: 89e0412aaa56b2ef5a75603971b82f4b0b494ab7
2020-11-03 13:55:18 -08:00
Jay Zhuang 9e03e4dd52 db_crashtest preserves expected values file for failed crash tests (#7534)
Summary:
If crash test fails, don't delete the `expected_values_file` for later
debug. More details: https://github.com/facebook/rocksdb/issues/7530

Pull Request resolved: https://github.com/facebook/rocksdb/pull/7534

Test Plan: local host

Reviewed By: ajkr

Differential Revision: D24239655

Pulled By: jay-zhuang

fbshipit-source-id: 3566f91a30aae1e27d2f51d910cddd08edb7d4cf
2020-10-12 14:10:14 -07:00
Andrew Kryczka 75d3b6fdf0 Redesign block cache pinning API (#7520)
Summary:
The old flag-based APIs (`BlockBasedTableOptions::pin_l0_filter_and_index_blocks_in_cache` and `BlockBasedTableOptions::pin_top_level_index_and_filter`) were insufficient for our needs. For example, it was impossible to pin only unpartitioned meta-blocks, which could prevent block cache contention when turning on dictionary compression or during a migration to partitioned indexes/filters. It was also impossible to pin all meta-blocks in memory while having predictable memory usage via block cache. If we had continued adding flags to address these scenarios, they would have had significant overlap causing confusion. Instead, this PR deprecates the flags and starts a new API with non-overlapping options.

Pull Request resolved: https://github.com/facebook/rocksdb/pull/7520

Test Plan:
- new unit test
- added new options to stress/crash test and ran for a while: `$ python tools/db_crashtest.py blackbox --simple --max_key=1000000 -write_buffer_size=1048576 -target_file_size_base=1048576 -max_bytes_for_level_base=4194304 --interval=10 -value_size_mult=33 -column_families=1 -reopen=0`

Reviewed By: pdillinger

Differential Revision: D24200034

Pulled By: ajkr

fbshipit-source-id: 3fa7cfc71e7960f7a867511dd6ae5834dd73b13e
2020-10-11 14:58:24 -07:00
sdong 82d42606ad Enable paranoid_file_checks in crash test (#7489)
Summary:
Cover paranoid_file_checks in crash test.

Pull Request resolved: https://github.com/facebook/rocksdb/pull/7489

Test Plan: Run crash tests for hours and didn't see any failure.

Reviewed By: ajkr

Differential Revision: D24063868

fbshipit-source-id: 7b48b110e66ce78ae5d0c99a9f32af86edd34c1e
2020-10-09 08:32:22 -07:00
sdong aedcaaef99 Stress test to support paranoid_file_checks (#7473)
Summary:
It's important to make sure no false positive is reported when options.paranoid_file_checks is used. Add it to stress test and a place holder in crash test. It is disabled in crash test as there appears to be a bug causing false positive.

Pull Request resolved: https://github.com/facebook/rocksdb/pull/7473

Test Plan: Run crash test

Reviewed By: ajkr

Differential Revision: D24026939

fbshipit-source-id: 89102acb45cf041776775ce44a4eef4b0f3a380c
2020-09-30 14:41:33 -07:00
Peter Dillinger 4e258d3e63 Fix backup/restore in stress/crash test (#7357)
Summary:
(1) Skip check on specific key if restoring an old backup
(small minority of cases) because it can fail in those cases. (2) Remove
an old assertion about number of column families and number of keys
passed in, which is broken by atomic flush (cf_consistency) test. Like
other code (for better or worse) assume a single key and iterate over
column families. (3) Apply mock_direct_io to NewSequentialFile so that
db_stress backup works on /dev/shm.

Also add more context to output in case of backup/restore db_stress
failure.

Also a minor fix to BackupEngine to report first failure status in
creating new backup, and drop another clue about the potential
source of a "Backup failed" status.

Reverts "Disable backup/restore stress test (https://github.com/facebook/rocksdb/issues/7350)"

Pull Request resolved: https://github.com/facebook/rocksdb/pull/7357

Test Plan:
Using backup_one_in=10000,
"USE_CLANG=1 make crash_test_with_atomic_flush" for 30+ minutes
"USE_CLANG=1 make blackbox_crash_test" for 30+ minutes
And with use_direct_reads with TEST_TMPDIR=/dev/shm/rocksdb

Reviewed By: riversand963

Differential Revision: D23567244

Pulled By: pdillinger

fbshipit-source-id: e77171c2e8394d173917e36898c02dead1c40b77
2020-09-08 10:50:19 -07:00
Jay Zhuang ef32f11004 Disable backup/restore stress test (#7350)
Summary:
Seems it's causing some tests failures.

Pull Request resolved: https://github.com/facebook/rocksdb/pull/7350

Reviewed By: riversand963

Differential Revision: D23544109

Pulled By: jay-zhuang

fbshipit-source-id: 798a0ca374a20b6c2d0f29582729ff101c6a2e99
2020-09-04 11:58:18 -07:00
Peter Dillinger 06ad5dd293 Add file checksum to stress/crash test (#7343)
Summary:
This change has the crash test randomly select from a few file
checksum implementations, or nullptr, for DB file_checksum_gen_factory.
For compatibility across runs on same DB, each non-null factory can
understand all the other functions, but the default changes.

Pull Request resolved: https://github.com/facebook/rocksdb/pull/7343

Test Plan:
'make blackbox_crash_test' for a while, including with some
debug output to ensure code is being exercised.

Reviewed By: zhichao-cao

Differential Revision: D23494580

Pulled By: pdillinger

fbshipit-source-id: 73bbc7ca32c1adaf619134c0c830f12894880b8a
2020-09-03 23:50:33 -07:00
Peter Dillinger 499c9448d0 Fix, enable, and enhance backup/restore in db_stress (#7348)
Summary:
Although added to db_stress, testing of backup/restore
was never integrated into the crash test, originally concerned about
performance. I've enabled it now and to address the peformance concern,
testing backup/restore is always skipped once the db exceeds a certain
size threshold, default 100MB. This should provide sufficient
opportunity for testing BackupEngine without bogging down everything
else with heavier and heavier operations.

Also fixed backup/restore in db_stress by making sure PurgeOldBackups
can remove manifest files, which are normally kept around for db_stress.

Added more coverage of backup options, and up to three backups being
saved in one backup directory (in some cases).

Pull Request resolved: https://github.com/facebook/rocksdb/pull/7348

Test Plan:
ran 'make blackbox_crash_test' for a while, with heightened
probabilitly of taking backups (1/10k). Also confirmed with some debug
output that the code is being covered, TestBackupRestore only takes
a few seconds to complete when triggered, and even at 1/10k and ~50MB
database, there's <,~ 1 thread testing backups at any time.

Reviewed By: ajkr

Differential Revision: D23510835

Pulled By: pdillinger

fbshipit-source-id: b6b8735591808141f81f10773ac31634cf03b6c0
2020-09-03 20:13:15 -07:00
Andrew Kryczka 7eebe6d38a Mark files for compaction in stress/crash tests (#7231)
Summary:
The mechanism to mark files for compaction is most commonly used in
delete-triggered compaction. This PR adds an option to exercise the
marking mechanism on random files created by db_stress. This PR also
enables that option in db_crashtest.py on its db_stress runs at random.

Pull Request resolved: https://github.com/facebook/rocksdb/pull/7231

Test Plan:
- ran some minified crash tests; verified they succeed and we see `"compaction_reason": "FilesMarkedForCompaction"` regularly in the logs.

```
$ TEST_TMPDIR=/dev/shm python tools/db_crashtest.py blackbox --duration=600 --interval=30 --max_key=10000000 --write_buffer_size=1048576 --target_file_size_base=1048576 --max_bytes_for_level_base=4194304 --value_size_mult=33
$ TEST_TMPDIR=/dev/shm python tools/db_crashtest.py whitebox --duration=600 --interval=30 --max_key=1000000 --write_buffer_size=1048576 --target_file_size_base=1048576 --max_bytes_for_level_base=4194304 --value_size_mult=33 --random_kill_odd=8887
```

Reviewed By: anand1976

Differential Revision: D23025156

Pulled By: ajkr

fbshipit-source-id: a404c467ebc12afa94dae35956ea9b372f592a96
2020-08-10 16:17:56 -07:00
Jay Zhuang fc4d5f5065 Add stress test for GetProperty (#7111)
Summary:
Add stress test coverage for `DB::GetProperty()`.

Pull Request resolved: https://github.com/facebook/rocksdb/pull/7111

Test Plan:
```
./db_stress -get_property_one_in=1
make crash_test
```

Reviewed By: ajkr

Differential Revision: D22487906

Pulled By: jay-zhuang

fbshipit-source-id: c118d95cc9b4e2fa669a06e6aa531541fa885dc5
2020-07-14 12:12:36 -07:00
Andrew Kryczka 8efb5cfb6f add SstFileManager to crash test (#6993)
Summary:
SstFileManager is already supported in the stress test as of https://github.com/facebook/rocksdb/issues/6454. This
PR enables the SstFileManager in some of the crash test runs.
Pull Request resolved: https://github.com/facebook/rocksdb/pull/6993

Reviewed By: riversand963

Differential Revision: D22084406

Pulled By: ajkr

fbshipit-source-id: 78b8642682e7570ff6ec3a1c3ccd9940f4362289
2020-06-23 16:27:20 -07:00
Peter Dillinger 5b2bbacb6f Minimize memory internal fragmentation for Bloom filters (#6427)
Summary:
New experimental option BBTO::optimize_filters_for_memory builds
filters that maximize their use of "usable size" from malloc_usable_size,
which is also used to compute block cache charges.

Rather than always "rounding up," we track state in the
BloomFilterPolicy object to mix essentially "rounding down" and
"rounding up" so that the average FP rate of all generated filters is
the same as without the option. (YMMV as heavily accessed filters might
be unluckily lower accuracy.)

Thus, the option near-minimizes what the block cache considers as
"memory used" for a given target Bloom filter false positive rate and
Bloom filter implementation. There are no forward or backward
compatibility issues with this change, though it only works on the
format_version=5 Bloom filter.

With Jemalloc, we see about 10% reduction in memory footprint (and block
cache charge) for Bloom filters, but 1-2% increase in storage footprint,
due to encoding efficiency losses (FP rate is non-linear with bits/key).

Why not weighted random round up/down rather than state tracking? By
only requiring malloc_usable_size, we don't actually know what the next
larger and next smaller usable sizes for the allocator are. We pick a
requested size, accept and use whatever usable size it has, and use the
difference to inform our next choice. This allows us to narrow in on the
right balance without tracking/predicting usable sizes.

Why not weight history of generated filter false positive rates by
number of keys? This could lead to excess skew in small filters after
generating a large filter.

Results from filter_bench with jemalloc (irrelevant details omitted):

    (normal keys/filter, but high variance)
    $ ./filter_bench -quick -impl=2 -average_keys_per_filter=30000 -vary_key_count_ratio=0.9
    Build avg ns/key: 29.6278
    Number of filters: 5516
    Total size (MB): 200.046
    Reported total allocated memory (MB): 220.597
    Reported internal fragmentation: 10.2732%
    Bits/key stored: 10.0097
    Average FP rate %: 0.965228
    $ ./filter_bench -quick -impl=2 -average_keys_per_filter=30000 -vary_key_count_ratio=0.9 -optimize_filters_for_memory
    Build avg ns/key: 30.5104
    Number of filters: 5464
    Total size (MB): 200.015
    Reported total allocated memory (MB): 200.322
    Reported internal fragmentation: 0.153709%
    Bits/key stored: 10.1011
    Average FP rate %: 0.966313

    (very few keys / filter, optimization not as effective due to ~59 byte
     internal fragmentation in blocked Bloom filter representation)
    $ ./filter_bench -quick -impl=2 -average_keys_per_filter=1000 -vary_key_count_ratio=0.9
    Build avg ns/key: 29.5649
    Number of filters: 162950
    Total size (MB): 200.001
    Reported total allocated memory (MB): 224.624
    Reported internal fragmentation: 12.3117%
    Bits/key stored: 10.2951
    Average FP rate %: 0.821534
    $ ./filter_bench -quick -impl=2 -average_keys_per_filter=1000 -vary_key_count_ratio=0.9 -optimize_filters_for_memory
    Build avg ns/key: 31.8057
    Number of filters: 159849
    Total size (MB): 200
    Reported total allocated memory (MB): 208.846
    Reported internal fragmentation: 4.42297%
    Bits/key stored: 10.4948
    Average FP rate %: 0.811006

    (high keys/filter)
    $ ./filter_bench -quick -impl=2 -average_keys_per_filter=1000000 -vary_key_count_ratio=0.9
    Build avg ns/key: 29.7017
    Number of filters: 164
    Total size (MB): 200.352
    Reported total allocated memory (MB): 221.5
    Reported internal fragmentation: 10.5552%
    Bits/key stored: 10.0003
    Average FP rate %: 0.969358
    $ ./filter_bench -quick -impl=2 -average_keys_per_filter=1000000 -vary_key_count_ratio=0.9 -optimize_filters_for_memory
    Build avg ns/key: 30.7131
    Number of filters: 160
    Total size (MB): 200.928
    Reported total allocated memory (MB): 200.938
    Reported internal fragmentation: 0.00448054%
    Bits/key stored: 10.1852
    Average FP rate %: 0.963387

And from db_bench (block cache) with jemalloc:

    $ ./db_bench -db=/dev/shm/dbbench.no_optimize -benchmarks=fillrandom -format_version=5 -value_size=90 -bloom_bits=10 -num=2000000 -threads=8 -compaction_style=2 -fifo_compaction_max_table_files_size_mb=10000 -fifo_compaction_allow_compaction=false
    $ ./db_bench -db=/dev/shm/dbbench -benchmarks=fillrandom -format_version=5 -value_size=90 -bloom_bits=10 -num=2000000 -threads=8 -optimize_filters_for_memory -compaction_style=2 -fifo_compaction_max_table_files_size_mb=10000 -fifo_compaction_allow_compaction=false
    $ (for FILE in /dev/shm/dbbench.no_optimize/*.sst; do ./sst_dump --file=$FILE --show_properties | grep 'filter block' ; done) | awk '{ t += $4; } END { print t; }'
    17063835
    $ (for FILE in /dev/shm/dbbench/*.sst; do ./sst_dump --file=$FILE --show_properties | grep 'filter block' ; done) | awk '{ t += $4; } END { print t; }'
    17430747
    $ #^ 2.1% additional filter storage
    $ ./db_bench -db=/dev/shm/dbbench.no_optimize -use_existing_db -benchmarks=readrandom,stats -statistics -bloom_bits=10 -num=2000000 -compaction_style=2 -fifo_compaction_max_table_files_size_mb=10000 -fifo_compaction_allow_compaction=false -duration=10 -cache_index_and_filter_blocks -cache_size=1000000000
    rocksdb.block.cache.index.add COUNT : 33
    rocksdb.block.cache.index.bytes.insert COUNT : 8440400
    rocksdb.block.cache.filter.add COUNT : 33
    rocksdb.block.cache.filter.bytes.insert COUNT : 21087528
    rocksdb.bloom.filter.useful COUNT : 4963889
    rocksdb.bloom.filter.full.positive COUNT : 1214081
    rocksdb.bloom.filter.full.true.positive COUNT : 1161999
    $ #^ 1.04 % observed FP rate
    $ ./db_bench -db=/dev/shm/dbbench -use_existing_db -benchmarks=readrandom,stats -statistics -bloom_bits=10 -num=2000000 -compaction_style=2 -fifo_compaction_max_table_files_size_mb=10000 -fifo_compaction_allow_compaction=false -optimize_filters_for_memory -duration=10 -cache_index_and_filter_blocks -cache_size=1000000000
    rocksdb.block.cache.index.add COUNT : 33
    rocksdb.block.cache.index.bytes.insert COUNT : 8448592
    rocksdb.block.cache.filter.add COUNT : 33
    rocksdb.block.cache.filter.bytes.insert COUNT : 18220328
    rocksdb.bloom.filter.useful COUNT : 5360933
    rocksdb.bloom.filter.full.positive COUNT : 1321315
    rocksdb.bloom.filter.full.true.positive COUNT : 1262999
    $ #^ 1.08 % observed FP rate, 13.6% less memory usage for filters

(Due to specific key density, this example tends to generate filters that are "worse than average" for internal fragmentation. "Better than average" cases can show little or no improvement.)
Pull Request resolved: https://github.com/facebook/rocksdb/pull/6427

Test Plan: unit test added, 'make check' with gcc, clang and valgrind

Reviewed By: siying

Differential Revision: D22124374

Pulled By: pdillinger

fbshipit-source-id: f3e3aa152f9043ddf4fae25799e76341d0d8714e
2020-06-22 13:32:07 -07:00
Andrew Kryczka d76eed4839 minor fixes for stress/crash contruns (#7006)
Summary:
Avoid using `cf_consistency` together with `enable_compaction_filter` as
the former heavily uses snapshots while the latter is incompatible with
snapshots.

Also fix a clang-analyze error for a write to a variable that is never
read.
Pull Request resolved: https://github.com/facebook/rocksdb/pull/7006

Reviewed By: zhichao-cao

Differential Revision: D22141679

Pulled By: ajkr

fbshipit-source-id: 1840ae238168818a9ab5973f90fd78c067399447
2020-06-19 16:05:17 -07:00
Peter Dillinger 88b4210701 Remove racially charged terms "whitelist" and "blacklist" (#7008)
Summary:
We don't need them.
Pull Request resolved: https://github.com/facebook/rocksdb/pull/7008

Test Plan: "make check" and ensure "make crash_test" starts

Reviewed By: ajkr

Differential Revision: D22143838

Pulled By: pdillinger

fbshipit-source-id: 72c8e16603abc59f4954e304466bc4dc1f58f94e
2020-06-19 15:27:32 -07:00
Andrew Kryczka 775dc623ad add `CompactionFilter` to stress/crash tests (#6988)
Summary:
Added a `CompactionFilter` that is aware of the stress test's expected state. It only drops key versions that are already covered according to the expected state. It is incompatible with snapshots (same as all `CompactionFilter`s), so disables all snapshot-related features when used in the crash test.
Pull Request resolved: https://github.com/facebook/rocksdb/pull/6988

Test Plan:
running a minified blackbox crash test

```
$ TEST_TMPDIR=/dev/shm python tools/db_crashtest.py blackbox --max_key=1000000 -write_buffer_size=1048576 -max_bytes_for_level_base=4194304 -target_file_size_base=1048576 -value_size_mult=33 --interval=10 --duration=3600
```

Reviewed By: anand1976

Differential Revision: D22072888

Pulled By: ajkr

fbshipit-source-id: 727b9d7a90d5eab18be0ec6cd5a810712ac13320
2020-06-18 09:54:55 -07:00
Yanqin Jin 15d9f28da5 Add stress test for best-efforts recovery (#6819)
Summary:
Add crash test for the case of best-efforts recovery.
After a certain amount of time, we kill the db_stress process, randomly delete some certain table files and restart db_stress. Given the randomness of file deletion, it is difficult to verify against a reference for data correctness. Therefore, we just check that the db can restart successfully.
Pull Request resolved: https://github.com/facebook/rocksdb/pull/6819

Test Plan:
```
./db_stress -best_efforts_recovery=true -disable_wal=1 -reopen=0
./db_stress -best_efforts_recovery=true -disable_wal=0 -skip_verifydb=1 -verify_db_one_in=0 -continuous_verification_interval=0
make crash_test_with_best_efforts_recovery
```

Reviewed By: anand1976

Differential Revision: D21436753

Pulled By: riversand963

fbshipit-source-id: 0b3605c922a16c37ed17d5ab6682ca4240e47926
2020-06-12 19:27:46 -07:00
Peter Dillinger 0c56fc4d66 Allow missing "unversioned" python, as in CentOS 8 (#6883)
Summary:
RocksDB Makefile was assuming existence of 'python' command,
which is not present in CentOS 8. We avoid using 'python' if 'python3' is available.

Also added fancy logic to format-diff.sh to make clang-format-diff.py for Python2 work even with Python3 only (as some CentOS 8 FB machines come equipped)

Also, now use just 'python3' for PYTHON if not found so that an informative
"command not found" error will result rather than something weird.
Pull Request resolved: https://github.com/facebook/rocksdb/pull/6883

Test Plan: manually tried some variants, 'make check' on a fresh CentOS 8 machine without 'python' executable or Python2 but with clang-format-diff.py for Python2.

Reviewed By: gg814

Differential Revision: D21767029

Pulled By: pdillinger

fbshipit-source-id: 54761b376b140a3922407bdc462f3572f461d0e9
2020-05-29 11:29:23 -07:00
sdong 12825894a2 Disable "compression_parallel_threads" in crash test for now (#6816)
Summary:
"compressio_parallel_threads" caused several test failure tests. To keep crash test clean, disable it for now.
Pull Request resolved: https://github.com/facebook/rocksdb/pull/6816

Test Plan: "make crash_test" to make sure the python script doesn't break

Reviewed By: zhichao-cao

Differential Revision: D21462112

fbshipit-source-id: 9eecc764800da82cd19665dc8b167eacead3310b
2020-05-07 20:52:29 -07:00
Andrew Kryczka 1f20df2f38 cover single level universal in crash test (#6818)
Summary: Pull Request resolved: https://github.com/facebook/rocksdb/pull/6818

Test Plan:
fast whitebox test and verify there are some single-level universal and
some multi-level universal runs.

```
$ python ./tools/db_crashtest.py whitebox --simple -max_key=1000000 -value_size_mult=33 -write_buffer_size=524288 -target_file_size_base=524288 -max_bytes_for_level_base=2097152 --duration=120 --interval=10 --ops_per_thread=1000 --random_kill_odd=887
```

Reviewed By: riversand963

Differential Revision: D21432138

Pulled By: ajkr

fbshipit-source-id: 2fc5ba9f3dfa49bb11e81da7dd00a17b476e64d7
2020-05-06 18:08:09 -07:00
Ziyue Yang e619a20e93 Add an option for parallel compression in for db_stress (#6722)
Summary:
This commit adds an `compression_parallel_threads` option in
db_stress. It also fixes the naming of parallel compression
option in db_bench to keep it aligned with others.
Pull Request resolved: https://github.com/facebook/rocksdb/pull/6722

Reviewed By: pdillinger

Differential Revision: D21091385

fbshipit-source-id: c9ba8c4e5cc327ff9e6094a6dc6a15fcff70f100
2020-04-30 10:49:07 -07:00
Cheng Chang 0a77617820 Disable O_DIRECT in stress test when db directory does not support direct IO (#6727)
Summary:
In crash test, the db directory might be set to /dev/shm or /tmp, in certain environments such as internal testing infrastructure, neither of these directories support direct IO, so direct IO is never enabled in crash test.

This PR sets up SyncPoints in direct IO related code paths to disable O_DIRECT flag in calls to `open`, so the direct IO code paths will be executed, all direct IO related assertions will be checked, but no real direct IO request will be issued to the file system.
Pull Request resolved: https://github.com/facebook/rocksdb/pull/6727

Test Plan:
export CRASH_TEST_EXT_ARGS="--use_direct_reads=1 --mmap_read=0"
make -j24 crash_test

Reviewed By: zhichao-cao

Differential Revision: D21139250

Pulled By: cheng-chang

fbshipit-source-id: db9adfe78d91aa4759835b1af91c5db7b27b62ee
2020-04-25 00:01:03 -07:00
sdong fe206f4f7c crash_test to cover index_type kBinarySearchWithFirstKey (#6721)
Summary:
Recently index_type kBinarySearchWithFirstKey is improved so that the API guarantee is exactly the same as other types and it is ready for wide production. We should cover it in crash tst.
Pull Request resolved: https://github.com/facebook/rocksdb/pull/6721

Test Plan: Run crash_test

Reviewed By: anand1976

Differential Revision: D21099781

fbshipit-source-id: fda91eba831d9eacbb140c703e9768bb1701f935
2020-04-20 12:57:15 -07:00
sdong 63d82e57b9 crash_test to cover small max_open_files (#6719)
Summary:
RocksDB behavior is different while max_open_files is small or large. Add the coverage to small max_open_files.
Pull Request resolved: https://github.com/facebook/rocksdb/pull/6719

Test Plan: Run crash_test

Reviewed By: pdillinger

Differential Revision: D21081021

fbshipit-source-id: e3e211761a9bd25d93d19a61c1f7b62d48cf5e3c
2020-04-17 11:00:07 -07:00
sdong 73523baeb1 crash_test to cover options.avoid_flush_during_recovery (#6712)
Summary:
Options.avoid_flush_during_recovery is uncovered in crash_test. Add the coverage with a chance of 1/8, as it is a less frequently used options.
Pull Request resolved: https://github.com/facebook/rocksdb/pull/6712

Test Plan: Run crash_test and see the option can be used or not used by chance.

Reviewed By: ltamasi

Differential Revision: D21056566

fbshipit-source-id: c3b1521517cfc204786e6ef8c6acd7fffda64793
2020-04-16 12:11:45 -07:00
Yueh-Hsuan Chiang 5801af4646 Add env_fault_injection argument to db_stress (#6687)
Summary:
Add env_fault_injection argument to db_stress.  When enabled,
FaultInjectionTestEnv will be used instead.  Currently this
option does not support running with other env setting.

This will allow
us to later manually produce error when running db_crashtest.
Pull Request resolved: https://github.com/facebook/rocksdb/pull/6687

Test Plan:
make db_stress -j32
./db_stress --env_fault_injection
./db_stress --env_fault_injection --hdfs   // expect error message

Reviewed By: ajkr

Differential Revision: D21014683

Pulled By: yhchiang

fbshipit-source-id: 0724aeac37efd57adb72a37defe6dbd3bfa8106a
2020-04-16 11:13:44 -07:00
anand76 610a09ccff Remove a printf from db_stress that's not useful info (#6705)
Summary:
This was causing db_crashtest.py to wrongly assume an error by parsing the output. Hopefully this will stabilize the crash tests.
Pull Request resolved: https://github.com/facebook/rocksdb/pull/6705

Test Plan: make blackbox_crash_test

Reviewed By: ltamasi

Differential Revision: D21043335

Pulled By: anand1976

fbshipit-source-id: 5cddd112b124d4e2ebd11724a17d4ef0f50c1cf8
2020-04-15 12:13:35 -07:00
anand76 79c838eb0f Fix a few bugs in db_stress fault injection (#6693)
Summary:
Fix the following issues -
1. Output parsing error in db_crashtest.py
2. Memory leak on exit
3. False alarm on filter block read error
Pull Request resolved: https://github.com/facebook/rocksdb/pull/6693

Test Plan: asan_crash

Reviewed By: cheng-chang

Differential Revision: D20990399

Pulled By: anand1976

fbshipit-source-id: 178ee0dd7c69a4bc5db698379db0dedb29281699
2020-04-13 11:01:03 -07:00
anand76 5c19a441c4 Fault injection in db_stress (#6538)
Summary:
This PR implements a fault injection mechanism for injecting errors in reads in db_stress. The FaultInjectionTestFS is used for this purpose. A thread local structure is used to track the errors, so that each db_stress thread can independently enable/disable error injection and verify observed errors against expected errors. This is initially enabled only for Get and MultiGet, but can be extended to iterator as well once its proven stable.
Pull Request resolved: https://github.com/facebook/rocksdb/pull/6538

Test Plan:
crash_test
make check

Reviewed By: riversand963

Differential Revision: D20714347

Pulled By: anand1976

fbshipit-source-id: d7598321d4a2d72bda0ced57411a337a91d87dc7
2020-04-10 17:21:26 -07:00
Yanqin Jin ccf7676455 Update a few scripts to be python3 compatible (#6525)
Summary:
There are a few scripts with python3 compatibility issues that were not
detected by automated tool before. Update them now.

Test Plan (devserver):
python2 tools/ldb_test.py
python3 tools/ldb_test.py

python2 tools/write_stress_runner.py --runtime_sec=30
python3 tools/write_stress_runner.py --runtime_sec=30

python2 tools/db_crashtest.py --simple --interval=2 --duration=10 blackbox
python3 tools/db_crashtest.py --simple --interval=2 --duration=10 blackbox

python2 tools/db_crashtest.py --simple --duration=10 --random_kill_odd=1000 --ops_per_thread=1000 whitebox
python3 tools/db_crashtest.py --simple --duration=10 --random_kill_odd=1000 --ops_per_thread=1000 whitebox
Pull Request resolved: https://github.com/facebook/rocksdb/pull/6525

Reviewed By: cheng-chang

Differential Revision: D20627820

Pulled By: riversand963

fbshipit-source-id: 4b25a7bd4d001c7f868be8b640ef876523be6ca3
2020-03-24 21:00:27 -07:00
Levi Tamasi 217ce20021 Remove GetSortedWalFiles/GetCurrentWalFile from the crash test (#6491)
Summary:
Currently, `db_stress` tests a randomly picked one of `GetLiveFiles`,
`GetSortedWalFiles`, and `GetCurrentWalFile` with a 1/N chance when the
command line parameter `get_live_files_and_wal_files_one_in` is specified.
The problem is that `GetSortedWalFiles` and `GetCurrentWalFile` are unreliable
in the sense that they can return errors if another thread removes a WAL file
while they are executing (which is a perfectly plausible and legitimate scenario).
The patch splits this command line parameter into three (one for each API),
and changes the crash test script so that only `GetLiveFiles` is tested during
our continuous crash test runs.
Pull Request resolved: https://github.com/facebook/rocksdb/pull/6491

Test Plan:
```
make check
python tools/db_crashtest.py whitebox
```

Reviewed By: siying

Differential Revision: D20312200

Pulled By: ltamasi

fbshipit-source-id: e7c3481eddfe3bd3d5349476e34abc9eee5b7dc8
2020-03-18 17:14:15 -07:00
Maysam Yabandeh 967a2d953f Revert "crash_test to enable block-based table hash index (#6310)" (#6327)
Summary:
This reverts commit 8e309b35bb.
The stress tests are failing . Revert it until we figure the root cause.
Pull Request resolved: https://github.com/facebook/rocksdb/pull/6327

Differential Revision: D19537657

Pulled By: maysamyabandeh

fbshipit-source-id: bf34a5dd720825957729e136e9a5a729a240e61a
2020-01-23 09:09:17 -08:00
Maysam Yabandeh cb1142e00d Set index_block_restart_interval of kHashSearch to 1 in stress test (#6324)
Summary:
kHashSearch is incompatible with larger than 1 values for index_block_restart_interval. Setting it to 1 in stress tests would avoid confusion about the test parameters.
Pull Request resolved: https://github.com/facebook/rocksdb/pull/6324

Differential Revision: D19525669

Pulled By: maysamyabandeh

fbshipit-source-id: fbf3a797e0ebcebb4d32eba3728cf3583906fc8a
2020-01-22 16:33:21 -08:00
sdong 8e309b35bb crash_test to enable block-based table hash index (#6310)
Summary:
Block-based table has index has been disabled in crash test due to bugs. We fixed a bug and re-enable it.
Pull Request resolved: https://github.com/facebook/rocksdb/pull/6310

Test Plan: Finish one round of "crash_test_with_atomic_flush" test successfully while exclusively running has index. Another run also ran for several hours without failure.

Differential Revision: D19455856

fbshipit-source-id: 1192752d2c1e81ed7e5c5c7a9481c841582d5274
2020-01-21 12:27:30 -08:00
sdong 6b64aed4c0 Fix bug which causes crash_test to always run on sync mode (#6304)
Summary:
A previous change meant to make db_stress to run on sync=1 mode for 1/20 of the time in crash_test, but a bug caused to to always run on sync=1 mode. Fix it.
Pull Request resolved: https://github.com/facebook/rocksdb/pull/6304

Test Plan: Start and kill "python -u tools/db_crashtest.py --simple whitebox" multiple times and observe that most times sync=0 is used while some times sync=1 is used.

Differential Revision: D19433000

fbshipit-source-id: 7a0adba39b17a1b3acbbd791bb0cdb743b91fa95
2020-01-17 01:46:48 -08:00
anand76 687119aeaf Variable key length in db_stress (#6273)
Summary:
Undo https://github.com/facebook/rocksdb/issues/6243 and fix the crash test failures.
Pull Request resolved: https://github.com/facebook/rocksdb/pull/6273

Test Plan: Run make ubsan_crash_test

Differential Revision: D19331472

Pulled By: anand1976

fbshipit-source-id: 30aa4a36c1b0f77a97159d82bbfd1cd767878e28
2020-01-09 21:27:18 -08:00
Peter Dillinger 83108d23e8 Re-enable level_compaction_dynamic_level_bytes in crash test (#6251)
Summary:
Was probably a false signal suggesting a problem in https://github.com/facebook/rocksdb/issues/6217
Pull Request resolved: https://github.com/facebook/rocksdb/pull/6251

Test Plan: 'make crash_test'

Differential Revision: D19246951

Pulled By: pdillinger

fbshipit-source-id: 3e4fafcd9a7cf5f19ffd207b90279ba615145d6f
2019-12-30 10:15:49 -08:00
Peter Dillinger 37fd2b9694 Revert "Generate variable length keys in db_stress (#6165)" and follow-ups (#6243)
Summary:
This commit is suspected in some crash test failures such as

Verification failed for column family 0 key 78438077: Value not found: NotFound:
Pull Request resolved: https://github.com/facebook/rocksdb/pull/6243

Test Plan: 'make check' and start 'make crash_test'

Differential Revision: D19220495

Pulled By: pdillinger

fbshipit-source-id: 6c4709cee80ab4344e06ce360f51e947d79fb3fa
2019-12-23 16:32:57 -08:00
anand76 3160edfdc7 Generate variable length keys in db_stress (#6165)
Summary:
Currently, db_stress generates fixed length keys of 8 bytes. This patch adds the ability to generate variable length keys. Most of the db_stress code continues to work with a numeric key randomly generated, and the numeric key also acts as an index into the values_ array. The numeric key is mapped to a variable length string key in a deterministic way. Furthermore, the ordering is preserved.
Pull Request resolved: https://github.com/facebook/rocksdb/pull/6165

Test Plan: run make crash_test

Differential Revision: D19204646

Pulled By: anand1976

fbshipit-source-id: d2d46a96615b4832a8be2a981f5913905f0e1ca7
2019-12-20 21:10:33 -08:00
sdong 338c149b92 crash_test to cover bottommost compression and some other changes (#6215)
Summary:
Several improvements to crash_test/stress_test:
(1) Stress_test to support an parameter of bottommost compression
(2) Rename those FLAGS_* variables that are not gflags to avoid confusion
(3) Crash_test to randomly generate compression type for bottommost compression with half the chance.
(4) Stress_test to sanitize unsupported compression type to snappy, so that crash_test to cover all possible compression types and people don't need to worry about they don't support all comrpession types in their environment.
(5) In crash_test, when generating db_stress command, sort arguments in alphabeta order, so that it is easier to find value for a specific argument.
Pull Request resolved: https://github.com/facebook/rocksdb/pull/6215

Test Plan: Run "make crash_test" for a while and see the botommost option shown in LOG files.

Differential Revision: D19171255

fbshipit-source-id: d7001e246c4ff9ee5760776eea0be97738650735
2019-12-20 16:14:52 -08:00
Zhichao Cao f89dea4fec db_stress: Added the verification for GetLiveFiles, GetSortedWalFiles and GetCurrentWalFile (#6224)
Summary:
Add the verification in operateDB to verify GetLiveFiles, GetSortedWalFiles and GetCurrentWalFile. The test will be called every 1 out of N, N is decided by get_live_files_and_wal_files_one_i, whose default is 1000000.
Pull Request resolved: https://github.com/facebook/rocksdb/pull/6224

Test Plan: pass db_stress default run.

Differential Revision: D19183358

Pulled By: zhichao-cao

fbshipit-source-id: 20073cf72ede77a3e0d3cf5f28304f1f605d2b1a
2019-12-20 12:07:30 -08:00
Yanqin Jin 670a916d01 Add more verification to db_stress (#6173)
Summary:
Currently, db_stress performs verification by calling `VerifyDb()` at the end of test and optionally before tests start. In case of corruption or incorrect result, it will be too late. This PR adds more verification in two ways.
1. For cf consistency test, each test thread takes a snapshot and verifies every N ops. N is configurable via `-verify_db_one_in`. This option is not supported in other stress tests.
2. For cf consistency test, we use another background thread in which a secondary instance periodically tails the primary (interval is configurable). We verify the secondary. Once an error is detected, we terminate the test and report. This does not affect other stress tests.

Test plan (devserver)
```
$./db_stress -test_cf_consistency -verify_db_one_in=0 -ops_per_thread=100000 -continuous_verification_interval=100
$./db_stress -test_cf_consistency -verify_db_one_in=1000 -ops_per_thread=10000 -continuous_verification_interval=0
$make crash_test
```
Pull Request resolved: https://github.com/facebook/rocksdb/pull/6173

Differential Revision: D19047367

Pulled By: riversand963

fbshipit-source-id: aeed584ad71f9310c111445f34975e5ab47a0615
2019-12-20 08:49:29 -08:00
Maysam Yabandeh 77d5ba7887 Revert "Add kHashSearch to stress tests (#6210)" (#6220)
Summary:
This reverts commit 54f9092b0c.
It making our daily stress tests fail. Revert it until the issues are fixed.
Pull Request resolved: https://github.com/facebook/rocksdb/pull/6220

Differential Revision: D19179881

Pulled By: maysamyabandeh

fbshipit-source-id: 99de0eaf776567fa81110b9ad2608234a16083ce
2019-12-19 10:46:55 -08:00
Peter Dillinger 9ff569bdfc Temporarily disable level_compaction_dynamic_level_bytes in crash test (#6217)
Summary:
We're seeing assertion violations like this in crash test:

db_stress: table/block_based/block_based_table_reader.cc:4129: virtual uint64_t rocksdb::BlockBasedTable::ApproximateSize(const rocksdb::Slice&, const rocksdb::Slice&, rocksdb::TableReaderCaller): Assertion `end_offset >= start_offset' failed.***

And ApproximateSize appears only to be called with the level_compaction_dynamic_level_bytes option.
Pull Request resolved: https://github.com/facebook/rocksdb/pull/6217

Test Plan:
temporarily put an assert(false) in ApproximateSize and
briefly run 'make crash_test'

Differential Revision: D19179174

Pulled By: pdillinger

fbshipit-source-id: 506e6549aea0da19b363a1a6da04373c364d92e4
2019-12-19 10:24:49 -08:00
Maysam Yabandeh 54f9092b0c Add kHashSearch to stress tests (#6210)
Summary:
Beside extending index_type to kHashSearch, it clarifies in the code base that this feature is incompatible with index_block_restart_interval > 1.
Pull Request resolved: https://github.com/facebook/rocksdb/pull/6210

Test Plan:
```
make -j32 crash_test

Differential Revision: D19166567

Pulled By: maysamyabandeh

fbshipit-source-id: 3aaf75a70a8b462d372d43aac69dbd10df303ec7
2019-12-18 18:09:30 -08:00
Peter Dillinger dfb259e48d Fix syntax error (!) in db_crashtest.py (#6207)
Summary:
Fixes syntax error from https://github.com/facebook/rocksdb/pull/6203

Pull Request resolved: https://github.com/facebook/rocksdb/pull/6207

Test Plan: make blackbox_crash_test -> no more syntax error

Differential Revision: D19161752

Pulled By: pdillinger

fbshipit-source-id: b3032f296041ab56307762622b9ef6c03a8379aa
2019-12-18 09:32:52 -08:00
anand76 2afea29762 Add VerifyChecksum() to db_stress (#6203)
Summary:
Add an option to db_stress, verify_checksum_one_in, to call DB::VerifyChecksum() once every N ops.
Pull Request resolved: https://github.com/facebook/rocksdb/pull/6203

Differential Revision: D19145753

Pulled By: anand1976

fbshipit-source-id: d09edf21f309ad53aa40dd25b7a563d50665fd8b
2019-12-17 20:44:58 -08:00
sdong 9f250dd88e crash_test: two fixes (#6200)
Summary:
Fix two crash test issues:
1. sync mode should not run with disable_wal=true
2. disable "compaction_readahead_size" for now. With it on, some block checksum verification failure will happen in compaction paths. Not sure why, but disable it for now to keep the test clean.
Pull Request resolved: https://github.com/facebook/rocksdb/pull/6200

Test Plan: Run "make crash_test" and "make crash_test_with_atomic_flush" and see it runs way longer than before the fix without failing.

Differential Revision: D19143493

fbshipit-source-id: 438fad52fbda60aafd142e1b65578addbe7d72b1
2019-12-17 18:25:04 -08:00
sdong bcc372c0c3 Add some new options to crash_test (#6176)
Summary:
Several options are trivially added to crash test and random values are picked.
Made simple test run non-dynamic level and normal test run dynamic level.
Pull Request resolved: https://github.com/facebook/rocksdb/pull/6176

Test Plan: Run crash_test and watch the printing

Differential Revision: D19053955

fbshipit-source-id: 958cb43c968541ebd87ed4d91e778bd1d40e7502
2019-12-16 15:43:13 -08:00
Maysam Yabandeh 4b97812da8 Add long-running snapshots to stress tests (#6171)
Summary:
Current implementation holds on to 10% of snapshots for 10x longer, and 1% of snapshots 100x longer.
Pull Request resolved: https://github.com/facebook/rocksdb/pull/6171

Test Plan:
```
make -j32 crash_test

Differential Revision: D19038399

Pulled By: maysamyabandeh

fbshipit-source-id: 75da2dbb5c47a0b3f37d299b8719e392b73b42c0
2019-12-14 15:22:40 -08:00
Maysam Yabandeh fec7302a9d Enable unordered_write in stress tests (#6164)
Summary:
With WritePrepared transactions configured with two_write_queues, unordered_write will offer the same guarantees as vanilla rocksdb and thus can be enabled in stress tests.
Pull Request resolved: https://github.com/facebook/rocksdb/pull/6164

Test Plan:
```
make -j32 crash_test_with_txn

Differential Revision: D18991899

Pulled By: maysamyabandeh

fbshipit-source-id: eece5e96b4169b67d7931e5c0afca88540a113e1
2019-12-13 10:25:04 -08:00
Maysam Yabandeh 8613ee2e94 Enable all txn write policies in crash test (#6158)
Summary:
Currently the default txn write policy in crash tests is WRITE_PREPARED. The patch randomly picks the write policy at the start of the crash test.
Pull Request resolved: https://github.com/facebook/rocksdb/pull/6158

Test Plan:
```
make -j32 crash_test_with_txn
```

Differential Revision: D18946307

Pulled By: maysamyabandeh

fbshipit-source-id: f77d7a94f99a08791ef9626da153d284bf521950
2019-12-12 10:43:49 -08:00
Maysam Yabandeh 1ad6fa9cc7 Enable txn in crash tests (#6155)
Summary:
Start daily crash tests with use_txn flag.
Pull Request resolved: https://github.com/facebook/rocksdb/pull/6155

Differential Revision: D18943630

Pulled By: maysamyabandeh

fbshipit-source-id: eea99a6ffd5f57fb9651f6ca7dab8fbf70379c87
2019-12-11 16:01:55 -08:00
Peter Dillinger a653857178 Add PauseBackgroundWork() to db_stress (#6148)
Summary:
Worker thread will occasionally call PauseBackgroundWork(),
briefly sleep (to avoid stalling itself) and then call
ContinueBackgroundWork().
Pull Request resolved: https://github.com/facebook/rocksdb/pull/6148

Test Plan:
some running of 'make blackbox_crash_test' with temporary
printf output to confirm code occasionally reached.

Differential Revision: D18913886

Pulled By: pdillinger

fbshipit-source-id: ae9356a803390929f3165dfb6a00194692ba92be
2019-12-10 15:46:48 -08:00
Peter Dillinger 6380df5e10 Vary bloom_bits in db_crashtest (#6103)
Summary:
Especially with non-integral bits/key now supported,
db_crashtest should vary the bloom_bits configuration. The probabilities
look like this:

1/2 chance of a uniform int from 0 to 19. This includes overall 1/40
chance of 0 which disables the bloom filter.

1/2 chance of a float from a lognormal distribution with a median of 10.
This always produces positive values but with a decent chance of < 1
(overall ~1/40) or > 100 (overall ~1/40), the enforced/coerced
implementation limits.
Pull Request resolved: https://github.com/facebook/rocksdb/pull/6103

Test Plan:
start 'make blackbox_crash_test' several times and look at
configuration output

Differential Revision: D18734877

Pulled By: pdillinger

fbshipit-source-id: 4a38cb057d3b3fc1327f93199f65b9a9ffbd7316
2019-12-10 08:39:50 -08:00
Peter Dillinger f19faf7814 Add format_version=5 to db_crashtest (#6102)
Summary:
format_version=5 enables new Bloom filter. Using 2/5
probability for "latest and greatest" rather than naive 1/4.
Pull Request resolved: https://github.com/facebook/rocksdb/pull/6102

Test Plan: start 'make blackbox_crash_test'

Differential Revision: D18735685

Pulled By: pdillinger

fbshipit-source-id: e81529c8a3f53560d246086ee5f92ee7d79a2eab
2019-11-27 13:19:11 -08:00
sdong 6123611c42 crash_test: use large max_manifest_file_size most of the time. (#6034)
Summary:
Right now, crash_test always uses 16KB max_manifest_file_size value. It is good to cover logic of manifest file switch. However, information stored in manifest files might be useful in debugging failures. Switch to only use small manifest file size in 1/15 of the time.
Pull Request resolved: https://github.com/facebook/rocksdb/pull/6034

Test Plan: Observe command generated by db_crash_test.py multiple times and see the --max_manifest_file_size value distribution.

Differential Revision: D18513824

fbshipit-source-id: 7b3ae6dbe521a0918df41064e3fa5ecbf2466e04
2019-11-14 14:01:06 -08:00
sdong 5b656584af crash_test: disable periodic compaction in FIFO compaction. (#5993)
Summary:
A recent commit make periodic compaction option valid in FIFO, which means TTL. But we fail to disable it in crash test, causing assert failure. Fix it by having it disabled.
Pull Request resolved: https://github.com/facebook/rocksdb/pull/5993

Test Plan: Restart "make crash_test" many times and make sure --periodic_compaction_seconds=0 is always the case when --compaction_style=2

Differential Revision: D18263223

fbshipit-source-id: c91a802017d83ae89ac43827d1b0012861933814
2019-10-31 17:28:03 -07:00
sdong 0337d87b42 crash_test: disable atomic flush with pipelined write (#5986)
Summary:
Recently, pipelined write is enabled even if atomic flush is enabled, which causing sanitizing failure in db_stress. Revert this change.
Pull Request resolved: https://github.com/facebook/rocksdb/pull/5986

Test Plan: Run "make crash_test_with_atomic_flush" and see it to run for some while so that the old sanitizing error (which showed up quickly) doesn't show up.

Differential Revision: D18228278

fbshipit-source-id: 27fdf2f8e3e77068c9725a838b9bef4ab25a2553
2019-10-30 11:36:55 -07:00
sdong a3960fc875 Move pipeline write waiting logic into WaitForPendingWrites() (#5716)
Summary:
In pipeline writing mode, memtable switching needs to wait for memtable writing to finish to make sure that when memtables are made immutable, inserts are not going to them. This is currently done in DBImpl::SwitchMemtable(). This is done after flush_scheduler_.TakeNextColumnFamily() is called to fetch the list of column families to switch. The function flush_scheduler_.TakeNextColumnFamily() itself, however, is not thread-safe when being called together with flush_scheduler_.ScheduleFlush().
This change provides a fix, which moves the waiting logic before flush_scheduler_.TakeNextColumnFamily(). WaitForPendingWrites() is a natural place where the logic can happen.
Pull Request resolved: https://github.com/facebook/rocksdb/pull/5716

Test Plan: Run all tests with ASAN and TSAN.

Differential Revision: D18217658

fbshipit-source-id: b9c5e765c9989645bf10afda7c5c726c3f82f6c3
2019-10-29 18:16:36 -07:00
Maysam Yabandeh a6e615a7ba Enable partitioned index/filter in stress tests (#5918)
Summary:
This is the 3rd attempt after the revert of https://github.com/facebook/rocksdb/issues/4020 and https://github.com/facebook/rocksdb/issues/5895
The last bug is fixed https://github.com/facebook/rocksdb/pull/5907
Pull Request resolved: https://github.com/facebook/rocksdb/pull/5918

Test Plan:
```
make -j32 crash_test
```

Differential Revision: D17909489

Pulled By: maysamyabandeh

fbshipit-source-id: 7dfb8cf998c2d295c86465dd21734593d277887e
2019-10-14 10:35:18 -07:00
Yanqin Jin bc8b05cb77 Revert "Enable partitioned index/filter in stress tests (#5895)" (#5904)
Summary:
This reverts commit 2f4e288143.
Pull Request resolved: https://github.com/facebook/rocksdb/pull/5904

Differential Revision: D17871282

Pulled By: riversand963

fbshipit-source-id: d210725f8f3b26d8eac25892094da09d9694337e
2019-10-10 19:19:39 -07:00
Maysam Yabandeh 2f4e288143 Enable partitioned index/filter in stress tests (#5895)
Summary:
This is the 2nd attempt after the revert of https://github.com/facebook/rocksdb/pull/4020
Pull Request resolved: https://github.com/facebook/rocksdb/pull/5895

Test Plan:
```
./tools/db_crashtest.py blackbox --simple --interval=10 --max_key=10000000
```

Differential Revision: D17822137

Pulled By: maysamyabandeh

fbshipit-source-id: 3d148c0d8cc129080410ff859c04b544223c8ea3
2019-10-08 16:50:21 -07:00
sdong 679a45d0cb crash_test to do some verification for prefix extractor and iterator bounds. (#5846)
Summary:
For now, crash_test is not able to report any failure for the logic related to iterator upper, lower bounds or iterators, or reseek. These are features prone to errors. Improve db_stress in several ways:
(1) For each iterator run, reseek up to 3 times.
(2) For every iterator, create control iterator with upper or lower bound, with total order seek. Compare the results with the iterator.
(3) Make simple crash test to avoid prefix size to have more coverage.
(4) make prefix_size = 0 a valid size and -1 to indicate disabling prefix extractor.
Pull Request resolved: https://github.com/facebook/rocksdb/pull/5846

Test Plan: Manually hack the code to create wrong results and see they are caught by the tool.

Differential Revision: D17631760

fbshipit-source-id: acd460a177bd2124a5ffd7fff490702dba63030b
2019-09-27 11:10:44 -07:00
Maysam Yabandeh 6ec6a4a9a4 Remove snap_refresh_nanos option (#5826)
Summary:
The snap_refresh_nanos option didn't bring much benefit. Remove the feature to simplify the code.
Pull Request resolved: https://github.com/facebook/rocksdb/pull/5826

Differential Revision: D17467147

Pulled By: maysamyabandeh

fbshipit-source-id: 4f950b046990d0d1292d7fc04c2ccafaf751c7f0
2019-09-18 20:26:04 -07:00
Levi Tamasi 94d62d771e Temporarily disable partitioned index/filter in stress test (#5811)
Summary:
PR https://github.com/facebook/rocksdb/issues/4020 enabled partitioned indexes/filters in stress tests; however,
this causes assertion failures in BatchedOpsStressTest. This patch
disables them until we can root cause the failures.
Pull Request resolved: https://github.com/facebook/rocksdb/pull/5811

Test Plan: Ran the script and made sure it only uses the binary search index.

Differential Revision: D17399366

Pulled By: ltamasi

fbshipit-source-id: adb116e6297f9c6ccd7ac15b6a16c9aa91f21ac5
2019-09-16 11:41:35 -07:00
Levi Tamasi d35ffd569c Temporarily disable hash index in stress tests (#5792)
Summary:
PR https://github.com/facebook/rocksdb/issues/4020 implicitly enabled the hash index as well in stress/crash
tests, resulting in assertion failures in Block. This patch disables
the hash index until we can pinpoint the root cause of these issues.
Pull Request resolved: https://github.com/facebook/rocksdb/pull/5792

Test Plan:
Ran tools/db_crashtest.py and made sure it only uses index types 0 and 2
(binary search and partitioned index).

Differential Revision: D17346777

Pulled By: ltamasi

fbshipit-source-id: b4318f37f1fda3ee1bbff4ef2c2f556ca9e6b551
2019-09-12 12:11:34 -07:00
Andrew Kryczka dd2a35f13f Support partitioned index and filters in stress/crash tests (#4020)
Summary:
- In `db_stress`, support choosing index type and whether to enable filter partitioning, and randomly set those options in crash test
- When partitioned filter is enabled by crash test, force partitioned index to also be enabled since it's a prerequisite
Pull Request resolved: https://github.com/facebook/rocksdb/pull/4020

Test Plan:
currently this is blocked on fixing the bug that crash test caught:

```
$ TEST_TMPDIR=/data/compaction_bench python ./tools/db_crashtest.py blackbox --simple --interval=10 --max_key=10000000
...
Verification failed for column family 0 key 937501: Value not found: NotFound:
Crash-recovery verification failed :(
```

Differential Revision: D8508683

Pulled By: maysamyabandeh

fbshipit-source-id: 0337e5d0558bcef26b1f3699f47265a2c1e99629
2019-09-11 14:13:38 -07:00
sdong 1daff8f85a crash_test to skip compaction TTL for FIFO compaction. (#5749)
Summary:
https://github.com/facebook/rocksdb/pull/5741 added compaction TTL to crash test, but it causes assertion fails for FIFO compaction. Disable this combination for now while we debug the assertion failure.
Pull Request resolved: https://github.com/facebook/rocksdb/pull/5749

Test Plan: Run crash test and observe that when compaction_style=2, compaction_ttl is always 0.

Differential Revision: D17078292

fbshipit-source-id: 446821a3b9739956094d5e4f9be1251a15b57f5d
2019-08-27 17:55:37 -07:00
sdong 1d6a10f52d Extend stress test to cover periodic compaction and compaction TTL (#5741)
Summary:
Covering periodic compaction and compaction TTL can help us expose potential issues. Add it there.
Randomly select value for these two options.
Pull Request resolved: https://github.com/facebook/rocksdb/pull/5741

Test Plan: Run crash_test and see the perameters generated.

Differential Revision: D17059515

fbshipit-source-id: 8213974846a0b6a22fc13be705825c9054d1d097
2019-08-26 15:03:25 -07:00
sdong d8a27d9331 Atomic Flush Crash Test also covers the case that WAL is enabled. (#5729)
Summary:
AtomicFlushStressTest is a powerful test, but right now we only run it for atomic_flush=true + disable_wal=true. We further extend it to the case where atomic_flush=false + disable_wal = false. All the workload generation and validation can stay the same.
Atomic flush crash test is also changed to switch between the two test scenarios. It makes the name "atomic flush crash test" out of sync from what it really does. We leave it as it is to avoid troubles with continous test set-up.
Pull Request resolved: https://github.com/facebook/rocksdb/pull/5729

Test Plan: Run "CRASH_TEST_KILL_ODD=188 TEST_TMPDIR=/dev/shm/ USE_CLANG=1 make whitebox_crash_test_with_atomic_flush", observe the settings used and see it passed.

Differential Revision: D16969791

fbshipit-source-id: 56e37487000ae631e31b0100acd7bdc441c04163
2019-08-22 16:32:55 -07:00
sdong 8e12638f3d Slightly adjust atomic white box test's kill odd (#5717)
Summary:
Atomic white box test's kill odd is the same as normal test. However, in the scenario that only WritableFileWriter::Append() is blacklisted, WritableFileWriter::Flush() dominates the killing odds. Normally, most of WritableFileWriter::Flush() are called in WAL writes, where every write triggers a WAL flush. In atomic test, WAL is disabled, so the kill happens less frequently than we antipated. In some rare cases, the kill didn't end up with happening (for reasons I still don't fully understand) and cause the stress test timeout.

If WAL is disabled, make the odds 5x likely to trigger.
Pull Request resolved: https://github.com/facebook/rocksdb/pull/5717

Test Plan: Run whitebox_crash_test_with_atomic_flush and whitebox_crash_test and observe the kill odds printed out.

Differential Revision: D16897237

fbshipit-source-id: cbf5d96f6fc0e980523d0f1f94bf4e72cdb82d1c
2019-08-19 10:51:59 -07:00
Yanqin Jin a78503bd6c Temporarily disable snapshot list refresh for atomic flush stress test (#5581)
Summary:
Atomic flush test started to fail after https://github.com/facebook/rocksdb/issues/5099. Then https://github.com/facebook/rocksdb/issues/5278 provided a fix after
which the same error occurred much less frequently. However it still occur
occasionally. Not sure what the root cause is. This PR disables the feature of
snapshot list refresh, and we should keep an eye on the failure in the future.
Pull Request resolved: https://github.com/facebook/rocksdb/pull/5581

Differential Revision: D16295985

Pulled By: riversand963

fbshipit-source-id: c9e62e65133c52c21b07097de359632ca62571e4
2019-07-22 14:38:16 -07:00
Tim Hatch a6a9213a36 Fix interpreter lines for files with python2-only syntax.
Reviewed By: lisroach

Differential Revision: D15362271

fbshipit-source-id: 48fab12ab6e55a8537b19b4623d2545ca9950ec5
2019-07-09 10:51:37 -07:00
Maysam Yabandeh f9842869cf Disable pipeline writes in stress test (#5445)
Summary:
The tsan crash tests are failing with a data race compliant with pipelined write option. Temporarily disable it until its concurrency issue are fixed.
Pull Request resolved: https://github.com/facebook/rocksdb/pull/5445

Differential Revision: D15783824

Pulled By: maysamyabandeh

fbshipit-source-id: 413a0c3230b86f524fc7eeea2cf8e8375406e65b
2019-06-12 11:12:36 -07:00
anand76 181bb43f08 Fix bugs in FilePickerMultiGet (#5292)
Summary:
This PR fixes a couple of bugs in FilePickerMultiGet that were causing db_stress test failures. The failures were caused by -
1. Improper handling of a key that matches the user key portion of an L0 file's largest key. In this case, the curr_index_in_curr_level file index in L0 for that key was getting incremented, but batch_iter_ was not advanced. By design, all keys in a batch are supposed to be checked against an L0 file before advancing to the next L0 file. Not advancing to the next key in the batch was causing a double increment of curr_index_in_curr_level due to the same key being processed again
2. Improper handling of a key that matches the user key portion of the largest key in the last file of L1 and higher. This was resulting in a premature end to the processing of the batch for that level when the next key in the batch is a duplicate. Typically, the keys in MultiGet will not be duplicates, but its good to handle that case correctly

Test -
asan_crash
make check
Pull Request resolved: https://github.com/facebook/rocksdb/pull/5292

Differential Revision: D15282530

Pulled By: anand1976

fbshipit-source-id: d1a6a86e0af273169c3632db22a44d79c66a581f
2019-05-09 13:18:00 -07:00
anand76 930bfa5750 Disable MultiGet from db_stress (#5284)
Summary:
Disable it for now until we can get stress tests to pass consistently.
Pull Request resolved: https://github.com/facebook/rocksdb/pull/5284

Differential Revision: D15230727

Pulled By: anand1976

fbshipit-source-id: 239baacdb3c4cd4fb7c4447f7582b9042501d752
2019-05-06 18:26:50 -07:00
Maysam Yabandeh 6a40ee5eb1 Refresh snapshot list during long compactions (2nd attempt) (#5278)
Summary:
Part of compaction cpu goes to processing snapshot list, the larger the list the bigger the overhead. Although the lifetime of most of the snapshots is much shorter than the lifetime of compactions, the compaction conservatively operates on the list of snapshots that it initially obtained. This patch allows the snapshot list to be updated via a callback if the compaction is taking long. This should let the compaction to continue more efficiently with much smaller snapshot list.
For simplicity, to avoid the feature is disabled in two cases: i) When more than one sub-compaction are sharing the same snapshot list, ii) when Range Delete is used in which the range delete aggregator has its own copy of snapshot list.
This fixes the reverted https://github.com/facebook/rocksdb/pull/5099 issue with range deletes.
Pull Request resolved: https://github.com/facebook/rocksdb/pull/5278

Differential Revision: D15203291

Pulled By: maysamyabandeh

fbshipit-source-id: fa645611e606aa222c7ce53176dc5bb6f259c258
2019-05-03 17:30:22 -07:00
anand76 434ccf2df4 Add option to use MultiGet in db_stress (#5264)
Summary:
The new option will pick a batch size randomly in the range 1-64. It will then space the keys in the batch by random intervals.
Pull Request resolved: https://github.com/facebook/rocksdb/pull/5264

Differential Revision: D15175522

Pulled By: anand1976

fbshipit-source-id: c16baa69d0f1ff4cf53c55c813ddd82c8aeb58fc
2019-05-01 23:06:56 -07:00
Andrew Kryczka b02d0c238d Init compression dict handle before reading meta-blocks (#5267)
Summary:
At least one of the meta-block loading functions (`ReadRangeDelBlock`)
uses the same block reading function (`NewDataBlockIterator`) as data
block reads, which means it uses the dictionary handle. However, the
dictionary handle was uninitialized while reading meta-blocks, causing
readers to receive an error. This situation was only noticed when
`cache_index_and_filter_blocks=true`.

This PR initializes the handle to null while reading meta-blocks to
prevent the error. It also adds support to `db_stress` /
`db_crashtest.py` for `cache_index_and_filter_blocks`.

Fixes #5263.
Pull Request resolved: https://github.com/facebook/rocksdb/pull/5267

Differential Revision: D15149264

Pulled By: maysamyabandeh

fbshipit-source-id: 991d38a306c62db5976778bfb050fa3cd4a0671b
2019-04-30 09:50:49 -07:00
Yanqin Jin 210b49cac9 Disable pipelined write in atomic flush stress test (#5266)
Summary:
Since currently pipelined write allows one thread to perform memtable writes
while another thread is traversing the `flush_scheduler_`, it will cause an
assertion failure in `FlushScheduler::Clear`. To unblock crash recoery tests,
we temporarily disable pipelined write when atomic flush is enabled.
Pull Request resolved: https://github.com/facebook/rocksdb/pull/5266

Differential Revision: D15142285

Pulled By: riversand963

fbshipit-source-id: a0c20fe4ac543e08feaed602414f982054df7831
2019-04-30 08:12:42 -07:00
Fosco Marotto 6c2bf9e916 Add copyright headers per FB open-source checkup tool. (#5199)
Summary:
internal task: T35568575
Pull Request resolved: https://github.com/facebook/rocksdb/pull/5199

Differential Revision: D14962794

Pulled By: gfosco

fbshipit-source-id: 93838ede6d0235eaecff90d200faed9a8515bbbe
2019-04-18 10:55:01 -07:00
Andrew Kryczka 2263f86901 exercise WAL recycling in crash test (#5070)
Summary:
Since this feature affects the WAL behavior, it seems important our crash-recovery tests cover it.
Pull Request resolved: https://github.com/facebook/rocksdb/pull/5070

Differential Revision: D14470085

Pulled By: miasantreble

fbshipit-source-id: 9b9682a718a926d57d055e0a5ec867efbd2eb9c1
2019-03-15 12:03:26 -07:00
Andrew Kryczka 1218704b61 Fix `compression_zstd_max_train_bytes` coverage in stress test (#4957)
Summary:
Previously `finalize_and_sanitize` function was always zeroing out `compression_zstd_max_train_bytes`. It was only supposed to do that when non-ZSTD compression was used. But since `--compression_type` was an unknown argument (i.e., one that `db_crashtest.py` does not recognize and blindly forwards to `db_stress`), `finalize_and_sanitize` could not tell whether ZSTD was used. This PR fixes it simply by making `--compression_type` a known argument with snappy as default (same as `db_stress`).
Pull Request resolved: https://github.com/facebook/rocksdb/pull/4957

Differential Revision: D13994302

Pulled By: ajkr

fbshipit-source-id: 1b0baea7331397822830970d3698642eb7a7df65
2019-02-11 14:56:39 -08:00
Andrew Kryczka 68d949b3e3 Enable DeleteRange in stress/crash tests (#4483)
Summary:
Set `delrangepercent=1` when `test_batches_snapshots=false`.
Pull Request resolved: https://github.com/facebook/rocksdb/pull/4483

Differential Revision: D10324361

Pulled By: ajkr

fbshipit-source-id: 0cde1f1504f9493408a0c6493b976d7e5f5b2d23
2018-12-18 13:42:49 -08:00
Andrew Kryczka 8d2b74d287 Refine db_stress params for atomic flush (#4781)
Summary:
Separate flag for enabling option from flag for enabling dedicated atomic stress test. I have found setting the former without setting the latter can detect different problems.
Pull Request resolved: https://github.com/facebook/rocksdb/pull/4781

Differential Revision: D13463211

Pulled By: ajkr

fbshipit-source-id: 054f777885b2dc7d5ea99faafa21d6537eee45fd
2018-12-13 22:10:38 -08:00
Yanqin Jin 912bbbbc72 Enable crash-recovery stress test for atomic flush (#4605)
Summary:
This PR adds test of atomic flush to our continuous stress tests.
Pull Request resolved: https://github.com/facebook/rocksdb/pull/4605

Differential Revision: D12840607

Pulled By: riversand963

fbshipit-source-id: 0da187572791a59530065a7952697c05b1197ad9
2018-10-30 14:03:36 -07:00
Zhongyi Xie 9b3cf908a6 add missing range in random.choice argument (#4397)
Summary:
This will fix the broken asan crash test:
> Traceback (most recent call last):
  File "tools/db_crashtest.py", line 384, in <module>
    main()
  File "tools/db_crashtest.py", line 368, in main
    parser.add_argument("--" + k, type=type(v() if callable(v) else v))
  File "tools/db_crashtest.py", line 59, in <lambda>
    "index_block_restart_interval": lambda: random.choice(1, 16),
TypeError: choice() takes exactly 2 arguments (3 given)
Pull Request resolved: https://github.com/facebook/rocksdb/pull/4397

Differential Revision: D9933041

Pulled By: miasantreble

fbshipit-source-id: 10998e5bc6b6a5cea3e4088b18465affc246e639
2018-09-19 12:13:20 -07:00
Maysam Yabandeh a0ebec3804 Extend crash test with index_block_restart_interval (#4383)
Summary:
The default for index_block_restart_interval is 1 but some use 16 in production. The patch extends crash test to test both values.
Pull Request resolved: https://github.com/facebook/rocksdb/pull/4383

Differential Revision: D9887304

Pulled By: maysamyabandeh

fbshipit-source-id: a8d00fea974a79ad563f9f4d9d7b069e9f746a8f
2018-09-18 15:43:29 -07:00
Andrew Kryczka 8c25204633 Support manual flush in stress/crash tests (#4368)
Summary:
- Made stress test call `Flush()` periodically according to `--flush_one_in` flag.
- Enabled by default in crash test.
Pull Request resolved: https://github.com/facebook/rocksdb/pull/4368

Differential Revision: D9838593

Pulled By: ajkr

fbshipit-source-id: fe5a6e49b36e5ea752acc3aa8be364f8ef34d9cc
2018-09-17 12:27:55 -07:00
Maysam Yabandeh d122025891 Extend stress test to format_version 4 (#4265)
Summary:
Stress tests currently cover format_version 2 and 3. The patch adds 4 as well.
Pull Request resolved: https://github.com/facebook/rocksdb/pull/4265

Differential Revision: D9323185

Pulled By: maysamyabandeh

fbshipit-source-id: 54d11e41ecae09bae14cadd7313f07c9a3db5a57
2018-08-14 14:13:33 -07:00
Andrew Kryczka 6175b4b294 Support dictionary compression in stress/crash tests (#4234)
Summary:
- Add `--compression_max_dict_bytes` and `--compression_zstd_max_train_bytes` flags to stress test
- Randomly enable/disable the above flags in crash test
- Set `--compression_type=zstd` in FB-specific crash test runs
Pull Request resolved: https://github.com/facebook/rocksdb/pull/4234

Differential Revision: D9187207

Pulled By: ajkr

fbshipit-source-id: 8d78cf8d8e1165f2cd1c32e069b73726b5bc1fd2
2018-08-06 15:27:29 -07:00
Siying Dong 4b0a43574a db_stress to cover upper bound in iterators (#4162)
Summary:
db_stress doesn't cover upper or lower bound in iterators. Try to cover it by randomly assigning a random one. Also in prefix scan tests, with 50% of the chance, set next prefix as the upper bound.
Pull Request resolved: https://github.com/facebook/rocksdb/pull/4162

Differential Revision: D8953507

Pulled By: siying

fbshipit-source-id: f0f04e9cb6c07cbebbb82b892ca23e0daeea708b
2018-07-23 10:45:29 -07:00
Peter (Stig) Edwards 2694b6dc26 Remove unused imports, from python scripts. (#4057)
Summary:
Also remove redefined variable.
As reported on https://lgtm.com/projects/g/facebook/rocksdb/
Closes https://github.com/facebook/rocksdb/pull/4057

Differential Revision: D8648342

Pulled By: ajkr

fbshipit-source-id: afd2ba84d1364d316010179edd44777e64ca9183
2018-06-26 12:43:04 -07:00
Andrew Kryczka 7f3a634e06 Support pipelined write in stress/crash tests
Summary: Closes https://github.com/facebook/rocksdb/pull/4019

Differential Revision: D8508681

Pulled By: ajkr

fbshipit-source-id: 23a3c07d642386446e322b02e69cdf70d12ef009
2018-06-19 09:14:12 -07:00
Andrew Kryczka 8585059ae0 Support backup and checkpoint in db_stress (#4005)
Summary:
Add the `backup_one_in` and `checkpoint_one_in` options to periodically trigger backups and checkpoints. The directory names contain thread ID to avoid clashing with parallel backups/checkpoints. Enable checkpoint in crash test so our CI runs will use it. Didn't enable backup in crash test since it copies all the files which is too slow.
Closes https://github.com/facebook/rocksdb/pull/4005

Differential Revision: D8472275

Pulled By: ajkr

fbshipit-source-id: ff91bdc37caac4ffd97aea8df96b3983313ac1d5
2018-06-18 19:28:18 -07:00
Andrew Kryczka de2c6fb158 Fix stderr processing in crash test (#4006)
Summary:
Fixed bug where `db_stress` output a line with a warning followed by a line with an error, and `db_crashtest.py` considered that a success. For example:

```
WARNING: prefix_size is non-zero but memtablerep != prefix_hash
open error: Corruption: SST file is ahead of WALs
```
Closes https://github.com/facebook/rocksdb/pull/4006

Differential Revision: D8473463

Pulled By: ajkr

fbshipit-source-id: 60461bdd7491d9d26c63f7d4ee522a0f88ba3de7
2018-06-18 17:58:13 -07:00
Andrew Kryczka 7497f992e0 Run manual compaction in stress/crash tests (#3936)
Summary:
- Add support to `db_stress` for `CompactRange`
- Enable `CompactRange` and `CompactFiles` in crash tests
Closes https://github.com/facebook/rocksdb/pull/3936

Differential Revision: D8230953

Pulled By: ajkr

fbshipit-source-id: 208f9980b5bc8c204b1fa726e83791ad674e21e8
2018-06-13 16:45:28 -07:00
Maysam Yabandeh d0c38c0c8c Extend some tests to format_version=3 (#3942)
Summary:
format_version=3 changes the format of SST index. This is however not being tested currently since tests only work with the default format_version which is currently 2. The patch extends the most related tests to also test for format_version=3.
Closes https://github.com/facebook/rocksdb/pull/3942

Differential Revision: D8238413

Pulled By: maysamyabandeh

fbshipit-source-id: 915725f55753dd8e9188e802bf471c23645ad035
2018-06-04 20:13:00 -07:00
Andrew Kryczka 4f297ad05f Fix crash test check for direct I/O
Summary:
We need to keep the DB directory around since the direct IO check in "db_crashtest.py" relies on it existing. This PR fixes an issue where it was removed after each stress test run during the second half of whitebox crash testing.
Closes https://github.com/facebook/rocksdb/pull/3946

Differential Revision: D8247998

Pulled By: ajkr

fbshipit-source-id: 4e7cffbdab9b40df125e7842d0d59916e76261d3
2018-06-03 21:42:12 -07:00
Andrew Kryczka 88c3ee2d31 Configure direct I/O statically in db_stress
Summary:
Previously `db_stress` attempted to configure direct I/O dynamically in `SetOptions()` which had multiple problems (ummm must've never been tested):

- It's a DB option so SetDBOptions should've been called instead
- It's not a dynamic option so even SetDBOptions would fail
- It required enabling SyncPoint to mask O_DIRECT since it had no way to detect whether the DB directory was in tmpfs or not. This required locking that consumed ~80% of db_stress CPU.

In this PR I delete the broken dynamic config and instead configure it statically, only enabling it if the DB directory truly supports O_DIRECT.
Closes https://github.com/facebook/rocksdb/pull/3939

Differential Revision: D8238120

Pulled By: ajkr

fbshipit-source-id: 60bb2deebe6c9b54a3f788079261715b4a229279
2018-06-01 16:42:34 -07:00
Andrew Kryczka d19f568abf Refactor argument handling in db_crashtest.py
Summary:
- Any options unknown to `db_crashtest.py` are now passed directly to `db_stress`. This way, we won't need to update `db_crashtest.py` every time `db_stress` gets a new option.
- Remove `db_crashtest.py` redundant arguments where the value is the same as `db_stress`'s default
- Remove `db_crashtest.py` redundant arguments where the value is the same in a previously applied options map. For example, default_params are always applied before whitebox_default_params, so if they require the same value for an argument, that value only needs to be provided in default_params.
- Made the simple option maps applied in addition to the regular option maps. Previously they were exclusive which led to lots of duplication
Closes https://github.com/facebook/rocksdb/pull/3809

Differential Revision: D7885779

Pulled By: ajkr

fbshipit-source-id: 3a3243b55724d6d5bff36e939b582b9b62c538a8
2018-05-09 13:42:41 -07:00
Andrew Kryczka 46152d53bf Second attempt at db_stress crash-recovery verification
Summary:
- Original commit: a4fb1f8c04
- Revert commit (we reverted as a quick fix to get crash tests passing): 6afe22db2e

This PR includes the contents of the original commit plus two bug fixes, which are:

- In whitebox crash test, only set `--expected_values_path` for `db_stress` runs in the first half of the crash test's duration. In the second half, a fresh DB is created for each `db_stress` run, so we cannot maintain expected state across `db_stress` runs.
- Made `Exists()` return true for `UNKNOWN_SENTINEL` values. I previously had an assert in `Exists()` that value was not `UNKNOWN_SENTINEL`. But it is possible for post-crash-recovery expected values to be `UNKNOWN_SENTINEL` (i.e., if the crash happens in the middle of an update), in which case this assertion would be tripped. The effect of returning true in this case is there may be cases where a `SingleDelete` deletes no data. But if we had returned false, the effect would be calling `SingleDelete` on a key with multiple older versions, which is not supported.
Closes https://github.com/facebook/rocksdb/pull/3793

Differential Revision: D7811671

Pulled By: ajkr

fbshipit-source-id: 67e0295bfb1695ff9674837f2e05bb29c50efc30
2018-04-30 12:27:34 -07:00
Andrew Kryczka 6afe22db2e revert db_stress crash-recovery verification
Summary:
crash-recovery verification is failing in the whitebox testing, which may or may not be a valid correctness issue -- need more time to investigate. In the meantime, reverting so we don't mask other failures.
Closes https://github.com/facebook/rocksdb/pull/3786

Differential Revision: D7794516

Pulled By: ajkr

fbshipit-source-id: 28ccdfdb9ec9b3b0fb08c15cbf9d2e282201ff33
2018-04-27 12:57:01 -07:00
Andrew Kryczka db36f222d8 Allow options file in db_stress and db_crashtest
Summary:
- When options file is provided to db_stress, take supported options from the file instead of from flags
- Call `BuildOptionsTable` after `Open` so it can use `options_` once it has been populated either from flags or from file
- Allow options filename to be passed via `db_crashtest.py`
Closes https://github.com/facebook/rocksdb/pull/3768

Differential Revision: D7755331

Pulled By: ajkr

fbshipit-source-id: 5205cc5deb0d74d677b9832174153812bab9a60a
2018-04-26 18:42:07 -07:00
Andrew Kryczka a4fb1f8c04 Add crash-recovery correctness check to db_stress
Summary:
Previously, our `db_stress` tool held the expected state of the DB in-memory, so after crash-recovery, there was no way to verify data correctness. This PR adds an option, `--expected_values_file`, which specifies a file holding the expected values.

In black-box testing, the `db_stress` process can be killed arbitrarily, so updates to the `--expected_values_file` must be atomic. We achieve this by `mmap`ing the file and relying on `std::atomic<uint32_t>` for atomicity. Actually this doesn't provide a total guarantee on what we want as `std::atomic<uint32_t>` could, in theory, be translated into multiple stores surrounded by a mutex. We can verify our assumption by looking at `std::atomic::is_always_lock_free`.

For the `mmap`'d file, we didn't have an existing way to expose its contents as a raw memory buffer. This PR adds it in the `Env::NewMemoryMappedFileBuffer` function, and `MemoryMappedFileBuffer` class.

`db_crashtest.py` is updated to use an expected values file for black-box testing. On the first iteration (when the DB is created), an empty file is provided as `db_stress` will populate it when it runs. On subsequent iterations, that same filename is provided so `db_stress` can check the data is as expected on startup.
Closes https://github.com/facebook/rocksdb/pull/3629

Differential Revision: D7463144

Pulled By: ajkr

fbshipit-source-id: c8f3e82c93e045a90055e2468316be155633bd8b
2018-04-24 15:58:22 -07:00
Andrew Kryczka b058a33705 Reduce default --nooverwritepercent in black-box crash tests
Summary:
Previously `python tools/db_crashtest.py blackbox` would do no useful work as the crash interval (two minutes) was shorter than the preparation phase. The preparation phase is slow because of the ridiculously inefficient way it computes which keys should not be overwritten. It was doing this for 60M keys since default values were `FLAGS_nooverwritepercent == 60` and `FLAGS_max_key == 100000000`.

Move the "nooverwritepercent" override from whitebox-specific to the general options so it also applies to blackbox test runs. Now preparation phase takes a few seconds.
Closes https://github.com/facebook/rocksdb/pull/3671

Differential Revision: D7457732

Pulled By: ajkr

fbshipit-source-id: 601f4461a6a7e49e50449dcf15aebc9b8a98d6f0
2018-04-03 15:28:40 -07:00
Mark Isaacson b8eb32f8cf Suppress lint in old files
Summary: Grandfather in super old lint issues to make a clean slate for moving forward that allows us to have stronger enforcement on new issues.

Reviewed By: yiwu-arbug

Differential Revision: D6821806

fbshipit-source-id: 22797d31ec58e9eb0255d3b66fedfcfcb0dc127c
2018-01-29 12:56:42 -08:00
Andrew Kryczka d75793d6b4 db_stress support long-held snapshots
Summary:
Add options to `db_stress` (correctness testing tool) to randomly acquire snapshot and release it after some period of time. It's useful for correctness testing of #3009, as well as other parts of compaction that behave differently depending on which snapshots are held.
Closes https://github.com/facebook/rocksdb/pull/3038

Differential Revision: D6086501

Pulled By: ajkr

fbshipit-source-id: 3ec0d8666c78ac507f1f808887c4ff759ba9b865
2017-10-20 15:26:59 -07:00