Commit Graph

259 Commits

Author SHA1 Message Date
Levi Tamasi 0288bdbc53 Add the integrated BlobDB to the stress/crash tests (#7900)
Summary:
The patch adds support for the options related to the new BlobDB implementation
to `db_stress`, including support for dynamically adjusting them using `SetOptions`
when `set_options_one_in` and a new flag `allow_setting_blob_options_dynamically`
are specified. (The latter is used to prevent the options from being enabled when
incompatible features are in use.)

The patch also updates the `db_stress` help messages of the existing stacked BlobDB
related options to clarify that they pertain to the old implementation. In addition, it
adds the new BlobDB to the crash test script. In order to prevent a combinatorial explosion
of jobs and still perform whitebox/blackbox testing (including under ASAN/TSAN/UBSAN),
and to also test BlobDB in conjunction with atomic flush and transactions, the script sets
the BlobDB options in 10% of normal/`cf_consistency`/`txn` crash test runs.

Pull Request resolved: https://github.com/facebook/rocksdb/pull/7900

Test Plan: Ran `make check` and `db_stress`/`db_crashtest.py` with various options.

Reviewed By: jay-zhuang

Differential Revision: D26094913

Pulled By: ltamasi

fbshipit-source-id: c2ef3391a05e43a9687f24e297df05f4a5584814
2021-02-02 11:41:18 -08:00
Andrew Kryczka 78ee8564ad Integrity protection for live updates to WriteBatch (#7748)
Summary:
This PR adds the foundation classes for key-value integrity protection and the first use case: protecting live updates from the source buffers added to `WriteBatch` through the destination buffer in `MemTable`. The width of the protection info is not yet configurable -- only eight bytes per key is supported. This PR allows users to enable protection by constructing `WriteBatch` with `protection_bytes_per_key == 8`. It does not yet expose a way for users to get integrity protection via other write APIs (e.g., `Put()`, `Merge()`, `Delete()`, etc.).

The foundation classes (`ProtectionInfo.*`) embed the coverage info in their type, and provide `Protect.*()` and `Strip.*()` functions to navigate between types with different coverage. For making bytes per key configurable (for powers of two up to eight) in the future, these classes are templated on the unsigned integer type used to store the protection info. That integer contains the XOR'd result of hashes with independent seeds for all covered fields. For integer fields, the hash is computed on the raw unadjusted bytes, so the result is endian-dependent. The most significant bytes are truncated when the hash value (8 bytes) is wider than the protection integer.

When `WriteBatch` is constructed with `protection_bytes_per_key == 8`, we hold a `ProtectionInfoKVOTC` (i.e., one that covers key, value, optype aka `ValueType`, timestamp, and CF ID) for each entry added to the batch. The protection info is generated from the original buffers passed by the user, as well as the original metadata generated internally. When writing to memtable, each entry is transformed to a `ProtectionInfoKVOTS` (i.e., dropping coverage of CF ID and adding coverage of sequence number), since at that point we know the sequence number, and have already selected a memtable corresponding to a particular CF. This protection info is verified once the entry is encoded in the `MemTable` buffer.

Pull Request resolved: https://github.com/facebook/rocksdb/pull/7748

Test Plan:
- an integration test to verify a wide variety of single-byte changes to the encoded `MemTable` buffer are caught
- add to stress/crash test to verify it works in variety of configs/operations without intentional corruption
- [deferred] unit tests for `ProtectionInfo.*` classes for edge cases like KV swap, `SliceParts` and `Slice` APIs are interchangeable, etc.

Reviewed By: pdillinger

Differential Revision: D25754492

Pulled By: ajkr

fbshipit-source-id: e481bac6c03c2ab268be41359730f1ceb9964866
2021-01-29 12:18:58 -08:00
Zhichao Cao 04b3524ad0 Inject the random write error to stress test (#7653)
Summary:
Inject the random write error to stress test, it requires set reopen=0 and disable_wal=true.

Pull Request resolved: https://github.com/facebook/rocksdb/pull/7653

Test Plan: pass db_stress and python3 db_crashtest.py blackbox

Reviewed By: ajkr

Differential Revision: D25354132

Pulled By: zhichao-cao

fbshipit-source-id: 44721104eecb416e27f65f854912c40e301dd669
2020-12-17 11:52:28 -08:00
Peter Dillinger 60af964372 Experimental (production candidate) SST schema for Ribbon filter (#7658)
Summary:
Added experimental public API for Ribbon filter:
NewExperimentalRibbonFilterPolicy(). This experimental API will
take a "Bloom equivalent" bits per key, and configure the Ribbon
filter for the same FP rate as Bloom would have but ~30% space
savings. (Note: optimize_filters_for_memory is not yet implemented
for Ribbon filter. That can be added with no effect on schema.)

Internally, the Ribbon filter is configured using a "one_in_fp_rate"
value, which is 1 over desired FP rate. For example, use 100 for 1%
FP rate. I'm expecting this will be used in the future for configuring
Bloom-like filters, as I expect people to more commonly hold constant
the filter accuracy and change the space vs. time trade-off, rather than
hold constant the space (per key) and change the accuracy vs. time
trade-off, though we might make that available.

### Benchmarking

```
$ ./filter_bench -impl=2 -quick -m_keys_total_max=200 -average_keys_per_filter=100000 -net_includes_hashing
Building...
Build avg ns/key: 34.1341
Number of filters: 1993
Total size (MB): 238.488
Reported total allocated memory (MB): 262.875
Reported internal fragmentation: 10.2255%
Bits/key stored: 10.0029
----------------------------
Mixed inside/outside queries...
  Single filter net ns/op: 18.7508
  Random filter net ns/op: 258.246
    Average FP rate %: 0.968672
----------------------------
Done. (For more info, run with -legend or -help.)
$ ./filter_bench -impl=3 -quick -m_keys_total_max=200 -average_keys_per_filter=100000 -net_includes_hashing
Building...
Build avg ns/key: 130.851
Number of filters: 1993
Total size (MB): 168.166
Reported total allocated memory (MB): 183.211
Reported internal fragmentation: 8.94626%
Bits/key stored: 7.05341
----------------------------
Mixed inside/outside queries...
  Single filter net ns/op: 58.4523
  Random filter net ns/op: 363.717
    Average FP rate %: 0.952978
----------------------------
Done. (For more info, run with -legend or -help.)
```

168.166 / 238.488 = 0.705  -> 29.5% space reduction

130.851 / 34.1341 = 3.83x construction time for this Ribbon filter vs. lastest Bloom filter (could make that as little as about 2.5x for less space reduction)

### Working around a hashing "flaw"

bloom_test discovered a flaw in the simple hashing applied in
StandardHasher when num_starts == 1 (num_slots == 128), showing an
excessively high FP rate.  The problem is that when many entries, on the
order of number of hash bits or kCoeffBits, are associated with the same
start location, the correlation between the CoeffRow and ResultRow (for
efficiency) can lead to a solution that is "universal," or nearly so, for
entries mapping to that start location. (Normally, variance in start
location breaks the effective association between CoeffRow and
ResultRow; the same value for CoeffRow is effectively different if start
locations are different.) Without kUseSmash and with num_starts > 1 (thus
num_starts ~= num_slots), this flaw should be completely irrelevant.  Even
with 10M slots, the chances of a single slot having just 16 (or more)
entries map to it--not enough to cause an FP problem, which would be local
to that slot if it happened--is 1 in millions. This spreadsheet formula
shows that: =1/(10000000*(1 - POISSON(15, 1, TRUE)))

As kUseSmash==false (the setting for Standard128RibbonBitsBuilder) is
intended for CPU efficiency of filters with many more entries/slots than
kCoeffBits, a very reasonable work-around is to disallow num_starts==1
when !kUseSmash, by making the minimum non-zero number of slots
2*kCoeffBits. This is the work-around I've applied. This also means that
the new Ribbon filter schema (Standard128RibbonBitsBuilder) is not
space-efficient for less than a few hundred entries. Because of this, I
have made it fall back on constructing a Bloom filter, under existing
schema, when that is more space efficient for small filters. (We can
change this in the future if we want.)

TODO: better unit tests for this case in ribbon_test, and probably
update StandardHasher for kUseSmash case so that it can scale nicely to
small filters.

### Other related changes

* Add Ribbon filter to stress/crash test
* Add Ribbon filter to filter_bench as -impl=3
* Add option string support, as in "filter_policy=experimental_ribbon:5.678;"
where 5.678 is the Bloom equivalent bits per key.
* Rename internal mode BloomFilterPolicy::kAuto to kAutoBloom
* Add a general BuiltinFilterBitsBuilder::CalculateNumEntry based on
binary searching CalculateSpace (inefficient), so that subclasses
(especially experimental ones) don't have to provide an efficient
implementation inverting CalculateSpace.
* Minor refactor FastLocalBloomBitsBuilder for new base class
XXH3pFilterBitsBuilder shared with new Standard128RibbonBitsBuilder,
which allows the latter to fall back on Bloom construction in some
extreme cases.
* Mostly updated bloom_test for Ribbon filter, though a test like
FullBloomTest::Schema is a next TODO to ensure schema stability
(in case this becomes production-ready schema as it is).
* Add some APIs to ribbon_impl.h for configuring Ribbon filters.
Although these are reasonably covered by bloom_test, TODO more unit
tests in ribbon_test
* Added a "tool" FindOccupancyForSuccessRate to ribbon_test to get data
for constructing the linear approximations in GetNumSlotsFor95PctSuccess.

Pull Request resolved: https://github.com/facebook/rocksdb/pull/7658

Test Plan:
Some unit tests updated but other testing is left TODO. This
is considered experimental but laying down schema compatibility as early
as possible in case it proves production-quality. Also tested in
stress/crash test.

Reviewed By: jay-zhuang

Differential Revision: D24899349

Pulled By: pdillinger

fbshipit-source-id: 9715f3e6371c959d923aea8077c9423c7a9f82b8
2020-11-12 20:46:14 -08:00
Akanksha Mahajan 202605143b Fix crash test to run in DEBUG_LEVEL=0 mode in tmpfs (#7643)
Summary:
crash tests donot run in DEBUG_MODE=0 on tmpfs when
use_direct_reads/use_direct_io_for_flush_and_compaction is set randomly because
direct I/O is not supported on tmpfs and tests exit.

Fix: Sanitize direct I/O read options in DEBUG_LEVEL=0 so that crash
tests can run in tmpfs. When mmap_reads is set, direct I/O reads options are
unset so we can sanitize direct I/O reads options in case of tmpfs as well.

Pull Request resolved: https://github.com/facebook/rocksdb/pull/7643

Test Plan:
1. export DEBUG_LEVEL=0; export TEST_TMPDIR="/dev/shm";
           export CRASH_TEST_EXT_ARGS="--use_direct_reads=1 --mmap_read=0";
           make crash_test -j64
           2. In DEBUG_LEVEL=1 mode:  make crash_test -j64

Reviewed By: jay-zhuang

Differential Revision: D24766550

Pulled By: akankshamahajan15

fbshipit-source-id: 021720b2343c12c72004f84b26147625d3991d9e
2020-11-10 10:50:34 -08:00
Akanksha Mahajan 06a92fcf5c Add "max_write_buffer_size_to_maintain" to crash test (#7634)
Summary:
Add "max_write_buffer_size_to_maintain" to crash test

Pull Request resolved: https://github.com/facebook/rocksdb/pull/7634

Test Plan: make crash_test -j64

Reviewed By: zhichao-cao

Differential Revision: D24710401

Pulled By: akankshamahajan15

fbshipit-source-id: 89e0412aaa56b2ef5a75603971b82f4b0b494ab7
2020-11-03 13:55:18 -08:00
Jay Zhuang 9e03e4dd52 db_crashtest preserves expected values file for failed crash tests (#7534)
Summary:
If crash test fails, don't delete the `expected_values_file` for later
debug. More details: https://github.com/facebook/rocksdb/issues/7530

Pull Request resolved: https://github.com/facebook/rocksdb/pull/7534

Test Plan: local host

Reviewed By: ajkr

Differential Revision: D24239655

Pulled By: jay-zhuang

fbshipit-source-id: 3566f91a30aae1e27d2f51d910cddd08edb7d4cf
2020-10-12 14:10:14 -07:00
Andrew Kryczka 75d3b6fdf0 Redesign block cache pinning API (#7520)
Summary:
The old flag-based APIs (`BlockBasedTableOptions::pin_l0_filter_and_index_blocks_in_cache` and `BlockBasedTableOptions::pin_top_level_index_and_filter`) were insufficient for our needs. For example, it was impossible to pin only unpartitioned meta-blocks, which could prevent block cache contention when turning on dictionary compression or during a migration to partitioned indexes/filters. It was also impossible to pin all meta-blocks in memory while having predictable memory usage via block cache. If we had continued adding flags to address these scenarios, they would have had significant overlap causing confusion. Instead, this PR deprecates the flags and starts a new API with non-overlapping options.

Pull Request resolved: https://github.com/facebook/rocksdb/pull/7520

Test Plan:
- new unit test
- added new options to stress/crash test and ran for a while: `$ python tools/db_crashtest.py blackbox --simple --max_key=1000000 -write_buffer_size=1048576 -target_file_size_base=1048576 -max_bytes_for_level_base=4194304 --interval=10 -value_size_mult=33 -column_families=1 -reopen=0`

Reviewed By: pdillinger

Differential Revision: D24200034

Pulled By: ajkr

fbshipit-source-id: 3fa7cfc71e7960f7a867511dd6ae5834dd73b13e
2020-10-11 14:58:24 -07:00
sdong 82d42606ad Enable paranoid_file_checks in crash test (#7489)
Summary:
Cover paranoid_file_checks in crash test.

Pull Request resolved: https://github.com/facebook/rocksdb/pull/7489

Test Plan: Run crash tests for hours and didn't see any failure.

Reviewed By: ajkr

Differential Revision: D24063868

fbshipit-source-id: 7b48b110e66ce78ae5d0c99a9f32af86edd34c1e
2020-10-09 08:32:22 -07:00
sdong aedcaaef99 Stress test to support paranoid_file_checks (#7473)
Summary:
It's important to make sure no false positive is reported when options.paranoid_file_checks is used. Add it to stress test and a place holder in crash test. It is disabled in crash test as there appears to be a bug causing false positive.

Pull Request resolved: https://github.com/facebook/rocksdb/pull/7473

Test Plan: Run crash test

Reviewed By: ajkr

Differential Revision: D24026939

fbshipit-source-id: 89102acb45cf041776775ce44a4eef4b0f3a380c
2020-09-30 14:41:33 -07:00
Peter Dillinger 4e258d3e63 Fix backup/restore in stress/crash test (#7357)
Summary:
(1) Skip check on specific key if restoring an old backup
(small minority of cases) because it can fail in those cases. (2) Remove
an old assertion about number of column families and number of keys
passed in, which is broken by atomic flush (cf_consistency) test. Like
other code (for better or worse) assume a single key and iterate over
column families. (3) Apply mock_direct_io to NewSequentialFile so that
db_stress backup works on /dev/shm.

Also add more context to output in case of backup/restore db_stress
failure.

Also a minor fix to BackupEngine to report first failure status in
creating new backup, and drop another clue about the potential
source of a "Backup failed" status.

Reverts "Disable backup/restore stress test (https://github.com/facebook/rocksdb/issues/7350)"

Pull Request resolved: https://github.com/facebook/rocksdb/pull/7357

Test Plan:
Using backup_one_in=10000,
"USE_CLANG=1 make crash_test_with_atomic_flush" for 30+ minutes
"USE_CLANG=1 make blackbox_crash_test" for 30+ minutes
And with use_direct_reads with TEST_TMPDIR=/dev/shm/rocksdb

Reviewed By: riversand963

Differential Revision: D23567244

Pulled By: pdillinger

fbshipit-source-id: e77171c2e8394d173917e36898c02dead1c40b77
2020-09-08 10:50:19 -07:00
Jay Zhuang ef32f11004 Disable backup/restore stress test (#7350)
Summary:
Seems it's causing some tests failures.

Pull Request resolved: https://github.com/facebook/rocksdb/pull/7350

Reviewed By: riversand963

Differential Revision: D23544109

Pulled By: jay-zhuang

fbshipit-source-id: 798a0ca374a20b6c2d0f29582729ff101c6a2e99
2020-09-04 11:58:18 -07:00
Peter Dillinger 06ad5dd293 Add file checksum to stress/crash test (#7343)
Summary:
This change has the crash test randomly select from a few file
checksum implementations, or nullptr, for DB file_checksum_gen_factory.
For compatibility across runs on same DB, each non-null factory can
understand all the other functions, but the default changes.

Pull Request resolved: https://github.com/facebook/rocksdb/pull/7343

Test Plan:
'make blackbox_crash_test' for a while, including with some
debug output to ensure code is being exercised.

Reviewed By: zhichao-cao

Differential Revision: D23494580

Pulled By: pdillinger

fbshipit-source-id: 73bbc7ca32c1adaf619134c0c830f12894880b8a
2020-09-03 23:50:33 -07:00
Peter Dillinger 499c9448d0 Fix, enable, and enhance backup/restore in db_stress (#7348)
Summary:
Although added to db_stress, testing of backup/restore
was never integrated into the crash test, originally concerned about
performance. I've enabled it now and to address the peformance concern,
testing backup/restore is always skipped once the db exceeds a certain
size threshold, default 100MB. This should provide sufficient
opportunity for testing BackupEngine without bogging down everything
else with heavier and heavier operations.

Also fixed backup/restore in db_stress by making sure PurgeOldBackups
can remove manifest files, which are normally kept around for db_stress.

Added more coverage of backup options, and up to three backups being
saved in one backup directory (in some cases).

Pull Request resolved: https://github.com/facebook/rocksdb/pull/7348

Test Plan:
ran 'make blackbox_crash_test' for a while, with heightened
probabilitly of taking backups (1/10k). Also confirmed with some debug
output that the code is being covered, TestBackupRestore only takes
a few seconds to complete when triggered, and even at 1/10k and ~50MB
database, there's <,~ 1 thread testing backups at any time.

Reviewed By: ajkr

Differential Revision: D23510835

Pulled By: pdillinger

fbshipit-source-id: b6b8735591808141f81f10773ac31634cf03b6c0
2020-09-03 20:13:15 -07:00
Andrew Kryczka 7eebe6d38a Mark files for compaction in stress/crash tests (#7231)
Summary:
The mechanism to mark files for compaction is most commonly used in
delete-triggered compaction. This PR adds an option to exercise the
marking mechanism on random files created by db_stress. This PR also
enables that option in db_crashtest.py on its db_stress runs at random.

Pull Request resolved: https://github.com/facebook/rocksdb/pull/7231

Test Plan:
- ran some minified crash tests; verified they succeed and we see `"compaction_reason": "FilesMarkedForCompaction"` regularly in the logs.

```
$ TEST_TMPDIR=/dev/shm python tools/db_crashtest.py blackbox --duration=600 --interval=30 --max_key=10000000 --write_buffer_size=1048576 --target_file_size_base=1048576 --max_bytes_for_level_base=4194304 --value_size_mult=33
$ TEST_TMPDIR=/dev/shm python tools/db_crashtest.py whitebox --duration=600 --interval=30 --max_key=1000000 --write_buffer_size=1048576 --target_file_size_base=1048576 --max_bytes_for_level_base=4194304 --value_size_mult=33 --random_kill_odd=8887
```

Reviewed By: anand1976

Differential Revision: D23025156

Pulled By: ajkr

fbshipit-source-id: a404c467ebc12afa94dae35956ea9b372f592a96
2020-08-10 16:17:56 -07:00
Jay Zhuang fc4d5f5065 Add stress test for GetProperty (#7111)
Summary:
Add stress test coverage for `DB::GetProperty()`.

Pull Request resolved: https://github.com/facebook/rocksdb/pull/7111

Test Plan:
```
./db_stress -get_property_one_in=1
make crash_test
```

Reviewed By: ajkr

Differential Revision: D22487906

Pulled By: jay-zhuang

fbshipit-source-id: c118d95cc9b4e2fa669a06e6aa531541fa885dc5
2020-07-14 12:12:36 -07:00
Andrew Kryczka 8efb5cfb6f add SstFileManager to crash test (#6993)
Summary:
SstFileManager is already supported in the stress test as of https://github.com/facebook/rocksdb/issues/6454. This
PR enables the SstFileManager in some of the crash test runs.
Pull Request resolved: https://github.com/facebook/rocksdb/pull/6993

Reviewed By: riversand963

Differential Revision: D22084406

Pulled By: ajkr

fbshipit-source-id: 78b8642682e7570ff6ec3a1c3ccd9940f4362289
2020-06-23 16:27:20 -07:00
Peter Dillinger 5b2bbacb6f Minimize memory internal fragmentation for Bloom filters (#6427)
Summary:
New experimental option BBTO::optimize_filters_for_memory builds
filters that maximize their use of "usable size" from malloc_usable_size,
which is also used to compute block cache charges.

Rather than always "rounding up," we track state in the
BloomFilterPolicy object to mix essentially "rounding down" and
"rounding up" so that the average FP rate of all generated filters is
the same as without the option. (YMMV as heavily accessed filters might
be unluckily lower accuracy.)

Thus, the option near-minimizes what the block cache considers as
"memory used" for a given target Bloom filter false positive rate and
Bloom filter implementation. There are no forward or backward
compatibility issues with this change, though it only works on the
format_version=5 Bloom filter.

With Jemalloc, we see about 10% reduction in memory footprint (and block
cache charge) for Bloom filters, but 1-2% increase in storage footprint,
due to encoding efficiency losses (FP rate is non-linear with bits/key).

Why not weighted random round up/down rather than state tracking? By
only requiring malloc_usable_size, we don't actually know what the next
larger and next smaller usable sizes for the allocator are. We pick a
requested size, accept and use whatever usable size it has, and use the
difference to inform our next choice. This allows us to narrow in on the
right balance without tracking/predicting usable sizes.

Why not weight history of generated filter false positive rates by
number of keys? This could lead to excess skew in small filters after
generating a large filter.

Results from filter_bench with jemalloc (irrelevant details omitted):

    (normal keys/filter, but high variance)
    $ ./filter_bench -quick -impl=2 -average_keys_per_filter=30000 -vary_key_count_ratio=0.9
    Build avg ns/key: 29.6278
    Number of filters: 5516
    Total size (MB): 200.046
    Reported total allocated memory (MB): 220.597
    Reported internal fragmentation: 10.2732%
    Bits/key stored: 10.0097
    Average FP rate %: 0.965228
    $ ./filter_bench -quick -impl=2 -average_keys_per_filter=30000 -vary_key_count_ratio=0.9 -optimize_filters_for_memory
    Build avg ns/key: 30.5104
    Number of filters: 5464
    Total size (MB): 200.015
    Reported total allocated memory (MB): 200.322
    Reported internal fragmentation: 0.153709%
    Bits/key stored: 10.1011
    Average FP rate %: 0.966313

    (very few keys / filter, optimization not as effective due to ~59 byte
     internal fragmentation in blocked Bloom filter representation)
    $ ./filter_bench -quick -impl=2 -average_keys_per_filter=1000 -vary_key_count_ratio=0.9
    Build avg ns/key: 29.5649
    Number of filters: 162950
    Total size (MB): 200.001
    Reported total allocated memory (MB): 224.624
    Reported internal fragmentation: 12.3117%
    Bits/key stored: 10.2951
    Average FP rate %: 0.821534
    $ ./filter_bench -quick -impl=2 -average_keys_per_filter=1000 -vary_key_count_ratio=0.9 -optimize_filters_for_memory
    Build avg ns/key: 31.8057
    Number of filters: 159849
    Total size (MB): 200
    Reported total allocated memory (MB): 208.846
    Reported internal fragmentation: 4.42297%
    Bits/key stored: 10.4948
    Average FP rate %: 0.811006

    (high keys/filter)
    $ ./filter_bench -quick -impl=2 -average_keys_per_filter=1000000 -vary_key_count_ratio=0.9
    Build avg ns/key: 29.7017
    Number of filters: 164
    Total size (MB): 200.352
    Reported total allocated memory (MB): 221.5
    Reported internal fragmentation: 10.5552%
    Bits/key stored: 10.0003
    Average FP rate %: 0.969358
    $ ./filter_bench -quick -impl=2 -average_keys_per_filter=1000000 -vary_key_count_ratio=0.9 -optimize_filters_for_memory
    Build avg ns/key: 30.7131
    Number of filters: 160
    Total size (MB): 200.928
    Reported total allocated memory (MB): 200.938
    Reported internal fragmentation: 0.00448054%
    Bits/key stored: 10.1852
    Average FP rate %: 0.963387

And from db_bench (block cache) with jemalloc:

    $ ./db_bench -db=/dev/shm/dbbench.no_optimize -benchmarks=fillrandom -format_version=5 -value_size=90 -bloom_bits=10 -num=2000000 -threads=8 -compaction_style=2 -fifo_compaction_max_table_files_size_mb=10000 -fifo_compaction_allow_compaction=false
    $ ./db_bench -db=/dev/shm/dbbench -benchmarks=fillrandom -format_version=5 -value_size=90 -bloom_bits=10 -num=2000000 -threads=8 -optimize_filters_for_memory -compaction_style=2 -fifo_compaction_max_table_files_size_mb=10000 -fifo_compaction_allow_compaction=false
    $ (for FILE in /dev/shm/dbbench.no_optimize/*.sst; do ./sst_dump --file=$FILE --show_properties | grep 'filter block' ; done) | awk '{ t += $4; } END { print t; }'
    17063835
    $ (for FILE in /dev/shm/dbbench/*.sst; do ./sst_dump --file=$FILE --show_properties | grep 'filter block' ; done) | awk '{ t += $4; } END { print t; }'
    17430747
    $ #^ 2.1% additional filter storage
    $ ./db_bench -db=/dev/shm/dbbench.no_optimize -use_existing_db -benchmarks=readrandom,stats -statistics -bloom_bits=10 -num=2000000 -compaction_style=2 -fifo_compaction_max_table_files_size_mb=10000 -fifo_compaction_allow_compaction=false -duration=10 -cache_index_and_filter_blocks -cache_size=1000000000
    rocksdb.block.cache.index.add COUNT : 33
    rocksdb.block.cache.index.bytes.insert COUNT : 8440400
    rocksdb.block.cache.filter.add COUNT : 33
    rocksdb.block.cache.filter.bytes.insert COUNT : 21087528
    rocksdb.bloom.filter.useful COUNT : 4963889
    rocksdb.bloom.filter.full.positive COUNT : 1214081
    rocksdb.bloom.filter.full.true.positive COUNT : 1161999
    $ #^ 1.04 % observed FP rate
    $ ./db_bench -db=/dev/shm/dbbench -use_existing_db -benchmarks=readrandom,stats -statistics -bloom_bits=10 -num=2000000 -compaction_style=2 -fifo_compaction_max_table_files_size_mb=10000 -fifo_compaction_allow_compaction=false -optimize_filters_for_memory -duration=10 -cache_index_and_filter_blocks -cache_size=1000000000
    rocksdb.block.cache.index.add COUNT : 33
    rocksdb.block.cache.index.bytes.insert COUNT : 8448592
    rocksdb.block.cache.filter.add COUNT : 33
    rocksdb.block.cache.filter.bytes.insert COUNT : 18220328
    rocksdb.bloom.filter.useful COUNT : 5360933
    rocksdb.bloom.filter.full.positive COUNT : 1321315
    rocksdb.bloom.filter.full.true.positive COUNT : 1262999
    $ #^ 1.08 % observed FP rate, 13.6% less memory usage for filters

(Due to specific key density, this example tends to generate filters that are "worse than average" for internal fragmentation. "Better than average" cases can show little or no improvement.)
Pull Request resolved: https://github.com/facebook/rocksdb/pull/6427

Test Plan: unit test added, 'make check' with gcc, clang and valgrind

Reviewed By: siying

Differential Revision: D22124374

Pulled By: pdillinger

fbshipit-source-id: f3e3aa152f9043ddf4fae25799e76341d0d8714e
2020-06-22 13:32:07 -07:00
Andrew Kryczka d76eed4839 minor fixes for stress/crash contruns (#7006)
Summary:
Avoid using `cf_consistency` together with `enable_compaction_filter` as
the former heavily uses snapshots while the latter is incompatible with
snapshots.

Also fix a clang-analyze error for a write to a variable that is never
read.
Pull Request resolved: https://github.com/facebook/rocksdb/pull/7006

Reviewed By: zhichao-cao

Differential Revision: D22141679

Pulled By: ajkr

fbshipit-source-id: 1840ae238168818a9ab5973f90fd78c067399447
2020-06-19 16:05:17 -07:00
Peter Dillinger 88b4210701 Remove racially charged terms "whitelist" and "blacklist" (#7008)
Summary:
We don't need them.
Pull Request resolved: https://github.com/facebook/rocksdb/pull/7008

Test Plan: "make check" and ensure "make crash_test" starts

Reviewed By: ajkr

Differential Revision: D22143838

Pulled By: pdillinger

fbshipit-source-id: 72c8e16603abc59f4954e304466bc4dc1f58f94e
2020-06-19 15:27:32 -07:00
Andrew Kryczka 775dc623ad add `CompactionFilter` to stress/crash tests (#6988)
Summary:
Added a `CompactionFilter` that is aware of the stress test's expected state. It only drops key versions that are already covered according to the expected state. It is incompatible with snapshots (same as all `CompactionFilter`s), so disables all snapshot-related features when used in the crash test.
Pull Request resolved: https://github.com/facebook/rocksdb/pull/6988

Test Plan:
running a minified blackbox crash test

```
$ TEST_TMPDIR=/dev/shm python tools/db_crashtest.py blackbox --max_key=1000000 -write_buffer_size=1048576 -max_bytes_for_level_base=4194304 -target_file_size_base=1048576 -value_size_mult=33 --interval=10 --duration=3600
```

Reviewed By: anand1976

Differential Revision: D22072888

Pulled By: ajkr

fbshipit-source-id: 727b9d7a90d5eab18be0ec6cd5a810712ac13320
2020-06-18 09:54:55 -07:00
Yanqin Jin 15d9f28da5 Add stress test for best-efforts recovery (#6819)
Summary:
Add crash test for the case of best-efforts recovery.
After a certain amount of time, we kill the db_stress process, randomly delete some certain table files and restart db_stress. Given the randomness of file deletion, it is difficult to verify against a reference for data correctness. Therefore, we just check that the db can restart successfully.
Pull Request resolved: https://github.com/facebook/rocksdb/pull/6819

Test Plan:
```
./db_stress -best_efforts_recovery=true -disable_wal=1 -reopen=0
./db_stress -best_efforts_recovery=true -disable_wal=0 -skip_verifydb=1 -verify_db_one_in=0 -continuous_verification_interval=0
make crash_test_with_best_efforts_recovery
```

Reviewed By: anand1976

Differential Revision: D21436753

Pulled By: riversand963

fbshipit-source-id: 0b3605c922a16c37ed17d5ab6682ca4240e47926
2020-06-12 19:27:46 -07:00
Peter Dillinger 0c56fc4d66 Allow missing "unversioned" python, as in CentOS 8 (#6883)
Summary:
RocksDB Makefile was assuming existence of 'python' command,
which is not present in CentOS 8. We avoid using 'python' if 'python3' is available.

Also added fancy logic to format-diff.sh to make clang-format-diff.py for Python2 work even with Python3 only (as some CentOS 8 FB machines come equipped)

Also, now use just 'python3' for PYTHON if not found so that an informative
"command not found" error will result rather than something weird.
Pull Request resolved: https://github.com/facebook/rocksdb/pull/6883

Test Plan: manually tried some variants, 'make check' on a fresh CentOS 8 machine without 'python' executable or Python2 but with clang-format-diff.py for Python2.

Reviewed By: gg814

Differential Revision: D21767029

Pulled By: pdillinger

fbshipit-source-id: 54761b376b140a3922407bdc462f3572f461d0e9
2020-05-29 11:29:23 -07:00
sdong 12825894a2 Disable "compression_parallel_threads" in crash test for now (#6816)
Summary:
"compressio_parallel_threads" caused several test failure tests. To keep crash test clean, disable it for now.
Pull Request resolved: https://github.com/facebook/rocksdb/pull/6816

Test Plan: "make crash_test" to make sure the python script doesn't break

Reviewed By: zhichao-cao

Differential Revision: D21462112

fbshipit-source-id: 9eecc764800da82cd19665dc8b167eacead3310b
2020-05-07 20:52:29 -07:00
Andrew Kryczka 1f20df2f38 cover single level universal in crash test (#6818)
Summary: Pull Request resolved: https://github.com/facebook/rocksdb/pull/6818

Test Plan:
fast whitebox test and verify there are some single-level universal and
some multi-level universal runs.

```
$ python ./tools/db_crashtest.py whitebox --simple -max_key=1000000 -value_size_mult=33 -write_buffer_size=524288 -target_file_size_base=524288 -max_bytes_for_level_base=2097152 --duration=120 --interval=10 --ops_per_thread=1000 --random_kill_odd=887
```

Reviewed By: riversand963

Differential Revision: D21432138

Pulled By: ajkr

fbshipit-source-id: 2fc5ba9f3dfa49bb11e81da7dd00a17b476e64d7
2020-05-06 18:08:09 -07:00
Ziyue Yang e619a20e93 Add an option for parallel compression in for db_stress (#6722)
Summary:
This commit adds an `compression_parallel_threads` option in
db_stress. It also fixes the naming of parallel compression
option in db_bench to keep it aligned with others.
Pull Request resolved: https://github.com/facebook/rocksdb/pull/6722

Reviewed By: pdillinger

Differential Revision: D21091385

fbshipit-source-id: c9ba8c4e5cc327ff9e6094a6dc6a15fcff70f100
2020-04-30 10:49:07 -07:00
Cheng Chang 0a77617820 Disable O_DIRECT in stress test when db directory does not support direct IO (#6727)
Summary:
In crash test, the db directory might be set to /dev/shm or /tmp, in certain environments such as internal testing infrastructure, neither of these directories support direct IO, so direct IO is never enabled in crash test.

This PR sets up SyncPoints in direct IO related code paths to disable O_DIRECT flag in calls to `open`, so the direct IO code paths will be executed, all direct IO related assertions will be checked, but no real direct IO request will be issued to the file system.
Pull Request resolved: https://github.com/facebook/rocksdb/pull/6727

Test Plan:
export CRASH_TEST_EXT_ARGS="--use_direct_reads=1 --mmap_read=0"
make -j24 crash_test

Reviewed By: zhichao-cao

Differential Revision: D21139250

Pulled By: cheng-chang

fbshipit-source-id: db9adfe78d91aa4759835b1af91c5db7b27b62ee
2020-04-25 00:01:03 -07:00
sdong fe206f4f7c crash_test to cover index_type kBinarySearchWithFirstKey (#6721)
Summary:
Recently index_type kBinarySearchWithFirstKey is improved so that the API guarantee is exactly the same as other types and it is ready for wide production. We should cover it in crash tst.
Pull Request resolved: https://github.com/facebook/rocksdb/pull/6721

Test Plan: Run crash_test

Reviewed By: anand1976

Differential Revision: D21099781

fbshipit-source-id: fda91eba831d9eacbb140c703e9768bb1701f935
2020-04-20 12:57:15 -07:00
sdong 63d82e57b9 crash_test to cover small max_open_files (#6719)
Summary:
RocksDB behavior is different while max_open_files is small or large. Add the coverage to small max_open_files.
Pull Request resolved: https://github.com/facebook/rocksdb/pull/6719

Test Plan: Run crash_test

Reviewed By: pdillinger

Differential Revision: D21081021

fbshipit-source-id: e3e211761a9bd25d93d19a61c1f7b62d48cf5e3c
2020-04-17 11:00:07 -07:00
sdong 73523baeb1 crash_test to cover options.avoid_flush_during_recovery (#6712)
Summary:
Options.avoid_flush_during_recovery is uncovered in crash_test. Add the coverage with a chance of 1/8, as it is a less frequently used options.
Pull Request resolved: https://github.com/facebook/rocksdb/pull/6712

Test Plan: Run crash_test and see the option can be used or not used by chance.

Reviewed By: ltamasi

Differential Revision: D21056566

fbshipit-source-id: c3b1521517cfc204786e6ef8c6acd7fffda64793
2020-04-16 12:11:45 -07:00
Yueh-Hsuan Chiang 5801af4646 Add env_fault_injection argument to db_stress (#6687)
Summary:
Add env_fault_injection argument to db_stress.  When enabled,
FaultInjectionTestEnv will be used instead.  Currently this
option does not support running with other env setting.

This will allow
us to later manually produce error when running db_crashtest.
Pull Request resolved: https://github.com/facebook/rocksdb/pull/6687

Test Plan:
make db_stress -j32
./db_stress --env_fault_injection
./db_stress --env_fault_injection --hdfs   // expect error message

Reviewed By: ajkr

Differential Revision: D21014683

Pulled By: yhchiang

fbshipit-source-id: 0724aeac37efd57adb72a37defe6dbd3bfa8106a
2020-04-16 11:13:44 -07:00
anand76 610a09ccff Remove a printf from db_stress that's not useful info (#6705)
Summary:
This was causing db_crashtest.py to wrongly assume an error by parsing the output. Hopefully this will stabilize the crash tests.
Pull Request resolved: https://github.com/facebook/rocksdb/pull/6705

Test Plan: make blackbox_crash_test

Reviewed By: ltamasi

Differential Revision: D21043335

Pulled By: anand1976

fbshipit-source-id: 5cddd112b124d4e2ebd11724a17d4ef0f50c1cf8
2020-04-15 12:13:35 -07:00
anand76 79c838eb0f Fix a few bugs in db_stress fault injection (#6693)
Summary:
Fix the following issues -
1. Output parsing error in db_crashtest.py
2. Memory leak on exit
3. False alarm on filter block read error
Pull Request resolved: https://github.com/facebook/rocksdb/pull/6693

Test Plan: asan_crash

Reviewed By: cheng-chang

Differential Revision: D20990399

Pulled By: anand1976

fbshipit-source-id: 178ee0dd7c69a4bc5db698379db0dedb29281699
2020-04-13 11:01:03 -07:00
anand76 5c19a441c4 Fault injection in db_stress (#6538)
Summary:
This PR implements a fault injection mechanism for injecting errors in reads in db_stress. The FaultInjectionTestFS is used for this purpose. A thread local structure is used to track the errors, so that each db_stress thread can independently enable/disable error injection and verify observed errors against expected errors. This is initially enabled only for Get and MultiGet, but can be extended to iterator as well once its proven stable.
Pull Request resolved: https://github.com/facebook/rocksdb/pull/6538

Test Plan:
crash_test
make check

Reviewed By: riversand963

Differential Revision: D20714347

Pulled By: anand1976

fbshipit-source-id: d7598321d4a2d72bda0ced57411a337a91d87dc7
2020-04-10 17:21:26 -07:00
Yanqin Jin ccf7676455 Update a few scripts to be python3 compatible (#6525)
Summary:
There are a few scripts with python3 compatibility issues that were not
detected by automated tool before. Update them now.

Test Plan (devserver):
python2 tools/ldb_test.py
python3 tools/ldb_test.py

python2 tools/write_stress_runner.py --runtime_sec=30
python3 tools/write_stress_runner.py --runtime_sec=30

python2 tools/db_crashtest.py --simple --interval=2 --duration=10 blackbox
python3 tools/db_crashtest.py --simple --interval=2 --duration=10 blackbox

python2 tools/db_crashtest.py --simple --duration=10 --random_kill_odd=1000 --ops_per_thread=1000 whitebox
python3 tools/db_crashtest.py --simple --duration=10 --random_kill_odd=1000 --ops_per_thread=1000 whitebox
Pull Request resolved: https://github.com/facebook/rocksdb/pull/6525

Reviewed By: cheng-chang

Differential Revision: D20627820

Pulled By: riversand963

fbshipit-source-id: 4b25a7bd4d001c7f868be8b640ef876523be6ca3
2020-03-24 21:00:27 -07:00
Levi Tamasi 217ce20021 Remove GetSortedWalFiles/GetCurrentWalFile from the crash test (#6491)
Summary:
Currently, `db_stress` tests a randomly picked one of `GetLiveFiles`,
`GetSortedWalFiles`, and `GetCurrentWalFile` with a 1/N chance when the
command line parameter `get_live_files_and_wal_files_one_in` is specified.
The problem is that `GetSortedWalFiles` and `GetCurrentWalFile` are unreliable
in the sense that they can return errors if another thread removes a WAL file
while they are executing (which is a perfectly plausible and legitimate scenario).
The patch splits this command line parameter into three (one for each API),
and changes the crash test script so that only `GetLiveFiles` is tested during
our continuous crash test runs.
Pull Request resolved: https://github.com/facebook/rocksdb/pull/6491

Test Plan:
```
make check
python tools/db_crashtest.py whitebox
```

Reviewed By: siying

Differential Revision: D20312200

Pulled By: ltamasi

fbshipit-source-id: e7c3481eddfe3bd3d5349476e34abc9eee5b7dc8
2020-03-18 17:14:15 -07:00
Maysam Yabandeh 967a2d953f Revert "crash_test to enable block-based table hash index (#6310)" (#6327)
Summary:
This reverts commit 8e309b35bb.
The stress tests are failing . Revert it until we figure the root cause.
Pull Request resolved: https://github.com/facebook/rocksdb/pull/6327

Differential Revision: D19537657

Pulled By: maysamyabandeh

fbshipit-source-id: bf34a5dd720825957729e136e9a5a729a240e61a
2020-01-23 09:09:17 -08:00
Maysam Yabandeh cb1142e00d Set index_block_restart_interval of kHashSearch to 1 in stress test (#6324)
Summary:
kHashSearch is incompatible with larger than 1 values for index_block_restart_interval. Setting it to 1 in stress tests would avoid confusion about the test parameters.
Pull Request resolved: https://github.com/facebook/rocksdb/pull/6324

Differential Revision: D19525669

Pulled By: maysamyabandeh

fbshipit-source-id: fbf3a797e0ebcebb4d32eba3728cf3583906fc8a
2020-01-22 16:33:21 -08:00
sdong 8e309b35bb crash_test to enable block-based table hash index (#6310)
Summary:
Block-based table has index has been disabled in crash test due to bugs. We fixed a bug and re-enable it.
Pull Request resolved: https://github.com/facebook/rocksdb/pull/6310

Test Plan: Finish one round of "crash_test_with_atomic_flush" test successfully while exclusively running has index. Another run also ran for several hours without failure.

Differential Revision: D19455856

fbshipit-source-id: 1192752d2c1e81ed7e5c5c7a9481c841582d5274
2020-01-21 12:27:30 -08:00
sdong 6b64aed4c0 Fix bug which causes crash_test to always run on sync mode (#6304)
Summary:
A previous change meant to make db_stress to run on sync=1 mode for 1/20 of the time in crash_test, but a bug caused to to always run on sync=1 mode. Fix it.
Pull Request resolved: https://github.com/facebook/rocksdb/pull/6304

Test Plan: Start and kill "python -u tools/db_crashtest.py --simple whitebox" multiple times and observe that most times sync=0 is used while some times sync=1 is used.

Differential Revision: D19433000

fbshipit-source-id: 7a0adba39b17a1b3acbbd791bb0cdb743b91fa95
2020-01-17 01:46:48 -08:00
anand76 687119aeaf Variable key length in db_stress (#6273)
Summary:
Undo https://github.com/facebook/rocksdb/issues/6243 and fix the crash test failures.
Pull Request resolved: https://github.com/facebook/rocksdb/pull/6273

Test Plan: Run make ubsan_crash_test

Differential Revision: D19331472

Pulled By: anand1976

fbshipit-source-id: 30aa4a36c1b0f77a97159d82bbfd1cd767878e28
2020-01-09 21:27:18 -08:00
Peter Dillinger 83108d23e8 Re-enable level_compaction_dynamic_level_bytes in crash test (#6251)
Summary:
Was probably a false signal suggesting a problem in https://github.com/facebook/rocksdb/issues/6217
Pull Request resolved: https://github.com/facebook/rocksdb/pull/6251

Test Plan: 'make crash_test'

Differential Revision: D19246951

Pulled By: pdillinger

fbshipit-source-id: 3e4fafcd9a7cf5f19ffd207b90279ba615145d6f
2019-12-30 10:15:49 -08:00
Peter Dillinger 37fd2b9694 Revert "Generate variable length keys in db_stress (#6165)" and follow-ups (#6243)
Summary:
This commit is suspected in some crash test failures such as

Verification failed for column family 0 key 78438077: Value not found: NotFound:
Pull Request resolved: https://github.com/facebook/rocksdb/pull/6243

Test Plan: 'make check' and start 'make crash_test'

Differential Revision: D19220495

Pulled By: pdillinger

fbshipit-source-id: 6c4709cee80ab4344e06ce360f51e947d79fb3fa
2019-12-23 16:32:57 -08:00
anand76 3160edfdc7 Generate variable length keys in db_stress (#6165)
Summary:
Currently, db_stress generates fixed length keys of 8 bytes. This patch adds the ability to generate variable length keys. Most of the db_stress code continues to work with a numeric key randomly generated, and the numeric key also acts as an index into the values_ array. The numeric key is mapped to a variable length string key in a deterministic way. Furthermore, the ordering is preserved.
Pull Request resolved: https://github.com/facebook/rocksdb/pull/6165

Test Plan: run make crash_test

Differential Revision: D19204646

Pulled By: anand1976

fbshipit-source-id: d2d46a96615b4832a8be2a981f5913905f0e1ca7
2019-12-20 21:10:33 -08:00
sdong 338c149b92 crash_test to cover bottommost compression and some other changes (#6215)
Summary:
Several improvements to crash_test/stress_test:
(1) Stress_test to support an parameter of bottommost compression
(2) Rename those FLAGS_* variables that are not gflags to avoid confusion
(3) Crash_test to randomly generate compression type for bottommost compression with half the chance.
(4) Stress_test to sanitize unsupported compression type to snappy, so that crash_test to cover all possible compression types and people don't need to worry about they don't support all comrpession types in their environment.
(5) In crash_test, when generating db_stress command, sort arguments in alphabeta order, so that it is easier to find value for a specific argument.
Pull Request resolved: https://github.com/facebook/rocksdb/pull/6215

Test Plan: Run "make crash_test" for a while and see the botommost option shown in LOG files.

Differential Revision: D19171255

fbshipit-source-id: d7001e246c4ff9ee5760776eea0be97738650735
2019-12-20 16:14:52 -08:00
Zhichao Cao f89dea4fec db_stress: Added the verification for GetLiveFiles, GetSortedWalFiles and GetCurrentWalFile (#6224)
Summary:
Add the verification in operateDB to verify GetLiveFiles, GetSortedWalFiles and GetCurrentWalFile. The test will be called every 1 out of N, N is decided by get_live_files_and_wal_files_one_i, whose default is 1000000.
Pull Request resolved: https://github.com/facebook/rocksdb/pull/6224

Test Plan: pass db_stress default run.

Differential Revision: D19183358

Pulled By: zhichao-cao

fbshipit-source-id: 20073cf72ede77a3e0d3cf5f28304f1f605d2b1a
2019-12-20 12:07:30 -08:00
Yanqin Jin 670a916d01 Add more verification to db_stress (#6173)
Summary:
Currently, db_stress performs verification by calling `VerifyDb()` at the end of test and optionally before tests start. In case of corruption or incorrect result, it will be too late. This PR adds more verification in two ways.
1. For cf consistency test, each test thread takes a snapshot and verifies every N ops. N is configurable via `-verify_db_one_in`. This option is not supported in other stress tests.
2. For cf consistency test, we use another background thread in which a secondary instance periodically tails the primary (interval is configurable). We verify the secondary. Once an error is detected, we terminate the test and report. This does not affect other stress tests.

Test plan (devserver)
```
$./db_stress -test_cf_consistency -verify_db_one_in=0 -ops_per_thread=100000 -continuous_verification_interval=100
$./db_stress -test_cf_consistency -verify_db_one_in=1000 -ops_per_thread=10000 -continuous_verification_interval=0
$make crash_test
```
Pull Request resolved: https://github.com/facebook/rocksdb/pull/6173

Differential Revision: D19047367

Pulled By: riversand963

fbshipit-source-id: aeed584ad71f9310c111445f34975e5ab47a0615
2019-12-20 08:49:29 -08:00
Maysam Yabandeh 77d5ba7887 Revert "Add kHashSearch to stress tests (#6210)" (#6220)
Summary:
This reverts commit 54f9092b0c.
It making our daily stress tests fail. Revert it until the issues are fixed.
Pull Request resolved: https://github.com/facebook/rocksdb/pull/6220

Differential Revision: D19179881

Pulled By: maysamyabandeh

fbshipit-source-id: 99de0eaf776567fa81110b9ad2608234a16083ce
2019-12-19 10:46:55 -08:00
Peter Dillinger 9ff569bdfc Temporarily disable level_compaction_dynamic_level_bytes in crash test (#6217)
Summary:
We're seeing assertion violations like this in crash test:

db_stress: table/block_based/block_based_table_reader.cc:4129: virtual uint64_t rocksdb::BlockBasedTable::ApproximateSize(const rocksdb::Slice&, const rocksdb::Slice&, rocksdb::TableReaderCaller): Assertion `end_offset >= start_offset' failed.***

And ApproximateSize appears only to be called with the level_compaction_dynamic_level_bytes option.
Pull Request resolved: https://github.com/facebook/rocksdb/pull/6217

Test Plan:
temporarily put an assert(false) in ApproximateSize and
briefly run 'make crash_test'

Differential Revision: D19179174

Pulled By: pdillinger

fbshipit-source-id: 506e6549aea0da19b363a1a6da04373c364d92e4
2019-12-19 10:24:49 -08:00
Maysam Yabandeh 54f9092b0c Add kHashSearch to stress tests (#6210)
Summary:
Beside extending index_type to kHashSearch, it clarifies in the code base that this feature is incompatible with index_block_restart_interval > 1.
Pull Request resolved: https://github.com/facebook/rocksdb/pull/6210

Test Plan:
```
make -j32 crash_test

Differential Revision: D19166567

Pulled By: maysamyabandeh

fbshipit-source-id: 3aaf75a70a8b462d372d43aac69dbd10df303ec7
2019-12-18 18:09:30 -08:00
Peter Dillinger dfb259e48d Fix syntax error (!) in db_crashtest.py (#6207)
Summary:
Fixes syntax error from https://github.com/facebook/rocksdb/pull/6203

Pull Request resolved: https://github.com/facebook/rocksdb/pull/6207

Test Plan: make blackbox_crash_test -> no more syntax error

Differential Revision: D19161752

Pulled By: pdillinger

fbshipit-source-id: b3032f296041ab56307762622b9ef6c03a8379aa
2019-12-18 09:32:52 -08:00
anand76 2afea29762 Add VerifyChecksum() to db_stress (#6203)
Summary:
Add an option to db_stress, verify_checksum_one_in, to call DB::VerifyChecksum() once every N ops.
Pull Request resolved: https://github.com/facebook/rocksdb/pull/6203

Differential Revision: D19145753

Pulled By: anand1976

fbshipit-source-id: d09edf21f309ad53aa40dd25b7a563d50665fd8b
2019-12-17 20:44:58 -08:00
sdong 9f250dd88e crash_test: two fixes (#6200)
Summary:
Fix two crash test issues:
1. sync mode should not run with disable_wal=true
2. disable "compaction_readahead_size" for now. With it on, some block checksum verification failure will happen in compaction paths. Not sure why, but disable it for now to keep the test clean.
Pull Request resolved: https://github.com/facebook/rocksdb/pull/6200

Test Plan: Run "make crash_test" and "make crash_test_with_atomic_flush" and see it runs way longer than before the fix without failing.

Differential Revision: D19143493

fbshipit-source-id: 438fad52fbda60aafd142e1b65578addbe7d72b1
2019-12-17 18:25:04 -08:00
sdong bcc372c0c3 Add some new options to crash_test (#6176)
Summary:
Several options are trivially added to crash test and random values are picked.
Made simple test run non-dynamic level and normal test run dynamic level.
Pull Request resolved: https://github.com/facebook/rocksdb/pull/6176

Test Plan: Run crash_test and watch the printing

Differential Revision: D19053955

fbshipit-source-id: 958cb43c968541ebd87ed4d91e778bd1d40e7502
2019-12-16 15:43:13 -08:00
Maysam Yabandeh 4b97812da8 Add long-running snapshots to stress tests (#6171)
Summary:
Current implementation holds on to 10% of snapshots for 10x longer, and 1% of snapshots 100x longer.
Pull Request resolved: https://github.com/facebook/rocksdb/pull/6171

Test Plan:
```
make -j32 crash_test

Differential Revision: D19038399

Pulled By: maysamyabandeh

fbshipit-source-id: 75da2dbb5c47a0b3f37d299b8719e392b73b42c0
2019-12-14 15:22:40 -08:00
Maysam Yabandeh fec7302a9d Enable unordered_write in stress tests (#6164)
Summary:
With WritePrepared transactions configured with two_write_queues, unordered_write will offer the same guarantees as vanilla rocksdb and thus can be enabled in stress tests.
Pull Request resolved: https://github.com/facebook/rocksdb/pull/6164

Test Plan:
```
make -j32 crash_test_with_txn

Differential Revision: D18991899

Pulled By: maysamyabandeh

fbshipit-source-id: eece5e96b4169b67d7931e5c0afca88540a113e1
2019-12-13 10:25:04 -08:00
Maysam Yabandeh 8613ee2e94 Enable all txn write policies in crash test (#6158)
Summary:
Currently the default txn write policy in crash tests is WRITE_PREPARED. The patch randomly picks the write policy at the start of the crash test.
Pull Request resolved: https://github.com/facebook/rocksdb/pull/6158

Test Plan:
```
make -j32 crash_test_with_txn
```

Differential Revision: D18946307

Pulled By: maysamyabandeh

fbshipit-source-id: f77d7a94f99a08791ef9626da153d284bf521950
2019-12-12 10:43:49 -08:00
Maysam Yabandeh 1ad6fa9cc7 Enable txn in crash tests (#6155)
Summary:
Start daily crash tests with use_txn flag.
Pull Request resolved: https://github.com/facebook/rocksdb/pull/6155

Differential Revision: D18943630

Pulled By: maysamyabandeh

fbshipit-source-id: eea99a6ffd5f57fb9651f6ca7dab8fbf70379c87
2019-12-11 16:01:55 -08:00
Peter Dillinger a653857178 Add PauseBackgroundWork() to db_stress (#6148)
Summary:
Worker thread will occasionally call PauseBackgroundWork(),
briefly sleep (to avoid stalling itself) and then call
ContinueBackgroundWork().
Pull Request resolved: https://github.com/facebook/rocksdb/pull/6148

Test Plan:
some running of 'make blackbox_crash_test' with temporary
printf output to confirm code occasionally reached.

Differential Revision: D18913886

Pulled By: pdillinger

fbshipit-source-id: ae9356a803390929f3165dfb6a00194692ba92be
2019-12-10 15:46:48 -08:00
Peter Dillinger 6380df5e10 Vary bloom_bits in db_crashtest (#6103)
Summary:
Especially with non-integral bits/key now supported,
db_crashtest should vary the bloom_bits configuration. The probabilities
look like this:

1/2 chance of a uniform int from 0 to 19. This includes overall 1/40
chance of 0 which disables the bloom filter.

1/2 chance of a float from a lognormal distribution with a median of 10.
This always produces positive values but with a decent chance of < 1
(overall ~1/40) or > 100 (overall ~1/40), the enforced/coerced
implementation limits.
Pull Request resolved: https://github.com/facebook/rocksdb/pull/6103

Test Plan:
start 'make blackbox_crash_test' several times and look at
configuration output

Differential Revision: D18734877

Pulled By: pdillinger

fbshipit-source-id: 4a38cb057d3b3fc1327f93199f65b9a9ffbd7316
2019-12-10 08:39:50 -08:00
Peter Dillinger f19faf7814 Add format_version=5 to db_crashtest (#6102)
Summary:
format_version=5 enables new Bloom filter. Using 2/5
probability for "latest and greatest" rather than naive 1/4.
Pull Request resolved: https://github.com/facebook/rocksdb/pull/6102

Test Plan: start 'make blackbox_crash_test'

Differential Revision: D18735685

Pulled By: pdillinger

fbshipit-source-id: e81529c8a3f53560d246086ee5f92ee7d79a2eab
2019-11-27 13:19:11 -08:00
sdong 6123611c42 crash_test: use large max_manifest_file_size most of the time. (#6034)
Summary:
Right now, crash_test always uses 16KB max_manifest_file_size value. It is good to cover logic of manifest file switch. However, information stored in manifest files might be useful in debugging failures. Switch to only use small manifest file size in 1/15 of the time.
Pull Request resolved: https://github.com/facebook/rocksdb/pull/6034

Test Plan: Observe command generated by db_crash_test.py multiple times and see the --max_manifest_file_size value distribution.

Differential Revision: D18513824

fbshipit-source-id: 7b3ae6dbe521a0918df41064e3fa5ecbf2466e04
2019-11-14 14:01:06 -08:00
sdong 5b656584af crash_test: disable periodic compaction in FIFO compaction. (#5993)
Summary:
A recent commit make periodic compaction option valid in FIFO, which means TTL. But we fail to disable it in crash test, causing assert failure. Fix it by having it disabled.
Pull Request resolved: https://github.com/facebook/rocksdb/pull/5993

Test Plan: Restart "make crash_test" many times and make sure --periodic_compaction_seconds=0 is always the case when --compaction_style=2

Differential Revision: D18263223

fbshipit-source-id: c91a802017d83ae89ac43827d1b0012861933814
2019-10-31 17:28:03 -07:00
sdong 0337d87b42 crash_test: disable atomic flush with pipelined write (#5986)
Summary:
Recently, pipelined write is enabled even if atomic flush is enabled, which causing sanitizing failure in db_stress. Revert this change.
Pull Request resolved: https://github.com/facebook/rocksdb/pull/5986

Test Plan: Run "make crash_test_with_atomic_flush" and see it to run for some while so that the old sanitizing error (which showed up quickly) doesn't show up.

Differential Revision: D18228278

fbshipit-source-id: 27fdf2f8e3e77068c9725a838b9bef4ab25a2553
2019-10-30 11:36:55 -07:00
sdong a3960fc875 Move pipeline write waiting logic into WaitForPendingWrites() (#5716)
Summary:
In pipeline writing mode, memtable switching needs to wait for memtable writing to finish to make sure that when memtables are made immutable, inserts are not going to them. This is currently done in DBImpl::SwitchMemtable(). This is done after flush_scheduler_.TakeNextColumnFamily() is called to fetch the list of column families to switch. The function flush_scheduler_.TakeNextColumnFamily() itself, however, is not thread-safe when being called together with flush_scheduler_.ScheduleFlush().
This change provides a fix, which moves the waiting logic before flush_scheduler_.TakeNextColumnFamily(). WaitForPendingWrites() is a natural place where the logic can happen.
Pull Request resolved: https://github.com/facebook/rocksdb/pull/5716

Test Plan: Run all tests with ASAN and TSAN.

Differential Revision: D18217658

fbshipit-source-id: b9c5e765c9989645bf10afda7c5c726c3f82f6c3
2019-10-29 18:16:36 -07:00
Maysam Yabandeh a6e615a7ba Enable partitioned index/filter in stress tests (#5918)
Summary:
This is the 3rd attempt after the revert of https://github.com/facebook/rocksdb/issues/4020 and https://github.com/facebook/rocksdb/issues/5895
The last bug is fixed https://github.com/facebook/rocksdb/pull/5907
Pull Request resolved: https://github.com/facebook/rocksdb/pull/5918

Test Plan:
```
make -j32 crash_test
```

Differential Revision: D17909489

Pulled By: maysamyabandeh

fbshipit-source-id: 7dfb8cf998c2d295c86465dd21734593d277887e
2019-10-14 10:35:18 -07:00
Yanqin Jin bc8b05cb77 Revert "Enable partitioned index/filter in stress tests (#5895)" (#5904)
Summary:
This reverts commit 2f4e288143.
Pull Request resolved: https://github.com/facebook/rocksdb/pull/5904

Differential Revision: D17871282

Pulled By: riversand963

fbshipit-source-id: d210725f8f3b26d8eac25892094da09d9694337e
2019-10-10 19:19:39 -07:00
Maysam Yabandeh 2f4e288143 Enable partitioned index/filter in stress tests (#5895)
Summary:
This is the 2nd attempt after the revert of https://github.com/facebook/rocksdb/pull/4020
Pull Request resolved: https://github.com/facebook/rocksdb/pull/5895

Test Plan:
```
./tools/db_crashtest.py blackbox --simple --interval=10 --max_key=10000000
```

Differential Revision: D17822137

Pulled By: maysamyabandeh

fbshipit-source-id: 3d148c0d8cc129080410ff859c04b544223c8ea3
2019-10-08 16:50:21 -07:00
sdong 679a45d0cb crash_test to do some verification for prefix extractor and iterator bounds. (#5846)
Summary:
For now, crash_test is not able to report any failure for the logic related to iterator upper, lower bounds or iterators, or reseek. These are features prone to errors. Improve db_stress in several ways:
(1) For each iterator run, reseek up to 3 times.
(2) For every iterator, create control iterator with upper or lower bound, with total order seek. Compare the results with the iterator.
(3) Make simple crash test to avoid prefix size to have more coverage.
(4) make prefix_size = 0 a valid size and -1 to indicate disabling prefix extractor.
Pull Request resolved: https://github.com/facebook/rocksdb/pull/5846

Test Plan: Manually hack the code to create wrong results and see they are caught by the tool.

Differential Revision: D17631760

fbshipit-source-id: acd460a177bd2124a5ffd7fff490702dba63030b
2019-09-27 11:10:44 -07:00
Maysam Yabandeh 6ec6a4a9a4 Remove snap_refresh_nanos option (#5826)
Summary:
The snap_refresh_nanos option didn't bring much benefit. Remove the feature to simplify the code.
Pull Request resolved: https://github.com/facebook/rocksdb/pull/5826

Differential Revision: D17467147

Pulled By: maysamyabandeh

fbshipit-source-id: 4f950b046990d0d1292d7fc04c2ccafaf751c7f0
2019-09-18 20:26:04 -07:00
Levi Tamasi 94d62d771e Temporarily disable partitioned index/filter in stress test (#5811)
Summary:
PR https://github.com/facebook/rocksdb/issues/4020 enabled partitioned indexes/filters in stress tests; however,
this causes assertion failures in BatchedOpsStressTest. This patch
disables them until we can root cause the failures.
Pull Request resolved: https://github.com/facebook/rocksdb/pull/5811

Test Plan: Ran the script and made sure it only uses the binary search index.

Differential Revision: D17399366

Pulled By: ltamasi

fbshipit-source-id: adb116e6297f9c6ccd7ac15b6a16c9aa91f21ac5
2019-09-16 11:41:35 -07:00
Levi Tamasi d35ffd569c Temporarily disable hash index in stress tests (#5792)
Summary:
PR https://github.com/facebook/rocksdb/issues/4020 implicitly enabled the hash index as well in stress/crash
tests, resulting in assertion failures in Block. This patch disables
the hash index until we can pinpoint the root cause of these issues.
Pull Request resolved: https://github.com/facebook/rocksdb/pull/5792

Test Plan:
Ran tools/db_crashtest.py and made sure it only uses index types 0 and 2
(binary search and partitioned index).

Differential Revision: D17346777

Pulled By: ltamasi

fbshipit-source-id: b4318f37f1fda3ee1bbff4ef2c2f556ca9e6b551
2019-09-12 12:11:34 -07:00
Andrew Kryczka dd2a35f13f Support partitioned index and filters in stress/crash tests (#4020)
Summary:
- In `db_stress`, support choosing index type and whether to enable filter partitioning, and randomly set those options in crash test
- When partitioned filter is enabled by crash test, force partitioned index to also be enabled since it's a prerequisite
Pull Request resolved: https://github.com/facebook/rocksdb/pull/4020

Test Plan:
currently this is blocked on fixing the bug that crash test caught:

```
$ TEST_TMPDIR=/data/compaction_bench python ./tools/db_crashtest.py blackbox --simple --interval=10 --max_key=10000000
...
Verification failed for column family 0 key 937501: Value not found: NotFound:
Crash-recovery verification failed :(
```

Differential Revision: D8508683

Pulled By: maysamyabandeh

fbshipit-source-id: 0337e5d0558bcef26b1f3699f47265a2c1e99629
2019-09-11 14:13:38 -07:00
sdong 1daff8f85a crash_test to skip compaction TTL for FIFO compaction. (#5749)
Summary:
https://github.com/facebook/rocksdb/pull/5741 added compaction TTL to crash test, but it causes assertion fails for FIFO compaction. Disable this combination for now while we debug the assertion failure.
Pull Request resolved: https://github.com/facebook/rocksdb/pull/5749

Test Plan: Run crash test and observe that when compaction_style=2, compaction_ttl is always 0.

Differential Revision: D17078292

fbshipit-source-id: 446821a3b9739956094d5e4f9be1251a15b57f5d
2019-08-27 17:55:37 -07:00
sdong 1d6a10f52d Extend stress test to cover periodic compaction and compaction TTL (#5741)
Summary:
Covering periodic compaction and compaction TTL can help us expose potential issues. Add it there.
Randomly select value for these two options.
Pull Request resolved: https://github.com/facebook/rocksdb/pull/5741

Test Plan: Run crash_test and see the perameters generated.

Differential Revision: D17059515

fbshipit-source-id: 8213974846a0b6a22fc13be705825c9054d1d097
2019-08-26 15:03:25 -07:00
sdong d8a27d9331 Atomic Flush Crash Test also covers the case that WAL is enabled. (#5729)
Summary:
AtomicFlushStressTest is a powerful test, but right now we only run it for atomic_flush=true + disable_wal=true. We further extend it to the case where atomic_flush=false + disable_wal = false. All the workload generation and validation can stay the same.
Atomic flush crash test is also changed to switch between the two test scenarios. It makes the name "atomic flush crash test" out of sync from what it really does. We leave it as it is to avoid troubles with continous test set-up.
Pull Request resolved: https://github.com/facebook/rocksdb/pull/5729

Test Plan: Run "CRASH_TEST_KILL_ODD=188 TEST_TMPDIR=/dev/shm/ USE_CLANG=1 make whitebox_crash_test_with_atomic_flush", observe the settings used and see it passed.

Differential Revision: D16969791

fbshipit-source-id: 56e37487000ae631e31b0100acd7bdc441c04163
2019-08-22 16:32:55 -07:00
sdong 8e12638f3d Slightly adjust atomic white box test's kill odd (#5717)
Summary:
Atomic white box test's kill odd is the same as normal test. However, in the scenario that only WritableFileWriter::Append() is blacklisted, WritableFileWriter::Flush() dominates the killing odds. Normally, most of WritableFileWriter::Flush() are called in WAL writes, where every write triggers a WAL flush. In atomic test, WAL is disabled, so the kill happens less frequently than we antipated. In some rare cases, the kill didn't end up with happening (for reasons I still don't fully understand) and cause the stress test timeout.

If WAL is disabled, make the odds 5x likely to trigger.
Pull Request resolved: https://github.com/facebook/rocksdb/pull/5717

Test Plan: Run whitebox_crash_test_with_atomic_flush and whitebox_crash_test and observe the kill odds printed out.

Differential Revision: D16897237

fbshipit-source-id: cbf5d96f6fc0e980523d0f1f94bf4e72cdb82d1c
2019-08-19 10:51:59 -07:00
Yanqin Jin a78503bd6c Temporarily disable snapshot list refresh for atomic flush stress test (#5581)
Summary:
Atomic flush test started to fail after https://github.com/facebook/rocksdb/issues/5099. Then https://github.com/facebook/rocksdb/issues/5278 provided a fix after
which the same error occurred much less frequently. However it still occur
occasionally. Not sure what the root cause is. This PR disables the feature of
snapshot list refresh, and we should keep an eye on the failure in the future.
Pull Request resolved: https://github.com/facebook/rocksdb/pull/5581

Differential Revision: D16295985

Pulled By: riversand963

fbshipit-source-id: c9e62e65133c52c21b07097de359632ca62571e4
2019-07-22 14:38:16 -07:00
Tim Hatch a6a9213a36 Fix interpreter lines for files with python2-only syntax.
Reviewed By: lisroach

Differential Revision: D15362271

fbshipit-source-id: 48fab12ab6e55a8537b19b4623d2545ca9950ec5
2019-07-09 10:51:37 -07:00
Maysam Yabandeh f9842869cf Disable pipeline writes in stress test (#5445)
Summary:
The tsan crash tests are failing with a data race compliant with pipelined write option. Temporarily disable it until its concurrency issue are fixed.
Pull Request resolved: https://github.com/facebook/rocksdb/pull/5445

Differential Revision: D15783824

Pulled By: maysamyabandeh

fbshipit-source-id: 413a0c3230b86f524fc7eeea2cf8e8375406e65b
2019-06-12 11:12:36 -07:00
anand76 181bb43f08 Fix bugs in FilePickerMultiGet (#5292)
Summary:
This PR fixes a couple of bugs in FilePickerMultiGet that were causing db_stress test failures. The failures were caused by -
1. Improper handling of a key that matches the user key portion of an L0 file's largest key. In this case, the curr_index_in_curr_level file index in L0 for that key was getting incremented, but batch_iter_ was not advanced. By design, all keys in a batch are supposed to be checked against an L0 file before advancing to the next L0 file. Not advancing to the next key in the batch was causing a double increment of curr_index_in_curr_level due to the same key being processed again
2. Improper handling of a key that matches the user key portion of the largest key in the last file of L1 and higher. This was resulting in a premature end to the processing of the batch for that level when the next key in the batch is a duplicate. Typically, the keys in MultiGet will not be duplicates, but its good to handle that case correctly

Test -
asan_crash
make check
Pull Request resolved: https://github.com/facebook/rocksdb/pull/5292

Differential Revision: D15282530

Pulled By: anand1976

fbshipit-source-id: d1a6a86e0af273169c3632db22a44d79c66a581f
2019-05-09 13:18:00 -07:00
anand76 930bfa5750 Disable MultiGet from db_stress (#5284)
Summary:
Disable it for now until we can get stress tests to pass consistently.
Pull Request resolved: https://github.com/facebook/rocksdb/pull/5284

Differential Revision: D15230727

Pulled By: anand1976

fbshipit-source-id: 239baacdb3c4cd4fb7c4447f7582b9042501d752
2019-05-06 18:26:50 -07:00
Maysam Yabandeh 6a40ee5eb1 Refresh snapshot list during long compactions (2nd attempt) (#5278)
Summary:
Part of compaction cpu goes to processing snapshot list, the larger the list the bigger the overhead. Although the lifetime of most of the snapshots is much shorter than the lifetime of compactions, the compaction conservatively operates on the list of snapshots that it initially obtained. This patch allows the snapshot list to be updated via a callback if the compaction is taking long. This should let the compaction to continue more efficiently with much smaller snapshot list.
For simplicity, to avoid the feature is disabled in two cases: i) When more than one sub-compaction are sharing the same snapshot list, ii) when Range Delete is used in which the range delete aggregator has its own copy of snapshot list.
This fixes the reverted https://github.com/facebook/rocksdb/pull/5099 issue with range deletes.
Pull Request resolved: https://github.com/facebook/rocksdb/pull/5278

Differential Revision: D15203291

Pulled By: maysamyabandeh

fbshipit-source-id: fa645611e606aa222c7ce53176dc5bb6f259c258
2019-05-03 17:30:22 -07:00
anand76 434ccf2df4 Add option to use MultiGet in db_stress (#5264)
Summary:
The new option will pick a batch size randomly in the range 1-64. It will then space the keys in the batch by random intervals.
Pull Request resolved: https://github.com/facebook/rocksdb/pull/5264

Differential Revision: D15175522

Pulled By: anand1976

fbshipit-source-id: c16baa69d0f1ff4cf53c55c813ddd82c8aeb58fc
2019-05-01 23:06:56 -07:00
Andrew Kryczka b02d0c238d Init compression dict handle before reading meta-blocks (#5267)
Summary:
At least one of the meta-block loading functions (`ReadRangeDelBlock`)
uses the same block reading function (`NewDataBlockIterator`) as data
block reads, which means it uses the dictionary handle. However, the
dictionary handle was uninitialized while reading meta-blocks, causing
readers to receive an error. This situation was only noticed when
`cache_index_and_filter_blocks=true`.

This PR initializes the handle to null while reading meta-blocks to
prevent the error. It also adds support to `db_stress` /
`db_crashtest.py` for `cache_index_and_filter_blocks`.

Fixes #5263.
Pull Request resolved: https://github.com/facebook/rocksdb/pull/5267

Differential Revision: D15149264

Pulled By: maysamyabandeh

fbshipit-source-id: 991d38a306c62db5976778bfb050fa3cd4a0671b
2019-04-30 09:50:49 -07:00
Yanqin Jin 210b49cac9 Disable pipelined write in atomic flush stress test (#5266)
Summary:
Since currently pipelined write allows one thread to perform memtable writes
while another thread is traversing the `flush_scheduler_`, it will cause an
assertion failure in `FlushScheduler::Clear`. To unblock crash recoery tests,
we temporarily disable pipelined write when atomic flush is enabled.
Pull Request resolved: https://github.com/facebook/rocksdb/pull/5266

Differential Revision: D15142285

Pulled By: riversand963

fbshipit-source-id: a0c20fe4ac543e08feaed602414f982054df7831
2019-04-30 08:12:42 -07:00
Fosco Marotto 6c2bf9e916 Add copyright headers per FB open-source checkup tool. (#5199)
Summary:
internal task: T35568575
Pull Request resolved: https://github.com/facebook/rocksdb/pull/5199

Differential Revision: D14962794

Pulled By: gfosco

fbshipit-source-id: 93838ede6d0235eaecff90d200faed9a8515bbbe
2019-04-18 10:55:01 -07:00
Andrew Kryczka 2263f86901 exercise WAL recycling in crash test (#5070)
Summary:
Since this feature affects the WAL behavior, it seems important our crash-recovery tests cover it.
Pull Request resolved: https://github.com/facebook/rocksdb/pull/5070

Differential Revision: D14470085

Pulled By: miasantreble

fbshipit-source-id: 9b9682a718a926d57d055e0a5ec867efbd2eb9c1
2019-03-15 12:03:26 -07:00
Andrew Kryczka 1218704b61 Fix `compression_zstd_max_train_bytes` coverage in stress test (#4957)
Summary:
Previously `finalize_and_sanitize` function was always zeroing out `compression_zstd_max_train_bytes`. It was only supposed to do that when non-ZSTD compression was used. But since `--compression_type` was an unknown argument (i.e., one that `db_crashtest.py` does not recognize and blindly forwards to `db_stress`), `finalize_and_sanitize` could not tell whether ZSTD was used. This PR fixes it simply by making `--compression_type` a known argument with snappy as default (same as `db_stress`).
Pull Request resolved: https://github.com/facebook/rocksdb/pull/4957

Differential Revision: D13994302

Pulled By: ajkr

fbshipit-source-id: 1b0baea7331397822830970d3698642eb7a7df65
2019-02-11 14:56:39 -08:00
Andrew Kryczka 68d949b3e3 Enable DeleteRange in stress/crash tests (#4483)
Summary:
Set `delrangepercent=1` when `test_batches_snapshots=false`.
Pull Request resolved: https://github.com/facebook/rocksdb/pull/4483

Differential Revision: D10324361

Pulled By: ajkr

fbshipit-source-id: 0cde1f1504f9493408a0c6493b976d7e5f5b2d23
2018-12-18 13:42:49 -08:00
Andrew Kryczka 8d2b74d287 Refine db_stress params for atomic flush (#4781)
Summary:
Separate flag for enabling option from flag for enabling dedicated atomic stress test. I have found setting the former without setting the latter can detect different problems.
Pull Request resolved: https://github.com/facebook/rocksdb/pull/4781

Differential Revision: D13463211

Pulled By: ajkr

fbshipit-source-id: 054f777885b2dc7d5ea99faafa21d6537eee45fd
2018-12-13 22:10:38 -08:00
Yanqin Jin 912bbbbc72 Enable crash-recovery stress test for atomic flush (#4605)
Summary:
This PR adds test of atomic flush to our continuous stress tests.
Pull Request resolved: https://github.com/facebook/rocksdb/pull/4605

Differential Revision: D12840607

Pulled By: riversand963

fbshipit-source-id: 0da187572791a59530065a7952697c05b1197ad9
2018-10-30 14:03:36 -07:00
Zhongyi Xie 9b3cf908a6 add missing range in random.choice argument (#4397)
Summary:
This will fix the broken asan crash test:
> Traceback (most recent call last):
  File "tools/db_crashtest.py", line 384, in <module>
    main()
  File "tools/db_crashtest.py", line 368, in main
    parser.add_argument("--" + k, type=type(v() if callable(v) else v))
  File "tools/db_crashtest.py", line 59, in <lambda>
    "index_block_restart_interval": lambda: random.choice(1, 16),
TypeError: choice() takes exactly 2 arguments (3 given)
Pull Request resolved: https://github.com/facebook/rocksdb/pull/4397

Differential Revision: D9933041

Pulled By: miasantreble

fbshipit-source-id: 10998e5bc6b6a5cea3e4088b18465affc246e639
2018-09-19 12:13:20 -07:00
Maysam Yabandeh a0ebec3804 Extend crash test with index_block_restart_interval (#4383)
Summary:
The default for index_block_restart_interval is 1 but some use 16 in production. The patch extends crash test to test both values.
Pull Request resolved: https://github.com/facebook/rocksdb/pull/4383

Differential Revision: D9887304

Pulled By: maysamyabandeh

fbshipit-source-id: a8d00fea974a79ad563f9f4d9d7b069e9f746a8f
2018-09-18 15:43:29 -07:00
Andrew Kryczka 8c25204633 Support manual flush in stress/crash tests (#4368)
Summary:
- Made stress test call `Flush()` periodically according to `--flush_one_in` flag.
- Enabled by default in crash test.
Pull Request resolved: https://github.com/facebook/rocksdb/pull/4368

Differential Revision: D9838593

Pulled By: ajkr

fbshipit-source-id: fe5a6e49b36e5ea752acc3aa8be364f8ef34d9cc
2018-09-17 12:27:55 -07:00
Maysam Yabandeh d122025891 Extend stress test to format_version 4 (#4265)
Summary:
Stress tests currently cover format_version 2 and 3. The patch adds 4 as well.
Pull Request resolved: https://github.com/facebook/rocksdb/pull/4265

Differential Revision: D9323185

Pulled By: maysamyabandeh

fbshipit-source-id: 54d11e41ecae09bae14cadd7313f07c9a3db5a57
2018-08-14 14:13:33 -07:00
Andrew Kryczka 6175b4b294 Support dictionary compression in stress/crash tests (#4234)
Summary:
- Add `--compression_max_dict_bytes` and `--compression_zstd_max_train_bytes` flags to stress test
- Randomly enable/disable the above flags in crash test
- Set `--compression_type=zstd` in FB-specific crash test runs
Pull Request resolved: https://github.com/facebook/rocksdb/pull/4234

Differential Revision: D9187207

Pulled By: ajkr

fbshipit-source-id: 8d78cf8d8e1165f2cd1c32e069b73726b5bc1fd2
2018-08-06 15:27:29 -07:00
Siying Dong 4b0a43574a db_stress to cover upper bound in iterators (#4162)
Summary:
db_stress doesn't cover upper or lower bound in iterators. Try to cover it by randomly assigning a random one. Also in prefix scan tests, with 50% of the chance, set next prefix as the upper bound.
Pull Request resolved: https://github.com/facebook/rocksdb/pull/4162

Differential Revision: D8953507

Pulled By: siying

fbshipit-source-id: f0f04e9cb6c07cbebbb82b892ca23e0daeea708b
2018-07-23 10:45:29 -07:00
Peter (Stig) Edwards 2694b6dc26 Remove unused imports, from python scripts. (#4057)
Summary:
Also remove redefined variable.
As reported on https://lgtm.com/projects/g/facebook/rocksdb/
Closes https://github.com/facebook/rocksdb/pull/4057

Differential Revision: D8648342

Pulled By: ajkr

fbshipit-source-id: afd2ba84d1364d316010179edd44777e64ca9183
2018-06-26 12:43:04 -07:00
Andrew Kryczka 7f3a634e06 Support pipelined write in stress/crash tests
Summary: Closes https://github.com/facebook/rocksdb/pull/4019

Differential Revision: D8508681

Pulled By: ajkr

fbshipit-source-id: 23a3c07d642386446e322b02e69cdf70d12ef009
2018-06-19 09:14:12 -07:00
Andrew Kryczka 8585059ae0 Support backup and checkpoint in db_stress (#4005)
Summary:
Add the `backup_one_in` and `checkpoint_one_in` options to periodically trigger backups and checkpoints. The directory names contain thread ID to avoid clashing with parallel backups/checkpoints. Enable checkpoint in crash test so our CI runs will use it. Didn't enable backup in crash test since it copies all the files which is too slow.
Closes https://github.com/facebook/rocksdb/pull/4005

Differential Revision: D8472275

Pulled By: ajkr

fbshipit-source-id: ff91bdc37caac4ffd97aea8df96b3983313ac1d5
2018-06-18 19:28:18 -07:00
Andrew Kryczka de2c6fb158 Fix stderr processing in crash test (#4006)
Summary:
Fixed bug where `db_stress` output a line with a warning followed by a line with an error, and `db_crashtest.py` considered that a success. For example:

```
WARNING: prefix_size is non-zero but memtablerep != prefix_hash
open error: Corruption: SST file is ahead of WALs
```
Closes https://github.com/facebook/rocksdb/pull/4006

Differential Revision: D8473463

Pulled By: ajkr

fbshipit-source-id: 60461bdd7491d9d26c63f7d4ee522a0f88ba3de7
2018-06-18 17:58:13 -07:00
Andrew Kryczka 7497f992e0 Run manual compaction in stress/crash tests (#3936)
Summary:
- Add support to `db_stress` for `CompactRange`
- Enable `CompactRange` and `CompactFiles` in crash tests
Closes https://github.com/facebook/rocksdb/pull/3936

Differential Revision: D8230953

Pulled By: ajkr

fbshipit-source-id: 208f9980b5bc8c204b1fa726e83791ad674e21e8
2018-06-13 16:45:28 -07:00
Maysam Yabandeh d0c38c0c8c Extend some tests to format_version=3 (#3942)
Summary:
format_version=3 changes the format of SST index. This is however not being tested currently since tests only work with the default format_version which is currently 2. The patch extends the most related tests to also test for format_version=3.
Closes https://github.com/facebook/rocksdb/pull/3942

Differential Revision: D8238413

Pulled By: maysamyabandeh

fbshipit-source-id: 915725f55753dd8e9188e802bf471c23645ad035
2018-06-04 20:13:00 -07:00
Andrew Kryczka 4f297ad05f Fix crash test check for direct I/O
Summary:
We need to keep the DB directory around since the direct IO check in "db_crashtest.py" relies on it existing. This PR fixes an issue where it was removed after each stress test run during the second half of whitebox crash testing.
Closes https://github.com/facebook/rocksdb/pull/3946

Differential Revision: D8247998

Pulled By: ajkr

fbshipit-source-id: 4e7cffbdab9b40df125e7842d0d59916e76261d3
2018-06-03 21:42:12 -07:00
Andrew Kryczka 88c3ee2d31 Configure direct I/O statically in db_stress
Summary:
Previously `db_stress` attempted to configure direct I/O dynamically in `SetOptions()` which had multiple problems (ummm must've never been tested):

- It's a DB option so SetDBOptions should've been called instead
- It's not a dynamic option so even SetDBOptions would fail
- It required enabling SyncPoint to mask O_DIRECT since it had no way to detect whether the DB directory was in tmpfs or not. This required locking that consumed ~80% of db_stress CPU.

In this PR I delete the broken dynamic config and instead configure it statically, only enabling it if the DB directory truly supports O_DIRECT.
Closes https://github.com/facebook/rocksdb/pull/3939

Differential Revision: D8238120

Pulled By: ajkr

fbshipit-source-id: 60bb2deebe6c9b54a3f788079261715b4a229279
2018-06-01 16:42:34 -07:00
Andrew Kryczka d19f568abf Refactor argument handling in db_crashtest.py
Summary:
- Any options unknown to `db_crashtest.py` are now passed directly to `db_stress`. This way, we won't need to update `db_crashtest.py` every time `db_stress` gets a new option.
- Remove `db_crashtest.py` redundant arguments where the value is the same as `db_stress`'s default
- Remove `db_crashtest.py` redundant arguments where the value is the same in a previously applied options map. For example, default_params are always applied before whitebox_default_params, so if they require the same value for an argument, that value only needs to be provided in default_params.
- Made the simple option maps applied in addition to the regular option maps. Previously they were exclusive which led to lots of duplication
Closes https://github.com/facebook/rocksdb/pull/3809

Differential Revision: D7885779

Pulled By: ajkr

fbshipit-source-id: 3a3243b55724d6d5bff36e939b582b9b62c538a8
2018-05-09 13:42:41 -07:00
Andrew Kryczka 46152d53bf Second attempt at db_stress crash-recovery verification
Summary:
- Original commit: a4fb1f8c04
- Revert commit (we reverted as a quick fix to get crash tests passing): 6afe22db2e

This PR includes the contents of the original commit plus two bug fixes, which are:

- In whitebox crash test, only set `--expected_values_path` for `db_stress` runs in the first half of the crash test's duration. In the second half, a fresh DB is created for each `db_stress` run, so we cannot maintain expected state across `db_stress` runs.
- Made `Exists()` return true for `UNKNOWN_SENTINEL` values. I previously had an assert in `Exists()` that value was not `UNKNOWN_SENTINEL`. But it is possible for post-crash-recovery expected values to be `UNKNOWN_SENTINEL` (i.e., if the crash happens in the middle of an update), in which case this assertion would be tripped. The effect of returning true in this case is there may be cases where a `SingleDelete` deletes no data. But if we had returned false, the effect would be calling `SingleDelete` on a key with multiple older versions, which is not supported.
Closes https://github.com/facebook/rocksdb/pull/3793

Differential Revision: D7811671

Pulled By: ajkr

fbshipit-source-id: 67e0295bfb1695ff9674837f2e05bb29c50efc30
2018-04-30 12:27:34 -07:00
Andrew Kryczka 6afe22db2e revert db_stress crash-recovery verification
Summary:
crash-recovery verification is failing in the whitebox testing, which may or may not be a valid correctness issue -- need more time to investigate. In the meantime, reverting so we don't mask other failures.
Closes https://github.com/facebook/rocksdb/pull/3786

Differential Revision: D7794516

Pulled By: ajkr

fbshipit-source-id: 28ccdfdb9ec9b3b0fb08c15cbf9d2e282201ff33
2018-04-27 12:57:01 -07:00
Andrew Kryczka db36f222d8 Allow options file in db_stress and db_crashtest
Summary:
- When options file is provided to db_stress, take supported options from the file instead of from flags
- Call `BuildOptionsTable` after `Open` so it can use `options_` once it has been populated either from flags or from file
- Allow options filename to be passed via `db_crashtest.py`
Closes https://github.com/facebook/rocksdb/pull/3768

Differential Revision: D7755331

Pulled By: ajkr

fbshipit-source-id: 5205cc5deb0d74d677b9832174153812bab9a60a
2018-04-26 18:42:07 -07:00
Andrew Kryczka a4fb1f8c04 Add crash-recovery correctness check to db_stress
Summary:
Previously, our `db_stress` tool held the expected state of the DB in-memory, so after crash-recovery, there was no way to verify data correctness. This PR adds an option, `--expected_values_file`, which specifies a file holding the expected values.

In black-box testing, the `db_stress` process can be killed arbitrarily, so updates to the `--expected_values_file` must be atomic. We achieve this by `mmap`ing the file and relying on `std::atomic<uint32_t>` for atomicity. Actually this doesn't provide a total guarantee on what we want as `std::atomic<uint32_t>` could, in theory, be translated into multiple stores surrounded by a mutex. We can verify our assumption by looking at `std::atomic::is_always_lock_free`.

For the `mmap`'d file, we didn't have an existing way to expose its contents as a raw memory buffer. This PR adds it in the `Env::NewMemoryMappedFileBuffer` function, and `MemoryMappedFileBuffer` class.

`db_crashtest.py` is updated to use an expected values file for black-box testing. On the first iteration (when the DB is created), an empty file is provided as `db_stress` will populate it when it runs. On subsequent iterations, that same filename is provided so `db_stress` can check the data is as expected on startup.
Closes https://github.com/facebook/rocksdb/pull/3629

Differential Revision: D7463144

Pulled By: ajkr

fbshipit-source-id: c8f3e82c93e045a90055e2468316be155633bd8b
2018-04-24 15:58:22 -07:00
Andrew Kryczka b058a33705 Reduce default --nooverwritepercent in black-box crash tests
Summary:
Previously `python tools/db_crashtest.py blackbox` would do no useful work as the crash interval (two minutes) was shorter than the preparation phase. The preparation phase is slow because of the ridiculously inefficient way it computes which keys should not be overwritten. It was doing this for 60M keys since default values were `FLAGS_nooverwritepercent == 60` and `FLAGS_max_key == 100000000`.

Move the "nooverwritepercent" override from whitebox-specific to the general options so it also applies to blackbox test runs. Now preparation phase takes a few seconds.
Closes https://github.com/facebook/rocksdb/pull/3671

Differential Revision: D7457732

Pulled By: ajkr

fbshipit-source-id: 601f4461a6a7e49e50449dcf15aebc9b8a98d6f0
2018-04-03 15:28:40 -07:00
Mark Isaacson b8eb32f8cf Suppress lint in old files
Summary: Grandfather in super old lint issues to make a clean slate for moving forward that allows us to have stronger enforcement on new issues.

Reviewed By: yiwu-arbug

Differential Revision: D6821806

fbshipit-source-id: 22797d31ec58e9eb0255d3b66fedfcfcb0dc127c
2018-01-29 12:56:42 -08:00
Andrew Kryczka d75793d6b4 db_stress support long-held snapshots
Summary:
Add options to `db_stress` (correctness testing tool) to randomly acquire snapshot and release it after some period of time. It's useful for correctness testing of #3009, as well as other parts of compaction that behave differently depending on which snapshots are held.
Closes https://github.com/facebook/rocksdb/pull/3038

Differential Revision: D6086501

Pulled By: ajkr

fbshipit-source-id: 3ec0d8666c78ac507f1f808887c4ff759ba9b865
2017-10-20 15:26:59 -07:00
Siying Dong b87ee6f773 Use more keys per lock in daily TSAN crash test
Summary:
TSAN shows error when we grab too many locks at the same time. In TSAN crash test, make one shard key cover 2^22 keys so that no many keys will be hold at the same time.
Closes https://github.com/facebook/rocksdb/pull/2719

Differential Revision: D5609035

Pulled By: siying

fbshipit-source-id: 930e5d63fff92dbc193dc154c4c615efbdf06c6a
2017-08-10 17:56:57 -07:00
Daniel Black 67510eeff3 db_crashtest.py: remove need for shell
Summary:
Before:
$ ps -ef

build     1713    16  0 Jul11 ?        00:00:00 make crash_test
build     3437  1713  0 Jul11 ?        00:00:00 python -u tools/db_crashtest.py --simple blackbox
build     3438  3437  0 Jul11 ?        00:00:00 [sh] <defunct>
build     3440     1 99 Jul11 ?        5-03:01:25 ./db_stress --max_background_compactions=1 --max_write_buffer_number=3 --sync=0 --reopen=20 --write_buffer_size=33554432 --delpercent=5 --block_size=16384 --allow_concurrent_me

After:

build      1706     16  0 02:52 ?        00:00:01 make crash_test
build      3432   1706  0 02:55 ?        00:00:00 python -u tools/db_crashtest.py --simple blackbox
build      4452   3432 99 04:35 ?        00:01:42 ./db_stress --max_background_compactions=1 --max_write_buffer_number=3 --sync=0 --reopen=20 --write_buffer_size=33554432 --delpercent=5 --block_size=16384 --allow_concurr
Closes https://github.com/facebook/rocksdb/pull/2571

Differential Revision: D5421580

Pulled By: maysamyabandeh

fbshipit-source-id: d6c3970c38ea0fa23da653f4385e8e25d83f5c9f
2017-07-14 09:11:03 -07:00
Aaron Gao ba7da434ae fix db_stress crash caused by buggy kernel warning
Summary:
filter the warning out and only print it once.
Closes https://github.com/facebook/rocksdb/pull/2137

Differential Revision: D4870925

Pulled By: lightmark

fbshipit-source-id: 91b363ce7f70bce88b0780337f408fc4649139b8
2017-04-11 16:56:59 -07:00
Sagar Vemuri eb912a927e Remove disableDataSync option
Summary:
Remove disableDataSync, and another similarly named disable_data_sync options.
This is being done to simplify options, and also because the performance gains of this feature can be achieved by other methods.
Closes https://github.com/facebook/rocksdb/pull/1859

Differential Revision: D4541292

Pulled By: sagar0

fbshipit-source-id: 5b3a6ca
2017-02-13 11:09:13 -08:00
Yi Wu d76ddf327d Disable ClockCache db_crashtest
Summary: Tempororily disable clock cache in db_crashtest while we investigate data race issue with clock cache.

Test Plan:
    python ./tools/db_crashtest.py blackbox

Reviewers: sdong, lightmark, IslamAbdelRahman

Reviewed By: IslamAbdelRahman

Subscribers: andrewkr, dhruba, leveldb

Differential Revision: https://reviews.facebook.net/D62481
2016-08-24 14:19:05 -07:00
Yi Wu 4cc37f59e5 Introduce ClockCache
Summary:
Clock-based cache implemenetation aim to have better concurreny than
default LRU cache. See inline comments for implementation details.

Test Plan:
Update cache_test to run on both LRUCache and ClockCache. Adding some
new tests to catch some of the bugs that I fixed while implementing the
cache.

Reviewers: kradhakrishnan, sdong

Reviewed By: sdong

Subscribers: andrewkr, dhruba, leveldb

Differential Revision: https://reviews.facebook.net/D61647
2016-08-19 12:28:19 -07:00
sdong f85df120f2 Re-enable tsan crash white-box test with reduced killing odds
Summary:
Tsan crash white-box test was disabled because it never ends. Re-enable it with reduced killing odds.
Add a parameter in crash test script to allow we pass it through an environment variable.

Test Plan: Run it manually.

Reviewers: IslamAbdelRahman

Reviewed By: IslamAbdelRahman

Subscribers: leveldb, andrewkr, dhruba

Differential Revision: https://reviews.facebook.net/D61053
2016-07-22 12:51:27 -07:00
Islam AbdelRahman 68a8e6b8fa Introduce FullMergeV2 (eliminate memcpy from merge operators)
Summary:
This diff update the code to pin the merge operator operands while the merge operation is done, so that we can eliminate the memcpy cost, to do that we need a new public API for FullMerge that replace the std::deque<std::string> with std::vector<Slice>

This diff is stacked on top of D56493 and D56511

In this diff we
- Update FullMergeV2 arguments to be encapsulated in MergeOperationInput and MergeOperationOutput which will make it easier to add new arguments in the future
- Replace std::deque<std::string> with std::vector<Slice> to pass operands
- Replace MergeContext std::deque with std::vector (based on a simple benchmark I ran https://gist.github.com/IslamAbdelRahman/78fc86c9ab9f52b1df791e58943fb187)
- Allow FullMergeV2 output to be an existing operand

```
[Everything in Memtable | 10K operands | 10 KB each | 1 operand per key]

DEBUG_LEVEL=0 make db_bench -j64 && ./db_bench --benchmarks="mergerandom,readseq,readseq,readseq,readseq,readseq" --merge_operator="max" --merge_keys=10000 --num=10000 --disable_auto_compactions --value_size=10240 --write_buffer_size=1000000000

[FullMergeV2]
readseq      :       0.607 micros/op 1648235 ops/sec; 16121.2 MB/s
readseq      :       0.478 micros/op 2091546 ops/sec; 20457.2 MB/s
readseq      :       0.252 micros/op 3972081 ops/sec; 38850.5 MB/s
readseq      :       0.237 micros/op 4218328 ops/sec; 41259.0 MB/s
readseq      :       0.247 micros/op 4043927 ops/sec; 39553.2 MB/s

[master]
readseq      :       3.935 micros/op 254140 ops/sec; 2485.7 MB/s
readseq      :       3.722 micros/op 268657 ops/sec; 2627.7 MB/s
readseq      :       3.149 micros/op 317605 ops/sec; 3106.5 MB/s
readseq      :       3.125 micros/op 320024 ops/sec; 3130.1 MB/s
readseq      :       4.075 micros/op 245374 ops/sec; 2400.0 MB/s
```

```
[Everything in Memtable | 10K operands | 10 KB each | 10 operand per key]

DEBUG_LEVEL=0 make db_bench -j64 && ./db_bench --benchmarks="mergerandom,readseq,readseq,readseq,readseq,readseq" --merge_operator="max" --merge_keys=1000 --num=10000 --disable_auto_compactions --value_size=10240 --write_buffer_size=1000000000

[FullMergeV2]
readseq      :       3.472 micros/op 288018 ops/sec; 2817.1 MB/s
readseq      :       2.304 micros/op 434027 ops/sec; 4245.2 MB/s
readseq      :       1.163 micros/op 859845 ops/sec; 8410.0 MB/s
readseq      :       1.192 micros/op 838926 ops/sec; 8205.4 MB/s
readseq      :       1.250 micros/op 800000 ops/sec; 7824.7 MB/s

[master]
readseq      :      24.025 micros/op 41623 ops/sec;  407.1 MB/s
readseq      :      18.489 micros/op 54086 ops/sec;  529.0 MB/s
readseq      :      18.693 micros/op 53495 ops/sec;  523.2 MB/s
readseq      :      23.621 micros/op 42335 ops/sec;  414.1 MB/s
readseq      :      18.775 micros/op 53262 ops/sec;  521.0 MB/s

```

```
[Everything in Block cache | 10K operands | 10 KB each | 1 operand per key]

[FullMergeV2]
$ DEBUG_LEVEL=0 make db_bench -j64 && ./db_bench --benchmarks="readseq,readseq,readseq,readseq,readseq" --merge_operator="max" --num=100000 --db="/dev/shm/merge-random-10K-10KB" --cache_size=1000000000 --use_existing_db --disable_auto_compactions
readseq      :      14.741 micros/op 67837 ops/sec;  663.5 MB/s
readseq      :       1.029 micros/op 971446 ops/sec; 9501.6 MB/s
readseq      :       0.974 micros/op 1026229 ops/sec; 10037.4 MB/s
readseq      :       0.965 micros/op 1036080 ops/sec; 10133.8 MB/s
readseq      :       0.943 micros/op 1060657 ops/sec; 10374.2 MB/s

[master]
readseq      :      16.735 micros/op 59755 ops/sec;  584.5 MB/s
readseq      :       3.029 micros/op 330151 ops/sec; 3229.2 MB/s
readseq      :       3.136 micros/op 318883 ops/sec; 3119.0 MB/s
readseq      :       3.065 micros/op 326245 ops/sec; 3191.0 MB/s
readseq      :       3.014 micros/op 331813 ops/sec; 3245.4 MB/s
```

```
[Everything in Block cache | 10K operands | 10 KB each | 10 operand per key]

DEBUG_LEVEL=0 make db_bench -j64 && ./db_bench --benchmarks="readseq,readseq,readseq,readseq,readseq" --merge_operator="max" --num=100000 --db="/dev/shm/merge-random-10-operands-10K-10KB" --cache_size=1000000000 --use_existing_db --disable_auto_compactions

[FullMergeV2]
readseq      :      24.325 micros/op 41109 ops/sec;  402.1 MB/s
readseq      :       1.470 micros/op 680272 ops/sec; 6653.7 MB/s
readseq      :       1.231 micros/op 812347 ops/sec; 7945.5 MB/s
readseq      :       1.091 micros/op 916590 ops/sec; 8965.1 MB/s
readseq      :       1.109 micros/op 901713 ops/sec; 8819.6 MB/s

[master]
readseq      :      27.257 micros/op 36687 ops/sec;  358.8 MB/s
readseq      :       4.443 micros/op 225073 ops/sec; 2201.4 MB/s
readseq      :       5.830 micros/op 171526 ops/sec; 1677.7 MB/s
readseq      :       4.173 micros/op 239635 ops/sec; 2343.8 MB/s
readseq      :       4.150 micros/op 240963 ops/sec; 2356.8 MB/s
```

Test Plan: COMPILE_WITH_ASAN=1 make check -j64

Reviewers: yhchiang, andrewkr, sdong

Reviewed By: sdong

Subscribers: lovro, andrewkr, dhruba

Differential Revision: https://reviews.facebook.net/D57075
2016-07-20 09:49:03 -07:00
Islam AbdelRahman d6b79e2fd0 Remove filter_deletes from crash_test
Summary: filter_deletes option was removed, remove it from crash_test to fix it

Test Plan: make crash_test

Reviewers: yhchiang, andrewkr, sdong

Reviewed By: sdong

Subscribers: andrewkr, dhruba

Differential Revision: https://reviews.facebook.net/D59901
2016-06-21 17:30:21 -07:00
Yueh-Hsuan Chiang a2466c8851 [db_stress] Make subcompaction random in crash_test
Summary: Make subcompaction random in crash_test

Test Plan: make crash_test and verify whether subcompaction changes randomly

Reviewers: IslamAbdelRahman, kradhakrishnan, yiwu, sdong, andrewkr

Reviewed By: andrewkr

Subscribers: andrewkr, dhruba, leveldb

Differential Revision: https://reviews.facebook.net/D56571
2016-04-18 14:43:33 -07:00
sdong 2608219cc9 crash_test: cover concurrent memtable insert in default crash test
Summary: Default crash test uses prefix hash memtable, which is not compatible to concurrent memtable. Allow prefix test run with skip list and use skip list memtable when concurrent insert is used.

Test Plan: Run "python -u tools/db_crashtest.py whitebox" and watch sometimes skip list is used.

Reviewers: anthony, yhchiang, kradhakrishnan, andrewkr, IslamAbdelRahman

Reviewed By: IslamAbdelRahman

Subscribers: leveldb, dhruba

Differential Revision: https://reviews.facebook.net/D53907
2016-02-09 08:02:38 -08:00
sdong b1887c5dd9 Explictly fail when memtable doesn't support concurrent insert
Summary: If users turn on concurrent insert but the memtable doesn't support it, they might see unexcepted crash. Fix it by explicitly fail.

Test Plan:
Run different setting of stress_test and make sure it fails correctly.
Will add a unit test too.

Reviewers: anthony, kradhakrishnan, IslamAbdelRahman, yhchiang, andrewkr, ngbronson

Reviewed By: ngbronson

Subscribers: leveldb, dhruba

Differential Revision: https://reviews.facebook.net/D53895
2016-02-05 14:15:50 -08:00
sdong 34a40bf911 Add --allow_concurrent_memtable_write in stress test and run it in crash_test
Summary: Add an option of --allow_concurrent_memtable_write in stress test and cover it in crash test

Test Plan: Run crash test and make sure three combinations of the two options show up randomly.

Reviewers: IslamAbdelRahman, yhchiang, andrewkr, anthony, kradhakrishnan

Reviewed By: kradhakrishnan

Subscribers: leveldb, dhruba

Differential Revision: https://reviews.facebook.net/D53811
2016-02-04 16:15:18 -08:00
sdong fdbff42391 Crash test to make kill decision for every kill point
Summary:
In crash test, when coming to each kill point, we start a random class using seed as current second. With this approach, for every second, the random number used is the same. However, in each second, there are multiple kill points with different frequency. It makes it hard to reason about chance of kill point to trigger. With this commit, we use thread local random seed to generate the random number, so that it will take different values per second, hoping it makes chances of killing much easier to reason about.

Also significantly reduce the kill odd to make sure time before kiling is similar as before.

Test Plan: Run white box crash test and see the killing happens as expected and the run time time before killing reasonable.

Reviewers: kradhakrishnan, IslamAbdelRahman, rven, yhchiang, andrewkr, anthony

Reviewed By: anthony

Subscribers: leveldb, dhruba

Differential Revision: https://reviews.facebook.net/D52971
2016-01-19 18:11:24 -08:00
sdong ab0f3b964f crash_test to trigger some less frequent crash point more frequently
Summary: crash_test still has a very low chance to hit some crash point. Have another mode for covering them more likely.

Test Plan: Run crash_test and see db_stress is called with expected prameters.

Reviewers: kradhakrishnan, igor, anthony, rven, IslamAbdelRahman, yhchiang

Reviewed By: yhchiang

Subscribers: leveldb, dhruba

Differential Revision: https://reviews.facebook.net/D49473
2015-10-27 12:06:06 -07:00
Shusen Liu d0d13ebf67 fix bug in db_crashtest.py
Summary:
in tools/db_crashtest.py, cmd_params['db'] by default is a lambda expression, not the actual db_name.
fix by get the db_name before passing it to gen_cmd.

Test Plan: run `make crashtest`

Reviewers: sdong

Reviewed By: sdong

Subscribers: dhruba

Differential Revision: https://reviews.facebook.net/D49119
2015-10-20 22:01:11 -07:00
Shusen Liu 033c6f1add T7916298, bug fix
Summary: dbname => cmd_params['db']

Test Plan: Run `make crash_test`

Reviewers: sdong

Reviewed By: sdong

Subscribers: dhruba

Differential Revision: https://reviews.facebook.net/D49077
2015-10-19 21:09:35 -07:00
Shusen Liu 4575de5b9e #7916298: merge tools/db_crashtest2.py into tools/db_crashtest.py
Summary:
merge tools/db_crashtest2.py into tools/db_crashtest.py

python tools/db_crashtest.py -h  # show help message, ALL parameters can be overwrite by arguments

Example usages:
python tools/db_crashtest.py blackbox  # run blackbox with default parameters
python tools/db_crashtest.py blackbox --simple
python tools/db_crashtest.py whitebox  # run whitebox with default parameters
python tools/db_crashtest.py whitebox --simple

all default parameters are identical to previous version.

Test Plan: `make crash_test` and make sure it can run with expected parameters pased to db_stress.

Reviewers: igor, rven, anthony, IslamAbdelRahman, yhchiang, sdong

Reviewed By: sdong

Subscribers: dhruba

Differential Revision: https://reviews.facebook.net/D48567
2015-10-19 13:24:55 -07:00
sdong 680156ca61 crash_test to run with data sync on
Summary: Mode of data sync off is a much less used than the case of data sync on. Crash test should cover the more common case than a corner case. So turn data sync on in crash tests.

Test Plan: Run crash test and make sure it can run with expected parameters pased to db_stres.

Reviewers: IslamAbdelRahman, anthony, igor, rven, kradhakrishnan, yhchiang

Reviewed By: yhchiang

Subscribers: leveldb, dhruba

Differential Revision: https://reviews.facebook.net/D48729
2015-10-15 14:37:08 -07:00
sdong cf3e05304f crash_test cleans up directory before testing if TEST_TMPDIR is set
Summary: In a recent change, crash_test can put data under TEST_TMPDIR. However, the directory is not cleaned before running the test, which may cause unexpected results. Clean it.

Test Plan: Run white and black box crash test against non-existing, or non-empty but not compactible DBs, and make sure it works as expected.

Reviewers: kradhakrishnan, rven, yhchiang, IslamAbdelRahman

Reviewed By: IslamAbdelRahman

Subscribers: leveldb, dhruba

Differential Revision: https://reviews.facebook.net/D43515
2015-08-04 14:59:28 -07:00
sdong 2e73bd4ff2 crash_test to put DB under TEST_TMPDIR
Summary: Currently crash_test only puts data under /tmp. It is less flexible if we want to cover different file systems or media. Make crash_test to appreciate TEST_TMPDIR so that users can run it against another file system.

Test Plan: Run blackbox_crash_test and whitebox_crash_test with or without TEST_TMPDIR set and make sure DBs are put in the right place

Reviewers: kradhakrishnan, yhchiang, rven, IslamAbdelRahman

Reviewed By: IslamAbdelRahman

Subscribers: leveldb, dhruba

Differential Revision: https://reviews.facebook.net/D43509
2015-08-04 12:10:55 -07:00
sdong 1205bdbcee crash_test to cover simply cases
Summary:
crash_test now only runs complicated options, multiple column families, prefix hash, frequently changing options, many compaction threads, etc. These options are good to cover new features but we loss coverage in most common use cases. Furthermore, by running only for multiple column families, we are not able to create LSM trees that are large enough to cover some stress cases.
Make half of crash_test runs the simply tests: single column family, default mem table, one compaction thread, no change options.

Test Plan: Run crash_test

Reviewers: rven, yhchiang, IslamAbdelRahman, kradhakrishnan

Reviewed By: kradhakrishnan

Subscribers: leveldb, dhruba

Differential Revision: https://reviews.facebook.net/D43461
2015-08-04 12:08:38 -07:00
Lei Jin 714c63c584 db_stress for dynamic options
Summary: Allow SetOptions() during db_stress test

Test Plan: make crash_test

Reviewers: sdong, yhchiang, rven, igor

Reviewed By: igor

Subscribers: leveldb

Differential Revision: https://reviews.facebook.net/D25497
2014-10-27 12:11:16 -07:00
Igor Canadi d4a8423334 Remove seek compaction
Summary:
As discussed in our internal group, we don't get much use of seek compaction at the moment, while it's making code more complicated and slower in some cases.

This diff removes seek compaction and (hopefully) all code that was introduced to support seek compaction.

There is one test case that relied on didIO information. I'll try to find another way to implement it.

Test Plan: make check

Reviewers: sdong, haobo, yhchiang, ljin, dhruba

Reviewed By: ljin

Subscribers: leveldb

Differential Revision: https://reviews.facebook.net/D19161
2014-06-20 10:23:02 +02:00
Igor Canadi ac328a86b9 Merge branch 'master' into columnfamilies
Conflicts:
	db/db_impl.cc
	db/db_test.cc
2014-03-20 14:41:37 -07:00
Igor Canadi 4cfb0eb40f Delete rocksdb dir after crashtest
Summary:
We should clean up after ourselves. Last crashtest failed because the disk on the cibuild machine was full.

Also, db_crashtest is supposed to run the test on the same database in every iteration. This isn't the case currently, because every run creates a new tmp directory. It's fixed with this diff.

Test Plan: ran it

Reviewers: ljin

Reviewed By: ljin

CC: leveldb

Differential Revision: https://reviews.facebook.net/D17073
2014-03-20 11:11:08 -07:00
Igor Canadi 159928dfa5 Added flag progress_reports in db_stress 2014-03-19 09:58:41 -07:00
Igor Canadi 04a1035efe Revert "DB stress with normal skip list"
This reverts commit 86926d8c6a.
2014-03-12 22:21:13 -07:00
Igor Canadi 86926d8c6a DB stress with normal skip list
Summary:
Hash skip list has issues, causing db_stress to fail badly.

For now, switching back to skip_list by default before we figure out root cause.

Test Plan: db_stress is happy(ier)

Reviewers: ljin

Reviewed By: ljin

CC: leveldb

Differential Revision: https://reviews.facebook.net/D16845
2014-03-12 17:35:27 -07:00
Igor Canadi 7b7793e97a Don't sync in stress test
Summary: Syncing in stress test makes it run much much much slower. It also doesn't add much value IMO.

Test Plan: no

Reviewers: ljin

Reviewed By: ljin

CC: leveldb

Differential Revision: https://reviews.facebook.net/D16839
2014-03-12 15:12:09 -07:00
Igor Canadi 5ba028c179 DBStress cleanup
Summary:
*) fixed the comment
*) constant 1 was not casted to 64-bit, which (I think) might cause overflow if we shift it too much
*) default prefix size to be 7, like it was before

Test Plan: compiled

Reviewers: ljin

Reviewed By: ljin

CC: leveldb

Differential Revision: https://reviews.facebook.net/D16827
2014-03-12 09:31:06 -07:00
Lei Jin 02dab3be11 fix db_stress test
Summary: Fix the db_stress test, let is run with HashSkipList for real

Test Plan:
python tools/db_crashtest.py
python tools/db_crashtest2.py

Reviewers: igor, haobo

Reviewed By: igor

CC: leveldb

Differential Revision: https://reviews.facebook.net/D16773
2014-03-11 13:44:33 -07:00
Igor Canadi 97eddef235 Reopen DB in crash test
Summary:
Why don't we automatically reopen DB when running crash test (running in our nightly build)? If I understand correctly, crashtest is manually reopenning the DB, but then the DB does not check its consistency when you kill db_stress process and then re-run it again.
Does this make sense?

Test Plan: not reall

Reviewers: dhruba, haobo, emayanke

Reviewed By: emayanke

CC: leveldb

Differential Revision: https://reviews.facebook.net/D16167
2014-03-03 17:10:30 -08:00
Natalie Hildebrandt 9262061b0d Fixing crashing tests to include iterpercent param
Summary: Adding in the iterpercent flag to tests.

Test Plan: make crash_test

Reviewers: emayanke

Reviewed By: emayanke

Differential Revision: https://reviews.facebook.net/D13035
2013-09-20 16:27:22 -07:00
Xing Jin af732c7ba8 Add universal compaction to db_stress nightly build
Summary:
Most code change in this diff is code cleanup/rewrite. The logic changes include:

(1) add universal compaction to db_crashtest2.py
(2) randomly set --test_batches_snapshots to be 0 or 1 in db_crashtest2.py. Old codes always use 1.
(3) use different tmp directory as db directory in different runs. I saw some intermittent errors in my local tests. Use of different tmp directory seems to be able to solve the issue.

Test Plan: Have run "make crashtest" for multiple times. Also run "make all check"

Reviewers: emayanke, dhruba, haobo

Reviewed By: emayanke

Differential Revision: https://reviews.facebook.net/D12369
2013-08-20 17:37:49 -07:00
Tyler Harter 85d83a150b Update crashtests to match D12267
Summary: I changed the db_stress configs, but forgot to update the scripts using the old configs.

Test Plan: 'make blackbox_crash_test' and 'make whitebox_crash_test' start running normally now (I haven't run them til the end, though).

Reviewers: vamsi

Reviewed By: vamsi

CC: leveldb, dhruba

Differential Revision: https://reviews.facebook.net/D12303
2013-08-15 10:14:32 -07:00