Commit graph

426 commits

Author SHA1 Message Date
Po-Chuan Hsieh b03d415660 Fix build on i386 (#12719)
Summary:
Cited from https://pkg-status.freebsd.org/beefy21/data/140i386-default/02faf78f4c9b/logs/rocksdb-9.2.1.log
The error message is as follows:
```
mkdir -p db_stress_tool && clang++ -O2 -pipe  -DOS_FREEBSD -fstack-protector-strong -isystem /usr/local/include -fno-strict-aliasing   -Wno-inconsistent-missing-override -Wno-unused-parameter -Wno-unused-variable -Wno-unused-private-field -isystem /usr/local/include -std=c++17   -fPIC -DROCKSDB_DLL -DROCKSDB_USE_RTTI   -g -W -Wextra -Wall -Wsign-compare -Wshadow -Wunused-parameter -Werror -I. -I./include -std=c++17 -O2 -pipe  -DOS_FREEBSD -fstack-protector-strong -isystem /usr/local/include -fno-strict-aliasing   -Wno-inconsistent-missing-override -Wno-unused-parameter -Wno-unused-variable -Wno-unused-private-field -isystem /usr/local/include -std=c++17  -faligned-new -DHAVE_ALIGNED_NEW -DROCKSDB_PLATFORM_POSIX -DROCKSDB_LIB_IO_POSIX -O2 -pipe  -DOS_FREEBSD -fstack-protector-strong -isystem /usr/local/include -fno-strict-aliasing  -D_REENTRANT -DOS_FREEBSD -DSNAPPY -DGFLAGS=1 -DZLIB -DBZIP2 -DLZ4 -DZSTD -DROCKSDB_MALLOC_USABLE_SIZE -DROCKSDB_PTHREAD_ADAPTIVE_MUTEX -DROCKSDB_BACKTRACE -DROCKSDB_SCHED_GETCPU_PRESENT   -isystem third-party/gtest-1.8.1/fused-src -O2 -fno-omit-frame-pointer -momit-leaf-frame-pointer -DNDEBUG -Woverloaded-virtual -Wnon-virtual-dtor -Wno-missing-field-initializers -Wno-invalid-offsetof -c db_stress_tool/db_stress_common.cc -o db_stress_tool/db_stress_common.o
db_stress_tool/db_stress_common.cc:204:17: error: format specifies type 'unsigned long' but the argument has type 'size_t' (aka 'unsigned int') [-Werror,-Wformat]
                block_cache->GetCapacity());
                ^~~~~~~~~~~~~~~~~~~~~~~~~~
1 error generated.
```

Pull Request resolved: https://github.com/facebook/rocksdb/pull/12719

Reviewed By: hx235

Differential Revision: D58093539

Pulled By: jaykorean

fbshipit-source-id: 400cae3a4b0d23b168937a5388065ef1c4b8b56e
2024-06-04 09:36:00 -07:00
Levi Tamasi 023a808417 Disable iterator refresh for CoalescingIterator in TestIterateAgainstExpected (#12723)
Summary:
Pull Request resolved: https://github.com/facebook/rocksdb/pull/12723

`CoalescingIterator` doesn't support `Refresh` currently; the patch adds a check that was missing from https://github.com/facebook/rocksdb/pull/12721 to disable this operation when multi-CF iterators are in use in the stress test.

Reviewed By: jaykorean

Differential Revision: D58053334

fbshipit-source-id: 3146f0e7e87230b49b244cecdfcee345c0ce78fa
2024-06-01 09:52:48 -07:00
Jay Huh f3b7e959b3 Add CoalescingIterator to TestIterateAgainstExpected (#12721)
Summary:
Continuing from https://github.com/facebook/rocksdb/pull/12706. Adding the CoalescingIterator to `TestIterateAgainstExpected` as well when `use_multi_cf_iterator` is set True

Pull Request resolved: https://github.com/facebook/rocksdb/pull/12721

Test Plan:
```
python3 tools/db_crashtest.py blackbox --simple --max_key=25000000 --write_buffer_size=4194304 --use_attribute_group=0 --use_put_entity_one_in=1 --use_multi_get=1 --use_multi_cf_iterator=1 --verify_iterator_with_expected_state_one_in=2
```

Reviewed By: ltamasi

Differential Revision: D58033811

Pulled By: jaykorean

fbshipit-source-id: 7caf39883e277e695b653e295ad72b1004169ca0
2024-05-31 15:17:06 -07:00
Levi Tamasi 6f17056e40 Add transactional/read-your-own-write MultiGetEntity stress test (#12717)
Summary:
Pull Request resolved: https://github.com/facebook/rocksdb/pull/12717

The PR adds `Transaction::MultiGetEntity` to the stress tests. Similarly to what we do for `Transaction::MultiGet`, in this mode we open a transaction and randomly add writes for some of the queried keys to it while keeping track of the values written on a per-key basis. The results of `Transaction::MultiGetEntity` can then be validated against these expected values (in order to test the read-your-own-writes functionality) as well as the results returned by `Transaction::GetEntity` for the same keys.

Reviewed By: jaykorean

Differential Revision: D57990210

fbshipit-source-id: 9bf3bb292051c2c57757f86b517919197b03c524
2024-05-31 12:01:59 -07:00
Jay Huh a901ef48f0 Introduce use_multi_cf_iterator in stress test (#12706)
Summary:
Introduce `use_multi_cf_iterator`, and when it's set, use `CoalescingIterator` in `TestIterate()`. Because all the column families contain the same data in today's Stress Test, we can compare `CoalescingIterator` against any `DBIter` from any of the column families. Currently, coalescing logic verification is done by unit tests, but we can extend the stress test to support different data in different column families in the future.

Pull Request resolved: https://github.com/facebook/rocksdb/pull/12706

Test Plan:
```
python3 tools/db_crashtest.py blackbox --simple --max_key=25000000 --write_buffer_size=4194304 --use_attribute_group=0 --use_put_entity_one_in=1 --use_multi_get=1 --use_multi_cf_iterator=1
```

**More PRs to come**
- Use `AttributeGroupIterator` when both `use_multi_cf_iterator` and `use_attribute_group` are true
- Support `Refresh()` in `CoalescingIterator`
- Extend Stress Test to support different data in different CFs (Long-term)

Reviewed By: ltamasi

Differential Revision: D58020247

Pulled By: jaykorean

fbshipit-source-id: 8e2483b85cf2bb0f5a9bb44851601bbf063484ec
2024-05-31 10:50:15 -07:00
Levi Tamasi 01179678b2 Refactor the non-attribute-group/attribute-group code paths in NonBatchedOpsStressTest::TestMultiGetEntity (#12715)
Summary:
Pull Request resolved: https://github.com/facebook/rocksdb/pull/12715

The patch refactors/deduplicates the non-attribute-group and attribute-group code paths in `NonBatchedOpsStressTest::TestMultiGetEntity` by introducing two new generic lambdas `verify_expected_errors` and `check_results` (the latter of which subsumes the existing `handle_results`) that can handle both types of APIs. This change also serves as groundwork for the upcoming transactional `MultiGetEntity` stress tests.

Reviewed By: jaykorean

Differential Revision: D57977700

fbshipit-source-id: 83a18a9e57f46ea92ba07b2f0dca3e9bc353f257
2024-05-30 13:31:25 -07:00
Levi Tamasi b6ea246333 Fix NonBatchOpsStressTest::TestGetEntity by adding fuzzy match (#12711)
Summary:
Pull Request resolved: https://github.com/facebook/rocksdb/pull/12711

The patch adds the missing other half of https://github.com/facebook/rocksdb/pull/12709: when there is no locking in a read test, we have to be more permissive when it comes to values returned by queries. In particular, any expected state value in a small window around the read call should be allowed, and discrepancies in the presence/absence of a key should only be treated as a failure if the key is guaranteed to have not existed/existed during the above window.

Reviewed By: hx235

Differential Revision: D57938678

fbshipit-source-id: cd5c8bc2e014ec12ea4daf441965f3ec2115663e
2024-05-29 17:00:11 -07:00
Levi Tamasi d9316260e4 Remove unneccessary MutexLock from NonBatchedOpStressTest::TestGetEntity (#12709)
Summary:
Pull Request resolved: https://github.com/facebook/rocksdb/pull/12709

This is most likely copypasta from `TestGet` from before https://github.com/facebook/rocksdb/pull/11058 . There is no need to lock the mutex for the key for reads; in fact, doing so is detrimental to test coverage since it locks out concurrent writers.

Reviewed By: jowlyzhang

Differential Revision: D57915207

fbshipit-source-id: eb0dbf6b84e5408b87d96dd47597511996e206a7
2024-05-29 10:16:37 -07:00
Levi Tamasi 5cec4bbcab Support PutEntity as write method in the transactional MultiGet stress test (#12699)
Summary:
Pull Request resolved: https://github.com/facebook/rocksdb/pull/12699

The patch adds `PutEntity` to the potential write operations used in the read-your-own-writes tests for `Transaction::MultiGet`. Note that since the stress test generates wide-column structures which have the value returned by `GenerateValue` in the default column, this does not affect the results returned by the `MultiGet` API (unless we have a bug).

The wide-column entity is generated according to the usual rules based on the value base and the `use_put_entity_one_in` flag. The entire entity structure will be validated by the upcoming stress test for `Transaction::MultiGetEntity`, where we also plan to leverage this logic.

Reviewed By: jowlyzhang

Differential Revision: D57799075

fbshipit-source-id: 5f86c2b2b3ceee8e1b8bf7453c02f1f1b1b00751
2024-05-28 16:54:10 -07:00
Peter Dillinger d2ef70872f Rename, deprecate LogFile and VectorLogPtr (#12695)
Summary:
These names are confusing with `Logger` etc. so moving to `WalFile` etc.

Other small, related name refactorings.

Pull Request resolved: https://github.com/facebook/rocksdb/pull/12695

Test Plan: Left most unit tests using old names as an API compatibility test. Non-test code compiles with deprecated names removed. No functional changes.

Reviewed By: ajkr

Differential Revision: D57747458

Pulled By: pdillinger

fbshipit-source-id: 7b77596b9c20d865d43b9dc66c30c8bd2b3b424f
2024-05-28 09:24:49 -07:00
Levi Tamasi bd801bd98c Factor out the RYW transaction building logic into a helper (#12697)
Summary:
Pull Request resolved: https://github.com/facebook/rocksdb/pull/12697

As groundwork for stress testing `Transaction::MultiGetEntity`, the patch factors out the logic for adding transactional writes for some of the keys in a `MultiGet` batch into a separate helper method called `MaybeAddKeyToTxnForRYW`.

Reviewed By: jowlyzhang

Differential Revision: D57791830

fbshipit-source-id: ef347ba6e6e82dfe5cedb4cf67dd6d1503901d89
2024-05-24 14:24:17 -07:00
Changyu Bi fecb10c2fa Improve universal compaction sorted-run trigger (#12477)
Summary:
Universal compaction currently uses `level0_file_num_compaction_trigger` for two purposes:
1. the trigger for checking if there is any compaction to do, and
2. the limit on the number of sorted runs. RocksDB will do compaction to keep the number of sorted runs no more than the value of this option.

This can make the option inflexible. A value that is too small causes higher write amp: more compactions to reduce the number of sorted runs. A value that is too big delays potential compaction work and causes worse read performance. This PR introduce an option `CompactionOptionsUniversal::max_read_amp` for only the second purpose: to specify
the hard limit on the number of sorted runs.

For backward compatibility, `max_read_amp = -1` by default, which means to fallback to the current behavior.
When `max_read_amp > 0`,`level0_file_num_compaction_trigger` will only serve as a trigger to find potential compaction.
When `max_read_amp = 0`, RocksDB will auto-tune the limit on the number of sorted runs. The estimation is based on DB size, write_buffer_size and size_ratio, so it is adaptive to the size change of the DB. See more in `UniversalCompactionBuilder::PickCompaction()`.
Alternatively, users now can configure `max_read_amp` to a very big value and keep `level0_file_num_compaction_trigger` small. This will allow `size_ratio` and `max_size_amplification_percent` to control the number of sorted runs. This essentially disables compactions with reason kUniversalSortedRunNum.

Pull Request resolved: https://github.com/facebook/rocksdb/pull/12477

Test Plan:
* new unit test
* existing unit test for default behavior
* updated crash test with the new option
* benchmark:
  * Create a DB that is roughly 24GB in the last level. When `max_read_amp = 0`, we estimate that the DB needs 9 levels to avoid excessive compactions to reduce the number of sorted runs.
  * We then run fillrandom to ingest another 24GB data to compare write amp.
     * case 1: small level0 trigger: `level0_file_num_compaction_trigger=5, max_read_amp=-1`
       * write-amp: 4.8
     * case 2: auto-tune: `level0_file_num_compaction_trigger=5, max_read_amp=0`
       *  write-amp: 3.6
     * case 3: auto-tune with minimal trigger: `level0_file_num_compaction_trigger=1, max_read_amp=0`
       *  write-amp: 3.8
     * case 4: hard-code a good value for trigger: `level0_file_num_compaction_trigger=9`
       * write-amp: 2.8
```
Case 1:
** Compaction Stats [default] **
Level    Files   Size     Score Read(GB)  Rn(GB) Rnp1(GB) Write(GB) Wnew(GB) Moved(GB) W-Amp Rd(MB/s) Wr(MB/s) Comp(sec) CompMergeCPU(sec) Comp(cnt) Avg(sec) KeyIn KeyDrop Rblob(GB) Wblob(GB)
------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
  L0      0/0    0.00 KB   1.0      0.0     0.0      0.0      22.6     22.6       0.0   1.0      0.0    163.2    141.94            111.10       108    1.314       0      0       0.0       0.0
 L45      8/0    1.81 GB   0.0     39.6    11.1     28.5      39.3     10.8       0.0   3.5    209.0    207.3    194.25            191.29        43    4.517    348M  2498K       0.0       0.0
 L46     13/0    3.12 GB   0.0     15.3     9.5      5.8      15.0      9.3       0.0   1.6    203.1    199.3     77.13             75.88        16    4.821    134M  2362K       0.0       0.0
 L47     19/0    4.68 GB   0.0     15.4    10.5      4.9      14.7      9.8       0.0   1.4    204.0    194.9     77.38             76.15         8    9.673    135M  5920K       0.0       0.0
 L48     38/0    9.42 GB   0.0     19.6    11.7      7.9      17.3      9.4       0.0   1.5    206.5    182.3     97.15             95.02         4   24.287    172M    20M       0.0       0.0
 L49     91/0   22.70 GB   0.0      0.0     0.0      0.0       0.0      0.0       0.0   0.0      0.0      0.0      0.00              0.00         0    0.000       0      0       0.0       0.0
 Sum    169/0   41.74 GB   0.0     89.9    42.9     47.0     109.0     61.9       0.0   4.8    156.7    189.8    587.85            549.45       179    3.284    791M    31M       0.0       0.0

Case 2:
** Compaction Stats [default] **
Level    Files   Size     Score Read(GB)  Rn(GB) Rnp1(GB) Write(GB) Wnew(GB) Moved(GB) W-Amp Rd(MB/s) Wr(MB/s) Comp(sec) CompMergeCPU(sec) Comp(cnt) Avg(sec) KeyIn KeyDrop Rblob(GB) Wblob(GB)
------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
  L0      1/0   214.47 MB   1.2      0.0     0.0      0.0      22.6     22.6       0.0   1.0      0.0    164.5    140.81            109.98       108    1.304       0      0       0.0       0.0
 L44      0/0    0.00 KB   0.0      1.3     1.3      0.0       1.2      1.2       0.0   1.0    206.1    204.9      6.24              5.98         3    2.081     11M    51K       0.0       0.0
 L45      4/0   844.36 MB   0.0      7.1     5.4      1.7       7.0      5.4       0.0   1.3    194.6    192.9     37.41             36.00        13    2.878     62M   489K       0.0       0.0
 L46     11/0    2.57 GB   0.0     14.6     9.8      4.8      14.3      9.5       0.0   1.5    193.7    189.8     77.09             73.54        17    4.535    128M  2411K       0.0       0.0
 L47     24/0    5.81 GB   0.0     19.8    12.0      7.8      18.8     11.0       0.0   1.6    191.4    181.1    106.19            101.21         9   11.799    174M  9166K       0.0       0.0
 L48     38/0    9.42 GB   0.0     19.6    11.8      7.9      17.3      9.4       0.0   1.5    197.3    173.6    101.97             97.23         4   25.491    172M    20M       0.0       0.0
 L49     91/0   22.70 GB   0.0      0.0     0.0      0.0       0.0      0.0       0.0   0.0      0.0      0.0      0.00              0.00         0    0.000       0      0       0.0       0.0
 Sum    169/0   41.54 GB   0.0     62.4    40.3     22.1      81.3     59.2       0.0   3.6    136.1    177.2    469.71            423.94       154    3.050    549M    32M       0.0       0.0

Case 3:
** Compaction Stats [default] **
Level    Files   Size     Score Read(GB)  Rn(GB) Rnp1(GB) Write(GB) Wnew(GB) Moved(GB) W-Amp Rd(MB/s) Wr(MB/s) Comp(sec) CompMergeCPU(sec) Comp(cnt) Avg(sec) KeyIn KeyDrop Rblob(GB) Wblob(GB)
------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
  L0      0/0    0.00 KB   5.0      0.0     0.0      0.0      22.6     22.6       0.0   1.0      0.0    163.8    141.43            111.13       108    1.310       0      0       0.0       0.0
 L44      0/0    0.00 KB   0.0      0.8     0.8      0.0       0.8      0.8       0.0   1.0    201.4    200.2      4.26              4.19         2    2.130   7360K    33K       0.0       0.0
 L45      4/0   844.38 MB   0.0      6.3     5.0      1.2       6.2      5.0       0.0   1.2    202.0    200.3     31.81             31.50        12    2.651     55M   403K       0.0       0.0
 L46      7/0    1.62 GB   0.0     13.3     8.8      4.6      13.1      8.6       0.0   1.5    198.9    195.7     68.72             67.89        17    4.042    117M  1696K       0.0       0.0
 L47     24/0    5.81 GB   0.0     21.7    12.9      8.8      20.6     11.8       0.0   1.6    198.5    188.6    112.04            109.97        12    9.336    191M  9352K       0.0       0.0
 L48     41/0   10.14 GB   0.0     24.8    13.0     11.8      21.9     10.1       0.0   1.7    198.6    175.6    127.88            125.36         6   21.313    218M    25M       0.0       0.0
 L49     91/0   22.70 GB   0.0      0.0     0.0      0.0       0.0      0.0       0.0   0.0      0.0      0.0      0.00              0.00         0    0.000       0      0       0.0       0.0
 Sum    167/0   41.10 GB   0.0     67.0    40.5     26.4      85.4     58.9       0.0   3.8    141.1    179.8    486.13            450.04       157    3.096    589M    36M       0.0       0.0

Case 4:
** Compaction Stats [default] **
Level    Files   Size     Score Read(GB)  Rn(GB) Rnp1(GB) Write(GB) Wnew(GB) Moved(GB) W-Amp Rd(MB/s) Wr(MB/s) Comp(sec) CompMergeCPU(sec) Comp(cnt) Avg(sec) KeyIn KeyDrop Rblob(GB) Wblob(GB)
------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
  L0      0/0    0.00 KB   0.7      0.0     0.0      0.0      22.6     22.6       0.0   1.0      0.0    158.6    146.02            114.68       108    1.352       0      0       0.0       0.0
 L42      0/0    0.00 KB   0.0      1.7     1.7      0.0       1.7      1.7       0.0   1.0    185.4    184.3      9.25              8.96         4    2.314     14M    67K       0.0       0.0
 L43      0/0    0.00 KB   0.0      2.5     2.5      0.0       2.5      2.5       0.0   1.0    197.8    195.6     13.01             12.65         4    3.253     22M   202K       0.0       0.0
 L44      4/0   844.40 MB   0.0      4.2     4.2      0.0       4.1      4.1       0.0   1.0    188.1    185.1     22.81             21.89         5    4.562     36M   503K       0.0       0.0
 L45     13/0    3.12 GB   0.0      7.5     6.5      1.0       7.2      6.2       0.0   1.1    188.7    181.8     40.69             39.32         5    8.138     65M  2282K       0.0       0.0
 L46     17/0    4.18 GB   0.0      8.3     7.1      1.2       7.9      6.6       0.0   1.1    192.2    181.8     44.23             43.06         4   11.058     73M  3846K       0.0       0.0
 L47     22/0    5.34 GB   0.0      8.9     7.5      1.4       8.2      6.8       0.0   1.1    189.1    174.1     48.12             45.37         3   16.041     78M  6098K       0.0       0.0
 L48     27/0    6.58 GB   0.0      9.2     7.6      1.6       8.2      6.6       0.0   1.1    195.2    172.9     48.52             47.11         2   24.262     81M  9217K       0.0       0.0
 L49     91/0   22.70 GB   0.0      0.0     0.0      0.0       0.0      0.0       0.0   0.0      0.0      0.0      0.00              0.00         0    0.000       0      0       0.0       0.0
 Sum    174/0   42.74 GB   0.0     42.3    37.0      5.3      62.4     57.1       0.0   2.8    116.3    171.3    372.66            333.04       135    2.760    372M    22M       0.0       0.0

setup:
./db_bench --benchmarks=fillseq,compactall,waitforcompaction --num=200000000 --compression_type=none --disable_wal=1 --compaction_style=1 --num_levels=50 --target_file_size_base=268435456 --max_compaction_bytes=6710886400 --level0_file_num_compaction_trigger=10 --write_buffer_size=268435456 --seed 1708494134896523

benchmark:
./db_bench --benchmarks=overwrite,waitforcompaction,stats --num=200000000 --compression_type=none --disable_wal=1 --compaction_style=1 --write_buffer_size=268435456 --level0_file_num_compaction_trigger=5 --target_file_size_base=268435456 --use_existing_db=1 --num_levels=50 --writes=200000000 --universal_max_read_amp=-1 --seed=1716488324800233

```

Reviewed By: ajkr

Differential Revision: D55370922

Pulled By: cbi42

fbshipit-source-id: 9be69979126b840d08e93e7059260e76a878bb2a
2024-05-24 10:10:31 -07:00
Levi Tamasi f044b6a6ad Fix a couple of issues in the stress test for Transaction::MultiGet (#12696)
Summary:
Pull Request resolved: https://github.com/facebook/rocksdb/pull/12696

Two fixes:
1) `Random::Uniform(n)` returns an integer from the interval [0, n - 1], so `Uniform(2)` returns 0 or 1, which means is that we have apparently never covered transactions with deletions in the test. (To prevent similar issues, the patch cleans this write logic up a bit using an `enum class` for the type of write.)
2) The keys passed in to `TestMultiGet` can have duplicates. What this boils down to is that we have to keep track of the latest expected values for read-your-own-writes on a per-key basis.

Reviewed By: jowlyzhang

Differential Revision: D57750212

fbshipit-source-id: e8ab603252c32331f8db0dfb2affcca1e188c790
2024-05-23 16:47:39 -07:00
Levi Tamasi db0960800a Add Transaction::PutEntity to the stress tests (#12688)
Summary:
Pull Request resolved: https://github.com/facebook/rocksdb/pull/12688

As a first step of covering the wide-column transaction APIs, the patch adds `PutEntity` to the optimistic and pessimistic transaction stress tests (for the latter, only when the WriteCommitted policy is utilized). Other APIs and the multi-operation transaction test will be covered by subsequent PRs.

Reviewed By: jaykorean

Differential Revision: D57675781

fbshipit-source-id: bfe062ec5f6ab48641cd99a70f239ce4aa39299c
2024-05-22 11:30:33 -07:00
Peter Dillinger d89ab23bec Disallow memtable flush and sst ingest while WAL is locked (#12652)
Summary:
We recently noticed that some memtable flushed and file
ingestions could proceed during LockWAL, in violation of its stated
contract. (Note: we aren't 100% sure its actually needed by MySQL, but
we want it to be in a clean state nonetheless.)

Despite earlier skepticism that this could be done safely (https://github.com/facebook/rocksdb/issues/12666), I
found a place to wait to wait for LockWAL to be cleared before allowing
these operations to proceed: WaitForPendingWrites()

Pull Request resolved: https://github.com/facebook/rocksdb/pull/12652

Test Plan:
Added to unit tests. Extended how db_stress validates LockWAL
and re-enabled combination of ingestion and LockWAL in crash test, in
follow-up to https://github.com/facebook/rocksdb/issues/12642

Ran blackbox_crash_test for a long while with relevant features
amplified.

Suggested follow-up: fix FaultInjectionTestFS to report file sizes
consistent with what the user has requested to be flushed.

Reviewed By: jowlyzhang

Differential Revision: D57622142

Pulled By: pdillinger

fbshipit-source-id: aef265fce69465618974b4ec47f4636257c676ce
2024-05-21 10:17:34 -07:00
Hui Xiao d7b938882e Sync WAL during db Close() (#12556)
Summary:
**Context/Summary:**
Below crash test found out we don't sync WAL upon DB close, which can lead to unsynced data loss. This PR syncs it.
```
./db_stress --threads=1 --disable_auto_compactions=1 --WAL_size_limit_MB=0 --WAL_ttl_seconds=0 --acquire_snapshot_one_in=0 --adaptive_readahead=0 --adm_policy=1 --advise_random_on_open=1 --allow_concurrent_memtable_write=1 --allow_data_in_errors=True --allow_fallocate=0 --async_io=0 --auto_readahead_size=0 --avoid_flush_during_recovery=1 --avoid_flush_during_shutdown=0 --avoid_unnecessary_blocking_io=1 --backup_max_size=104857600 --backup_one_in=0 --batch_protection_bytes_per_key=0 --bgerror_resume_retry_interval=1000000 --block_align=0 --block_protection_bytes_per_key=2 --block_size=16384 --bloom_before_level=1 --bloom_bits=29.895303579352174 --bottommost_compression_type=disable --bottommost_file_compaction_delay=0 --bytes_per_sync=0 --cache_index_and_filter_blocks=0 --cache_index_and_filter_blocks_with_high_priority=1 --cache_size=33554432 --cache_type=lru_cache --charge_compression_dictionary_building_buffer=1 --charge_file_metadata=0 --charge_filter_construction=1 --charge_table_reader=1 --checkpoint_one_in=0 --checksum_type=kxxHash64 --clear_column_family_one_in=0 --column_families=1 --compact_files_one_in=0 --compact_range_one_in=0 --compaction_pri=0 --compaction_readahead_size=0 --compaction_style=0 --compaction_ttl=0 --compress_format_version=2 --compressed_secondary_cache_ratio=0 --compressed_secondary_cache_size=0 --compression_checksum=1 --compression_max_dict_buffer_bytes=0 --compression_max_dict_bytes=0 --compression_parallel_threads=4 --compression_type=zstd --compression_use_zstd_dict_trainer=1 --compression_zstd_max_train_bytes=0 --continuous_verification_interval=0 --data_block_index_type=0 --db=/dev/shm/rocksdb_test/rocksdb_crashtest_whitebox --db_write_buffer_size=0 --default_temperature=kUnknown --default_write_temperature=kUnknown --delete_obsolete_files_period_micros=0 --delpercent=0 --delrangepercent=0 --destroy_db_initially=1 --detect_filter_construct_corruption=1 --disable_wal=0 --dump_malloc_stats=0 --enable_checksum_handoff=0 --enable_compaction_filter=0 --enable_custom_split_merge=0 --enable_do_not_compress_roles=1 --enable_index_compression=1 --enable_memtable_insert_with_hint_prefix_extractor=0 --enable_pipelined_write=0 --enable_sst_partitioner_factory=0 --enable_thread_tracking=1 --enable_write_thread_adaptive_yield=0 --expected_values_dir=/dev/shm/rocksdb_test/rocksdb_crashtest_expected --fail_if_options_file_error=0 --fifo_allow_compaction=1 --file_checksum_impl=none --fill_cache=0 --flush_one_in=1000 --format_version=5 --get_current_wal_file_one_in=0 --get_live_files_one_in=0 --get_property_one_in=0 --get_sorted_wal_files_one_in=0 --hard_pending_compaction_bytes_limit=274877906944 --high_pri_pool_ratio=0 --index_block_restart_interval=6 --index_shortening=0 --index_type=0 --ingest_external_file_one_in=0 --initial_auto_readahead_size=16384 --iterpercent=0 --key_len_percent_dist=1,30,69 --last_level_temperature=kUnknown --level_compaction_dynamic_level_bytes=1 --lock_wal_one_in=0 --log2_keys_per_lock=10 --log_file_time_to_roll=0 --log_readahead_size=16777216 --long_running_snapshots=0 --low_pri_pool_ratio=0 --lowest_used_cache_tier=0 --manifest_preallocation_size=5120 --manual_wal_flush_one_in=0 --mark_for_compaction_one_file_in=0 --max_auto_readahead_size=0 --max_background_compactions=1 --max_bytes_for_level_base=67108864 --max_key=2500000 --max_key_len=3 --max_log_file_size=0 --max_manifest_file_size=1073741824 --max_sequential_skip_in_iterations=8 --max_total_wal_size=0 --max_write_batch_group_size_bytes=64 --max_write_buffer_number=10 --max_write_buffer_size_to_maintain=0 --memtable_insert_hint_per_batch=0 --memtable_max_range_deletions=0 --memtable_prefix_bloom_size_ratio=0.5 --memtable_protection_bytes_per_key=1 --memtable_whole_key_filtering=1 --memtablerep=skip_list --metadata_charge_policy=0 --min_write_buffer_number_to_merge=1 --mmap_read=0 --mock_direct_io=True --nooverwritepercent=1 --num_file_reads_for_auto_readahead=0 --num_levels=1 --open_files=-1 --open_metadata_write_fault_one_in=0 --open_read_fault_one_in=0 --open_write_fault_one_in=0 --ops_per_thread=3 --optimize_filters_for_hits=1 --optimize_filters_for_memory=1 --optimize_multiget_for_io=0 --paranoid_file_checks=0 --partition_filters=0 --partition_pinning=1 --pause_background_one_in=0 --periodic_compaction_seconds=0 --prefix_size=1 --prefixpercent=0 --prepopulate_block_cache=0 --preserve_internal_time_seconds=3600 --progress_reports=0 --read_amp_bytes_per_bit=0 --read_fault_one_in=0 --readahead_size=16384 --readpercent=0 --recycle_log_file_num=0 --reopen=2 --report_bg_io_stats=1 --sample_for_compression=5 --secondary_cache_fault_one_in=0 --secondary_cache_uri= --skip_stats_update_on_db_open=1 --snapshot_hold_ops=0 --soft_pending_compaction_bytes_limit=68719476736 --sst_file_manager_bytes_per_sec=0 --sst_file_manager_bytes_per_truncate=0 --stats_dump_period_sec=10 --stats_history_buffer_size=1048576 --strict_bytes_per_sync=0 --subcompactions=3 --sync=0 --sync_fault_injection=1 --table_cache_numshardbits=6 --target_file_size_base=16777216 --target_file_size_multiplier=1 --test_batches_snapshots=0 --top_level_index_pinning=0 --unpartitioned_pinning=3 --use_adaptive_mutex=1 --use_adaptive_mutex_lru=0 --use_delta_encoding=1 --use_direct_io_for_flush_and_compaction=0 --use_direct_reads=0 --use_full_merge_v1=0 --use_get_entity=0 --use_merge=0 --use_multi_get_entity=0 --use_multiget=1 --use_put_entity_one_in=0 --use_write_buffer_manager=0 --user_timestamp_size=0 --value_size_mult=32 --verification_only=0 --verify_checksum=1 --verify_checksum_one_in=1000 --verify_compression=0 --verify_db_one_in=100000 --verify_file_checksums_one_in=0 --verify_iterator_with_expected_state_one_in=5 --verify_sst_unique_id_in_manifest=1 --wal_bytes_per_sync=0 --wal_compression=zstd --write_buffer_size=33554432 --write_dbid_to_manifest=0 --write_fault_one_in=0 --writepercent=100

 Verification failed for column family 0 key 000000000000B9D1000000000000012B000000000000017D (4756691): value_from_db: , value_from_expected: 010000000504070609080B0A0D0C0F0E111013121514171619181B1A1D1C1F1E212023222524272629282B2A2D2C2F2E313033323534373639383B3A3D3C3F3E, msg: Iterator verification: Value not found: NotFound:
Verification failed :(
```

Pull Request resolved: https://github.com/facebook/rocksdb/pull/12556

Test Plan:
- New UT
- Same stress test command failed before this fix but pass after
- CI

Reviewed By: ajkr

Differential Revision: D56267964

Pulled By: hx235

fbshipit-source-id: af1b7e8769c129f64ba1c7f1ff17102f1239b929
2024-05-20 17:33:43 -07:00
Changyu Bi 2eb404de13 Print non-ok status for multi_ops_txns_stress test (#12660)
Summary:
Currently `assert(s.ok())` does not offer much information to debug.

Pull Request resolved: https://github.com/facebook/rocksdb/pull/12660

Test Plan:
revert https://github.com/facebook/rocksdb/issues/12639 and run `python3 ./tools/db_crashtest.py blackbox --write_policy write_prepared --recycle_log_file_num=1 --threads=1 --test_multiops_txn`

before this PR:
```
Exit Before Killing
stdout:

stderr:
 db_stress: db_stress_tool/multi_ops_txns_stress.cc:1529: void rocksdb::MultiOpsTxnsStressTest::PreloadDb(rocksdb::SharedState*, int, uint32_t, uint32_t, uint32_t, uint32_t): Assertion `s.ok()' failed.
```

after this PR:
```
Exit Before Killing
stdout:

stderr:
 Verification failed: PreloadDB failed: Invalid argument: WriteOptions::disableWAL option is not supported if DBOptions::recycle_log_file_num > 0
db_stress: db_stress_tool/db_stress_test_base.cc:517: void rocksdb::StressTest::ProcessStatus(rocksdb::SharedState*, std::string, rocksdb::Status, bool) const: Assertion `false' failed.
Received signal 6 (Aborted)
```

Reviewed By: ajkr

Differential Revision: D57410819

Pulled By: cbi42

fbshipit-source-id: a03c2202c3fd666eb2f58bae24e0c9e3e6ed4265
2024-05-15 21:14:41 -07:00
Jay Huh b4c6956a59 Add MultiGetEntity AttributeGroup API to stress test (#12640)
Summary:
Continuing from https://github.com/facebook/rocksdb/pull/12605, adding AttributeGroup `MultiGetEntity` API to stress tests.

Pull Request resolved: https://github.com/facebook/rocksdb/pull/12640

Test Plan:
**AttributeGroup Tests**

NonBatchOps
```
python3 tools/db_crashtest.py blackbox --simple --max_key=25000000 --write_buffer_size=4194304 --use_attribute_group=1 --use_put_entity_one_in=1 --use_multi_get=1
```

BatchOps
```
python3 tools/db_crashtest.py blackbox  --test_batches_snapshots=1 --max_key=25000000 --write_buffer_size=4194304 --use_attribute_group=1 --use_put_entity_one_in=1 --use_multi_get=1
```

CfConsistency Test
```
python3 tools/db_crashtest.py blackbox --cf_consistency --max_key=25000000 --write_buffer_size=4194304 --use_attribute_group=1 --use_put_entity_one_in=1 --use_multi_get=1
```

**Non-AttributeGroup Tests**

NonBatchOps
```
python3 tools/db_crashtest.py blackbox --simple --max_key=25000000 --write_buffer_size=4194304 --use_attribute_group=0 --use_put_entity_one_in=1 --use_multi_get=1
```

BatchOps
```
python3 tools/db_crashtest.py blackbox  --test_batches_snapshots=1 --max_key=25000000 --write_buffer_size=4194304 --use_attribute_group=0 --use_put_entity_one_in=1 --use_multi_get=1
```

CfConsistency Test
```
python3 tools/db_crashtest.py blackbox --cf_consistency --max_key=25000000 --write_buffer_size=4194304 --use_attribute_group=0 --use_put_entity_one_in=1 --use_multi_get=1
```

Reviewed By: ltamasi

Differential Revision: D57233931

Pulled By: jaykorean

fbshipit-source-id: 8cea771ac2e5749050bf5319360c6c5aa476d7d5
2024-05-14 16:33:44 -07:00
Yu Zhang 7d9642d876 Add logging for read timestamp during VerifyDB (#12638)
Summary:
As titled. To help debug some verification failures.

Pull Request resolved: https://github.com/facebook/rocksdb/pull/12638

Test Plan: manually tested

Reviewed By: ajkr

Differential Revision: D57219549

Pulled By: jowlyzhang

fbshipit-source-id: 59c05ac85fb1c24449e7394ea04172c855d86420
2024-05-10 12:34:53 -07:00
Jay Huh 1f2715d1d2 AttributeGroup APIs in stress test - PutEntity and GetEntity (#12605)
Summary:
Adding AttributeGroup APIs in stress test. This contains the following changes only. More PRs to follow.

- Introduce `use_attribute_group` flag
- AttributeGroup `PutEntity()` and `GetEntity()` are now used per `use_attribute_group` flag in BatchOps, NonBatchOps and CfConsistency tests

In the next PRs I plan to add
- AttributeGroup `MultiGetEntity()` in Stress Test
- AttributeGroupIterator in Stress Test (along with CoalescingIterator)

Pull Request resolved: https://github.com/facebook/rocksdb/pull/12605

Test Plan:
NonBatchOps
```
python3 tools/db_crashtest.py blackbox --simple --max_key=25000000 --write_buffer_size=4194304 --use_attribute_group=1 --use_put_entity_one_in=1
```

BatchOps
```
python3 tools/db_crashtest.py blackbox --test_batches_snapshots=1 --max_key=25000000 --write_buffer_size=4194304 --use_attribute_group=1 --use_put_entity_one_in=1
```

CfConsistency Test
```
python3 tools/db_crashtest.py blackbox --cf_consistency --max_key=25000000 --write_buffer_size=4194304 --use_attribute_group=1 --use_put_entity_one_in=1
```

Reviewed By: ltamasi

Differential Revision: D56916768

Pulled By: jaykorean

fbshipit-source-id: 8555d9e0d05927740a10e4e8301e44beec59a6f5
2024-05-09 16:40:22 -07:00
Hui Xiao 9bddac0dcf Add more public APIs to crash test (#12617)
Summary:
**Context/Summary:**
As titled. Bonus: found that PromoteL0 called with other concurrent PromoteL0 will return non-okay error so clarify the API.

Pull Request resolved: https://github.com/facebook/rocksdb/pull/12617

Test Plan: CI

Reviewed By: ajkr

Differential Revision: D56954428

Pulled By: hx235

fbshipit-source-id: 0e056153c515003fd241ffec59b0d8a27529db4c
2024-05-09 15:37:38 -07:00
Andrew Kryczka 933ac0e05c Fix locking for ColumnFamilyOptions::inplace_update_support (#12624)
Summary:
In `SaveValue()`, the read lock needs to be obtained before `VerifyEntryChecksum()` because the KV checksum verification reads the entire value metadata+data, which is all mutable when `ColumnFamilyOptions::inplace_update_support == true`.

In `MemTable::Update()`, the write lock needs to be obtained before mutating the value metadata (changing the value size) because it can be read concurrently.

Pull Request resolved: https://github.com/facebook/rocksdb/pull/12624

Test Plan:
```
$ make COMPILE_WITH_TSAN=1 -j56 db_stress
...
$ python3 tools/db_crashtest.py blackbox --simple --max_key=10 --inplace_update_support=1 --interval=10 --allow_concurrent_memtable_write=0
```

Reviewed By: cbi42

Differential Revision: D57034571

Pulled By: ajkr

fbshipit-source-id: 3dddf881ad87923143acdf6bfec12ce47bb13a48
2024-05-08 08:30:12 -07:00
Changyu Bi 5bf2c00a35 Clarify manual compaction and file ingestion behavior with FIFO compaction (#12618)
Summary:
For manual compaction, FIFO compaction will always skip key range overlapping checking with SST files. If CompactRange() is called with CompactionRangeOptions::change_level=true, a CF with FIFO compaction will now return Status::NotSupported.

For file ingestion, we will always ingest into L0. Previously, it's possible to ingest files into non-L0 levels with FIFO compaction.

These changes also help to fix [this](a178d15baf/db/db_impl/db_impl_compaction_flush.cc (L1269)) assertion failure in crash tests.

Pull Request resolved: https://github.com/facebook/rocksdb/pull/12618

Test Plan: added unit tests to verify the new behavior.

Reviewed By: hx235

Differential Revision: D56962401

Pulled By: cbi42

fbshipit-source-id: 19812a1509650b4162b379ca5bee02f2e9d9569d
2024-05-07 12:00:15 -07:00
Hui Xiao b312dbe920 Remove duplicate inplace_update_support crash/stress test flag (#12610)
Summary:
**Context/Summary:**
As titled. There were two flags serving the same purpose so removed one of them.

Pull Request resolved: https://github.com/facebook/rocksdb/pull/12610

Test Plan: CI

Reviewed By: jaykorean, ajkr

Differential Revision: D56916119

Pulled By: hx235

fbshipit-source-id: 011140a7945782cc613ca86d4b542db0cf7fb444
2024-05-03 11:33:33 -07:00
Yu Zhang 8b3d9e6bfe Add TimedPut to stress test (#12559)
Summary:
This also updates WriteBatch's protection info to include write time since there are several places in memtable that by default protects the whole value slice.

This PR is stacked on https://github.com/facebook/rocksdb/issues/12543

Pull Request resolved: https://github.com/facebook/rocksdb/pull/12559

Reviewed By: pdillinger

Differential Revision: D56308285

Pulled By: jowlyzhang

fbshipit-source-id: 5524339fe0dd6c918dc940ca2f0657b5f2111c56
2024-04-30 15:40:35 -07:00
Hui Xiao 24a35b6e57 Add more public APIs to crash/stress test (#12541)
Summary:
**Context/Summary:**
This PR includes some public DB APIs not tested in crash/stress yet can be added in a straightforward way.

Pull Request resolved: https://github.com/facebook/rocksdb/pull/12541

Test Plan:
- Locally run crash test heavily stressing on these new APIs
- CI

Reviewed By: jowlyzhang

Differential Revision: D56164892

Pulled By: hx235

fbshipit-source-id: 8bb568c3e65aec39d642987033f1d76c52f69bd8
2024-04-16 15:43:26 -07:00
Jay Huh dfdc3b158e Add offpeak feature to crash test (#12549)
Summary:
As title. Add offpeak feature in stress test.

Pull Request resolved: https://github.com/facebook/rocksdb/pull/12549

Test Plan:
Ran stress test locally with the flag set
```
Running db_stress with pid=701060: ./db_stress ... --daily_offpeak_time_utc=04:00-08:00 ... --periodic_compaction_seconds=10 ...
...
KILLED 701060

stdout:
 Choosing random keys with no overwrite
Creating 6250000 locks
2024/04/16-11:38:19  Initializing db_stress
RocksDB version           : 9.2
Format version            : 5
TransactionDB             : false
Stacked BlobDB            : false
Read only mode            : false
Atomic flush              : false
Manual WAL flush          : true
Column families           : 1
Clear CFs one in          : 0
Number of threads         : 32
Ops per thread            : 100000000
Time to live(sec)         : unused
Read percentage           : 60%
Prefix percentage         : 0%
Write percentage          : 35%
Delete percentage         : 4%
Delete range percentage   : 1%
No overwrite percentage   : 1%
Iterate percentage        : 0%
Custom ops percentage     : 0%
DB-write-buffer-size      : 0
Write-buffer-size         : 4194304
Iterations                : 10
Max key                   : 25000000
Ratio #ops/#keys          : 128.000000
Num times DB reopens      : 0
Batches/snapshots         : 0
Do update in place        : 0
Num keys per lock         : 4
Compression               : LZ4
Bottommost Compression    : DisableOption
Checksum type             : kxxHash
File checksum impl        : none
Bloom bits / key          : 18.000000
Max subcompactions        : 4
Use MultiGet              : false
Use GetEntity             : false
Use MultiGetEntity        : false
Verification only         : false
Memtablerep               : skip_list
Test kill odd             : 0
Periodic Compaction Secs  : 10
Daily Offpeak UTC         : 04:00-08:00  <<<<<<<<<<<<<<< Newly added
Compaction TTL            : 0
Compaction Pri            : kMinOverlappingRatio
Background Purge          : 0
Write DB ID to manifest   : 0
Max Write Batch Group Size: 16
Use dynamic level         : 1
Read fault one in         : 0
Write fault one in        : 1000
Open metadata write fault one in:
                            8
Sync fault injection      : 0
Best efforts recovery     : 0
Fail if OPTIONS file error: 0
User timestamp size bytes : 0
Persist user defined timestamps : 1
WAL compression           : zstd
Try verify sst unique id  : 1
------------------------------------------------
```

Reviewed By: hx235

Differential Revision: D56203102

Pulled By: jaykorean

fbshipit-source-id: 11a9be7362b3b26940d74d41c8bf4ebac3f03a2d
2024-04-16 12:44:44 -07:00
Hui Xiao d41e568b1c Add inplace_update_support to crash/stress test (#12535)
Summary:
**Context/Summary:**
`inplace_update_support=true` is not tested in crash/stress test. Since it's not compatible with snapshots like compaction_filter, we need to sanitize its value in presence of snapshots-related options. A minor refactoring is added to centralize such sanitization in db_crashtest.py - see `check_multiget_consistency` and `check_multiget_entity_consistency`

Pull Request resolved: https://github.com/facebook/rocksdb/pull/12535

Test Plan: CI

Reviewed By: ajkr

Differential Revision: D56102978

Pulled By: hx235

fbshipit-source-id: 2e2ab6685a65123b14a321b99f45f60bc6509c6b
2024-04-15 16:11:58 -07:00
Hui Xiao ef26d68e8d Renable kAdmPolicyThreeQueue in crash test (#12524)
Summary:
Context/Summary:

We need a `nvm_sec_cache` when `kAdmPolicyThreeQueue` is used otherwise a nullptr cache will be accessed causing us segfault in https://github.com/facebook/rocksdb/pull/12521

Pull Request resolved: https://github.com/facebook/rocksdb/pull/12524

Test Plan: - Re-enabled `kAdmPolicyThreeQueue` and rehearsed stress test that failed before this fix and pass after

Reviewed By: jowlyzhang

Differential Revision: D55997093

Pulled By: hx235

fbshipit-source-id: e1c6f1015091b4cff0ce6a3fff981d5dece52a62
2024-04-11 14:53:11 -07:00
Hui Xiao 72c1376fcf Fix "assertion failed - iter != ROCKSDB_NAMESPACE::OptionsHelper::temperature_to_string.end()" (#12519)
Summary:
Context/Summary: for unknown reason, calling a db stress common function in db stress flag file for temperature-related flags will cause some weird behavior in some compilation/build.
```
assertion failed - iter != ROCKSDB_NAMESPACE::OptionsHelper::temperature_to_string.end()
```
For now, we decide not to call such function by hard-coding their default stress test values.

Pull Request resolved: https://github.com/facebook/rocksdb/pull/12519

Test Plan: - Run a rehearsal stress test with this fix and weird behavior is gone.

Reviewed By: jowlyzhang

Differential Revision: D55884693

Pulled By: hx235

fbshipit-source-id: ba5135f5b37a9fa686b3ccae8d3f77e62d6562c9
2024-04-08 13:45:41 -07:00
Hui Xiao ad423abbd1 Add more missing options in crash test (#12508)
Summary:
**Context/Summary:**
This is to improve our crash test coverage.

Bonus change:
- Added the missing Options string mapping for `CacheTier::kVolatileCompressedTier`
- Deprecated crash test options `enable_tiered_storage` mainly for setting `last_level_temperature` which is now covered in crash test by itself
- Intensified `verify_checksum_one_in\verify_file_checksums_one_in` as I found out these together with new coverage surface more issues

Pull Request resolved: https://github.com/facebook/rocksdb/pull/12508

Test Plan: CI to look out for trivial failures

Reviewed By: jowlyzhang

Differential Revision: D55768594

Pulled By: hx235

fbshipit-source-id: 9b829da0309a7db3fcdb17992a524dd64498325c
2024-04-08 09:48:03 -07:00
Hui Xiao d985902ef4 Disallow refitting more than 1 file from non-L0 to L0 (#12481)
Summary:
**Context/Summary:**
We recently discovered that `CompactRange(change_level=true, target_level=0)` can possibly refit more than 1 files to L0. This refitting can cause read performance regression as we need to go through every file in L0, corruption in some edge case and false positive corruption caught by force consistency check. We decided to explicitly disallow such behavior.

A related change to OptionChangeMigration():
- When migrating to FIFO with `compaction_options_fifo.max_table_files_size > 0`, RocksDB will [CompactRange() all the to-be-migrate data into a couple L0 files](https://github.com/facebook/rocksdb/blob/main/utilities/option_change_migration/option_change_migration.cc#L164-L169) to avoid dropping all the data upon migration finishes when the migrated data is larger than max_table_files_size. This is achieved by first compacting all the data into a couple non-L0 files and refitting those files from non-L0 to L0 if needed. In that way, only some data instead of all data will be dropped immediately after migration to FIFO with a max_table_files_size.
- Since this type of refitting behavior is disallowed from now on, we won't do this trick anymore and explicitly state such risk in API comment.

Pull Request resolved: https://github.com/facebook/rocksdb/pull/12481

Test Plan:
- New UT
- Modified UT

Reviewed By: cbi42

Differential Revision: D55351178

Pulled By: hx235

fbshipit-source-id: 9d8854f2f81d7e8aff859c3a4e53b7d688048e80
2024-03-29 10:52:36 -07:00
Changyu Bi 3d5be596a5 Fix a bug in iterator with UDT + ReadOptions::pin_data (#12451)
Summary:
with https://github.com/facebook/rocksdb/issues/12414 enabling `ReadOptions::pin_data`, this bug surfaced as corrupted per key-value checksum during crash test. `saved_key_.GetUserKey()` could be pinned user key, so DBIter should not overwrite it.

In one case, it only surfaces when iterator skips many keys of the same user key. To stress that code path, this PR also added `max_sequential_skip_in_iterations` to crash test.

Pull Request resolved: https://github.com/facebook/rocksdb/pull/12451

Test Plan:
- Set ReadOptions::pin_data to true, the bug can be reproed quickly with `./db_stress --persist_user_defined_timestamps=1 --user_timestamp_size=8 --writepercent=35 --delpercent=4 --delrangepercent=1 --iterpercent=20 --nooverwritepercent=1 --prefix_size=8 --prefixpercent=10 --readpercent=30 --memtable_protection_bytes_per_key=8 --block_protection_bytes_per_key=2 --clear_column_family_one_in=0`.
    - Set max_sequential_skip_in_iterations to 1 for the other occurrence of the bug.

Reviewed By: jowlyzhang

Differential Revision: D55003766

Pulled By: cbi42

fbshipit-source-id: 23e1049129456684dafb028b6132b70e0afc07fb
2024-03-18 09:05:11 -07:00
Hui Xiao fa4978c566 Re-suppress tolerable manual compaction stress test failures (#12437)
Summary:
**Context/Summary:**
Previously manual compaction stress test failures won't terminate stress test. https://github.com/facebook/rocksdb/pull/12414 made more manual compaction failures terminate the stress test for signal boosting. A downside to that PR: some tolerable manual compaction stress test failures also unnecessarily terminate stress test.

Ideally we should exclude exactly those tolerable errors (left as a TODO) from being able to terminate. For now we approximate those errors by Aborted(), InvalidArgument(), NotSupported() etc. It's still an improvement to pre-https://github.com/facebook/rocksdb/pull/12414 situation.

Pull Request resolved: https://github.com/facebook/rocksdb/pull/12437

Test Plan: - No more tolerable manual compaction stress test failures terminating stress test.

Reviewed By: cbi42

Differential Revision: D54913010

Pulled By: hx235

fbshipit-source-id: c43fa79d8f9c1c8b4f8786f8f46508b0ad619a9e
2024-03-14 14:50:56 -07:00
Hui Xiao 30243c6573 Add missing db crash options (#12414)
Summary:
**Context/Summary:**
We are doing a sweep in all public options, including but not limited to the `Options`, `Read/WriteOptions`, `IngestExternalFileOptions`, cache options.., to find and add the uncovered ones into db crash. The options included in this PR require minimum changes to db crash other than adding the options themselves.

A bonus change: to surface new issues by improved coverage in stderror, we decided to fail/terminate crash test for manual compactions (CompactFiles, CompactRange()) on meaningful errors. See https://github.com/facebook/rocksdb/pull/12414/files#diff-5c4ced6afb6a90e27fec18ab03b2cd89e8f99db87791b4ecc6fa2694284d50c0R2528-R2532, https://github.com/facebook/rocksdb/pull/12414/files#diff-5c4ced6afb6a90e27fec18ab03b2cd89e8f99db87791b4ecc6fa2694284d50c0R2330-R2336 for more.

Pull Request resolved: https://github.com/facebook/rocksdb/pull/12414

Test Plan:
- Run `python3 ./tools/db_crashtest.py --simple blackbox` for 10 minutes to ensure no trivial failure
- Run `python3 tools/db_crashtest.py --simple blackbox --compact_files_one_in=1 --compact_range_one_in=1 --read_fault_one_in=1 --write_fault_one_in=1 --interval=50` for a while to ensure the bonus change does not result in trivial crash/termination of stress test

Reviewed By: ajkr, jowlyzhang, cbi42

Differential Revision: D54691774

Pulled By: hx235

fbshipit-source-id: 50443dfb6aaabd8e24c79a2e42b68c6de877be88
2024-03-12 17:24:12 -07:00
Peter Dillinger a53ed91691 Fix/improve temperature handling for file ingestion (#12402)
Summary:
Partly following up on leftovers from https://github.com/facebook/rocksdb/issues/12388

In terms of public API:
* Make it clear that IngestExternalFileArg::file_temperature is just a hint for opening the existing file, though it was previously used for both copy-from temp hint and copy-to temp, which was bizarre.
* Specify how IngestExternalFile assigns temperature to file ingested into DB. (See details in comments.) This approach is not perfect in terms of matching how the DB assigns temperatures, but was the simplest way to get close. The key complication for matching DB temperature assignments is that ingestion files are copied (to a destination temp) before their target level is determined (in general).
* Add a temperature option to SstFileWriter::Open so that files intended for ingestion can be initially written to a chosen temperature.
* Note that "fail_if_not_bottommost_level" is obsolete/confusing use of "bottommost"

In terms of the implementation, there was a similar bit of oddness with the internal CopyFile API, which only took one temperature, ambiguously applicable to the source, destination, or both. This is also fixed.

Eventual suggested follow-up:
* Before copying files for ingestion, determine a tentative level assignment to use for destination temperature, and keep that even if final level assignment happens to be different at commit time (rare).
* More temperature handling for CreateColumnFamilyWithImport and Checkpoints.

Pull Request resolved: https://github.com/facebook/rocksdb/pull/12402

Test Plan:
Deeply revamped
ExternalSSTFileBasicTest.IngestWithTemperature to test the new changes. Previously this test was insufficient because it was only looking at temperatures according to the DB manifest. Incorporating FileTemperatureTestFS allows us to also test the temperatures in the storage layer.

Used macros instead of functions for better tracing to critical source location on test failures.

Some enhancements to FileTemperatureTestFS in the process of developing the revamped test.

Reviewed By: jowlyzhang

Differential Revision: D54442794

Pulled By: pdillinger

fbshipit-source-id: 41d9d0afdc073e6a983304c10bbc07c70cc7e995
2024-03-05 16:56:08 -08:00
Peter Dillinger d780e7a561 Remove bottommost_temperature (#12389)
Summary:
deprecated option already replaced by `last_level_temperature`. (Keeping recognition of the option in old options files.)

Pull Request resolved: https://github.com/facebook/rocksdb/pull/12389

Test Plan: tests updated

Reviewed By: jowlyzhang, cbi42

Differential Revision: D54267946

Pulled By: pdillinger

fbshipit-source-id: 65c49b15e7394829c1f3b44edd4179d2daff6017
2024-02-27 14:48:00 -08:00
Andrew Kryczka 8e29f243c9 No filesystem reads during Merge() writes (#12365)
Summary:
This occasional filesystem read in the write path has caused user pain. It doesn't seem very useful considering it only limits one component's merge chain length, and only helps merge uncached (i.e., infrequently read) values. This PR proposes allowing `max_successive_merges` to be exceeded when the value cannot be read from in-memory components. I included a rollback flag (`strict_max_successive_merges`) just in case.

Pull Request resolved: https://github.com/facebook/rocksdb/pull/12365

Test Plan:
"rocksdb.block.cache.data.add" is number of data blocks read from filesystem. Since the benchmark is write-only, compaction is disabled, and flush doesn't read data blocks, any nonzero value means the user write issued the read.

```
$ for s in false true; do echo -n "strict_max_successive_merges=$s: " && ./db_bench -value_size=64 -write_buffer_size=131072 -writes=128 -num=1 -benchmarks=mergerandom,flush,mergerandom -merge_operator=stringappend -disable_auto_compactions=true -compression_type=none -strict_max_successive_merges=$s -max_successive_merges=100 -statistics=true |& grep 'block.cache.data.add COUNT' ; done
strict_max_successive_merges=false: rocksdb.block.cache.data.add COUNT : 0
strict_max_successive_merges=true: rocksdb.block.cache.data.add COUNT : 1
```

Reviewed By: hx235

Differential Revision: D53982520

Pulled By: ajkr

fbshipit-source-id: e40f761a60bd601f232417ac0058e4a33ee9c0f4
2024-02-21 13:15:27 -08:00
Akanksha Mahajan 956f1dfde3 Change ReadAsync callback API to remove const from FSReadRequest (#11649)
Summary:
Modify ReadAsync callback API to remove const from FSReadRequest as const doesn't let to fs_scratch to move the ownership.

Pull Request resolved: https://github.com/facebook/rocksdb/pull/11649

Test Plan: CircleCI jobs

Reviewed By: anand1976

Differential Revision: D53585309

Pulled By: akankshamahajan15

fbshipit-source-id: 3bff9035db0e6fbbe34721a5963443355807420d
2024-02-16 09:14:55 -08:00
Changyu Bi 42a8e583c9 Print zstd warning to stdout in stress test (#12338)
Summary:
so the stress test does not fail.

Pull Request resolved: https://github.com/facebook/rocksdb/pull/12338

Reviewed By: jaykorean

Differential Revision: D53542941

Pulled By: cbi42

fbshipit-source-id: 83b2eb3cb5cc4c5a268da386c22c4aadeb039a74
2024-02-07 14:17:51 -08:00
Peter Dillinger 54cb9c77d9 Prefer static_cast in place of most reinterpret_cast (#12308)
Summary:
The following are risks associated with pointer-to-pointer reinterpret_cast:
* Can produce the "wrong result" (crash or memory corruption). IIRC, in theory this can happen for any up-cast or down-cast for a non-standard-layout type, though in practice would only happen for multiple inheritance cases (where the base class pointer might be "inside" the derived object). We don't use multiple inheritance a lot, but we do.
* Can mask useful compiler errors upon code change, including converting between unrelated pointer types that you are expecting to be related, and converting between pointer and scalar types unintentionally.

I can only think of some obscure cases where static_cast could be troublesome when it compiles as a replacement:
* Going through `void*` could plausibly cause unnecessary or broken pointer arithmetic. Suppose we have
`struct Derived: public Base1, public Base2`.  If we have `Derived*` -> `void*` -> `Base2*` -> `Derived*` through reinterpret casts, this could plausibly work (though technical UB) assuming the `Base2*` is not dereferenced. Changing to static cast could introduce breaking pointer arithmetic.
* Unnecessary (but safe) pointer arithmetic could arise in a case like `Derived*` -> `Base2*` -> `Derived*` where before the Base2 pointer might not have been dereferenced. This could potentially affect performance.

With some light scripting, I tried replacing pointer-to-pointer reinterpret_casts with static_cast and kept the cases that still compile. Most occurrences of reinterpret_cast have successfully been changed (except for java/ and third-party/). 294 changed, 257 remain.

A couple of related interventions included here:
* Previously Cache::Handle was not actually derived from in the implementations and just used as a `void*` stand-in with reinterpret_cast. Now there is a relationship to allow static_cast. In theory, this could introduce pointer arithmetic (as described above) but is unlikely without multiple inheritance AND non-empty Cache::Handle.
* Remove some unnecessary casts to void* as this is allowed to be implicit (for better or worse).

Most of the remaining reinterpret_casts are for converting to/from raw bytes of objects. We could consider better idioms for these patterns in follow-up work.

I wish there were a way to implement a template variant of static_cast that would only compile if no pointer arithmetic is generated, but best I can tell, this is not possible. AFAIK the best you could do is a dynamic check that the void* conversion after the static cast is unchanged.

Pull Request resolved: https://github.com/facebook/rocksdb/pull/12308

Test Plan: existing tests, CI

Reviewed By: ltamasi

Differential Revision: D53204947

Pulled By: pdillinger

fbshipit-source-id: 9de23e618263b0d5b9820f4e15966876888a16e2
2024-02-07 10:44:11 -08:00
Peter Dillinger 76c834e441 Remove 'virtual' when implied by 'override' (#12319)
Summary:
... to follow modern C++ style / idioms.

Used this hack:
```
for FILE in `cat my_list_of_files`; do perl -pi -e 'BEGIN{undef $/;} s/ virtual( [^;{]* override)/$1/smg' $FILE; done
```

Pull Request resolved: https://github.com/facebook/rocksdb/pull/12319

Test Plan: existing tests, CI

Reviewed By: jaykorean

Differential Revision: D53275303

Pulled By: pdillinger

fbshipit-source-id: bc0881af270aa8ef4d0ae4f44c5a6614b6407377
2024-01-31 13:14:42 -08:00
Akanksha Mahajan 95d582e0cc Enable io_uring in stress test (#12313)
Summary:
Enable io_uring in stress test

Pull Request resolved: https://github.com/facebook/rocksdb/pull/12313

Test Plan: Crash test

Reviewed By: anand1976

Differential Revision: D53238319

Pulled By: akankshamahajan15

fbshipit-source-id: c0c8e6a6479f6977210370606e9d551c1299ba62
2024-01-31 12:37:42 -08:00
Peter Dillinger acf77e1bfe Fix possible crash test segfault in FileExpectedStateManager::Restore() (#12314)
Summary:
`replayer` could be `nullptr` if `!s.ok()` from an earlier failure. Also consider status returned from `record->Accept()`.

Pull Request resolved: https://github.com/facebook/rocksdb/pull/12314

Test Plan: blackbox_crash_test run

Reviewed By: hx235

Differential Revision: D53241506

Pulled By: pdillinger

fbshipit-source-id: fd330417c23391ca819c3ee0f69e4156d81934dc
2024-01-30 16:16:04 -08:00
Jay Huh 0d68aff3a1 StressTest - Move some stderr messages to stdout (#12304)
Summary:
Moving some of the messages that we print out in `stderr` to `stdout` to make `stderr` more strictly related to errors.

Pull Request resolved: https://github.com/facebook/rocksdb/pull/12304

Test Plan:
```
$> python3 tools/db_crashtest.py blackbox --simple --max_key=25000000 --write_buffer_size=4194304
```

**Before**
```
...
stderr:
WARNING: prefix_size is non-zero but memtablerep != prefix_hash
Error : jewoongh injected test error This is not a real failure.
Verification failed :(
```

**After**

No longer seeing the `WARNING: prefix_size is non-zero but memtablerep != prefix_hash` message in stderr, but still appears in stdout.

```
WARNING: prefix_size is non-zero but memtablerep != prefix_hash
Integrated BlobDB: blob files enabled 0, min blob size 0, blob file size 268435456, blob compression type NoCompression, blob GC enabled 0, cutoff 0.250000, force threshold 1.000000, blob compaction readahead size 0, blob file starting level 0
Integrated BlobDB: blob cache disabled
DB path: [/tmp/jewoongh/rocksdb_crashtest_blackboxzztp281q]
(Re-)verified 0 unique IDs
2024/01/29-11:57:51  Initializing worker threads
Crash-recovery verification passed :)
2024/01/29-11:57:58  Starting database operations
2024/01/29-11:57:58  Starting verification
Stress Test : 245.167 micros/op 10221 ops/sec
            : Wrote 0.00 MB (0.10 MB/sec) (16% of 6 ops)
            : Wrote 1 times
            : Deleted 0 times
            : Single deleted 0 times
            : 4 read and 0 found the key
            : Prefix scanned 0 times
            : Iterator size sum is 0
            : Iterated 3 times
            : Deleted 0 key-ranges
            : Range deletions covered 0 keys
            : Got errors 0 times
            : 0 CompactFiles() succeed
            : 0 CompactFiles() did not succeed

stderr:
Error : jewoongh injected test error This is not a real failure.
Error : jewoongh injected test error This is not a real failure.
Error : jewoongh injected test error This is not a real failure.
Verification failed :(
```

Reviewed By: akankshamahajan15

Differential Revision: D53193587

Pulled By: jaykorean

fbshipit-source-id: 40d59f4c993c5ce043c571a207ccc9b74a0180c6
2024-01-29 12:52:59 -08:00
Peter Dillinger 4e60663b31 Remove unnecessary, confusing 'extern' (#12300)
Summary:
In C++, `extern` is redundant in a number of cases:
* "Global" function declarations and definitions
* "Global" variable definitions when already declared `extern`

For consistency and simplicity, I've removed these in code that *we own*. In a couple of cases, I removed obsolete declarations, and for MagicNumber constants, I have consolidated the declarations into a header file (format.h)
as standard best practice would prescribe.

Pull Request resolved: https://github.com/facebook/rocksdb/pull/12300

Test Plan: no functional changes, CI

Reviewed By: ajkr

Differential Revision: D53148629

Pulled By: pdillinger

fbshipit-source-id: fb8d927959892e03af09b0c0d542b0a3b38fd886
2024-01-29 10:38:08 -08:00
Hui Xiao 438fc3d9b7 No consistency check when compaction filter is enabled in stress test (#12291)
Summary:
**Context/Summary:**
[Consistency check between Multiget and Get](d82d179a5e/db_stress_tool/no_batched_ops_stress.cc (L585-L591)) requires snapshot to be repeatable, that is, no keys being protected by this snapshot is deleted. However compaction filter can remove keys under snapshot, e,g,[DBStressCompactionFilter](d82d179a5e/db_stress_tool/db_stress_compaction_filter.h (L59)), which makes consistency check fail meaninglessly. This is noted in https://github.com/facebook/rocksdb/wiki/Compaction-Filter - "Since release 6.0, with compaction filter enabled, RocksDB always invoke filtering for any key, even if it knows it will make a snapshot not repeatable."

This PR makes consistency check happens only when compaction filter is not enabled in stress test

Pull Request resolved: https://github.com/facebook/rocksdb/pull/12291

Test Plan:
- Make the stress test command that fails on consistency check pass
```
 ./db_stress --preserve_unverified_changes=1 --acquire_snapshot_one_in=0 --adaptive_readahead=0 --allow_concurrent_memtable_write=0 --allow_data_in_errors=True --async_io=0 --auto_readahead_size=0 --avoid_flush_during_recovery=0 --avoid_unnecessary_blocking_io=1 --backup_max_size=104857600 --backup_one_in=0 --batch_protection_bytes_per_key=0 --block_protection_bytes_per_key=0 --block_size=16384 --bloom_before_level=2147483647 --bloom_bits=30.729729833325962 --bottommost_compression_type=disable --bottommost_file_compaction_delay=0 --bytes_per_sync=0 --cache_index_and_filter_blocks=1 --cache_size=33554432 --cache_type=lru_cache --charge_compression_dictionary_building_buffer=0 --charge_file_metadata=1 --charge_filter_construction=0 --charge_table_reader=1 --checkpoint_one_in=0 --checksum_type=kCRC32c --clear_column_family_one_in=0 --column_families=1 --compact_files_one_in=0 --compact_range_one_in=0 --compaction_pri=4 --compaction_readahead_size=0 --compaction_ttl=0 --compressed_secondary_cache_size=8388608 --compression_checksum=0 --compression_max_dict_buffer_bytes=0 --compression_max_dict_bytes=0 --compression_parallel_threads=1 --compression_type=zlib --compression_use_zstd_dict_trainer=0 --compression_zstd_max_train_bytes=0 --continuous_verification_interval=0 --data_block_index_type=0 --db=$db --db_write_buffer_size=0 --delpercent=0 --delrangepercent=50 --destroy_db_initially=0 --detect_filter_construct_corruption=1 --disable_wal=0 --enable_compaction_filter=1 --disable_auto_compactions=1 --enable_pipelined_write=0 --enable_thread_tracking=0 --expected_values_dir=$expected --fail_if_options_file_error=1 --fifo_allow_compaction=0 --file_checksum_impl=xxh64 --flush_one_in=0 --format_version=5 --get_current_wal_file_one_in=0 --get_live_files_one_in=0 --get_property_one_in=0 --get_sorted_wal_files_one_in=0 --index_block_restart_interval=3 --index_type=0 --ingest_external_file_one_in=100 --initial_auto_readahead_size=0 --iterpercent=0 --key_len_percent_dist=1,30,69 --level_compaction_dynamic_level_bytes=1 --lock_wal_one_in=0 --long_running_snapshots=0 --manual_wal_flush_one_in=0 --mark_for_compaction_one_file_in=0 --max_auto_readahead_size=0 --max_background_compactions=1 --max_bytes_for_level_base=67108864 --max_key=1000 --max_key_len=3 --max_manifest_file_size=1073741824 --max_write_batch_group_size_bytes=64 --max_write_buffer_number=10 --max_write_buffer_size_to_maintain=2097152 --memtable_max_range_deletions=0 --memtable_prefix_bloom_size_ratio=0.01 --memtable_protection_bytes_per_key=8 --memtable_whole_key_filtering=0 --memtablerep=skip_list --min_write_buffer_number_to_merge=1 --mmap_read=0 --mock_direct_io=True --nooverwritepercent=1 --num_file_reads_for_auto_readahead=1 --open_files=-1 --open_metadata_write_fault_one_in=0 --open_read_fault_one_in=0 --open_write_fault_one_in=0 --ops_per_thread=100000 --optimize_filters_for_memory=0 --paranoid_file_checks=0 --partition_filters=0 --partition_pinning=2 --pause_background_one_in=0 --periodic_compaction_seconds=0 --prefix_size=8 --prefixpercent=0 --prepopulate_block_cache=0 --preserve_internal_time_seconds=0 --progress_reports=0 --read_fault_one_in=0 --readahead_size=0 --readpercent=45 --recycle_log_file_num=0 --reopen=0 --secondary_cache_fault_one_in=0 --secondary_cache_uri= --set_options_one_in=0 --snapshot_hold_ops=0 --sst_file_manager_bytes_per_sec=104857600 --sst_file_manager_bytes_per_truncate=1048576 --stats_dump_period_sec=0 --subcompactions=1 --sync=0 --sync_fault_injection=0 --target_file_size_base=167772 --target_file_size_multiplier=1 --test_batches_snapshots=0 --top_level_index_pinning=0 --unpartitioned_pinning=3 --use_direct_io_for_flush_and_compaction=0 --use_direct_reads=0 --use_full_merge_v1=0 --use_get_entity=0 --use_merge=0 --use_multi_get_entity=0 --use_multiget=1 --use_put_entity_one_in=0 --use_write_buffer_manager=0 --user_timestamp_size=0 --value_size_mult=32 --verification_only=0 --verify_checksum=0 --verify_checksum_one_in=0 --verify_db_one_in=0 --verify_file_checksums_one_in=0 --verify_iterator_with_expected_state_one_in=0 --verify_sst_unique_id_in_manifest=0 --wal_bytes_per_sync=0 --wal_compression=none --write_buffer_size=335544 --write_dbid_to_manifest=1 --write_fault_one_in=0 --writepercent=5
 ```

Reviewed By: jaykorean, ajkr

Differential Revision: D53075223

Pulled By: hx235

fbshipit-source-id: 61aa4a79de5d123a55eb5ac08b449a8362cc91ae
2024-01-25 09:58:25 -08:00
Yu Zhang 9243f1b668 Ensures PendingExpectedValue either Commit or Rollback (#12244)
Summary:
This PR adds automatic checks in the `PendingExpectedValue` class to make sure it's either committed or rolled back before being destructed.

Pull Request resolved: https://github.com/facebook/rocksdb/pull/12244

Reviewed By: hx235

Differential Revision: D52853794

Pulled By: jowlyzhang

fbshipit-source-id: 1dcd7695f2c52b79695be0abe11e861047637dc4
2024-01-24 11:04:40 -08:00
Changyu Bi a29db3048f Fix TestGetEntity failure with UDT (#12264)
Summary:
Use the read option with right timestamp and skip verification when using old timestamps.

Pull Request resolved: https://github.com/facebook/rocksdb/pull/12264

Test Plan:
I can repro with small keyspace:
```
./db_stress --acquire_snapshot_one_in=10000 --adaptive_readahead=0 --allow_data_in_errors=True --async_io=0 --auto_readahead_size=1 --avoid_flush_during_recovery=0 --avoid_unnecessary_blocking_io=0 --backup_max_size=104857600 --backup_one_in=1000 --batch_protection_bytes_per_key=0 --block_protection_bytes_per_key=4 --block_size=16384 --bloom_before_level=7 --bloom_bits=15 --bottommost_compression_type=xpress --bottommost_file_compaction_delay=0 --bytes_per_sync=262144 --cache_index_and_filter_blocks=1 --cache_size=33554432 --cache_type=fixed_hyper_clock_cache --charge_compression_dictionary_building_buffer=1 --charge_file_metadata=0 --charge_filter_construction=1 --charge_table_reader=1 --checkpoint_one_in=10000 --checksum_type=kXXH3 --clear_column_family_one_in=0 --compact_files_one_in=1000000 --compact_range_one_in=1000000 --compaction_pri=4 --compaction_readahead_size=0 --compaction_style=1 --compaction_ttl=0 --compressed_secondary_cache_size=16777216 --compression_checksum=1 --compression_max_dict_buffer_bytes=0 --compression_max_dict_bytes=0 --compression_parallel_threads=1 --compression_type=snappy --compression_use_zstd_dict_trainer=0 --compression_zstd_max_train_bytes=0 --continuous_verification_interval=0 --data_block_index_type=1 --db_write_buffer_size=0 --delpercent=4 --delrangepercent=1 --destroy_db_initially=1 --detect_filter_construct_corruption=1 --disable_wal=0 --enable_compaction_filter=0 --enable_pipelined_write=1 --enable_thread_tracking=0 --fail_if_options_file_error=1 --fifo_allow_compaction=1 --file_checksum_impl=big --flush_one_in=1000 --format_version=2 --get_current_wal_file_one_in=0 --get_live_files_one_in=10000 --get_property_one_in=100000 --get_sorted_wal_files_one_in=0 --index_block_restart_interval=13 --index_type=0 --ingest_external_file_one_in=0 --initial_auto_readahead_size=16384 --iterpercent=10 --key_len_percent_dist=1,30,69 --level_compaction_dynamic_level_bytes=0 --lock_wal_one_in=10000 --log2_keys_per_lock=10 --long_running_snapshots=0 --manual_wal_flush_one_in=1000 --mark_for_compaction_one_file_in=0 --max_auto_readahead_size=0 --max_background_compactions=20 --max_bytes_for_level_base=10485760 --max_key=1000 --max_key_len=3 --max_manifest_file_size=16384 --max_write_batch_group_size_bytes=1048576 --max_write_buffer_number=3 --max_write_buffer_size_to_maintain=2097152 --memtable_max_range_deletions=0 --memtable_prefix_bloom_size_ratio=0.01 --memtable_protection_bytes_per_key=0 --memtable_whole_key_filtering=0 --memtablerep=skip_list --min_write_buffer_number_to_merge=2 --mmap_read=1 --mock_direct_io=False --nooverwritepercent=1 --num_file_reads_for_auto_readahead=1 --open_files=100 --open_metadata_write_fault_one_in=0 --open_read_fault_one_in=32 --open_write_fault_one_in=16 --ops_per_thread=200000 --optimize_filters_for_memory=0 --paranoid_file_checks=0 --partition_filters=0 --partition_pinning=2 --pause_background_one_in=1000000 --periodic_compaction_seconds=0 --persist_user_defined_timestamps=1 --prefix_size=8 --prefixpercent=5 --prepopulate_block_cache=1 --preserve_internal_time_seconds=3600 --progress_reports=0 --read_fault_one_in=0 --readahead_size=16384 --readpercent=45 --recycle_log_file_num=1 --reopen=20 --secondary_cache_fault_one_in=32 --secondary_cache_uri= --snapshot_hold_ops=100000 --sst_file_manager_bytes_per_sec=0 --sst_file_manager_bytes_per_truncate=0 --stats_dump_period_sec=0 --subcompactions=3 --sync=0 --sync_fault_injection=0 --target_file_size_base=524288 --target_file_size_multiplier=2 --test_batches_snapshots=0 --test_cf_consistency=0 --top_level_index_pinning=1 --unpartitioned_pinning=1 --use_direct_io_for_flush_and_compaction=0 --use_direct_reads=0 --use_full_merge_v1=0 --use_get_entity=1 --use_merge=0 --use_multi_get_entity=0 --use_multiget=1 --use_put_entity_one_in=0 --use_txn=0 --use_write_buffer_manager=0 --user_timestamp_size=8 --value_size_mult=32 --verification_only=0 --verify_checksum=1 --verify_checksum_one_in=1000000 --verify_db_one_in=100000 --verify_file_checksums_one_in=100000 --verify_iterator_with_expected_state_one_in=5 --verify_sst_unique_id_in_manifest=1 --wal_bytes_per_sync=524288 --wal_compression=zstd --write_buffer_size=4194304 --write_dbid_to_manifest=0 --write_fault_one_in=0 --writepercent=35 --db=/dev/shm/rocksdb_test/rocksdb_crashtest_whitebox --expected_values_dir=/dev/shm/rocksdb_test/rocksdb_crashtest_expected

Errors when run with main:
error : inconsistent values for key 0x00000000000000E5000000000000012B000000000000014D: expected state has the key, GetEntity returns NotFound.

error : inconsistent values for key 0x0000000000000009000000000000012B0000000000000254: GetEntity returns :0x010000000504070609080B0A0D0C0F0E111013121514171619181B1A1D1C1F1E212023222524272629282B2A2D2C2F2E313033323534373639383B3A3D3C3F3E, expected state does not have the key.
```

Reviewed By: jaykorean

Differential Revision: D52966251

Pulled By: cbi42

fbshipit-source-id: 09436a1b747f1ac545140fc83a2fa4555fef51c1
2024-01-22 12:15:17 -08:00
Yu Zhang c4228abdc0 Fix backup/checkpoint stress test failure (#12227)
Summary:
This PR fixes this type of stress test failure that could happen in either checkpoint or backup. Example failure messages are like this:

`Failure in a backup/restore operation with: Corruption: 0x00000000000001D5000000000000012B00000000000000FD exists in original db but not in restore`

`A checkpoint operation failed with: Corruption: 0x0000000000000365000000000000012B0000000000000067 exists in original db but not in checkpoint /...`

The internal task has an example test command to quickly reproduce this type of error.

The common symptom of these test failures are these expected keys do not exist in the original db either. The root cause is `TestCheckpoint` and `TestBackupRestore` both use the expected state as a proxy for the state of the original db when it comes to check a key's existence. 0758271d51/db_stress_tool/db_stress_test_base.cc (L1838)

This `ExpectedState::Exists` API returns true if a key has a pending write, such as a pending put. In usual case, this pending put should either soon materialize to an actual write when `PendingExpectedValue::Commit` is called to reflect a successful write to the DB, or test should be safely terminated if write to DB fails. All of which happens while a key is locked. So checkpoint and backup usually won't see the discrepancy between db and expected state caused by pending writes. However, the external file ingestion test currently has a path that will proceed the test after a failed ingestion caused by injected errors, leaving the pending put in the expected state. 0758271d51/db_stress_tool/no_batched_ops_stress.cc (L1577-L1589)

I think a proper and future proof fix for this is to explicitly rollback a pending state when a db write operation failed so that expected state do not diverge from db in the first place. I added a `PendingExpectedValue::Rollback` API so that we don't implicitly depend on thread termination to prevent test failures. Another place that could cause same divergence as external file ingestion is `PreloadDbAndReopenAsReadOnly`.
0758271d51/db_stress_tool/db_stress_test_base.cc (L616-L619)

Pull Request resolved: https://github.com/facebook/rocksdb/pull/12227

Reviewed By: hx235

Differential Revision: D52705470

Pulled By: jowlyzhang

fbshipit-source-id: b21586b037caeeba29a2cff8c2fdc6f1d0bda9cf
2024-01-16 13:28:51 -08:00