Commit Graph

1348 Commits

Author SHA1 Message Date
Jay Zhuang c6d326d3d7 Track SST unique id in MANIFEST and verify (#9990)
Summary:
Start tracking SST unique id in MANIFEST, which is used to verify with
SST properties to make sure the SST file is not overwritten or
misplaced. A DB option `try_verify_sst_unique_id` is introduced to
enable/disable the verification, if enabled, it opens all SST files
during DB-open to read the unique_id from table properties (default is
false), so it's recommended to use it with `max_open_files = -1` to
pre-open the files.

Pull Request resolved: https://github.com/facebook/rocksdb/pull/9990

Test Plan: unittests, format-compatible test, mini-crash

Reviewed By: anand1976

Differential Revision: D36381863

Pulled By: jay-zhuang

fbshipit-source-id: 89ea2eb6b35ed3e80ead9c724eb096083eaba63f
2022-05-19 11:04:21 -07:00
Hui Xiao 3573558ec5 Rewrite memory-charging feature's option API (#9926)
Summary:
**Context:**
Previous PR https://github.com/facebook/rocksdb/pull/9748, https://github.com/facebook/rocksdb/pull/9073, https://github.com/facebook/rocksdb/pull/8428 added separate flag for each charged memory area. Such API design is not scalable as we charge more and more memory areas. Also, we foresee an opportunity to consolidate this feature with other cache usage related features such as `cache_index_and_filter_blocks` using `CacheEntryRole`.

Therefore we decided to consolidate all these flags with `CacheUsageOptions cache_usage_options` and this PR serves as the first step by consolidating memory-charging related flags.

**Summary:**
- Replaced old API reference with new ones, including making `kCompressionDictionaryBuildingBuffer` opt-out and added a unit test for that
- Added missing db bench/stress test for some memory charging features
- Renamed related test suite to indicate they are under the same theme of memory charging
- Refactored a commonly used mocked cache component in memory charging related tests to reduce code duplication
- Replaced the phrases "memory tracking" / "cache reservation" (other than CacheReservationManager-related ones) with "memory charging" for standard description of this feature.

Pull Request resolved: https://github.com/facebook/rocksdb/pull/9926

Test Plan:
- New unit test for opt-out `kCompressionDictionaryBuildingBuffer` `TEST_F(ChargeCompressionDictionaryBuildingBufferTest, Basic)`
- New unit test for option validation/sanitization `TEST_F(CacheUsageOptionsOverridesTest, SanitizeAndValidateOptions)`
- CI
- db bench (in case querying new options introduces regression) **+0.5% micros/op**: `TEST_TMPDIR=/dev/shm/testdb ./db_bench -benchmarks=fillseq -db=$TEST_TMPDIR  -charge_compression_dictionary_building_buffer=1(remove this for comparison)  -compression_max_dict_bytes=10000 -disable_auto_compactions=1 -write_buffer_size=100000 -num=4000000 | egrep 'fillseq'`

#-run | (pre-PR) avg micros/op | std micros/op | (post-PR)  micros/op | std micros/op | change (%)
-- | -- | -- | -- | -- | --
10 | 3.9711 | 0.264408 | 3.9914 | 0.254563 | 0.5111933721
20 | 3.83905 | 0.0664488 | 3.8251 | 0.0695456 | **-0.3633711465**
40 | 3.86625 | 0.136669 | 3.8867 | 0.143765 | **0.5289363078**

- db_stress: `python3 tools/db_crashtest.py blackbox  -charge_compression_dictionary_building_buffer=1 -charge_filter_construction=1 -charge_table_reader=1 -cache_size=1` killed as normal

Reviewed By: ajkr

Differential Revision: D36054712

Pulled By: hx235

fbshipit-source-id: d406e90f5e0c5ea4dbcb585a484ad9302d4302af
2022-05-17 15:01:51 -07:00
Yanqin Jin f6d9730ea1 Fix stress test with best-efforts-recovery (#9986)
Summary:
This PR

- since we are testing with disable_wal = true and best_efforts_recovery, we should set column family count to 1, due to the requirement of `ExpectedState` tracking and replaying logic.
- during backup and checkpoint restore, disable best-efforts-recovery. This does not matter now because db_crashtest.py always disables wal when testing best-efforts-recovery. In the future, if we enable wal, then not setting `restore_opitions.best_efforts_recovery` will cause backup db not to recover the WALs, and differ from db (that enables WAL).
- during verification of backup and checkpoint restore, print the key where inconsistency exists between expected state and db.

Pull Request resolved: https://github.com/facebook/rocksdb/pull/9986

Test Plan: TEST_TMPDIR=/dev/shm/rocksdb make crash_test_with_best_efforts_recovery

Reviewed By: siying

Differential Revision: D36353105

Pulled By: riversand963

fbshipit-source-id: a484da161273e6216a1f7e245bac15a349693917
2022-05-13 12:29:20 -07:00
Andrew Kryczka e943bbdd2f Temporarily disable sync_fault_injection (#9979)
Summary: Pull Request resolved: https://github.com/facebook/rocksdb/pull/9979

Reviewed By: siying

Differential Revision: D36301555

Pulled By: ajkr

fbshipit-source-id: ed298d3484b6aad3ef19746e984bf4c52be33a9f
2022-05-11 12:19:07 -07:00
yaphet 26768edb65 Support single delete in ldb (#9469)
Summary: Pull Request resolved: https://github.com/facebook/rocksdb/pull/9469

Reviewed By: riversand963

Differential Revision: D33953484

fbshipit-source-id: f4e84a2d9865957d744c7e84ff02ffbb0a62b0a8
2022-05-10 16:37:19 -07:00
Peter Dillinger c5c58708db Fix format_compatible blowing away its TEST_TMPDIR (#9970)
Summary:
https://github.com/facebook/rocksdb/issues/9961 broke format_compatible check because of `make clean`
referencing TEST_TMPDIR. The Makefile behavior seems reasonable to me,
so here's a fix in check_format_compatible.sh

Apparently I also included removing a redundant part of our CircleCI config.

Pull Request resolved: https://github.com/facebook/rocksdb/pull/9970

Test Plan: manual run: SHORT_TEST=1 ./tools/check_format_compatible.sh

Reviewed By: riversand963

Differential Revision: D36258172

Pulled By: pdillinger

fbshipit-source-id: d46507f04614e888b414ff23b88d040ae2b5c294
2022-05-09 13:38:46 -07:00
sdong 736a7b5433 Remove own ToString() (#9955)
Summary:
ToString() is created as some platform doesn't support std::to_string(). However, we've already used std::to_string() by mistake for 16 months (in db/db_info_dumper.cc). This commit just remove ToString().

Pull Request resolved: https://github.com/facebook/rocksdb/pull/9955

Test Plan: Watch CI tests

Reviewed By: riversand963

Differential Revision: D36176799

fbshipit-source-id: bdb6dcd0e3a3ab96a1ac810f5d0188f684064471
2022-05-06 13:03:58 -07:00
Andrew Kryczka 62d84e2a2b db_stress fault injection in release mode (#9957)
Summary:
Previously all fault injection was ignored in release mode. This PR adds it back except for read fault injection (`--read_fault_one_in > 0`) since its dependency (`IGNORE_STATUS_IF_ERROR`) is unavailable in release mode.

Other notable changes include:

- Moved `EnableWriteErrorInjection()` for `--write_fault_one_in > 0` so it's after `DB::Open()` without depending on `SyncPoint`
- Made `--read_fault_one_in > 0` return an error in release mode
- Updated `db_crashtest.py` to always set `--read_fault_one_in=0` in release mode

Pull Request resolved: https://github.com/facebook/rocksdb/pull/9957

Test Plan:
```
$ DEBUG_LEVEL=0 make -j24 db_stress
$ DEBUG_LEVEL=0 TEST_TMPDIR=/dev/shm python3 tools/db_crashtest.py blackbox
```

Reviewed By: anand1976

Differential Revision: D36193830

Pulled By: ajkr

fbshipit-source-id: 0b97946b4e3f06e3e0f6e7833c2763da08ec5321
2022-05-06 11:17:08 -07:00
Andrew Kryczka a62506aee2 Enable unsynced data loss in crash test (#9947)
Summary:
`db_stress` already tracks expected state history to verify prefix-recoverability when `sync_fault_injection` is enabled. This PR enables `sync_fault_injection` in `db_crashtest.py`.

Previously enabling `sync_fault_injection` would cause whole unsynced files to be dropped. This PR adds a more interesting case of losing only the tail of unsynced data by implementing `TestFSWritableFile::RangeSync()` and enabling `{wal_,}bytes_per_sync`.

Pull Request resolved: https://github.com/facebook/rocksdb/pull/9947

Test Plan:
- regular blackbox, blackbox --simple
- various commands to stress this new case, such as `TEST_TMPDIR=/dev/shm python3 tools/db_crashtest.py blackbox --max_key=100000 --write_buffer_size=2097152 --avoid_flush_during_recovery=1 --disable_wal=0 --interval=10 --db_write_buffer_size=0 --sync_fault_injection=1 --wal_compression=none --delpercent=0 --delrangepercent=0 --prefixpercent=0 --iterpercent=0 --writepercent=100 --readpercent=0 --wal_bytes_per_sync=131072 --duration=36000 --sync=0 --open_write_fault_one_in=16`

Reviewed By: riversand963

Differential Revision: D36152775

Pulled By: ajkr

fbshipit-source-id: 44b68a7fad0a4cf74af9fe1f39be01baab8141d8
2022-05-05 13:21:03 -07:00
sdong 49628c9a83 Use std::numeric_limits<> (#9954)
Summary:
Right now we still don't fully use std::numeric_limits but use a macro, mainly for supporting VS 2013. Right now we only support VS 2017 and up so it is not a problem. The code comment claims that MinGW still needs it. We don't have a CI running MinGW so it's hard to validate. since we now require C++17, it's hard to imagine MinGW would still build RocksDB but doesn't support std::numeric_limits<>.

Pull Request resolved: https://github.com/facebook/rocksdb/pull/9954

Test Plan: See CI Runs.

Reviewed By: riversand963

Differential Revision: D36173954

fbshipit-source-id: a35a73af17cdcae20e258cdef57fcf29a50b49e0
2022-05-05 13:08:21 -07:00
Mark Callaghan bf68d1c93d Print elapsed time and number of operations completed (#9886)
Summary:
This is inspired by debugging a regression test that runs for ~0.05 seconds and the short
running time makes it prone to variance. While db_bench ran for ~60 seconds, 59.95 seconds
was spent opening 128 databases (and doing recovery). So it was harder to notice that the
benchmark only ran for 0.05 seconds.

Normally I add output to the end of the line to make life easier for existing tools that parse it
but in this case the output near the end of the line has two optional parts and one of the optional
parts adds an extra newline.

This is for https://github.com/facebook/rocksdb/issues/9856

Pull Request resolved: https://github.com/facebook/rocksdb/pull/9886

Test Plan:
./db_bench --benchmarks=overwrite,readrandom --num=1000000 --threads=4

old output:
 DB path: [/tmp/rocksdbtest-2260/dbbench]
 overwrite    :      14.108 micros/op 283338 ops/sec;   31.3 MB/s
 DB path: [/tmp/rocksdbtest-2260/dbbench]
 readrandom   :       7.994 micros/op 496788 ops/sec;   55.0 MB/s (1000000 of 1000000 found)

new output:
 DB path: [/tmp/rocksdbtest-2260/dbbench]
 overwrite    :      14.117 micros/op 282862 ops/sec 14.141 seconds 4000000 operations;   31.3 MB/s
 DB path: [/tmp/rocksdbtest-2260/dbbench]
 readrandom   :       8.649 micros/op 458475 ops/sec 8.725 seconds 4000000 operations;   49.8 MB/s (981548 of 1000000 found)

Reviewed By: ajkr

Differential Revision: D36102269

Pulled By: mdcallag

fbshipit-source-id: 5cd8a9e11f5cbe2a46809571afd83335b6b0caa0
2022-05-04 10:15:49 -07:00
Jay Zhuang 270179bb12 Default `try_load_options` to true when DB is specified (#9937)
Summary:
If the DB path is specified, the user would expect ldb loads the
options from the path, but it's not:
```
$ ldb list_live_files_metadata --db=`pwd`
```
Default `try_load_options` to true in that case. The user can still
disable that by:
```
$ ldb list_live_files_metadata --db=`pwd` --try_load_options=false
```

Pull Request resolved: https://github.com/facebook/rocksdb/pull/9937

Test Plan:
`ldb list_live_files_metadata --db=`pwd`` is able to work for
a db generated with different options.num_levels.

Reviewed By: ajkr

Differential Revision: D36106708

Pulled By: jay-zhuang

fbshipit-source-id: 2732fdc027a4d172436b2c9b6a9787b56b10c710
2022-05-04 08:49:46 -07:00
Mark Callaghan b6ec3328af Make --benchmarks=flush flush the default column family (#9887)
Summary:
db_bench --benchmarks=flush wasn't flushing the default column family.

This is for https://github.com/facebook/rocksdb/issues/9880

Pull Request resolved: https://github.com/facebook/rocksdb/pull/9887

Test Plan:
Confirm that flush works (*.log is empty) when "flush" added to benchmark list
Confirm that *.log is not empty otherwise.

Repeat for all combinations for: uses column families, uses multiple databases

./db_bench --benchmarks=overwrite --num=10000
ls -lrt /tmp/rocksdbtest-2260/dbbench/*.log
-rw-r--r-- 1 me users 1380286 Apr 21 10:47 /tmp/rocksdbtest-2260/dbbench/000004.log

./db_bench --benchmarks=overwrite,flush --num=10000
ls -lrt /tmp/rocksdbtest-2260/dbbench/*.log
 -rw-r--r-- 1 me users 0 Apr 21 10:48 /tmp/rocksdbtest-2260/dbbench/000008.log

./db_bench --benchmarks=overwrite --num=10000 --num_column_families=4
ls -lrt /tmp/rocksdbtest-2260/dbbench/*.log
  -rw-r--r-- 1 me users 1387823 Apr 21 10:49 /tmp/rocksdbtest-2260/dbbench/000004.log

./db_bench --benchmarks=overwrite,flush --num=10000 --num_column_families=4
ls -lrt /tmp/rocksdbtest-2260/dbbench/*.log
-rw-r--r-- 1 me users 0 Apr 21 10:51 /tmp/rocksdbtest-2260/dbbench/000014.log

./db_bench --benchmarks=overwrite --num=10000 --num_multi_db=2
ls -lrt /tmp/rocksdbtest-2260/dbbench/[01]/*.log
 -rw-r--r-- 1 me users 1380838 Apr 21 10:55 /tmp/rocksdbtest-2260/dbbench/0/000004.log
 -rw-r--r-- 1 me users 1379734 Apr 21 10:55 /tmp/rocksdbtest-2260/dbbench/1/000004.log

./db_bench --benchmarks=overwrite,flush --num=10000 --num_multi_db=2
ls -lrt /tmp/rocksdbtest-2260/dbbench/[01]/*.log
-rw-r--r-- 1 me users 0 Apr 21 10:57 /tmp/rocksdbtest-2260/dbbench/0/000013.log
-rw-r--r-- 1 me users 0 Apr 21 10:57 /tmp/rocksdbtest-2260/dbbench/1/000013.log

./db_bench --benchmarks=overwrite --num=10000 --num_column_families=4 --num_multi_db=2
ls -lrt /tmp/rocksdbtest-2260/dbbench/[01]/*.log
-rw-r--r-- 1 me users 1395108 Apr 21 10:52 /tmp/rocksdbtest-2260/dbbench/1/000004.log
-rw-r--r-- 1 me users 1380411 Apr 21 10:52 /tmp/rocksdbtest-2260/dbbench/0/000004.log

./db_bench --benchmarks=overwrite,flush --num=10000 --num_column_families=4 --num_multi_db=2
ls -lrt /tmp/rocksdbtest-2260/dbbench/[01]/*.log
-rw-r--r-- 1 me users 0 Apr 21 10:54 /tmp/rocksdbtest-2260/dbbench/0/000022.log
-rw-r--r-- 1 me users 0 Apr 21 10:54 /tmp/rocksdbtest-2260/dbbench/1/000022.log

Reviewed By: ajkr

Differential Revision: D36026777

Pulled By: mdcallag

fbshipit-source-id: d42d3d7efceea7b9a25bbbc0f04461d2b7301122
2022-05-03 09:37:49 -07:00
Yanqin Jin 06394ff4e7 Fix a bug of CompactionIterator/CompactionFilter using `Delete` (#9929)
Summary:
When compaction filter determines that a key should be removed, it updates the internal key's type
to `Delete`. If this internal key is preserved in current compaction but seen by a later compaction
together with `SingleDelete`, it will cause compaction iterator to return Corruption.

To fix the issue, compaction filter should return more information in addition to the intention of removing
a key. Therefore, we add a new `kRemoveWithSingleDelete` to `CompactionFilter::Decision`. Seeing
`kRemoveWithSingleDelete`, compaction iterator will update the op type of the internal key to `kTypeSingleDelete`.

In addition, I updated db_stress_shared_state.[cc|h] so that `no_overwrite_ids_` becomes `const`. It is easier to
reason about thread-safety if accessed from multiple threads. This information is passed to `PrepareTxnDBOptions()`
when calling from `Open()` so that we can set up the rollback deletion type callback for transactions.

Finally, disable compaction filter for multiops_txn because the key removal logic of `DbStressCompactionFilter` does
not quite work with `MultiOpsTxnsStressTest`.

Pull Request resolved: https://github.com/facebook/rocksdb/pull/9929

Test Plan:
make check
make crash_test
make crash_test_with_txn

Reviewed By: anand1976

Differential Revision: D36069678

Pulled By: riversand963

fbshipit-source-id: cedd2f1ba958af59ad3916f1ba6f424307955f92
2022-05-02 13:25:45 -07:00
Anvesh Komuravelli aafb377bb5 Update protection info on recovered logs data (#9875)
Summary:
Update protection info on recovered logs data

Pull Request resolved: https://github.com/facebook/rocksdb/pull/9875

Test Plan:
- Benchmark setup: `TEST_TMPDIR=/dev/shm/100MB_WAL_DB/ ./db_bench -benchmarks=fillrandom -write_buffer_size=1048576000`
- Benchmark command: `TEST_TMPDIR=/dev/shm/100MB_WAL_DB/ /usr/bin/time ./db_bench -use_existing_db=true -benchmarks=overwrite -write_buffer_size=1048576000 -writes=1 -report_open_timing=true`
- Results before this PR
```
OpenDb:     2350.14 milliseconds
OpenDb:     2296.94 milliseconds
OpenDb:     2184.29 milliseconds
OpenDb:     2167.59 milliseconds
OpenDb:     2231.24 milliseconds
OpenDb:     2109.57 milliseconds
OpenDb:     2197.71 milliseconds
OpenDb:     2120.8 milliseconds
OpenDb:     2148.12 milliseconds
OpenDb:     2207.95 milliseconds
```
- Results after this PR
```
OpenDb:     2424.52 milliseconds
OpenDb:     2359.84 milliseconds
OpenDb:     2317.68 milliseconds
OpenDb:     2339.4 milliseconds
OpenDb:     2325.36 milliseconds
OpenDb:     2321.06 milliseconds
OpenDb:     2353.98 milliseconds
OpenDb:     2344.64 milliseconds
OpenDb:     2384.09 milliseconds
OpenDb:     2428.58 milliseconds
```

Mean regressed 7.2% (2201.4 -> 2359.9)

Reviewed By: ajkr

Differential Revision: D36012787

Pulled By: akomurav

fbshipit-source-id: d2aba09f29c6beb2fd0fe8e1e359be910b4ef02a
2022-04-28 14:42:00 -07:00
Yanqin Jin 94e245a14d Improve stress test for MultiOpsTxnsStressTest (#9829)
Summary:
Adds more coverage to `MultiOpsTxnsStressTest` with a focus on write-prepared transactions.

1. Add a hack to manually evict commit cache entries. We currently cannot assign small values to `wp_commit_cache_bits` because it requires a prepared transaction to commit within a certain range of sequence numbers, otherwise it will throw.
2. Add coverage for commit-time-write-batch. If write policy is write-prepared, we need to set `use_only_the_last_commit_time_batch_for_recovery` to true.
3. After each flush/compaction, verify data consistency. This is possible since data size can be small: default numbers of primary/secondary keys are just 1000.

Pull Request resolved: https://github.com/facebook/rocksdb/pull/9829

Test Plan:
```
TEST_TMPDIR=/dev/shm/rocksdb_crashtest_blackbox/ make blackbox_crash_test_with_multiops_wp_txn
```

Reviewed By: pdillinger

Differential Revision: D35806678

Pulled By: riversand963

fbshipit-source-id: d7fde7a29fda0fb481a61f553e0ca0c47da93616
2022-04-27 17:50:54 -07:00
Jaromir Vanek fb9a167a55 Add 95% confidence intervals to db_bench output (#9882)
Summary:
Enhancing `db_bench` output with 95% statistical confidence intervals for better performance evaluation. The goal is to unambiguously separate random variance when running benchmark over multiple iterations.

Output enhanced with confidence intervals exposed in brackets:

```
$ ./db_bench --benchmarks=fillseq[-X10]

Running benchmark for 10 times
fillseq      :       4.961 micros/op 201578 ops/sec;   22.3 MB/s
fillseq      :       5.030 micros/op 198824 ops/sec;   22.0 MB/s
fillseq [AVG 2 runs] : 200201 (± 2698) ops/sec;   22.1 (± 0.3) MB/sec
fillseq      :       4.963 micros/op 201471 ops/sec;   22.3 MB/s
fillseq [AVG 3 runs] : 200624 (± 1765) ops/sec;   22.2 (± 0.2) MB/sec
fillseq      :       5.035 micros/op 198625 ops/sec;   22.0 MB/s
fillseq [AVG 4 runs] : 200124 (± 1586) ops/sec;   22.1 (± 0.2) MB/sec
fillseq      :       4.979 micros/op 200861 ops/sec;   22.2 MB/s
fillseq [AVG 5 runs] : 200272 (± 1262) ops/sec;   22.2 (± 0.1) MB/sec
fillseq      :       4.893 micros/op 204367 ops/sec;   22.6 MB/s
fillseq [AVG 6 runs] : 200954 (± 1688) ops/sec;   22.2 (± 0.2) MB/sec
fillseq      :       4.914 micros/op 203502 ops/sec;   22.5 MB/s
fillseq [AVG 7 runs] : 201318 (± 1595) ops/sec;   22.3 (± 0.2) MB/sec
fillseq      :       4.998 micros/op 200074 ops/sec;   22.1 MB/s
fillseq [AVG 8 runs] : 201163 (± 1415) ops/sec;   22.3 (± 0.2) MB/sec
fillseq      :       4.946 micros/op 202188 ops/sec;   22.4 MB/s
fillseq [AVG 9 runs] : 201277 (± 1267) ops/sec;   22.3 (± 0.1) MB/sec
fillseq      :       5.093 micros/op 196331 ops/sec;   21.7 MB/s
fillseq [AVG 10 runs] : 200782 (± 1491) ops/sec;   22.2 (± 0.2) MB/sec
fillseq [AVG    10 runs] : 200782 (± 1491) ops/sec;   22.2 (± 0.2) MB/sec
fillseq [MEDIAN 10 runs] : 201166 ops/sec;   22.3 MB/s
```

For more explicit interval representation, use `--confidence_interval_only` flag:

```
$ ./db_bench --benchmarks=fillseq[-X10] --confidence_interval_only

Running benchmark for 10 times
fillseq      :       4.935 micros/op 202648 ops/sec;   22.4 MB/s
fillseq      :       5.078 micros/op 196943 ops/sec;   21.8 MB/s
fillseq [CI95 2 runs] : (194205, 205385) ops/sec; (21.5, 22.7) MB/sec
fillseq      :       5.159 micros/op 193816 ops/sec;   21.4 MB/s
fillseq [CI95 3 runs] : (192735, 202869) ops/sec; (21.3, 22.4) MB/sec
fillseq      :       4.947 micros/op 202158 ops/sec;   22.4 MB/s
fillseq [CI95 4 runs] : (194721, 203061) ops/sec; (21.5, 22.5) MB/sec
fillseq      :       4.908 micros/op 203756 ops/sec;   22.5 MB/s
fillseq [CI95 5 runs] : (196113, 203615) ops/sec; (21.7, 22.5) MB/sec
fillseq      :       5.063 micros/op 197528 ops/sec;   21.9 MB/s
fillseq [CI95 6 runs] : (196319, 202631) ops/sec; (21.7, 22.4) MB/sec
fillseq      :       5.214 micros/op 191799 ops/sec;   21.2 MB/s
fillseq [CI95 7 runs] : (194953, 201803) ops/sec; (21.6, 22.3) MB/sec
fillseq      :       5.260 micros/op 190095 ops/sec;   21.0 MB/s
fillseq [CI95 8 runs] : (193749, 200937) ops/sec; (21.4, 22.2) MB/sec
fillseq      :       5.076 micros/op 196992 ops/sec;   21.8 MB/s
fillseq [CI95 9 runs] : (194134, 200474) ops/sec; (21.5, 22.2) MB/sec
fillseq      :       5.388 micros/op 185603 ops/sec;   20.5 MB/s
fillseq [CI95 10 runs] : (192487, 199781) ops/sec; (21.3, 22.1) MB/sec
fillseq [AVG    10 runs] : 196134 (± 3647) ops/sec;   21.7 (± 0.4) MB/sec
fillseq [MEDIAN 10 runs] : 196968 ops/sec;   21.8 MB/sec
```

Pull Request resolved: https://github.com/facebook/rocksdb/pull/9882

Reviewed By: pdillinger

Differential Revision: D35796148

Pulled By: vanekjar

fbshipit-source-id: 8313712d16728ff982b8aff28195ee56622385b8
2022-04-25 14:49:54 -07:00
yuzhangyu ac29645743 Add blob dump support to the dump_live_files command (#9896)
Summary:
This patch completes the second part of the task: "Add blob support to the dump and dump_live_files command"

Pull Request resolved: https://github.com/facebook/rocksdb/pull/9896

Reviewed By: ltamasi

Differential Revision: D35852667

Pulled By: jowlyzhang

fbshipit-source-id: a006456c881f468a92da689e895134762e9574e1
2022-04-22 16:54:43 -07:00
yuzhangyu fff28a7725 Add blob dump support to the dump command (#9881)
Summary:
This patch is the first part of adding blob dump support. It only adds blob dump support to the dump command. A follow up patch will add blob dump support to the dump_live_files command.

Pull Request resolved: https://github.com/facebook/rocksdb/pull/9881

Reviewed By: ltamasi

Differential Revision: D35796731

Pulled By: jowlyzhang

fbshipit-source-id: 2cc5973b222d505a331ac7b969edcf992b47c5ee
2022-04-21 20:37:07 -07:00
Jay Zhuang 2ea4205a69 Add 7.2 to compatible check (#9858)
Summary:
Add 7.2 to compatible check (should change it with version update).

Pull Request resolved: https://github.com/facebook/rocksdb/pull/9858

Reviewed By: riversand963

Differential Revision: D35722897

Pulled By: jay-zhuang

fbshipit-source-id: 08c782b9344599d7296543eb0c61afcd9a869a1a
2022-04-20 11:34:20 -07:00
yuzhangyu 9b5790f018 Add --decode_blob_index option to idump and dump commands (#9870)
Summary:
This patch completes the first part of the task: "Extend all three commands so they can decode and print blob references if a new option --decode_blob_index is specified"

Pull Request resolved: https://github.com/facebook/rocksdb/pull/9870

Reviewed By: ltamasi

Differential Revision: D35753932

Pulled By: jowlyzhang

fbshipit-source-id: 9d2bbba0eef2ed86b982767eba9de1b4881f35c9
2022-04-20 11:10:20 -07:00
Andrew Kryczka 690f1edf37 Avoid overwriting OPTIONS file settings in db_bench (#9862)
Summary:
`InitializeOptionsGeneral()` was overwriting many options that were already configured by OPTIONS file, potentially with the flag default values. This PR changes that function to only overwrite options in limited scenarios, as described at the top of its definition. Block cache is still a violation.

Pull Request resolved: https://github.com/facebook/rocksdb/pull/9862

Test Plan: ran under various scenarios (multi-DB, single DB, OPTIONS file, flags) and verified options are set as expected

Reviewed By: jay-zhuang

Differential Revision: D35736960

Pulled By: ajkr

fbshipit-source-id: 75b77740af37e6f5741618f8a8f5685df2417d03
2022-04-18 23:46:16 -07:00
Peter Dillinger 41237dd306 Add "no compression" job to CircleCI (#9850)
Summary:
Since they operate at distinct abstraction layers, I thought it
was prudent to combine with EncryptedEnv CI test for each PR, for efficiency
in testing. Also added supported compressions to sst_dump --help output
so that CI job can verify no compiled-in compression support.

Pull Request resolved: https://github.com/facebook/rocksdb/pull/9850

Test Plan: CI, some manual stuff

Reviewed By: riversand963

Differential Revision: D35682346

Pulled By: pdillinger

fbshipit-source-id: be9879c1533fed304ee32c89fd9ba4b07c2b90cc
2022-04-18 12:47:16 -07:00
yuzhangyu 082eb04200 Add option --decode_blob_index to dump_live_files command (#9842)
Summary:
This change only add decode blob index support to dump_live_files command, which is part of a task to add blob support to a few commands.

Pull Request resolved: https://github.com/facebook/rocksdb/pull/9842

Reviewed By: ltamasi

Differential Revision: D35650167

Pulled By: jowlyzhang

fbshipit-source-id: a78151b98bc38ac6f52c6e01ca6927a3429ddd14
2022-04-15 09:04:04 -07:00
gitbw95 f241d082b6 Prevent double caching in the compressed secondary cache (#9747)
Summary:
###  **Summary:**
When both LRU Cache and CompressedSecondaryCache are configured together, there possibly are some data blocks double cached.

**Changes include:**
1. Update IS_PROMOTED to IS_IN_SECONDARY_CACHE to prevent confusions.
2. This PR updates SecondaryCacheResultHandle and use IsErasedFromSecondaryCache to determine whether the handle is erased in the secondary cache. Then, the caller can determine whether to SetIsInSecondaryCache().
3. Rename LRUSecondaryCache to CompressedSecondaryCache.

Pull Request resolved: https://github.com/facebook/rocksdb/pull/9747

Test Plan:
**Test Scripts:**
1. Populate a DB. The on disk footprint is 482 MB. The data is set to be 50% compressible, so the total decompressed size is expected to be 964 MB.
./db_bench --benchmarks=fillrandom --num=10000000 -db=/db_bench_1

2. overwrite it to a stable state:
./db_bench --benchmarks=overwrite,stats --num=10000000 -use_existing_db -duration=10 --benchmark_write_rate_limit=2000000 -db=/db_bench_1

4. Run read tests with diffeernt cache setting:

T1:
./db_bench --benchmarks=seekrandom,stats --threads=16 --num=10000000 -use_existing_db -duration=120 --benchmark_write_rate_limit=52000000 -use_direct_reads --cache_size=520000000  --statistics -db=/db_bench_1

T2:
./db_bench --benchmarks=seekrandom,stats --threads=16 --num=10000000 -use_existing_db -duration=120 --benchmark_write_rate_limit=52000000 -use_direct_reads --cache_size=320000000 -compressed_secondary_cache_size=400000000 --statistics -use_compressed_secondary_cache -db=/db_bench_1

T3:
./db_bench --benchmarks=seekrandom,stats --threads=16 --num=10000000 -use_existing_db -duration=120 --benchmark_write_rate_limit=52000000 -use_direct_reads --cache_size=520000000 -compressed_secondary_cache_size=400000000 --statistics -use_compressed_secondary_cache -db=/db_bench_1

T4:
./db_bench --benchmarks=seekrandom,stats --threads=16 --num=10000000 -use_existing_db -duration=120 --benchmark_write_rate_limit=52000000 -use_direct_reads --cache_size=20000000 -compressed_secondary_cache_size=500000000 --statistics -use_compressed_secondary_cache -db=/db_bench_1

**Before this PR**
| Cache Size | Compressed Secondary Cache Size | Cache Hit Rate |
|------------|-------------------------------------|----------------|
|520 MB | 0 MB | 85.5% |
|320 MB | 400 MB | 96.2% |
|520 MB | 400 MB | 98.3% |
|20 MB | 500 MB | 98.8% |

**Before this PR**
| Cache Size | Compressed Secondary Cache Size | Cache Hit Rate |
|------------|-------------------------------------|----------------|
|520 MB | 0 MB | 85.5% |
|320 MB | 400 MB | 99.9% |
|520 MB | 400 MB | 99.9% |
|20 MB | 500 MB | 99.2% |

Reviewed By: anand1976

Differential Revision: D35117499

Pulled By: gitbw95

fbshipit-source-id: ea2657749fc13efebe91a8a1b56bc61d6a224a12
2022-04-11 13:28:33 -07:00
Duncan Bellamy 25e31d1a94 tools/db_bench_tool.cc use uint64_t instead of size_t (#9800)
Summary:
to fix compilation for 32bit

Pull Request resolved: https://github.com/facebook/rocksdb/pull/9800

Reviewed By: riversand963

Differential Revision: D35404447

fbshipit-source-id: 6a1185bb38f3a718357aa120e3b26a1ea77f023d
2022-04-08 13:29:19 -07:00
anand76 c3d7e16252 Add WAL compression to stress tests (#9811)
Summary:
Add the WAL compression feature to the stress test.

Pull Request resolved: https://github.com/facebook/rocksdb/pull/9811

Reviewed By: riversand963

Differential Revision: D35414316

Pulled By: anand1976

fbshipit-source-id: 0c17b1ec55679a52f088ad368798b57139bd921a
2022-04-06 15:47:09 -07:00
Hui Xiao 49623f9c8e Account memory of big memory users in BlockBasedTable in global memory limit (#9748)
Summary:
**Context:**
Through heap profiling, we discovered that `BlockBasedTableReader` objects can accumulate and lead to high memory usage (e.g, `max_open_file = -1`). These memories are currently not saved, not tracked, not constrained and not cache evict-able. As a first step to improve this, similar to https://github.com/facebook/rocksdb/pull/8428,  this PR is to track an estimate of `BlockBasedTableReader` object's memory in block cache and fail future creation if the memory usage exceeds the available space of cache at the time of creation.

**Summary:**
- Approximate big memory users  (`BlockBasedTable::Rep` and `TableProperties` )' memory usage in addition to the existing estimated ones (filter block/index block/un-compression dictionary)
- Charge all of these memory usages to block cache on `BlockBasedTable::Open()` and release them on `~BlockBasedTable()` as there is no memory usage fluctuation of concern in between
- Refactor on CacheReservationManager (and its call-sites) to add concurrent support for BlockBasedTable  used in this PR.

Pull Request resolved: https://github.com/facebook/rocksdb/pull/9748

Test Plan:
- New unit tests
- db bench: `OpenDb` : **-0.52% in ms**
  - Setup `./db_bench -benchmarks=fillseq -db=/dev/shm/testdb -disable_auto_compactions=1 -write_buffer_size=1048576`
  - Repeated run with pre-change w/o feature and post-change with feature, benchmark `OpenDb`:  `./db_bench -benchmarks=readrandom -use_existing_db=1 -db=/dev/shm/testdb -reserve_table_reader_memory=true (remove this when running w/o feature) -file_opening_threads=3 -open_files=-1 -report_open_timing=true| egrep 'OpenDb:'`

#-run | (feature-off) avg milliseconds | std milliseconds | (feature-on) avg milliseconds | std milliseconds | change (%)
-- | -- | -- | -- | -- | --
10 | 11.4018 | 5.95173 | 9.47788 | 1.57538 | -16.87382694
20 | 9.23746 | 0.841053 | 9.32377 | 1.14074 | 0.9343477536
40 | 9.0876 | 0.671129 | 9.35053 | 1.11713 | 2.893283155
80 | 9.72514 | 2.28459 | 9.52013 | 1.0894 | -2.108041632
160 | 9.74677 | 0.991234 | 9.84743 | 1.73396 | 1.032752389
320 | 10.7297 | 5.11555 | 10.547 | 1.97692 | **-1.70275031**
640 | 11.7092 | 2.36565 | 11.7869 | 2.69377 | **0.6635807741**

-  db bench on write with cost to cache in WriteBufferManager (just in case this PR's CRM refactoring accidentally slows down anything in WBM) : `fillseq` : **+0.54% in micros/op**
`./db_bench -benchmarks=fillseq -db=/dev/shm/testdb -disable_auto_compactions=1 -cost_write_buffer_to_cache=true -write_buffer_size=10000000000 | egrep 'fillseq'`

#-run | (pre-PR) avg micros/op | std micros/op | (post-PR)  avg micros/op | std micros/op | change (%)
-- | -- | -- | -- | -- | --
10 | 6.15 | 0.260187 | 6.289 | 0.371192 | 2.260162602
20 | 7.28025 | 0.465402 | 7.37255 | 0.451256 | 1.267813605
40 | 7.06312 | 0.490654 | 7.13803 | 0.478676 | **1.060579461**
80 | 7.14035 | 0.972831 | 7.14196 | 0.92971 | **0.02254791432**

-  filter bench: `bloom filter`: **-0.78% in ms/key**
    - ` ./filter_bench -impl=2 -quick -reserve_table_builder_memory=true | grep 'Build avg'`

#-run | (pre-PR) avg ns/key | std ns/key | (post-PR)  ns/key | std ns/key | change (%)
-- | -- | -- | -- | -- | --
10 | 26.4369 | 0.442182 | 26.3273 | 0.422919 | **-0.4145720565**
20 | 26.4451 | 0.592787 | 26.1419 | 0.62451 | **-1.1465262**

- Crash test `python3 tools/db_crashtest.py blackbox --reserve_table_reader_memory=1 --cache_size=1` killed as normal

Reviewed By: ajkr

Differential Revision: D35136549

Pulled By: hx235

fbshipit-source-id: 146978858d0f900f43f4eb09bfd3e83195e3be28
2022-04-06 10:33:00 -07:00
Peter Dillinger 6534c6dea4 Fix remaining uses of "backupable" (#9792)
Summary:
Various renaming and fixes to get rid of remaining uses of
"backupable" which is terminology leftover from the original, flawed
design of BackupableDB. Now any DB can be backed up, using BackupEngine.

Pull Request resolved: https://github.com/facebook/rocksdb/pull/9792

Test Plan: CI

Reviewed By: ajkr

Differential Revision: D35334386

Pulled By: pdillinger

fbshipit-source-id: 2108a42b4575c8cccdfd791c549aae93ec2f3329
2022-04-05 09:52:33 -07:00
Chen Lixiang cd59b139fc Fix some typos in comments and HISTORY.md (#9798)
Summary:
compation --> compaction

Pull Request resolved: https://github.com/facebook/rocksdb/pull/9798

Reviewed By: ajkr

Differential Revision: D35341611

Pulled By: jay-zhuang

fbshipit-source-id: 5ea07527c311de75cade219456b6ee52b23020f6
2022-04-04 09:32:57 -07:00
Bo Wang bcabee737f Improve comments for some files (#9793)
Summary:
Update the comments, e.g. fixing typo, formatting, etc.

Pull Request resolved: https://github.com/facebook/rocksdb/pull/9793

Reviewed By: jay-zhuang

Differential Revision: D35323989

Pulled By: gitbw95

fbshipit-source-id: 4a72fc02b67abaae8be0d1439b68f9967a68052d
2022-04-01 16:06:14 -07:00
Andrew Kryczka bfea9e7c02 Add benchmark for GetMergeOperands() (#9785)
Summary:
There's an existing benchmark, "getmergeoperands", but it is unconventional in that it has multiple phases and hardcoded setup parameters.

This PR adds a different one, "readrandomoperands", that follows the pattern of other benchmarks of having a single phase and taking its configuration from existing flags.

Pull Request resolved: https://github.com/facebook/rocksdb/pull/9785

Test Plan:
```
$ ./db_bench -benchmarks=mergerandom -merge_operator=StringAppendOperator -write_buffer_size=1048576 -max_bytes_for_level_base=4194304 -target_file_size_base=1048576 -compression_type=none -disable_auto_compactions=true
$ ./db_bench -use_existing_db=true -benchmarks=readrandomoperands -merge_operator=StringAppendOperator -disable_auto_compactions=true -duration=10
...
readrandomoperands :     542.082 micros/op 1844 ops/sec;    0.2 MB/s (11980 of 18999 found)
```

Reviewed By: jay-zhuang

Differential Revision: D35290412

Pulled By: ajkr

fbshipit-source-id: fb367ca614b128cef844a75f0e5d9dd7c3328d85
2022-03-31 21:23:58 -07:00
Akanksha Mahajan fd66005628 Add 'adaptive_readahead' and 'async_io' options to db_stress (#9750)
Summary:
Same as title

Pull Request resolved: https://github.com/facebook/rocksdb/pull/9750

Test Plan:
export CRASH_TEST_EXT_ARGS=" --async_io=1 --adaptive_readahead=1;
make -j crash_test

Reviewed By: jay-zhuang

Differential Revision: D35114326

Pulled By: akankshamahajan15

fbshipit-source-id: 8b05c95be09f7aff6cb9eb757aa20a6520349d45
2022-03-30 13:52:37 -07:00
Hui Xiao 60106b91ac Add 7.0.fb/7.1.fb to check_format_compatible.sh (#9772)
Summary:
As titled

Pull Request resolved: https://github.com/facebook/rocksdb/pull/9772

Test Plan: `./tools/check_format_compatible.sh 7.1.fb` (and manually removed 2.7.fb due to pre-existing assertion failure) passed compatibility test

Reviewed By: ajkr

Differential Revision: D35233659

Pulled By: hx235

fbshipit-source-id: 6b93263a5724d752347e04f1396628804c24a880
2022-03-30 11:11:39 -07:00
Mark Callaghan 37de4e1d08 Correctly set ThreadState::tid (#9757)
Summary:
Fixes a bug introduced by me in https://github.com/facebook/rocksdb/pull/9733
That PR added a counter so that the per-thread seeds in ThreadState would
be unique even when --benchmarks had more than one test. But it incorrectly
used this counter as the value for ThreadState::tid as well.

Pull Request resolved: https://github.com/facebook/rocksdb/pull/9757

Test Plan:
Confirm that unexpectedly good QPS results on the regression tests return
to normal with this fix. I have confirmed that the QPS increase starts with
the PR 9733 diff.

Reviewed By: jay-zhuang

Differential Revision: D35149303

Pulled By: mdcallag

fbshipit-source-id: dee5cc36b7faaba6c3be6d6a253d3c2eaad72864
2022-03-25 15:30:28 -07:00
Mark Callaghan 1a130fa3c1 db_bench should use a good seed when --seed is not set or set to 0 (#9740)
Summary:
This is for https://github.com/facebook/rocksdb/issues/9737

I have wasted more than a few hours running db_bench benchmarks where --seed was not set
and getting better than expected results because cache hit rates are great because
multiple invocations of db_bench used the same value for --seed or did not set it,
and then all used 0. The result is that all see the same sequence of keys.

Others have done the same. The problem is worse in that it is easy to miss and the result is a benchmark with results that are misleading.

A good way to avoid this is to set it to the equivalent of gettimeofday() when either
--seed is not set or it is set to 0 (the default).

With this change the actual seed is printed when it was 0 at process start:
  Set seed to 1647992570365606 because --seed was 0

Pull Request resolved: https://github.com/facebook/rocksdb/pull/9740

Test Plan:
Perf results:

./db_bench --benchmarks=fillseq,readrandom --num=1000000 --reads=4000000
  readrandom   :       6.469 micros/op 154583 ops/sec;   17.1 MB/s (4000000 of 4000000 found)

./db_bench --benchmarks=fillseq,readrandom --num=1000000 --reads=4000000 --seed=0
  readrandom   :       6.565 micros/op 152321 ops/sec;   16.9 MB/s (4000000 of 4000000 found)

./db_bench --benchmarks=fillseq,readrandom --num=1000000 --reads=4000000 --seed=1
  readrandom   :       6.461 micros/op 154777 ops/sec;   17.1 MB/s (4000000 of 4000000 found)

./db_bench --benchmarks=fillseq,readrandom --num=1000000 --reads=4000000 --seed=2
  readrandom   :       6.525 micros/op 153244 ops/sec;   17.0 MB/s (4000000 of 4000000 found)

Reviewed By: jay-zhuang

Differential Revision: D35145361

Pulled By: mdcallag

fbshipit-source-id: 2b35b153ccec46b27d7c9405997523555fc51267
2022-03-25 10:12:27 -07:00
Mark Callaghan 409635cb2a Add --slow_usecs option to determine when long op message is printed (#9732)
Summary:
This adds the --slow_usecs option with a default value of 1M. Operations that
take this much time have a message printed when --histogram=1, --stats_interval=0
and --stats_interval_seconds=0. The current code hardwired this to 20,000 usecs
and for some stress tests that reduced throughput by 20% or more.

This is for https://github.com/facebook/rocksdb/issues/9620

Pull Request resolved: https://github.com/facebook/rocksdb/pull/9732

Test Plan:
./db_bench --benchmarks=fillrandom,readrandom --compression_type=lz4 --slow_usecs=100 --histogram=1
./db_bench --benchmarks=fillrandom,readrandom --compression_type=lz4 --slow_usecs=100000 --histogram=1

Reviewed By: jay-zhuang

Differential Revision: D35121522

Pulled By: mdcallag

fbshipit-source-id: daf27f937efd748980545d6395db332712fc078b
2022-03-24 13:39:01 -07:00
Mark Callaghan f219e3d5d8 db_bench should fail on bad values for --compaction_fadvice and --value_size_distribution_type (#9741)
Summary:
db_bench quietly parses and ignores bad values for --compaction_fadvice and --value_size_distribution_type
I prefer that it fail for them as it does for bad option values in most other cases. Otherwise a benchmark
result will be provided for the wrong configuration and the result will be misleading.

Pull Request resolved: https://github.com/facebook/rocksdb/pull/9741

Test Plan:
These now fail:
./db_bench --compaction_fadvice=noney
Unknown compaction fadvice:noney

./db_bench --value_size_distribution_type=norma
Cannot parse distribution type 'norma'

While correct values continue to work:
 ./db_bench --value_size_distribution_type=normal
Initializing RocksDB Options from the specified file
Initializing RocksDB Options from command-line flags

./db_bench --compaction_fadvice=none
Initializing RocksDB Options from the specified file
Initializing RocksDB Options from command-line flags

Reviewed By: siying

Differential Revision: D35115973

Pulled By: mdcallag

fbshipit-source-id: c2b10de5c2d1ea7c7539e676f5bd556351f5d370
2022-03-24 11:46:27 -07:00
Mark Callaghan d583d23d86 Avoid seed reuse when --benchmarks has more than one test (#9733)
Summary:
When --benchmarks has more than one test then the threads in one benchmark
will use the same set of seeds as the threads in the previous benchmark.
This diff fixe that.

This fixes https://github.com/facebook/rocksdb/issues/9632

Pull Request resolved: https://github.com/facebook/rocksdb/pull/9733

Test Plan:
For this command line the block cache is 8GB, so it caches at most 1024 8KB blocks. Note that without
this diff the second run of readrandom has a much better response time because seed reuse means the
second run reads the same 1000 blocks as the first run and they are cached at that point. But with
this diff that does not happen.

./db_bench --benchmarks=fillseq,flush,compact0,waitforcompaction,levelstats,readrandom,readrandom --compression_type=zlib --num=10000000 --reads=1000 --block_size=8192

...

```
Level Files Size(MB)
--------------------
  0        0        0
  1       11      238
  2        9      253
  3        0        0
  4        0        0
  5        0        0
  6        0        0
```

 --- perf results without this diff

DB path: [/tmp/rocksdbtest-2260/dbbench]
readrandom   :      46.212 micros/op 21618 ops/sec;    2.4 MB/s (1000 of 1000 found)

DB path: [/tmp/rocksdbtest-2260/dbbench]
readrandom   :      21.963 micros/op 45450 ops/sec;    5.0 MB/s (1000 of 1000 found)

 --- perf results with this diff

DB path: [/tmp/rocksdbtest-2260/dbbench]
readrandom   :      47.213 micros/op 21126 ops/sec;    2.3 MB/s (1000 of 1000 found)

DB path: [/tmp/rocksdbtest-2260/dbbench]
readrandom   :      42.880 micros/op 23299 ops/sec;    2.6 MB/s (1000 of 1000 found)

Reviewed By: jay-zhuang

Differential Revision: D35089763

Pulled By: mdcallag

fbshipit-source-id: 1b50143a07afe876b8c8e5fa50dd94a8ce57fc6b
2022-03-24 08:57:48 -07:00
Yanqin Jin c18c4a081c Add new determinators for multiops transactions stress test (#9708)
Summary:
Add determinators for multiops transactions stress test with
write-committed and write-prepared policies.

Pull Request resolved: https://github.com/facebook/rocksdb/pull/9708

Test Plan: Internal CI

Reviewed By: jay-zhuang

Differential Revision: D34967263

Pulled By: riversand963

fbshipit-source-id: 170a0842d56dccb6ed6bc0c5adfd33849acd6b31
2022-03-23 22:29:50 -07:00
Mark Callaghan 6904fd0c86 db_bench should fail when an option uses an invalid compression type (#9729)
Summary:
This changes db_bench to fail at startup for invalid compression types. It had been
changing them to Snappy. For other invalid options it fails at startup.

This is for https://github.com/facebook/rocksdb/issues/9621

Pull Request resolved: https://github.com/facebook/rocksdb/pull/9729

Test Plan:
This continues to work:
./db_bench --benchmarks=fillrandom --compression_type=lz4

This now fails rather than changing the compression type to Snappy
./db_bench --benchmarks=fillrandom --compression_type=lz44
Cannot parse compression type 'lz44'

Reviewed By: jay-zhuang

Differential Revision: D35081323

Pulled By: mdcallag

fbshipit-source-id: 9b38c835abddce11aa7feb235df63f53cf829981
2022-03-23 12:26:34 -07:00
Mark Callaghan d71e5a5beb Add number of running flushes & compactions to --stats_per_interval output (#9726)
Summary:
This is for https://github.com/facebook/rocksdb/issues/9709 and add two lines to the end of DB Stats
for num-running-compactions and num-running-flushes.

For example ...

** DB Stats **
Uptime(secs): 6.0 total, 1.0 interval
Cumulative writes: 915K writes, 915K keys, 915K commit groups, 1.0 writes per commit group, ingest: 0.11 GB, 18.95 MB/s
Cumulative WAL: 915K writes, 0 syncs, 915000.00 writes per sync, written: 0.11 GB, 18.95 MB/s
Cumulative stall: 00:00:0.000 H:M:S, 0.0 percent
Interval writes: 133K writes, 133K keys, 133K commit groups, 1.0 writes per commit group, ingest: 16.62 MB, 16.53 MB/s
Interval WAL: 133K writes, 0 syncs, 133000.00 writes per sync, written: 0.02 GB, 16.53 MB/s
Interval stall: 00:00:0.000 H:M:S, 0.0 percent
num-running-compactions: 0
num-running-flushes: 0

Pull Request resolved: https://github.com/facebook/rocksdb/pull/9726

Reviewed By: jay-zhuang

Differential Revision: D35066759

Pulled By: mdcallag

fbshipit-source-id: c161fadd3c15c5aa715a820dab6bfedb46dc099b
2022-03-23 09:33:41 -07:00
Akanksha Mahajan f07eec1bf8 Add async_io read option in db_bench (#9735)
Summary:
Add async_io Read option in db_bench

Pull Request resolved: https://github.com/facebook/rocksdb/pull/9735

Test Plan:
./db_bench -use_existing_db=true
-db=/tmp/prefix_scan_prefetch_main -benchmarks="seekrandom" -key_size=32
-value_size=512 -num=5000000 -use_direct_reads=true -seek_nexts=327680
-duration=120 -ops_between_duration_checks=1 -async_io=1

Reviewed By: riversand963

Differential Revision: D35058482

Pulled By: akankshamahajan15

fbshipit-source-id: 1522b638c79f6d85bb7408c67f6ab76dbabeeee7
2022-03-22 17:21:35 -07:00
Mark Callaghan 63a284a6ad For db_bench --benchmarks=fillseq with --num_multi_db load databases … (#9713)
Summary:
…in order

This fixes https://github.com/facebook/rocksdb/issues/9650
For db_bench --benchmarks=fillseq --num_multi_db=X it loads databases in sequence
rather than randomly choosing a database per Put. The benefits are:
1) avoids long delays between flushing memtables
2) avoids flushing memtables for all of them at the same point in time
3) puts same number of keys per database so that query tests will find keys as expected

Pull Request resolved: https://github.com/facebook/rocksdb/pull/9713

Test Plan:
Using db_bench.1 without the change and db_bench.2 with the change:

for i in 1 2; do rm -rf /data/m/rx/* ; time ./db_bench.$i --db=/data/m/rx --benchmarks=fillseq --num_multi_db=4 --num=10000000; du -hs /data/m/rx ; done

 --- without the change
    fillseq      :       3.188 micros/op 313682 ops/sec;   34.7 MB/s
    real    2m7.787s
    user    1m52.776s
    sys     0m46.549s
    2.7G    /data/m/rx

 --- with the change

    fillseq      :       3.149 micros/op 317563 ops/sec;   35.1 MB/s
    real    2m6.196s
    user    1m51.482s
    sys     0m46.003s
    2.7G    /data/m/rx

    Also, temporarily added a printf to confirm that the code switches to the next database at the right time
    ZZ switch to db 1 at 10000000
    ZZ switch to db 2 at 20000000
    ZZ switch to db 3 at 30000000

for i in 1 2; do rm -rf /data/m/rx/* ; time ./db_bench.$i --db=/data/m/rx --benchmarks=fillseq,readrandom --num_multi_db=4 --num=100000; du -hs /data/m/rx ; done

 --- without the change, smaller database, note that not all keys are found by readrandom because databases have < and > --num keys

    fillseq      :       3.176 micros/op 314805 ops/sec;   34.8 MB/s
    readrandom   :       1.913 micros/op 522616 ops/sec;   57.7 MB/s (99873 of 100000 found)

 --- with the change, smaller database, note that all keys are found by readrandom

    fillseq      :       3.110 micros/op 321566 ops/sec;   35.6 MB/s
    readrandom   :       1.714 micros/op 583257 ops/sec;   64.5 MB/s (100000 of 100000 found)

Reviewed By: jay-zhuang

Differential Revision: D35030168

Pulled By: mdcallag

fbshipit-source-id: 2a18c4ec571d954cf5a57b00a11802a3608823ee
2022-03-22 10:36:24 -07:00
Mark Callaghan 1ca1562e35 Make mixgraph easier to use (#9711)
Summary:
Changes:
* improves monitoring by displaying average size of a Put value and average scan length
* forces the minimum value size to be 10. Before this it was 0 if you didn't set the distribution parameters.
* uses reasonable defaults for the distribution parameters that determine value size and scan length
* includes seeks in "reads ... found" message, before this they were missing

This is for https://github.com/facebook/rocksdb/issues/9672

Pull Request resolved: https://github.com/facebook/rocksdb/pull/9711

Test Plan:
Before this change:

./db_bench --benchmarks=fillseq,mixgraph --mix_get_ratio=50 --mix_put_ratio=25 --mix_seek_ratio=25 --num=100000 --value_k=0.2615 --value_sigma=25.45 --iter_k=2.517 --iter_sigma=14.236
fillseq      :       4.289 micros/op 233138 ops/sec;   25.8 MB/s
mixgraph     :      18.461 micros/op 54166 ops/sec;  755.0 MB/s ( Gets:50164 Puts:24919 Seek:24917 of 50164 in 75081 found)

After this change:

./db_bench --benchmarks=fillseq,mixgraph --mix_get_ratio=50 --mix_put_ratio=25 --mix_seek_ratio=25 --num=100000 --value_k=0.2615 --value_sigma=25.45 --iter_k=2.517 --iter_sigma=14.236
fillseq      :       3.974 micros/op 251553 ops/sec;   27.8 MB/s
mixgraph     :      16.722 micros/op 59795 ops/sec;  833.5 MB/s ( Gets:50164 Puts:24919 Seek:24917, reads 75081 in 75081 found, avg size: 36.0 value, 504.9 scan)

Reviewed By: jay-zhuang

Differential Revision: D35030190

Pulled By: mdcallag

fbshipit-source-id: d8f555f28d869f752ddb674a524108884511b151
2022-03-21 17:30:51 -07:00
Peter Dillinger a8a422e962 Add manifest fix-up utility for file temperatures (#9683)
Summary:
The goal of this change is to allow changes to the "current" (in
FileSystem) file temperatures to feed back into DB metadata, so that
they can inform decisions and stats reporting. In part because of
modular code factoring, it doesn't seem easy to do this automagically,
where opening an SST file and observing current Temperature different
from expected would trigger a change in metadata and DB manifest write
(essentially giving the deep read path access to the write path). It is also
difficult to do this while the DB is open because of the limitations of
LogAndApply.

This change allows updating file temperature metadata on a closed DB
using an experimental utility function UpdateManifestForFilesState()
or `ldb update_manifest --update_temperatures`. This should suffice for
"migration" scenarios where outside tooling has placed or re-arranged DB
files into a (different) tiered configuration without going through
RocksDB itself (currently, only compaction can change temperature
metadata).

Some details:
* Refactored and added unit test for `ldb unsafe_remove_sst_file` because
of shared functionality
* Pulled in autovector.h changes from https://github.com/facebook/rocksdb/issues/9546 to fix SuperVersionContext
move constructor (related to an older draft of this change)

Possible follow-up work:
* Support updating manifest with file checksums, such as when a
new checksum function is used and want existing DB metadata updated
for it.
* It's possible that for some repair scenarios, lighter weight than
full repair, we might want to support UpdateManifestForFilesState() to
modify critical file details like size or checksum using same
algorithm. But let's make sure these are differentiated from modifying
file details in ways that don't suspect corruption (or require extreme
trust).

Pull Request resolved: https://github.com/facebook/rocksdb/pull/9683

Test Plan: unit tests added

Reviewed By: jay-zhuang

Differential Revision: D34798828

Pulled By: pdillinger

fbshipit-source-id: cfd83e8fb10761d8c9e7f9c020d68c9106a95554
2022-03-18 16:35:51 -07:00
Peter Dillinger cff0d1e8e6 New backup meta schema, with file temperatures (#9660)
Summary:
The primary goal of this change is to add support for backing up and
restoring (applying on restore) file temperature metadata, without
committing to either the DB manifest or the FS reported "current"
temperatures being exclusive "source of truth".

To achieve this goal, we need to add temperature information to backup
metadata, which requires updated backup meta schema. Fortunately I
prepared for this in https://github.com/facebook/rocksdb/issues/8069, which began forward compatibility in version
6.19.0 for this kind of schema update. (Previously, backup meta schema
was not extensible! Making this schema update public will allow some
other "nice to have" features like taking backups with hard links, and
avoiding crc32c checksum computation when another checksum is already
available.) While schema version 2 is newly public, the default schema
version is still 1. Until we change the default, users will need to set
to 2 to enable features like temperature data backup+restore. New
metadata like temperature information will be ignored with a warning
in versions before this change and since 6.19.0. The metadata is
considered ignorable because a functioning DB can be restored without
it.

Some detail:
* Some renaming because "future schema" is now just public schema 2.
* Initialize some atomics in TestFs (linter reported)
* Add temperature hint support to SstFileDumper (used by BackupEngine)

Pull Request resolved: https://github.com/facebook/rocksdb/pull/9660

Test Plan:
related unit test majorly updated for the new functionality,
including some shared testing support for tracking temperatures in a FS.

Some other tests and testing hooks into production code also updated for
making the backup meta schema change public.

Reviewed By: ajkr

Differential Revision: D34686968

Pulled By: pdillinger

fbshipit-source-id: 3ac1fa3e67ee97ca8a5103d79cc87d872c1d862a
2022-03-18 11:06:17 -07:00
Yanqin Jin 5894761056 Improve stress test for transactions (#9568)
Summary:
Test only, no change to functionality.
Extremely low risk of library regression.

Update test key generation by maintaining existing and non-existing keys.
Update db_crashtest.py to drive multiops_txn stress test for both write-committed and write-prepared.
Add a make target 'blackbox_crash_test_with_multiops_txn'.

Running the following commands caught the bug exposed in https://github.com/facebook/rocksdb/issues/9571.
```
$rm -rf /tmp/rocksdbtest/*
$./db_stress -progress_reports=0 -test_multi_ops_txns -use_txn -clear_column_family_one_in=0 \
    -column_families=1 -writepercent=0 -delpercent=0 -delrangepercent=0 -customopspercent=60 \
   -readpercent=20 -prefixpercent=0 -iterpercent=20 -reopen=0 -ops_per_thread=1000 -ub_a=10000 \
   -ub_c=100 -destroy_db_initially=0 -key_spaces_path=/dev/shm/key_spaces_desc -threads=32 -read_fault_one_in=0
$./db_stress -progress_reports=0 -test_multi_ops_txns -use_txn -clear_column_family_one_in=0
   -column_families=1 -writepercent=0 -delpercent=0 -delrangepercent=0 -customopspercent=60 -readpercent=20 \
   -prefixpercent=0 -iterpercent=20 -reopen=0 -ops_per_thread=1000 -ub_a=10000 -ub_c=100 -destroy_db_initially=0 \
   -key_spaces_path=/dev/shm/key_spaces_desc -threads=32 -read_fault_one_in=0
```

Running the following command caught a bug which will be fixed in https://github.com/facebook/rocksdb/issues/9648 .
```
$TEST_TMPDIR=/dev/shm make blackbox_crash_test_with_multiops_wc_txn
```

Pull Request resolved: https://github.com/facebook/rocksdb/pull/9568

Reviewed By: jay-zhuang

Differential Revision: D34308154

Pulled By: riversand963

fbshipit-source-id: 99ff1b65c19b46c471d2f2d3b47adcd342a1b9e7
2022-03-16 19:00:04 -07:00
Baptiste Lemaire e4c87773e1 Reactivate Mempurge feature in crash test. (#9684)
Summary:
Set `experimental_mempurge_threshold` back to `lambda: 10.0*random.random()` in crash test, reverting https://github.com/facebook/rocksdb/issues/8958 after fix provided in https://github.com/facebook/rocksdb/issues/9671 .

Pull Request resolved: https://github.com/facebook/rocksdb/pull/9684

Reviewed By: pdillinger

Differential Revision: D34820257

Pulled By: bjlemaire

fbshipit-source-id: 1e5ae8c872c4ac4c4267c990ac5e3e793d77908c
2022-03-11 15:47:30 -08:00
Hui Xiao ca0ef54f16 Rate-limit automatic WAL flush after each user write (#9607)
Summary:
**Context:**
WAL flush is currently not rate-limited by `Options::rate_limiter`. This PR is to provide rate-limiting to auto WAL flush, the one that automatically happen after each user write operation (i.e, `Options::manual_wal_flush == false`), by adding `WriteOptions::rate_limiter_options`.

Note that we are NOT rate-limiting WAL flush that do NOT automatically happen after each user write, such as  `Options::manual_wal_flush == true + manual FlushWAL()` (rate-limiting multiple WAL flushes),  for the benefits of:
- being consistent with [ReadOptions::rate_limiter_priority](https://github.com/facebook/rocksdb/blob/7.0.fb/include/rocksdb/options.h#L515)
- being able to turn off some WAL flush's rate-limiting but not all (e.g, turn off specific the WAL flush of a critical user write like a service's heartbeat)

`WriteOptions::rate_limiter_options` only accept `Env::IO_USER` and `Env::IO_TOTAL` currently due to an implementation constraint.
- The constraint is that we currently queue parallel writes (including WAL writes) based on FIFO policy which does not factor rate limiter priority into this layer's scheduling. If we allow lower priorities such as `Env::IO_HIGH/MID/LOW` and such writes specified with lower priorities occurs before ones specified with higher priorities (even just by a tiny bit in arrival time), the former would have blocked the latter, leading to a "priority inversion" issue and contradictory to what we promise for rate-limiting priority. Therefore we only allow `Env::IO_USER` and `Env::IO_TOTAL`  right now before improving that scheduling.

A pre-requisite to this feature is to support operation-level rate limiting in `WritableFileWriter`, which is also included in this PR.

**Summary:**
- Renamed test suite `DBRateLimiterTest to DBRateLimiterOnReadTest` for adding a new test suite
- Accept `rate_limiter_priority` in `WritableFileWriter`'s private and public write functions
- Passed `WriteOptions::rate_limiter_options` to `WritableFileWriter` in the path of automatic WAL flush.

Pull Request resolved: https://github.com/facebook/rocksdb/pull/9607

Test Plan:
- Added new unit test to verify existing flush/compaction rate-limiting does not break, since `DBTest, RateLimitingTest` is disabled and current db-level rate-limiting tests focus on read only (e.g, `db_rate_limiter_test`, `DBTest2, RateLimitedCompactionReads`).
- Added new unit test `DBRateLimiterOnWriteWALTest, AutoWalFlush`
- `strace -ftt -e trace=write ./db_bench -benchmarks=fillseq -db=/dev/shm/testdb -rate_limit_auto_wal_flush=1 -rate_limiter_bytes_per_sec=15 -rate_limiter_refill_period_us=1000000 -write_buffer_size=100000000 -disable_auto_compactions=1 -num=100`
   - verified that WAL flush(i.e, system-call _write_) were chunked into 15 bytes and each _write_ was roughly 1 second apart
   - verified the chunking disappeared when `-rate_limit_auto_wal_flush=0`
- crash test: `python3 tools/db_crashtest.py blackbox --disable_wal=0  --rate_limit_auto_wal_flush=1 --rate_limiter_bytes_per_sec=10485760 --interval=10` killed as normal

**Benchmarked on flush/compaction to ensure no performance regression:**
- compaction with rate-limiting  (see table 1, avg over 1280-run):  pre-change: **915635 micros/op**; post-change:
   **907350 micros/op (improved by 0.106%)**
```
#!/bin/bash
TEST_TMPDIR=/dev/shm/testdb
START=1
NUM_DATA_ENTRY=8
N=10

rm -f compact_bmk_output.txt compact_bmk_output_2.txt dont_care_output.txt
for i in $(eval echo "{$START..$NUM_DATA_ENTRY}")
do
    NUM_RUN=$(($N*(2**($i-1))))
    for j in $(eval echo "{$START..$NUM_RUN}")
    do
       ./db_bench --benchmarks=fillrandom -db=$TEST_TMPDIR -disable_auto_compactions=1 -write_buffer_size=6710886 > dont_care_output.txt && ./db_bench --benchmarks=compact -use_existing_db=1 -db=$TEST_TMPDIR -level0_file_num_compaction_trigger=1 -rate_limiter_bytes_per_sec=100000000 | egrep 'compact'
    done > compact_bmk_output.txt && awk -v NUM_RUN=$NUM_RUN '{sum+=$3;sum_sqrt+=$3^2}END{print sum/NUM_RUN, sqrt(sum_sqrt/NUM_RUN-(sum/NUM_RUN)^2)}' compact_bmk_output.txt >> compact_bmk_output_2.txt
done
```
- compaction w/o rate-limiting  (see table 2, avg over 640-run):  pre-change: **822197 micros/op**; post-change: **823148 micros/op (regressed by 0.12%)**
```
Same as above script, except that -rate_limiter_bytes_per_sec=0
```
- flush with rate-limiting (see table 3, avg over 320-run, run on the [patch](ee5c6023a9) to augment current db_bench ): pre-change: **745752 micros/op**; post-change: **745331 micros/op (regressed by 0.06 %)**
```
 #!/bin/bash
TEST_TMPDIR=/dev/shm/testdb
START=1
NUM_DATA_ENTRY=8
N=10

rm -f flush_bmk_output.txt flush_bmk_output_2.txt

for i in $(eval echo "{$START..$NUM_DATA_ENTRY}")
do
    NUM_RUN=$(($N*(2**($i-1))))
    for j in $(eval echo "{$START..$NUM_RUN}")
    do
       ./db_bench -db=$TEST_TMPDIR -write_buffer_size=1048576000 -num=1000000 -rate_limiter_bytes_per_sec=100000000 -benchmarks=fillseq,flush | egrep 'flush'
    done > flush_bmk_output.txt && awk -v NUM_RUN=$NUM_RUN '{sum+=$3;sum_sqrt+=$3^2}END{print sum/NUM_RUN, sqrt(sum_sqrt/NUM_RUN-(sum/NUM_RUN)^2)}' flush_bmk_output.txt >> flush_bmk_output_2.txt
done

```
- flush w/o rate-limiting (see table 4, avg over 320-run, run on the [patch](ee5c6023a9) to augment current db_bench): pre-change: **487512 micros/op**, post-change: **485856 micors/ops (improved by 0.34%)**
```
Same as above script, except that -rate_limiter_bytes_per_sec=0
```

| table 1 - compact with rate-limiting|
#-run | (pre-change) avg micros/op | std micros/op | (post-change)  avg micros/op | std micros/op | change in avg micros/op  (%)
-- | -- | -- | -- | -- | --
10 | 896978 | 16046.9 | 901242 | 15670.9 | 0.475373978
20 | 893718 | 15813 | 886505 | 17544.7 | -0.8070778478
40 | 900426 | 23882.2 | 894958 | 15104.5 | -0.6072681153
80 | 906635 | 21761.5 | 903332 | 23948.3 | -0.3643141948
160 | 898632 | 21098.9 | 907583 | 21145 | 0.9960695813
3.20E+02 | 905252 | 22785.5 | 908106 | 25325.5 | 0.3152713278
6.40E+02 | 905213 | 23598.6 | 906741 | 21370.5 | 0.1688000504
**1.28E+03** | **908316** | **23533.1** | **907350** | **24626.8** | **-0.1063506533**
average over #-run | 901896.25 | 21064.9625 | 901977.125 | 20592.025 | 0.008967217682

| table 2 - compact w/o rate-limiting|
#-run | (pre-change) avg micros/op | std micros/op | (post-change)  avg micros/op | std micros/op | change in avg micros/op  (%)
-- | -- | -- | -- | -- | --
10 | 811211 | 26996.7 | 807586 | 28456.4 | -0.4468627768
20 | 815465 | 14803.7 | 814608 | 28719.7 | -0.105093413
40 | 809203 | 26187.1 | 797835 | 25492.1 | -1.404839082
80 | 822088 | 28765.3 | 822192 | 32840.4 | 0.01265071379
160 | 821719 | 36344.7 | 821664 | 29544.9 | -0.006693285661
3.20E+02 | 820921 | 27756.4 | 821403 | 28347.7 | 0.05871454135
**6.40E+02** | **822197** | **28960.6** | **823148** | **30055.1** | **0.1156657103**
average over #-run | 8.18E+05 | 2.71E+04 | 8.15E+05 | 2.91E+04 |  -0.25

| table 3 - flush with rate-limiting|
#-run | (pre-change) avg micros/op | std micros/op | (post-change)  avg micros/op | std micros/op | change in avg micros/op  (%)
-- | -- | -- | -- | -- | --
10 | 741721 | 11770.8 | 740345 | 5949.76 | -0.1855144994
20 | 735169 | 3561.83 | 743199 | 9755.77 | 1.09226586
40 | 743368 | 8891.03 | 742102 | 8683.22 | -0.1703059588
80 | 742129 | 8148.51 | 743417 | 9631.58| 0.1735547324
160 | 749045 | 9757.21 | 746256 | 9191.86 | -0.3723407806
**3.20E+02** | **745752** | **9819.65** | **745331** | **9840.62** | **-0.0564530836**
6.40E+02 | 749006 | 11080.5 | 748173 | 10578.7 | -0.1112140624
average over #-run | 743741.4286 | 9004.218571 | 744117.5714 | 9090.215714 | 0.05057441238

| table 4 - flush w/o rate-limiting|
#-run | (pre-change) avg micros/op | std micros/op | (post-change)  avg micros/op | std micros/op | change in avg micros/op (%)
-- | -- | -- | -- | -- | --
10 | 477283 | 24719.6 | 473864 | 12379 | -0.7163464863
20 | 486743 | 20175.2 | 502296 | 23931.3 | 3.195320734
40 | 482846 | 15309.2 | 489820 | 22259.5 | 1.444352858
80 | 491490 | 21883.1 | 490071 | 23085.7 | -0.2887139108
160 | 493347 | 28074.3 | 483609 | 21211.7 | -1.973864238
**3.20E+02** | **487512** | **21401.5** | **485856** | **22195.2** | **-0.3396839462**
6.40E+02 | 490307 | 25418.6 | 485435 | 22405.2 | -0.9936631539
average over #-run | 4.87E+05 | 2.24E+04 | 4.87E+05 | 2.11E+04 | 0.00E+00

Reviewed By: ajkr

Differential Revision: D34442441

Pulled By: hx235

fbshipit-source-id: 4790f13e1e5c0a95ae1d1cc93ffcf69dc6e78bdd
2022-03-08 13:19:39 -08:00
sdong 33742c2a9f Remove BlockBasedTableOptions.hash_index_allow_collision (#9454)
Summary:
BlockBasedTableOptions.hash_index_allow_collision is already deprecated and has no effect. Delete it for preparing 7.0 release.

Pull Request resolved: https://github.com/facebook/rocksdb/pull/9454

Test Plan: Run all existing tests.

Reviewed By: ajkr

Differential Revision: D33805827

fbshipit-source-id: ed8a436d1d083173ec6aef2a762ba02e1eefdc9d
2022-03-01 13:58:02 -08:00
Changneng Chen 9ed96703d1 Add support for BlobDB to ldb (#9630)
Summary:
Add the configuration options and help messages of BlobDB to `ldb`

Pull Request resolved: https://github.com/facebook/rocksdb/pull/9630

Test Plan: `python ./tools/ldb_test.py`

Reviewed By: ltamasi

Differential Revision: D34443176

Pulled By: changneng

fbshipit-source-id: 5b3f185cdfc2561e06dd37215c7edfbca07dbe80
2022-02-25 23:13:11 -08:00
Bo Wang f706a9c199 Add a secondary cache implementation based on LRUCache 1 (#9518)
Summary:
**Summary:**
RocksDB uses a block cache to reduce IO and make queries more efficient. The block cache is based on the LRU algorithm (LRUCache) and keeps objects containing uncompressed data, such as Block, ParsedFullFilterBlock etc. It allows the user to configure a second level cache (rocksdb::SecondaryCache) to extend the primary block cache by holding items evicted from it. Some of the major RocksDB users, like MyRocks, use direct IO and would like to use a primary block cache for uncompressed data and a secondary cache for compressed data. The latter allows us to mitigate the loss of the Linux page cache due to direct IO.

This PR includes a concrete implementation of rocksdb::SecondaryCache that integrates with compression libraries such as LZ4 and implements an LRU cache to hold compressed blocks.

Pull Request resolved: https://github.com/facebook/rocksdb/pull/9518

Test Plan:
In this PR, the lru_secondary_cache_test.cc includes the following tests:
1. The unit tests for the secondary cache with either compression or no compression, such as basic tests, fails tests.
2. The integration tests with both primary cache and this secondary cache .

**Follow Up:**

1. Statistics (e.g. compression ratio) will be added in another PR.
2. Once this implementation is ready, I will do some shadow testing and benchmarking with UDB to measure the impact.

Reviewed By: anand1976

Differential Revision: D34430930

Pulled By: gitbw95

fbshipit-source-id: 218d78b672a2f914856d8a90ff32f2f5b5043ded
2022-02-23 16:06:27 -08:00
Siddhartha Roychowdhury 39b0d92153 Add record to set WAL compression type if enabled (#9556)
Summary:
When WAL compression is enabled, add a record (new record type) to store the compression type to indicate that all subsequent records are compressed. The log reader will store the compression type when this record is encountered and use the type to uncompress the subsequent records. Compress and uncompress to be implemented in subsequent diffs.
Enabled WAL compression in some WAL tests to check for regressions. Some tests that rely on offsets have been disabled.

Pull Request resolved: https://github.com/facebook/rocksdb/pull/9556

Reviewed By: anand1976

Differential Revision: D34308216

Pulled By: sidroyc

fbshipit-source-id: 7f10595e46f3277f1ea2d309fbf95e2e935a8705
2022-02-17 16:19:31 -08:00
Andrew Kryczka babe56ddba Add rate limiter priority to ReadOptions (#9424)
Summary:
Users can set the priority for file reads associated with their operation by setting `ReadOptions::rate_limiter_priority` to something other than `Env::IO_TOTAL`. Rate limiting `VerifyChecksum()` and `VerifyFileChecksums()` is the motivation for this PR, so it also includes benchmarks and minor bug fixes to get that working.

`RandomAccessFileReader::Read()` already had support for rate limiting compaction reads. I changed that rate limiting to be non-specific to compaction, but rather performed according to the passed in `Env::IOPriority`. Now the compaction read rate limiting is supported by setting `rate_limiter_priority = Env::IO_LOW` on its `ReadOptions`.

There is no default value for the new `Env::IOPriority` parameter to `RandomAccessFileReader::Read()`. That means this PR goes through all callers (in some cases multiple layers up the call stack) to find a `ReadOptions` to provide the priority. There are TODOs for cases I believe it would be good to let user control the priority some day (e.g., file footer reads), and no TODO in cases I believe it doesn't matter (e.g., trace file reads).

The API doc only lists the missing cases where a file read associated with a provided `ReadOptions` cannot be rate limited. For cases like file ingestion checksum calculation, there is no API to provide `ReadOptions` or `Env::IOPriority`, so I didn't count that as missing.

Pull Request resolved: https://github.com/facebook/rocksdb/pull/9424

Test Plan:
- new unit tests
- new benchmarks on ~50MB database with 1MB/s read rate limit and 100ms refill interval; verified with strace reads are chunked (at 0.1MB per chunk) and spaced roughly 100ms apart.
  - setup command: `./db_bench -benchmarks=fillrandom,compact -db=/tmp/testdb -target_file_size_base=1048576 -disable_auto_compactions=true -file_checksum=true`
  - benchmarks command: `strace -ttfe pread64 ./db_bench -benchmarks=verifychecksum,verifyfilechecksums -use_existing_db=true -db=/tmp/testdb -rate_limiter_bytes_per_sec=1048576 -rate_limit_bg_reads=1 -rate_limit_user_ops=true -file_checksum=true`
- crash test using IO_USER priority on non-validation reads with https://github.com/facebook/rocksdb/issues/9567 reverted: `python3 tools/db_crashtest.py blackbox --max_key=1000000 --write_buffer_size=524288 --target_file_size_base=524288 --level_compaction_dynamic_level_bytes=true --duration=3600 --rate_limit_bg_reads=true --rate_limit_user_ops=true --rate_limiter_bytes_per_sec=10485760 --interval=10`

Reviewed By: hx235

Differential Revision: D33747386

Pulled By: ajkr

fbshipit-source-id: a2d985e97912fba8c54763798e04f006ccc56e0c
2022-02-16 23:18:14 -08:00
sdong 8286469b9a LDB to add --secondary_path to help (#9582)
Summary:
Opening DB as seconeary instance has been supported in ldb but it is not mentioned in --help. Mention it there. The part of the help message after the modification:

```
commands MUST specify --db=<full_path_to_db_directory> when necessary

commands can optionally specify
  --env_uri=<uri_of_environment> or --fs_uri=<uri_of_filesystem> if necessary
  --secondary_path=<secondary_path> to open DB as secondary instance. Operations not supported in secondary instance will fail.
```

Pull Request resolved: https://github.com/facebook/rocksdb/pull/9582

Test Plan: Build and run ldb --help

Reviewed By: riversand963

Differential Revision: D34286427

fbshipit-source-id: e56c5290d0548098ab6acc6dde2167f5a64f34f3
2022-02-16 17:07:37 -08:00
Andrew Kryczka ad2cab8f0c minor tweaks to db_crashtest.py settings (#9483)
Summary:
I did another pass through running CI jobs. It is uncommon now to see
`db_stress` stuck in the setup phase but still happen.

One reason was repeatedly reading/verifying checksum on filter blocks when
`-cache_index_and_filter_blocks=1` and `-cache_size=1048576`. To address
that I increased the cache size.

Another reason was having a WAL with many range tombstones and every
`db_stress` run using `-avoid_flush_during_recovery=1` (in that
scenario, the setup phase spent too much CPU in
`rocksdb::MemTable::NewRangeTombstoneIteratorInternal()`). To address
that I fixed the `-avoid_flush_during_recovery` setting so it is
reevaluated for every `db_stress` run.

Pull Request resolved: https://github.com/facebook/rocksdb/pull/9483

Reviewed By: riversand963

Differential Revision: D33922929

Pulled By: ajkr

fbshipit-source-id: 0a298ec7c4df6f6b44620233996047a2dc7ee5f3
2022-02-15 13:56:27 -08:00
Levi Tamasi ec0b1ff2bd Add blob compaction readahead size to the BlobDB benchmark script (#9566)
Summary: Pull Request resolved: https://github.com/facebook/rocksdb/pull/9566

Reviewed By: riversand963

Differential Revision: D34226256

Pulled By: ltamasi

fbshipit-source-id: 4374b819e937c35e3a866ba5b5eafba87ff20af3
2022-02-14 15:38:32 -08:00
Peter Dillinger 479eb1aad6 Hide deprecated, inefficient block-based filter from public API (#9535)
Summary:
This change removes the ability to configure the deprecated,
inefficient block-based filter in the public API. Options that would
have enabled it now use "full" (and optionally partitioned) filters.
Existing block-based filters can still be read and used, and a "back
door" way to build them still exists, for testing and in case of trouble.

About the only way this removal would cause an issue for users is if
temporary memory for filter construction greatly increases. In
HISTORY.md we suggest a few possible mitigations: partitioned filters,
smaller SST files, or setting reserve_table_builder_memory=true.

Or users who have customized a FilterPolicy using the
CreateFilter/KeyMayMatch mechanism removed in https://github.com/facebook/rocksdb/issues/9501 will have to upgrade
their code. (It's long past time for people to move to the new
builder/reader customization interface.)

This change also introduces some internal-use-only configuration strings
for testing specific filter implementations while bypassing some
compatibility / intelligence logic. This is intended to hint at a path
toward making FilterPolicy Customizable, but it also gives us a "back
door" way to configure block-based filter.

Aside: updated db_bench so that -readonly implies -use_existing_db

Pull Request resolved: https://github.com/facebook/rocksdb/pull/9535

Test Plan:
Unit tests updated. Specifically,

* BlockBasedTableTest.BlockReadCountTest is tweaked to validate the back
door configuration interface and ignoring of `use_block_based_builder`.
* BlockBasedTableTest.TracingGetTest is migrated from testing
block-based filter access pattern to full filter access patter, by
re-ordering some things.
* Options test (pretty self-explanatory)

Performance test - create with `./db_bench -db=/dev/shm/rocksdb1 -bloom_bits=10 -cache_index_and_filter_blocks=1 -benchmarks=fillrandom -num=10000000 -compaction_style=2 -fifo_compaction_max_table_files_size_mb=10000 -fifo_compaction_allow_compaction=0` with and without `-use_block_based_filter`, which creates a DB with 21 SST files in L0. Read with `./db_bench -db=/dev/shm/rocksdb1 -readonly -bloom_bits=10 -cache_index_and_filter_blocks=1 -benchmarks=readrandom -num=10000000 -compaction_style=2 -fifo_compaction_max_table_files_size_mb=10000 -fifo_compaction_allow_compaction=0 -duration=30`

Without -use_block_based_filter: readrandom 464 ops/sec, 689280 KB DB
With -use_block_based_filter: readrandom 169 ops/sec, 690996 KB DB
No consistent difference with fillrandom

Reviewed By: jay-zhuang

Differential Revision: D34153871

Pulled By: pdillinger

fbshipit-source-id: 31f4a933c542f8f09aca47fa64aec67832a69738
2022-02-12 07:05:57 -08:00
Akanksha Mahajan 9745c68eb1 Remove deprecated option new_table_reader_for_compaction_inputs (#9443)
Summary:
In RocksDB option new_table_reader_for_compaction_inputs has
not effect on Compaction or on the behavior of RocksDB library.
Therefore, we are removing it in the upcoming 7.0 release.

Pull Request resolved: https://github.com/facebook/rocksdb/pull/9443

Test Plan: CircleCI

Reviewed By: ajkr

Differential Revision: D33788508

Pulled By: akankshamahajan15

fbshipit-source-id: 324ca6f12bfd019e9bd5e1b0cdac39be5c3cec7d
2022-02-08 19:31:28 -08:00
Akanksha Mahajan ddce0c3f11 Add releases till 6.29.fb to compatibility check (#9529)
Summary:
Add releases till 6.29.fb to compatibility check for forward and backward compatibility

Pull Request resolved: https://github.com/facebook/rocksdb/pull/9529

Test Plan: run locally

Reviewed By: hx235

Differential Revision: D34086063

Pulled By: akankshamahajan15

fbshipit-source-id: 4ccff513c99cf2d0e41da0b76ab27ffcfdffe7df
2022-02-08 13:50:18 -08:00
satyajanga 036bbab6f7 Use the comparator from the sst file table properties in sst_dump_tool (#9491)
Summary:
We introduced a new Comparator for timestamp in user keys. In the sst_dump_tool by default we use BytewiseComparator to read sst files. This change allows us to read comparator_name from table properties in meta data block and use it to read.

Pull Request resolved: https://github.com/facebook/rocksdb/pull/9491

Test Plan:
added unittests for new functionality.
make check
![image](https://user-images.githubusercontent.com/4923556/152915444-28b88a1f-7b4e-47d0-815f-7011552bd9a2.png)
![image](https://user-images.githubusercontent.com/4923556/152916196-bea3d2a1-a3d5-4362-b911-036131b83e8d.png)

Reviewed By: riversand963

Differential Revision: D33993614

Pulled By: satyajanga

fbshipit-source-id: 4b5cf938e6d2cb3931d763bef5baccc900b8c536
2022-02-08 12:15:35 -08:00
Peter Dillinger d7c868b062 Work around snappy linker issue with newer compilers (#9517)
Summary:
After https://github.com/facebook/rocksdb/issues/9481, we are using newer default compiler for
build-format-compatible CircleCI nightly job, which fails on building
2.2.fb.branch branch because it tries to use a pre-compiled libsnappy.a
that is checked into the repo (!). This works around that by setting
SNAPPY_LDFLAGS=-lsnappy, which is only understood by such old versions.

Pull Request resolved: https://github.com/facebook/rocksdb/pull/9517

Test Plan:
Run check_format_compatible.sh on Ubuntu 20 AWS machine,
watch nightly run

Reviewed By: hx235

Differential Revision: D34055561

Pulled By: pdillinger

fbshipit-source-id: 45f9d428dd082f026773bfa8d9dd4dad66fc9378
2022-02-07 19:36:01 -08:00
Levi Tamasi 42e0751b3a Clean up VersionStorageInfo a bit (#9494)
Summary:
The patch does some cleanup in and around `VersionStorageInfo`:
* Renames the method `PrepareApply` to `PrepareAppend` in `Version`
to make it clear that it is to be called before appending the `Version` to
`VersionSet` (via `AppendVersion`), not before applying any `VersionEdit`s.
* Introduces a helper method `VersionStorageInfo::PrepareForVersionAppend`
(called by `Version::PrepareAppend`) that encapsulates the population of the
various derived data structures in `VersionStorageInfo`, and turns the
methods computing the derived structures (`UpdateNumNonEmptyLevels`,
`CalculateBaseBytes` etc.) into private helpers.
* Changes `Version::PrepareAppend` so it only calls `UpdateAccumulatedStats`
if the `update_stats` flag is set. (Earlier, this was checked by the callee.)
Related to this, it also moves the call to `ComputeCompensatedSizes` to
`VersionStorageInfo::PrepareForVersionAppend`.
* Updates and cleans up `version_builder_test`, `version_set_test`, and
`compaction_picker_test` so `PrepareForVersionAppend` is called anytime
a new `VersionStorageInfo` is set up or saved. This cleanup also involves
splitting `VersionStorageInfoTest.MaxBytesForLevelDynamic`
into multiple smaller test cases.
* Fixes up a bunch of comments that were outdated or just plain incorrect.

Pull Request resolved: https://github.com/facebook/rocksdb/pull/9494

Test Plan: Ran `make check` and the crash test script for a while.

Reviewed By: riversand963

Differential Revision: D33971666

Pulled By: ltamasi

fbshipit-source-id: fda52faac7783041126e4f8dec0fe01bdcadf65a
2022-02-04 08:19:20 -08:00
Yanqin Jin 8b62abcc21 Disable backup/restore for ts-stress test (#9497)
Summary: Pull Request resolved: https://github.com/facebook/rocksdb/pull/9497

Reviewed By: ajkr

Differential Revision: D33990256

Pulled By: riversand963

fbshipit-source-id: 268ce16b037e23e42b14fa0fcb45535582e1a0d6
2022-02-03 16:18:34 -08:00
mrambacher aae3093719 Introduce a CountedFileSystem for counting file operations (#9283)
Summary:
Added a CountedFileSystem that tracks a number of file operations (opens, closes, deletes, renames, flushes, syncs, fsyncs, reads, writes).    This class was based on the ReportFileOpEnv from db_bench.

This is a stepping stone PR to be able to change the SpecialEnv into a SpecialFileSystem, where several of the file varieties wish to do operation counting.

Pull Request resolved: https://github.com/facebook/rocksdb/pull/9283

Reviewed By: pdillinger

Differential Revision: D33062004

Pulled By: mrambacher

fbshipit-source-id: d0d297a7fb9c48c06cbf685e5fa755c27193b6f5
2022-02-03 15:01:23 -08:00
Yanqin Jin 3122cb4358 Revise APIs related to user-defined timestamp (#8946)
Summary:
ajkr reminded me that we have a rule of not including per-kv related data in `WriteOptions`.
Namely, `WriteOptions` should not include information about "what-to-write", but should just
include information about "how-to-write".

According to this rule, `WriteOptions::timestamp` (experimental) is clearly a violation. Therefore,
this PR removes `WriteOptions::timestamp` for compliance.
After the removal, we need to pass timestamp info via another set of APIs. This PR proposes a set
of overloaded functions `Put(write_opts, key, value, ts)`, `Delete(write_opts, key, ts)`, and
`SingleDelete(write_opts, key, ts)`. Planned to add `Write(write_opts, batch, ts)`, but its complexity
made me reconsider doing it in another PR (maybe).

For better checking and returning error early, we also add a new set of APIs to `WriteBatch` that take
extra `timestamp` information when writing to `WriteBatch`es.
These set of APIs in `WriteBatchWithIndex` are currently not supported, and are on our TODO list.

Removed `WriteBatch::AssignTimestamps()` and renamed `WriteBatch::AssignTimestamp()` to
`WriteBatch::UpdateTimestamps()` since this method require that all keys have space for timestamps
allocated already and multiple timestamps can be updated.

The constructor of `WriteBatch` now takes a fourth argument `default_cf_ts_sz` which is the timestamp
size of the default column family. This will be used to allocate space when calling APIs that do not
specify a column family handle.

Also, updated `DB::Get()`, `DB::MultiGet()`, `DB::NewIterator()`, `DB::NewIterators()` methods, replacing
some assertions about timestamp to returning Status code.

Pull Request resolved: https://github.com/facebook/rocksdb/pull/8946

Test Plan:
make check
./db_bench -benchmarks=fillseq,fillrandom,readrandom,readseq,deleterandom -user_timestamp_size=8
./db_stress --user_timestamp_size=8 -nooverwritepercent=0 -test_secondary=0 -secondary_catch_up_one_in=0 -continuous_verification_interval=0

Make sure there is no perf regression by running the following
```
./db_bench_opt -db=/dev/shm/rocksdb -use_existing_db=0 -level0_stop_writes_trigger=256 -level0_slowdown_writes_trigger=256 -level0_file_num_compaction_trigger=256 -disable_wal=1 -duration=10 -benchmarks=fillrandom
```

Before this PR
```
DB path: [/dev/shm/rocksdb]
fillrandom   :       1.831 micros/op 546235 ops/sec;   60.4 MB/s
```
After this PR
```
DB path: [/dev/shm/rocksdb]
fillrandom   :       1.820 micros/op 549404 ops/sec;   60.8 MB/s
```

Reviewed By: ltamasi

Differential Revision: D33721359

Pulled By: riversand963

fbshipit-source-id: c131561534272c120ffb80711d42748d21badf09
2022-02-01 22:19:01 -08:00
Hui Xiao 920386f2b7 Detect (new) Bloom/Ribbon Filter construction corruption (#9342)
Summary:
Note: rebase on and merge after https://github.com/facebook/rocksdb/pull/9349, https://github.com/facebook/rocksdb/pull/9345, (optional) https://github.com/facebook/rocksdb/pull/9393
**Context:**
(Quoted from pdillinger) Layers of information during new Bloom/Ribbon Filter construction in building block-based tables includes the following:
a) set of keys to add to filter
b) set of hashes to add to filter (64-bit hash applied to each key)
c) set of Bloom indices to set in filter, with duplicates
d) set of Bloom indices to set in filter, deduplicated
e) final filter and its checksum

This PR aims to detect corruption (e.g, unexpected hardware/software corruption on data structures residing in the memory for a long time) from b) to e) and leave a) as future works for application level.
- b)'s corruption is detected by verifying the xor checksum of the hash entries calculated as the entries accumulate before being added to the filter. (i.e, `XXPH3FilterBitsBuilder::MaybeVerifyHashEntriesChecksum()`)
- c) - e)'s corruption is detected by verifying the hash entries indeed exists in the constructed filter by re-querying these hash entries in the filter (i.e, `FilterBitsBuilder::MaybePostVerify()`) after computing the block checksum (except for PartitionFilter, which is done right after each `FilterBitsBuilder::Finish` for impl simplicity - see code comment for more). For this stage of detection, we assume hash entries are not corrupted after checking on b) since the time interval from b) to c) is relatively short IMO.

Option to enable this feature of detection is `BlockBasedTableOptions::detect_filter_construct_corruption` which is false by default.

**Summary:**
- Implemented new functions `XXPH3FilterBitsBuilder::MaybeVerifyHashEntriesChecksum()` and `FilterBitsBuilder::MaybePostVerify()`
- Ensured hash entries, final filter and banding and their [cache reservation ](https://github.com/facebook/rocksdb/issues/9073) are released properly despite corruption
   - See [Filter.construction.artifacts.release.point.pdf ](https://github.com/facebook/rocksdb/files/7923487/Design.Filter.construction.artifacts.release.point.pdf) for high-level design
   -  Bundled and refactored hash entries's related artifact in XXPH3FilterBitsBuilder into `HashEntriesInfo` for better control on lifetime of these artifact during `SwapEntires`, `ResetEntries`
- Ensured RocksDB block-based table builder calls `FilterBitsBuilder::MaybePostVerify()` after constructing the filter by `FilterBitsBuilder::Finish()`
- When encountering such filter construction corruption, stop writing the filter content to files and mark such a block-based table building non-ok by storing the corruption status in the builder.

Pull Request resolved: https://github.com/facebook/rocksdb/pull/9342

Test Plan:
- Added new unit test `DBFilterConstructionCorruptionTestWithParam.DetectCorruption`
- Included this new feature in `DBFilterConstructionReserveMemoryTestWithParam.ReserveMemory` as this feature heavily touch ReserveMemory's impl
   - For fallback case, I run `./filter_bench -impl=3 -detect_filter_construct_corruption=true -reserve_table_builder_memory=true -strict_capacity_limit=true  -quick -runs 10 | grep 'Build avg'` to make sure nothing break.
- Added to `filter_bench`: increased filter construction time by **30%**, mostly by `MaybePostVerify()`
   -  FastLocalBloom
       - Before change: `./filter_bench -impl=2 -quick -runs 10 | grep 'Build avg'`: **28.86643s**
       - After change:
          -  `./filter_bench -impl=2 -detect_filter_construct_corruption=false -quick -runs 10 | grep 'Build avg'` (expect a tiny increase due to MaybePostVerify is always called regardless): **27.6644s (-4% perf improvement might be due to now we don't drop bloom hash entry in `AddAllEntries` along iteration but in bulk later, same with the bypassing-MaybePostVerify case below)**
          - `./filter_bench -impl=2 -detect_filter_construct_corruption=true -quick -runs 10 | grep 'Build avg'` (expect acceptable increase): **34.41159s (+20%)**
          - `./filter_bench -impl=2 -detect_filter_construct_corruption=true -quick -runs 10 | grep 'Build avg'` (by-passing MaybePostVerify, expect minor increase): **27.13431s (-6%)**
    -  Standard128Ribbon
       - Before change: `./filter_bench -impl=3 -quick -runs 10 | grep 'Build avg'`: **122.5384s**
       - After change:
          - `./filter_bench -impl=3 -detect_filter_construct_corruption=false -quick -runs 10 | grep 'Build avg'` (expect a tiny increase due to MaybePostVerify is always called regardless - verified by removing MaybePostVerify under this case and found only +-1ns difference): **124.3588s (+2%)**
          - `./filter_bench -impl=3 -detect_filter_construct_corruption=true -quick -runs 10 | grep 'Build avg'`(expect acceptable increase): **159.4946s (+30%)**
          - `./filter_bench -impl=3 -detect_filter_construct_corruption=true -quick -runs 10 | grep 'Build avg'`(by-passing MaybePostVerify, expect minor increase) : **125.258s (+2%)**
- Added to `db_stress`: `make crash_test`, `./db_stress --detect_filter_construct_corruption=true`
- Manually smoke-tested: manually corrupted the filter construction in some db level tests with basic PUT and background flush. As expected, the error did get returned to users in subsequent PUT and Flush status.

Reviewed By: pdillinger

Differential Revision: D33746928

Pulled By: hx235

fbshipit-source-id: cb056426be5a7debc1cd16f23bc250f36a08ca57
2022-02-01 17:42:35 -08:00
Peter Dillinger f6d7ec1d02 Ignore `total_order_seek` in DB::Get (#9427)
Summary:
Apparently setting total_order_seek=true for DB::Get was
intended to allow accurate read semantics if the current prefix
extractor doesn't match what was used to generate SST files on
disk. But since prefix_extractor was made a mutable option in 5.14.0, we
have been able to detect this case and provide the correct semantics
regardless of the total_order_seek option. Since that time, the option
has only made Get() slower in a reasonably common case: prefix_extractor
unchanged and whole_key_filtering=false.

So this change primarily removes unnecessary effect of
total_order_seek on Get. Also cleans up some related comments.

Also adds a -total_order_seek option to db_bench and canonicalizes
handling of ReadOptions in db_bench so that command line options have
the expected association with library features. (There is potential
for change in regression test behavior, but the old behavior is likely
indefensible, or some other inconsistency would need to be fixed.)

TODO in follow-up work: there should be no reason for Get() to depend on
current prefix extractor at all.

Pull Request resolved: https://github.com/facebook/rocksdb/pull/9427

Test Plan:
Unit tests updated.

Performance (using db_bench update)

Create DB with `TEST_TMPDIR=/dev/shm/rocksdb ./db_bench -benchmarks=fillrandom -num=10000000 -disable_wal=1 -write_buffer_size=10000000 -bloom_bits=16 -compaction_style=2 -fifo_compaction_max_table_files_size_mb=10000 -fifo_compaction_allow_compaction=0 -prefix_size=12 -whole_key_filtering=0`

Test with and without `-total_order_seek` on `TEST_TMPDIR=/dev/shm/rocksdb ./db_bench -use_existing_db -readonly -benchmarks=readrandom -num=10000000 -duration=40 -disable_wal=1 -bloom_bits=16 -compaction_style=2 -fifo_compaction_max_table_files_size_mb=10000 -fifo_compaction_allow_compaction=0 -prefix_size=12`

Before this change, total_order_seek=false: 25188 ops/sec
Before this change, total_order_seek=true:   1222 ops/sec (~20x slower)

After this change, total_order_seek=false: 24570 ops/sec
After this change, total_order_seek=true:  25012 ops/sec (indistinguishable)

Reviewed By: siying

Differential Revision: D33753458

Pulled By: pdillinger

fbshipit-source-id: bf892f34907a5e407d9c40bd4d42f0adbcbe0014
2022-01-31 19:46:42 -08:00
Andrew Kryczka 8dbd0bd11f db_crashtest.py use cheaper settings (#9476)
Summary:
Despite attempts to optimize `db_stress` setup phase (i.e.,
pre-`OperateDb()`) latency in https://github.com/facebook/rocksdb/issues/9470 and https://github.com/facebook/rocksdb/issues/9475, it still always took tens
of seconds. Since we still aren't able to setup a 100M key `db_stress`
quickly, we should reduce the number of keys. This PR reduces it 4x
while increasing `value_size_mult` 4x (from its default value of 8) so
that memtables and SST files fill at a similar rate compared to before this PR.

Also disabled bzip2 compression since we'll probably never use it and
I noticed many CI runs spending majority of CPU on bzip2 decompression.

Pull Request resolved: https://github.com/facebook/rocksdb/pull/9476

Reviewed By: siying

Differential Revision: D33898520

Pulled By: ajkr

fbshipit-source-id: 855021784ad9664f2be5bce21f0339a1cf93230d
2022-01-31 13:21:24 -08:00
Hui Xiao 42cca28ebb Remove deprecated API AdvancedColumnFamilyOptions::rate_limit_delay_max_milliseconds (#9455)
Summary:
**Context/Summary:**
AdvancedColumnFamilyOptions::rate_limit_delay_max_milliseconds has been marked as deprecated and it's time to actually remove the code.
- Keep `soft_rate_limit`/`hard_rate_limit` in `cf_mutable_options_type_info` to prevent throwing `InvalidArgument` in `GetColumnFamilyOptionsFromMap` when reading an option file still with these options (e.g, old option file generated from RocksDB before the deprecation)
- Keep `soft_rate_limit`/`hard_rate_limit` in under `OptionsOldApiTest.GetOptionsFromMapTest` to test the case mentioned above.

Pull Request resolved: https://github.com/facebook/rocksdb/pull/9455

Test Plan: Rely on my eyeball and CI

Reviewed By: ajkr

Differential Revision: D33811664

Pulled By: hx235

fbshipit-source-id: 866859427fe710354a90f1095057f80116365ff0
2022-01-28 16:47:08 -08:00
Akanksha Mahajan 74ccd1931e Remove deprecated option DBOptions::skip_log_error_on_recovery (#9434)
Summary:
In  RocksDB DBOptions::skip_log_error_on_recovery is marked as
"NOT SUPPORTED" for a long time, and setting this option does not have
any effect on the behavior of RocksDB library. Therefore, we are removing it
in the upcoming 7.0 release.

Pull Request resolved: https://github.com/facebook/rocksdb/pull/9434

Test Plan: CircleCI

Reviewed By: ajkr

Differential Revision: D33763015

Pulled By: akankshamahajan15

fbshipit-source-id: 11f09643298da6c02d3dcdb090b996f4c3cfdd76
2022-01-28 01:46:04 -08:00
Peter Dillinger c11fe94000 Fix^2 prefix extractor testing in crash test (#9463)
Summary:
Even after https://github.com/facebook/rocksdb/issues/9461 could see
```
Error: please specify prefix_size for test_batches_snapshots test!
```

Pull Request resolved: https://github.com/facebook/rocksdb/pull/9463

Test Plan:
run `make blackbox_crashtest` for a long time. (Unfortunately,
it's taking a long time to reproduce these failures)

Reviewed By: akankshamahajan15

Differential Revision: D33838152

Pulled By: pdillinger

fbshipit-source-id: b9a73c5bbb68df53f14c22b9b52f61d1f7ef38af
2022-01-27 23:11:11 -08:00
Jay Zhuang 22321e1027 Remove unused API base_background_compactions (#9462)
Summary:
The API is deprecated long time ago. Clean up the codebase by
removing it.

Pull Request resolved: https://github.com/facebook/rocksdb/pull/9462

Test Plan: CI, fake release: D33835220

Reviewed By: riversand963

Differential Revision: D33835103

Pulled By: jay-zhuang

fbshipit-source-id: 6d2dc12c8e7fdbe2700865a3e61f0e3f78bd8184
2022-01-27 21:05:18 -08:00
Peter Dillinger 981e8c621f Fix/expand prefix extractor testing in crash test (#9461)
Summary:
Changes in https://github.com/facebook/rocksdb/issues/9453 could trigger
```
stderr:
Error: prefixpercent is non-zero while prefix_size is not positive!
```

Pull Request resolved: https://github.com/facebook/rocksdb/pull/9461

Test Plan: run `make blackbox_crashtest` for a long time

Reviewed By: ajkr

Differential Revision: D33830751

Pulled By: pdillinger

fbshipit-source-id: be88377dcaa47e4bb7adb0347762639eff8f1476
2022-01-27 16:37:55 -08:00
Peter Dillinger 78aee6fedc Remove obsolete backupable_db.h, utility_db.h (#9438)
Summary:
This also removes the obsolete names BackupableDBOptions
and UtilityDB. API users must now use BackupEngineOptions and
DBWithTTL::Open. In C API, `rocksdb_backupable_db_*` is replaced
`rocksdb_backup_engine_*`. Similar renaming in Java API.

In reference to https://github.com/facebook/rocksdb/issues/9389

Pull Request resolved: https://github.com/facebook/rocksdb/pull/9438

Test Plan: CI

Reviewed By: mrambacher

Differential Revision: D33780269

Pulled By: pdillinger

fbshipit-source-id: 4a6cfc5c1b4c78bcad790b9d3dd13c5fdf4a1fac
2022-01-27 15:45:30 -08:00
Peter Dillinger ea89c77f27 Fix major bug with MultiGet, DeleteRange, and memtable Bloom (#9453)
Summary:
MemTable::MultiGet was not considering range tombstones before
querying Bloom filter. This means range tombstones would be skipped for
keys (or prefixes) with no other entries in the memtable. This could cause
old values for a key (in SST files) to still show up until the range tombstone
covering it has been flushed.

This is fixed by essentially disabling the memtable Bloom filter when there
are any range tombstones. (This could be better optimized in the future, but
good enough for now.)

Did some other cleanup/optimization in the same code to (more than) offset
the cost of checking on range tombstones in more cases. There is now
notable improvement when memtable_whole_key_filtering and prefix_extractor
are used together (unusual), and this makes MultiGet closer to the Get
implementation.

Pull Request resolved: https://github.com/facebook/rocksdb/pull/9453

Test Plan:
new unit test added. Added memtable Bloom to crash test.

Performance testing
--------------------

Build WAL-only DB (recovers to memtable):
```
TEST_TMPDIR=/dev/shm/rocksdb ./db_bench -benchmarks=fillrandom -num=1000000 -write_buffer_size=250000000
```

Query test command, to maximize sensitivity to the changed code:
```
TEST_TMPDIR=/dev/shm/rocksdb ./db_bench -use_existing_db -readonly -benchmarks=multireadrandom -num=10000000 -write_buffer_size=250000000 -memtable_bloom_size_ratio=0.015 -multiread_batched -batch_size=24 -threads=8 -memtable_whole_key_filtering=$MWKF -prefix_size=$PXS
```
(Note -num here is 10x larger for mostly memtable misses)

Before & after run simultaneously, average over 10 iterations per data point, ops/sec.

MWKF=0 PXS=0 (Bloom disabled)
Before: 5724844
After: 6722066

MWKF=0 PXS=7 (prefixes hardly unique; Bloom not useful)
Before: 9981319
After: 10237990

MWKF=0 PXS=8 (prefixes unique; Bloom useful)
Before:  12081715
After: 12117603

MWKF=1 PXS=0 (whole key Bloom useful)
Before: 11944354
After: 12096085

MWKF=1 PXS=7 (whole key Bloom useful in new version; prefixes not useful in old version)
Before: 9444299
After: 11826029

MWKF=1 PXS=7 (whole key Bloom useful in new version; prefixes useful in old version)
Before: 11784465
After: 11778591

Only in this last case is the 'before' *slightly* faster, perhaps because hashing prefixes is slightly faster than hashing whole keys. Otherwise, 'after' is faster.

Reviewed By: ajkr

Differential Revision: D33805025

Pulled By: pdillinger

fbshipit-source-id: 597523cae4f4eafdf6ae6bb2bc6cb46f83b017bf
2022-01-27 14:55:04 -08:00
Hui Xiao 1e0e883ca5 Remove deprecated API AdvancedColumnFamilyOptions::soft_rate_limit/hard_rate_limit (#9452)
Summary:
**Context/Summary:**
AdvancedColumnFamilyOptions::soft_rate_limit/hard_rate_limit have been marked as deprecated and it's time to actually remove the code.
- Keep `soft_rate_limit`/`hard_rate_limit` in `cf_mutable_options_type_info` to prevent throwing `InvalidArgument` in `GetColumnFamilyOptionsFromMap` when reading an option file still with these options (e.g, old option file generated from RocksDB before the deprecation)
- Keep `soft_rate_limit`/`hard_rate_limit` in under `OptionsOldApiTest.GetOptionsFromMapTest` to test the case mentioned above.

Pull Request resolved: https://github.com/facebook/rocksdb/pull/9452

Test Plan: Rely on my eyeball and CI

Reviewed By: ajkr

Differential Revision: D33804938

Pulled By: hx235

fbshipit-source-id: 133d49f7ec5238d7efceeb0a3122a5792a2b9945
2022-01-27 13:01:09 -08:00
yaphet 7fb723f581 Using back to get the last element (#9415)
Summary: Pull Request resolved: https://github.com/facebook/rocksdb/pull/9415

Reviewed By: ajkr

Differential Revision: D33773673

Pulled By: riversand963

fbshipit-source-id: 52b59ec5a6b01a91d3f990b7f2b0f16320afb49b
2022-01-27 11:35:33 -08:00
Siddhartha Roychowdhury c27ca23644 Add option for WAL compression algorithm (#9432)
Summary:
Add an option to set the WAL compression algorithm - wal_compression.

TODO: WAL compression is not implemented and will only support zstd initially. Will be added in subsequent diffs.

Pull Request resolved: https://github.com/facebook/rocksdb/pull/9432

Reviewed By: pdillinger

Differential Revision: D33797275

Pulled By: sidroyc

fbshipit-source-id: 8db81d9c9cea5e2e4f1445d3aecad8106137b8e7
2022-01-26 14:23:00 -08:00
Jay Zhuang 961d8dacf2 Remove unused option purge_redundant_kvs_while_flush (#9429)
Summary: Pull Request resolved: https://github.com/facebook/rocksdb/pull/9429

Test Plan: fake release for test: D33754513

Reviewed By: riversand963

Differential Revision: D33753637

Pulled By: jay-zhuang

fbshipit-source-id: 18db4701e8f28dda8f1ab660c2be9890a8312c12
2022-01-26 10:24:16 -08:00
Yanqin Jin 5d30668cab Remove tools/rdb from main repo (#9399)
Summary:
This PR is one proposal to resolve https://github.com/facebook/rocksdb/issues/9382.

Looking at the code, I can't think of a reason why rdb is an internal component of RocksDB: it does not require
any header files NOT in `include/rocksdb`. It's a better idea to host it somewhere else.

Plus, rdb requires python2 which is not supported any more. No fixes or improvements will be made, even for potential
security bugs (https://www.python.org/doc/sunset-python-2/).

Pull Request resolved: https://github.com/facebook/rocksdb/pull/9399

Test Plan: make check

Reviewed By: ajkr

Differential Revision: D33641965

Pulled By: riversand963

fbshipit-source-id: 2a6a74693e5de36834f355e41d6865db206af48b
2022-01-24 21:23:03 -08:00
Yanqin Jin 50135c1bf3 Move HDFS support to separate repo (#9170)
Summary:
This PR moves HDFS support from RocksDB repo to a separate repo. The new (temporary?) repo
in this PR serves as an example before we finalize the decision on where and who to host hdfs support. At this point,
people can start from the example repo and fork.

Java/JNI is not included yet, and needs to be done later if necessary.

The goal is to include this commit in RocksDB 7.0 release.

Reference:
https://github.com/ajkr/dedupfs by ajkr

Pull Request resolved: https://github.com/facebook/rocksdb/pull/9170

Test Plan:
Follow the instructions in https://github.com/riversand963/rocksdb-hdfs-env/blob/master/README.md. Build and run db_bench and db_stress.

make check

Reviewed By: ajkr

Differential Revision: D33751662

Pulled By: riversand963

fbshipit-source-id: 22b4db7f31762ed417a20239f5a08dcd1696244f
2022-01-24 20:23:54 -08:00
Peter Dillinger 5576ded762 Add Options::DisableExtraChecks, clarify force_consistency_checks (#9363)
Summary:
In response to https://github.com/facebook/rocksdb/issues/9354, this PR adds a way for users to "opt out"
of extra checks that can impact peak write performance, which
currently only includes force_consistency_checks. I considered including
some other options but did not see a db_bench performance difference.

Also clarify in comment for force_consistency_checks that it can "slow
down saturated writing."

Pull Request resolved: https://github.com/facebook/rocksdb/pull/9363

Test Plan:
basic coverage in unit tests

Using my perf test in https://github.com/facebook/rocksdb/issues/9354 comment, I see

force_consistency_checks=true -> 725360 ops/s
force_consistency_checks=false -> 783072 ops/s

Reviewed By: mrambacher

Differential Revision: D33636559

Pulled By: pdillinger

fbshipit-source-id: 25bfd006f4844675e7669b342817dd4c6a641e84
2022-01-18 17:31:03 -08:00
Yanqin Jin 0376869f05 Remove using namespace (#9369)
Summary:
As title.
This is part of an fb-internal task.
First, remove all `using namespace` statements if applicable.
Next, utilize multiple build platforms and see if anything is broken.
Should anything become broken, fix the compilation errors with as little extra change as possible.

Pull Request resolved: https://github.com/facebook/rocksdb/pull/9369

Test Plan:
internal build and make check
make clean && make static_lib && cd examples && make all

Reviewed By: pdillinger

Differential Revision: D33517260

Pulled By: riversand963

fbshipit-source-id: 3fc4ce6402a073421dfd9a9b2d1c79441dca7a40
2022-01-12 09:31:12 -08:00
Andrew Kryczka 6892f19b11 Test correctness with WAL disabled in non-txn blackbox crash tests (#9338)
Summary:
Recently we added the ability to verify some prefix of operations are recovered (AKA no "hole" in the recovered data) (https://github.com/facebook/rocksdb/issues/8966). Besides testing unsynced data loss scenarios, it is also useful to test WAL disabled use cases, where unflushed writes are expected to be lost. Note RocksDB only offers the prefix-recovery guarantee to WAL-disabled use cases that use atomic flush, so crash test always enables atomic flush when WAL is disabled.

To verify WAL-disabled crash-recovery correctness globally, i.e., also in whitebox and blackbox transaction tests, it is possible but requires further changes. I added TODOs in db_crashtest.py.

Depends on https://github.com/facebook/rocksdb/issues/9305.

Pull Request resolved: https://github.com/facebook/rocksdb/pull/9338

Test Plan: Running all crash tests and many instances of blackbox. Sandcastle links are in Phabricator diff test plan.

Reviewed By: riversand963

Differential Revision: D33345333

Pulled By: ajkr

fbshipit-source-id: f56dd7d2e5a78d59301bf4fc3fedb980eb31e0ce
2022-01-05 16:23:37 -08:00
mrambacher fe31dc53ca Make the Env class Customizable (#9293)
Summary:
Allows the Env to have options (Configurable) and loads like other Customizable classes.

Pull Request resolved: https://github.com/facebook/rocksdb/pull/9293

Reviewed By: pdillinger, zhichao-cao

Differential Revision: D33181591

Pulled By: mrambacher

fbshipit-source-id: 55e823886c654d214eda9eedd45ccdc54dac14d7
2022-01-04 16:45:49 -08:00
sdong a931bacf5d Improve SimulatedHybridFileSystem (#9301)
Summary:
Several improvements to SimulatedHybridFileSystem:
(1) Allow a mode where all I/Os to all files simulate HDD. This can be enabled in db_bench using -simulate_hdd
(2) Latency calculation is slightly more accurate
(3) Allow to simulate more than one HDD spindles.

Pull Request resolved: https://github.com/facebook/rocksdb/pull/9301

Test Plan: Run db_bench and observe the results are reasonable.

Reviewed By: jay-zhuang

Differential Revision: D33141662

fbshipit-source-id: b736e58c4ba910d06899cc9ccec79b628275f4fa
2021-12-29 11:14:42 -08:00
Andrew Kryczka 5383f1eec4 Verify recovery correctness in multi-CF blackbox crash test (#9303)
Summary:
db_crashtest.py uses multiple CFs only when run without flag `--simple`.
The previous config set `-test_batches_snapshots=1` in that case for
blackbox mode. But `-test_batches_snapshots=1` cannot verify recovery
correctness, so it should not always be set for multi-CF blackbox tests.
We can instead randomly toggle it.

Pull Request resolved: https://github.com/facebook/rocksdb/pull/9303

Reviewed By: riversand963

Differential Revision: D33155229

Pulled By: ajkr

fbshipit-source-id: 4a6fdc4eddccc8ece664063baf6393ce1c5de6b7
2021-12-16 09:05:40 -08:00
sdong 806d8916da SimulatedHybridFileSystem to simulate HDD behavior more accurately (#9259)
Summary:
SimulatedHybridFileSystem now takes a more thorough simualtion of an HDD:
1. cover writes too, not just read
2. Latency and throughput is now simulated as seek + read time, using a rate limiter
This implementation can be modified to simulate full HDD behavior, which is not yet done.

Pull Request resolved: https://github.com/facebook/rocksdb/pull/9259

Test Plan: Run db_bench and observe the desired behavior.

Reviewed By: jay-zhuang

Differential Revision: D32903039

fbshipit-source-id: a83f5d72143e114d5e75edf39d647bf0b71978e1
2021-12-14 20:07:57 -08:00
Yanqin Jin bd513fd075 Add commit marker with timestamp (#9266)
Summary:
Pull Request resolved: https://github.com/facebook/rocksdb/pull/9266

This diff adds a new tag `CommitWithTimestamp`. Currently, there is no API to trigger writing
this tag to WAL, thus it is unavailable to users.
This is an ongoing effort to add user-defined timestamp support to write-committed transactions.
This diff also indicates all column families that may potentially participate in the same
transaction must either disable timestamp or have the same timestamp format, since
`CommitWithTimestamp` tag is followed by a single byte-array denoting the commit
timestamp of the transaction. We will enforce this checking in a future diff. We keep this
diff small.

Reviewed By: ltamasi

Differential Revision: D31721350

fbshipit-source-id: e1450811443647feb6ca01adec4c8aaae270ffc6
2021-12-10 11:05:35 -08:00
Akanksha Mahajan 9e4d56f2c9 Fix segmentation fault in table_options.prepopulate_block_cache when used with partition_filters (#9263)
Summary:
When table_options.prepopulate_block_cache is set to
BlockBasedTableOptions::PrepopulateBlockCache::kFlushOnly and
table_options.partition_filters is also set true, then there is
segmentation failure when top level filter is fetched because its
entered with wrong type in cache.

Pull Request resolved: https://github.com/facebook/rocksdb/pull/9263

Test Plan:
Updated unit tests;
Ran db_stress: make crash_test -j32

Reviewed By: pdillinger

Differential Revision: D32936566

Pulled By: akankshamahajan15

fbshipit-source-id: 8bd79e53830d3e3c1bb79787e1ffbc3cb46d4426
2021-12-08 12:44:38 -08:00
sdong 88875df821 File temperature information should be preserved when restart the DB (#9242)
Summary:
Fix a bug that causes file temperature not preserved after DB is restarted, or options.max_manifest_file_size is hit.
Also, pass temperature information to NewRandomAccessFile() to allow users to hack a solution where they don't preserve tiering information.

Pull Request resolved: https://github.com/facebook/rocksdb/pull/9242

Test Plan: Add a unit test that would fail without the fix.

Reviewed By: jay-zhuang

Differential Revision: D32818150

fbshipit-source-id: 36aa3f148c60107f7b8e9d65b63b039f9e1a1eec
2021-12-03 14:43:14 -08:00
Yanqin Jin 42fef0224f Fix build for msvc (#9230)
Summary:
Test plan

With Visual Studio 2017.
```
cd rocksdb
mkdir build && cd build
cmake -G "Visual Studio 15 Win64" -DWITH_GFLAGS=1 ..
MSBuild rocksdb.sln /m /TARGET:cache_bench /TARGET:db_bench /TARGET:db_stress
```

Pull Request resolved: https://github.com/facebook/rocksdb/pull/9230

Reviewed By: akankshamahajan15

Differential Revision: D32705095

Pulled By: riversand963

fbshipit-source-id: 101e3533f5178b24c0535ddc47a39347ccfcf92c
2021-11-29 14:27:48 -08:00
Zhichao Cao 4340e1ff6d Disable the QPS verification in test temporally (#9190)
Summary:
Disable the QPS verification in test temporally, which causes the test failure due to different system delays.

Pull Request resolved: https://github.com/facebook/rocksdb/pull/9190

Test Plan: make check

Reviewed By: siying

Differential Revision: D32576289

Pulled By: zhichao-cao

fbshipit-source-id: 1df972e77dd82eed5af3462e5db5e141aadf8fae
2021-11-19 23:47:43 -08:00
Levi Tamasi dc5de45af8 Support readahead during compaction for blob files (#9187)
Summary:
The patch adds a new BlobDB configuration option `blob_compaction_readahead_size`
that can be used to enable prefetching data from blob files during compaction.
This is important when using storage with higher latencies like HDDs or remote filesystems.
If enabled, prefetching is used for all cases when blobs are read during compaction,
namely garbage collection, compaction filters (when the existing value has to be read from
a blob file), and `Merge` (when the value of the base `Put` is stored in a blob file).

Pull Request resolved: https://github.com/facebook/rocksdb/pull/9187

Test Plan: Ran `make check` and the stress/crash test.

Reviewed By: riversand963

Differential Revision: D32565512

Pulled By: ltamasi

fbshipit-source-id: 87be9cebc3aa01cc227bec6b5f64d827b8164f5d
2021-11-19 17:53:47 -08:00
Zhichao Cao f4294669e0 Fix the analyzer test failure caused by inaccurate timing wait (#9181)
Summary:
Fix the analyzer test failure caused by inaccurate timing wait. The wait time at different system might be different or cause the delay, now we do not accurately count the lines. Only in a very rare extreme case, test will ignore the part exceed the timing of 1 second.

Pull Request resolved: https://github.com/facebook/rocksdb/pull/9181

Test Plan: make check

Reviewed By: pdillinger

Differential Revision: D32511319

Pulled By: zhichao-cao

fbshipit-source-id: e694c8cb465c750cfa5a43dab3eff6707b9a11c8
2021-11-18 11:28:38 -08:00
Akanksha Mahajan 17ce1ca48b Reuse internal auto readhead_size at each Level (expect L0) for Iterations (#9056)
Summary:
RocksDB does auto-readahead for iterators on noticing more than two sequential reads for a table file if user doesn't provide readahead_size. The readahead starts at 8KB and doubles on every additional read up to max_auto_readahead_size. However at each level, if iterator moves over next file, readahead_size starts again from 8KB.

This PR introduces a new ReadOption "adaptive_readahead" which when set true will maintain readahead_size  at each level. So when iterator moves from one file to another, new file's readahead_size will continue from previous file's readahead_size instead of scratch. However if reads are not sequential it will fall back to 8KB (default) with no prefetching for that block.

1. If block is found in cache but it was eligible for prefetch (block wasn't in Rocksdb's prefetch buffer),  readahead_size will decrease by 8KB.
2. It maintains readahead_size for L1 - Ln levels.

Pull Request resolved: https://github.com/facebook/rocksdb/pull/9056

Test Plan:
Added new unit tests
Ran db_bench for "readseq, seekrandom, seekrandomwhilewriting, readrandom" with --adaptive_readahead=true and there was no regression if new feature is enabled.

Reviewed By: anand1976

Differential Revision: D31773640

Pulled By: akankshamahajan15

fbshipit-source-id: 7332d16258b846ae5cea773009195a5af58f8f98
2021-11-10 16:20:04 -08:00
anand76 dddb791c18 Enable a few unit tests to use custom Env objects (#9087)
Summary:
Allow compaction_job_test, db_io_failure_test, dbformat_test, deletefile_test, and fault_injection_test to use a custom Env object. Also move ```RegisterCustomObjects``` declaration to a header file to simplify things.

Pull Request resolved: https://github.com/facebook/rocksdb/pull/9087

Test Plan: Run manually using "buck test rocksdb/src:compaction_job_test_fbcode" etc.

Reviewed By: riversand963

Differential Revision: D32007222

Pulled By: anand1976

fbshipit-source-id: 99af58559e25bf61563dfa95dc46e31fa7375792
2021-11-08 11:05:59 -08:00
anand76 78556c14dd Secondary cache error injection (#9002)
Summary:
Implement secondary cache error injection in db_stress.

Pull Request resolved: https://github.com/facebook/rocksdb/pull/9002

Reviewed By: zhichao-cao

Differential Revision: D31874896

Pulled By: anand1976

fbshipit-source-id: 8cf04c061a4a44efa0fe88423d05cade67b85f73
2021-11-08 10:27:27 -08:00