Commit Graph

11314 Commits

Author SHA1 Message Date
Alan Paxton 2f4a0ffef8 CI Benchmarking with CircleCI Runner and OpenSearch Dashboard (EB 1088) (#9723)
Summary:
CircleCI runner based benchmarking. A runner is a dedicate machine configured for CircleCI to perform work on. Our work is a repeatable benchmark, the `benchmark-linux` job in `config.yml`

A runner, in CircleCI terminology, is a machine that is managed by the client (us) rather than running on CircleCI resources in the cloud. This means that we define and configure the iron, and that therefore the performance is repeatable and predictable. Which is what we need for performance regression benchmarking.

On a time schedule (or on commit, during branch development) benchmarks are set off on the runner, and then a script is run `benchmark_log_tool.py` which parses the benchmark output and pushes it into a pre-configured OpenSearch document connected to an OpenSearch dashboard. Members of the team can examine benchmark performance changes on the dashboard.

As time progresses we can add different benchmarks to the suite which gets run.

Pull Request resolved: https://github.com/facebook/rocksdb/pull/9723

Reviewed By: pdillinger

Differential Revision: D35555626

Pulled By: jay-zhuang

fbshipit-source-id: c6a905ca04494495c3784cfbb991f5ab90c807ee
2022-06-04 09:31:47 -07:00
yite.gu 560906ab33 Add a simple example of backup and restore (#10054)
Summary:
Add a simple example of backup and restore

Signed-off-by: YiteGu <ess_gyt@qq.com>

Pull Request resolved: https://github.com/facebook/rocksdb/pull/10054

Reviewed By: jay-zhuang

Differential Revision: D36678141

Pulled By: ajkr

fbshipit-source-id: 43545356baddb4c2c76c62cd63d7a3238d1f8a00
2022-06-03 23:25:31 -07:00
Levi Tamasi e9c74bc474 Add wide column serialization primitives (#9915)
Summary:
The patch adds some low-level logic that can be used to serialize/deserialize
a sorted vector of wide columns to/from a simple binary searchable string
representation. Currently, there is no user-facing API; this will be implemented in
subsequent stages.

Pull Request resolved: https://github.com/facebook/rocksdb/pull/9915

Test Plan: `make check`

Reviewed By: siying

Differential Revision: D35978076

Pulled By: ltamasi

fbshipit-source-id: 33f5f6628ec3bcd8c8beab363b1978ac047a8788
2022-06-03 20:54:48 -07:00
Yanqin Jin 3e02c6e05a Point-lookup returns timestamps of Delete and SingleDelete (#10056)
Summary:
If caller specifies a non-null `timestamp` argument in `DB::Get()` or a non-null `timestamps` in `DB::MultiGet()`,
RocksDB will return the timestamps of the point tombstones.

Note: DeleteRange is still unsupported.

Pull Request resolved: https://github.com/facebook/rocksdb/pull/10056

Test Plan: make check

Reviewed By: ltamasi

Differential Revision: D36677956

Pulled By: riversand963

fbshipit-source-id: 2d7af02cc7237b1829cd269086ea895a49d501ae
2022-06-03 20:00:42 -07:00
Hui Xiao 4bdcc80192 Increase ChargeTableReaderTest/ChargeTableReaderTest.Basic error tolerance rate from 1% to 5% (#10113)
Summary:
**Context:**
https://github.com/facebook/rocksdb/pull/9748 added support to charge table reader memory to block cache. In the test `ChargeTableReaderTest/ChargeTableReaderTest.Basic`, it estimated the table reader memory, calculated the expected number of table reader opened based on this estimation and asserted this number with actual number. The expected number of table reader opened calculated based on estimated table reader memory will not be 100% accurate and should have tolerance for error. It was previously set to 1% and recently encountered an assertion failure that `(opened_table_reader_num) <= (max_table_reader_num_capped_upper_bound), actual: 375 or 376 vs 374` where `opened_table_reader_num` is the actual opened one and `max_table_reader_num_capped_upper_bound` is the estimated opened one (=371 * 1.01). I believe it's safe to increase error tolerance from 1% to 5% hence there is this PR.

Pull Request resolved: https://github.com/facebook/rocksdb/pull/10113

Test Plan: - CI again succeeds.

Reviewed By: ajkr

Differential Revision: D36911556

Pulled By: hx235

fbshipit-source-id: 259687dd77b450fea0f5658a5b567a1d31d4b1f7
2022-06-03 19:42:22 -07:00
Zeyi (Rice) Fan c1018b7516 cmake: add an option to skip thirdparty.inc on Windows (#10110)
Summary:
When building RocksDB with getdeps on Windows, `thirdparty.inc` get in the way since `FindXXXX.cmake` are working properly now.

This PR adds an option to skip that file when building RocksDB so we can disable it.

FB: see [D36905191](https://www.internalfb.com/diff/D36905191).

Pull Request resolved: https://github.com/facebook/rocksdb/pull/10110

Reviewed By: siying

Differential Revision: D36913882

Pulled By: fanzeyi

fbshipit-source-id: 33d36841dc0d4fe87f51e1d9fd2b158a3adab88f
2022-06-03 19:20:34 -07:00
Levi Tamasi 7d36bc4273 Fix some bugs in verify_random_db.sh (#10112)
Summary:
The patch attempts to fix three bugs in `verify_random_db.sh`:
1) https://github.com/facebook/rocksdb/pull/9937 changed the default for
`--try_load_options` to true in the script's use case, so we have to
explicitly set it to false if the corresponding argument of the script
is 0. This should fix the issue we've been seeing with our forward
compatibility tests where 7.3 is unable to open a database created by
the version on main after adding a new configuration option.
2) The script seems to support two "extra parameters"; however,
in practice, if the second one was set, only that one was passed on to
`ldb`. Now both get forwarded.
3) When running the `diff` command, the base DB directory was passed as
the second argument instead of the file containing the `ldb` output
(this actually seems to work, probably accidentally though).

Pull Request resolved: https://github.com/facebook/rocksdb/pull/10112

Reviewed By: pdillinger

Differential Revision: D36911363

Pulled By: ltamasi

fbshipit-source-id: fe29db4e28d373cee51a12322c59050fc50e926d
2022-06-03 16:35:13 -07:00
Yanqin Jin d739de63e5 Fix a bug in WAL tracking (#10087)
Summary:
Closing https://github.com/facebook/rocksdb/issues/10080

When `SyncWAL()` calls `MarkLogsSynced()`, even if there is only one active WAL file,
this event should still be added to the MANIFEST.

Pull Request resolved: https://github.com/facebook/rocksdb/pull/10087

Test Plan: make check

Reviewed By: ajkr

Differential Revision: D36797580

Pulled By: riversand963

fbshipit-source-id: 24184c9dd606b3939a454ed41de6e868d1519999
2022-06-03 16:33:00 -07:00
Guido Tagliavini Ponce eb99e08076 Add support for FastLRUCache in cache_bench (#10095)
Summary:
cache_bench can now run with FastLRUCache.

Pull Request resolved: https://github.com/facebook/rocksdb/pull/10095

Test Plan:
- Temporarily add an ``assert(false)`` in the execution path that sets up the FastLRUCache. Run ``make -j24 cache_bench``. Then test the appropriate code is used by running ``./cache_bench -cache_type=fast_lru_cache`` and checking that the assert is called. Repeat for LRUCache.
- Verify that FastLRUCache (currently a clone of LRUCache) has similar latency distribution than LRUCache, by comparing the outputs of ``./cache_bench -cache_type=fast_lru_cache`` and ``./cache_bench -cache_type=lru_cache``.

Reviewed By: pdillinger

Differential Revision: D36875834

Pulled By: guidotag

fbshipit-source-id: eb2ad0bb32c2717a258a6ac66ed736e06f826cd8
2022-06-03 13:40:09 -07:00
zczhu 21906d66f6 Add default impl to dir close (#10101)
Summary:
As pointed by anand1976 in his [comment](https://github.com/facebook/rocksdb/pull/10049#pullrequestreview-994255819), previous implementation is not backward-compatible. In this implementation, the default implementation `return Status::NotSupported("Close")` or `return IOStatus::NotSupported("Close")` is added for `Close()` function for `*Directory` classes.

Pull Request resolved: https://github.com/facebook/rocksdb/pull/10101

Test Plan: DBBasicTest.DBCloseAllDirectoryFDs

Reviewed By: anand1976

Differential Revision: D36899346

Pulled By: littlepig2013

fbshipit-source-id: 430624793362f330cbb8837960f0e8712a944ab9
2022-06-03 12:53:28 -07:00
Guido Tagliavini Ponce cf85607795 Add support for FastLRUCache in db_bench. (#10096)
Summary:
db_bench can now run with FastLRUCache.

Pull Request resolved: https://github.com/facebook/rocksdb/pull/10096

Test Plan:
- Temporarily add an ``assert(false)`` in the execution path that sets up the FastLRUCache. Run ``make -j24 db_bench``. Then test the appropriate code is used by running ``./db_bench -cache_type=fast_lru_cache`` and checking that the assert is called. Repeat for LRUCache.
- Verify that FastLRUCache (currently a clone of LRUCache) produces similar benchmark data than LRUCache, by comparing the outputs of ``./db_bench -benchmarks=fillseq,fillrandom,readseq,readrandom -cache_type=fast_lru_cache`` and ``./db_bench -benchmarks=fillseq,fillrandom,readseq,readrandom -cache_type=lru_cache``.

Reviewed By: gitbw95

Differential Revision: D36898774

Pulled By: guidotag

fbshipit-source-id: f9f6b6f6da124f88b21b3c8dee742fbb04eff773
2022-06-03 11:16:49 -07:00
Yanqin Jin 2b3c50c429 Temporarily disable wal compression (#10108)
Summary:
Will re-enable after fixing the bug in https://github.com/facebook/rocksdb/issues/10099 and https://github.com/facebook/rocksdb/issues/10097.
Right now, the priority is https://github.com/facebook/rocksdb/issues/10087, but the bug in WAL compression prevents the mini crash test from passing.

Pull Request resolved: https://github.com/facebook/rocksdb/pull/10108

Reviewed By: pdillinger

Differential Revision: D36897214

Pulled By: riversand963

fbshipit-source-id: d64dc52738222d5f66003f7731dc46eaeed812be
2022-06-03 10:22:52 -07:00
Mark Callaghan 5506954b1f Enhance to support more tuning options, and universal and integrated… (#9704)
Summary:
… BlobDB for all tests

This does two big things:
* provides more tuning options
* supports universal and integrated BlobDB for all of the benchmarks that are leveled-only

It does several smaller things, and I will list a few
* sets l0_slowdown_writes_trigger which wasn't set before this diff.
* improves readability in report.tsv by using smaller field names in the header
* adds more columns to report.tsv

report.tsv before this diff:
```
ops_sec mb_sec  total_size_gb   level0_size_gb  sum_gb  write_amplification     write_mbps      usec_op percentile_50   percentile_75   percentile_99   percentile_99.9 percentile_99.99        uptime  stall_time      stall_percent   test_name       test_date      rocksdb_version  job_id
823294  329.8   0.0     21.5    21.5    1.0     183.4   1.2     1.0     1.0     3       6       14      120     00:00:0.000     0.0     fillseq.wal_disabled.v400       2022-03-16T15:46:45.000-07:00   7.0
326520  130.8   0.0     0.0     0.0     0.0     0       12.2    139.8   155.1   170     234     250     60      00:00:0.000     0.0     multireadrandom.t4      2022-03-16T15:48:47.000-07:00   7.0
86313   345.7   0.0     0.0     0.0     0.0     0       46.3    44.8    50.6    75      84      108     60      00:00:0.000     0.0     revrangewhilewriting.t4 2022-03-16T15:50:48.000-07:00   7.0
101294  405.7   0.0     0.1     0.1     1.0     1.6     39.5    40.4    45.9    64      75      103     62      00:00:0.000     0.0     fwdrangewhilewriting.t4 2022-03-16T15:52:50.000-07:00   7.0
258141  103.4   0.0     0.1     1.2     18.2    19.8    15.5    14.3    18.1    28      34      48      62      00:00:0.000     0.0     readwhilewriting.t4     2022-03-16T15:54:51.000-07:00   7.0
334690  134.1   0.0     7.6     18.7    4.2     308.8   12.0    11.8    13.7    21      30      62      62      00:00:0.000     0.0     overwrite.t4.s0 2022-03-16T15:56:53.000-07:00   7.0
```
report.tsv with this diff:
```
ops_sec mb_sec  lsm_sz  blob_sz c_wgb   w_amp   c_mbps  c_wsecs c_csecs b_rgb   b_wgb   usec_op p50     p99     p99.9   p99.99  pmax    uptime  stall%  Nstall  u_cpu   s_cpu   rss     test    date    version job_id
831144  332.9   22GB    0.0GB,  21.7    1.0     185.1   264     262     0       0       1.2     1.0     3       6       14      9198    120     0.0     0       0.4     0.0     0.7     fillseq.wal_disabled.v400       2022-03-16T16:21:23     7.0
325229  130.3   22GB    0.0GB,  0.0             0.0     0       0       0       0       12.3    139.8   170     237     249     572     60      0.0     0       0.4     0.1     1.2     multireadrandom.t4      2022-03-16T16:23:25     7.0
312920  125.3   26GB    0.0GB,  11.1    2.6     189.3   115     113     0       0       12.8    11.8    21      34      1255    6442    60      0.2     1       0.7     0.1     0.6     overwritesome.t4.s0     2022-03-16T16:25:27     7.0
81698   327.2   25GB    0.0GB,  0.0             0.0     0       0       0       0       48.9    46.2    79      246     369     9445    60      0.0     0       0.4     0.1     1.4     revrangewhilewriting.t4 2022-03-16T16:30:21     7.0
92484   370.4   25GB    0.0GB,  0.1     1.5     1.1     1       0       0       0       43.2    42.3    75      103     110     9512    62      0.0     0       0.4     0.1     1.4     fwdrangewhilewriting.t4 2022-03-16T16:32:24     7.0
241661  96.8    25GB    0.0GB,  0.1     1.5     1.1     1       0       0       0       16.5    17.1    30      34      49      9092    62      0.0     0       0.4     0.1     1.4     readwhilewriting.t4     2022-03-16T16:34:27     7.0
305234  122.3   30GB    0.0GB,  12.1    2.7     201.7   127     124     0       0       13.1    11.8    21      128     1934    6339    62      0.0     0       0.7     0.1     0.7     overwrite.t4.s0 2022-03-16T16:36:30     7.0
```

Pull Request resolved: https://github.com/facebook/rocksdb/pull/9704

Test Plan: run it

Reviewed By: jay-zhuang

Differential Revision: D36864627

Pulled By: mdcallag

fbshipit-source-id: d5af1cfc258a16865210163fa6fd1b803ab1a7d3
2022-06-03 08:20:10 -07:00
Levi Tamasi 7b2c0140ba Fix Java build (#10105)
Summary: Pull Request resolved: https://github.com/facebook/rocksdb/pull/10105

Reviewed By: cbi42

Differential Revision: D36891073

Pulled By: ltamasi

fbshipit-source-id: 16487ec708fc96add2a1ebc2d98f6439dfc852ca
2022-06-03 08:11:31 -07:00
Levi Tamasi b8fe7df2e5 Fix LITE build (#10106)
Summary: Pull Request resolved: https://github.com/facebook/rocksdb/pull/10106

Reviewed By: cbi42

Differential Revision: D36891284

Pulled By: ltamasi

fbshipit-source-id: 304ffa84549201659feb0b74d6ba54a83f08906b
2022-06-02 23:42:41 -07:00
zczhu e88d8935ae Add comments/permit unchecked error to close_db_dir pull requests (#10093)
Summary:
In [close_db_dir](https://github.com/facebook/rocksdb/pull/10049) pull request, some merging conflicts occurred (some comments and one line `s.PermitUncheckedError()` are missing). This pull request aims to put them back.

Pull Request resolved: https://github.com/facebook/rocksdb/pull/10093

Reviewed By: ajkr

Differential Revision: D36884117

Pulled By: littlepig2013

fbshipit-source-id: 8c0e2a8793fc52804067c511843bd1ff4912c1c3
2022-06-02 21:52:35 -07:00
Yanqin Jin ed50ccd19a Install zstd on CircleCI linux (#10102)
Summary: Pull Request resolved: https://github.com/facebook/rocksdb/pull/10102

Reviewed By: jay-zhuang

Differential Revision: D36885468

Pulled By: riversand963

fbshipit-source-id: 6ed5b62dda8fe0f4be4b66d09bdec0134cf4500c
2022-06-02 21:38:29 -07:00
Gang Liao e6432dfd4c Make it possible to enable blob files starting from a certain LSM tree level (#10077)
Summary:
Currently, if blob files are enabled (i.e. `enable_blob_files` is true), large values are extracted both during flush/recovery (when SST files are written into level 0 of the LSM tree) and during compaction into any LSM tree level. For certain use cases that have a mix of short-lived and long-lived values, it might make sense to support extracting large values only during compactions whose output level is greater than or equal to a specified LSM tree level (e.g. compactions into L1/L2/... or above). This could reduce the space amplification caused by large values that are turned into garbage shortly after being written at the price of some write amplification incurred by long-lived values whose extraction to blob files is delayed.

In order to achieve this, we would like to do the following:
- Add a new configuration option `blob_file_starting_level` (default: 0) to `AdvancedColumnFamilyOptions` (and `MutableCFOptions` and extend the related logic)
- Instantiate `BlobFileBuilder` in `BuildTable` (used during flush and recovery, where the LSM tree level is L0) and `CompactionJob` iff `enable_blob_files` is set and the LSM tree level is `>= blob_file_starting_level`
- Add unit tests for the new functionality, and add the new option to our stress tests (`db_stress` and `db_crashtest.py` )
- Add the new option to our benchmarking tool `db_bench` and the BlobDB benchmark script `run_blob_bench.sh`
- Add the new option to the `ldb` tool (see https://github.com/facebook/rocksdb/wiki/Administration-and-Data-Access-Tool)
- Ideally extend the C and Java bindings with the new option
- Update the BlobDB wiki to document the new option.

Pull Request resolved: https://github.com/facebook/rocksdb/pull/10077

Reviewed By: ltamasi

Differential Revision: D36884156

Pulled By: gangliao

fbshipit-source-id: 942bab025f04633edca8564ed64791cb5e31627d
2022-06-02 20:04:33 -07:00
Jay Zhuang a020031552 Add kLastTemperature as temperature high bound (#10044)
Summary:
Only used as temperature high bound for current code, may
increase with more temperatures added.

Pull Request resolved: https://github.com/facebook/rocksdb/pull/10044

Test Plan: ci

Reviewed By: siying

Differential Revision: D36633410

Pulled By: jay-zhuang

fbshipit-source-id: eecdfa7623c31778c31d789902eacf78aad7b482
2022-06-02 13:10:49 -07:00
Gang Liao 3dc6ebaf74 Support specifying blob garbage collection parameters when CompactRange() (#10073)
Summary:
Garbage collection is generally controlled by the BlobDB configuration options `enable_blob_garbage_collection` and `blob_garbage_collection_age_cutoff`. However, there might be use cases where we would want to temporarily override these options while performing a manual compaction. (One use case would be doing a full key-space manual compaction with full=100% garbage collection age cutoff in order to minimize the space occupied by the database.) Our goal here is to make it possible to override the configured GC parameters when using the `CompactRange` API to perform manual compactions. This PR would involve:

- Extending the `CompactRangeOptions` structure so clients can both force-enable and force-disable GC, as well as use a different cutoff than what's currently configured
- Storing whether blob GC should actually be enabled during a certain manual compaction and the cutoff to use in the `Compaction` object (considering the above overrides) and passing it to `CompactionIterator` via `CompactionProxy`
- Updating the BlobDB wiki to document the new options.

Pull Request resolved: https://github.com/facebook/rocksdb/pull/10073

Test Plan: Adding unit tests and adding the new options to the stress test tool.

Reviewed By: ltamasi

Differential Revision: D36848700

Pulled By: gangliao

fbshipit-source-id: c878ef101d1c612429999f513453c319f75d78e9
2022-06-01 19:40:26 -07:00
Zichen Zhu 65893ad959 Explicitly closing all directory file descriptors (#10049)
Summary:
Currently, the DB directory file descriptor is left open until the deconstruction process (`DB::Close()` does not close the file descriptor). To verify this, comment out the lines between `db_ = nullptr` and `db_->Close()` (line 512, 513, 514, 515 in ldb_cmd.cc) to leak the ``db_'' object, build `ldb` tool and run
```
strace --trace=open,openat,close ./ldb --db=$TEST_TMPDIR --ignore_unknown_options put K1 V1 --create_if_missing
```
There is one directory file descriptor that is not closed in the strace log.

Pull Request resolved: https://github.com/facebook/rocksdb/pull/10049

Test Plan: Add a new unit test DBBasicTest.DBCloseAllDirectoryFDs: Open a database with different WAL directory and three different data directories, and all directory file descriptors should be closed after calling Close(). Explicitly call Close() after a directory file descriptor is not used so that the counter of directory open and close should be equivalent.

Reviewed By: ajkr, hx235

Differential Revision: D36722135

Pulled By: littlepig2013

fbshipit-source-id: 07bdc2abc417c6b30997b9bbef1f79aa757b21ff
2022-06-01 18:03:34 -07:00
Guido Tagliavini Ponce b4d0e041d0 Add support for FastLRUCache in stress and crash tests. (#10081)
Summary:
Stress tests can run with the experimental FastLRUCache. Crash tests randomly choose between LRUCache and FastLRUCache.

Since only LRUCache supports a secondary cache, we validate the `--secondary_cache_uri` and `--cache_type` flags---when `--secondary_cache_uri` is set, the `--cache_type` is set to `lru_cache`.

Pull Request resolved: https://github.com/facebook/rocksdb/pull/10081

Test Plan:
- To test that the FastLRUCache is used and the stress test runs successfully, run `make -j24 CRASH_TEST_EXT_ARGS=—duration=960 blackbox_crash_test_with_atomic_flush`. The cache type should sometimes be `fast_lru_cache`.
- To test the flag validation, run `make -j24 CRASH_TEST_EXT_ARGS="--duration=960 --secondary_cache_uri=x" blackbox_crash_test_with_atomic_flush` multiple times. The test will always be aborted (which is okay). Check that the cache type is always `lru_cache`.

Reviewed By: anand1976

Differential Revision: D36839908

Pulled By: guidotag

fbshipit-source-id: ebcdfdcd12ec04c96c09ae5b9c9d1e613bdd1725
2022-06-01 18:00:28 -07:00
Akanksha Mahajan 45b1c788c4 Update History.md for #9922 (#10092)
Summary: Pull Request resolved: https://github.com/facebook/rocksdb/pull/10092

Reviewed By: riversand963

Differential Revision: D36832311

Pulled By: akankshamahajan15

fbshipit-source-id: 8fb1cf90b1d4dddebbfbeebeddb15f6905968e9b
2022-06-01 15:36:55 -07:00
Jay Zhuang 5864900cf4 Get current LogFileNumberSize the same as log_writer (#10086)
Summary:
`db_impl.alive_log_files_` is used to track the WAL size in `db_impl.logs_`.
Get the `LogFileNumberSize` obj in `alive_log_files_` the same time as `log_writer` to keep them consistent.
For this issue, it's not safe to do `deque::reverse_iterator::operator*` and `deque::pop_front()` concurrently,
so remove the tail cache.

Pull Request resolved: https://github.com/facebook/rocksdb/pull/10086

Test Plan:
```
# on Windows
gtest-parallel ./db_test --gtest_filter=DBTest.FileCreationRandomFailure -r 1000 -w 100
```

Reviewed By: riversand963

Differential Revision: D36822373

Pulled By: jay-zhuang

fbshipit-source-id: 5e738051dfc7bcf6a15d85ba25e6365df6b6a6af
2022-06-01 15:33:22 -07:00
Changyu Bi 463873f1bb Add bug fix to HISTORY.md (#10091)
Summary:
Add to HISTORY.md the bug fixed in https://github.com/facebook/rocksdb/issues/10051

Pull Request resolved: https://github.com/facebook/rocksdb/pull/10091

Reviewed By: ajkr

Differential Revision: D36821861

Pulled By: cbi42

fbshipit-source-id: 598812fab88f65c0147ece53cff55cf4ea73aac6
2022-06-01 14:07:13 -07:00
Peter Dillinger a00cffaf69 Reduce risk of backup or checkpoint missing a WAL file (#10083)
Summary:
We recently saw a case in crash test in which a WAL file in the
middle of the list of live WALs was not included in the backup, so the
DB was not openable due to missing WAL. We are not sure why, but this
change should at least turn that into a backup-time failure by ensuring
all the WAL files expected by the manifest (according to VersionSet) are
included in `GetSortedWalFiles()` (used by `GetLiveFilesStorageInfo()`,
`BackupEngine`, and `Checkpoint`)

Related: to maximize the effectiveness of
track_and_verify_wals_in_manifest with GetSortedWalFiles() during
checkpoint/backup, we will now sync WAL in GetLiveFilesStorageInfo()
when track_and_verify_wals_in_manifest=true.

Pull Request resolved: https://github.com/facebook/rocksdb/pull/10083

Test Plan: added new unit test for the check in GetSortedWalFiles()

Reviewed By: ajkr

Differential Revision: D36791608

Pulled By: pdillinger

fbshipit-source-id: a27bcf0213fc7ab177760fede50d4375d579afa6
2022-06-01 11:02:27 -07:00
Akanksha Mahajan d04df2752a Persist the new MANIFEST after successfully syncing the new WAL during recovery (#9922)
Summary:
In case of non-TransactionDB and avoid_flush_during_recovery = true, RocksDB won't
flush the data from WAL to L0 for all column families if possible. As a
result, not all column families can increase their log_numbers, and
min_log_number_to_keep won't change.
For transaction DB (.allow_2pc), even with the flush, there may be old WAL files that it must not delete because they can contain data of uncommitted transactions and min_log_number_to_keep won't change.
If we persist a new MANIFEST with
advanced log_numbers for some column families, then during a second
crash after persisting the MANIFEST, RocksDB will see some column
families' log_numbers larger than the corrupted wal, and the "column family inconsistency" error will be hit, causing recovery to fail.

As a solution, RocksDB will persist the new MANIFEST after successfully syncing the new WAL.
If a future recovery starts from the new MANIFEST, then it means the new WAL is successfully synced. Due to the sentinel empty write batch at the beginning, kPointInTimeRecovery of WAL is guaranteed to go after this point.
If future recovery starts from the old MANIFEST, it means the writing the new MANIFEST failed. We won't have the "SST ahead of WAL" error.
Currently, RocksDB DB::Open() may creates and writes to two new MANIFEST files even before recovery succeeds. This PR buffers the edits in a structure and writes to a new MANIFEST after recovery is successful

Pull Request resolved: https://github.com/facebook/rocksdb/pull/9922

Test Plan:
1. Update unit tests to fail without this change
2. make crast_test -j

Branch with unit test and no fix  https://github.com/facebook/rocksdb/pull/9942 to keep track of unit test (without fix)

Reviewed By: riversand963

Differential Revision: D36043701

Pulled By: akankshamahajan15

fbshipit-source-id: 5760970db0a0920fb73d3c054a4155733500acd9
2022-06-01 10:52:26 -07:00
Yanqin Jin 7c8c803938 Remove unused variable `single_column_family_mode_` (#10078)
Summary:
This variable is actually not being used for anything meaningful, thus remove it.

This can make https://github.com/facebook/rocksdb/issues/7516 slightly simpler by reducing the amount of state that must be made lock-free.

Pull Request resolved: https://github.com/facebook/rocksdb/pull/10078

Test Plan: make check

Reviewed By: ajkr

Differential Revision: D36779817

Pulled By: riversand963

fbshipit-source-id: ffb0d9ad6149616917ae5e02bb28102cb90fc406
2022-05-31 13:03:37 -07:00
Jay Zhuang 151dc0038a Bypass tests instead of skipping (#10076)
Summary:
Make fb test infra happy, more details: https://github.com/facebook/rocksdb/issues/8048

Pull Request resolved: https://github.com/facebook/rocksdb/pull/10076

Test Plan: CI

Reviewed By: ajkr

Differential Revision: D36768766

Pulled By: jay-zhuang

fbshipit-source-id: 4f039a5c623abb6d4a7d09bbf97077618e7ec2c8
2022-05-31 13:02:50 -07:00
Yanqin Jin 5ab5537d79 Deflake unit test BackupEngineTest.Concurrency (#10069)
Summary:
After https://github.com/facebook/rocksdb/issues/9984, BackupEngineTest.Concurrency becomes flaky.

During DB::Open(), someone else can rename/remove the LOG file, causing
this thread's `CreateLoggerFromOptions()` to fail. The reason is that the operation sequence
of "FileExists -> Rename" is not atomic. It's possible that a FileExists() returns OK, but the file
gets deleted before Rename(), causing the latter to return IOError with PathNotFound subcode.

Although it's not encouraged to concurrently modify the contents of the directories managed by
the database instance in this case, we can still perform some simple handling to make DB::Open()
more robust. In this case, we can check if a racing thread has deleted the original LOG file, we can
allow this thread to continue creating a new LOG file.

Pull Request resolved: https://github.com/facebook/rocksdb/pull/10069

Test Plan: ~/gtest-parallel/gtest-parallel -r 100 ./backup_engine_test --gtest_filter=BackupEngineTest.Concurrency

Reviewed By: jay-zhuang

Differential Revision: D36736913

Pulled By: riversand963

fbshipit-source-id: 3cbe92d77ca175e55e586bdb1a32ac8107217ae6
2022-05-31 09:36:32 -07:00
Changyu Bi 9baeef712f Fix unittest ExternalSSTFileBasicTest.StableSnapshotWhileLoggingToManifest (#10066)
Summary:
Fix the unittest `ExternalSSTFileBasicTest.StableSnapshotWhileLoggingToManifest` introduced in https://github.com/facebook/rocksdb/issues/10051 that is failing.

Pull Request resolved: https://github.com/facebook/rocksdb/pull/10066

Test Plan: CI

Reviewed By: ajkr

Differential Revision: D36720669

Pulled By: cbi42

fbshipit-source-id: 47a6d2c161f27b605ede5c62d1776eecaf0d5363
2022-05-31 08:48:57 -07:00
Andrea Pappacoda a0f391cafc build: fix pkg-config file generation (#9953)
Summary:
- Instead of hardcoding "lib" and "include" in `libdir` and `includedir`, use the values from [`GNUInstallDirs`](https://cmake.org/cmake/help/latest/module/GNUInstallDirs.html).

- Use `PROJECT_DESCRIPTION` and `PROJECT_HOMEPAGE_URL` instead of their
`CMAKE_` conterparts to fix pkg-config generation when rocksdb is not the top-level project (see [`project()`](https://cmake.org/cmake/help/latest/command/project.html)).

- Drop explicit `CMAKE_CURRENT_SOURCE_DIR` and `CMAKE_CURRENT_BINARY_DIR` in [`configure_file()`](https://cmake.org/cmake/help/latest/command/configure_file.html) as that's implied by default (and quite intuitive).

See https://github.com/facebook/rocksdb/issues/9945
CC: trynity

Pull Request resolved: https://github.com/facebook/rocksdb/pull/9953

Reviewed By: ajkr

Differential Revision: D36716373

Pulled By: jay-zhuang

fbshipit-source-id: 57840eeb4453099fa1fe861dc03366085dbca704
2022-05-30 12:46:40 -07:00
Jay Zhuang 0adac6f88e Deflake Transaction stress tests (#10063)
Summary:
TSAN test is slower, for `TransactionStressTest` and
`DeadlockStress`, they're reaching the timeout limit of 600 seconds.
Decreasing the transaction test number.

Pull Request resolved: https://github.com/facebook/rocksdb/pull/10063

Test Plan: CI

Reviewed By: ajkr

Differential Revision: D36711727

Pulled By: jay-zhuang

fbshipit-source-id: 600f82a6d32108f52fbe5572fcc7497607b7fe98
2022-05-30 12:34:43 -07:00
Jay Zhuang 460b44c07f Deflake column_family_test to avoid hang (#10060)
Summary:
Tests could hang because of flags are not test and set
atomiclly, so it's waiting for a sync point forever.

Pull Request resolved: https://github.com/facebook/rocksdb/pull/10060

Test Plan: CI

Reviewed By: ajkr

Differential Revision: D36706311

Pulled By: jay-zhuang

fbshipit-source-id: d54b8053ce51b2de74162b28f496c048519b6cde
2022-05-30 12:31:46 -07:00
Jaepil Jeong 4eb7b35f6d Fix compile error in Clang 13 (#10033)
Summary:
This PR fixes the following compilation error in Clang 13, which was tested on macOS 12.4.

```
❯ ninja clean && ninja
[1/1] Cleaning all built files...
Cleaning... 0 files.
[198/315] Building CXX object CMakeFiles/rocksdb.dir/util/cleanable.cc.o
FAILED: CMakeFiles/rocksdb.dir/util/cleanable.cc.o
ccache /opt/homebrew/opt/llvm/bin/clang++ -DGFLAGS=1 -DGFLAGS_IS_A_DLL=0 -DHAVE_FULLFSYNC -DJEMALLOC_NO_DEMANGLE -DLZ4 -DOS_MACOSX -DROCKSDB_JEMALLOC -DROCKSDB_LIB_IO_POSIX -DROCKSDB_NO_DYNAMIC_EXTENSION -DROCKSDB_PLATFORM_POSIX -DSNAPPY -DTBB -DZLIB -DZSTD -I/Users/jaepil/work/deepsearch/deps/cpp/rocksdb -I/Users/jaepil/work/deepsearch/deps/cpp/rocksdb/include -I/Users/jaepil/app/include -I/opt/homebrew/include -I/opt/homebrew/opt/llvm/include -W -Wextra -Wall -pthread -Wsign-compare -Wshadow -Wno-unused-parameter -Wno-unused-variable -Woverloaded-virtual -Wnon-virtual-dtor -Wno-missing-field-initializers -Wno-strict-aliasing -Wno-invalid-offsetof -fno-omit-frame-pointer -momit-leaf-frame-pointer -march=armv8-a+crc+crypto -Wno-unused-function -Werror -O3 -DNDEBUG -DROCKSDB_USE_RTTI -arch arm64 -isysroot /Library/Developer/CommandLineTools/SDKs/MacOSX12.3.sdk -std=gnu++17 -MD -MT CMakeFiles/rocksdb.dir/util/cleanable.cc.o -MF CMakeFiles/rocksdb.dir/util/cleanable.cc.o.d -o CMakeFiles/rocksdb.dir/util/cleanable.cc.o -c /Users/jaepil/work/deepsearch/deps/cpp/rocksdb/util/cleanable.cc
/Users/jaepil/work/deepsearch/deps/cpp/rocksdb/util/cleanable.cc:24:65: error: no member named 'move' in namespace 'std'
Cleanable::Cleanable(Cleanable&& other) noexcept { *this = std::move(other); }
                                                           ~~~~~^
/Users/jaepil/work/deepsearch/deps/cpp/rocksdb/util/cleanable.cc:126:16: error: no member named 'move' in namespace 'std'
  *this = std::move(from);
          ~~~~~^
2 errors generated.
[209/315] Building CXX object CMakeFiles/rocksdb.dir/tools/block_cache_analyzer/block_cache_trace_analyzer.cc.o
ninja: build stopped: subcommand failed.
```

Pull Request resolved: https://github.com/facebook/rocksdb/pull/10033

Reviewed By: jay-zhuang

Differential Revision: D36580562

Pulled By: ajkr

fbshipit-source-id: 0f6b241d186ed528ad62d259af2857d2c2b4ded1
2022-05-28 00:15:28 -07:00
Yanqin Jin 514f0b0937 Fail DB::Open() if logger cannot be created (#9984)
Summary:
For regular db instance and secondary instance, we return error and refuse to open DB if Logger creation fails.

Our current code allows it, but it is really difficult to debug because
there will be no LOG files. The same for OPTIONS file, which will be explored in another PR.

Furthermore, Arena::AllocateAligned(size_t bytes, size_t huge_page_size, Logger* logger) has an
assertion as the following:

```cpp
#ifdef MAP_HUGETLB
if (huge_page_size > 0 && bytes > 0) {
  assert(logger != nullptr);
}
#endif
```

It can be removed.

Pull Request resolved: https://github.com/facebook/rocksdb/pull/9984

Test Plan: make check

Reviewed By: jay-zhuang

Differential Revision: D36347754

Pulled By: riversand963

fbshipit-source-id: 529798c0511d2eaa2f0fd40cf7e61c4cbc6bc57e
2022-05-27 07:23:31 -07:00
Gang Liao e228515740 Pass the size of blob files to SstFileManager during DB open (#10062)
Summary:
RocksDB uses the (no longer aptly named) SST file manager (see https://github.com/facebook/rocksdb/wiki/Managing-Disk-Space-Utilization) to track and potentially limit the space used by SST and blob files (as well as to rate-limit the deletion of these data files). The SST file manager tracks the SST and blob file sizes in an in-memory hash map, which has to be rebuilt during DB open. File sizes can be generally obtained by querying the file system; however, there is a performance optimization possibility here since the sizes of SST and blob files are also tracked in the RocksDB MANIFEST, so we can simply pass the file sizes stored there instead of consulting the file system for each file. Currently, this optimization is only implemented for SST files; we would like to extend it to blob files as well.

Pull Request resolved: https://github.com/facebook/rocksdb/pull/10062

Test Plan:
Add unit tests for the change to the test suite
ltamasi riversand963  akankshamahajan15

Reviewed By: ltamasi

Differential Revision: D36726621

Pulled By: gangliao

fbshipit-source-id: 4010dc46ef7306142f1c2e0d1c3bf75b196ef82a
2022-05-27 05:58:43 -07:00
Yu Zhang 8c4ea7b851 Add timestamp support to secondary instance (#10061)
Summary:
This PR adds timestamp support to the secondary DB instance.

With this, these timestamp related APIs are supported:

ReadOptions.timestamp : read should return the latest data visible to this specified timestamp
Iterator::timestamp() : returns the timestamp associated with the key, value
DB:Get(..., std::string* timestamp) : returns the timestamp associated with the key, value in timestamp

Test plan (on devserver):
```
$COMPILE_WITH_ASAN=1 make -j24 all
$./db_secondary_test --gtest_filter=DBSecondaryTestWithTimestamp*
```

Pull Request resolved: https://github.com/facebook/rocksdb/pull/10061

Reviewed By: riversand963

Differential Revision: D36722915

Pulled By: jowlyzhang

fbshipit-source-id: 644ada39e4e51164a759593478c38285e0c1a666
2022-05-26 19:45:31 -07:00
Andrew Kryczka f6e45382e9 Disable file ingestion in crash test for CF consistency (#10067)
Summary: Pull Request resolved: https://github.com/facebook/rocksdb/pull/10067

Reviewed By: jay-zhuang

Differential Revision: D36727948

Pulled By: ajkr

fbshipit-source-id: a3502730412c01ba63d822a5d4bf56f8bae8fcb2
2022-05-26 17:41:30 -07:00
tagliavini 6c50082654 Remove code that only compiles for Visual Studio versions older than 2015 (#10065)
Summary:
There are currently some preprocessor checks that assume support for Visual Studio versions older than 2015 (i.e., 0 < _MSC_VER < 1900), although we don't support them any more.

We removed all code that only compiles on those older versions, except third-party/ files.

The ROCKSDB_NOEXCEPT symbol is now obsolete, since it now always gets replaced by noexcept. We removed it.

Pull Request resolved: https://github.com/facebook/rocksdb/pull/10065

Reviewed By: pdillinger

Differential Revision: D36721901

Pulled By: guidotag

fbshipit-source-id: a2892d365ef53cce44a0a7d90dd6b72ee9b5e5f2
2022-05-26 16:55:08 -07:00
Andrew Kryczka 91ba7837b7 Enable IngestExternalFile() in crash test (#9357)
Summary:
Thanks to https://github.com/facebook/rocksdb/issues/9919 and https://github.com/facebook/rocksdb/issues/10051 the known bugs in file ingestion (besides mmap read + file checksum) are fixed. Now we can  try again to enable file ingestion in crash test.

Pull Request resolved: https://github.com/facebook/rocksdb/pull/9357

Test Plan: stress file ingestion heavily for an hour: `$ TEST_TMPDIR=/dev/shm python3 tools/db_crashtest.py blackbox --max_key=1000000 --ingest_external_file_one_in=100 --duration=3600 --interval=20 --write_buffer_size=524288 --target_file_size_base=524288 --max_bytes_for_level_base=2097152`

Reviewed By: riversand963

Differential Revision: D33410746

Pulled By: ajkr

fbshipit-source-id: d276431390995a67f68390d61c06a40945fdd280
2022-05-26 10:31:37 -07:00
Muthu Krishnan c9c58a320f Add C API for User Defined Timestamp (#9914)
Summary:
Fixes https://github.com/facebook/rocksdb/issues/9889

Pull Request resolved: https://github.com/facebook/rocksdb/pull/9914

Reviewed By: akankshamahajan15

Differential Revision: D36599983

Pulled By: riversand963

fbshipit-source-id: 39000fb473f850d88359e90b287035257854af0d
2022-05-26 09:40:10 -07:00
Jie Liang Ang 4cf2f6723a Expose DisableManualCompaction and EnableManualCompaction to C api (#10052)
Summary:
Add `rocksdb_disable_manual_compaction` and `rocksdb_enable_manual_compaction`.

Note that `rocksdb_enable_manual_compaction` should be used with care and must not be called more times than `rocksdb_disable_manual_compaction` has been called.

Pull Request resolved: https://github.com/facebook/rocksdb/pull/10052

Reviewed By: ajkr

Differential Revision: D36665496

Pulled By: jay-zhuang

fbshipit-source-id: a4ae6e34694066feb21302ca1a5c365fb9de0ec7
2022-05-25 21:46:17 -07:00
Akanksha Mahajan 28ea1fb44a Provide support for IOTracing for ReadAsync API (#9833)
Summary:
Same as title

Pull Request resolved: https://github.com/facebook/rocksdb/pull/9833

Test Plan:
Add unit test and manually check the output of tracing logs
For fixed readahead_size it logs as:
```
Access Time : 193352113447923     , File Name: 000026.sst          , File Operation: ReadAsync         , Latency: 15075     , IO Status: OK, Length: 12288, Offset: 659456
Access Time : 193352113465232     , File Name: 000026.sst          , File Operation: ReadAsync         , Latency: 14425     , IO Status: OK, Length: 12288, Offset: 671744
Access Time : 193352113481539     , File Name: 000026.sst          , File Operation: ReadAsync         , Latency: 13062     , IO Status: OK, Length: 12288, Offset: 684032
Access Time : 193352113497692     , File Name: 000026.sst          , File Operation: ReadAsync         , Latency: 13649     , IO Status: OK, Length: 12288, Offset: 696320
Access Time : 193352113520043     , File Name: 000026.sst          , File Operation: ReadAsync         , Latency: 19384     , IO Status: OK, Length: 12288, Offset: 708608
Access Time : 193352113538401     , File Name: 000026.sst          , File Operation: ReadAsync         , Latency: 15406     , IO Status: OK, Length: 12288, Offset: 720896
Access Time : 193352113554855     , File Name: 000026.sst          , File Operation: ReadAsync         , Latency: 13670     , IO Status: OK, Length: 12288, Offset: 733184
Access Time : 193352113571624     , File Name: 000026.sst          , File Operation: ReadAsync         , Latency: 13855     , IO Status: OK, Length: 12288, Offset: 745472
Access Time : 193352113587924     , File Name: 000026.sst          , File Operation: ReadAsync         , Latency: 13953     , IO Status: OK, Length: 12288, Offset: 757760
Access Time : 193352113603285     , File Name: 000026.sst          , File Operation: Prefetch          , Latency: 59        , IO Status: Not implemented: Prefetch not supported, Length: 8868, Offset: 898349
```

For implicit readahead:
```
Access Time : 193351865156587     , File Name: 000026.sst          , File Operation: Prefetch          , Latency: 48        , IO Status: Not implemented: Prefetch not supported, Length: 12266, Offset: 391174
Access Time : 193351865160354     , File Name: 000026.sst          , File Operation: Prefetch          , Latency: 51        , IO Status: Not implemented: Prefetch not supported, Length: 12266, Offset: 395248
Access Time : 193351865164253     , File Name: 000026.sst          , File Operation: Prefetch          , Latency: 49        , IO Status: Not implemented: Prefetch not supported, Length: 12266, Offset: 399322
Access Time : 193351865165461     , File Name: 000026.sst          , File Operation: ReadAsync         , Latency: 222871    , IO Status: OK, Length: 135168, Offset: 401408
```

Reviewed By: anand1976

Differential Revision: D35601634

Pulled By: akankshamahajan15

fbshipit-source-id: 5a4f32a850af878efa0767bd5706380152a1f26e
2022-05-25 19:47:03 -07:00
Jay Zhuang 5490da20a5 Fix flaky db_basic_bench caused by unreleased iterator (#10058)
Summary:
Iterator is not freed after test is done (after the main for
loop), which could cause db close failed.

Pull Request resolved: https://github.com/facebook/rocksdb/pull/10058

Test Plan:
Able to reproduce consistently with higher thread number,
like 100, make sure it passes after the fix

Reviewed By: ajkr

Differential Revision: D36685823

Pulled By: jay-zhuang

fbshipit-source-id: 4c98b8758d106bfe40cae670e689c3d284765bcf
2022-05-25 18:02:04 -07:00
Peter Dillinger bd170dda03 Abort RocksDB performance regression test on failure in test setup (#10053)
Summary:
Pull Request resolved: https://github.com/facebook/rocksdb/pull/10053

Need to exit if ldb command fails, to avoid running db_bench on
empty/bad DB and considering the results valid.

Reviewed By: jay-zhuang

Differential Revision: D36673200

fbshipit-source-id: e0d78a0d397e0e335d82d9349bfd612d38ffb552
2022-05-25 13:35:08 -07:00
sdong 356f8c5d81 FindObsoleteFiles() to directly check whether candidate files are live (#10040)
Summary:
Right now, in FindObsoleteFiles() we build a list of all live SST files from all existing Versions. This is all done in DB mutex, and is O(m*n) where m is number of versions and n is number of files. In some extereme cases, it can take very long. The list is used to see whether a candidate file still shows up in a version. With this commit, every candidate file is directly check against all the versions for file existance. This operation would be O(m*k) where k is number of candidate files. Since is usually small (except perhaps full compaction in universal compaction), this is usually much faster than the previous solution.

Pull Request resolved: https://github.com/facebook/rocksdb/pull/10040

Test Plan: TBD

Reviewed By: riversand963

Differential Revision: D36613391

fbshipit-source-id: 3f13b090f755d9b3ae417faec62cd6e798bac1eb
2022-05-25 12:43:48 -07:00
Changyu Bi b0e190604b Update VersionSet last seqno after LogAndApply (#10051)
Summary:
This PR fixes the issue of unstable snapshot during external SST file ingestion. Credit ajkr for the following walk through:  consider these relevant steps for of IngestExternalFile():

(1) increase seqno while holding mutex -- 677d2b4a8f/db/db_impl/db_impl.cc (L4768)
(2) LogAndApply() -- 677d2b4a8f/db/db_impl/db_impl.cc (L4797-L4798)
  (a) write to MANIFEST with mutex released a96a4a2f7b/db/version_set.cc (L4407)
  (b) apply to in-memory state with mutex held

A snapshot taken during (2a) will be unstable. In particular, queries against that snapshot will not include data from the ingested file before (2b), and will include data from the ingested file after (2b).

Pull Request resolved: https://github.com/facebook/rocksdb/pull/10051

Test Plan:
Added a new unit test: `ExternalSSTFileBasicTest.WriteAfterReopenStableSnapshotWhileLoggingToManifest`.
```
make external_sst_file_basic_test
./external_sst_file_basic_test
```

Reviewed By: ajkr

Differential Revision: D36654033

Pulled By: cbi42

fbshipit-source-id: bf720cca313e0cf211585960f3aff04853a31b96
2022-05-25 10:05:17 -07:00
Yiyuan Liu b71466e982 Improve transaction C-API (#9252)
Summary:
This PR wants to improve support for transaction in C-API:
* Support two-phase commit.
* Support `get_pinned` and `multi_get` in transaction.
* Add `rocksdb_transactiondb_flush`
* Support get writebatch from transaction and rebuild transaction from writebatch.

Pull Request resolved: https://github.com/facebook/rocksdb/pull/9252

Reviewed By: jay-zhuang

Differential Revision: D36459007

Pulled By: riversand963

fbshipit-source-id: 47371d527be821c496353a7fe2fd18d628069a98
2022-05-25 09:38:10 -07:00
Yanqin Jin 9901e7f681 Enable checkpoint and backup in db_stress when timestamp is enabled (#10047)
Summary:
After https://github.com/facebook/rocksdb/issues/10030 and https://github.com/facebook/rocksdb/issues/10004, we can enable checkpoint and backup in stress tests when
user-defined timestamp is enabled.

This PR has no production risk.

Pull Request resolved: https://github.com/facebook/rocksdb/pull/10047

Test Plan:
```
TEST_TMPDIR=/dev/shm make crash_test_with_ts
```

Reviewed By: jowlyzhang

Differential Revision: D36641565

Pulled By: riversand963

fbshipit-source-id: d86c9d87efcc34c32d1aa176af691d32b897644a
2022-05-24 18:25:43 -07:00