Summary:
`-Wextra-semi` or `-Wextra-semi-stmt`
If the code compiles, this is safe to land.
Reviewed By: dmm-fb
Differential Revision: D51778007
fbshipit-source-id: 5d1b20a3acc4bcc7cd7c204f2f73a14fc8f81883
Summary:
This is a simple refactor for the crash test script to put shared logic for parsing stderr into a function. There is no functional change.
Pull Request resolved: https://github.com/facebook/rocksdb/pull/12109
Test Plan: manually tested the script
Reviewed By: ajkr
Differential Revision: D51692172
Pulled By: jowlyzhang
fbshipit-source-id: d346d64e981d9c489c380ff6ce33296a224b5877
Summary:
Pull Request resolved: https://github.com/facebook/rocksdb/pull/11982
The patch constitutes the first phase of adding wide-column support to `WriteBatchWithIndex`. Namely, it implements the `PutEntity` API in `WriteBatchWithIndex` on the write path, and the `Iterator::columns()` API in `BaseDeltaIterator` on the read path. In addition, it updates all existing read APIs (`GetFromBatch`, `GetFromBatchAndDB`, `MultiGetFromBatchAndDB`, and `BaseDeltaIterator`) so that they handle wide-column entities correctly. This includes returning the value of the default column of entities as appropriate and correctly applying merges to wide-column base values. I plan to add the wide-column specific point lookup APIs (`GetEntityFromBatch`, `GetEntityFromBatchAndDB`, and `MultiGetEntityFromBatchAndDB`) in subsequent patches.
Reviewed By: jaykorean
Differential Revision: D50439231
fbshipit-source-id: 59fd0f12c45249fecde8af249c5d3f509ba58bbe
Summary:
The optimization to not find and delete obsolete files when FlushRequest is re-scheduled also inadvertently skipped flushing the `LogBuffer`, resulting in missed logs. This PR fixes the issue.
Pull Request resolved: https://github.com/facebook/rocksdb/pull/12105
Test Plan:
manually check this test has the correct info log after the fix
`./column_family_test --gtest_filter=ColumnFamilyRetainUDTTest.NotAllKeysExpiredFlushRescheduled`
Reviewed By: ajkr
Differential Revision: D51671079
Pulled By: jowlyzhang
fbshipit-source-id: da0640e07e35c69c08988772ed611ec9e67f2e92
Summary:
Add the option to have a 3-tier block cache (uncompressed RAM, compressed RAM, and local flash) in db_bench, as well as specifying secondary cache admission policy.
Pull Request resolved: https://github.com/facebook/rocksdb/pull/12104
Reviewed By: ajkr
Differential Revision: D51629092
Pulled By: anand1976
fbshipit-source-id: 6a208f853bc85d3d8b437d91cb1b0142d9a99e53
Summary:
We now support re-enabling the compressed portion of the `TieredCache` after dynamically disabling it. Add it to db_stress for testing purposes.
Pull Request resolved: https://github.com/facebook/rocksdb/pull/12102
Reviewed By: akankshamahajan15
Differential Revision: D51594259
Pulled By: anand1976
fbshipit-source-id: ea544e30a5ebd6290fc9ed46a241f09634764d2a
Summary:
I must have chosen trimming before frame 8 based on assertion failures, but that trims too many frame for a general segfault. So this changes to start printing at frame 4, as in this example where I've seeded a null deref:
```
Received signal 11 (Segmentation fault)
Invoking LLDB for stack trace...
Process 873208 stopped
* thread #1, name = 'db_stress', stop reason = signal SIGSTOP
frame #0: 0x00007fb1fe8f1033 libc.so.6`__GI___wait4(pid=873478, stat_loc=0x00007fb1fb114030, options=0, usage=0x0000000000000000) at wait4.c:30:10
thread #2, name = 'rocksdb:low', stop reason = signal SIGSTOP
frame #0: 0x00007fb1fe8972a1 libc.so.6`__GI___futex_abstimed_wait_cancelable64 at futex-internal.c:57:12
Executable module set to "/data/users/peterd/rocksdb/db_stress".
Architecture set to: x86_64-unknown-linux-gnu.
True
frame #4: 0x00007fb1fe844540 libc.so.6`__restore_rt at libc_sigaction.c:13
frame #5: 0x0000000000608514 db_stress`rocksdb::StressTest::InitDb(rocksdb::SharedState*) at db_stress_test_base.cc:345:18
frame #6: 0x0000000000585d62 db_stress`rocksdb::RunStressTestImpl(rocksdb::SharedState*) at db_stress_driver.cc:84:17
frame #7: 0x000000000058dd69 db_stress`rocksdb::RunStressTest(shared=0x00006120000001c0) at db_stress_driver.cc:266:34
frame #8: 0x0000000000453b34 db_stress`rocksdb::db_stress_tool(int, char**) at db_stress_tool.cc:370:20
...
```
Pull Request resolved: https://github.com/facebook/rocksdb/pull/12101
Test Plan: manual (see above)
Reviewed By: ajkr
Differential Revision: D51593217
Pulled By: pdillinger
fbshipit-source-id: 4a71eb8e516edbc32e682f9537bc77d073a7b4ed
Summary:
It's been relatively easy to break our stack trace printer:
* If another thread reaches a signal condition such as a related SEGV or assertion failure while one is trying to print a stack trace from the signal handler, it seems to end the process abruptly without a stack trace.
* If the process exits normally in one thread (such as main finishing) while another is trying to print a stack trace from the signal handler, it seems the process will often end normally without a stack trace.
This change attempts to fix these issues, with
* Keep the custom signal handler installed as long as possible, so that other threads will most likely re-enter our custom handler. (We only switch back to default for triggering core dump or whatever after stack trace.)
* Use atomics and sleeps to implement a crude recursive mutex for ensuring all threads hitting the custom signal handler wait on the first that is trying to print a stack trace, while recursive signals in the same thread can still be handled cleanly.
* Use an atexit handler to hook into normal exit to (a) wait on a pending printing of stack trace when detectable and applicable, and (b) detect and warn when printing a stack trace might be interrupted by a process exit in progress. (I don't know how to pause that *after* our atexit handler has been called; the best I know how to do is warn, "In a race with process already exiting...".)
Pull Request resolved: https://github.com/facebook/rocksdb/pull/12086
Test Plan:
manual, including with TSAN. I added this code to the end of a unit test file:
```
for (size_t i = 0; i < 3; ++i) {
std::thread t([]() { assert(false); });
t.detach();
}
```
Followed by either `sleep(100)` or `usleep(100)` or usual process exit. And for recursive signal testing, inject `abort()` at various places in the handler.
Reviewed By: cbi42
Differential Revision: D51531882
Pulled By: pdillinger
fbshipit-source-id: 3473b863a43e61b722dfb7a2ed12a8120949b09c
Summary:
Fixes https://github.com/facebook/rocksdb/issues/7930.
When there is a timestamp associated with stored records, get from row cache will return the timestamp provided in query instead of the timestamp associated with the stored record.
## Cause of error:
Currently a row_handle is fetched using row_cache_key(contains a timestamp provided by user query) and the row_handle itself does not persist timestamp associated with the object. Hence the [GetContext::SaveValue()
](6e3429b8a6/table/get_context.cc (L257)) function will fetch the timestamp in row_cache_key and may return the incorrect timestamp value.
## Proposed Solution
If current cf enables ts, append a timestamp associated with stored records after the value in replay_log (equivalently the value of row cache entry).
When read, `replayGetContextLog()` will update parsed_key with the correct timestamp.
Pull Request resolved: https://github.com/facebook/rocksdb/pull/11952
Reviewed By: ajkr
Differential Revision: D51501176
Pulled By: jowlyzhang
fbshipit-source-id: 808fc943a8ae95de56ae0e82ec59a2573a031f28
Summary:
These bugs surfaced while I was trying to add the stress test for the feature:
Bug 1) On the index building path: the optimization to use user key instead of internal key as separator needed a bit tweak for when user defined timestamps can be removed. Because even though the user key look different now and eligible to be used as separator, when their user-defined timestamps are removed, they could be equal and that invariant no longer stands.
Bug 2) On the index reading path: one path that builds the second level index iterator for `PartitionedIndexReader` are not passing the corresponding `user_defined_timestamps_persisted` flag. As a result, the default `true` value be used leading to no minimum timestamps padded when they should be.
Pull Request resolved: https://github.com/facebook/rocksdb/pull/12062
Test Plan:
For bug 1): added separate unit test `BlockBasedTableReaderTest::Get` to exercise the `Get` API. It's a different code path from `MultiGet` so worth having its own test. Also in order to cover the bug, the test is modified to generate key values with the same user provided key, different timestamps and different sequence numbers. The test reads back different versions of the same user provided key. `MultiGet` takes one `ReadOptions` with one read timestamp so we cannot test retrieving different versions of the same key easily.
For bug 2): simply added options `BlockBasedTableOptions.metadata_cache_options.partition_pinning = PinningTier::kAll` to exercise all the index iterator creating paths.
Reviewed By: ltamasi
Differential Revision: D51508280
Pulled By: jowlyzhang
fbshipit-source-id: 8b174d3d70373c0599266ac1f467f2bd4d7ea6e5
Summary:
The option "write_buffer_size" has changed from 4MB for 64MB by default, and the compact_files_example will not work as expected, as the test data written is only about 50MB and will not trigger compaction.
Pull Request resolved: https://github.com/facebook/rocksdb/pull/12084
Reviewed By: cbi42
Differential Revision: D51499959
Pulled By: ajkr
fbshipit-source-id: 4f4b25ebc4b6bb568501adc8e97813edcddceea8
Summary:
Add some asserts in the `CacheWithSecondaryAdapter` destructor to help debug a crash test failure.
Pull Request resolved: https://github.com/facebook/rocksdb/pull/12082
Reviewed By: cbi42
Differential Revision: D51486041
Pulled By: anand1976
fbshipit-source-id: 76537beed31ba27ab9ac8b4ce6deb775629e3be5
Summary:
`WithinPenultimateLevelOutputRange()` is updated in https://github.com/facebook/rocksdb/issues/12063 to check internal key range. However, op_type of a key can change during compaction, e.g. MERGE -> PUT, which makes a key larger and becomes out of penultimate output range. This has caused stress test failures with error message "Unsafe to store Seq later than snapshot in the last level if per_key_placement is enabled". So update `WithinPenultimateLevelOutputRange()` to only check user key and sequence number.
Pull Request resolved: https://github.com/facebook/rocksdb/pull/12081
Test Plan:
* This repro can produce the corruption within a few runs. Ran it a few times after the fix and did not see Corruption failure.
```
python3 ./tools/db_crashtest.py whitebox --test_tiered_storage --random_kill_odd=888887 --use_merge=1 --writepercent=100 --readpercent=0 --prefixpercent=0 --delpercent=0 --delrangepercent=0 --iterpercent=0 --write_buffer_size=419430 --column_families=1 --read_fault_one_in=0 --write_fault_one_in=0
```
Reviewed By: ajkr
Differential Revision: D51481202
Pulled By: cbi42
fbshipit-source-id: cad6b65099733e03071b496e752bbdb09cf4db82
Summary:
The test failure in https://github.com/facebook/rocksdb/issues/11909 shows that we may compact keys outside of internal key range of penultimate level input files from last level to penultimate level, which can potentially cause overlapping files in the penultimate level. This PR updates the `Compaction::WithinPenultimateLevelOutputRange()` to check internal key range instead of user key.
Other fixes:
* skip range del sentinels when deciding output level for tiered compaction
Pull Request resolved: https://github.com/facebook/rocksdb/pull/12063
Test Plan:
- existing unit tests
- apply the fix to https://github.com/facebook/rocksdb/issues/11905 and run `./tiered_compaction_test --gtest_filter="*RangeDelsCauseFileEndpointsToOverlap*"`
Reviewed By: ajkr
Differential Revision: D51288985
Pulled By: cbi42
fbshipit-source-id: 70085db5f5c3b15300bcbc39057d57b83fd9902a
Summary:
I want to use the `WriteBufferManager` in my rust project, which requires exposing it through the c api, just like `Cache` is.
Hopefully the changes are fairly straightfoward!
Pull Request resolved: https://github.com/facebook/rocksdb/pull/11710
Reviewed By: cbi42
Differential Revision: D51166518
Pulled By: ajkr
fbshipit-source-id: cd266ff1e4a7ab145d05385cd125a8390f51f3fc
Summary:
Fixes https://github.com/facebook/rocksdb/issues/11000.
That issue pointed out that RocksDB was slow to delete archived WALs in case time-based and size-based expiration were enabled, and the time-based threshold (`WAL_ttl_seconds`) was small. This PR prevents the delay by taking into account `WAL_ttl_seconds` when deciding the frequency to process archived WALs for deletion.
Pull Request resolved: https://github.com/facebook/rocksdb/pull/12069
Reviewed By: pdillinger
Differential Revision: D51262589
Pulled By: ajkr
fbshipit-source-id: e65431a06ee96f4c599ba84a27d1aedebecbb003
Summary:
`CacheWithSecondaryAdapter` can distribute placeholder reservations across the primary and secondary caches. The current implementation of the accounting is quite complicated in order to avoid using a mutex. This may cause the accounting to be slightly off after changes to the cache capacity and ratio, resulting in assertion failures. There's also a bug in the unlikely event that the total reservation exceeds the cache capacity. Furthermore, the current implementation is difficult to reason about.
This PR simplifies it by doing the accounting while holding a mutex. The reservations are processed in 1MB chunks in order to avoid taking a lock too frequently. As a side effect, this also removes the restriction of not allowing to increase the compressed secondary cache capacity after decreasing it to 0.
Pull Request resolved: https://github.com/facebook/rocksdb/pull/12059
Test Plan: Existing unit tests, and a new test for capacity increase from 0
Reviewed By: pdillinger
Differential Revision: D51278686
Pulled By: anand1976
fbshipit-source-id: 7e1ad2c50694772997072dd59cab35c93c12ba4f
Summary:
At the moment RocksDBJava uses the default CIrcleCI JVM on Windows builds. This can and has changed in the past and can cause some incompatibilities.
This PR addresses the problem of explicitly installing and using Liberica JDK 8 as Java 8 Is the primary target for RocksdbJava.
Pull Request resolved: https://github.com/facebook/rocksdb/pull/12068
Reviewed By: cbi42
Differential Revision: D51307233
Pulled By: ajkr
fbshipit-source-id: 9cb4e173d8a9ac42e5f9fda1daf012302942fdbc
Summary:
Fixes https://github.com/facebook/rocksdb/issues/11909. The test passes after the change in https://github.com/facebook/rocksdb/issues/11917 to start mock clock from a non-zero time.
The reason for test failing is a bit complicated:
- The Put here e4ad4a0ef1/db/compaction/tiered_compaction_test.cc (L2045) happens before mock clock advances beyond 0.
- This causes oldest_key_time_ to be 0 for memtable.
- oldest_ancester_time of the first L0 file becomes 0
- L0 -> L5/6 compaction output files sets `oldest_ancestoer_time` to the current time due to these lines: 509947ce2c/db/compaction/compaction_job.cc (L1898C34-L1904).
- This causes some small sequence number to be mapped to current time: 509947ce2c/db/compaction/compaction_job.cc (L301)
- Keys in L6 is being moved up to L5 due to the unexpected seqno_to_time mapping
- When compacting keys from last level to the penultimate level, we only check keys to be within user key range of penultimate level input files. If we compact the following file 3 with file 1 and output keys to L5, we can get the reported inconsistency bug.
```
L5: file 1 [K5@20, K10@kMaxSeqno], file 2 [K10@30, K14@34)
L6: file 3 [K6@5, K10@20]
```
https://github.com/facebook/rocksdb/issues/12063 will add fixes to check internal key range when compacting keys from last level up to the penultimate level.
Pull Request resolved: https://github.com/facebook/rocksdb/pull/12064
Test Plan: the unit test passes
Reviewed By: ajkr
Differential Revision: D51281149
Pulled By: cbi42
fbshipit-source-id: 00b7f026c453454d9f3af5b2de441383a96f0c62
Summary:
- Add missing null check for ColumnFamilyHandle in `GetEntity()`
- `FailIfCfHasTs()` now returns `Status::InvalidArgument()` if `column_family` is null. `MultiGetEntity()` can rely on this for cfh null check.
- Added `DeleteRange` API using Default Column Family to be consistent with other major APIs (This was also causing Java Test failure after the `FailIfCfHasTs()` change)
Pull Request resolved: https://github.com/facebook/rocksdb/pull/12057
Test Plan:
- Updated `DBWideBasicTest::GetEntityAsPinnableAttributeGroups` to include null CF case
- Updated `DBWideBasicTest::MultiCFMultiGetEntityAsPinnableAttributeGroups` to include null CF case
Reviewed By: jowlyzhang
Differential Revision: D51167445
Pulled By: jaykorean
fbshipit-source-id: 1c1e44fd7b7df4d2dc3bb2d7d251da85bad7d664
Summary:
When delay didn't happen, histogram WRITE_STALL is still recorded, and ticker STALL_MICROS is not recorded.
This is a bug, neither WRITE_STALL or STALL_MICROS should not be recorded when delay did not happen.
Pull Request resolved: https://github.com/facebook/rocksdb/pull/12067
Reviewed By: cbi42
Differential Revision: D51263133
Pulled By: ajkr
fbshipit-source-id: bd82d8328fe088d613991966e83854afdabc6a25
Summary:
When I call `DBWithTTLImpl::Resume()`, it returns `Status::NotSupported`. Did `StackableDB` miss this API ?
Thanks !
Pull Request resolved: https://github.com/facebook/rocksdb/pull/12060
Reviewed By: jaykorean
Differential Revision: D51202742
Pulled By: ajkr
fbshipit-source-id: 5e01a54a42efd81fd57b3c992b9af8bc45c59c9c
Summary:
Part of the procedures to handle manifest IO error is to disable file deletion in case some files in limbo state get deleted prematurely. This is not ideal because: 1) not all the VersionEdits whose commit encounter such an error contain updates for files, disabling file deletion sometimes are not necessary. 2) `EnableFileDeletion` has a force mode that could make other threads accidentally disrupt this procedure in recovery. 3) Disabling file deletion as a whole is also not as efficient as more precisely tracking impacted files from being prematurely deleted. This PR replaces this mechanism with tracking such files and quarantine them from being deleted in `ErrorHandler`.
These are the types of files being actively tracked in quarantine in this PR:
1) new table files and blob files from a background job
2) old manifest file whose immediately following new manifest file's CURRENT file creation gets into unclear state. Current handling is not sufficient to make sure the old manifest file is kept in case it's needed.
Note that WAL logs are not part of the quarantine because `min_log_number_to_keep` is a safe mechanism and it's only updated after successful manifest commits so it can prevent this premature deletion issue from happening.
We track these files' file numbers because they share the same file number space.
Pull Request resolved: https://github.com/facebook/rocksdb/pull/12030
Test Plan: Modified existing unit tests
Reviewed By: ajkr
Differential Revision: D51036774
Pulled By: jowlyzhang
fbshipit-source-id: 84ef26271fbbc888ef70da5c40fe843bd7038716
Summary:
Followed mrambacher's first suggestion in https://github.com/facebook/rocksdb/pull/12044#issuecomment-1800706148.
This change allows serializing a `TtlMergeOperator` that wraps an unregistered `MergeOperator`. Such a `TtlMergeOperator` cannot be loaded (validation will fail in `TtlMergeOperator::ValidateOptions()`), but that is OK for us currently.
Pull Request resolved: https://github.com/facebook/rocksdb/pull/12056
Reviewed By: hx235
Differential Revision: D51125097
Pulled By: ajkr
fbshipit-source-id: 8ed3705e8d36ab473673b9198eea6db64397ed15
Summary:
Disabling file deletion can be critical for operations like making a backup, recovery from manifest IO error (for now). Ideally as long as there is one caller requesting file deletion disabled, it should be kept disabled until all callers agree to re-enable it. So this PR removes the default forcing behavior for the `EnableFileDeletion` API, and users need to explicitly pass the argument if they insisted on doing so knowing the consequence of what can be potentially disrupted.
This PR removes the API's default argument value so it will cause breakage for all users that are relying on the default value, regardless of whether the forcing behavior is critical for them. When fixing this breakage, it's good to check if the forcing behavior is indeed needed and potential disruption is OK.
This PR also makes unit test that do not need force behavior to do a regular enable file deletion.
Pull Request resolved: https://github.com/facebook/rocksdb/pull/12001
Reviewed By: ajkr
Differential Revision: D51214683
Pulled By: jowlyzhang
fbshipit-source-id: ca7b1ebf15c09eed00f954da2f75c00d2c6a97e4
Summary:
This PR adds a missing set function for rocksdb_options in the C-API:
rocksdb_options_set_cf_paths(). Without this function, users cannot
specify different paths for different column families as it will fall back
to db_paths.
As a bonus, this PR also includes rocksdb_sst_file_metadata_get_directory()
to the C api -- a missing public function that will also make the test easier to write.
Pull Request resolved: https://github.com/facebook/rocksdb/pull/11151
Test Plan: Augment existing c_test to verify the specified cf_path.
Reviewed By: hx235
Differential Revision: D51201888
Pulled By: ajkr
fbshipit-source-id: 62a96451f26fab60ada2005ede3eea8e9b431f30
Summary:
#### Problem
While the RocksDB C API does have the RateLimiter API, it does not
expose the auto_tuned option.
#### Summary of Change
This PR exposes auto_tuned RateLimiter option in RocksDB C API.
Pull Request resolved: https://github.com/facebook/rocksdb/pull/12058
Test Plan: Augment the C API existing test to cover the new API.
Reviewed By: cbi42
Differential Revision: D51201933
Pulled By: ajkr
fbshipit-source-id: 5bc595a9cf9f88f50fee797b729ba96f09ed8266
Summary:
As titled. This PR contains the API and stubbed implementation for piping write time.
Pull Request resolved: https://github.com/facebook/rocksdb/pull/12043
Reviewed By: pdillinger
Differential Revision: D51076575
Pulled By: jowlyzhang
fbshipit-source-id: 3b341263498351b9ccaff27cf35d5aeb5bdf0cf1
Summary:
The CreateEnvTest.CreateEncryptedFileSystem unit test is to verify the creation functionality of EncryptedFileSystem, but now it just support the builtin CTREncryptionProvider class.
This patch make it flexible to use environment variable `TEST_FS_URI`, it is useful to test customer encryption plugins.
Pull Request resolved: https://github.com/facebook/rocksdb/pull/12025
Reviewed By: anand1976
Differential Revision: D50799656
Pulled By: ajkr
fbshipit-source-id: dbcacfefbf07de9c7803f7707b34c5193bec17bf
Summary:
- Our database is corrupted, causing some sequences of wal record to be invalid (but the `record_checksum` looks fine).
- When we RecoverLogFiles in WALRecoveryMode::kPointInTimeRecovery, `assert(seq <= kMaxSequenceNumber)` will be failed.
- When it is found that sequence is illegal, can we drop the file to recover as much data as possible ? Thx !
Pull Request resolved: https://github.com/facebook/rocksdb/pull/11985
Reviewed By: anand1976
Differential Revision: D50698039
Pulled By: ajkr
fbshipit-source-id: 1e42113b58823088d7c0c3a92af5b3efbb5f5296
Summary:
`${PROJECT_NAME}` isn't guaranteed to match a target name when an artefact suffix is specified.
Pull Request resolved: https://github.com/facebook/rocksdb/pull/12055
Reviewed By: anand1976
Differential Revision: D51125532
Pulled By: ajkr
fbshipit-source-id: cd1f4a5b11eb517c379e3ee3f78592f7e606a034
Summary:
**Context/Summary:**
It's intuitive for users to assume `TablePropertiesCollector::Finish()` is called only once by RocksDB internal by the word "finish".
However, this is currently not true as RocksDB also calls this function in `BlockBased/PlainTableBuilder::GetTableProperties()` to populate user collected properties on demand.
This PR avoids that by moving that populating to where we first call `Finish()` (i.e, `NotifyCollectTableCollectorsOnFinish`)
Bonus: clarified in the API that `GetReadableProperties()` will be called after `Finish()` and added UT to ensure that.
Pull Request resolved: https://github.com/facebook/rocksdb/pull/12053
Test Plan:
- Modified test `DBPropertiesTest.GetUserDefinedTableProperties` to ensure `Finish()` only called once.
- Existing test particularly `db_properties_test, table_properties_collector_test` verify the functionality `NotifyCollectTableCollectorsOnFinish` and `GetReadableProperties()` are not broken by this change.
Reviewed By: ajkr
Differential Revision: D51095434
Pulled By: hx235
fbshipit-source-id: 1c6275258f9b99dedad313ee8427119126817973
Summary:
See new atomic.h file comments for motivation.
I have updated HyperClockCache to use the new atomic wrapper, fixing a few cases where an implicit conversion was accidentally used and therefore mixing std::memory_order_seq_cst where release/acquire ordering (or relaxed) was intended. There probably wasn't a real bug because I think all the cases happened to be in single-threaded contexts like constructors/destructors or statistical ops like `GetCapacity()` that don't need any particular ordering constraints.
Recommended follow-up:
* Replace other uses of std::atomic to help keep them safe from bugs.
Pull Request resolved: https://github.com/facebook/rocksdb/pull/12051
Test Plan:
Did some local correctness stress testing with cache_bench. Also triggered 15 runs of fbcode_blackbox_crash_test and saw no related failures (just 3 failures in ~CacheWithSecondaryAdapter(), already known)
No performance difference seen before & after running simultaneously:
```
(while ./cache_bench -cache_type=fixed_hyper_clock_cache -populate_cache=0 -cache_size=3000000000 -ops_per_thread=500000 -threads=12 -histograms=0 2>&1 | grep parallel; do :; done) | awk '{ s += $3; c++; print "Avg time: " (s/c);}'
```
... for both fixed_hcc and auto_hcc.
Reviewed By: jowlyzhang
Differential Revision: D51090518
Pulled By: pdillinger
fbshipit-source-id: eeb324facb3185584603f9ea0c4de6f32919a2d7
Summary:
Env::NewRandomRWFile() will not create the file if it doesn't exist, as the test saying https://github.com/facebook/rocksdb/blob/main/env/env_test.cc#L2208.
This patch correct the comments of Env::NewRandomRWFile(), it may mislead the developers who use rocksdb Env() as an utility.
Pull Request resolved: https://github.com/facebook/rocksdb/pull/11820
Reviewed By: ajkr
Differential Revision: D50176707
Pulled By: jowlyzhang
fbshipit-source-id: a6ee469f549360de8d551a4fe8517b4450df7b15
Summary:
There was some unncessary logic (e.g. a dead assignment to home_shift) left over from earlier revision of the code.
Also, rename confusing ChainRewriteLock::new_head_ / GetNewHead() to saved_head_ / GetSavedHead().
Pull Request resolved: https://github.com/facebook/rocksdb/pull/12052
Test Plan: existing tests
Reviewed By: jowlyzhang
Differential Revision: D51091499
Pulled By: pdillinger
fbshipit-source-id: 4b191b60a2b16085681e59d49c4d97e802869db8