rocksdb

mirror of https://github.com/facebook/rocksdb.git synced 2024-11-28 15:33:54 +00:00

Author	SHA1	Message	Date
Changyu Bi	e2ef349f56	Deflake unit test `DBCompactionTest.CompactionLimiter` (#12596 ) Summary: The test has been flaky for a long time. A recent [failure](https://github.com/facebook/rocksdb/actions/runs/8820808355/job/24215219590?pr=12578) shows that there is still flush running when the assertion fails. I think this is because `WaitForFlushMemTable()` may return before the a flush schedules the next compaction. Pull Request resolved: https://github.com/facebook/rocksdb/pull/12596 Test Plan: I could not repro the failure locally: `gtest-parallel --repeat=8000 --workers=100 ./db_compaction_test --gtest_filter="CompactionLimiter"` Reviewed By: ajkr Differential Revision: D56715874 Pulled By: cbi42 fbshipit-source-id: f5f64eb30fff7e115c19beedad2dc22afa06258d	2024-05-02 17:10:06 -07:00
Jaepil Jeong	2cd4346df6	Fix compile error in Clang (#12588 ) Summary: This PR fixes the following compile errors with Clang: ``` .../rocksdb/env/fs_on_demand.cc:184:5: error: no member named 'for_each' in namespace 'std'; did you mean 'std::ranges::for_each'? 184 \| std::for_each(rchildren.begin(), rchildren.end(), [&](std::string& name) { \| ^~~~~~~~~~~~~ \| std::ranges::for_each /opt/homebrew/opt/llvm@17/bin/../include/c++/v1/__algorithm/ranges_for_each.h:68:23: note: 'std::ranges::for_each' declared here 68 \| inline constexpr auto for_each = __for_each::__fn{}; \| ^ .../rocksdb/env/fs_on_demand.cc:188:10: error: no member named 'sort' in namespace 'std' 188 \| std::sort(result->begin(), result->end()); \| ~~~~~^ .../rocksdb/env/fs_on_demand.cc:189:10: error: no member named 'sort' in namespace 'std' 189 \| std::sort(rchildren.begin(), rchildren.end()); \| ~~~~~^ .../rocksdb/env/fs_on_demand.cc:193:10: error: no member named 'set_union' in namespace 'std' 193 \| std::set_union(result->begin(), result->end(), rchildren.begin(), \| ~~~~~^ .../rocksdb/env/fs_on_demand.cc:221:5: error: no member named 'for_each' in namespace 'std'; did you mean 'std::ranges::for_each'? 221 \| std::for_each( \| ^~~~~~~~~~~~~ \| std::ranges::for_each /opt/homebrew/opt/llvm@17/bin/../include/c++/v1/__algorithm/ranges_for_each.h:68:23: note: 'std::ranges::for_each' declared here 68 \| inline constexpr auto for_each = __for_each::__fn{}; \| ^ .../rocksdb/env/fs_on_demand.cc:226:10: error: no member named 'sort' in namespace 'std' 226 \| std::sort(result->begin(), result->end(), file_attr_sorter); \| ~~~~~^ .../rocksdb/env/fs_on_demand.cc:227:10: error: no member named 'sort' in namespace 'std' 227 \| std::sort(rchildren.begin(), rchildren.end(), file_attr_sorter); \| ~~~~~^ .../rocksdb/env/fs_on_demand.cc:231:10: error: no member named 'set_union' in namespace 'std' 231 \| std::set_union(rchildren.begin(), rchildren.end(), result->begin(), \| ~~~~~^ 8 errors generated. ``` Pull Request resolved: https://github.com/facebook/rocksdb/pull/12588 Reviewed By: jaykorean Differential Revision: D56656222 Pulled By: ajkr fbshipit-source-id: 7e94b6250fc9edfe597a61b7622f09d6b6cd9cbd	2024-05-02 16:54:21 -07:00
anand76	6cc7ad15b6	Implement secondary cache admission policy to allow all evicted blocks (#12599 ) Summary: Add a secondary cache admission policy to admit all blocks evicted from the block cache. Pull Request resolved: https://github.com/facebook/rocksdb/pull/12599 Reviewed By: pdillinger Differential Revision: D56891760 Pulled By: anand1976 fbshipit-source-id: 193c98c055aa3477f4e3a78e5d3daef27a5eacf4	2024-05-02 11:23:35 -07:00
anand76	6349da612b	Update HISTORY.md and version to 9.3.0 (#12601 ) Summary: Update HISTORY.md for 9.2 and version to 9.3. Pull Request resolved: https://github.com/facebook/rocksdb/pull/12601 Reviewed By: jaykorean, jowlyzhang Differential Revision: D56845901 Pulled By: anand1976 fbshipit-source-id: 0d1137a6568e4712be2f8b705f4f7b438217dbed	2024-05-01 16:33:04 -07:00
Yu Zhang	241253053a	Fix delete obsolete files on recovery not rate limited (#12590 ) Summary: This PR fix the issue that deletion of obsolete files during DB::Open are not rate limited. The root cause is slow deletion is disabled if trash/db size ratio exceeds the configured `max_trash_db_ratio` `d610e14f93/include/rocksdb/sst_file_manager.h (L126)` however, the current handling in DB::Open starts with tracking nothing but the obsolete files. This will make the ratio always look like it's 1. In order for the deletion rate limiting logic to work properly, we should only start deleting files after `SstFileManager` has finished tracking the whole DB, so the main fix is to move these two places that attempts to delete file after the tracking are done: 1) the `DeleteScheduler::CleanupDirectory` call in `SanitizeOptions`, 2) the `DB::DeleteObsoleteFiles` call. There are some other aesthetic changes like refactoring collecting all the DB paths into a function, rename `DBImp::DeleteUnreferencedSstFiles` to `DBImpl:: MaybeUpdateNextFileNumber` as it doesn't actually delete the files. Pull Request resolved: https://github.com/facebook/rocksdb/pull/12590 Test Plan: Added unit test and verified with manual testing Reviewed By: anand1976 Differential Revision: D56830519 Pulled By: jowlyzhang fbshipit-source-id: 8a38a21b1ea11c5371924f2b88663648f7a17885	2024-05-01 12:26:54 -07:00
Yu Zhang	8b3d9e6bfe	Add TimedPut to stress test (#12559 ) Summary: This also updates WriteBatch's protection info to include write time since there are several places in memtable that by default protects the whole value slice. This PR is stacked on https://github.com/facebook/rocksdb/issues/12543 Pull Request resolved: https://github.com/facebook/rocksdb/pull/12559 Reviewed By: pdillinger Differential Revision: D56308285 Pulled By: jowlyzhang fbshipit-source-id: 5524339fe0dd6c918dc940ca2f0657b5f2111c56	2024-04-30 15:40:35 -07:00
Hui Xiao	abd6751aba	Fix wrong padded bytes being used to generate file checksum (#12598 ) Summary: Context/Summary: https://github.com/facebook/rocksdb/pull/12542 introduced a bug where wrong padded bytes used to generate file checksum if flush happens during padding. This PR fixed it along with an existing same bug for `perform_data_verification_=true`. Pull Request resolved: https://github.com/facebook/rocksdb/pull/12598 Test Plan: - New UT that failed before this fix (`db->VerifyFileChecksums: ...Corruption: ...file checksum mismatch`) and passes after - Benchmark ``` TEST_TMPDIR=/dev/shm ./db_bench --benchmarks=fillseq[-X300] --num=100000 --block_align=1 --compression_type=none ``` Pre-PR: fillseq [AVG 300 runs] : 421334 (± 4126) ops/sec; 46.6 (± 0.5) MB/sec Post-PR: (no regression observed but a slight improvement) fillseq [AVG 300 runs] : 425768 (± 4309) ops/sec; 47.1 (± 0.5) MB/sec Reviewed By: ajkr, anand1976 Differential Revision: D56725688 Pulled By: hx235 fbshipit-source-id: c1a700a95def8c65c0a21e44f8c1966164925ad5	2024-04-30 15:38:53 -07:00
Yu Zhang	2c02a9b76f	Preserve TimedPut on penultimate level until it actually expires (#12543 ) Summary: To make sure `TimedPut` are placed on proper tier before and when it becomes eligible for cold tier 1) flush and compaction need to keep relevant seqno to time mapping for not just the sequence number contained in internal keys, but also preferred sequence number for `TimedPut` entries. This PR also fix some bugs in for handling `TimedPut` during compaction: 1) dealing with an edge case when a `TimedPut` entry's internal key is the right bound for penultimate level, the internal key after swapping in its preferred sequence number will fall outside of the penultimate range because preferred sequence number is smaller than its original sequence number. The entry however is still safe to be placed on penultimate level, so we keep track of `TimedPut` entry's original sequence number for this check. The idea behind this is that as long as it's safe for the original key to be placed on penultimate level, it's safe for the entry with swapped preferred sequence number to be placed on penultimate level too. Because we only swap in preferred sequence number when that entry is visible to the earliest snapshot and there is no other data points with the same user key in lower levels. On the other hand, as long as it's not safe for the original key to be placed on penultimate level, we will not place the entry after swapping the preferred seqno on penultimate level either. 2) the assertion that preferred seqno is always bigger than original sequence number may fail if this logic is only exercised after sequence number is zeroed out. We adjust the assertion to handle that case too. In this case, we don't swap in the preferred seqno but will adjust the its type to `kTypeValue`. 3) there was a special case handling for when range deletion may end up incorrectly covering an entry if preferred seqno is swapped in. But it missed the case that if the original entry is already covered by range deletion. The original handling will mistakenly output the entry instead of omitting it. Pull Request resolved: https://github.com/facebook/rocksdb/pull/12543 Test Plan: ./tiered_compaction_test --gtest_filter="PrecludeLastLevelTest.PreserveTimedPutOnPenultimateLevel" ./compaction_iterator_test --gtest_filter="TimedPut" Reviewed By: pdillinger Differential Revision: D56195096 Pulled By: jowlyzhang fbshipit-source-id: 37ebb09d2513abbd9e90cda0217e26874584b8f3	2024-04-30 11:16:02 -07:00
Peter Dillinger	45c105104b	Set optimize_filters_for_memory by default (#12377 ) Summary: This feature has been around for a couple of years and users haven't reported any problems with it. Not quite related: fixed a technical ODR violation in public header for info_log_level in case DEBUG build status changes. Pull Request resolved: https://github.com/facebook/rocksdb/pull/12377 Test Plan: unit tests updated, already in crash test. Some unit tests are expecting specific behaviors of optimize_filters_for_memory=false and we now need to bake that in. Reviewed By: jowlyzhang Differential Revision: D54129517 Pulled By: pdillinger fbshipit-source-id: a64b614840eadd18b892624187b3e122bab6719c	2024-04-30 08:33:31 -07:00
Changyu Bi	5c1334f763	DeleteRange() return NotSupported if row_cache is configured (#12512 ) Summary: ...since this feature combination is not supported yet (https://github.com/facebook/rocksdb/issues/4122). Pull Request resolved: https://github.com/facebook/rocksdb/pull/12512 Test Plan: new unit test. Reviewed By: jaykorean, jowlyzhang Differential Revision: D55820323 Pulled By: cbi42 fbshipit-source-id: eeb5e97d15c9bdc388793a2fb8e52cfa47e34bcf	2024-04-29 16:33:13 -07:00
Andrew Kryczka	b2931a5c53	Fixed `MultiGet()` error handling to not skip blob dereference (#12597 ) Summary: See comment at top of the test case and release note. Pull Request resolved: https://github.com/facebook/rocksdb/pull/12597 Reviewed By: jaykorean Differential Revision: D56718786 Pulled By: ajkr fbshipit-source-id: 8dce185bb0d24a358372fc2b553d181793fc335f	2024-04-29 14:18:42 -07:00
anand76	e36b0a2da4	Fix corruption bug when recycle_log_file_num changed from 0 (#12591 ) Summary: When `recycle_log_file_num` is changed from 0 to non-zero and the DB is reopened, any log files from the previous session that are still alive get reused. However, the WAL records in those files are not in the recyclable format. If one of those files is reused and is empty, a subsequent re-open, in `RecoverLogFiles`, can replay those records and insert stale data into the memtable. Another manifestation of this is an assertion failure `first_seqno_ == 0 \|\| s >= first_seqno_` in `rocksdb::MemTable::Add`. We could fix this by either 1) Writing a special record when reusing a log file, or 2) Implement more rigorous checking in `RecoverLogFiles` to ensure we don't replay stale records, or 3) Not reuse files created by a previous DB session. We choose option 3 as its the simplest, and flipping `recycle_log_file_num` is expected to be a rare event. Pull Request resolved: https://github.com/facebook/rocksdb/pull/12591 Test Plan: 1. Add a unit test to verify the bug and fix Reviewed By: jowlyzhang Differential Revision: D56655812 Pulled By: anand1976 fbshipit-source-id: aa3a26b4a5e892d39a54b5a0658233cbebebac87	2024-04-29 12:25:00 -07:00
Andrew Kryczka	d80e1d99bc	Add `ldb multi_get_entity` subcommand (#12593 ) Summary: Mixed code from `MultiGetCommand` and `GetEntityCommand` to introduce `MultiGetEntityCommand`. Some minor fixes for the related subcommands are included. Pull Request resolved: https://github.com/facebook/rocksdb/pull/12593 Reviewed By: jaykorean Differential Revision: D56687147 Pulled By: ajkr fbshipit-source-id: 2ad7b7ba8e05e990b43f2d1eb4990f746ce5f1ea	2024-04-28 21:22:31 -07:00
Andrew Kryczka	2ec25a3e54	Prevent data block compression with `BlockBasedTableOptions::block_align` (#12592 ) Summary: Made `BlockBasedTableOptions::block_align` incompatible (i.e., APIs will return `Status::InvalidArgument`) with more ways of enabling compression: `CompactionOptions::compression`, `ColumnFamilyOptions::compression_per_level`, and `ColumnFamilyOptions::bottommost_compression`. Previously it was only incompatible with `ColumnFamilyOptions::compression`. Pull Request resolved: https://github.com/facebook/rocksdb/pull/12592 Reviewed By: hx235 Differential Revision: D56650862 Pulled By: ajkr fbshipit-source-id: f5201602c2ce436e6d8d30893caa6a161a61f141	2024-04-26 20:05:30 -07:00
Jay Huh	a82ba52756	Disable inplace_update_support in OptimisticTxnDB (#12589 ) Summary: Adding OptimisticTransactionDB like https://github.com/facebook/rocksdb/issues/12586 Pull Request resolved: https://github.com/facebook/rocksdb/pull/12589 Test Plan: ``` python3 tools/db_crashtest.py whitebox --optimistic_txn ``` ``` Running db_stress with pid=773197: ./db_stress ... --inplace_update_support=0 ... --use_optimistic_txn=1 ... ... ``` Reviewed By: ajkr Differential Revision: D56635338 Pulled By: jaykorean fbshipit-source-id: fc3ef13420a2d539c7651d3f5b7dd6c4c89c836d	2024-04-26 16:00:06 -07:00
Richard Barnes	8e1bd02279	Fix deprecated use of 0/NULL in internal_repo_rocksdb/repo/util/xxhash.h + 1 Summary: `nullptr` is typesafe. `0` and `NULL` are not. In the future, only `nullptr` will be allowed. This diff helps us embrace the future _now_ in service of enabling `-Wzero-as-null-pointer-constant`. Reviewed By: palmje Differential Revision: D56650257 fbshipit-source-id: ce628fbf12ea5846bb7103455ab859c5ed7e3598	2024-04-26 15:34:49 -07:00
Richard Barnes	3fa2ff3046	Fix deprecated use of 0/NULL in internal_repo_rocksdb/repo/include/rocksdb/utilities/env_mirror.h + 1 Summary: `nullptr` is typesafe. `0` and `NULL` are not. In the future, only `nullptr` will be allowed. This diff helps us embrace the future _now_ in service of enabling `-Wzero-as-null-pointer-constant`. Reviewed By: palmje Differential Revision: D56650296 fbshipit-source-id: ee3491d30e6c1fdefb3010c8ae1104b3f45e70f6	2024-04-26 15:33:38 -07:00
Andrew Kryczka	b4c520cadc	change default `CompactionOptions::compression` while deprecating it (#12587 ) Summary: I had a TODO to complete `CompactionOptions`'s compression API but never did it: `d610e14f93/db/compaction/compaction_picker.cc (L371-L373)` Without solving that TODO, the API remains incomplete and unsafe. Now, however, I don't think it's worthwhile to complete it. I think we should instead delete the API entirely. This PR deprecates it in preparation for deletion in a future major release. The `ColumnFamilyOptions` settings for compression should be good enough for `CompactFiles()` since they are apparently good enough for every other compaction, including `CompactRange()`. In the meantime, I also changed the default `CompressionType`. Having callers of `CompactFiles()` use Snappy compression by default does not make sense when the default could be to simply use the same compression type that is used for every other compaction. As a bonus, this change makes the default `CompressionType` consistent with the `CompressionOptions` that will be used. Pull Request resolved: https://github.com/facebook/rocksdb/pull/12587 Reviewed By: hx235 Differential Revision: D56619273 Pulled By: ajkr fbshipit-source-id: 1477de49f14b06c72d6f0045616a8ce91d97e66e	2024-04-26 13:03:21 -07:00
Changyu Bi	d610e14f93	Disable `inplace_update_support` in transaction stress tests (#12586 ) Summary: `MultiOpsTxnsStressTest` relies on snapshot which is incompatible with `inplace_update_support`. TransactionDB uses snapshot too so we don't expect it to be used with `inplace_update_support` either. Pull Request resolved: https://github.com/facebook/rocksdb/pull/12586 Test Plan: ``` python3 tools/db_crashtest.py whitebox --[test_multiops_txn\|txn] --txn_write_policy=1 ``` Reviewed By: hx235 Differential Revision: D56602769 Pulled By: cbi42 fbshipit-source-id: 8778541295f0af71e8ce912c8f872ab5cc607fc1	2024-04-25 17:06:30 -07:00
Andrew Kryczka	177ccd3904	Print more debug info in test when `SyncWAL()` fails (#12580 ) Summary: Example failure (cannot reproduce): ``` [==========] Running 1 test from 1 test case. [----------] Global test environment set-up. [----------] 1 test from DBWriteTestInstance/DBWriteTest [ RUN ] DBWriteTestInstance/DBWriteTest.ConcurrentlyDisabledWAL/0 db/db_write_test.cc:809: Failure dbfull()->SyncWAL() Not implemented: SyncWAL() is not supported for this implementation of WAL file db/db_write_test.cc:809: Failure dbfull()->SyncWAL() Not implemented: SyncWAL() is not supported for this implementation of WAL file db/db_write_test.cc:809: Failure dbfull()->SyncWAL() Not implemented: SyncWAL() is not supported for this implementation of WAL file db/db_write_test.cc:809: Failure dbfull()->SyncWAL() Not implemented: SyncWAL() is not supported for this implementation of WAL file db/db_write_test.cc:809: Failure dbfull()->SyncWAL() Not implemented: SyncWAL() is not supported for this implementation of WAL file db/db_write_test.cc:809: Failure dbfull()->SyncWAL() Not implemented: SyncWAL() is not supported for this implementation of WAL file db/db_write_test.cc:809: Failure dbfull()->SyncWAL() Not implemented: SyncWAL() is not supported for this implementation of WAL file db/db_write_test.cc:809: Failure dbfull()->SyncWAL() Not implemented: SyncWAL() is not supported for this implementation of WAL file db/db_write_test.cc:809: Failure dbfull()->SyncWAL() Not implemented: SyncWAL() is not supported for this implementation of WAL file db/db_write_test.cc:809: Failure dbfull()->SyncWAL() Not implemented: SyncWAL() is not supported for this implementation of WAL file [ FAILED ] DBWriteTestInstance/DBWriteTest.ConcurrentlyDisabledWAL/0, where GetParam() = 0 (49 ms) [----------] 1 test from DBWriteTestInstance/DBWriteTest (49 ms total) [----------] Global test environment tear-down [==========] 1 test from 1 test case ran. (49 ms total) [ PASSED ] 0 tests. [ FAILED ] 1 test, listed below: [ FAILED ] DBWriteTestInstance/DBWriteTest.ConcurrentlyDisabledWAL/0, where GetParam() = 0 ``` I have no idea why `SyncWAL()` would not be supported from what is presumably a `SpecialEnv` so added more debug info in case it fails again in CI. The last failure was https://github.com/facebook/rocksdb/actions/runs/8731304938/job/23956487511?fbclid=IwAR2jyXgVQtCezri3axV5MwMdI7D6VIudMk1xkiN_FL9-x2dkBv4IqIjjgB4 and it only happened once ever AFAIK. Pull Request resolved: https://github.com/facebook/rocksdb/pull/12580 Reviewed By: hx235 Differential Revision: D56541996 Pulled By: ajkr fbshipit-source-id: 1eab17567db783c11054fa85dd8b8880eacd3a50	2024-04-25 14:34:11 -07:00
Hui Xiao	490d11a012	Clarify `inplace_update_support` with DeleteRange and reenable `inplace_update_support` in crash test (#12577 ) Summary: Context/Summary: Our crash test recently surfaced incompatibilities between DeleteRange and inplace_update_support. Incorrect read result will be returned after insertion into memtables already contain delete range data. This PR is to clarify this in API and re-enable `inplace_update_support` in crash test with sanitization. Ideally there should be a way to check memtable for delete range entry upon put under inplace_update_support = true Pull Request resolved: https://github.com/facebook/rocksdb/pull/12577 Test Plan: CI Reviewed By: ajkr Differential Revision: D56492556 Pulled By: hx235 fbshipit-source-id: 9e80e5c69dd708716619a266f41580959680c83b	2024-04-25 14:07:39 -07:00
Jay Huh	f16ba42116	Fix IteratorsConsistentView tests (#12582 ) Summary: Fixing the failure in IteratorsConsistentViewExplicitSnapshot as shown in https://github.com/facebook/rocksdb/actions/runs/8825927545/job/24230854140?pr=12581 The failure was due to the timing of the `flush()` for the later Column Family in the loop. If the flush for the later CFs installs the new super version before getting the SV for the iterator, assertion succeeds, but if the order flips, SV will be obsolete and assertion can fail. This PR simplifies the test in a way that we do only one `flush()` so that `SYNC_POINT` can guarantee the order of operations. For ImplicitSnapshot test, it now just triggers flush for the second CF after obtaining SV for the first CF. For the ExplicitSnapshot test, it now triggers atomic flush() for all CFs after obtaining SV for the first CF. Pull Request resolved: https://github.com/facebook/rocksdb/pull/12582 Test Plan: ``` ./db_iterator_test --gtest_filter="IteratorsConsistentView" ./multi_cf_iterator_test -- --gtest_filter="ConsistentView ``` Reviewed By: ajkr, jowlyzhang Differential Revision: D56557234 Pulled By: jaykorean fbshipit-source-id: 7aa2f6d0e12a915b6e16cd240389bcfb5b4a5b62	2024-04-25 14:06:46 -07:00
Andrew Kryczka	f75f033d74	initialize member variables in `PerfContext`'s default constructor (#12581 ) Summary: Pull Request resolved: https://github.com/facebook/rocksdb/pull/12581 Reviewed By: jaykorean Differential Revision: D56555535 Pulled By: ajkr fbshipit-source-id: 8bff376247736a8da69c79b20f6f334f47d896ca	2024-04-25 10:24:34 -07:00
Jay Huh	1fca175eec	MultiCFSnapshot for NewIterators() API (#12573 ) Summary: As mentioned in https://github.com/facebook/rocksdb/issues/12561 and https://github.com/facebook/rocksdb/issues/12566 , `NewIterators()` API has not been providing consistent view of the db across multiple column families. This PR addresses it by utilizing `MultiCFSnapshot()` function which has been used for `MultiGet()` APIs. To be able to obtain the thread-local super version with ref, `sv_exclusive_access` parameter has been added to `MultiCFSnapshot()` so that we could call `GetReferencedSuperVersion()` or `GetAndRefSuperVersion()` depending on the param and support `Refresh()` API for MultiCfIterators Pull Request resolved: https://github.com/facebook/rocksdb/pull/12573 Test Plan: Unit Tests Added ``` ./db_iterator_test --gtest_filter="IteratorsConsistentView" ``` ``` ./multi_cf_iterator_test -- --gtest_filter="ConsistentView" ``` Performance Check Setup ``` make -j64 release TEST_TMPDIR=/dev/shm/db_bench ./db_bench -benchmarks="filluniquerandom" -key_size=32 -value_size=512 -num=10000000 -compression_type=none ``` Run ``` TEST_TMPDIR=/dev/shm/db_bench ./db_bench -use_existing_db=1 -benchmarks="multireadrandom" -cache_size=10485760000 ``` Before the change ``` DB path: [/dev/shm/db_bench/dbbench] multireadrandom : 6.374 micros/op 156892 ops/sec 6.374 seconds 1000000 operations; (0 of 1000000 found) ``` After the change ``` DB path: [/dev/shm/db_bench/dbbench] multireadrandom : 6.265 micros/op 159627 ops/sec 6.265 seconds 1000000 operations; (0 of 1000000 found) ``` Reviewed By: jowlyzhang Differential Revision: D56444066 Pulled By: jaykorean fbshipit-source-id: 327ce73c072da30c221e18d4f3389f49115b8f99	2024-04-24 15:28:55 -07:00
Andrew Kryczka	6807da0b44	Fix `DisableManualCompaction()` hang (#12578 ) Summary: Prior to this PR the following sequence could happen: 1. `RunManualCompaction()` A schedules compaction to thread pool and waits 2. `RunManualCompaction()` B waits without scheduling anything due to conflict 3. `DisableManualCompaction()` bumps `manual_compaction_paused_` and wakes up both 4. `RunManualCompaction()` A (`scheduled && !unscheduled`) unschedules its compaction and marks itself done 5. `RunManualCompaction()` B (`!scheduled && !unscheduled`) schedules compaction to thread pool 6. `RunManualCompaction()` B (`scheduled && !unscheduled`) waits on its compaction 7. `RunManualCompaction()` B at some point wakes up and finishes, either by unscheduling or by compaction execution 8. `DisableManualCompaction()` returns as there are no more manual compactions running Between 6. and 7. the wait can be long while the compaction sits in the thread pool queue. That wait is unnecessary. This PR changes the behavior from step 5. onward: 5'. `RunManualCompaction()` B (`!scheduled && !unscheduled`) marks itself done 6'. `DisableManualCompaction()` returns as there are no more manual compactions running Pull Request resolved: https://github.com/facebook/rocksdb/pull/12578 Reviewed By: cbi42 Differential Revision: D56528144 Pulled By: ajkr fbshipit-source-id: 4da2467376d7d4ff435547aa74dd8f118db0c03b	2024-04-24 12:40:36 -07:00
Hui Xiao	d72e60397f	Enable block_align in crash test (#12560 ) Summary: Context/Summary: After https://github.com/facebook/rocksdb/pull/12542 there should be no blocker to re-enable block_align in crash test Pull Request resolved: https://github.com/facebook/rocksdb/pull/12560 Test Plan: CI Reviewed By: jowlyzhang Differential Revision: D56479173 Pulled By: hx235 fbshipit-source-id: 7c0bf327da0bd619deb89ab706e6ccd24e5b9543	2024-04-23 15:06:56 -07:00
Hui Xiao	9d37408f9a	Temporarily disable inplace_update_support in crash test (#12574 ) Summary: Context/Summary: Our recent crash test failures show inplace_update_support can cause DB to return value inconsistent with expected state upon crash recovery if delete range was used in the previous run AND inplace_update_support=true is used in either previous or the current verification run. Since it's a bit hard to keep track of whether previous run has used delete range or not, I decided to temporarily disable inplace_update_support in crash test to keep crash test stabilized before figuring why these two features are incompatible and how to prevent such combination in crash test. Pull Request resolved: https://github.com/facebook/rocksdb/pull/12574 Test Plan: Rehearsed many stress run with `inplace_update_support=0` and they passed Reviewed By: jaykorean Differential Revision: D56454951 Pulled By: hx235 fbshipit-source-id: 57f2ae6308bad7ed4077ddb9e658380742afa293	2024-04-23 10:02:18 -07:00
Andrew Kryczka	3f3045a405	fix DeleteRange+memtable_insert_with_hint_prefix_extractor interaction (#12558 ) Summary: Previously `insert_hints_` was used for both point key table (`table_`) and range deletion table (`range_del_table_`). Hints include pointers to table data, so mixing hints for different tables together without tracking which hint corresponds to which table was problematic. We can just make the hints dedicated to the point key table only. Pull Request resolved: https://github.com/facebook/rocksdb/pull/12558 Reviewed By: hx235 Differential Revision: D56279019 Pulled By: ajkr fbshipit-source-id: 00fe5ce72f9f11a1c1cba5f1977b908b2d518f29	2024-04-22 20:13:58 -07:00
Rob Anderson	c165394439	convert circleci arm jobs to github actions (#12569 ) Summary: This pull request converts the CircleCI jobs that run on ARM runners to GitHub actions jobs. With this change you can retire the [circleci config](https://github.com/facebook/rocksdb/blob/main/.circleci/config.yml) for this repo. This change assumes you have [ARM runners](https://github.blog/changelog/2023-10-30-accelerate-your-ci-cd-with-arm-based-hosted-runners-in-github-actions/) with the label `4-core-ubuntu-arm`. --- [Here is a workflow run in my fork showing these jobs passing](https://github.com/robandpdx-org/rocksdb/actions/runs/8760406181/job/24045304383). --- https://fburl.com/workplace/f6mz6tmw Pull Request resolved: https://github.com/facebook/rocksdb/pull/12569 Reviewed By: ltamasi Differential Revision: D56435439 Pulled By: ajkr fbshipit-source-id: a24d79f21baca01beda232746f90b2853f27a664	2024-04-22 15:01:50 -07:00
Hui Xiao	7d83b4e3e5	Fix file checksum mismatch due to padded bytes when block_align=true (#12542 ) Summary: Context/Summary: When `BlockBasedTableOptions::block_align=true`, we pad bytes to align blocks `d41e568b1c/table/block_based/block_based_table_builder.cc (L1415-L1421)`. Those bytes are not included in generating the file checksum upon file creation. But `VerifyFileChecksums()` includes those bytes in generating the file check to compare against the checksum generating upon file creation. Therefore a file checksum mismatch is returned in `VerifyFileChecksums()`. We decided to include those padded bytes in generating the checksum upon file creation. Bonus: also fix surrounding code to use actual padded bytes for verification - see https://github.com/facebook/rocksdb/pull/12542#discussion_r1571429163 Pull Request resolved: https://github.com/facebook/rocksdb/pull/12542 Test Plan: - New UT - Benchmark ``` TEST_TMPDIR=/dev/shm ./db_bench --benchmarks=fillseq[-X300] --num=100000 --block_align=1 --compression_type=none ``` Pre-PR: fillseq [AVG 300 runs] : 422857 (± 3942) ops/sec; 46.8 (± 0.4) MB/sec Post-PR: fillseq [AVG 300 runs] : 424707 (± 3799) ops/sec; 47.0 (± 0.4) MB/sec Reviewed By: ajkr Differential Revision: D56168447 Pulled By: hx235 fbshipit-source-id: 96209ef950d42943d336f11968ae3fcf9872fc2c	2024-04-22 14:07:34 -07:00
Levi Tamasi	bcfe4a0dcf	Make sure DBImplFollower::stop_requested_ is initialized (#12572 ) Summary: Pull Request resolved: https://github.com/facebook/rocksdb/pull/12572 Reviewed By: jowlyzhang, anand1976 Differential Revision: D56426800 fbshipit-source-id: a31f86d8869148092325924db4e7fbfad28777a4	2024-04-22 12:02:28 -07:00
anand76	d8fb849b7e	Basic RocksDB follower implementation (#12540 ) Summary: A basic implementation of RocksDB follower mode, which opens a remote database (referred to as leader) on a distributed file system by tailing its MANIFEST. It leverages the secondary instance mode, but is different in some key ways - 1. It has its own directory with links to the leader's database 2. Periodically refreshes itself 3. (Future) Snapshot support 4. (Future) Garbage collection of obsolete links 5. (Long term) Memtable replication There are two main classes implementing this functionality - `DBImplFollower` and `OnDemandFileSystem`. The former is derived from `DBImplSecondary`. Similar to `DBImplSecondary`, it implements recovery and catch up through MANIFEST tailing using the `ReactiveVersionSet`, but does not consider logs. In a future PR, we will implement memtable replication, which will eliminate the need to catch up using logs. In addition, the recovery and catch-up tries to avoid directory listing as repeated metadata operations are expensive. The second main piece is the `OnDemandFileSystem`, which plugs in as an `Env` for the follower instance and creates the illusion of the follower directory as a clone of the leader directory. It creates links to SSTs on first reference. When the follower tails the MANIFEST and attempts to create a new `Version`, it calls `VerifyFileMetadata` to verify the size of the file, and optionally the unique ID of the file. During this process, links are created which prevent the underlying files from getting deallocated even if the leader deletes the files. TODOs: Deletion of obsolete links, snapshots, robust checking against misconfigurations, better observability etc. Pull Request resolved: https://github.com/facebook/rocksdb/pull/12540 Reviewed By: jowlyzhang Differential Revision: D56315718 Pulled By: anand1976 fbshipit-source-id: d19e1aca43a6af4000cb8622a718031b69ebd97b	2024-04-19 19:13:31 -07:00
Hui Xiao	f0864d3eec	Temporarily disable reopen with unsync data loss (#12567 ) Summary: Context/Summary: See https://github.com/facebook/rocksdb/pull/12556 for the original problem. The [fix](https://github.com/facebook/rocksdb/pull/12556) encountered some design [discussion](https://github.com/facebook/rocksdb/pull/12556#discussion_r1572729453) that might take longer than I expected. Temporarily disable reopen with unsync data loss now just to stablize our crash test since the original problem is root-caused already. Pull Request resolved: https://github.com/facebook/rocksdb/pull/12567 Test Plan: CI Reviewed By: ltamasi Differential Revision: D56365503 Pulled By: hx235 fbshipit-source-id: 0755e82617c065f42be4c8429e86fa289b250855	2024-04-19 15:23:52 -07:00
Jay Huh	ca3814aef9	Fix scan-build path (#12563 ) Summary: As title Pull Request resolved: https://github.com/facebook/rocksdb/pull/12563 Test Plan: ``` make -j64 release ``` Before the fix ``` $DEBUG_LEVEL is 0, $LIB_MODE is static Makefile:306: Warning: /mnt/gvfs/third-party2/llvm-fb/1f6edd1ff15c99c861afc8f3cd69054cd974dd64/15/platform010/72a2ff8/../../src/llvm/clang/tools/scan-build/bin/scan-build does not exist ... ``` After the fix ``` $DEBUG_LEVEL is 0, $LIB_MODE is static ... ``` Reviewed By: ajkr Differential Revision: D56318047 Pulled By: jaykorean fbshipit-source-id: 4a11ad8353fc94aa96676e57c67063d051de5fbc	2024-04-19 08:45:16 -07:00
Jay Huh	909ff2c208	MultiCFSnapshot Refactor - separate multiget key range info from CFD & superversion info (#12561 ) Summary: While implementing MultiCFIterators (CoalescingIterator and AttributeGroupIterator), we found that the existing `NewIterators()` API does not ensure a uniform view of the DB across all column families. The `NewIterators()` function is utilized to generate child iterators for the MultiCfIterators, and it's expected that all child iterators maintain a consistent view of the DB. For example, within the loop where the super version for each CF is being obtained, if a CF undergoes compaction after the super versions for previous CFs have already been retrieved, we lose the consistency in the view of the CFs for the iterators due to the API not under a db mutex. This preliminary refactoring of `MultiCFSnapshot` aims to address this issue in the `NewIterators()` API in the later PR. Currently, `MultiCFSnapshot` is used to achieve a consistent view across CFs in `MultiGet`. The `MultiGetColumnFamilyData` contains MultiGet-specific information that can be decoupled from the cfd and sv, allowing `MultiCFSnapshot` to be used in other places. Pull Request resolved: https://github.com/facebook/rocksdb/pull/12561 Test Plan: Existing Unit Tests for `MultiCFSnapshot()` ``` ./db_basic_test -- --gtest_filter="MultiGet" ``` Performance Test Setup ``` make -j64 release TEST_TMPDIR=/dev/shm/db_bench ./db_bench -benchmarks="filluniquerandom" -key_size=32 -value_size=512 -num=10000000 -compression_type=none ``` Run ``` TEST_TMPDIR=/dev/shm/db_bench ./db_bench -use_existing_db=1 -benchmarks="multireadrandom" -cache_size=10485760000 ``` Before the change ``` DB path: [/dev/shm/db_bench/dbbench] multireadrandom : 4.760 micros/op 210072 ops/sec 4.760 seconds 1000000 operations; (0 of 1000000 found) ``` After the change ``` DB path: [/dev/shm/db_bench/dbbench] multireadrandom : 4.593 micros/op 217727 ops/sec 4.593 seconds 1000000 operations; (0 of 1000000 found) ``` Reviewed By: anand1976 Differential Revision: D56309422 Pulled By: jaykorean fbshipit-source-id: 7a9164d12c810b6c2d2db062827fcc4a36cbc77b	2024-04-18 20:11:01 -07:00
anand76	97991960e9	Retry DB::Open upon a corruption detected while reading the MANIFEST (#12518 ) Summary: This PR is a counterpart of https://github.com/facebook/rocksdb/issues/12427 . On file systems that support storage level data checksum and reconstruction, retry opening the DB if a corruption is detected when reading the MANIFEST. This could be done in `log::Reader`, but its a little complicated since the sequential file would have to be reopened in order to re-read the same data, and we may miss some subtle corruptions that don't result in checksum mismatch. The approach chosen here instead is to make the decision to retry in `DBImpl::Recover`, based on either an explicit corruption in the MANIFEST file, or missing SST files due to bad data in the MANIFEST. Pull Request resolved: https://github.com/facebook/rocksdb/pull/12518 Reviewed By: ajkr Differential Revision: D55932155 Pulled By: anand1976 fbshipit-source-id: 51755a29b3eb14b9d8e98534adb2e7d54b12ced9	2024-04-18 17:36:33 -07:00
Levi Tamasi	ef38d99edc	Sanity check the keys parameter in MultiGetEntityFromBatchAndDB (#12564 ) Summary: Pull Request resolved: https://github.com/facebook/rocksdb/pull/12564 Similarly to how `db`, `column_family`, and `results` are handled, bail out early from `WriteBatchWithIndex::MultiGetEntityFromBatchAndDB` if `keys` is `nullptr`. Note that these checks are best effort in the sense that with the current method signature, the callee has no way of reporting an error if `statuses` is `nullptr` or catching other types of invalid pointers (e.g. when `keys` and/or `results` is non-`nullptr` but do not point to a contiguous range of `num_keys` objects). We can improve this (and many similar RocksDB APIs) using `std::span` in a major release once we move to C++20. Reviewed By: jaykorean Differential Revision: D56318179 fbshipit-source-id: bc7a258eda82b5f6c839f212ab824130e773a4f0	2024-04-18 14:26:58 -07:00
Levi Tamasi	0df601ab07	Reset user-facing wide-column stuctures upon deserialization failures (#12562 ) Summary: Pull Request resolved: https://github.com/facebook/rocksdb/pull/12562 The patch makes a small usability improvement by consistently resetting any user-facing wide-column structures (`DBIter::columns()`, `BaseDeltaIterator::columns()`, and any `PinnableWideColumns` objects) upon encountering any deserialization failures. Reviewed By: jaykorean Differential Revision: D56312764 fbshipit-source-id: 44efed0d1720cc06bf6facf928f73ce39a1bd2ca	2024-04-18 13:08:34 -07:00
Levi Tamasi	e82fe7c0b7	Fix the move semantics of PinnableWideColumns (#12557 ) Summary: Pull Request resolved: https://github.com/facebook/rocksdb/pull/12557 Unlike for other sequence containers, the C++ standard allows moving an `std::string` to invalidate pointers/iterators/references. In practice, this happens with short strings which are stored "inline" in the `std::string` object (small string optimization). Since `PinnableSlice` uses `std::string` as its internal buffer, and `PinnableWideColumns` in turn is implemented in terms of `PinnableSlice`, this means that the default compiler-generated move operations can invalidate the column index stored in `PinnableWideColumns::columns_`. The PR fixes this by providing custom move constructor/move assignment implementations for `PinnableWideColumns` that recreate the `columns_` index upon move. Reviewed By: jaykorean Differential Revision: D56275054 fbshipit-source-id: e8648c003dbcf1c39ec122ad229780c28138e730	2024-04-17 18:56:23 -07:00
Jay Huh	4f584652ab	Add an option to wait for purge in WaitForCompact (#12520 ) Summary: Adding an option to wait for purge to complete in `WaitForCompact` API. Internally, RocksDB has a way to wait for purge to complete (e.g. TEST_WaitForPurge() in db_impl_debug.cc), but there's no public API available for gracefully wait for purge to complete. Pull Request resolved: https://github.com/facebook/rocksdb/pull/12520 Test Plan: Unit Test Added - `WaitForCompactWithWaitForPurgeOptionTest` ``` ./deletefile_test -- --gtest_filter="WaitForCompactWithWaitForPurgeOptionTest" ``` Existing Tests ``` ./db_compaction_test -- --gtest_filter="WaitForCompactWithOption" ``` Reviewed By: ajkr Differential Revision: D55888283 Pulled By: jaykorean fbshipit-source-id: cfc6d6e8657deaefab8961890b36e390095c9f65	2024-04-17 17:33:27 -07:00
Andrew Kryczka	7027265417	Fix `max_successive_merges` counting CPU overhead regression (#12546 ) Summary: In https://github.com/facebook/rocksdb/issues/12365 we made `max_successive_merges` non-strict by default. Before https://github.com/facebook/rocksdb/issues/12365, `CountSuccessiveMergeEntries()`'s scan was implicitly limited to `max_successive_merges` entries for a given key, because after that the merge operator would be invoked and the merge chain would be collapsed. After https://github.com/facebook/rocksdb/issues/12365, the merge chain will not be collapsed no matter how long it is when the chain's operands are not all in memory. Since `CountSuccessiveMergeEntries()` scanned the whole merge chain, https://github.com/facebook/rocksdb/issues/12365 had a side effect that it would scan more memtable entries. This PR introduces a limit so it won't scan more entries than it could before. Pull Request resolved: https://github.com/facebook/rocksdb/pull/12546 Reviewed By: jaykorean Differential Revision: D56193693 Pulled By: ajkr fbshipit-source-id: b070ba0703ef733e0ff230f89cd5cca5233b84da	2024-04-17 12:11:24 -07:00
Jay Huh	02ea0d6367	Reserve vector in advance to avoid resizing in GetLiveFilesMetaData (#12554 ) Summary: As title Pull Request resolved: https://github.com/facebook/rocksdb/pull/12554 Test Plan: Existing CI Reviewed By: ajkr Differential Revision: D56252201 Pulled By: jaykorean fbshipit-source-id: 06211555a54ce5e6bf656b81109022494e6787ea	2024-04-17 11:01:06 -07:00
Hui Xiao	3bbacda9b1	Disallow inplace_update_support with allow_concurrent_memtable_write (#12550 ) Summary: Context/Summary: In-place memtable updates (inplace_update_support) is not compatible with concurrent writes (allow_concurrent_memtable_write). So we disallow this combination in crash test Pull Request resolved: https://github.com/facebook/rocksdb/pull/12550 Test Plan: CI Reviewed By: jaykorean Differential Revision: D56204269 Pulled By: hx235 fbshipit-source-id: 06608f2591db5e37470a1da6afcdfd2701781c2d	2024-04-16 19:41:38 -07:00
Hui Xiao	24a35b6e57	Add more public APIs to crash/stress test (#12541 ) Summary: Context/Summary: This PR includes some public DB APIs not tested in crash/stress yet can be added in a straightforward way. Pull Request resolved: https://github.com/facebook/rocksdb/pull/12541 Test Plan: - Locally run crash test heavily stressing on these new APIs - CI Reviewed By: jowlyzhang Differential Revision: D56164892 Pulled By: hx235 fbshipit-source-id: 8bb568c3e65aec39d642987033f1d76c52f69bd8	2024-04-16 15:43:26 -07:00
Levi Tamasi	87e164f39a	Add a couple of missing (Multi)GetEntity overloads to StackableDB (#12551 ) Summary: Pull Request resolved: https://github.com/facebook/rocksdb/pull/12551 Reviewed By: jaykorean Differential Revision: D56206320 fbshipit-source-id: f5d25732d5a138d2460cb2e1820830701fd05c78	2024-04-16 14:30:22 -07:00
Jay Huh	b7319d8a10	MultiCfIterator - Tests for lower/upper bounds (#12548 ) Summary: Thanks to how we are using `DBIter` as child iterators in MultiCfIterators (both `CoalescingIterator` and `AttributeGroupIterator`), we got the lower/upper bound feature for free. This PR simply adds unit test coverage to ensure that the lower/upper bounds are working as expected in the MultiCfIterators. Pull Request resolved: https://github.com/facebook/rocksdb/pull/12548 Test Plan: UnitTest Added ``` ./multi_cf_iterator_test ``` Reviewed By: ltamasi Differential Revision: D56197966 Pulled By: jaykorean fbshipit-source-id: fa51cc70705dbc5efd836ac006a7c6a49d05707a	2024-04-16 14:20:13 -07:00
Jay Huh	dfdc3b158e	Add offpeak feature to crash test (#12549 ) Summary: As title. Add offpeak feature in stress test. Pull Request resolved: https://github.com/facebook/rocksdb/pull/12549 Test Plan: Ran stress test locally with the flag set ``` Running db_stress with pid=701060: ./db_stress ... --daily_offpeak_time_utc=04:00-08:00 ... --periodic_compaction_seconds=10 ... ... KILLED 701060 stdout: Choosing random keys with no overwrite Creating 6250000 locks 2024/04/16-11:38:19 Initializing db_stress RocksDB version : 9.2 Format version : 5 TransactionDB : false Stacked BlobDB : false Read only mode : false Atomic flush : false Manual WAL flush : true Column families : 1 Clear CFs one in : 0 Number of threads : 32 Ops per thread : 100000000 Time to live(sec) : unused Read percentage : 60% Prefix percentage : 0% Write percentage : 35% Delete percentage : 4% Delete range percentage : 1% No overwrite percentage : 1% Iterate percentage : 0% Custom ops percentage : 0% DB-write-buffer-size : 0 Write-buffer-size : 4194304 Iterations : 10 Max key : 25000000 Ratio #ops/#keys : 128.000000 Num times DB reopens : 0 Batches/snapshots : 0 Do update in place : 0 Num keys per lock : 4 Compression : LZ4 Bottommost Compression : DisableOption Checksum type : kxxHash File checksum impl : none Bloom bits / key : 18.000000 Max subcompactions : 4 Use MultiGet : false Use GetEntity : false Use MultiGetEntity : false Verification only : false Memtablerep : skip_list Test kill odd : 0 Periodic Compaction Secs : 10 Daily Offpeak UTC : 04:00-08:00 <<<<<<<<<<<<<<< Newly added Compaction TTL : 0 Compaction Pri : kMinOverlappingRatio Background Purge : 0 Write DB ID to manifest : 0 Max Write Batch Group Size: 16 Use dynamic level : 1 Read fault one in : 0 Write fault one in : 1000 Open metadata write fault one in: 8 Sync fault injection : 0 Best efforts recovery : 0 Fail if OPTIONS file error: 0 User timestamp size bytes : 0 Persist user defined timestamps : 1 WAL compression : zstd Try verify sst unique id : 1 ------------------------------------------------ ``` Reviewed By: hx235 Differential Revision: D56203102 Pulled By: jaykorean fbshipit-source-id: 11a9be7362b3b26940d74d41c8bf4ebac3f03a2d	2024-04-16 12:44:44 -07:00
Levi Tamasi	c0aef2a28e	Add MultiGetEntityFromBatchAndDB to WriteBatchWithIndex (#12539 ) Summary: Pull Request resolved: https://github.com/facebook/rocksdb/pull/12539 As a follow-up to https://github.com/facebook/rocksdb/pull/12533, this PR extends `WriteBatchWithIndex` with a `MultiGetEntityFromBatchAndDB` API that enables users to perform batched wide-column point lookups with read-your-own-writes consistency. This API transparently combines data from the indexed write batch and the underlying database as needed and presents the results in the form of a wide-column entity. Reviewed By: jaykorean Differential Revision: D56153145 fbshipit-source-id: 537967051b7521bb41b04070ac1a78a1d8873c08	2024-04-16 08:58:04 -07:00
Jay Huh	d34712e0ac	MultiCfIterator - AttributeGroupIter Impl & CoalescingIter Optimization (#12534 ) Summary: Continuing from the previous MultiCfIterator Implementations - (https://github.com/facebook/rocksdb/issues/12422, https://github.com/facebook/rocksdb/issues/12480 #12465), this PR completes the `AttributeGroupIterator` by implementing `AttributeGroupIteratorImpl::AddToAttributeGroups()`. While implementing the `AttributeGroupIterator`, we had to make some changes in `MultiCfIteratorImpl` and found an opportunity to improve `Coalesce()` in `CoalescingIterator`. Lifting `UNDER CONSTRUCTION - DO NOT USE` comment by replacing it with `EXPERIMENTAL` Here are some implementation details: - `IteratorAttributeGroups` is introduced to avoid having to copy all `WideColumn` objects during iteration. - `PopulateIterator()` no longer advances non-top iterators that have the same key as the top iterator in the heap. - `AdvanceIterator()` needs to advance the non-top iterators when they have the same key as the top iterator in the heap. - Instead of populating one by one, `PopulateIterator()` now collects all items with the same key and calls `populate_func(items)` at once. - This allowed optimization in `Coalesce()` such that we no longer do K-1 rounds of 2-way merge, but do one K-way merge instead. Pull Request resolved: https://github.com/facebook/rocksdb/pull/12534 Test Plan: Uncommented the assertions in `verifyAttributeGroupIterator()` ``` ./multi_cf_iterator_test ``` Reviewed By: ltamasi Differential Revision: D56089019 Pulled By: jaykorean fbshipit-source-id: 6b0b4247e221f69b40b147d41492008cc9b15054	2024-04-16 08:45:38 -07:00
Hui Xiao	d41e568b1c	Add inplace_update_support to crash/stress test (#12535 ) Summary: Context/Summary: `inplace_update_support=true` is not tested in crash/stress test. Since it's not compatible with snapshots like compaction_filter, we need to sanitize its value in presence of snapshots-related options. A minor refactoring is added to centralize such sanitization in db_crashtest.py - see `check_multiget_consistency` and `check_multiget_entity_consistency` Pull Request resolved: https://github.com/facebook/rocksdb/pull/12535 Test Plan: CI Reviewed By: ajkr Differential Revision: D56102978 Pulled By: hx235 fbshipit-source-id: 2e2ab6685a65123b14a321b99f45f60bc6509c6b	2024-04-15 16:11:58 -07:00

... 5 6 7 8 9 ...

12947 commits