rocksdb

Commit Graph

Author	SHA1	Message	Date
Alexander Kiel	6e7701d49b	Fix JavaDoc of setCompactionReadaheadSize (#12090 ) Summary: Recently in https://github.com/facebook/rocksdb/issues/11762 the default of `compaction_readahead_size` changed from 0 to 2 MB. Closes: https://github.com/facebook/rocksdb/issues/12088 Pull Request resolved: https://github.com/facebook/rocksdb/pull/12090 Reviewed By: jaykorean Differential Revision: D51531762 Pulled By: ajkr fbshipit-source-id: a0b7145a1dca95ee90ffa3553f6eeacce6424aee	2023-11-27 11:50:53 -08:00
Peter Dillinger	4dd2bb8f70	Fix stack trace trimming with LLDB (#12101 ) Summary: I must have chosen trimming before frame 8 based on assertion failures, but that trims too many frame for a general segfault. So this changes to start printing at frame 4, as in this example where I've seeded a null deref: ``` Received signal 11 (Segmentation fault) Invoking LLDB for stack trace... Process 873208 stopped * thread #1, name = 'db_stress', stop reason = signal SIGSTOP frame #0: 0x00007fb1fe8f1033 libc.so.6`__GI___wait4(pid=873478, stat_loc=0x00007fb1fb114030, options=0, usage=0x0000000000000000) at wait4.c:30:10 thread #2, name = 'rocksdb:low', stop reason = signal SIGSTOP frame #0: 0x00007fb1fe8972a1 libc.so.6`__GI___futex_abstimed_wait_cancelable64 at futex-internal.c:57:12 Executable module set to "/data/users/peterd/rocksdb/db_stress". Architecture set to: x86_64-unknown-linux-gnu. True frame #4: 0x00007fb1fe844540 libc.so.6`__restore_rt at libc_sigaction.c:13 frame #5: 0x0000000000608514 db_stress`rocksdb::StressTest::InitDb(rocksdb::SharedState) at db_stress_test_base.cc:345:18 frame #6: 0x0000000000585d62 db_stress`rocksdb::RunStressTestImpl(rocksdb::SharedState) at db_stress_driver.cc:84:17 frame #7: 0x000000000058dd69 db_stress`rocksdb::RunStressTest(shared=0x00006120000001c0) at db_stress_driver.cc:266:34 frame #8: 0x0000000000453b34 db_stress`rocksdb::db_stress_tool(int, char**) at db_stress_tool.cc:370:20 ... ``` Pull Request resolved: https://github.com/facebook/rocksdb/pull/12101 Test Plan: manual (see above) Reviewed By: ajkr Differential Revision: D51593217 Pulled By: pdillinger fbshipit-source-id: 4a71eb8e516edbc32e682f9537bc77d073a7b4ed	2023-11-27 11:49:52 -08:00
Peter Dillinger	f6fd4b9dbd	Print stack traces more reliably with concurrency (#12086 ) Summary: It's been relatively easy to break our stack trace printer: * If another thread reaches a signal condition such as a related SEGV or assertion failure while one is trying to print a stack trace from the signal handler, it seems to end the process abruptly without a stack trace. * If the process exits normally in one thread (such as main finishing) while another is trying to print a stack trace from the signal handler, it seems the process will often end normally without a stack trace. This change attempts to fix these issues, with * Keep the custom signal handler installed as long as possible, so that other threads will most likely re-enter our custom handler. (We only switch back to default for triggering core dump or whatever after stack trace.) * Use atomics and sleeps to implement a crude recursive mutex for ensuring all threads hitting the custom signal handler wait on the first that is trying to print a stack trace, while recursive signals in the same thread can still be handled cleanly. * Use an atexit handler to hook into normal exit to (a) wait on a pending printing of stack trace when detectable and applicable, and (b) detect and warn when printing a stack trace might be interrupted by a process exit in progress. (I don't know how to pause that after our atexit handler has been called; the best I know how to do is warn, "In a race with process already exiting...".) Pull Request resolved: https://github.com/facebook/rocksdb/pull/12086 Test Plan: manual, including with TSAN. I added this code to the end of a unit test file: ``` for (size_t i = 0; i < 3; ++i) { std::thread t([]() { assert(false); }); t.detach(); } ``` Followed by either `sleep(100)` or `usleep(100)` or usual process exit. And for recursive signal testing, inject `abort()` at various places in the handler. Reviewed By: cbi42 Differential Revision: D51531882 Pulled By: pdillinger fbshipit-source-id: 3473b863a43e61b722dfb7a2ed12a8120949b09c	2023-11-22 11:55:10 -08:00
Peter Dillinger	a140b519b1	Convert all but one windows job to nightly (#12089 ) Summary: ... because they are expensive and rarely disagree with each other. Historical data indicates that the 2019 job is most sensitive to failure. https://fburl.com/scuba/opensource_ci_jobs/ntq3ue3p https://fburl.com/scuba/opensource_ci_jobs/0xo91j5f Pull Request resolved: https://github.com/facebook/rocksdb/pull/12089 Test Plan: CI Reviewed By: ajkr Differential Revision: D51530386 Pulled By: pdillinger fbshipit-source-id: 8b676d6e01096e359a0f465b59d81ac10f4f7969	2023-11-22 10:40:52 -08:00
cz2h	324453e579	Fix rowcache get returning incorrect timestamp (#11952 ) Summary: Fixes https://github.com/facebook/rocksdb/issues/7930. When there is a timestamp associated with stored records, get from row cache will return the timestamp provided in query instead of the timestamp associated with the stored record. ## Cause of error: Currently a row_handle is fetched using row_cache_key(contains a timestamp provided by user query) and the row_handle itself does not persist timestamp associated with the object. Hence the [GetContext::SaveValue() ](`6e3429b8a6/table/get_context.cc (L257)`) function will fetch the timestamp in row_cache_key and may return the incorrect timestamp value. ## Proposed Solution If current cf enables ts, append a timestamp associated with stored records after the value in replay_log (equivalently the value of row cache entry). When read, `replayGetContextLog()` will update parsed_key with the correct timestamp. Pull Request resolved: https://github.com/facebook/rocksdb/pull/11952 Reviewed By: ajkr Differential Revision: D51501176 Pulled By: jowlyzhang fbshipit-source-id: 808fc943a8ae95de56ae0e82ec59a2573a031f28	2023-11-21 20:39:33 -08:00
Jay Huh	ddb7df10ef	Update HISTORY.md and version.h for 8.9.fb release (#12074 ) Summary: Creating cut for 8.9 release Pull Request resolved: https://github.com/facebook/rocksdb/pull/12074 Test Plan: CI Reviewed By: ajkr Differential Revision: D51435289 Pulled By: jaykorean fbshipit-source-id: 3918a8250032839e5b71f67f26c8ba01cbc17a41	2023-11-21 18:07:19 -08:00
Yu Zhang	84a54e1e28	Fix some bugs in index builder and reader for the UDT in memtable only feature (#12062 ) Summary: These bugs surfaced while I was trying to add the stress test for the feature: Bug 1) On the index building path: the optimization to use user key instead of internal key as separator needed a bit tweak for when user defined timestamps can be removed. Because even though the user key look different now and eligible to be used as separator, when their user-defined timestamps are removed, they could be equal and that invariant no longer stands. Bug 2) On the index reading path: one path that builds the second level index iterator for `PartitionedIndexReader` are not passing the corresponding `user_defined_timestamps_persisted` flag. As a result, the default `true` value be used leading to no minimum timestamps padded when they should be. Pull Request resolved: https://github.com/facebook/rocksdb/pull/12062 Test Plan: For bug 1): added separate unit test `BlockBasedTableReaderTest::Get` to exercise the `Get` API. It's a different code path from `MultiGet` so worth having its own test. Also in order to cover the bug, the test is modified to generate key values with the same user provided key, different timestamps and different sequence numbers. The test reads back different versions of the same user provided key. `MultiGet` takes one `ReadOptions` with one read timestamp so we cannot test retrieving different versions of the same key easily. For bug 2): simply added options `BlockBasedTableOptions.metadata_cache_options.partition_pinning = PinningTier::kAll` to exercise all the index iterator creating paths. Reviewed By: ltamasi Differential Revision: D51508280 Pulled By: jowlyzhang fbshipit-source-id: 8b174d3d70373c0599266ac1f467f2bd4d7ea6e5	2023-11-21 14:05:02 -08:00
songqing	d3e015fe06	Fix compact_files_example (#12084 ) Summary: The option "write_buffer_size" has changed from 4MB for 64MB by default, and the compact_files_example will not work as expected, as the test data written is only about 50MB and will not trigger compaction. Pull Request resolved: https://github.com/facebook/rocksdb/pull/12084 Reviewed By: cbi42 Differential Revision: D51499959 Pulled By: ajkr fbshipit-source-id: 4f4b25ebc4b6bb568501adc8e97813edcddceea8	2023-11-21 09:34:59 -08:00
Andrew Kryczka	04cbc77b90	Add missing license to source files (#12083 ) Summary: Fixes https://github.com/facebook/rocksdb/issues/12079. Fixed missing licenses in "\.h" and "\.cc" files Pull Request resolved: https://github.com/facebook/rocksdb/pull/12083 Reviewed By: cbi42 Differential Revision: D51489634 Pulled By: ajkr fbshipit-source-id: 764bfee257b9d6603fd7606a55664b7537e1898f	2023-11-21 08:36:30 -08:00
anand76	336a74db60	Add some asserts in ~CacheWithSecondaryAdapter (#12082 ) Summary: Add some asserts in the `CacheWithSecondaryAdapter` destructor to help debug a crash test failure. Pull Request resolved: https://github.com/facebook/rocksdb/pull/12082 Reviewed By: cbi42 Differential Revision: D51486041 Pulled By: anand1976 fbshipit-source-id: 76537beed31ba27ab9ac8b4ce6deb775629e3be5	2023-11-20 17:48:17 -08:00
Changyu Bi	fb5c8c7ea3	Do not compare op_type in `WithinPenultimateLevelOutputRange()` (#12081 ) Summary: `WithinPenultimateLevelOutputRange()` is updated in https://github.com/facebook/rocksdb/issues/12063 to check internal key range. However, op_type of a key can change during compaction, e.g. MERGE -> PUT, which makes a key larger and becomes out of penultimate output range. This has caused stress test failures with error message "Unsafe to store Seq later than snapshot in the last level if per_key_placement is enabled". So update `WithinPenultimateLevelOutputRange()` to only check user key and sequence number. Pull Request resolved: https://github.com/facebook/rocksdb/pull/12081 Test Plan: * This repro can produce the corruption within a few runs. Ran it a few times after the fix and did not see Corruption failure. ``` python3 ./tools/db_crashtest.py whitebox --test_tiered_storage --random_kill_odd=888887 --use_merge=1 --writepercent=100 --readpercent=0 --prefixpercent=0 --delpercent=0 --delrangepercent=0 --iterpercent=0 --write_buffer_size=419430 --column_families=1 --read_fault_one_in=0 --write_fault_one_in=0 ``` Reviewed By: ajkr Differential Revision: D51481202 Pulled By: cbi42 fbshipit-source-id: cad6b65099733e03071b496e752bbdb09cf4db82	2023-11-20 17:07:28 -08:00
Timo Riski	39d33475da	Fix build on FreeBSD (#11218 ) (#12078 ) Summary: Fixes https://github.com/facebook/rocksdb/issues/11218 Changes from https://github.com/facebook/rocksdb/issues/10881 broke FreeBSD builds with: env/io_posix.h:39:9: error: 'POSIX_MADV_NORMAL' macro redefined [-Werror,-Wmacro-redefined] This commit fixes FreeBSD builds by ignoring MADV defines. Pull Request resolved: https://github.com/facebook/rocksdb/pull/12078 Reviewed By: cbi42 Differential Revision: D51452802 Pulled By: ajkr fbshipit-source-id: 0a1f5a90954e7d257a95794277a843ac77f3a709	2023-11-20 10:11:16 -08:00
Changyu Bi	b059c5680e	Add missing copyright header (#12076 ) Summary: Required for open source repo. Pull Request resolved: https://github.com/facebook/rocksdb/pull/12076 Reviewed By: ajkr Differential Revision: D51449839 Pulled By: cbi42 fbshipit-source-id: 4a25a3422880db3f28a2834d966341935db32530	2023-11-19 09:50:59 -08:00
Benoît Mériaux	7780e98268	add write_buffer_manager setter into options and tests in c bindings, (#12007 ) Summary: following https://github.com/facebook/rocksdb/pull/11710 - add test on wbm c api - add a setter for WBM in `DBOptions` Pull Request resolved: https://github.com/facebook/rocksdb/pull/12007 Reviewed By: cbi42 Differential Revision: D51430042 Pulled By: ajkr fbshipit-source-id: 608bc4d3ed35a84200459d0230b35be64b3475f7	2023-11-17 11:34:05 -08:00
Changyu Bi	4e58cc6437	Check internal key range when compacting from last level to penultimate level (#12063 ) Summary: The test failure in https://github.com/facebook/rocksdb/issues/11909 shows that we may compact keys outside of internal key range of penultimate level input files from last level to penultimate level, which can potentially cause overlapping files in the penultimate level. This PR updates the `Compaction::WithinPenultimateLevelOutputRange()` to check internal key range instead of user key. Other fixes: * skip range del sentinels when deciding output level for tiered compaction Pull Request resolved: https://github.com/facebook/rocksdb/pull/12063 Test Plan: - existing unit tests - apply the fix to https://github.com/facebook/rocksdb/issues/11905 and run `./tiered_compaction_test --gtest_filter="RangeDelsCauseFileEndpointsToOverlap"` Reviewed By: ajkr Differential Revision: D51288985 Pulled By: cbi42 fbshipit-source-id: 70085db5f5c3b15300bcbc39057d57b83fd9902a	2023-11-17 10:50:40 -08:00
Radek Hubner	2f9ea8193f	Add HyperClockCache Java API. (#12065 ) Summary: Fix https://github.com/facebook/rocksdb/issues/11510 Pull Request resolved: https://github.com/facebook/rocksdb/pull/12065 Reviewed By: ajkr Differential Revision: D51406695 Pulled By: cbi42 fbshipit-source-id: b9e32da5f9bcafb5365e4349f7295be90d5aa7ba	2023-11-16 15:46:31 -08:00
nccx	a9bd525b52	Add Qdrant to USERS.md (#12072 ) Summary: Pull Request resolved: https://github.com/facebook/rocksdb/pull/12072 Reviewed By: cbi42 Differential Revision: D51398080 Pulled By: ajkr fbshipit-source-id: 1043f2b012bd744e9c53c638e1ba56a3e0392e11	2023-11-16 10:35:08 -08:00
Gus Wynn	6d10f8d690	add WriteBufferManager to c api (#11710 ) Summary: I want to use the `WriteBufferManager` in my rust project, which requires exposing it through the c api, just like `Cache` is. Hopefully the changes are fairly straightfoward! Pull Request resolved: https://github.com/facebook/rocksdb/pull/11710 Reviewed By: cbi42 Differential Revision: D51166518 Pulled By: ajkr fbshipit-source-id: cd266ff1e4a7ab145d05385cd125a8390f51f3fc	2023-11-16 10:34:00 -08:00
Andrew Kryczka	9202db1867	Consider archived WALs for deletion more frequently (#12069 ) Summary: Fixes https://github.com/facebook/rocksdb/issues/11000. That issue pointed out that RocksDB was slow to delete archived WALs in case time-based and size-based expiration were enabled, and the time-based threshold (`WAL_ttl_seconds`) was small. This PR prevents the delay by taking into account `WAL_ttl_seconds` when deciding the frequency to process archived WALs for deletion. Pull Request resolved: https://github.com/facebook/rocksdb/pull/12069 Reviewed By: pdillinger Differential Revision: D51262589 Pulled By: ajkr fbshipit-source-id: e65431a06ee96f4c599ba84a27d1aedebecbb003	2023-11-15 15:42:28 -08:00
anand76	2222caec9e	Make CacheWithSecondaryAdapter reservation accounting more robust (#12059 ) Summary: `CacheWithSecondaryAdapter` can distribute placeholder reservations across the primary and secondary caches. The current implementation of the accounting is quite complicated in order to avoid using a mutex. This may cause the accounting to be slightly off after changes to the cache capacity and ratio, resulting in assertion failures. There's also a bug in the unlikely event that the total reservation exceeds the cache capacity. Furthermore, the current implementation is difficult to reason about. This PR simplifies it by doing the accounting while holding a mutex. The reservations are processed in 1MB chunks in order to avoid taking a lock too frequently. As a side effect, this also removes the restriction of not allowing to increase the compressed secondary cache capacity after decreasing it to 0. Pull Request resolved: https://github.com/facebook/rocksdb/pull/12059 Test Plan: Existing unit tests, and a new test for capacity increase from 0 Reviewed By: pdillinger Differential Revision: D51278686 Pulled By: anand1976 fbshipit-source-id: 7e1ad2c50694772997072dd59cab35c93c12ba4f	2023-11-14 16:25:52 -08:00
Radek Hubner	a660e074cd	Build RocksDBJava on Windows with Java8. (#12068 ) Summary: At the moment RocksDBJava uses the default CIrcleCI JVM on Windows builds. This can and has changed in the past and can cause some incompatibilities. This PR addresses the problem of explicitly installing and using Liberica JDK 8 as Java 8 Is the primary target for RocksdbJava. Pull Request resolved: https://github.com/facebook/rocksdb/pull/12068 Reviewed By: cbi42 Differential Revision: D51307233 Pulled By: ajkr fbshipit-source-id: 9cb4e173d8a9ac42e5f9fda1daf012302942fdbc	2023-11-14 14:39:31 -08:00
Yingchun Lai	37064d631b	Add encfs plugin link (#12070 ) Summary: Pull Request resolved: https://github.com/facebook/rocksdb/pull/12070 Reviewed By: jaykorean Differential Revision: D51307148 Pulled By: ajkr fbshipit-source-id: d04335506becd5970802f87ab0573b6307479222	2023-11-14 07:33:21 -08:00
Dzmitry Ivaniuk	65d71ee371	Fix warnings when using API (#12066 ) Summary: Fixes https://github.com/facebook/rocksdb/issues/11457. Pull Request resolved: https://github.com/facebook/rocksdb/pull/12066 Reviewed By: cbi42 Differential Revision: D51259966 Pulled By: ajkr fbshipit-source-id: a158b6f341b6b48233d917bfe4d00b639dbd8619	2023-11-13 20:03:44 -08:00
Changyu Bi	e7896f03ad	Enable unit test `PrecludeLastLevelTest.RangeDelsCauseFileEndpointsToOverlap` (#12064 ) Summary: Fixes https://github.com/facebook/rocksdb/issues/11909. The test passes after the change in https://github.com/facebook/rocksdb/issues/11917 to start mock clock from a non-zero time. The reason for test failing is a bit complicated: - The Put here `e4ad4a0ef1/db/compaction/tiered_compaction_test.cc (L2045)` happens before mock clock advances beyond 0. - This causes oldest_key_time_ to be 0 for memtable. - oldest_ancester_time of the first L0 file becomes 0 - L0 -> L5/6 compaction output files sets `oldest_ancestoer_time` to the current time due to these lines: `509947ce2c/db/compaction/compaction_job.cc (L1898C34-L1904)`. - This causes some small sequence number to be mapped to current time: `509947ce2c/db/compaction/compaction_job.cc (L301)` - Keys in L6 is being moved up to L5 due to the unexpected seqno_to_time mapping - When compacting keys from last level to the penultimate level, we only check keys to be within user key range of penultimate level input files. If we compact the following file 3 with file 1 and output keys to L5, we can get the reported inconsistency bug. ``` L5: file 1 [K5@20, K10@kMaxSeqno], file 2 [K10@30, K14@34) L6: file 3 [K6@5, K10@20] ``` https://github.com/facebook/rocksdb/issues/12063 will add fixes to check internal key range when compacting keys from last level up to the penultimate level. Pull Request resolved: https://github.com/facebook/rocksdb/pull/12064 Test Plan: the unit test passes Reviewed By: ajkr Differential Revision: D51281149 Pulled By: cbi42 fbshipit-source-id: 00b7f026c453454d9f3af5b2de441383a96f0c62	2023-11-13 15:26:52 -08:00
Jay Huh	8b8f6c63ef	ColumnFamilyHandle Nullcheck in GetEntity and MultiGetEntity (#12057 ) Summary: - Add missing null check for ColumnFamilyHandle in `GetEntity()` - `FailIfCfHasTs()` now returns `Status::InvalidArgument()` if `column_family` is null. `MultiGetEntity()` can rely on this for cfh null check. - Added `DeleteRange` API using Default Column Family to be consistent with other major APIs (This was also causing Java Test failure after the `FailIfCfHasTs()` change) Pull Request resolved: https://github.com/facebook/rocksdb/pull/12057 Test Plan: - Updated `DBWideBasicTest::GetEntityAsPinnableAttributeGroups` to include null CF case - Updated `DBWideBasicTest::MultiCFMultiGetEntityAsPinnableAttributeGroups` to include null CF case Reviewed By: jowlyzhang Differential Revision: D51167445 Pulled By: jaykorean fbshipit-source-id: 1c1e44fd7b7df4d2dc3bb2d7d251da85bad7d664	2023-11-13 14:30:04 -08:00
leipeng	b3ffca0e29	DBImpl::DelayWrite: Remove bad WRITE_STALL histogram (#12067 ) Summary: When delay didn't happen, histogram WRITE_STALL is still recorded, and ticker STALL_MICROS is not recorded. This is a bug, neither WRITE_STALL or STALL_MICROS should not be recorded when delay did not happen. Pull Request resolved: https://github.com/facebook/rocksdb/pull/12067 Reviewed By: cbi42 Differential Revision: D51263133 Pulled By: ajkr fbshipit-source-id: bd82d8328fe088d613991966e83854afdabc6a25	2023-11-13 12:48:44 -08:00
brodyhuang	9fb6851918	fix(StackableDB): Resume API (#12060 ) Summary: When I call `DBWithTTLImpl::Resume()`, it returns `Status::NotSupported`. Did `StackableDB` miss this API ? Thanks ! Pull Request resolved: https://github.com/facebook/rocksdb/pull/12060 Reviewed By: jaykorean Differential Revision: D51202742 Pulled By: ajkr fbshipit-source-id: 5e01a54a42efd81fd57b3c992b9af8bc45c59c9c	2023-11-13 12:09:58 -08:00
Yu Zhang	509947ce2c	Quarantine files in a limbo state after a manifest error (#12030 ) Summary: Part of the procedures to handle manifest IO error is to disable file deletion in case some files in limbo state get deleted prematurely. This is not ideal because: 1) not all the VersionEdits whose commit encounter such an error contain updates for files, disabling file deletion sometimes are not necessary. 2) `EnableFileDeletion` has a force mode that could make other threads accidentally disrupt this procedure in recovery. 3) Disabling file deletion as a whole is also not as efficient as more precisely tracking impacted files from being prematurely deleted. This PR replaces this mechanism with tracking such files and quarantine them from being deleted in `ErrorHandler`. These are the types of files being actively tracked in quarantine in this PR: 1) new table files and blob files from a background job 2) old manifest file whose immediately following new manifest file's CURRENT file creation gets into unclear state. Current handling is not sufficient to make sure the old manifest file is kept in case it's needed. Note that WAL logs are not part of the quarantine because `min_log_number_to_keep` is a safe mechanism and it's only updated after successful manifest commits so it can prevent this premature deletion issue from happening. We track these files' file numbers because they share the same file number space. Pull Request resolved: https://github.com/facebook/rocksdb/pull/12030 Test Plan: Modified existing unit tests Reviewed By: ajkr Differential Revision: D51036774 Pulled By: jowlyzhang fbshipit-source-id: 84ef26271fbbc888ef70da5c40fe843bd7038716	2023-11-11 08:11:11 -08:00
Andrew Kryczka	0ffc0c7db1	Allow `TtlMergeOperator` to wrap an unregistered `MergeOperator` (#12056 ) Summary: Followed mrambacher's first suggestion in https://github.com/facebook/rocksdb/pull/12044#issuecomment-1800706148. This change allows serializing a `TtlMergeOperator` that wraps an unregistered `MergeOperator`. Such a `TtlMergeOperator` cannot be loaded (validation will fail in `TtlMergeOperator::ValidateOptions()`), but that is OK for us currently. Pull Request resolved: https://github.com/facebook/rocksdb/pull/12056 Reviewed By: hx235 Differential Revision: D51125097 Pulled By: ajkr fbshipit-source-id: 8ed3705e8d36ab473673b9198eea6db64397ed15	2023-11-10 16:57:17 -08:00
Yu Zhang	c6c683a0ca	Remove the default force behavior for `EnableFileDeletion` API (#12001 ) Summary: Disabling file deletion can be critical for operations like making a backup, recovery from manifest IO error (for now). Ideally as long as there is one caller requesting file deletion disabled, it should be kept disabled until all callers agree to re-enable it. So this PR removes the default forcing behavior for the `EnableFileDeletion` API, and users need to explicitly pass the argument if they insisted on doing so knowing the consequence of what can be potentially disrupted. This PR removes the API's default argument value so it will cause breakage for all users that are relying on the default value, regardless of whether the forcing behavior is critical for them. When fixing this breakage, it's good to check if the forcing behavior is indeed needed and potential disruption is OK. This PR also makes unit test that do not need force behavior to do a regular enable file deletion. Pull Request resolved: https://github.com/facebook/rocksdb/pull/12001 Reviewed By: ajkr Differential Revision: D51214683 Pulled By: jowlyzhang fbshipit-source-id: ca7b1ebf15c09eed00f954da2f75c00d2c6a97e4	2023-11-10 14:35:54 -08:00
Yueh-Hsuan Chiang	5ef92b8ea4	Add rocksdb_options_set_cf_paths (#11151 ) Summary: This PR adds a missing set function for rocksdb_options in the C-API: rocksdb_options_set_cf_paths(). Without this function, users cannot specify different paths for different column families as it will fall back to db_paths. As a bonus, this PR also includes rocksdb_sst_file_metadata_get_directory() to the C api -- a missing public function that will also make the test easier to write. Pull Request resolved: https://github.com/facebook/rocksdb/pull/11151 Test Plan: Augment existing c_test to verify the specified cf_path. Reviewed By: hx235 Differential Revision: D51201888 Pulled By: ajkr fbshipit-source-id: 62a96451f26fab60ada2005ede3eea8e9b431f30	2023-11-10 11:36:11 -08:00
Yueh-Hsuan Chiang	73d223c4e2	Add auto_tuned option to RateLimiter C API (#12058 ) Summary: #### Problem While the RocksDB C API does have the RateLimiter API, it does not expose the auto_tuned option. #### Summary of Change This PR exposes auto_tuned RateLimiter option in RocksDB C API. Pull Request resolved: https://github.com/facebook/rocksdb/pull/12058 Test Plan: Augment the C API existing test to cover the new API. Reviewed By: cbi42 Differential Revision: D51201933 Pulled By: ajkr fbshipit-source-id: 5bc595a9cf9f88f50fee797b729ba96f09ed8266	2023-11-10 09:53:09 -08:00
Yu Zhang	dfaf4dc111	Stubs for piping write time (#12043 ) Summary: As titled. This PR contains the API and stubbed implementation for piping write time. Pull Request resolved: https://github.com/facebook/rocksdb/pull/12043 Reviewed By: pdillinger Differential Revision: D51076575 Pulled By: jowlyzhang fbshipit-source-id: 3b341263498351b9ccaff27cf35d5aeb5bdf0cf1	2023-11-09 15:58:07 -08:00
Yingchun Lai	c4c62c2304	Support to use environment variable to test customer encryption plugins (#12025 ) Summary: The CreateEnvTest.CreateEncryptedFileSystem unit test is to verify the creation functionality of EncryptedFileSystem, but now it just support the builtin CTREncryptionProvider class. This patch make it flexible to use environment variable `TEST_FS_URI`, it is useful to test customer encryption plugins. Pull Request resolved: https://github.com/facebook/rocksdb/pull/12025 Reviewed By: anand1976 Differential Revision: D50799656 Pulled By: ajkr fbshipit-source-id: dbcacfefbf07de9c7803f7707b34c5193bec17bf	2023-11-09 10:45:13 -08:00
brodyhuang	e90e9825b4	Drop wal record when sequence is illegal (#11985 ) Summary: - Our database is corrupted, causing some sequences of wal record to be invalid (but the `record_checksum` looks fine). - When we RecoverLogFiles in WALRecoveryMode::kPointInTimeRecovery, `assert(seq <= kMaxSequenceNumber)` will be failed. - When it is found that sequence is illegal, can we drop the file to recover as much data as possible ? Thx ! Pull Request resolved: https://github.com/facebook/rocksdb/pull/11985 Reviewed By: anand1976 Differential Revision: D50698039 Pulled By: ajkr fbshipit-source-id: 1e42113b58823088d7c0c3a92af5b3efbb5f5296	2023-11-09 10:43:16 -08:00
Kasper Isager Dalsgarð	f9b7877cf3	Ensure `target_include_directories()` is called with correct target name (#12055 ) Summary: `${PROJECT_NAME}` isn't guaranteed to match a target name when an artefact suffix is specified. Pull Request resolved: https://github.com/facebook/rocksdb/pull/12055 Reviewed By: anand1976 Differential Revision: D51125532 Pulled By: ajkr fbshipit-source-id: cd1f4a5b11eb517c379e3ee3f78592f7e606a034	2023-11-09 10:41:38 -08:00
Hui Xiao	f337533b6f	Ensure and clarify how RocksDB calls TablePropertiesCollector's functions (#12053 ) Summary: Context/Summary: It's intuitive for users to assume `TablePropertiesCollector::Finish()` is called only once by RocksDB internal by the word "finish". However, this is currently not true as RocksDB also calls this function in `BlockBased/PlainTableBuilder::GetTableProperties()` to populate user collected properties on demand. This PR avoids that by moving that populating to where we first call `Finish()` (i.e, `NotifyCollectTableCollectorsOnFinish`) Bonus: clarified in the API that `GetReadableProperties()` will be called after `Finish()` and added UT to ensure that. Pull Request resolved: https://github.com/facebook/rocksdb/pull/12053 Test Plan: - Modified test `DBPropertiesTest.GetUserDefinedTableProperties` to ensure `Finish()` only called once. - Existing test particularly `db_properties_test, table_properties_collector_test` verify the functionality `NotifyCollectTableCollectorsOnFinish` and `GetReadableProperties()` are not broken by this change. Reviewed By: ajkr Differential Revision: D51095434 Pulled By: hx235 fbshipit-source-id: 1c6275258f9b99dedad313ee8427119126817973	2023-11-08 14:00:36 -08:00
Peter Dillinger	65cde19f40	Safer wrapper for std::atomic, use in HCC (#12051 ) Summary: See new atomic.h file comments for motivation. I have updated HyperClockCache to use the new atomic wrapper, fixing a few cases where an implicit conversion was accidentally used and therefore mixing std::memory_order_seq_cst where release/acquire ordering (or relaxed) was intended. There probably wasn't a real bug because I think all the cases happened to be in single-threaded contexts like constructors/destructors or statistical ops like `GetCapacity()` that don't need any particular ordering constraints. Recommended follow-up: * Replace other uses of std::atomic to help keep them safe from bugs. Pull Request resolved: https://github.com/facebook/rocksdb/pull/12051 Test Plan: Did some local correctness stress testing with cache_bench. Also triggered 15 runs of fbcode_blackbox_crash_test and saw no related failures (just 3 failures in ~CacheWithSecondaryAdapter(), already known) No performance difference seen before & after running simultaneously: ``` (while ./cache_bench -cache_type=fixed_hyper_clock_cache -populate_cache=0 -cache_size=3000000000 -ops_per_thread=500000 -threads=12 -histograms=0 2>&1 \| grep parallel; do :; done) \| awk '{ s += $3; c++; print "Avg time: " (s/c);}' ``` ... for both fixed_hcc and auto_hcc. Reviewed By: jowlyzhang Differential Revision: D51090518 Pulled By: pdillinger fbshipit-source-id: eeb324facb3185584603f9ea0c4de6f32919a2d7	2023-11-08 13:28:43 -08:00
Yingchun Lai	e406c26c4e	Update the API comments of NewRandomRWFile() (#11820 ) Summary: Env::NewRandomRWFile() will not create the file if it doesn't exist, as the test saying https://github.com/facebook/rocksdb/blob/main/env/env_test.cc#L2208. This patch correct the comments of Env::NewRandomRWFile(), it may mislead the developers who use rocksdb Env() as an utility. Pull Request resolved: https://github.com/facebook/rocksdb/pull/11820 Reviewed By: ajkr Differential Revision: D50176707 Pulled By: jowlyzhang fbshipit-source-id: a6ee469f549360de8d551a4fe8517b4450df7b15	2023-11-08 12:28:00 -08:00
Peter Dillinger	9af25a392b	Clean up AutoHyperClockTable::PurgeImpl (#12052 ) Summary: There was some unncessary logic (e.g. a dead assignment to home_shift) left over from earlier revision of the code. Also, rename confusing ChainRewriteLock::new_head_ / GetNewHead() to saved_head_ / GetSavedHead(). Pull Request resolved: https://github.com/facebook/rocksdb/pull/12052 Test Plan: existing tests Reviewed By: jowlyzhang Differential Revision: D51091499 Pulled By: pdillinger fbshipit-source-id: 4b191b60a2b16085681e59d49c4d97e802869db8	2023-11-07 16:35:19 -08:00
Zaidoon Abd Al Hadi	58f2a29fb4	Expose Options::periodic_compaction_seconds through C API (#12019 ) Summary: fixes [11090](https://github.com/facebook/rocksdb/issues/11090) Pull Request resolved: https://github.com/facebook/rocksdb/pull/12019 Reviewed By: jaykorean Differential Revision: D51076427 Pulled By: cbi42 fbshipit-source-id: de353ff66c7f73aba70ab3379e20d8c40f50d873	2023-11-07 12:46:50 -08:00
Alan Paxton	c181667c4f	FIX new blog post (JNI performance) Locate images correctly (#12050 ) Summary: We set up the images / references to the images wrongly in https://github.com/facebook/rocksdb/pull/11818 Images should be in the docs/static/images/… directory with an absolute reference to /static/images/… Make it so. Pull Request resolved: https://github.com/facebook/rocksdb/pull/12050 Reviewed By: pdillinger Differential Revision: D51079811 Pulled By: jaykorean fbshipit-source-id: 4c1ab80d313b70d0e60eec94086451d7b2814922	2023-11-07 11:58:58 -08:00
Guozhang Wu	c06309c832	Not to print unnecessary commands in Makefile (#11978 ) Summary: When I run `make check`, there is a command that should not be printed to screen, which is shown below. ```text ... ... Generating parallel test scripts for util_merge_operators_test Generating parallel test scripts for write_batch_with_index_test make[2]: Leaving directory '/home/z/rocksdb' make[1]: Leaving directory '/home/z/rocksdb' GEN check make[1]: Entering directory '/home/z/rocksdb' $DEBUG_LEVEL is 1, $LIB_MODE is shared Makefile:185: Warning: Compiling in debug mode. Don't use the resulting binary in production printf '%s\n' '' \ 'To monitor subtest <duration,pass/fail,name>,' \ ' run "make watch-log" in a separate window' ''; \ { \ printf './%s\n' db_bloom_filter_test deletefile_test env_test c_test; \ find t -name 'run-' -print; \ } \ \| perl -pe 's,(^.MySQLStyleTransactionTest.$\|^.SnapshotConcurrentAccessTest.$\|^.SeqAdvanceConcurrentTest.$\|^t/run-table_test-HarnessTest.Randomized$\|^t/run-db_test-.(?:FileCreationRandomFailure\|EncodeDecompressedBlockSizeTest)$\|^.RecoverFromCorruptedWALWithoutFlush$),100 $1,' \| sort -k1,1gr \| sed 's/^[.0-9] //' \ \| grep -E '.' \ \| grep -E -v '"^$"' \ \| build_tools/gnu_parallel -j100% --plain --joblog=LOG --eta --gnu \ --tmpdir=/dev/shm/rocksdb.6lop '{} >& t/log-{/} \|\| bash -c "cat t/log-{/}; exit $?"' ; \ parallel_retcode=$? ; \ awk '{ if ($7 != 0 \|\| $8 != 0) { if ($7 == "Exitval") { h = $0; } else { if (!f) print h; print; f = 1 } } } END { if(f) exit 1; }' < LOG ; \ awk_retcode=$?; \ if [ $parallel_retcode -ne 0 ] \|\| [ $awk_retcode -ne 0 ] ; then exit 1 ; fi To monitor subtest <duration,pass/fail,name>, run "make watch-log" in a separate window Computers / CPU cores / Max jobs to run 1:local / 16 / 16 ``` The `printf` command will make the output confusing. It would be better not to print it. Before Change ![image](https://github.com/facebook/rocksdb/assets/30565051/92cf681a-40b7-462e-ae5b-23eeacbb8f82) After Change ![image](https://github.com/facebook/rocksdb/assets/30565051/4a70b04b-e4ef-4bed-9ce0-d942ed9d132e) Test Plan Not applicable. This is a trivial change, only to add a `@` before a Makefile command, and it will not impact any workflows. Pull Request resolved: https://github.com/facebook/rocksdb/pull/11978 Reviewed By: jaykorean Differential Revision: D51076606 Pulled By: cbi42 fbshipit-source-id: dc079ab8f60a5a5b9d04a83888884657b2e442ff	2023-11-07 11:44:20 -08:00
Peter Dillinger	16ae3548a2	AutoHCC: Improve/fix allocation/detection of grow homes (#12047 ) Summary: This change simplifies some code and logic by introducing a new atomic field that tracks the next slot to grow into. It should offer slightly better performance during the growth phase (not measurable; see Test Plan below) and fix a suspected (but unconfirmed) bug like this: * Thread 1 is in non-trivial SplitForGrow() with grow_home=n. * Thread 2 reaches Grow() with grow_home=2n, and waits at the start of SplitForGrow() for the rewrite lock on n. By this point, the head at 2n is marked with the new shift amount but no chain is locked. * Thread 3 reaches Grow() with grow_home=4n, and waits before SplitForGrow() for the rewrite lock on n. By this point, the head at 4n is marked with the new shift amount but no chain is locked. * Thread 4 reaches Grow() with grow_home=8n and meets no resistance to proceeding through a SplitForGrow() on an empty chain, permanently missing out on any entries from chain n that should have ended up here. This is fixed by not updating the shift amount at the grow_home head until we have checked the preconditions that Grow()s feeding into this one have completed. Pull Request resolved: https://github.com/facebook/rocksdb/pull/12047 Test Plan: Some manual cache_bench stress runs, and about 20 triggered runs of fbcode_blackbox_crash_test No discernible performance difference on this benchmark, running before & after in parallel for a few minutes: ``` (while ./cache_bench -cache_type=auto_hyper_clock_cache -populate_cache=0 -cache_size=3000000000 -ops_per_thread=50000 -threads=12 -histograms=0 2>&1 \| grep parallel; do :; done) \| awk '{ s += $3; c++; print "Avg time: " (s/c);}' ``` Reviewed By: jowlyzhang Differential Revision: D51017007 Pulled By: pdillinger fbshipit-source-id: 5f6d6a6194fc966f94693f3205ed75c87cdad269	2023-11-07 10:40:39 -08:00
Jay Huh	2adef5367a	AttributeGroups - PutEntity Implementation (#11977 ) Summary: Write Path for AttributeGroup Support. The new `PutEntity()` API uses `WriteBatch` and atomically writes WideColumns entities in multiple Column Families. Combined the release note from PR https://github.com/facebook/rocksdb/issues/11925 Pull Request resolved: https://github.com/facebook/rocksdb/pull/11977 Test Plan: - `DBWideBasicTest::MultiCFMultiGetEntityAsPinnableAttributeGroups` updated - `WriteBatchTest::AttributeGroupTest` added - `WriteBatchTest::AttributeGroupSavePointTest` added Reviewed By: ltamasi Differential Revision: D50457122 Pulled By: jaykorean fbshipit-source-id: 4997b265e415588ce077933082dcd1ac3eeae2cd	2023-11-06 16:52:51 -08:00
Peter Dillinger	92dc5f3e67	AutoHCC: fix a bug with "blind" Insert (#12046 ) Summary: I have finally tracked down and fixed a bug affecting AutoHCC that was causing CI crash test assertion failures in AutoHCC when using secondary cache, but I was only able to reproduce locally a couple of times, after very long runs/repetitions. It turns out that the essential feature used by secondary cache to trigger the bug is Insert without keeping a handle, which is otherwise rarely used in RocksDB and not incorporated into cache_bench (also used for targeted correctness stress testing) until this change (new option `-blind_insert_percent`). The problem was in copying some logic from FixedHCC that makes the entry "sharable" but unreferenced once populated, if no reference is to be saved. The problem in AutoHCC is that we can only add the entry to a chain after it is in the sharable state, and must be removed from the chain while in the "under (de)construction" state and before it is back in the "empty" state. Also, it is possible for Lookup to find entries that are not connected to any chain, by design for efficiency, and for Release to erase_if_last_ref. Therefore, we could have * Thread 1 starts to Insert a cache entry without keeping ref, and pauses before adding to the chain. * Thread 2 finds it with Lookup optimizations, and then does Release with `erase_if_last_ref=true` causing it to trigger erasure on the entry. It successfully locks the home chain for the entry and purges any entries pending erasure. It is OK that this entry is not found on the chain, as another thread is allowed to remove it from the chain before we are able to (but after is it marked for (de)construction). And after the purge of the chain, the entry is marked empty. * Thread 1 resumes in adding the slot (presumed entry) to the home chain for what was being inserted, but that now violates invariants and sets up a race or double-chain-reference as another thread could insert a new entry in the slot and try to insert into a different chain. This is easily fixed by holding on to a reference until inserted onto the chain. Pull Request resolved: https://github.com/facebook/rocksdb/pull/12046 Test Plan: As I don't have a reliable local reproducer, I triggered 20 runs of internal CI on fbcode_blackbox_crash_test that were previously failing in AutoHCC with about 1/3 probability, and they all passed. Also re-enabling AutoHCC in the crash test with this change. (Revert https://github.com/facebook/rocksdb/issues/12000) Reviewed By: jowlyzhang Differential Revision: D51016979 Pulled By: pdillinger fbshipit-source-id: 3840fb829d65b97c779d8aed62a4a4a433aeff2b	2023-11-06 16:06:01 -08:00
Jay Huh	0ecfc4fbb4	AttributeGroups - GetEntity Implementation (#11943 ) Summary: Implementation of `GetEntity()` API that returns wide-column entities as AttributeGroups from multiple column families for a single key. Regarding the definition of Attribute groups, please see the detailed example description in PR https://github.com/facebook/rocksdb/issues/11925 Pull Request resolved: https://github.com/facebook/rocksdb/pull/11943 Test Plan: - `DBWideBasicTest::GetEntityAsPinnableAttributeGroups` added will enable the new API in the `db_stress` after merging Reviewed By: ltamasi Differential Revision: D50195794 Pulled By: jaykorean fbshipit-source-id: 218d54841ac7e337de62e13b1233b0a99bd91af3	2023-11-06 15:04:41 -08:00
Jay Huh	2dab137182	Mark more files for periodic compaction during offpeak (#12031 ) Summary: - The struct previously named `OffpeakTimeInfo` has been renamed to `OffpeakTimeOption` to indicate that it's a user-configurable option. Additionally, a new struct, `OffpeakTimeInfo`, has been introduced, which includes two fields: `is_now_offpeak` and `seconds_till_next_offpeak_start`. This change prevents the need to parse the `daily_offpeak_time_utc` string twice. - It's worth noting that we may consider adding more fields to the `OffpeakTimeInfo` struct, such as `elapsed_seconds` and `total_seconds`, as needed for further optimization. - Within `VersionStorageInfo::ComputeFilesMarkedForPeriodicCompaction()`, we've adjusted the `allowed_time_limit` to include files that are expected to expire by the next offpeak start. - We might explore further optimizations, such as evenly distributing files to mark during offpeak hours, if the initial approach results in marking too many files simultaneously during the first scoring in offpeak hours. The primary objective of this PR is to prevent periodic compactions during non-offpeak hours when offpeak hours are configured. We'll start with this straightforward solution and assess whether it suffices for now. Pull Request resolved: https://github.com/facebook/rocksdb/pull/12031 Test Plan: Unit Tests added - `DBCompactionTest::LevelPeriodicCompactionOffpeak` for Leveled - `DBTestUniversalCompaction2::PeriodicCompaction` for Universal Reviewed By: cbi42 Differential Revision: D50900292 Pulled By: jaykorean fbshipit-source-id: 267e7d3332d45a5d9881796786c8650fa0a3b43d	2023-11-06 11:43:59 -08:00
Peter Dillinger	a399bbc037	More fixes and enhancements for cache_bench (#12041 ) Summary: Mostly things for using cache_bench for stress/correctness testing. * Make secondary_cache_uri option work with HCC (forgot to update when secondary support was added for HCC) * Add -pinned_ratio option to keep more than just one entry per thread pinned. This can be important for testing eviction stress. * Add -vary_capacity_ratio for testing dynamically changing capacity. Also added some overrides to CacheWrapper to help with diagnostic output. Pull Request resolved: https://github.com/facebook/rocksdb/pull/12041 Test Plan: manual, make check Reviewed By: jowlyzhang Differential Revision: D51013430 Pulled By: pdillinger fbshipit-source-id: 7914adc1218f0afacace05ccd77d3bfb91a878d0	2023-11-06 09:59:09 -08:00
Alan Paxton	6979e9dc6a	Create blog post from report on JNI performance work (#11818 ) Summary: We did some investigation into the performance of JNI for workloads emulating how data is carried between Java and C++ for RocksDB. The repo for our performance work lives at https://github.com/evolvedbinary/jni-benchmarks This is a report text from that work, extracted as a blog post. Along with some supporting files (png, pdf of graphs). Pull Request resolved: https://github.com/facebook/rocksdb/pull/11818 Reviewed By: jaykorean Differential Revision: D50907467 Pulled By: pdillinger fbshipit-source-id: ec6a43c83bd9ad94a3d11cfd87031e613acf7659	2023-11-06 09:15:00 -08:00

... 7 8 9 10 11 ...

12697 Commits All Branches Search

12697 Commits

All Branches