rocksdb

mirror of https://github.com/facebook/rocksdb.git synced 2024-11-28 15:33:54 +00:00

Author	SHA1	Message	Date
Fan Zhang(DevX)	0e5ed2e0c8	add export_file to rockdb TARGETS generator and re-gen Summary: we are converting the implicit loads to explicit loads, then remove the hidden loads in fbcode macroes. details see https://fb.workplace.com/groups/devx.build.bffs/permalink/7481848805183560/ Reviewed By: JakobDegen Differential Revision: D57800976 fbshipit-source-id: a893aa2aa9237704ba9eb998cba210222c95dd2f	2024-05-25 17:10:12 -07:00
Levi Tamasi	bd801bd98c	Factor out the RYW transaction building logic into a helper (#12697 ) Summary: Pull Request resolved: https://github.com/facebook/rocksdb/pull/12697 As groundwork for stress testing `Transaction::MultiGetEntity`, the patch factors out the logic for adding transactional writes for some of the keys in a `MultiGet` batch into a separate helper method called `MaybeAddKeyToTxnForRYW`. Reviewed By: jowlyzhang Differential Revision: D57791830 fbshipit-source-id: ef347ba6e6e82dfe5cedb4cf67dd6d1503901d89	2024-05-24 14:24:17 -07:00
Changyu Bi	fecb10c2fa	Improve universal compaction sorted-run trigger (#12477 ) Summary: Universal compaction currently uses `level0_file_num_compaction_trigger` for two purposes: 1. the trigger for checking if there is any compaction to do, and 2. the limit on the number of sorted runs. RocksDB will do compaction to keep the number of sorted runs no more than the value of this option. This can make the option inflexible. A value that is too small causes higher write amp: more compactions to reduce the number of sorted runs. A value that is too big delays potential compaction work and causes worse read performance. This PR introduce an option `CompactionOptionsUniversal::max_read_amp` for only the second purpose: to specify the hard limit on the number of sorted runs. For backward compatibility, `max_read_amp = -1` by default, which means to fallback to the current behavior. When `max_read_amp > 0`,`level0_file_num_compaction_trigger` will only serve as a trigger to find potential compaction. When `max_read_amp = 0`, RocksDB will auto-tune the limit on the number of sorted runs. The estimation is based on DB size, write_buffer_size and size_ratio, so it is adaptive to the size change of the DB. See more in `UniversalCompactionBuilder::PickCompaction()`. Alternatively, users now can configure `max_read_amp` to a very big value and keep `level0_file_num_compaction_trigger` small. This will allow `size_ratio` and `max_size_amplification_percent` to control the number of sorted runs. This essentially disables compactions with reason kUniversalSortedRunNum. Pull Request resolved: https://github.com/facebook/rocksdb/pull/12477 Test Plan: * new unit test * existing unit test for default behavior * updated crash test with the new option * benchmark: * Create a DB that is roughly 24GB in the last level. When `max_read_amp = 0`, we estimate that the DB needs 9 levels to avoid excessive compactions to reduce the number of sorted runs. * We then run fillrandom to ingest another 24GB data to compare write amp. * case 1: small level0 trigger: `level0_file_num_compaction_trigger=5, max_read_amp=-1` * write-amp: 4.8 * case 2: auto-tune: `level0_file_num_compaction_trigger=5, max_read_amp=0` * write-amp: 3.6 * case 3: auto-tune with minimal trigger: `level0_file_num_compaction_trigger=1, max_read_amp=0` * write-amp: 3.8 * case 4: hard-code a good value for trigger: `level0_file_num_compaction_trigger=9` * write-amp: 2.8 ``` Case 1: Compaction Stats [default] Level Files Size Score Read(GB) Rn(GB) Rnp1(GB) Write(GB) Wnew(GB) Moved(GB) W-Amp Rd(MB/s) Wr(MB/s) Comp(sec) CompMergeCPU(sec) Comp(cnt) Avg(sec) KeyIn KeyDrop Rblob(GB) Wblob(GB) ------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------ L0 0/0 0.00 KB 1.0 0.0 0.0 0.0 22.6 22.6 0.0 1.0 0.0 163.2 141.94 111.10 108 1.314 0 0 0.0 0.0 L45 8/0 1.81 GB 0.0 39.6 11.1 28.5 39.3 10.8 0.0 3.5 209.0 207.3 194.25 191.29 43 4.517 348M 2498K 0.0 0.0 L46 13/0 3.12 GB 0.0 15.3 9.5 5.8 15.0 9.3 0.0 1.6 203.1 199.3 77.13 75.88 16 4.821 134M 2362K 0.0 0.0 L47 19/0 4.68 GB 0.0 15.4 10.5 4.9 14.7 9.8 0.0 1.4 204.0 194.9 77.38 76.15 8 9.673 135M 5920K 0.0 0.0 L48 38/0 9.42 GB 0.0 19.6 11.7 7.9 17.3 9.4 0.0 1.5 206.5 182.3 97.15 95.02 4 24.287 172M 20M 0.0 0.0 L49 91/0 22.70 GB 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.00 0.00 0 0.000 0 0 0.0 0.0 Sum 169/0 41.74 GB 0.0 89.9 42.9 47.0 109.0 61.9 0.0 4.8 156.7 189.8 587.85 549.45 179 3.284 791M 31M 0.0 0.0 Case 2: Compaction Stats [default] Level Files Size Score Read(GB) Rn(GB) Rnp1(GB) Write(GB) Wnew(GB) Moved(GB) W-Amp Rd(MB/s) Wr(MB/s) Comp(sec) CompMergeCPU(sec) Comp(cnt) Avg(sec) KeyIn KeyDrop Rblob(GB) Wblob(GB) ------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------ L0 1/0 214.47 MB 1.2 0.0 0.0 0.0 22.6 22.6 0.0 1.0 0.0 164.5 140.81 109.98 108 1.304 0 0 0.0 0.0 L44 0/0 0.00 KB 0.0 1.3 1.3 0.0 1.2 1.2 0.0 1.0 206.1 204.9 6.24 5.98 3 2.081 11M 51K 0.0 0.0 L45 4/0 844.36 MB 0.0 7.1 5.4 1.7 7.0 5.4 0.0 1.3 194.6 192.9 37.41 36.00 13 2.878 62M 489K 0.0 0.0 L46 11/0 2.57 GB 0.0 14.6 9.8 4.8 14.3 9.5 0.0 1.5 193.7 189.8 77.09 73.54 17 4.535 128M 2411K 0.0 0.0 L47 24/0 5.81 GB 0.0 19.8 12.0 7.8 18.8 11.0 0.0 1.6 191.4 181.1 106.19 101.21 9 11.799 174M 9166K 0.0 0.0 L48 38/0 9.42 GB 0.0 19.6 11.8 7.9 17.3 9.4 0.0 1.5 197.3 173.6 101.97 97.23 4 25.491 172M 20M 0.0 0.0 L49 91/0 22.70 GB 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.00 0.00 0 0.000 0 0 0.0 0.0 Sum 169/0 41.54 GB 0.0 62.4 40.3 22.1 81.3 59.2 0.0 3.6 136.1 177.2 469.71 423.94 154 3.050 549M 32M 0.0 0.0 Case 3: Compaction Stats [default] Level Files Size Score Read(GB) Rn(GB) Rnp1(GB) Write(GB) Wnew(GB) Moved(GB) W-Amp Rd(MB/s) Wr(MB/s) Comp(sec) CompMergeCPU(sec) Comp(cnt) Avg(sec) KeyIn KeyDrop Rblob(GB) Wblob(GB) ------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------ L0 0/0 0.00 KB 5.0 0.0 0.0 0.0 22.6 22.6 0.0 1.0 0.0 163.8 141.43 111.13 108 1.310 0 0 0.0 0.0 L44 0/0 0.00 KB 0.0 0.8 0.8 0.0 0.8 0.8 0.0 1.0 201.4 200.2 4.26 4.19 2 2.130 7360K 33K 0.0 0.0 L45 4/0 844.38 MB 0.0 6.3 5.0 1.2 6.2 5.0 0.0 1.2 202.0 200.3 31.81 31.50 12 2.651 55M 403K 0.0 0.0 L46 7/0 1.62 GB 0.0 13.3 8.8 4.6 13.1 8.6 0.0 1.5 198.9 195.7 68.72 67.89 17 4.042 117M 1696K 0.0 0.0 L47 24/0 5.81 GB 0.0 21.7 12.9 8.8 20.6 11.8 0.0 1.6 198.5 188.6 112.04 109.97 12 9.336 191M 9352K 0.0 0.0 L48 41/0 10.14 GB 0.0 24.8 13.0 11.8 21.9 10.1 0.0 1.7 198.6 175.6 127.88 125.36 6 21.313 218M 25M 0.0 0.0 L49 91/0 22.70 GB 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.00 0.00 0 0.000 0 0 0.0 0.0 Sum 167/0 41.10 GB 0.0 67.0 40.5 26.4 85.4 58.9 0.0 3.8 141.1 179.8 486.13 450.04 157 3.096 589M 36M 0.0 0.0 Case 4: Compaction Stats [default] Level Files Size Score Read(GB) Rn(GB) Rnp1(GB) Write(GB) Wnew(GB) Moved(GB) W-Amp Rd(MB/s) Wr(MB/s) Comp(sec) CompMergeCPU(sec) Comp(cnt) Avg(sec) KeyIn KeyDrop Rblob(GB) Wblob(GB) ------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------ L0 0/0 0.00 KB 0.7 0.0 0.0 0.0 22.6 22.6 0.0 1.0 0.0 158.6 146.02 114.68 108 1.352 0 0 0.0 0.0 L42 0/0 0.00 KB 0.0 1.7 1.7 0.0 1.7 1.7 0.0 1.0 185.4 184.3 9.25 8.96 4 2.314 14M 67K 0.0 0.0 L43 0/0 0.00 KB 0.0 2.5 2.5 0.0 2.5 2.5 0.0 1.0 197.8 195.6 13.01 12.65 4 3.253 22M 202K 0.0 0.0 L44 4/0 844.40 MB 0.0 4.2 4.2 0.0 4.1 4.1 0.0 1.0 188.1 185.1 22.81 21.89 5 4.562 36M 503K 0.0 0.0 L45 13/0 3.12 GB 0.0 7.5 6.5 1.0 7.2 6.2 0.0 1.1 188.7 181.8 40.69 39.32 5 8.138 65M 2282K 0.0 0.0 L46 17/0 4.18 GB 0.0 8.3 7.1 1.2 7.9 6.6 0.0 1.1 192.2 181.8 44.23 43.06 4 11.058 73M 3846K 0.0 0.0 L47 22/0 5.34 GB 0.0 8.9 7.5 1.4 8.2 6.8 0.0 1.1 189.1 174.1 48.12 45.37 3 16.041 78M 6098K 0.0 0.0 L48 27/0 6.58 GB 0.0 9.2 7.6 1.6 8.2 6.6 0.0 1.1 195.2 172.9 48.52 47.11 2 24.262 81M 9217K 0.0 0.0 L49 91/0 22.70 GB 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.00 0.00 0 0.000 0 0 0.0 0.0 Sum 174/0 42.74 GB 0.0 42.3 37.0 5.3 62.4 57.1 0.0 2.8 116.3 171.3 372.66 333.04 135 2.760 372M 22M 0.0 0.0 setup: ./db_bench --benchmarks=fillseq,compactall,waitforcompaction --num=200000000 --compression_type=none --disable_wal=1 --compaction_style=1 --num_levels=50 --target_file_size_base=268435456 --max_compaction_bytes=6710886400 --level0_file_num_compaction_trigger=10 --write_buffer_size=268435456 --seed 1708494134896523 benchmark: ./db_bench --benchmarks=overwrite,waitforcompaction,stats --num=200000000 --compression_type=none --disable_wal=1 --compaction_style=1 --write_buffer_size=268435456 --level0_file_num_compaction_trigger=5 --target_file_size_base=268435456 --use_existing_db=1 --num_levels=50 --writes=200000000 --universal_max_read_amp=-1 --seed=1716488324800233 ``` Reviewed By: ajkr Differential Revision: D55370922 Pulled By: cbi42 fbshipit-source-id: 9be69979126b840d08e93e7059260e76a878bb2a	2024-05-24 10:10:31 -07:00
Yu Zhang	9a72cf1a61	Add timestamp support in dump_wal/dump/idump (#12690 ) Summary: As titled. For dumping wal files, since a mapping from column family id to the user comparator object is needed to print the timestamp in human readable format, option `[--db=<db_path>]` is added to `dump_wal` command to allow the user to choose to optionally open the DB as read only instance and dump the wal file with better timestamp formatting. Pull Request resolved: https://github.com/facebook/rocksdb/pull/12690 Test Plan: Manually tested dump_wal: [dump a wal file specified with --walfile] ``` >> ./ldb --walfile=$TEST_DB/000004.log dump_wal --print_value >>1,1,28,13,PUT(0) : 0x666F6F0100000000000000 : 0x7631 (Column family id: [0] contained in WAL are not opened in DB. Applied default hex formatting for user key. Specify --db=<db_path> to open DB for better user key formatting if it contains timestamp.) ``` [dump with --db specified for better timestamp formatting] ``` >> ./ldb --walfile=$TEST_DB/000004.log dump_wal --db=$TEST_DB --print_value >> 1,1,28,13,PUT(0) : 0x666F6F\|timestamp:1 : 0x7631 ``` dump: [dump a file specified with --path] ``` >>./ldb --path=/tmp/rocksdbtest-501/column_family_test_75359_17910784957761284041/000004.log dump Sequence,Count,ByteSize,Physical Offset,Key(s) : value 1,1,28,13,PUT(0) : 0x666F6F0100000000000000 : 0x7631 (Column family id: [0] contained in WAL are not opened in DB. Applied default hex formatting for user key. Specify --db=<db_path> to open DB for better user key formatting if it contains timestamp.) ``` [dump db specified with --db] ``` >> ./ldb --db=/tmp/rocksdbtest-501/column_family_test_75359_17910784957761284041 dump >> foo\|timestamp:1 ==> v1 Keys in range: 1 ``` idump ``` ./ldb --db=$TEST_DB idump 'foo\|timestamp:1' seq:1, type:1 => v1 Internal keys in range: 1 ``` Reviewed By: ltamasi Differential Revision: D57755382 Pulled By: jowlyzhang fbshipit-source-id: a0a2ef80c92801cbf7bfccc64769c1191824362e	2024-05-23 20:26:57 -07:00
Levi Tamasi	f044b6a6ad	Fix a couple of issues in the stress test for Transaction::MultiGet (#12696 ) Summary: Pull Request resolved: https://github.com/facebook/rocksdb/pull/12696 Two fixes: 1) `Random::Uniform(n)` returns an integer from the interval [0, n - 1], so `Uniform(2)` returns 0 or 1, which means is that we have apparently never covered transactions with deletions in the test. (To prevent similar issues, the patch cleans this write logic up a bit using an `enum class` for the type of write.) 2) The keys passed in to `TestMultiGet` can have duplicates. What this boils down to is that we have to keep track of the latest expected values for read-your-own-writes on a per-key basis. Reviewed By: jowlyzhang Differential Revision: D57750212 fbshipit-source-id: e8ab603252c32331f8db0dfb2affcca1e188c790	2024-05-23 16:47:39 -07:00
Andrew Kryczka	c72ee4531b	Fix recycled WAL detection when wal_compression is enabled (#12643 ) Summary: I think the point of the `if (end_of_buffer_offset_ - buffer_.size() == 0)` was to only set `recycled_` when the first record was read. However, the condition was false when reading the first record when the WAL began with a `kSetCompressionType` record because we had already dropped the `kSetCompressionType` record from `buffer_`. To fix this, I used `first_record_read_` instead. Also, it was pretty confusing to treat the WAL as non-recycled when a recyclable record first appeared in a non-first record. I changed it to return an error if that happens. Pull Request resolved: https://github.com/facebook/rocksdb/pull/12643 Reviewed By: hx235 Differential Revision: D57238099 Pulled By: ajkr fbshipit-source-id: e20a2a0c9cf0c9510a7b6af463650a05d559239e	2024-05-22 15:34:37 -07:00
Levi Tamasi	db0960800a	Add Transaction::PutEntity to the stress tests (#12688 ) Summary: Pull Request resolved: https://github.com/facebook/rocksdb/pull/12688 As a first step of covering the wide-column transaction APIs, the patch adds `PutEntity` to the optimistic and pessimistic transaction stress tests (for the latter, only when the WriteCommitted policy is utilized). Other APIs and the multi-operation transaction test will be covered by subsequent PRs. Reviewed By: jaykorean Differential Revision: D57675781 fbshipit-source-id: bfe062ec5f6ab48641cd99a70f239ce4aa39299c	2024-05-22 11:30:33 -07:00
Hui Xiao	733150f6aa	Flush WAL upon DB close (#12684 ) Summary: Context/Summary: https://github.com/facebook/rocksdb/pull/12556 `avoid_sync_during_shutdown=false` missed an edge case where `manual_wal_flush == true` so WAL sync will still miss unflushed WAL. This PR fixes it. Pull Request resolved: https://github.com/facebook/rocksdb/pull/12684 Test Plan: modified UT to include this case `manual_wal_flush==true` Reviewed By: cbi42 Differential Revision: D57655861 Pulled By: hx235 fbshipit-source-id: c9f49fe260e8b38b3ea387558432dcd9a3dbec19	2024-05-22 11:08:16 -07:00
Levi Tamasi	014368f62c	Fix the names of function objects added in PR 12681 (#12689 ) Summary: Pull Request resolved: https://github.com/facebook/rocksdb/pull/12689 These should be in `snake_case` (not `camelCase`) per our style guide. Reviewed By: jowlyzhang Differential Revision: D57676418 fbshipit-source-id: 82ad6a87d1540f0b29c2f864ca0128287fe95a9e	2024-05-22 11:06:52 -07:00
Richard Barnes	1827f3f983	Remove extra semi colon from internal_repo_rocksdb/repo/table/sst_file_reader.cc Summary: `-Wextra-semi` or `-Wextra-semi-stmt` If the code compiles, this is safe to land. Reviewed By: palmje Differential Revision: D57632757 fbshipit-source-id: 1dbad2a2e185381e225df8b9027033e06aeaf01b	2024-05-22 07:14:52 -07:00
Levi Tamasi	ad6f6e24c8	Fix txn_write_policy check in crash test script (#12683 ) Summary: Pull Request resolved: https://github.com/facebook/rocksdb/pull/12683 With optimistic transactions, the stress test parameter `txn_write_policy` is not applicable and is thus not set. When the parameter is subsequently checked, Python's dictionary `get` method returns `None`, which is not equal to zero. The net result of this is that currently, `sync_fault_injection` and `manual_wal_flush_one_in` are always disabled in optimistic transaction mode (most likely unintentionally). Reviewed By: cbi42 Differential Revision: D57655339 fbshipit-source-id: 8b93a788f9b02307b6ea7b2129dc012271130334	2024-05-22 00:49:18 -07:00
Levi Tamasi	62600cb2d4	Fix rebuilding transactions containing PutEntity (#12681 ) Summary: Pull Request resolved: https://github.com/facebook/rocksdb/pull/12681 When rebuilding transactions during recovery, `MemtableInserter::PutCFImpl` currently calls `WriteBatchInternal::Put` regardless of value type, which is incorrect for `PutEntity` entries, as well as `TimedPut`s and the blob indexes used by the old BlobDB implementation. The patch fixes the handling of `PutEntity` and returns `NotSupported` for `TimedPut`s and blob indices. Reviewed By: jaykorean, jowlyzhang Differential Revision: D57636355 fbshipit-source-id: 833de4e4aa0b42ff6638b72c4181f981d12d0f15	2024-05-21 17:22:20 -07:00
Davide Angelocola	cee32c5cce	use nullptr instead of NULL / 0 in rocksdbjni (#12575 ) Summary: While I was trying to understand issue https://github.com/facebook/rocksdb/issues/12503, I found this minor problem. Please have a look adamretter rhubner Pull Request resolved: https://github.com/facebook/rocksdb/pull/12575 Reviewed By: ajkr Differential Revision: D57596055 Pulled By: cbi42 fbshipit-source-id: ee0860bdfbee9364cd30c23957b72a04da6acd45	2024-05-21 12:56:07 -07:00
Peter Dillinger	d89ab23bec	Disallow memtable flush and sst ingest while WAL is locked (#12652 ) Summary: We recently noticed that some memtable flushed and file ingestions could proceed during LockWAL, in violation of its stated contract. (Note: we aren't 100% sure its actually needed by MySQL, but we want it to be in a clean state nonetheless.) Despite earlier skepticism that this could be done safely (https://github.com/facebook/rocksdb/issues/12666), I found a place to wait to wait for LockWAL to be cleared before allowing these operations to proceed: WaitForPendingWrites() Pull Request resolved: https://github.com/facebook/rocksdb/pull/12652 Test Plan: Added to unit tests. Extended how db_stress validates LockWAL and re-enabled combination of ingestion and LockWAL in crash test, in follow-up to https://github.com/facebook/rocksdb/issues/12642 Ran blackbox_crash_test for a long while with relevant features amplified. Suggested follow-up: fix FaultInjectionTestFS to report file sizes consistent with what the user has requested to be flushed. Reviewed By: jowlyzhang Differential Revision: D57622142 Pulled By: pdillinger fbshipit-source-id: aef265fce69465618974b4ec47f4636257c676ce	2024-05-21 10:17:34 -07:00
Hui Xiao	d7b938882e	Sync WAL during db Close() (#12556 ) Summary: Context/Summary: Below crash test found out we don't sync WAL upon DB close, which can lead to unsynced data loss. This PR syncs it. ``` ./db_stress --threads=1 --disable_auto_compactions=1 --WAL_size_limit_MB=0 --WAL_ttl_seconds=0 --acquire_snapshot_one_in=0 --adaptive_readahead=0 --adm_policy=1 --advise_random_on_open=1 --allow_concurrent_memtable_write=1 --allow_data_in_errors=True --allow_fallocate=0 --async_io=0 --auto_readahead_size=0 --avoid_flush_during_recovery=1 --avoid_flush_during_shutdown=0 --avoid_unnecessary_blocking_io=1 --backup_max_size=104857600 --backup_one_in=0 --batch_protection_bytes_per_key=0 --bgerror_resume_retry_interval=1000000 --block_align=0 --block_protection_bytes_per_key=2 --block_size=16384 --bloom_before_level=1 --bloom_bits=29.895303579352174 --bottommost_compression_type=disable --bottommost_file_compaction_delay=0 --bytes_per_sync=0 --cache_index_and_filter_blocks=0 --cache_index_and_filter_blocks_with_high_priority=1 --cache_size=33554432 --cache_type=lru_cache --charge_compression_dictionary_building_buffer=1 --charge_file_metadata=0 --charge_filter_construction=1 --charge_table_reader=1 --checkpoint_one_in=0 --checksum_type=kxxHash64 --clear_column_family_one_in=0 --column_families=1 --compact_files_one_in=0 --compact_range_one_in=0 --compaction_pri=0 --compaction_readahead_size=0 --compaction_style=0 --compaction_ttl=0 --compress_format_version=2 --compressed_secondary_cache_ratio=0 --compressed_secondary_cache_size=0 --compression_checksum=1 --compression_max_dict_buffer_bytes=0 --compression_max_dict_bytes=0 --compression_parallel_threads=4 --compression_type=zstd --compression_use_zstd_dict_trainer=1 --compression_zstd_max_train_bytes=0 --continuous_verification_interval=0 --data_block_index_type=0 --db=/dev/shm/rocksdb_test/rocksdb_crashtest_whitebox --db_write_buffer_size=0 --default_temperature=kUnknown --default_write_temperature=kUnknown --delete_obsolete_files_period_micros=0 --delpercent=0 --delrangepercent=0 --destroy_db_initially=1 --detect_filter_construct_corruption=1 --disable_wal=0 --dump_malloc_stats=0 --enable_checksum_handoff=0 --enable_compaction_filter=0 --enable_custom_split_merge=0 --enable_do_not_compress_roles=1 --enable_index_compression=1 --enable_memtable_insert_with_hint_prefix_extractor=0 --enable_pipelined_write=0 --enable_sst_partitioner_factory=0 --enable_thread_tracking=1 --enable_write_thread_adaptive_yield=0 --expected_values_dir=/dev/shm/rocksdb_test/rocksdb_crashtest_expected --fail_if_options_file_error=0 --fifo_allow_compaction=1 --file_checksum_impl=none --fill_cache=0 --flush_one_in=1000 --format_version=5 --get_current_wal_file_one_in=0 --get_live_files_one_in=0 --get_property_one_in=0 --get_sorted_wal_files_one_in=0 --hard_pending_compaction_bytes_limit=274877906944 --high_pri_pool_ratio=0 --index_block_restart_interval=6 --index_shortening=0 --index_type=0 --ingest_external_file_one_in=0 --initial_auto_readahead_size=16384 --iterpercent=0 --key_len_percent_dist=1,30,69 --last_level_temperature=kUnknown --level_compaction_dynamic_level_bytes=1 --lock_wal_one_in=0 --log2_keys_per_lock=10 --log_file_time_to_roll=0 --log_readahead_size=16777216 --long_running_snapshots=0 --low_pri_pool_ratio=0 --lowest_used_cache_tier=0 --manifest_preallocation_size=5120 --manual_wal_flush_one_in=0 --mark_for_compaction_one_file_in=0 --max_auto_readahead_size=0 --max_background_compactions=1 --max_bytes_for_level_base=67108864 --max_key=2500000 --max_key_len=3 --max_log_file_size=0 --max_manifest_file_size=1073741824 --max_sequential_skip_in_iterations=8 --max_total_wal_size=0 --max_write_batch_group_size_bytes=64 --max_write_buffer_number=10 --max_write_buffer_size_to_maintain=0 --memtable_insert_hint_per_batch=0 --memtable_max_range_deletions=0 --memtable_prefix_bloom_size_ratio=0.5 --memtable_protection_bytes_per_key=1 --memtable_whole_key_filtering=1 --memtablerep=skip_list --metadata_charge_policy=0 --min_write_buffer_number_to_merge=1 --mmap_read=0 --mock_direct_io=True --nooverwritepercent=1 --num_file_reads_for_auto_readahead=0 --num_levels=1 --open_files=-1 --open_metadata_write_fault_one_in=0 --open_read_fault_one_in=0 --open_write_fault_one_in=0 --ops_per_thread=3 --optimize_filters_for_hits=1 --optimize_filters_for_memory=1 --optimize_multiget_for_io=0 --paranoid_file_checks=0 --partition_filters=0 --partition_pinning=1 --pause_background_one_in=0 --periodic_compaction_seconds=0 --prefix_size=1 --prefixpercent=0 --prepopulate_block_cache=0 --preserve_internal_time_seconds=3600 --progress_reports=0 --read_amp_bytes_per_bit=0 --read_fault_one_in=0 --readahead_size=16384 --readpercent=0 --recycle_log_file_num=0 --reopen=2 --report_bg_io_stats=1 --sample_for_compression=5 --secondary_cache_fault_one_in=0 --secondary_cache_uri= --skip_stats_update_on_db_open=1 --snapshot_hold_ops=0 --soft_pending_compaction_bytes_limit=68719476736 --sst_file_manager_bytes_per_sec=0 --sst_file_manager_bytes_per_truncate=0 --stats_dump_period_sec=10 --stats_history_buffer_size=1048576 --strict_bytes_per_sync=0 --subcompactions=3 --sync=0 --sync_fault_injection=1 --table_cache_numshardbits=6 --target_file_size_base=16777216 --target_file_size_multiplier=1 --test_batches_snapshots=0 --top_level_index_pinning=0 --unpartitioned_pinning=3 --use_adaptive_mutex=1 --use_adaptive_mutex_lru=0 --use_delta_encoding=1 --use_direct_io_for_flush_and_compaction=0 --use_direct_reads=0 --use_full_merge_v1=0 --use_get_entity=0 --use_merge=0 --use_multi_get_entity=0 --use_multiget=1 --use_put_entity_one_in=0 --use_write_buffer_manager=0 --user_timestamp_size=0 --value_size_mult=32 --verification_only=0 --verify_checksum=1 --verify_checksum_one_in=1000 --verify_compression=0 --verify_db_one_in=100000 --verify_file_checksums_one_in=0 --verify_iterator_with_expected_state_one_in=5 --verify_sst_unique_id_in_manifest=1 --wal_bytes_per_sync=0 --wal_compression=zstd --write_buffer_size=33554432 --write_dbid_to_manifest=0 --write_fault_one_in=0 --writepercent=100 Verification failed for column family 0 key 000000000000B9D1000000000000012B000000000000017D (4756691): value_from_db: , value_from_expected: 010000000504070609080B0A0D0C0F0E111013121514171619181B1A1D1C1F1E212023222524272629282B2A2D2C2F2E313033323534373639383B3A3D3C3F3E, msg: Iterator verification: Value not found: NotFound: Verification failed :( ``` Pull Request resolved: https://github.com/facebook/rocksdb/pull/12556 Test Plan: - New UT - Same stress test command failed before this fix but pass after - CI Reviewed By: ajkr Differential Revision: D56267964 Pulled By: hx235 fbshipit-source-id: af1b7e8769c129f64ba1c7f1ff17102f1239b929	2024-05-20 17:33:43 -07:00
Levi Tamasi	ef1d4955ba	Fix the output of `ldb dump_wal` for PutEntity records (#12677 ) Summary: Pull Request resolved: https://github.com/facebook/rocksdb/pull/12677 The patch contains two fixes related to printing `PutEntity` records with `ldb dump_wal`: 1) It adds the key to the printout (it was missing earlier). 2) It restores the formatting flags of the output stream after dumping the wide-column structure so that any `hex` flag that might have been set does not affect subsequent printing of e.g. sequence numbers. Reviewed By: jaykorean, jowlyzhang Differential Revision: D57591295 fbshipit-source-id: af4e3e219f0082ad39bbdfd26f8c5a57ebb898be	2024-05-20 17:04:14 -07:00
Levi Tamasi	b7520f4815	Support building ldb with buck (#12676 ) Summary: Pull Request resolved: https://github.com/facebook/rocksdb/pull/12676 The patch extends the RocksDB buckifier script so it also creates a `buck` target for the `ldb` tool and updates the `TARGETS` file with the results of the new version of the script. Reviewed By: cbi42 Differential Revision: D57588789 fbshipit-source-id: 2ed58b405b3f216e802cf6bcbdbf9809e7386c8b	2024-05-20 16:08:43 -07:00
Changyu Bi	35985a988c	Fix value of `inplace_update_support` across stress test runs (#12675 ) Summary: the value of `inplace_update_support` option need to be fixed across runs of db_stress on the same DB (https://github.com/facebook/rocksdb/issues/12577). My recent fix (https://github.com/facebook/rocksdb/issues/12673) regressed this behavior. Also fix some existing places where this does not hold. Pull Request resolved: https://github.com/facebook/rocksdb/pull/12675 Test Plan: monitor crash tests related to `inplace_update_support`. Reviewed By: hx235 Differential Revision: D57576375 Pulled By: cbi42 fbshipit-source-id: 75b1bd233f03e5657984f5d5234dbbb1ffc35c27	2024-05-20 13:23:34 -07:00
Levi Tamasi	c87f5cf91c	Add GetEntityForUpdate to optimistic and WriteCommitted pessimistic transactions (#12668 ) Summary: Pull Request resolved: https://github.com/facebook/rocksdb/pull/12668 The patch adds a new `GetEntityForUpdate` API to optimistic and WriteCommitted pessimistic transactions, which provides transactional wide-column point lookup functionality with concurrency control. For WriteCommitted transactions, user-defined timestamps are also supported similarly to the `GetForUpdate` API. Reviewed By: jaykorean Differential Revision: D57458304 fbshipit-source-id: 7eadbac531ca5446353e494abbd0635d63f62d24	2024-05-20 10:43:05 -07:00
Hui Xiao	f910a0c025	Fix unreleased bug fix .md name (#12672 ) Summary: Context/Summary: as above Pull Request resolved: https://github.com/facebook/rocksdb/pull/12672 Test Plan: no code change Reviewed By: ajkr Differential Revision: D57505136 Pulled By: hx235 fbshipit-source-id: 0e216dc5974e9be10027b444eb6b4034f679dfd8	2024-05-20 09:41:11 -07:00
raffertyyu	4dd084f66d	fix gcc warning about dangling-reference in backup_engine_test (#12637 ) Summary: gcc 14.1 reports some warnings about dangling-reference occured in backup_engine_test. ```c++ /data/rocksdb/utilities/backup/backup_engine_test.cc: In member function 'virtual void rocksdb::{anonymous}::BackupEngineTest_ExcludeFiles_Test::TestBody()': /data/rocksdb/utilities/backup/backup_engine_test.cc:4411:64: error: possibly dangling reference to a temporary [-Werror=dangling-reference] 4411 \| std::make_pair(alt_backup_engine, backup_engine_.get())}) { \| ^ /data/rocksdb/utilities/backup/backup_engine_test.cc:4410:23: note: the temporary was destroyed at the end of the full expression 'std::make_pair<rocksdb::BackupEngine, rocksdb::BackupEngine&>(((rocksdb::{anonymous}::BackupEngineTest_ExcludeFiles_Test)this)->rocksdb::{anonymous}::BackupEngineTest_ExcludeFiles_Test::rocksdb::{anonymous}::BackupEngineTest.rocksdb::{anonymous}::BackupEngineTest::backup_engine_.std::unique_ptr<rocksdb::BackupEngine>::get(), alt_backup_engine)' 4410 \| {std::make_pair(backup_engine_.get(), alt_backup_engine), \| ~~~~~~~~~~~~~~^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ /data/rocksdb/utilities/backup/backup_engine_test.cc:4411:64: error: possibly dangling reference to a temporary [-Werror=dangling-reference] 4411 \| std::make_pair(alt_backup_engine, backup_engine_.get())}) { \| ^ /data/rocksdb/utilities/backup/backup_engine_test.cc:4411:23: note: the temporary was destroyed at the end of the full expression 'std::make_pair<rocksdb::BackupEngine&, rocksdb::BackupEngine>(alt_backup_engine, ((rocksdb::{anonymous}::BackupEngineTest_ExcludeFiles_Test)this)->rocksdb::{anonymous}::BackupEngineTest_ExcludeFiles_Test::rocksdb::{anonymous}::BackupEngineTest.rocksdb::{anonymous}::BackupEngineTest::backup_engine_.std::unique_ptr<rocksdb::BackupEngine>::get())' 4411 \| std::make_pair(alt_backup_engine, backup_engine_.get())}) { \| ~~~~~~~~~~~~~~^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ ``` It seems to be related to this update in gcc: https://gcc.gnu.org/gcc-14/changes.html#:~:text=%2DWdangling%2Dreference%20false%20positives%20have%20been%20reduced.%20The%20warning%20does%20not%20warn%20about%20std%3A%3Aspan%2Dlike%20classes%3B%20there%20is%20also%20a%20new%20attribute%20gnu%3A%3Ano_dangling%20to%20suppress%20the%20warning.%20See%20the%20manual%20for%20more%20info. Pull Request resolved: https://github.com/facebook/rocksdb/pull/12637 Reviewed By: cbi42 Differential Revision: D57263996 Pulled By: ajkr fbshipit-source-id: 1e416c38240d3d1adda787fc484c0392e28bb7f1	2024-05-18 18:01:19 -07:00
Changyu Bi	c4782bde41	Disable `inplace_update_support` in crash test with unsynced data loss (#12673 ) Summary: With unsynced data loss, we replay traces to recover expected state to DB's latest sequence number. With `inplace_update_support`, the largest sequence number of memtable may not reflect the latest update. This is because inplace updates in memtable do not update sequence number. So we disable `inplace_update_support` where traces need to be replayed. Pull Request resolved: https://github.com/facebook/rocksdb/pull/12673 Reviewed By: ltamasi Differential Revision: D57512548 Pulled By: cbi42 fbshipit-source-id: 69278fe2e935874faf744d0ac4fd85263773c3ec	2024-05-18 16:48:41 -07:00
anand76	0ed93552f4	Implement obsolete file deletion (GC) in follower (#12657 ) Summary: This PR implements deletion of obsolete files in a follower RocksDB instance. The follower tails the leader's MANIFEST and creates links to newly added SST files. These links need to be deleted once those files become obsolete in order to reclaim space. There are three cases to be considered - 1. New files added and links created, but the Version could not be installed due to some missing files. Those links need to be preserved so a subsequent catch up attempt can succeed. We insert the next file number in the `VersionSet` to `pending_outputs_` to prevent their deletion. 2. Files deleted from the previous successfully installed `Version`. These are deleted as usual in `PurgeObsoleteFiles`. 3. New files added by a `VersionEdit` and deleted by a subsequent `VersionEdit`, both processed in the same catchup attempt. Links will be created for the new files when verifying a candidate `Version`. Those need to be deleted explicitly as they're never added to `VersionStorageInfo`, and thus not deleted by `PurgeObsoleteFiles`. Test plan - New unit tests in `db_follower_test`. Pull Request resolved: https://github.com/facebook/rocksdb/pull/12657 Reviewed By: jowlyzhang Differential Revision: D57462697 Pulled By: anand1976 fbshipit-source-id: 898f15570638dd4930f839ffd31c560f9cb73916	2024-05-17 19:13:33 -07:00
Changyu Bi	ffd7930312	Add more debug print to `DBTestWithParam.ThreadStatusSingleCompaction` (#12661 ) Summary: This test is flaky and a recent failure prints the following: ``` [ RUN ] DBTestWithParam/DBTestWithParam.ThreadStatusSingleCompaction/0 thread id: 1842811, thread status: thread id: 1842803, thread status: db/db_test.cc:4697: Failure Expected equality of these values: op_count Which is: 0 expected_count Which is: 1 [ FAILED ] DBTestWithParam/DBTestWithParam.ThreadStatusSingleCompaction/0, where GetParam() = (1, false) (307 ms) ``` Empty thread status implies that operation_type of the threads are all OP_UNKNOWN. From `3ed46e0668/monitoring/thread_status_updater.cc (L197)`, this can be due to thread_data->operation_type being OP_UNKNOWN or that thread_data->cf_key it not in `cf_info_map_`, potentially due to how cf_key_ is accessed with relaxed memory order. This PR adds some debug print to print the cf_name to check this. This PR also prints num_running_compaction and lsm state to check if a compaction is indeed running, and removes some not needed options and ensures that exactly 4 L0 files are created. Pull Request resolved: https://github.com/facebook/rocksdb/pull/12661 Test Plan: - Cannot repro the failure locally: `gtest-parallel --repeat=10000 --workers=200 ./db_test --gtest_filter="ThreadStatusSingleCompaction"` - New failure message will look like: ``` [ RUN ] DBTestWithParam/DBTestWithParam.ThreadStatusSingleCompaction/0 op_count: 1, expected_count 2 thread id: 6104100864, thread status: , cf_name thread id: 6103527424, thread status: Compaction, cf_name default running compaction: 1 lsm state: 4 db/db_test.cc:4885: Failure Value of: match Actual: false Expected: true [ FAILED ] DBTestWithParam/DBTestWithParam.ThreadStatusSingleCompaction/0, where GetParam() = (1, false) (115 ms) ``` Reviewed By: hx235 Differential Revision: D57422755 Pulled By: cbi42 fbshipit-source-id: 635663f26052b20e485dfa06a7c0f1f318ac1099	2024-05-16 17:23:56 -07:00
Peter Dillinger	131c8ccfcd	Add EntryType for TimedPut (#12669 ) Summary: Represent internal kTypeValuePreferredSeqno in the public API as kEntryTimedPut (because it is created by TimedPut, until the entry can be safely converted to a regular value entry in compaction) Pull Request resolved: https://github.com/facebook/rocksdb/pull/12669 Test Plan: for follow-up work actually using it. But putting this in place in the public API gives us more flexibility in rolling out that follow-up work (e.g. as a user extension or patch if needed). Reviewed By: jowlyzhang Differential Revision: D57459637 Pulled By: pdillinger fbshipit-source-id: 160ccf7c4e524ee479558846b2a207d51b8b3d9c	2024-05-16 15:18:12 -07:00
Changyu Bi	2eb404de13	Print non-ok status for multi_ops_txns_stress test (#12660 ) Summary: Currently `assert(s.ok())` does not offer much information to debug. Pull Request resolved: https://github.com/facebook/rocksdb/pull/12660 Test Plan: revert https://github.com/facebook/rocksdb/issues/12639 and run `python3 ./tools/db_crashtest.py blackbox --write_policy write_prepared --recycle_log_file_num=1 --threads=1 --test_multiops_txn` before this PR: ``` Exit Before Killing stdout: stderr: db_stress: db_stress_tool/multi_ops_txns_stress.cc:1529: void rocksdb::MultiOpsTxnsStressTest::PreloadDb(rocksdb::SharedState, int, uint32_t, uint32_t, uint32_t, uint32_t): Assertion `s.ok()' failed. ``` after this PR: ``` Exit Before Killing stdout: stderr: Verification failed: PreloadDB failed: Invalid argument: WriteOptions::disableWAL option is not supported if DBOptions::recycle_log_file_num > 0 db_stress: db_stress_tool/db_stress_test_base.cc:517: void rocksdb::StressTest::ProcessStatus(rocksdb::SharedState, std::string, rocksdb::Status, bool) const: Assertion `false' failed. Received signal 6 (Aborted) ``` Reviewed By: ajkr Differential Revision: D57410819 Pulled By: cbi42 fbshipit-source-id: a03c2202c3fd666eb2f58bae24e0c9e3e6ed4265	2024-05-15 21:14:41 -07:00
Andrew Kryczka	4eaf628120	Add `Iterator` property "rocksdb.iterator.is-value-pinned" (#12659 ) Summary: `ReadOptions::pin_data` already has the effect of pinning the `Slice` returned by `Iterator::value()` when the value is stored inline (e.g., `kTypeValue`). This PR adds a bit of visibility into that via a new `Iterator` property, "rocksdb.iterator.is-value-pinned", as well as some documentation and tests. See also: https://github.com/facebook/rocksdb/issues/12658 Pull Request resolved: https://github.com/facebook/rocksdb/pull/12659 Reviewed By: cbi42 Differential Revision: D57391200 Pulled By: ajkr fbshipit-source-id: 0caa8db27ca1aba86ee2addc3dfd6f0e003d32e2	2024-05-15 19:11:52 -07:00
Peter Dillinger	3ed46e0668	Handle early exit in DBErrorHandlingFSTests (#12655 ) Summary: To avoid use-after-free on custom env on ASSERT_WHATEVER failure. This is motivated by a rare crash seen in DBErrorHandlingFSTest.WALWriteError (VersionSet::GetObsoleteFiles in a SstFileManagerImpl::ClearError thread) and wanting to rule out this being related to that. Pull Request resolved: https://github.com/facebook/rocksdb/pull/12655 Test Plan: manually seeing ASSERT_WHATEVER failures, especially under ASAN Reviewed By: cbi42 Differential Revision: D57358202 Pulled By: pdillinger fbshipit-source-id: 4da2a0d73a54380b257e5cc1ab6c666e26b83973	2024-05-14 16:44:32 -07:00
Jay Huh	b4c6956a59	Add MultiGetEntity AttributeGroup API to stress test (#12640 ) Summary: Continuing from https://github.com/facebook/rocksdb/pull/12605, adding AttributeGroup `MultiGetEntity` API to stress tests. Pull Request resolved: https://github.com/facebook/rocksdb/pull/12640 Test Plan: AttributeGroup Tests NonBatchOps ``` python3 tools/db_crashtest.py blackbox --simple --max_key=25000000 --write_buffer_size=4194304 --use_attribute_group=1 --use_put_entity_one_in=1 --use_multi_get=1 ``` BatchOps ``` python3 tools/db_crashtest.py blackbox --test_batches_snapshots=1 --max_key=25000000 --write_buffer_size=4194304 --use_attribute_group=1 --use_put_entity_one_in=1 --use_multi_get=1 ``` CfConsistency Test ``` python3 tools/db_crashtest.py blackbox --cf_consistency --max_key=25000000 --write_buffer_size=4194304 --use_attribute_group=1 --use_put_entity_one_in=1 --use_multi_get=1 ``` Non-AttributeGroup Tests NonBatchOps ``` python3 tools/db_crashtest.py blackbox --simple --max_key=25000000 --write_buffer_size=4194304 --use_attribute_group=0 --use_put_entity_one_in=1 --use_multi_get=1 ``` BatchOps ``` python3 tools/db_crashtest.py blackbox --test_batches_snapshots=1 --max_key=25000000 --write_buffer_size=4194304 --use_attribute_group=0 --use_put_entity_one_in=1 --use_multi_get=1 ``` CfConsistency Test ``` python3 tools/db_crashtest.py blackbox --cf_consistency --max_key=25000000 --write_buffer_size=4194304 --use_attribute_group=0 --use_put_entity_one_in=1 --use_multi_get=1 ``` Reviewed By: ltamasi Differential Revision: D57233931 Pulled By: jaykorean fbshipit-source-id: 8cea771ac2e5749050bf5319360c6c5aa476d7d5	2024-05-14 16:33:44 -07:00
Andrii Lysenko	b9fc13db69	Add padding before timestamp size record if it doesn't fit into a WAL block. (#12614 ) Summary: If timestamp size record doesn't fit into a block, without padding `Writer::EmitPhysicalRecord` fails on assert (either `assert(block_offset_ + kHeaderSize + n <= kBlockSize);` or `assert(block_offset_ + kRecyclableHeaderSize + n <= kBlockSize)`, depending on whether recycling log files is enabled) in debug build. In release, current block grows beyond 32K, `block_offset_` gets reset on next `AddRecord` and all the subsequent blocks are no longer aligned by block size. Pull Request resolved: https://github.com/facebook/rocksdb/pull/12614 Reviewed By: ltamasi Differential Revision: D57302140 Pulled By: jowlyzhang fbshipit-source-id: cacb5cefb7586885e52a8137ae23a95e1aefca2d	2024-05-14 15:54:02 -07:00
Yu Zhang	1567d50a27	Temporarily disable file ingestion if LockWAL is tested (#12642 ) Summary: As titled. A proper fix should probably be failing file ingestion if the DB is in a lock wal state as it promises to "Freezes the logical state of the DB". Pull Request resolved: https://github.com/facebook/rocksdb/pull/12642 Reviewed By: pdillinger Differential Revision: D57235869 Pulled By: jowlyzhang fbshipit-source-id: c70031463842220f865621eb6f53424df27d29e9	2024-05-14 09:27:48 -07:00
Yu Zhang	c110091d36	Support read timestamp in ldb (#12641 ) Summary: As titled. Also updated sst_dump to print out user-defined timestamp separately and in human readable format. Pull Request resolved: https://github.com/facebook/rocksdb/pull/12641 Test Plan: manually tested: Example success run: ./ldb --db=$TEST_DB multi_get_entity 0x00000000000000950000000000000120 --key_hex --column_family=7 --read_timestamp=1115613683797942 --value_hex 0x00000000000000950000000000000120 ==> :0x0E0000000A0B080906070405020300011E1F1C1D1A1B181916171415121310112E2F2C2D2A2B282926272425222320213E3F3C3D3A3B383936373435323330314E4F4C4D4A4B484946474445424340415E5F5C5D5A5B58595657545552535051 Example failed run: Failed: GetEntity failed: Invalid argument: column family enables user-defined timestamp while --read_timestamp is not a valid uint64 value. sst_dump print out: '000000000000015D000000000000012B000000000000013B\|timestamp:1113554493256671' seq:2330405, type:1 => 010000000504070609080B0A0D0C0F0E111013121514171619181B1A1D1C1F1E212023222524272629282B2A2D2C2F2E313033323534373639383B3A3D3C3F3E Reviewed By: ltamasi Differential Revision: D57297006 Pulled By: jowlyzhang fbshipit-source-id: 8486d91468e4f6c0d42dca3c9629f1c45a92bf5a	2024-05-13 15:43:12 -07:00
Hui Xiao	20213d01a3	Fix crash in CompactFiles() of conflict range under `preclude_last_level_data_seconds > 0` (#12628 ) Summary: Context/Summary: Previously `CompactFiles()` used `RangeOverlapWithCompaction()` to check for conflict when sanitizing input files while later used `FilesRangeOverlapWithCompaction()` to assert for no conflict. The latter function checks for more conflict scenarios than the former does, particularly the ones arising from `preclude_last_level_data_seconds > 0` (i.e, compaction can output to second-to-the-last level). So we ran into assertion violation in `CompactFiles()` like below ``` Assertion `output_level == 0 \|\| !FilesRangeOverlapWithCompaction( input_files, output_level, Compaction::EvaluatePenultimateLevel(vstorage, ioptions_, start_level, output_level))' failed. ``` This PR make `CompactFiles()` used `FilesRangeOverlapWithCompaction()` and return Aborted status upon range conflict instead of crashing (during debug build) or proceed incorrectly (during non-debug build). To do so cleanly, I included a refactoring to make `FilesRangeOverlapWithCompaction()` part of `SanitizeAndConvertCompactionInputFiles()`, replacing `RangeOverlapWithCompaction()`. Pull Request resolved: https://github.com/facebook/rocksdb/pull/12628 Test Plan: New UT crashed before the fix and return correct status after the fix. Reviewed By: cbi42 Differential Revision: D57123536 Pulled By: hx235 fbshipit-source-id: f963a2c9e7ba1a9927a67fcc87f0dce126d3a430	2024-05-13 13:12:06 -07:00
Peter Dillinger	7747abdc15	Disable PromoteL0 in crash test (#12651 ) Summary: Seeing way too many errors likely related to PromoteL0 from https://github.com/facebook/rocksdb/issues/12617, containing ``` Cannot delete table file #N from level 0 since it is on level X ``` Pull Request resolved: https://github.com/facebook/rocksdb/pull/12651 Test Plan: watch crash test results Reviewed By: hx235 Differential Revision: D57286208 Pulled By: pdillinger fbshipit-source-id: f7f0560cc0804ca297373c8d20ebc34986cc19d0	2024-05-13 12:42:01 -07:00
Peter Dillinger	b75438f986	Allow disableWAL+recycle with WritePreparedTxnDB internals (#12639 ) Summary: Follow-up from https://github.com/facebook/rocksdb/issues/12403 The crash test was periodically failing with the "disableWAL option is not supported if recycle_log_file_num > 0" failure, despite not setting the disableWAL from the user side. Pull Request resolved: https://github.com/facebook/rocksdb/pull/12639 Test Plan: db_stress reproducer now passes. Added WAL recycling to txn DB unit tests, which is generally more difficult for correctness. Many tests now cover this change and pass. Reviewed By: anand1976 Differential Revision: D57227617 Pulled By: pdillinger fbshipit-source-id: db9abefeb505bce624b45bc64009694d2a5baed9	2024-05-10 17:56:40 -07:00
Yu Zhang	7d9642d876	Add logging for read timestamp during VerifyDB (#12638 ) Summary: As titled. To help debug some verification failures. Pull Request resolved: https://github.com/facebook/rocksdb/pull/12638 Test Plan: manually tested Reviewed By: ajkr Differential Revision: D57219549 Pulled By: jowlyzhang fbshipit-source-id: 59c05ac85fb1c24449e7394ea04172c855d86420	2024-05-10 12:34:53 -07:00
Levi Tamasi	b92d874c8b	Support MultiGetEntity in optimistic and WriteCommitted pessimistic transactions (#12634 ) Summary: Pull Request resolved: https://github.com/facebook/rocksdb/pull/12634 The patch implements support for the `MultiGetEntity` API in optimistic transactions and pessimistic transactions with the WriteCommitted policy. Similarly to the other wide-column transaction APIs, the implementation leverages the `WriteBatchWithIndex` layer. Reviewed By: jaykorean Differential Revision: D57177638 fbshipit-source-id: 2d9f9f287fc97e7c126830b48d21457c7c35db3f	2024-05-09 16:49:38 -07:00
Jay Huh	1f2715d1d2	AttributeGroup APIs in stress test - PutEntity and GetEntity (#12605 ) Summary: Adding AttributeGroup APIs in stress test. This contains the following changes only. More PRs to follow. - Introduce `use_attribute_group` flag - AttributeGroup `PutEntity()` and `GetEntity()` are now used per `use_attribute_group` flag in BatchOps, NonBatchOps and CfConsistency tests In the next PRs I plan to add - AttributeGroup `MultiGetEntity()` in Stress Test - AttributeGroupIterator in Stress Test (along with CoalescingIterator) Pull Request resolved: https://github.com/facebook/rocksdb/pull/12605 Test Plan: NonBatchOps ``` python3 tools/db_crashtest.py blackbox --simple --max_key=25000000 --write_buffer_size=4194304 --use_attribute_group=1 --use_put_entity_one_in=1 ``` BatchOps ``` python3 tools/db_crashtest.py blackbox --test_batches_snapshots=1 --max_key=25000000 --write_buffer_size=4194304 --use_attribute_group=1 --use_put_entity_one_in=1 ``` CfConsistency Test ``` python3 tools/db_crashtest.py blackbox --cf_consistency --max_key=25000000 --write_buffer_size=4194304 --use_attribute_group=1 --use_put_entity_one_in=1 ``` Reviewed By: ltamasi Differential Revision: D56916768 Pulled By: jaykorean fbshipit-source-id: 8555d9e0d05927740a10e4e8301e44beec59a6f5	2024-05-09 16:40:22 -07:00
Hui Xiao	9bddac0dcf	Add more public APIs to crash test (#12617 ) Summary: Context/Summary: As titled. Bonus: found that PromoteL0 called with other concurrent PromoteL0 will return non-okay error so clarify the API. Pull Request resolved: https://github.com/facebook/rocksdb/pull/12617 Test Plan: CI Reviewed By: ajkr Differential Revision: D56954428 Pulled By: hx235 fbshipit-source-id: 0e056153c515003fd241ffec59b0d8a27529db4c	2024-05-09 15:37:38 -07:00
Levi Tamasi	97e70906fa	Improve the sanity checks in (Multi)GetEntity and friends (#12630 ) Summary: Pull Request resolved: https://github.com/facebook/rocksdb/pull/12630 The patch cleans up, improves, and brings into sync (to the extent possible without API signature changes) the sanity checks around the `GetEntity` / `MultiGetEntity` family of APIs, including the read-your-own-writes (`WriteBatchWithIndex`) and transaction layers. The checks are centralized in two main sets of entry points, namely in `DB(Impl)` and the "main" `GetEntityFromBatchAndDB` / `MultiGetEntityFromBatchAndDB` overloads in `WriteBatchWithIndex`. This eliminates the need to duplicate the checks in the transaction classes. Reviewed By: jaykorean Differential Revision: D57125741 fbshipit-source-id: 4dd059ef644a9b173fbba767538943397e4cc6cd	2024-05-09 12:25:19 -07:00
Wei Liu	1a3357648f	Error log update to db_impl_compaction_flush.cc (#12608 ) Summary: Make the error looks better. Some inconsistency here and [here](`e46ab9d4f0/db/db_impl/db_impl_compaction_flush.cc (L2701-L2702)`) Pull Request resolved: https://github.com/facebook/rocksdb/pull/12608 Reviewed By: ajkr Differential Revision: D57134933 Pulled By: cbi42 fbshipit-source-id: 2f19f077f388d196652a4e3afd2526f18bf75b2d	2024-05-09 11:36:24 -07:00
Yu Zhang	9dc171e3bb	Fix issue that cause false alarm corruption report (#12626 ) Summary: The state of `saved_seq_for_penul_check_` is not correctly maintained with the current flow. It's supposed to store the original sequence number for a `kTypeValuePreferredSeqno` entry for use in the `DecideOutputLevel` function. However, it's not always properly cleared. Pull Request resolved: https://github.com/facebook/rocksdb/pull/12626 Test Plan: Added unit test that would fail before the fix ./tiered_compaction_test --gtest_filter="InterleavedTimedPutAndPut" Reviewed By: pdillinger Differential Revision: D57123469 Pulled By: jowlyzhang fbshipit-source-id: 8d73214b3b6dc152daf19b6bd6ee9063581dc277	2024-05-08 15:51:38 -07:00
Peter Dillinger	eeda54fe63	Fix db_crashtest.py for prefixpercent and enable_compaction_filter (#12627 ) Summary: After https://github.com/facebook/rocksdb/issues/12624 seeing db_stress failures due to db_crashtest.py calling it with --prefixpercent=5 --enable_compaction_filter=1 Pull Request resolved: https://github.com/facebook/rocksdb/pull/12627 Test Plan: watch crash test Reviewed By: ajkr Differential Revision: D57121592 Pulled By: pdillinger fbshipit-source-id: 55727355a7662e67efcd22d7e353153e78e24f59	2024-05-08 13:16:44 -07:00
Andrew Kryczka	933ac0e05c	Fix locking for `ColumnFamilyOptions::inplace_update_support` (#12624 ) Summary: In `SaveValue()`, the read lock needs to be obtained before `VerifyEntryChecksum()` because the KV checksum verification reads the entire value metadata+data, which is all mutable when `ColumnFamilyOptions::inplace_update_support == true`. In `MemTable::Update()`, the write lock needs to be obtained before mutating the value metadata (changing the value size) because it can be read concurrently. Pull Request resolved: https://github.com/facebook/rocksdb/pull/12624 Test Plan: ``` $ make COMPILE_WITH_TSAN=1 -j56 db_stress ... $ python3 tools/db_crashtest.py blackbox --simple --max_key=10 --inplace_update_support=1 --interval=10 --allow_concurrent_memtable_write=0 ``` Reviewed By: cbi42 Differential Revision: D57034571 Pulled By: ajkr fbshipit-source-id: 3dddf881ad87923143acdf6bfec12ce47bb13a48	2024-05-08 08:30:12 -07:00
Hans Holmberg	b8400c9faf	Make linux file write life time hinting work (#12595 ) Summary: The life time hint fcntl takes a 64-bit unsigned int, so make sure to pass a uint64_t when doing the syscall. See: https://man7.org/linux/man-pages/man2/fcntl.2.html https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=c75b1d9421f80f4143e389d2d50ddfc8a28c8c35 This is one of those "How did this ever work?", as Env::WriteLifeTimeHint hint is definitely not the same as an 64-bit unsigned int. What's surprising is that SetWriteLifeTimeHint does pass a valid hint from time to time. Thanks, Hans Pull Request resolved: https://github.com/facebook/rocksdb/pull/12595 Reviewed By: cbi42 Differential Revision: D56901280 Pulled By: ajkr fbshipit-source-id: f276348863cbc29a537bed9450b16b0cc513ea78	2024-05-07 17:54:50 -07:00
Changyu Bi	5bf2c00a35	Clarify manual compaction and file ingestion behavior with FIFO compaction (#12618 ) Summary: For manual compaction, FIFO compaction will always skip key range overlapping checking with SST files. If CompactRange() is called with CompactionRangeOptions::change_level=true, a CF with FIFO compaction will now return Status::NotSupported. For file ingestion, we will always ingest into L0. Previously, it's possible to ingest files into non-L0 levels with FIFO compaction. These changes also help to fix [this](`a178d15baf/db/db_impl/db_impl_compaction_flush.cc (L1269)`) assertion failure in crash tests. Pull Request resolved: https://github.com/facebook/rocksdb/pull/12618 Test Plan: added unit tests to verify the new behavior. Reviewed By: hx235 Differential Revision: D56962401 Pulled By: cbi42 fbshipit-source-id: 19812a1509650b4162b379ca5bee02f2e9d9569d	2024-05-07 12:00:15 -07:00
Levi Tamasi	83d051a8d9	Add release note for GetEntity transaction support (#12625 ) Summary: Pull Request resolved: https://github.com/facebook/rocksdb/pull/12625 Reviewed By: jaykorean Differential Revision: D57059775 fbshipit-source-id: 80b3ddb51d538c6c21b69cd589f4ee8dd13596c9	2024-05-07 11:38:04 -07:00
Levi Tamasi	eaa3226ef7	Add support for GetEntity in optimistic and WriteCommitted pessimistic transactions (#12623 ) Summary: Pull Request resolved: https://github.com/facebook/rocksdb/pull/12623 The PR adds support for the `GetEntity` API to optimistic and WriteCommitted pessimistic transactions. `MultiGetEntity` support and the `ForUpdate` variants of these read APIs will be implemented in subsequent PRs. Reviewed By: jaykorean Differential Revision: D57030879 fbshipit-source-id: 1f0aed6418782975fe537b6b3d437fad31fcbd43	2024-05-07 10:20:26 -07:00
Zaidoon Abd Al Hadi	36ab251c07	Expose block based metadata cache options via C API (#12611 ) Summary: Pull Request resolved: https://github.com/facebook/rocksdb/pull/12611 Reviewed By: jaykorean Differential Revision: D56961823 Pulled By: ajkr fbshipit-source-id: aa062cdb49a0bb2c1148a81d4c882a4733c7790e	2024-05-06 16:49:11 -07:00
Levi Tamasi	45c290660a	Add PutEntity support for optimistic and WritePrepared pessimistic transactions (#12606 ) Summary: Pull Request resolved: https://github.com/facebook/rocksdb/pull/12606 The patch extends optimistic transactions and WriteCommitted pessimistic transactions with support for the `PutEntity` API. Similarly to the other APIs, `PutEntity` is available via both the `Transaction` and `TransactionDB` interfaces, where using the latter executes the write in a single-operation transaction as usual. Support for read APIs and other write policies (WritePrepared, WriteUnprepared) will be added in separate PRs. Reviewed By: jaykorean Differential Revision: D56911242 fbshipit-source-id: 57cf8bb6c6b1b40ba4a8a652831c13a617644289	2024-05-06 14:41:00 -07:00

1 2 3 4 5 ...

12706 commits