Commit graph

12054 commits

Author SHA1 Message Date
hackingthekernel 291300ece8 add c-api for allowing FIFO compaction (#11156)
Summary:
Addressing issue https://github.com/facebook/rocksdb/issues/11079

Pull Request resolved: https://github.com/facebook/rocksdb/pull/11156

Reviewed By: pdillinger

Differential Revision: D42869964

Pulled By: ajkr

fbshipit-source-id: 58214901d4e072c568d4c5cf0944a0b1c60de897
2023-03-16 16:57:03 -07:00
Peter Dillinger ccaa3225b0 Simplify tracking entries already in SecondaryCache (#11299)
Summary:
In preparation for factoring secondary cache support out of individual Cache implementations, we can get rid of the "in secondary cache" flag on entries through a workable hack: when an entry is promoted from secondary, it is inserted in primary using a helper that lacks secondary cache support, thus preventing re-insertion into secondary cache through existing logic.

This adds to the complexity of building CacheItemHelpers, because you always have to be able to get to an equivalent helper without secondary cache support, but that complexity is reasonably isolated within RocksDB typed_cache.h and test code.

gcc-7 seems to have problems with constexpr constructor referencing `this` so removed constexpr support on CacheItemHelper.

Also refactored some related test code to share common code / functionality.

Pull Request resolved: https://github.com/facebook/rocksdb/pull/11299

Test Plan: existing tests

Reviewed By: anand1976

Differential Revision: D44101453

Pulled By: pdillinger

fbshipit-source-id: 7a59d0a3938ee40159c90c3e65d7004f6a272345
2023-03-15 17:51:44 -07:00
nccx 664dabda8f Add Microsoft Bing as a user (#11270)
Summary: Pull Request resolved: https://github.com/facebook/rocksdb/pull/11270

Reviewed By: pdillinger

Differential Revision: D43811584

Pulled By: ajkr

fbshipit-source-id: f27e55395644a469840785685646456f6b1452fc
2023-03-15 15:29:28 -07:00
Hui Xiao bab5f9a6f2 Add new stat rocksdb.table.open.prefetch.tail.read.bytes, rocksdb.table.open.prefetch.tail.{miss|hit} (#11265)
Summary:
**Context/Summary:**
We are adding new stats to measure behavior of prefetched tail size and look up into this buffer

The stat collection is done in FilePrefetchBuffer but only for prefetched tail buffer during table open for now using FilePrefetchBuffer enum. It's cleaner than the alternative of implementing in upper-level call places of FilePrefetchBuffer for table open. It also has the benefit of extensible to other types of FilePrefetchBuffer if needed. See db bench for perf regression concern.

Pull Request resolved: https://github.com/facebook/rocksdb/pull/11265

Test Plan:
**- Piggyback on existing test**
**- rocksdb.table.open.prefetch.tail.miss is harder to UT so I manually set prefetch tail read bytes to be small and run db bench.**
```
./db_bench -db=/tmp/testdb -statistics=true -benchmarks="fillseq" -key_size=32 -value_size=512 -num=5000 -write_buffer_size=655 -target_file_size_base=655 -disable_auto_compactions=false -compression_type=none -bloom_bits=3  -use_direct_reads=true
```
```
rocksdb.table.open.prefetch.tail.read.bytes P50 : 4096.000000 P95 : 4096.000000 P99 : 4096.000000 P100 : 4096.000000 COUNT : 225 SUM : 921600
rocksdb.table.open.prefetch.tail.miss COUNT : 91
rocksdb.table.open.prefetch.tail.hit COUNT : 1034
```
**- No perf regression observed in db_bench**

SETUP command: create same db with ~900 files for pre-change/post-change.
```
./db_bench -db=/tmp/testdb -benchmarks="fillseq" -key_size=32 -value_size=512 -num=500000 -write_buffer_size=655360  -disable_auto_compactions=true -target_file_size_base=16777216 -compression_type=none
```
TEST command 60 runs or til convergence: as suggested by anand1976 and akankshamahajan15, vary `seek_nexts` and `async_io` in testing.
```
./db_bench -use_existing_db=true -db=/tmp/testdb -statistics=false -cache_size=0 -cache_index_and_filter_blocks=false -benchmarks=seekrandom[-X60] -num=50000 -seek_nexts={10, 500, 1000} -async_io={0|1} -use_direct_reads=true
```
async io = 0, direct io read = true

  | seek_nexts = 10, 30 runs | seek_nexts = 500, 12 runs | seek_nexts = 1000, 6 runs
-- | -- | -- | --
pre-post change | 4776 (± 28) ops/sec;   24.8 (± 0.1) MB/sec | 288 (± 1) ops/sec;   74.8 (± 0.4) MB/sec | 145 (± 4) ops/sec;   75.6 (± 2.2) MB/sec
post-change | 4790 (± 32) ops/sec;   24.9 (± 0.2) MB/sec | 288 (± 3) ops/sec;   74.7 (± 0.8) MB/sec | 143 (± 3) ops/sec;   74.5 (± 1.6) MB/sec

async io = 1, direct io read = true
  | seek_nexts = 10, 54 runs | seek_nexts = 500, 6 runs | seek_nexts = 1000, 4 runs
-- | -- | -- | --
pre-post change | 3350 (± 36) ops/sec;   17.4 (± 0.2) MB/sec | 264 (± 0) ops/sec;   68.7 (± 0.2) MB/sec | 138 (± 1) ops/sec;   71.8 (± 1.0) MB/sec
post-change | 3358 (± 27) ops/sec;   17.4 (± 0.1) MB/sec  | 263 (± 2) ops/sec;   68.3 (± 0.8) MB/sec | 139 (± 1) ops/sec;   72.6 (± 0.6) MB/sec

Reviewed By: ajkr

Differential Revision: D43781467

Pulled By: hx235

fbshipit-source-id: a706a18472a8edb2b952bac3af40eec803537f2a
2023-03-15 14:02:43 -07:00
Peter Dillinger 601efe3cf2 Misc cleanup of block cache code (#11291)
Summary:
... ahead of a larger change.
* Rename confusingly named `is_in_sec_cache` to `kept_in_sec_cache`
* Unify naming of "standalone" block cache entries (was "detached" in clock_cache)
* Remove some unused definitions in clock_cache.h (leftover from a previous revision)

Pull Request resolved: https://github.com/facebook/rocksdb/pull/11291

Test Plan: usual tests and CI, no behavior changes

Reviewed By: anand1976

Differential Revision: D43984642

Pulled By: pdillinger

fbshipit-source-id: b8bf0c5b90a932a88bcbdb413b2f256834aedf97
2023-03-15 12:08:17 -07:00
Hui Xiao 11cb6af6e5 Fix bug of prematurely excluded CF in atomic flush contains unflushed data that should've been included in the atomic flush (#11148)
Summary:
**Context:**
Atomic flush should guarantee recoverability of all data of seqno up to the max seqno of the flush. It achieves this by ensuring all such data are flushed by the time this atomic flush finishes through `SelectColumnFamiliesForAtomicFlush()`. However, our crash test exposed the following case where an excluded CF from an atomic flush contains unflushed data of seqno less than the max seqno of that atomic flush and loses its data with `WriteOptions::DisableWAL=true` in face of a crash right after the atomic flush finishes .
```
./db_stress --preserve_unverified_changes=1 --reopen=0 --acquire_snapshot_one_in=0 --adaptive_readahead=1 --allow_data_in_errors=True --async_io=1 --atomic_flush=1 --avoid_flush_during_recovery=0 --avoid_unnecessary_blocking_io=0 --backup_max_size=104857600 --backup_one_in=0 --batch_protection_bytes_per_key=0 --block_size=16384 --bloom_bits=15 --bottommost_compression_type=none --bytes_per_sync=262144 --cache_index_and_filter_blocks=0 --cache_size=8388608 --cache_type=lru_cache --charge_compression_dictionary_building_buffer=0 --charge_file_metadata=1 --charge_filter_construction=0 --charge_table_reader=0 --checkpoint_one_in=0 --checksum_type=kXXH3 --clear_column_family_one_in=0 --compact_files_one_in=0 --compact_range_one_in=0 --compaction_pri=1 --compaction_ttl=100 --compression_max_dict_buffer_bytes=134217727 --compression_max_dict_bytes=16384 --compression_parallel_threads=1 --compression_type=lz4hc --compression_use_zstd_dict_trainer=0 --compression_zstd_max_train_bytes=0 --continuous_verification_interval=0 --data_block_index_type=0 --db=$db --db_write_buffer_size=1048576 --delpercent=4 --delrangepercent=1 --destroy_db_initially=0 --detect_filter_construct_corruption=0 --disable_wal=1 --enable_compaction_filter=0 --enable_pipelined_write=0 --expected_values_dir=$exp --fail_if_options_file_error=0 --fifo_allow_compaction=0 --file_checksum_impl=none --flush_one_in=0 --format_version=5 --get_current_wal_file_one_in=0 --get_live_files_one_in=100 --get_property_one_in=0 --get_sorted_wal_files_one_in=0 --index_block_restart_interval=2 --index_type=0 --ingest_external_file_one_in=0 --initial_auto_readahead_size=524288 --iterpercent=10 --key_len_percent_dist=1,30,69 --level_compaction_dynamic_level_bytes=True --long_running_snapshots=1 --manual_wal_flush_one_in=100 --mark_for_compaction_one_file_in=0 --max_auto_readahead_size=0 --max_background_compactions=20 --max_bytes_for_level_base=10485760 --max_key=10000 --max_key_len=3 --max_manifest_file_size=1073741824 --max_write_batch_group_size_bytes=64 --max_write_buffer_number=3 --max_write_buffer_size_to_maintain=0 --memtable_prefix_bloom_size_ratio=0.01 --memtable_protection_bytes_per_key=4 --memtable_whole_key_filtering=0 --memtablerep=skip_list --min_write_buffer_number_to_merge=2 --mmap_read=1 --mock_direct_io=False --nooverwritepercent=1 --num_file_reads_for_auto_readahead=0 --open_files=-1 --open_metadata_write_fault_one_in=0 --open_read_fault_one_in=0 --open_write_fault_one_in=0 --ops_per_thread=100000000 --optimize_filters_for_memory=1 --paranoid_file_checks=1 --partition_filters=0 --partition_pinning=3 --pause_background_one_in=0 --periodic_compaction_seconds=100 --prefix_size=8 --prefixpercent=5 --prepopulate_block_cache=0 --preserve_internal_time_seconds=3600 --progress_reports=0 --read_fault_one_in=32 --readahead_size=16384 --readpercent=50 --recycle_log_file_num=0 --ribbon_starting_level=6 --secondary_cache_fault_one_in=0 --set_options_one_in=10000 --snapshot_hold_ops=100000 --sst_file_manager_bytes_per_sec=104857600 --sst_file_manager_bytes_per_truncate=1048576 --stats_dump_period_sec=10 --subcompactions=1 --sync=0 --sync_fault_injection=0 --target_file_size_base=524288 --target_file_size_multiplier=2 --test_batches_snapshots=0 --top_level_index_pinning=0 --unpartitioned_pinning=1 --use_direct_io_for_flush_and_compaction=0 --use_direct_reads=0 --use_full_merge_v1=0 --use_merge=0 --use_multiget=1 --use_put_entity_one_in=0 --user_timestamp_size=0 --value_size_mult=32 --verify_checksum=1 --verify_checksum_one_in=0 --verify_db_one_in=1000 --verify_sst_unique_id_in_manifest=1 --wal_bytes_per_sync=524288 --wal_compression=none --write_buffer_size=524288 --write_dbid_to_manifest=1 --write_fault_one_in=0 --writepercent=30 &
    pid=$!
    sleep 0.2
    sleep 10
    kill $pid
    sleep 0.2
./db_stress --ops_per_thread=1 --preserve_unverified_changes=1 --reopen=0 --acquire_snapshot_one_in=0 --adaptive_readahead=1 --allow_data_in_errors=True --async_io=1 --atomic_flush=1 --avoid_flush_during_recovery=0 --avoid_unnecessary_blocking_io=0 --backup_max_size=104857600 --backup_one_in=0 --batch_protection_bytes_per_key=0 --block_size=16384 --bloom_bits=15 --bottommost_compression_type=none --bytes_per_sync=262144 --cache_index_and_filter_blocks=0 --cache_size=8388608 --cache_type=lru_cache --charge_compression_dictionary_building_buffer=0 --charge_file_metadata=1 --charge_filter_construction=0 --charge_table_reader=0 --checkpoint_one_in=0 --checksum_type=kXXH3 --clear_column_family_one_in=0 --compact_files_one_in=0 --compact_range_one_in=0 --compaction_pri=1 --compaction_ttl=100 --compression_max_dict_buffer_bytes=134217727 --compression_max_dict_bytes=16384 --compression_parallel_threads=1 --compression_type=lz4hc --compression_use_zstd_dict_trainer=0 --compression_zstd_max_train_bytes=0 --continuous_verification_interval=0 --data_block_index_type=0 --db=$db --db_write_buffer_size=1048576 --delpercent=4 --delrangepercent=1 --destroy_db_initially=0 --detect_filter_construct_corruption=0 --disable_wal=1 --enable_compaction_filter=0 --enable_pipelined_write=0 --expected_values_dir=$exp --fail_if_options_file_error=0 --fifo_allow_compaction=0 --file_checksum_impl=none --flush_one_in=0 --format_version=5 --get_current_wal_file_one_in=0 --get_live_files_one_in=100 --get_property_one_in=0 --get_sorted_wal_files_one_in=0 --index_block_restart_interval=2 --index_type=0 --ingest_external_file_one_in=0 --initial_auto_readahead_size=524288 --iterpercent=10 --key_len_percent_dist=1,30,69 --level_compaction_dynamic_level_bytes=True --long_running_snapshots=1 --manual_wal_flush_one_in=100 --mark_for_compaction_one_file_in=0 --max_auto_readahead_size=0 --max_background_compactions=20 --max_bytes_for_level_base=10485760 --max_key=10000 --max_key_len=3 --max_manifest_file_size=1073741824 --max_write_batch_group_size_bytes=64 --max_write_buffer_number=3 --max_write_buffer_size_to_maintain=0 --memtable_prefix_bloom_size_ratio=0.01 --memtable_protection_bytes_per_key=4 --memtable_whole_key_filtering=0 --memtablerep=skip_list --min_write_buffer_number_to_merge=2 --mmap_read=1 --mock_direct_io=False --nooverwritepercent=1 --num_file_reads_for_auto_readahead=0 --open_files=-1 --open_metadata_write_fault_one_in=0 --open_read_fault_one_in=0 --open_write_fault_one_in=0 --ops_per_thread=100000000 --optimize_filters_for_memory=1 --paranoid_file_checks=1 --partition_filters=0 --partition_pinning=3 --pause_background_one_in=0 --periodic_compaction_seconds=100 --prefix_size=8 --prefixpercent=5 --prepopulate_block_cache=0 --preserve_internal_time_seconds=3600 --progress_reports=0 --read_fault_one_in=32 --readahead_size=16384 --readpercent=50 --recycle_log_file_num=0 --ribbon_starting_level=6 --secondary_cache_fault_one_in=0 --set_options_one_in=10000 --snapshot_hold_ops=100000 --sst_file_manager_bytes_per_sec=104857600 --sst_file_manager_bytes_per_truncate=1048576 --stats_dump_period_sec=10 --subcompactions=1 --sync=0 --sync_fault_injection=0 --target_file_size_base=524288 --target_file_size_multiplier=2 --test_batches_snapshots=0 --top_level_index_pinning=0 --unpartitioned_pinning=1 --use_direct_io_for_flush_and_compaction=0 --use_direct_reads=0 --use_full_merge_v1=0 --use_merge=0 --use_multiget=1 --use_put_entity_one_in=0 --user_timestamp_size=0 --value_size_mult=32 --verify_checksum=1 --verify_checksum_one_in=0 --verify_db_one_in=1000 --verify_sst_unique_id_in_manifest=1 --wal_bytes_per_sync=524288 --wal_compression=none --write_buffer_size=524288 --write_dbid_to_manifest=1 --write_fault_one_in=0 --writepercent=30 &
    pid=$!
    sleep 0.2
    sleep 40
    kill $pid
    sleep 0.2

Verification failed for column family 6 key 0000000000000239000000000000012B0000000000000138 (56622): value_from_db: , value_from_expected: 4A6331754E4F4C4D42434041464744455A5B58595E5F5C5D5253505156575455, msg: Value not found: NotFound:
Crash-recovery verification failed :(
No writes or ops?
Verification failed :(
```

The bug is due to the following:
- When atomic flush is used, an empty CF is legally [excluded](https://github.com/facebook/rocksdb/blob/7.10.fb/db/db_filesnapshot.cc#L39) in `SelectColumnFamiliesForAtomicFlush` as the first step of `DBImpl::FlushForGetLiveFiles` before [passing](https://github.com/facebook/rocksdb/blob/7.10.fb/db/db_filesnapshot.cc#L42) the included CFDs to `AtomicFlushMemTables`.
- But [later](https://github.com/facebook/rocksdb/blob/7.10.fb/db/db_impl/db_impl_compaction_flush.cc#L2133) in `AtomicFlushMemTables`, `WaitUntilFlushWouldNotStallWrites` will [release the db mutex](https://github.com/facebook/rocksdb/blob/7.10.fb/db/db_impl/db_impl_compaction_flush.cc#L2403), during which data@seqno N can be inserted into the excluded CF and data@seqno M can be inserted into one of the included CFs, where M > N.
- However, data@seqno N in an already-excluded CF is thus excluded from this atomic flush while we seqno N is less than seqno M.

**Summary:**
- Replace `SelectColumnFamiliesForAtomicFlush()`-before-`AtomicFlushMemTables()` with `SelectColumnFamiliesForAtomicFlush()`-after-wait-within-`AtomicFlushMemTables()` so we ensure no write affecting the recoverability of this atomic job (i.e, change to max seqno of this atomic flush or insertion of data with less seqno than the max seqno of the atomic flush to excluded CF) can happen after calling `SelectColumnFamiliesForAtomicFlush()`.
- For above, refactored and clarified comments on `SelectColumnFamiliesForAtomicFlush()` and `AtomicFlushMemTables()` for clearer semantics of passed-in CFDs to atomic-flush

Pull Request resolved: https://github.com/facebook/rocksdb/pull/11148

Test Plan:
- New unit test failed before the fix and passes after
- Make check
- Rehearsal stress test

Reviewed By: ajkr

Differential Revision: D42799871

Pulled By: hx235

fbshipit-source-id: 13636b63e9c25c5895857afc36ea580d57f6d644
2023-03-14 16:53:20 -07:00
Peter Dillinger 2a23bee963 Use CacheWrapper in more places (#11295)
Summary:
... to simplify code and make it less prone to needless updates on refactoring.

Pull Request resolved: https://github.com/facebook/rocksdb/pull/11295

Test Plan: existing tests (no functional changes intended)

Reviewed By: hx235

Differential Revision: D44040260

Pulled By: pdillinger

fbshipit-source-id: 1b6badb5c8ca673db0903bfaba3cfbc986f386be
2023-03-13 20:41:55 -07:00
Levi Tamasi 49881921cd Rename a recently added PerfContext counter (#11294)
Summary:
The patch renames the counter added in https://github.com/facebook/rocksdb/issues/11284 for better consistency with the existing naming scheme.

Pull Request resolved: https://github.com/facebook/rocksdb/pull/11294

Test Plan: `make check`

Reviewed By: jowlyzhang

Differential Revision: D44035964

Pulled By: ltamasi

fbshipit-source-id: 8b1a2a03ee728148365367e0ecc1fcf462f62191
2023-03-13 18:43:27 -07:00
Peter Dillinger 648e972f30 Document DB::Resume(), fix LockWALInEffect test (#11290)
Summary:
In rare cases seeing failures like this

```
[ RUN      ] DBWriteTestInstance/DBWriteTest.LockWALInEffect/2
db/db_write_test.cc:653: Failure
Put("key3", "value")
Corruption: Not active
```

in a test with no explicit threading. This is likely because of the unpredictability of background auto-resume. I didn't really know this feature, in part because DB::Resume() was undocumented. So I believe I have fixed the test and documented the API function.

Pull Request resolved: https://github.com/facebook/rocksdb/pull/11290

Test Plan: 1000s of stress runs of the test with gtest-parallel

Reviewed By: anand1976

Differential Revision: D43984583

Pulled By: pdillinger

fbshipit-source-id: d30dec120b4864e193751b2e33ff16834d313db3
2023-03-13 14:19:59 -07:00
Changyu Bi 9aa3b6f9ae Support range deletion tombstones in CreateColumnFamilyWithImport (#11252)
Summary:
CreateColumnFamilyWithImport() did not support range tombstones for two reasons:
1. it uses point keys of a input file to determine its boundary (smallest and largest internal key), which means range tombstones outside of the point key range will be effectively dropped.
2. it does not handle files with no point keys.

Also included a fix in external_sst_file_ingestion_job.cc where the blocks read in `GetIngestedFileInfo()` can be added to block cache now (issue fixed in https://github.com/facebook/rocksdb/pull/6429).

This PR adds support for exporting and importing column family with range tombstones. The main change is to add smallest internal key and largest internal key to `SstFileMetaData` that will be part of the output of `ExportColumnFamily()`. Then during `CreateColumnFamilyWithImport(...,const ExportImportFilesMetaData& metadata,...)`, file boundaries can be set from `metadata` directly. This is needed since when file boundaries are extended by range tombstones, sometimes they cannot be deduced from a file's content alone.

Pull Request resolved: https://github.com/facebook/rocksdb/pull/11252

Test Plan:
- added unit tests that fails before this change

Closes https://github.com/facebook/rocksdb/issues/11245

Reviewed By: ajkr

Differential Revision: D43577443

Pulled By: cbi42

fbshipit-source-id: 6bff78e583cc50c44854994dea0a8dd519398f2f
2023-03-13 11:06:59 -07:00
Alan Paxton fbd603d04a Reverse wrong order of parameter names for Java WriteBatchWithIndex#iteratorWithBase (#11280)
Summary:
Fix for https://github.com/facebook/rocksdb/issues/11008

`Java_org_rocksdb_WriteBatchWithIndex_iteratorWithBase` takes parameters `(… jlong jwbwi_handle, jlong jcf_handle,
    jlong jbase_iterator_handle, jlong jread_opts_handle)` while `WriteBatchWithIndex.java` declares `private native long iteratorWithBase(final long handle, final long baseIteratorHandle,
      final long cfHandle, final long readOptionsHandle)`.

Luckily the only call to `iteratorWithBase` passes the parameters in the correct order for the implementation `(… cfHandle, baseIteratorHandle …)` This type checks because the types are the same (long words).

The code is currently used correctly, it is just extremely misleading. Swap the names of the 2 parameters in the Java method so that the correct usage is clear.

There already exist test methods which call the API correctly and only succeed because of that. These continue to work.

Pull Request resolved: https://github.com/facebook/rocksdb/pull/11280

Reviewed By: cbi42

Differential Revision: D43874798

Pulled By: ajkr

fbshipit-source-id: b59bc930bf579f4e0804f0effd4fb17f4225d60c
2023-03-10 12:26:09 -08:00
Jaepil Jeong 969d4e1dd2 Fix compile errors in Clang due to unused variables depending on the build configuration (#11234)
Summary:
This PR fixes compilation errors in Clang due to unused variables like the below:
```
[109/329] Building CXX object CMakeFiles/rocksdb.dir/db/version_edit_handler.cc.o
FAILED: CMakeFiles/rocksdb.dir/db/version_edit_handler.cc.o
ccache /opt/homebrew/opt/llvm/bin/clang++ -DGFLAGS=1 -DGFLAGS_IS_A_DLL=0 -DHAVE_FULLFSYNC -DJEMALLOC_NO_DEMANGLE -DLZ4 -DOS_MACOSX -DROCKSDB_JEMALLOC -DROCKSDB_LIB_IO_POSIX -DROCKSDB_NO_DYNAMIC_EXTENSION -DROCKSDB_PLATFORM_POSIX -DSNAPPY -DTBB -DZLIB -DZSTD -I/Users/jaepil/work/deepsearch/deps/cpp/rocksdb -I/Users/jaepil/work/deepsearch/deps/cpp/rocksdb/include -isysroot /Applications/Xcode.app/Contents/Developer/Platforms/MacOSX.platform/Developer/SDKs/MacOSX.sdk -I/Users/jaepil/app/include -I/opt/homebrew/include -I/opt/homebrew/opt/llvm/include -I/opt/homebrew/opt/llvm/include/c++/v1 -W -Wextra -Wall -pthread -Wsign-compare -Wshadow -Wno-unused-parameter -Wno-unused-variable -Woverloaded-virtual -Wnon-virtual-dtor -Wno-missing-field-initializers -Wno-strict-aliasing -Wno-invalid-offsetof -fno-omit-frame-pointer -momit-leaf-frame-pointer -march=armv8-a+crc+crypto -Wno-unused-function -Werror -O2 -g -DNDEBUG -arch arm64 -isysroot /Applications/Xcode.app/Contents/Developer/Platforms/MacOSX.platform/Developer/SDKs/MacOSX13.1.sdk -std=gnu++20 -MD -MT CMakeFiles/rocksdb.dir/db/version_edit_handler.cc.o -MF CMakeFiles/rocksdb.dir/db/version_edit_handler.cc.o.d -o CMakeFiles/rocksdb.dir/db/version_edit_handler.cc.o -c /Users/jaepil/work/deepsearch/deps/cpp/rocksdb/db/version_edit_handler.cc
/Users/jaepil/work/deepsearch/deps/cpp/rocksdb/db/version_edit_handler.cc:30:10: error: variable 'recovered_edits' set but not used [-Werror,-Wunused-but-set-variable]
  size_t recovered_edits = 0;
         ^
1 error generated.
```

Pull Request resolved: https://github.com/facebook/rocksdb/pull/11234

Reviewed By: cbi42

Differential Revision: D43458604

Pulled By: ajkr

fbshipit-source-id: d8c50e1a108887b037a120cd9f19374ddaeee817
2023-03-09 16:42:57 -08:00
zhangliangkai1992 7a07afe82e DBWithTTLImpl::IsStale overflow when ttl is 15 years (#11279)
Summary:
Fix DBWIthTTLImpl::IsStale overflow

Pull Request resolved: https://github.com/facebook/rocksdb/pull/11279

Reviewed By: cbi42

Differential Revision: D43875039

Pulled By: ajkr

fbshipit-source-id: 3e5feb8c4c4480bf1421b0763ade3d2e459ec028
2023-03-09 13:11:25 -08:00
Alan Paxton daeec505a4 Add instructions for installing googlebenchmark (#11282)
Summary:
Per the discussion in https://groups.google.com/g/rocksdb/c/JqhlvSs6ZEs/m/bnXZ7Q--AAAJ
It seems non-obvious that googlebenchmark must be installed manually before microbenchmarks can be run. I have added more detail to the installation instructions to make it clearer.

Pull Request resolved: https://github.com/facebook/rocksdb/pull/11282

Reviewed By: cbi42

Differential Revision: D43874724

Pulled By: ajkr

fbshipit-source-id: f64a4ac4914cb057955d1ca965885f8822ca7764
2023-03-09 13:11:00 -08:00
akankshamahajan 1de697628e Fix hang in async_io benchmarks in regression script (#11285)
Summary:
Fix hang in async_io benchmarks in regression script. I changed the order of benchmarks and that somehow fixed the issue of hang.

Pull Request resolved: https://github.com/facebook/rocksdb/pull/11285

Test Plan: Ran it manually

Reviewed By: pdillinger

Differential Revision: D43937431

Pulled By: akankshamahajan15

fbshipit-source-id: 7c43075d3be6b8f41d08e845664012768b769661
2023-03-09 09:16:20 -08:00
Levi Tamasi 1d52438504 Add a PerfContext counter for merge operands applied in point lookups (#11284)
Summary:
The existing PerfContext counter `internal_merge_count` only tracks the
Merge operands applied during range scans. The patch adds a new counter
called `internal_merge_count_point_lookups` to track the same metric
for point lookups (`Get` / `MultiGet` / `GetEntity` / `MultiGetEntity`), and
also fixes a couple of cases in the iterator where the existing counter wasn't
updated.

Pull Request resolved: https://github.com/facebook/rocksdb/pull/11284

Test Plan: `make check`

Reviewed By: jowlyzhang

Differential Revision: D43926082

Pulled By: ltamasi

fbshipit-source-id: 321566d8b4cf0a3b6c9b73b7a5c984fb9bb492e9
2023-03-08 18:22:11 -08:00
akankshamahajan 6c65bf1743 Decrease duration time for internally debugging the regression_script (#11283)
Summary:
Internally, the benchmark is going on hang state whereas when run on same host manually, it passes. Decrease the duration to 5s to figure out how much time it is taking to complete the benchmark.

Pull Request resolved: https://github.com/facebook/rocksdb/pull/11283

Test Plan: Ran manually internally

Reviewed By: hx235

Differential Revision: D43882260

Pulled By: akankshamahajan15

fbshipit-source-id: 9ea44164773d4df4fc05cd817b7e011426c4d428
2023-03-07 15:07:49 -08:00
Peter Dillinger e01073252b Tests verifying non-zero checksums of zero bytes (#11260)
Summary:
Adds unit tests verifying that a block payload and checksum of all zeros is not falsely considered valid data. The test exhaustively checks that for blocks up to some length (default 20K, more exhaustively 10M) of all zeros do not produce a block checksum of all zeros.

Also small refactoring of an existing checksum test to use parameterized test. (Suggest hiding whitespace changes for review.)

Pull Request resolved: https://github.com/facebook/rocksdb/pull/11260

Test Plan:
this is the test, manual run with
`ROCKSDB_THOROUGH_CHECKSUM_TEST=1` to verify up to 10M.

Reviewed By: hx235

Differential Revision: D43706192

Pulled By: pdillinger

fbshipit-source-id: 95e721c320ca928e7fa2400c2570fb359cc30b1f
2023-03-06 11:53:09 -08:00
akankshamahajan 13357de0c2 Add support for parameters setting related to async_io benchmarks (#11262)
Summary:
Provide support in benchmark regression to use different options to be used in async_io benchamark only - "$`MAX_READAHEAD_SIZE`", $`INITIAL_READAHEAD_SIZE`", "$`NUM_READS_FOR_READAHEAD_SIZE`".
If user wants to run set these parameters for all benchmarks then these parameters need to be set in OPTION file instead.

Pull Request resolved: https://github.com/facebook/rocksdb/pull/11262

Test Plan: Ran manually

Reviewed By: anand1976

Differential Revision: D43725567

Pulled By: akankshamahajan15

fbshipit-source-id: 28c3462dd785ffd646d44560fa9c92bc6a8066e5
2023-03-06 11:22:21 -08:00
Levi Tamasi a1a3b23346 Deflake/fix BlobSourceCacheReservationTest.IncreaseCacheReservationOnFullCache (#11273)
Summary:
`BlobSourceCacheReservationTest.IncreaseCacheReservationOnFullCache` is both flaky and also doesn't do what its name says. The patch changes this test so it actually tests increasing the cache reservation, hopefully also deflaking it in the process.

Pull Request resolved: https://github.com/facebook/rocksdb/pull/11273

Test Plan: `make check`

Reviewed By: akankshamahajan15

Differential Revision: D43800935

Pulled By: ltamasi

fbshipit-source-id: 5eb54130dfbe227285b0e14f2084aa4b89f0b107
2023-03-06 09:50:39 -08:00
Peter Dillinger 50e9b3f9c7 Default print stack traces with GDB on Linux (#11272)
Summary:
On Linux systems using full ASLR, including CircleCI, the old backtrace()+addr2line stack traces are pretty useless, as seen in some failures under ASSERT_STATUS_CHECKED=1 LIB_MODE=static. Use gdb by default for stack traces under Linux. More detail in code comments.

Pull Request resolved: https://github.com/facebook/rocksdb/pull/11272

Test Plan: manual testing locally and on CircleCI with ssh

Reviewed By: anand1976

Differential Revision: D43786211

Pulled By: pdillinger

fbshipit-source-id: f8c7c77f774b504fbdf7c786ff2430cbc8f5b939
2023-03-05 08:21:57 -08:00
Peter Dillinger e168c1b1a4 Use FaultInjectionTestFS in DBWriteTest.LockWALInEffect (#11271)
Summary:
Existing use of FaultInjectionTestEnv shows rare TSAN errors with parallel Sync and Flush. This appears to be fixed in FaultInjectionTestFS. (Sigh, code duplication and divergence.)

Example failure:
https://app.circleci.com/pipelines/github/facebook/rocksdb/24631/workflows/fc2a66f0-f21c-48d6-a944-3885bcff50a4/jobs/571928

Pull Request resolved: https://github.com/facebook/rocksdb/pull/11271

Test Plan: wasn't able to reproduce locally but stress tested the updated test with gtest-parallel -r1000 and TSAN.

Reviewed By: ajkr

Differential Revision: D43779477

Pulled By: pdillinger

fbshipit-source-id: a019b0f1d4045a26a15ab08aab63828a398f6d3e
2023-03-05 08:21:16 -08:00
Igor Canadi ddde1e6af8 Avoid ColumnFamilyDescriptor copy (#10978)
Summary:
Hi. :) Noticed we are copying ColumnFamilyDescriptor here because my process crashed during copy constructor (cause unrelated)

Pull Request resolved: https://github.com/facebook/rocksdb/pull/10978

Reviewed By: cbi42

Differential Revision: D41473924

Pulled By: ajkr

fbshipit-source-id: 58a3473f2d7b24918f79d4b2726c20081c5e95b4
2023-03-03 20:55:31 -08:00
Changyu Bi d053926fa2 Improve documentation for MergingIterator (#11161)
Summary:
Add some comments to try to explain how/why MergingIterator works. Made some small refactoring, mostly in MergingIterator::SkipNextDeleted() and MergingIterator::SeekImpl().

Pull Request resolved: https://github.com/facebook/rocksdb/pull/11161

Test Plan:
crash test with small key range:
```
python3 tools/db_crashtest.py blackbox --simple --max_key=100 --interval=6000 --write_buffer_size=262144 --target_file_size_base=256 --max_bytes_for_level_base=262144 --block_size=128 --value_size_mult=33 --subcompactions=10 --use_multiget=1 --delpercent=3 --delrangepercent=2 --verify_iterator_with_expected_state_one_in=2 --num_iterations=10
```

Reviewed By: ajkr

Differential Revision: D42860994

Pulled By: cbi42

fbshipit-source-id: 3f0c1c9c6481a7f468bf79d823998907a8116e9e
2023-03-03 12:17:30 -08:00
Levi Tamasi 95d67f3646 Fix/clarify/extend the API comments of CompactionFilter (#11261)
Summary:
The patch makes the following changes to the API comments:
* Some general comments about snapshots, thread safety, and user-defined timestamps are moved to a more prominent place at the top of the file.
* Detailed descriptions are added for each `ValueType` and `Decision`, fixing and extending some existing comments (e.g. that of `kRemove`, which suggested that key-values are simply removed from the output, while in reality base values are converted to tombstones) and adding detailed comments that were missing (e.g. `kPurge` and `kChangeWideColumnEntity`).
* Updated/extended the comments of `FilterV2/V3` and `FilterBlobByKey`.

Pull Request resolved: https://github.com/facebook/rocksdb/pull/11261

Reviewed By: akankshamahajan15

Differential Revision: D43714314

Pulled By: ltamasi

fbshipit-source-id: 835f4b1bdac1ce0e291155186095211303260729
2023-03-03 09:53:13 -08:00
Yu Zhang 8dfcfd4e90 Fix backward iteration issue when user defined timestamp is enabled in BlobDB (#11258)
Summary:
During backward iteration, blob verification would fail because the user key (ts included) in `saved_key_` doesn't match the blob. This happens because during`FindValueForCurrentKey`, `saved_key_` is not updated when the user key(ts not included) is the same for all cases except when `timestamp_lb_` is specified. This breaks the blob verification logic when user defined timestamp is enabled and `timestamp_lb_` is not specified. Fix this by always updating `saved_key_` when a smaller user key (ts included) is seen.

Pull Request resolved: https://github.com/facebook/rocksdb/pull/11258

Test Plan:
`make check`
`./db_blob_basic_test --gtest_filter=DBBlobWithTimestampTest.IterateBlobs`

Run db_bench (built with DEBUG_LEVEL=0) to demonstrate that no overhead is introduced with:

`./db_bench -user_timestamp_size=8  -db=/dev/shm/rocksdb -disable_wal=1 -benchmarks=fillseq,seekrandom[-W1-X6] -reverse_iterator=1 -seek_nexts=5`

Baseline:

- seekrandom [AVG    6 runs] : 72188 (± 1481) ops/sec;   37.2 (± 0.8) MB/sec

With this PR:

- seekrandom [AVG    6 runs] : 74171 (± 1427) ops/sec;   38.2 (± 0.7) MB/sec

Reviewed By: ltamasi

Differential Revision: D43675642

Pulled By: jowlyzhang

fbshipit-source-id: 8022ae8522d1f66548821855e6eed63640c14e04
2023-03-01 13:28:54 -08:00
anand76 cf09917c18 Add filter/index/data secondary cache hits stats (#11246)
Summary:
Add more stats for better visibility into the usefulness of the secondary cache.

Pull Request resolved: https://github.com/facebook/rocksdb/pull/11246

Test Plan: Add a new unit test

Reviewed By: akankshamahajan15

Differential Revision: D43521364

Pulled By: anand1976

fbshipit-source-id: a92f04884e738a9bf40ad4047acaaaea343838a7
2023-02-28 10:36:56 -08:00
yihuang b7e73501d8 fix: add extern and ROCKSDB_LIBRARY_API to two c apis (#11217)
Summary:
add extern and `ROCKSDB_LIBRARY_API ` to `rocksdb_property_int` and `rocksdb_property_int_cf`.

Pull Request resolved: https://github.com/facebook/rocksdb/pull/11217

Reviewed By: cbi42

Differential Revision: D43522968

Pulled By: ajkr

fbshipit-source-id: 4cd4e136f3890fc17e0a1f9e7ac4e517e4d79afa
2023-02-27 11:39:38 -08:00
Levi Tamasi 3c9eed688e Enable moving a string or PinnableSlice into PinnableWideColumns (#11248)
Summary:
This makes it possible to eliminate some copies in `GetEntity` / `MultiGetEntity`,
in particular when `Merge`s or blobs are involved.

Pull Request resolved: https://github.com/facebook/rocksdb/pull/11248

Test Plan: `make check`

Reviewed By: akankshamahajan15

Differential Revision: D43544215

Pulled By: ltamasi

fbshipit-source-id: bc4c8955a24bbd8bc4ab098e72133ead757f9707
2023-02-24 10:33:00 -08:00
Yu Zhang af7872ffd1 Fix a TestGet failure when user defined timestamp is enabled (#11249)
Summary:
Stressing small DB with small number of keys and user-defined timestamp enabled usually fails pretty quickly in TestGet.

Example command to reproduce the failure:

` tools/db_crashtest.py blackbox --enable_ts --simple --delrangepercent=0 --delpercent=5 --max_key=100 --interval=3 --write_buffer_size=262144 --target_file_size_base=262144 --max_bytes_for_level_base=262144 --subcompactions=1`

Example failure: `error : inconsistent values for key 0000000000000009000000000000000A7878: expected state has the key, Get() returns NotFound.`

Fixes this test failure by refreshing the read up to timestamp to the most up to date timestamp, a.k.a now, after a key is locked.  Without this, things could happen in this order and cause a test failure:

<table>
  <tr>
    <th>TestGet thread</th>
    <th> A writing thread</th>
  </tr>
  <tr>
    <td>read_opts.timestamp = GetNow()</td>
    <td></td>
  </tr>
  <tr>
    <td></td>
    <td>Lock key, do write</td>
  </tr>
  <tr>
    <td>Lock key, read(read_opts) return NotFound</td>
    <td></td>
  </tr>
</table>

Pull Request resolved: https://github.com/facebook/rocksdb/pull/11249

Reviewed By: ltamasi

Differential Revision: D43551302

Pulled By: jowlyzhang

fbshipit-source-id: 26877ab379bdb97acd2682a2632bc29718427f38
2023-02-23 17:00:04 -08:00
Yu Zhang f007b8fdea Support iter_start_ts in integrated BlobDB (#11244)
Summary:
Fixed an issue during backward iteration when `iter_start_ts` is set in an integrated BlobDB.

Pull Request resolved: https://github.com/facebook/rocksdb/pull/11244

Test Plan:
```make check
./db_blob_basic_test --gtest_filter="DBBlobWithTimestampTest.IterateBlobs"
tools/db_crashtest.py --stress_cmd=./db_stress --cleanup_cmd='' --enable_ts whitebox --random_kill_odd 888887 --enable_blob_files=1```

Reviewed By: ltamasi

Differential Revision: D43506726

Pulled By: jowlyzhang

fbshipit-source-id: 2cdc19ebf8da909d8d43d621353905784949a9f0
2023-02-22 15:44:59 -08:00
Changyu Bi 229297d1b8 Refactor AddRangeDels() + consider range tombstone during compaction file cutting (#11113)
Summary:
A second attempt after https://github.com/facebook/rocksdb/issues/10802, with bug fixes and refactoring. This PR updates compaction logic to take range tombstones into account when determining whether to cut the current compaction output file (https://github.com/facebook/rocksdb/issues/4811). Before this change, only point keys were considered, and range tombstones could cause large compactions. For example, if the current compaction outputs is a range tombstone [a, b) and 2 point keys y, z, they would be added to the same file, and may overlap with too many files in the next level and cause a large compaction in the future. This PR also includes ajkr's effort to simplify the logic to add range tombstones to compaction output files in `AddRangeDels()` ([https://github.com/facebook/rocksdb/issues/11078](https://github.com/facebook/rocksdb/pull/11078#issuecomment-1386078861)).

The main change is for `CompactionIterator` to emit range tombstone start keys to be processed by `CompactionOutputs`. A new class `CompactionMergingIterator` is introduced to replace `MergingIterator` under `CompactionIterator` to enable emitting of range tombstone start keys. Further improvement after this PR include cutting compaction output at some grandparent boundary key (instead of the next output key) when cutting within a range tombstone to reduce overlap with grandparents.

Pull Request resolved: https://github.com/facebook/rocksdb/pull/11113

Test Plan:
* added unit test in db_range_del_test
* crash test with a small key range: `python3 tools/db_crashtest.py blackbox --simple --max_key=100 --interval=600 --write_buffer_size=262144 --target_file_size_base=256 --max_bytes_for_level_base=262144 --block_size=128 --value_size_mult=33 --subcompactions=10 --use_multiget=1 --delpercent=3 --delrangepercent=2 --verify_iterator_with_expected_state_one_in=2 --num_iterations=10`

Reviewed By: ajkr

Differential Revision: D42655709

Pulled By: cbi42

fbshipit-source-id: 8367e36ef5640e8f21c14a3855d4a8d6e360a34c
2023-02-22 12:28:18 -08:00
ywave 9fa9becf53 fix -Wrange-loop-analysis in Apple clang version 12.0.0 (clang-1200.0.32.29) (#11240)
Summary:
Fix complain
```
db/db_impl/db_impl_compaction_flush.cc:417:19: error: loop variable 'bg_flush_arg' of type 'const rocksdb::DBImpl::BGFlushArg' creates a copy from type
      'const rocksdb::DBImpl::BGFlushArg' [-Werror,-Wrange-loop-analysis]
  for (const auto bg_flush_arg : bg_flush_args) {
                  ^
db/db_impl/db_impl_compaction_flush.cc:417:8: note: use reference type 'const rocksdb::DBImpl::BGFlushArg &' to prevent copying
  for (const auto bg_flush_arg : bg_flush_args) {
       ^~~~~~~~~~~~~~~~~~~~~~~~~
                  &
db/db_impl/db_impl_compaction_flush.cc:2911:21: error: loop variable 'bg_flush_arg' of type 'const rocksdb::DBImpl::BGFlushArg' creates a copy from type
      'const rocksdb::DBImpl::BGFlushArg' [-Werror,-Wrange-loop-analysis]
    for (const auto bg_flush_arg : bg_flush_args) {
                    ^
db/db_impl/db_impl_compaction_flush.cc:2911:10: note: use reference type 'const rocksdb::DBImpl::BGFlushArg &' to prevent copying
    for (const auto bg_flush_arg : bg_flush_args) {
         ^~~~~~~~~~~~~~~~~~~~~~~~~
                    &
```
from

```sh
xxx@MacBook-Pro / % g++ -v
Configured with: --prefix=/Library/Developer/CommandLineTools/usr --with-gxx-include-dir=/Library/Developer/CommandLineTools/SDKs/MacOSX.sdk/usr/include/c++/4.2.1
Apple clang version 12.0.0 (clang-1200.0.32.29)
Target: x86_64-apple-darwin21.6.0
Thread model: posix
```

Pull Request resolved: https://github.com/facebook/rocksdb/pull/11240

Reviewed By: cbi42

Differential Revision: D43458729

Pulled By: ajkr

fbshipit-source-id: 26e110f83451509463a1bc308f737ccb693c9f45
2023-02-22 05:44:03 -08:00
Andrew Kryczka 286080456c Update HISTORY.md and version.h for 8.0 release (#11238)
Summary:
8.0.fb branch is cut so changes going forward will be part of 8.1. Updated version.h and HISTORY.md accordingly

Pull Request resolved: https://github.com/facebook/rocksdb/pull/11238

Reviewed By: cbi42

Differential Revision: D43428345

Pulled By: ajkr

fbshipit-source-id: d344b6e504c81a85563ae9d3705b11c533b1cd43
2023-02-21 15:21:33 -08:00
anand76 476b01579c Revert enabling IO uring in db_stress (#11242)
Summary:
IO uring usage is causing crash test failures due to bad cqe data being returned in the uring. Revert the change to enable IO uring in db_stress, and also re-enable async_io in CircleCI so that code path can be tested. Added the -use_io_uring flag to db_stress that, when false, will wrap the default env in db_stress to emulate async IO.

Pull Request resolved: https://github.com/facebook/rocksdb/pull/11242

Reviewed By: akankshamahajan15

Differential Revision: D43470569

Pulled By: anand1976

fbshipit-source-id: 7c69ac3f53a79ade31d37313f815f1a4b6108b75
2023-02-21 12:53:55 -08:00
Changyu Bi 1b48ecc2c6 Fix an assertion failure in DBIter::SeekToLast() when user-defined timestamp is enabled (#11223)
Summary:
in DBIter::SeekToLast(), key() can be called when iter is invalid and fails the following assertion:
```
./db/db_iter.h:153: virtual rocksdb::Slice rocksdb::DBIter::key() const: Assertion `valid_' failed.
```
This happens when `iterate_upper_bound` and timestamp_lb_ are set. SeekForPrev(*iterate_upper_bound_) positions the iterator on the same user key as *iterate_upper_bound_. A subsequent PrevInternal() call makes the iterator invalid just be the call to key().

This PR fixes this issue by setting updating the seek key to have max sequence number AND max timestamp when the seek key has the same user key as *iterate_upper_bound_.

Pull Request resolved: https://github.com/facebook/rocksdb/pull/11223

Test Plan: - Added a unit test that would fail the above assertion before this fix.

Reviewed By: jowlyzhang

Differential Revision: D43283600

Pulled By: cbi42

fbshipit-source-id: 0dd3999845b722584679bbc95be2664b266005ba
2023-02-21 11:57:58 -08:00
leipeng ea85148b78 DBIter::FindNextUserEntryInternal: do not PrepareValue for Delete (#11211)
Summary:
`kTypeDeletion/kTypeDeletionWithTimestamp/kTypeSingleDeletion` does not need access iter value, so omit `PrepareValue`.

Pull Request resolved: https://github.com/facebook/rocksdb/pull/11211

Reviewed By: ajkr

Differential Revision: D43253068

Pulled By: cbi42

fbshipit-source-id: 1945c7f8a90b6909128a0553b62d9fd1078b0a08
2023-02-21 11:26:30 -08:00
Changyu Bi ebfca2cf00 Fix comment for option periodic_compaction_seconds (#11227)
Summary:
the comment for option `periodic_compaction_seconds` only mentions support for Leveled and FIFO compaction, while the implementation supports all compaction styles after https://github.com/facebook/rocksdb/issues/5970. This PR updates comment to reflect this.

Pull Request resolved: https://github.com/facebook/rocksdb/pull/11227

Reviewed By: ajkr

Differential Revision: D43325046

Pulled By: cbi42

fbshipit-source-id: 2364dcb5a01cd098ad52c818fe10d621445e2188
2023-02-21 11:12:22 -08:00
HuangYi 83bc03a99a add c api to set option fail_if_not_bottommost_level (#11158)
Summary: Pull Request resolved: https://github.com/facebook/rocksdb/pull/11158

Reviewed By: cbi42

Differential Revision: D42870647

Pulled By: ajkr

fbshipit-source-id: 1b71a1dd415c34c332cecf60c68ce37fe4393e2a
2023-02-21 10:52:09 -08:00
HuangYi cfe50f7e77 add c api for HyperClockCache (#11110)
Summary: Pull Request resolved: https://github.com/facebook/rocksdb/pull/11110

Reviewed By: cbi42

Differential Revision: D42660941

Pulled By: ajkr

fbshipit-source-id: e977d9b76dfd5d8c62335f961c275f3b810503d7
2023-02-21 10:00:43 -08:00
Matt Jurik 142b18d00b C-API: Support multi-CF flush (#11112)
Summary:
This PR adds support to the c-api bindings for calling `Flush()` with multiple column families, which is useful for performing atomic flushes (assuming also that the db has been opened with `atomic_flush = true`).

Pull Request resolved: https://github.com/facebook/rocksdb/pull/11112

Reviewed By: cbi42

Differential Revision: D42666382

Pulled By: ajkr

fbshipit-source-id: 82f05bf32d28452d85c79ea42411c8fea961fd87
2023-02-21 09:10:03 -08:00
Andrew Kryczka fcd816d534 Add missing override keyword in env_win.h functions (#11232)
Summary:
I couldn't figure out why this causes failures in our 8.0 release to fbcode while this issue appears to not be new in 8.0. Anyways, we can add the missing `override` keywords to these functions as the compiler insists.

Pull Request resolved: https://github.com/facebook/rocksdb/pull/11232

Reviewed By: pdillinger

Differential Revision: D43420656

Pulled By: ajkr

fbshipit-source-id: da748eeef6ba38dd113dbe4b5143d7558daf38dd
2023-02-18 17:30:15 -08:00
Alan Paxton d47126875b Fix Java API ComparatorOptions use after delete error (#11176)
Summary:
The problem
-------------
ComparatorOptions is AutoCloseable.

AbstractComparator does not hold a reference to its ComparatorOptions, but the native C++ ComparatorJniCallback holds a reference to the ComparatorOptions’ native C++ options structure. This gets deleted when the ComparatorOptions is closed, either explicitly, or as part of try-with-resources.

Later, the deleted C++ options structure gets used by the callback and the comparator options are effectively random.

The original bug report https://github.com/facebook/rocksdb/issues/8715 was caused by a GC-initiated finalization closing the still-in-use ComparatorOptions. As of 7.0, finalization of RocksDB objects no longer closes them, which worked round the reported bug, but still left ComparatorOptions with a potentially broken lifetime.

In any case, we encourage API clients to use the try-with-resources model, and so we need it to work. And if they don't use it, they leak resources.

The solution
-------------
The solution implemented here is to make a copy of the native C++ options object into the ComparatorJniCallback, rather than a reference. Then the deletion of the native object held by ComparatorOptions is *correctly* deleted when its scope is closed in try/finally.

Testing
-------
We added a regression unit test based on the original test for the reported ticket.

This checkin closes https://github.com/facebook/rocksdb/issues/8715

We expect that there are more instances of "lifecycle" bugs in the Java API. They are a major source of support time/cost, and we note that they could be addressed as a whole using the model proposed/prototyped in https://github.com/facebook/rocksdb/pull/10736

Pull Request resolved: https://github.com/facebook/rocksdb/pull/11176

Reviewed By: cbi42

Differential Revision: D43160885

Pulled By: pdillinger

fbshipit-source-id: 60b54215a02ad9abb17363319650328c00a9ad62
2023-02-17 13:03:41 -08:00
mrambacher b6640c3117 Remove FactoryFunc from LoadXXXObject (#11203)
Summary:
The primary purpose of the FactoryFunc was to support LITE mode where the ObjectRegistry was not available.  With the removal of LITE mode, the function was no longer required.

Note that the MergeOperator had some private classes defined in header files.  To gain access to their constructors (and name methods), the class definitions were moved into header files.

Pull Request resolved: https://github.com/facebook/rocksdb/pull/11203

Reviewed By: cbi42

Differential Revision: D43160255

Pulled By: pdillinger

fbshipit-source-id: f3a465fd5d1a7049b73ecf31e4b8c3762f6dae6c
2023-02-17 12:54:07 -08:00
Andrew Kryczka 25e1365227 Merge operator failed subcode (#11231)
Summary:
From HISTORY.md: Added a subcode of `Status::Corruption`, `Status::SubCode::kMergeOperatorFailed`, for users to identify corruption failures originating in the merge operator, as opposed to RocksDB's internally identified data corruptions.

This is a followup to https://github.com/facebook/rocksdb/issues/11092, where we gave users the ability to keep running a DB despite merge operator failing. Now that the DB keeps running despite such failures, they want to be able to distinguish such failures from real corruptions.

Pull Request resolved: https://github.com/facebook/rocksdb/pull/11231

Test Plan: updated unit test

Reviewed By: akankshamahajan15

Differential Revision: D43396607

Pulled By: ajkr

fbshipit-source-id: 17fbcc779ad724dafada8abd73efd38e1c5208b9
2023-02-17 10:58:46 -08:00
Andrew Kryczka 6aef1a05d6 Use CacheDependencies() at start of ApproximateKeyAnchors() (#11230)
Summary: Pull Request resolved: https://github.com/facebook/rocksdb/pull/11230

Test Plan:
- setup command: `$ ./db_bench -benchmarks=fillrandom,compact -compression_type=none -num=1000000 -write_buffer_size=4194304 -target_file_size_base=4194304 -use_direct_io_for_flush_and_compaction=true -partition_index_and_filters=true -bloom_bits=10 -metadata_block_size=1024`
- measure small read count bucketed by size: `$ strace -fye pread64 ./db_bench.ctrl -use_existing_db=true -benchmarks=compact -compaction_readahead_size=4194304 -compression_type=none -num=1000000 -write_buffer_size=4194304 -target_file_size_base=4194304 -use_direct_io_for_flush_and_compaction=true -partition_index_and_filters=true -bloom_bits=10 -metadata_block_size=1024  -subcompactions=4 -cache_size=1048576000  2>&1 >/dev/null | awk '/= [0-9]+$/{print "[", int($NF / 1024), "KB,", int(1 + $NF / 1024), "KB)"}' | sort -n -k 2 | uniq -c | head -3`
- before:
```
   1119 [ 0 KB, 1 KB)
      1 [ 6 KB, 7 KB)
      2 [ 7 KB, 8 KB)
```
- after:
```
    242 [ 0 KB, 1 KB)
      1 [ 6 KB, 7 KB)
      2 [ 7 KB, 8 KB)
```

Reviewed By: pdillinger

Differential Revision: D43388507

Pulled By: ajkr

fbshipit-source-id: a02413c9f615b00784700646825a9870ee10f3a7
2023-02-17 09:03:37 -08:00
akankshamahajan 68e4581c67 Return NotSupported in scan if IOUring not supported and enable IOUring in db_stress for async_io testing (#11197)
Summary:
- Return NotSupported in scan if IOUring not supported if async_io is enabled
- Enable IOUring in db_stress for async_io testing
- Disable async_io in circleci crash testing as circleci doesn't support IOUring

Pull Request resolved: https://github.com/facebook/rocksdb/pull/11197

Test Plan: CircleCI jobs

Reviewed By: anand1976

Differential Revision: D43096313

Pulled By: akankshamahajan15

fbshipit-source-id: c2c53a87636950c0243038b9f5bd0d91608e4fda
2023-02-16 18:33:06 -08:00
Peter Dillinger 64a1f7670f Customize CompressedSecondaryCache by block kind (#11204)
Summary:
Added `do_not_compress_roles` to `CompressedSecondaryCacheOptions` to disable compression on certain kinds of block. Filter blocks are now not compressed by CompressedSecondaryCache by default.

Pull Request resolved: https://github.com/facebook/rocksdb/pull/11204

Test Plan: unit test added

Reviewed By: anand1976

Differential Revision: D43147698

Pulled By: pdillinger

fbshipit-source-id: db496975ae975fa18f157f93fe131a16315ac875
2023-02-16 17:22:27 -08:00
Peter Dillinger 88056ea6cb Re-add memory_allocator.h include from cache.h (#11229)
Summary:
Enough users of NewJemallocNodumpAllocator() with cache.h to justify keeping it. (Reverting one little part of https://github.com/facebook/rocksdb/issues/11192)

Pull Request resolved: https://github.com/facebook/rocksdb/pull/11229

Test Plan: CI

Reviewed By: ajkr

Differential Revision: D43337140

Pulled By: pdillinger

fbshipit-source-id: 886b27b96b395619a4209f51b9b7787f4fe89e57
2023-02-16 08:07:45 -08:00
Levi Tamasi ab22e79824 Support using MultiGetEntity as verification method in stress tests (#11228)
Summary: Pull Request resolved: https://github.com/facebook/rocksdb/pull/11228

Reviewed By: akankshamahajan15

Differential Revision: D43332120

Pulled By: ltamasi

fbshipit-source-id: 15f32cf335aecb7e654da24ecafc6e010dc65194
2023-02-15 17:08:25 -08:00