rocksdb/unreleased_history
Hui Xiao 58fc9d61b7 Trim readahead size based on prefix during prefix scan (#13040)
Summary:
**Context/Summary:**
During prefix scan, prefetched data blocks containing keys not in the same prefix as the `Seek()`'s key will be wasted when `ReadOptions::prefix_same_as_start = true` since they won't be returned to the user. This PR is to exclude those data blocks from being prefetched in a similar manner like trimming according to `ReadOptions::iterate_upper_bound`.

Bonus: refactoring to some existing prefetch test so they are easier to extend and read

Pull Request resolved: https://github.com/facebook/rocksdb/pull/13040

Test Plan:
- New UT, integration to existing UTs
- Benchmark to ensure no regression from CPU due to more trimming logic
```
// Build DB with one sorted run under the same prefix
./db_bench --benchmarks=fillrandom --prefix_size=3 --keys_per_prefix=5000000 --num=5000000 --db=/dev/shm/db_bench --disable_auto_compactions=1
```
```
// Augment the existing db bench to call `Seek()` instead of `SeekToFirst()` in `void ReadSequential(){..}` to trigger the logic in this PR

+++ b/tools/db_bench_tool.cc
@@ -5900,7 +5900,12 @@ class Benchmark {
     Iterator* iter = db->NewIterator(options);
     int64_t i = 0;
     int64_t bytes = 0;
-    for (iter->SeekToFirst(); i < reads_ && iter->Valid(); iter->Next()) {
+
+    iter->SeekToFirst();
+    assert(iter->status().ok() && iter->Valid());
+    auto prefix = prefix_extractor_->Transform(iter->key());
+
+    for (iter->Seek(prefix); i < reads_ && iter->Valid(); iter->Next()) {
       bytes += iter->key().size() + iter->value().size();
       thread->stats.FinishedOps(nullptr, db, 1, kRead);
       ++i;
:
```
```
// Compare prefix scan performance
./db_bench --benchmarks=readseq[-X20] --prefix_size=3  --prefix_same_as_start=1 --auto_readahead_size=1 --cache_size=1 --use_existing_db=1 --db=/dev/shm/db_bench --disable_auto_compactions=1

// Before PR
readseq [AVG    20 runs] : 2449011 (± 50238) ops/sec;  270.9 (± 5.6) MB/sec
readseq [MEDIAN 20 runs] : 2499167 ops/sec;  276.5 MB/sec

// After PR  (regress 0.4 %)
readseq [AVG    20 runs] : 2439098 (± 42931) ops/sec;  269.8 (± 4.7) MB/sec
readseq [MEDIAN 20 runs] : 2460859 ops/sec;  272.2 MB/sec

```

- Stress test: randomly set `prefix_same_as_start` in `TestPrefixScan()`. Run below for a while
```
python3 tools/db_crashtest.py --simple blackbox --prefix_size=5 --prefixpercent=65 --WAL_size_limit_MB=1 --WAL_ttl_seconds=0 --acquire_snapshot_one_in=10000 --adm_policy=1 --advise_random_on_open=1 --allow_data_in_errors=True --allow_fallocate=0 --async_io=0 --avoid_flush_during_recovery=1 --avoid_flush_during_shutdown=1 --avoid_unnecessary_blocking_io=0 --backup_max_size=104857600 --backup_one_in=1000 --batch_protection_bytes_per_key=8 --bgerror_resume_retry_interval=100 --block_align=1 --block_protection_bytes_per_key=4 --block_size=16384 --bloom_before_level=-1 --bloom_bits=3 --bottommost_compression_type=none --bottommost_file_compaction_delay=3600 --bytes_per_sync=262144 --cache_index_and_filter_blocks=0 --cache_index_and_filter_blocks_with_high_priority=0 --cache_size=33554432 --cache_type=lru_cache --charge_compression_dictionary_building_buffer=1 --charge_file_metadata=0 --charge_filter_construction=0 --charge_table_reader=0 --check_multiget_consistency=0 --check_multiget_entity_consistency=0 --checkpoint_one_in=10000 --checksum_type=kxxHash --clear_column_family_one_in=0 --compact_files_one_in=1000 --compact_range_one_in=1000 --compaction_pri=4 --compaction_readahead_size=0 --compaction_style=2 --compaction_ttl=0 --compress_format_version=2 --compressed_secondary_cache_size=16777216 --compression_checksum=0 --compression_max_dict_buffer_bytes=0 --compression_max_dict_bytes=0 --compression_parallel_threads=1 --compression_type=none --compression_use_zstd_dict_trainer=1 --compression_zstd_max_train_bytes=0 --continuous_verification_interval=0 --daily_offpeak_time_utc= --data_block_index_type=0  --db_write_buffer_size=8388608 --decouple_partitioned_filters=1 --default_temperature=kCold --default_write_temperature=kWarm --delete_obsolete_files_period_micros=21600000000 --delpercent=4 --delrangepercent=1 --destroy_db_initially=1 --detect_filter_construct_corruption=0 --disable_file_deletions_one_in=10000 --disable_manual_compaction_one_in=1000000 --disable_wal=0 --dump_malloc_stats=1 --enable_checksum_handoff=0 --enable_compaction_filter=0 --enable_custom_split_merge=1 --enable_do_not_compress_roles=1 --enable_index_compression=1 --enable_memtable_insert_with_hint_prefix_extractor=0 --enable_pipelined_write=1 --enable_sst_partitioner_factory=0 --enable_thread_tracking=1 --enable_write_thread_adaptive_yield=0 --error_recovery_with_no_fault_injection=0 --exclude_wal_from_write_fault_injection=1 --fail_if_options_file_error=0 --fifo_allow_compaction=1 --file_checksum_impl=big --fill_cache=0 --flush_one_in=1000000 --format_version=6 --get_all_column_family_metadata_one_in=10000 --get_current_wal_file_one_in=0 --get_live_files_apis_one_in=10000 --get_properties_of_all_tables_one_in=1000000 --get_property_one_in=1000000 --get_sorted_wal_files_one_in=0 --hard_pending_compaction_bytes_limit=274877906944 --high_pri_pool_ratio=0 --index_block_restart_interval=15 --index_shortening=2 --index_type=2 --ingest_external_file_one_in=0 --inplace_update_support=0 --iterpercent=10 --key_len_percent_dist=1,30,69 --key_may_exist_one_in=100000 --last_level_temperature=kUnknown --level_compaction_dynamic_level_bytes=0 --lock_wal_one_in=1000000 --log2_keys_per_lock=10 --log_file_time_to_roll=0 --log_readahead_size=0 --long_running_snapshots=0 --low_pri_pool_ratio=0.5 --lowest_used_cache_tier=2 --manifest_preallocation_size=5120 --manual_wal_flush_one_in=1000 --mark_for_compaction_one_file_in=10 --max_background_compactions=20 --max_bytes_for_level_base=10485760 --max_key=1000 --max_key_len=3 --max_log_file_size=1048576 --max_manifest_file_size=1073741824 --max_sequential_skip_in_iterations=8 --max_total_wal_size=0 --max_write_batch_group_size_bytes=16777216 --max_write_buffer_number=10 --max_write_buffer_size_to_maintain=4194304 --memtable_insert_hint_per_batch=1 --memtable_max_range_deletions=0 --memtable_prefix_bloom_size_ratio=0.01 --memtable_protection_bytes_per_key=2 --memtable_whole_key_filtering=1 --memtablerep=skip_list --metadata_charge_policy=1 --metadata_read_fault_one_in=32 --metadata_write_fault_one_in=0 --min_write_buffer_number_to_merge=2 --mmap_read=0 --mock_direct_io=True --nooverwritepercent=1 --open_files=-1 --open_metadata_read_fault_one_in=0 --open_metadata_write_fault_one_in=8 --open_read_fault_one_in=0 --open_write_fault_one_in=0 --ops_per_thread=40000 --optimize_filters_for_hits=0 --optimize_filters_for_memory=1 --optimize_multiget_for_io=1 --paranoid_file_checks=1 --paranoid_memory_checks=0 --partition_filters=0 --partition_pinning=3 --pause_background_one_in=10000 --periodic_compaction_seconds=0 --prepopulate_block_cache=1 --preserve_internal_time_seconds=0 --progress_reports=0 --promote_l0_one_in=0 --read_amp_bytes_per_bit=0 --read_fault_one_in=32  --readpercent=10 --recycle_log_file_num=0 --reopen=0 --report_bg_io_stats=0 --reset_stats_one_in=1000000 --sample_for_compression=5 --secondary_cache_fault_one_in=0 --secondary_cache_uri= --skip_stats_update_on_db_open=0 --snapshot_hold_ops=100000 --soft_pending_compaction_bytes_limit=1048576 --sqfc_name=foo --sqfc_version=2 --sst_file_manager_bytes_per_sec=104857600 --sst_file_manager_bytes_per_truncate=1048576 --stats_dump_period_sec=10 --stats_history_buffer_size=1048576 --strict_bytes_per_sync=1 --subcompactions=4 --sync=0 --sync_fault_injection=1 --table_cache_numshardbits=-1 --target_file_size_base=524288 --target_file_size_multiplier=2 --test_batches_snapshots=0 --top_level_index_pinning=0 --uncache_aggressiveness=1 --universal_max_read_amp=10 --unpartitioned_pinning=0 --use_adaptive_mutex=1 --use_adaptive_mutex_lru=1 --use_attribute_group=0 --use_delta_encoding=0 --use_direct_io_for_flush_and_compaction=0 --use_direct_reads=1 --use_full_merge_v1=1 --use_get_entity=0 --use_merge=1 --use_multi_cf_iterator=1 --use_multi_get_entity=0 --use_multiget=1 --use_put_entity_one_in=0 --use_sqfc_for_range_queries=1 --use_timed_put_one_in=0 --use_write_buffer_manager=1 --user_timestamp_size=0 --value_size_mult=32 --verification_only=0 --verify_checksum=1 --verify_checksum_one_in=1000 --verify_compression=0 --verify_db_one_in=100000 --verify_file_checksums_one_in=1000 --verify_iterator_with_expected_state_one_in=5 --verify_sst_unique_id_in_manifest=1 --wal_bytes_per_sync=0 --wal_compression=none --write_buffer_size=1048576 --write_dbid_to_manifest=1 --write_fault_one_in=1000 --writepercent=10
```

Reviewed By: anand1976

Differential Revision: D64367065

Pulled By: hx235

fbshipit-source-id: 5750c05ccc835c3e9dc81c961b76deaf30bd23c2
2024-10-17 15:52:55 -07:00
..
behavior_changes Trim readahead size based on prefix during prefix scan (#13040) 2024-10-17 15:52:55 -07:00
bug_fixes Re-implement GetApproximateMemTableStats for skip lists (#13047) 2024-10-02 14:25:50 -07:00
new_features Add an ingestion option to not fill block cache (#13067) 2024-10-16 14:11:22 -07:00
performance_improvements Update history and version for 9.5.fb release (#12880) 2024-07-22 13:15:09 -07:00
public_api_changes Update main branch for 9.6 release (#12945) 2024-08-19 17:36:23 -07:00
add.sh
README.txt
release.sh

Adding release notes
--------------------

When adding release notes for the next release, add a file to one of these
directories:

unreleased_history/new_features
unreleased_history/behavior_changes
unreleased_history/public_api_changes
unreleased_history/bug_fixes

with a unique name that makes sense for your change, preferably using the .md
extension for syntax highlighting.

There is a script to help, as in

$ unreleased_history/add.sh unreleased_history/bug_fixes/crash_in_feature.md

or simply

$ unreleased_history/add.sh

will take you through some prompts.

The file should usually contain one line of markdown, and "* " is not
required, as it will automatically be inserted later if not included at the
start of the first line in the file. Extra newlines or missing trailing
newlines will also be corrected.

The only times release notes should be added directly to HISTORY are if
* A release is being amended or corrected after it is already "cut" but not
tagged, which should be rare.
* A single commit contains a noteworthy change and a patch release version bump


Ordering of entries
-------------------

Within each group, entries will be included using ls sort order, so important
entries could start their file name with a small three digit number like
100pretty_important.md.

The ordering of groups such as new_features vs. public_api_changes is
hard-coded in unreleased_history/release.sh


Updating HISTORY.md with release notes
--------------------------------------

The script unreleased_history/release.sh does this. Run the script before
updating version.h to the next development release, so that the script will pick
up the version being released. You might want to start with

$ DRY_RUN=1 unreleased_history/release.sh | less

to check for problems and preview the output. Then run

$ unreleased_history/release.sh

which will git rm some files and modify HISTORY.md. You still need to commit the
changes, or revert with the command reported in the output.


Why not update HISTORY.md directly?
-----------------------------------

First, it was common to hit unnecessary merge conflicts when adding entries to
HISTORY.md, which slowed development. Second, when a PR was opened before a
release cut and landed after the release cut, it was easy to add the HISTORY
entry to the wrong version's history. This new setup completely fixes both of
those issues, with perhaps slightly more initial work to create each entry.
There is also now an extra step in using `git blame` to map a release note
to its source code implementation, but that is a relatively rare operation.