mirror of
https://github.com/facebook/rocksdb.git
synced 2024-11-26 07:30:54 +00:00
3093d98c78
Summary: **Context:** [PR11406](https://github.com/facebook/rocksdb/pull/11406/) caused more frequent read during db open reading files with no `tail_size` in the manifest as part of the upgrade to 11406. This is due to that PR introduced - [smaller](https://github.com/facebook/rocksdb/pull/11406/files#diff-57ed8c49db2bdd4db7618646a177397674bbf25beacacecb104070071d30129fR833) prefetch tail buffer size compared to pre-11406 for small files (< 52 MB) when `tail_prefetch_stats` infers tail size to be 0 (usually happens when the stats does not have much historical data to infer early on) - more read (up to # of partitioned filter/index) when such small prefetch tail buffer does not contain all the partitioned filter/index needed in CacheDependencies() since the [fallback logic](https://github.com/facebook/rocksdb/pull/11406/files#diff-d98f1a83de24412ad7f3527725dae7e28851c7222622c3cdb832d3cdf24bbf9fR165-R179) that prefetches all partitions at once will be [skipped](url) when such a small prefetch tail buffer is passed in **Summary:** - Revert the fallback prefetch buffer size change to preserve existing behavior fully during upgrading in `BlockBasedTable::PrefetchTail()` - Use passed-in prefetch tail buffer in `CacheDependencies()` only if it has a smaller offset than the the offset of first partition filter/index, that is, at least as good as the existing prefetching behavior Pull Request resolved: https://github.com/facebook/rocksdb/pull/11516 Test Plan: - db bench Create db with small files prior to PR 11406 ``` ./db_bench -db=/tmp/testdb/ --partition_index_and_filters=1 --statistics=1 -benchmarks=fillseq -key_size=3200 -value_size=5 -num=1000000 -write_buffer_size=6550000 -disable_auto_compactions=true -compression_type=zstd` ``` Read db to see if post-pr has lower read qps (i.e, rocksdb.file.read.db.open.micros count) during db open. ``` ./db_bench -use_direct_reads=1 --file_opening_threads=1 --threads=1 --use_existing_db=1 --seed=1682546046158958 --partition_index_and_filters=1 --statistics=1 --db=/tmp/testdb/ --benchmarks=readrandom --key_size=3200 --value_size=5 --num=100 --disable_auto_compactions=true --compression_type=zstd ``` Pre-PR: ``` rocksdb.file.read.db.open.micros P50 : 3.399023 P95 : 5.924468 P99 : 12.408333 P100 : 29.000000 COUNT : 611 SUM : 2539 ``` Post-PR: ``` rocksdb.file.read.db.open.micros P50 : 593.736842 P95 : 861.605263 P99 : 1212.868421 P100 : 2663.000000 COUNT : 585 SUM : 345349 ``` _Note: To control the starting offset of the prefetch tail buffer easier, I manually override the following to eliminate the effect of alignment_ ``` class PosixRandomAccessFile : public FSRandomAccessFile { virtual size_t GetRequiredBufferAlignment() const override { - return logical_sector_size_; + return 1; } ``` - CI Reviewed By: pdillinger Differential Revision: D46472566 Pulled By: hx235 fbshipit-source-id: 2fe14ac8d489d15b0e08e6f8fe4f46d5f110978e |
||
---|---|---|
.. | ||
binary_search_index_reader.cc | ||
binary_search_index_reader.h | ||
block.cc | ||
block.h | ||
block_based_table_builder.cc | ||
block_based_table_builder.h | ||
block_based_table_factory.cc | ||
block_based_table_factory.h | ||
block_based_table_iterator.cc | ||
block_based_table_iterator.h | ||
block_based_table_reader.cc | ||
block_based_table_reader.h | ||
block_based_table_reader_impl.h | ||
block_based_table_reader_sync_and_async.h | ||
block_based_table_reader_test.cc | ||
block_builder.cc | ||
block_builder.h | ||
block_cache.cc | ||
block_cache.h | ||
block_prefetcher.cc | ||
block_prefetcher.h | ||
block_prefix_index.cc | ||
block_prefix_index.h | ||
block_test.cc | ||
block_type.h | ||
cachable_entry.h | ||
data_block_footer.cc | ||
data_block_footer.h | ||
data_block_hash_index.cc | ||
data_block_hash_index.h | ||
data_block_hash_index_test.cc | ||
filter_block.h | ||
filter_block_reader_common.cc | ||
filter_block_reader_common.h | ||
filter_policy.cc | ||
filter_policy_internal.h | ||
flush_block_policy.cc | ||
flush_block_policy_impl.h | ||
full_filter_block.cc | ||
full_filter_block.h | ||
full_filter_block_test.cc | ||
hash_index_reader.cc | ||
hash_index_reader.h | ||
index_builder.cc | ||
index_builder.h | ||
index_reader_common.cc | ||
index_reader_common.h | ||
mock_block_based_table.h | ||
parsed_full_filter_block.cc | ||
parsed_full_filter_block.h | ||
partitioned_filter_block.cc | ||
partitioned_filter_block.h | ||
partitioned_filter_block_test.cc | ||
partitioned_index_iterator.cc | ||
partitioned_index_iterator.h | ||
partitioned_index_reader.cc | ||
partitioned_index_reader.h | ||
reader_common.cc | ||
reader_common.h | ||
uncompression_dict_reader.cc | ||
uncompression_dict_reader.h |