rocksdb/table/block_based
Hui Xiao 3093d98c78 Fix higher read qps during db open caused by pr 11406 (#11516)
Summary:
**Context:**
[PR11406](https://github.com/facebook/rocksdb/pull/11406/) caused more frequent read during db open reading files with no `tail_size` in the manifest as part of the upgrade to 11406. This is due to that PR introduced
- [smaller](https://github.com/facebook/rocksdb/pull/11406/files#diff-57ed8c49db2bdd4db7618646a177397674bbf25beacacecb104070071d30129fR833) prefetch tail buffer size compared to pre-11406 for small files (< 52 MB) when `tail_prefetch_stats` infers tail size to be 0 (usually happens when the stats does not have much historical data to infer early on)
-  more read (up to # of partitioned filter/index) when such small prefetch tail buffer does not contain all the partitioned filter/index needed in CacheDependencies() since the [fallback logic](https://github.com/facebook/rocksdb/pull/11406/files#diff-d98f1a83de24412ad7f3527725dae7e28851c7222622c3cdb832d3cdf24bbf9fR165-R179) that prefetches all partitions at once will be [skipped](url) when such a small prefetch tail buffer is passed in

**Summary:**
- Revert the fallback prefetch buffer size change to preserve existing behavior fully during upgrading in `BlockBasedTable::PrefetchTail()`
- Use passed-in prefetch tail buffer in `CacheDependencies()` only if it has a smaller offset than the the offset of first partition filter/index, that is, at least as good as the existing prefetching behavior

Pull Request resolved: https://github.com/facebook/rocksdb/pull/11516

Test Plan:
- db bench

Create db with small files prior to PR 11406
```
./db_bench -db=/tmp/testdb/ --partition_index_and_filters=1 --statistics=1 -benchmarks=fillseq -key_size=3200 -value_size=5 -num=1000000 -write_buffer_size=6550000 -disable_auto_compactions=true -compression_type=zstd`
```
Read db to see if post-pr has lower read qps (i.e, rocksdb.file.read.db.open.micros count) during db open.
```
./db_bench -use_direct_reads=1 --file_opening_threads=1 --threads=1 --use_existing_db=1 --seed=1682546046158958 --partition_index_and_filters=1 --statistics=1 --db=/tmp/testdb/ --benchmarks=readrandom --key_size=3200 --value_size=5 --num=100 --disable_auto_compactions=true --compression_type=zstd
```
Pre-PR:
```
rocksdb.file.read.db.open.micros P50 : 3.399023 P95 : 5.924468 P99 : 12.408333 P100 : 29.000000 COUNT : 611 SUM : 2539
```

Post-PR:
```
rocksdb.file.read.db.open.micros P50 : 593.736842 P95 : 861.605263 P99 : 1212.868421 P100 : 2663.000000 COUNT : 585 SUM : 345349
```

_Note: To control the starting offset of the prefetch tail buffer easier, I manually override the following to eliminate the effect of alignment_
```
class PosixRandomAccessFile : public FSRandomAccessFile {
virtual size_t GetRequiredBufferAlignment() const override {
-    return logical_sector_size_;
+    return 1;
  }
 ```

- CI

Reviewed By: pdillinger

Differential Revision: D46472566

Pulled By: hx235

fbshipit-source-id: 2fe14ac8d489d15b0e08e6f8fe4f46d5f110978e
2023-06-06 17:42:43 -07:00
..
binary_search_index_reader.cc Add support to strip / pad timestamp when creating / reading a block based table (#11495) 2023-06-01 11:10:03 -07:00
binary_search_index_reader.h Extend Get/MultiGet deadline support to table open (#6982) 2020-06-29 14:53:17 -07:00
block.cc Add support to strip / pad timestamp when creating / reading a block based table (#11495) 2023-06-01 11:10:03 -07:00
block.h Add support to strip / pad timestamp when writing / reading a block (#11472) 2023-05-25 15:41:32 -07:00
block_based_table_builder.cc Add support to strip / pad timestamp when creating / reading a block based table (#11495) 2023-06-01 11:10:03 -07:00
block_based_table_builder.h Record and use the tail size to prefetch table tail (#11406) 2023-05-08 13:14:28 -07:00
block_based_table_factory.cc Add support to strip / pad timestamp when creating / reading a block based table (#11495) 2023-06-01 11:10:03 -07:00
block_based_table_factory.h Record and use the tail size to prefetch table tail (#11406) 2023-05-08 13:14:28 -07:00
block_based_table_iterator.cc Much better stats for seeks and prefix filtering (#11460) 2023-05-19 15:25:49 -07:00
block_based_table_iterator.h Much better stats for seeks and prefix filtering (#11460) 2023-05-19 15:25:49 -07:00
block_based_table_reader.cc Fix higher read qps during db open caused by pr 11406 (#11516) 2023-06-06 17:42:43 -07:00
block_based_table_reader.h Add support to strip / pad timestamp when creating / reading a block based table (#11495) 2023-06-01 11:10:03 -07:00
block_based_table_reader_impl.h Group rocksdb.sst.read.micros stat by IOActivity flush and compaction (#11288) 2023-04-21 09:07:18 -07:00
block_based_table_reader_sync_and_async.h Much better stats for seeks and prefix filtering (#11460) 2023-05-19 15:25:49 -07:00
block_based_table_reader_test.cc Add support to strip / pad timestamp when creating / reading a block based table (#11495) 2023-06-01 11:10:03 -07:00
block_builder.cc Add support to strip / pad timestamp when writing / reading a block (#11472) 2023-05-25 15:41:32 -07:00
block_builder.h Add support to strip / pad timestamp when writing / reading a block (#11472) 2023-05-25 15:41:32 -07:00
block_cache.cc Block per key-value checksum (#11287) 2023-04-25 12:08:23 -07:00
block_cache.h Block per key-value checksum (#11287) 2023-04-25 12:08:23 -07:00
block_prefetcher.cc Fix stress test failure for async_io (#10660) 2022-09-12 14:48:06 -07:00
block_prefetcher.h Provide support for direct_reads with async_io (#10197) 2022-07-06 11:42:59 -07:00
block_prefix_index.cc Fix bug with kHashSearch and changing prefix_extractor with SetOptions (#10128) 2022-06-10 08:51:45 -07:00
block_prefix_index.h Fix bug with kHashSearch and changing prefix_extractor with SetOptions (#10128) 2022-06-10 08:51:45 -07:00
block_test.cc Add support to strip / pad timestamp when writing / reading a block (#11472) 2023-05-25 15:41:32 -07:00
block_type.h Remove deprecated block-based filter (#10184) 2022-06-16 15:51:33 -07:00
cachable_entry.h HyperClockCache support for SecondaryCache, with refactoring (#11301) 2023-03-17 20:23:49 -07:00
data_block_footer.cc Replace namespace name "rocksdb" with ROCKSDB_NAMESPACE (#6433) 2020-02-20 12:09:57 -08:00
data_block_footer.h Replace namespace name "rocksdb" with ROCKSDB_NAMESPACE (#6433) 2020-02-20 12:09:57 -08:00
data_block_hash_index.cc Format files under table/ by clang-format (#10852) 2022-10-25 11:50:38 -07:00
data_block_hash_index.h Fix build with gcc 13 by including <cstdint> (#11118) 2023-01-25 14:30:32 -08:00
data_block_hash_index_test.cc Block per key-value checksum (#11287) 2023-04-25 12:08:23 -07:00
filter_block.h Record and use the tail size to prefetch table tail (#11406) 2023-05-08 13:14:28 -07:00
filter_block_reader_common.cc Use user-provided ReadOptions for metadata block reads more often (#11208) 2023-04-04 16:53:14 -07:00
filter_block_reader_common.h Use user-provided ReadOptions for metadata block reads more often (#11208) 2023-04-04 16:53:14 -07:00
filter_policy.cc Remove RocksDB LITE (#11147) 2023-01-27 13:14:19 -08:00
filter_policy_internal.h Remove deprecated block-based filter (#10184) 2022-06-16 15:51:33 -07:00
flush_block_policy.cc Change internal headers with duplicate names (#11408) 2023-05-17 11:27:09 -07:00
flush_block_policy_impl.h Change internal headers with duplicate names (#11408) 2023-05-17 11:27:09 -07:00
full_filter_block.cc Use user-provided ReadOptions for metadata block reads more often (#11208) 2023-04-04 16:53:14 -07:00
full_filter_block.h Use user-provided ReadOptions for metadata block reads more often (#11208) 2023-04-04 16:53:14 -07:00
full_filter_block_test.cc Use user-provided ReadOptions for metadata block reads more often (#11208) 2023-04-04 16:53:14 -07:00
hash_index_reader.cc Add support to strip / pad timestamp when creating / reading a block based table (#11495) 2023-06-01 11:10:03 -07:00
hash_index_reader.h Extend Get/MultiGet deadline support to table open (#6982) 2020-06-29 14:53:17 -07:00
index_builder.cc Add support to strip / pad timestamp when creating / reading a block based table (#11495) 2023-06-01 11:10:03 -07:00
index_builder.h Add support to strip / pad timestamp when creating / reading a block based table (#11495) 2023-06-01 11:10:03 -07:00
index_reader_common.cc Use user-provided ReadOptions for metadata block reads more often (#11208) 2023-04-04 16:53:14 -07:00
index_reader_common.h Add support to strip / pad timestamp when creating / reading a block based table (#11495) 2023-06-01 11:10:03 -07:00
mock_block_based_table.h Remove deprecated block-based filter (#10184) 2022-06-16 15:51:33 -07:00
parsed_full_filter_block.cc Hide FilterBits{Builder,Reader} from public API (#9592) 2022-02-17 16:34:46 -08:00
parsed_full_filter_block.h Major Cache refactoring, CPU efficiency improvement (#10975) 2023-01-11 14:20:40 -08:00
partitioned_filter_block.cc Fix higher read qps during db open caused by pr 11406 (#11516) 2023-06-06 17:42:43 -07:00
partitioned_filter_block.h Add support to strip / pad timestamp when creating / reading a block based table (#11495) 2023-06-01 11:10:03 -07:00
partitioned_filter_block_test.cc Add support to strip / pad timestamp when creating / reading a block based table (#11495) 2023-06-01 11:10:03 -07:00
partitioned_index_iterator.cc Provide support for direct_reads with async_io (#10197) 2022-07-06 11:42:59 -07:00
partitioned_index_iterator.h Format files under table/ by clang-format (#10852) 2022-10-25 11:50:38 -07:00
partitioned_index_reader.cc Fix higher read qps during db open caused by pr 11406 (#11516) 2023-06-06 17:42:43 -07:00
partitioned_index_reader.h Record and use the tail size to prefetch table tail (#11406) 2023-05-08 13:14:28 -07:00
reader_common.cc Remove own ToString() (#9955) 2022-05-06 13:03:58 -07:00
reader_common.h Add block checksum mismatch ticker stat (#11438) 2023-05-12 18:16:11 -07:00
uncompression_dict_reader.cc Group rocksdb.sst.read.micros stat by IOActivity flush and compaction (#11288) 2023-04-21 09:07:18 -07:00
uncompression_dict_reader.h Group rocksdb.sst.read.micros stat by IOActivity flush and compaction (#11288) 2023-04-21 09:07:18 -07:00