mirror of https://github.com/facebook/rocksdb.git
c77b50a4fd
Summary: Add support for tuning of readahead_size by block cache lookup for async_io. **Design/ Implementation** - **BlockBasedTableIterator.cc** - `BlockCacheLookupForReadAheadSize` callback API lookups in the block cache and tries to reduce the start and end offset passed. This function looks into the block cache for the blocks between `start_offset` and `end_offset` and add all the handles in the queue. It then iterates from the end in the handles to find first miss block and update the end offset to that block. It also iterates from the start and find first miss block and update the start offset to that block. ``` _read_curr_block_ argument : True if this call was due to miss in the cache and caller wants to read that block synchronously. False if current call is to prefetch additional data in extra buffers (due to ReadAsync call in FilePrefetchBuffer) ``` In case there is no data to be read in that callback (because of upper_bound or all blocks are in cache), it updates start and end offset to be equal and that `FilePrefetchBuffer` interprets that as 0 length to be read. **FilePrefetchBuffer.cc** - FilePrefetchBuffer calls the callback - `ReadAheadSizeTuning` and pass the start and end offset to that callback to get updated start and end offset to read based on cache hits/misses. 1. In case of Read calls (when offset passed to FilePrefetchBuffer is on cache miss and that data needs to be read), _read_curr_block_ is passed true. 2. In case of ReadAsync calls, when buffer is all consumed and can go for additional prefetching, the start offset passed is the initial end offset of prev buffer (without any updated offset based on cache hit/miss). Foreg. if following are the data blocks with cache hit/miss and start offset and Read API found miss on DB1 and based on readahead_size (50) it passes end offset to be 50. [DB1 - miss- 0 ] [DB2 - hit -10] [DB3 - miss -20] [DB4 - miss-30] [DB5 - hit-40] [DB6 - hit-50] [DB7 - miss-60] [DB8 - miss - 70] [DB9 - hit - 80] [DB6 - hit 90] - For Read call - updated start offset remains 0 but end offset updates to DB4, as DB5 is in cache. - Read calls saves initial end offset 50 as that was meant to be prefetched. - Now for next ReadAsync call - the start offset will be 50 (previous buffer initial end offset) and based on readahead_size, end offset will be 100 - On callback, because of cache hits - callback will update the start offset to 60 and end offset to 80 to read only 2 data blocks (DB7 and DB8). - And for that ReadAsync call - initial end offset will be set to 100 which will again used by next ReadAsync call as start offset. - `initial_end_offset_` in `BufferInfo` is used to save the initial end offset of that buffer. - If let's say DB5 and DB6 overlaps in 2 buffers (because of alignment), `prev_buf_end_offset` is passed to make sure already prefetched data is not prefetched again in second buffer. Pull Request resolved: https://github.com/facebook/rocksdb/pull/11936 Test Plan: - Ran crash_test several times. - New unit tests added. Reviewed By: anand1976 Differential Revision: D50906217 Pulled By: akankshamahajan15 fbshipit-source-id: 0d75d3c98274e98aa34901b201b8fb05232139cf |
||
---|---|---|
.. | ||
adaptive | ||
block_based | ||
cuckoo | ||
plain | ||
block_fetcher.cc | ||
block_fetcher.h | ||
block_fetcher_test.cc | ||
cleanable_test.cc | ||
compaction_merging_iterator.cc | ||
compaction_merging_iterator.h | ||
format.cc | ||
format.h | ||
get_context.cc | ||
get_context.h | ||
internal_iterator.h | ||
iter_heap.h | ||
iterator.cc | ||
iterator_wrapper.h | ||
merger_test.cc | ||
merging_iterator.cc | ||
merging_iterator.h | ||
meta_blocks.cc | ||
meta_blocks.h | ||
mock_table.cc | ||
mock_table.h | ||
multiget_context.h | ||
persistent_cache_helper.cc | ||
persistent_cache_helper.h | ||
persistent_cache_options.h | ||
scoped_arena_iterator.h | ||
sst_file_dumper.cc | ||
sst_file_dumper.h | ||
sst_file_reader.cc | ||
sst_file_reader_test.cc | ||
sst_file_writer.cc | ||
sst_file_writer_collectors.h | ||
table_builder.h | ||
table_factory.cc | ||
table_properties.cc | ||
table_properties_internal.h | ||
table_reader.h | ||
table_reader_bench.cc | ||
table_test.cc | ||
two_level_iterator.cc | ||
two_level_iterator.h | ||
unique_id.cc | ||
unique_id_impl.h |