rocksdb/java/rocksjni
Mike Kolupaev b4d7209428 Add an option to put first key of each sst block in the index (#5289)
Summary:
The first key is used to defer reading the data block until this file gets to the top of merging iterator's heap. For short range scans, most files never make it to the top of the heap, so this change can reduce read amplification by a lot sometimes.

Consider the following workload. There are a few data streams (we'll be calling them "logs"), each stream consisting of a sequence of blobs (we'll be calling them "records"). Each record is identified by log ID and a sequence number within the log. RocksDB key is concatenation of log ID and sequence number (big endian). Reads are mostly relatively short range scans, each within a single log. Writes are mostly sequential for each log, but writes to different logs are randomly interleaved. Compactions are disabled; instead, when we accumulate a few tens of sst files, we create a new column family and start writing to it.

So, a typical sst file consists of a few ranges of blocks, each range corresponding to one log ID (we use FlushBlockPolicy to cut blocks at log boundaries). A typical read would go like this. First, iterator Seek() reads one block from each sst file. Then a series of Next()s move through one sst file (since writes to each log are mostly sequential) until the subiterator reaches the end of this log in this sst file; then Next() switches to the next sst file and reads sequentially from that, and so on. Often a range scan will only return records from a small number of blocks in small number of sst files; in this case, the cost of initial Seek() reading one block from each file may be bigger than the cost of reading the actually useful blocks.

Neither iterate_upper_bound nor bloom filters can prevent reading one block from each file in Seek(). But this PR can: if the index contains first key from each block, we don't have to read the block until this block actually makes it to the top of merging iterator's heap, so for short range scans we won't read any blocks from most of the sst files.

This PR does the deferred block loading inside value() call. This is not ideal: there's no good way to report an IO error from inside value(). As discussed with siying offline, it would probably be better to change InternalIterator's interface to explicitly fetch deferred value and get status. I'll do it in a separate PR.
Pull Request resolved: https://github.com/facebook/rocksdb/pull/5289

Differential Revision: D15256423

Pulled By: al13n321

fbshipit-source-id: 750e4c39ce88e8d41662f701cf6275d9388ba46a
2019-06-24 20:54:04 -07:00
..
backupablejni.cc comment unused parameters to turn on -Wunused-parameter flag 2018-04-12 17:59:16 -07:00
backupenginejni.cc Support range deletion tombstones in IngestExternalFile SSTs (#3778) 2018-07-13 22:43:09 -07:00
cassandra_compactionfilterjni.cc comment unused parameters to turn on -Wunused-parameter flag 2018-04-12 17:59:16 -07:00
cassandra_value_operator.cc comment unused parameters to turn on -Wunused-parameter flag 2018-04-12 17:59:16 -07:00
checkpoint.cc comment unused parameters to turn on -Wunused-parameter flag 2018-04-12 17:59:16 -07:00
clock_cache.cc comment unused parameters to turn on -Wunused-parameter flag 2018-04-12 17:59:16 -07:00
columnfamilyhandle.cc comment unused parameters to turn on -Wunused-parameter flag 2018-04-12 17:59:16 -07:00
compact_range_options.cc Add CompactRangeOptions for Java (#4220) 2018-08-17 10:57:25 -07:00
compaction_filter.cc comment unused parameters to turn on -Wunused-parameter flag 2018-04-12 17:59:16 -07:00
compaction_filter_factory.cc Add missing functionality to RocksJava (#4833) 2019-02-22 14:46:46 -08:00
compaction_filter_factory_jnicallback.cc
compaction_filter_factory_jnicallback.h
compaction_job_info.cc Add missing functionality to RocksJava (#4833) 2019-02-22 14:46:46 -08:00
compaction_job_stats.cc Add missing functionality to RocksJava (#4833) 2019-02-22 14:46:46 -08:00
compaction_options.cc Add missing functionality to RocksJava (#4833) 2019-02-22 14:46:46 -08:00
compaction_options_fifo.cc Add missing functionality to RocksJava (#4833) 2019-02-22 14:46:46 -08:00
compaction_options_universal.cc Add missing functionality to RocksJava (#4833) 2019-02-22 14:46:46 -08:00
comparator.cc comment unused parameters to turn on -Wunused-parameter flag 2018-04-12 17:59:16 -07:00
comparatorjnicallback.cc
comparatorjnicallback.h
compression_options.cc Add missing functionality to RocksJava (#4833) 2019-02-22 14:46:46 -08:00
env.cc Add missing functionality to RocksJava (#4833) 2019-02-22 14:46:46 -08:00
env_options.cc Add missing functionality to RocksJava (#4833) 2019-02-22 14:46:46 -08:00
filter.cc comment unused parameters to turn on -Wunused-parameter flag 2018-04-12 17:59:16 -07:00
ingest_external_file_options.cc Add missing functionality to RocksJava (#4833) 2019-02-22 14:46:46 -08:00
iterator.cc comment unused parameters to turn on -Wunused-parameter flag 2018-04-12 17:59:16 -07:00
jnicallback.cc
jnicallback.h
loggerjnicallback.cc comment unused parameters to turn on -Wunused-parameter flag 2018-04-12 17:59:16 -07:00
loggerjnicallback.h
lru_cache.cc comment unused parameters to turn on -Wunused-parameter flag 2018-04-12 17:59:16 -07:00
memory_util.cc Add missing functionality to RocksJava (#4833) 2019-02-22 14:46:46 -08:00
memtablejni.cc Add missing functionality to RocksJava (#4833) 2019-02-22 14:46:46 -08:00
merge_operator.cc Add copyright headers per FB open-source checkup tool. (#5199) 2019-04-18 10:55:01 -07:00
native_comparator_wrapper_test.cc comment unused parameters to turn on -Wunused-parameter flag 2018-04-12 17:59:16 -07:00
optimistic_transaction_db.cc Add missing functionality to RocksJava (#4833) 2019-02-22 14:46:46 -08:00
optimistic_transaction_options.cc comment unused parameters to turn on -Wunused-parameter flag 2018-04-12 17:59:16 -07:00
options.cc Add missing functionality to RocksJava (#4833) 2019-02-22 14:46:46 -08:00
options_util.cc Add missing functionality to RocksJava (#4833) 2019-02-22 14:46:46 -08:00
persistent_cache.cc Add missing functionality to RocksJava (#4833) 2019-02-22 14:46:46 -08:00
portal.h Add an option to put first key of each sst block in the index (#5289) 2019-06-24 20:54:04 -07:00
ratelimiterjni.cc comment unused parameters to turn on -Wunused-parameter flag 2018-04-12 17:59:16 -07:00
remove_emptyvalue_compactionfilterjni.cc comment unused parameters to turn on -Wunused-parameter flag 2018-04-12 17:59:16 -07:00
restorejni.cc comment unused parameters to turn on -Wunused-parameter flag 2018-04-12 17:59:16 -07:00
rocks_callback_object.cc comment unused parameters to turn on -Wunused-parameter flag 2018-04-12 17:59:16 -07:00
rocksdb_exception_test.cc comment unused parameters to turn on -Wunused-parameter flag 2018-04-12 17:59:16 -07:00
rocksjni.cc Add missing functionality to RocksJava (#4833) 2019-02-22 14:46:46 -08:00
slice.cc comment unused parameters to turn on -Wunused-parameter flag 2018-04-12 17:59:16 -07:00
snapshot.cc comment unused parameters to turn on -Wunused-parameter flag 2018-04-12 17:59:16 -07:00
sst_file_manager.cc Add missing functionality to RocksJava (#4833) 2019-02-22 14:46:46 -08:00
sst_file_writerjni.cc comment unused parameters to turn on -Wunused-parameter flag 2018-04-12 17:59:16 -07:00
statistics.cc Make statistics's stats_level change thread-safe (#5030) 2019-03-01 10:42:09 -08:00
statisticsjni.cc Get CompactionJobInfo from CompactFiles 2018-12-13 14:21:24 -08:00
statisticsjni.h
table.cc JNI: Do not create 8M block cache for negative blockCacheSize values (#5465) 2019-06-24 11:37:04 -07:00
table_filter.cc Add missing functionality to RocksJava (#4833) 2019-02-22 14:46:46 -08:00
table_filter_jnicallback.cc Add missing functionality to RocksJava (#4833) 2019-02-22 14:46:46 -08:00
table_filter_jnicallback.h Add missing functionality to RocksJava (#4833) 2019-02-22 14:46:46 -08:00
thread_status.cc Add missing functionality to RocksJava (#4833) 2019-02-22 14:46:46 -08:00
trace_writer.cc Add missing functionality to RocksJava (#4833) 2019-02-22 14:46:46 -08:00
trace_writer_jnicallback.cc Add missing functionality to RocksJava (#4833) 2019-02-22 14:46:46 -08:00
trace_writer_jnicallback.h Add missing functionality to RocksJava (#4833) 2019-02-22 14:46:46 -08:00
transaction.cc Extend Transaction::GetForUpdate with do_validate (#4680) 2018-12-06 17:49:00 -08:00
transaction_db.cc Add missing functionality to RocksJava (#4833) 2019-02-22 14:46:46 -08:00
transaction_db_options.cc comment unused parameters to turn on -Wunused-parameter flag 2018-04-12 17:59:16 -07:00
transaction_log.cc comment unused parameters to turn on -Wunused-parameter flag 2018-04-12 17:59:16 -07:00
transaction_notifier.cc comment unused parameters to turn on -Wunused-parameter flag 2018-04-12 17:59:16 -07:00
transaction_notifier_jnicallback.cc
transaction_notifier_jnicallback.h
transaction_options.cc comment unused parameters to turn on -Wunused-parameter flag 2018-04-12 17:59:16 -07:00
ttl.cc Add missing functionality to RocksJava (#4833) 2019-02-22 14:46:46 -08:00
wal_filter.cc Add missing functionality to RocksJava (#4833) 2019-02-22 14:46:46 -08:00
wal_filter_jnicallback.cc Add missing functionality to RocksJava (#4833) 2019-02-22 14:46:46 -08:00
wal_filter_jnicallback.h Add missing functionality to RocksJava (#4833) 2019-02-22 14:46:46 -08:00
write_batch.cc Move some logging related files to logging/ (#5387) 2019-05-31 17:23:59 -07:00
write_batch_test.cc Move some memory related files from util/ to memory/ (#5382) 2019-05-30 17:44:09 -07:00
write_batch_with_index.cc Revert "BaseDeltaIterator: always check valid() before accessing key(… (#4744) 2018-12-03 23:38:27 -08:00
write_buffer_manager.cc Plumb WriteBufferManager through JNI (#4492) 2018-10-17 11:49:57 -07:00
writebatchhandlerjnicallback.cc Remove warnings caused by unused variables in jni (#4345) 2018-09-05 13:42:34 -07:00
writebatchhandlerjnicallback.h WriteUnPrepared: Add support for recovering WriteUnprepared transactions (#4078) 2018-07-06 17:59:13 -07:00