rocksdb/table/block_based
Changyu Bi 7a95938899 Improve FragmentTombstones() speed by lazily initializing `seq_set_` (#10848)
Summary:
FragmentedRangeTombstoneList has a member variable `seq_set_` that contains the sequence numbers of all range tombstones in a set. The set is constructed in `FragmentTombstones()` and is used only in `FragmentedRangeTombstoneList::ContainsRange()` which only happens during compaction. This PR moves the initialization of `seq_set_` to `FragmentedRangeTombstoneList::ContainsRange()`. This should speed up `FragmentTombstones()` when the range tombstone list is used for read/scan requests. Microbench shows the speed improvement to be ~45%.

Pull Request resolved: https://github.com/facebook/rocksdb/pull/10848

Test Plan:
- Existing tests and stress test: `python3 tools/db_crashtest.py whitebox --simple  --verify_iterator_with_expected_state_one_in=5`.
- Microbench: update `range_del_aggregator_bench` to benchmark speed of `FragmentTombstones()`:
```
./range_del_aggregator_bench --num_range_tombstones=1000 --tombstone_start_upper_bound=50000000 --num_runs=10000 --tombstone_width_mean=200 --should_deletes_per_run=100 --use_compaction_range_del_aggregator=true

Before this PR:
=========================
Fragment Tombstones:     270.286 us
AddTombstones:           1.28933 us
ShouldDelete (first):    0.525528 us
ShouldDelete (rest):     0.0797519 us

After this PR: time to fragment tombstones is pushed to AddTombstones() which only happen during compaction.
=========================
Fragment Tombstones:     149.879 us
AddTombstones:           102.131 us
ShouldDelete (first):    0.565871 us
ShouldDelete (rest):     0.0729444 us
```
- db_bench: this should improve speed for fragmenting range tombstones for mutable memtable:
```
./db_bench --benchmarks=readwhilewriting --writes_per_range_tombstone=100 --max_write_buffer_number=100 --min_write_buffer_number_to_merge=100 --writes=500000 --reads=250000 --disable_auto_compactions --max_num_range_tombstones=100000 --finish_after_writes --write_buffer_size=1073741824 --threads=25

Before this PR:
readwhilewriting :      18.301 micros/op 1310445 ops/sec 4.769 seconds 6250000 operations;   28.1 MB/s (41001 of 250000 found)
After this PR:
readwhilewriting :      16.943 micros/op 1439376 ops/sec 4.342 seconds 6250000 operations;   23.8 MB/s (28977 of 250000 found)
```

Reviewed By: ajkr

Differential Revision: D40646227

Pulled By: cbi42

fbshipit-source-id: ea471667edb258f67d01cfd828588e80a89e4083
2022-10-25 11:33:04 -07:00
..
binary_search_index_reader.cc Set Read rate limiter priority dynamically and pass it to FS (#9996) 2022-05-18 19:41:44 -07:00
binary_search_index_reader.h Extend Get/MultiGet deadline support to table open (#6982) 2020-06-29 14:53:17 -07:00
block.cc Add API for writing wide-column entities (#10242) 2022-06-25 15:30:47 -07:00
block.h Refactor to avoid confusing "raw block" (#10408) 2022-09-22 11:25:32 -07:00
block_based_table_builder.cc Refactor to avoid confusing "raw block" (#10408) 2022-09-22 11:25:32 -07:00
block_based_table_builder.h Refactor to avoid confusing "raw block" (#10408) 2022-09-22 11:25:32 -07:00
block_based_table_factory.cc Refactor ShardedCache for more sharing, static polymorphism (#10801) 2022-10-18 22:06:57 -07:00
block_based_table_factory.h Account memory of big memory users in BlockBasedTable in global memory limit (#9748) 2022-04-06 10:33:00 -07:00
block_based_table_iterator.cc Provide support for direct_reads with async_io (#10197) 2022-07-06 11:42:59 -07:00
block_based_table_iterator.h More testing w/prefix extractor, small refactor (#10122) 2022-06-16 16:41:25 -07:00
block_based_table_reader.cc Refactor ShardedCache for more sharing, static polymorphism (#10801) 2022-10-18 22:06:57 -07:00
block_based_table_reader.h Improve FragmentTombstones() speed by lazily initializing `seq_set_` (#10848) 2022-10-25 11:33:04 -07:00
block_based_table_reader_impl.h Updated NewDataBlockIterator to not fetch compression dict for non-da… (#10310) 2022-07-06 09:30:25 -07:00
block_based_table_reader_sync_and_async.h Refactor to avoid confusing "raw block" (#10408) 2022-09-22 11:25:32 -07:00
block_based_table_reader_test.cc Add support for wide-column point lookups (#10540) 2022-08-19 11:51:12 -07:00
block_builder.cc Improve data block construction performance (#9040) 2021-10-19 12:36:21 -07:00
block_builder.h Improve data block construction performance (#9040) 2021-10-19 12:36:21 -07:00
block_like_traits.h Revert "Avoid dynamic memory allocation on read path (#10453)" (#10541) 2022-08-19 11:02:54 -07:00
block_prefetcher.cc Fix stress test failure for async_io (#10660) 2022-09-12 14:48:06 -07:00
block_prefetcher.h Provide support for direct_reads with async_io (#10197) 2022-07-06 11:42:59 -07:00
block_prefix_index.cc Fix bug with kHashSearch and changing prefix_extractor with SetOptions (#10128) 2022-06-10 08:51:45 -07:00
block_prefix_index.h Fix bug with kHashSearch and changing prefix_extractor with SetOptions (#10128) 2022-06-10 08:51:45 -07:00
block_test.cc Print stack traces on frozen tests in CI (#10828) 2022-10-18 00:35:35 -07:00
block_type.h Remove deprecated block-based filter (#10184) 2022-06-16 15:51:33 -07:00
cachable_entry.h Refactor to avoid confusing "raw block" (#10408) 2022-09-22 11:25:32 -07:00
data_block_footer.cc Replace namespace name "rocksdb" with ROCKSDB_NAMESPACE (#6433) 2020-02-20 12:09:57 -08:00
data_block_footer.h Replace namespace name "rocksdb" with ROCKSDB_NAMESPACE (#6433) 2020-02-20 12:09:57 -08:00
data_block_hash_index.cc Replace namespace name "rocksdb" with ROCKSDB_NAMESPACE (#6433) 2020-02-20 12:09:57 -08:00
data_block_hash_index.h Replace namespace name "rocksdb" with ROCKSDB_NAMESPACE (#6433) 2020-02-20 12:09:57 -08:00
data_block_hash_index_test.cc Print stack traces on frozen tests in CI (#10828) 2022-10-18 00:35:35 -07:00
filter_block.h Pass rate_limiter_priority through filter block reader functions to FS (#10251) 2022-06-24 16:13:44 -07:00
filter_block_reader_common.cc Pass rate_limiter_priority through filter block reader functions to FS (#10251) 2022-06-24 16:13:44 -07:00
filter_block_reader_common.h Pass rate_limiter_priority through filter block reader functions to FS (#10251) 2022-06-24 16:13:44 -07:00
filter_policy.cc Have Cache use Status::MemoryLimit (#10262) 2022-07-06 14:41:46 -07:00
filter_policy_internal.h Remove deprecated block-based filter (#10184) 2022-06-16 15:51:33 -07:00
flush_block_policy.cc Restore Regex support for ObjectLibrary::Register, rename new APIs to allow old one to be deprecated in the future (#9362) 2022-01-11 06:33:48 -08:00
flush_block_policy.h Make FlushBlockPolicyFactory into a Customizable class (#8432) 2021-07-12 09:04:59 -07:00
full_filter_block.cc Pass rate_limiter_priority through filter block reader functions to FS (#10251) 2022-06-24 16:13:44 -07:00
full_filter_block.h Pass rate_limiter_priority through filter block reader functions to FS (#10251) 2022-06-24 16:13:44 -07:00
full_filter_block_test.cc Print stack traces on frozen tests in CI (#10828) 2022-10-18 00:35:35 -07:00
hash_index_reader.cc Fix bug with kHashSearch and changing prefix_extractor with SetOptions (#10128) 2022-06-10 08:51:45 -07:00
hash_index_reader.h Extend Get/MultiGet deadline support to table open (#6982) 2020-06-29 14:53:17 -07:00
index_builder.cc Make InternalKeyComparator not configurable (#10342) 2022-07-14 10:09:31 -07:00
index_builder.h Make InternalKeyComparator not configurable (#10342) 2022-07-14 10:09:31 -07:00
index_reader_common.cc Seek parallelization (#9994) 2022-05-20 16:09:33 -07:00
index_reader_common.h Set Read rate limiter priority dynamically and pass it to FS (#9996) 2022-05-18 19:41:44 -07:00
mock_block_based_table.h Remove deprecated block-based filter (#10184) 2022-06-16 15:51:33 -07:00
parsed_full_filter_block.cc Hide FilterBits{Builder,Reader} from public API (#9592) 2022-02-17 16:34:46 -08:00
parsed_full_filter_block.h Use new Insert and Lookup APIs in table reader to support secondary cache (#8315) 2021-05-21 18:29:12 -07:00
partitioned_filter_block.cc Add new option num_file_reads_for_auto_readahead in BlockBasedTableOptions (#10556) 2022-09-01 11:56:00 -07:00
partitioned_filter_block.h Update passing rate_limiter_priority for a PartitionedFilterBlockReader function to FS (#10438) 2022-07-29 11:32:54 -07:00
partitioned_filter_block_test.cc Print stack traces on frozen tests in CI (#10828) 2022-10-18 00:35:35 -07:00
partitioned_index_iterator.cc Provide support for direct_reads with async_io (#10197) 2022-07-06 11:42:59 -07:00
partitioned_index_iterator.h Make initial auto readahead_size configurable (#9836) 2022-04-15 17:28:09 -07:00
partitioned_index_reader.cc Add new option num_file_reads_for_auto_readahead in BlockBasedTableOptions (#10556) 2022-09-01 11:56:00 -07:00
partitioned_index_reader.h Meta-internal folly integration with F14FastMap (#9546) 2022-04-13 07:34:01 -07:00
reader_common.cc Remove own ToString() (#9955) 2022-05-06 13:03:58 -07:00
reader_common.h Bring the Configurable options together (#5753) 2020-09-14 17:01:01 -07:00
uncompression_dict_reader.cc Seek parallelization (#9994) 2022-05-20 16:09:33 -07:00
uncompression_dict_reader.h Fb 9718 verify checksums is ignored (#9767) 2022-03-29 11:54:54 -07:00