rocksdb/table/block_based
Yu Zhang 15053f3ab4 Logically strip timestamp during flush (#11557)
Summary:
Logically strip the user-defined timestamp when L0 files are created during flush when `AdvancedColumnFamilyOptions.persist_user_defined_timestamps` is false. Logically stripping timestamp here means replacing the original user-defined timestamp with a mininum timestamp, which for now is hard coded to be all zeros bytes.

While working on this, I caught a missing piece on the `BlockBuilder` level for this feature. The current quick path `std::min(buffer_size, last_key_size)` needs a bit tweaking to work for this feature. When user-defined timestamp is stripped during block building, on writing first entry or right after resetting, `buffer` is empty and `buffer_size` is zero as usual. However, in follow-up writes, depending on the size of the stripped user-defined timestamp, and the size of the value, what's in `buffer` can sometimes be smaller than `last_key_size`, leading `std::min(buffer_size, last_key_size)` to truncate the `last_key`. Previous test doesn't caught the bug because in those tests, the size of the stripped user-defined timestamps bytes is smaller than the length of the value. In order to avoid the conditional operation, this PR changed the original trivial `std::min` operation into an arithmetic operation. Since this is a change in a hot and performance critical path, I did the following benchmark to check no observable regression is introduced.
```TEST_TMPDIR=/dev/shm/rocksdb1 ./db_bench -benchmarks=fillseq -memtablerep=vector -allow_concurrent_memtable_write=false -num=50000000```
Compiled with DEBUG_LEVEL=0
Test vs. control runs simulaneous for better accuracy, units = ops/sec
                       PR  vs base:
Round 1: 350652 vs 349055
Round 2: 365733 vs 364308
Round 3: 355681 vs 354475

Pull Request resolved: https://github.com/facebook/rocksdb/pull/11557

Test Plan:
New timestamp specific test added or existing tests augmented, both are parameterized with `UserDefinedTimestampTestMode`:
`UserDefinedTimestampTestMode::kNormal` -> UDT feature enabled, write / read with min timestamp
`UserDefinedTimestampTestMode::kStripUserDefinedTimestamps` -> UDT feature enabled, write / read with min timestamp, set Options.persist_user_defined_timestamps to false.

```
make all check
./db_wal_test --gtest_filter="*WithTimestamp*"
./flush_job_test --gtest_filter="*WithTimestamp*"
./repair_test --gtest_filter="*WithTimestamp*"
./block_based_table_reader_test
```

Reviewed By: pdillinger

Differential Revision: D47027664

Pulled By: jowlyzhang

fbshipit-source-id: e729193b6334dfc63aaa736d684d907a022571f5
2023-06-29 15:50:50 -07:00
..
binary_search_index_reader.cc Add support to strip / pad timestamp when creating / reading a block based table (#11495) 2023-06-01 11:10:03 -07:00
binary_search_index_reader.h Extend Get/MultiGet deadline support to table open (#6982) 2020-06-29 14:53:17 -07:00
block.cc Fix use after move in data block hash index (#11505) 2023-06-08 11:04:43 -07:00
block.h Add support to strip / pad timestamp when writing / reading a block (#11472) 2023-05-25 15:41:32 -07:00
block_based_table_builder.cc Record the persist_user_defined_timestamps flag in manifest (#11515) 2023-06-21 21:49:01 -07:00
block_based_table_builder.h Record and use the tail size to prefetch table tail (#11406) 2023-05-08 13:14:28 -07:00
block_based_table_factory.cc Add support to strip / pad timestamp when creating / reading a block based table (#11495) 2023-06-01 11:10:03 -07:00
block_based_table_factory.h Record and use the tail size to prefetch table tail (#11406) 2023-05-08 13:14:28 -07:00
block_based_table_iterator.cc Much better stats for seeks and prefix filtering (#11460) 2023-05-19 15:25:49 -07:00
block_based_table_iterator.h Much better stats for seeks and prefix filtering (#11460) 2023-05-19 15:25:49 -07:00
block_based_table_reader.cc Add UT to test BG read qps behavior during upgrade for pr11406 (#11522) 2023-06-16 13:04:30 -07:00
block_based_table_reader.h Add an interface to provide support for underlying FS to pass their own buffer during reads (#11324) 2023-06-23 11:48:49 -07:00
block_based_table_reader_impl.h Group rocksdb.sst.read.micros stat by IOActivity flush and compaction (#11288) 2023-04-21 09:07:18 -07:00
block_based_table_reader_sync_and_async.h Add an interface to provide support for underlying FS to pass their own buffer during reads (#11324) 2023-06-23 11:48:49 -07:00
block_based_table_reader_test.cc Add support to strip / pad timestamp when creating / reading a block based table (#11495) 2023-06-01 11:10:03 -07:00
block_builder.cc Logically strip timestamp during flush (#11557) 2023-06-29 15:50:50 -07:00
block_builder.h Logically strip timestamp during flush (#11557) 2023-06-29 15:50:50 -07:00
block_cache.cc Block per key-value checksum (#11287) 2023-04-25 12:08:23 -07:00
block_cache.h Block per key-value checksum (#11287) 2023-04-25 12:08:23 -07:00
block_prefetcher.cc Fix stress test failure for async_io (#10660) 2022-09-12 14:48:06 -07:00
block_prefetcher.h Provide support for direct_reads with async_io (#10197) 2022-07-06 11:42:59 -07:00
block_prefix_index.cc Fix bug with kHashSearch and changing prefix_extractor with SetOptions (#10128) 2022-06-10 08:51:45 -07:00
block_prefix_index.h Fix bug with kHashSearch and changing prefix_extractor with SetOptions (#10128) 2022-06-10 08:51:45 -07:00
block_test.cc Fix use after move in data block hash index (#11505) 2023-06-08 11:04:43 -07:00
block_type.h Remove deprecated block-based filter (#10184) 2022-06-16 15:51:33 -07:00
cachable_entry.h HyperClockCache support for SecondaryCache, with refactoring (#11301) 2023-03-17 20:23:49 -07:00
data_block_footer.cc Replace namespace name "rocksdb" with ROCKSDB_NAMESPACE (#6433) 2020-02-20 12:09:57 -08:00
data_block_footer.h Replace namespace name "rocksdb" with ROCKSDB_NAMESPACE (#6433) 2020-02-20 12:09:57 -08:00
data_block_hash_index.cc Format files under table/ by clang-format (#10852) 2022-10-25 11:50:38 -07:00
data_block_hash_index.h Fix build with gcc 13 by including <cstdint> (#11118) 2023-01-25 14:30:32 -08:00
data_block_hash_index_test.cc Block per key-value checksum (#11287) 2023-04-25 12:08:23 -07:00
filter_block.h Record and use the tail size to prefetch table tail (#11406) 2023-05-08 13:14:28 -07:00
filter_block_reader_common.cc Use user-provided ReadOptions for metadata block reads more often (#11208) 2023-04-04 16:53:14 -07:00
filter_block_reader_common.h Use user-provided ReadOptions for metadata block reads more often (#11208) 2023-04-04 16:53:14 -07:00
filter_policy.cc Remove RocksDB LITE (#11147) 2023-01-27 13:14:19 -08:00
filter_policy_internal.h Remove deprecated block-based filter (#10184) 2022-06-16 15:51:33 -07:00
flush_block_policy.cc Change internal headers with duplicate names (#11408) 2023-05-17 11:27:09 -07:00
flush_block_policy_impl.h Change internal headers with duplicate names (#11408) 2023-05-17 11:27:09 -07:00
full_filter_block.cc Use user-provided ReadOptions for metadata block reads more often (#11208) 2023-04-04 16:53:14 -07:00
full_filter_block.h Use user-provided ReadOptions for metadata block reads more often (#11208) 2023-04-04 16:53:14 -07:00
full_filter_block_test.cc Use user-provided ReadOptions for metadata block reads more often (#11208) 2023-04-04 16:53:14 -07:00
hash_index_reader.cc Add support to strip / pad timestamp when creating / reading a block based table (#11495) 2023-06-01 11:10:03 -07:00
hash_index_reader.h Extend Get/MultiGet deadline support to table open (#6982) 2020-06-29 14:53:17 -07:00
index_builder.cc Add support to strip / pad timestamp when creating / reading a block based table (#11495) 2023-06-01 11:10:03 -07:00
index_builder.h Add support to strip / pad timestamp when creating / reading a block based table (#11495) 2023-06-01 11:10:03 -07:00
index_reader_common.cc Use user-provided ReadOptions for metadata block reads more often (#11208) 2023-04-04 16:53:14 -07:00
index_reader_common.h Add support to strip / pad timestamp when creating / reading a block based table (#11495) 2023-06-01 11:10:03 -07:00
mock_block_based_table.h Remove deprecated block-based filter (#10184) 2022-06-16 15:51:33 -07:00
parsed_full_filter_block.cc Hide FilterBits{Builder,Reader} from public API (#9592) 2022-02-17 16:34:46 -08:00
parsed_full_filter_block.h Major Cache refactoring, CPU efficiency improvement (#10975) 2023-01-11 14:20:40 -08:00
partitioned_filter_block.cc Fix higher read qps during db open caused by pr 11406 (#11516) 2023-06-06 17:42:43 -07:00
partitioned_filter_block.h Add support to strip / pad timestamp when creating / reading a block based table (#11495) 2023-06-01 11:10:03 -07:00
partitioned_filter_block_test.cc Add support to strip / pad timestamp when creating / reading a block based table (#11495) 2023-06-01 11:10:03 -07:00
partitioned_index_iterator.cc Provide support for direct_reads with async_io (#10197) 2022-07-06 11:42:59 -07:00
partitioned_index_iterator.h Format files under table/ by clang-format (#10852) 2022-10-25 11:50:38 -07:00
partitioned_index_reader.cc Fix higher read qps during db open caused by pr 11406 (#11516) 2023-06-06 17:42:43 -07:00
partitioned_index_reader.h Record and use the tail size to prefetch table tail (#11406) 2023-05-08 13:14:28 -07:00
reader_common.cc Remove own ToString() (#9955) 2022-05-06 13:03:58 -07:00
reader_common.h Add block checksum mismatch ticker stat (#11438) 2023-05-12 18:16:11 -07:00
uncompression_dict_reader.cc Group rocksdb.sst.read.micros stat by IOActivity flush and compaction (#11288) 2023-04-21 09:07:18 -07:00
uncompression_dict_reader.h Group rocksdb.sst.read.micros stat by IOActivity flush and compaction (#11288) 2023-04-21 09:07:18 -07:00