rocksdb/options
Andrew Ryan Chang af2a36d2c7 Record newest_key_time as a table property (#13083)
Summary:
This PR does two things:
1. Adds a new table property `newest_key_time`
2. Uses this property to improve TTL and temperature change compaction.

### Context

The current `creation_time` table property should really be named `oldest_ancestor_time`. For flush output files, this is the oldest key time in the file. For compaction output files, this is the minimum among all oldest key times in the input files.

The problem with using the oldest ancestor time for TTL compaction is that we may end up dropping files earlier than we should. What we really want is the newest (i.e. "youngest") key time. Right now we take a roundabout way to estimate this value -- we take the value of the _oldest_ key time for the _next_ (newer) SST file. This is also why the current code has checks for `index >= 1`.

Our new property `newest_key_time` is set to the file creation time during flushes, and the max over all input files for compactions.

There were some additional smaller changes that I had to make for testing purposes:
- Refactoring the mock table reader to support specifying my own table properties
- Refactoring out a test utility method `GetLevelFileMetadatas`  that would otherwise be copy/pasted in 3 places

Credit to cbi42 for the problem explanation and proposed solution

### Testing

- Added a dedicated unit test to my `newest_key_time` logic in isolation (i.e. are we populating the property on flush and compaction)
- Updated the existing unit tests (for TTL/temperate change compaction), which were comprehensive enough to break when I first made my code changes. I removed the test setup code which set the file metadata `oldest_ancestor_time`, so we know we are actually only using the new table property instead.

Pull Request resolved: https://github.com/facebook/rocksdb/pull/13083

Reviewed By: cbi42

Differential Revision: D65298604

Pulled By: archang19

fbshipit-source-id: 898ef91b692ab33f5129a2a16b64ecadd4c32432
2024-11-01 10:08:35 -07:00
..
cf_options.cc Fix race to make BlockBasedTableOptions effectively mutable (#13082) 2024-10-25 10:24:54 -07:00
cf_options.h Refactor `table_factory` into MutableCFOptions (#13077) 2024-10-17 14:13:20 -07:00
configurable.cc Fix race to make BlockBasedTableOptions effectively mutable (#13082) 2024-10-25 10:24:54 -07:00
configurable_helper.h Fix race to make BlockBasedTableOptions effectively mutable (#13082) 2024-10-25 10:24:54 -07:00
configurable_test.cc Run internal cpp modernizer on RocksDB repo (#12398) 2024-03-04 10:08:32 -08:00
configurable_test.h internal_repo_rocksdb (-8794174668376270091) (#12114) 2023-12-01 11:10:30 -08:00
customizable.cc Prefer static_cast in place of most reinterpret_cast (#12308) 2024-02-07 10:44:11 -08:00
customizable_test.cc Make Cache a customizable class (#13024) 2024-09-20 12:13:19 -07:00
db_options.cc Steps toward deprecating implicit prefix seek, related fixes (#13026) 2024-09-20 15:54:19 -07:00
db_options.h Steps toward deprecating implicit prefix seek, related fixes (#13026) 2024-09-20 15:54:19 -07:00
offpeak_time_info.cc Mark more files for periodic compaction during offpeak (#12031) 2023-11-06 11:43:59 -08:00
offpeak_time_info.h Fix build on alpine 3.19 (#12345) 2024-02-12 11:24:56 -08:00
options.cc Improve universal compaction sorted-run trigger (#12477) 2024-05-24 10:10:31 -07:00
options_helper.cc Refactor `table_factory` into MutableCFOptions (#13077) 2024-10-17 14:13:20 -07:00
options_helper.h Bug fix and test BuildDBOptions (#13038) 2024-09-26 14:36:29 -07:00
options_parser.cc Fix and clarify ignore_unknown_options (#12989) 2024-09-04 11:42:04 -07:00
options_parser.h Group SST write in flush, compaction and db open with new stats (#11910) 2023-12-29 15:29:23 -08:00
options_settable_test.cc Record newest_key_time as a table property (#13083) 2024-11-01 10:08:35 -07:00
options_test.cc Fix race to make BlockBasedTableOptions effectively mutable (#13082) 2024-10-25 10:24:54 -07:00