rocksdb/db/compaction
Peter Dillinger 6de7081cf3 Always verify SST unique IDs on SST file open (#10532)
Summary:
Although we've been tracking SST unique IDs in the DB manifest
unconditionally, checking has been opt-in and with an extra pass at DB::Open
time. This changes the behavior of `verify_sst_unique_id_in_manifest` to
check unique ID against manifest every time an SST file is opened through
table cache (normal DB operations), replacing the explicit pass over files
at DB::Open time. This change also enables the option by default and
removes the "EXPERIMENTAL" designation.

One possible criticism is that the option no longer ensures the integrity
of a DB at Open time. This is far from an all-or-nothing issue. Verifying
the IDs of all SST files hardly ensures all the data in the DB is readable.
(VerifyChecksum is supposed to do that.) Also, with
max_open_files=-1 (default, extremely common), all SST files are
opened at DB::Open time anyway.

Implementation details:
* `VerifySstUniqueIdInManifest()` functions are the extra/explicit pass
that is now removed.
* Unit tests that manipulate/corrupt table properties have to opt out of
this check, because that corrupts the "actual" unique id. (And even for
testing we don't currently have a mechanism to set "no unique id"
in the in-memory file metadata for new files.)
* A lot of other unit test churn relates to (a) default checking on, and
(b) checking on SST open even without DB::Open (e.g. on flush)
* Use `FileMetaData` for more `TableCache` operations (in place of
`FileDescriptor`) so that we have access to the unique_id whenever
we might need to open an SST file. **There is the possibility of
performance impact because we can no longer use the more
localized `fd` part of an `FdWithKeyRange` but instead follow the
`file_metadata` pointer. However, this change (possible regression)
is only done for `GetMemoryUsageByTableReaders`.**
* Removed a completely unnecessary constructor overload of
`TableReaderOptions`

Possible follow-up:
* Verification only happens when opening through table cache. Are there
more places where this should happen?
* Improve error message when there is a file size mismatch vs. manifest
(FIXME added in the appropriate place).
* I'm not sure there's a justification for `FileDescriptor` to be distinct from
`FileMetaData`.
* I'm skeptical that `FdWithKeyRange` really still makes sense for
optimizing some data locality by duplicating some data in memory, but I
could be wrong.
* An unnecessary overload of NewTableReader was recently added, in
the public API nonetheless (though unusable there). It should be cleaned
up to put most things under `TableReaderOptions`.

Pull Request resolved: https://github.com/facebook/rocksdb/pull/10532

Test Plan:
updated unit tests

Performance test showing no significant difference (just noise I think):
`./db_bench -benchmarks=readwhilewriting[-X10] -num=3000000 -disable_wal=1 -bloom_bits=8 -write_buffer_size=1000000 -target_file_size_base=1000000`
Before: readwhilewriting [AVG 10 runs] : 68702 (± 6932) ops/sec
After: readwhilewriting [AVG 10 runs] : 68239 (± 7198) ops/sec

Reviewed By: jay-zhuang

Differential Revision: D38765551

Pulled By: pdillinger

fbshipit-source-id: a827a708155f12344ab2a5c16e7701c7636da4c2
2022-09-07 22:52:42 -07:00
..
clipping_iterator.h Make InternalKeyComparator not configurable (#10342) 2022-07-14 10:09:31 -07:00
clipping_iterator_test.cc Cleanup multiple implementations of VectorIterator (#8901) 2021-10-06 07:48:31 -07:00
compaction.cc Provide support for subcompactions with user-defined timestamps (#10344) 2022-07-31 11:39:16 -07:00
compaction.h Change bottommost_temperture to last_level_temperture (#10471) 2022-08-08 14:36:34 -07:00
compaction_iteration_stats.h Support readahead during compaction for blob files (#9187) 2021-11-19 17:53:47 -08:00
compaction_iterator.cc Add memtable per key-value checksum (#10281) 2022-08-12 13:51:32 -07:00
compaction_iterator.h Include minimal contextual information in CompactionIterator (#10505) 2022-08-09 17:07:24 -07:00
compaction_iterator_test.cc Tiered Compaction: per key placement support (#9964) 2022-07-13 20:54:49 -07:00
compaction_job.cc Always verify SST unique IDs on SST file open (#10532) 2022-09-07 22:52:42 -07:00
compaction_job.h Support subcmpct using reserved resources for round-robin priority (#10341) 2022-07-24 11:12:44 -07:00
compaction_job_stats_test.cc Remove own ToString() (#9955) 2022-05-06 13:03:58 -07:00
compaction_job_test.cc Sync dir containing CURRENT after RenameFile on CURRENT as much as possible (#10573) 2022-08-29 17:35:21 -07:00
compaction_outputs.cc Add seqno to time mapping (#10338) 2022-07-14 21:49:34 -07:00
compaction_outputs.h Add seqno to time mapping (#10338) 2022-07-14 21:49:34 -07:00
compaction_picker.cc Support subcmpct using reserved resources for round-robin priority (#10341) 2022-07-24 11:12:44 -07:00
compaction_picker.h Support subcmpct using reserved resources for round-robin priority (#10341) 2022-07-24 11:12:44 -07:00
compaction_picker_fifo.cc Fix underflow in FIFOCompactionPicker (#10386) 2022-07-22 09:20:35 -07:00
compaction_picker_fifo.h Add OpenAndTrimHistory API to support trimming data with specified timestamp (#9410) 2022-03-11 16:13:23 -08:00
compaction_picker_level.cc Support subcmpct using reserved resources for round-robin priority (#10341) 2022-07-24 11:12:44 -07:00
compaction_picker_level.h Make ImmutableOptions struct that inherits from ImmutableCFOptions and ImmutableDBOptions (#8262) 2021-05-05 14:00:17 -07:00
compaction_picker_test.cc Improve universal compaction picker for tiered compaction (#10467) 2022-08-08 14:34:36 -07:00
compaction_picker_universal.cc Fix wrong compression type and options in universal compaction picker (#10515) 2022-08-23 14:58:02 -07:00
compaction_picker_universal.h Make ImmutableOptions struct that inherits from ImmutableCFOptions and ImmutableDBOptions (#8262) 2021-05-05 14:00:17 -07:00
compaction_service_job.cc Improve SubCompaction Partitioning (#10393) 2022-07-23 17:38:49 -07:00
compaction_service_test.cc Always verify SST unique IDs on SST file open (#10532) 2022-09-07 22:52:42 -07:00
compaction_state.cc Tiered Compaction: per key placement support (#9964) 2022-07-13 20:54:49 -07:00
compaction_state.h Tiered Compaction: per key placement support (#9964) 2022-07-13 20:54:49 -07:00
file_pri.h Try to start TTL earlier with kMinOverlappingRatio is used (#8749) 2021-11-01 14:36:31 -07:00
sst_partitioner.cc Restore Regex support for ObjectLibrary::Register, rename new APIs to allow old one to be deprecated in the future (#9362) 2022-01-11 06:33:48 -08:00
subcompaction_state.cc Tiered Compaction: per key placement support (#9964) 2022-07-13 20:54:49 -07:00
subcompaction_state.h Support subcmpct using reserved resources for round-robin priority (#10341) 2022-07-24 11:12:44 -07:00
tiered_compaction_test.cc Change bottommost_temperture to last_level_temperture (#10471) 2022-08-08 14:36:34 -07:00