mirror of https://github.com/facebook/rocksdb.git
9d8eb77c4d
Summary: Two relatively simple functional changes to incremental backup behavior, integrated with a minor refactoring to reduce code redundancy and improve error/log message. There are nuances to the impact of these changes, but I believe they are fundamentally good and generally safe. Those functional changes: * Incremental backups no longer read DB table files that are already saved to a shared part of the backup directory, unless `share_files_with_checksum` is used with `kLegacyCrc32cAndFileSize` naming (discouraged) where crc32c full file checksums are needed to determine file naming. * Justification: incremental backups should not need to read the whole DB, especially without rate limiting. (Although other BackupEngine reads are not rate limited either, other non-trivial reads are generally limited by a corresponding write, as in copying files.) Also, the fact that this is not already fixed was arguably a bug/oversight in the implementation of https://github.com/facebook/rocksdb/issues/7110. * When considering whether a table file is already backed up in a shared part of backup directory, BackupEngine would already query the sizes of source (DB) and pre-existing destination (backup) files. BackupEngine now uses these file sizes to detect corruption, as at least one of (a) old backup, (b) backup in progress, or (c) current DB is corrupt if there's a size mismatch. * Justification: a random related fix that also helps to cover a small hole in corruption checking uncovered by the other functional change: * For `share_table_files` without "checksum" (not recommended), the other change regresses in detecting fundamentally unsafe use of this option combination: when you might generate different versions of same SST file number. As demonstrated by `BackupableDBTest.FailOverwritingBackups,` this regression is greatly mitigated by the new file size checking. Nevertheless, almost no reason to use `share_files_with_checksum=false` should remain, and comments are updated appropriately. Also, this change renames internal function `CalculateChecksum` to `ReadFileAndComputeChecksum` to make the performance impact of this function clear in code reviews. It is not clear what 'same_path' is for in backupable_db.cc, and I suspect it cannot be true for a DB with unique file names (like DBImpl). Nevertheless, I've tried to keep its functionality intact when `true` to minimize risk for now, despite having no unit tests for which it is true. Select impact details (much more in unit tests): For `share_files_with_checksum`, I am confident there is no regression (vs. pre-6.12) in detecting DB or backup corruption at backup creation time, mostly because the old design did not leverage this extra checksum computation for detecting inconsistencies at backup creation time. (With computed checksums in names, a recently corrupted file just looked like a different file vs. what was already backed up.) Even in the hypothetical case of DB session id collision (~100 bits entropy collision), file size in name and/or our file size check add an extra layer of protection against false success in creating an accurate new backup. (Unit test included.) `DB::VerifyChecksum` and `BackupEngine::VerifyBackup` with checksum checking are still able to catch corruptions that `CreateNewBackup` does not. Note that when custom file checksum support is added to BackupEngine, that will essentially give the same power as `DB::VerifyChecksum` into `CreateNewBackup`. We could add options for `CreateNewBackup` to cover some of what would be caught by `VerifyBackup` with checksum checking. Pull Request resolved: https://github.com/facebook/rocksdb/pull/7413 Test Plan: Two new unit tests included, both of which fail without these changes. Although we don't test the I/O improvement directly, we test it indirectly in DB corruption detection power that was inadvertently unlocked with new backup file naming PLUS computing current content checksums (now removed). (I don't think that case of DB corruption detection justifies reading the whole DB on incremental backup.) Reviewed By: zhichao-cao Differential Revision: D23818480 Pulled By: pdillinger fbshipit-source-id: 148aff16f001af5b9fd4b22f155311c2461f1bac |
||
---|---|---|
.. | ||
backupable | ||
blob_db | ||
cassandra | ||
checkpoint | ||
compaction_filters | ||
convenience | ||
leveldb_options | ||
memory | ||
merge_operators | ||
option_change_migration | ||
options | ||
persistent_cache | ||
simulator_cache | ||
table_properties_collectors | ||
trace | ||
transactions | ||
ttl | ||
write_batch_with_index | ||
debug.cc | ||
env_librados.cc | ||
env_librados.md | ||
env_librados_test.cc | ||
env_mirror.cc | ||
env_mirror_test.cc | ||
env_timed.cc | ||
env_timed_test.cc | ||
fault_injection_env.cc | ||
fault_injection_env.h | ||
fault_injection_fs.cc | ||
fault_injection_fs.h | ||
merge_operators.h | ||
object_registry.cc | ||
object_registry_test.cc | ||
util_merge_operators_test.cc |