Restore file size in backup table file names (and other cleanup) (#7400)

Summary:
Prior to 6.12, backup files using share_files_with_checksum had
the file size encoded in the file name, after the last '\_' and before
the last '.'. We considered this an implementation detail subject to
change, and indeed removed this information from the file name (with an
option to use old behavior) because it was considered
ineffective/inefficient for file name uniqueness. However, some
downstream RocksDB users were relying on this information since the file
size is not explicitly in the backup manifest file.

This primary purpose of this change is "retrofitting" the 6.12 release
(not yet a public release) to simultaneously support the benefits of the
new naming scheme (I/O performance and data correctness at scale) and
preserve the file size information, both as default behaviors. With this
change, we are essentially making the file size information encoded in
the file name an official, though obscure, extension of the backup meta
file format.

We preserve an option (kLegacyCrc32cAndFileSize) to use the original
"legacy" naming scheme, with its caveats, and make it easy to omit the
file size information (no kFlagIncludeFileSize), for more compact file
names. But note that changing the naming scheme used on an existing db
and backup directory can lead to transient space amplification, as some
files will be stored under two names in the shared_checksum directory.
Because some backups were saved using the original 6.12 naming scheme,
we offer two ways of dealing with those files: SST files generated by
older 6.12 versions can either use the default naming scheme in effect
when the SST files were generated (kFlagMatchInterimNaming, default, no
transient space amplification) or can use a new naming scheme (no
kFlagMatchInterimNaming, potential space amplification because some
already stored files getting a new name).

We don't have a natural way to detect which files were generated by
previous 6.12 versions, but this change hacks one in by changing DB
session ids to now use a more concise encoding, reducing file name
length, saving ~dozen bytes from SST files, and making them visually
distinct from DB ids so that they are less likely to be mixed up.

Two final auxiliary notes:
Recognizing that the backup file names have become a de facto part of
the backup meta schema, this change makes them easier to parse and
extend by putting a distinct marker, 's', before DB session ids embedded
in the name. When we extend this to allow custom checksums in the name,
they can get their own marker to ensure safe parsing. For backward
compatibility, file size does not get a marker but is assumed for
`_[0-9]+[.]`

Another change from initial 6.12 default behavior is never including
file custom checksum in the file name. Looking ahead to 6.13, we do not
want the default behavior to cause backup space amplification for
someone turning on file custom checksum checking in BackupEngine; we
want that to be an easy decision. When implemented, including file
custom checksums in backup file names will be a non-default option.

Actual file name patterns and priorities, as regexes:

    kLegacyCrc32cAndFileSize OR pre-6.12 SST file ->
      [0-9]+_[0-9]+_[0-9]+[.]sst
    kFlagMatchInterimNaming set (default) AND early 6.12 SST file ->
      [0-9]+_[0-9a-fA-F-]+[.]sst
    kUseDbSessionId AND NOT kFlagIncludeFileSize ->
      [0-9]+_s[0-9A-Z]{20}[.]sst
    kUseDbSessionId AND kFlagIncludeFileSize (default) ->
      [0-9]+_s[0-9A-Z]{20}_[0-9]+[.]sst

We might add opt-in options for more '\_' separated data in the name,
but embedded file size, if present, will always be after last '\_' and
before '.sst'.

This change was originally applied to version 6.12. (See https://github.com/facebook/rocksdb/issues/7390)

Pull Request resolved: https://github.com/facebook/rocksdb/pull/7400

Test Plan:
unit tests included. Sync point callbacks are used to mimic
previous version SST files.

Reviewed By: ajkr

Differential Revision: D23759587

Pulled By: pdillinger

fbshipit-source-id: f62d8af4e0978de0a34f26288cfbe66049b70025
This commit is contained in:
Peter Dillinger 2020-09-17 10:22:56 -07:00 committed by Facebook GitHub Bot
parent 7780a360eb
commit 93719fc953
8 changed files with 372 additions and 105 deletions

View File

@ -75,7 +75,7 @@
### New Features
* DB identity (`db_id`) and DB session identity (`db_session_id`) are added to table properties and stored in SST files. SST files generated from SstFileWriter and Repairer have DB identity “SST Writer” and “DB Repairer”, respectively. Their DB session IDs are generated in the same way as `DB::GetDbSessionId`. The session ID for SstFileWriter (resp., Repairer) resets every time `SstFileWriter::Open` (resp., `Repairer::Run`) is called.
* Added experimental option BlockBasedTableOptions::optimize_filters_for_memory for reducing allocated memory size of Bloom filters (~10% savings with Jemalloc) while preserving the same general accuracy. To have an effect, the option requires format_version=5 and malloc_usable_size. Enabling this option is forward and backward compatible with existing format_version=5.
* `BackupTableNameOption BackupableDBOptions::share_files_with_checksum_naming` is added, where `BackupTableNameOption` is an `enum` type with two enumerators `kChecksumAndFileSize` and `kOptionalChecksumAndDbSessionId`. By default, `BackupableDBOptions::share_files_with_checksum_naming` is set to `kOptionalChecksumAndDbSessionId`. In the default case, backup table filenames generated by this version of RocksDB are of the form either `<file_number>_<crc32c>_<db_session_id>.sst` or `<file_number>_<db_session_id>.sst` as opposed to `<file_number>_<crc32c>_<file_size>.sst`. Specifically, table filenames are of the form `<file_number>_<crc32c>_<db_session_id>.sst` if `DBOptions::file_checksum_gen_factory` is set to `GetFileChecksumGenCrc32cFactory()`. Futhermore, the checksum value `<crc32c>` appeared in the filenames is hexadecimal-encoded, instead of being decimal-encoded `uint32_t` value. If `DBOptions::file_checksum_gen_factory` is `nullptr`, the table filenames are of the form `<file_number>_<db_session_id>.sst`. The new default behavior fixes the backup file name collision problem, which might be possible at large scale, but the option `kChecksumAndFileSize` is added to allow use of old naming in case it is needed. Moreover, for table files generated prior to this version of RocksDB, using `kOptionalChecksumAndDbSessionId` will fall back on `kChecksumAndFileSize`. In these cases, the checksum value `<crc32c>` in the filenames `<file_number>_<crc32c>_<file_size>.sst` is decimal-encoded `uint32_t` value as before. This default behavior change is not an upgrade issue, because previous versions of RocksDB can read, restore, and delete backups using new names, and it's OK for a backup directory to use a mixture of table file naming schemes. Note that `share_files_with_checksum_naming` comes into effect only when both `share_files_with_checksum` and `share_table_files` are true.
* `BackupableDBOptions::share_files_with_checksum_naming` is added with new default behavior for naming backup files with `share_files_with_checksum`, to address performance and backup integrity issues. See API comments for details.
* Added auto resume function to automatically recover the DB from background Retryable IO Error. When retryable IOError happens during flush and WAL write, the error is mapped to Hard Error and DB will be in read mode. When retryable IO Error happens during compaction, the error will be mapped to Soft Error. DB is still in write/read mode. Autoresume function will create a thread for a DB to call DB->ResumeImpl() to try the recover for Retryable IO Error during flush and WAL write. Compaction will be rescheduled by itself if retryable IO Error happens. Auto resume may also cause other Retryable IO Error during the recovery, so the recovery will fail. Retry the auto resume may solve the issue, so we use max_bgerror_resume_count to decide how many resume cycles will be tried in total. If it is <=0, auto resume retryable IO Error is disabled. Default is INT_MAX, which will lead to a infinit auto resume. bgerror_resume_retry_interval decides the time interval between two auto resumes.
* Option `max_subcompactions` can be set dynamically using DB::SetDBOptions().
* Added experimental ColumnFamilyOptions::sst_partitioner_factory to define determine the partitioning of sst files. This helps compaction to split the files on interesting boundaries (key prefixes) to make propagation of sst files less write amplifying (covering the whole key space).
@ -83,7 +83,7 @@
### Performance Improvements
* Eliminate key copies for internal comparisons while accessing ingested block-based tables.
* Reduce key comparisons during random access in all block-based tables.
* BackupEngine avoids unnecessary repeated checksum computation for backing up a table file to the `shared_checksum` directory when using `kOptionalChecksumAndDbSessionId`, except on SST files generated before this version of RocksDB, which fall back on using `kChecksumAndFileSize`.
* BackupEngine avoids unnecessary repeated checksum computation for backing up a table file to the `shared_checksum` directory when using `share_files_with_checksum_naming = kUseDbSessionId` (new default), except on SST files generated before this version of RocksDB, which fall back on using `kLegacyCrc32cAndFileSize`.
## 6.11 (6/12/2020)
### Bug Fixes

View File

@ -8,6 +8,7 @@
// found in the LICENSE file. See the AUTHORS file for names of contributors.
#include <cstring>
#include <regex>
#include "db/db_test_util.h"
#include "port/stack_trace.h"
@ -62,6 +63,13 @@ TEST_F(DBBasicTest, UniqueSession) {
ASSERT_EQ(sid2, sid4);
// Expected compact format for session ids (see notes in implementation)
std::regex expected("[0-9A-Z]{20}");
const std::string match("match");
EXPECT_EQ(match, std::regex_replace(sid1, expected, match));
EXPECT_EQ(match, std::regex_replace(sid2, expected, match));
EXPECT_EQ(match, std::regex_replace(sid3, expected, match));
#ifndef ROCKSDB_LITE
Close();
ASSERT_OK(ReadOnlyReopen(options));

View File

@ -3694,13 +3694,29 @@ Status DBImpl::GetDbSessionId(std::string& session_id) const {
}
void DBImpl::SetDbSessionId() {
// GenerateUniqueId() generates an identifier
// that has a negligible probability of being duplicated
db_session_id_ = env_->GenerateUniqueId();
// Remove the extra '\n' at the end if there is one
if (!db_session_id_.empty() && db_session_id_.back() == '\n') {
db_session_id_.pop_back();
// GenerateUniqueId() generates an identifier that has a negligible
// probability of being duplicated, ~128 bits of entropy
std::string uuid = env_->GenerateUniqueId();
// Hash and reformat that down to a more compact format, 20 characters
// in base-36 ([0-9A-Z]), which is ~103 bits of entropy, which is enough
// to expect no collisions across a billion servers each opening DBs
// a million times (~2^50). Benefits vs. raw unique id:
// * Save ~ dozen bytes per SST file
// * Shorter shared backup file names (some platforms have low limits)
// * Visually distinct from DB id format
uint64_t a = NPHash64(uuid.data(), uuid.size(), 1234U);
uint64_t b = NPHash64(uuid.data(), uuid.size(), 5678U);
db_session_id_.resize(20);
static const char* const base36 = "0123456789ABCDEFGHIJKLMNOPQRSTUVWXYZ";
size_t i = 0;
for (; i < 10U; ++i, a /= 36U) {
db_session_id_[i] = base36[a % 36];
}
for (; i < 20U; ++i, b /= 36U) {
db_session_id_[i] = base36[b % 36];
}
TEST_SYNC_POINT_CALLBACK("DBImpl::SetDbSessionId", &db_session_id_);
}
// Default implementation -- returns not supported status

View File

@ -1236,11 +1236,22 @@ Status StressTest::TestBackupRestore(
backup_opts.share_files_with_checksum = true;
if (thread->rand.OneIn(2)) {
// old
backup_opts.share_files_with_checksum_naming = kChecksumAndFileSize;
backup_opts.share_files_with_checksum_naming =
BackupableDBOptions::kLegacyCrc32cAndFileSize;
} else {
// new
backup_opts.share_files_with_checksum_naming =
kOptionalChecksumAndDbSessionId;
BackupableDBOptions::kUseDbSessionId;
}
if (thread->rand.OneIn(2)) {
backup_opts.share_files_with_checksum_naming =
backup_opts.share_files_with_checksum_naming |
BackupableDBOptions::kFlagIncludeFileSize;
}
if (thread->rand.OneIn(2)) {
backup_opts.share_files_with_checksum_naming =
backup_opts.share_files_with_checksum_naming |
BackupableDBOptions::kFlagMatchInterimNaming;
}
}
}

View File

@ -27,24 +27,6 @@ namespace ROCKSDB_NAMESPACE {
// The default BackupEngine file checksum function name.
constexpr char kDefaultBackupFileChecksumFuncName[] = "crc32c";
// BackupTableNameOption describes possible naming schemes for backup
// table file names when the table files are stored in the shared_checksum
// directory (i.e., both share_table_files and share_files_with_checksum
// are true).
enum BackupTableNameOption : unsigned char {
// Backup SST filenames are <file_number>_<crc32c>_<file_size>.sst
// where <crc32c> is uint32_t.
kChecksumAndFileSize = 0,
// Backup SST filenames are <file_number>_<crc32c>_<db_session_id>.sst
// where <crc32c> is hexidecimally encoded.
// When DBOptions::file_checksum_gen_factory is not set to
// GetFileChecksumGenCrc32cFactory(), the filenames will be
// <file_number>_<db_session_id>.sst
// When there are no db session ids available in the table file, this
// option will use kChecksumAndFileSize as a fallback.
kOptionalChecksumAndDbSessionId = 1
};
struct BackupableDBOptions {
// Where to keep the backup files. Has to be different than dbname_
// Best to set this to dbname_ + "/backups"
@ -108,17 +90,11 @@ struct BackupableDBOptions {
// Default: nullptr
std::shared_ptr<RateLimiter> restore_rate_limiter{nullptr};
// Only used if share_table_files is set to true. If true, will consider that
// backups can come from different databases, hence an sst is not uniquely
// identifed by its name, but by the triple
// (file name, crc32c, db session id or file length)
//
// Note: If this option is set to true, we recommend setting
// share_files_with_checksum_naming to kOptionalChecksumAndDbSessionId, which
// is also our default option. Otherwise, there is a non-negligible chance of
// filename collision when sharing tables in shared_checksum among several
// DBs.
// *turn it on only if you know what you're doing*
// Only used if share_table_files is set to true. If true, will consider
// that backups can come from different databases, even differently mutated
// databases with the same DB ID. See share_files_with_checksum_naming and
// ShareFilesNaming for details on how table files names are made
// unique between databases.
//
// Default: false
bool share_files_with_checksum;
@ -144,24 +120,79 @@ struct BackupableDBOptions {
// Default: INT_MAX
int max_valid_backups_to_open;
// Naming option for share_files_with_checksum table files. This option
// can be set to kChecksumAndFileSize or kOptionalChecksumAndDbSessionId.
// kChecksumAndFileSize is susceptible to collision as file size is not a
// good source of entroy.
// kOptionalChecksumAndDbSessionId is immune to collision.
// ShareFilesNaming describes possible naming schemes for backup
// table file names when the table files are stored in the shared_checksum
// directory (i.e., both share_table_files and share_files_with_checksum
// are true).
enum ShareFilesNaming : int {
// Backup SST filenames are <file_number>_<crc32c>_<file_size>.sst
// where <crc32c> is an unsigned decimal integer. This is the
// original/legacy naming scheme for share_files_with_checksum,
// with two problems:
// * At massive scale, collisions on this triple with different file
// contents is plausible.
// * Determining the name to use requires computing the checksum,
// so generally requires reading the whole file even if the file
// is already backed up.
// ** ONLY RECOMMENDED FOR PRESERVING OLD BEHAVIOR **
kLegacyCrc32cAndFileSize = 1,
// Backup SST filenames are <file_number>_s<db_session_id>.sst. This
// pair of values should be very strongly unique for a given SST file
// and easily determined before computing a checksum. The 's' indicates
// the value is a DB session id, not a checksum.
//
// Exceptions:
// * For old SST files without a DB session id, kLegacyCrc32cAndFileSize
// will be used instead, matching the names assigned by RocksDB versions
// not supporting the newer naming scheme.
// * See also flags below.
kUseDbSessionId = 2,
kMaskNoNamingFlags = 0xffff,
// If not already part of the naming scheme, insert
// _<file_size>
// before .sst in the name. In case of user code actually parsing the
// last _<whatever> before the .sst as the file size, this preserves that
// feature of kLegacyCrc32cAndFileSize. In other words, this option makes
// official that unofficial feature of the backup metadata.
//
// We do not consider SST file sizes to have sufficient entropy to
// contribute significantly to naming uniqueness.
kFlagIncludeFileSize = 1 << 31,
// When encountering an SST file from a Facebook-internal early
// release of 6.12, use the default naming scheme in effect for
// when the SST file was generated (assuming full file checksum
// was not set to GetFileChecksumGenCrc32cFactory()). That naming is
// <file_number>_<db_session_id>.sst
// and ignores kFlagIncludeFileSize setting.
// NOTE: This flag is intended to be temporary and should be removed
// in a later release.
kFlagMatchInterimNaming = 1 << 30,
kMaskNamingFlags = ~kMaskNoNamingFlags,
};
// Naming option for share_files_with_checksum table files. See
// ShareFilesNaming for details.
//
// Modifying this option cannot introduce a downgrade compatibility issue
// because RocksDB can read, restore, and delete backups using different file
// names, and it's OK for a backup directory to use a mixture of table file
// naming schemes.
//
// Default: kOptionalChecksumAndDbSessionId
// However, modifying this option and saving more backups to the same
// directory can lead to the same file getting saved again to that
// directory, under the new shared name in addition to the old shared
// name.
//
// Default: kUseDbSessionId | kFlagIncludeFileSize | kFlagMatchInterimNaming
//
// Note: This option comes into effect only if both share_files_with_checksum
// and share_table_files are true. In the cases of old table files where no
// db_session_id is stored, we use the file_size to replace the empty
// db_session_id as a fallback.
BackupTableNameOption share_files_with_checksum_naming;
// and share_table_files are true.
ShareFilesNaming share_files_with_checksum_naming;
// Option for custom checksum functions.
// When this option is nullptr, BackupEngine will use its default crc32c as
@ -200,8 +231,9 @@ struct BackupableDBOptions {
uint64_t _restore_rate_limit = 0, int _max_background_operations = 1,
uint64_t _callback_trigger_interval_size = 4 * 1024 * 1024,
int _max_valid_backups_to_open = INT_MAX,
BackupTableNameOption _share_files_with_checksum_naming =
kOptionalChecksumAndDbSessionId,
ShareFilesNaming _share_files_with_checksum_naming =
static_cast<ShareFilesNaming>(kUseDbSessionId | kFlagIncludeFileSize |
kFlagMatchInterimNaming),
std::shared_ptr<FileChecksumGenFactory> _file_checksum_gen_factory =
nullptr)
: backup_dir(_backup_dir),
@ -220,16 +252,36 @@ struct BackupableDBOptions {
share_files_with_checksum_naming(_share_files_with_checksum_naming),
file_checksum_gen_factory(_file_checksum_gen_factory) {
assert(share_table_files || !share_files_with_checksum);
assert((share_files_with_checksum_naming & kMaskNoNamingFlags) != 0);
}
};
inline BackupableDBOptions::ShareFilesNaming operator&(
BackupableDBOptions::ShareFilesNaming lhs,
BackupableDBOptions::ShareFilesNaming rhs) {
int l = static_cast<int>(lhs);
int r = static_cast<int>(rhs);
assert(r == BackupableDBOptions::kMaskNoNamingFlags ||
(r & BackupableDBOptions::kMaskNoNamingFlags) == 0);
return static_cast<BackupableDBOptions::ShareFilesNaming>(l & r);
}
inline BackupableDBOptions::ShareFilesNaming operator|(
BackupableDBOptions::ShareFilesNaming lhs,
BackupableDBOptions::ShareFilesNaming rhs) {
int l = static_cast<int>(lhs);
int r = static_cast<int>(rhs);
assert((r & BackupableDBOptions::kMaskNoNamingFlags) == 0);
return static_cast<BackupableDBOptions::ShareFilesNaming>(l | r);
}
struct CreateBackupOptions {
// Flush will always trigger if 2PC is enabled.
// If write-ahead logs are disabled, set flush_before_backup=true to
// avoid losing unflushed key/value pairs from the memtable.
bool flush_before_backup = false;
// Callback for reporting progress.
// Callback for reporting progress, based on callback_trigger_interval_size.
std::function<void()> progress_callback = []() {};
// If false, background_thread_cpu_priority is ignored.

View File

@ -96,8 +96,12 @@ void PropertyBlockBuilder::AddTableProperty(const TableProperties& props) {
if (props.file_creation_time > 0) {
Add(TablePropertiesNames::kFileCreationTime, props.file_creation_time);
}
Add(TablePropertiesNames::kDbId, props.db_id);
Add(TablePropertiesNames::kDbSessionId, props.db_session_id);
if (!props.db_id.empty()) {
Add(TablePropertiesNames::kDbId, props.db_id);
}
if (!props.db_session_id.empty()) {
Add(TablePropertiesNames::kDbSessionId, props.db_session_id);
}
if (!props.filter_policy_name.empty()) {
Add(TablePropertiesNames::kFilterPolicy, props.filter_policy_name);

View File

@ -49,6 +49,8 @@
namespace ROCKSDB_NAMESPACE {
namespace {
using ShareFilesNaming = BackupableDBOptions::ShareFilesNaming;
inline uint32_t ChecksumHexToInt32(const std::string& checksum_hex) {
std::string checksum_str;
Slice(checksum_hex).DecodeHex(&checksum_str);
@ -167,9 +169,13 @@ class BackupEngineImpl : public BackupEngine {
Status Initialize();
// Obtain the naming option for backup table files
BackupTableNameOption GetTableNamingOption() const {
return options_.share_files_with_checksum_naming;
ShareFilesNaming GetNamingNoFlags() const {
return options_.share_files_with_checksum_naming &
BackupableDBOptions::kMaskNoNamingFlags;
}
ShareFilesNaming GetNamingFlags() const {
return options_.share_files_with_checksum_naming &
BackupableDBOptions::kMaskNamingFlags;
}
private:
@ -210,7 +216,7 @@ class BackupEngineImpl : public BackupEngine {
// currently
const std::string db_id;
// db_session_id appears in the backup SST filename if the table naming
// option is kOptionalChecksumAndDbSessionId
// option is kUseDbSessionId
const std::string db_session_id;
};
@ -344,9 +350,17 @@ class BackupEngineImpl : public BackupEngine {
return GetSharedChecksumDirRel() + "/" + (tmp ? "." : "") + file +
(tmp ? ".tmp" : "");
}
inline bool UseSessionId(const std::string& sid) const {
return GetTableNamingOption() == kOptionalChecksumAndDbSessionId &&
!sid.empty();
inline bool UseLegacyNaming(const std::string& sid) const {
return GetNamingNoFlags() ==
BackupableDBOptions::kLegacyCrc32cAndFileSize ||
sid.empty();
}
inline bool UseInterimNaming(const std::string& sid) const {
// The indicator of SST file from early internal 6.12 release
// is a '-' in the DB session id. DB session id was made more
// concise without '-' after that.
return (GetNamingFlags() & BackupableDBOptions::kFlagMatchInterimNaming) &&
sid.find('-') != std::string::npos;
}
inline std::string GetSharedFileWithChecksum(
const std::string& file, bool has_checksum,
@ -354,19 +368,22 @@ class BackupEngineImpl : public BackupEngine {
const std::string& db_session_id) const {
assert(file.size() == 0 || file[0] != '/');
std::string file_copy = file;
if (UseSessionId(db_session_id)) {
if (has_checksum) {
return file_copy.insert(file_copy.find_last_of('.'),
"_" + checksum_hex + "_" + db_session_id);
} else {
return file_copy.insert(file_copy.find_last_of('.'),
"_" + db_session_id);
}
if (UseLegacyNaming(db_session_id)) {
assert(has_checksum);
(void)has_checksum;
file_copy.insert(file_copy.find_last_of('.'),
"_" + ToString(ChecksumHexToInt32(checksum_hex)) + "_" +
ToString(file_size));
} else if (UseInterimNaming(db_session_id)) {
file_copy.insert(file_copy.find_last_of('.'), "_" + db_session_id);
} else {
return file_copy.insert(file_copy.find_last_of('.'),
"_" + ToString(ChecksumHexToInt32(checksum_hex)) +
"_" + ToString(file_size));
file_copy.insert(file_copy.find_last_of('.'), "_s" + db_session_id);
if (GetNamingFlags() & BackupableDBOptions::kFlagIncludeFileSize) {
file_copy.insert(file_copy.find_last_of('.'),
"_" + ToString(file_size));
}
}
return file_copy;
}
inline std::string GetFileFromChecksumFile(const std::string& file) const {
assert(file.size() == 0 || file[0] != '/');
@ -1851,7 +1868,7 @@ Status BackupEngineImpl::AddBackupFileWorkItem(
// Step 1: Prepare the relative path to destination
if (shared && shared_checksum) {
if (GetTableNamingOption() == kOptionalChecksumAndDbSessionId) {
if (GetNamingNoFlags() != BackupableDBOptions::kLegacyCrc32cAndFileSize) {
// Prepare db_session_id to add to the file name
// Ignore the returned status
// In the failed cases, db_id and db_session_id will be empty
@ -1875,7 +1892,7 @@ Status BackupEngineImpl::AddBackupFileWorkItem(
return Status::NotFound("File missing: " + src_dir + fname);
}
// dst_relative depends on the following conditions:
// 1) the naming scheme is kOptionalChecksumAndDbSessionId,
// 1) the naming scheme is kUseDbSessionId,
// 2) db_session_id is not empty,
// 3) checksum is available in the DB manifest.
// If 1,2,3) are satisfied, then dst_relative will be of the form:
@ -1888,7 +1905,7 @@ Status BackupEngineImpl::AddBackupFileWorkItem(
// Also, we display custom checksums in the name if possible.
dst_relative = GetSharedFileWithChecksum(
dst_relative, has_checksum,
checksum_func == nullptr || !UseSessionId(db_session_id)
checksum_func == nullptr || UseLegacyNaming(db_session_id)
? checksum_hex
: custom_checksum_hex,
size_bytes, db_session_id);
@ -1959,6 +1976,7 @@ Status BackupEngineImpl::AddBackupFileWorkItem(
if (!has_checksum || checksum_hex.empty()) {
// Either both checksum_hex and custom_checksum_hex need recalculating
// or only checksum_hex needs recalculating
// FIXME(peterd): extra I/O
s = CalculateChecksum(
src_dir + fname, db_env_, src_env_options, size_limit,
&checksum_hex, checksum_func,
@ -1968,7 +1986,7 @@ Status BackupEngineImpl::AddBackupFileWorkItem(
}
has_checksum = true;
}
if (UseSessionId(db_session_id)) {
if (!db_session_id.empty()) {
ROCKS_LOG_INFO(options_.info_log,
"%s already present, with checksum %s, size %" PRIu64
" and DB session identity %s",
@ -2005,6 +2023,7 @@ Status BackupEngineImpl::AddBackupFileWorkItem(
if (!has_checksum || checksum_hex.empty()) {
// Either both checksum_hex and custom_checksum_hex need recalculating
// or only checksum_hex needs recalculating
// FIXME(peterd): extra I/O
s = CalculateChecksum(
src_dir + fname, db_env_, src_env_options, size_limit,
&checksum_hex, checksum_func,

View File

@ -38,6 +38,15 @@
namespace ROCKSDB_NAMESPACE {
namespace {
using ShareFilesNaming = BackupableDBOptions::ShareFilesNaming;
const auto kLegacyCrc32cAndFileSize =
BackupableDBOptions::kLegacyCrc32cAndFileSize;
const auto kUseDbSessionId = BackupableDBOptions::kUseDbSessionId;
const auto kFlagIncludeFileSize = BackupableDBOptions::kFlagIncludeFileSize;
const auto kFlagMatchInterimNaming =
BackupableDBOptions::kFlagMatchInterimNaming;
const auto kNamingDefault =
kUseDbSessionId | kFlagIncludeFileSize | kFlagMatchInterimNaming;
class DummyFileChecksumGen : public FileChecksumGenerator {
public:
@ -892,6 +901,45 @@ class BackupableDBTest : public testing::Test {
return WriteStringToFile(test_db_env_.get(), file_contents, fname);
}
void AssertDirectoryFilesMatchRegex(const std::string& dir,
const std::regex& pattern,
int minimum_count) {
std::vector<FileAttributes> children;
ASSERT_OK(file_manager_->GetChildrenFileAttributes(dir, &children));
int found_count = 0;
for (const auto& child : children) {
if (child.name == "." || child.name == "..") {
continue;
}
const std::string match("match");
ASSERT_EQ(match, std::regex_replace(child.name, pattern, match));
++found_count;
}
ASSERT_GE(found_count, minimum_count);
}
void AssertDirectoryFilesSizeIndicators(const std::string& dir,
int minimum_count) {
std::vector<FileAttributes> children;
ASSERT_OK(file_manager_->GetChildrenFileAttributes(dir, &children));
int found_count = 0;
for (const auto& child : children) {
if (child.name == "." || child.name == "..") {
continue;
}
auto last_underscore = child.name.find_last_of('_');
auto last_dot = child.name.find_last_of('.');
ASSERT_NE(child.name, child.name.substr(0, last_underscore));
ASSERT_NE(child.name, child.name.substr(0, last_dot));
ASSERT_LT(last_underscore, last_dot);
std::string s = child.name.substr(last_underscore + 1,
last_dot - (last_underscore + 1));
ASSERT_EQ(s, ToString(child.size_bytes));
++found_count;
}
ASSERT_GE(found_count, minimum_count);
}
// files
std::string dbname_;
std::string backupdir_;
@ -1711,7 +1759,8 @@ TEST_P(BackupableDBTestWithParam, TableFileCorruptedBeforeBackup) {
TEST_F(BackupableDBTest, TableFileWithoutDbChecksumCorruptedDuringBackup) {
const int keys_iteration = 50000;
backupable_options_->share_files_with_checksum_naming = kChecksumAndFileSize;
backupable_options_->share_files_with_checksum_naming =
kLegacyCrc32cAndFileSize;
// When share_files_with_checksum is on, we calculate checksums of table
// files before and after copying. So we can test whether a corruption has
// happened during the file is copied to backup directory.
@ -1845,6 +1894,7 @@ TEST_F(BackupableDBTest, FlushCompactDuringBackupCheckpoint) {
CloseDBAndBackupEngine();
SyncPoint::GetInstance()->DisableProcessing();
SyncPoint::GetInstance()->ClearAllCallBacks();
/* FIXME(peterd): reinstate with option for checksum in file names
if (sopt == kShareWithChecksum) {
// Ensure we actually got DB manifest checksums by inspecting
// shared_checksum file names for hex checksum component
@ -1860,6 +1910,7 @@ TEST_F(BackupableDBTest, FlushCompactDuringBackupCheckpoint) {
EXPECT_EQ(match, std::regex_replace(child.name, expected, match));
}
}
*/
AssertBackupConsistency(0, 0, keys_iteration);
}
}
@ -2071,42 +2122,146 @@ TEST_F(BackupableDBTest, ShareTableFilesWithChecksumsTransition) {
}
}
// Verify backup and restore with share_files_with_checksum on and
// share_files_with_checksum_naming = kOptionalChecksumAndDbSessionId
// Verify backup and restore with various naming options, check names
TEST_F(BackupableDBTest, ShareTableFilesWithChecksumsNewNaming) {
// Use session id in the name of SST files
ASSERT_TRUE(backupable_options_->share_files_with_checksum_naming ==
kOptionalChecksumAndDbSessionId);
kNamingDefault);
const int keys_iteration = 5000;
int i = 0;
OpenDBAndBackupEngine(true, false, kShareWithChecksum);
FillDB(db_.get(), keys_iteration * i, keys_iteration * (i + 1));
ASSERT_OK(backup_engine_->CreateNewBackup(db_.get(), !!(i % 2)));
FillDB(db_.get(), 0, keys_iteration);
CloseDBAndBackupEngine();
AssertBackupConsistency(i + 1, 0, keys_iteration * (i + 1),
keys_iteration * (i + 2));
// Both checksum and session id in the name of SST files
options_.file_checksum_gen_factory = GetFileChecksumGenCrc32cFactory();
OpenDBAndBackupEngine(false, false, kShareWithChecksum);
FillDB(db_.get(), keys_iteration * i, keys_iteration * (i + 1));
ASSERT_OK(backup_engine_->CreateNewBackup(db_.get(), !!(i % 2)));
static const std::map<ShareFilesNaming, std::string> option_to_expected = {
{kLegacyCrc32cAndFileSize, "[0-9]+_[0-9]+_[0-9]+[.]sst"},
// kFlagIncludeFileSize redundant here
{kLegacyCrc32cAndFileSize | kFlagIncludeFileSize,
"[0-9]+_[0-9]+_[0-9]+[.]sst"},
{kUseDbSessionId, "[0-9]+_s[0-9A-Z]{20}[.]sst"},
{kUseDbSessionId | kFlagIncludeFileSize,
"[0-9]+_s[0-9A-Z]{20}_[0-9]+[.]sst"},
};
for (const auto& pair : option_to_expected) {
// kFlagMatchInterimNaming must not matter on new SST files
for (const auto option :
{pair.first, pair.first | kFlagMatchInterimNaming}) {
CloseAndReopenDB();
backupable_options_->share_files_with_checksum_naming = option;
OpenBackupEngine(true /*destroy_old_data*/);
ASSERT_OK(backup_engine_->CreateNewBackup(db_.get()));
CloseDBAndBackupEngine();
AssertBackupConsistency(1, 0, keys_iteration, keys_iteration * 2);
AssertDirectoryFilesMatchRegex(backupdir_ + "/shared_checksum",
std::regex(pair.second),
1 /* minimum_count */);
if (std::string::npos != pair.second.find("_[0-9]+[.]sst")) {
AssertDirectoryFilesSizeIndicators(backupdir_ + "/shared_checksum",
1 /* minimum_count */);
}
}
}
}
// Mimic SST file generated by early internal-only 6.12 release
// and test various naming options. This test can be removed when
// the kFlagMatchInterimNaming feature is removed.
TEST_F(BackupableDBTest, ShareTableFilesWithChecksumsInterimNaming) {
const int keys_iteration = 5000;
// Essentially, reinstate old implementaiton of generating a DB
// session id. This is how we distinguish "interim" SST files from
// newer ones: from the form of the db session id string.
ROCKSDB_NAMESPACE::SyncPoint::GetInstance()->SetCallBack(
"DBImpl::SetDbSessionId", [&](void* sid_void_star) {
std::string* sid = static_cast<std::string*>(sid_void_star);
*sid = test_db_env_->GenerateUniqueId();
if (!sid->empty() && sid->back() == '\n') {
sid->pop_back();
}
});
ROCKSDB_NAMESPACE::SyncPoint::GetInstance()->EnableProcessing();
OpenDBAndBackupEngine(true, false, kShareWithChecksum);
FillDB(db_.get(), 0, keys_iteration);
CloseDBAndBackupEngine();
AssertBackupConsistency(i + 1, 0, keys_iteration * (i + 1),
keys_iteration * (i + 2));
static const std::map<ShareFilesNaming, std::string> option_to_expected = {
{kLegacyCrc32cAndFileSize, "[0-9]+_[0-9]+_[0-9]+[.]sst"},
// kFlagMatchInterimNaming ignored here
{kLegacyCrc32cAndFileSize | kFlagMatchInterimNaming,
"[0-9]+_[0-9]+_[0-9]+[.]sst"},
{kUseDbSessionId, "[0-9]+_s[0-9a-fA-F-]+[.]sst"},
{kUseDbSessionId | kFlagIncludeFileSize,
"[0-9]+_s[0-9a-fA-F-]+_[0-9]+[.]sst"},
{kUseDbSessionId | kFlagMatchInterimNaming, "[0-9]+_[0-9a-fA-F-]+[.]sst"},
{kUseDbSessionId | kFlagIncludeFileSize | kFlagMatchInterimNaming,
"[0-9]+_[0-9a-fA-F-]+[.]sst"},
};
for (const auto& pair : option_to_expected) {
CloseAndReopenDB();
backupable_options_->share_files_with_checksum_naming = pair.first;
OpenBackupEngine(true /*destroy_old_data*/);
ASSERT_OK(backup_engine_->CreateNewBackup(db_.get()));
CloseDBAndBackupEngine();
AssertBackupConsistency(1, 0, keys_iteration, keys_iteration * 2);
AssertDirectoryFilesMatchRegex(backupdir_ + "/shared_checksum",
std::regex(pair.second),
1 /* minimum_count */);
}
ROCKSDB_NAMESPACE::SyncPoint::GetInstance()->DisableProcessing();
ROCKSDB_NAMESPACE::SyncPoint::GetInstance()->ClearAllCallBacks();
}
// Mimic SST file generated by pre-6.12 releases and verify that
// old names are always used regardless of naming option.
TEST_F(BackupableDBTest, ShareTableFilesWithChecksumsOldFileNaming) {
const int keys_iteration = 5000;
// Pre-6.12 release did not include db id and db session id properties.
ROCKSDB_NAMESPACE::SyncPoint::GetInstance()->SetCallBack(
"PropertyBlockBuilder::AddTableProperty:Start", [&](void* props_vs) {
auto props = static_cast<TableProperties*>(props_vs);
props->db_id = "";
props->db_session_id = "";
});
ROCKSDB_NAMESPACE::SyncPoint::GetInstance()->EnableProcessing();
OpenDBAndBackupEngine(true, false, kShareWithChecksum);
FillDB(db_.get(), 0, keys_iteration);
CloseDBAndBackupEngine();
// Old names should always be used on old files
const std::regex expected("[0-9]+_[0-9]+_[0-9]+[.]sst");
for (ShareFilesNaming option : {kNamingDefault, kUseDbSessionId}) {
CloseAndReopenDB();
backupable_options_->share_files_with_checksum_naming = option;
OpenBackupEngine(true /*destroy_old_data*/);
ASSERT_OK(backup_engine_->CreateNewBackup(db_.get()));
CloseDBAndBackupEngine();
AssertBackupConsistency(1, 0, keys_iteration, keys_iteration * 2);
AssertDirectoryFilesMatchRegex(backupdir_ + "/shared_checksum", expected,
1 /* minimum_count */);
}
ROCKSDB_NAMESPACE::SyncPoint::GetInstance()->DisableProcessing();
ROCKSDB_NAMESPACE::SyncPoint::GetInstance()->ClearAllCallBacks();
}
// Verify backup and restore with share_files_with_checksum off and then
// transition this option to on and share_files_with_checksum_naming to be
// kOptionalChecksumAndDbSessionId
// based on kUseDbSessionId
TEST_F(BackupableDBTest, ShareTableFilesWithChecksumsNewNamingTransition) {
const int keys_iteration = 5000;
// We may set share_files_with_checksum_naming to kChecksumAndFileSize
// We may set share_files_with_checksum_naming to kLegacyCrc32cAndFileSize
// here but even if we don't, it should have no effect when
// share_files_with_checksum is false
ASSERT_TRUE(backupable_options_->share_files_with_checksum_naming ==
kOptionalChecksumAndDbSessionId);
kNamingDefault);
// set share_files_with_checksum to false
OpenDBAndBackupEngine(true, false, kShareNoChecksum);
int j = 3;
@ -2124,7 +2279,7 @@ TEST_F(BackupableDBTest, ShareTableFilesWithChecksumsNewNamingTransition) {
// set share_files_with_checksum to true and do some more backups
// and use session id in the name of SST file backup
ASSERT_TRUE(backupable_options_->share_files_with_checksum_naming ==
kOptionalChecksumAndDbSessionId);
kNamingDefault);
OpenDBAndBackupEngine(false /* destroy_old_data */, false,
kShareWithChecksum);
FillDB(db_.get(), keys_iteration * j, keys_iteration * (j + 1));
@ -2144,9 +2299,9 @@ TEST_F(BackupableDBTest, ShareTableFilesWithChecksumsNewNamingTransition) {
// For an extra challenge, make sure that GarbageCollect / DeleteBackup
// is OK even if we open without share_table_files but with
// share_files_with_checksum_naming being kOptionalChecksumAndDbSessionId
// share_files_with_checksum_naming based on kUseDbSessionId
ASSERT_TRUE(backupable_options_->share_files_with_checksum_naming ==
kOptionalChecksumAndDbSessionId);
kNamingDefault);
OpenDBAndBackupEngine(false /* destroy_old_data */, false, kNoShare);
backup_engine_->DeleteBackup(1);
backup_engine_->GarbageCollect();
@ -2158,7 +2313,8 @@ TEST_F(BackupableDBTest, ShareTableFilesWithChecksumsNewNamingTransition) {
// Use checksum and file size for backup table file names and open without
// share_table_files
// Again, make sure that GarbageCollect / DeleteBackup is OK
backupable_options_->share_files_with_checksum_naming = kChecksumAndFileSize;
backupable_options_->share_files_with_checksum_naming =
kLegacyCrc32cAndFileSize;
OpenDBAndBackupEngine(false /* destroy_old_data */, false, kNoShare);
backup_engine_->DeleteBackup(2);
backup_engine_->GarbageCollect();
@ -2172,9 +2328,10 @@ TEST_F(BackupableDBTest, ShareTableFilesWithChecksumsNewNamingTransition) {
}
// Verify backup and restore with share_files_with_checksum on and transition
// from kChecksumAndFileSize to kOptionalChecksumAndDbSessionId
// from kLegacyCrc32cAndFileSize to kUseDbSessionId
TEST_F(BackupableDBTest, ShareTableFilesWithChecksumsNewNamingUpgrade) {
backupable_options_->share_files_with_checksum_naming = kChecksumAndFileSize;
backupable_options_->share_files_with_checksum_naming =
kLegacyCrc32cAndFileSize;
const int keys_iteration = 5000;
// set share_files_with_checksum to true
OpenDBAndBackupEngine(true, false, kShareWithChecksum);
@ -2190,8 +2347,7 @@ TEST_F(BackupableDBTest, ShareTableFilesWithChecksumsNewNamingUpgrade) {
keys_iteration * (j + 1));
}
backupable_options_->share_files_with_checksum_naming =
kOptionalChecksumAndDbSessionId;
backupable_options_->share_files_with_checksum_naming = kUseDbSessionId;
OpenDBAndBackupEngine(false /* destroy_old_data */, false,
kShareWithChecksum);
FillDB(db_.get(), keys_iteration * j, keys_iteration * (j + 1));
@ -2222,7 +2378,8 @@ TEST_F(BackupableDBTest, ShareTableFilesWithChecksumsNewNamingUpgrade) {
// Use checksum and file size for backup table file names and open without
// share_table_files
// Again, make sure that GarbageCollect / DeleteBackup is OK
backupable_options_->share_files_with_checksum_naming = kChecksumAndFileSize;
backupable_options_->share_files_with_checksum_naming =
kLegacyCrc32cAndFileSize;
OpenDBAndBackupEngine(false /* destroy_old_data */, false, kNoShare);
backup_engine_->DeleteBackup(2);
backup_engine_->GarbageCollect();