mirror of
https://github.com/facebook/rocksdb.git
synced 2024-11-26 07:30:54 +00:00
e484b81eee
Summary: **Context:** Below crash test revealed a bug that directory containing CURRENT file (short for `dir_contains_current_file` below) was not always get synced after a new CURRENT is created and being called with `RenameFile` as part of the creation. This bug exposes a risk that such un-synced directory containing the updated CURRENT can’t survive a host crash (e.g, power loss) hence get corrupted. This then will be followed by a recovery from a corrupted CURRENT that we don't want. The root-cause is that a nullptr `FSDirectory* dir_contains_current_file` sometimes gets passed-down to `SetCurrentFile()` hence in those case `dir_contains_current_file->FSDirectory::FsyncWithDirOptions()` will be skipped (which otherwise will internally call`Env/FS::SyncDic()` ) ``` ./db_stress --acquire_snapshot_one_in=10000 --adaptive_readahead=1 --allow_data_in_errors=True --avoid_unnecessary_blocking_io=0 --backup_max_size=104857600 --backup_one_in=100000 --batch_protection_bytes_per_key=8 --block_size=16384 --bloom_bits=134.8015470676662 --bottommost_compression_type=disable --cache_size=8388608 --checkpoint_one_in=1000000 --checksum_type=kCRC32c --clear_column_family_one_in=0 --compact_files_one_in=1000000 --compact_range_one_in=1000000 --compaction_pri=2 --compaction_ttl=100 --compression_max_dict_buffer_bytes=511 --compression_max_dict_bytes=16384 --compression_type=zstd --compression_use_zstd_dict_trainer=1 --compression_zstd_max_train_bytes=65536 --continuous_verification_interval=0 --data_block_index_type=0 --db=$db --db_write_buffer_size=1048576 --delpercent=5 --delrangepercent=0 --destroy_db_initially=0 --disable_wal=0 --enable_compaction_filter=0 --enable_pipelined_write=1 --expected_values_dir=$exp --fail_if_options_file_error=1 --file_checksum_impl=none --flush_one_in=1000000 --get_current_wal_file_one_in=0 --get_live_files_one_in=1000000 --get_property_one_in=1000000 --get_sorted_wal_files_one_in=0 --index_block_restart_interval=4 --ingest_external_file_one_in=0 --iterpercent=10 --key_len_percent_dist=1,30,69 --level_compaction_dynamic_level_bytes=True --mark_for_compaction_one_file_in=10 --max_background_compactions=20 --max_bytes_for_level_base=10485760 --max_key=10000 --max_key_len=3 --max_manifest_file_size=16384 --max_write_batch_group_size_bytes=64 --max_write_buffer_number=3 --max_write_buffer_size_to_maintain=0 --memtable_prefix_bloom_size_ratio=0.001 --memtable_protection_bytes_per_key=1 --memtable_whole_key_filtering=1 --mmap_read=1 --nooverwritepercent=1 --open_metadata_write_fault_one_in=0 --open_read_fault_one_in=0 --open_write_fault_one_in=0 --ops_per_thread=100000000 --optimize_filters_for_memory=1 --paranoid_file_checks=1 --partition_pinning=2 --pause_background_one_in=1000000 --periodic_compaction_seconds=0 --prefix_size=5 --prefixpercent=5 --prepopulate_block_cache=1 --progress_reports=0 --read_fault_one_in=1000 --readpercent=45 --recycle_log_file_num=0 --reopen=0 --ribbon_starting_level=999 --secondary_cache_fault_one_in=32 --secondary_cache_uri=compressed_secondary_cache://capacity=8388608 --set_options_one_in=10000 --snapshot_hold_ops=100000 --sst_file_manager_bytes_per_sec=0 --sst_file_manager_bytes_per_truncate=0 --subcompactions=3 --sync_fault_injection=1 --target_file_size_base=2097 --target_file_size_multiplier=2 --test_batches_snapshots=1 --top_level_index_pinning=1 --use_full_merge_v1=1 --use_merge=1 --value_size_mult=32 --verify_checksum=1 --verify_checksum_one_in=1000000 --verify_db_one_in=100000 --verify_sst_unique_id_in_manifest=1 --wal_bytes_per_sync=524288 --write_buffer_size=4194 --writepercent=35 ``` ``` stderr: WARNING: prefix_size is non-zero but memtablerep != prefix_hash db_stress: utilities/fault_injection_fs.cc:748: virtual rocksdb::IOStatus rocksdb::FaultInjectionTestFS::RenameFile(const std::string &, const std::string &, const rocksdb::IOOptions &, rocksdb::IODebugContext *): Assertion `tlist.find(tdn.second) == tlist.end()' failed.` ``` **Summary:** The PR ensured the non-test path pass down a non-null dir containing CURRENT (which is by current RocksDB assumption just db_dir) by doing the following: - Renamed `directory_to_fsync` as `dir_contains_current_file` in `SetCurrentFile()` to tighten the association between this directory and CURRENT file - Changed `SetCurrentFile()` API to require `dir_contains_current_file` being passed-in, instead of making it by default nullptr. - Because `SetCurrentFile()`'s `dir_contains_current_file` is passed down from `VersionSet::LogAndApply()` then `VersionSet::ProcessManifestWrites()` (i.e, think about this as a chain of 3 functions related to MANIFEST update), these 2 functions also got refactored to require `dir_contains_current_file` - Updated the non-test-path callers of these 3 functions to obtain and pass in non-nullptr `dir_contains_current_file`, which by current assumption of RocksDB, is the `FSDirectory* db_dir`. - `db_impl` path will obtain `DBImpl::directories_.getDbDir()` while others with no access to such `directories_` are obtained on the fly by creating such object `FileSystem::NewDirectory(..)` and manage it by unique pointers to ensure short life time. Pull Request resolved: https://github.com/facebook/rocksdb/pull/10573 Test Plan: - `make check` - Passed the repro db_stress command - For future improvement, since we currently don't assert dir containing CURRENT to be non-nullptr due to https://github.com/facebook/rocksdb/pull/10573#pullrequestreview-1087698899, there is still chances that future developers mistakenly pass down nullptr dir containing CURRENT thus resulting skipped sync dir and cause the bug again. Therefore a smarter test (e.g, such as quoted from ajkr "(make) unsynced data loss to be dropping files corresponding to unsynced directory entries") is still needed. Reviewed By: ajkr Differential Revision: D39005886 Pulled By: hx235 fbshipit-source-id: 336fb9090d0cfa6ca3dd580db86268007dde7f5a
190 lines
7.7 KiB
C++
190 lines
7.7 KiB
C++
// Copyright (c) 2011-present, Facebook, Inc. All rights reserved.
|
|
// This source code is licensed under both the GPLv2 (found in the
|
|
// COPYING file in the root directory) and Apache 2.0 License
|
|
// (found in the LICENSE.Apache file in the root directory).
|
|
//
|
|
// Copyright (c) 2011 The LevelDB Authors. All rights reserved.
|
|
// Use of this source code is governed by a BSD-style license that can be
|
|
// found in the LICENSE file. See the AUTHORS file for names of contributors.
|
|
//
|
|
// File names used by DB code
|
|
|
|
#pragma once
|
|
#include <stdint.h>
|
|
#include <unordered_map>
|
|
#include <string>
|
|
#include <vector>
|
|
|
|
#include "options/db_options.h"
|
|
#include "port/port.h"
|
|
#include "rocksdb/file_system.h"
|
|
#include "rocksdb/options.h"
|
|
#include "rocksdb/slice.h"
|
|
#include "rocksdb/status.h"
|
|
#include "rocksdb/transaction_log.h"
|
|
|
|
namespace ROCKSDB_NAMESPACE {
|
|
|
|
class Env;
|
|
class Directory;
|
|
class SystemClock;
|
|
class WritableFileWriter;
|
|
|
|
#ifdef OS_WIN
|
|
constexpr char kFilePathSeparator = '\\';
|
|
#else
|
|
constexpr char kFilePathSeparator = '/';
|
|
#endif
|
|
|
|
// Return the name of the log file with the specified number
|
|
// in the db named by "dbname". The result will be prefixed with
|
|
// "dbname".
|
|
extern std::string LogFileName(const std::string& dbname, uint64_t number);
|
|
|
|
extern std::string LogFileName(uint64_t number);
|
|
|
|
extern std::string BlobFileName(uint64_t number);
|
|
|
|
extern std::string BlobFileName(const std::string& bdirname, uint64_t number);
|
|
|
|
extern std::string BlobFileName(const std::string& dbname,
|
|
const std::string& blob_dir, uint64_t number);
|
|
|
|
extern std::string ArchivalDirectory(const std::string& dbname);
|
|
|
|
// Return the name of the archived log file with the specified number
|
|
// in the db named by "dbname". The result will be prefixed with "dbname".
|
|
extern std::string ArchivedLogFileName(const std::string& dbname,
|
|
uint64_t num);
|
|
|
|
extern std::string MakeTableFileName(const std::string& name, uint64_t number);
|
|
|
|
extern std::string MakeTableFileName(uint64_t number);
|
|
|
|
// Return the name of sstable with LevelDB suffix
|
|
// created from RocksDB sstable suffixed name
|
|
extern std::string Rocks2LevelTableFileName(const std::string& fullname);
|
|
|
|
// the reverse function of MakeTableFileName
|
|
// TODO(yhchiang): could merge this function with ParseFileName()
|
|
extern uint64_t TableFileNameToNumber(const std::string& name);
|
|
|
|
// Return the name of the sstable with the specified number
|
|
// in the db named by "dbname". The result will be prefixed with
|
|
// "dbname".
|
|
extern std::string TableFileName(const std::vector<DbPath>& db_paths,
|
|
uint64_t number, uint32_t path_id);
|
|
|
|
// Sufficient buffer size for FormatFileNumber.
|
|
const size_t kFormatFileNumberBufSize = 38;
|
|
|
|
extern void FormatFileNumber(uint64_t number, uint32_t path_id, char* out_buf,
|
|
size_t out_buf_size);
|
|
|
|
// Return the name of the descriptor file for the db named by
|
|
// "dbname" and the specified incarnation number. The result will be
|
|
// prefixed with "dbname".
|
|
extern std::string DescriptorFileName(const std::string& dbname,
|
|
uint64_t number);
|
|
|
|
extern std::string DescriptorFileName(uint64_t number);
|
|
|
|
extern const std::string kCurrentFileName; // = "CURRENT"
|
|
|
|
// Return the name of the current file. This file contains the name
|
|
// of the current manifest file. The result will be prefixed with
|
|
// "dbname".
|
|
extern std::string CurrentFileName(const std::string& dbname);
|
|
|
|
// Return the name of the lock file for the db named by
|
|
// "dbname". The result will be prefixed with "dbname".
|
|
extern std::string LockFileName(const std::string& dbname);
|
|
|
|
// Return the name of a temporary file owned by the db named "dbname".
|
|
// The result will be prefixed with "dbname".
|
|
extern std::string TempFileName(const std::string& dbname, uint64_t number);
|
|
|
|
// A helper structure for prefix of info log names.
|
|
struct InfoLogPrefix {
|
|
char buf[260];
|
|
Slice prefix;
|
|
// Prefix with DB absolute path encoded
|
|
explicit InfoLogPrefix(bool has_log_dir, const std::string& db_absolute_path);
|
|
// Default Prefix
|
|
explicit InfoLogPrefix();
|
|
};
|
|
|
|
// Return the name of the info log file for "dbname".
|
|
extern std::string InfoLogFileName(const std::string& dbname,
|
|
const std::string& db_path = "",
|
|
const std::string& log_dir = "");
|
|
|
|
// Return the name of the old info log file for "dbname".
|
|
extern std::string OldInfoLogFileName(const std::string& dbname, uint64_t ts,
|
|
const std::string& db_path = "",
|
|
const std::string& log_dir = "");
|
|
|
|
extern const std::string kOptionsFileNamePrefix; // = "OPTIONS-"
|
|
extern const std::string kTempFileNameSuffix; // = "dbtmp"
|
|
|
|
// Return a options file name given the "dbname" and file number.
|
|
// Format: OPTIONS-[number].dbtmp
|
|
extern std::string OptionsFileName(const std::string& dbname,
|
|
uint64_t file_num);
|
|
extern std::string OptionsFileName(uint64_t file_num);
|
|
|
|
// Return a temp options file name given the "dbname" and file number.
|
|
// Format: OPTIONS-[number]
|
|
extern std::string TempOptionsFileName(const std::string& dbname,
|
|
uint64_t file_num);
|
|
|
|
// Return the name to use for a metadatabase. The result will be prefixed with
|
|
// "dbname".
|
|
extern std::string MetaDatabaseName(const std::string& dbname,
|
|
uint64_t number);
|
|
|
|
// Return the name of the Identity file which stores a unique number for the db
|
|
// that will get regenerated if the db loses all its data and is recreated fresh
|
|
// either from a backup-image or empty
|
|
extern std::string IdentityFileName(const std::string& dbname);
|
|
|
|
// If filename is a rocksdb file, store the type of the file in *type.
|
|
// The number encoded in the filename is stored in *number. If the
|
|
// filename was successfully parsed, returns true. Else return false.
|
|
// info_log_name_prefix is the path of info logs.
|
|
extern bool ParseFileName(const std::string& filename, uint64_t* number,
|
|
const Slice& info_log_name_prefix, FileType* type,
|
|
WalFileType* log_type = nullptr);
|
|
// Same as previous function, but skip info log files.
|
|
extern bool ParseFileName(const std::string& filename, uint64_t* number,
|
|
FileType* type, WalFileType* log_type = nullptr);
|
|
|
|
// Make the CURRENT file point to the descriptor file with the
|
|
// specified number. On its success and when dir_contains_current_file is not
|
|
// nullptr, the function will fsync the directory containing the CURRENT file
|
|
// when
|
|
extern IOStatus SetCurrentFile(FileSystem* fs, const std::string& dbname,
|
|
uint64_t descriptor_number,
|
|
FSDirectory* dir_contains_current_file);
|
|
|
|
// Make the IDENTITY file for the db
|
|
extern Status SetIdentityFile(Env* env, const std::string& dbname,
|
|
const std::string& db_id = {});
|
|
|
|
// Sync manifest file `file`.
|
|
extern IOStatus SyncManifest(const ImmutableDBOptions* db_options,
|
|
WritableFileWriter* file);
|
|
|
|
// Return list of file names of info logs in `file_names`.
|
|
// The list only contains file name. The parent directory name is stored
|
|
// in `parent_dir`.
|
|
// `db_log_dir` should be the one as in options.db_log_dir
|
|
extern Status GetInfoLogFiles(const std::shared_ptr<FileSystem>& fs,
|
|
const std::string& db_log_dir,
|
|
const std::string& dbname,
|
|
std::string* parent_dir,
|
|
std::vector<std::string>* file_names);
|
|
|
|
extern std::string NormalizePath(const std::string& path);
|
|
} // namespace ROCKSDB_NAMESPACE
|