2016-02-09 23:12:00 +00:00
|
|
|
// Copyright (c) 2011-present, Facebook, Inc. All rights reserved.
|
2017-07-15 23:03:42 +00:00
|
|
|
// This source code is licensed under both the GPLv2 (found in the
|
|
|
|
// COPYING file in the root directory) and Apache 2.0 License
|
|
|
|
// (found in the LICENSE.Apache file in the root directory).
|
2013-10-16 21:59:46 +00:00
|
|
|
//
|
2011-03-18 22:37:00 +00:00
|
|
|
// Copyright (c) 2011 The LevelDB Authors. All rights reserved.
|
|
|
|
// Use of this source code is governed by a BSD-style license that can be
|
|
|
|
// found in the LICENSE file. See the AUTHORS file for names of contributors.
|
|
|
|
|
2013-08-23 15:38:13 +00:00
|
|
|
#include "rocksdb/env.h"
|
Buffer info logs when picking compactions and write them out after releasing the mutex
Summary: Now while the background thread is picking compactions, it writes out multiple info_logs, especially for universal compaction, which introduces a chance of waiting log writing in mutex, which is bad. To remove this risk, write all those info logs to a buffer and flush it after releasing the mutex.
Test Plan:
make all check
check the log lines while running some tests that trigger compactions.
Reviewers: haobo, igor, dhruba
Reviewed By: dhruba
CC: i.am.jin.lei, dhruba, yhchiang, leveldb, nkg-
Differential Revision: https://reviews.facebook.net/D16515
2014-03-04 22:32:55 +00:00
|
|
|
|
2015-06-11 21:18:02 +00:00
|
|
|
#include <thread>
|
2021-01-26 06:07:26 +00:00
|
|
|
|
Introduce a new storage specific Env API (#5761)
Summary:
The current Env API encompasses both storage/file operations, as well as OS related operations. Most of the APIs return a Status, which does not have enough metadata about an error, such as whether its retry-able or not, scope (i.e fault domain) of the error etc., that may be required in order to properly handle a storage error. The file APIs also do not provide enough control over the IO SLA, such as timeout, prioritization, hinting about placement and redundancy etc.
This PR separates out the file/storage APIs from Env into a new FileSystem class. The APIs are updated to return an IOStatus with metadata about the error, as well as to take an IOOptions structure as input in order to allow more control over the IO.
The user can set both ```options.env``` and ```options.file_system``` to specify that RocksDB should use the former for OS related operations and the latter for storage operations. Internally, a ```CompositeEnvWrapper``` has been introduced that inherits from ```Env``` and redirects individual methods to either an ```Env``` implementation or the ```FileSystem``` as appropriate. When options are sanitized during ```DB::Open```, ```options.env``` is replaced with a newly allocated ```CompositeEnvWrapper``` instance if both env and file_system have been specified. This way, the rest of the RocksDB code can continue to function as before.
This PR also ports PosixEnv to the new API by splitting it into two - PosixEnv and PosixFileSystem. PosixEnv is defined as a sub-class of CompositeEnvWrapper, and threading/time functions are overridden with Posix specific implementations in order to avoid an extra level of indirection.
The ```CompositeEnvWrapper``` translates ```IOStatus``` return code to ```Status```, and sets the severity to ```kSoftError``` if the io_status is retryable. The error handling code in RocksDB can then recover the DB automatically.
Pull Request resolved: https://github.com/facebook/rocksdb/pull/5761
Differential Revision: D18868376
Pulled By: anand1976
fbshipit-source-id: 39efe18a162ea746fabac6360ff529baba48486f
2019-12-13 22:47:08 +00:00
|
|
|
#include "env/composite_env_wrapper.h"
|
2021-09-21 15:53:03 +00:00
|
|
|
#include "env/emulated_clock.h"
|
2022-01-05 00:44:54 +00:00
|
|
|
#include "env/mock_env.h"
|
Experimental support for SST unique IDs (#8990)
Summary:
* New public header unique_id.h and function GetUniqueIdFromTableProperties
which computes a universally unique identifier based on table properties
of table files from recent RocksDB versions.
* Generation of DB session IDs is refactored so that they are
guaranteed unique in the lifetime of a process running RocksDB.
(SemiStructuredUniqueIdGen, new test included.) Along with file numbers,
this enables SST unique IDs to be guaranteed unique among SSTs generated
in a single process, and "better than random" between processes.
See https://github.com/pdillinger/unique_id
* In addition to public API producing 'external' unique IDs, there is a function
for producing 'internal' unique IDs, with functions for converting between the
two. In short, the external ID is "safe" for things people might do with it, and
the internal ID enables more "power user" features for the future. Specifically,
the external ID goes through a hashing layer so that any subset of bits in the
external ID can be used as a hash of the full ID, while also preserving
uniqueness guarantees in the first 128 bits (bijective both on first 128 bits
and on full 192 bits).
Intended follow-up:
* Use the internal unique IDs in cache keys. (Avoid conflicts with https://github.com/facebook/rocksdb/issues/8912) (The file offset can be XORed into
the third 64-bit value of the unique ID.)
* Publish the external unique IDs in FileStorageInfo (https://github.com/facebook/rocksdb/issues/8968)
Pull Request resolved: https://github.com/facebook/rocksdb/pull/8990
Test Plan:
Unit tests added, and checking of unique ids in stress test.
NOTE in stress test we do not generate nearly enough files to thoroughly
stress uniqueness, but the test trims off pieces of the ID to check for
uniqueness so that we can infer (with some assumptions) stronger
properties in the aggregate.
Reviewed By: zhichao-cao, mrambacher
Differential Revision: D31582865
Pulled By: pdillinger
fbshipit-source-id: 1f620c4c86af9abe2a8d177b9ccf2ad2b9f48243
2021-10-19 06:28:28 +00:00
|
|
|
#include "env/unique_id_gen.h"
|
2019-07-09 21:48:07 +00:00
|
|
|
#include "logging/env_logger.h"
|
2019-05-31 00:39:43 +00:00
|
|
|
#include "memory/arena.h"
|
2017-04-13 20:07:33 +00:00
|
|
|
#include "options/db_options.h"
|
2015-07-16 19:10:16 +00:00
|
|
|
#include "port/port.h"
|
2021-06-15 10:42:52 +00:00
|
|
|
#include "rocksdb/convenience.h"
|
2013-08-23 15:38:13 +00:00
|
|
|
#include "rocksdb/options.h"
|
2021-01-26 06:07:26 +00:00
|
|
|
#include "rocksdb/system_clock.h"
|
2021-09-21 15:53:03 +00:00
|
|
|
#include "rocksdb/utilities/customizable_util.h"
|
2019-07-24 00:08:26 +00:00
|
|
|
#include "rocksdb/utilities/object_registry.h"
|
2021-09-21 15:53:03 +00:00
|
|
|
#include "rocksdb/utilities/options_type.h"
|
Buffer info logs when picking compactions and write them out after releasing the mutex
Summary: Now while the background thread is picking compactions, it writes out multiple info_logs, especially for universal compaction, which introduces a chance of waiting log writing in mutex, which is bad. To remove this risk, write all those info logs to a buffer and flush it after releasing the mutex.
Test Plan:
make all check
check the log lines while running some tests that trigger compactions.
Reviewers: haobo, igor, dhruba
Reviewed By: dhruba
CC: i.am.jin.lei, dhruba, yhchiang, leveldb, nkg-
Differential Revision: https://reviews.facebook.net/D16515
2014-03-04 22:32:55 +00:00
|
|
|
#include "util/autovector.h"
|
2011-03-18 22:37:00 +00:00
|
|
|
|
2020-02-20 20:07:53 +00:00
|
|
|
namespace ROCKSDB_NAMESPACE {
|
2021-01-06 18:48:24 +00:00
|
|
|
namespace {
|
2022-01-05 00:44:54 +00:00
|
|
|
static int RegisterBuiltinEnvs(ObjectLibrary& library,
|
|
|
|
const std::string& /*arg*/) {
|
2022-01-11 14:32:42 +00:00
|
|
|
library.AddFactory<Env>(MockEnv::kClassName(), [](const std::string& /*uri*/,
|
|
|
|
std::unique_ptr<Env>* guard,
|
|
|
|
std::string* /* errmsg */) {
|
2022-01-05 00:44:54 +00:00
|
|
|
guard->reset(MockEnv::Create(Env::Default()));
|
|
|
|
return guard->get();
|
|
|
|
});
|
2022-01-11 14:32:42 +00:00
|
|
|
library.AddFactory<Env>(
|
2022-01-05 00:44:54 +00:00
|
|
|
CompositeEnvWrapper::kClassName(),
|
|
|
|
[](const std::string& /*uri*/, std::unique_ptr<Env>* guard,
|
|
|
|
std::string* /* errmsg */) {
|
|
|
|
guard->reset(new CompositeEnvWrapper(Env::Default()));
|
|
|
|
return guard->get();
|
|
|
|
});
|
|
|
|
size_t num_types;
|
|
|
|
return static_cast<int>(library.GetFactoryCount(&num_types));
|
|
|
|
}
|
|
|
|
|
|
|
|
static void RegisterSystemEnvs() {
|
|
|
|
static std::once_flag loaded;
|
|
|
|
std::call_once(loaded, [&]() {
|
|
|
|
RegisterBuiltinEnvs(*(ObjectLibrary::Default().get()), "");
|
|
|
|
});
|
|
|
|
}
|
|
|
|
|
2021-01-26 06:07:26 +00:00
|
|
|
class LegacySystemClock : public SystemClock {
|
|
|
|
private:
|
|
|
|
Env* env_;
|
|
|
|
|
|
|
|
public:
|
|
|
|
explicit LegacySystemClock(Env* env) : env_(env) {}
|
2022-01-05 00:44:54 +00:00
|
|
|
const char* Name() const override { return "LegacySystemClock"; }
|
2021-01-26 06:07:26 +00:00
|
|
|
|
|
|
|
// Returns the number of micro-seconds since some fixed point in time.
|
|
|
|
// It is often used as system time such as in GenericRateLimiter
|
|
|
|
// and other places so a port needs to return system time in order to work.
|
|
|
|
uint64_t NowMicros() override { return env_->NowMicros(); }
|
|
|
|
|
|
|
|
// Returns the number of nano-seconds since some fixed point in time. Only
|
|
|
|
// useful for computing deltas of time in one run.
|
|
|
|
// Default implementation simply relies on NowMicros.
|
|
|
|
// In platform-specific implementations, NowNanos() should return time points
|
|
|
|
// that are MONOTONIC.
|
|
|
|
uint64_t NowNanos() override { return env_->NowNanos(); }
|
|
|
|
|
|
|
|
uint64_t CPUMicros() override { return CPUNanos() / 1000; }
|
|
|
|
uint64_t CPUNanos() override { return env_->NowCPUNanos(); }
|
|
|
|
|
|
|
|
// Sleep/delay the thread for the prescribed number of micro-seconds.
|
|
|
|
void SleepForMicroseconds(int micros) override {
|
|
|
|
env_->SleepForMicroseconds(micros);
|
|
|
|
}
|
|
|
|
|
|
|
|
// Get the number of seconds since the Epoch, 1970-01-01 00:00:00 (UTC).
|
|
|
|
// Only overwrites *unix_time on success.
|
|
|
|
Status GetCurrentTime(int64_t* unix_time) override {
|
|
|
|
return env_->GetCurrentTime(unix_time);
|
|
|
|
}
|
|
|
|
// Converts seconds-since-Jan-01-1970 to a printable string
|
|
|
|
std::string TimeToString(uint64_t time) override {
|
|
|
|
return env_->TimeToString(time);
|
|
|
|
}
|
2022-01-05 00:44:54 +00:00
|
|
|
|
|
|
|
std::string SerializeOptions(const ConfigOptions& /*config_options*/,
|
|
|
|
const std::string& /*prefix*/) const override {
|
|
|
|
// We do not want the LegacySystemClock to appear in the serialized output.
|
|
|
|
// This clock is an internal class for those who do not implement one and
|
|
|
|
// would be part of the Env. As such, do not serialize it here.
|
|
|
|
return "";
|
|
|
|
}
|
2021-01-26 06:07:26 +00:00
|
|
|
};
|
|
|
|
|
2021-01-29 06:08:46 +00:00
|
|
|
class LegacySequentialFileWrapper : public FSSequentialFile {
|
|
|
|
public:
|
|
|
|
explicit LegacySequentialFileWrapper(
|
|
|
|
std::unique_ptr<SequentialFile>&& _target)
|
|
|
|
: target_(std::move(_target)) {}
|
|
|
|
|
|
|
|
IOStatus Read(size_t n, const IOOptions& /*options*/, Slice* result,
|
|
|
|
char* scratch, IODebugContext* /*dbg*/) override {
|
|
|
|
return status_to_io_status(target_->Read(n, result, scratch));
|
|
|
|
}
|
|
|
|
IOStatus Skip(uint64_t n) override {
|
|
|
|
return status_to_io_status(target_->Skip(n));
|
|
|
|
}
|
|
|
|
bool use_direct_io() const override { return target_->use_direct_io(); }
|
|
|
|
size_t GetRequiredBufferAlignment() const override {
|
|
|
|
return target_->GetRequiredBufferAlignment();
|
|
|
|
}
|
|
|
|
IOStatus InvalidateCache(size_t offset, size_t length) override {
|
|
|
|
return status_to_io_status(target_->InvalidateCache(offset, length));
|
|
|
|
}
|
|
|
|
IOStatus PositionedRead(uint64_t offset, size_t n,
|
|
|
|
const IOOptions& /*options*/, Slice* result,
|
|
|
|
char* scratch, IODebugContext* /*dbg*/) override {
|
|
|
|
return status_to_io_status(
|
|
|
|
target_->PositionedRead(offset, n, result, scratch));
|
|
|
|
}
|
|
|
|
|
|
|
|
private:
|
|
|
|
std::unique_ptr<SequentialFile> target_;
|
|
|
|
};
|
|
|
|
|
|
|
|
class LegacyRandomAccessFileWrapper : public FSRandomAccessFile {
|
|
|
|
public:
|
|
|
|
explicit LegacyRandomAccessFileWrapper(
|
|
|
|
std::unique_ptr<RandomAccessFile>&& target)
|
|
|
|
: target_(std::move(target)) {}
|
|
|
|
|
|
|
|
IOStatus Read(uint64_t offset, size_t n, const IOOptions& /*options*/,
|
|
|
|
Slice* result, char* scratch,
|
|
|
|
IODebugContext* /*dbg*/) const override {
|
|
|
|
return status_to_io_status(target_->Read(offset, n, result, scratch));
|
|
|
|
}
|
|
|
|
|
|
|
|
IOStatus MultiRead(FSReadRequest* fs_reqs, size_t num_reqs,
|
|
|
|
const IOOptions& /*options*/,
|
|
|
|
IODebugContext* /*dbg*/) override {
|
|
|
|
std::vector<ReadRequest> reqs;
|
|
|
|
Status status;
|
|
|
|
|
|
|
|
reqs.reserve(num_reqs);
|
|
|
|
for (size_t i = 0; i < num_reqs; ++i) {
|
|
|
|
ReadRequest req;
|
|
|
|
|
|
|
|
req.offset = fs_reqs[i].offset;
|
|
|
|
req.len = fs_reqs[i].len;
|
|
|
|
req.scratch = fs_reqs[i].scratch;
|
|
|
|
req.status = Status::OK();
|
2023-06-23 18:48:49 +00:00
|
|
|
reqs.emplace_back(std::move(req));
|
2021-01-29 06:08:46 +00:00
|
|
|
}
|
|
|
|
status = target_->MultiRead(reqs.data(), num_reqs);
|
|
|
|
for (size_t i = 0; i < num_reqs; ++i) {
|
|
|
|
fs_reqs[i].result = reqs[i].result;
|
|
|
|
fs_reqs[i].status = status_to_io_status(std::move(reqs[i].status));
|
|
|
|
}
|
|
|
|
return status_to_io_status(std::move(status));
|
|
|
|
}
|
|
|
|
|
|
|
|
IOStatus Prefetch(uint64_t offset, size_t n, const IOOptions& /*options*/,
|
|
|
|
IODebugContext* /*dbg*/) override {
|
|
|
|
return status_to_io_status(target_->Prefetch(offset, n));
|
|
|
|
}
|
|
|
|
size_t GetUniqueId(char* id, size_t max_size) const override {
|
|
|
|
return target_->GetUniqueId(id, max_size);
|
|
|
|
}
|
|
|
|
void Hint(AccessPattern pattern) override {
|
|
|
|
target_->Hint((RandomAccessFile::AccessPattern)pattern);
|
|
|
|
}
|
|
|
|
bool use_direct_io() const override { return target_->use_direct_io(); }
|
|
|
|
size_t GetRequiredBufferAlignment() const override {
|
|
|
|
return target_->GetRequiredBufferAlignment();
|
|
|
|
}
|
|
|
|
IOStatus InvalidateCache(size_t offset, size_t length) override {
|
|
|
|
return status_to_io_status(target_->InvalidateCache(offset, length));
|
|
|
|
}
|
|
|
|
|
|
|
|
private:
|
|
|
|
std::unique_ptr<RandomAccessFile> target_;
|
|
|
|
};
|
|
|
|
|
|
|
|
class LegacyRandomRWFileWrapper : public FSRandomRWFile {
|
|
|
|
public:
|
|
|
|
explicit LegacyRandomRWFileWrapper(std::unique_ptr<RandomRWFile>&& target)
|
|
|
|
: target_(std::move(target)) {}
|
|
|
|
|
|
|
|
bool use_direct_io() const override { return target_->use_direct_io(); }
|
|
|
|
size_t GetRequiredBufferAlignment() const override {
|
|
|
|
return target_->GetRequiredBufferAlignment();
|
|
|
|
}
|
|
|
|
IOStatus Write(uint64_t offset, const Slice& data,
|
|
|
|
const IOOptions& /*options*/,
|
|
|
|
IODebugContext* /*dbg*/) override {
|
|
|
|
return status_to_io_status(target_->Write(offset, data));
|
|
|
|
}
|
|
|
|
IOStatus Read(uint64_t offset, size_t n, const IOOptions& /*options*/,
|
|
|
|
Slice* result, char* scratch,
|
|
|
|
IODebugContext* /*dbg*/) const override {
|
|
|
|
return status_to_io_status(target_->Read(offset, n, result, scratch));
|
|
|
|
}
|
|
|
|
IOStatus Flush(const IOOptions& /*options*/,
|
|
|
|
IODebugContext* /*dbg*/) override {
|
|
|
|
return status_to_io_status(target_->Flush());
|
|
|
|
}
|
|
|
|
IOStatus Sync(const IOOptions& /*options*/,
|
|
|
|
IODebugContext* /*dbg*/) override {
|
|
|
|
return status_to_io_status(target_->Sync());
|
|
|
|
}
|
|
|
|
IOStatus Fsync(const IOOptions& /*options*/,
|
|
|
|
IODebugContext* /*dbg*/) override {
|
|
|
|
return status_to_io_status(target_->Fsync());
|
|
|
|
}
|
|
|
|
IOStatus Close(const IOOptions& /*options*/,
|
|
|
|
IODebugContext* /*dbg*/) override {
|
|
|
|
return status_to_io_status(target_->Close());
|
|
|
|
}
|
|
|
|
|
|
|
|
private:
|
|
|
|
std::unique_ptr<RandomRWFile> target_;
|
|
|
|
};
|
|
|
|
|
|
|
|
class LegacyWritableFileWrapper : public FSWritableFile {
|
|
|
|
public:
|
|
|
|
explicit LegacyWritableFileWrapper(std::unique_ptr<WritableFile>&& _target)
|
|
|
|
: target_(std::move(_target)) {}
|
|
|
|
|
|
|
|
IOStatus Append(const Slice& data, const IOOptions& /*options*/,
|
|
|
|
IODebugContext* /*dbg*/) override {
|
|
|
|
return status_to_io_status(target_->Append(data));
|
|
|
|
}
|
|
|
|
IOStatus Append(const Slice& data, const IOOptions& /*options*/,
|
|
|
|
const DataVerificationInfo& /*verification_info*/,
|
|
|
|
IODebugContext* /*dbg*/) override {
|
|
|
|
return status_to_io_status(target_->Append(data));
|
|
|
|
}
|
|
|
|
IOStatus PositionedAppend(const Slice& data, uint64_t offset,
|
|
|
|
const IOOptions& /*options*/,
|
|
|
|
IODebugContext* /*dbg*/) override {
|
|
|
|
return status_to_io_status(target_->PositionedAppend(data, offset));
|
|
|
|
}
|
|
|
|
IOStatus PositionedAppend(const Slice& data, uint64_t offset,
|
|
|
|
const IOOptions& /*options*/,
|
|
|
|
const DataVerificationInfo& /*verification_info*/,
|
|
|
|
IODebugContext* /*dbg*/) override {
|
|
|
|
return status_to_io_status(target_->PositionedAppend(data, offset));
|
|
|
|
}
|
|
|
|
IOStatus Truncate(uint64_t size, const IOOptions& /*options*/,
|
|
|
|
IODebugContext* /*dbg*/) override {
|
|
|
|
return status_to_io_status(target_->Truncate(size));
|
|
|
|
}
|
|
|
|
IOStatus Close(const IOOptions& /*options*/,
|
|
|
|
IODebugContext* /*dbg*/) override {
|
|
|
|
return status_to_io_status(target_->Close());
|
|
|
|
}
|
|
|
|
IOStatus Flush(const IOOptions& /*options*/,
|
|
|
|
IODebugContext* /*dbg*/) override {
|
|
|
|
return status_to_io_status(target_->Flush());
|
|
|
|
}
|
|
|
|
IOStatus Sync(const IOOptions& /*options*/,
|
|
|
|
IODebugContext* /*dbg*/) override {
|
|
|
|
return status_to_io_status(target_->Sync());
|
|
|
|
}
|
|
|
|
IOStatus Fsync(const IOOptions& /*options*/,
|
|
|
|
IODebugContext* /*dbg*/) override {
|
|
|
|
return status_to_io_status(target_->Fsync());
|
|
|
|
}
|
|
|
|
bool IsSyncThreadSafe() const override { return target_->IsSyncThreadSafe(); }
|
|
|
|
|
|
|
|
bool use_direct_io() const override { return target_->use_direct_io(); }
|
|
|
|
|
|
|
|
size_t GetRequiredBufferAlignment() const override {
|
|
|
|
return target_->GetRequiredBufferAlignment();
|
|
|
|
}
|
|
|
|
|
|
|
|
void SetWriteLifeTimeHint(Env::WriteLifeTimeHint hint) override {
|
|
|
|
target_->SetWriteLifeTimeHint(hint);
|
|
|
|
}
|
|
|
|
|
|
|
|
Env::WriteLifeTimeHint GetWriteLifeTimeHint() override {
|
|
|
|
return target_->GetWriteLifeTimeHint();
|
|
|
|
}
|
|
|
|
|
|
|
|
uint64_t GetFileSize(const IOOptions& /*options*/,
|
|
|
|
IODebugContext* /*dbg*/) override {
|
|
|
|
return target_->GetFileSize();
|
|
|
|
}
|
|
|
|
|
|
|
|
void SetPreallocationBlockSize(size_t size) override {
|
|
|
|
target_->SetPreallocationBlockSize(size);
|
|
|
|
}
|
|
|
|
|
|
|
|
void GetPreallocationStatus(size_t* block_size,
|
|
|
|
size_t* last_allocated_block) override {
|
|
|
|
target_->GetPreallocationStatus(block_size, last_allocated_block);
|
|
|
|
}
|
|
|
|
|
|
|
|
size_t GetUniqueId(char* id, size_t max_size) const override {
|
|
|
|
return target_->GetUniqueId(id, max_size);
|
|
|
|
}
|
|
|
|
|
|
|
|
IOStatus InvalidateCache(size_t offset, size_t length) override {
|
|
|
|
return status_to_io_status(target_->InvalidateCache(offset, length));
|
|
|
|
}
|
|
|
|
|
|
|
|
IOStatus RangeSync(uint64_t offset, uint64_t nbytes,
|
|
|
|
const IOOptions& /*options*/,
|
|
|
|
IODebugContext* /*dbg*/) override {
|
|
|
|
return status_to_io_status(target_->RangeSync(offset, nbytes));
|
|
|
|
}
|
|
|
|
|
|
|
|
void PrepareWrite(size_t offset, size_t len, const IOOptions& /*options*/,
|
|
|
|
IODebugContext* /*dbg*/) override {
|
|
|
|
target_->PrepareWrite(offset, len);
|
|
|
|
}
|
|
|
|
|
|
|
|
IOStatus Allocate(uint64_t offset, uint64_t len, const IOOptions& /*options*/,
|
|
|
|
IODebugContext* /*dbg*/) override {
|
|
|
|
return status_to_io_status(target_->Allocate(offset, len));
|
|
|
|
}
|
|
|
|
|
|
|
|
private:
|
|
|
|
std::unique_ptr<WritableFile> target_;
|
|
|
|
};
|
|
|
|
|
|
|
|
class LegacyDirectoryWrapper : public FSDirectory {
|
|
|
|
public:
|
|
|
|
explicit LegacyDirectoryWrapper(std::unique_ptr<Directory>&& target)
|
|
|
|
: target_(std::move(target)) {}
|
|
|
|
|
|
|
|
IOStatus Fsync(const IOOptions& /*options*/,
|
|
|
|
IODebugContext* /*dbg*/) override {
|
|
|
|
return status_to_io_status(target_->Fsync());
|
|
|
|
}
|
Explicitly closing all directory file descriptors (#10049)
Summary:
Currently, the DB directory file descriptor is left open until the deconstruction process (`DB::Close()` does not close the file descriptor). To verify this, comment out the lines between `db_ = nullptr` and `db_->Close()` (line 512, 513, 514, 515 in ldb_cmd.cc) to leak the ``db_'' object, build `ldb` tool and run
```
strace --trace=open,openat,close ./ldb --db=$TEST_TMPDIR --ignore_unknown_options put K1 V1 --create_if_missing
```
There is one directory file descriptor that is not closed in the strace log.
Pull Request resolved: https://github.com/facebook/rocksdb/pull/10049
Test Plan: Add a new unit test DBBasicTest.DBCloseAllDirectoryFDs: Open a database with different WAL directory and three different data directories, and all directory file descriptors should be closed after calling Close(). Explicitly call Close() after a directory file descriptor is not used so that the counter of directory open and close should be equivalent.
Reviewed By: ajkr, hx235
Differential Revision: D36722135
Pulled By: littlepig2013
fbshipit-source-id: 07bdc2abc417c6b30997b9bbef1f79aa757b21ff
2022-06-02 01:03:34 +00:00
|
|
|
IOStatus Close(const IOOptions& /*options*/,
|
|
|
|
IODebugContext* /*dbg*/) override {
|
|
|
|
return status_to_io_status(target_->Close());
|
|
|
|
}
|
2021-01-29 06:08:46 +00:00
|
|
|
size_t GetUniqueId(char* id, size_t max_size) const override {
|
|
|
|
return target_->GetUniqueId(id, max_size);
|
|
|
|
}
|
|
|
|
|
|
|
|
private:
|
|
|
|
std::unique_ptr<Directory> target_;
|
|
|
|
};
|
|
|
|
|
2021-01-06 18:48:24 +00:00
|
|
|
class LegacyFileSystemWrapper : public FileSystem {
|
|
|
|
public:
|
|
|
|
// Initialize an EnvWrapper that delegates all calls to *t
|
|
|
|
explicit LegacyFileSystemWrapper(Env* t) : target_(t) {}
|
2023-12-04 19:17:32 +00:00
|
|
|
~LegacyFileSystemWrapper() override = default;
|
2021-01-06 18:48:24 +00:00
|
|
|
|
2021-11-02 16:06:02 +00:00
|
|
|
static const char* kClassName() { return "LegacyFileSystem"; }
|
|
|
|
const char* Name() const override { return kClassName(); }
|
2021-01-06 18:48:24 +00:00
|
|
|
|
|
|
|
// Return the target to which this Env forwards all calls
|
|
|
|
Env* target() const { return target_; }
|
|
|
|
|
|
|
|
// The following text is boilerplate that forwards all methods to target()
|
|
|
|
IOStatus NewSequentialFile(const std::string& f, const FileOptions& file_opts,
|
|
|
|
std::unique_ptr<FSSequentialFile>* r,
|
|
|
|
IODebugContext* /*dbg*/) override {
|
|
|
|
std::unique_ptr<SequentialFile> file;
|
|
|
|
Status s = target_->NewSequentialFile(f, &file, file_opts);
|
|
|
|
if (s.ok()) {
|
|
|
|
r->reset(new LegacySequentialFileWrapper(std::move(file)));
|
|
|
|
}
|
|
|
|
return status_to_io_status(std::move(s));
|
|
|
|
}
|
|
|
|
IOStatus NewRandomAccessFile(const std::string& f,
|
|
|
|
const FileOptions& file_opts,
|
|
|
|
std::unique_ptr<FSRandomAccessFile>* r,
|
|
|
|
IODebugContext* /*dbg*/) override {
|
|
|
|
std::unique_ptr<RandomAccessFile> file;
|
|
|
|
Status s = target_->NewRandomAccessFile(f, &file, file_opts);
|
|
|
|
if (s.ok()) {
|
|
|
|
r->reset(new LegacyRandomAccessFileWrapper(std::move(file)));
|
|
|
|
}
|
|
|
|
return status_to_io_status(std::move(s));
|
|
|
|
}
|
|
|
|
IOStatus NewWritableFile(const std::string& f, const FileOptions& file_opts,
|
|
|
|
std::unique_ptr<FSWritableFile>* r,
|
|
|
|
IODebugContext* /*dbg*/) override {
|
|
|
|
std::unique_ptr<WritableFile> file;
|
|
|
|
Status s = target_->NewWritableFile(f, &file, file_opts);
|
|
|
|
if (s.ok()) {
|
|
|
|
r->reset(new LegacyWritableFileWrapper(std::move(file)));
|
|
|
|
}
|
|
|
|
return status_to_io_status(std::move(s));
|
|
|
|
}
|
|
|
|
IOStatus ReopenWritableFile(const std::string& fname,
|
|
|
|
const FileOptions& file_opts,
|
|
|
|
std::unique_ptr<FSWritableFile>* result,
|
|
|
|
IODebugContext* /*dbg*/) override {
|
|
|
|
std::unique_ptr<WritableFile> file;
|
|
|
|
Status s = target_->ReopenWritableFile(fname, &file, file_opts);
|
|
|
|
if (s.ok()) {
|
|
|
|
result->reset(new LegacyWritableFileWrapper(std::move(file)));
|
|
|
|
}
|
|
|
|
return status_to_io_status(std::move(s));
|
|
|
|
}
|
|
|
|
IOStatus ReuseWritableFile(const std::string& fname,
|
|
|
|
const std::string& old_fname,
|
|
|
|
const FileOptions& file_opts,
|
|
|
|
std::unique_ptr<FSWritableFile>* r,
|
|
|
|
IODebugContext* /*dbg*/) override {
|
|
|
|
std::unique_ptr<WritableFile> file;
|
|
|
|
Status s = target_->ReuseWritableFile(fname, old_fname, &file, file_opts);
|
|
|
|
if (s.ok()) {
|
|
|
|
r->reset(new LegacyWritableFileWrapper(std::move(file)));
|
|
|
|
}
|
|
|
|
return status_to_io_status(std::move(s));
|
|
|
|
}
|
|
|
|
IOStatus NewRandomRWFile(const std::string& fname,
|
|
|
|
const FileOptions& file_opts,
|
|
|
|
std::unique_ptr<FSRandomRWFile>* result,
|
|
|
|
IODebugContext* /*dbg*/) override {
|
|
|
|
std::unique_ptr<RandomRWFile> file;
|
|
|
|
Status s = target_->NewRandomRWFile(fname, &file, file_opts);
|
|
|
|
if (s.ok()) {
|
|
|
|
result->reset(new LegacyRandomRWFileWrapper(std::move(file)));
|
|
|
|
}
|
|
|
|
return status_to_io_status(std::move(s));
|
|
|
|
}
|
|
|
|
IOStatus NewMemoryMappedFileBuffer(
|
|
|
|
const std::string& fname,
|
|
|
|
std::unique_ptr<MemoryMappedFileBuffer>* result) override {
|
|
|
|
return status_to_io_status(
|
|
|
|
target_->NewMemoryMappedFileBuffer(fname, result));
|
|
|
|
}
|
|
|
|
IOStatus NewDirectory(const std::string& name, const IOOptions& /*io_opts*/,
|
|
|
|
std::unique_ptr<FSDirectory>* result,
|
|
|
|
IODebugContext* /*dbg*/) override {
|
|
|
|
std::unique_ptr<Directory> dir;
|
|
|
|
Status s = target_->NewDirectory(name, &dir);
|
|
|
|
if (s.ok()) {
|
|
|
|
result->reset(new LegacyDirectoryWrapper(std::move(dir)));
|
|
|
|
}
|
|
|
|
return status_to_io_status(std::move(s));
|
|
|
|
}
|
|
|
|
IOStatus FileExists(const std::string& f, const IOOptions& /*io_opts*/,
|
|
|
|
IODebugContext* /*dbg*/) override {
|
|
|
|
return status_to_io_status(target_->FileExists(f));
|
|
|
|
}
|
|
|
|
IOStatus GetChildren(const std::string& dir, const IOOptions& /*io_opts*/,
|
|
|
|
std::vector<std::string>* r,
|
|
|
|
IODebugContext* /*dbg*/) override {
|
|
|
|
return status_to_io_status(target_->GetChildren(dir, r));
|
|
|
|
}
|
|
|
|
IOStatus GetChildrenFileAttributes(const std::string& dir,
|
|
|
|
const IOOptions& /*options*/,
|
|
|
|
std::vector<FileAttributes>* result,
|
|
|
|
IODebugContext* /*dbg*/) override {
|
|
|
|
return status_to_io_status(target_->GetChildrenFileAttributes(dir, result));
|
|
|
|
}
|
|
|
|
IOStatus DeleteFile(const std::string& f, const IOOptions& /*options*/,
|
|
|
|
IODebugContext* /*dbg*/) override {
|
|
|
|
return status_to_io_status(target_->DeleteFile(f));
|
|
|
|
}
|
|
|
|
IOStatus Truncate(const std::string& fname, size_t size,
|
|
|
|
const IOOptions& /*options*/,
|
|
|
|
IODebugContext* /*dbg*/) override {
|
|
|
|
return status_to_io_status(target_->Truncate(fname, size));
|
|
|
|
}
|
|
|
|
IOStatus CreateDir(const std::string& d, const IOOptions& /*options*/,
|
|
|
|
IODebugContext* /*dbg*/) override {
|
|
|
|
return status_to_io_status(target_->CreateDir(d));
|
|
|
|
}
|
|
|
|
IOStatus CreateDirIfMissing(const std::string& d,
|
|
|
|
const IOOptions& /*options*/,
|
|
|
|
IODebugContext* /*dbg*/) override {
|
|
|
|
return status_to_io_status(target_->CreateDirIfMissing(d));
|
|
|
|
}
|
|
|
|
IOStatus DeleteDir(const std::string& d, const IOOptions& /*options*/,
|
|
|
|
IODebugContext* /*dbg*/) override {
|
|
|
|
return status_to_io_status(target_->DeleteDir(d));
|
|
|
|
}
|
|
|
|
IOStatus GetFileSize(const std::string& f, const IOOptions& /*options*/,
|
|
|
|
uint64_t* s, IODebugContext* /*dbg*/) override {
|
|
|
|
return status_to_io_status(target_->GetFileSize(f, s));
|
|
|
|
}
|
|
|
|
|
|
|
|
IOStatus GetFileModificationTime(const std::string& fname,
|
|
|
|
const IOOptions& /*options*/,
|
|
|
|
uint64_t* file_mtime,
|
|
|
|
IODebugContext* /*dbg*/) override {
|
|
|
|
return status_to_io_status(
|
|
|
|
target_->GetFileModificationTime(fname, file_mtime));
|
|
|
|
}
|
|
|
|
|
|
|
|
IOStatus GetAbsolutePath(const std::string& db_path,
|
|
|
|
const IOOptions& /*options*/,
|
|
|
|
std::string* output_path,
|
|
|
|
IODebugContext* /*dbg*/) override {
|
|
|
|
return status_to_io_status(target_->GetAbsolutePath(db_path, output_path));
|
|
|
|
}
|
|
|
|
|
|
|
|
IOStatus RenameFile(const std::string& s, const std::string& t,
|
|
|
|
const IOOptions& /*options*/,
|
|
|
|
IODebugContext* /*dbg*/) override {
|
|
|
|
return status_to_io_status(target_->RenameFile(s, t));
|
|
|
|
}
|
|
|
|
|
|
|
|
IOStatus LinkFile(const std::string& s, const std::string& t,
|
|
|
|
const IOOptions& /*options*/,
|
|
|
|
IODebugContext* /*dbg*/) override {
|
|
|
|
return status_to_io_status(target_->LinkFile(s, t));
|
|
|
|
}
|
|
|
|
|
|
|
|
IOStatus NumFileLinks(const std::string& fname, const IOOptions& /*options*/,
|
|
|
|
uint64_t* count, IODebugContext* /*dbg*/) override {
|
|
|
|
return status_to_io_status(target_->NumFileLinks(fname, count));
|
|
|
|
}
|
|
|
|
|
|
|
|
IOStatus AreFilesSame(const std::string& first, const std::string& second,
|
|
|
|
const IOOptions& /*options*/, bool* res,
|
|
|
|
IODebugContext* /*dbg*/) override {
|
|
|
|
return status_to_io_status(target_->AreFilesSame(first, second, res));
|
|
|
|
}
|
|
|
|
|
|
|
|
IOStatus LockFile(const std::string& f, const IOOptions& /*options*/,
|
|
|
|
FileLock** l, IODebugContext* /*dbg*/) override {
|
|
|
|
return status_to_io_status(target_->LockFile(f, l));
|
|
|
|
}
|
|
|
|
|
|
|
|
IOStatus UnlockFile(FileLock* l, const IOOptions& /*options*/,
|
|
|
|
IODebugContext* /*dbg*/) override {
|
|
|
|
return status_to_io_status(target_->UnlockFile(l));
|
|
|
|
}
|
|
|
|
|
|
|
|
IOStatus GetTestDirectory(const IOOptions& /*options*/, std::string* path,
|
|
|
|
IODebugContext* /*dbg*/) override {
|
|
|
|
return status_to_io_status(target_->GetTestDirectory(path));
|
|
|
|
}
|
|
|
|
IOStatus NewLogger(const std::string& fname, const IOOptions& /*options*/,
|
|
|
|
std::shared_ptr<Logger>* result,
|
|
|
|
IODebugContext* /*dbg*/) override {
|
|
|
|
return status_to_io_status(target_->NewLogger(fname, result));
|
|
|
|
}
|
|
|
|
|
|
|
|
void SanitizeFileOptions(FileOptions* opts) const override {
|
|
|
|
target_->SanitizeEnvOptions(opts);
|
|
|
|
}
|
|
|
|
|
|
|
|
FileOptions OptimizeForLogRead(
|
|
|
|
const FileOptions& file_options) const override {
|
|
|
|
return target_->OptimizeForLogRead(file_options);
|
|
|
|
}
|
|
|
|
FileOptions OptimizeForManifestRead(
|
|
|
|
const FileOptions& file_options) const override {
|
|
|
|
return target_->OptimizeForManifestRead(file_options);
|
|
|
|
}
|
|
|
|
FileOptions OptimizeForLogWrite(const FileOptions& file_options,
|
|
|
|
const DBOptions& db_options) const override {
|
|
|
|
return target_->OptimizeForLogWrite(file_options, db_options);
|
|
|
|
}
|
|
|
|
FileOptions OptimizeForManifestWrite(
|
|
|
|
const FileOptions& file_options) const override {
|
|
|
|
return target_->OptimizeForManifestWrite(file_options);
|
|
|
|
}
|
|
|
|
FileOptions OptimizeForCompactionTableWrite(
|
|
|
|
const FileOptions& file_options,
|
|
|
|
const ImmutableDBOptions& immutable_ops) const override {
|
|
|
|
return target_->OptimizeForCompactionTableWrite(file_options,
|
|
|
|
immutable_ops);
|
|
|
|
}
|
|
|
|
FileOptions OptimizeForCompactionTableRead(
|
|
|
|
const FileOptions& file_options,
|
|
|
|
const ImmutableDBOptions& db_options) const override {
|
|
|
|
return target_->OptimizeForCompactionTableRead(file_options, db_options);
|
|
|
|
}
|
2021-04-07 20:37:36 +00:00
|
|
|
FileOptions OptimizeForBlobFileRead(
|
|
|
|
const FileOptions& file_options,
|
|
|
|
const ImmutableDBOptions& db_options) const override {
|
|
|
|
return target_->OptimizeForBlobFileRead(file_options, db_options);
|
|
|
|
}
|
2021-01-06 18:48:24 +00:00
|
|
|
|
|
|
|
#ifdef GetFreeSpace
|
|
|
|
#undef GetFreeSpace
|
|
|
|
#endif
|
|
|
|
IOStatus GetFreeSpace(const std::string& path, const IOOptions& /*options*/,
|
|
|
|
uint64_t* diskfree, IODebugContext* /*dbg*/) override {
|
|
|
|
return status_to_io_status(target_->GetFreeSpace(path, diskfree));
|
|
|
|
}
|
|
|
|
IOStatus IsDirectory(const std::string& path, const IOOptions& /*options*/,
|
|
|
|
bool* is_dir, IODebugContext* /*dbg*/) override {
|
|
|
|
return status_to_io_status(target_->IsDirectory(path, is_dir));
|
|
|
|
}
|
|
|
|
|
2022-01-05 00:44:54 +00:00
|
|
|
std::string SerializeOptions(const ConfigOptions& /*config_options*/,
|
|
|
|
const std::string& /*prefix*/) const override {
|
|
|
|
// We do not want the LegacyFileSystem to appear in the serialized output.
|
|
|
|
// This clock is an internal class for those who do not implement one and
|
|
|
|
// would be part of the Env. As such, do not serialize it here.
|
|
|
|
return "";
|
|
|
|
}
|
2021-01-06 18:48:24 +00:00
|
|
|
private:
|
|
|
|
Env* target_;
|
|
|
|
};
|
|
|
|
} // end anonymous namespace
|
2011-03-18 22:37:00 +00:00
|
|
|
|
2020-03-24 04:50:42 +00:00
|
|
|
Env::Env() : thread_status_updater_(nullptr) {
|
|
|
|
file_system_ = std::make_shared<LegacyFileSystemWrapper>(this);
|
2021-01-26 06:07:26 +00:00
|
|
|
system_clock_ = std::make_shared<LegacySystemClock>(this);
|
2020-03-24 04:50:42 +00:00
|
|
|
}
|
|
|
|
|
2021-01-26 06:07:26 +00:00
|
|
|
Env::Env(const std::shared_ptr<FileSystem>& fs)
|
|
|
|
: thread_status_updater_(nullptr), file_system_(fs) {
|
|
|
|
system_clock_ = std::make_shared<LegacySystemClock>(this);
|
|
|
|
}
|
|
|
|
|
|
|
|
Env::Env(const std::shared_ptr<FileSystem>& fs,
|
|
|
|
const std::shared_ptr<SystemClock>& clock)
|
|
|
|
: thread_status_updater_(nullptr), file_system_(fs), system_clock_(clock) {}
|
2020-03-24 04:50:42 +00:00
|
|
|
|
2023-12-04 19:17:32 +00:00
|
|
|
Env::~Env() = default;
|
2011-03-18 22:37:00 +00:00
|
|
|
|
2019-07-09 21:48:07 +00:00
|
|
|
Status Env::NewLogger(const std::string& fname,
|
|
|
|
std::shared_ptr<Logger>* result) {
|
|
|
|
return NewEnvLogger(fname, this, result);
|
|
|
|
}
|
|
|
|
|
2021-06-15 10:42:52 +00:00
|
|
|
Status Env::CreateFromString(const ConfigOptions& config_options,
|
|
|
|
const std::string& value, Env** result) {
|
2022-01-05 00:44:54 +00:00
|
|
|
Env* base = Env::Default();
|
|
|
|
if (value.empty() || base->IsInstanceOf(value)) {
|
|
|
|
*result = base;
|
|
|
|
return Status::OK();
|
|
|
|
} else {
|
|
|
|
RegisterSystemEnvs();
|
|
|
|
Env* env = *result;
|
2023-02-17 20:54:07 +00:00
|
|
|
Status s = LoadStaticObject<Env>(config_options, value, &env);
|
2022-01-05 00:44:54 +00:00
|
|
|
if (s.ok()) {
|
|
|
|
*result = env;
|
|
|
|
}
|
|
|
|
return s;
|
2019-07-24 00:08:26 +00:00
|
|
|
}
|
|
|
|
}
|
|
|
|
|
2021-06-15 10:42:52 +00:00
|
|
|
Status Env::CreateFromString(const ConfigOptions& config_options,
|
|
|
|
const std::string& value, Env** result,
|
|
|
|
std::shared_ptr<Env>* guard) {
|
2019-10-09 02:17:39 +00:00
|
|
|
assert(result);
|
|
|
|
assert(guard != nullptr);
|
2022-01-05 00:44:54 +00:00
|
|
|
std::unique_ptr<Env> uniq;
|
|
|
|
|
|
|
|
Env* env = *result;
|
|
|
|
std::string id;
|
|
|
|
std::unordered_map<std::string, std::string> opt_map;
|
|
|
|
|
|
|
|
Status status =
|
|
|
|
Customizable::GetOptionsMap(config_options, env, value, &id, &opt_map);
|
|
|
|
if (!status.ok()) { // GetOptionsMap failed
|
|
|
|
return status;
|
|
|
|
}
|
|
|
|
Env* base = Env::Default();
|
|
|
|
if (id.empty() || base->IsInstanceOf(id)) {
|
|
|
|
env = base;
|
|
|
|
status = Status::OK();
|
2019-10-09 02:17:39 +00:00
|
|
|
} else {
|
2022-01-05 00:44:54 +00:00
|
|
|
RegisterSystemEnvs();
|
2022-02-11 13:10:10 +00:00
|
|
|
// First, try to load the Env as a unique object.
|
|
|
|
status = config_options.registry->NewObject<Env>(id, &env, &uniq);
|
2022-01-05 00:44:54 +00:00
|
|
|
}
|
|
|
|
if (config_options.ignore_unsupported_options && status.IsNotSupported()) {
|
|
|
|
status = Status::OK();
|
|
|
|
} else if (status.ok()) {
|
|
|
|
status = Customizable::ConfigureNewObject(config_options, env, opt_map);
|
|
|
|
}
|
|
|
|
if (status.ok()) {
|
|
|
|
guard->reset(uniq.release());
|
|
|
|
*result = env;
|
|
|
|
}
|
|
|
|
return status;
|
2019-10-09 02:17:39 +00:00
|
|
|
}
|
|
|
|
|
2021-06-15 10:42:52 +00:00
|
|
|
Status Env::CreateFromUri(const ConfigOptions& config_options,
|
|
|
|
const std::string& env_uri, const std::string& fs_uri,
|
|
|
|
Env** result, std::shared_ptr<Env>* guard) {
|
|
|
|
*result = config_options.env;
|
|
|
|
if (env_uri.empty() && fs_uri.empty()) {
|
|
|
|
// Neither specified. Use the default
|
|
|
|
guard->reset();
|
|
|
|
return Status::OK();
|
|
|
|
} else if (!env_uri.empty() && !fs_uri.empty()) {
|
|
|
|
// Both specified. Cannot choose. Return Invalid
|
|
|
|
return Status::InvalidArgument("cannot specify both fs_uri and env_uri");
|
|
|
|
} else if (fs_uri.empty()) { // Only have an ENV URI. Create an Env from it
|
|
|
|
return CreateFromString(config_options, env_uri, result, guard);
|
|
|
|
} else {
|
|
|
|
std::shared_ptr<FileSystem> fs;
|
|
|
|
Status s = FileSystem::CreateFromString(config_options, fs_uri, &fs);
|
|
|
|
if (s.ok()) {
|
|
|
|
guard->reset(new CompositeEnvWrapper(*result, fs));
|
|
|
|
*result = guard->get();
|
|
|
|
}
|
|
|
|
return s;
|
|
|
|
}
|
|
|
|
}
|
|
|
|
|
2018-04-19 00:25:37 +00:00
|
|
|
std::string Env::PriorityToString(Env::Priority priority) {
|
|
|
|
switch (priority) {
|
|
|
|
case Env::Priority::BOTTOM:
|
|
|
|
return "Bottom";
|
|
|
|
case Env::Priority::LOW:
|
|
|
|
return "Low";
|
|
|
|
case Env::Priority::HIGH:
|
|
|
|
return "High";
|
2019-03-20 00:24:09 +00:00
|
|
|
case Env::Priority::USER:
|
|
|
|
return "User";
|
2018-04-19 00:25:37 +00:00
|
|
|
case Env::Priority::TOTAL:
|
|
|
|
assert(false);
|
|
|
|
}
|
|
|
|
return "Invalid";
|
|
|
|
}
|
|
|
|
|
2015-06-11 21:18:02 +00:00
|
|
|
uint64_t Env::GetThreadID() const {
|
|
|
|
std::hash<std::thread::id> hasher;
|
|
|
|
return hasher(std::this_thread::get_id());
|
|
|
|
}
|
|
|
|
|
2015-10-08 02:11:09 +00:00
|
|
|
Status Env::ReuseWritableFile(const std::string& fname,
|
|
|
|
const std::string& old_fname,
|
2018-11-09 19:17:34 +00:00
|
|
|
std::unique_ptr<WritableFile>* result,
|
2015-10-08 02:11:09 +00:00
|
|
|
const EnvOptions& options) {
|
|
|
|
Status s = RenameFile(old_fname, fname);
|
|
|
|
if (!s.ok()) {
|
|
|
|
return s;
|
|
|
|
}
|
|
|
|
return NewWritableFile(fname, result, options);
|
|
|
|
}
|
|
|
|
|
2016-02-09 22:54:32 +00:00
|
|
|
Status Env::GetChildrenFileAttributes(const std::string& dir,
|
|
|
|
std::vector<FileAttributes>* result) {
|
|
|
|
assert(result != nullptr);
|
|
|
|
std::vector<std::string> child_fnames;
|
|
|
|
Status s = GetChildren(dir, &child_fnames);
|
|
|
|
if (!s.ok()) {
|
|
|
|
return s;
|
|
|
|
}
|
|
|
|
result->resize(child_fnames.size());
|
|
|
|
size_t result_size = 0;
|
|
|
|
for (size_t i = 0; i < child_fnames.size(); ++i) {
|
|
|
|
const std::string path = dir + "/" + child_fnames[i];
|
|
|
|
if (!(s = GetFileSize(path, &(*result)[result_size].size_bytes)).ok()) {
|
|
|
|
if (FileExists(path).IsNotFound()) {
|
|
|
|
// The file may have been deleted since we listed the directory
|
|
|
|
continue;
|
|
|
|
}
|
|
|
|
return s;
|
|
|
|
}
|
|
|
|
(*result)[result_size].name = std::move(child_fnames[i]);
|
|
|
|
result_size++;
|
|
|
|
}
|
|
|
|
result->resize(result_size);
|
|
|
|
return Status::OK();
|
|
|
|
}
|
|
|
|
|
2020-10-19 18:37:05 +00:00
|
|
|
Status Env::GetHostNameString(std::string* result) {
|
2021-11-24 19:18:07 +00:00
|
|
|
std::array<char, kMaxHostNameLen> hostname_buf{};
|
2020-10-19 18:37:05 +00:00
|
|
|
Status s = GetHostName(hostname_buf.data(), hostname_buf.size());
|
|
|
|
if (s.ok()) {
|
|
|
|
hostname_buf[hostname_buf.size() - 1] = '\0';
|
|
|
|
result->assign(hostname_buf.data());
|
|
|
|
}
|
|
|
|
return s;
|
|
|
|
}
|
|
|
|
|
Built-in support for generating unique IDs, bug fix (#8708)
Summary:
Env::GenerateUniqueId() works fine on Windows and on POSIX
where /proc/sys/kernel/random/uuid exists. Our other implementation is
flawed and easily produces collision in a new multi-threaded test.
As we rely more heavily on DB session ID uniqueness, this becomes a
serious issue.
This change combines several individually suitable entropy sources
for reliable generation of random unique IDs, with goal of uniqueness
and portability, not cryptographic strength nor maximum speed.
Specifically:
* Moves code for getting UUIDs from the OS to port::GenerateRfcUuid
rather than in Env implementation details. Callers are now told whether
the operation fails or succeeds.
* Adds an internal API GenerateRawUniqueId for generating high-quality
128-bit unique identifiers, by combining entropy from three "tracks":
* Lots of info from default Env like time, process id, and hostname.
* std::random_device
* port::GenerateRfcUuid (when working)
* Built-in implementations of Env::GenerateUniqueId() will now always
produce an RFC 4122 UUID string, either from platform-specific API or
by converting the output of GenerateRawUniqueId.
DB session IDs now use GenerateRawUniqueId while DB IDs (not as
critical) try to use port::GenerateRfcUuid but fall back on
GenerateRawUniqueId with conversion to an RFC 4122 UUID.
GenerateRawUniqueId is declared and defined under env/ rather than util/
or even port/ because of the Env dependency.
Likely follow-up: enhance GenerateRawUniqueId to be faster after the
first call and to guarantee uniqueness within the lifetime of a single
process (imparting the same property onto DB session IDs).
Pull Request resolved: https://github.com/facebook/rocksdb/pull/8708
Test Plan:
A new mini-stress test in env_test checks the various public
and internal APIs for uniqueness, including each track of
GenerateRawUniqueId individually. We can't hope to verify anywhere close
to 128 bits of entropy, but it can at least detect flaws as bad as the
old code. Serial execution of the new tests takes about 350 ms on
my machine.
Reviewed By: zhichao-cao, mrambacher
Differential Revision: D30563780
Pulled By: pdillinger
fbshipit-source-id: de4c9ff4b2f581cf784fcedb5f39f16e5185c364
2021-08-30 22:19:39 +00:00
|
|
|
std::string Env::GenerateUniqueId() {
|
|
|
|
std::string result;
|
|
|
|
bool success = port::GenerateRfcUuid(&result);
|
|
|
|
if (!success) {
|
|
|
|
// Fall back on our own way of generating a unique ID and adapt it to
|
|
|
|
// RFC 4122 variant 1 version 4 (a random ID).
|
|
|
|
// https://en.wikipedia.org/wiki/Universally_unique_identifier
|
|
|
|
// We already tried GenerateRfcUuid so no need to try it again in
|
|
|
|
// GenerateRawUniqueId
|
|
|
|
constexpr bool exclude_port_uuid = true;
|
|
|
|
uint64_t upper, lower;
|
|
|
|
GenerateRawUniqueId(&upper, &lower, exclude_port_uuid);
|
|
|
|
|
|
|
|
// Set 4-bit version to 4
|
|
|
|
upper = (upper & (~uint64_t{0xf000})) | 0x4000;
|
|
|
|
// Set unary-encoded variant to 1 (0b10)
|
|
|
|
lower = (lower & (~(uint64_t{3} << 62))) | (uint64_t{2} << 62);
|
|
|
|
|
|
|
|
// Use 36 character format of RFC 4122
|
|
|
|
result.resize(36U);
|
2023-12-04 19:17:32 +00:00
|
|
|
char* buf = result.data();
|
Built-in support for generating unique IDs, bug fix (#8708)
Summary:
Env::GenerateUniqueId() works fine on Windows and on POSIX
where /proc/sys/kernel/random/uuid exists. Our other implementation is
flawed and easily produces collision in a new multi-threaded test.
As we rely more heavily on DB session ID uniqueness, this becomes a
serious issue.
This change combines several individually suitable entropy sources
for reliable generation of random unique IDs, with goal of uniqueness
and portability, not cryptographic strength nor maximum speed.
Specifically:
* Moves code for getting UUIDs from the OS to port::GenerateRfcUuid
rather than in Env implementation details. Callers are now told whether
the operation fails or succeeds.
* Adds an internal API GenerateRawUniqueId for generating high-quality
128-bit unique identifiers, by combining entropy from three "tracks":
* Lots of info from default Env like time, process id, and hostname.
* std::random_device
* port::GenerateRfcUuid (when working)
* Built-in implementations of Env::GenerateUniqueId() will now always
produce an RFC 4122 UUID string, either from platform-specific API or
by converting the output of GenerateRawUniqueId.
DB session IDs now use GenerateRawUniqueId while DB IDs (not as
critical) try to use port::GenerateRfcUuid but fall back on
GenerateRawUniqueId with conversion to an RFC 4122 UUID.
GenerateRawUniqueId is declared and defined under env/ rather than util/
or even port/ because of the Env dependency.
Likely follow-up: enhance GenerateRawUniqueId to be faster after the
first call and to guarantee uniqueness within the lifetime of a single
process (imparting the same property onto DB session IDs).
Pull Request resolved: https://github.com/facebook/rocksdb/pull/8708
Test Plan:
A new mini-stress test in env_test checks the various public
and internal APIs for uniqueness, including each track of
GenerateRawUniqueId individually. We can't hope to verify anywhere close
to 128 bits of entropy, but it can at least detect flaws as bad as the
old code. Serial execution of the new tests takes about 350 ms on
my machine.
Reviewed By: zhichao-cao, mrambacher
Differential Revision: D30563780
Pulled By: pdillinger
fbshipit-source-id: de4c9ff4b2f581cf784fcedb5f39f16e5185c364
2021-08-30 22:19:39 +00:00
|
|
|
PutBaseChars<16>(&buf, 8, upper >> 32, /*!uppercase*/ false);
|
|
|
|
*(buf++) = '-';
|
|
|
|
PutBaseChars<16>(&buf, 4, upper >> 16, /*!uppercase*/ false);
|
|
|
|
*(buf++) = '-';
|
|
|
|
PutBaseChars<16>(&buf, 4, upper, /*!uppercase*/ false);
|
|
|
|
*(buf++) = '-';
|
|
|
|
PutBaseChars<16>(&buf, 4, lower >> 48, /*!uppercase*/ false);
|
|
|
|
*(buf++) = '-';
|
|
|
|
PutBaseChars<16>(&buf, 12, lower, /*!uppercase*/ false);
|
|
|
|
assert(buf == &result[36]);
|
|
|
|
|
|
|
|
// Verify variant 1 version 4
|
|
|
|
assert(result[14] == '4');
|
|
|
|
assert(result[19] == '8' || result[19] == '9' || result[19] == 'a' ||
|
|
|
|
result[19] == 'b');
|
|
|
|
}
|
|
|
|
return result;
|
|
|
|
}
|
|
|
|
|
2023-12-04 19:17:32 +00:00
|
|
|
SequentialFile::~SequentialFile() = default;
|
2011-03-18 22:37:00 +00:00
|
|
|
|
2023-12-04 19:17:32 +00:00
|
|
|
RandomAccessFile::~RandomAccessFile() = default;
|
2011-03-18 22:37:00 +00:00
|
|
|
|
2023-12-04 19:17:32 +00:00
|
|
|
WritableFile::~WritableFile() = default;
|
2011-03-18 22:37:00 +00:00
|
|
|
|
2023-12-04 19:17:32 +00:00
|
|
|
MemoryMappedFileBuffer::~MemoryMappedFileBuffer() = default;
|
2018-04-30 19:23:45 +00:00
|
|
|
|
2023-12-04 19:17:32 +00:00
|
|
|
Logger::~Logger() = default;
|
2018-01-16 18:57:56 +00:00
|
|
|
|
|
|
|
Status Logger::Close() {
|
|
|
|
if (!closed_) {
|
|
|
|
closed_ = true;
|
|
|
|
return CloseImpl();
|
2018-02-23 21:50:02 +00:00
|
|
|
} else {
|
|
|
|
return Status::OK();
|
2018-01-16 18:57:56 +00:00
|
|
|
}
|
2011-07-21 02:40:18 +00:00
|
|
|
}
|
|
|
|
|
2018-02-23 21:50:02 +00:00
|
|
|
Status Logger::CloseImpl() { return Status::NotSupported(); }
|
2018-01-16 18:57:56 +00:00
|
|
|
|
2023-12-04 19:17:32 +00:00
|
|
|
FileLock::~FileLock() = default;
|
2011-03-18 22:37:00 +00:00
|
|
|
|
2022-10-25 00:54:14 +00:00
|
|
|
void LogFlush(Logger* info_log) {
|
2013-11-07 19:31:56 +00:00
|
|
|
if (info_log) {
|
|
|
|
info_log->Flush();
|
|
|
|
}
|
|
|
|
}
|
|
|
|
|
2022-10-25 00:54:14 +00:00
|
|
|
static void Logv(Logger* info_log, const char* format, va_list ap) {
|
2014-10-30 20:36:18 +00:00
|
|
|
if (info_log && info_log->GetInfoLogLevel() <= InfoLogLevel::INFO_LEVEL) {
|
2014-04-10 22:27:42 +00:00
|
|
|
info_log->Logv(InfoLogLevel::INFO_LEVEL, format, ap);
|
2013-01-20 10:07:13 +00:00
|
|
|
}
|
|
|
|
}
|
|
|
|
|
2018-04-13 17:58:25 +00:00
|
|
|
void Log(Logger* info_log, const char* format, ...) {
|
|
|
|
va_list ap;
|
|
|
|
va_start(ap, format);
|
|
|
|
Logv(info_log, format, ap);
|
|
|
|
va_end(ap);
|
|
|
|
}
|
|
|
|
|
2022-10-25 00:54:14 +00:00
|
|
|
void Logger::Logv(const InfoLogLevel log_level, const char* format,
|
|
|
|
va_list ap) {
|
|
|
|
static const char* kInfoLogLevelNames[5] = {"DEBUG", "INFO", "WARN", "ERROR",
|
|
|
|
"FATAL"};
|
2015-07-16 19:10:16 +00:00
|
|
|
if (log_level < log_level_) {
|
|
|
|
return;
|
|
|
|
}
|
|
|
|
|
|
|
|
if (log_level == InfoLogLevel::INFO_LEVEL) {
|
|
|
|
// Doesn't print log level if it is INFO level.
|
|
|
|
// This is to avoid unexpected performance regression after we add
|
|
|
|
// the feature of log level. All the logs before we add the feature
|
|
|
|
// are INFO level. We don't want to add extra costs to those existing
|
|
|
|
// logging.
|
|
|
|
Logv(format, ap);
|
2019-02-16 00:56:58 +00:00
|
|
|
} else if (log_level == InfoLogLevel::HEADER_LEVEL) {
|
|
|
|
LogHeader(format, ap);
|
2015-07-16 19:10:16 +00:00
|
|
|
} else {
|
|
|
|
char new_format[500];
|
|
|
|
snprintf(new_format, sizeof(new_format) - 1, "[%s] %s",
|
2022-10-25 00:54:14 +00:00
|
|
|
kInfoLogLevelNames[log_level], format);
|
2015-07-16 19:10:16 +00:00
|
|
|
Logv(new_format, ap);
|
|
|
|
}
|
2020-09-29 23:04:52 +00:00
|
|
|
|
|
|
|
if (log_level >= InfoLogLevel::WARN_LEVEL &&
|
|
|
|
log_level != InfoLogLevel::HEADER_LEVEL) {
|
|
|
|
// Log messages with severity of warning or higher should be rare and are
|
|
|
|
// sometimes followed by an unclean crash. We want to be sure important
|
|
|
|
// messages are not lost in an application buffer when that happens.
|
|
|
|
Flush();
|
|
|
|
}
|
2015-07-16 19:10:16 +00:00
|
|
|
}
|
|
|
|
|
2022-10-25 00:54:14 +00:00
|
|
|
static void Logv(const InfoLogLevel log_level, Logger* info_log,
|
|
|
|
const char* format, va_list ap) {
|
2014-10-30 20:36:18 +00:00
|
|
|
if (info_log && info_log->GetInfoLogLevel() <= log_level) {
|
2015-07-03 00:14:39 +00:00
|
|
|
if (log_level == InfoLogLevel::HEADER_LEVEL) {
|
|
|
|
info_log->LogHeader(format, ap);
|
|
|
|
} else {
|
|
|
|
info_log->Logv(log_level, format, ap);
|
|
|
|
}
|
2014-02-26 22:41:28 +00:00
|
|
|
}
|
|
|
|
}
|
|
|
|
|
2018-04-13 17:58:25 +00:00
|
|
|
void Log(const InfoLogLevel log_level, Logger* info_log, const char* format,
|
|
|
|
...) {
|
|
|
|
va_list ap;
|
|
|
|
va_start(ap, format);
|
|
|
|
Logv(log_level, info_log, format, ap);
|
|
|
|
va_end(ap);
|
|
|
|
}
|
|
|
|
|
2022-10-25 00:54:14 +00:00
|
|
|
static void Headerv(Logger* info_log, const char* format, va_list ap) {
|
2015-02-02 17:47:24 +00:00
|
|
|
if (info_log) {
|
|
|
|
info_log->LogHeader(format, ap);
|
|
|
|
}
|
|
|
|
}
|
|
|
|
|
2018-04-13 17:58:25 +00:00
|
|
|
void Header(Logger* info_log, const char* format, ...) {
|
|
|
|
va_list ap;
|
|
|
|
va_start(ap, format);
|
|
|
|
Headerv(info_log, format, ap);
|
|
|
|
va_end(ap);
|
|
|
|
}
|
|
|
|
|
|
|
|
static void Debugv(Logger* info_log, const char* format, va_list ap) {
|
2014-10-30 20:36:18 +00:00
|
|
|
if (info_log && info_log->GetInfoLogLevel() <= InfoLogLevel::DEBUG_LEVEL) {
|
2014-04-10 22:27:42 +00:00
|
|
|
info_log->Logv(InfoLogLevel::DEBUG_LEVEL, format, ap);
|
2014-02-26 22:41:28 +00:00
|
|
|
}
|
|
|
|
}
|
|
|
|
|
2018-04-13 17:58:25 +00:00
|
|
|
void Debug(Logger* info_log, const char* format, ...) {
|
|
|
|
va_list ap;
|
|
|
|
va_start(ap, format);
|
|
|
|
Debugv(info_log, format, ap);
|
|
|
|
va_end(ap);
|
|
|
|
}
|
|
|
|
|
|
|
|
static void Infov(Logger* info_log, const char* format, va_list ap) {
|
2014-10-30 20:36:18 +00:00
|
|
|
if (info_log && info_log->GetInfoLogLevel() <= InfoLogLevel::INFO_LEVEL) {
|
2014-04-10 22:27:42 +00:00
|
|
|
info_log->Logv(InfoLogLevel::INFO_LEVEL, format, ap);
|
2014-02-26 22:41:28 +00:00
|
|
|
}
|
|
|
|
}
|
|
|
|
|
2018-04-13 17:58:25 +00:00
|
|
|
void Info(Logger* info_log, const char* format, ...) {
|
|
|
|
va_list ap;
|
|
|
|
va_start(ap, format);
|
|
|
|
Infov(info_log, format, ap);
|
|
|
|
va_end(ap);
|
|
|
|
}
|
|
|
|
|
|
|
|
static void Warnv(Logger* info_log, const char* format, va_list ap) {
|
2014-10-30 20:36:18 +00:00
|
|
|
if (info_log && info_log->GetInfoLogLevel() <= InfoLogLevel::WARN_LEVEL) {
|
2014-04-10 22:27:42 +00:00
|
|
|
info_log->Logv(InfoLogLevel::WARN_LEVEL, format, ap);
|
2014-02-26 22:41:28 +00:00
|
|
|
}
|
|
|
|
}
|
2018-04-13 17:58:25 +00:00
|
|
|
|
|
|
|
void Warn(Logger* info_log, const char* format, ...) {
|
|
|
|
va_list ap;
|
|
|
|
va_start(ap, format);
|
|
|
|
Warnv(info_log, format, ap);
|
|
|
|
va_end(ap);
|
|
|
|
}
|
|
|
|
|
|
|
|
static void Errorv(Logger* info_log, const char* format, va_list ap) {
|
2014-10-30 20:36:18 +00:00
|
|
|
if (info_log && info_log->GetInfoLogLevel() <= InfoLogLevel::ERROR_LEVEL) {
|
2014-04-10 22:27:42 +00:00
|
|
|
info_log->Logv(InfoLogLevel::ERROR_LEVEL, format, ap);
|
2014-02-26 22:41:28 +00:00
|
|
|
}
|
|
|
|
}
|
2018-04-13 17:58:25 +00:00
|
|
|
|
|
|
|
void Error(Logger* info_log, const char* format, ...) {
|
|
|
|
va_list ap;
|
|
|
|
va_start(ap, format);
|
|
|
|
Errorv(info_log, format, ap);
|
|
|
|
va_end(ap);
|
|
|
|
}
|
|
|
|
|
|
|
|
static void Fatalv(Logger* info_log, const char* format, va_list ap) {
|
2014-10-30 20:36:18 +00:00
|
|
|
if (info_log && info_log->GetInfoLogLevel() <= InfoLogLevel::FATAL_LEVEL) {
|
2014-04-10 22:27:42 +00:00
|
|
|
info_log->Logv(InfoLogLevel::FATAL_LEVEL, format, ap);
|
2014-02-26 22:41:28 +00:00
|
|
|
}
|
|
|
|
}
|
|
|
|
|
2018-04-13 17:58:25 +00:00
|
|
|
void Fatal(Logger* info_log, const char* format, ...) {
|
|
|
|
va_list ap;
|
|
|
|
va_start(ap, format);
|
|
|
|
Fatalv(info_log, format, ap);
|
|
|
|
va_end(ap);
|
|
|
|
}
|
|
|
|
|
2018-11-09 19:17:34 +00:00
|
|
|
void LogFlush(const std::shared_ptr<Logger>& info_log) {
|
2018-04-13 17:58:25 +00:00
|
|
|
LogFlush(info_log.get());
|
2013-11-07 19:31:56 +00:00
|
|
|
}
|
|
|
|
|
2018-11-09 19:17:34 +00:00
|
|
|
void Log(const InfoLogLevel log_level, const std::shared_ptr<Logger>& info_log,
|
2014-02-26 22:41:28 +00:00
|
|
|
const char* format, ...) {
|
2018-04-13 17:58:25 +00:00
|
|
|
va_list ap;
|
|
|
|
va_start(ap, format);
|
|
|
|
Logv(log_level, info_log.get(), format, ap);
|
|
|
|
va_end(ap);
|
2014-02-26 22:41:28 +00:00
|
|
|
}
|
|
|
|
|
2018-11-09 19:17:34 +00:00
|
|
|
void Header(const std::shared_ptr<Logger>& info_log, const char* format, ...) {
|
2018-04-13 17:58:25 +00:00
|
|
|
va_list ap;
|
|
|
|
va_start(ap, format);
|
|
|
|
Headerv(info_log.get(), format, ap);
|
|
|
|
va_end(ap);
|
2015-02-02 17:47:24 +00:00
|
|
|
}
|
|
|
|
|
2018-11-09 19:17:34 +00:00
|
|
|
void Debug(const std::shared_ptr<Logger>& info_log, const char* format, ...) {
|
2018-04-13 17:58:25 +00:00
|
|
|
va_list ap;
|
|
|
|
va_start(ap, format);
|
|
|
|
Debugv(info_log.get(), format, ap);
|
|
|
|
va_end(ap);
|
2014-02-26 22:41:28 +00:00
|
|
|
}
|
|
|
|
|
2018-11-09 19:17:34 +00:00
|
|
|
void Info(const std::shared_ptr<Logger>& info_log, const char* format, ...) {
|
2018-04-13 17:58:25 +00:00
|
|
|
va_list ap;
|
|
|
|
va_start(ap, format);
|
|
|
|
Infov(info_log.get(), format, ap);
|
|
|
|
va_end(ap);
|
2014-02-26 22:41:28 +00:00
|
|
|
}
|
|
|
|
|
2018-11-09 19:17:34 +00:00
|
|
|
void Warn(const std::shared_ptr<Logger>& info_log, const char* format, ...) {
|
2018-04-13 17:58:25 +00:00
|
|
|
va_list ap;
|
|
|
|
va_start(ap, format);
|
|
|
|
Warnv(info_log.get(), format, ap);
|
|
|
|
va_end(ap);
|
2014-02-26 22:41:28 +00:00
|
|
|
}
|
|
|
|
|
2018-11-09 19:17:34 +00:00
|
|
|
void Error(const std::shared_ptr<Logger>& info_log, const char* format, ...) {
|
2018-04-13 17:58:25 +00:00
|
|
|
va_list ap;
|
|
|
|
va_start(ap, format);
|
|
|
|
Errorv(info_log.get(), format, ap);
|
|
|
|
va_end(ap);
|
2014-02-26 22:41:28 +00:00
|
|
|
}
|
|
|
|
|
2018-11-09 19:17:34 +00:00
|
|
|
void Fatal(const std::shared_ptr<Logger>& info_log, const char* format, ...) {
|
2018-04-13 17:58:25 +00:00
|
|
|
va_list ap;
|
|
|
|
va_start(ap, format);
|
|
|
|
Fatalv(info_log.get(), format, ap);
|
|
|
|
va_end(ap);
|
2014-02-26 22:41:28 +00:00
|
|
|
}
|
|
|
|
|
2018-11-09 19:17:34 +00:00
|
|
|
void Log(const std::shared_ptr<Logger>& info_log, const char* format, ...) {
|
2018-04-13 17:58:25 +00:00
|
|
|
va_list ap;
|
|
|
|
va_start(ap, format);
|
|
|
|
Logv(info_log.get(), format, ap);
|
|
|
|
va_end(ap);
|
2011-03-18 22:37:00 +00:00
|
|
|
}
|
|
|
|
|
2014-04-10 04:17:14 +00:00
|
|
|
Status WriteStringToFile(Env* env, const Slice& data, const std::string& fname,
|
Group SST write in flush, compaction and db open with new stats (#11910)
Summary:
## Context/Summary
Similar to https://github.com/facebook/rocksdb/pull/11288, https://github.com/facebook/rocksdb/pull/11444, categorizing SST/blob file write according to different io activities allows more insight into the activity.
For that, this PR does the following:
- Tag different write IOs by passing down and converting WriteOptions to IOOptions
- Add new SST_WRITE_MICROS histogram in WritableFileWriter::Append() and breakdown FILE_WRITE_{FLUSH|COMPACTION|DB_OPEN}_MICROS
Some related code refactory to make implementation cleaner:
- Blob stats
- Replace high-level write measurement with low-level WritableFileWriter::Append() measurement for BLOB_DB_BLOB_FILE_WRITE_MICROS. This is to make FILE_WRITE_{FLUSH|COMPACTION|DB_OPEN}_MICROS include blob file. As a consequence, this introduces some behavioral changes on it, see HISTORY and db bench test plan below for more info.
- Fix bugs where BLOB_DB_BLOB_FILE_SYNCED/BLOB_DB_BLOB_FILE_BYTES_WRITTEN include file failed to sync and bytes failed to write.
- Refactor WriteOptions constructor for easier construction with io_activity and rate_limiter_priority
- Refactor DBImpl::~DBImpl()/BlobDBImpl::Close() to bypass thread op verification
- Build table
- TableBuilderOptions now includes Read/WriteOpitons so BuildTable() do not need to take these two variables
- Replace the io_priority passed into BuildTable() with TableBuilderOptions::WriteOpitons::rate_limiter_priority. Similar for BlobFileBuilder.
This parameter is used for dynamically changing file io priority for flush, see https://github.com/facebook/rocksdb/pull/9988?fbclid=IwAR1DtKel6c-bRJAdesGo0jsbztRtciByNlvokbxkV6h_L-AE9MACzqRTT5s for more
- Update ThreadStatus::FLUSH_BYTES_WRITTEN to use io_activity to track flush IO in flush job and db open instead of io_priority
## Test
### db bench
Flush
```
./db_bench --statistics=1 --benchmarks=fillseq --num=100000 --write_buffer_size=100
rocksdb.sst.write.micros P50 : 1.830863 P95 : 4.094720 P99 : 6.578947 P100 : 26.000000 COUNT : 7875 SUM : 20377
rocksdb.file.write.flush.micros P50 : 1.830863 P95 : 4.094720 P99 : 6.578947 P100 : 26.000000 COUNT : 7875 SUM : 20377
rocksdb.file.write.compaction.micros P50 : 0.000000 P95 : 0.000000 P99 : 0.000000 P100 : 0.000000 COUNT : 0 SUM : 0
rocksdb.file.write.db.open.micros P50 : 0.000000 P95 : 0.000000 P99 : 0.000000 P100 : 0.000000 COUNT : 0 SUM : 0
```
compaction, db oopen
```
Setup: ./db_bench --statistics=1 --benchmarks=fillseq --num=10000 --disable_auto_compactions=1 -write_buffer_size=100 --db=../db_bench
Run:./db_bench --statistics=1 --benchmarks=compact --db=../db_bench --use_existing_db=1
rocksdb.sst.write.micros P50 : 2.675325 P95 : 9.578788 P99 : 18.780000 P100 : 314.000000 COUNT : 638 SUM : 3279
rocksdb.file.write.flush.micros P50 : 0.000000 P95 : 0.000000 P99 : 0.000000 P100 : 0.000000 COUNT : 0 SUM : 0
rocksdb.file.write.compaction.micros P50 : 2.757353 P95 : 9.610687 P99 : 19.316667 P100 : 314.000000 COUNT : 615 SUM : 3213
rocksdb.file.write.db.open.micros P50 : 2.055556 P95 : 3.925000 P99 : 9.000000 P100 : 9.000000 COUNT : 23 SUM : 66
```
blob stats - just to make sure they aren't broken by this PR
```
Integrated Blob DB
Setup: ./db_bench --enable_blob_files=1 --statistics=1 --benchmarks=fillseq --num=10000 --disable_auto_compactions=1 -write_buffer_size=100 --db=../db_bench
Run:./db_bench --enable_blob_files=1 --statistics=1 --benchmarks=compact --db=../db_bench --use_existing_db=1
pre-PR:
rocksdb.blobdb.blob.file.write.micros P50 : 7.298246 P95 : 9.771930 P99 : 9.991813 P100 : 16.000000 COUNT : 235 SUM : 1600
rocksdb.blobdb.blob.file.synced COUNT : 1
rocksdb.blobdb.blob.file.bytes.written COUNT : 34842
post-PR:
rocksdb.blobdb.blob.file.write.micros P50 : 2.000000 P95 : 2.829360 P99 : 2.993779 P100 : 9.000000 COUNT : 707 SUM : 1614
- COUNT is higher and values are smaller as it includes header and footer write
- COUNT is 3X higher due to each Append() count as one post-PR, while in pre-PR, 3 Append()s counts as one. See https://github.com/facebook/rocksdb/pull/11910/files#diff-32b811c0a1c000768cfb2532052b44dc0b3bf82253f3eab078e15ff201a0dabfL157-L164
rocksdb.blobdb.blob.file.synced COUNT : 1 (stay the same)
rocksdb.blobdb.blob.file.bytes.written COUNT : 34842 (stay the same)
```
```
Stacked Blob DB
Run: ./db_bench --use_blob_db=1 --statistics=1 --benchmarks=fillseq --num=10000 --disable_auto_compactions=1 -write_buffer_size=100 --db=../db_bench
pre-PR:
rocksdb.blobdb.blob.file.write.micros P50 : 12.808042 P95 : 19.674497 P99 : 28.539683 P100 : 51.000000 COUNT : 10000 SUM : 140876
rocksdb.blobdb.blob.file.synced COUNT : 8
rocksdb.blobdb.blob.file.bytes.written COUNT : 1043445
post-PR:
rocksdb.blobdb.blob.file.write.micros P50 : 1.657370 P95 : 2.952175 P99 : 3.877519 P100 : 24.000000 COUNT : 30001 SUM : 67924
- COUNT is higher and values are smaller as it includes header and footer write
- COUNT is 3X higher due to each Append() count as one post-PR, while in pre-PR, 3 Append()s counts as one. See https://github.com/facebook/rocksdb/pull/11910/files#diff-32b811c0a1c000768cfb2532052b44dc0b3bf82253f3eab078e15ff201a0dabfL157-L164
rocksdb.blobdb.blob.file.synced COUNT : 8 (stay the same)
rocksdb.blobdb.blob.file.bytes.written COUNT : 1043445 (stay the same)
```
### Rehearsal CI stress test
Trigger 3 full runs of all our CI stress tests
### Performance
Flush
```
TEST_TMPDIR=/dev/shm ./db_basic_bench_pre_pr --benchmark_filter=ManualFlush/key_num:524288/per_key_size:256 --benchmark_repetitions=1000
-- default: 1 thread is used to run benchmark; enable_statistics = true
Pre-pr: avg 507515519.3 ns
497686074,499444327,500862543,501389862,502994471,503744435,504142123,504224056,505724198,506610393,506837742,506955122,507695561,507929036,508307733,508312691,508999120,509963561,510142147,510698091,510743096,510769317,510957074,511053311,511371367,511409911,511432960,511642385,511691964,511730908,
Post-pr: avg 511971266.5 ns, regressed 0.88%
502744835,506502498,507735420,507929724,508313335,509548582,509994942,510107257,510715603,511046955,511352639,511458478,512117521,512317380,512766303,512972652,513059586,513804934,513808980,514059409,514187369,514389494,514447762,514616464,514622882,514641763,514666265,514716377,514990179,515502408,
```
Compaction
```
TEST_TMPDIR=/dev/shm ./db_basic_bench_{pre|post}_pr --benchmark_filter=ManualCompaction/comp_style:0/max_data:134217728/per_key_size:256/enable_statistics:1 --benchmark_repetitions=1000
-- default: 1 thread is used to run benchmark
Pre-pr: avg 495346098.30 ns
492118301,493203526,494201411,494336607,495269217,495404950,496402598,497012157,497358370,498153846
Post-pr: avg 504528077.20, regressed 1.85%. "ManualCompaction" include flush so the isolated regression for compaction should be around 1.85-0.88 = 0.97%
502465338,502485945,502541789,502909283,503438601,504143885,506113087,506629423,507160414,507393007
```
Put with WAL (in case passing WriteOptions slows down this path even without collecting SST write stats)
```
TEST_TMPDIR=/dev/shm ./db_basic_bench_pre_pr --benchmark_filter=DBPut/comp_style:0/max_data:107374182400/per_key_size:256/enable_statistics:1/wal:1 --benchmark_repetitions=1000
-- default: 1 thread is used to run benchmark
Pre-pr: avg 3848.10 ns
3814,3838,3839,3848,3854,3854,3854,3860,3860,3860
Post-pr: avg 3874.20 ns, regressed 0.68%
3863,3867,3871,3874,3875,3877,3877,3877,3880,3881
```
Pull Request resolved: https://github.com/facebook/rocksdb/pull/11910
Reviewed By: ajkr
Differential Revision: D49788060
Pulled By: hx235
fbshipit-source-id: 79e73699cda5be3b66461687e5147c2484fc5eff
2023-12-29 23:29:23 +00:00
|
|
|
bool should_sync, const IOOptions* io_options) {
|
2021-01-06 18:48:24 +00:00
|
|
|
const auto& fs = env->GetFileSystem();
|
Group SST write in flush, compaction and db open with new stats (#11910)
Summary:
## Context/Summary
Similar to https://github.com/facebook/rocksdb/pull/11288, https://github.com/facebook/rocksdb/pull/11444, categorizing SST/blob file write according to different io activities allows more insight into the activity.
For that, this PR does the following:
- Tag different write IOs by passing down and converting WriteOptions to IOOptions
- Add new SST_WRITE_MICROS histogram in WritableFileWriter::Append() and breakdown FILE_WRITE_{FLUSH|COMPACTION|DB_OPEN}_MICROS
Some related code refactory to make implementation cleaner:
- Blob stats
- Replace high-level write measurement with low-level WritableFileWriter::Append() measurement for BLOB_DB_BLOB_FILE_WRITE_MICROS. This is to make FILE_WRITE_{FLUSH|COMPACTION|DB_OPEN}_MICROS include blob file. As a consequence, this introduces some behavioral changes on it, see HISTORY and db bench test plan below for more info.
- Fix bugs where BLOB_DB_BLOB_FILE_SYNCED/BLOB_DB_BLOB_FILE_BYTES_WRITTEN include file failed to sync and bytes failed to write.
- Refactor WriteOptions constructor for easier construction with io_activity and rate_limiter_priority
- Refactor DBImpl::~DBImpl()/BlobDBImpl::Close() to bypass thread op verification
- Build table
- TableBuilderOptions now includes Read/WriteOpitons so BuildTable() do not need to take these two variables
- Replace the io_priority passed into BuildTable() with TableBuilderOptions::WriteOpitons::rate_limiter_priority. Similar for BlobFileBuilder.
This parameter is used for dynamically changing file io priority for flush, see https://github.com/facebook/rocksdb/pull/9988?fbclid=IwAR1DtKel6c-bRJAdesGo0jsbztRtciByNlvokbxkV6h_L-AE9MACzqRTT5s for more
- Update ThreadStatus::FLUSH_BYTES_WRITTEN to use io_activity to track flush IO in flush job and db open instead of io_priority
## Test
### db bench
Flush
```
./db_bench --statistics=1 --benchmarks=fillseq --num=100000 --write_buffer_size=100
rocksdb.sst.write.micros P50 : 1.830863 P95 : 4.094720 P99 : 6.578947 P100 : 26.000000 COUNT : 7875 SUM : 20377
rocksdb.file.write.flush.micros P50 : 1.830863 P95 : 4.094720 P99 : 6.578947 P100 : 26.000000 COUNT : 7875 SUM : 20377
rocksdb.file.write.compaction.micros P50 : 0.000000 P95 : 0.000000 P99 : 0.000000 P100 : 0.000000 COUNT : 0 SUM : 0
rocksdb.file.write.db.open.micros P50 : 0.000000 P95 : 0.000000 P99 : 0.000000 P100 : 0.000000 COUNT : 0 SUM : 0
```
compaction, db oopen
```
Setup: ./db_bench --statistics=1 --benchmarks=fillseq --num=10000 --disable_auto_compactions=1 -write_buffer_size=100 --db=../db_bench
Run:./db_bench --statistics=1 --benchmarks=compact --db=../db_bench --use_existing_db=1
rocksdb.sst.write.micros P50 : 2.675325 P95 : 9.578788 P99 : 18.780000 P100 : 314.000000 COUNT : 638 SUM : 3279
rocksdb.file.write.flush.micros P50 : 0.000000 P95 : 0.000000 P99 : 0.000000 P100 : 0.000000 COUNT : 0 SUM : 0
rocksdb.file.write.compaction.micros P50 : 2.757353 P95 : 9.610687 P99 : 19.316667 P100 : 314.000000 COUNT : 615 SUM : 3213
rocksdb.file.write.db.open.micros P50 : 2.055556 P95 : 3.925000 P99 : 9.000000 P100 : 9.000000 COUNT : 23 SUM : 66
```
blob stats - just to make sure they aren't broken by this PR
```
Integrated Blob DB
Setup: ./db_bench --enable_blob_files=1 --statistics=1 --benchmarks=fillseq --num=10000 --disable_auto_compactions=1 -write_buffer_size=100 --db=../db_bench
Run:./db_bench --enable_blob_files=1 --statistics=1 --benchmarks=compact --db=../db_bench --use_existing_db=1
pre-PR:
rocksdb.blobdb.blob.file.write.micros P50 : 7.298246 P95 : 9.771930 P99 : 9.991813 P100 : 16.000000 COUNT : 235 SUM : 1600
rocksdb.blobdb.blob.file.synced COUNT : 1
rocksdb.blobdb.blob.file.bytes.written COUNT : 34842
post-PR:
rocksdb.blobdb.blob.file.write.micros P50 : 2.000000 P95 : 2.829360 P99 : 2.993779 P100 : 9.000000 COUNT : 707 SUM : 1614
- COUNT is higher and values are smaller as it includes header and footer write
- COUNT is 3X higher due to each Append() count as one post-PR, while in pre-PR, 3 Append()s counts as one. See https://github.com/facebook/rocksdb/pull/11910/files#diff-32b811c0a1c000768cfb2532052b44dc0b3bf82253f3eab078e15ff201a0dabfL157-L164
rocksdb.blobdb.blob.file.synced COUNT : 1 (stay the same)
rocksdb.blobdb.blob.file.bytes.written COUNT : 34842 (stay the same)
```
```
Stacked Blob DB
Run: ./db_bench --use_blob_db=1 --statistics=1 --benchmarks=fillseq --num=10000 --disable_auto_compactions=1 -write_buffer_size=100 --db=../db_bench
pre-PR:
rocksdb.blobdb.blob.file.write.micros P50 : 12.808042 P95 : 19.674497 P99 : 28.539683 P100 : 51.000000 COUNT : 10000 SUM : 140876
rocksdb.blobdb.blob.file.synced COUNT : 8
rocksdb.blobdb.blob.file.bytes.written COUNT : 1043445
post-PR:
rocksdb.blobdb.blob.file.write.micros P50 : 1.657370 P95 : 2.952175 P99 : 3.877519 P100 : 24.000000 COUNT : 30001 SUM : 67924
- COUNT is higher and values are smaller as it includes header and footer write
- COUNT is 3X higher due to each Append() count as one post-PR, while in pre-PR, 3 Append()s counts as one. See https://github.com/facebook/rocksdb/pull/11910/files#diff-32b811c0a1c000768cfb2532052b44dc0b3bf82253f3eab078e15ff201a0dabfL157-L164
rocksdb.blobdb.blob.file.synced COUNT : 8 (stay the same)
rocksdb.blobdb.blob.file.bytes.written COUNT : 1043445 (stay the same)
```
### Rehearsal CI stress test
Trigger 3 full runs of all our CI stress tests
### Performance
Flush
```
TEST_TMPDIR=/dev/shm ./db_basic_bench_pre_pr --benchmark_filter=ManualFlush/key_num:524288/per_key_size:256 --benchmark_repetitions=1000
-- default: 1 thread is used to run benchmark; enable_statistics = true
Pre-pr: avg 507515519.3 ns
497686074,499444327,500862543,501389862,502994471,503744435,504142123,504224056,505724198,506610393,506837742,506955122,507695561,507929036,508307733,508312691,508999120,509963561,510142147,510698091,510743096,510769317,510957074,511053311,511371367,511409911,511432960,511642385,511691964,511730908,
Post-pr: avg 511971266.5 ns, regressed 0.88%
502744835,506502498,507735420,507929724,508313335,509548582,509994942,510107257,510715603,511046955,511352639,511458478,512117521,512317380,512766303,512972652,513059586,513804934,513808980,514059409,514187369,514389494,514447762,514616464,514622882,514641763,514666265,514716377,514990179,515502408,
```
Compaction
```
TEST_TMPDIR=/dev/shm ./db_basic_bench_{pre|post}_pr --benchmark_filter=ManualCompaction/comp_style:0/max_data:134217728/per_key_size:256/enable_statistics:1 --benchmark_repetitions=1000
-- default: 1 thread is used to run benchmark
Pre-pr: avg 495346098.30 ns
492118301,493203526,494201411,494336607,495269217,495404950,496402598,497012157,497358370,498153846
Post-pr: avg 504528077.20, regressed 1.85%. "ManualCompaction" include flush so the isolated regression for compaction should be around 1.85-0.88 = 0.97%
502465338,502485945,502541789,502909283,503438601,504143885,506113087,506629423,507160414,507393007
```
Put with WAL (in case passing WriteOptions slows down this path even without collecting SST write stats)
```
TEST_TMPDIR=/dev/shm ./db_basic_bench_pre_pr --benchmark_filter=DBPut/comp_style:0/max_data:107374182400/per_key_size:256/enable_statistics:1/wal:1 --benchmark_repetitions=1000
-- default: 1 thread is used to run benchmark
Pre-pr: avg 3848.10 ns
3814,3838,3839,3848,3854,3854,3854,3860,3860,3860
Post-pr: avg 3874.20 ns, regressed 0.68%
3863,3867,3871,3874,3875,3877,3877,3877,3880,3881
```
Pull Request resolved: https://github.com/facebook/rocksdb/pull/11910
Reviewed By: ajkr
Differential Revision: D49788060
Pulled By: hx235
fbshipit-source-id: 79e73699cda5be3b66461687e5147c2484fc5eff
2023-12-29 23:29:23 +00:00
|
|
|
return WriteStringToFile(fs.get(), data, fname, should_sync,
|
|
|
|
io_options ? *io_options : IOOptions());
|
2011-03-18 22:37:00 +00:00
|
|
|
}
|
|
|
|
|
|
|
|
Status ReadFileToString(Env* env, const std::string& fname, std::string* data) {
|
2021-01-06 18:48:24 +00:00
|
|
|
const auto& fs = env->GetFileSystem();
|
|
|
|
return ReadFileToString(fs.get(), fname, data);
|
2011-03-18 22:37:00 +00:00
|
|
|
}
|
|
|
|
|
2013-06-07 22:35:17 +00:00
|
|
|
namespace { // anonymous namespace
|
|
|
|
|
2014-02-05 21:12:23 +00:00
|
|
|
void AssignEnvOptions(EnvOptions* env_options, const DBOptions& options) {
|
2013-06-07 22:35:17 +00:00
|
|
|
env_options->use_mmap_reads = options.allow_mmap_reads;
|
|
|
|
env_options->use_mmap_writes = options.allow_mmap_writes;
|
2016-10-28 17:36:05 +00:00
|
|
|
env_options->use_direct_reads = options.use_direct_reads;
|
2013-06-07 22:35:17 +00:00
|
|
|
env_options->set_fd_cloexec = options.is_fd_close_on_exec;
|
2013-06-14 05:49:46 +00:00
|
|
|
env_options->bytes_per_sync = options.bytes_per_sync;
|
2015-10-27 21:44:16 +00:00
|
|
|
env_options->compaction_readahead_size = options.compaction_readahead_size;
|
2015-10-29 22:52:32 +00:00
|
|
|
env_options->random_access_max_buffer_size =
|
|
|
|
options.random_access_max_buffer_size;
|
2014-07-08 19:31:49 +00:00
|
|
|
env_options->rate_limiter = options.rate_limiter.get();
|
2015-11-16 20:56:21 +00:00
|
|
|
env_options->writable_file_max_buffer_size =
|
|
|
|
options.writable_file_max_buffer_size;
|
2015-10-07 17:04:05 +00:00
|
|
|
env_options->allow_fallocate = options.allow_fallocate;
|
Optionally wait on bytes_per_sync to smooth I/O (#5183)
Summary:
The existing implementation does not guarantee bytes reach disk every `bytes_per_sync` when writing SST files, or every `wal_bytes_per_sync` when writing WALs. This can cause confusing behavior for users who enable this feature to avoid large syncs during flush and compaction, but then end up hitting them anyways.
My understanding of the existing behavior is we used `sync_file_range` with `SYNC_FILE_RANGE_WRITE` to submit ranges for async writeback, such that we could continue processing the next range of bytes while that I/O is happening. I believe we can preserve that benefit while also limiting how far the processing can get ahead of the I/O, which prevents huge syncs from happening when the file finishes.
Consider this `sync_file_range` usage: `sync_file_range(fd_, 0, static_cast<off_t>(offset + nbytes), SYNC_FILE_RANGE_WAIT_BEFORE | SYNC_FILE_RANGE_WRITE)`. Expanding the range to start at 0 and adding the `SYNC_FILE_RANGE_WAIT_BEFORE` flag causes any pending writeback (like from a previous call to `sync_file_range`) to finish before it proceeds to submit the latest `nbytes` for writeback. The latest `nbytes` are still written back asynchronously, unless processing exceeds I/O speed, in which case the following `sync_file_range` will need to wait on it.
There is a second change in this PR to use `fdatasync` when `sync_file_range` is unavailable (determined statically) or has some known problem with the underlying filesystem (determined dynamically).
The above two changes only apply when the user enables a new option, `strict_bytes_per_sync`.
Pull Request resolved: https://github.com/facebook/rocksdb/pull/5183
Differential Revision: D14953553
Pulled By: siying
fbshipit-source-id: 445c3862e019fb7b470f9c7f314fc231b62706e9
2019-04-22 18:48:45 +00:00
|
|
|
env_options->strict_bytes_per_sync = options.strict_bytes_per_sync;
|
2019-10-14 19:23:39 +00:00
|
|
|
options.env->SanitizeEnvOptions(env_options);
|
2013-06-07 22:35:17 +00:00
|
|
|
}
|
|
|
|
|
2022-10-25 00:54:14 +00:00
|
|
|
} // namespace
|
2013-06-07 22:35:17 +00:00
|
|
|
|
2015-05-19 00:03:59 +00:00
|
|
|
EnvOptions Env::OptimizeForLogWrite(const EnvOptions& env_options,
|
|
|
|
const DBOptions& db_options) const {
|
|
|
|
EnvOptions optimized_env_options(env_options);
|
|
|
|
optimized_env_options.bytes_per_sync = db_options.wal_bytes_per_sync;
|
2017-10-31 20:49:25 +00:00
|
|
|
optimized_env_options.writable_file_max_buffer_size =
|
|
|
|
db_options.writable_file_max_buffer_size;
|
2015-05-19 00:03:59 +00:00
|
|
|
return optimized_env_options;
|
2014-03-18 04:52:14 +00:00
|
|
|
}
|
|
|
|
|
|
|
|
EnvOptions Env::OptimizeForManifestWrite(const EnvOptions& env_options) const {
|
|
|
|
return env_options;
|
2014-02-28 21:19:47 +00:00
|
|
|
}
|
|
|
|
|
2017-05-23 01:40:41 +00:00
|
|
|
EnvOptions Env::OptimizeForLogRead(const EnvOptions& env_options) const {
|
|
|
|
EnvOptions optimized_env_options(env_options);
|
|
|
|
optimized_env_options.use_direct_reads = false;
|
|
|
|
return optimized_env_options;
|
|
|
|
}
|
|
|
|
|
|
|
|
EnvOptions Env::OptimizeForManifestRead(const EnvOptions& env_options) const {
|
|
|
|
EnvOptions optimized_env_options(env_options);
|
|
|
|
optimized_env_options.use_direct_reads = false;
|
|
|
|
return optimized_env_options;
|
|
|
|
}
|
|
|
|
|
2017-04-13 20:07:33 +00:00
|
|
|
EnvOptions Env::OptimizeForCompactionTableWrite(
|
|
|
|
const EnvOptions& env_options, const ImmutableDBOptions& db_options) const {
|
|
|
|
EnvOptions optimized_env_options(env_options);
|
|
|
|
optimized_env_options.use_direct_writes =
|
|
|
|
db_options.use_direct_io_for_flush_and_compaction;
|
|
|
|
return optimized_env_options;
|
|
|
|
}
|
|
|
|
|
|
|
|
EnvOptions Env::OptimizeForCompactionTableRead(
|
|
|
|
const EnvOptions& env_options, const ImmutableDBOptions& db_options) const {
|
|
|
|
EnvOptions optimized_env_options(env_options);
|
2018-05-10 02:26:43 +00:00
|
|
|
optimized_env_options.use_direct_reads = db_options.use_direct_reads;
|
2021-04-07 20:37:36 +00:00
|
|
|
return optimized_env_options;
|
|
|
|
}
|
|
|
|
EnvOptions Env::OptimizeForBlobFileRead(
|
|
|
|
const EnvOptions& env_options, const ImmutableDBOptions& db_options) const {
|
|
|
|
EnvOptions optimized_env_options(env_options);
|
|
|
|
optimized_env_options.use_direct_reads = db_options.use_direct_reads;
|
2017-04-13 20:07:33 +00:00
|
|
|
return optimized_env_options;
|
|
|
|
}
|
|
|
|
|
2014-02-05 21:12:23 +00:00
|
|
|
EnvOptions::EnvOptions(const DBOptions& options) {
|
2013-06-07 22:35:17 +00:00
|
|
|
AssignEnvOptions(this, options);
|
|
|
|
}
|
|
|
|
|
|
|
|
EnvOptions::EnvOptions() {
|
2014-02-05 21:12:23 +00:00
|
|
|
DBOptions options;
|
2013-06-07 22:35:17 +00:00
|
|
|
AssignEnvOptions(this, options);
|
|
|
|
}
|
|
|
|
|
2019-07-09 21:48:07 +00:00
|
|
|
Status NewEnvLogger(const std::string& fname, Env* env,
|
|
|
|
std::shared_ptr<Logger>* result) {
|
2021-01-29 06:08:46 +00:00
|
|
|
FileOptions options;
|
2019-07-09 21:48:07 +00:00
|
|
|
// TODO: Tune the buffer size.
|
|
|
|
options.writable_file_max_buffer_size = 1024 * 1024;
|
2021-01-29 06:08:46 +00:00
|
|
|
std::unique_ptr<FSWritableFile> writable_file;
|
|
|
|
const auto status = env->GetFileSystem()->NewWritableFile(
|
|
|
|
fname, options, &writable_file, nullptr);
|
2019-07-09 21:48:07 +00:00
|
|
|
if (!status.ok()) {
|
|
|
|
return status;
|
|
|
|
}
|
|
|
|
|
2021-01-29 06:08:46 +00:00
|
|
|
*result = std::make_shared<EnvLogger>(std::move(writable_file), fname,
|
|
|
|
options, env);
|
2019-07-09 21:48:07 +00:00
|
|
|
return Status::OK();
|
|
|
|
}
|
2013-06-07 22:35:17 +00:00
|
|
|
|
2020-03-24 04:50:42 +00:00
|
|
|
const std::shared_ptr<FileSystem>& Env::GetFileSystem() const {
|
|
|
|
return file_system_;
|
|
|
|
}
|
2021-01-26 06:07:26 +00:00
|
|
|
|
|
|
|
const std::shared_ptr<SystemClock>& Env::GetSystemClock() const {
|
|
|
|
return system_clock_;
|
|
|
|
}
|
2021-09-21 15:53:03 +00:00
|
|
|
namespace {
|
|
|
|
static std::unordered_map<std::string, OptionTypeInfo> sc_wrapper_type_info = {
|
|
|
|
{"target",
|
|
|
|
OptionTypeInfo::AsCustomSharedPtr<SystemClock>(
|
|
|
|
0, OptionVerificationType::kByName, OptionTypeFlags::kDontSerialize)},
|
|
|
|
};
|
|
|
|
|
|
|
|
} // namespace
|
|
|
|
SystemClockWrapper::SystemClockWrapper(const std::shared_ptr<SystemClock>& t)
|
|
|
|
: target_(t) {
|
|
|
|
RegisterOptions("", &target_, &sc_wrapper_type_info);
|
|
|
|
}
|
|
|
|
|
|
|
|
Status SystemClockWrapper::PrepareOptions(const ConfigOptions& options) {
|
|
|
|
if (target_ == nullptr) {
|
|
|
|
target_ = SystemClock::Default();
|
|
|
|
}
|
|
|
|
return SystemClock::PrepareOptions(options);
|
|
|
|
}
|
|
|
|
|
|
|
|
std::string SystemClockWrapper::SerializeOptions(
|
|
|
|
const ConfigOptions& config_options, const std::string& header) const {
|
|
|
|
auto parent = SystemClock::SerializeOptions(config_options, "");
|
|
|
|
if (config_options.IsShallow() || target_ == nullptr ||
|
|
|
|
target_->IsInstanceOf(SystemClock::kDefaultName())) {
|
|
|
|
return parent;
|
|
|
|
} else {
|
|
|
|
std::string result = header;
|
|
|
|
if (!StartsWith(parent, OptionTypeInfo::kIdPropName())) {
|
|
|
|
result.append(OptionTypeInfo::kIdPropName()).append("=");
|
|
|
|
}
|
|
|
|
result.append(parent);
|
|
|
|
if (!EndsWith(result, config_options.delimiter)) {
|
|
|
|
result.append(config_options.delimiter);
|
|
|
|
}
|
|
|
|
result.append("target=").append(target_->ToString(config_options));
|
|
|
|
return result;
|
|
|
|
}
|
|
|
|
}
|
|
|
|
|
|
|
|
static int RegisterBuiltinSystemClocks(ObjectLibrary& library,
|
|
|
|
const std::string& /*arg*/) {
|
2022-01-11 14:32:42 +00:00
|
|
|
library.AddFactory<SystemClock>(
|
2021-09-21 15:53:03 +00:00
|
|
|
EmulatedSystemClock::kClassName(),
|
|
|
|
[](const std::string& /*uri*/, std::unique_ptr<SystemClock>* guard,
|
|
|
|
std::string* /* errmsg */) {
|
|
|
|
guard->reset(new EmulatedSystemClock(SystemClock::Default()));
|
|
|
|
return guard->get();
|
|
|
|
});
|
|
|
|
size_t num_types;
|
|
|
|
return static_cast<int>(library.GetFactoryCount(&num_types));
|
|
|
|
}
|
|
|
|
|
|
|
|
Status SystemClock::CreateFromString(const ConfigOptions& config_options,
|
|
|
|
const std::string& value,
|
|
|
|
std::shared_ptr<SystemClock>* result) {
|
|
|
|
auto clock = SystemClock::Default();
|
|
|
|
if (clock->IsInstanceOf(value)) {
|
|
|
|
*result = clock;
|
|
|
|
return Status::OK();
|
|
|
|
} else {
|
|
|
|
static std::once_flag once;
|
|
|
|
std::call_once(once, [&]() {
|
|
|
|
RegisterBuiltinSystemClocks(*(ObjectLibrary::Default().get()), "");
|
|
|
|
});
|
2023-02-17 20:54:07 +00:00
|
|
|
return LoadSharedObject<SystemClock>(config_options, value, result);
|
2021-09-21 15:53:03 +00:00
|
|
|
}
|
|
|
|
}
|
2023-08-30 01:39:10 +00:00
|
|
|
|
|
|
|
bool SystemClock::TimedWait(port::CondVar* cv,
|
|
|
|
std::chrono::microseconds deadline) {
|
|
|
|
return cv->TimedWait(deadline.count());
|
|
|
|
}
|
2020-02-20 20:07:53 +00:00
|
|
|
} // namespace ROCKSDB_NAMESPACE
|