Some API clarifications (#9080)

Summary:
* Clarify that RocksDB is not exception safe on many of our callback
and extension interfaces
* Clarify FSRandomAccessFile::MultiRead implementations must accept
non-sorted inputs (see https://github.com/facebook/rocksdb/issues/8953)
* Clarify ConcurrentTaskLimiter and SstFileManager are not (currently)
extensible interfaces
* Mark WriteBufferManager as `final`, so it is then clearly not a
callback interface, even though it smells like one
* Clarify TablePropertiesCollector Status returns are mostly ignored

Pull Request resolved: https://github.com/facebook/rocksdb/pull/9080

Test Plan: comments only (except WriteBufferManager final)

Reviewed By: ajkr

Differential Revision: D31968782

Pulled By: pdillinger

fbshipit-source-id: 11b648ce3ce3c5e5bdc02d2eafc7ea4b864bd1d2
This commit is contained in:
Peter Dillinger 2021-11-02 20:29:07 -07:00 committed by Facebook GitHub Bot
parent f72c834eab
commit 82afa01815
24 changed files with 128 additions and 24 deletions

View File

@ -17,6 +17,8 @@
### Public API change ### Public API change
* Made FileSystem extend the Customizable class and added a CreateFromString method. Implementations need to be registered with the ObjectRegistry and to implement a Name() method in order to be created via this method. * Made FileSystem extend the Customizable class and added a CreateFromString method. Implementations need to be registered with the ObjectRegistry and to implement a Name() method in order to be created via this method.
* Clarified in API comments that RocksDB is not exception safe for callbacks and custom extensions. An exception propagating into RocksDB can lead to undefined behavior, including data loss, unreported corruption, deadlocks, and more.
* Marked `WriteBufferManager` as `final` because it is not intended for extension.
## 6.26.0 (2021-10-20) ## 6.26.0 (2021-10-20)
### Bug Fixes ### Bug Fixes

View File

@ -307,35 +307,39 @@ struct AdvancedColumnFamilyOptions {
// delta_value - Delta value to be merged with the existing_value. // delta_value - Delta value to be merged with the existing_value.
// Stored in transaction logs. // Stored in transaction logs.
// merged_value - Set when delta is applied on the previous value. // merged_value - Set when delta is applied on the previous value.
//
// Applicable only when inplace_update_support is true, // Applicable only when inplace_update_support is true,
// this callback function is called at the time of updating the memtable // this callback function is called at the time of updating the memtable
// as part of a Put operation, lets say Put(key, delta_value). It allows the // as part of a Put operation, lets say Put(key, delta_value). It allows the
// 'delta_value' specified as part of the Put operation to be merged with // 'delta_value' specified as part of the Put operation to be merged with
// an 'existing_value' of the key in the database. // an 'existing_value' of the key in the database.
//
// If the merged value is smaller in size that the 'existing_value', // If the merged value is smaller in size that the 'existing_value',
// then this function can update the 'existing_value' buffer inplace and // then this function can update the 'existing_value' buffer inplace and
// the corresponding 'existing_value'_size pointer, if it wishes to. // the corresponding 'existing_value'_size pointer, if it wishes to.
// The callback should return UpdateStatus::UPDATED_INPLACE. // The callback should return UpdateStatus::UPDATED_INPLACE.
// In this case. (In this case, the snapshot-semantics of the rocksdb // In this case. (In this case, the snapshot-semantics of the rocksdb
// Iterator is not atomic anymore). // Iterator is not atomic anymore).
//
// If the merged value is larger in size than the 'existing_value' or the // If the merged value is larger in size than the 'existing_value' or the
// application does not wish to modify the 'existing_value' buffer inplace, // application does not wish to modify the 'existing_value' buffer inplace,
// then the merged value should be returned via *merge_value. It is set by // then the merged value should be returned via *merge_value. It is set by
// merging the 'existing_value' and the Put 'delta_value'. The callback should // merging the 'existing_value' and the Put 'delta_value'. The callback should
// return UpdateStatus::UPDATED in this case. This merged value will be added // return UpdateStatus::UPDATED in this case. This merged value will be added
// to the memtable. // to the memtable.
//
// If merging fails or the application does not wish to take any action, // If merging fails or the application does not wish to take any action,
// then the callback should return UpdateStatus::UPDATE_FAILED. // then the callback should return UpdateStatus::UPDATE_FAILED.
//
// Please remember that the original call from the application is Put(key, // Please remember that the original call from the application is Put(key,
// delta_value). So the transaction log (if enabled) will still contain (key, // delta_value). So the transaction log (if enabled) will still contain (key,
// delta_value). The 'merged_value' is not stored in the transaction log. // delta_value). The 'merged_value' is not stored in the transaction log.
// Hence the inplace_callback function should be consistent across db reopens. // Hence the inplace_callback function should be consistent across db reopens.
//
// RocksDB callbacks are NOT exception-safe. A callback completing with an
// exception can lead to undefined behavior in RocksDB, including data loss,
// unreported corruption, deadlocks, and more.
//
// Default: nullptr // Default: nullptr
UpdateStatus (*inplace_callback)(char* existing_value, UpdateStatus (*inplace_callback)(char* existing_value,
uint32_t* existing_value_size, uint32_t* existing_value_size,

View File

@ -170,6 +170,10 @@ class Cache {
// The SizeCallback takes a void* pointer to the object and returns the size // The SizeCallback takes a void* pointer to the object and returns the size
// of the persistable data. It can be used by the secondary cache to allocate // of the persistable data. It can be used by the secondary cache to allocate
// memory if needed. // memory if needed.
//
// RocksDB callbacks are NOT exception-safe. A callback completing with an
// exception can lead to undefined behavior in RocksDB, including data loss,
// unreported corruption, deadlocks, and more.
using SizeCallback = size_t (*)(void* obj); using SizeCallback = size_t (*)(void* obj);
// The SaveToCallback takes a void* object pointer and saves the persistable // The SaveToCallback takes a void* object pointer and saves the persistable

View File

@ -24,7 +24,10 @@ class SliceTransform;
// CompactionFilter allows an application to modify/delete a key-value during // CompactionFilter allows an application to modify/delete a key-value during
// table file creation. // table file creation.
//
// Exceptions MUST NOT propagate out of overridden functions into RocksDB,
// because RocksDB is not exception-safe. This could cause undefined behavior
// including data loss, unreported corruption, deadlocks, and more.
class CompactionFilter : public Customizable { class CompactionFilter : public Customizable {
public: public:
enum ValueType { enum ValueType {
@ -219,6 +222,10 @@ class CompactionFilter : public Customizable {
// `CompactionFilter` according to `ShouldFilterTableFileCreation()`. This // `CompactionFilter` according to `ShouldFilterTableFileCreation()`. This
// allows the application to know about the different ongoing threads of work // allows the application to know about the different ongoing threads of work
// and makes it unnecessary for `CompactionFilter` to provide thread-safety. // and makes it unnecessary for `CompactionFilter` to provide thread-safety.
//
// Exceptions MUST NOT propagate out of overridden functions into RocksDB,
// because RocksDB is not exception-safe. This could cause undefined behavior
// including data loss, unreported corruption, deadlocks, and more.
class CompactionFilterFactory : public Customizable { class CompactionFilterFactory : public Customizable {
public: public:
virtual ~CompactionFilterFactory() {} virtual ~CompactionFilterFactory() {}

View File

@ -21,6 +21,10 @@ class Slice;
// used as keys in an sstable or a database. A Comparator implementation // used as keys in an sstable or a database. A Comparator implementation
// must be thread-safe since rocksdb may invoke its methods concurrently // must be thread-safe since rocksdb may invoke its methods concurrently
// from multiple threads. // from multiple threads.
//
// Exceptions MUST NOT propagate out of overridden functions into RocksDB,
// because RocksDB is not exception-safe. This could cause undefined behavior
// including data loss, unreported corruption, deadlocks, and more.
class Comparator : public Customizable { class Comparator : public Customizable {
public: public:
Comparator() : timestamp_size_(0) {} Comparator() : timestamp_size_(0) {}

View File

@ -17,6 +17,8 @@
namespace ROCKSDB_NAMESPACE { namespace ROCKSDB_NAMESPACE {
// This is NOT an extensible interface but a public interface for result of
// NewConcurrentTaskLimiter. Any derived classes must be RocksDB internal.
class ConcurrentTaskLimiter { class ConcurrentTaskLimiter {
public: public:
virtual ~ConcurrentTaskLimiter() {} virtual ~ConcurrentTaskLimiter() {}

View File

@ -145,6 +145,9 @@ struct EnvOptions {
RateLimiter* rate_limiter = nullptr; RateLimiter* rate_limiter = nullptr;
}; };
// Exceptions MUST NOT propagate out of overridden functions into RocksDB,
// because RocksDB is not exception-safe. This could cause undefined behavior
// including data loss, unreported corruption, deadlocks, and more.
class Env { class Env {
public: public:
struct FileAttributes { struct FileAttributes {
@ -1150,6 +1153,10 @@ enum InfoLogLevel : unsigned char {
}; };
// An interface for writing log messages. // An interface for writing log messages.
//
// Exceptions MUST NOT propagate out of overridden functions into RocksDB,
// because RocksDB is not exception-safe. This could cause undefined behavior
// including data loss, unreported corruption, deadlocks, and more.
class Logger { class Logger {
public: public:
size_t kDoNotSupportGetLogFileSize = (std::numeric_limits<size_t>::max)(); size_t kDoNotSupportGetLogFileSize = (std::numeric_limits<size_t>::max)();

View File

@ -62,6 +62,10 @@ class BlockAccessCipherStream {
}; };
// BlockCipher // BlockCipher
//
// Exceptions MUST NOT propagate out of overridden functions into RocksDB,
// because RocksDB is not exception-safe. This could cause undefined behavior
// including data loss, unreported corruption, deadlocks, and more.
class BlockCipher : public Customizable { class BlockCipher : public Customizable {
public: public:
virtual ~BlockCipher(){}; virtual ~BlockCipher(){};
@ -105,6 +109,10 @@ class BlockCipher : public Customizable {
// The encryption provider is used to create a cipher stream for a specific // The encryption provider is used to create a cipher stream for a specific
// file. The returned cipher stream will be used for actual // file. The returned cipher stream will be used for actual
// encryption/decryption actions. // encryption/decryption actions.
//
// Exceptions MUST NOT propagate out of overridden functions into RocksDB,
// because RocksDB is not exception-safe. This could cause undefined behavior
// including data loss, unreported corruption, deadlocks, and more.
class EncryptionProvider : public Customizable { class EncryptionProvider : public Customizable {
public: public:
virtual ~EncryptionProvider(){}; virtual ~EncryptionProvider(){};

View File

@ -43,6 +43,10 @@ struct FileChecksumGenContext {
// * Finalize is called at most once during the life of the object // * Finalize is called at most once during the life of the object
// * All calls to Update come before Finalize // * All calls to Update come before Finalize
// * All calls to GetChecksum come after Finalize // * All calls to GetChecksum come after Finalize
//
// Exceptions MUST NOT propagate out of overridden functions into RocksDB,
// because RocksDB is not exception-safe. This could cause undefined behavior
// including data loss, unreported corruption, deadlocks, and more.
class FileChecksumGenerator { class FileChecksumGenerator {
public: public:
virtual ~FileChecksumGenerator() {} virtual ~FileChecksumGenerator() {}
@ -64,6 +68,10 @@ class FileChecksumGenerator {
}; };
// Create the FileChecksumGenerator object for each SST file. // Create the FileChecksumGenerator object for each SST file.
//
// Exceptions MUST NOT propagate out of overridden functions into RocksDB,
// because RocksDB is not exception-safe. This could cause undefined behavior
// including data loss, unreported corruption, deadlocks, and more.
class FileChecksumGenFactory : public Customizable { class FileChecksumGenFactory : public Customizable {
public: public:
virtual ~FileChecksumGenFactory() {} virtual ~FileChecksumGenFactory() {}
@ -86,6 +94,10 @@ class FileChecksumGenFactory : public Customizable {
// the checksum information of all valid SST file of a DB instance. It can // the checksum information of all valid SST file of a DB instance. It can
// also be used to store the checksum information of a list of SST files to // also be used to store the checksum information of a list of SST files to
// be ingested. // be ingested.
//
// Exceptions MUST NOT propagate out of overridden functions into RocksDB,
// because RocksDB is not exception-safe. This could cause undefined behavior
// including data loss, unreported corruption, deadlocks, and more.
class FileChecksumList { class FileChecksumList {
public: public:
virtual ~FileChecksumList() {} virtual ~FileChecksumList() {}

View File

@ -208,7 +208,10 @@ struct IODebugContext {
// retryable. // retryable.
// NewCompositeEnv can be used to create an Env with a custom FileSystem for // NewCompositeEnv can be used to create an Env with a custom FileSystem for
// DBOptions::env. // DBOptions::env.
//
// Exceptions MUST NOT propagate out of overridden functions into RocksDB,
// because RocksDB is not exception-safe. This could cause undefined behavior
// including data loss, unreported corruption, deadlocks, and more.
class FileSystem : public Customizable { class FileSystem : public Customizable {
public: public:
FileSystem(); FileSystem();
@ -731,10 +734,11 @@ class FSRandomAccessFile {
// Read a bunch of blocks as described by reqs. The blocks can // Read a bunch of blocks as described by reqs. The blocks can
// optionally be read in parallel. This is a synchronous call, i.e it // optionally be read in parallel. This is a synchronous call, i.e it
// should return after all reads have completed. The reads will be // should return after all reads have completed. The reads will be
// non-overlapping. If the function return Status is not ok, status of // non-overlapping but can be in any order. If the function return Status
// individual requests will be ignored and return status will be assumed // is not ok, status of individual requests will be ignored and return
// for all read requests. The function return status is only meant for any // status will be assumed for all read requests. The function return status
// any errors that occur before even processing specific read requests // is only meant for errors that occur before processing individual read
// requests.
virtual IOStatus MultiRead(FSReadRequest* reqs, size_t num_reqs, virtual IOStatus MultiRead(FSReadRequest* reqs, size_t num_reqs,
const IOOptions& options, IODebugContext* dbg) { const IOOptions& options, IODebugContext* dbg) {
assert(reqs != nullptr); assert(reqs != nullptr);

View File

@ -18,7 +18,11 @@ struct ConfigOptions;
struct Options; struct Options;
// FlushBlockPolicy provides a configurable way to determine when to flush a // FlushBlockPolicy provides a configurable way to determine when to flush a
// block in the block based tables, // block in the block based tables.
//
// Exceptions MUST NOT propagate out of overridden functions into RocksDB,
// because RocksDB is not exception-safe. This could cause undefined behavior
// including data loss, unreported corruption, deadlocks, and more.
class FlushBlockPolicy { class FlushBlockPolicy {
public: public:
// Keep track of the key/value sequences and return the boolean value to // Keep track of the key/value sequences and return the boolean value to

View File

@ -476,6 +476,10 @@ struct ExternalFileIngestionInfo {
// the current thread holding any DB mutex. This is to prevent potential // the current thread holding any DB mutex. This is to prevent potential
// deadlock and performance issue when using EventListener callback // deadlock and performance issue when using EventListener callback
// in a complex way. // in a complex way.
//
// [Exceptions] Exceptions MUST NOT propagate out of overridden functions into
// RocksDB, because RocksDB is not exception-safe. This could cause undefined
// behavior including data loss, unreported corruption, deadlocks, and more.
class EventListener : public Customizable { class EventListener : public Customizable {
public: public:
static const char* Type() { return "EventListener"; } static const char* Type() { return "EventListener"; }

View File

@ -44,6 +44,9 @@ class Logger;
// //
// Refer to rocksdb-merge wiki for more details and example implementations. // Refer to rocksdb-merge wiki for more details and example implementations.
// //
// Exceptions MUST NOT propagate out of overridden functions into RocksDB,
// because RocksDB is not exception-safe. This could cause undefined behavior
// including data loss, unreported corruption, deadlocks, and more.
class MergeOperator : public Customizable { class MergeOperator : public Customizable {
public: public:
virtual ~MergeOperator() {} virtual ~MergeOperator() {}

View File

@ -396,6 +396,9 @@ struct CompactionServiceJobInfo {
priority(priority_) {} priority(priority_) {}
}; };
// Exceptions MUST NOT propagate out of overridden functions into RocksDB,
// because RocksDB is not exception-safe. This could cause undefined behavior
// including data loss, unreported corruption, deadlocks, and more.
class CompactionService : public Customizable { class CompactionService : public Customizable {
public: public:
static const char* Type() { return "CompactionService"; } static const char* Type() { return "CompactionService"; }

View File

@ -15,6 +15,9 @@
namespace ROCKSDB_NAMESPACE { namespace ROCKSDB_NAMESPACE {
// Exceptions MUST NOT propagate out of overridden functions into RocksDB,
// because RocksDB is not exception-safe. This could cause undefined behavior
// including data loss, unreported corruption, deadlocks, and more.
class RateLimiter { class RateLimiter {
public: public:
enum class OpType { enum class OpType {

View File

@ -43,6 +43,10 @@ class SecondaryCacheResultHandle {
// //
// Cache interface for caching blocks on a secondary tier (which can include // Cache interface for caching blocks on a secondary tier (which can include
// non-volatile media, or alternate forms of caching such as compressed data) // non-volatile media, or alternate forms of caching such as compressed data)
//
// Exceptions MUST NOT propagate out of overridden functions into RocksDB,
// because RocksDB is not exception-safe. This could cause undefined behavior
// including data loss, unreported corruption, deadlocks, and more.
class SecondaryCache : public Customizable { class SecondaryCache : public Customizable {
public: public:
virtual ~SecondaryCache() {} virtual ~SecondaryCache() {}

View File

@ -25,12 +25,14 @@ namespace ROCKSDB_NAMESPACE {
class Slice; class Slice;
struct ConfigOptions; struct ConfigOptions;
/* // A SliceTransform is a generic pluggable way of transforming one string
* A SliceTransform is a generic pluggable way of transforming one string // to another. Its primary use-case is in configuring rocksdb
* to another. Its primary use-case is in configuring rocksdb // to store prefix blooms by setting prefix_extractor in
* to store prefix blooms by setting prefix_extractor in // ColumnFamilyOptions.
* ColumnFamilyOptions. //
*/ // Exceptions MUST NOT propagate out of overridden functions into RocksDB,
// because RocksDB is not exception-safe. This could cause undefined behavior
// including data loss, unreported corruption, deadlocks, and more.
class SliceTransform : public Customizable { class SliceTransform : public Customizable {
public: public:
virtual ~SliceTransform(){}; virtual ~SliceTransform(){};

View File

@ -21,7 +21,8 @@ class Logger;
// SstFileManager is used to track SST and blob files in the DB and control // SstFileManager is used to track SST and blob files in the DB and control
// their deletion rate. All SstFileManager public functions are thread-safe. // their deletion rate. All SstFileManager public functions are thread-safe.
// SstFileManager is not extensible. // SstFileManager is NOT an extensible interface but a public interface for
// result of NewSstFileManager. Any derived classes must be RocksDB internal.
class SstFileManager { class SstFileManager {
public: public:
virtual ~SstFileManager() {} virtual ~SstFileManager() {}

View File

@ -78,6 +78,9 @@ class SstPartitioner {
}; };
}; };
// Exceptions MUST NOT propagate out of overridden functions into RocksDB,
// because RocksDB is not exception-safe. This could cause undefined behavior
// including data loss, unreported corruption, deadlocks, and more.
class SstPartitionerFactory : public Customizable { class SstPartitionerFactory : public Customizable {
public: public:
virtual ~SstPartitionerFactory() {} virtual ~SstPartitionerFactory() {}

View File

@ -573,6 +573,10 @@ enum StatsLevel : uint8_t {
// options.statistics->getTickerCount(NUMBER_BLOCK_COMPRESSED); // options.statistics->getTickerCount(NUMBER_BLOCK_COMPRESSED);
// HistogramData hist; // HistogramData hist;
// options.statistics->histogramData(FLUSH_TIME, &hist); // options.statistics->histogramData(FLUSH_TIME, &hist);
//
// Exceptions MUST NOT propagate out of overridden functions into RocksDB,
// because RocksDB is not exception-safe. This could cause undefined behavior
// including data loss, unreported corruption, deadlocks, and more.
class Statistics : public Customizable { class Statistics : public Customizable {
public: public:
virtual ~Statistics() {} virtual ~Statistics() {}

View File

@ -80,7 +80,14 @@ extern const std::string kRangeDelBlock;
// a collection of callback functions that will be invoked during table // a collection of callback functions that will be invoked during table
// building. It is constructed with TablePropertiesCollectorFactory. The methods // building. It is constructed with TablePropertiesCollectorFactory. The methods
// don't need to be thread-safe, as we will create exactly one // don't need to be thread-safe, as we will create exactly one
// TablePropertiesCollector object per table and then call it sequentially // TablePropertiesCollector object per table and then call it sequentially.
//
// Statuses from these callbacks are currently logged when not OK, but
// otherwise ignored by RocksDB.
//
// Exceptions MUST NOT propagate out of overridden functions into RocksDB,
// because RocksDB is not exception-safe. This could cause undefined behavior
// including data loss, unreported corruption, deadlocks, and more.
class TablePropertiesCollector { class TablePropertiesCollector {
public: public:
virtual ~TablePropertiesCollector() {} virtual ~TablePropertiesCollector() {}
@ -133,6 +140,10 @@ class TablePropertiesCollector {
// Constructs TablePropertiesCollector. Internals create a new // Constructs TablePropertiesCollector. Internals create a new
// TablePropertiesCollector for each new table // TablePropertiesCollector for each new table
//
// Exceptions MUST NOT propagate out of overridden functions into RocksDB,
// because RocksDB is not exception-safe. This could cause undefined behavior
// including data loss, unreported corruption, deadlocks, and more.
class TablePropertiesCollectorFactory : public Customizable { class TablePropertiesCollectorFactory : public Customizable {
public: public:
struct Context { struct Context {

View File

@ -262,6 +262,10 @@ struct CreateBackupOptions {
bool flush_before_backup = false; bool flush_before_backup = false;
// Callback for reporting progress, based on callback_trigger_interval_size. // Callback for reporting progress, based on callback_trigger_interval_size.
//
// RocksDB callbacks are NOT exception-safe. A callback completing with an
// exception can lead to undefined behavior in RocksDB, including data loss,
// unreported corruption, deadlocks, and more.
std::function<void()> progress_callback = []() {}; std::function<void()> progress_callback = []() {};
// If false, background_thread_cpu_priority is ignored. // If false, background_thread_cpu_priority is ignored.

View File

@ -19,6 +19,10 @@ struct ConfigOptions;
// WALFilter allows an application to inspect write-ahead-log (WAL) // WALFilter allows an application to inspect write-ahead-log (WAL)
// records or modify their processing on recovery. // records or modify their processing on recovery.
// Please see the details below. // Please see the details below.
//
// Exceptions MUST NOT propagate out of overridden functions into RocksDB,
// because RocksDB is not exception-safe. This could cause undefined behavior
// including data loss, unreported corruption, deadlocks, and more.
class WalFilter : public Customizable { class WalFilter : public Customizable {
public: public:
static const char* Type() { return "WalFilter"; } static const char* Type() { return "WalFilter"; }

View File

@ -23,8 +23,8 @@
namespace ROCKSDB_NAMESPACE { namespace ROCKSDB_NAMESPACE {
class CacheReservationManager; class CacheReservationManager;
// Interface to block and signal DB instances. // Interface to block and signal DB instances, intended for RocksDB
// Each DB instance contains ptr to StallInterface. // internal use only. Each DB instance contains ptr to StallInterface.
class StallInterface { class StallInterface {
public: public:
virtual ~StallInterface() {} virtual ~StallInterface() {}
@ -34,7 +34,7 @@ class StallInterface {
virtual void Signal() = 0; virtual void Signal() = 0;
}; };
class WriteBufferManager { class WriteBufferManager final {
public: public:
// Parameters: // Parameters:
// _buffer_size: _buffer_size = 0 indicates no limit. Memory won't be capped. // _buffer_size: _buffer_size = 0 indicates no limit. Memory won't be capped.