rocksdb/db/forward_iterator.h
Yu Zhang f2546b6623 Support returning write unix time in iterator property (#12428)
Summary:
This PR adds support to return data's approximate unix write time in the iterator property API. The general implementation is:
1) If the entry comes from a SST file, the sequence number to time mapping recorded in that file's table properties will be used to deduce the entry's write time from its sequence number. If no such recording is available, `std::numeric_limits<uint64_t>::max()` is returned to indicate the write time is unknown except if the entry's sequence number is zero, in which case, 0 is returned. This also means that even if `preclude_last_level_data_seconds` and `preserve_internal_time_seconds` can be toggled off between DB reopens, as long as the SST file's table property has the mapping available, the entry's write time can be deduced and returned.

2) If the entry comes from memtable, we will use the DB's sequence number to write time mapping to do similar things. A copy of the DB's seqno to write time mapping is kept in SuperVersion to allow iterators to have lock free access. This also means a new `SuperVersion` is installed each time DB's seqno to time mapping updates, which is originally proposed by Peter in  https://github.com/facebook/rocksdb/issues/11928 . Similarly, if the feature is not enabled, `std::numeric_limits<uint64_t>::max()` is returned to indicate the write time is unknown.

Needed follow up:
1) The write time for `kTypeValuePreferredSeqno` should be special cased, where it's already specified by the user, so we can directly return it.

2) Flush job can be updated to use DB's seqno to time mapping copy in the SuperVersion.

3) Handle the case when `TimedPut` is called with a write time that is `std::numeric_limits<uint64_t>::max()`. We can make it a regular `Put`.

Pull Request resolved: https://github.com/facebook/rocksdb/pull/12428

Test Plan: Added unit test

Reviewed By: pdillinger

Differential Revision: D54967067

Pulled By: jowlyzhang

fbshipit-source-id: c795b1b7ec142e09e53f2ed3461cf719833cb37a
2024-03-15 15:37:37 -07:00

167 lines
5.7 KiB
C++

// Copyright (c) 2011-present, Facebook, Inc. All rights reserved.
// This source code is licensed under both the GPLv2 (found in the
// COPYING file in the root directory) and Apache 2.0 License
// (found in the LICENSE.Apache file in the root directory).
#pragma once
#include "rocksdb/comparator.h"
#include <queue>
#include <string>
#include <vector>
#include "memory/arena.h"
#include "rocksdb/db.h"
#include "rocksdb/iterator.h"
#include "rocksdb/options.h"
#include "table/internal_iterator.h"
namespace ROCKSDB_NAMESPACE {
class DBImpl;
class Env;
struct SuperVersion;
class ColumnFamilyData;
class ForwardLevelIterator;
class VersionStorageInfo;
struct FileMetaData;
class MinIterComparator {
public:
explicit MinIterComparator(const CompareInterface* comparator)
: comparator_(comparator) {}
bool operator()(InternalIterator* a, InternalIterator* b) {
return comparator_->Compare(a->key(), b->key()) > 0;
}
private:
const CompareInterface* comparator_;
};
using MinIterHeap =
std::priority_queue<InternalIterator*, std::vector<InternalIterator*>,
MinIterComparator>;
/**
* ForwardIterator is a special type of iterator that only supports Seek()
* and Next(). It is expected to perform better than TailingIterator by
* removing the encapsulation and making all information accessible within
* the iterator. At the current implementation, snapshot is taken at the
* time Seek() is called. The Next() followed do not see new values after.
*/
class ForwardIterator : public InternalIterator {
public:
ForwardIterator(DBImpl* db, const ReadOptions& read_options,
ColumnFamilyData* cfd, SuperVersion* current_sv = nullptr,
bool allow_unprepared_value = false);
virtual ~ForwardIterator();
void SeekForPrev(const Slice& /*target*/) override {
status_ = Status::NotSupported("ForwardIterator::SeekForPrev()");
valid_ = false;
}
void SeekToLast() override {
status_ = Status::NotSupported("ForwardIterator::SeekToLast()");
valid_ = false;
}
void Prev() override {
status_ = Status::NotSupported("ForwardIterator::Prev");
valid_ = false;
}
bool Valid() const override;
void SeekToFirst() override;
void Seek(const Slice& target) override;
void Next() override;
Slice key() const override;
Slice value() const override;
uint64_t write_unix_time() const override;
Status status() const override;
bool PrepareValue() override;
Status GetProperty(std::string prop_name, std::string* prop) override;
void SetPinnedItersMgr(PinnedIteratorsManager* pinned_iters_mgr) override;
bool IsKeyPinned() const override;
bool IsValuePinned() const override;
bool TEST_CheckDeletedIters(int* deleted_iters, int* num_iters);
private:
void Cleanup(bool release_sv);
// Unreference and, if needed, clean up the current SuperVersion. This is
// either done immediately or deferred until this iterator is unpinned by
// PinnedIteratorsManager.
void SVCleanup();
static void SVCleanup(DBImpl* db, SuperVersion* sv,
bool background_purge_on_iterator_cleanup);
static void DeferredSVCleanup(void* arg);
void RebuildIterators(bool refresh_sv);
void RenewIterators();
void BuildLevelIterators(const VersionStorageInfo* vstorage,
SuperVersion* sv);
void ResetIncompleteIterators();
void SeekInternal(const Slice& internal_key, bool seek_to_first,
bool seek_after_async_io);
void UpdateCurrent();
bool NeedToSeekImmutable(const Slice& internal_key);
void DeleteCurrentIter();
uint32_t FindFileInRange(const std::vector<FileMetaData*>& files,
const Slice& internal_key, uint32_t left,
uint32_t right);
bool IsOverUpperBound(const Slice& internal_key) const;
// Set PinnedIteratorsManager for all children Iterators, this function should
// be called whenever we update children Iterators or pinned_iters_mgr_.
void UpdateChildrenPinnedItersMgr();
// A helper function that will release iter in the proper manner, or pass it
// to pinned_iters_mgr_ to release it later if pinning is enabled.
void DeleteIterator(InternalIterator* iter, bool is_arena = false);
DBImpl* const db_;
ReadOptions read_options_;
ColumnFamilyData* const cfd_;
const SliceTransform* const prefix_extractor_;
const Comparator* user_comparator_;
const bool allow_unprepared_value_;
MinIterHeap immutable_min_heap_;
SuperVersion* sv_;
InternalIterator* mutable_iter_;
std::vector<InternalIterator*> imm_iters_;
std::vector<InternalIterator*> l0_iters_;
std::vector<ForwardLevelIterator*> level_iters_;
InternalIterator* current_;
bool valid_;
// Internal iterator status; set only by one of the unsupported methods.
Status status_;
// Status of immutable iterators, maintained here to avoid iterating over
// all of them in status().
Status immutable_status_;
// Indicates that at least one of the immutable iterators pointed to a key
// larger than iterate_upper_bound and was therefore destroyed. Seek() may
// need to rebuild such iterators.
bool has_iter_trimmed_for_upper_bound_;
// Is current key larger than iterate_upper_bound? If so, makes Valid()
// return false.
bool current_over_upper_bound_;
// Left endpoint of the range of keys that immutable iterators currently
// cover. When Seek() is called with a key that's within that range, immutable
// iterators don't need to be moved; see NeedToSeekImmutable(). This key is
// included in the range after a Seek(), but excluded when advancing the
// iterator using Next().
IterKey prev_key_;
bool is_prev_set_;
bool is_prev_inclusive_;
PinnedIteratorsManager* pinned_iters_mgr_;
Arena arena_;
};
} // namespace ROCKSDB_NAMESPACE