Group SST write in flush, compaction and db open with new stats (#11910)

Summary:
## Context/Summary
Similar to https://github.com/facebook/rocksdb/pull/11288, https://github.com/facebook/rocksdb/pull/11444, categorizing SST/blob file write according to different io activities allows more insight into the activity.

For that, this PR does the following:
- Tag different write IOs by passing down and converting WriteOptions to IOOptions
- Add new SST_WRITE_MICROS histogram in WritableFileWriter::Append() and breakdown FILE_WRITE_{FLUSH|COMPACTION|DB_OPEN}_MICROS

Some related code refactory to make implementation cleaner:
- Blob stats
   - Replace high-level write measurement with low-level WritableFileWriter::Append() measurement for BLOB_DB_BLOB_FILE_WRITE_MICROS. This is to make FILE_WRITE_{FLUSH|COMPACTION|DB_OPEN}_MICROS  include blob file. As a consequence, this introduces some behavioral changes on it, see HISTORY and db bench test plan below for more info.
   - Fix bugs where BLOB_DB_BLOB_FILE_SYNCED/BLOB_DB_BLOB_FILE_BYTES_WRITTEN include file failed to sync and bytes failed to write.
- Refactor WriteOptions constructor for easier construction with io_activity and rate_limiter_priority
- Refactor DBImpl::~DBImpl()/BlobDBImpl::Close() to bypass thread op verification
- Build table
   - TableBuilderOptions now includes Read/WriteOpitons so BuildTable() do not need to take these two variables
   - Replace the io_priority passed into BuildTable() with TableBuilderOptions::WriteOpitons::rate_limiter_priority. Similar for BlobFileBuilder.
This parameter is used for dynamically changing file io priority for flush, see  https://github.com/facebook/rocksdb/pull/9988?fbclid=IwAR1DtKel6c-bRJAdesGo0jsbztRtciByNlvokbxkV6h_L-AE9MACzqRTT5s for more
   - Update ThreadStatus::FLUSH_BYTES_WRITTEN to use io_activity to track flush IO in flush job and db open instead of io_priority

## Test
### db bench

Flush
```
./db_bench --statistics=1 --benchmarks=fillseq --num=100000 --write_buffer_size=100

rocksdb.sst.write.micros P50 : 1.830863 P95 : 4.094720 P99 : 6.578947 P100 : 26.000000 COUNT : 7875 SUM : 20377
rocksdb.file.write.flush.micros P50 : 1.830863 P95 : 4.094720 P99 : 6.578947 P100 : 26.000000 COUNT : 7875 SUM : 20377
rocksdb.file.write.compaction.micros P50 : 0.000000 P95 : 0.000000 P99 : 0.000000 P100 : 0.000000 COUNT : 0 SUM : 0
rocksdb.file.write.db.open.micros P50 : 0.000000 P95 : 0.000000 P99 : 0.000000 P100 : 0.000000 COUNT : 0 SUM : 0
```

compaction, db oopen
```
Setup: ./db_bench --statistics=1 --benchmarks=fillseq --num=10000 --disable_auto_compactions=1 -write_buffer_size=100 --db=../db_bench

Run:./db_bench --statistics=1 --benchmarks=compact  --db=../db_bench --use_existing_db=1

rocksdb.sst.write.micros P50 : 2.675325 P95 : 9.578788 P99 : 18.780000 P100 : 314.000000 COUNT : 638 SUM : 3279
rocksdb.file.write.flush.micros P50 : 0.000000 P95 : 0.000000 P99 : 0.000000 P100 : 0.000000 COUNT : 0 SUM : 0
rocksdb.file.write.compaction.micros P50 : 2.757353 P95 : 9.610687 P99 : 19.316667 P100 : 314.000000 COUNT : 615 SUM : 3213
rocksdb.file.write.db.open.micros P50 : 2.055556 P95 : 3.925000 P99 : 9.000000 P100 : 9.000000 COUNT : 23 SUM : 66
```

blob stats - just to make sure they aren't broken by this PR
```
Integrated Blob DB

Setup: ./db_bench --enable_blob_files=1 --statistics=1 --benchmarks=fillseq --num=10000 --disable_auto_compactions=1 -write_buffer_size=100 --db=../db_bench

Run:./db_bench --enable_blob_files=1 --statistics=1 --benchmarks=compact  --db=../db_bench --use_existing_db=1

pre-PR:
rocksdb.blobdb.blob.file.write.micros P50 : 7.298246 P95 : 9.771930 P99 : 9.991813 P100 : 16.000000 COUNT : 235 SUM : 1600
rocksdb.blobdb.blob.file.synced COUNT : 1
rocksdb.blobdb.blob.file.bytes.written COUNT : 34842

post-PR:
rocksdb.blobdb.blob.file.write.micros P50 : 2.000000 P95 : 2.829360 P99 : 2.993779 P100 : 9.000000 COUNT : 707 SUM : 1614
- COUNT is higher and values are smaller as it includes header and footer write
- COUNT is 3X higher due to each Append() count as one post-PR, while in pre-PR, 3 Append()s counts as one. See https://github.com/facebook/rocksdb/pull/11910/files#diff-32b811c0a1c000768cfb2532052b44dc0b3bf82253f3eab078e15ff201a0dabfL157-L164

rocksdb.blobdb.blob.file.synced COUNT : 1 (stay the same)
rocksdb.blobdb.blob.file.bytes.written COUNT : 34842 (stay the same)
```

```
Stacked Blob DB

Run: ./db_bench --use_blob_db=1 --statistics=1 --benchmarks=fillseq --num=10000 --disable_auto_compactions=1 -write_buffer_size=100 --db=../db_bench

pre-PR:
rocksdb.blobdb.blob.file.write.micros P50 : 12.808042 P95 : 19.674497 P99 : 28.539683 P100 : 51.000000 COUNT : 10000 SUM : 140876
rocksdb.blobdb.blob.file.synced COUNT : 8
rocksdb.blobdb.blob.file.bytes.written COUNT : 1043445

post-PR:
rocksdb.blobdb.blob.file.write.micros P50 : 1.657370 P95 : 2.952175 P99 : 3.877519 P100 : 24.000000 COUNT : 30001 SUM : 67924
- COUNT is higher and values are smaller as it includes header and footer write
- COUNT is 3X higher due to each Append() count as one post-PR, while in pre-PR, 3 Append()s counts as one. See https://github.com/facebook/rocksdb/pull/11910/files#diff-32b811c0a1c000768cfb2532052b44dc0b3bf82253f3eab078e15ff201a0dabfL157-L164

rocksdb.blobdb.blob.file.synced COUNT : 8 (stay the same)
rocksdb.blobdb.blob.file.bytes.written COUNT : 1043445 (stay the same)
```

###  Rehearsal CI stress test
Trigger 3 full runs of all our CI stress tests

###  Performance

Flush
```
TEST_TMPDIR=/dev/shm ./db_basic_bench_pre_pr --benchmark_filter=ManualFlush/key_num:524288/per_key_size:256 --benchmark_repetitions=1000
-- default: 1 thread is used to run benchmark; enable_statistics = true

Pre-pr: avg 507515519.3 ns
497686074,499444327,500862543,501389862,502994471,503744435,504142123,504224056,505724198,506610393,506837742,506955122,507695561,507929036,508307733,508312691,508999120,509963561,510142147,510698091,510743096,510769317,510957074,511053311,511371367,511409911,511432960,511642385,511691964,511730908,

Post-pr: avg 511971266.5 ns, regressed 0.88%
502744835,506502498,507735420,507929724,508313335,509548582,509994942,510107257,510715603,511046955,511352639,511458478,512117521,512317380,512766303,512972652,513059586,513804934,513808980,514059409,514187369,514389494,514447762,514616464,514622882,514641763,514666265,514716377,514990179,515502408,
```

Compaction
```
TEST_TMPDIR=/dev/shm ./db_basic_bench_{pre|post}_pr --benchmark_filter=ManualCompaction/comp_style:0/max_data:134217728/per_key_size:256/enable_statistics:1  --benchmark_repetitions=1000
-- default: 1 thread is used to run benchmark

Pre-pr: avg 495346098.30 ns
492118301,493203526,494201411,494336607,495269217,495404950,496402598,497012157,497358370,498153846

Post-pr: avg 504528077.20, regressed 1.85%. "ManualCompaction" include flush so the isolated regression for compaction should be around 1.85-0.88 = 0.97%
502465338,502485945,502541789,502909283,503438601,504143885,506113087,506629423,507160414,507393007
```

Put with WAL (in case passing WriteOptions slows down this path even without collecting SST write stats)
```
TEST_TMPDIR=/dev/shm ./db_basic_bench_pre_pr --benchmark_filter=DBPut/comp_style:0/max_data:107374182400/per_key_size:256/enable_statistics:1/wal:1  --benchmark_repetitions=1000
-- default: 1 thread is used to run benchmark

Pre-pr: avg 3848.10 ns
3814,3838,3839,3848,3854,3854,3854,3860,3860,3860

Post-pr: avg 3874.20 ns, regressed 0.68%
3863,3867,3871,3874,3875,3877,3877,3877,3880,3881
```

Pull Request resolved: https://github.com/facebook/rocksdb/pull/11910

Reviewed By: ajkr

Differential Revision: D49788060

Pulled By: hx235

fbshipit-source-id: 79e73699cda5be3b66461687e5147c2484fc5eff
This commit is contained in:
Hui Xiao 2023-12-29 15:29:23 -08:00 committed by Facebook GitHub Bot
parent a036525809
commit 06e593376c
123 changed files with 1821 additions and 1047 deletions

View File

@ -34,9 +34,9 @@ BlobFileBuilder::BlobFileBuilder(
VersionSet* versions, FileSystem* fs, VersionSet* versions, FileSystem* fs,
const ImmutableOptions* immutable_options, const ImmutableOptions* immutable_options,
const MutableCFOptions* mutable_cf_options, const FileOptions* file_options, const MutableCFOptions* mutable_cf_options, const FileOptions* file_options,
std::string db_id, std::string db_session_id, int job_id, const WriteOptions* write_options, std::string db_id,
uint32_t column_family_id, const std::string& column_family_name, std::string db_session_id, int job_id, uint32_t column_family_id,
Env::IOPriority io_priority, Env::WriteLifeTimeHint write_hint, const std::string& column_family_name, Env::WriteLifeTimeHint write_hint,
const std::shared_ptr<IOTracer>& io_tracer, const std::shared_ptr<IOTracer>& io_tracer,
BlobFileCompletionCallback* blob_callback, BlobFileCompletionCallback* blob_callback,
BlobFileCreationReason creation_reason, BlobFileCreationReason creation_reason,
@ -44,18 +44,18 @@ BlobFileBuilder::BlobFileBuilder(
std::vector<BlobFileAddition>* blob_file_additions) std::vector<BlobFileAddition>* blob_file_additions)
: BlobFileBuilder([versions]() { return versions->NewFileNumber(); }, fs, : BlobFileBuilder([versions]() { return versions->NewFileNumber(); }, fs,
immutable_options, mutable_cf_options, file_options, immutable_options, mutable_cf_options, file_options,
db_id, db_session_id, job_id, column_family_id, write_options, db_id, db_session_id, job_id,
column_family_name, io_priority, write_hint, io_tracer, column_family_id, column_family_name, write_hint,
blob_callback, creation_reason, blob_file_paths, io_tracer, blob_callback, creation_reason,
blob_file_additions) {} blob_file_paths, blob_file_additions) {}
BlobFileBuilder::BlobFileBuilder( BlobFileBuilder::BlobFileBuilder(
std::function<uint64_t()> file_number_generator, FileSystem* fs, std::function<uint64_t()> file_number_generator, FileSystem* fs,
const ImmutableOptions* immutable_options, const ImmutableOptions* immutable_options,
const MutableCFOptions* mutable_cf_options, const FileOptions* file_options, const MutableCFOptions* mutable_cf_options, const FileOptions* file_options,
std::string db_id, std::string db_session_id, int job_id, const WriteOptions* write_options, std::string db_id,
uint32_t column_family_id, const std::string& column_family_name, std::string db_session_id, int job_id, uint32_t column_family_id,
Env::IOPriority io_priority, Env::WriteLifeTimeHint write_hint, const std::string& column_family_name, Env::WriteLifeTimeHint write_hint,
const std::shared_ptr<IOTracer>& io_tracer, const std::shared_ptr<IOTracer>& io_tracer,
BlobFileCompletionCallback* blob_callback, BlobFileCompletionCallback* blob_callback,
BlobFileCreationReason creation_reason, BlobFileCreationReason creation_reason,
@ -69,12 +69,12 @@ BlobFileBuilder::BlobFileBuilder(
blob_compression_type_(mutable_cf_options->blob_compression_type), blob_compression_type_(mutable_cf_options->blob_compression_type),
prepopulate_blob_cache_(mutable_cf_options->prepopulate_blob_cache), prepopulate_blob_cache_(mutable_cf_options->prepopulate_blob_cache),
file_options_(file_options), file_options_(file_options),
write_options_(write_options),
db_id_(std::move(db_id)), db_id_(std::move(db_id)),
db_session_id_(std::move(db_session_id)), db_session_id_(std::move(db_session_id)),
job_id_(job_id), job_id_(job_id),
column_family_id_(column_family_id), column_family_id_(column_family_id),
column_family_name_(column_family_name), column_family_name_(column_family_name),
io_priority_(io_priority),
write_hint_(write_hint), write_hint_(write_hint),
io_tracer_(io_tracer), io_tracer_(io_tracer),
blob_callback_(blob_callback), blob_callback_(blob_callback),
@ -87,6 +87,7 @@ BlobFileBuilder::BlobFileBuilder(
assert(fs_); assert(fs_);
assert(immutable_options_); assert(immutable_options_);
assert(file_options_); assert(file_options_);
assert(write_options_);
assert(blob_file_paths_); assert(blob_file_paths_);
assert(blob_file_paths_->empty()); assert(blob_file_paths_->empty());
assert(blob_file_additions_); assert(blob_file_additions_);
@ -207,14 +208,14 @@ Status BlobFileBuilder::OpenBlobFileIfNeeded() {
blob_file_paths_->emplace_back(std::move(blob_file_path)); blob_file_paths_->emplace_back(std::move(blob_file_path));
assert(file); assert(file);
file->SetIOPriority(io_priority_); file->SetIOPriority(write_options_->rate_limiter_priority);
file->SetWriteLifeTimeHint(write_hint_); file->SetWriteLifeTimeHint(write_hint_);
FileTypeSet tmp_set = immutable_options_->checksum_handoff_file_types; FileTypeSet tmp_set = immutable_options_->checksum_handoff_file_types;
Statistics* const statistics = immutable_options_->stats; Statistics* const statistics = immutable_options_->stats;
std::unique_ptr<WritableFileWriter> file_writer(new WritableFileWriter( std::unique_ptr<WritableFileWriter> file_writer(new WritableFileWriter(
std::move(file), blob_file_paths_->back(), *file_options_, std::move(file), blob_file_paths_->back(), *file_options_,
immutable_options_->clock, io_tracer_, statistics, immutable_options_->clock, io_tracer_, statistics,
immutable_options_->listeners, Histograms::BLOB_DB_BLOB_FILE_WRITE_MICROS, immutable_options_->listeners,
immutable_options_->file_checksum_gen_factory.get(), immutable_options_->file_checksum_gen_factory.get(),
tmp_set.Contains(FileType::kBlobFile), false)); tmp_set.Contains(FileType::kBlobFile), false));
@ -231,7 +232,7 @@ Status BlobFileBuilder::OpenBlobFileIfNeeded() {
expiration_range); expiration_range);
{ {
Status s = blob_log_writer->WriteHeader(header); Status s = blob_log_writer->WriteHeader(*write_options_, header);
TEST_SYNC_POINT_CALLBACK( TEST_SYNC_POINT_CALLBACK(
"BlobFileBuilder::OpenBlobFileIfNeeded:WriteHeader", &s); "BlobFileBuilder::OpenBlobFileIfNeeded:WriteHeader", &s);
@ -296,7 +297,8 @@ Status BlobFileBuilder::WriteBlobToFile(const Slice& key, const Slice& blob,
uint64_t key_offset = 0; uint64_t key_offset = 0;
Status s = writer_->AddRecord(key, blob, &key_offset, blob_offset); Status s =
writer_->AddRecord(*write_options_, key, blob, &key_offset, blob_offset);
TEST_SYNC_POINT_CALLBACK("BlobFileBuilder::WriteBlobToFile:AddRecord", &s); TEST_SYNC_POINT_CALLBACK("BlobFileBuilder::WriteBlobToFile:AddRecord", &s);
@ -321,7 +323,8 @@ Status BlobFileBuilder::CloseBlobFile() {
std::string checksum_method; std::string checksum_method;
std::string checksum_value; std::string checksum_value;
Status s = writer_->AppendFooter(footer, &checksum_method, &checksum_value); Status s = writer_->AppendFooter(*write_options_, footer, &checksum_method,
&checksum_value);
TEST_SYNC_POINT_CALLBACK("BlobFileBuilder::WriteBlobToFile:AppendFooter", &s); TEST_SYNC_POINT_CALLBACK("BlobFileBuilder::WriteBlobToFile:AppendFooter", &s);

View File

@ -13,6 +13,7 @@
#include "rocksdb/advanced_options.h" #include "rocksdb/advanced_options.h"
#include "rocksdb/compression_type.h" #include "rocksdb/compression_type.h"
#include "rocksdb/env.h" #include "rocksdb/env.h"
#include "rocksdb/options.h"
#include "rocksdb/rocksdb_namespace.h" #include "rocksdb/rocksdb_namespace.h"
#include "rocksdb/types.h" #include "rocksdb/types.h"
@ -36,11 +37,11 @@ class BlobFileBuilder {
BlobFileBuilder(VersionSet* versions, FileSystem* fs, BlobFileBuilder(VersionSet* versions, FileSystem* fs,
const ImmutableOptions* immutable_options, const ImmutableOptions* immutable_options,
const MutableCFOptions* mutable_cf_options, const MutableCFOptions* mutable_cf_options,
const FileOptions* file_options, std::string db_id, const FileOptions* file_options,
const WriteOptions* write_options, std::string db_id,
std::string db_session_id, int job_id, std::string db_session_id, int job_id,
uint32_t column_family_id, uint32_t column_family_id,
const std::string& column_family_name, const std::string& column_family_name,
Env::IOPriority io_priority,
Env::WriteLifeTimeHint write_hint, Env::WriteLifeTimeHint write_hint,
const std::shared_ptr<IOTracer>& io_tracer, const std::shared_ptr<IOTracer>& io_tracer,
BlobFileCompletionCallback* blob_callback, BlobFileCompletionCallback* blob_callback,
@ -51,11 +52,11 @@ class BlobFileBuilder {
BlobFileBuilder(std::function<uint64_t()> file_number_generator, BlobFileBuilder(std::function<uint64_t()> file_number_generator,
FileSystem* fs, const ImmutableOptions* immutable_options, FileSystem* fs, const ImmutableOptions* immutable_options,
const MutableCFOptions* mutable_cf_options, const MutableCFOptions* mutable_cf_options,
const FileOptions* file_options, std::string db_id, const FileOptions* file_options,
const WriteOptions* write_options, std::string db_id,
std::string db_session_id, int job_id, std::string db_session_id, int job_id,
uint32_t column_family_id, uint32_t column_family_id,
const std::string& column_family_name, const std::string& column_family_name,
Env::IOPriority io_priority,
Env::WriteLifeTimeHint write_hint, Env::WriteLifeTimeHint write_hint,
const std::shared_ptr<IOTracer>& io_tracer, const std::shared_ptr<IOTracer>& io_tracer,
BlobFileCompletionCallback* blob_callback, BlobFileCompletionCallback* blob_callback,
@ -92,12 +93,12 @@ class BlobFileBuilder {
CompressionType blob_compression_type_; CompressionType blob_compression_type_;
PrepopulateBlobCache prepopulate_blob_cache_; PrepopulateBlobCache prepopulate_blob_cache_;
const FileOptions* file_options_; const FileOptions* file_options_;
const WriteOptions* write_options_;
const std::string db_id_; const std::string db_id_;
const std::string db_session_id_; const std::string db_session_id_;
int job_id_; int job_id_;
uint32_t column_family_id_; uint32_t column_family_id_;
std::string column_family_name_; std::string column_family_name_;
Env::IOPriority io_priority_;
Env::WriteLifeTimeHint write_hint_; Env::WriteLifeTimeHint write_hint_;
std::shared_ptr<IOTracer> io_tracer_; std::shared_ptr<IOTracer> io_tracer_;
BlobFileCompletionCallback* blob_callback_; BlobFileCompletionCallback* blob_callback_;

View File

@ -43,6 +43,7 @@ class BlobFileBuilderTest : public testing::Test {
mock_env_.reset(MockEnv::Create(Env::Default())); mock_env_.reset(MockEnv::Create(Env::Default()));
fs_ = mock_env_->GetFileSystem().get(); fs_ = mock_env_->GetFileSystem().get();
clock_ = mock_env_->GetSystemClock().get(); clock_ = mock_env_->GetSystemClock().get();
write_options_.rate_limiter_priority = Env::IO_HIGH;
} }
void VerifyBlobFile(uint64_t blob_file_number, void VerifyBlobFile(uint64_t blob_file_number,
@ -113,6 +114,7 @@ class BlobFileBuilderTest : public testing::Test {
FileSystem* fs_; FileSystem* fs_;
SystemClock* clock_; SystemClock* clock_;
FileOptions file_options_; FileOptions file_options_;
WriteOptions write_options_;
}; };
TEST_F(BlobFileBuilderTest, BuildAndCheckOneFile) { TEST_F(BlobFileBuilderTest, BuildAndCheckOneFile) {
@ -136,7 +138,6 @@ TEST_F(BlobFileBuilderTest, BuildAndCheckOneFile) {
constexpr int job_id = 1; constexpr int job_id = 1;
constexpr uint32_t column_family_id = 123; constexpr uint32_t column_family_id = 123;
constexpr char column_family_name[] = "foobar"; constexpr char column_family_name[] = "foobar";
constexpr Env::IOPriority io_priority = Env::IO_HIGH;
constexpr Env::WriteLifeTimeHint write_hint = Env::WLTH_MEDIUM; constexpr Env::WriteLifeTimeHint write_hint = Env::WLTH_MEDIUM;
std::vector<std::string> blob_file_paths; std::vector<std::string> blob_file_paths;
@ -144,8 +145,8 @@ TEST_F(BlobFileBuilderTest, BuildAndCheckOneFile) {
BlobFileBuilder builder( BlobFileBuilder builder(
TestFileNumberGenerator(), fs_, &immutable_options, &mutable_cf_options, TestFileNumberGenerator(), fs_, &immutable_options, &mutable_cf_options,
&file_options_, "" /*db_id*/, "" /*db_session_id*/, job_id, &file_options_, &write_options_, "" /*db_id*/, "" /*db_session_id*/,
column_family_id, column_family_name, io_priority, write_hint, job_id, column_family_id, column_family_name, write_hint,
nullptr /*IOTracer*/, nullptr /*BlobFileCompletionCallback*/, nullptr /*IOTracer*/, nullptr /*BlobFileCompletionCallback*/,
BlobFileCreationReason::kFlush, &blob_file_paths, &blob_file_additions); BlobFileCreationReason::kFlush, &blob_file_paths, &blob_file_additions);
@ -221,7 +222,6 @@ TEST_F(BlobFileBuilderTest, BuildAndCheckMultipleFiles) {
constexpr int job_id = 1; constexpr int job_id = 1;
constexpr uint32_t column_family_id = 123; constexpr uint32_t column_family_id = 123;
constexpr char column_family_name[] = "foobar"; constexpr char column_family_name[] = "foobar";
constexpr Env::IOPriority io_priority = Env::IO_HIGH;
constexpr Env::WriteLifeTimeHint write_hint = Env::WLTH_MEDIUM; constexpr Env::WriteLifeTimeHint write_hint = Env::WLTH_MEDIUM;
std::vector<std::string> blob_file_paths; std::vector<std::string> blob_file_paths;
@ -229,8 +229,8 @@ TEST_F(BlobFileBuilderTest, BuildAndCheckMultipleFiles) {
BlobFileBuilder builder( BlobFileBuilder builder(
TestFileNumberGenerator(), fs_, &immutable_options, &mutable_cf_options, TestFileNumberGenerator(), fs_, &immutable_options, &mutable_cf_options,
&file_options_, "" /*db_id*/, "" /*db_session_id*/, job_id, &file_options_, &write_options_, "" /*db_id*/, "" /*db_session_id*/,
column_family_id, column_family_name, io_priority, write_hint, job_id, column_family_id, column_family_name, write_hint,
nullptr /*IOTracer*/, nullptr /*BlobFileCompletionCallback*/, nullptr /*IOTracer*/, nullptr /*BlobFileCompletionCallback*/,
BlobFileCreationReason::kFlush, &blob_file_paths, &blob_file_additions); BlobFileCreationReason::kFlush, &blob_file_paths, &blob_file_additions);
@ -309,7 +309,6 @@ TEST_F(BlobFileBuilderTest, InlinedValues) {
constexpr int job_id = 1; constexpr int job_id = 1;
constexpr uint32_t column_family_id = 123; constexpr uint32_t column_family_id = 123;
constexpr char column_family_name[] = "foobar"; constexpr char column_family_name[] = "foobar";
constexpr Env::IOPriority io_priority = Env::IO_HIGH;
constexpr Env::WriteLifeTimeHint write_hint = Env::WLTH_MEDIUM; constexpr Env::WriteLifeTimeHint write_hint = Env::WLTH_MEDIUM;
std::vector<std::string> blob_file_paths; std::vector<std::string> blob_file_paths;
@ -317,8 +316,8 @@ TEST_F(BlobFileBuilderTest, InlinedValues) {
BlobFileBuilder builder( BlobFileBuilder builder(
TestFileNumberGenerator(), fs_, &immutable_options, &mutable_cf_options, TestFileNumberGenerator(), fs_, &immutable_options, &mutable_cf_options,
&file_options_, "" /*db_id*/, "" /*db_session_id*/, job_id, &file_options_, &write_options_, "" /*db_id*/, "" /*db_session_id*/,
column_family_id, column_family_name, io_priority, write_hint, job_id, column_family_id, column_family_name, write_hint,
nullptr /*IOTracer*/, nullptr /*BlobFileCompletionCallback*/, nullptr /*IOTracer*/, nullptr /*BlobFileCompletionCallback*/,
BlobFileCreationReason::kFlush, &blob_file_paths, &blob_file_additions); BlobFileCreationReason::kFlush, &blob_file_paths, &blob_file_additions);
@ -364,7 +363,6 @@ TEST_F(BlobFileBuilderTest, Compression) {
constexpr int job_id = 1; constexpr int job_id = 1;
constexpr uint32_t column_family_id = 123; constexpr uint32_t column_family_id = 123;
constexpr char column_family_name[] = "foobar"; constexpr char column_family_name[] = "foobar";
constexpr Env::IOPriority io_priority = Env::IO_HIGH;
constexpr Env::WriteLifeTimeHint write_hint = Env::WLTH_MEDIUM; constexpr Env::WriteLifeTimeHint write_hint = Env::WLTH_MEDIUM;
std::vector<std::string> blob_file_paths; std::vector<std::string> blob_file_paths;
@ -372,8 +370,8 @@ TEST_F(BlobFileBuilderTest, Compression) {
BlobFileBuilder builder( BlobFileBuilder builder(
TestFileNumberGenerator(), fs_, &immutable_options, &mutable_cf_options, TestFileNumberGenerator(), fs_, &immutable_options, &mutable_cf_options,
&file_options_, "" /*db_id*/, "" /*db_session_id*/, job_id, &file_options_, &write_options_, "" /*db_id*/, "" /*db_session_id*/,
column_family_id, column_family_name, io_priority, write_hint, job_id, column_family_id, column_family_name, write_hint,
nullptr /*IOTracer*/, nullptr /*BlobFileCompletionCallback*/, nullptr /*IOTracer*/, nullptr /*BlobFileCompletionCallback*/,
BlobFileCreationReason::kFlush, &blob_file_paths, &blob_file_additions); BlobFileCreationReason::kFlush, &blob_file_paths, &blob_file_additions);
@ -448,7 +446,6 @@ TEST_F(BlobFileBuilderTest, CompressionError) {
constexpr int job_id = 1; constexpr int job_id = 1;
constexpr uint32_t column_family_id = 123; constexpr uint32_t column_family_id = 123;
constexpr char column_family_name[] = "foobar"; constexpr char column_family_name[] = "foobar";
constexpr Env::IOPriority io_priority = Env::IO_HIGH;
constexpr Env::WriteLifeTimeHint write_hint = Env::WLTH_MEDIUM; constexpr Env::WriteLifeTimeHint write_hint = Env::WLTH_MEDIUM;
std::vector<std::string> blob_file_paths; std::vector<std::string> blob_file_paths;
@ -456,8 +453,8 @@ TEST_F(BlobFileBuilderTest, CompressionError) {
BlobFileBuilder builder( BlobFileBuilder builder(
TestFileNumberGenerator(), fs_, &immutable_options, &mutable_cf_options, TestFileNumberGenerator(), fs_, &immutable_options, &mutable_cf_options,
&file_options_, "" /*db_id*/, "" /*db_session_id*/, job_id, &file_options_, &write_options_, "" /*db_id*/, "" /*db_session_id*/,
column_family_id, column_family_name, io_priority, write_hint, job_id, column_family_id, column_family_name, write_hint,
nullptr /*IOTracer*/, nullptr /*BlobFileCompletionCallback*/, nullptr /*IOTracer*/, nullptr /*BlobFileCompletionCallback*/,
BlobFileCreationReason::kFlush, &blob_file_paths, &blob_file_additions); BlobFileCreationReason::kFlush, &blob_file_paths, &blob_file_additions);
@ -528,7 +525,6 @@ TEST_F(BlobFileBuilderTest, Checksum) {
constexpr int job_id = 1; constexpr int job_id = 1;
constexpr uint32_t column_family_id = 123; constexpr uint32_t column_family_id = 123;
constexpr char column_family_name[] = "foobar"; constexpr char column_family_name[] = "foobar";
constexpr Env::IOPriority io_priority = Env::IO_HIGH;
constexpr Env::WriteLifeTimeHint write_hint = Env::WLTH_MEDIUM; constexpr Env::WriteLifeTimeHint write_hint = Env::WLTH_MEDIUM;
std::vector<std::string> blob_file_paths; std::vector<std::string> blob_file_paths;
@ -536,8 +532,8 @@ TEST_F(BlobFileBuilderTest, Checksum) {
BlobFileBuilder builder( BlobFileBuilder builder(
TestFileNumberGenerator(), fs_, &immutable_options, &mutable_cf_options, TestFileNumberGenerator(), fs_, &immutable_options, &mutable_cf_options,
&file_options_, "" /*db_id*/, "" /*db_session_id*/, job_id, &file_options_, &write_options_, "" /*db_id*/, "" /*db_session_id*/,
column_family_id, column_family_name, io_priority, write_hint, job_id, column_family_id, column_family_name, write_hint,
nullptr /*IOTracer*/, nullptr /*BlobFileCompletionCallback*/, nullptr /*IOTracer*/, nullptr /*BlobFileCompletionCallback*/,
BlobFileCreationReason::kFlush, &blob_file_paths, &blob_file_additions); BlobFileCreationReason::kFlush, &blob_file_paths, &blob_file_additions);
@ -589,11 +585,13 @@ class BlobFileBuilderIOErrorTest
BlobFileBuilderIOErrorTest() : sync_point_(GetParam()) { BlobFileBuilderIOErrorTest() : sync_point_(GetParam()) {
mock_env_.reset(MockEnv::Create(Env::Default())); mock_env_.reset(MockEnv::Create(Env::Default()));
fs_ = mock_env_->GetFileSystem().get(); fs_ = mock_env_->GetFileSystem().get();
write_options_.rate_limiter_priority = Env::IO_HIGH;
} }
std::unique_ptr<Env> mock_env_; std::unique_ptr<Env> mock_env_;
FileSystem* fs_; FileSystem* fs_;
FileOptions file_options_; FileOptions file_options_;
WriteOptions write_options_;
std::string sync_point_; std::string sync_point_;
}; };
@ -626,7 +624,6 @@ TEST_P(BlobFileBuilderIOErrorTest, IOError) {
constexpr int job_id = 1; constexpr int job_id = 1;
constexpr uint32_t column_family_id = 123; constexpr uint32_t column_family_id = 123;
constexpr char column_family_name[] = "foobar"; constexpr char column_family_name[] = "foobar";
constexpr Env::IOPriority io_priority = Env::IO_HIGH;
constexpr Env::WriteLifeTimeHint write_hint = Env::WLTH_MEDIUM; constexpr Env::WriteLifeTimeHint write_hint = Env::WLTH_MEDIUM;
std::vector<std::string> blob_file_paths; std::vector<std::string> blob_file_paths;
@ -634,8 +631,8 @@ TEST_P(BlobFileBuilderIOErrorTest, IOError) {
BlobFileBuilder builder( BlobFileBuilder builder(
TestFileNumberGenerator(), fs_, &immutable_options, &mutable_cf_options, TestFileNumberGenerator(), fs_, &immutable_options, &mutable_cf_options,
&file_options_, "" /*db_id*/, "" /*db_session_id*/, job_id, &file_options_, &write_options_, "" /*db_id*/, "" /*db_session_id*/,
column_family_id, column_family_name, io_priority, write_hint, job_id, column_family_id, column_family_name, write_hint,
nullptr /*IOTracer*/, nullptr /*BlobFileCompletionCallback*/, nullptr /*IOTracer*/, nullptr /*BlobFileCompletionCallback*/,
BlobFileCreationReason::kFlush, &blob_file_paths, &blob_file_additions); BlobFileCreationReason::kFlush, &blob_file_paths, &blob_file_additions);

View File

@ -57,7 +57,7 @@ void WriteBlobFile(uint32_t column_family_id,
BlobLogHeader header(column_family_id, kNoCompression, has_ttl, BlobLogHeader header(column_family_id, kNoCompression, has_ttl,
expiration_range); expiration_range);
ASSERT_OK(blob_log_writer.WriteHeader(header)); ASSERT_OK(blob_log_writer.WriteHeader(WriteOptions(), header));
constexpr char key[] = "key"; constexpr char key[] = "key";
constexpr char blob[] = "blob"; constexpr char blob[] = "blob";
@ -67,7 +67,8 @@ void WriteBlobFile(uint32_t column_family_id,
uint64_t key_offset = 0; uint64_t key_offset = 0;
uint64_t blob_offset = 0; uint64_t blob_offset = 0;
ASSERT_OK(blob_log_writer.AddRecord(key, blob, &key_offset, &blob_offset)); ASSERT_OK(blob_log_writer.AddRecord(WriteOptions(), key, blob, &key_offset,
&blob_offset));
BlobLogFooter footer; BlobLogFooter footer;
footer.blob_count = 1; footer.blob_count = 1;
@ -76,8 +77,8 @@ void WriteBlobFile(uint32_t column_family_id,
std::string checksum_method; std::string checksum_method;
std::string checksum_value; std::string checksum_value;
ASSERT_OK( ASSERT_OK(blob_log_writer.AppendFooter(WriteOptions(), footer,
blob_log_writer.AppendFooter(footer, &checksum_method, &checksum_value)); &checksum_method, &checksum_value));
} }
} // anonymous namespace } // anonymous namespace

View File

@ -63,7 +63,7 @@ void WriteBlobFile(const ImmutableOptions& immutable_options,
BlobLogHeader header(column_family_id, compression, has_ttl, BlobLogHeader header(column_family_id, compression, has_ttl,
expiration_range_header); expiration_range_header);
ASSERT_OK(blob_log_writer.WriteHeader(header)); ASSERT_OK(blob_log_writer.WriteHeader(WriteOptions(), header));
std::vector<std::string> compressed_blobs(num); std::vector<std::string> compressed_blobs(num);
std::vector<Slice> blobs_to_write(num); std::vector<Slice> blobs_to_write(num);
@ -91,7 +91,8 @@ void WriteBlobFile(const ImmutableOptions& immutable_options,
for (size_t i = 0; i < num; ++i) { for (size_t i = 0; i < num; ++i) {
uint64_t key_offset = 0; uint64_t key_offset = 0;
ASSERT_OK(blob_log_writer.AddRecord(keys[i], blobs_to_write[i], &key_offset, ASSERT_OK(blob_log_writer.AddRecord(WriteOptions(), keys[i],
blobs_to_write[i], &key_offset,
&blob_offsets[i])); &blob_offsets[i]));
} }
@ -101,8 +102,8 @@ void WriteBlobFile(const ImmutableOptions& immutable_options,
std::string checksum_method; std::string checksum_method;
std::string checksum_value; std::string checksum_value;
ASSERT_OK( ASSERT_OK(blob_log_writer.AppendFooter(WriteOptions(), footer,
blob_log_writer.AppendFooter(footer, &checksum_method, &checksum_value)); &checksum_method, &checksum_value));
} }
// Creates a test blob file with a single blob in it. Note: this method // Creates a test blob file with a single blob in it. Note: this method
@ -473,7 +474,7 @@ TEST_F(BlobFileReaderTest, Malformed) {
BlobLogHeader header(column_family_id, kNoCompression, has_ttl, BlobLogHeader header(column_family_id, kNoCompression, has_ttl,
expiration_range); expiration_range);
ASSERT_OK(blob_log_writer.WriteHeader(header)); ASSERT_OK(blob_log_writer.WriteHeader(WriteOptions(), header));
} }
constexpr HistogramImpl* blob_file_read_hist = nullptr; constexpr HistogramImpl* blob_file_read_hist = nullptr;

View File

@ -33,35 +33,49 @@ BlobLogWriter::BlobLogWriter(std::unique_ptr<WritableFileWriter>&& dest,
BlobLogWriter::~BlobLogWriter() = default; BlobLogWriter::~BlobLogWriter() = default;
Status BlobLogWriter::Sync() { Status BlobLogWriter::Sync(const WriteOptions& write_options) {
TEST_SYNC_POINT("BlobLogWriter::Sync"); TEST_SYNC_POINT("BlobLogWriter::Sync");
StopWatch sync_sw(clock_, statistics_, BLOB_DB_BLOB_FILE_SYNC_MICROS); StopWatch sync_sw(clock_, statistics_, BLOB_DB_BLOB_FILE_SYNC_MICROS);
Status s = dest_->Sync(use_fsync_); IOOptions opts;
Status s = WritableFileWriter::PrepareIOOptions(write_options, opts);
if (s.ok()) {
s = dest_->Sync(opts, use_fsync_);
}
if (s.ok()) {
RecordTick(statistics_, BLOB_DB_BLOB_FILE_SYNCED); RecordTick(statistics_, BLOB_DB_BLOB_FILE_SYNCED);
}
return s; return s;
} }
Status BlobLogWriter::WriteHeader(BlobLogHeader& header) { Status BlobLogWriter::WriteHeader(const WriteOptions& write_options,
BlobLogHeader& header) {
assert(block_offset_ == 0); assert(block_offset_ == 0);
assert(last_elem_type_ == kEtNone); assert(last_elem_type_ == kEtNone);
std::string str; std::string str;
header.EncodeTo(&str); header.EncodeTo(&str);
Status s = dest_->Append(Slice(str)); IOOptions opts;
Status s = WritableFileWriter::PrepareIOOptions(write_options, opts);
if (s.ok()) {
s = dest_->Append(opts, Slice(str));
}
if (s.ok()) { if (s.ok()) {
block_offset_ += str.size(); block_offset_ += str.size();
if (do_flush_) { if (do_flush_) {
s = dest_->Flush(); s = dest_->Flush(opts);
} }
} }
last_elem_type_ = kEtFileHdr; last_elem_type_ = kEtFileHdr;
if (s.ok()) {
RecordTick(statistics_, BLOB_DB_BLOB_FILE_BYTES_WRITTEN, RecordTick(statistics_, BLOB_DB_BLOB_FILE_BYTES_WRITTEN,
BlobLogHeader::kSize); BlobLogHeader::kSize);
}
return s; return s;
} }
Status BlobLogWriter::AppendFooter(BlobLogFooter& footer, Status BlobLogWriter::AppendFooter(const WriteOptions& write_options,
BlobLogFooter& footer,
std::string* checksum_method, std::string* checksum_method,
std::string* checksum_value) { std::string* checksum_value) {
assert(block_offset_ != 0); assert(block_offset_ != 0);
@ -75,14 +89,17 @@ Status BlobLogWriter::AppendFooter(BlobLogFooter& footer,
s.PermitUncheckedError(); s.PermitUncheckedError();
return Status::IOError("Seen Error. Skip closing."); return Status::IOError("Seen Error. Skip closing.");
} else { } else {
s = dest_->Append(Slice(str)); IOOptions opts;
s = WritableFileWriter::PrepareIOOptions(write_options, opts);
if (s.ok()) {
s = dest_->Append(opts, Slice(str));
}
if (s.ok()) { if (s.ok()) {
block_offset_ += str.size(); block_offset_ += str.size();
s = Sync(write_options);
s = Sync();
if (s.ok()) { if (s.ok()) {
s = dest_->Close(); s = dest_->Close(opts);
if (s.ok()) { if (s.ok()) {
assert(!!checksum_method == !!checksum_value); assert(!!checksum_method == !!checksum_value);
@ -111,12 +128,15 @@ Status BlobLogWriter::AppendFooter(BlobLogFooter& footer,
} }
last_elem_type_ = kEtFileFooter; last_elem_type_ = kEtFileFooter;
if (s.ok()) {
RecordTick(statistics_, BLOB_DB_BLOB_FILE_BYTES_WRITTEN, RecordTick(statistics_, BLOB_DB_BLOB_FILE_BYTES_WRITTEN,
BlobLogFooter::kSize); BlobLogFooter::kSize);
}
return s; return s;
} }
Status BlobLogWriter::AddRecord(const Slice& key, const Slice& val, Status BlobLogWriter::AddRecord(const WriteOptions& write_options,
const Slice& key, const Slice& val,
uint64_t expiration, uint64_t* key_offset, uint64_t expiration, uint64_t* key_offset,
uint64_t* blob_offset) { uint64_t* blob_offset) {
assert(block_offset_ != 0); assert(block_offset_ != 0);
@ -125,11 +145,13 @@ Status BlobLogWriter::AddRecord(const Slice& key, const Slice& val,
std::string buf; std::string buf;
ConstructBlobHeader(&buf, key, val, expiration); ConstructBlobHeader(&buf, key, val, expiration);
Status s = EmitPhysicalRecord(buf, key, val, key_offset, blob_offset); Status s =
EmitPhysicalRecord(write_options, buf, key, val, key_offset, blob_offset);
return s; return s;
} }
Status BlobLogWriter::AddRecord(const Slice& key, const Slice& val, Status BlobLogWriter::AddRecord(const WriteOptions& write_options,
const Slice& key, const Slice& val,
uint64_t* key_offset, uint64_t* blob_offset) { uint64_t* key_offset, uint64_t* blob_offset) {
assert(block_offset_ != 0); assert(block_offset_ != 0);
assert(last_elem_type_ == kEtFileHdr || last_elem_type_ == kEtRecord); assert(last_elem_type_ == kEtFileHdr || last_elem_type_ == kEtRecord);
@ -137,7 +159,8 @@ Status BlobLogWriter::AddRecord(const Slice& key, const Slice& val,
std::string buf; std::string buf;
ConstructBlobHeader(&buf, key, val, 0); ConstructBlobHeader(&buf, key, val, 0);
Status s = EmitPhysicalRecord(buf, key, val, key_offset, blob_offset); Status s =
EmitPhysicalRecord(write_options, buf, key, val, key_offset, blob_offset);
return s; return s;
} }
@ -150,28 +173,34 @@ void BlobLogWriter::ConstructBlobHeader(std::string* buf, const Slice& key,
record.EncodeHeaderTo(buf); record.EncodeHeaderTo(buf);
} }
Status BlobLogWriter::EmitPhysicalRecord(const std::string& headerbuf, Status BlobLogWriter::EmitPhysicalRecord(const WriteOptions& write_options,
const std::string& headerbuf,
const Slice& key, const Slice& val, const Slice& key, const Slice& val,
uint64_t* key_offset, uint64_t* key_offset,
uint64_t* blob_offset) { uint64_t* blob_offset) {
StopWatch write_sw(clock_, statistics_, BLOB_DB_BLOB_FILE_WRITE_MICROS); IOOptions opts;
Status s = dest_->Append(Slice(headerbuf)); Status s = WritableFileWriter::PrepareIOOptions(write_options, opts);
if (s.ok()) { if (s.ok()) {
s = dest_->Append(key); s = dest_->Append(opts, Slice(headerbuf));
} }
if (s.ok()) { if (s.ok()) {
s = dest_->Append(val); s = dest_->Append(opts, key);
}
if (s.ok()) {
s = dest_->Append(opts, val);
} }
if (do_flush_ && s.ok()) { if (do_flush_ && s.ok()) {
s = dest_->Flush(); s = dest_->Flush(opts);
} }
*key_offset = block_offset_ + BlobLogRecord::kHeaderSize; *key_offset = block_offset_ + BlobLogRecord::kHeaderSize;
*blob_offset = *key_offset + key.size(); *blob_offset = *key_offset + key.size();
block_offset_ = *blob_offset + val.size(); block_offset_ = *blob_offset + val.size();
last_elem_type_ = kEtRecord; last_elem_type_ = kEtRecord;
if (s.ok()) {
RecordTick(statistics_, BLOB_DB_BLOB_FILE_BYTES_WRITTEN, RecordTick(statistics_, BLOB_DB_BLOB_FILE_BYTES_WRITTEN,
BlobLogRecord::kHeaderSize + key.size() + val.size()); BlobLogRecord::kHeaderSize + key.size() + val.size());
}
return s; return s;
} }

View File

@ -43,20 +43,24 @@ class BlobLogWriter {
static void ConstructBlobHeader(std::string* buf, const Slice& key, static void ConstructBlobHeader(std::string* buf, const Slice& key,
const Slice& val, uint64_t expiration); const Slice& val, uint64_t expiration);
Status AddRecord(const Slice& key, const Slice& val, uint64_t* key_offset, Status AddRecord(const WriteOptions& write_options, const Slice& key,
uint64_t* blob_offset);
Status AddRecord(const Slice& key, const Slice& val, uint64_t expiration,
uint64_t* key_offset, uint64_t* blob_offset);
Status EmitPhysicalRecord(const std::string& headerbuf, const Slice& key,
const Slice& val, uint64_t* key_offset, const Slice& val, uint64_t* key_offset,
uint64_t* blob_offset); uint64_t* blob_offset);
Status AppendFooter(BlobLogFooter& footer, std::string* checksum_method, Status AddRecord(const WriteOptions& write_options, const Slice& key,
const Slice& val, uint64_t expiration, uint64_t* key_offset,
uint64_t* blob_offset);
Status EmitPhysicalRecord(const WriteOptions& write_options,
const std::string& headerbuf, const Slice& key,
const Slice& val, uint64_t* key_offset,
uint64_t* blob_offset);
Status AppendFooter(const WriteOptions& write_options, BlobLogFooter& footer,
std::string* checksum_method,
std::string* checksum_value); std::string* checksum_value);
Status WriteHeader(BlobLogHeader& header); Status WriteHeader(const WriteOptions& write_options, BlobLogHeader& header);
WritableFileWriter* file() { return dest_.get(); } WritableFileWriter* file() { return dest_.get(); }
@ -64,7 +68,7 @@ class BlobLogWriter {
uint64_t get_log_number() const { return log_number_; } uint64_t get_log_number() const { return log_number_; }
Status Sync(); Status Sync(const WriteOptions& write_options);
private: private:
std::unique_ptr<WritableFileWriter> dest_; std::unique_ptr<WritableFileWriter> dest_;

View File

@ -65,7 +65,7 @@ void WriteBlobFile(const ImmutableOptions& immutable_options,
BlobLogHeader header(column_family_id, compression, has_ttl, BlobLogHeader header(column_family_id, compression, has_ttl,
expiration_range_header); expiration_range_header);
ASSERT_OK(blob_log_writer.WriteHeader(header)); ASSERT_OK(blob_log_writer.WriteHeader(WriteOptions(), header));
std::vector<std::string> compressed_blobs(num); std::vector<std::string> compressed_blobs(num);
std::vector<Slice> blobs_to_write(num); std::vector<Slice> blobs_to_write(num);
@ -93,7 +93,8 @@ void WriteBlobFile(const ImmutableOptions& immutable_options,
for (size_t i = 0; i < num; ++i) { for (size_t i = 0; i < num; ++i) {
uint64_t key_offset = 0; uint64_t key_offset = 0;
ASSERT_OK(blob_log_writer.AddRecord(keys[i], blobs_to_write[i], &key_offset, ASSERT_OK(blob_log_writer.AddRecord(WriteOptions(), keys[i],
blobs_to_write[i], &key_offset,
&blob_offsets[i])); &blob_offsets[i]));
} }
@ -103,8 +104,8 @@ void WriteBlobFile(const ImmutableOptions& immutable_options,
std::string checksum_method; std::string checksum_method;
std::string checksum_value; std::string checksum_value;
ASSERT_OK( ASSERT_OK(blob_log_writer.AppendFooter(WriteOptions(), footer,
blob_log_writer.AppendFooter(footer, &checksum_method, &checksum_value)); &checksum_method, &checksum_value));
} }
} // anonymous namespace } // anonymous namespace

View File

@ -32,6 +32,7 @@
#include "options/options_helper.h" #include "options/options_helper.h"
#include "rocksdb/db.h" #include "rocksdb/db.h"
#include "rocksdb/env.h" #include "rocksdb/env.h"
#include "rocksdb/file_system.h"
#include "rocksdb/iterator.h" #include "rocksdb/iterator.h"
#include "rocksdb/options.h" #include "rocksdb/options.h"
#include "rocksdb/table.h" #include "rocksdb/table.h"
@ -57,8 +58,8 @@ TableBuilder* NewTableBuilder(const TableBuilderOptions& tboptions,
Status BuildTable( Status BuildTable(
const std::string& dbname, VersionSet* versions, const std::string& dbname, VersionSet* versions,
const ImmutableDBOptions& db_options, const TableBuilderOptions& tboptions, const ImmutableDBOptions& db_options, const TableBuilderOptions& tboptions,
const FileOptions& file_options, const ReadOptions& read_options, const FileOptions& file_options, TableCache* table_cache,
TableCache* table_cache, InternalIterator* iter, InternalIterator* iter,
std::vector<std::unique_ptr<FragmentedRangeTombstoneIterator>> std::vector<std::unique_ptr<FragmentedRangeTombstoneIterator>>
range_del_iters, range_del_iters,
FileMetaData* meta, std::vector<BlobFileAddition>* blob_file_additions, FileMetaData* meta, std::vector<BlobFileAddition>* blob_file_additions,
@ -69,9 +70,8 @@ Status BuildTable(
IOStatus* io_status, const std::shared_ptr<IOTracer>& io_tracer, IOStatus* io_status, const std::shared_ptr<IOTracer>& io_tracer,
BlobFileCreationReason blob_creation_reason, BlobFileCreationReason blob_creation_reason,
const SeqnoToTimeMapping& seqno_to_time_mapping, EventLogger* event_logger, const SeqnoToTimeMapping& seqno_to_time_mapping, EventLogger* event_logger,
int job_id, const Env::IOPriority io_priority, int job_id, TableProperties* table_properties,
TableProperties* table_properties, Env::WriteLifeTimeHint write_hint, Env::WriteLifeTimeHint write_hint, const std::string* full_history_ts_low,
const std::string* full_history_ts_low,
BlobFileCompletionCallback* blob_callback, Version* version, BlobFileCompletionCallback* blob_callback, Version* version,
uint64_t* num_input_entries, uint64_t* memtable_payload_bytes, uint64_t* num_input_entries, uint64_t* memtable_payload_bytes,
uint64_t* memtable_garbage_bytes) { uint64_t* memtable_garbage_bytes) {
@ -164,11 +164,11 @@ Status BuildTable(
table_file_created = true; table_file_created = true;
FileTypeSet tmp_set = ioptions.checksum_handoff_file_types; FileTypeSet tmp_set = ioptions.checksum_handoff_file_types;
file->SetIOPriority(io_priority); file->SetIOPriority(tboptions.write_options.rate_limiter_priority);
file->SetWriteLifeTimeHint(write_hint); file->SetWriteLifeTimeHint(write_hint);
file_writer.reset(new WritableFileWriter( file_writer.reset(new WritableFileWriter(
std::move(file), fname, file_options, ioptions.clock, io_tracer, std::move(file), fname, file_options, ioptions.clock, io_tracer,
ioptions.stats, ioptions.listeners, ioptions.stats, Histograms::SST_WRITE_MICROS, ioptions.listeners,
ioptions.file_checksum_gen_factory.get(), ioptions.file_checksum_gen_factory.get(),
tmp_set.Contains(FileType::kTableFile), false)); tmp_set.Contains(FileType::kTableFile), false));
@ -188,10 +188,11 @@ Status BuildTable(
blob_file_additions) blob_file_additions)
? new BlobFileBuilder( ? new BlobFileBuilder(
versions, fs, &ioptions, &mutable_cf_options, &file_options, versions, fs, &ioptions, &mutable_cf_options, &file_options,
tboptions.db_id, tboptions.db_session_id, job_id, &(tboptions.write_options), tboptions.db_id,
tboptions.column_family_id, tboptions.column_family_name, tboptions.db_session_id, job_id, tboptions.column_family_id,
io_priority, write_hint, io_tracer, blob_callback, tboptions.column_family_name, write_hint, io_tracer,
blob_creation_reason, &blob_file_paths, blob_file_additions) blob_callback, blob_creation_reason, &blob_file_paths,
blob_file_additions)
: nullptr); : nullptr);
const std::atomic<bool> kManualCompactionCanceledFalse{false}; const std::atomic<bool> kManualCompactionCanceledFalse{false};
@ -244,7 +245,11 @@ Status BuildTable(
} }
// TODO(noetzli): Update stats after flush, too. // TODO(noetzli): Update stats after flush, too.
if (io_priority == Env::IO_HIGH && // TODO(hx235): Replace `rate_limiter_priority` with `io_activity` for
// flush IO in repair when we have an `Env::IOActivity` enum for it
if ((tboptions.write_options.io_activity == Env::IOActivity::kFlush ||
tboptions.write_options.io_activity == Env::IOActivity::kDBOpen ||
tboptions.write_options.rate_limiter_priority == Env::IO_HIGH) &&
IOSTATS(bytes_written) >= kReportFlushIOStatsEvery) { IOSTATS(bytes_written) >= kReportFlushIOStatsEvery) {
ThreadStatusUtil::SetThreadOperationProperty( ThreadStatusUtil::SetThreadOperationProperty(
ThreadStatus::FLUSH_BYTES_WRITTEN, IOSTATS(bytes_written)); ThreadStatus::FLUSH_BYTES_WRITTEN, IOSTATS(bytes_written));
@ -275,7 +280,7 @@ Status BuildTable(
SizeApproximationOptions approx_opts; SizeApproximationOptions approx_opts;
approx_opts.files_size_error_margin = 0.1; approx_opts.files_size_error_margin = 0.1;
meta->compensated_range_deletion_size += versions->ApproximateSize( meta->compensated_range_deletion_size += versions->ApproximateSize(
approx_opts, read_options, version, kv.first.Encode(), approx_opts, tboptions.read_options, version, kv.first.Encode(),
tombstone_end.Encode(), 0 /* start_level */, -1 /* end_level */, tombstone_end.Encode(), 0 /* start_level */, -1 /* end_level */,
TableReaderCaller::kFlush); TableReaderCaller::kFlush);
} }
@ -346,13 +351,16 @@ Status BuildTable(
// Finish and check for file errors // Finish and check for file errors
TEST_SYNC_POINT("BuildTable:BeforeSyncTable"); TEST_SYNC_POINT("BuildTable:BeforeSyncTable");
if (s.ok() && !empty) { IOOptions opts;
*io_status =
WritableFileWriter::PrepareIOOptions(tboptions.write_options, opts);
if (s.ok() && io_status->ok() && !empty) {
StopWatch sw(ioptions.clock, ioptions.stats, TABLE_SYNC_MICROS); StopWatch sw(ioptions.clock, ioptions.stats, TABLE_SYNC_MICROS);
*io_status = file_writer->Sync(ioptions.use_fsync); *io_status = file_writer->Sync(opts, ioptions.use_fsync);
} }
TEST_SYNC_POINT("BuildTable:BeforeCloseTableFile"); TEST_SYNC_POINT("BuildTable:BeforeCloseTableFile");
if (s.ok() && io_status->ok() && !empty) { if (s.ok() && io_status->ok() && !empty) {
*io_status = file_writer->Close(); *io_status = file_writer->Close(opts);
} }
if (s.ok() && io_status->ok() && !empty) { if (s.ok() && io_status->ok() && !empty) {
// Add the checksum information to file metadata. // Add the checksum information to file metadata.
@ -396,9 +404,9 @@ Status BuildTable(
// No matter whether use_direct_io_for_flush_and_compaction is true, // No matter whether use_direct_io_for_flush_and_compaction is true,
// the goal is to cache it here for further user reads. // the goal is to cache it here for further user reads.
std::unique_ptr<InternalIterator> it(table_cache->NewIterator( std::unique_ptr<InternalIterator> it(table_cache->NewIterator(
read_options, file_options, tboptions.internal_comparator, *meta, tboptions.read_options, file_options, tboptions.internal_comparator,
nullptr /* range_del_agg */, mutable_cf_options.prefix_extractor, *meta, nullptr /* range_del_agg */,
nullptr, mutable_cf_options.prefix_extractor, nullptr,
(internal_stats == nullptr) ? nullptr (internal_stats == nullptr) ? nullptr
: internal_stats->GetFileReadHist(0), : internal_stats->GetFileReadHist(0),
TableReaderCaller::kFlush, /*arena=*/nullptr, TableReaderCaller::kFlush, /*arena=*/nullptr,
@ -436,9 +444,14 @@ Status BuildTable(
constexpr IODebugContext* dbg = nullptr; constexpr IODebugContext* dbg = nullptr;
if (table_file_created) { if (table_file_created) {
Status ignored = fs->DeleteFile(fname, IOOptions(), dbg); IOOptions opts;
Status prepare =
WritableFileWriter::PrepareIOOptions(tboptions.write_options, opts);
if (prepare.ok()) {
Status ignored = fs->DeleteFile(fname, opts, dbg);
ignored.PermitUncheckedError(); ignored.PermitUncheckedError();
} }
}
assert(blob_file_additions || blob_file_paths.empty()); assert(blob_file_additions || blob_file_paths.empty());

View File

@ -53,8 +53,8 @@ TableBuilder* NewTableBuilder(const TableBuilderOptions& tboptions,
extern Status BuildTable( extern Status BuildTable(
const std::string& dbname, VersionSet* versions, const std::string& dbname, VersionSet* versions,
const ImmutableDBOptions& db_options, const TableBuilderOptions& tboptions, const ImmutableDBOptions& db_options, const TableBuilderOptions& tboptions,
const FileOptions& file_options, const ReadOptions& read_options, const FileOptions& file_options, TableCache* table_cache,
TableCache* table_cache, InternalIterator* iter, InternalIterator* iter,
std::vector<std::unique_ptr<FragmentedRangeTombstoneIterator>> std::vector<std::unique_ptr<FragmentedRangeTombstoneIterator>>
range_del_iters, range_del_iters,
FileMetaData* meta, std::vector<BlobFileAddition>* blob_file_additions, FileMetaData* meta, std::vector<BlobFileAddition>* blob_file_additions,
@ -66,7 +66,6 @@ extern Status BuildTable(
BlobFileCreationReason blob_creation_reason, BlobFileCreationReason blob_creation_reason,
const SeqnoToTimeMapping& seqno_to_time_mapping, const SeqnoToTimeMapping& seqno_to_time_mapping,
EventLogger* event_logger = nullptr, int job_id = 0, EventLogger* event_logger = nullptr, int job_id = 0,
const Env::IOPriority io_priority = Env::IO_HIGH,
TableProperties* table_properties = nullptr, TableProperties* table_properties = nullptr,
Env::WriteLifeTimeHint write_hint = Env::WLTH_NOT_SET, Env::WriteLifeTimeHint write_hint = Env::WLTH_NOT_SET,
const std::string* full_history_ts_low = nullptr, const std::string* full_history_ts_low = nullptr,

View File

@ -1168,7 +1168,7 @@ Status ColumnFamilyData::RangesOverlapWithMemtables(
*overlap = false; *overlap = false;
// Create an InternalIterator over all unflushed memtables // Create an InternalIterator over all unflushed memtables
Arena arena; Arena arena;
// TODO: plumb Env::IOActivity // TODO: plumb Env::IOActivity, Env::IOPriority
ReadOptions read_opts; ReadOptions read_opts;
read_opts.total_order_seek = true; read_opts.total_order_seek = true;
MergeIteratorBuilder merge_iter_builder(&internal_comparator_, &arena); MergeIteratorBuilder merge_iter_builder(&internal_comparator_, &arena);

View File

@ -1130,6 +1130,9 @@ void CompactionJob::ProcessKeyValueCompaction(SubcompactionState* sub_compact) {
// (b) CompactionFilter::Decision::kRemoveAndSkipUntil. // (b) CompactionFilter::Decision::kRemoveAndSkipUntil.
read_options.total_order_seek = true; read_options.total_order_seek = true;
const WriteOptions write_options(Env::IOPriority::IO_LOW,
Env::IOActivity::kCompaction);
// Remove the timestamps from boundaries because boundaries created in // Remove the timestamps from boundaries because boundaries created in
// GenSubcompactionBoundaries doesn't strip away the timestamp. // GenSubcompactionBoundaries doesn't strip away the timestamp.
size_t ts_sz = cfd->user_comparator()->timestamp_size(); size_t ts_sz = cfd->user_comparator()->timestamp_size();
@ -1264,8 +1267,8 @@ void CompactionJob::ProcessKeyValueCompaction(SubcompactionState* sub_compact) {
? new BlobFileBuilder( ? new BlobFileBuilder(
versions_, fs_.get(), versions_, fs_.get(),
sub_compact->compaction->immutable_options(), sub_compact->compaction->immutable_options(),
mutable_cf_options, &file_options_, db_id_, db_session_id_, mutable_cf_options, &file_options_, &write_options, db_id_,
job_id_, cfd->GetID(), cfd->GetName(), Env::IOPriority::IO_LOW, db_session_id_, job_id_, cfd->GetID(), cfd->GetName(),
write_hint_, io_tracer_, blob_callback_, write_hint_, io_tracer_, blob_callback_,
BlobFileCreationReason::kCompaction, &blob_file_paths, BlobFileCreationReason::kCompaction, &blob_file_paths,
sub_compact->Current().GetBlobFileAdditionsPtr()) sub_compact->Current().GetBlobFileAdditionsPtr())
@ -1710,6 +1713,8 @@ Status CompactionJob::InstallCompactionResults(
db_mutex_->AssertHeld(); db_mutex_->AssertHeld();
const ReadOptions read_options(Env::IOActivity::kCompaction); const ReadOptions read_options(Env::IOActivity::kCompaction);
const WriteOptions write_options(Env::IOActivity::kCompaction);
auto* compaction = compact_->compaction; auto* compaction = compact_->compaction;
assert(compaction); assert(compaction);
@ -1792,8 +1797,9 @@ Status CompactionJob::InstallCompactionResults(
}; };
return versions_->LogAndApply( return versions_->LogAndApply(
compaction->column_family_data(), mutable_cf_options, read_options, edit, compaction->column_family_data(), mutable_cf_options, read_options,
db_mutex_, db_directory_, /*new_descriptor_log=*/false, write_options, edit, db_mutex_, db_directory_,
/*new_descriptor_log=*/false,
/*column_family_options=*/nullptr, manifest_wcb); /*column_family_options=*/nullptr, manifest_wcb);
} }
@ -1943,13 +1949,17 @@ Status CompactionJob::OpenCompactionOutputFile(SubcompactionState* sub_compact,
sub_compact->compaction->immutable_options()->listeners; sub_compact->compaction->immutable_options()->listeners;
outputs.AssignFileWriter(new WritableFileWriter( outputs.AssignFileWriter(new WritableFileWriter(
std::move(writable_file), fname, fo_copy, db_options_.clock, io_tracer_, std::move(writable_file), fname, fo_copy, db_options_.clock, io_tracer_,
db_options_.stats, listeners, db_options_.file_checksum_gen_factory.get(), db_options_.stats, Histograms::SST_WRITE_MICROS, listeners,
db_options_.file_checksum_gen_factory.get(),
tmp_set.Contains(FileType::kTableFile), false)); tmp_set.Contains(FileType::kTableFile), false));
// TODO(hx235): pass in the correct `oldest_key_time` instead of `0` // TODO(hx235): pass in the correct `oldest_key_time` instead of `0`
const ReadOptions read_options(Env::IOActivity::kCompaction);
const WriteOptions write_options(Env::IOActivity::kCompaction);
TableBuilderOptions tboptions( TableBuilderOptions tboptions(
*cfd->ioptions(), *(sub_compact->compaction->mutable_cf_options()), *cfd->ioptions(), *(sub_compact->compaction->mutable_cf_options()),
cfd->internal_comparator(), cfd->int_tbl_prop_collector_factories(), read_options, write_options, cfd->internal_comparator(),
cfd->int_tbl_prop_collector_factories(),
sub_compact->compaction->output_compression(), sub_compact->compaction->output_compression(),
sub_compact->compaction->output_compression_opts(), cfd->GetID(), sub_compact->compaction->output_compression_opts(), cfd->GetID(),
cfd->GetName(), sub_compact->compaction->output_level(), cfd->GetName(), sub_compact->compaction->output_level(),

View File

@ -295,9 +295,12 @@ class CompactionJobTestBase : public testing::Test {
Status s = WritableFileWriter::Create(fs_, table_name, FileOptions(), Status s = WritableFileWriter::Create(fs_, table_name, FileOptions(),
&file_writer, nullptr); &file_writer, nullptr);
ASSERT_OK(s); ASSERT_OK(s);
const ReadOptions read_options;
const WriteOptions write_options;
std::unique_ptr<TableBuilder> table_builder( std::unique_ptr<TableBuilder> table_builder(
cf_options_.table_factory->NewTableBuilder( cf_options_.table_factory->NewTableBuilder(
TableBuilderOptions(*cfd_->ioptions(), mutable_cf_options_, TableBuilderOptions(*cfd_->ioptions(), mutable_cf_options_,
read_options, write_options,
cfd_->internal_comparator(), cfd_->internal_comparator(),
cfd_->int_tbl_prop_collector_factories(), cfd_->int_tbl_prop_collector_factories(),
CompressionType::kNoCompression, CompressionType::kNoCompression,
@ -394,7 +397,7 @@ class CompactionJobTestBase : public testing::Test {
mutex_.Lock(); mutex_.Lock();
EXPECT_OK(versions_->LogAndApply( EXPECT_OK(versions_->LogAndApply(
versions_->GetColumnFamilySet()->GetDefault(), mutable_cf_options_, versions_->GetColumnFamilySet()->GetDefault(), mutable_cf_options_,
read_options_, &edit, &mutex_, nullptr)); read_options_, write_options_, &edit, &mutex_, nullptr));
mutex_.Unlock(); mutex_.Unlock();
} }
@ -549,7 +552,7 @@ class CompactionJobTestBase : public testing::Test {
/*db_id=*/"", /*db_session_id=*/"", /*daily_offpeak_time_utc=*/"", /*db_id=*/"", /*db_session_id=*/"", /*daily_offpeak_time_utc=*/"",
/*error_handler=*/nullptr, /*read_only=*/false)); /*error_handler=*/nullptr, /*read_only=*/false));
compaction_job_stats_.Reset(); compaction_job_stats_.Reset();
ASSERT_OK(SetIdentityFile(env_, dbname_)); ASSERT_OK(SetIdentityFile(WriteOptions(), env_, dbname_));
VersionEdit new_db; VersionEdit new_db;
new_db.SetLogNumber(0); new_db.SetLogNumber(0);
@ -568,11 +571,11 @@ class CompactionJobTestBase : public testing::Test {
log::Writer log(std::move(file_writer), 0, false); log::Writer log(std::move(file_writer), 0, false);
std::string record; std::string record;
new_db.EncodeTo(&record); new_db.EncodeTo(&record);
s = log.AddRecord(record); s = log.AddRecord(WriteOptions(), record);
} }
ASSERT_OK(s); ASSERT_OK(s);
// Make "CURRENT" file that points to the new manifest file. // Make "CURRENT" file that points to the new manifest file.
s = SetCurrentFile(fs_.get(), dbname_, 1, nullptr); s = SetCurrentFile(WriteOptions(), fs_.get(), dbname_, 1, nullptr);
ASSERT_OK(s); ASSERT_OK(s);
@ -736,6 +739,7 @@ class CompactionJobTestBase : public testing::Test {
MutableCFOptions mutable_cf_options_; MutableCFOptions mutable_cf_options_;
MutableDBOptions mutable_db_options_; MutableDBOptions mutable_db_options_;
const ReadOptions read_options_; const ReadOptions read_options_;
const WriteOptions write_options_;
std::shared_ptr<Cache> table_cache_; std::shared_ptr<Cache> table_cache_;
WriteController write_controller_; WriteController write_controller_;
WriteBufferManager write_buffer_manager_; WriteBufferManager write_buffer_manager_;

View File

@ -62,12 +62,15 @@ IOStatus CompactionOutputs::WriterSyncClose(const Status& input_status,
Statistics* statistics, Statistics* statistics,
bool use_fsync) { bool use_fsync) {
IOStatus io_s; IOStatus io_s;
if (input_status.ok()) { IOOptions opts;
io_s = WritableFileWriter::PrepareIOOptions(
WriteOptions(Env::IOActivity::kCompaction), opts);
if (input_status.ok() && io_s.ok()) {
StopWatch sw(clock, statistics, COMPACTION_OUTFILE_SYNC_MICROS); StopWatch sw(clock, statistics, COMPACTION_OUTFILE_SYNC_MICROS);
io_s = file_writer_->Sync(use_fsync); io_s = file_writer_->Sync(opts, use_fsync);
} }
if (input_status.ok() && io_s.ok()) { if (input_status.ok() && io_s.ok()) {
io_s = file_writer_->Close(); io_s = file_writer_->Close(opts);
} }
if (input_status.ok() && io_s.ok()) { if (input_status.ok() && io_s.ok()) {

View File

@ -34,7 +34,7 @@ Status DeleteFilesInRanges(DB* db, ColumnFamilyHandle* column_family,
Status VerifySstFileChecksum(const Options& options, Status VerifySstFileChecksum(const Options& options,
const EnvOptions& env_options, const EnvOptions& env_options,
const std::string& file_path) { const std::string& file_path) {
// TODO: plumb Env::IOActivity // TODO: plumb Env::IOActivity, Env::IOPriority
const ReadOptions read_options; const ReadOptions read_options;
return VerifySstFileChecksum(options, env_options, read_options, file_path); return VerifySstFileChecksum(options, env_options, read_options, file_path);
} }

View File

@ -3126,7 +3126,8 @@ TEST_F(DBBasicTest, LastSstFileNotInManifest) {
// Manually add a sst file. // Manually add a sst file.
constexpr uint64_t kSstFileNumber = 100; constexpr uint64_t kSstFileNumber = 100;
const std::string kSstFile = MakeTableFileName(dbname_, kSstFileNumber); const std::string kSstFile = MakeTableFileName(dbname_, kSstFileNumber);
ASSERT_OK(WriteStringToFile(env_, /* data = */ "bad sst file content", ASSERT_OK(WriteStringToFile(env_,
/* data = */ "bad sst file content",
/* fname = */ kSstFile, /* fname = */ kSstFile,
/* should_sync = */ true)); /* should_sync = */ true));
ASSERT_OK(env_->FileExists(kSstFile)); ASSERT_OK(env_->FileExists(kSstFile));

View File

@ -333,8 +333,10 @@ Status DBImpl::Resume() {
Status DBImpl::ResumeImpl(DBRecoverContext context) { Status DBImpl::ResumeImpl(DBRecoverContext context) {
mutex_.AssertHeld(); mutex_.AssertHeld();
// TODO: plumb Env::IOActivity // TODO: plumb Env::IOActivity, Env::IOPriority
const ReadOptions read_options; const ReadOptions read_options;
const WriteOptions write_options;
WaitForBackgroundWork(); WaitForBackgroundWork();
Status s; Status s;
@ -373,8 +375,8 @@ Status DBImpl::ResumeImpl(DBRecoverContext context) {
assert(cfh); assert(cfh);
ColumnFamilyData* cfd = cfh->cfd(); ColumnFamilyData* cfd = cfh->cfd();
const MutableCFOptions& cf_opts = *cfd->GetLatestMutableCFOptions(); const MutableCFOptions& cf_opts = *cfd->GetLatestMutableCFOptions();
s = versions_->LogAndApply(cfd, cf_opts, read_options, &edit, &mutex_, s = versions_->LogAndApply(cfd, cf_opts, read_options, write_options,
directories_.GetDbDir()); &edit, &mutex_, directories_.GetDbDir());
if (!s.ok()) { if (!s.ok()) {
io_s = versions_->io_status(); io_s = versions_->io_status();
if (!io_s.ok()) { if (!io_s.ok()) {
@ -716,14 +718,15 @@ Status DBImpl::CloseHelper() {
Status DBImpl::CloseImpl() { return CloseHelper(); } Status DBImpl::CloseImpl() { return CloseHelper(); }
DBImpl::~DBImpl() { DBImpl::~DBImpl() {
ThreadStatus::OperationType cur_op_type =
ThreadStatusUtil::GetThreadOperation();
ThreadStatusUtil::SetThreadOperation(ThreadStatus::OperationType::OP_UNKNOWN);
// TODO: remove this. // TODO: remove this.
init_logger_creation_s_.PermitUncheckedError(); init_logger_creation_s_.PermitUncheckedError();
InstrumentedMutexLock closing_lock_guard(&closing_mutex_); InstrumentedMutexLock closing_lock_guard(&closing_mutex_);
if (closed_) { if (!closed_) {
return;
}
closed_ = true; closed_ = true;
{ {
@ -733,6 +736,8 @@ DBImpl::~DBImpl() {
closing_status_ = CloseImpl(); closing_status_ = CloseImpl();
closing_status_.PermitUncheckedError(); closing_status_.PermitUncheckedError();
}
ThreadStatusUtil::SetThreadOperation(cur_op_type);
} }
void DBImpl::MaybeIgnoreError(Status* s) const { void DBImpl::MaybeIgnoreError(Status* s) const {
@ -807,7 +812,9 @@ Status DBImpl::StartPeriodicTaskScheduler() {
return s; return s;
} }
Status DBImpl::RegisterRecordSeqnoTimeWorker(bool is_new_db) { Status DBImpl::RegisterRecordSeqnoTimeWorker(const ReadOptions& read_options,
const WriteOptions& write_options,
bool is_new_db) {
options_mutex_.AssertHeld(); options_mutex_.AssertHeld();
uint64_t min_preserve_seconds = std::numeric_limits<uint64_t>::max(); uint64_t min_preserve_seconds = std::numeric_limits<uint64_t>::max();
@ -890,7 +897,8 @@ Status DBImpl::RegisterRecordSeqnoTimeWorker(bool is_new_db) {
VersionEdit edit; VersionEdit edit;
edit.SetLastSequence(kMax); edit.SetLastSequence(kMax);
s = versions_->LogAndApplyToDefaultColumnFamily( s = versions_->LogAndApplyToDefaultColumnFamily(
{}, &edit, &mutex_, directories_.GetDbDir()); read_options, write_options, &edit, &mutex_,
directories_.GetDbDir());
if (!s.ok() && versions_->io_status().IsIOError()) { if (!s.ok() && versions_->io_status().IsIOError()) {
s = error_handler_.SetBGError(versions_->io_status(), s = error_handler_.SetBGError(versions_->io_status(),
BackgroundErrorReason::kManifestWrite); BackgroundErrorReason::kManifestWrite);
@ -1000,6 +1008,7 @@ void DBImpl::PersistStats() {
stats_slice_initialized_ = true; stats_slice_initialized_ = true;
std::swap(stats_slice_, stats_map); std::swap(stats_slice_, stats_map);
if (s.ok()) { if (s.ok()) {
// TODO: plumb Env::IOActivity, Env::IOPriority
WriteOptions wo; WriteOptions wo;
wo.low_pri = true; wo.low_pri = true;
wo.no_slowdown = true; wo.no_slowdown = true;
@ -1214,8 +1223,10 @@ FSDirectory* DBImpl::GetDataDir(ColumnFamilyData* cfd, size_t path_id) const {
Status DBImpl::SetOptions( Status DBImpl::SetOptions(
ColumnFamilyHandle* column_family, ColumnFamilyHandle* column_family,
const std::unordered_map<std::string, std::string>& options_map) { const std::unordered_map<std::string, std::string>& options_map) {
// TODO: plumb Env::IOActivity // TODO: plumb Env::IOActivity, Env::IOPriority
const ReadOptions read_options; const ReadOptions read_options;
const WriteOptions write_options;
auto* cfd = auto* cfd =
static_cast_with_check<ColumnFamilyHandleImpl>(column_family)->cfd(); static_cast_with_check<ColumnFamilyHandleImpl>(column_family)->cfd();
if (options_map.empty()) { if (options_map.empty()) {
@ -1238,14 +1249,15 @@ Status DBImpl::SetOptions(
new_options = *cfd->GetLatestMutableCFOptions(); new_options = *cfd->GetLatestMutableCFOptions();
// Append new version to recompute compaction score. // Append new version to recompute compaction score.
VersionEdit dummy_edit; VersionEdit dummy_edit;
s = versions_->LogAndApply(cfd, new_options, read_options, &dummy_edit, s = versions_->LogAndApply(cfd, new_options, read_options, write_options,
&mutex_, directories_.GetDbDir()); &dummy_edit, &mutex_, directories_.GetDbDir());
// Trigger possible flush/compactions. This has to be before we persist // Trigger possible flush/compactions. This has to be before we persist
// options to file, otherwise there will be a deadlock with writer // options to file, otherwise there will be a deadlock with writer
// thread. // thread.
InstallSuperVersionAndScheduleWork(cfd, &sv_context, new_options); InstallSuperVersionAndScheduleWork(cfd, &sv_context, new_options);
persist_options_status = WriteOptionsFile(true /*db_mutex_already_held*/); persist_options_status =
WriteOptionsFile(write_options, true /*db_mutex_already_held*/);
bg_cv_.SignalAll(); bg_cv_.SignalAll();
} }
} }
@ -1424,7 +1436,8 @@ Status DBImpl::SetDBOptions(
} }
write_thread_.ExitUnbatched(&w); write_thread_.ExitUnbatched(&w);
} }
persist_options_status = WriteOptionsFile(true /*db_mutex_already_held*/); persist_options_status =
WriteOptionsFile(WriteOptions(), true /*db_mutex_already_held*/);
} else { } else {
// To get here, we must have had invalid options and will not attempt to // To get here, we must have had invalid options and will not attempt to
// persist the options, which means the status is "OK/Uninitialized. // persist the options, which means the status is "OK/Uninitialized.
@ -1476,14 +1489,14 @@ int DBImpl::FindMinimumEmptyLevelFitting(
return minimum_level; return minimum_level;
} }
Status DBImpl::FlushWAL(bool sync) { Status DBImpl::FlushWAL(const WriteOptions& write_options, bool sync) {
if (manual_wal_flush_) { if (manual_wal_flush_) {
IOStatus io_s; IOStatus io_s;
{ {
// We need to lock log_write_mutex_ since logs_ might change concurrently // We need to lock log_write_mutex_ since logs_ might change concurrently
InstrumentedMutexLock wl(&log_write_mutex_); InstrumentedMutexLock wl(&log_write_mutex_);
log::Writer* cur_log_writer = logs_.back().writer; log::Writer* cur_log_writer = logs_.back().writer;
io_s = cur_log_writer->WriteBuffer(); io_s = cur_log_writer->WriteBuffer(write_options);
} }
if (!io_s.ok()) { if (!io_s.ok()) {
ROCKS_LOG_ERROR(immutable_db_options_.info_log, "WAL flush error %s", ROCKS_LOG_ERROR(immutable_db_options_.info_log, "WAL flush error %s",
@ -1556,13 +1569,24 @@ Status DBImpl::SyncWAL() {
RecordTick(stats_, WAL_FILE_SYNCED); RecordTick(stats_, WAL_FILE_SYNCED);
Status status; Status status;
IOStatus io_s; IOStatus io_s;
// TODO: plumb Env::IOActivity, Env::IOPriority
const ReadOptions read_options;
const WriteOptions write_options;
IOOptions opts;
io_s = WritableFileWriter::PrepareIOOptions(write_options, opts);
if (!io_s.ok()) {
status = io_s;
}
if (io_s.ok()) {
for (log::Writer* log : logs_to_sync) { for (log::Writer* log : logs_to_sync) {
io_s = log->file()->SyncWithoutFlush(immutable_db_options_.use_fsync); io_s =
log->file()->SyncWithoutFlush(opts, immutable_db_options_.use_fsync);
if (!io_s.ok()) { if (!io_s.ok()) {
status = io_s; status = io_s;
break; break;
} }
} }
}
if (!io_s.ok()) { if (!io_s.ok()) {
ROCKS_LOG_ERROR(immutable_db_options_.info_log, "WAL Sync error %s", ROCKS_LOG_ERROR(immutable_db_options_.info_log, "WAL Sync error %s",
io_s.ToString().c_str()); io_s.ToString().c_str());
@ -1589,9 +1613,7 @@ Status DBImpl::SyncWAL() {
} }
if (status.ok() && synced_wals.IsWalAddition()) { if (status.ok() && synced_wals.IsWalAddition()) {
InstrumentedMutexLock l(&mutex_); InstrumentedMutexLock l(&mutex_);
// TODO: plumb Env::IOActivity status = ApplyWALToManifest(read_options, write_options, &synced_wals);
const ReadOptions read_options;
status = ApplyWALToManifest(read_options, &synced_wals);
} }
TEST_SYNC_POINT("DBImpl::SyncWAL:BeforeMarkLogsSynced:2"); TEST_SYNC_POINT("DBImpl::SyncWAL:BeforeMarkLogsSynced:2");
@ -1600,12 +1622,14 @@ Status DBImpl::SyncWAL() {
} }
Status DBImpl::ApplyWALToManifest(const ReadOptions& read_options, Status DBImpl::ApplyWALToManifest(const ReadOptions& read_options,
const WriteOptions& write_options,
VersionEdit* synced_wals) { VersionEdit* synced_wals) {
// not empty, write to MANIFEST. // not empty, write to MANIFEST.
mutex_.AssertHeld(); mutex_.AssertHeld();
Status status = versions_->LogAndApplyToDefaultColumnFamily( Status status = versions_->LogAndApplyToDefaultColumnFamily(
read_options, synced_wals, &mutex_, directories_.GetDbDir()); read_options, write_options, synced_wals, &mutex_,
directories_.GetDbDir());
if (!status.ok() && versions_->io_status().IsIOError()) { if (!status.ok() && versions_->io_status().IsIOError()) {
status = error_handler_.SetBGError(versions_->io_status(), status = error_handler_.SetBGError(versions_->io_status(),
BackgroundErrorReason::kManifestWrite); BackgroundErrorReason::kManifestWrite);
@ -3486,6 +3510,7 @@ void DBImpl::MultiGetEntity(const ReadOptions& _read_options, size_t num_keys,
} }
Status DBImpl::WrapUpCreateColumnFamilies( Status DBImpl::WrapUpCreateColumnFamilies(
const ReadOptions& read_options, const WriteOptions& write_options,
const std::vector<const ColumnFamilyOptions*>& cf_options) { const std::vector<const ColumnFamilyOptions*>& cf_options) {
// NOTE: this function is skipped for create_missing_column_families and // NOTE: this function is skipped for create_missing_column_families and
// DB::Open, so new functionality here might need to go into Open also. // DB::Open, so new functionality here might need to go into Open also.
@ -3498,26 +3523,32 @@ Status DBImpl::WrapUpCreateColumnFamilies(
} }
} }
// Attempt both follow-up actions even if one fails // Attempt both follow-up actions even if one fails
Status s = WriteOptionsFile(false /*db_mutex_already_held*/); Status s = WriteOptionsFile(write_options, false /*db_mutex_already_held*/);
if (register_worker) { if (register_worker) {
s.UpdateIfOk(RegisterRecordSeqnoTimeWorker(/*from_db_open=*/false)); s.UpdateIfOk(RegisterRecordSeqnoTimeWorker(read_options, write_options,
/* is_new_db */ false));
} }
return s; return s;
} }
Status DBImpl::CreateColumnFamily(const ColumnFamilyOptions& cf_options, Status DBImpl::CreateColumnFamily(const ReadOptions& read_options,
const WriteOptions& write_options,
const ColumnFamilyOptions& cf_options,
const std::string& column_family, const std::string& column_family,
ColumnFamilyHandle** handle) { ColumnFamilyHandle** handle) {
assert(handle != nullptr); assert(handle != nullptr);
InstrumentedMutexLock ol(&options_mutex_); InstrumentedMutexLock ol(&options_mutex_);
Status s = CreateColumnFamilyImpl(cf_options, column_family, handle); Status s = CreateColumnFamilyImpl(read_options, write_options, cf_options,
column_family, handle);
if (s.ok()) { if (s.ok()) {
s.UpdateIfOk(WrapUpCreateColumnFamilies({&cf_options})); s.UpdateIfOk(
WrapUpCreateColumnFamilies(read_options, write_options, {&cf_options}));
} }
return s; return s;
} }
Status DBImpl::CreateColumnFamilies( Status DBImpl::CreateColumnFamilies(
const ReadOptions& read_options, const WriteOptions& write_options,
const ColumnFamilyOptions& cf_options, const ColumnFamilyOptions& cf_options,
const std::vector<std::string>& column_family_names, const std::vector<std::string>& column_family_names,
std::vector<ColumnFamilyHandle*>* handles) { std::vector<ColumnFamilyHandle*>* handles) {
@ -3529,7 +3560,8 @@ Status DBImpl::CreateColumnFamilies(
bool success_once = false; bool success_once = false;
for (size_t i = 0; i < num_cf; i++) { for (size_t i = 0; i < num_cf; i++) {
ColumnFamilyHandle* handle; ColumnFamilyHandle* handle;
s = CreateColumnFamilyImpl(cf_options, column_family_names[i], &handle); s = CreateColumnFamilyImpl(read_options, write_options, cf_options,
column_family_names[i], &handle);
if (!s.ok()) { if (!s.ok()) {
break; break;
} }
@ -3537,12 +3569,14 @@ Status DBImpl::CreateColumnFamilies(
success_once = true; success_once = true;
} }
if (success_once) { if (success_once) {
s.UpdateIfOk(WrapUpCreateColumnFamilies({&cf_options})); s.UpdateIfOk(
WrapUpCreateColumnFamilies(read_options, write_options, {&cf_options}));
} }
return s; return s;
} }
Status DBImpl::CreateColumnFamilies( Status DBImpl::CreateColumnFamilies(
const ReadOptions& read_options, const WriteOptions& write_options,
const std::vector<ColumnFamilyDescriptor>& column_families, const std::vector<ColumnFamilyDescriptor>& column_families,
std::vector<ColumnFamilyHandle*>* handles) { std::vector<ColumnFamilyHandle*>* handles) {
assert(handles != nullptr); assert(handles != nullptr);
@ -3555,7 +3589,8 @@ Status DBImpl::CreateColumnFamilies(
cf_opts.reserve(num_cf); cf_opts.reserve(num_cf);
for (size_t i = 0; i < num_cf; i++) { for (size_t i = 0; i < num_cf; i++) {
ColumnFamilyHandle* handle; ColumnFamilyHandle* handle;
s = CreateColumnFamilyImpl(column_families[i].options, s = CreateColumnFamilyImpl(read_options, write_options,
column_families[i].options,
column_families[i].name, &handle); column_families[i].name, &handle);
if (!s.ok()) { if (!s.ok()) {
break; break;
@ -3565,17 +3600,18 @@ Status DBImpl::CreateColumnFamilies(
cf_opts.push_back(&column_families[i].options); cf_opts.push_back(&column_families[i].options);
} }
if (success_once) { if (success_once) {
s.UpdateIfOk(WrapUpCreateColumnFamilies(cf_opts)); s.UpdateIfOk(
WrapUpCreateColumnFamilies(read_options, write_options, cf_opts));
} }
return s; return s;
} }
Status DBImpl::CreateColumnFamilyImpl(const ColumnFamilyOptions& cf_options, Status DBImpl::CreateColumnFamilyImpl(const ReadOptions& read_options,
const WriteOptions& write_options,
const ColumnFamilyOptions& cf_options,
const std::string& column_family_name, const std::string& column_family_name,
ColumnFamilyHandle** handle) { ColumnFamilyHandle** handle) {
options_mutex_.AssertHeld(); options_mutex_.AssertHeld();
// TODO: plumb Env::IOActivity
const ReadOptions read_options;
Status s; Status s;
*handle = nullptr; *handle = nullptr;
@ -3619,7 +3655,7 @@ Status DBImpl::CreateColumnFamilyImpl(const ColumnFamilyOptions& cf_options,
// LogAndApply will both write the creation in MANIFEST and create // LogAndApply will both write the creation in MANIFEST and create
// ColumnFamilyData object // ColumnFamilyData object
s = versions_->LogAndApply(nullptr, MutableCFOptions(cf_options), s = versions_->LogAndApply(nullptr, MutableCFOptions(cf_options),
read_options, &edit, &mutex_, read_options, write_options, &edit, &mutex_,
directories_.GetDbDir(), false, &cf_options); directories_.GetDbDir(), false, &cf_options);
write_thread_.ExitUnbatched(&w); write_thread_.ExitUnbatched(&w);
} }
@ -3668,7 +3704,8 @@ Status DBImpl::DropColumnFamily(ColumnFamilyHandle* column_family) {
InstrumentedMutexLock ol(&options_mutex_); InstrumentedMutexLock ol(&options_mutex_);
Status s = DropColumnFamilyImpl(column_family); Status s = DropColumnFamilyImpl(column_family);
if (s.ok()) { if (s.ok()) {
s = WriteOptionsFile(false /*db_mutex_already_held*/); // TODO: plumb Env::IOActivity, Env::IOPriority
s = WriteOptionsFile(WriteOptions(), false /*db_mutex_already_held*/);
} }
return s; return s;
} }
@ -3686,8 +3723,9 @@ Status DBImpl::DropColumnFamilies(
success_once = true; success_once = true;
} }
if (success_once) { if (success_once) {
// TODO: plumb Env::IOActivity, Env::IOPriority
Status persist_options_status = Status persist_options_status =
WriteOptionsFile(false /*db_mutex_already_held*/); WriteOptionsFile(WriteOptions(), false /*db_mutex_already_held*/);
if (s.ok() && !persist_options_status.ok()) { if (s.ok() && !persist_options_status.ok()) {
s = persist_options_status; s = persist_options_status;
} }
@ -3696,8 +3734,10 @@ Status DBImpl::DropColumnFamilies(
} }
Status DBImpl::DropColumnFamilyImpl(ColumnFamilyHandle* column_family) { Status DBImpl::DropColumnFamilyImpl(ColumnFamilyHandle* column_family) {
// TODO: plumb Env::IOActivity // TODO: plumb Env::IOActivity, Env::IOPriority
const ReadOptions read_options; const ReadOptions read_options;
const WriteOptions write_options;
auto cfh = static_cast_with_check<ColumnFamilyHandleImpl>(column_family); auto cfh = static_cast_with_check<ColumnFamilyHandleImpl>(column_family);
auto cfd = cfh->cfd(); auto cfd = cfh->cfd();
if (cfd->GetID() == 0) { if (cfd->GetID() == 0) {
@ -3721,7 +3761,7 @@ Status DBImpl::DropColumnFamilyImpl(ColumnFamilyHandle* column_family) {
WriteThread::Writer w; WriteThread::Writer w;
write_thread_.EnterUnbatched(&w, &mutex_); write_thread_.EnterUnbatched(&w, &mutex_);
s = versions_->LogAndApply(cfd, *cfd->GetLatestMutableCFOptions(), s = versions_->LogAndApply(cfd, *cfd->GetLatestMutableCFOptions(),
read_options, &edit, &mutex_, read_options, write_options, &edit, &mutex_,
directories_.GetDbDir()); directories_.GetDbDir());
write_thread_.ExitUnbatched(&w); write_thread_.ExitUnbatched(&w);
} }
@ -3748,7 +3788,8 @@ Status DBImpl::DropColumnFamilyImpl(ColumnFamilyHandle* column_family) {
if (cfd->ioptions()->preserve_internal_time_seconds > 0 || if (cfd->ioptions()->preserve_internal_time_seconds > 0 ||
cfd->ioptions()->preclude_last_level_data_seconds > 0) { cfd->ioptions()->preclude_last_level_data_seconds > 0) {
s = RegisterRecordSeqnoTimeWorker(/*from_db_open=*/false); s = RegisterRecordSeqnoTimeWorker(read_options, write_options,
/* is_new_db */ false);
} }
if (s.ok()) { if (s.ok()) {
@ -3779,7 +3820,7 @@ bool DBImpl::KeyMayExist(const ReadOptions& read_options,
// falsify later if key-may-exist but can't fetch value // falsify later if key-may-exist but can't fetch value
*value_found = true; *value_found = true;
} }
// TODO: plumb Env::IOActivity // TODO: plumb Env::IOActivity, Env::IOPriority
ReadOptions roptions = read_options; ReadOptions roptions = read_options;
roptions.read_tier = kBlockCacheTier; // read from block cache only roptions.read_tier = kBlockCacheTier; // read from block cache only
PinnableSlice pinnable_val; PinnableSlice pinnable_val;
@ -4298,7 +4339,7 @@ Status DBImpl::GetPropertiesOfAllTables(ColumnFamilyHandle* column_family,
version->Ref(); version->Ref();
mutex_.Unlock(); mutex_.Unlock();
// TODO: plumb Env::IOActivity // TODO: plumb Env::IOActivity, Env::IOPriority
const ReadOptions read_options; const ReadOptions read_options;
auto s = version->GetPropertiesOfAllTables(read_options, props); auto s = version->GetPropertiesOfAllTables(read_options, props);
@ -4322,7 +4363,7 @@ Status DBImpl::GetPropertiesOfTablesInRange(ColumnFamilyHandle* column_family,
version->Ref(); version->Ref();
mutex_.Unlock(); mutex_.Unlock();
// TODO: plumb Env::IOActivity // TODO: plumb Env::IOActivity, Env::IOPriority
const ReadOptions read_options; const ReadOptions read_options;
auto s = version->GetPropertiesOfTablesInRange(read_options, range, n, props); auto s = version->GetPropertiesOfTablesInRange(read_options, range, n, props);
@ -4664,7 +4705,7 @@ Status DBImpl::GetApproximateSizes(const SizeApproximationOptions& options,
SuperVersion* sv = GetAndRefSuperVersion(cfd); SuperVersion* sv = GetAndRefSuperVersion(cfd);
v = sv->current; v = sv->current;
// TODO: plumb Env::IOActivity // TODO: plumb Env::IOActivity, Env::IOPriority
const ReadOptions read_options; const ReadOptions read_options;
for (int i = 0; i < n; i++) { for (int i = 0; i < n; i++) {
// Add timestamp if needed // Add timestamp if needed
@ -4728,8 +4769,10 @@ Status DBImpl::GetUpdatesSince(
} }
Status DBImpl::DeleteFile(std::string name) { Status DBImpl::DeleteFile(std::string name) {
// TODO: plumb Env::IOActivity // TODO: plumb Env::IOActivity, Env::IOPriority
const ReadOptions read_options; const ReadOptions read_options;
const WriteOptions write_options;
uint64_t number; uint64_t number;
FileType type; FileType type;
WalFileType log_type; WalFileType log_type;
@ -4809,7 +4852,7 @@ Status DBImpl::DeleteFile(std::string name) {
edit.SetColumnFamily(cfd->GetID()); edit.SetColumnFamily(cfd->GetID());
edit.DeleteFile(level, number); edit.DeleteFile(level, number);
status = versions_->LogAndApply(cfd, *cfd->GetLatestMutableCFOptions(), status = versions_->LogAndApply(cfd, *cfd->GetLatestMutableCFOptions(),
read_options, &edit, &mutex_, read_options, write_options, &edit, &mutex_,
directories_.GetDbDir()); directories_.GetDbDir());
if (status.ok()) { if (status.ok()) {
InstallSuperVersionAndScheduleWork(cfd, InstallSuperVersionAndScheduleWork(cfd,
@ -4832,8 +4875,10 @@ Status DBImpl::DeleteFile(std::string name) {
Status DBImpl::DeleteFilesInRanges(ColumnFamilyHandle* column_family, Status DBImpl::DeleteFilesInRanges(ColumnFamilyHandle* column_family,
const RangePtr* ranges, size_t n, const RangePtr* ranges, size_t n,
bool include_end) { bool include_end) {
// TODO: plumb Env::IOActivity // TODO: plumb Env::IOActivity, Env::IOPriority
const ReadOptions read_options; const ReadOptions read_options;
const WriteOptions write_options;
Status status = Status::OK(); Status status = Status::OK();
auto cfh = static_cast_with_check<ColumnFamilyHandleImpl>(column_family); auto cfh = static_cast_with_check<ColumnFamilyHandleImpl>(column_family);
ColumnFamilyData* cfd = cfh->cfd(); ColumnFamilyData* cfd = cfh->cfd();
@ -4901,7 +4946,7 @@ Status DBImpl::DeleteFilesInRanges(ColumnFamilyHandle* column_family,
} }
input_version->Ref(); input_version->Ref();
status = versions_->LogAndApply(cfd, *cfd->GetLatestMutableCFOptions(), status = versions_->LogAndApply(cfd, *cfd->GetLatestMutableCFOptions(),
read_options, &edit, &mutex_, read_options, write_options, &edit, &mutex_,
directories_.GetDbDir()); directories_.GetDbDir());
if (status.ok()) { if (status.ok()) {
InstallSuperVersionAndScheduleWork(cfd, InstallSuperVersionAndScheduleWork(cfd,
@ -5315,7 +5360,8 @@ Status DestroyDB(const std::string& dbname, const Options& options,
return result; return result;
} }
Status DBImpl::WriteOptionsFile(bool db_mutex_already_held) { Status DBImpl::WriteOptionsFile(const WriteOptions& write_options,
bool db_mutex_already_held) {
options_mutex_.AssertHeld(); options_mutex_.AssertHeld();
if (db_mutex_already_held) { if (db_mutex_already_held) {
@ -5349,8 +5395,8 @@ Status DBImpl::WriteOptionsFile(bool db_mutex_already_held) {
std::string file_name = std::string file_name =
TempOptionsFileName(GetName(), versions_->NewFileNumber()); TempOptionsFileName(GetName(), versions_->NewFileNumber());
Status s = PersistRocksDBOptions(db_options, cf_names, cf_opts, file_name, Status s = PersistRocksDBOptions(write_options, db_options, cf_names, cf_opts,
fs_.get()); file_name, fs_.get());
if (s.ok()) { if (s.ok()) {
s = RenameTempFileToOptionsFile(file_name); s = RenameTempFileToOptionsFile(file_name);
@ -5543,7 +5589,7 @@ Status DBImpl::GetLatestSequenceForKey(
MergeContext merge_context; MergeContext merge_context;
SequenceNumber max_covering_tombstone_seq = 0; SequenceNumber max_covering_tombstone_seq = 0;
// TODO: plumb Env::IOActivity // TODO: plumb Env::IOActivity, Env::IOPriority
ReadOptions read_options; ReadOptions read_options;
SequenceNumber current_seq = versions_->LastSequence(); SequenceNumber current_seq = versions_->LastSequence();
@ -5699,8 +5745,10 @@ Status DBImpl::IngestExternalFile(
Status DBImpl::IngestExternalFiles( Status DBImpl::IngestExternalFiles(
const std::vector<IngestExternalFileArg>& args) { const std::vector<IngestExternalFileArg>& args) {
// TODO: plumb Env::IOActivity // TODO: plumb Env::IOActivity, Env::IOPriority
const ReadOptions read_options; const ReadOptions read_options;
const WriteOptions write_options;
if (args.empty()) { if (args.empty()) {
return Status::InvalidArgument("ingestion arg list is empty"); return Status::InvalidArgument("ingestion arg list is empty");
} }
@ -5918,9 +5966,10 @@ Status DBImpl::IngestExternalFiles(
} }
assert(0 == num_entries); assert(0 == num_entries);
} }
status = versions_->LogAndApply(cfds_to_commit, mutable_cf_options_list, status = versions_->LogAndApply(
read_options, edit_lists, &mutex_, cfds_to_commit, mutable_cf_options_list, read_options, write_options,
directories_.GetDbDir());
edit_lists, &mutex_, directories_.GetDbDir());
// It is safe to update VersionSet last seqno here after LogAndApply since // It is safe to update VersionSet last seqno here after LogAndApply since
// LogAndApply persists last sequence number from VersionEdits, // LogAndApply persists last sequence number from VersionEdits,
// which are from file's largest seqno and not from VersionSet. // which are from file's largest seqno and not from VersionSet.
@ -6022,8 +6071,10 @@ Status DBImpl::CreateColumnFamilyWithImport(
ColumnFamilyHandle** handle) { ColumnFamilyHandle** handle) {
assert(handle != nullptr); assert(handle != nullptr);
assert(*handle == nullptr); assert(*handle == nullptr);
// TODO: plumb Env::IOActivity // TODO: plumb Env::IOActivity, Env::IOPriority
const ReadOptions read_options; const ReadOptions read_options;
const WriteOptions write_options;
std::string cf_comparator_name = options.comparator->Name(); std::string cf_comparator_name = options.comparator->Name();
size_t total_file_num = 0; size_t total_file_num = 0;
@ -6039,7 +6090,8 @@ Status DBImpl::CreateColumnFamilyWithImport(
} }
// Create column family. // Create column family.
auto status = CreateColumnFamily(options, column_family_name, handle); auto status = CreateColumnFamily(read_options, write_options, options,
column_family_name, handle);
if (!status.ok()) { if (!status.ok()) {
return status; return status;
} }
@ -6075,8 +6127,8 @@ Status DBImpl::CreateColumnFamilyWithImport(
next_file_number = versions_->FetchAddFileNumber(total_file_num); next_file_number = versions_->FetchAddFileNumber(total_file_num);
auto cf_options = cfd->GetLatestMutableCFOptions(); auto cf_options = cfd->GetLatestMutableCFOptions();
status = status =
versions_->LogAndApply(cfd, *cf_options, read_options, &dummy_edit, versions_->LogAndApply(cfd, *cf_options, read_options, write_options,
&mutex_, directories_.GetDbDir()); &dummy_edit, &mutex_, directories_.GetDbDir());
if (status.ok()) { if (status.ok()) {
InstallSuperVersionAndScheduleWork(cfd, &dummy_sv_ctx, *cf_options); InstallSuperVersionAndScheduleWork(cfd, &dummy_sv_ctx, *cf_options);
} }
@ -6113,8 +6165,8 @@ Status DBImpl::CreateColumnFamilyWithImport(
if (status.ok()) { if (status.ok()) {
auto cf_options = cfd->GetLatestMutableCFOptions(); auto cf_options = cfd->GetLatestMutableCFOptions();
status = versions_->LogAndApply(cfd, *cf_options, read_options, status = versions_->LogAndApply(cfd, *cf_options, read_options,
import_job.edit(), &mutex_, write_options, import_job.edit(),
directories_.GetDbDir()); &mutex_, directories_.GetDbDir());
if (status.ok()) { if (status.ok()) {
InstallSuperVersionAndScheduleWork(cfd, &sv_context, *cf_options); InstallSuperVersionAndScheduleWork(cfd, &sv_context, *cf_options);
} }
@ -6198,6 +6250,7 @@ Status DBImpl::ClipColumnFamily(ColumnFamilyHandle* column_family,
empty_after_delete = true; empty_after_delete = true;
} else { } else {
const Comparator* const ucmp = column_family->GetComparator(); const Comparator* const ucmp = column_family->GetComparator();
// TODO: plumb Env::IOActivity, Env::IOPriority
WriteOptions wo; WriteOptions wo;
// Delete [smallest_user_key, clip_begin_key) // Delete [smallest_user_key, clip_begin_key)
if (ucmp->Compare(smallest_user_key, begin_key) < 0) { if (ucmp->Compare(smallest_user_key, begin_key) < 0) {
@ -6518,8 +6571,10 @@ Status DBImpl::ReserveFileNumbersBeforeIngestion(
ColumnFamilyData* cfd, uint64_t num, ColumnFamilyData* cfd, uint64_t num,
std::unique_ptr<std::list<uint64_t>::iterator>& pending_output_elem, std::unique_ptr<std::list<uint64_t>::iterator>& pending_output_elem,
uint64_t* next_file_number) { uint64_t* next_file_number) {
// TODO: plumb Env::IOActivity // TODO: plumb Env::IOActivity, Env::IOPriority
const ReadOptions read_options; const ReadOptions read_options;
const WriteOptions write_options;
Status s; Status s;
SuperVersionContext dummy_sv_ctx(true /* create_superversion */); SuperVersionContext dummy_sv_ctx(true /* create_superversion */);
assert(nullptr != next_file_number); assert(nullptr != next_file_number);
@ -6537,8 +6592,8 @@ Status DBImpl::ReserveFileNumbersBeforeIngestion(
// reuse the file number that has already assigned to the internal file, // reuse the file number that has already assigned to the internal file,
// and this will overwrite the external file. To protect the external // and this will overwrite the external file. To protect the external
// file, we have to make sure the file number will never being reused. // file, we have to make sure the file number will never being reused.
s = versions_->LogAndApply(cfd, *cf_options, read_options, &dummy_edit, s = versions_->LogAndApply(cfd, *cf_options, read_options, write_options,
&mutex_, directories_.GetDbDir()); &dummy_edit, &mutex_, directories_.GetDbDir());
if (s.ok()) { if (s.ok()) {
InstallSuperVersionAndScheduleWork(cfd, &dummy_sv_ctx, *cf_options); InstallSuperVersionAndScheduleWork(cfd, &dummy_sv_ctx, *cf_options);
} }

View File

@ -321,14 +321,41 @@ class DBImpl : public DB {
virtual Status CreateColumnFamily(const ColumnFamilyOptions& cf_options, virtual Status CreateColumnFamily(const ColumnFamilyOptions& cf_options,
const std::string& column_family, const std::string& column_family,
ColumnFamilyHandle** handle) override; ColumnFamilyHandle** handle) override {
// TODO: plumb Env::IOActivity, Env::IOPriority
return CreateColumnFamily(ReadOptions(), WriteOptions(), cf_options,
column_family, handle);
}
virtual Status CreateColumnFamily(const ReadOptions& read_options,
const WriteOptions& write_options,
const ColumnFamilyOptions& cf_options,
const std::string& column_family,
ColumnFamilyHandle** handle);
virtual Status CreateColumnFamilies( virtual Status CreateColumnFamilies(
const ColumnFamilyOptions& cf_options, const ColumnFamilyOptions& cf_options,
const std::vector<std::string>& column_family_names, const std::vector<std::string>& column_family_names,
std::vector<ColumnFamilyHandle*>* handles) override; std::vector<ColumnFamilyHandle*>* handles) override {
// TODO: plumb Env::IOActivity, Env::IOPriority
return CreateColumnFamilies(ReadOptions(), WriteOptions(), cf_options,
column_family_names, handles);
}
virtual Status CreateColumnFamilies(
const ReadOptions& read_options, const WriteOptions& write_options,
const ColumnFamilyOptions& cf_options,
const std::vector<std::string>& column_family_names,
std::vector<ColumnFamilyHandle*>* handles);
virtual Status CreateColumnFamilies( virtual Status CreateColumnFamilies(
const std::vector<ColumnFamilyDescriptor>& column_families, const std::vector<ColumnFamilyDescriptor>& column_families,
std::vector<ColumnFamilyHandle*>* handles) override; std::vector<ColumnFamilyHandle*>* handles) override {
// TODO: plumb Env::IOActivity, Env::IOPriority
return CreateColumnFamilies(ReadOptions(), WriteOptions(), column_families,
handles);
}
virtual Status CreateColumnFamilies(
const ReadOptions& read_options, const WriteOptions& write_options,
const std::vector<ColumnFamilyDescriptor>& column_families,
std::vector<ColumnFamilyHandle*>* handles);
virtual Status DropColumnFamily(ColumnFamilyHandle* column_family) override; virtual Status DropColumnFamily(ColumnFamilyHandle* column_family) override;
virtual Status DropColumnFamilies( virtual Status DropColumnFamilies(
const std::vector<ColumnFamilyHandle*>& column_families) override; const std::vector<ColumnFamilyHandle*>& column_families) override;
@ -440,7 +467,12 @@ class DBImpl : public DB {
virtual Status Flush( virtual Status Flush(
const FlushOptions& options, const FlushOptions& options,
const std::vector<ColumnFamilyHandle*>& column_families) override; const std::vector<ColumnFamilyHandle*>& column_families) override;
virtual Status FlushWAL(bool sync) override; virtual Status FlushWAL(bool sync) override {
// TODO: plumb Env::IOActivity, Env::IOPriority
return FlushWAL(WriteOptions(), sync);
}
virtual Status FlushWAL(const WriteOptions& write_options, bool sync);
bool WALBufferIsEmpty(); bool WALBufferIsEmpty();
virtual Status SyncWAL() override; virtual Status SyncWAL() override;
virtual Status LockWAL() override; virtual Status LockWAL() override;
@ -1406,7 +1438,8 @@ class DBImpl : public DB {
// Persist options to options file. Must be holding options_mutex_. // Persist options to options file. Must be holding options_mutex_.
// Will lock DB mutex if !db_mutex_already_held. // Will lock DB mutex if !db_mutex_already_held.
Status WriteOptionsFile(bool db_mutex_already_held); Status WriteOptionsFile(const WriteOptions& write_options,
bool db_mutex_already_held);
Status CompactRangeInternal(const CompactRangeOptions& options, Status CompactRangeInternal(const CompactRangeOptions& options,
ColumnFamilyHandle* column_family, ColumnFamilyHandle* column_family,
@ -1532,7 +1565,8 @@ class DBImpl : public DB {
virtual bool OwnTablesAndLogs() const { return true; } virtual bool OwnTablesAndLogs() const { return true; }
// Setup DB identity file, and write DB ID to manifest if necessary. // Setup DB identity file, and write DB ID to manifest if necessary.
Status SetupDBId(bool read_only, RecoveryContext* recovery_ctx); Status SetupDBId(const WriteOptions& write_options, bool read_only,
RecoveryContext* recovery_ctx);
// Assign db_id_ and write DB ID to manifest if necessary. // Assign db_id_ and write DB ID to manifest if necessary.
void SetDBId(std::string&& id, bool read_only, RecoveryContext* recovery_ctx); void SetDBId(std::string&& id, bool read_only, RecoveryContext* recovery_ctx);
@ -1659,7 +1693,8 @@ class DBImpl : public DB {
return w; return w;
} }
Status ClearWriter() { Status ClearWriter() {
Status s = writer->WriteBuffer(); // TODO: plumb Env::IOActivity, Env::IOPriority
Status s = writer->WriteBuffer(WriteOptions());
delete writer; delete writer;
writer = nullptr; writer = nullptr;
return s; return s;
@ -1835,12 +1870,15 @@ class DBImpl : public DB {
const Status CreateArchivalDirectory(); const Status CreateArchivalDirectory();
// Create a column family, without some of the follow-up work yet // Create a column family, without some of the follow-up work yet
Status CreateColumnFamilyImpl(const ColumnFamilyOptions& cf_options, Status CreateColumnFamilyImpl(const ReadOptions& read_options,
const WriteOptions& write_options,
const ColumnFamilyOptions& cf_options,
const std::string& cf_name, const std::string& cf_name,
ColumnFamilyHandle** handle); ColumnFamilyHandle** handle);
// Follow-up work to user creating a column family or (families) // Follow-up work to user creating a column family or (families)
Status WrapUpCreateColumnFamilies( Status WrapUpCreateColumnFamilies(
const ReadOptions& read_options, const WriteOptions& write_options,
const std::vector<const ColumnFamilyOptions*>& cf_options); const std::vector<const ColumnFamilyOptions*>& cf_options);
Status DropColumnFamilyImpl(ColumnFamilyHandle* column_family); Status DropColumnFamilyImpl(ColumnFamilyHandle* column_family);
@ -1872,7 +1910,8 @@ class DBImpl : public DB {
void ReleaseFileNumberFromPendingOutputs( void ReleaseFileNumberFromPendingOutputs(
std::unique_ptr<std::list<uint64_t>::iterator>& v); std::unique_ptr<std::list<uint64_t>::iterator>& v);
IOStatus SyncClosedLogs(JobContext* job_context, VersionEdit* synced_wals, IOStatus SyncClosedLogs(const WriteOptions& write_options,
JobContext* job_context, VersionEdit* synced_wals,
bool error_recovery_in_prog); bool error_recovery_in_prog);
// Flush the in-memory write buffer to storage. Switches to a new // Flush the in-memory write buffer to storage. Switches to a new
@ -2058,12 +2097,10 @@ class DBImpl : public DB {
WriteBatch* tmp_batch, WriteBatch** merged_batch, WriteBatch* tmp_batch, WriteBatch** merged_batch,
size_t* write_with_wal, WriteBatch** to_be_cached_state); size_t* write_with_wal, WriteBatch** to_be_cached_state);
// rate_limiter_priority is used to charge `DBOptions::rate_limiter` IOStatus WriteToWAL(const WriteBatch& merged_batch,
// for automatic WAL flush (`Options::manual_wal_flush` == false) const WriteOptions& write_options,
// associated with this WriteToWAL log::Writer* log_writer, uint64_t* log_used,
IOStatus WriteToWAL(const WriteBatch& merged_batch, log::Writer* log_writer, uint64_t* log_size,
uint64_t* log_used, uint64_t* log_size,
Env::IOPriority rate_limiter_priority,
LogFileNumberSize& log_file_number_size); LogFileNumberSize& log_file_number_size);
IOStatus WriteToWAL(const WriteThread::WriteGroup& write_group, IOStatus WriteToWAL(const WriteThread::WriteGroup& write_group,
@ -2175,7 +2212,9 @@ class DBImpl : public DB {
// Cancel scheduled periodic tasks // Cancel scheduled periodic tasks
Status CancelPeriodicTaskScheduler(); Status CancelPeriodicTaskScheduler();
Status RegisterRecordSeqnoTimeWorker(bool is_new_db); Status RegisterRecordSeqnoTimeWorker(const ReadOptions& read_options,
const WriteOptions& write_options,
bool is_new_db);
void PrintStatistics(); void PrintStatistics();
@ -2203,7 +2242,9 @@ class DBImpl : public DB {
// helper function to call after some of the logs_ were synced // helper function to call after some of the logs_ were synced
void MarkLogsSynced(uint64_t up_to, bool synced_dir, VersionEdit* edit); void MarkLogsSynced(uint64_t up_to, bool synced_dir, VersionEdit* edit);
Status ApplyWALToManifest(const ReadOptions& read_options, VersionEdit* edit); Status ApplyWALToManifest(const ReadOptions& read_options,
const WriteOptions& write_options,
VersionEdit* edit);
// WALs with log number up to up_to are not synced successfully. // WALs with log number up to up_to are not synced successfully.
void MarkLogsNotSynced(uint64_t up_to); void MarkLogsNotSynced(uint64_t up_to);
@ -2275,8 +2316,9 @@ class DBImpl : public DB {
size_t GetWalPreallocateBlockSize(uint64_t write_buffer_size) const; size_t GetWalPreallocateBlockSize(uint64_t write_buffer_size) const;
Env::WriteLifeTimeHint CalculateWALWriteHint() { return Env::WLTH_SHORT; } Env::WriteLifeTimeHint CalculateWALWriteHint() { return Env::WLTH_SHORT; }
IOStatus CreateWAL(uint64_t log_file_num, uint64_t recycle_log_number, IOStatus CreateWAL(const WriteOptions& write_options, uint64_t log_file_num,
size_t preallocate_block_size, log::Writer** new_log); uint64_t recycle_log_number, size_t preallocate_block_size,
log::Writer** new_log);
// Validate self-consistency of DB options // Validate self-consistency of DB options
static Status ValidateOptions(const DBOptions& db_options); static Status ValidateOptions(const DBOptions& db_options);

View File

@ -19,6 +19,10 @@
#include "monitoring/perf_context_imp.h" #include "monitoring/perf_context_imp.h"
#include "monitoring/thread_status_updater.h" #include "monitoring/thread_status_updater.h"
#include "monitoring/thread_status_util.h" #include "monitoring/thread_status_util.h"
#include "rocksdb/file_system.h"
#include "rocksdb/io_status.h"
#include "rocksdb/options.h"
#include "rocksdb/table.h"
#include "test_util/sync_point.h" #include "test_util/sync_point.h"
#include "util/cast_util.h" #include "util/cast_util.h"
#include "util/coding.h" #include "util/coding.h"
@ -112,7 +116,8 @@ bool DBImpl::ShouldRescheduleFlushRequestToRetainUDT(
return true; return true;
} }
IOStatus DBImpl::SyncClosedLogs(JobContext* job_context, IOStatus DBImpl::SyncClosedLogs(const WriteOptions& write_options,
JobContext* job_context,
VersionEdit* synced_wals, VersionEdit* synced_wals,
bool error_recovery_in_prog) { bool error_recovery_in_prog) {
TEST_SYNC_POINT("DBImpl::SyncClosedLogs:Start"); TEST_SYNC_POINT("DBImpl::SyncClosedLogs:Start");
@ -143,7 +148,13 @@ IOStatus DBImpl::SyncClosedLogs(JobContext* job_context,
if (error_recovery_in_prog) { if (error_recovery_in_prog) {
log->file()->reset_seen_error(); log->file()->reset_seen_error();
} }
io_s = log->file()->Sync(immutable_db_options_.use_fsync);
IOOptions io_options;
io_s = WritableFileWriter::PrepareIOOptions(write_options, io_options);
if (!io_s.ok()) {
break;
}
io_s = log->file()->Sync(io_options, immutable_db_options_.use_fsync);
if (!io_s.ok()) { if (!io_s.ok()) {
break; break;
} }
@ -152,17 +163,22 @@ IOStatus DBImpl::SyncClosedLogs(JobContext* job_context,
if (error_recovery_in_prog) { if (error_recovery_in_prog) {
log->file()->reset_seen_error(); log->file()->reset_seen_error();
} }
io_s = log->Close(); // TODO: plumb Env::IOActivity, Env::IOPriority
io_s = log->Close(WriteOptions());
if (!io_s.ok()) { if (!io_s.ok()) {
break; break;
} }
} }
} }
if (io_s.ok()) {
IOOptions io_options;
io_s = WritableFileWriter::PrepareIOOptions(write_options, io_options);
if (io_s.ok()) { if (io_s.ok()) {
io_s = directories_.GetWalDir()->FsyncWithDirOptions( io_s = directories_.GetWalDir()->FsyncWithDirOptions(
IOOptions(), nullptr, io_options, nullptr,
DirFsyncOptions(DirFsyncOptions::FsyncReason::kNewFileSynced)); DirFsyncOptions(DirFsyncOptions::FsyncReason::kNewFileSynced));
} }
}
TEST_SYNC_POINT_CALLBACK("DBImpl::SyncClosedLogs:BeforeReLock", TEST_SYNC_POINT_CALLBACK("DBImpl::SyncClosedLogs:BeforeReLock",
/*arg=*/nullptr); /*arg=*/nullptr);
@ -199,6 +215,8 @@ Status DBImpl::FlushMemTableToOutputFile(
assert(cfd->imm()->IsFlushPending()); assert(cfd->imm()->IsFlushPending());
assert(versions_); assert(versions_);
assert(versions_->GetColumnFamilySet()); assert(versions_->GetColumnFamilySet());
const ReadOptions read_options(Env::IOActivity::kFlush);
const WriteOptions write_options(Env::IOActivity::kFlush);
// If there are more than one column families, we need to make sure that // If there are more than one column families, we need to make sure that
// all the log files except the most recent one are synced. Otherwise if // all the log files except the most recent one are synced. Otherwise if
// the host crashes after flushing and before WAL is persistent, the // the host crashes after flushing and before WAL is persistent, the
@ -265,13 +283,12 @@ Status DBImpl::FlushMemTableToOutputFile(
VersionEdit synced_wals; VersionEdit synced_wals;
bool error_recovery_in_prog = error_handler_.IsRecoveryInProgress(); bool error_recovery_in_prog = error_handler_.IsRecoveryInProgress();
mutex_.Unlock(); mutex_.Unlock();
log_io_s = log_io_s = SyncClosedLogs(write_options, job_context, &synced_wals,
SyncClosedLogs(job_context, &synced_wals, error_recovery_in_prog); error_recovery_in_prog);
mutex_.Lock(); mutex_.Lock();
if (log_io_s.ok() && synced_wals.IsWalAddition()) { if (log_io_s.ok() && synced_wals.IsWalAddition()) {
const ReadOptions read_options(Env::IOActivity::kFlush); log_io_s = status_to_io_status(
log_io_s = ApplyWALToManifest(read_options, write_options, &synced_wals));
status_to_io_status(ApplyWALToManifest(read_options, &synced_wals));
TEST_SYNC_POINT_CALLBACK("DBImpl::FlushMemTableToOutputFile:CommitWal:1", TEST_SYNC_POINT_CALLBACK("DBImpl::FlushMemTableToOutputFile:CommitWal:1",
nullptr); nullptr);
} }
@ -465,6 +482,8 @@ Status DBImpl::AtomicFlushMemTablesToOutputFiles(
const autovector<BGFlushArg>& bg_flush_args, bool* made_progress, const autovector<BGFlushArg>& bg_flush_args, bool* made_progress,
JobContext* job_context, LogBuffer* log_buffer, Env::Priority thread_pri) { JobContext* job_context, LogBuffer* log_buffer, Env::Priority thread_pri) {
mutex_.AssertHeld(); mutex_.AssertHeld();
const ReadOptions read_options(Env::IOActivity::kFlush);
const WriteOptions write_options(Env::IOActivity::kFlush);
autovector<ColumnFamilyData*> cfds; autovector<ColumnFamilyData*> cfds;
for (const auto& arg : bg_flush_args) { for (const auto& arg : bg_flush_args) {
@ -552,13 +571,12 @@ Status DBImpl::AtomicFlushMemTablesToOutputFiles(
VersionEdit synced_wals; VersionEdit synced_wals;
bool error_recovery_in_prog = error_handler_.IsRecoveryInProgress(); bool error_recovery_in_prog = error_handler_.IsRecoveryInProgress();
mutex_.Unlock(); mutex_.Unlock();
log_io_s = log_io_s = SyncClosedLogs(write_options, job_context, &synced_wals,
SyncClosedLogs(job_context, &synced_wals, error_recovery_in_prog); error_recovery_in_prog);
mutex_.Lock(); mutex_.Lock();
if (log_io_s.ok() && synced_wals.IsWalAddition()) { if (log_io_s.ok() && synced_wals.IsWalAddition()) {
const ReadOptions read_options(Env::IOActivity::kFlush); log_io_s = status_to_io_status(
log_io_s = ApplyWALToManifest(read_options, write_options, &synced_wals));
status_to_io_status(ApplyWALToManifest(read_options, &synced_wals));
} }
if (!log_io_s.ok() && !log_io_s.IsShutdownInProgress() && if (!log_io_s.ok() && !log_io_s.IsShutdownInProgress() &&
@ -653,9 +671,14 @@ Status DBImpl::AtomicFlushMemTablesToOutputFiles(
// Sync on all distinct output directories. // Sync on all distinct output directories.
for (auto dir : distinct_output_dirs) { for (auto dir : distinct_output_dirs) {
if (dir != nullptr) { if (dir != nullptr) {
Status error_status = dir->FsyncWithDirOptions( IOOptions io_options;
IOOptions(), nullptr, Status error_status =
WritableFileWriter::PrepareIOOptions(write_options, io_options);
if (error_status.ok()) {
error_status = dir->FsyncWithDirOptions(
io_options, nullptr,
DirFsyncOptions(DirFsyncOptions::FsyncReason::kNewFileSynced)); DirFsyncOptions(DirFsyncOptions::FsyncReason::kNewFileSynced));
}
if (!error_status.ok()) { if (!error_status.ok()) {
s = error_status; s = error_status;
break; break;
@ -1049,8 +1072,10 @@ Status DBImpl::IncreaseFullHistoryTsLowImpl(ColumnFamilyData* cfd,
edit.SetColumnFamily(cfd->GetID()); edit.SetColumnFamily(cfd->GetID());
edit.SetFullHistoryTsLow(ts_low); edit.SetFullHistoryTsLow(ts_low);
// TODO: plumb Env::IOActivity // TODO: plumb Env::IOActivity, Env::IOPriority
const ReadOptions read_options; const ReadOptions read_options;
const WriteOptions write_options;
TEST_SYNC_POINT_CALLBACK("DBImpl::IncreaseFullHistoryTsLowImpl:BeforeEdit", TEST_SYNC_POINT_CALLBACK("DBImpl::IncreaseFullHistoryTsLowImpl:BeforeEdit",
&edit); &edit);
@ -1064,7 +1089,7 @@ Status DBImpl::IncreaseFullHistoryTsLowImpl(ColumnFamilyData* cfd,
} }
Status s = versions_->LogAndApply(cfd, *cfd->GetLatestMutableCFOptions(), Status s = versions_->LogAndApply(cfd, *cfd->GetLatestMutableCFOptions(),
read_options, &edit, &mutex_, read_options, write_options, &edit, &mutex_,
directories_.GetDbDir()); directories_.GetDbDir());
if (!s.ok()) { if (!s.ok()) {
return s; return s;
@ -1754,6 +1779,7 @@ Status DBImpl::ReFitLevel(ColumnFamilyData* cfd, int level, int target_level) {
} }
const ReadOptions read_options(Env::IOActivity::kCompaction); const ReadOptions read_options(Env::IOActivity::kCompaction);
const WriteOptions write_options(Env::IOActivity::kCompaction);
SuperVersionContext sv_context(/* create_superversion */ true); SuperVersionContext sv_context(/* create_superversion */ true);
@ -1870,8 +1896,8 @@ Status DBImpl::ReFitLevel(ColumnFamilyData* cfd, int level, int target_level) {
"[%s] Apply version edit:\n%s", cfd->GetName().c_str(), "[%s] Apply version edit:\n%s", cfd->GetName().c_str(),
edit.DebugString().data()); edit.DebugString().data());
Status status = Status status = versions_->LogAndApply(cfd, mutable_cf_options,
versions_->LogAndApply(cfd, mutable_cf_options, read_options, &edit, read_options, write_options, &edit,
&mutex_, directories_.GetDbDir()); &mutex_, directories_.GetDbDir());
cfd->compaction_picker()->UnregisterCompaction(c.get()); cfd->compaction_picker()->UnregisterCompaction(c.get());
@ -3480,6 +3506,7 @@ Status DBImpl::BackgroundCompaction(bool* made_progress,
TEST_SYNC_POINT("DBImpl::BackgroundCompaction:Start"); TEST_SYNC_POINT("DBImpl::BackgroundCompaction:Start");
const ReadOptions read_options(Env::IOActivity::kCompaction); const ReadOptions read_options(Env::IOActivity::kCompaction);
const WriteOptions write_options(Env::IOActivity::kCompaction);
bool is_manual = (manual_compaction != nullptr); bool is_manual = (manual_compaction != nullptr);
std::unique_ptr<Compaction> c; std::unique_ptr<Compaction> c;
@ -3692,7 +3719,7 @@ Status DBImpl::BackgroundCompaction(bool* made_progress,
} }
status = versions_->LogAndApply( status = versions_->LogAndApply(
c->column_family_data(), *c->mutable_cf_options(), read_options, c->column_family_data(), *c->mutable_cf_options(), read_options,
c->edit(), &mutex_, directories_.GetDbDir(), write_options, c->edit(), &mutex_, directories_.GetDbDir(),
/*new_descriptor_log=*/false, /*column_family_options=*/nullptr, /*new_descriptor_log=*/false, /*column_family_options=*/nullptr,
[&c, &compaction_released](const Status& s) { [&c, &compaction_released](const Status& s) {
c->ReleaseCompactionFiles(s); c->ReleaseCompactionFiles(s);
@ -3766,7 +3793,7 @@ Status DBImpl::BackgroundCompaction(bool* made_progress,
} }
status = versions_->LogAndApply( status = versions_->LogAndApply(
c->column_family_data(), *c->mutable_cf_options(), read_options, c->column_family_data(), *c->mutable_cf_options(), read_options,
c->edit(), &mutex_, directories_.GetDbDir(), write_options, c->edit(), &mutex_, directories_.GetDbDir(),
/*new_descriptor_log=*/false, /*column_family_options=*/nullptr, /*new_descriptor_log=*/false, /*column_family_options=*/nullptr,
[&c, &compaction_released](const Status& s) { [&c, &compaction_released](const Status& s) {
c->ReleaseCompactionFiles(s); c->ReleaseCompactionFiles(s);

View File

@ -61,8 +61,10 @@ Status DBImpl::PromoteL0(ColumnFamilyHandle* column_family, int target_level) {
"PromoteL0 FAILED. Invalid target level %d\n", target_level); "PromoteL0 FAILED. Invalid target level %d\n", target_level);
return Status::InvalidArgument("Invalid target level"); return Status::InvalidArgument("Invalid target level");
} }
// TODO: plumb Env::IOActivity // TODO: plumb Env::IOActivity, Env::IOPriority
const ReadOptions read_options; const ReadOptions read_options;
const WriteOptions write_options;
Status status; Status status;
VersionEdit edit; VersionEdit edit;
JobContext job_context(next_job_id_.fetch_add(1), true); JobContext job_context(next_job_id_.fetch_add(1), true);
@ -143,7 +145,7 @@ Status DBImpl::PromoteL0(ColumnFamilyHandle* column_family, int target_level) {
} }
status = versions_->LogAndApply(cfd, *cfd->GetLatestMutableCFOptions(), status = versions_->LogAndApply(cfd, *cfd->GetLatestMutableCFOptions(),
read_options, &edit, &mutex_, read_options, write_options, &edit, &mutex_,
directories_.GetDbDir()); directories_.GetDbDir());
if (status.ok()) { if (status.ok()) {
InstallSuperVersionAndScheduleWork(cfd, InstallSuperVersionAndScheduleWork(cfd,

View File

@ -18,6 +18,7 @@
#include "file/sst_file_manager_impl.h" #include "file/sst_file_manager_impl.h"
#include "logging/logging.h" #include "logging/logging.h"
#include "port/port.h" #include "port/port.h"
#include "rocksdb/options.h"
#include "util/autovector.h" #include "util/autovector.h"
#include "util/defer.h" #include "util/defer.h"
@ -510,7 +511,8 @@ void DBImpl::PurgeObsoleteFiles(JobContext& state, bool schedule_only) {
// Close WALs before trying to delete them. // Close WALs before trying to delete them.
for (const auto w : state.logs_to_free) { for (const auto w : state.logs_to_free) {
// TODO: maybe check the return value of Close. // TODO: maybe check the return value of Close.
auto s = w->Close(); // TODO: plumb Env::IOActivity, Env::IOPriority
auto s = w->Close(WriteOptions());
s.PermitUncheckedError(); s.PermitUncheckedError();
} }
@ -925,7 +927,8 @@ void DBImpl::SetDBId(std::string&& id, bool read_only,
} }
} }
Status DBImpl::SetupDBId(bool read_only, RecoveryContext* recovery_ctx) { Status DBImpl::SetupDBId(const WriteOptions& write_options, bool read_only,
RecoveryContext* recovery_ctx) {
Status s; Status s;
// Check for the IDENTITY file and create it if not there or // Check for the IDENTITY file and create it if not there or
// broken or not matching manifest // broken or not matching manifest
@ -958,7 +961,7 @@ Status DBImpl::SetupDBId(bool read_only, RecoveryContext* recovery_ctx) {
} }
// Persist it to IDENTITY file if allowed // Persist it to IDENTITY file if allowed
if (!read_only) { if (!read_only) {
s = SetIdentityFile(env_, dbname_, db_id_); s = SetIdentityFile(write_options, env_, dbname_, db_id_);
} }
return s; return s;
} }

View File

@ -21,6 +21,7 @@
#include "monitoring/persistent_stats_history.h" #include "monitoring/persistent_stats_history.h"
#include "monitoring/thread_status_util.h" #include "monitoring/thread_status_util.h"
#include "options/options_helper.h" #include "options/options_helper.h"
#include "rocksdb/options.h"
#include "rocksdb/table.h" #include "rocksdb/table.h"
#include "rocksdb/wal_filter.h" #include "rocksdb/wal_filter.h"
#include "test_util/sync_point.h" #include "test_util/sync_point.h"
@ -309,7 +310,8 @@ Status DBImpl::ValidateOptions(const DBOptions& db_options) {
Status DBImpl::NewDB(std::vector<std::string>* new_filenames) { Status DBImpl::NewDB(std::vector<std::string>* new_filenames) {
VersionEdit new_db; VersionEdit new_db;
Status s = SetIdentityFile(env_, dbname_); const WriteOptions write_options(Env::IOActivity::kDBOpen);
Status s = SetIdentityFile(write_options, env_, dbname_);
if (!s.ok()) { if (!s.ok()) {
return s; return s;
} }
@ -339,20 +341,23 @@ Status DBImpl::NewDB(std::vector<std::string>* new_filenames) {
immutable_db_options_.manifest_preallocation_size); immutable_db_options_.manifest_preallocation_size);
std::unique_ptr<WritableFileWriter> file_writer(new WritableFileWriter( std::unique_ptr<WritableFileWriter> file_writer(new WritableFileWriter(
std::move(file), manifest, file_options, immutable_db_options_.clock, std::move(file), manifest, file_options, immutable_db_options_.clock,
io_tracer_, nullptr /* stats */, immutable_db_options_.listeners, io_tracer_, nullptr /* stats */,
nullptr, tmp_set.Contains(FileType::kDescriptorFile), Histograms::HISTOGRAM_ENUM_MAX /* hist_type */,
immutable_db_options_.listeners, nullptr,
tmp_set.Contains(FileType::kDescriptorFile),
tmp_set.Contains(FileType::kDescriptorFile))); tmp_set.Contains(FileType::kDescriptorFile)));
log::Writer log(std::move(file_writer), 0, false); log::Writer log(std::move(file_writer), 0, false);
std::string record; std::string record;
new_db.EncodeTo(&record); new_db.EncodeTo(&record);
s = log.AddRecord(record); s = log.AddRecord(write_options, record);
if (s.ok()) { if (s.ok()) {
s = SyncManifest(&immutable_db_options_, log.file()); s = SyncManifest(&immutable_db_options_, write_options, log.file());
} }
} }
if (s.ok()) { if (s.ok()) {
// Make "CURRENT" file that points to the new manifest file. // Make "CURRENT" file that points to the new manifest file.
s = SetCurrentFile(fs_.get(), dbname_, 1, directories_.GetDbDir()); s = SetCurrentFile(write_options, fs_.get(), dbname_, 1,
directories_.GetDbDir());
if (new_filenames) { if (new_filenames) {
new_filenames->emplace_back( new_filenames->emplace_back(
manifest.substr(manifest.find_last_of("/\\") + 1)); manifest.substr(manifest.find_last_of("/\\") + 1));
@ -418,6 +423,7 @@ Status DBImpl::Recover(
uint64_t* recovered_seq, RecoveryContext* recovery_ctx) { uint64_t* recovered_seq, RecoveryContext* recovery_ctx) {
mutex_.AssertHeld(); mutex_.AssertHeld();
const WriteOptions write_options(Env::IOActivity::kDBOpen);
bool tmp_is_new_db = false; bool tmp_is_new_db = false;
bool& is_new_db = recovery_ctx ? recovery_ctx->is_new_db_ : tmp_is_new_db; bool& is_new_db = recovery_ctx ? recovery_ctx->is_new_db_ : tmp_is_new_db;
assert(db_lock_ == nullptr); assert(db_lock_ == nullptr);
@ -642,7 +648,7 @@ Status DBImpl::Recover(
} }
} }
} }
s = SetupDBId(read_only, recovery_ctx); s = SetupDBId(write_options, read_only, recovery_ctx);
ROCKS_LOG_INFO(immutable_db_options_.info_log, "DB ID: %s\n", db_id_.c_str()); ROCKS_LOG_INFO(immutable_db_options_.info_log, "DB ID: %s\n", db_id_.c_str());
if (s.ok() && !read_only) { if (s.ok() && !read_only) {
s = DeleteUnreferencedSstFiles(recovery_ctx); s = DeleteUnreferencedSstFiles(recovery_ctx);
@ -872,8 +878,9 @@ Status DBImpl::PersistentStatsProcessFormatVersion() {
if (s.ok()) { if (s.ok()) {
ColumnFamilyOptions cfo; ColumnFamilyOptions cfo;
OptimizeForPersistentStats(&cfo); OptimizeForPersistentStats(&cfo);
s = CreateColumnFamilyImpl(cfo, kPersistentStatsColumnFamilyName, s = CreateColumnFamilyImpl(ReadOptions(Env::IOActivity::kDBOpen),
&handle); WriteOptions(Env::IOActivity::kDBOpen), cfo,
kPersistentStatsColumnFamilyName, &handle);
} }
if (s.ok()) { if (s.ok()) {
persist_stats_cf_handle_ = static_cast<ColumnFamilyHandleImpl*>(handle); persist_stats_cf_handle_ = static_cast<ColumnFamilyHandleImpl*>(handle);
@ -895,6 +902,7 @@ Status DBImpl::PersistentStatsProcessFormatVersion() {
std::to_string(kStatsCFCompatibleFormatVersion)); std::to_string(kStatsCFCompatibleFormatVersion));
} }
if (s.ok()) { if (s.ok()) {
// TODO: plumb Env::IOActivity, Env::IOPriority
WriteOptions wo; WriteOptions wo;
wo.low_pri = true; wo.low_pri = true;
wo.no_slowdown = true; wo.no_slowdown = true;
@ -926,7 +934,9 @@ Status DBImpl::InitPersistStatsColumnFamily() {
ColumnFamilyHandle* handle = nullptr; ColumnFamilyHandle* handle = nullptr;
ColumnFamilyOptions cfo; ColumnFamilyOptions cfo;
OptimizeForPersistentStats(&cfo); OptimizeForPersistentStats(&cfo);
s = CreateColumnFamilyImpl(cfo, kPersistentStatsColumnFamilyName, &handle); s = CreateColumnFamilyImpl(ReadOptions(Env::IOActivity::kDBOpen),
WriteOptions(Env::IOActivity::kDBOpen), cfo,
kPersistentStatsColumnFamilyName, &handle);
persist_stats_cf_handle_ = static_cast<ColumnFamilyHandleImpl*>(handle); persist_stats_cf_handle_ = static_cast<ColumnFamilyHandleImpl*>(handle);
mutex_.Lock(); mutex_.Lock();
} }
@ -937,9 +947,12 @@ Status DBImpl::LogAndApplyForRecovery(const RecoveryContext& recovery_ctx) {
mutex_.AssertHeld(); mutex_.AssertHeld();
assert(versions_->descriptor_log_ == nullptr); assert(versions_->descriptor_log_ == nullptr);
const ReadOptions read_options(Env::IOActivity::kDBOpen); const ReadOptions read_options(Env::IOActivity::kDBOpen);
Status s = versions_->LogAndApply( const WriteOptions write_options(Env::IOActivity::kDBOpen);
recovery_ctx.cfds_, recovery_ctx.mutable_cf_opts_, read_options,
recovery_ctx.edit_lists_, &mutex_, directories_.GetDbDir()); Status s = versions_->LogAndApply(recovery_ctx.cfds_,
recovery_ctx.mutable_cf_opts_, read_options,
write_options, recovery_ctx.edit_lists_,
&mutex_, directories_.GetDbDir());
if (s.ok() && !(recovery_ctx.files_to_delete_.empty())) { if (s.ok() && !(recovery_ctx.files_to_delete_.empty())) {
mutex_.Unlock(); mutex_.Unlock();
for (const auto& stale_sst_file : recovery_ctx.files_to_delete_) { for (const auto& stale_sst_file : recovery_ctx.files_to_delete_) {
@ -1665,9 +1678,11 @@ Status DBImpl::WriteLevel0TableForRecovery(int job_id, ColumnFamilyData* cfd,
} }
IOStatus io_s; IOStatus io_s;
const ReadOptions read_option(Env::IOActivity::kDBOpen);
const WriteOptions write_option(Env::IO_HIGH, Env::IOActivity::kDBOpen);
TableBuilderOptions tboptions( TableBuilderOptions tboptions(
*cfd->ioptions(), mutable_cf_options, cfd->internal_comparator(), *cfd->ioptions(), mutable_cf_options, read_option, write_option,
cfd->int_tbl_prop_collector_factories(), cfd->internal_comparator(), cfd->int_tbl_prop_collector_factories(),
GetCompressionFlush(*cfd->ioptions(), mutable_cf_options), GetCompressionFlush(*cfd->ioptions(), mutable_cf_options),
mutable_cf_options.compression_opts, cfd->GetID(), cfd->GetName(), mutable_cf_options.compression_opts, cfd->GetID(), cfd->GetName(),
0 /* level */, false /* is_bottommost */, 0 /* level */, false /* is_bottommost */,
@ -1677,16 +1692,15 @@ Status DBImpl::WriteLevel0TableForRecovery(int job_id, ColumnFamilyData* cfd,
SeqnoToTimeMapping empty_seqno_to_time_mapping; SeqnoToTimeMapping empty_seqno_to_time_mapping;
Version* version = cfd->current(); Version* version = cfd->current();
version->Ref(); version->Ref();
const ReadOptions read_option(Env::IOActivity::kDBOpen);
uint64_t num_input_entries = 0; uint64_t num_input_entries = 0;
s = BuildTable( s = BuildTable(
dbname_, versions_.get(), immutable_db_options_, tboptions, dbname_, versions_.get(), immutable_db_options_, tboptions,
file_options_for_compaction_, read_option, cfd->table_cache(), file_options_for_compaction_, cfd->table_cache(), iter.get(),
iter.get(), std::move(range_del_iters), &meta, &blob_file_additions, std::move(range_del_iters), &meta, &blob_file_additions,
snapshot_seqs, earliest_write_conflict_snapshot, kMaxSequenceNumber, snapshot_seqs, earliest_write_conflict_snapshot, kMaxSequenceNumber,
snapshot_checker, paranoid_file_checks, cfd->internal_stats(), &io_s, snapshot_checker, paranoid_file_checks, cfd->internal_stats(), &io_s,
io_tracer_, BlobFileCreationReason::kRecovery, io_tracer_, BlobFileCreationReason::kRecovery,
empty_seqno_to_time_mapping, &event_logger_, job_id, Env::IO_HIGH, empty_seqno_to_time_mapping, &event_logger_, job_id,
nullptr /* table_properties */, write_hint, nullptr /* table_properties */, write_hint,
nullptr /*full_history_ts_low*/, &blob_callback_, version, nullptr /*full_history_ts_low*/, &blob_callback_, version,
&num_input_entries); &num_input_entries);
@ -1888,7 +1902,8 @@ Status DB::OpenAndTrimHistory(
return s; return s;
} }
IOStatus DBImpl::CreateWAL(uint64_t log_file_num, uint64_t recycle_log_number, IOStatus DBImpl::CreateWAL(const WriteOptions& write_options,
uint64_t log_file_num, uint64_t recycle_log_number,
size_t preallocate_block_size, size_t preallocate_block_size,
log::Writer** new_log) { log::Writer** new_log) {
IOStatus io_s; IOStatus io_s;
@ -1922,14 +1937,15 @@ IOStatus DBImpl::CreateWAL(uint64_t log_file_num, uint64_t recycle_log_number,
FileTypeSet tmp_set = immutable_db_options_.checksum_handoff_file_types; FileTypeSet tmp_set = immutable_db_options_.checksum_handoff_file_types;
std::unique_ptr<WritableFileWriter> file_writer(new WritableFileWriter( std::unique_ptr<WritableFileWriter> file_writer(new WritableFileWriter(
std::move(lfile), log_fname, opt_file_options, std::move(lfile), log_fname, opt_file_options,
immutable_db_options_.clock, io_tracer_, nullptr /* stats */, listeners, immutable_db_options_.clock, io_tracer_, nullptr /* stats */,
nullptr, tmp_set.Contains(FileType::kWalFile), Histograms::HISTOGRAM_ENUM_MAX /* hist_type */, listeners, nullptr,
tmp_set.Contains(FileType::kWalFile),
tmp_set.Contains(FileType::kWalFile))); tmp_set.Contains(FileType::kWalFile)));
*new_log = new log::Writer(std::move(file_writer), log_file_num, *new_log = new log::Writer(std::move(file_writer), log_file_num,
immutable_db_options_.recycle_log_file_num > 0, immutable_db_options_.recycle_log_file_num > 0,
immutable_db_options_.manual_wal_flush, immutable_db_options_.manual_wal_flush,
immutable_db_options_.wal_compression); immutable_db_options_.wal_compression);
io_s = (*new_log)->AddCompressionTypeRecord(); io_s = (*new_log)->AddCompressionTypeRecord(write_options);
} }
return io_s; return io_s;
} }
@ -1938,6 +1954,9 @@ Status DBImpl::Open(const DBOptions& db_options, const std::string& dbname,
const std::vector<ColumnFamilyDescriptor>& column_families, const std::vector<ColumnFamilyDescriptor>& column_families,
std::vector<ColumnFamilyHandle*>* handles, DB** dbptr, std::vector<ColumnFamilyHandle*>* handles, DB** dbptr,
const bool seq_per_batch, const bool batch_per_txn) { const bool seq_per_batch, const bool batch_per_txn) {
const WriteOptions write_options(Env::IOActivity::kDBOpen);
const ReadOptions read_options(Env::IOActivity::kDBOpen);
Status s = ValidateOptionsByTable(db_options, column_families); Status s = ValidateOptionsByTable(db_options, column_families);
if (!s.ok()) { if (!s.ok()) {
return s; return s;
@ -2014,7 +2033,7 @@ Status DBImpl::Open(const DBOptions& db_options, const std::string& dbname,
log::Writer* new_log = nullptr; log::Writer* new_log = nullptr;
const size_t preallocate_block_size = const size_t preallocate_block_size =
impl->GetWalPreallocateBlockSize(max_write_buffer_size); impl->GetWalPreallocateBlockSize(max_write_buffer_size);
s = impl->CreateWAL(new_log_number, 0 /*recycle_log_number*/, s = impl->CreateWAL(write_options, new_log_number, 0 /*recycle_log_number*/,
preallocate_block_size, &new_log); preallocate_block_size, &new_log);
if (s.ok()) { if (s.ok()) {
InstrumentedMutexLock wl(&impl->log_write_mutex_); InstrumentedMutexLock wl(&impl->log_write_mutex_);
@ -2039,21 +2058,25 @@ Status DBImpl::Open(const DBOptions& db_options, const std::string& dbname,
if (recovered_seq != kMaxSequenceNumber) { if (recovered_seq != kMaxSequenceNumber) {
WriteBatch empty_batch; WriteBatch empty_batch;
WriteBatchInternal::SetSequence(&empty_batch, recovered_seq); WriteBatchInternal::SetSequence(&empty_batch, recovered_seq);
WriteOptions write_options;
uint64_t log_used, log_size; uint64_t log_used, log_size;
log::Writer* log_writer = impl->logs_.back().writer; log::Writer* log_writer = impl->logs_.back().writer;
LogFileNumberSize& log_file_number_size = impl->alive_log_files_.back(); LogFileNumberSize& log_file_number_size = impl->alive_log_files_.back();
assert(log_writer->get_log_number() == log_file_number_size.number); assert(log_writer->get_log_number() == log_file_number_size.number);
impl->mutex_.AssertHeld(); impl->mutex_.AssertHeld();
s = impl->WriteToWAL(empty_batch, log_writer, &log_used, &log_size, s = impl->WriteToWAL(empty_batch, write_options, log_writer, &log_used,
Env::IO_TOTAL, log_file_number_size); &log_size, log_file_number_size);
if (s.ok()) { if (s.ok()) {
// Need to fsync, otherwise it might get lost after a power reset. // Need to fsync, otherwise it might get lost after a power reset.
s = impl->FlushWAL(false); s = impl->FlushWAL(write_options, false);
TEST_SYNC_POINT_CALLBACK("DBImpl::Open::BeforeSyncWAL", /*arg=*/&s); TEST_SYNC_POINT_CALLBACK("DBImpl::Open::BeforeSyncWAL", /*arg=*/&s);
IOOptions opts;
if (s.ok()) { if (s.ok()) {
s = log_writer->file()->Sync(impl->immutable_db_options_.use_fsync); s = WritableFileWriter::PrepareIOOptions(write_options, opts);
}
if (s.ok()) {
s = log_writer->file()->Sync(opts,
impl->immutable_db_options_.use_fsync);
} }
} }
} }
@ -2084,7 +2107,8 @@ Status DBImpl::Open(const DBOptions& db_options, const std::string& dbname,
impl->mutex_.Unlock(); impl->mutex_.Unlock();
// NOTE: the work normally done in WrapUpCreateColumnFamilies will // NOTE: the work normally done in WrapUpCreateColumnFamilies will
// be done separately below. // be done separately below.
s = impl->CreateColumnFamilyImpl(cf.options, cf.name, &handle); s = impl->CreateColumnFamilyImpl(read_options, write_options,
cf.options, cf.name, &handle);
impl->mutex_.Lock(); impl->mutex_.Lock();
if (s.ok()) { if (s.ok()) {
handles->push_back(handle); handles->push_back(handle);
@ -2136,7 +2160,7 @@ Status DBImpl::Open(const DBOptions& db_options, const std::string& dbname,
// Persist RocksDB Options before scheduling the compaction. // Persist RocksDB Options before scheduling the compaction.
// The WriteOptionsFile() will release and lock the mutex internally. // The WriteOptionsFile() will release and lock the mutex internally.
persist_options_status = persist_options_status =
impl->WriteOptionsFile(true /*db_mutex_already_held*/); impl->WriteOptionsFile(write_options, true /*db_mutex_already_held*/);
*dbptr = impl; *dbptr = impl;
impl->opened_successfully_ = true; impl->opened_successfully_ = true;
impl->DeleteObsoleteFiles(); impl->DeleteObsoleteFiles();
@ -2236,12 +2260,17 @@ Status DBImpl::Open(const DBOptions& db_options, const std::string& dbname,
impl); impl);
LogFlush(impl->immutable_db_options_.info_log); LogFlush(impl->immutable_db_options_.info_log);
if (!impl->WALBufferIsEmpty()) { if (!impl->WALBufferIsEmpty()) {
s = impl->FlushWAL(false); s = impl->FlushWAL(write_options, false);
if (s.ok()) { if (s.ok()) {
// Sync is needed otherwise WAL buffered data might get lost after a // Sync is needed otherwise WAL buffered data might get lost after a
// power reset. // power reset.
log::Writer* log_writer = impl->logs_.back().writer; log::Writer* log_writer = impl->logs_.back().writer;
s = log_writer->file()->Sync(impl->immutable_db_options_.use_fsync); IOOptions opts;
s = WritableFileWriter::PrepareIOOptions(write_options, opts);
if (s.ok()) {
s = log_writer->file()->Sync(opts,
impl->immutable_db_options_.use_fsync);
}
} }
} }
if (s.ok() && !persist_options_status.ok()) { if (s.ok() && !persist_options_status.ok()) {
@ -2258,7 +2287,8 @@ Status DBImpl::Open(const DBOptions& db_options, const std::string& dbname,
s = impl->StartPeriodicTaskScheduler(); s = impl->StartPeriodicTaskScheduler();
} }
if (s.ok()) { if (s.ok()) {
s = impl->RegisterRecordSeqnoTimeWorker(recovery_ctx.is_new_db_); s = impl->RegisterRecordSeqnoTimeWorker(read_options, write_options,
recovery_ctx.is_new_db_);
} }
impl->options_mutex_.Unlock(); impl->options_mutex_.Unlock();
if (!s.ok()) { if (!s.ok()) {

View File

@ -620,9 +620,9 @@ Status DBImpl::WriteImpl(const WriteOptions& write_options,
log_write_mutex_.Unlock(); log_write_mutex_.Unlock();
if (status.ok() && synced_wals.IsWalAddition()) { if (status.ok() && synced_wals.IsWalAddition()) {
InstrumentedMutexLock l(&mutex_); InstrumentedMutexLock l(&mutex_);
// TODO: plumb Env::IOActivity // TODO: plumb Env::IOActivity, Env::IOPriority
const ReadOptions read_options; const ReadOptions read_options;
status = ApplyWALToManifest(read_options, &synced_wals); status = ApplyWALToManifest(read_options, write_options, &synced_wals);
} }
// Requesting sync with two_write_queues_ is expected to be very rare. We // Requesting sync with two_write_queues_ is expected to be very rare. We
@ -783,9 +783,9 @@ Status DBImpl::PipelinedWriteImpl(const WriteOptions& write_options,
} }
if (w.status.ok() && synced_wals.IsWalAddition()) { if (w.status.ok() && synced_wals.IsWalAddition()) {
InstrumentedMutexLock l(&mutex_); InstrumentedMutexLock l(&mutex_);
// TODO: plumb Env::IOActivity // TODO: plumb Env::IOActivity, Env::IOPriority
const ReadOptions read_options; const ReadOptions read_options;
w.status = ApplyWALToManifest(read_options, &synced_wals); w.status = ApplyWALToManifest(read_options, write_options, &synced_wals);
} }
write_thread_.ExitAsBatchGroupLeader(wal_write_group, w.status); write_thread_.ExitAsBatchGroupLeader(wal_write_group, w.status);
} }
@ -1318,9 +1318,9 @@ Status DBImpl::MergeBatch(const WriteThread::WriteGroup& write_group,
// When two_write_queues_ is disabled, this function is called from the only // When two_write_queues_ is disabled, this function is called from the only
// write thread. Otherwise this must be called holding log_write_mutex_. // write thread. Otherwise this must be called holding log_write_mutex_.
IOStatus DBImpl::WriteToWAL(const WriteBatch& merged_batch, IOStatus DBImpl::WriteToWAL(const WriteBatch& merged_batch,
const WriteOptions& write_options,
log::Writer* log_writer, uint64_t* log_used, log::Writer* log_writer, uint64_t* log_used,
uint64_t* log_size, uint64_t* log_size,
Env::IOPriority rate_limiter_priority,
LogFileNumberSize& log_file_number_size) { LogFileNumberSize& log_file_number_size) {
assert(log_size != nullptr); assert(log_size != nullptr);
@ -1343,12 +1343,11 @@ IOStatus DBImpl::WriteToWAL(const WriteBatch& merged_batch,
log_write_mutex_.Lock(); log_write_mutex_.Lock();
} }
IOStatus io_s = log_writer->MaybeAddUserDefinedTimestampSizeRecord( IOStatus io_s = log_writer->MaybeAddUserDefinedTimestampSizeRecord(
versions_->GetColumnFamiliesTimestampSizeForRecord(), write_options, versions_->GetColumnFamiliesTimestampSizeForRecord());
rate_limiter_priority);
if (!io_s.ok()) { if (!io_s.ok()) {
return io_s; return io_s;
} }
io_s = log_writer->AddRecord(log_entry, rate_limiter_priority); io_s = log_writer->AddRecord(write_options, log_entry);
if (UNLIKELY(needs_locking)) { if (UNLIKELY(needs_locking)) {
log_write_mutex_.Unlock(); log_write_mutex_.Unlock();
@ -1391,9 +1390,13 @@ IOStatus DBImpl::WriteToWAL(const WriteThread::WriteGroup& write_group,
WriteBatchInternal::SetSequence(merged_batch, sequence); WriteBatchInternal::SetSequence(merged_batch, sequence);
uint64_t log_size; uint64_t log_size;
io_s = WriteToWAL(*merged_batch, log_writer, log_used, &log_size,
write_group.leader->rate_limiter_priority, // TODO: plumb Env::IOActivity, Env::IOPriority
log_file_number_size); WriteOptions write_options;
write_options.rate_limiter_priority =
write_group.leader->rate_limiter_priority;
io_s = WriteToWAL(*merged_batch, write_options, log_writer, log_used,
&log_size, log_file_number_size);
if (to_be_cached_state) { if (to_be_cached_state) {
cached_recoverable_state_ = *to_be_cached_state; cached_recoverable_state_ = *to_be_cached_state;
cached_recoverable_state_empty_ = false; cached_recoverable_state_empty_ = false;
@ -1420,11 +1423,18 @@ IOStatus DBImpl::WriteToWAL(const WriteThread::WriteGroup& write_group,
log_write_mutex_.Lock(); log_write_mutex_.Lock();
} }
if (io_s.ok()) {
for (auto& log : logs_) { for (auto& log : logs_) {
io_s = log.writer->file()->Sync(immutable_db_options_.use_fsync); IOOptions opts;
io_s = WritableFileWriter::PrepareIOOptions(write_options, opts);
if (!io_s.ok()) { if (!io_s.ok()) {
break; break;
} }
io_s = log.writer->file()->Sync(opts, immutable_db_options_.use_fsync);
if (!io_s.ok()) {
break;
}
}
} }
if (UNLIKELY(needs_locking)) { if (UNLIKELY(needs_locking)) {
@ -1496,9 +1506,13 @@ IOStatus DBImpl::ConcurrentWriteToWAL(
assert(log_writer->get_log_number() == log_file_number_size.number); assert(log_writer->get_log_number() == log_file_number_size.number);
uint64_t log_size; uint64_t log_size;
io_s = WriteToWAL(*merged_batch, log_writer, log_used, &log_size,
write_group.leader->rate_limiter_priority, // TODO: plumb Env::IOActivity, Env::IOPriority
log_file_number_size); WriteOptions write_options;
write_options.rate_limiter_priority =
write_group.leader->rate_limiter_priority;
io_s = WriteToWAL(*merged_batch, write_options, log_writer, log_used,
&log_size, log_file_number_size);
if (to_be_cached_state) { if (to_be_cached_state) {
cached_recoverable_state_ = *to_be_cached_state; cached_recoverable_state_ = *to_be_cached_state;
cached_recoverable_state_empty_ = false; cached_recoverable_state_empty_ = false;
@ -2117,8 +2131,10 @@ void DBImpl::NotifyOnMemTableSealed(ColumnFamilyData* /*cfd*/,
// two_write_queues_ is true (This is to simplify the reasoning.) // two_write_queues_ is true (This is to simplify the reasoning.)
Status DBImpl::SwitchMemtable(ColumnFamilyData* cfd, WriteContext* context) { Status DBImpl::SwitchMemtable(ColumnFamilyData* cfd, WriteContext* context) {
mutex_.AssertHeld(); mutex_.AssertHeld();
// TODO: plumb Env::IOActivity // TODO: plumb Env::IOActivity, Env::IOPriority
const ReadOptions read_options; const ReadOptions read_options;
const WriteOptions write_options;
log::Writer* new_log = nullptr; log::Writer* new_log = nullptr;
MemTable* new_mem = nullptr; MemTable* new_mem = nullptr;
IOStatus io_s; IOStatus io_s;
@ -2165,8 +2181,8 @@ Status DBImpl::SwitchMemtable(ColumnFamilyData* cfd, WriteContext* context) {
if (creating_new_log) { if (creating_new_log) {
// TODO: Write buffer size passed in should be max of all CF's instead // TODO: Write buffer size passed in should be max of all CF's instead
// of mutable_cf_options.write_buffer_size. // of mutable_cf_options.write_buffer_size.
io_s = CreateWAL(new_log_number, recycle_log_number, preallocate_block_size, io_s = CreateWAL(write_options, new_log_number, recycle_log_number,
&new_log); preallocate_block_size, &new_log);
if (s.ok()) { if (s.ok()) {
s = io_s; s = io_s;
} }
@ -2203,7 +2219,7 @@ Status DBImpl::SwitchMemtable(ColumnFamilyData* cfd, WriteContext* context) {
// In recovery path, we force another try of writing WAL buffer. // In recovery path, we force another try of writing WAL buffer.
cur_log_writer->file()->reset_seen_error(); cur_log_writer->file()->reset_seen_error();
} }
io_s = cur_log_writer->WriteBuffer(); io_s = cur_log_writer->WriteBuffer(write_options);
if (s.ok()) { if (s.ok()) {
s = io_s; s = io_s;
} }
@ -2271,7 +2287,8 @@ Status DBImpl::SwitchMemtable(ColumnFamilyData* cfd, WriteContext* context) {
VersionEdit wal_deletion; VersionEdit wal_deletion;
wal_deletion.DeleteWalsBefore(min_wal_number_to_keep); wal_deletion.DeleteWalsBefore(min_wal_number_to_keep);
s = versions_->LogAndApplyToDefaultColumnFamily( s = versions_->LogAndApplyToDefaultColumnFamily(
read_options, &wal_deletion, &mutex_, directories_.GetDbDir()); read_options, write_options, &wal_deletion, &mutex_,
directories_.GetDbDir());
if (!s.ok() && versions_->io_status().IsIOError()) { if (!s.ok() && versions_->io_status().IsIOError()) {
s = error_handler_.SetBGError(versions_->io_status(), s = error_handler_.SetBGError(versions_->io_status(),
BackgroundErrorReason::kManifestWrite); BackgroundErrorReason::kManifestWrite);

View File

@ -201,6 +201,7 @@ bool DBIter::SetBlobValueIfNeeded(const Slice& user_key,
// TODO: consider moving ReadOptions from ArenaWrappedDBIter to DBIter to // TODO: consider moving ReadOptions from ArenaWrappedDBIter to DBIter to
// avoid having to copy options back and forth. // avoid having to copy options back and forth.
// TODO: plumb Env::IOActivity, Env::IOPriority
ReadOptions read_options; ReadOptions read_options;
read_options.read_tier = read_tier_; read_options.read_tier = read_tier_;
read_options.fill_cache = fill_cache_; read_options.fill_cache = fill_cache_;

View File

@ -126,6 +126,10 @@ class DBIter final : public Iterator {
void operator=(const DBIter&) = delete; void operator=(const DBIter&) = delete;
~DBIter() override { ~DBIter() override {
ThreadStatus::OperationType cur_op_type =
ThreadStatusUtil::GetThreadOperation();
ThreadStatusUtil::SetThreadOperation(
ThreadStatus::OperationType::OP_UNKNOWN);
// Release pinned data if any // Release pinned data if any
if (pinned_iters_mgr_.PinningEnabled()) { if (pinned_iters_mgr_.PinningEnabled()) {
pinned_iters_mgr_.ReleasePinnedData(); pinned_iters_mgr_.ReleasePinnedData();
@ -134,6 +138,7 @@ class DBIter final : public Iterator {
ResetInternalKeysSkippedCounter(); ResetInternalKeysSkippedCounter();
local_stats_.BumpGlobalStatistics(statistics_); local_stats_.BumpGlobalStatistics(statistics_);
iter_.DeleteIter(arena_mode_); iter_.DeleteIter(arena_mode_);
ThreadStatusUtil::SetThreadOperation(cur_op_type);
} }
void SetIter(InternalIterator* iter) { void SetIter(InternalIterator* iter) {
assert(iter_.iter() == nullptr); assert(iter_.iter() == nullptr);

View File

@ -957,15 +957,18 @@ TEST_F(DBSSTTest, OpenDBWithExistingTrashAndObsoleteSstFile) {
// Add some trash files to the db directory so the DB can clean them up // Add some trash files to the db directory so the DB can clean them up
ASSERT_OK(env_->CreateDirIfMissing(dbname_)); ASSERT_OK(env_->CreateDirIfMissing(dbname_));
ASSERT_OK(WriteStringToFile(env_, "abc", dbname_ + "/" + "001.sst.trash")); ASSERT_OK(
ASSERT_OK(WriteStringToFile(env_, "abc", dbname_ + "/" + "002.sst.trash")); WriteStringToFile(env_, "abc", dbname_ + "/" + "001.sst.trash", false));
ASSERT_OK(WriteStringToFile(env_, "abc", dbname_ + "/" + "003.sst.trash")); ASSERT_OK(
WriteStringToFile(env_, "abc", dbname_ + "/" + "002.sst.trash", false));
ASSERT_OK(
WriteStringToFile(env_, "abc", dbname_ + "/" + "003.sst.trash", false));
// Manually add an obsolete sst file. Obsolete SST files are discovered and // Manually add an obsolete sst file. Obsolete SST files are discovered and
// deleted upon recovery. // deleted upon recovery.
constexpr uint64_t kSstFileNumber = 100; constexpr uint64_t kSstFileNumber = 100;
const std::string kObsoleteSstFile = const std::string kObsoleteSstFile =
MakeTableFileName(dbname_, kSstFileNumber); MakeTableFileName(dbname_, kSstFileNumber);
ASSERT_OK(WriteStringToFile(env_, "abc", kObsoleteSstFile)); ASSERT_OK(WriteStringToFile(env_, "abc", kObsoleteSstFile, false));
// Reopen the DB and verify that it deletes existing trash files and obsolete // Reopen the DB and verify that it deletes existing trash files and obsolete
// SST files with rate limiting. // SST files with rate limiting.

View File

@ -5691,7 +5691,7 @@ TEST_F(DBTest2, CrashInRecoveryMultipleCF) {
ASSERT_OK(ReadFileToString(env_, fname, &file_content)); ASSERT_OK(ReadFileToString(env_, fname, &file_content));
file_content[400] = 'h'; file_content[400] = 'h';
file_content[401] = 'a'; file_content[401] = 'a';
ASSERT_OK(WriteStringToFile(env_, file_content, fname)); ASSERT_OK(WriteStringToFile(env_, file_content, fname, false));
break; break;
} }
} }

View File

@ -1561,7 +1561,7 @@ class RecoveryTestHelper {
new log::Writer(std::move(file_writer), current_log_number, new log::Writer(std::move(file_writer), current_log_number,
db_options.recycle_log_file_num > 0, false, db_options.recycle_log_file_num > 0, false,
db_options.wal_compression); db_options.wal_compression);
ASSERT_OK(log_writer->AddCompressionTypeRecord()); ASSERT_OK(log_writer->AddCompressionTypeRecord(WriteOptions()));
current_log_writer.reset(log_writer); current_log_writer.reset(log_writer);
WriteBatch batch; WriteBatch batch;
@ -1574,7 +1574,7 @@ class RecoveryTestHelper {
ASSERT_OK(batch.Put(key, value)); ASSERT_OK(batch.Put(key, value));
WriteBatchInternal::SetSequence(&batch, seq); WriteBatchInternal::SetSequence(&batch, seq);
ASSERT_OK(current_log_writer->AddRecord( ASSERT_OK(current_log_writer->AddRecord(
WriteBatchInternal::Contents(&batch))); WriteOptions(), WriteBatchInternal::Contents(&batch)));
versions->SetLastAllocatedSequence(seq); versions->SetLastAllocatedSequence(seq);
versions->SetLastPublishedSequence(seq); versions->SetLastPublishedSequence(seq);
versions->SetLastSequence(seq); versions->SetLastSequence(seq);

View File

@ -38,8 +38,9 @@ Status UpdateManifestForFilesState(
const DBOptions& db_opts, const std::string& db_name, const DBOptions& db_opts, const std::string& db_name,
const std::vector<ColumnFamilyDescriptor>& column_families, const std::vector<ColumnFamilyDescriptor>& column_families,
const UpdateManifestForFilesStateOptions& opts) { const UpdateManifestForFilesStateOptions& opts) {
// TODO: plumb Env::IOActivity // TODO: plumb Env::IOActivity, Env::IOPriority
const ReadOptions read_options; const ReadOptions read_options;
const WriteOptions write_options;
OfflineManifestWriter w(db_opts, db_name); OfflineManifestWriter w(db_opts, db_name);
Status s = w.Recover(column_families); Status s = w.Recover(column_families);
@ -117,7 +118,8 @@ Status UpdateManifestForFilesState(
std::unique_ptr<FSDirectory> db_dir; std::unique_ptr<FSDirectory> db_dir;
s = fs->NewDirectory(db_name, IOOptions(), &db_dir, nullptr); s = fs->NewDirectory(db_name, IOOptions(), &db_dir, nullptr);
if (s.ok()) { if (s.ok()) {
s = w.LogAndApply(read_options, cfd, &edit, db_dir.get()); s = w.LogAndApply(read_options, write_options, cfd, &edit,
db_dir.get());
} }
if (s.ok()) { if (s.ok()) {
++cfs_updated; ++cfs_updated;

View File

@ -710,7 +710,7 @@ Status ExternalSstFileIngestionJob::GetIngestedFileInfo(
// If customized readahead size is needed, we can pass a user option // If customized readahead size is needed, we can pass a user option
// all the way to here. Right now we just rely on the default readahead // all the way to here. Right now we just rely on the default readahead
// to keep things simple. // to keep things simple.
// TODO: plumb Env::IOActivity // TODO: plumb Env::IOActivity, Env::IOPriority
ReadOptions ro; ReadOptions ro;
ro.readahead_size = ingestion_options_.verify_checksums_readahead_size; ro.readahead_size = ingestion_options_.verify_checksums_readahead_size;
status = table_reader->VerifyChecksum( status = table_reader->VerifyChecksum(
@ -764,7 +764,7 @@ Status ExternalSstFileIngestionJob::GetIngestedFileInfo(
file_to_ingest->num_range_deletions = props->num_range_deletions; file_to_ingest->num_range_deletions = props->num_range_deletions;
ParsedInternalKey key; ParsedInternalKey key;
// TODO: plumb Env::IOActivity // TODO: plumb Env::IOActivity, Env::IOPriority
ReadOptions ro; ReadOptions ro;
std::unique_ptr<InternalIterator> iter(table_reader->NewIterator( std::unique_ptr<InternalIterator> iter(table_reader->NewIterator(
ro, sv->mutable_cf_options.prefix_extractor.get(), /*arena=*/nullptr, ro, sv->mutable_cf_options.prefix_extractor.get(), /*arena=*/nullptr,
@ -902,7 +902,7 @@ Status ExternalSstFileIngestionJob::AssignLevelAndSeqnoForIngestedFile(
bool overlap_with_db = false; bool overlap_with_db = false;
Arena arena; Arena arena;
// TODO: plumb Env::IOActivity // TODO: plumb Env::IOActivity, Env::IOPriority
ReadOptions ro; ReadOptions ro;
ro.total_order_seek = true; ro.total_order_seek = true;
int target_level = 0; int target_level = 0;

View File

@ -572,7 +572,7 @@ TEST_P(FaultInjectionTest, NoDuplicateTrailingEntries) {
edit.SetColumnFamily(0); edit.SetColumnFamily(0);
std::string buf; std::string buf;
assert(edit.EncodeTo(&buf)); assert(edit.EncodeTo(&buf));
const Status s = log_writer->AddRecord(buf); const Status s = log_writer->AddRecord(WriteOptions(), buf);
ASSERT_NOK(s); ASSERT_NOK(s);
} }

View File

@ -409,7 +409,7 @@ Status FlushJob::MemPurge() {
// Create two iterators, one for the memtable data (contains // Create two iterators, one for the memtable data (contains
// info from puts + deletes), and one for the memtable // info from puts + deletes), and one for the memtable
// Range Tombstones (from DeleteRanges). // Range Tombstones (from DeleteRanges).
// TODO: plumb Env::IOActivity // TODO: plumb Env::IOActivity, Env::IOPriority
ReadOptions ro; ReadOptions ro;
ro.total_order_seek = true; ro.total_order_seek = true;
Arena arena; Arena arena;
@ -701,8 +701,8 @@ bool FlushJob::MemPurgeDecider(double threshold) {
// Cochran formula for determining sample size. // Cochran formula for determining sample size.
// 95% confidence interval, 7% precision. // 95% confidence interval, 7% precision.
// n0 = (1.96*1.96)*0.25/(0.07*0.07) = 196.0 // n0 = (1.96*1.96)*0.25/(0.07*0.07) = 196.0
// TODO: plumb Env::IOActivity
double n0 = 196.0; double n0 = 196.0;
// TODO: plumb Env::IOActivity, Env::IOPriority
ReadOptions ro; ReadOptions ro;
ro.total_order_seek = true; ro.total_order_seek = true;
@ -961,28 +961,29 @@ Status FlushJob::WriteLevel0Table() {
const std::string* const full_history_ts_low = const std::string* const full_history_ts_low =
(full_history_ts_low_.empty()) ? nullptr : &full_history_ts_low_; (full_history_ts_low_.empty()) ? nullptr : &full_history_ts_low_;
const ReadOptions read_options(Env::IOActivity::kFlush);
const WriteOptions write_options(io_priority, Env::IOActivity::kFlush);
TableBuilderOptions tboptions( TableBuilderOptions tboptions(
*cfd_->ioptions(), mutable_cf_options_, cfd_->internal_comparator(), *cfd_->ioptions(), mutable_cf_options_, read_options, write_options,
cfd_->int_tbl_prop_collector_factories(), output_compression_, cfd_->internal_comparator(), cfd_->int_tbl_prop_collector_factories(),
mutable_cf_options_.compression_opts, cfd_->GetID(), cfd_->GetName(), output_compression_, mutable_cf_options_.compression_opts,
0 /* level */, false /* is_bottommost */, cfd_->GetID(), cfd_->GetName(), 0 /* level */,
TableFileCreationReason::kFlush, oldest_key_time, current_time, false /* is_bottommost */, TableFileCreationReason::kFlush,
db_id_, db_session_id_, 0 /* target_file_size */, oldest_key_time, current_time, db_id_, db_session_id_,
meta_.fd.GetNumber()); 0 /* target_file_size */, meta_.fd.GetNumber());
const SequenceNumber job_snapshot_seq = const SequenceNumber job_snapshot_seq =
job_context_->GetJobSnapshotSequence(); job_context_->GetJobSnapshotSequence();
const ReadOptions read_options(Env::IOActivity::kFlush);
s = BuildTable(dbname_, versions_, db_options_, tboptions, file_options_, s = BuildTable(
read_options, cfd_->table_cache(), iter.get(), dbname_, versions_, db_options_, tboptions, file_options_,
std::move(range_del_iters), &meta_, &blob_file_additions, cfd_->table_cache(), iter.get(), std::move(range_del_iters), &meta_,
existing_snapshots_, earliest_write_conflict_snapshot_, &blob_file_additions, existing_snapshots_,
job_snapshot_seq, snapshot_checker_, earliest_write_conflict_snapshot_, job_snapshot_seq,
mutable_cf_options_.paranoid_file_checks, snapshot_checker_, mutable_cf_options_.paranoid_file_checks,
cfd_->internal_stats(), &io_s, io_tracer_, cfd_->internal_stats(), &io_s, io_tracer_,
BlobFileCreationReason::kFlush, seqno_to_time_mapping_, BlobFileCreationReason::kFlush, seqno_to_time_mapping_, event_logger_,
event_logger_, job_context_->job_id, io_priority, job_context_->job_id, &table_properties_, write_hint,
&table_properties_, write_hint, full_history_ts_low, full_history_ts_low, blob_callback_, base_, &num_input_entries,
blob_callback_, base_, &num_input_entries,
&memtable_payload_bytes, &memtable_garbage_bytes); &memtable_payload_bytes, &memtable_garbage_bytes);
TEST_SYNC_POINT_CALLBACK("FlushJob::WriteLevel0Table:s", &s); TEST_SYNC_POINT_CALLBACK("FlushJob::WriteLevel0Table:s", &s);
// TODO: Cleanup io_status in BuildTable and table builders // TODO: Cleanup io_status in BuildTable and table builders
@ -1177,8 +1178,9 @@ Status FlushJob::MaybeIncreaseFullHistoryTsLowToAboveCutoffUDT() {
VersionEdit edit; VersionEdit edit;
edit.SetColumnFamily(cfd_->GetID()); edit.SetColumnFamily(cfd_->GetID());
edit.SetFullHistoryTsLow(new_full_history_ts_low); edit.SetFullHistoryTsLow(new_full_history_ts_low);
// TODO: plumb Env::IOActivity, Env::IOPriority
return versions_->LogAndApply(cfd_, *cfd_->GetLatestMutableCFOptions(), return versions_->LogAndApply(cfd_, *cfd_->GetLatestMutableCFOptions(),
ReadOptions(), &edit, db_mutex_, ReadOptions(), WriteOptions(), &edit, db_mutex_,
output_file_directory_); output_file_directory_);
} }

View File

@ -55,7 +55,7 @@ class FlushJobTestBase : public testing::Test {
} }
void NewDB() { void NewDB() {
ASSERT_OK(SetIdentityFile(env_, dbname_)); ASSERT_OK(SetIdentityFile(WriteOptions(), env_, dbname_));
VersionEdit new_db; VersionEdit new_db;
new_db.SetLogNumber(0); new_db.SetLogNumber(0);
@ -89,19 +89,19 @@ class FlushJobTestBase : public testing::Test {
log::Writer log(std::move(file_writer), 0, false); log::Writer log(std::move(file_writer), 0, false);
std::string record; std::string record;
new_db.EncodeTo(&record); new_db.EncodeTo(&record);
s = log.AddRecord(record); s = log.AddRecord(WriteOptions(), record);
ASSERT_OK(s); ASSERT_OK(s);
for (const auto& e : new_cfs) { for (const auto& e : new_cfs) {
record.clear(); record.clear();
e.EncodeTo(&record); e.EncodeTo(&record);
s = log.AddRecord(record); s = log.AddRecord(WriteOptions(), record);
ASSERT_OK(s); ASSERT_OK(s);
} }
} }
ASSERT_OK(s); ASSERT_OK(s);
// Make "CURRENT" file that points to the new manifest file. // Make "CURRENT" file that points to the new manifest file.
s = SetCurrentFile(fs_.get(), dbname_, 1, nullptr); s = SetCurrentFile(WriteOptions(), fs_.get(), dbname_, 1, nullptr);
ASSERT_OK(s); ASSERT_OK(s);
} }

View File

@ -355,7 +355,7 @@ Status ImportColumnFamilyJob::GetIngestedFileInfo(
// in file_meta. // in file_meta.
if (file_meta.smallest.empty()) { if (file_meta.smallest.empty()) {
assert(file_meta.largest.empty()); assert(file_meta.largest.empty());
// TODO: plumb Env::IOActivity // TODO: plumb Env::IOActivity, Env::IOPriority
ReadOptions ro; ReadOptions ro;
std::unique_ptr<InternalIterator> iter(table_reader->NewIterator( std::unique_ptr<InternalIterator> iter(table_reader->NewIterator(
ro, sv->mutable_cf_options.prefix_extractor.get(), /*arena=*/nullptr, ro, sv->mutable_cf_options.prefix_extractor.get(), /*arena=*/nullptr,

View File

@ -1155,7 +1155,7 @@ bool InternalStats::HandleSsTables(std::string* value, Slice /*suffix*/) {
bool InternalStats::HandleAggregatedTableProperties(std::string* value, bool InternalStats::HandleAggregatedTableProperties(std::string* value,
Slice /*suffix*/) { Slice /*suffix*/) {
std::shared_ptr<const TableProperties> tp; std::shared_ptr<const TableProperties> tp;
// TODO: plumb Env::IOActivity // TODO: plumb Env::IOActivity, Env::IOPriority
const ReadOptions read_options; const ReadOptions read_options;
auto s = cfd_->current()->GetAggregatedTableProperties(read_options, &tp); auto s = cfd_->current()->GetAggregatedTableProperties(read_options, &tp);
if (!s.ok()) { if (!s.ok()) {
@ -1177,7 +1177,7 @@ static std::map<std::string, std::string> MapUint64ValuesToString(
bool InternalStats::HandleAggregatedTablePropertiesMap( bool InternalStats::HandleAggregatedTablePropertiesMap(
std::map<std::string, std::string>* values, Slice /*suffix*/) { std::map<std::string, std::string>* values, Slice /*suffix*/) {
std::shared_ptr<const TableProperties> tp; std::shared_ptr<const TableProperties> tp;
// TODO: plumb Env::IOActivity // TODO: plumb Env::IOActivity, Env::IOPriority
const ReadOptions read_options; const ReadOptions read_options;
auto s = cfd_->current()->GetAggregatedTableProperties(read_options, &tp); auto s = cfd_->current()->GetAggregatedTableProperties(read_options, &tp);
if (!s.ok()) { if (!s.ok()) {
@ -1195,7 +1195,7 @@ bool InternalStats::HandleAggregatedTablePropertiesAtLevel(std::string* values,
return false; return false;
} }
std::shared_ptr<const TableProperties> tp; std::shared_ptr<const TableProperties> tp;
// TODO: plumb Env::IOActivity // TODO: plumb Env::IOActivity, Env::IOPriority
const ReadOptions read_options; const ReadOptions read_options;
auto s = cfd_->current()->GetAggregatedTableProperties( auto s = cfd_->current()->GetAggregatedTableProperties(
read_options, &tp, static_cast<int>(level)); read_options, &tp, static_cast<int>(level));
@ -1214,7 +1214,7 @@ bool InternalStats::HandleAggregatedTablePropertiesAtLevelMap(
return false; return false;
} }
std::shared_ptr<const TableProperties> tp; std::shared_ptr<const TableProperties> tp;
// TODO: plumb Env::IOActivity // TODO: plumb Env::IOActivity, Env::IOPriority
const ReadOptions read_options; const ReadOptions read_options;
auto s = cfd_->current()->GetAggregatedTableProperties( auto s = cfd_->current()->GetAggregatedTableProperties(
read_options, &tp, static_cast<int>(level)); read_options, &tp, static_cast<int>(level));
@ -1418,7 +1418,7 @@ bool InternalStats::HandleEstimatePendingCompactionBytes(uint64_t* value,
bool InternalStats::HandleEstimateTableReadersMem(uint64_t* value, bool InternalStats::HandleEstimateTableReadersMem(uint64_t* value,
DBImpl* /*db*/, DBImpl* /*db*/,
Version* version) { Version* version) {
// TODO: plumb Env::IOActivity // TODO: plumb Env::IOActivity, Env::IOPriority
const ReadOptions read_options; const ReadOptions read_options;
*value = (version == nullptr) *value = (version == nullptr)
? 0 ? 0
@ -1473,7 +1473,7 @@ bool InternalStats::HandleEstimateOldestKeyTime(uint64_t* value, DBImpl* /*db*/,
->compaction_options_fifo.allow_compaction) { ->compaction_options_fifo.allow_compaction) {
return false; return false;
} }
// TODO: plumb Env::IOActivity // TODO: plumb Env::IOActivity, Env::IOPriority
const ReadOptions read_options; const ReadOptions read_options;
TablePropertiesCollection collection; TablePropertiesCollection collection;
auto s = cfd_->current()->GetPropertiesOfAllTables(read_options, &collection); auto s = cfd_->current()->GetPropertiesOfAllTables(read_options, &collection);

View File

@ -185,9 +185,10 @@ class LogTest
void Write(const std::string& msg, void Write(const std::string& msg,
const UnorderedMap<uint32_t, size_t>* cf_to_ts_sz = nullptr) { const UnorderedMap<uint32_t, size_t>* cf_to_ts_sz = nullptr) {
if (cf_to_ts_sz != nullptr && !cf_to_ts_sz->empty()) { if (cf_to_ts_sz != nullptr && !cf_to_ts_sz->empty()) {
ASSERT_OK(writer_->MaybeAddUserDefinedTimestampSizeRecord(*cf_to_ts_sz)); ASSERT_OK(writer_->MaybeAddUserDefinedTimestampSizeRecord(WriteOptions(),
*cf_to_ts_sz));
} }
ASSERT_OK(writer_->AddRecord(Slice(msg))); ASSERT_OK(writer_->AddRecord(WriteOptions(), Slice(msg)));
} }
size_t WrittenBytes() const { return dest_contents().size(); } size_t WrittenBytes() const { return dest_contents().size(); }
@ -732,8 +733,8 @@ TEST_P(LogTest, Recycle) {
std::unique_ptr<WritableFileWriter> dest_holder(new WritableFileWriter( std::unique_ptr<WritableFileWriter> dest_holder(new WritableFileWriter(
std::move(sink), "" /* don't care */, FileOptions())); std::move(sink), "" /* don't care */, FileOptions()));
Writer recycle_writer(std::move(dest_holder), 123, true); Writer recycle_writer(std::move(dest_holder), 123, true);
ASSERT_OK(recycle_writer.AddRecord(Slice("foooo"))); ASSERT_OK(recycle_writer.AddRecord(WriteOptions(), Slice("foooo")));
ASSERT_OK(recycle_writer.AddRecord(Slice("bar"))); ASSERT_OK(recycle_writer.AddRecord(WriteOptions(), Slice("bar")));
ASSERT_GE(get_reader_contents()->size(), log::kBlockSize * 2); ASSERT_GE(get_reader_contents()->size(), log::kBlockSize * 2);
ASSERT_EQ("foooo", Read()); ASSERT_EQ("foooo", Read());
ASSERT_EQ("bar", Read()); ASSERT_EQ("bar", Read());
@ -764,9 +765,10 @@ TEST_P(LogTest, RecycleWithTimestampSize) {
UnorderedMap<uint32_t, size_t> ts_sz_two = { UnorderedMap<uint32_t, size_t> ts_sz_two = {
{2, sizeof(uint64_t)}, {2, sizeof(uint64_t)},
}; };
ASSERT_OK(recycle_writer.MaybeAddUserDefinedTimestampSizeRecord(ts_sz_two)); ASSERT_OK(recycle_writer.MaybeAddUserDefinedTimestampSizeRecord(
ASSERT_OK(recycle_writer.AddRecord(Slice("foooo"))); WriteOptions(), ts_sz_two));
ASSERT_OK(recycle_writer.AddRecord(Slice("bar"))); ASSERT_OK(recycle_writer.AddRecord(WriteOptions(), Slice("foooo")));
ASSERT_OK(recycle_writer.AddRecord(WriteOptions(), Slice("bar")));
ASSERT_GE(get_reader_contents()->size(), log::kBlockSize * 2); ASSERT_GE(get_reader_contents()->size(), log::kBlockSize * 2);
CheckRecordAndTimestampSize("foooo", ts_sz_two); CheckRecordAndTimestampSize("foooo", ts_sz_two);
CheckRecordAndTimestampSize("bar", ts_sz_two); CheckRecordAndTimestampSize("bar", ts_sz_two);
@ -853,12 +855,12 @@ class RetriableLogTest : public ::testing::TestWithParam<int> {
std::string contents() { return sink_->contents_; } std::string contents() { return sink_->contents_; }
void Encode(const std::string& msg) { void Encode(const std::string& msg) {
ASSERT_OK(log_writer_->AddRecord(Slice(msg))); ASSERT_OK(log_writer_->AddRecord(WriteOptions(), Slice(msg)));
} }
void Write(const Slice& data) { void Write(const Slice& data) {
ASSERT_OK(writer_->Append(data)); ASSERT_OK(writer_->Append(IOOptions(), data));
ASSERT_OK(writer_->Sync(true)); ASSERT_OK(writer_->Sync(IOOptions(), true));
} }
bool TryRead(std::string* result) { bool TryRead(std::string* result) {
@ -991,7 +993,9 @@ INSTANTIATE_TEST_CASE_P(bool, RetriableLogTest, ::testing::Values(0, 2));
class CompressionLogTest : public LogTest { class CompressionLogTest : public LogTest {
public: public:
Status SetupTestEnv() { return writer_->AddCompressionTypeRecord(); } Status SetupTestEnv() {
return writer_->AddCompressionTypeRecord(WriteOptions());
}
}; };
TEST_P(CompressionLogTest, Empty) { TEST_P(CompressionLogTest, Empty) {
@ -1109,7 +1113,7 @@ TEST_P(CompressionLogTest, AlignedFragmentation) {
// beginning of the block. // beginning of the block.
while ((WrittenBytes() & (kBlockSize - 1)) >= kHeaderSize) { while ((WrittenBytes() & (kBlockSize - 1)) >= kHeaderSize) {
char entry = 'a'; char entry = 'a';
ASSERT_OK(writer_->AddRecord(Slice(&entry, 1))); ASSERT_OK(writer_->AddRecord(WriteOptions(), Slice(&entry, 1)));
num_filler_records++; num_filler_records++;
} }
const std::vector<std::string> wal_entries = { const std::vector<std::string> wal_entries = {

View File

@ -38,32 +38,43 @@ Writer::Writer(std::unique_ptr<WritableFileWriter>&& dest, uint64_t log_number,
} }
Writer::~Writer() { Writer::~Writer() {
ThreadStatus::OperationType cur_op_type =
ThreadStatusUtil::GetThreadOperation();
ThreadStatusUtil::SetThreadOperation(ThreadStatus::OperationType::OP_UNKNOWN);
if (dest_) { if (dest_) {
WriteBuffer().PermitUncheckedError(); WriteBuffer(WriteOptions()).PermitUncheckedError();
} }
if (compress_) { if (compress_) {
delete compress_; delete compress_;
} }
ThreadStatusUtil::SetThreadOperation(cur_op_type);
} }
IOStatus Writer::WriteBuffer() { IOStatus Writer::WriteBuffer(const WriteOptions& write_options) {
if (dest_->seen_error()) { if (dest_->seen_error()) {
return IOStatus::IOError("Seen error. Skip writing buffer."); return IOStatus::IOError("Seen error. Skip writing buffer.");
} }
return dest_->Flush(); IOOptions opts;
IOStatus s = WritableFileWriter::PrepareIOOptions(write_options, opts);
if (!s.ok()) {
return s;
}
return dest_->Flush(opts);
} }
IOStatus Writer::Close() { IOStatus Writer::Close(const WriteOptions& write_options) {
IOStatus s; IOStatus s;
if (dest_) { IOOptions opts;
s = dest_->Close(); s = WritableFileWriter::PrepareIOOptions(write_options, opts);
if (s.ok() && dest_) {
s = dest_->Close(opts);
dest_.reset(); dest_.reset();
} }
return s; return s;
} }
IOStatus Writer::AddRecord(const Slice& slice, IOStatus Writer::AddRecord(const WriteOptions& write_options,
Env::IOPriority rate_limiter_priority) { const Slice& slice) {
const char* ptr = slice.data(); const char* ptr = slice.data();
size_t left = slice.size(); size_t left = slice.size();
@ -83,6 +94,9 @@ IOStatus Writer::AddRecord(const Slice& slice,
} }
IOStatus s; IOStatus s;
IOOptions opts;
s = WritableFileWriter::PrepareIOOptions(write_options, opts);
if (s.ok()) {
do { do {
const int64_t leftover = kBlockSize - block_offset_; const int64_t leftover = kBlockSize - block_offset_;
assert(leftover >= 0); assert(leftover >= 0);
@ -92,9 +106,10 @@ IOStatus Writer::AddRecord(const Slice& slice,
// Fill the trailer (literal below relies on kHeaderSize and // Fill the trailer (literal below relies on kHeaderSize and
// kRecyclableHeaderSize being <= 11) // kRecyclableHeaderSize being <= 11)
assert(header_size <= 11); assert(header_size <= 11);
s = dest_->Append(Slice("\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00", s = dest_->Append(opts,
Slice("\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00",
static_cast<size_t>(leftover)), static_cast<size_t>(leftover)),
0 /* crc32c_checksum */, rate_limiter_priority); 0 /* crc32c_checksum */);
if (!s.ok()) { if (!s.ok()) {
break; break;
} }
@ -112,8 +127,8 @@ IOStatus Writer::AddRecord(const Slice& slice,
// previous generated compressed chunk is written out as one or more // previous generated compressed chunk is written out as one or more
// physical records (left=0). // physical records (left=0).
if (compress_ && (compress_start || left == 0)) { if (compress_ && (compress_start || left == 0)) {
compress_remaining = compress_->Compress(slice.data(), slice.size(), compress_remaining = compress_->Compress(
compressed_buffer_.get(), &left); slice.data(), slice.size(), compressed_buffer_.get(), &left);
if (compress_remaining < 0) { if (compress_remaining < 0) {
// Set failure status // Set failure status
@ -144,22 +159,22 @@ IOStatus Writer::AddRecord(const Slice& slice,
type = recycle_log_files_ ? kRecyclableMiddleType : kMiddleType; type = recycle_log_files_ ? kRecyclableMiddleType : kMiddleType;
} }
s = EmitPhysicalRecord(type, ptr, fragment_length, rate_limiter_priority); s = EmitPhysicalRecord(write_options, type, ptr, fragment_length);
ptr += fragment_length; ptr += fragment_length;
left -= fragment_length; left -= fragment_length;
begin = false; begin = false;
} while (s.ok() && (left > 0 || compress_remaining > 0)); } while (s.ok() && (left > 0 || compress_remaining > 0));
}
if (s.ok()) { if (s.ok()) {
if (!manual_flush_) { if (!manual_flush_) {
s = dest_->Flush(rate_limiter_priority); s = dest_->Flush(opts);
} }
} }
return s; return s;
} }
IOStatus Writer::AddCompressionTypeRecord() { IOStatus Writer::AddCompressionTypeRecord(const WriteOptions& write_options) {
// Should be the first record // Should be the first record
assert(block_offset_ == 0); assert(block_offset_ == 0);
@ -171,11 +186,15 @@ IOStatus Writer::AddCompressionTypeRecord() {
CompressionTypeRecord record(compression_type_); CompressionTypeRecord record(compression_type_);
std::string encode; std::string encode;
record.EncodeTo(&encode); record.EncodeTo(&encode);
IOStatus s = IOStatus s = EmitPhysicalRecord(write_options, kSetCompressionType,
EmitPhysicalRecord(kSetCompressionType, encode.data(), encode.size()); encode.data(), encode.size());
if (s.ok()) { if (s.ok()) {
if (!manual_flush_) { if (!manual_flush_) {
s = dest_->Flush(); IOOptions io_opts;
s = WritableFileWriter::PrepareIOOptions(write_options, io_opts);
if (s.ok()) {
s = dest_->Flush(io_opts);
}
} }
// Initialize fields required for compression // Initialize fields required for compression
const size_t max_output_buffer_len = const size_t max_output_buffer_len =
@ -197,8 +216,8 @@ IOStatus Writer::AddCompressionTypeRecord() {
} }
IOStatus Writer::MaybeAddUserDefinedTimestampSizeRecord( IOStatus Writer::MaybeAddUserDefinedTimestampSizeRecord(
const UnorderedMap<uint32_t, size_t>& cf_to_ts_sz, const WriteOptions& write_options,
Env::IOPriority rate_limiter_priority) { const UnorderedMap<uint32_t, size_t>& cf_to_ts_sz) {
std::vector<std::pair<uint32_t, size_t>> ts_sz_to_record; std::vector<std::pair<uint32_t, size_t>> ts_sz_to_record;
for (const auto& [cf_id, ts_sz] : cf_to_ts_sz) { for (const auto& [cf_id, ts_sz] : cf_to_ts_sz) {
if (recorded_cf_to_ts_sz_.count(cf_id) != 0) { if (recorded_cf_to_ts_sz_.count(cf_id) != 0) {
@ -219,14 +238,14 @@ IOStatus Writer::MaybeAddUserDefinedTimestampSizeRecord(
record.EncodeTo(&encoded); record.EncodeTo(&encoded);
RecordType type = recycle_log_files_ ? kRecyclableUserDefinedTimestampSizeType RecordType type = recycle_log_files_ ? kRecyclableUserDefinedTimestampSizeType
: kUserDefinedTimestampSizeType; : kUserDefinedTimestampSizeType;
return EmitPhysicalRecord(type, encoded.data(), encoded.size(), return EmitPhysicalRecord(write_options, type, encoded.data(),
rate_limiter_priority); encoded.size());
} }
bool Writer::BufferIsEmpty() { return dest_->BufferIsEmpty(); } bool Writer::BufferIsEmpty() { return dest_->BufferIsEmpty(); }
IOStatus Writer::EmitPhysicalRecord(RecordType t, const char* ptr, size_t n, IOStatus Writer::EmitPhysicalRecord(const WriteOptions& write_options,
Env::IOPriority rate_limiter_priority) { RecordType t, const char* ptr, size_t n) {
assert(n <= 0xffff); // Must fit in two bytes assert(n <= 0xffff); // Must fit in two bytes
size_t header_size; size_t header_size;
@ -266,10 +285,13 @@ IOStatus Writer::EmitPhysicalRecord(RecordType t, const char* ptr, size_t n,
EncodeFixed32(buf, crc); EncodeFixed32(buf, crc);
// Write the header and the payload // Write the header and the payload
IOStatus s = dest_->Append(Slice(buf, header_size), 0 /* crc32c_checksum */, IOOptions opts;
rate_limiter_priority); IOStatus s = WritableFileWriter::PrepareIOOptions(write_options, opts);
if (s.ok()) { if (s.ok()) {
s = dest_->Append(Slice(ptr, n), payload_crc, rate_limiter_priority); s = dest_->Append(opts, Slice(buf, header_size), 0 /* crc32c_checksum */);
}
if (s.ok()) {
s = dest_->Append(opts, Slice(ptr, n), payload_crc);
} }
block_offset_ += header_size + n; block_offset_ += header_size + n;
return s; return s;

View File

@ -86,9 +86,8 @@ class Writer {
~Writer(); ~Writer();
IOStatus AddRecord(const Slice& slice, IOStatus AddRecord(const WriteOptions& write_options, const Slice& slice);
Env::IOPriority rate_limiter_priority = Env::IO_TOTAL); IOStatus AddCompressionTypeRecord(const WriteOptions& write_options);
IOStatus AddCompressionTypeRecord();
// If there are column families in `cf_to_ts_sz` not included in // If there are column families in `cf_to_ts_sz` not included in
// `recorded_cf_to_ts_sz_` and its user-defined timestamp size is non-zero, // `recorded_cf_to_ts_sz_` and its user-defined timestamp size is non-zero,
@ -96,17 +95,17 @@ class Writer {
// kRecyclableUserDefinedTimestampSizeType for these column families. // kRecyclableUserDefinedTimestampSizeType for these column families.
// This timestamp size record applies to all subsequent records. // This timestamp size record applies to all subsequent records.
IOStatus MaybeAddUserDefinedTimestampSizeRecord( IOStatus MaybeAddUserDefinedTimestampSizeRecord(
const UnorderedMap<uint32_t, size_t>& cf_to_ts_sz, const WriteOptions& write_options,
Env::IOPriority rate_limiter_priority = Env::IO_TOTAL); const UnorderedMap<uint32_t, size_t>& cf_to_ts_sz);
WritableFileWriter* file() { return dest_.get(); } WritableFileWriter* file() { return dest_.get(); }
const WritableFileWriter* file() const { return dest_.get(); } const WritableFileWriter* file() const { return dest_.get(); }
uint64_t get_log_number() const { return log_number_; } uint64_t get_log_number() const { return log_number_; }
IOStatus WriteBuffer(); IOStatus WriteBuffer(const WriteOptions& write_options);
IOStatus Close(); IOStatus Close(const WriteOptions& write_options);
bool BufferIsEmpty(); bool BufferIsEmpty();
@ -121,9 +120,8 @@ class Writer {
// record type stored in the header. // record type stored in the header.
uint32_t type_crc_[kMaxRecordType + 1]; uint32_t type_crc_[kMaxRecordType + 1];
IOStatus EmitPhysicalRecord( IOStatus EmitPhysicalRecord(const WriteOptions& write_options,
RecordType type, const char* ptr, size_t length, RecordType type, const char* ptr, size_t length);
Env::IOPriority rate_limiter_priority = Env::IO_TOTAL);
// If true, it does not flush after each write. Instead it relies on the upper // If true, it does not flush after each write. Instead it relies on the upper
// layer to manually does the flush by calling ::WriteBuffer() // layer to manually does the flush by calling ::WriteBuffer()

View File

@ -597,7 +597,7 @@ void MemTable::ConstructFragmentedRangeTombstones() {
assert(!IsFragmentedRangeTombstonesConstructed(false)); assert(!IsFragmentedRangeTombstonesConstructed(false));
// There should be no concurrent Construction // There should be no concurrent Construction
if (!is_range_del_table_empty_.load(std::memory_order_relaxed)) { if (!is_range_del_table_empty_.load(std::memory_order_relaxed)) {
// TODO: plumb Env::IOActivity // TODO: plumb Env::IOActivity, Env::IOPriority
auto* unfragmented_iter = auto* unfragmented_iter =
new MemTableIterator(*this, ReadOptions(), nullptr /* arena */, new MemTableIterator(*this, ReadOptions(), nullptr /* arena */,
true /* use_range_del_table */); true /* use_range_del_table */);

View File

@ -502,6 +502,7 @@ Status MemTableList::TryInstallMemtableFlushResults(
mu->AssertHeld(); mu->AssertHeld();
const ReadOptions read_options(Env::IOActivity::kFlush); const ReadOptions read_options(Env::IOActivity::kFlush);
const WriteOptions write_options(Env::IOActivity::kFlush);
// Flush was successful // Flush was successful
// Record the status on the memtable object. Either this call or a call by a // Record the status on the memtable object. Either this call or a call by a
@ -614,10 +615,10 @@ Status MemTableList::TryInstallMemtableFlushResults(
}; };
if (write_edits) { if (write_edits) {
// this can release and reacquire the mutex. // this can release and reacquire the mutex.
s = vset->LogAndApply(cfd, mutable_cf_options, read_options, edit_list, s = vset->LogAndApply(
mu, db_directory, /*new_descriptor_log=*/false, cfd, mutable_cf_options, read_options, write_options, edit_list, mu,
/*column_family_options=*/nullptr, db_directory, /*new_descriptor_log=*/false,
manifest_write_cb); /*column_family_options=*/nullptr, manifest_write_cb);
} else { } else {
// If write_edit is false (e.g: successful mempurge), // If write_edit is false (e.g: successful mempurge),
// then remove old memtables, wake up manifest write queue threads, // then remove old memtables, wake up manifest write queue threads,
@ -835,6 +836,7 @@ Status InstallMemtableAtomicFlushResults(
mu->AssertHeld(); mu->AssertHeld();
const ReadOptions read_options(Env::IOActivity::kFlush); const ReadOptions read_options(Env::IOActivity::kFlush);
const WriteOptions write_options(Env::IOActivity::kFlush);
size_t num = mems_list.size(); size_t num = mems_list.size();
assert(cfds.size() == num); assert(cfds.size() == num);
@ -913,8 +915,8 @@ Status InstallMemtableAtomicFlushResults(
} }
// this can release and reacquire the mutex. // this can release and reacquire the mutex.
s = vset->LogAndApply(cfds, mutable_cf_options_list, read_options, edit_lists, s = vset->LogAndApply(cfds, mutable_cf_options_list, read_options,
mu, db_directory); write_options, edit_lists, mu, db_directory);
for (size_t k = 0; k != cfds.size(); ++k) { for (size_t k = 0; k != cfds.size(); ++k) {
auto* imm = (imm_lists == nullptr) ? cfds[k]->imm() : imm_lists->at(k); auto* imm = (imm_lists == nullptr) ? cfds[k]->imm() : imm_lists->at(k);

View File

@ -146,8 +146,10 @@ class Repairer {
// Adds a column family to the VersionSet with cf_options_ and updates // Adds a column family to the VersionSet with cf_options_ and updates
// manifest. // manifest.
Status AddColumnFamily(const std::string& cf_name, uint32_t cf_id) { Status AddColumnFamily(const std::string& cf_name, uint32_t cf_id) {
// TODO: plumb Env::IOActivity; // TODO: plumb Env::IOActivity, Env::IOPriority;
const ReadOptions read_options; const ReadOptions read_options;
const WriteOptions write_options;
const auto* cf_opts = GetColumnFamilyOptions(cf_name); const auto* cf_opts = GetColumnFamilyOptions(cf_name);
if (cf_opts == nullptr) { if (cf_opts == nullptr) {
return Status::Corruption("Encountered unknown column family with name=" + return Status::Corruption("Encountered unknown column family with name=" +
@ -170,9 +172,9 @@ class Repairer {
Status status = env_->GetFileSystem()->NewDirectory(dbname_, IOOptions(), Status status = env_->GetFileSystem()->NewDirectory(dbname_, IOOptions(),
&db_dir, nullptr); &db_dir, nullptr);
if (status.ok()) { if (status.ok()) {
status = vset_.LogAndApply(cfd, mut_cf_opts, read_options, &edit, &mutex_, status = vset_.LogAndApply(cfd, mut_cf_opts, read_options, write_options,
db_dir.get(), false /* new_descriptor_log */, &edit, &mutex_, db_dir.get(),
cf_opts); false /* new_descriptor_log */, cf_opts);
} }
mutex_.Unlock(); mutex_.Unlock();
return status; return status;
@ -362,9 +364,6 @@ class Repairer {
} }
}; };
// TODO: plumb Env::IOActivity
const ReadOptions read_options;
// Open the log file // Open the log file
std::string logname = LogFileName(wal_dir, log); std::string logname = LogFileName(wal_dir, log);
const auto& fs = env_->GetFileSystem(); const auto& fs = env_->GetFileSystem();
@ -440,7 +439,7 @@ class Repairer {
FileMetaData meta; FileMetaData meta;
meta.fd = FileDescriptor(next_file_number_++, 0, 0); meta.fd = FileDescriptor(next_file_number_++, 0, 0);
// TODO: plumb Env::IOActivity // TODO: plumb Env::IOActivity, Env::IOPriority
ReadOptions ro; ReadOptions ro;
ro.total_order_seek = true; ro.total_order_seek = true;
Arena arena; Arena arena;
@ -463,26 +462,29 @@ class Repairer {
IOStatus io_s; IOStatus io_s;
CompressionOptions default_compression; CompressionOptions default_compression;
// TODO: plumb Env::IOActivity, Env::IOPriority
const ReadOptions read_options;
const WriteOptions write_option(Env::IO_HIGH);
TableBuilderOptions tboptions( TableBuilderOptions tboptions(
*cfd->ioptions(), *cfd->GetLatestMutableCFOptions(), *cfd->ioptions(), *cfd->GetLatestMutableCFOptions(), read_options,
cfd->internal_comparator(), cfd->int_tbl_prop_collector_factories(), write_option, cfd->internal_comparator(),
kNoCompression, default_compression, cfd->GetID(), cfd->GetName(), cfd->int_tbl_prop_collector_factories(), kNoCompression,
-1 /* level */, false /* is_bottommost */, default_compression, cfd->GetID(), cfd->GetName(), -1 /* level */,
TableFileCreationReason::kRecovery, 0 /* oldest_key_time */, false /* is_bottommost */, TableFileCreationReason::kRecovery,
0 /* file_creation_time */, "DB Repairer" /* db_id */, db_session_id_, 0 /* oldest_key_time */, 0 /* file_creation_time */,
0 /*target_file_size*/, meta.fd.GetNumber()); "DB Repairer" /* db_id */, db_session_id_, 0 /*target_file_size*/,
meta.fd.GetNumber());
SeqnoToTimeMapping empty_seqno_to_time_mapping; SeqnoToTimeMapping empty_seqno_to_time_mapping;
status = BuildTable( status = BuildTable(
dbname_, /* versions */ nullptr, immutable_db_options_, tboptions, dbname_, /* versions */ nullptr, immutable_db_options_, tboptions,
file_options_, read_options, table_cache_.get(), iter.get(), file_options_, table_cache_.get(), iter.get(),
std::move(range_del_iters), &meta, nullptr /* blob_file_additions */, std::move(range_del_iters), &meta, nullptr /* blob_file_additions */,
{}, kMaxSequenceNumber, kMaxSequenceNumber, snapshot_checker, {}, kMaxSequenceNumber, kMaxSequenceNumber, snapshot_checker,
false /* paranoid_file_checks*/, nullptr /* internal_stats */, &io_s, false /* paranoid_file_checks*/, nullptr /* internal_stats */, &io_s,
nullptr /*IOTracer*/, BlobFileCreationReason::kRecovery, nullptr /*IOTracer*/, BlobFileCreationReason::kRecovery,
empty_seqno_to_time_mapping, nullptr /* event_logger */, empty_seqno_to_time_mapping, nullptr /* event_logger */,
0 /* job_id */, Env::IO_HIGH, nullptr /* table_properties */, 0 /* job_id */, nullptr /* table_properties */, write_hint);
write_hint);
ROCKS_LOG_INFO(db_options_.info_log, ROCKS_LOG_INFO(db_options_.info_log,
"Log #%" PRIu64 ": %d ops saved to Table #%" PRIu64 " %s", "Log #%" PRIu64 ": %d ops saved to Table #%" PRIu64 " %s",
log, counter, meta.fd.GetNumber(), log, counter, meta.fd.GetNumber(),
@ -529,7 +531,7 @@ class Repairer {
file_size); file_size);
std::shared_ptr<const TableProperties> props; std::shared_ptr<const TableProperties> props;
if (status.ok()) { if (status.ok()) {
// TODO: plumb Env::IOActivity // TODO: plumb Env::IOActivity, Env::IOPriority
const ReadOptions read_options; const ReadOptions read_options;
status = table_cache_->GetTableProperties( status = table_cache_->GetTableProperties(
file_options_, read_options, icmp_, t->meta, &props, file_options_, read_options, icmp_, t->meta, &props,
@ -592,7 +594,7 @@ class Repairer {
} }
} }
if (status.ok()) { if (status.ok()) {
// TODO: plumb Env::IOActivity // TODO: plumb Env::IOActivity, Env::IOPriority
ReadOptions ropts; ReadOptions ropts;
ropts.total_order_seek = true; ropts.total_order_seek = true;
InternalIterator* iter = table_cache_->NewIterator( InternalIterator* iter = table_cache_->NewIterator(
@ -641,7 +643,7 @@ class Repairer {
// an SST file is a full sorted run. This probably needs the extra logic // an SST file is a full sorted run. This probably needs the extra logic
// from compaction_job.cc around call to UpdateBoundariesForRange (to // from compaction_job.cc around call to UpdateBoundariesForRange (to
// handle range tombstones extendingg beyond range of other entries). // handle range tombstones extendingg beyond range of other entries).
// TODO: plumb Env::IOActivity // TODO: plumb Env::IOActivity, Env::IOPriority
ReadOptions ropts; ReadOptions ropts;
std::unique_ptr<FragmentedRangeTombstoneIterator> r_iter; std::unique_ptr<FragmentedRangeTombstoneIterator> r_iter;
status = table_cache_->GetRangeTombstoneIterator( status = table_cache_->GetRangeTombstoneIterator(
@ -666,8 +668,10 @@ class Repairer {
} }
Status AddTables() { Status AddTables() {
// TODO: plumb Env::IOActivity; // TODO: plumb Env::IOActivity, Env::IOPriority;
const ReadOptions read_options; const ReadOptions read_options;
const WriteOptions write_options;
std::unordered_map<uint32_t, std::vector<const TableInfo*>> cf_id_to_tables; std::unordered_map<uint32_t, std::vector<const TableInfo*>> cf_id_to_tables;
SequenceNumber max_sequence = 0; SequenceNumber max_sequence = 0;
for (size_t i = 0; i < tables_.size(); i++) { for (size_t i = 0; i < tables_.size(); i++) {
@ -755,8 +759,8 @@ class Repairer {
nullptr); nullptr);
if (s.ok()) { if (s.ok()) {
s = vset_.LogAndApply(cfd, *cfd->GetLatestMutableCFOptions(), s = vset_.LogAndApply(cfd, *cfd->GetLatestMutableCFOptions(),
read_options, &edit, &mutex_, db_dir.get(), read_options, write_options, &edit, &mutex_,
false /* new_descriptor_log */); db_dir.get(), false /* new_descriptor_log */);
} }
mutex_.Unlock(); mutex_.Unlock();
} }

View File

@ -52,10 +52,13 @@ void MakeBuilder(
std::unique_ptr<FSWritableFile> wf(new test::StringSink); std::unique_ptr<FSWritableFile> wf(new test::StringSink);
writable->reset( writable->reset(
new WritableFileWriter(std::move(wf), "" /* don't care */, EnvOptions())); new WritableFileWriter(std::move(wf), "" /* don't care */, EnvOptions()));
const ReadOptions read_options;
const WriteOptions write_options;
TableBuilderOptions tboptions( TableBuilderOptions tboptions(
ioptions, moptions, internal_comparator, int_tbl_prop_collector_factories, ioptions, moptions, read_options, write_options, internal_comparator,
options.compression, options.compression_opts, kTestColumnFamilyId, int_tbl_prop_collector_factories, options.compression,
kTestColumnFamilyName, kTestLevel); options.compression_opts, kTestColumnFamilyId, kTestColumnFamilyName,
kTestLevel);
builder->reset(NewTableBuilder(tboptions, writable->get())); builder->reset(NewTableBuilder(tboptions, writable->get()));
} }
} // namespace } // namespace
@ -280,7 +283,7 @@ void TestCustomizedTablePropertiesCollector(
builder->Add(ikey.Encode(), kv.second); builder->Add(ikey.Encode(), kv.second);
} }
ASSERT_OK(builder->Finish()); ASSERT_OK(builder->Finish());
ASSERT_OK(writer->Flush()); ASSERT_OK(writer->Flush(IOOptions()));
// -- Step 2: Read properties // -- Step 2: Read properties
test::StringSink* fwf = test::StringSink* fwf =
@ -419,7 +422,7 @@ void TestInternalKeyPropertiesCollector(
} }
ASSERT_OK(builder->Finish()); ASSERT_OK(builder->Finish());
ASSERT_OK(writable->Flush()); ASSERT_OK(writable->Flush(IOOptions()));
test::StringSink* fwf = test::StringSink* fwf =
static_cast<test::StringSink*>(writable->writable_file()); static_cast<test::StringSink*>(writable->writable_file());

View File

@ -1623,7 +1623,7 @@ Status Version::TablesRangeTombstoneSummary(int max_entries_to_print,
std::stringstream ss; std::stringstream ss;
// TODO: plumb Env::IOActivity // TODO: plumb Env::IOActivity, Env::IOPriority
const ReadOptions read_options; const ReadOptions read_options;
for (int level = 0; level < storage_info_.num_levels_; level++) { for (int level = 0; level < storage_info_.num_levels_; level++) {
for (const auto& file_meta : storage_info_.files_[level]) { for (const auto& file_meta : storage_info_.files_[level]) {
@ -5113,7 +5113,7 @@ Status VersionSet::Close(FSDirectory* db_dir, InstrumentedMutex* mu) {
std::string manifest_file_name = std::string manifest_file_name =
DescriptorFileName(dbname_, manifest_file_number_); DescriptorFileName(dbname_, manifest_file_number_);
uint64_t size = 0; uint64_t size = 0;
IOStatus io_s = descriptor_log_->Close(); IOStatus io_s = descriptor_log_->Close(WriteOptions());
descriptor_log_.reset(); descriptor_log_.reset();
TEST_SYNC_POINT("VersionSet::Close:AfterClose"); TEST_SYNC_POINT("VersionSet::Close:AfterClose");
if (io_s.ok()) { if (io_s.ok()) {
@ -5146,7 +5146,8 @@ Status VersionSet::Close(FSDirectory* db_dir, InstrumentedMutex* mu) {
VersionEdit edit; VersionEdit edit;
assert(cfd); assert(cfd);
const MutableCFOptions& cf_opts = *cfd->GetLatestMutableCFOptions(); const MutableCFOptions& cf_opts = *cfd->GetLatestMutableCFOptions();
s = LogAndApply(cfd, cf_opts, ReadOptions(), &edit, mu, db_dir); s = LogAndApply(cfd, cf_opts, ReadOptions(), WriteOptions(), &edit, mu,
db_dir);
} }
closed_ = true; closed_ = true;
@ -5230,8 +5231,8 @@ void VersionSet::AppendVersion(ColumnFamilyData* column_family_data,
Status VersionSet::ProcessManifestWrites( Status VersionSet::ProcessManifestWrites(
std::deque<ManifestWriter>& writers, InstrumentedMutex* mu, std::deque<ManifestWriter>& writers, InstrumentedMutex* mu,
FSDirectory* dir_contains_current_file, bool new_descriptor_log, FSDirectory* dir_contains_current_file, bool new_descriptor_log,
const ColumnFamilyOptions* new_cf_options, const ColumnFamilyOptions* new_cf_options, const ReadOptions& read_options,
const ReadOptions& read_options) { const WriteOptions& write_options) {
mu->AssertHeld(); mu->AssertHeld();
assert(!writers.empty()); assert(!writers.empty());
ManifestWriter& first_writer = writers.front(); ManifestWriter& first_writer = writers.front();
@ -5505,13 +5506,15 @@ Status VersionSet::ProcessManifestWrites(
FileTypeSet tmp_set = db_options_->checksum_handoff_file_types; FileTypeSet tmp_set = db_options_->checksum_handoff_file_types;
std::unique_ptr<WritableFileWriter> file_writer(new WritableFileWriter( std::unique_ptr<WritableFileWriter> file_writer(new WritableFileWriter(
std::move(descriptor_file), descriptor_fname, opt_file_opts, clock_, std::move(descriptor_file), descriptor_fname, opt_file_opts, clock_,
io_tracer_, nullptr, db_options_->listeners, nullptr, io_tracer_, nullptr, Histograms::HISTOGRAM_ENUM_MAX /* hist_type */,
db_options_->listeners, nullptr,
tmp_set.Contains(FileType::kDescriptorFile), tmp_set.Contains(FileType::kDescriptorFile),
tmp_set.Contains(FileType::kDescriptorFile))); tmp_set.Contains(FileType::kDescriptorFile)));
descriptor_log_.reset( descriptor_log_.reset(
new log::Writer(std::move(file_writer), 0, false)); new log::Writer(std::move(file_writer), 0, false));
s = WriteCurrentStateToManifest(curr_state, wal_additions, s = WriteCurrentStateToManifest(write_options, curr_state,
descriptor_log_.get(), io_s); wal_additions, descriptor_log_.get(),
io_s);
} else { } else {
manifest_io_status = io_s; manifest_io_status = io_s;
s = io_s; s = io_s;
@ -5555,7 +5558,7 @@ Status VersionSet::ProcessManifestWrites(
} }
++idx; ++idx;
#endif /* !NDEBUG */ #endif /* !NDEBUG */
io_s = descriptor_log_->AddRecord(record); io_s = descriptor_log_->AddRecord(write_options, record);
if (!io_s.ok()) { if (!io_s.ok()) {
s = io_s; s = io_s;
manifest_io_status = io_s; manifest_io_status = io_s;
@ -5564,7 +5567,8 @@ Status VersionSet::ProcessManifestWrites(
} }
if (s.ok()) { if (s.ok()) {
io_s = SyncManifest(db_options_, descriptor_log_->file()); io_s =
SyncManifest(db_options_, write_options, descriptor_log_->file());
manifest_io_status = io_s; manifest_io_status = io_s;
TEST_SYNC_POINT_CALLBACK( TEST_SYNC_POINT_CALLBACK(
"VersionSet::ProcessManifestWrites:AfterSyncManifest", &io_s); "VersionSet::ProcessManifestWrites:AfterSyncManifest", &io_s);
@ -5582,7 +5586,8 @@ Status VersionSet::ProcessManifestWrites(
assert(manifest_io_status.ok()); assert(manifest_io_status.ok());
} }
if (s.ok() && new_descriptor_log) { if (s.ok() && new_descriptor_log) {
io_s = SetCurrentFile(fs_.get(), dbname_, pending_manifest_file_number_, io_s = SetCurrentFile(write_options, fs_.get(), dbname_,
pending_manifest_file_number_,
dir_contains_current_file); dir_contains_current_file);
if (!io_s.ok()) { if (!io_s.ok()) {
s = io_s; s = io_s;
@ -5822,7 +5827,7 @@ void VersionSet::WakeUpWaitingManifestWriters() {
Status VersionSet::LogAndApply( Status VersionSet::LogAndApply(
const autovector<ColumnFamilyData*>& column_family_datas, const autovector<ColumnFamilyData*>& column_family_datas,
const autovector<const MutableCFOptions*>& mutable_cf_options_list, const autovector<const MutableCFOptions*>& mutable_cf_options_list,
const ReadOptions& read_options, const ReadOptions& read_options, const WriteOptions& write_options,
const autovector<autovector<VersionEdit*>>& edit_lists, const autovector<autovector<VersionEdit*>>& edit_lists,
InstrumentedMutex* mu, FSDirectory* dir_contains_current_file, InstrumentedMutex* mu, FSDirectory* dir_contains_current_file,
bool new_descriptor_log, const ColumnFamilyOptions* new_cf_options, bool new_descriptor_log, const ColumnFamilyOptions* new_cf_options,
@ -5900,8 +5905,8 @@ Status VersionSet::LogAndApply(
return Status::ColumnFamilyDropped(); return Status::ColumnFamilyDropped();
} }
return ProcessManifestWrites(writers, mu, dir_contains_current_file, return ProcessManifestWrites(writers, mu, dir_contains_current_file,
new_descriptor_log, new_cf_options, new_descriptor_log, new_cf_options, read_options,
read_options); write_options);
} }
void VersionSet::LogAndApplyCFHelper(VersionEdit* edit, void VersionSet::LogAndApplyCFHelper(VersionEdit* edit,
@ -6238,7 +6243,7 @@ Status VersionSet::ListColumnFamilies(std::vector<std::string>* column_families,
Status VersionSet::ListColumnFamiliesFromManifest( Status VersionSet::ListColumnFamiliesFromManifest(
const std::string& manifest_path, FileSystem* fs, const std::string& manifest_path, FileSystem* fs,
std::vector<std::string>* column_families) { std::vector<std::string>* column_families) {
// TODO: plumb Env::IOActivity // TODO: plumb Env::IOActivity, Env::IOPriority
const ReadOptions read_options; const ReadOptions read_options;
std::unique_ptr<SequentialFileReader> file_reader; std::unique_ptr<SequentialFileReader> file_reader;
Status s; Status s;
@ -6282,8 +6287,9 @@ Status VersionSet::ReduceNumberOfLevels(const std::string& dbname,
"Number of levels needs to be bigger than 1"); "Number of levels needs to be bigger than 1");
} }
// TODO: plumb Env::IOActivity // TODO: plumb Env::IOActivity, Env::IOPriority
const ReadOptions read_options; const ReadOptions read_options;
const WriteOptions write_options;
ImmutableDBOptions db_options(*options); ImmutableDBOptions db_options(*options);
ColumnFamilyOptions cf_options(*options); ColumnFamilyOptions cf_options(*options);
@ -6373,8 +6379,8 @@ Status VersionSet::ReduceNumberOfLevels(const std::string& dbname,
InstrumentedMutex dummy_mutex; InstrumentedMutex dummy_mutex;
InstrumentedMutexLock l(&dummy_mutex); InstrumentedMutexLock l(&dummy_mutex);
return versions.LogAndApply(versions.GetColumnFamilySet()->GetDefault(), return versions.LogAndApply(versions.GetColumnFamilySet()->GetDefault(),
mutable_cf_options, read_options, &ve, mutable_cf_options, read_options, write_options,
&dummy_mutex, nullptr, true); &ve, &dummy_mutex, nullptr, true);
} }
// Get the checksum information including the checksum and checksum function // Get the checksum information including the checksum and checksum function
@ -6448,7 +6454,7 @@ Status VersionSet::DumpManifest(
Options& options, std::string& dscname, bool verbose, bool hex, bool json, Options& options, std::string& dscname, bool verbose, bool hex, bool json,
const std::vector<ColumnFamilyDescriptor>& cf_descs) { const std::vector<ColumnFamilyDescriptor>& cf_descs) {
assert(options.env); assert(options.env);
// TODO: plumb Env::IOActivity // TODO: plumb Env::IOActivity, Env::IOPriority
const ReadOptions read_options; const ReadOptions read_options;
std::vector<std::string> column_families; std::vector<std::string> column_families;
@ -6515,6 +6521,7 @@ void VersionSet::MarkMinLogNumberToKeep(uint64_t number) {
} }
Status VersionSet::WriteCurrentStateToManifest( Status VersionSet::WriteCurrentStateToManifest(
const WriteOptions& write_options,
const std::unordered_map<uint32_t, MutableCFState>& curr_state, const std::unordered_map<uint32_t, MutableCFState>& curr_state,
const VersionEdit& wal_additions, log::Writer* log, IOStatus& io_s) { const VersionEdit& wal_additions, log::Writer* log, IOStatus& io_s) {
// TODO: Break up into multiple records to reduce memory usage on recovery? // TODO: Break up into multiple records to reduce memory usage on recovery?
@ -6535,7 +6542,7 @@ Status VersionSet::WriteCurrentStateToManifest(
return Status::Corruption("Unable to Encode VersionEdit:" + return Status::Corruption("Unable to Encode VersionEdit:" +
edit_for_db_id.DebugString(true)); edit_for_db_id.DebugString(true));
} }
io_s = log->AddRecord(db_id_record); io_s = log->AddRecord(write_options, db_id_record);
if (!io_s.ok()) { if (!io_s.ok()) {
return io_s; return io_s;
} }
@ -6550,7 +6557,7 @@ Status VersionSet::WriteCurrentStateToManifest(
return Status::Corruption("Unable to Encode VersionEdit: " + return Status::Corruption("Unable to Encode VersionEdit: " +
wal_additions.DebugString(true)); wal_additions.DebugString(true));
} }
io_s = log->AddRecord(record); io_s = log->AddRecord(write_options, record);
if (!io_s.ok()) { if (!io_s.ok()) {
return io_s; return io_s;
} }
@ -6567,7 +6574,7 @@ Status VersionSet::WriteCurrentStateToManifest(
return Status::Corruption("Unable to Encode VersionEdit: " + return Status::Corruption("Unable to Encode VersionEdit: " +
wal_deletions.DebugString(true)); wal_deletions.DebugString(true));
} }
io_s = log->AddRecord(wal_deletions_record); io_s = log->AddRecord(write_options, wal_deletions_record);
if (!io_s.ok()) { if (!io_s.ok()) {
return io_s; return io_s;
} }
@ -6597,7 +6604,7 @@ Status VersionSet::WriteCurrentStateToManifest(
return Status::Corruption("Unable to Encode VersionEdit:" + return Status::Corruption("Unable to Encode VersionEdit:" +
edit.DebugString(true)); edit.DebugString(true));
} }
io_s = log->AddRecord(record); io_s = log->AddRecord(write_options, record);
if (!io_s.ok()) { if (!io_s.ok()) {
return io_s; return io_s;
} }
@ -6679,7 +6686,7 @@ Status VersionSet::WriteCurrentStateToManifest(
return Status::Corruption("Unable to Encode VersionEdit:" + return Status::Corruption("Unable to Encode VersionEdit:" +
edit.DebugString(true)); edit.DebugString(true));
} }
io_s = log->AddRecord(record); io_s = log->AddRecord(write_options, record);
if (!io_s.ok()) { if (!io_s.ok()) {
return io_s; return io_s;
} }

View File

@ -1170,14 +1170,15 @@ class VersionSet {
virtual Status Close(FSDirectory* db_dir, InstrumentedMutex* mu); virtual Status Close(FSDirectory* db_dir, InstrumentedMutex* mu);
Status LogAndApplyToDefaultColumnFamily( Status LogAndApplyToDefaultColumnFamily(
const ReadOptions& read_options, VersionEdit* edit, InstrumentedMutex* mu, const ReadOptions& read_options, const WriteOptions& write_options,
VersionEdit* edit, InstrumentedMutex* mu,
FSDirectory* dir_contains_current_file, bool new_descriptor_log = false, FSDirectory* dir_contains_current_file, bool new_descriptor_log = false,
const ColumnFamilyOptions* column_family_options = nullptr) { const ColumnFamilyOptions* column_family_options = nullptr) {
ColumnFamilyData* default_cf = GetColumnFamilySet()->GetDefault(); ColumnFamilyData* default_cf = GetColumnFamilySet()->GetDefault();
const MutableCFOptions* cf_options = const MutableCFOptions* cf_options =
default_cf->GetLatestMutableCFOptions(); default_cf->GetLatestMutableCFOptions();
return LogAndApply(default_cf, *cf_options, read_options, edit, mu, return LogAndApply(default_cf, *cf_options, read_options, write_options,
dir_contains_current_file, new_descriptor_log, edit, mu, dir_contains_current_file, new_descriptor_log,
column_family_options); column_family_options);
} }
@ -1190,7 +1191,8 @@ class VersionSet {
Status LogAndApply( Status LogAndApply(
ColumnFamilyData* column_family_data, ColumnFamilyData* column_family_data,
const MutableCFOptions& mutable_cf_options, const MutableCFOptions& mutable_cf_options,
const ReadOptions& read_options, VersionEdit* edit, InstrumentedMutex* mu, const ReadOptions& read_options, const WriteOptions& write_options,
VersionEdit* edit, InstrumentedMutex* mu,
FSDirectory* dir_contains_current_file, bool new_descriptor_log = false, FSDirectory* dir_contains_current_file, bool new_descriptor_log = false,
const ColumnFamilyOptions* column_family_options = nullptr, const ColumnFamilyOptions* column_family_options = nullptr,
const std::function<void(const Status&)>& manifest_wcb = {}) { const std::function<void(const Status&)>& manifest_wcb = {}) {
@ -1202,16 +1204,17 @@ class VersionSet {
autovector<VersionEdit*> edit_list; autovector<VersionEdit*> edit_list;
edit_list.emplace_back(edit); edit_list.emplace_back(edit);
edit_lists.emplace_back(edit_list); edit_lists.emplace_back(edit_list);
return LogAndApply(cfds, mutable_cf_options_list, read_options, edit_lists, return LogAndApply(cfds, mutable_cf_options_list, read_options,
mu, dir_contains_current_file, new_descriptor_log, write_options, edit_lists, mu, dir_contains_current_file,
column_family_options, {manifest_wcb}); new_descriptor_log, column_family_options,
{manifest_wcb});
} }
// The batch version. If edit_list.size() > 1, caller must ensure that // The batch version. If edit_list.size() > 1, caller must ensure that
// no edit in the list column family add or drop // no edit in the list column family add or drop
Status LogAndApply( Status LogAndApply(
ColumnFamilyData* column_family_data, ColumnFamilyData* column_family_data,
const MutableCFOptions& mutable_cf_options, const MutableCFOptions& mutable_cf_options,
const ReadOptions& read_options, const ReadOptions& read_options, const WriteOptions& write_options,
const autovector<VersionEdit*>& edit_list, InstrumentedMutex* mu, const autovector<VersionEdit*>& edit_list, InstrumentedMutex* mu,
FSDirectory* dir_contains_current_file, bool new_descriptor_log = false, FSDirectory* dir_contains_current_file, bool new_descriptor_log = false,
const ColumnFamilyOptions* column_family_options = nullptr, const ColumnFamilyOptions* column_family_options = nullptr,
@ -1222,9 +1225,10 @@ class VersionSet {
mutable_cf_options_list.emplace_back(&mutable_cf_options); mutable_cf_options_list.emplace_back(&mutable_cf_options);
autovector<autovector<VersionEdit*>> edit_lists; autovector<autovector<VersionEdit*>> edit_lists;
edit_lists.emplace_back(edit_list); edit_lists.emplace_back(edit_list);
return LogAndApply(cfds, mutable_cf_options_list, read_options, edit_lists, return LogAndApply(cfds, mutable_cf_options_list, read_options,
mu, dir_contains_current_file, new_descriptor_log, write_options, edit_lists, mu, dir_contains_current_file,
column_family_options, {manifest_wcb}); new_descriptor_log, column_family_options,
{manifest_wcb});
} }
// The across-multi-cf batch version. If edit_lists contain more than // The across-multi-cf batch version. If edit_lists contain more than
@ -1233,7 +1237,7 @@ class VersionSet {
virtual Status LogAndApply( virtual Status LogAndApply(
const autovector<ColumnFamilyData*>& cfds, const autovector<ColumnFamilyData*>& cfds,
const autovector<const MutableCFOptions*>& mutable_cf_options_list, const autovector<const MutableCFOptions*>& mutable_cf_options_list,
const ReadOptions& read_options, const ReadOptions& read_options, const WriteOptions& write_options,
const autovector<autovector<VersionEdit*>>& edit_lists, const autovector<autovector<VersionEdit*>>& edit_lists,
InstrumentedMutex* mu, FSDirectory* dir_contains_current_file, InstrumentedMutex* mu, FSDirectory* dir_contains_current_file,
bool new_descriptor_log = false, bool new_descriptor_log = false,
@ -1547,6 +1551,7 @@ class VersionSet {
new Version(cfd, this, file_options_, mutable_cf_options, io_tracer_); new Version(cfd, this, file_options_, mutable_cf_options, io_tracer_);
constexpr bool update_stats = false; constexpr bool update_stats = false;
// TODO: plumb Env::IOActivity, Env::IOPriority
const ReadOptions read_options; const ReadOptions read_options;
version->PrepareAppend(mutable_cf_options, read_options, update_stats); version->PrepareAppend(mutable_cf_options, read_options, update_stats);
AppendVersion(cfd, version); AppendVersion(cfd, version);
@ -1595,6 +1600,7 @@ class VersionSet {
// Save current contents to *log // Save current contents to *log
Status WriteCurrentStateToManifest( Status WriteCurrentStateToManifest(
const WriteOptions& write_options,
const std::unordered_map<uint32_t, MutableCFState>& curr_state, const std::unordered_map<uint32_t, MutableCFState>& curr_state,
const VersionEdit& wal_additions, log::Writer* log, IOStatus& io_s); const VersionEdit& wal_additions, log::Writer* log, IOStatus& io_s);
@ -1688,7 +1694,8 @@ class VersionSet {
FSDirectory* dir_contains_current_file, FSDirectory* dir_contains_current_file,
bool new_descriptor_log, bool new_descriptor_log,
const ColumnFamilyOptions* new_cf_options, const ColumnFamilyOptions* new_cf_options,
const ReadOptions& read_options); const ReadOptions& read_options,
const WriteOptions& write_options);
void LogAndApplyCFHelper(VersionEdit* edit, void LogAndApplyCFHelper(VersionEdit* edit,
SequenceNumber* max_last_sequence); SequenceNumber* max_last_sequence);
@ -1747,7 +1754,7 @@ class ReactiveVersionSet : public VersionSet {
private: private:
std::unique_ptr<ManifestTailer> manifest_tailer_; std::unique_ptr<ManifestTailer> manifest_tailer_;
// TODO: plumb Env::IOActivity // TODO: plumb Env::IOActivity, Env::IOPriority
const ReadOptions read_options_; const ReadOptions read_options_;
using VersionSet::LogAndApply; using VersionSet::LogAndApply;
using VersionSet::Recover; using VersionSet::Recover;
@ -1756,6 +1763,7 @@ class ReactiveVersionSet : public VersionSet {
const autovector<ColumnFamilyData*>& /*cfds*/, const autovector<ColumnFamilyData*>& /*cfds*/,
const autovector<const MutableCFOptions*>& /*mutable_cf_options_list*/, const autovector<const MutableCFOptions*>& /*mutable_cf_options_list*/,
const ReadOptions& /* read_options */, const ReadOptions& /* read_options */,
const WriteOptions& /* write_options */,
const autovector<autovector<VersionEdit*>>& /*edit_lists*/, const autovector<autovector<VersionEdit*>>& /*edit_lists*/,
InstrumentedMutex* /*mu*/, FSDirectory* /*dir_contains_current_file*/, InstrumentedMutex* /*mu*/, FSDirectory* /*dir_contains_current_file*/,
bool /*new_descriptor_log*/, const ColumnFamilyOptions* /*new_cf_option*/, bool /*new_descriptor_log*/, const ColumnFamilyOptions* /*new_cf_option*/,

View File

@ -1322,11 +1322,11 @@ class VersionSetTestBase {
log_writer->reset(new log::Writer(std::move(file_writer), 0, false)); log_writer->reset(new log::Writer(std::move(file_writer), 0, false));
std::string record; std::string record;
new_db.EncodeTo(&record); new_db.EncodeTo(&record);
s = (*log_writer)->AddRecord(record); s = (*log_writer)->AddRecord(WriteOptions(), record);
for (const auto& e : new_cfs) { for (const auto& e : new_cfs) {
record.clear(); record.clear();
e.EncodeTo(&record); e.EncodeTo(&record);
s = (*log_writer)->AddRecord(record); s = (*log_writer)->AddRecord(WriteOptions(), record);
ASSERT_OK(s); ASSERT_OK(s);
} }
} }
@ -1342,11 +1342,11 @@ class VersionSetTestBase {
void NewDB() { void NewDB() {
SequenceNumber last_seqno; SequenceNumber last_seqno;
std::unique_ptr<log::Writer> log_writer; std::unique_ptr<log::Writer> log_writer;
ASSERT_OK(SetIdentityFile(env_, dbname_)); ASSERT_OK(SetIdentityFile(WriteOptions(), env_, dbname_));
PrepareManifest(&column_families_, &last_seqno, &log_writer); PrepareManifest(&column_families_, &last_seqno, &log_writer);
log_writer.reset(); log_writer.reset();
// Make "CURRENT" file point to the new manifest file. // Make "CURRENT" file point to the new manifest file.
Status s = SetCurrentFile(fs_.get(), dbname_, 1, nullptr); Status s = SetCurrentFile(WriteOptions(), fs_.get(), dbname_, 1, nullptr);
ASSERT_OK(s); ASSERT_OK(s);
EXPECT_OK(versions_->Recover(column_families_, false)); EXPECT_OK(versions_->Recover(column_families_, false));
@ -1392,7 +1392,7 @@ class VersionSetTestBase {
mutex_.Lock(); mutex_.Lock();
Status s = versions_->LogAndApply( Status s = versions_->LogAndApply(
versions_->GetColumnFamilySet()->GetDefault(), mutable_cf_options_, versions_->GetColumnFamilySet()->GetDefault(), mutable_cf_options_,
read_options_, &edit, &mutex_, nullptr); read_options_, write_options_, &edit, &mutex_, nullptr);
mutex_.Unlock(); mutex_.Unlock();
return s; return s;
} }
@ -1406,7 +1406,7 @@ class VersionSetTestBase {
mutex_.Lock(); mutex_.Lock();
Status s = versions_->LogAndApply( Status s = versions_->LogAndApply(
versions_->GetColumnFamilySet()->GetDefault(), mutable_cf_options_, versions_->GetColumnFamilySet()->GetDefault(), mutable_cf_options_,
read_options_, vedits, &mutex_, nullptr); read_options_, write_options_, vedits, &mutex_, nullptr);
mutex_.Unlock(); mutex_.Unlock();
return s; return s;
} }
@ -1418,7 +1418,8 @@ class VersionSetTestBase {
VersionEdit dummy; VersionEdit dummy;
ASSERT_OK(versions_->LogAndApply( ASSERT_OK(versions_->LogAndApply(
versions_->GetColumnFamilySet()->GetDefault(), mutable_cf_options_, versions_->GetColumnFamilySet()->GetDefault(), mutable_cf_options_,
read_options_, &dummy, &mutex_, db_directory, new_descriptor_log)); read_options_, write_options_, &dummy, &mutex_, db_directory,
new_descriptor_log));
mutex_.Unlock(); mutex_.Unlock();
} }
@ -1436,7 +1437,7 @@ class VersionSetTestBase {
mutex_.Lock(); mutex_.Lock();
s = versions_->LogAndApply(/*column_family_data=*/nullptr, s = versions_->LogAndApply(/*column_family_data=*/nullptr,
MutableCFOptions(cf_options), read_options_, MutableCFOptions(cf_options), read_options_,
&new_cf, &mutex_, write_options_, &new_cf, &mutex_,
/*db_directory=*/nullptr, /*db_directory=*/nullptr,
/*new_descriptor_log=*/false, &cf_options); /*new_descriptor_log=*/false, &cf_options);
mutex_.Unlock(); mutex_.Unlock();
@ -1459,6 +1460,8 @@ class VersionSetTestBase {
ImmutableOptions immutable_options_; ImmutableOptions immutable_options_;
MutableCFOptions mutable_cf_options_; MutableCFOptions mutable_cf_options_;
const ReadOptions read_options_; const ReadOptions read_options_;
const WriteOptions write_options_;
std::shared_ptr<Cache> table_cache_; std::shared_ptr<Cache> table_cache_;
WriteController write_controller_; WriteController write_controller_;
WriteBufferManager write_buffer_manager_; WriteBufferManager write_buffer_manager_;
@ -1483,6 +1486,7 @@ TEST_F(VersionSetTest, SameColumnFamilyGroupCommit) {
NewDB(); NewDB();
const int kGroupSize = 5; const int kGroupSize = 5;
const ReadOptions read_options; const ReadOptions read_options;
const WriteOptions write_options;
autovector<VersionEdit> edits; autovector<VersionEdit> edits;
for (int i = 0; i != kGroupSize; ++i) { for (int i = 0; i != kGroupSize; ++i) {
@ -1510,8 +1514,9 @@ TEST_F(VersionSetTest, SameColumnFamilyGroupCommit) {
}); });
SyncPoint::GetInstance()->EnableProcessing(); SyncPoint::GetInstance()->EnableProcessing();
mutex_.Lock(); mutex_.Lock();
Status s = versions_->LogAndApply(cfds, all_mutable_cf_options, read_options, Status s =
edit_lists, &mutex_, nullptr); versions_->LogAndApply(cfds, all_mutable_cf_options, read_options,
write_options, edit_lists, &mutex_, nullptr);
mutex_.Unlock(); mutex_.Unlock();
EXPECT_OK(s); EXPECT_OK(s);
EXPECT_EQ(kGroupSize - 1, count); EXPECT_EQ(kGroupSize - 1, count);
@ -1713,7 +1718,7 @@ TEST_F(VersionSetTest, ObsoleteBlobFile) {
mutex_.Lock(); mutex_.Lock();
Status s = versions_->LogAndApply( Status s = versions_->LogAndApply(
versions_->GetColumnFamilySet()->GetDefault(), mutable_cf_options_, versions_->GetColumnFamilySet()->GetDefault(), mutable_cf_options_,
read_options_, &edit, &mutex_, nullptr); read_options_, write_options_, &edit, &mutex_, nullptr);
mutex_.Unlock(); mutex_.Unlock();
ASSERT_OK(s); ASSERT_OK(s);
@ -2454,7 +2459,8 @@ class VersionSetWithTimestampTest : public VersionSetTest {
Status s; Status s;
mutex_.Lock(); mutex_.Lock();
s = versions_->LogAndApply(cfd_, *(cfd_->GetLatestMutableCFOptions()), s = versions_->LogAndApply(cfd_, *(cfd_->GetLatestMutableCFOptions()),
read_options_, edits_, &mutex_, nullptr); read_options_, write_options_, edits_, &mutex_,
nullptr);
mutex_.Unlock(); mutex_.Unlock();
ASSERT_OK(s); ASSERT_OK(s);
VerifyFullHistoryTsLow(*std::max_element(ts_lbs.begin(), ts_lbs.end())); VerifyFullHistoryTsLow(*std::max_element(ts_lbs.begin(), ts_lbs.end()));
@ -2514,7 +2520,7 @@ class VersionSetAtomicGroupTest : public VersionSetTestBase,
edits_[i].MarkAtomicGroup(--remaining); edits_[i].MarkAtomicGroup(--remaining);
edits_[i].SetLastSequence(last_seqno_++); edits_[i].SetLastSequence(last_seqno_++);
} }
ASSERT_OK(SetCurrentFile(fs_.get(), dbname_, 1, nullptr)); ASSERT_OK(SetCurrentFile(WriteOptions(), fs_.get(), dbname_, 1, nullptr));
} }
void SetupIncompleteTrailingAtomicGroup(int atomic_group_size) { void SetupIncompleteTrailingAtomicGroup(int atomic_group_size) {
@ -2526,7 +2532,7 @@ class VersionSetAtomicGroupTest : public VersionSetTestBase,
edits_[i].MarkAtomicGroup(--remaining); edits_[i].MarkAtomicGroup(--remaining);
edits_[i].SetLastSequence(last_seqno_++); edits_[i].SetLastSequence(last_seqno_++);
} }
ASSERT_OK(SetCurrentFile(fs_.get(), dbname_, 1, nullptr)); ASSERT_OK(SetCurrentFile(WriteOptions(), fs_.get(), dbname_, 1, nullptr));
} }
void SetupCorruptedAtomicGroup(int atomic_group_size) { void SetupCorruptedAtomicGroup(int atomic_group_size) {
@ -2540,7 +2546,7 @@ class VersionSetAtomicGroupTest : public VersionSetTestBase,
} }
edits_[i].SetLastSequence(last_seqno_++); edits_[i].SetLastSequence(last_seqno_++);
} }
ASSERT_OK(SetCurrentFile(fs_.get(), dbname_, 1, nullptr)); ASSERT_OK(SetCurrentFile(WriteOptions(), fs_.get(), dbname_, 1, nullptr));
} }
void SetupIncorrectAtomicGroup(int atomic_group_size) { void SetupIncorrectAtomicGroup(int atomic_group_size) {
@ -2556,7 +2562,7 @@ class VersionSetAtomicGroupTest : public VersionSetTestBase,
} }
edits_[i].SetLastSequence(last_seqno_++); edits_[i].SetLastSequence(last_seqno_++);
} }
ASSERT_OK(SetCurrentFile(fs_.get(), dbname_, 1, nullptr)); ASSERT_OK(SetCurrentFile(WriteOptions(), fs_.get(), dbname_, 1, nullptr));
} }
void SetupTestSyncPoints() { void SetupTestSyncPoints() {
@ -2602,7 +2608,7 @@ class VersionSetAtomicGroupTest : public VersionSetTestBase,
for (int i = 0; i < num_edits; i++) { for (int i = 0; i < num_edits; i++) {
std::string record; std::string record;
edits_[i].EncodeTo(&record); edits_[i].EncodeTo(&record);
ASSERT_OK(log_writer_->AddRecord(record)); ASSERT_OK(log_writer_->AddRecord(WriteOptions(), record));
} }
} }
@ -2724,7 +2730,7 @@ TEST_F(VersionSetAtomicGroupTest,
// edits. // edits.
std::string last_record; std::string last_record;
edits_[kAtomicGroupSize - 1].EncodeTo(&last_record); edits_[kAtomicGroupSize - 1].EncodeTo(&last_record);
EXPECT_OK(log_writer_->AddRecord(last_record)); EXPECT_OK(log_writer_->AddRecord(WriteOptions(), last_record));
InstrumentedMutex mu; InstrumentedMutex mu;
std::unordered_set<ColumnFamilyData*> cfds_changed; std::unordered_set<ColumnFamilyData*> cfds_changed;
mu.Lock(); mu.Lock();
@ -2896,12 +2902,13 @@ class VersionSetTestDropOneCF : public VersionSetTestBase,
// last column family in an atomic group. // last column family in an atomic group.
TEST_P(VersionSetTestDropOneCF, HandleDroppedColumnFamilyInAtomicGroup) { TEST_P(VersionSetTestDropOneCF, HandleDroppedColumnFamilyInAtomicGroup) {
const ReadOptions read_options; const ReadOptions read_options;
const WriteOptions write_options;
std::vector<ColumnFamilyDescriptor> column_families; std::vector<ColumnFamilyDescriptor> column_families;
SequenceNumber last_seqno; SequenceNumber last_seqno;
std::unique_ptr<log::Writer> log_writer; std::unique_ptr<log::Writer> log_writer;
PrepareManifest(&column_families, &last_seqno, &log_writer); PrepareManifest(&column_families, &last_seqno, &log_writer);
Status s = SetCurrentFile(fs_.get(), dbname_, 1, nullptr); Status s = SetCurrentFile(WriteOptions(), fs_.get(), dbname_, 1, nullptr);
ASSERT_OK(s); ASSERT_OK(s);
EXPECT_OK(versions_->Recover(column_families, false /* read_only */)); EXPECT_OK(versions_->Recover(column_families, false /* read_only */));
@ -2924,9 +2931,9 @@ TEST_P(VersionSetTestDropOneCF, HandleDroppedColumnFamilyInAtomicGroup) {
cfd_to_drop->Ref(); cfd_to_drop->Ref();
drop_cf_edit.SetColumnFamily(cfd_to_drop->GetID()); drop_cf_edit.SetColumnFamily(cfd_to_drop->GetID());
mutex_.Lock(); mutex_.Lock();
s = versions_->LogAndApply(cfd_to_drop, s = versions_->LogAndApply(
*cfd_to_drop->GetLatestMutableCFOptions(), cfd_to_drop, *cfd_to_drop->GetLatestMutableCFOptions(), read_options,
read_options, &drop_cf_edit, &mutex_, nullptr); write_options, &drop_cf_edit, &mutex_, nullptr);
mutex_.Unlock(); mutex_.Unlock();
ASSERT_OK(s); ASSERT_OK(s);
@ -2976,7 +2983,7 @@ TEST_P(VersionSetTestDropOneCF, HandleDroppedColumnFamilyInAtomicGroup) {
SyncPoint::GetInstance()->EnableProcessing(); SyncPoint::GetInstance()->EnableProcessing();
mutex_.Lock(); mutex_.Lock();
s = versions_->LogAndApply(cfds, mutable_cf_options_list, read_options, s = versions_->LogAndApply(cfds, mutable_cf_options_list, read_options,
edit_lists, &mutex_, nullptr); write_options, edit_lists, &mutex_, nullptr);
mutex_.Unlock(); mutex_.Unlock();
ASSERT_OK(s); ASSERT_OK(s);
ASSERT_EQ(1, called); ASSERT_EQ(1, called);
@ -3010,7 +3017,7 @@ class EmptyDefaultCfNewManifest : public VersionSetTestBase,
log_writer->reset(new log::Writer(std::move(file_writer), 0, true)); log_writer->reset(new log::Writer(std::move(file_writer), 0, true));
std::string record; std::string record;
ASSERT_TRUE(new_db.EncodeTo(&record)); ASSERT_TRUE(new_db.EncodeTo(&record));
s = (*log_writer)->AddRecord(record); s = (*log_writer)->AddRecord(WriteOptions(), record);
ASSERT_OK(s); ASSERT_OK(s);
// Create new column family // Create new column family
VersionEdit new_cf; VersionEdit new_cf;
@ -3020,7 +3027,7 @@ class EmptyDefaultCfNewManifest : public VersionSetTestBase,
new_cf.SetNextFile(2); new_cf.SetNextFile(2);
record.clear(); record.clear();
ASSERT_TRUE(new_cf.EncodeTo(&record)); ASSERT_TRUE(new_cf.EncodeTo(&record));
s = (*log_writer)->AddRecord(record); s = (*log_writer)->AddRecord(WriteOptions(), record);
ASSERT_OK(s); ASSERT_OK(s);
} }
@ -3034,8 +3041,8 @@ class EmptyDefaultCfNewManifest : public VersionSetTestBase,
TEST_F(EmptyDefaultCfNewManifest, Recover) { TEST_F(EmptyDefaultCfNewManifest, Recover) {
PrepareManifest(nullptr, nullptr, &log_writer_); PrepareManifest(nullptr, nullptr, &log_writer_);
log_writer_.reset(); log_writer_.reset();
Status s = Status s = SetCurrentFile(WriteOptions(), fs_.get(), dbname_, 1,
SetCurrentFile(fs_.get(), dbname_, 1, /*directory_to_fsync=*/nullptr); /* dir_contains_current_file */ nullptr);
ASSERT_OK(s); ASSERT_OK(s);
std::string manifest_path; std::string manifest_path;
VerifyManifest(&manifest_path); VerifyManifest(&manifest_path);
@ -3066,7 +3073,7 @@ class VersionSetTestEmptyDb
assert(nullptr != log_writer); assert(nullptr != log_writer);
VersionEdit new_db; VersionEdit new_db;
if (db_options_.write_dbid_to_manifest) { if (db_options_.write_dbid_to_manifest) {
ASSERT_OK(SetIdentityFile(env_, dbname_)); ASSERT_OK(SetIdentityFile(WriteOptions(), env_, dbname_));
DBOptions tmp_db_options; DBOptions tmp_db_options;
tmp_db_options.env = env_; tmp_db_options.env = env_;
std::unique_ptr<DBImpl> impl(new DBImpl(tmp_db_options, dbname_)); std::unique_ptr<DBImpl> impl(new DBImpl(tmp_db_options, dbname_));
@ -3085,7 +3092,7 @@ class VersionSetTestEmptyDb
log_writer->reset(new log::Writer(std::move(file_writer), 0, false)); log_writer->reset(new log::Writer(std::move(file_writer), 0, false));
std::string record; std::string record;
new_db.EncodeTo(&record); new_db.EncodeTo(&record);
s = (*log_writer)->AddRecord(record); s = (*log_writer)->AddRecord(WriteOptions(), record);
ASSERT_OK(s); ASSERT_OK(s);
} }
} }
@ -3099,8 +3106,8 @@ TEST_P(VersionSetTestEmptyDb, OpenFromIncompleteManifest0) {
db_options_.write_dbid_to_manifest = std::get<0>(GetParam()); db_options_.write_dbid_to_manifest = std::get<0>(GetParam());
PrepareManifest(nullptr, nullptr, &log_writer_); PrepareManifest(nullptr, nullptr, &log_writer_);
log_writer_.reset(); log_writer_.reset();
Status s = Status s = SetCurrentFile(WriteOptions(), fs_.get(), dbname_, 1,
SetCurrentFile(fs_.get(), dbname_, 1, /*directory_to_fsync=*/nullptr); /* dir_contains_current_file */ nullptr);
ASSERT_OK(s); ASSERT_OK(s);
std::string manifest_path; std::string manifest_path;
@ -3140,11 +3147,12 @@ TEST_P(VersionSetTestEmptyDb, OpenFromIncompleteManifest1) {
{ {
std::string record; std::string record;
new_cf1.EncodeTo(&record); new_cf1.EncodeTo(&record);
s = log_writer_->AddRecord(record); s = log_writer_->AddRecord(WriteOptions(), record);
ASSERT_OK(s); ASSERT_OK(s);
} }
log_writer_.reset(); log_writer_.reset();
s = SetCurrentFile(fs_.get(), dbname_, 1, /*directory_to_fsync=*/nullptr); s = SetCurrentFile(WriteOptions(), fs_.get(), dbname_, 1,
/* dir_contains_current_file */ nullptr);
ASSERT_OK(s); ASSERT_OK(s);
std::string manifest_path; std::string manifest_path;
@ -3187,11 +3195,12 @@ TEST_P(VersionSetTestEmptyDb, OpenFromInCompleteManifest2) {
new_cf.SetColumnFamily(cf_id++); new_cf.SetColumnFamily(cf_id++);
std::string record; std::string record;
ASSERT_TRUE(new_cf.EncodeTo(&record)); ASSERT_TRUE(new_cf.EncodeTo(&record));
s = log_writer_->AddRecord(record); s = log_writer_->AddRecord(WriteOptions(), record);
ASSERT_OK(s); ASSERT_OK(s);
} }
log_writer_.reset(); log_writer_.reset();
s = SetCurrentFile(fs_.get(), dbname_, 1, /*directory_to_fsync=*/nullptr); s = SetCurrentFile(WriteOptions(), fs_.get(), dbname_, 1,
/* dir_contains_current_file */ nullptr);
ASSERT_OK(s); ASSERT_OK(s);
std::string manifest_path; std::string manifest_path;
@ -3234,7 +3243,7 @@ TEST_P(VersionSetTestEmptyDb, OpenManifestWithUnknownCF) {
new_cf.SetColumnFamily(cf_id++); new_cf.SetColumnFamily(cf_id++);
std::string record; std::string record;
ASSERT_TRUE(new_cf.EncodeTo(&record)); ASSERT_TRUE(new_cf.EncodeTo(&record));
s = log_writer_->AddRecord(record); s = log_writer_->AddRecord(WriteOptions(), record);
ASSERT_OK(s); ASSERT_OK(s);
} }
{ {
@ -3245,11 +3254,12 @@ TEST_P(VersionSetTestEmptyDb, OpenManifestWithUnknownCF) {
tmp_edit.SetLastSequence(0); tmp_edit.SetLastSequence(0);
std::string record; std::string record;
ASSERT_TRUE(tmp_edit.EncodeTo(&record)); ASSERT_TRUE(tmp_edit.EncodeTo(&record));
s = log_writer_->AddRecord(record); s = log_writer_->AddRecord(WriteOptions(), record);
ASSERT_OK(s); ASSERT_OK(s);
} }
log_writer_.reset(); log_writer_.reset();
s = SetCurrentFile(fs_.get(), dbname_, 1, /*directory_to_fsync=*/nullptr); s = SetCurrentFile(WriteOptions(), fs_.get(), dbname_, 1,
/* dir_contains_current_file */ nullptr);
ASSERT_OK(s); ASSERT_OK(s);
std::string manifest_path; std::string manifest_path;
@ -3292,7 +3302,7 @@ TEST_P(VersionSetTestEmptyDb, OpenCompleteManifest) {
new_cf.SetColumnFamily(cf_id++); new_cf.SetColumnFamily(cf_id++);
std::string record; std::string record;
ASSERT_TRUE(new_cf.EncodeTo(&record)); ASSERT_TRUE(new_cf.EncodeTo(&record));
s = log_writer_->AddRecord(record); s = log_writer_->AddRecord(WriteOptions(), record);
ASSERT_OK(s); ASSERT_OK(s);
} }
{ {
@ -3302,11 +3312,12 @@ TEST_P(VersionSetTestEmptyDb, OpenCompleteManifest) {
tmp_edit.SetLastSequence(0); tmp_edit.SetLastSequence(0);
std::string record; std::string record;
ASSERT_TRUE(tmp_edit.EncodeTo(&record)); ASSERT_TRUE(tmp_edit.EncodeTo(&record));
s = log_writer_->AddRecord(record); s = log_writer_->AddRecord(WriteOptions(), record);
ASSERT_OK(s); ASSERT_OK(s);
} }
log_writer_.reset(); log_writer_.reset();
s = SetCurrentFile(fs_.get(), dbname_, 1, /*directory_to_fsync=*/nullptr); s = SetCurrentFile(WriteOptions(), fs_.get(), dbname_, 1,
/* dir_contains_current_file */ nullptr);
ASSERT_OK(s); ASSERT_OK(s);
std::string manifest_path; std::string manifest_path;
@ -3407,7 +3418,7 @@ class VersionSetTestMissingFiles : public VersionSetTestBase,
{ {
std::string record; std::string record;
ASSERT_TRUE(new_db.EncodeTo(&record)); ASSERT_TRUE(new_db.EncodeTo(&record));
s = (*log_writer)->AddRecord(record); s = (*log_writer)->AddRecord(WriteOptions(), record);
ASSERT_OK(s); ASSERT_OK(s);
} }
const std::vector<std::string> cf_names = { const std::vector<std::string> cf_names = {
@ -3425,7 +3436,7 @@ class VersionSetTestMissingFiles : public VersionSetTestBase,
new_cf.SetColumnFamily(cf_id); new_cf.SetColumnFamily(cf_id);
std::string record; std::string record;
ASSERT_TRUE(new_cf.EncodeTo(&record)); ASSERT_TRUE(new_cf.EncodeTo(&record));
s = (*log_writer)->AddRecord(record); s = (*log_writer)->AddRecord(WriteOptions(), record);
ASSERT_OK(s); ASSERT_OK(s);
VersionEdit cf_files; VersionEdit cf_files;
@ -3433,7 +3444,7 @@ class VersionSetTestMissingFiles : public VersionSetTestBase,
cf_files.SetLogNumber(0); cf_files.SetLogNumber(0);
record.clear(); record.clear();
ASSERT_TRUE(cf_files.EncodeTo(&record)); ASSERT_TRUE(cf_files.EncodeTo(&record));
s = (*log_writer)->AddRecord(record); s = (*log_writer)->AddRecord(WriteOptions(), record);
ASSERT_OK(s); ASSERT_OK(s);
++cf_id; ++cf_id;
} }
@ -3444,7 +3455,7 @@ class VersionSetTestMissingFiles : public VersionSetTestBase,
edit.SetLastSequence(seq); edit.SetLastSequence(seq);
std::string record; std::string record;
ASSERT_TRUE(edit.EncodeTo(&record)); ASSERT_TRUE(edit.EncodeTo(&record));
s = (*log_writer)->AddRecord(record); s = (*log_writer)->AddRecord(WriteOptions(), record);
ASSERT_OK(s); ASSERT_OK(s);
} }
*last_seqno = seq + 1; *last_seqno = seq + 1;
@ -3485,9 +3496,12 @@ class VersionSetTestMissingFiles : public VersionSetTestBase,
std::move(file), fname, FileOptions(), env_->GetSystemClock().get())); std::move(file), fname, FileOptions(), env_->GetSystemClock().get()));
IntTblPropCollectorFactories int_tbl_prop_collector_factories; IntTblPropCollectorFactories int_tbl_prop_collector_factories;
const ReadOptions read_options;
const WriteOptions write_options;
std::unique_ptr<TableBuilder> builder(table_factory_->NewTableBuilder( std::unique_ptr<TableBuilder> builder(table_factory_->NewTableBuilder(
TableBuilderOptions( TableBuilderOptions(
immutable_options_, mutable_cf_options_, *internal_comparator_, immutable_options_, mutable_cf_options_, read_options,
write_options, *internal_comparator_,
&int_tbl_prop_collector_factories, kNoCompression, &int_tbl_prop_collector_factories, kNoCompression,
CompressionOptions(), CompressionOptions(),
TablePropertiesCollectorFactory::Context::kUnknownColumnFamily, TablePropertiesCollectorFactory::Context::kUnknownColumnFamily,
@ -3496,7 +3510,7 @@ class VersionSetTestMissingFiles : public VersionSetTestBase,
InternalKey ikey(info.key, 0, ValueType::kTypeValue); InternalKey ikey(info.key, 0, ValueType::kTypeValue);
builder->Add(ikey.Encode(), "value"); builder->Add(ikey.Encode(), "value");
ASSERT_OK(builder->Finish()); ASSERT_OK(builder->Finish());
ASSERT_OK(fwriter->Flush()); ASSERT_OK(fwriter->Flush(IOOptions()));
uint64_t file_size = 0; uint64_t file_size = 0;
s = fs_->GetFileSize(fname, IOOptions(), &file_size, nullptr); s = fs_->GetFileSize(fname, IOOptions(), &file_size, nullptr);
ASSERT_OK(s); ASSERT_OK(s);
@ -3528,7 +3542,7 @@ class VersionSetTestMissingFiles : public VersionSetTestBase,
assert(log_writer_.get() != nullptr); assert(log_writer_.get() != nullptr);
std::string record; std::string record;
ASSERT_TRUE(edit.EncodeTo(&record, 0 /* ts_sz */)); ASSERT_TRUE(edit.EncodeTo(&record, 0 /* ts_sz */));
Status s = log_writer_->AddRecord(record); Status s = log_writer_->AddRecord(WriteOptions(), record);
ASSERT_OK(s); ASSERT_OK(s);
} }
@ -3573,7 +3587,7 @@ TEST_F(VersionSetTestMissingFiles, ManifestFarBehindSst) {
WriteFileAdditionAndDeletionToManifest( WriteFileAdditionAndDeletionToManifest(
/*cf=*/0, std::vector<std::pair<int, FileMetaData>>(), deleted_files); /*cf=*/0, std::vector<std::pair<int, FileMetaData>>(), deleted_files);
log_writer_.reset(); log_writer_.reset();
Status s = SetCurrentFile(fs_.get(), dbname_, 1, nullptr); Status s = SetCurrentFile(WriteOptions(), fs_.get(), dbname_, 1, nullptr);
ASSERT_OK(s); ASSERT_OK(s);
std::string manifest_path; std::string manifest_path;
VerifyManifest(&manifest_path); VerifyManifest(&manifest_path);
@ -3631,7 +3645,7 @@ TEST_F(VersionSetTestMissingFiles, ManifestAheadofSst) {
WriteFileAdditionAndDeletionToManifest( WriteFileAdditionAndDeletionToManifest(
/*cf=*/0, added_files, std::vector<std::pair<int, uint64_t>>()); /*cf=*/0, added_files, std::vector<std::pair<int, uint64_t>>());
log_writer_.reset(); log_writer_.reset();
Status s = SetCurrentFile(fs_.get(), dbname_, 1, nullptr); Status s = SetCurrentFile(WriteOptions(), fs_.get(), dbname_, 1, nullptr);
ASSERT_OK(s); ASSERT_OK(s);
std::string manifest_path; std::string manifest_path;
VerifyManifest(&manifest_path); VerifyManifest(&manifest_path);
@ -3685,7 +3699,7 @@ TEST_F(VersionSetTestMissingFiles, NoFileMissing) {
WriteFileAdditionAndDeletionToManifest( WriteFileAdditionAndDeletionToManifest(
/*cf=*/0, std::vector<std::pair<int, FileMetaData>>(), deleted_files); /*cf=*/0, std::vector<std::pair<int, FileMetaData>>(), deleted_files);
log_writer_.reset(); log_writer_.reset();
Status s = SetCurrentFile(fs_.get(), dbname_, 1, nullptr); Status s = SetCurrentFile(WriteOptions(), fs_.get(), dbname_, 1, nullptr);
ASSERT_OK(s); ASSERT_OK(s);
std::string manifest_path; std::string manifest_path;
VerifyManifest(&manifest_path); VerifyManifest(&manifest_path);

View File

@ -36,15 +36,17 @@ class OfflineManifestWriter {
/*no_error_if_files_missing*/ true); /*no_error_if_files_missing*/ true);
} }
Status LogAndApply(const ReadOptions& read_options, ColumnFamilyData* cfd, Status LogAndApply(const ReadOptions& read_options,
const WriteOptions& write_options, ColumnFamilyData* cfd,
VersionEdit* edit, VersionEdit* edit,
FSDirectory* dir_contains_current_file) { FSDirectory* dir_contains_current_file) {
// Use `mutex` to imitate a locked DB mutex when calling `LogAndApply()`. // Use `mutex` to imitate a locked DB mutex when calling `LogAndApply()`.
InstrumentedMutex mutex; InstrumentedMutex mutex;
mutex.Lock(); mutex.Lock();
Status s = versions_.LogAndApply( Status s = versions_.LogAndApply(cfd, *cfd->GetLatestMutableCFOptions(),
cfd, *cfd->GetLatestMutableCFOptions(), read_options, edit, &mutex, read_options, write_options, edit, &mutex,
dir_contains_current_file, false /* new_descriptor_log */); dir_contains_current_file,
false /* new_descriptor_log */);
mutex.Unlock(); mutex.Unlock();
return s; return s;
} }

View File

@ -73,8 +73,8 @@ class WalManagerTest : public testing::Test {
WriteBatch batch; WriteBatch batch;
ASSERT_OK(batch.Put(key, value)); ASSERT_OK(batch.Put(key, value));
WriteBatchInternal::SetSequence(&batch, seq); WriteBatchInternal::SetSequence(&batch, seq);
ASSERT_OK( ASSERT_OK(current_log_writer_->AddRecord(
current_log_writer_->AddRecord(WriteBatchInternal::Contents(&batch))); WriteOptions(), WriteBatchInternal::Contents(&batch)));
versions_->SetLastAllocatedSequence(seq); versions_->SetLastAllocatedSequence(seq);
versions_->SetLastPublishedSequence(seq); versions_->SetLastPublishedSequence(seq);
versions_->SetLastSequence(seq); versions_->SetLastSequence(seq);
@ -146,7 +146,8 @@ TEST_F(WalManagerTest, ReadFirstRecordCache) {
WriteBatch batch; WriteBatch batch;
ASSERT_OK(batch.Put("foo", "bar")); ASSERT_OK(batch.Put("foo", "bar"));
WriteBatchInternal::SetSequence(&batch, 10); WriteBatchInternal::SetSequence(&batch, 10);
ASSERT_OK(writer.AddRecord(WriteBatchInternal::Contents(&batch))); ASSERT_OK(
writer.AddRecord(WriteOptions(), WriteBatchInternal::Contents(&batch)));
// TODO(icanadi) move SpecialEnv outside of db_test, so we can reuse it here. // TODO(icanadi) move SpecialEnv outside of db_test, so we can reuse it here.
// Waiting for lei to finish with db_test // Waiting for lei to finish with db_test

View File

@ -2064,7 +2064,7 @@ class MemTableInserter : public WriteBatch::Handler {
// key not found in memtable. Do sst get, update, add // key not found in memtable. Do sst get, update, add
SnapshotImpl read_from_snapshot; SnapshotImpl read_from_snapshot;
read_from_snapshot.number_ = sequence_; read_from_snapshot.number_ = sequence_;
// TODO: plumb Env::IOActivity // TODO: plumb Env::IOActivity, Env::IOPriority
ReadOptions ropts; ReadOptions ropts;
// it's going to be overwritten for sure, so no point caching data block // it's going to be overwritten for sure, so no point caching data block
// containing the old version // containing the old version
@ -2511,7 +2511,7 @@ class MemTableInserter : public WriteBatch::Handler {
SnapshotImpl read_from_snapshot; SnapshotImpl read_from_snapshot;
read_from_snapshot.number_ = sequence_; read_from_snapshot.number_ = sequence_;
// TODO: plumb Env::IOActivity // TODO: plumb Env::IOActivity, Env::IOPriority
ReadOptions read_options; ReadOptions read_options;
read_options.snapshot = &read_from_snapshot; read_options.snapshot = &read_from_snapshot;

View File

@ -166,6 +166,8 @@ class WriteThread {
PreReleaseCallback* _pre_release_callback = nullptr, PreReleaseCallback* _pre_release_callback = nullptr,
PostMemTableCallback* _post_memtable_callback = nullptr) PostMemTableCallback* _post_memtable_callback = nullptr)
: batch(_batch), : batch(_batch),
// TODO: store a copy of WriteOptions instead of its seperated data
// members
sync(write_options.sync), sync(write_options.sync),
no_slowdown(write_options.no_slowdown), no_slowdown(write_options.no_slowdown),
disable_wal(write_options.disableWAL), disable_wal(write_options.disableWAL),

View File

@ -76,6 +76,161 @@ class DbStressRandomAccessFileWrapper : public FSRandomAccessFileOwnerWrapper {
} }
}; };
class DbStressWritableFileWrapper : public FSWritableFileOwnerWrapper {
public:
explicit DbStressWritableFileWrapper(std::unique_ptr<FSWritableFile>&& target)
: FSWritableFileOwnerWrapper(std::move(target)) {}
IOStatus Append(const Slice& data, const IOOptions& options,
IODebugContext* dbg) override {
#ifndef NDEBUG
const ThreadStatus::OperationType thread_op =
ThreadStatusUtil::GetThreadOperation();
Env::IOActivity io_activity =
ThreadStatusUtil::TEST_GetExpectedIOActivity(thread_op);
assert(io_activity == Env::IOActivity::kUnknown ||
io_activity == options.io_activity);
#endif
return target()->Append(data, options, dbg);
}
IOStatus Append(const Slice& data, const IOOptions& options,
const DataVerificationInfo& verification_info,
IODebugContext* dbg) override {
#ifndef NDEBUG
const ThreadStatus::OperationType thread_op =
ThreadStatusUtil::GetThreadOperation();
Env::IOActivity io_activity =
ThreadStatusUtil::TEST_GetExpectedIOActivity(thread_op);
assert(io_activity == Env::IOActivity::kUnknown ||
io_activity == options.io_activity);
#endif
return target()->Append(data, options, verification_info, dbg);
}
IOStatus PositionedAppend(const Slice& data, uint64_t offset,
const IOOptions& options,
IODebugContext* dbg) override {
#ifndef NDEBUG
const ThreadStatus::OperationType thread_op =
ThreadStatusUtil::GetThreadOperation();
Env::IOActivity io_activity =
ThreadStatusUtil::TEST_GetExpectedIOActivity(thread_op);
assert(io_activity == Env::IOActivity::kUnknown ||
io_activity == options.io_activity);
#endif
return target()->PositionedAppend(data, offset, options, dbg);
}
IOStatus PositionedAppend(const Slice& data, uint64_t offset,
const IOOptions& options,
const DataVerificationInfo& verification_info,
IODebugContext* dbg) override {
#ifndef NDEBUG
const ThreadStatus::OperationType thread_op =
ThreadStatusUtil::GetThreadOperation();
Env::IOActivity io_activity =
ThreadStatusUtil::TEST_GetExpectedIOActivity(thread_op);
assert(io_activity == Env::IOActivity::kUnknown ||
io_activity == options.io_activity);
#endif
return target()->PositionedAppend(data, offset, options, verification_info,
dbg);
}
virtual IOStatus Truncate(uint64_t size, const IOOptions& options,
IODebugContext* dbg) override {
#ifndef NDEBUG
const ThreadStatus::OperationType thread_op =
ThreadStatusUtil::GetThreadOperation();
Env::IOActivity io_activity =
ThreadStatusUtil::TEST_GetExpectedIOActivity(thread_op);
assert(io_activity == Env::IOActivity::kUnknown ||
io_activity == options.io_activity);
#endif
return target()->Truncate(size, options, dbg);
}
virtual IOStatus Close(const IOOptions& options,
IODebugContext* dbg) override {
#ifndef NDEBUG
const ThreadStatus::OperationType thread_op =
ThreadStatusUtil::GetThreadOperation();
Env::IOActivity io_activity =
ThreadStatusUtil::TEST_GetExpectedIOActivity(thread_op);
assert(io_activity == Env::IOActivity::kUnknown ||
io_activity == options.io_activity);
#endif
return target()->Close(options, dbg);
}
virtual IOStatus Flush(const IOOptions& options,
IODebugContext* dbg) override {
#ifndef NDEBUG
const ThreadStatus::OperationType thread_op =
ThreadStatusUtil::GetThreadOperation();
Env::IOActivity io_activity =
ThreadStatusUtil::TEST_GetExpectedIOActivity(thread_op);
assert(io_activity == Env::IOActivity::kUnknown ||
io_activity == options.io_activity);
#endif
return target()->Flush(options, dbg);
}
virtual IOStatus Sync(const IOOptions& options,
IODebugContext* dbg) override {
#ifndef NDEBUG
const ThreadStatus::OperationType thread_op =
ThreadStatusUtil::GetThreadOperation();
Env::IOActivity io_activity =
ThreadStatusUtil::TEST_GetExpectedIOActivity(thread_op);
assert(io_activity == Env::IOActivity::kUnknown ||
io_activity == options.io_activity);
#endif
return target()->Sync(options, dbg);
}
virtual IOStatus Fsync(const IOOptions& options,
IODebugContext* dbg) override {
#ifndef NDEBUG
const ThreadStatus::OperationType thread_op =
ThreadStatusUtil::GetThreadOperation();
Env::IOActivity io_activity =
ThreadStatusUtil::TEST_GetExpectedIOActivity(thread_op);
assert(io_activity == Env::IOActivity::kUnknown ||
io_activity == options.io_activity);
#endif
return target()->Fsync(options, dbg);
}
#ifdef ROCKSDB_FALLOCATE_PRESENT
virtual IOStatus Allocate(uint64_t offset, uint64_t len,
const IOOptions& options,
IODebugContext* dbg) override {
#ifndef NDEBUG
const ThreadStatus::OperationType thread_op =
ThreadStatusUtil::GetThreadOperation();
Env::IOActivity io_activity =
ThreadStatusUtil::TEST_GetExpectedIOActivity(thread_op);
assert(io_activity == Env::IOActivity::kUnknown ||
io_activity == options.io_activity);
#endif
return target()->Allocate(offset, len, options, dbg);
}
#endif
virtual IOStatus RangeSync(uint64_t offset, uint64_t nbytes,
const IOOptions& options,
IODebugContext* dbg) override {
#ifndef NDEBUG
const ThreadStatus::OperationType thread_op =
ThreadStatusUtil::GetThreadOperation();
Env::IOActivity io_activity =
ThreadStatusUtil::TEST_GetExpectedIOActivity(thread_op);
assert(io_activity == Env::IOActivity::kUnknown ||
io_activity == options.io_activity);
#endif
return target()->RangeSync(offset, nbytes, options, dbg);
}
};
class DbStressFSWrapper : public FileSystemWrapper { class DbStressFSWrapper : public FileSystemWrapper {
public: public:
explicit DbStressFSWrapper(const std::shared_ptr<FileSystem>& t) explicit DbStressFSWrapper(const std::shared_ptr<FileSystem>& t)
@ -95,6 +250,17 @@ class DbStressFSWrapper : public FileSystemWrapper {
return s; return s;
} }
IOStatus NewWritableFile(const std::string& f, const FileOptions& file_opts,
std::unique_ptr<FSWritableFile>* r,
IODebugContext* dbg) override {
std::unique_ptr<FSWritableFile> file;
IOStatus s = target()->NewWritableFile(f, file_opts, &file, dbg);
if (s.ok()) {
r->reset(new DbStressWritableFileWrapper(std::move(file)));
}
return s;
}
IOStatus DeleteFile(const std::string& f, const IOOptions& opts, IOStatus DeleteFile(const std::string& f, const IOOptions& opts,
IODebugContext* dbg) override { IODebugContext* dbg) override {
// We determine whether it is a manifest file by searching a strong, // We determine whether it is a manifest file by searching a strong,

View File

@ -130,8 +130,13 @@ UniqueIdVerifier::UniqueIdVerifier(const std::string& db_name, Env* env)
} }
UniqueIdVerifier::~UniqueIdVerifier() { UniqueIdVerifier::~UniqueIdVerifier() {
IOStatus s = data_file_writer_->Close(); ThreadStatus::OperationType cur_op_type =
ThreadStatusUtil::GetThreadOperation();
ThreadStatusUtil::SetThreadOperation(ThreadStatus::OperationType::OP_UNKNOWN);
IOStatus s;
s = data_file_writer_->Close(IOOptions());
assert(s.ok()); assert(s.ok());
ThreadStatusUtil::SetThreadOperation(cur_op_type);
} }
void UniqueIdVerifier::VerifyNoWrite(const std::string& id) { void UniqueIdVerifier::VerifyNoWrite(const std::string& id) {
@ -153,13 +158,14 @@ void UniqueIdVerifier::Verify(const std::string& id) {
if (id_set_.size() >= 4294967) { if (id_set_.size() >= 4294967) {
return; return;
} }
IOStatus s = data_file_writer_->Append(Slice(id)); IOOptions opts;
IOStatus s = data_file_writer_->Append(opts, Slice(id));
if (!s.ok()) { if (!s.ok()) {
fprintf(stderr, "Error writing to unique id file: %s\n", fprintf(stderr, "Error writing to unique id file: %s\n",
s.ToString().c_str()); s.ToString().c_str());
assert(false); assert(false);
} }
s = data_file_writer_->Flush(); s = data_file_writer_->Flush(opts);
if (!s.ok()) { if (!s.ok()) {
fprintf(stderr, "Error flushing unique id file: %s\n", fprintf(stderr, "Error flushing unique id file: %s\n",
s.ToString().c_str()); s.ToString().c_str());

View File

@ -373,10 +373,15 @@ Status MultiOpsTxnsStressTest::TestGet(
ThreadState* thread, const ReadOptions& read_opts, ThreadState* thread, const ReadOptions& read_opts,
const std::vector<int>& /*rand_column_families*/, const std::vector<int>& /*rand_column_families*/,
const std::vector<int64_t>& /*rand_keys*/) { const std::vector<int64_t>& /*rand_keys*/) {
ThreadStatus::OperationType cur_op_type =
ThreadStatusUtil::GetThreadOperation();
ThreadStatusUtil::SetThreadOperation(ThreadStatus::OperationType::OP_UNKNOWN);
uint32_t a = 0; uint32_t a = 0;
uint32_t pos = 0; uint32_t pos = 0;
std::tie(a, pos) = ChooseExistingA(thread); std::tie(a, pos) = ChooseExistingA(thread);
return PointLookupTxn(thread, read_opts, a); Status s = PointLookupTxn(thread, read_opts, a);
ThreadStatusUtil::SetThreadOperation(cur_op_type);
return s;
} }
// Not used. // Not used.
@ -416,10 +421,15 @@ Status MultiOpsTxnsStressTest::TestIterate(
ThreadState* thread, const ReadOptions& read_opts, ThreadState* thread, const ReadOptions& read_opts,
const std::vector<int>& /*rand_column_families*/, const std::vector<int>& /*rand_column_families*/,
const std::vector<int64_t>& /*rand_keys*/) { const std::vector<int64_t>& /*rand_keys*/) {
ThreadStatus::OperationType cur_op_type =
ThreadStatusUtil::GetThreadOperation();
ThreadStatusUtil::SetThreadOperation(ThreadStatus::OperationType::OP_UNKNOWN);
uint32_t c = 0; uint32_t c = 0;
uint32_t pos = 0; uint32_t pos = 0;
std::tie(c, pos) = ChooseExistingC(thread); std::tie(c, pos) = ChooseExistingC(thread);
return RangeScanTxn(thread, read_opts, c); Status s = RangeScanTxn(thread, read_opts, c);
ThreadStatusUtil::SetThreadOperation(cur_op_type);
return s;
} }
// Not intended for use. // Not intended for use.
@ -1221,7 +1231,11 @@ void MultiOpsTxnsStressTest::VerifyPkSkFast(const ReadOptions& read_options,
assert(db_ == db); assert(db_ == db);
assert(db_ != nullptr); assert(db_ != nullptr);
ThreadStatus::OperationType cur_op_type =
ThreadStatusUtil::GetThreadOperation();
ThreadStatusUtil::SetThreadOperation(ThreadStatus::OperationType::OP_UNKNOWN);
const Snapshot* const snapshot = db_->GetSnapshot(); const Snapshot* const snapshot = db_->GetSnapshot();
ThreadStatusUtil::SetThreadOperation(cur_op_type);
assert(snapshot); assert(snapshot);
ManagedSnapshot snapshot_guard(db_, snapshot); ManagedSnapshot snapshot_guard(db_, snapshot);

5
env/env.cc vendored
View File

@ -1051,9 +1051,10 @@ void Log(const std::shared_ptr<Logger>& info_log, const char* format, ...) {
} }
Status WriteStringToFile(Env* env, const Slice& data, const std::string& fname, Status WriteStringToFile(Env* env, const Slice& data, const std::string& fname,
bool should_sync) { bool should_sync, const IOOptions* io_options) {
const auto& fs = env->GetFileSystem(); const auto& fs = env->GetFileSystem();
return WriteStringToFile(fs.get(), data, fname, should_sync); return WriteStringToFile(fs.get(), data, fname, should_sync,
io_options ? *io_options : IOOptions());
} }
Status ReadFileToString(Env* env, const std::string& fname, std::string* data) { Status ReadFileToString(Env* env, const std::string& fname, std::string* data) {

2
env/env_test.cc vendored
View File

@ -2610,7 +2610,7 @@ TEST_F(EnvTest, IsDirectory) {
FileOptions(), FileOptions(),
SystemClock::Default().get())); SystemClock::Default().get()));
constexpr char buf[] = "test"; constexpr char buf[] = "test";
s = fwriter->Append(buf); s = fwriter->Append(IOOptions(), buf);
ASSERT_OK(s); ASSERT_OK(s);
} }
ASSERT_OK(Env::Default()->IsDirectory(test_file_path, &is_dir)); ASSERT_OK(Env::Default()->IsDirectory(test_file_path, &is_dir));

9
env/file_system.cc vendored
View File

@ -180,19 +180,20 @@ FileOptions FileSystem::OptimizeForBlobFileRead(
} }
IOStatus WriteStringToFile(FileSystem* fs, const Slice& data, IOStatus WriteStringToFile(FileSystem* fs, const Slice& data,
const std::string& fname, bool should_sync) { const std::string& fname, bool should_sync,
const IOOptions& io_options) {
std::unique_ptr<FSWritableFile> file; std::unique_ptr<FSWritableFile> file;
EnvOptions soptions; EnvOptions soptions;
IOStatus s = fs->NewWritableFile(fname, soptions, &file, nullptr); IOStatus s = fs->NewWritableFile(fname, soptions, &file, nullptr);
if (!s.ok()) { if (!s.ok()) {
return s; return s;
} }
s = file->Append(data, IOOptions(), nullptr); s = file->Append(data, io_options, nullptr);
if (s.ok() && should_sync) { if (s.ok() && should_sync) {
s = file->Sync(IOOptions(), nullptr); s = file->Sync(io_options, nullptr);
} }
if (!s.ok()) { if (!s.ok()) {
fs->DeleteFile(fname, IOOptions(), nullptr); fs->DeleteFile(fname, io_options, nullptr);
} }
return s; return s;
} }

View File

@ -26,6 +26,7 @@ IOStatus CopyFile(FileSystem* fs, const std::string& source,
FileOptions soptions; FileOptions soptions;
IOStatus io_s; IOStatus io_s;
std::unique_ptr<SequentialFileReader> src_reader; std::unique_ptr<SequentialFileReader> src_reader;
const IOOptions opts;
{ {
soptions.temperature = temperature; soptions.temperature = temperature;
@ -37,7 +38,7 @@ IOStatus CopyFile(FileSystem* fs, const std::string& source,
if (size == 0) { if (size == 0) {
// default argument means copy everything // default argument means copy everything
io_s = fs->GetFileSize(source, IOOptions(), &size, nullptr); io_s = fs->GetFileSize(source, opts, &size, nullptr);
if (!io_s.ok()) { if (!io_s.ok()) {
return io_s; return io_s;
} }
@ -60,13 +61,14 @@ IOStatus CopyFile(FileSystem* fs, const std::string& source,
if (slice.size() == 0) { if (slice.size() == 0) {
return IOStatus::Corruption("file too small"); return IOStatus::Corruption("file too small");
} }
io_s = dest_writer->Append(slice);
io_s = dest_writer->Append(opts, slice);
if (!io_s.ok()) { if (!io_s.ok()) {
return io_s; return io_s;
} }
size -= slice.size(); size -= slice.size();
} }
return dest_writer->Sync(use_fsync); return dest_writer->Sync(opts, use_fsync);
} }
IOStatus CopyFile(FileSystem* fs, const std::string& source, IOStatus CopyFile(FileSystem* fs, const std::string& source,
@ -85,6 +87,7 @@ IOStatus CopyFile(FileSystem* fs, const std::string& source,
return io_s; return io_s;
} }
// TODO: pass in Histograms if the destination file is sst or blob
dest_writer.reset( dest_writer.reset(
new WritableFileWriter(std::move(destfile), destination, options)); new WritableFileWriter(std::move(destfile), destination, options));
} }
@ -99,19 +102,21 @@ IOStatus CreateFile(FileSystem* fs, const std::string& destination,
const EnvOptions soptions; const EnvOptions soptions;
IOStatus io_s; IOStatus io_s;
std::unique_ptr<WritableFileWriter> dest_writer; std::unique_ptr<WritableFileWriter> dest_writer;
const IOOptions opts;
std::unique_ptr<FSWritableFile> destfile; std::unique_ptr<FSWritableFile> destfile;
io_s = fs->NewWritableFile(destination, soptions, &destfile, nullptr); io_s = fs->NewWritableFile(destination, soptions, &destfile, nullptr);
if (!io_s.ok()) { if (!io_s.ok()) {
return io_s; return io_s;
} }
// TODO: pass in Histograms if the destination file is sst or blob
dest_writer.reset( dest_writer.reset(
new WritableFileWriter(std::move(destfile), destination, soptions)); new WritableFileWriter(std::move(destfile), destination, soptions));
io_s = dest_writer->Append(Slice(contents)); io_s = dest_writer->Append(opts, Slice(contents));
if (!io_s.ok()) { if (!io_s.ok()) {
return io_s; return io_s;
} }
return dest_writer->Sync(use_fsync); return dest_writer->Sync(opts, use_fsync);
} }
Status DeleteDBFile(const ImmutableDBOptions* db_options, Status DeleteDBFile(const ImmutableDBOptions* db_options,

View File

@ -87,6 +87,14 @@ inline IOStatus PrepareIOFromReadOptions(const ReadOptions& ro,
return IOStatus::OK(); return IOStatus::OK();
} }
inline IOStatus PrepareIOFromWriteOptions(const WriteOptions& wo,
IOOptions& opts) {
opts.rate_limiter_priority = wo.rate_limiter_priority;
opts.io_activity = wo.io_activity;
return IOStatus::OK();
}
// Test method to delete the input directory and all of its contents. // Test method to delete the input directory and all of its contents.
// This method is destructive and is meant for use only in tests!!! // This method is destructive and is meant for use only in tests!!!
Status DestroyDir(Env* env, const std::string& dir); Status DestroyDir(Env* env, const std::string& dir);

View File

@ -13,8 +13,10 @@
#include <cstdio> #include <cstdio>
#include <vector> #include <vector>
#include "file/file_util.h"
#include "file/writable_file_writer.h" #include "file/writable_file_writer.h"
#include "rocksdb/env.h" #include "rocksdb/env.h"
#include "rocksdb/file_system.h"
#include "test_util/sync_point.h" #include "test_util/sync_point.h"
#include "util/stop_watch.h" #include "util/stop_watch.h"
#include "util/string_util.h" #include "util/string_util.h"
@ -384,8 +386,8 @@ bool ParseFileName(const std::string& fname, uint64_t* number,
return true; return true;
} }
IOStatus SetCurrentFile(FileSystem* fs, const std::string& dbname, IOStatus SetCurrentFile(const WriteOptions& write_options, FileSystem* fs,
uint64_t descriptor_number, const std::string& dbname, uint64_t descriptor_number,
FSDirectory* dir_contains_current_file) { FSDirectory* dir_contains_current_file) {
// Remove leading "dbname/" and add newline to manifest file name // Remove leading "dbname/" and add newline to manifest file name
std::string manifest = DescriptorFileName(dbname, descriptor_number); std::string manifest = DescriptorFileName(dbname, descriptor_number);
@ -393,21 +395,25 @@ IOStatus SetCurrentFile(FileSystem* fs, const std::string& dbname,
assert(contents.starts_with(dbname + "/")); assert(contents.starts_with(dbname + "/"));
contents.remove_prefix(dbname.size() + 1); contents.remove_prefix(dbname.size() + 1);
std::string tmp = TempFileName(dbname, descriptor_number); std::string tmp = TempFileName(dbname, descriptor_number);
IOStatus s = WriteStringToFile(fs, contents.ToString() + "\n", tmp, true); IOOptions opts;
IOStatus s = PrepareIOFromWriteOptions(write_options, opts);
if (s.ok()) {
s = WriteStringToFile(fs, contents.ToString() + "\n", tmp, true, opts);
}
TEST_SYNC_POINT_CALLBACK("SetCurrentFile:BeforeRename", &s); TEST_SYNC_POINT_CALLBACK("SetCurrentFile:BeforeRename", &s);
if (s.ok()) { if (s.ok()) {
TEST_KILL_RANDOM_WITH_WEIGHT("SetCurrentFile:0", REDUCE_ODDS2); TEST_KILL_RANDOM_WITH_WEIGHT("SetCurrentFile:0", REDUCE_ODDS2);
s = fs->RenameFile(tmp, CurrentFileName(dbname), IOOptions(), nullptr); s = fs->RenameFile(tmp, CurrentFileName(dbname), opts, nullptr);
TEST_KILL_RANDOM_WITH_WEIGHT("SetCurrentFile:1", REDUCE_ODDS2); TEST_KILL_RANDOM_WITH_WEIGHT("SetCurrentFile:1", REDUCE_ODDS2);
TEST_SYNC_POINT_CALLBACK("SetCurrentFile:AfterRename", &s); TEST_SYNC_POINT_CALLBACK("SetCurrentFile:AfterRename", &s);
} }
if (s.ok()) { if (s.ok()) {
if (dir_contains_current_file != nullptr) { if (dir_contains_current_file != nullptr) {
s = dir_contains_current_file->FsyncWithDirOptions( s = dir_contains_current_file->FsyncWithDirOptions(
IOOptions(), nullptr, DirFsyncOptions(CurrentFileName(dbname))); opts, nullptr, DirFsyncOptions(CurrentFileName(dbname)));
} }
} else { } else {
fs->DeleteFile(tmp, IOOptions(), nullptr) fs->DeleteFile(tmp, opts, nullptr)
.PermitUncheckedError(); // NOTE: PermitUncheckedError is acceptable .PermitUncheckedError(); // NOTE: PermitUncheckedError is acceptable
// here as we are already handling an error // here as we are already handling an error
// case, and this is just a best-attempt // case, and this is just a best-attempt
@ -416,8 +422,8 @@ IOStatus SetCurrentFile(FileSystem* fs, const std::string& dbname,
return s; return s;
} }
Status SetIdentityFile(Env* env, const std::string& dbname, Status SetIdentityFile(const WriteOptions& write_options, Env* env,
const std::string& db_id) { const std::string& dbname, const std::string& db_id) {
std::string id; std::string id;
if (db_id.empty()) { if (db_id.empty()) {
id = env->GenerateUniqueId(); id = env->GenerateUniqueId();
@ -428,17 +434,21 @@ Status SetIdentityFile(Env* env, const std::string& dbname,
// Reserve the filename dbname/000000.dbtmp for the temporary identity file // Reserve the filename dbname/000000.dbtmp for the temporary identity file
std::string tmp = TempFileName(dbname, 0); std::string tmp = TempFileName(dbname, 0);
std::string identify_file_name = IdentityFileName(dbname); std::string identify_file_name = IdentityFileName(dbname);
Status s = WriteStringToFile(env, id, tmp, true); Status s;
IOOptions opts;
s = PrepareIOFromWriteOptions(write_options, opts);
if (s.ok()) {
s = WriteStringToFile(env, id, tmp, true, &opts);
}
if (s.ok()) { if (s.ok()) {
s = env->RenameFile(tmp, identify_file_name); s = env->RenameFile(tmp, identify_file_name);
} }
std::unique_ptr<FSDirectory> dir_obj; std::unique_ptr<FSDirectory> dir_obj;
if (s.ok()) { if (s.ok()) {
s = env->GetFileSystem()->NewDirectory(dbname, IOOptions(), &dir_obj, s = env->GetFileSystem()->NewDirectory(dbname, opts, &dir_obj, nullptr);
nullptr);
} }
if (s.ok()) { if (s.ok()) {
s = dir_obj->FsyncWithDirOptions(IOOptions(), nullptr, s = dir_obj->FsyncWithDirOptions(opts, nullptr,
DirFsyncOptions(identify_file_name)); DirFsyncOptions(identify_file_name));
} }
@ -446,7 +456,7 @@ Status SetIdentityFile(Env* env, const std::string& dbname,
// if it is not impelmented. Detailed explanations can be found in // if it is not impelmented. Detailed explanations can be found in
// db/db_impl/db_impl.h // db/db_impl/db_impl.h
if (s.ok()) { if (s.ok()) {
Status temp_s = dir_obj->Close(IOOptions(), nullptr); Status temp_s = dir_obj->Close(opts, nullptr);
if (!temp_s.ok()) { if (!temp_s.ok()) {
if (temp_s.IsNotSupported()) { if (temp_s.IsNotSupported()) {
temp_s.PermitUncheckedError(); temp_s.PermitUncheckedError();
@ -462,10 +472,16 @@ Status SetIdentityFile(Env* env, const std::string& dbname,
} }
IOStatus SyncManifest(const ImmutableDBOptions* db_options, IOStatus SyncManifest(const ImmutableDBOptions* db_options,
const WriteOptions& write_options,
WritableFileWriter* file) { WritableFileWriter* file) {
TEST_KILL_RANDOM_WITH_WEIGHT("SyncManifest:0", REDUCE_ODDS2); TEST_KILL_RANDOM_WITH_WEIGHT("SyncManifest:0", REDUCE_ODDS2);
StopWatch sw(db_options->clock, db_options->stats, MANIFEST_FILE_SYNC_MICROS); StopWatch sw(db_options->clock, db_options->stats, MANIFEST_FILE_SYNC_MICROS);
return file->Sync(db_options->use_fsync); IOOptions io_options;
IOStatus s = PrepareIOFromWriteOptions(write_options, io_options);
if (!s.ok()) {
return s;
}
return file->Sync(io_options, db_options->use_fsync);
} }
Status GetInfoLogFiles(const std::shared_ptr<FileSystem>& fs, Status GetInfoLogFiles(const std::shared_ptr<FileSystem>& fs,

View File

@ -162,16 +162,19 @@ extern bool ParseFileName(const std::string& filename, uint64_t* number,
// specified number. On its success and when dir_contains_current_file is not // specified number. On its success and when dir_contains_current_file is not
// nullptr, the function will fsync the directory containing the CURRENT file // nullptr, the function will fsync the directory containing the CURRENT file
// when // when
extern IOStatus SetCurrentFile(FileSystem* fs, const std::string& dbname, extern IOStatus SetCurrentFile(const WriteOptions& write_options,
FileSystem* fs, const std::string& dbname,
uint64_t descriptor_number, uint64_t descriptor_number,
FSDirectory* dir_contains_current_file); FSDirectory* dir_contains_current_file);
// Make the IDENTITY file for the db // Make the IDENTITY file for the db
extern Status SetIdentityFile(Env* env, const std::string& dbname, extern Status SetIdentityFile(const WriteOptions& write_options, Env* env,
const std::string& dbname,
const std::string& db_id = {}); const std::string& db_id = {});
// Sync manifest file `file`. // Sync manifest file `file`.
extern IOStatus SyncManifest(const ImmutableDBOptions* db_options, extern IOStatus SyncManifest(const ImmutableDBOptions* db_options,
const WriteOptions& write_options,
WritableFileWriter* file); WritableFileWriter* file);
// Return list of file names of info logs in `file_names`. // Return list of file names of info logs in `file_names`.

View File

@ -13,6 +13,7 @@
#include <mutex> #include <mutex>
#include "db/version_edit.h" #include "db/version_edit.h"
#include "file/file_util.h"
#include "monitoring/histogram.h" #include "monitoring/histogram.h"
#include "monitoring/iostats_context_imp.h" #include "monitoring/iostats_context_imp.h"
#include "port/port.h" #include "port/port.h"
@ -24,6 +25,24 @@
#include "util/rate_limiter_impl.h" #include "util/rate_limiter_impl.h"
namespace ROCKSDB_NAMESPACE { namespace ROCKSDB_NAMESPACE {
inline Histograms GetFileWriteHistograms(Histograms file_writer_hist,
Env::IOActivity io_activity) {
if (file_writer_hist == Histograms::SST_WRITE_MICROS ||
file_writer_hist == Histograms::BLOB_DB_BLOB_FILE_WRITE_MICROS) {
switch (io_activity) {
case Env::IOActivity::kFlush:
return Histograms::FILE_WRITE_FLUSH_MICROS;
case Env::IOActivity::kCompaction:
return Histograms::FILE_WRITE_COMPACTION_MICROS;
case Env::IOActivity::kDBOpen:
return Histograms::FILE_WRITE_DB_OPEN_MICROS;
default:
break;
}
}
return Histograms::HISTOGRAM_ENUM_MAX;
}
IOStatus WritableFileWriter::Create(const std::shared_ptr<FileSystem>& fs, IOStatus WritableFileWriter::Create(const std::shared_ptr<FileSystem>& fs,
const std::string& fname, const std::string& fname,
const FileOptions& file_opts, const FileOptions& file_opts,
@ -42,12 +61,16 @@ IOStatus WritableFileWriter::Create(const std::shared_ptr<FileSystem>& fs,
return io_s; return io_s;
} }
IOStatus WritableFileWriter::Append(const Slice& data, uint32_t crc32c_checksum, IOStatus WritableFileWriter::Append(const IOOptions& opts, const Slice& data,
Env::IOPriority op_rate_limiter_priority) { uint32_t crc32c_checksum) {
if (seen_error()) { if (seen_error()) {
return AssertFalseAndGetStatusForPrevError(); return AssertFalseAndGetStatusForPrevError();
} }
StopWatch sw(clock_, stats_, hist_type_,
GetFileWriteHistograms(hist_type_, opts.io_activity));
const IOOptions io_options = FinalizeIOOptions(opts);
const char* src = data.data(); const char* src = data.data();
size_t left = data.size(); size_t left = data.size();
IOStatus s; IOStatus s;
@ -59,10 +82,6 @@ IOStatus WritableFileWriter::Append(const Slice& data, uint32_t crc32c_checksum,
UpdateFileChecksum(data); UpdateFileChecksum(data);
{ {
IOOptions io_options;
io_options.rate_limiter_priority =
WritableFileWriter::DecideRateLimiterPriority(
writable_file_->GetIOPriority(), op_rate_limiter_priority);
IOSTATS_TIMER_GUARD(prepare_write_nanos); IOSTATS_TIMER_GUARD(prepare_write_nanos);
TEST_SYNC_POINT("WritableFileWriter::Append:BeforePrepareWrite"); TEST_SYNC_POINT("WritableFileWriter::Append:BeforePrepareWrite");
writable_file_->PrepareWrite(static_cast<size_t>(GetFileSize()), left, writable_file_->PrepareWrite(static_cast<size_t>(GetFileSize()), left,
@ -88,7 +107,7 @@ IOStatus WritableFileWriter::Append(const Slice& data, uint32_t crc32c_checksum,
// Flush only when buffered I/O // Flush only when buffered I/O
if (!use_direct_io() && (buf_.Capacity() - buf_.CurrentSize()) < left) { if (!use_direct_io() && (buf_.Capacity() - buf_.CurrentSize()) < left) {
if (buf_.CurrentSize() > 0) { if (buf_.CurrentSize() > 0) {
s = Flush(op_rate_limiter_priority); s = Flush(io_options);
if (!s.ok()) { if (!s.ok()) {
set_seen_error(); set_seen_error();
return s; return s;
@ -119,7 +138,7 @@ IOStatus WritableFileWriter::Append(const Slice& data, uint32_t crc32c_checksum,
src += appended; src += appended;
if (left > 0) { if (left > 0) {
s = Flush(op_rate_limiter_priority); s = Flush(io_options);
if (!s.ok()) { if (!s.ok()) {
break; break;
} }
@ -129,7 +148,7 @@ IOStatus WritableFileWriter::Append(const Slice& data, uint32_t crc32c_checksum,
} else { } else {
assert(buf_.CurrentSize() == 0); assert(buf_.CurrentSize() == 0);
buffered_data_crc32c_checksum_ = crc32c_checksum; buffered_data_crc32c_checksum_ = crc32c_checksum;
s = WriteBufferedWithChecksum(src, left, op_rate_limiter_priority); s = WriteBufferedWithChecksum(io_options, src, left);
} }
} else { } else {
// In this case, either we do not need to do the data verification or // In this case, either we do not need to do the data verification or
@ -149,7 +168,7 @@ IOStatus WritableFileWriter::Append(const Slice& data, uint32_t crc32c_checksum,
src += appended; src += appended;
if (left > 0) { if (left > 0) {
s = Flush(op_rate_limiter_priority); s = Flush(io_options);
if (!s.ok()) { if (!s.ok()) {
break; break;
} }
@ -160,9 +179,9 @@ IOStatus WritableFileWriter::Append(const Slice& data, uint32_t crc32c_checksum,
assert(buf_.CurrentSize() == 0); assert(buf_.CurrentSize() == 0);
if (perform_data_verification_ && buffered_data_with_checksum_) { if (perform_data_verification_ && buffered_data_with_checksum_) {
buffered_data_crc32c_checksum_ = crc32c::Value(src, left); buffered_data_crc32c_checksum_ = crc32c::Value(src, left);
s = WriteBufferedWithChecksum(src, left, op_rate_limiter_priority); s = WriteBufferedWithChecksum(io_options, src, left);
} else { } else {
s = WriteBuffered(src, left, op_rate_limiter_priority); s = WriteBuffered(io_options, src, left);
} }
} }
} }
@ -177,11 +196,12 @@ IOStatus WritableFileWriter::Append(const Slice& data, uint32_t crc32c_checksum,
return s; return s;
} }
IOStatus WritableFileWriter::Pad(const size_t pad_bytes, IOStatus WritableFileWriter::Pad(const IOOptions& opts,
Env::IOPriority op_rate_limiter_priority) { const size_t pad_bytes) {
if (seen_error()) { if (seen_error()) {
return AssertFalseAndGetStatusForPrevError(); return AssertFalseAndGetStatusForPrevError();
} }
const IOOptions io_options = FinalizeIOOptions(opts);
assert(pad_bytes < kDefaultPageSize); assert(pad_bytes < kDefaultPageSize);
size_t left = pad_bytes; size_t left = pad_bytes;
size_t cap = buf_.Capacity() - buf_.CurrentSize(); size_t cap = buf_.Capacity() - buf_.CurrentSize();
@ -195,7 +215,7 @@ IOStatus WritableFileWriter::Pad(const size_t pad_bytes,
buf_.PadWith(append_bytes, 0); buf_.PadWith(append_bytes, 0);
left -= append_bytes; left -= append_bytes;
if (left > 0) { if (left > 0) {
IOStatus s = Flush(op_rate_limiter_priority); IOStatus s = Flush(io_options);
if (!s.ok()) { if (!s.ok()) {
set_seen_error(); set_seen_error();
return s; return s;
@ -214,11 +234,12 @@ IOStatus WritableFileWriter::Pad(const size_t pad_bytes,
return IOStatus::OK(); return IOStatus::OK();
} }
IOStatus WritableFileWriter::Close() { IOStatus WritableFileWriter::Close(const IOOptions& opts) {
IOOptions io_options = FinalizeIOOptions(opts);
if (seen_error()) { if (seen_error()) {
IOStatus interim; IOStatus interim;
if (writable_file_.get() != nullptr) { if (writable_file_.get() != nullptr) {
interim = writable_file_->Close(IOOptions(), nullptr); interim = writable_file_->Close(io_options, nullptr);
writable_file_.reset(); writable_file_.reset();
} }
if (interim.ok()) { if (interim.ok()) {
@ -240,11 +261,9 @@ IOStatus WritableFileWriter::Close() {
} }
IOStatus s; IOStatus s;
s = Flush(); // flush cache to OS s = Flush(io_options); // flush cache to OS
IOStatus interim; IOStatus interim;
IOOptions io_options;
io_options.rate_limiter_priority = writable_file_->GetIOPriority();
// In direct I/O mode we write whole pages so // In direct I/O mode we write whole pages so
// we need to let the file know where data ends. // we need to let the file know where data ends.
if (use_direct_io()) { if (use_direct_io()) {
@ -322,11 +341,13 @@ IOStatus WritableFileWriter::Close() {
// write out the cached data to the OS cache or storage if direct I/O // write out the cached data to the OS cache or storage if direct I/O
// enabled // enabled
IOStatus WritableFileWriter::Flush(Env::IOPriority op_rate_limiter_priority) { IOStatus WritableFileWriter::Flush(const IOOptions& opts) {
if (seen_error()) { if (seen_error()) {
return AssertFalseAndGetStatusForPrevError(); return AssertFalseAndGetStatusForPrevError();
} }
const IOOptions io_options = FinalizeIOOptions(opts);
IOStatus s; IOStatus s;
TEST_KILL_RANDOM_WITH_WEIGHT("WritableFileWriter::Flush:0", REDUCE_ODDS2); TEST_KILL_RANDOM_WITH_WEIGHT("WritableFileWriter::Flush:0", REDUCE_ODDS2);
@ -334,18 +355,17 @@ IOStatus WritableFileWriter::Flush(Env::IOPriority op_rate_limiter_priority) {
if (use_direct_io()) { if (use_direct_io()) {
if (pending_sync_) { if (pending_sync_) {
if (perform_data_verification_ && buffered_data_with_checksum_) { if (perform_data_verification_ && buffered_data_with_checksum_) {
s = WriteDirectWithChecksum(op_rate_limiter_priority); s = WriteDirectWithChecksum(io_options);
} else { } else {
s = WriteDirect(op_rate_limiter_priority); s = WriteDirect(io_options);
} }
} }
} else { } else {
if (perform_data_verification_ && buffered_data_with_checksum_) { if (perform_data_verification_ && buffered_data_with_checksum_) {
s = WriteBufferedWithChecksum(buf_.BufferStart(), buf_.CurrentSize(), s = WriteBufferedWithChecksum(io_options, buf_.BufferStart(),
op_rate_limiter_priority); buf_.CurrentSize());
} else { } else {
s = WriteBuffered(buf_.BufferStart(), buf_.CurrentSize(), s = WriteBuffered(io_options, buf_.BufferStart(), buf_.CurrentSize());
op_rate_limiter_priority);
} }
} }
if (!s.ok()) { if (!s.ok()) {
@ -359,10 +379,6 @@ IOStatus WritableFileWriter::Flush(Env::IOPriority op_rate_limiter_priority) {
if (ShouldNotifyListeners()) { if (ShouldNotifyListeners()) {
start_ts = FileOperationInfo::StartNow(); start_ts = FileOperationInfo::StartNow();
} }
IOOptions io_options;
io_options.rate_limiter_priority =
WritableFileWriter::DecideRateLimiterPriority(
writable_file_->GetIOPriority(), op_rate_limiter_priority);
s = writable_file_->Flush(io_options, nullptr); s = writable_file_->Flush(io_options, nullptr);
if (ShouldNotifyListeners()) { if (ShouldNotifyListeners()) {
auto finish_ts = std::chrono::steady_clock::now(); auto finish_ts = std::chrono::steady_clock::now();
@ -400,7 +416,8 @@ IOStatus WritableFileWriter::Flush(Env::IOPriority op_rate_limiter_priority) {
assert(offset_sync_to >= last_sync_size_); assert(offset_sync_to >= last_sync_size_);
if (offset_sync_to > 0 && if (offset_sync_to > 0 &&
offset_sync_to - last_sync_size_ >= bytes_per_sync_) { offset_sync_to - last_sync_size_ >= bytes_per_sync_) {
s = RangeSync(last_sync_size_, offset_sync_to - last_sync_size_); s = RangeSync(io_options, last_sync_size_,
offset_sync_to - last_sync_size_);
if (!s.ok()) { if (!s.ok()) {
set_seen_error(); set_seen_error();
} }
@ -429,19 +446,25 @@ const char* WritableFileWriter::GetFileChecksumFuncName() const {
} }
} }
IOStatus WritableFileWriter::Sync(bool use_fsync) { IOStatus WritableFileWriter::PrepareIOOptions(const WriteOptions& wo,
IOOptions& opts) {
return PrepareIOFromWriteOptions(wo, opts);
}
IOStatus WritableFileWriter::Sync(const IOOptions& opts, bool use_fsync) {
if (seen_error()) { if (seen_error()) {
return AssertFalseAndGetStatusForPrevError(); return AssertFalseAndGetStatusForPrevError();
} }
IOStatus s = Flush(); IOOptions io_options = FinalizeIOOptions(opts);
IOStatus s = Flush(io_options);
if (!s.ok()) { if (!s.ok()) {
set_seen_error(); set_seen_error();
return s; return s;
} }
TEST_KILL_RANDOM("WritableFileWriter::Sync:0"); TEST_KILL_RANDOM("WritableFileWriter::Sync:0");
if (!use_direct_io() && pending_sync_) { if (!use_direct_io() && pending_sync_) {
s = SyncInternal(use_fsync); s = SyncInternal(io_options, use_fsync);
if (!s.ok()) { if (!s.ok()) {
set_seen_error(); set_seen_error();
return s; return s;
@ -452,17 +475,19 @@ IOStatus WritableFileWriter::Sync(bool use_fsync) {
return IOStatus::OK(); return IOStatus::OK();
} }
IOStatus WritableFileWriter::SyncWithoutFlush(bool use_fsync) { IOStatus WritableFileWriter::SyncWithoutFlush(const IOOptions& opts,
bool use_fsync) {
if (seen_error()) { if (seen_error()) {
return AssertFalseAndGetStatusForPrevError(); return AssertFalseAndGetStatusForPrevError();
} }
IOOptions io_options = FinalizeIOOptions(opts);
if (!writable_file_->IsSyncThreadSafe()) { if (!writable_file_->IsSyncThreadSafe()) {
return IOStatus::NotSupported( return IOStatus::NotSupported(
"Can't WritableFileWriter::SyncWithoutFlush() because " "Can't WritableFileWriter::SyncWithoutFlush() because "
"WritableFile::IsSyncThreadSafe() is false"); "WritableFile::IsSyncThreadSafe() is false");
} }
TEST_SYNC_POINT("WritableFileWriter::SyncWithoutFlush:1"); TEST_SYNC_POINT("WritableFileWriter::SyncWithoutFlush:1");
IOStatus s = SyncInternal(use_fsync); IOStatus s = SyncInternal(io_options, use_fsync);
TEST_SYNC_POINT("WritableFileWriter::SyncWithoutFlush:2"); TEST_SYNC_POINT("WritableFileWriter::SyncWithoutFlush:2");
if (!s.ok()) { if (!s.ok()) {
#ifndef NDEBUG #ifndef NDEBUG
@ -473,7 +498,8 @@ IOStatus WritableFileWriter::SyncWithoutFlush(bool use_fsync) {
return s; return s;
} }
IOStatus WritableFileWriter::SyncInternal(bool use_fsync) { IOStatus WritableFileWriter::SyncInternal(const IOOptions& opts,
bool use_fsync) {
// Caller is supposed to check seen_error_ // Caller is supposed to check seen_error_
IOStatus s; IOStatus s;
IOSTATS_TIMER_GUARD(fsync_nanos); IOSTATS_TIMER_GUARD(fsync_nanos);
@ -487,12 +513,10 @@ IOStatus WritableFileWriter::SyncInternal(bool use_fsync) {
start_ts = FileOperationInfo::StartNow(); start_ts = FileOperationInfo::StartNow();
} }
IOOptions io_options;
io_options.rate_limiter_priority = writable_file_->GetIOPriority();
if (use_fsync) { if (use_fsync) {
s = writable_file_->Fsync(io_options, nullptr); s = writable_file_->Fsync(opts, nullptr);
} else { } else {
s = writable_file_->Sync(io_options, nullptr); s = writable_file_->Sync(opts, nullptr);
} }
if (ShouldNotifyListeners()) { if (ShouldNotifyListeners()) {
auto finish_ts = std::chrono::steady_clock::now(); auto finish_ts = std::chrono::steady_clock::now();
@ -511,7 +535,8 @@ IOStatus WritableFileWriter::SyncInternal(bool use_fsync) {
return s; return s;
} }
IOStatus WritableFileWriter::RangeSync(uint64_t offset, uint64_t nbytes) { IOStatus WritableFileWriter::RangeSync(const IOOptions& opts, uint64_t offset,
uint64_t nbytes) {
if (seen_error()) { if (seen_error()) {
return AssertFalseAndGetStatusForPrevError(); return AssertFalseAndGetStatusForPrevError();
} }
@ -522,9 +547,7 @@ IOStatus WritableFileWriter::RangeSync(uint64_t offset, uint64_t nbytes) {
if (ShouldNotifyListeners()) { if (ShouldNotifyListeners()) {
start_ts = FileOperationInfo::StartNow(); start_ts = FileOperationInfo::StartNow();
} }
IOOptions io_options; IOStatus s = writable_file_->RangeSync(offset, nbytes, opts, nullptr);
io_options.rate_limiter_priority = writable_file_->GetIOPriority();
IOStatus s = writable_file_->RangeSync(offset, nbytes, io_options, nullptr);
if (!s.ok()) { if (!s.ok()) {
set_seen_error(); set_seen_error();
} }
@ -541,8 +564,8 @@ IOStatus WritableFileWriter::RangeSync(uint64_t offset, uint64_t nbytes) {
// This method writes to disk the specified data and makes use of the rate // This method writes to disk the specified data and makes use of the rate
// limiter if available // limiter if available
IOStatus WritableFileWriter::WriteBuffered( IOStatus WritableFileWriter::WriteBuffered(const IOOptions& opts,
const char* data, size_t size, Env::IOPriority op_rate_limiter_priority) { const char* data, size_t size) {
if (seen_error()) { if (seen_error()) {
return AssertFalseAndGetStatusForPrevError(); return AssertFalseAndGetStatusForPrevError();
} }
@ -553,11 +576,7 @@ IOStatus WritableFileWriter::WriteBuffered(
size_t left = size; size_t left = size;
DataVerificationInfo v_info; DataVerificationInfo v_info;
char checksum_buf[sizeof(uint32_t)]; char checksum_buf[sizeof(uint32_t)];
Env::IOPriority rate_limiter_priority_used = Env::IOPriority rate_limiter_priority_used = opts.rate_limiter_priority;
WritableFileWriter::DecideRateLimiterPriority(
writable_file_->GetIOPriority(), op_rate_limiter_priority);
IOOptions io_options;
io_options.rate_limiter_priority = rate_limiter_priority_used;
while (left > 0) { while (left > 0) {
size_t allowed = left; size_t allowed = left;
@ -573,7 +592,7 @@ IOStatus WritableFileWriter::WriteBuffered(
TEST_SYNC_POINT("WritableFileWriter::Flush:BeforeAppend"); TEST_SYNC_POINT("WritableFileWriter::Flush:BeforeAppend");
FileOperationInfo::StartTimePoint start_ts; FileOperationInfo::StartTimePoint start_ts;
uint64_t old_size = writable_file_->GetFileSize(io_options, nullptr); uint64_t old_size = writable_file_->GetFileSize(opts, nullptr);
if (ShouldNotifyListeners()) { if (ShouldNotifyListeners()) {
start_ts = FileOperationInfo::StartNow(); start_ts = FileOperationInfo::StartNow();
old_size = next_write_offset_; old_size = next_write_offset_;
@ -585,10 +604,10 @@ IOStatus WritableFileWriter::WriteBuffered(
if (perform_data_verification_) { if (perform_data_verification_) {
Crc32cHandoffChecksumCalculation(src, allowed, checksum_buf); Crc32cHandoffChecksumCalculation(src, allowed, checksum_buf);
v_info.checksum = Slice(checksum_buf, sizeof(uint32_t)); v_info.checksum = Slice(checksum_buf, sizeof(uint32_t));
s = writable_file_->Append(Slice(src, allowed), io_options, v_info, s = writable_file_->Append(Slice(src, allowed), opts, v_info,
nullptr); nullptr);
} else { } else {
s = writable_file_->Append(Slice(src, allowed), io_options, nullptr); s = writable_file_->Append(Slice(src, allowed), opts, nullptr);
} }
if (!s.ok()) { if (!s.ok()) {
// If writable_file_->Append() failed, then the data may or may not // If writable_file_->Append() failed, then the data may or may not
@ -635,8 +654,9 @@ IOStatus WritableFileWriter::WriteBuffered(
return s; return s;
} }
IOStatus WritableFileWriter::WriteBufferedWithChecksum( IOStatus WritableFileWriter::WriteBufferedWithChecksum(const IOOptions& opts,
const char* data, size_t size, Env::IOPriority op_rate_limiter_priority) { const char* data,
size_t size) {
if (seen_error()) { if (seen_error()) {
return AssertFalseAndGetStatusForPrevError(); return AssertFalseAndGetStatusForPrevError();
} }
@ -648,11 +668,7 @@ IOStatus WritableFileWriter::WriteBufferedWithChecksum(
size_t left = size; size_t left = size;
DataVerificationInfo v_info; DataVerificationInfo v_info;
char checksum_buf[sizeof(uint32_t)]; char checksum_buf[sizeof(uint32_t)];
Env::IOPriority rate_limiter_priority_used = Env::IOPriority rate_limiter_priority_used = opts.rate_limiter_priority;
WritableFileWriter::DecideRateLimiterPriority(
writable_file_->GetIOPriority(), op_rate_limiter_priority);
IOOptions io_options;
io_options.rate_limiter_priority = rate_limiter_priority_used;
// Check how much is allowed. Here, we loop until the rate limiter allows to // Check how much is allowed. Here, we loop until the rate limiter allows to
// write the entire buffer. // write the entire buffer.
// TODO: need to be improved since it sort of defeats the purpose of the rate // TODO: need to be improved since it sort of defeats the purpose of the rate
@ -673,7 +689,7 @@ IOStatus WritableFileWriter::WriteBufferedWithChecksum(
TEST_SYNC_POINT("WritableFileWriter::Flush:BeforeAppend"); TEST_SYNC_POINT("WritableFileWriter::Flush:BeforeAppend");
FileOperationInfo::StartTimePoint start_ts; FileOperationInfo::StartTimePoint start_ts;
uint64_t old_size = writable_file_->GetFileSize(io_options, nullptr); uint64_t old_size = writable_file_->GetFileSize(opts, nullptr);
if (ShouldNotifyListeners()) { if (ShouldNotifyListeners()) {
start_ts = FileOperationInfo::StartNow(); start_ts = FileOperationInfo::StartNow();
old_size = next_write_offset_; old_size = next_write_offset_;
@ -685,7 +701,7 @@ IOStatus WritableFileWriter::WriteBufferedWithChecksum(
EncodeFixed32(checksum_buf, buffered_data_crc32c_checksum_); EncodeFixed32(checksum_buf, buffered_data_crc32c_checksum_);
v_info.checksum = Slice(checksum_buf, sizeof(uint32_t)); v_info.checksum = Slice(checksum_buf, sizeof(uint32_t));
s = writable_file_->Append(Slice(src, left), io_options, v_info, nullptr); s = writable_file_->Append(Slice(src, left), opts, v_info, nullptr);
SetPerfLevel(prev_perf_level); SetPerfLevel(prev_perf_level);
} }
if (ShouldNotifyListeners()) { if (ShouldNotifyListeners()) {
@ -755,8 +771,7 @@ void WritableFileWriter::Crc32cHandoffChecksumCalculation(const char* data,
// whole number of pages to be written again on the next flush because we can // whole number of pages to be written again on the next flush because we can
// only write on aligned // only write on aligned
// offsets. // offsets.
IOStatus WritableFileWriter::WriteDirect( IOStatus WritableFileWriter::WriteDirect(const IOOptions& opts) {
Env::IOPriority op_rate_limiter_priority) {
if (seen_error()) { if (seen_error()) {
assert(false); assert(false);
@ -785,11 +800,7 @@ IOStatus WritableFileWriter::WriteDirect(
size_t left = buf_.CurrentSize(); size_t left = buf_.CurrentSize();
DataVerificationInfo v_info; DataVerificationInfo v_info;
char checksum_buf[sizeof(uint32_t)]; char checksum_buf[sizeof(uint32_t)];
Env::IOPriority rate_limiter_priority_used = Env::IOPriority rate_limiter_priority_used = opts.rate_limiter_priority;
WritableFileWriter::DecideRateLimiterPriority(
writable_file_->GetIOPriority(), op_rate_limiter_priority);
IOOptions io_options;
io_options.rate_limiter_priority = rate_limiter_priority_used;
while (left > 0) { while (left > 0) {
// Check how much is allowed // Check how much is allowed
@ -813,10 +824,10 @@ IOStatus WritableFileWriter::WriteDirect(
Crc32cHandoffChecksumCalculation(src, size, checksum_buf); Crc32cHandoffChecksumCalculation(src, size, checksum_buf);
v_info.checksum = Slice(checksum_buf, sizeof(uint32_t)); v_info.checksum = Slice(checksum_buf, sizeof(uint32_t));
s = writable_file_->PositionedAppend(Slice(src, size), write_offset, s = writable_file_->PositionedAppend(Slice(src, size), write_offset,
io_options, v_info, nullptr); opts, v_info, nullptr);
} else { } else {
s = writable_file_->PositionedAppend(Slice(src, size), write_offset, s = writable_file_->PositionedAppend(Slice(src, size), write_offset,
io_options, nullptr); opts, nullptr);
} }
if (ShouldNotifyListeners()) { if (ShouldNotifyListeners()) {
@ -859,8 +870,7 @@ IOStatus WritableFileWriter::WriteDirect(
return s; return s;
} }
IOStatus WritableFileWriter::WriteDirectWithChecksum( IOStatus WritableFileWriter::WriteDirectWithChecksum(const IOOptions& opts) {
Env::IOPriority op_rate_limiter_priority) {
if (seen_error()) { if (seen_error()) {
return AssertFalseAndGetStatusForPrevError(); return AssertFalseAndGetStatusForPrevError();
} }
@ -895,11 +905,7 @@ IOStatus WritableFileWriter::WriteDirectWithChecksum(
DataVerificationInfo v_info; DataVerificationInfo v_info;
char checksum_buf[sizeof(uint32_t)]; char checksum_buf[sizeof(uint32_t)];
Env::IOPriority rate_limiter_priority_used = Env::IOPriority rate_limiter_priority_used = opts.rate_limiter_priority;
WritableFileWriter::DecideRateLimiterPriority(
writable_file_->GetIOPriority(), op_rate_limiter_priority);
IOOptions io_options;
io_options.rate_limiter_priority = rate_limiter_priority_used;
// Check how much is allowed. Here, we loop until the rate limiter allows to // Check how much is allowed. Here, we loop until the rate limiter allows to
// write the entire buffer. // write the entire buffer.
// TODO: need to be improved since it sort of defeats the purpose of the rate // TODO: need to be improved since it sort of defeats the purpose of the rate
@ -925,8 +931,8 @@ IOStatus WritableFileWriter::WriteDirectWithChecksum(
// direct writes must be positional // direct writes must be positional
EncodeFixed32(checksum_buf, buffered_data_crc32c_checksum_); EncodeFixed32(checksum_buf, buffered_data_crc32c_checksum_);
v_info.checksum = Slice(checksum_buf, sizeof(uint32_t)); v_info.checksum = Slice(checksum_buf, sizeof(uint32_t));
s = writable_file_->PositionedAppend(Slice(src, left), write_offset, s = writable_file_->PositionedAppend(Slice(src, left), write_offset, opts,
io_options, v_info, nullptr); v_info, nullptr);
if (ShouldNotifyListeners()) { if (ShouldNotifyListeners()) {
auto finish_ts = std::chrono::steady_clock::now(); auto finish_ts = std::chrono::steady_clock::now();
@ -986,4 +992,14 @@ Env::IOPriority WritableFileWriter::DecideRateLimiterPriority(
} }
} }
IOOptions WritableFileWriter::FinalizeIOOptions(const IOOptions& opts) const {
Env::IOPriority op_rate_limiter_priority = opts.rate_limiter_priority;
IOOptions io_options(opts);
if (writable_file_.get() != nullptr) {
io_options.rate_limiter_priority =
WritableFileWriter::DecideRateLimiterPriority(
writable_file_->GetIOPriority(), op_rate_limiter_priority);
}
return io_options;
}
} // namespace ROCKSDB_NAMESPACE } // namespace ROCKSDB_NAMESPACE

View File

@ -13,6 +13,7 @@
#include "db/version_edit.h" #include "db/version_edit.h"
#include "env/file_system_tracer.h" #include "env/file_system_tracer.h"
#include "monitoring/thread_status_util.h"
#include "port/port.h" #include "port/port.h"
#include "rocksdb/file_checksum.h" #include "rocksdb/file_checksum.h"
#include "rocksdb/file_system.h" #include "rocksdb/file_system.h"
@ -159,6 +160,7 @@ class WritableFileWriter {
uint64_t bytes_per_sync_; uint64_t bytes_per_sync_;
RateLimiter* rate_limiter_; RateLimiter* rate_limiter_;
Statistics* stats_; Statistics* stats_;
Histograms hist_type_;
std::vector<std::shared_ptr<EventListener>> listeners_; std::vector<std::shared_ptr<EventListener>> listeners_;
std::unique_ptr<FileChecksumGenerator> checksum_generator_; std::unique_ptr<FileChecksumGenerator> checksum_generator_;
bool checksum_finalized_; bool checksum_finalized_;
@ -173,6 +175,7 @@ class WritableFileWriter {
const FileOptions& options, SystemClock* clock = nullptr, const FileOptions& options, SystemClock* clock = nullptr,
const std::shared_ptr<IOTracer>& io_tracer = nullptr, const std::shared_ptr<IOTracer>& io_tracer = nullptr,
Statistics* stats = nullptr, Statistics* stats = nullptr,
Histograms hist_type = Histograms::HISTOGRAM_ENUM_MAX,
const std::vector<std::shared_ptr<EventListener>>& listeners = {}, const std::vector<std::shared_ptr<EventListener>>& listeners = {},
FileChecksumGenFactory* file_checksum_gen_factory = nullptr, FileChecksumGenFactory* file_checksum_gen_factory = nullptr,
bool perform_data_verification = false, bool perform_data_verification = false,
@ -191,6 +194,7 @@ class WritableFileWriter {
bytes_per_sync_(options.bytes_per_sync), bytes_per_sync_(options.bytes_per_sync),
rate_limiter_(options.rate_limiter), rate_limiter_(options.rate_limiter),
stats_(stats), stats_(stats),
hist_type_(hist_type),
listeners_(), listeners_(),
checksum_generator_(nullptr), checksum_generator_(nullptr),
checksum_finalized_(false), checksum_finalized_(false),
@ -222,35 +226,42 @@ class WritableFileWriter {
const std::string& fname, const FileOptions& file_opts, const std::string& fname, const FileOptions& file_opts,
std::unique_ptr<WritableFileWriter>* writer, std::unique_ptr<WritableFileWriter>* writer,
IODebugContext* dbg); IODebugContext* dbg);
static IOStatus PrepareIOOptions(const WriteOptions& wo, IOOptions& opts);
WritableFileWriter(const WritableFileWriter&) = delete; WritableFileWriter(const WritableFileWriter&) = delete;
WritableFileWriter& operator=(const WritableFileWriter&) = delete; WritableFileWriter& operator=(const WritableFileWriter&) = delete;
~WritableFileWriter() { ~WritableFileWriter() {
auto s = Close(); ThreadStatus::OperationType cur_op_type =
ThreadStatusUtil::GetThreadOperation();
ThreadStatusUtil::SetThreadOperation(
ThreadStatus::OperationType::OP_UNKNOWN);
auto s = Close(IOOptions());
s.PermitUncheckedError(); s.PermitUncheckedError();
ThreadStatusUtil::SetThreadOperation(cur_op_type);
} }
std::string file_name() const { return file_name_; } std::string file_name() const { return file_name_; }
// When this Append API is called, if the crc32c_checksum is not provided, we // When this Append API is called, if the crc32c_checksum is not provided, we
// will calculate the checksum internally. // will calculate the checksum internally.
IOStatus Append(const Slice& data, uint32_t crc32c_checksum = 0, IOStatus Append(const IOOptions& opts, const Slice& data,
Env::IOPriority op_rate_limiter_priority = Env::IO_TOTAL); uint32_t crc32c_checksum = 0);
IOStatus Pad(const size_t pad_bytes, IOStatus Pad(const IOOptions& opts, const size_t pad_bytes);
Env::IOPriority op_rate_limiter_priority = Env::IO_TOTAL);
IOStatus Flush(Env::IOPriority op_rate_limiter_priority = Env::IO_TOTAL); IOStatus Flush(const IOOptions& opts);
IOStatus Close(); IOStatus Close(const IOOptions& opts);
IOStatus Sync(bool use_fsync); IOStatus Sync(const IOOptions& opts, bool use_fsync);
// Sync only the data that was already Flush()ed. Safe to call concurrently // Sync only the data that was already Flush()ed. Safe to call concurrently
// with Append() and Flush(). If !writable_file_->IsSyncThreadSafe(), // with Append() and Flush(). If !writable_file_->IsSyncThreadSafe(),
// returns NotSupported status. // returns NotSupported status.
IOStatus SyncWithoutFlush(bool use_fsync); IOStatus SyncWithoutFlush(const IOOptions& opts, bool use_fsync);
uint64_t GetFileSize() const { uint64_t GetFileSize() const {
return filesize_.load(std::memory_order_acquire); return filesize_.load(std::memory_order_acquire);
@ -307,14 +318,20 @@ class WritableFileWriter {
// Used when os buffering is OFF and we are writing // Used when os buffering is OFF and we are writing
// DMA such as in Direct I/O mode // DMA such as in Direct I/O mode
IOStatus WriteDirect(Env::IOPriority op_rate_limiter_priority); // `opts` should've been called with `FinalizeIOOptions()` before passing in
IOStatus WriteDirectWithChecksum(Env::IOPriority op_rate_limiter_priority); IOStatus WriteDirect(const IOOptions& opts);
// `opts` should've been called with `FinalizeIOOptions()` before passing in
IOStatus WriteDirectWithChecksum(const IOOptions& opts);
// Normal write. // Normal write.
IOStatus WriteBuffered(const char* data, size_t size, // `opts` should've been called with `FinalizeIOOptions()` before passing in
Env::IOPriority op_rate_limiter_priority); IOStatus WriteBuffered(const IOOptions& opts, const char* data, size_t size);
IOStatus WriteBufferedWithChecksum(const char* data, size_t size, // `opts` should've been called with `FinalizeIOOptions()` before passing in
Env::IOPriority op_rate_limiter_priority); IOStatus WriteBufferedWithChecksum(const IOOptions& opts, const char* data,
IOStatus RangeSync(uint64_t offset, uint64_t nbytes); size_t size);
IOStatus SyncInternal(bool use_fsync); // `opts` should've been called with `FinalizeIOOptions()` before passing in
IOStatus RangeSync(const IOOptions& opts, uint64_t offset, uint64_t nbytes);
// `opts` should've been called with `FinalizeIOOptions()` before passing in
IOStatus SyncInternal(const IOOptions& opts, bool use_fsync);
IOOptions FinalizeIOOptions(const IOOptions& opts) const;
}; };
} // namespace ROCKSDB_NAMESPACE } // namespace ROCKSDB_NAMESPACE

View File

@ -67,6 +67,7 @@ struct ThreadStatus;
class FileSystem; class FileSystem;
class SystemClock; class SystemClock;
struct ConfigOptions; struct ConfigOptions;
struct IOOptions;
const size_t kDefaultPageSize = 4 * 1024; const size_t kDefaultPageSize = 4 * 1024;
@ -1352,7 +1353,8 @@ extern void Fatal(Logger* info_log, const char* format, ...)
// A utility routine: write "data" to the named file. // A utility routine: write "data" to the named file.
extern Status WriteStringToFile(Env* env, const Slice& data, extern Status WriteStringToFile(Env* env, const Slice& data,
const std::string& fname, const std::string& fname,
bool should_sync = false); bool should_sync = false,
const IOOptions* io_options = nullptr);
// A utility routine: read contents of named file into *data // A utility routine: read contents of named file into *data
extern Status ReadFileToString(Env* env, const std::string& fname, extern Status ReadFileToString(Env* env, const std::string& fname,

View File

@ -1918,7 +1918,8 @@ class FSDirectoryWrapper : public FSDirectory {
// A utility routine: write "data" to the named file. // A utility routine: write "data" to the named file.
extern IOStatus WriteStringToFile(FileSystem* fs, const Slice& data, extern IOStatus WriteStringToFile(FileSystem* fs, const Slice& data,
const std::string& fname, const std::string& fname,
bool should_sync = false); bool should_sync = false,
const IOOptions& io_options = IOOptions());
// A utility routine: read contents of named file into *data // A utility routine: read contents of named file into *data
extern IOStatus ReadFileToString(FileSystem* fs, const std::string& fname, extern IOStatus ReadFileToString(FileSystem* fs, const std::string& fname,

View File

@ -1781,7 +1781,7 @@ struct WriteOptions {
// system call followed by "fdatasync()". // system call followed by "fdatasync()".
// //
// Default: false // Default: false
bool sync; bool sync = false;
// If true, writes will not first go to the write ahead log, // If true, writes will not first go to the write ahead log,
// and the write may get lost after a crash. The backup engine // and the write may get lost after a crash. The backup engine
@ -1789,18 +1789,18 @@ struct WriteOptions {
// you disable write-ahead logs, you must create backups with // you disable write-ahead logs, you must create backups with
// flush_before_backup=true to avoid losing unflushed memtable data. // flush_before_backup=true to avoid losing unflushed memtable data.
// Default: false // Default: false
bool disableWAL; bool disableWAL = false;
// If true and if user is trying to write to column families that don't exist // If true and if user is trying to write to column families that don't exist
// (they were dropped), ignore the write (don't return an error). If there // (they were dropped), ignore the write (don't return an error). If there
// are multiple writes in a WriteBatch, other writes will succeed. // are multiple writes in a WriteBatch, other writes will succeed.
// Default: false // Default: false
bool ignore_missing_column_families; bool ignore_missing_column_families = false;
// If true and we need to wait or sleep for the write request, fails // If true and we need to wait or sleep for the write request, fails
// immediately with Status::Incomplete(). // immediately with Status::Incomplete().
// Default: false // Default: false
bool no_slowdown; bool no_slowdown = false;
// If true, this write request is of lower priority if compaction is // If true, this write request is of lower priority if compaction is
// behind. In this case, no_slowdown = true, the request will be canceled // behind. In this case, no_slowdown = true, the request will be canceled
@ -1809,7 +1809,7 @@ struct WriteOptions {
// it introduces minimum impacts to high priority writes. // it introduces minimum impacts to high priority writes.
// //
// Default: false // Default: false
bool low_pri; bool low_pri = false;
// If true, this writebatch will maintain the last insert positions of each // If true, this writebatch will maintain the last insert positions of each
// memtable as hints in concurrent write. It can improve write performance // memtable as hints in concurrent write. It can improve write performance
@ -1818,7 +1818,7 @@ struct WriteOptions {
// option will be ignored. // option will be ignored.
// //
// Default: false // Default: false
bool memtable_insert_hint_per_batch; bool memtable_insert_hint_per_batch = false;
// For writes associated with this option, charge the internal rate // For writes associated with this option, charge the internal rate
// limiter (see `DBOptions::rate_limiter`) at the specified priority. The // limiter (see `DBOptions::rate_limiter`) at the specified priority. The
@ -1833,24 +1833,25 @@ struct WriteOptions {
// due to implementation constraints. // due to implementation constraints.
// //
// Default: `Env::IO_TOTAL` // Default: `Env::IO_TOTAL`
Env::IOPriority rate_limiter_priority; Env::IOPriority rate_limiter_priority = Env::IO_TOTAL;
// `protection_bytes_per_key` is the number of bytes used to store // `protection_bytes_per_key` is the number of bytes used to store
// protection information for each key entry. Currently supported values are // protection information for each key entry. Currently supported values are
// zero (disabled) and eight. // zero (disabled) and eight.
// //
// Default: zero (disabled). // Default: zero (disabled).
size_t protection_bytes_per_key; size_t protection_bytes_per_key = 0;
WriteOptions() // For RocksDB internal use only
: sync(false), //
disableWAL(false), // Default: Env::IOActivity::kUnknown.
ignore_missing_column_families(false), Env::IOActivity io_activity = Env::IOActivity::kUnknown;
no_slowdown(false),
low_pri(false), WriteOptions() {}
memtable_insert_hint_per_batch(false), explicit WriteOptions(Env::IOActivity _io_activity);
rate_limiter_priority(Env::IO_TOTAL), explicit WriteOptions(
protection_bytes_per_key(0) {} Env::IOPriority _rate_limiter_priority,
Env::IOActivity _io_activity = Env::IOActivity::kUnknown);
}; };
// Options that control flush operations // Options that control flush operations

View File

@ -34,6 +34,7 @@ class SstFileReader {
// Verifies whether there is corruption in this table. // Verifies whether there is corruption in this table.
Status VerifyChecksum(const ReadOptions& /*read_options*/); Status VerifyChecksum(const ReadOptions& /*read_options*/);
// TODO: plumb Env::IOActivity, Env::IOPriority
Status VerifyChecksum() { return VerifyChecksum(ReadOptions()); } Status VerifyChecksum() { return VerifyChecksum(ReadOptions()); }
private: private:
@ -42,4 +43,3 @@ class SstFileReader {
}; };
} // namespace ROCKSDB_NAMESPACE } // namespace ROCKSDB_NAMESPACE

View File

@ -589,6 +589,14 @@ enum Histograms : uint32_t {
FILE_READ_VERIFY_DB_CHECKSUM_MICROS, FILE_READ_VERIFY_DB_CHECKSUM_MICROS,
FILE_READ_VERIFY_FILE_CHECKSUMS_MICROS, FILE_READ_VERIFY_FILE_CHECKSUMS_MICROS,
// Time spent in writing SST files
SST_WRITE_MICROS,
// Time spent in writing SST table (currently only block-based table) or blob
// file for flush, compaction or db open
FILE_WRITE_FLUSH_MICROS,
FILE_WRITE_COMPACTION_MICROS,
FILE_WRITE_DB_OPEN_MICROS,
// The number of subcompactions actually scheduled during a compaction // The number of subcompactions actually scheduled during a compaction
NUM_SUBCOMPACTIONS_SCHEDULED, NUM_SUBCOMPACTIONS_SCHEDULED,
// Value size distribution in each operation // Value size distribution in each operation

View File

@ -5716,10 +5716,17 @@ class HistogramTypeJni {
case ROCKSDB_NAMESPACE::Histograms:: case ROCKSDB_NAMESPACE::Histograms::
FILE_READ_VERIFY_FILE_CHECKSUMS_MICROS: FILE_READ_VERIFY_FILE_CHECKSUMS_MICROS:
return 0x41; return 0x41;
case ROCKSDB_NAMESPACE::Histograms::SST_WRITE_MICROS:
return 0x42;
case ROCKSDB_NAMESPACE::Histograms::FILE_WRITE_FLUSH_MICROS:
return 0x43;
case ROCKSDB_NAMESPACE::Histograms::FILE_WRITE_COMPACTION_MICROS:
return 0x44;
case ROCKSDB_NAMESPACE::Histograms::FILE_WRITE_DB_OPEN_MICROS:
return 0x45;
case ROCKSDB_NAMESPACE::Histograms::HISTOGRAM_ENUM_MAX: case ROCKSDB_NAMESPACE::Histograms::HISTOGRAM_ENUM_MAX:
// 0x1F for backwards compatibility on current minor version. // 0x1F for backwards compatibility on current minor version.
return 0x1F; return 0x1F;
default: default:
// undefined/default // undefined/default
return 0x0; return 0x0;
@ -5853,6 +5860,14 @@ class HistogramTypeJni {
case 0x41: case 0x41:
return ROCKSDB_NAMESPACE::Histograms:: return ROCKSDB_NAMESPACE::Histograms::
FILE_READ_VERIFY_FILE_CHECKSUMS_MICROS; FILE_READ_VERIFY_FILE_CHECKSUMS_MICROS;
case 0x42:
return ROCKSDB_NAMESPACE::Histograms::SST_WRITE_MICROS;
case 0x43:
return ROCKSDB_NAMESPACE::Histograms::FILE_WRITE_FLUSH_MICROS;
case 0x44:
return ROCKSDB_NAMESPACE::Histograms::FILE_WRITE_COMPACTION_MICROS;
case 0x45:
return ROCKSDB_NAMESPACE::Histograms::FILE_WRITE_DB_OPEN_MICROS;
case 0x1F: case 0x1F:
// 0x1F for backwards compatibility on current minor version. // 0x1F for backwards compatibility on current minor version.
return ROCKSDB_NAMESPACE::Histograms::HISTOGRAM_ENUM_MAX; return ROCKSDB_NAMESPACE::Histograms::HISTOGRAM_ENUM_MAX;

View File

@ -185,6 +185,14 @@ public enum HistogramType {
FILE_READ_VERIFY_FILE_CHECKSUMS_MICROS((byte) 0x41), FILE_READ_VERIFY_FILE_CHECKSUMS_MICROS((byte) 0x41),
SST_WRITE_MICROS((byte) 0x42),
FILE_WRITE_FLUSH_MICROS((byte) 0x43),
FILE_WRITE_COMPACTION_MICROS((byte) 0x44),
FILE_WRITE_DB_OPEN_MICROS((byte) 0x45),
// 0x1F for backwards compatibility on current minor version. // 0x1F for backwards compatibility on current minor version.
HISTOGRAM_ENUM_MAX((byte) 0x1F); HISTOGRAM_ENUM_MAX((byte) 0x1F);

View File

@ -75,7 +75,7 @@ class EnvLogger : public Logger {
mutex_.AssertHeld(); mutex_.AssertHeld();
if (flush_pending_) { if (flush_pending_) {
flush_pending_ = false; flush_pending_ = false;
file_.Flush().PermitUncheckedError(); file_.Flush(IOOptions()).PermitUncheckedError();
file_.reset_seen_error(); file_.reset_seen_error();
} }
last_flush_micros_ = clock_->NowMicros(); last_flush_micros_ = clock_->NowMicros();
@ -93,7 +93,7 @@ class EnvLogger : public Logger {
Status CloseHelper() { Status CloseHelper() {
FileOpGuard guard(*this); FileOpGuard guard(*this);
const auto close_status = file_.Close(); const auto close_status = file_.Close(IOOptions());
if (close_status.ok()) { if (close_status.ok()) {
return close_status; return close_status;
@ -162,7 +162,7 @@ class EnvLogger : public Logger {
{ {
FileOpGuard guard(*this); FileOpGuard guard(*this);
// We will ignore any error returned by Append(). // We will ignore any error returned by Append().
file_.Append(Slice(base, p - base)).PermitUncheckedError(); file_.Append(IOOptions(), Slice(base, p - base)).PermitUncheckedError();
file_.reset_seen_error(); file_.reset_seen_error();
flush_pending_ = true; flush_pending_ = true;
const uint64_t now_micros = clock_->NowMicros(); const uint64_t now_micros = clock_->NowMicros();

View File

@ -41,6 +41,8 @@ Status DecodePersistentStatsVersionNumber(DBImpl* db, StatsVersionKeyType type,
} else if (type == StatsVersionKeyType::kCompatibleVersion) { } else if (type == StatsVersionKeyType::kCompatibleVersion) {
key = kCompatibleVersionKeyString; key = kCompatibleVersionKeyString;
} }
// TODO: plumb Env::IOActivity, Env::IOPriority
ReadOptions options; ReadOptions options;
options.verify_checksums = true; options.verify_checksums = true;
std::string result; std::string result;
@ -122,6 +124,7 @@ void PersistentStatsHistoryIterator::AdvanceIteratorByTime(uint64_t start_time,
uint64_t end_time) { uint64_t end_time) {
// try to find next entry in stats_history_ map // try to find next entry in stats_history_ map
if (db_impl_ != nullptr) { if (db_impl_ != nullptr) {
// TODO: plumb Env::IOActivity, Env::IOPriority
ReadOptions ro; ReadOptions ro;
Iterator* iter = Iterator* iter =
db_impl_->NewIterator(ro, db_impl_->PersistentStatsColumnFamily()); db_impl_->NewIterator(ro, db_impl_->PersistentStatsColumnFamily());

View File

@ -303,6 +303,10 @@ const std::vector<std::pair<Histograms, std::string>> HistogramsNameMap = {
"rocksdb.file.read.verify.db.checksum.micros"}, "rocksdb.file.read.verify.db.checksum.micros"},
{FILE_READ_VERIFY_FILE_CHECKSUMS_MICROS, {FILE_READ_VERIFY_FILE_CHECKSUMS_MICROS,
"rocksdb.file.read.verify.file.checksums.micros"}, "rocksdb.file.read.verify.file.checksums.micros"},
{SST_WRITE_MICROS, "rocksdb.sst.write.micros"},
{FILE_WRITE_FLUSH_MICROS, "rocksdb.file.write.flush.micros"},
{FILE_WRITE_COMPACTION_MICROS, "rocksdb.file.write.compaction.micros"},
{FILE_WRITE_DB_OPEN_MICROS, "rocksdb.file.write.db.open.micros"},
{NUM_SUBCOMPACTIONS_SCHEDULED, "rocksdb.num.subcompactions.scheduled"}, {NUM_SUBCOMPACTIONS_SCHEDULED, "rocksdb.num.subcompactions.scheduled"},
{BYTES_PER_READ, "rocksdb.bytes.per.read"}, {BYTES_PER_READ, "rocksdb.bytes.per.read"},
{BYTES_PER_WRITE, "rocksdb.bytes.per.write"}, {BYTES_PER_WRITE, "rocksdb.bytes.per.write"},

View File

@ -703,4 +703,11 @@ ReadOptions::ReadOptions(bool _verify_checksums, bool _fill_cache)
ReadOptions::ReadOptions(Env::IOActivity _io_activity) ReadOptions::ReadOptions(Env::IOActivity _io_activity)
: io_activity(_io_activity) {} : io_activity(_io_activity) {}
WriteOptions::WriteOptions(Env::IOActivity _io_activity)
: io_activity(_io_activity) {}
WriteOptions::WriteOptions(Env::IOPriority _rate_limiter_priority,
Env::IOActivity _io_activity)
: rate_limiter_priority(_rate_limiter_priority),
io_activity(_io_activity) {}
} // namespace ROCKSDB_NAMESPACE } // namespace ROCKSDB_NAMESPACE

View File

@ -35,7 +35,8 @@ static const std::string option_file_header =
"#\n" "#\n"
"\n"; "\n";
Status PersistRocksDBOptions(const DBOptions& db_opt, Status PersistRocksDBOptions(const WriteOptions& write_options,
const DBOptions& db_opt,
const std::vector<std::string>& cf_names, const std::vector<std::string>& cf_names,
const std::vector<ColumnFamilyOptions>& cf_opts, const std::vector<ColumnFamilyOptions>& cf_opts,
const std::string& file_name, FileSystem* fs) { const std::string& file_name, FileSystem* fs) {
@ -48,11 +49,12 @@ Status PersistRocksDBOptions(const DBOptions& db_opt,
if (db_opt.log_readahead_size > 0) { if (db_opt.log_readahead_size > 0) {
config_options.file_readahead_size = db_opt.log_readahead_size; config_options.file_readahead_size = db_opt.log_readahead_size;
} }
return PersistRocksDBOptions(config_options, db_opt, cf_names, cf_opts, return PersistRocksDBOptions(write_options, config_options, db_opt, cf_names,
file_name, fs); cf_opts, file_name, fs);
} }
Status PersistRocksDBOptions(const ConfigOptions& config_options_in, Status PersistRocksDBOptions(const WriteOptions& write_options,
const ConfigOptions& config_options_in,
const DBOptions& db_opt, const DBOptions& db_opt,
const std::vector<std::string>& cf_names, const std::vector<std::string>& cf_names,
const std::vector<ColumnFamilyOptions>& cf_opts, const std::vector<ColumnFamilyOptions>& cf_opts,
@ -79,62 +81,70 @@ Status PersistRocksDBOptions(const ConfigOptions& config_options_in,
std::string options_file_content; std::string options_file_content;
s = writable->Append( IOOptions opts;
option_file_header + "[" + opt_section_titles[kOptionSectionVersion] + s = WritableFileWriter::PrepareIOOptions(write_options, opts);
if (s.ok()) {
s = writable->Append(opts, option_file_header + "[" +
opt_section_titles[kOptionSectionVersion] +
"]\n" "]\n"
" rocksdb_version=" + " rocksdb_version=" +
std::to_string(ROCKSDB_MAJOR) + "." + std::to_string(ROCKSDB_MINOR) + std::to_string(ROCKSDB_MAJOR) + "." +
"." + std::to_string(ROCKSDB_PATCH) + "\n"); std::to_string(ROCKSDB_MINOR) + "." +
std::to_string(ROCKSDB_PATCH) + "\n");
}
if (s.ok()) { if (s.ok()) {
s = writable->Append( s = writable->Append(
opts,
" options_file_version=" + std::to_string(ROCKSDB_OPTION_FILE_MAJOR) + " options_file_version=" + std::to_string(ROCKSDB_OPTION_FILE_MAJOR) +
"." + std::to_string(ROCKSDB_OPTION_FILE_MINOR) + "\n"); "." + std::to_string(ROCKSDB_OPTION_FILE_MINOR) + "\n");
} }
if (s.ok()) { if (s.ok()) {
s = writable->Append("\n[" + opt_section_titles[kOptionSectionDBOptions] + s = writable->Append(
"]\n "); opts, "\n[" + opt_section_titles[kOptionSectionDBOptions] + "]\n ");
} }
if (s.ok()) { if (s.ok()) {
s = GetStringFromDBOptions(config_options, db_opt, &options_file_content); s = GetStringFromDBOptions(config_options, db_opt, &options_file_content);
} }
if (s.ok()) { if (s.ok()) {
s = writable->Append(options_file_content + "\n"); s = writable->Append(opts, options_file_content + "\n");
} }
for (size_t i = 0; s.ok() && i < cf_opts.size(); ++i) { for (size_t i = 0; s.ok() && i < cf_opts.size(); ++i) {
// CFOptions section // CFOptions section
s = writable->Append("\n[" + opt_section_titles[kOptionSectionCFOptions] + s = writable->Append(
" \"" + EscapeOptionString(cf_names[i]) + "\"]\n "); opts, "\n[" + opt_section_titles[kOptionSectionCFOptions] + " \"" +
EscapeOptionString(cf_names[i]) + "\"]\n ");
if (s.ok()) { if (s.ok()) {
s = GetStringFromColumnFamilyOptions(config_options, cf_opts[i], s = GetStringFromColumnFamilyOptions(config_options, cf_opts[i],
&options_file_content); &options_file_content);
} }
if (s.ok()) { if (s.ok()) {
s = writable->Append(options_file_content + "\n"); s = writable->Append(opts, options_file_content + "\n");
} }
// TableOptions section // TableOptions section
auto* tf = cf_opts[i].table_factory.get(); auto* tf = cf_opts[i].table_factory.get();
if (tf != nullptr) { if (tf != nullptr) {
if (s.ok()) { if (s.ok()) {
s = writable->Append( s = writable->Append(
"[" + opt_section_titles[kOptionSectionTableOptions] + tf->Name() + opts, "[" + opt_section_titles[kOptionSectionTableOptions] +
" \"" + EscapeOptionString(cf_names[i]) + "\"]\n "); tf->Name() + " \"" + EscapeOptionString(cf_names[i]) +
"\"]\n ");
} }
if (s.ok()) { if (s.ok()) {
options_file_content.clear(); options_file_content.clear();
s = tf->GetOptionString(config_options, &options_file_content); s = tf->GetOptionString(config_options, &options_file_content);
} }
if (s.ok()) { if (s.ok()) {
s = writable->Append(options_file_content + "\n"); s = writable->Append(opts, options_file_content + "\n");
} }
} }
} }
if (s.ok()) { if (s.ok()) {
s = writable->Sync(true /* use_fsync */); s = writable->Sync(opts, true /* use_fsync */);
} }
if (s.ok()) { if (s.ok()) {
s = writable->Close(); s = writable->Close(opts);
} }
TEST_SYNC_POINT("PersistRocksDBOptions:written"); TEST_SYNC_POINT("PersistRocksDBOptions:written");
if (s.ok()) { if (s.ok()) {
@ -733,4 +743,3 @@ Status RocksDBOptionsParser::VerifyTableFactory(
return Status::OK(); return Status::OK();
} }
} // namespace ROCKSDB_NAMESPACE } // namespace ROCKSDB_NAMESPACE

View File

@ -32,11 +32,13 @@ enum OptionSection : char {
static const std::string opt_section_titles[] = { static const std::string opt_section_titles[] = {
"Version", "DBOptions", "CFOptions", "TableOptions/", "Unknown"}; "Version", "DBOptions", "CFOptions", "TableOptions/", "Unknown"};
Status PersistRocksDBOptions(const DBOptions& db_opt, Status PersistRocksDBOptions(const WriteOptions& write_options,
const DBOptions& db_opt,
const std::vector<std::string>& cf_names, const std::vector<std::string>& cf_names,
const std::vector<ColumnFamilyOptions>& cf_opts, const std::vector<ColumnFamilyOptions>& cf_opts,
const std::string& file_name, FileSystem* fs); const std::string& file_name, FileSystem* fs);
Status PersistRocksDBOptions(const ConfigOptions& config_options, Status PersistRocksDBOptions(const WriteOptions& write_options,
const ConfigOptions& config_options,
const DBOptions& db_opt, const DBOptions& db_opt,
const std::vector<std::string>& cf_names, const std::vector<std::string>& cf_names,
const std::vector<ColumnFamilyOptions>& cf_opts, const std::vector<ColumnFamilyOptions>& cf_opts,

View File

@ -3672,8 +3672,8 @@ TEST_F(OptionsParserTest, Readahead) {
std::vector<std::string> cf_names = {"default", one_mb_string}; std::vector<std::string> cf_names = {"default", one_mb_string};
const std::string kOptionsFileName = "test-persisted-options.ini"; const std::string kOptionsFileName = "test-persisted-options.ini";
ASSERT_OK(PersistRocksDBOptions(base_db_opt, cf_names, base_cf_opts, ASSERT_OK(PersistRocksDBOptions(WriteOptions(), base_db_opt, cf_names,
kOptionsFileName, fs_.get())); base_cf_opts, kOptionsFileName, fs_.get()));
uint64_t file_size = 0; uint64_t file_size = 0;
ASSERT_OK( ASSERT_OK(
@ -3747,8 +3747,8 @@ TEST_F(OptionsParserTest, DumpAndParse) {
const std::string kOptionsFileName = "test-persisted-options.ini"; const std::string kOptionsFileName = "test-persisted-options.ini";
// Use default for escaped(true), unknown(false) and check (exact) // Use default for escaped(true), unknown(false) and check (exact)
ConfigOptions config_options; ConfigOptions config_options;
ASSERT_OK(PersistRocksDBOptions(base_db_opt, cf_names, base_cf_opts, ASSERT_OK(PersistRocksDBOptions(WriteOptions(), base_db_opt, cf_names,
kOptionsFileName, fs_.get())); base_cf_opts, kOptionsFileName, fs_.get()));
RocksDBOptionsParser parser; RocksDBOptionsParser parser;
ASSERT_OK(parser.Parse(config_options, kOptionsFileName, fs_.get())); ASSERT_OK(parser.Parse(config_options, kOptionsFileName, fs_.get()));
@ -3808,9 +3808,9 @@ TEST_F(OptionsParserTest, DifferentDefault) {
ColumnFamilyOptions cf_univ_opts; ColumnFamilyOptions cf_univ_opts;
cf_univ_opts.OptimizeUniversalStyleCompaction(); cf_univ_opts.OptimizeUniversalStyleCompaction();
ASSERT_OK(PersistRocksDBOptions(DBOptions(), {"default", "universal"}, ASSERT_OK(PersistRocksDBOptions(
{cf_level_opts, cf_univ_opts}, WriteOptions(), DBOptions(), {"default", "universal"},
kOptionsFileName, fs_.get())); {cf_level_opts, cf_univ_opts}, kOptionsFileName, fs_.get()));
RocksDBOptionsParser parser; RocksDBOptionsParser parser;
ASSERT_OK(parser.Parse(kOptionsFileName, fs_.get(), false, ASSERT_OK(parser.Parse(kOptionsFileName, fs_.get(), false,
@ -3953,8 +3953,8 @@ class OptionsSanityCheckTest : public OptionsParserTest,
if (!s.ok()) { if (!s.ok()) {
return s; return s;
} }
return PersistRocksDBOptions(db_opts, {"default"}, {cf_opts}, return PersistRocksDBOptions(WriteOptions(), db_opts, {"default"},
kOptionsFileName, fs_.get()); {cf_opts}, kOptionsFileName, fs_.get());
} }
Status PersistCFOptions(const ColumnFamilyOptions& cf_opts) { Status PersistCFOptions(const ColumnFamilyOptions& cf_opts) {

View File

@ -264,6 +264,7 @@ struct BlockBasedTableBuilder::Rep {
// BEGIN from MutableCFOptions // BEGIN from MutableCFOptions
std::shared_ptr<const SliceTransform> prefix_extractor; std::shared_ptr<const SliceTransform> prefix_extractor;
// END from MutableCFOptions // END from MutableCFOptions
const WriteOptions write_options;
const BlockBasedTableOptions table_options; const BlockBasedTableOptions table_options;
const InternalKeyComparator& internal_comparator; const InternalKeyComparator& internal_comparator;
// Size in bytes for the user-defined timestamps. // Size in bytes for the user-defined timestamps.
@ -439,6 +440,7 @@ struct BlockBasedTableBuilder::Rep {
WritableFileWriter* f) WritableFileWriter* f)
: ioptions(tbo.ioptions), : ioptions(tbo.ioptions),
prefix_extractor(tbo.moptions.prefix_extractor), prefix_extractor(tbo.moptions.prefix_extractor),
write_options(tbo.write_options),
table_options(table_opt), table_options(table_opt),
internal_comparator(tbo.internal_comparator), internal_comparator(tbo.internal_comparator),
ts_sz(tbo.internal_comparator.user_comparator()->timestamp_size()), ts_sz(tbo.internal_comparator.user_comparator()->timestamp_size()),
@ -1317,6 +1319,13 @@ void BlockBasedTableBuilder::WriteMaybeCompressedBlock(
// checksum: uint32 // checksum: uint32
Rep* r = rep_; Rep* r = rep_;
bool is_data_block = block_type == BlockType::kData; bool is_data_block = block_type == BlockType::kData;
IOOptions io_options;
IOStatus io_s =
WritableFileWriter::PrepareIOOptions(r->write_options, io_options);
if (!io_s.ok()) {
r->SetIOStatus(io_s);
return;
}
// Old, misleading name of this function: WriteRawBlock // Old, misleading name of this function: WriteRawBlock
StopWatch sw(r->ioptions.clock, r->ioptions.stats, WRITE_RAW_BLOCK_MICROS); StopWatch sw(r->ioptions.clock, r->ioptions.stats, WRITE_RAW_BLOCK_MICROS);
const uint64_t offset = r->get_offset(); const uint64_t offset = r->get_offset();
@ -1330,7 +1339,7 @@ void BlockBasedTableBuilder::WriteMaybeCompressedBlock(
} }
{ {
IOStatus io_s = r->file->Append(block_contents); io_s = r->file->Append(io_options, block_contents);
if (!io_s.ok()) { if (!io_s.ok()) {
r->SetIOStatus(io_s); r->SetIOStatus(io_s);
return; return;
@ -1357,7 +1366,7 @@ void BlockBasedTableBuilder::WriteMaybeCompressedBlock(
"BlockBasedTableBuilder::WriteMaybeCompressedBlock:TamperWithChecksum", "BlockBasedTableBuilder::WriteMaybeCompressedBlock:TamperWithChecksum",
trailer.data()); trailer.data());
{ {
IOStatus io_s = r->file->Append(Slice(trailer.data(), trailer.size())); io_s = r->file->Append(io_options, Slice(trailer.data(), trailer.size()));
if (!io_s.ok()) { if (!io_s.ok()) {
r->SetIOStatus(io_s); r->SetIOStatus(io_s);
return; return;
@ -1394,7 +1403,8 @@ void BlockBasedTableBuilder::WriteMaybeCompressedBlock(
(r->alignment - (r->alignment -
((block_contents.size() + kBlockTrailerSize) & (r->alignment - 1))) & ((block_contents.size() + kBlockTrailerSize) & (r->alignment - 1))) &
(r->alignment - 1); (r->alignment - 1);
IOStatus io_s = r->file->Pad(pad_bytes);
io_s = r->file->Pad(io_options, pad_bytes);
if (io_s.ok()) { if (io_s.ok()) {
r->set_offset(r->get_offset() + pad_bytes); r->set_offset(r->get_offset() + pad_bytes);
} else { } else {
@ -1800,7 +1810,14 @@ void BlockBasedTableBuilder::WriteFooter(BlockHandle& metaindex_block_handle,
r->SetStatus(s); r->SetStatus(s);
return; return;
} }
IOStatus ios = r->file->Append(footer.GetSlice()); IOOptions io_options;
IOStatus ios =
WritableFileWriter::PrepareIOOptions(r->write_options, io_options);
if (!ios.ok()) {
r->SetIOStatus(ios);
return;
}
ios = r->file->Append(io_options, footer.GetSlice());
if (ios.ok()) { if (ios.ok()) {
r->set_offset(r->get_offset() + footer.GetSlice().size()); r->set_offset(r->get_offset() + footer.GetSlice().size());
} else { } else {

View File

@ -2922,7 +2922,7 @@ Status BlockBasedTable::DumpTable(WritableFile* out_file) {
"--------------------------------------\n"; "--------------------------------------\n";
std::unique_ptr<Block> metaindex; std::unique_ptr<Block> metaindex;
std::unique_ptr<InternalIterator> metaindex_iter; std::unique_ptr<InternalIterator> metaindex_iter;
// TODO: plumb Env::IOActivity // TODO: plumb Env::IOActivity, Env::IOPriority
const ReadOptions ro; const ReadOptions ro;
Status s = ReadMetaIndexBlock(ro, nullptr /* prefetch_buffer */, &metaindex, Status s = ReadMetaIndexBlock(ro, nullptr /* prefetch_buffer */, &metaindex,
&metaindex_iter); &metaindex_iter);
@ -3027,7 +3027,7 @@ Status BlockBasedTable::DumpTable(WritableFile* out_file) {
Status BlockBasedTable::DumpIndexBlock(std::ostream& out_stream) { Status BlockBasedTable::DumpIndexBlock(std::ostream& out_stream) {
out_stream << "Index Details:\n" out_stream << "Index Details:\n"
"--------------------------------------\n"; "--------------------------------------\n";
// TODO: plumb Env::IOActivity // TODO: plumb Env::IOActivity, Env::IOPriority
const ReadOptions read_options; const ReadOptions read_options;
std::unique_ptr<InternalIteratorBase<IndexValue>> blockhandles_iter( std::unique_ptr<InternalIteratorBase<IndexValue>> blockhandles_iter(
NewIndexIterator(read_options, /*need_upper_bound_check=*/false, NewIndexIterator(read_options, /*need_upper_bound_check=*/false,
@ -3078,7 +3078,7 @@ Status BlockBasedTable::DumpIndexBlock(std::ostream& out_stream) {
} }
Status BlockBasedTable::DumpDataBlocks(std::ostream& out_stream) { Status BlockBasedTable::DumpDataBlocks(std::ostream& out_stream) {
// TODO: plumb Env::IOActivity // TODO: plumb Env::IOActivity, Env::IOPriority
const ReadOptions read_options; const ReadOptions read_options;
std::unique_ptr<InternalIteratorBase<IndexValue>> blockhandles_iter( std::unique_ptr<InternalIteratorBase<IndexValue>> blockhandles_iter(
NewIndexIterator(read_options, /*need_upper_bound_check=*/false, NewIndexIterator(read_options, /*need_upper_bound_check=*/false,

View File

@ -19,6 +19,7 @@
#include "rocksdb/compression_type.h" #include "rocksdb/compression_type.h"
#include "rocksdb/db.h" #include "rocksdb/db.h"
#include "rocksdb/file_system.h" #include "rocksdb/file_system.h"
#include "rocksdb/options.h"
#include "table/block_based/block_based_table_builder.h" #include "table/block_based/block_based_table_builder.h"
#include "table/block_based/block_based_table_factory.h" #include "table/block_based/block_based_table_factory.h"
#include "table/block_based/partitioned_index_iterator.h" #include "table/block_based/partitioned_index_iterator.h"
@ -133,11 +134,13 @@ class BlockBasedTableReaderBaseTest : public testing::Test {
compression_opts.max_dict_bytes = compression_dict_bytes; compression_opts.max_dict_bytes = compression_dict_bytes;
compression_opts.max_dict_buffer_bytes = compression_dict_bytes; compression_opts.max_dict_buffer_bytes = compression_dict_bytes;
IntTblPropCollectorFactories factories; IntTblPropCollectorFactories factories;
const ReadOptions read_options;
const WriteOptions write_options;
std::unique_ptr<TableBuilder> table_builder( std::unique_ptr<TableBuilder> table_builder(
options_.table_factory->NewTableBuilder( options_.table_factory->NewTableBuilder(
TableBuilderOptions(ioptions, moptions, comparator, &factories, TableBuilderOptions(ioptions, moptions, read_options, write_options,
compression_type, compression_opts, comparator, &factories, compression_type,
0 /* column_family_id */, compression_opts, 0 /* column_family_id */,
kDefaultColumnFamilyName, -1 /* level */), kDefaultColumnFamilyName, -1 /* level */),
writer.get())); writer.get()));

View File

@ -553,9 +553,11 @@ void TestBoundary(InternalKey& ik1, std::string& v1, InternalKey& ik2,
std::unique_ptr<TableBuilder> builder; std::unique_ptr<TableBuilder> builder;
IntTblPropCollectorFactories int_tbl_prop_collector_factories; IntTblPropCollectorFactories int_tbl_prop_collector_factories;
std::string column_family_name; std::string column_family_name;
const ReadOptions read_options;
const WriteOptions write_options;
builder.reset(ioptions.table_factory->NewTableBuilder( builder.reset(ioptions.table_factory->NewTableBuilder(
TableBuilderOptions( TableBuilderOptions(
ioptions, moptions, internal_comparator, ioptions, moptions, read_options, write_options, internal_comparator,
&int_tbl_prop_collector_factories, options.compression, &int_tbl_prop_collector_factories, options.compression,
CompressionOptions(), CompressionOptions(),
TablePropertiesCollectorFactory::Context::kUnknownColumnFamily, TablePropertiesCollectorFactory::Context::kUnknownColumnFamily,
@ -567,7 +569,7 @@ void TestBoundary(InternalKey& ik1, std::string& v1, InternalKey& ik2,
EXPECT_TRUE(builder->status().ok()); EXPECT_TRUE(builder->status().ok());
Status s = builder->Finish(); Status s = builder->Finish();
ASSERT_OK(file_writer->Flush()); ASSERT_OK(file_writer->Flush(IOOptions()));
EXPECT_TRUE(s.ok()) << s.ToString(); EXPECT_TRUE(s.ok()) << s.ToString();
EXPECT_EQ(sink->contents().size(), builder->FileSize()); EXPECT_EQ(sink->contents().size(), builder->FileSize());

View File

@ -77,11 +77,13 @@ class BlockFetcherTest : public testing::Test {
ColumnFamilyOptions cf_options(options_); ColumnFamilyOptions cf_options(options_);
MutableCFOptions moptions(cf_options); MutableCFOptions moptions(cf_options);
IntTblPropCollectorFactories factories; IntTblPropCollectorFactories factories;
const ReadOptions read_options;
const WriteOptions write_options;
std::unique_ptr<TableBuilder> table_builder(table_factory_.NewTableBuilder( std::unique_ptr<TableBuilder> table_builder(table_factory_.NewTableBuilder(
TableBuilderOptions(ioptions, moptions, comparator, &factories, TableBuilderOptions(ioptions, moptions, read_options, write_options,
compression_type, CompressionOptions(), comparator, &factories, compression_type,
0 /* column_family_id */, kDefaultColumnFamilyName, CompressionOptions(), 0 /* column_family_id */,
-1 /* level */), kDefaultColumnFamilyName, -1 /* level */),
writer.get())); writer.get()));
// Build table. // Build table.

View File

@ -318,15 +318,16 @@ Status CuckooTableBuilder::Finish() {
unused_bucket.resize(static_cast<size_t>(bucket_size), 'a'); unused_bucket.resize(static_cast<size_t>(bucket_size), 'a');
// Write the table. // Write the table.
uint32_t num_added = 0; uint32_t num_added = 0;
const IOOptions opts;
for (auto& bucket : buckets) { for (auto& bucket : buckets) {
if (bucket.vector_idx == kMaxVectorIdx) { if (bucket.vector_idx == kMaxVectorIdx) {
io_status_ = file_->Append(Slice(unused_bucket)); io_status_ = file_->Append(opts, Slice(unused_bucket));
} else { } else {
++num_added; ++num_added;
io_status_ = file_->Append(GetKey(bucket.vector_idx)); io_status_ = file_->Append(opts, GetKey(bucket.vector_idx));
if (io_status_.ok()) { if (io_status_.ok()) {
if (value_size_ > 0) { if (value_size_ > 0) {
io_status_ = file_->Append(GetValue(bucket.vector_idx)); io_status_ = file_->Append(opts, GetValue(bucket.vector_idx));
} }
} }
} }
@ -382,7 +383,7 @@ Status CuckooTableBuilder::Finish() {
BlockHandle property_block_handle; BlockHandle property_block_handle;
property_block_handle.set_offset(offset); property_block_handle.set_offset(offset);
property_block_handle.set_size(property_block.size()); property_block_handle.set_size(property_block.size());
io_status_ = file_->Append(property_block); io_status_ = file_->Append(opts, property_block);
offset += property_block.size(); offset += property_block.size();
if (!io_status_.ok()) { if (!io_status_.ok()) {
status_ = io_status_; status_ = io_status_;
@ -395,7 +396,7 @@ Status CuckooTableBuilder::Finish() {
BlockHandle meta_index_block_handle; BlockHandle meta_index_block_handle;
meta_index_block_handle.set_offset(offset); meta_index_block_handle.set_offset(offset);
meta_index_block_handle.set_size(meta_index_block.size()); meta_index_block_handle.set_size(meta_index_block.size());
io_status_ = file_->Append(meta_index_block); io_status_ = file_->Append(opts, meta_index_block);
if (!io_status_.ok()) { if (!io_status_.ok()) {
status_ = io_status_; status_ = io_status_;
return status_; return status_;
@ -408,7 +409,7 @@ Status CuckooTableBuilder::Finish() {
status_ = s; status_ = s;
return status_; return status_;
} }
io_status_ = file_->Append(footer.GetSlice()); io_status_ = file_->Append(opts, footer.GetSlice());
status_ = io_status_; status_ = io_status_;
return status_; return status_;
} }

View File

@ -182,7 +182,7 @@ TEST_F(CuckooBuilderTest, SuccessWithEmptyFile) {
ASSERT_OK(builder.status()); ASSERT_OK(builder.status());
ASSERT_EQ(0UL, builder.FileSize()); ASSERT_EQ(0UL, builder.FileSize());
ASSERT_OK(builder.Finish()); ASSERT_OK(builder.Finish());
ASSERT_OK(file_writer->Close()); ASSERT_OK(file_writer->Close(IOOptions()));
CheckFileContents({}, {}, {}, "", 2, 2, false); CheckFileContents({}, {}, {}, "", 2, 2, false);
} }
@ -229,7 +229,7 @@ TEST_F(CuckooBuilderTest, WriteSuccessNoCollisionFullKey) {
size_t bucket_size = keys[0].size() + values[0].size(); size_t bucket_size = keys[0].size() + values[0].size();
ASSERT_EQ(expected_table_size * bucket_size - 1, builder.FileSize()); ASSERT_EQ(expected_table_size * bucket_size - 1, builder.FileSize());
ASSERT_OK(builder.Finish()); ASSERT_OK(builder.Finish());
ASSERT_OK(file_writer->Close()); ASSERT_OK(file_writer->Close(IOOptions()));
ASSERT_LE(expected_table_size * bucket_size, builder.FileSize()); ASSERT_LE(expected_table_size * bucket_size, builder.FileSize());
std::string expected_unused_bucket = GetInternalKey("key00", true); std::string expected_unused_bucket = GetInternalKey("key00", true);
@ -277,7 +277,7 @@ TEST_F(CuckooBuilderTest, WriteSuccessWithCollisionFullKey) {
size_t bucket_size = keys[0].size() + values[0].size(); size_t bucket_size = keys[0].size() + values[0].size();
ASSERT_EQ(expected_table_size * bucket_size - 1, builder.FileSize()); ASSERT_EQ(expected_table_size * bucket_size - 1, builder.FileSize());
ASSERT_OK(builder.Finish()); ASSERT_OK(builder.Finish());
ASSERT_OK(file_writer->Close()); ASSERT_OK(file_writer->Close(IOOptions()));
ASSERT_LE(expected_table_size * bucket_size, builder.FileSize()); ASSERT_LE(expected_table_size * bucket_size, builder.FileSize());
std::string expected_unused_bucket = GetInternalKey("key00", true); std::string expected_unused_bucket = GetInternalKey("key00", true);
@ -325,7 +325,7 @@ TEST_F(CuckooBuilderTest, WriteSuccessWithCollisionAndCuckooBlock) {
size_t bucket_size = keys[0].size() + values[0].size(); size_t bucket_size = keys[0].size() + values[0].size();
ASSERT_EQ(expected_table_size * bucket_size - 1, builder.FileSize()); ASSERT_EQ(expected_table_size * bucket_size - 1, builder.FileSize());
ASSERT_OK(builder.Finish()); ASSERT_OK(builder.Finish());
ASSERT_OK(file_writer->Close()); ASSERT_OK(file_writer->Close(IOOptions()));
ASSERT_LE(expected_table_size * bucket_size, builder.FileSize()); ASSERT_LE(expected_table_size * bucket_size, builder.FileSize());
std::string expected_unused_bucket = GetInternalKey("key00", true); std::string expected_unused_bucket = GetInternalKey("key00", true);
@ -374,7 +374,7 @@ TEST_F(CuckooBuilderTest, WithCollisionPathFullKey) {
size_t bucket_size = keys[0].size() + values[0].size(); size_t bucket_size = keys[0].size() + values[0].size();
ASSERT_EQ(expected_table_size * bucket_size - 1, builder.FileSize()); ASSERT_EQ(expected_table_size * bucket_size - 1, builder.FileSize());
ASSERT_OK(builder.Finish()); ASSERT_OK(builder.Finish());
ASSERT_OK(file_writer->Close()); ASSERT_OK(file_writer->Close(IOOptions()));
ASSERT_LE(expected_table_size * bucket_size, builder.FileSize()); ASSERT_LE(expected_table_size * bucket_size, builder.FileSize());
std::string expected_unused_bucket = GetInternalKey("key00", true); std::string expected_unused_bucket = GetInternalKey("key00", true);
@ -420,7 +420,7 @@ TEST_F(CuckooBuilderTest, WithCollisionPathFullKeyAndCuckooBlock) {
size_t bucket_size = keys[0].size() + values[0].size(); size_t bucket_size = keys[0].size() + values[0].size();
ASSERT_EQ(expected_table_size * bucket_size - 1, builder.FileSize()); ASSERT_EQ(expected_table_size * bucket_size - 1, builder.FileSize());
ASSERT_OK(builder.Finish()); ASSERT_OK(builder.Finish());
ASSERT_OK(file_writer->Close()); ASSERT_OK(file_writer->Close(IOOptions()));
ASSERT_LE(expected_table_size * bucket_size, builder.FileSize()); ASSERT_LE(expected_table_size * bucket_size, builder.FileSize());
std::string expected_unused_bucket = GetInternalKey("key00", true); std::string expected_unused_bucket = GetInternalKey("key00", true);
@ -463,7 +463,7 @@ TEST_F(CuckooBuilderTest, WriteSuccessNoCollisionUserKey) {
size_t bucket_size = user_keys[0].size() + values[0].size(); size_t bucket_size = user_keys[0].size() + values[0].size();
ASSERT_EQ(expected_table_size * bucket_size - 1, builder.FileSize()); ASSERT_EQ(expected_table_size * bucket_size - 1, builder.FileSize());
ASSERT_OK(builder.Finish()); ASSERT_OK(builder.Finish());
ASSERT_OK(file_writer->Close()); ASSERT_OK(file_writer->Close(IOOptions()));
ASSERT_LE(expected_table_size * bucket_size, builder.FileSize()); ASSERT_LE(expected_table_size * bucket_size, builder.FileSize());
std::string expected_unused_bucket = "key00"; std::string expected_unused_bucket = "key00";
@ -507,7 +507,7 @@ TEST_F(CuckooBuilderTest, WriteSuccessWithCollisionUserKey) {
size_t bucket_size = user_keys[0].size() + values[0].size(); size_t bucket_size = user_keys[0].size() + values[0].size();
ASSERT_EQ(expected_table_size * bucket_size - 1, builder.FileSize()); ASSERT_EQ(expected_table_size * bucket_size - 1, builder.FileSize());
ASSERT_OK(builder.Finish()); ASSERT_OK(builder.Finish());
ASSERT_OK(file_writer->Close()); ASSERT_OK(file_writer->Close(IOOptions()));
ASSERT_LE(expected_table_size * bucket_size, builder.FileSize()); ASSERT_LE(expected_table_size * bucket_size, builder.FileSize());
std::string expected_unused_bucket = "key00"; std::string expected_unused_bucket = "key00";
@ -550,7 +550,7 @@ TEST_F(CuckooBuilderTest, WithCollisionPathUserKey) {
size_t bucket_size = user_keys[0].size() + values[0].size(); size_t bucket_size = user_keys[0].size() + values[0].size();
ASSERT_EQ(expected_table_size * bucket_size - 1, builder.FileSize()); ASSERT_EQ(expected_table_size * bucket_size - 1, builder.FileSize());
ASSERT_OK(builder.Finish()); ASSERT_OK(builder.Finish());
ASSERT_OK(file_writer->Close()); ASSERT_OK(file_writer->Close(IOOptions()));
ASSERT_LE(expected_table_size * bucket_size, builder.FileSize()); ASSERT_LE(expected_table_size * bucket_size, builder.FileSize());
std::string expected_unused_bucket = "key00"; std::string expected_unused_bucket = "key00";
@ -589,7 +589,7 @@ TEST_F(CuckooBuilderTest, FailWhenCollisionPathTooLong) {
ASSERT_OK(builder.status()); ASSERT_OK(builder.status());
} }
ASSERT_TRUE(builder.Finish().IsNotSupported()); ASSERT_TRUE(builder.Finish().IsNotSupported());
ASSERT_OK(file_writer->Close()); ASSERT_OK(file_writer->Close(IOOptions()));
} }
TEST_F(CuckooBuilderTest, FailWhenSameKeyInserted) { TEST_F(CuckooBuilderTest, FailWhenSameKeyInserted) {
@ -619,7 +619,7 @@ TEST_F(CuckooBuilderTest, FailWhenSameKeyInserted) {
ASSERT_OK(builder.status()); ASSERT_OK(builder.status());
ASSERT_TRUE(builder.Finish().IsNotSupported()); ASSERT_TRUE(builder.Finish().IsNotSupported());
ASSERT_OK(file_writer->Close()); ASSERT_OK(file_writer->Close(IOOptions()));
} }
} // namespace ROCKSDB_NAMESPACE } // namespace ROCKSDB_NAMESPACE

View File

@ -59,7 +59,7 @@ CuckooTableReader::CuckooTableReader(
} }
{ {
std::unique_ptr<TableProperties> props; std::unique_ptr<TableProperties> props;
// TODO: plumb Env::IOActivity // TODO: plumb Env::IOActivity, Env::IOPriority
const ReadOptions read_options; const ReadOptions read_options;
status_ = status_ =
ReadTableProperties(file_.get(), file_size, kCuckooTableMagicNumber, ReadTableProperties(file_.get(), file_size, kCuckooTableMagicNumber,

View File

@ -104,7 +104,7 @@ class CuckooReaderTest : public testing::Test {
ASSERT_OK(builder.Finish()); ASSERT_OK(builder.Finish());
ASSERT_EQ(num_items, builder.NumEntries()); ASSERT_EQ(num_items, builder.NumEntries());
file_size = builder.FileSize(); file_size = builder.FileSize();
ASSERT_OK(file_writer->Close()); ASSERT_OK(file_writer->Close(IOOptions()));
// Check reader now. // Check reader now.
std::unique_ptr<RandomAccessFileReader> file_reader; std::unique_ptr<RandomAccessFileReader> file_reader;
@ -431,7 +431,7 @@ void WriteFile(const std::vector<std::string>& keys, const uint64_t num,
} }
ASSERT_OK(builder.Finish()); ASSERT_OK(builder.Finish());
ASSERT_EQ(num, builder.NumEntries()); ASSERT_EQ(num, builder.NumEntries());
ASSERT_OK(file_writer->Close()); ASSERT_OK(file_writer->Close(IOOptions()));
uint64_t file_size; uint64_t file_size;
ASSERT_OK( ASSERT_OK(
@ -571,4 +571,3 @@ int main(int argc, char** argv) {
} }
#endif // GFLAGS. #endif // GFLAGS.

View File

@ -298,7 +298,7 @@ Status MockTableFactory::GetAndWriteNextID(WritableFileWriter* file,
*next_id = next_id_.fetch_add(1); *next_id = next_id_.fetch_add(1);
char buf[4]; char buf[4];
EncodeFixed32(buf, *next_id); EncodeFixed32(buf, *next_id);
return file->Append(Slice(buf, 4)); return file->Append(IOOptions(), Slice(buf, 4));
} }
Status MockTableFactory::GetIDFromFile(RandomAccessFileReader* file, Status MockTableFactory::GetIDFromFile(RandomAccessFileReader* file,

View File

@ -39,7 +39,7 @@ IOStatus WriteBlock(const Slice& block_contents, WritableFileWriter* file,
uint64_t* offset, BlockHandle* block_handle) { uint64_t* offset, BlockHandle* block_handle) {
block_handle->set_offset(*offset); block_handle->set_offset(*offset);
block_handle->set_size(block_contents.size()); block_handle->set_size(block_contents.size());
IOStatus io_s = file->Append(block_contents); IOStatus io_s = file->Append(IOOptions(), block_contents);
if (io_s.ok()) { if (io_s.ok()) {
*offset += block_contents.size(); *offset += block_contents.size();
@ -138,6 +138,7 @@ void PlainTableBuilder::Add(const Slice& key, const Slice& value) {
// temp buffer for metadata bytes between key and value. // temp buffer for metadata bytes between key and value.
char meta_bytes_buf[6]; char meta_bytes_buf[6];
size_t meta_bytes_buf_size = 0; size_t meta_bytes_buf_size = 0;
const IOOptions opts;
ParsedInternalKey internal_key; ParsedInternalKey internal_key;
if (!ParseInternalKey(key, &internal_key, false /* log_err_key */) if (!ParseInternalKey(key, &internal_key, false /* log_err_key */)
@ -178,12 +179,13 @@ void PlainTableBuilder::Add(const Slice& key, const Slice& value) {
EncodeVarint32(meta_bytes_buf + meta_bytes_buf_size, value_size); EncodeVarint32(meta_bytes_buf + meta_bytes_buf_size, value_size);
assert(end_ptr <= meta_bytes_buf + sizeof(meta_bytes_buf)); assert(end_ptr <= meta_bytes_buf + sizeof(meta_bytes_buf));
meta_bytes_buf_size = end_ptr - meta_bytes_buf; meta_bytes_buf_size = end_ptr - meta_bytes_buf;
io_status_ = file_->Append(Slice(meta_bytes_buf, meta_bytes_buf_size)); io_status_ =
file_->Append(opts, Slice(meta_bytes_buf, meta_bytes_buf_size));
} }
// Write value // Write value
if (io_status_.ok()) { if (io_status_.ok()) {
io_status_ = file_->Append(value); io_status_ = file_->Append(opts, value);
offset_ += value_size + meta_bytes_buf_size; offset_ += value_size + meta_bytes_buf_size;
} }
@ -306,7 +308,7 @@ Status PlainTableBuilder::Finish() {
status_ = s; status_ = s;
return status_; return status_;
} }
io_status_ = file_->Append(footer.GetSlice()); io_status_ = file_->Append(IOOptions(), footer.GetSlice());
if (io_status_.ok()) { if (io_status_.ok()) {
offset_ += footer.GetSlice().size(); offset_ += footer.GetSlice().size();
} }

View File

@ -94,6 +94,8 @@ IOStatus PlainTableKeyEncoder::AppendKey(const Slice& key,
Slice key_to_write = key; // Portion of internal key to write out. Slice key_to_write = key; // Portion of internal key to write out.
uint32_t user_key_size = static_cast<uint32_t>(key.size() - 8); uint32_t user_key_size = static_cast<uint32_t>(key.size() - 8);
const IOOptions opts;
if (encoding_type_ == kPlain) { if (encoding_type_ == kPlain) {
if (fixed_user_key_len_ == kPlainTableVariableLength) { if (fixed_user_key_len_ == kPlainTableVariableLength) {
// Write key length // Write key length
@ -101,7 +103,7 @@ IOStatus PlainTableKeyEncoder::AppendKey(const Slice& key,
char* ptr = EncodeVarint32(key_size_buf, user_key_size); char* ptr = EncodeVarint32(key_size_buf, user_key_size);
assert(ptr <= key_size_buf + sizeof(key_size_buf)); assert(ptr <= key_size_buf + sizeof(key_size_buf));
auto len = ptr - key_size_buf; auto len = ptr - key_size_buf;
IOStatus io_s = file->Append(Slice(key_size_buf, len)); IOStatus io_s = file->Append(opts, Slice(key_size_buf, len));
if (!io_s.ok()) { if (!io_s.ok()) {
return io_s; return io_s;
} }
@ -119,7 +121,7 @@ IOStatus PlainTableKeyEncoder::AppendKey(const Slice& key,
key_count_for_prefix_ = 1; key_count_for_prefix_ = 1;
pre_prefix_.SetUserKey(prefix); pre_prefix_.SetUserKey(prefix);
size_bytes_pos += EncodeSize(kFullKey, user_key_size, size_bytes); size_bytes_pos += EncodeSize(kFullKey, user_key_size, size_bytes);
IOStatus io_s = file->Append(Slice(size_bytes, size_bytes_pos)); IOStatus io_s = file->Append(opts, Slice(size_bytes, size_bytes_pos));
if (!io_s.ok()) { if (!io_s.ok()) {
return io_s; return io_s;
} }
@ -137,7 +139,7 @@ IOStatus PlainTableKeyEncoder::AppendKey(const Slice& key,
static_cast<uint32_t>(pre_prefix_.GetUserKey().size()); static_cast<uint32_t>(pre_prefix_.GetUserKey().size());
size_bytes_pos += EncodeSize(kKeySuffix, user_key_size - prefix_len, size_bytes_pos += EncodeSize(kKeySuffix, user_key_size - prefix_len,
size_bytes + size_bytes_pos); size_bytes + size_bytes_pos);
IOStatus io_s = file->Append(Slice(size_bytes, size_bytes_pos)); IOStatus io_s = file->Append(opts, Slice(size_bytes, size_bytes_pos));
if (!io_s.ok()) { if (!io_s.ok()) {
return io_s; return io_s;
} }
@ -152,7 +154,7 @@ IOStatus PlainTableKeyEncoder::AppendKey(const Slice& key,
// in this buffer to safe one file append call, which takes 1 byte. // in this buffer to safe one file append call, which takes 1 byte.
if (parsed_key.sequence == 0 && parsed_key.type == kTypeValue) { if (parsed_key.sequence == 0 && parsed_key.type == kTypeValue) {
IOStatus io_s = IOStatus io_s =
file->Append(Slice(key_to_write.data(), key_to_write.size() - 8)); file->Append(opts, Slice(key_to_write.data(), key_to_write.size() - 8));
if (!io_s.ok()) { if (!io_s.ok()) {
return io_s; return io_s;
} }
@ -160,7 +162,7 @@ IOStatus PlainTableKeyEncoder::AppendKey(const Slice& key,
meta_bytes_buf[*meta_bytes_buf_size] = PlainTableFactory::kValueTypeSeqId0; meta_bytes_buf[*meta_bytes_buf_size] = PlainTableFactory::kValueTypeSeqId0;
*meta_bytes_buf_size += 1; *meta_bytes_buf_size += 1;
} else { } else {
IOStatus io_s = file->Append(key_to_write); IOStatus io_s = file->Append(opts, key_to_write);
if (!io_s.ok()) { if (!io_s.ok()) {
return io_s; return io_s;
} }

View File

@ -126,7 +126,7 @@ Status PlainTableReader::Open(
} }
std::unique_ptr<TableProperties> props; std::unique_ptr<TableProperties> props;
// TODO: plumb Env::IOActivity // TODO: plumb Env::IOActivity, Env::IOPriority
const ReadOptions read_options; const ReadOptions read_options;
auto s = ReadTableProperties(file.get(), file_size, kPlainTableMagicNumber, auto s = ReadTableProperties(file.get(), file_size, kPlainTableMagicNumber,
ioptions, read_options, &props); ioptions, read_options, &props);
@ -300,7 +300,7 @@ Status PlainTableReader::PopulateIndex(TableProperties* props,
BlockContents index_block_contents; BlockContents index_block_contents;
// TODO: plumb Env::IOActivity // TODO: plumb Env::IOActivity, Env::IOPriority
const ReadOptions read_options; const ReadOptions read_options;
Status s = Status s =
ReadMetaBlock(file_info_.file.get(), nullptr /* prefetch_buffer */, ReadMetaBlock(file_info_.file.get(), nullptr /* prefetch_buffer */,

View File

@ -58,6 +58,7 @@ SstFileDumper::SstFileDumper(const Options& options,
options_(options), options_(options),
ioptions_(options_), ioptions_(options_),
moptions_(ColumnFamilyOptions(options_)), moptions_(ColumnFamilyOptions(options_)),
// TODO: plumb Env::IOActivity, Env::IOPriority
read_options_(verify_checksum, false), read_options_(verify_checksum, false),
internal_comparator_(BytewiseComparator()) { internal_comparator_(BytewiseComparator()) {
read_options_.readahead_size = readahead_size; read_options_.readahead_size = readahead_size;
@ -303,14 +304,18 @@ Status SstFileDumper::ShowCompressionSize(
const ImmutableOptions imoptions(opts); const ImmutableOptions imoptions(opts);
const ColumnFamilyOptions cfo(opts); const ColumnFamilyOptions cfo(opts);
const MutableCFOptions moptions(cfo); const MutableCFOptions moptions(cfo);
// TODO: plumb Env::IOActivity, Env::IOPriority
const ReadOptions read_options;
const WriteOptions write_options;
ROCKSDB_NAMESPACE::InternalKeyComparator ikc(opts.comparator); ROCKSDB_NAMESPACE::InternalKeyComparator ikc(opts.comparator);
IntTblPropCollectorFactories block_based_table_factories; IntTblPropCollectorFactories block_based_table_factories;
std::string column_family_name; std::string column_family_name;
int unknown_level = -1; int unknown_level = -1;
TableBuilderOptions tb_opts( TableBuilderOptions tb_opts(
imoptions, moptions, ikc, &block_based_table_factories, compress_type, imoptions, moptions, read_options, write_options, ikc,
compress_opt, &block_based_table_factories, compress_type, compress_opt,
TablePropertiesCollectorFactory::Context::kUnknownColumnFamily, TablePropertiesCollectorFactory::Context::kUnknownColumnFamily,
column_family_name, unknown_level); column_family_name, unknown_level);
uint64_t num_data_blocks = 0; uint64_t num_data_blocks = 0;
@ -375,10 +380,8 @@ Status SstFileDumper::ReadTableProperties(uint64_t table_magic_number,
RandomAccessFileReader* file, RandomAccessFileReader* file,
uint64_t file_size, uint64_t file_size,
FilePrefetchBuffer* prefetch_buffer) { FilePrefetchBuffer* prefetch_buffer) {
// TODO: plumb Env::IOActivity
const ReadOptions read_options;
Status s = ROCKSDB_NAMESPACE::ReadTableProperties( Status s = ROCKSDB_NAMESPACE::ReadTableProperties(
file, file_size, table_magic_number, ioptions_, read_options, file, file_size, table_magic_number, ioptions_, read_options_,
&table_properties_, &table_properties_,
/* memory_allocator= */ nullptr, prefetch_buffer); /* memory_allocator= */ nullptr, prefetch_buffer);
if (!s.ok()) { if (!s.ok()) {

View File

@ -41,7 +41,11 @@ struct SstFileWriter::Rep {
cfh(_cfh), cfh(_cfh),
invalidate_page_cache(_invalidate_page_cache), invalidate_page_cache(_invalidate_page_cache),
skip_filters(_skip_filters), skip_filters(_skip_filters),
db_session_id(_db_session_id) {} db_session_id(_db_session_id) {
// TODO (hx235): pass in `WriteOptions` instead of `rate_limiter_priority`
// during construction
write_options.rate_limiter_priority = io_priority;
}
std::unique_ptr<WritableFileWriter> file_writer; std::unique_ptr<WritableFileWriter> file_writer;
std::unique_ptr<TableBuilder> builder; std::unique_ptr<TableBuilder> builder;
@ -49,6 +53,7 @@ struct SstFileWriter::Rep {
ImmutableOptions ioptions; ImmutableOptions ioptions;
MutableCFOptions mutable_cf_options; MutableCFOptions mutable_cf_options;
Env::IOPriority io_priority; Env::IOPriority io_priority;
WriteOptions write_options;
InternalKeyComparator internal_comparator; InternalKeyComparator internal_comparator;
ExternalSstFileInfo file_info; ExternalSstFileInfo file_info;
InternalKey ikey; InternalKey ikey;
@ -343,13 +348,15 @@ Status SstFileWriter::Open(const std::string& file_path) {
// TODO: it would be better to set oldest_key_time to be used for getting the // TODO: it would be better to set oldest_key_time to be used for getting the
// approximate time of ingested keys. // approximate time of ingested keys.
// TODO: plumb Env::IOActivity, Env::IOPriority
TableBuilderOptions table_builder_options( TableBuilderOptions table_builder_options(
r->ioptions, r->mutable_cf_options, r->internal_comparator, r->ioptions, r->mutable_cf_options, ReadOptions(), r->write_options,
&int_tbl_prop_collector_factories, compression_type, compression_opts, r->internal_comparator, &int_tbl_prop_collector_factories,
cf_id, r->column_family_name, unknown_level, false /* is_bottommost */, compression_type, compression_opts, cf_id, r->column_family_name,
TableFileCreationReason::kMisc, 0 /* oldest_key_time */, unknown_level, false /* is_bottommost */, TableFileCreationReason::kMisc,
0 /* file_creation_time */, "SST Writer" /* db_id */, r->db_session_id, 0 /* oldest_key_time */, 0 /* file_creation_time */,
0 /* target_file_size */, r->next_file_number); "SST Writer" /* db_id */, r->db_session_id, 0 /* target_file_size */,
r->next_file_number);
// External SST files used to each get a unique session id. Now for // External SST files used to each get a unique session id. Now for
// slightly better uniqueness probability in constructing cache keys, we // slightly better uniqueness probability in constructing cache keys, we
// assign fake file numbers to each file (into table properties) and keep // assign fake file numbers to each file (into table properties) and keep
@ -361,8 +368,8 @@ Status SstFileWriter::Open(const std::string& file_path) {
FileTypeSet tmp_set = r->ioptions.checksum_handoff_file_types; FileTypeSet tmp_set = r->ioptions.checksum_handoff_file_types;
r->file_writer.reset(new WritableFileWriter( r->file_writer.reset(new WritableFileWriter(
std::move(sst_file), file_path, r->env_options, r->ioptions.clock, std::move(sst_file), file_path, r->env_options, r->ioptions.clock,
nullptr /* io_tracer */, nullptr /* stats */, r->ioptions.listeners, nullptr /* io_tracer */, r->ioptions.stats, Histograms::SST_WRITE_MICROS,
r->ioptions.file_checksum_gen_factory.get(), r->ioptions.listeners, r->ioptions.file_checksum_gen_factory.get(),
tmp_set.Contains(FileType::kTableFile), false)); tmp_set.Contains(FileType::kTableFile), false));
// TODO(tec) : If table_factory is using compressed block cache, we will // TODO(tec) : If table_factory is using compressed block cache, we will
@ -430,11 +437,13 @@ Status SstFileWriter::Finish(ExternalSstFileInfo* file_info) {
Status s = r->builder->Finish(); Status s = r->builder->Finish();
r->file_info.file_size = r->builder->FileSize(); r->file_info.file_size = r->builder->FileSize();
IOOptions opts;
s = WritableFileWriter::PrepareIOOptions(r->write_options, opts);
if (s.ok()) { if (s.ok()) {
s = r->file_writer->Sync(r->ioptions.use_fsync); s = r->file_writer->Sync(opts, r->ioptions.use_fsync);
r->InvalidatePageCache(true /* closing */).PermitUncheckedError(); r->InvalidatePageCache(true /* closing */).PermitUncheckedError();
if (s.ok()) { if (s.ok()) {
s = r->file_writer->Close(); s = r->file_writer->Close(opts);
} }
} }
if (s.ok()) { if (s.ok()) {

View File

@ -102,6 +102,7 @@ struct TableReaderOptions {
struct TableBuilderOptions { struct TableBuilderOptions {
TableBuilderOptions( TableBuilderOptions(
const ImmutableOptions& _ioptions, const MutableCFOptions& _moptions, const ImmutableOptions& _ioptions, const MutableCFOptions& _moptions,
const ReadOptions& _read_options, const WriteOptions& _write_options,
const InternalKeyComparator& _internal_comparator, const InternalKeyComparator& _internal_comparator,
const IntTblPropCollectorFactories* _int_tbl_prop_collector_factories, const IntTblPropCollectorFactories* _int_tbl_prop_collector_factories,
CompressionType _compression_type, CompressionType _compression_type,
@ -115,6 +116,8 @@ struct TableBuilderOptions {
const uint64_t _target_file_size = 0, const uint64_t _cur_file_num = 0) const uint64_t _target_file_size = 0, const uint64_t _cur_file_num = 0)
: ioptions(_ioptions), : ioptions(_ioptions),
moptions(_moptions), moptions(_moptions),
read_options(_read_options),
write_options(_write_options),
internal_comparator(_internal_comparator), internal_comparator(_internal_comparator),
int_tbl_prop_collector_factories(_int_tbl_prop_collector_factories), int_tbl_prop_collector_factories(_int_tbl_prop_collector_factories),
compression_type(_compression_type), compression_type(_compression_type),
@ -133,6 +136,8 @@ struct TableBuilderOptions {
const ImmutableOptions& ioptions; const ImmutableOptions& ioptions;
const MutableCFOptions& moptions; const MutableCFOptions& moptions;
const ReadOptions& read_options;
const WriteOptions& write_options;
const InternalKeyComparator& internal_comparator; const InternalKeyComparator& internal_comparator;
const IntTblPropCollectorFactories* int_tbl_prop_collector_factories; const IntTblPropCollectorFactories* int_tbl_prop_collector_factories;
const CompressionType compression_type; const CompressionType compression_type;

View File

@ -98,11 +98,13 @@ void TableReaderBenchmark(Options& opts, EnvOptions& env_options,
IntTblPropCollectorFactories int_tbl_prop_collector_factories; IntTblPropCollectorFactories int_tbl_prop_collector_factories;
int unknown_level = -1; int unknown_level = -1;
const WriteOptions write_options;
tb = opts.table_factory->NewTableBuilder( tb = opts.table_factory->NewTableBuilder(
TableBuilderOptions( TableBuilderOptions(ioptions, moptions, read_options, write_options,
ioptions, moptions, ikc, &int_tbl_prop_collector_factories, ikc, &int_tbl_prop_collector_factories,
CompressionType::kNoCompression, CompressionOptions(), CompressionType::kNoCompression,
0 /* column_family_id */, kDefaultColumnFamilyName, unknown_level), CompressionOptions(), 0 /* column_family_id */,
kDefaultColumnFamilyName, unknown_level),
file_writer.get()); file_writer.get());
} else { } else {
s = DB::Open(opts, dbname, &db); s = DB::Open(opts, dbname, &db);
@ -122,7 +124,7 @@ void TableReaderBenchmark(Options& opts, EnvOptions& env_options,
} }
if (!through_db) { if (!through_db) {
tb->Finish(); tb->Finish();
file_writer->Close(); file_writer->Close(IOOptions());
} else { } else {
db->Flush(FlushOptions()); db->Flush(FlushOptions());
} }

View File

@ -383,8 +383,11 @@ class TableConstructor : public Constructor {
} }
std::string column_family_name; std::string column_family_name;
const ReadOptions read_options;
const WriteOptions write_options;
builder.reset(ioptions.table_factory->NewTableBuilder( builder.reset(ioptions.table_factory->NewTableBuilder(
TableBuilderOptions(ioptions, moptions, internal_comparator, TableBuilderOptions(ioptions, moptions, read_options, write_options,
internal_comparator,
&int_tbl_prop_collector_factories, &int_tbl_prop_collector_factories,
options.compression, options.compression_opts, options.compression, options.compression_opts,
kUnknownColumnFamily, column_family_name, level_), kUnknownColumnFamily, column_family_name, level_),
@ -402,7 +405,7 @@ class TableConstructor : public Constructor {
EXPECT_OK(builder->status()); EXPECT_OK(builder->status());
} }
Status s = builder->Finish(); Status s = builder->Finish();
EXPECT_OK(file_writer_->Flush()); EXPECT_OK(file_writer_->Flush(IOOptions()));
EXPECT_TRUE(s.ok()) << s.ToString(); EXPECT_TRUE(s.ok()) << s.ToString();
EXPECT_EQ(TEST_GetSink()->contents().size(), builder->FileSize()); EXPECT_EQ(TEST_GetSink()->contents().size(), builder->FileSize());
@ -1309,7 +1312,7 @@ class FileChecksumTestHelper {
EXPECT_TRUE(table_builder_->status().ok()); EXPECT_TRUE(table_builder_->status().ok());
} }
Status s = table_builder_->Finish(); Status s = table_builder_->Finish();
EXPECT_OK(file_writer_->Flush()); EXPECT_OK(file_writer_->Flush(IOOptions()));
EXPECT_OK(s); EXPECT_OK(s);
EXPECT_EQ(sink_->contents().size(), table_builder_->FileSize()); EXPECT_EQ(sink_->contents().size(), table_builder_->FileSize());
@ -1317,7 +1320,7 @@ class FileChecksumTestHelper {
} }
std::string GetFileChecksum() { std::string GetFileChecksum() {
EXPECT_OK(file_writer_->Close()); EXPECT_OK(file_writer_->Close(IOOptions()));
return table_builder_->GetFileChecksum(); return table_builder_->GetFileChecksum();
} }
@ -4466,9 +4469,11 @@ TEST_P(BlockBasedTableTest, NoFileChecksum) {
FileChecksumTestHelper f(true); FileChecksumTestHelper f(true);
f.CreateWritableFile(); f.CreateWritableFile();
std::unique_ptr<TableBuilder> builder; std::unique_ptr<TableBuilder> builder;
const ReadOptions read_options;
const WriteOptions write_options;
builder.reset(ioptions.table_factory->NewTableBuilder( builder.reset(ioptions.table_factory->NewTableBuilder(
TableBuilderOptions(ioptions, moptions, *comparator, TableBuilderOptions(ioptions, moptions, read_options, write_options,
&int_tbl_prop_collector_factories, *comparator, &int_tbl_prop_collector_factories,
options.compression, options.compression_opts, options.compression, options.compression_opts,
kUnknownColumnFamily, column_family_name, level), kUnknownColumnFamily, column_family_name, level),
f.GetFileWriter())); f.GetFileWriter()));
@ -4502,9 +4507,11 @@ TEST_P(BlockBasedTableTest, Crc32cFileChecksum) {
f.CreateWritableFile(); f.CreateWritableFile();
f.SetFileChecksumGenerator(checksum_crc32c_gen1.release()); f.SetFileChecksumGenerator(checksum_crc32c_gen1.release());
std::unique_ptr<TableBuilder> builder; std::unique_ptr<TableBuilder> builder;
const ReadOptions read_options;
const WriteOptions write_options;
builder.reset(ioptions.table_factory->NewTableBuilder( builder.reset(ioptions.table_factory->NewTableBuilder(
TableBuilderOptions(ioptions, moptions, *comparator, TableBuilderOptions(ioptions, moptions, read_options, write_options,
&int_tbl_prop_collector_factories, *comparator, &int_tbl_prop_collector_factories,
options.compression, options.compression_opts, options.compression, options.compression_opts,
kUnknownColumnFamily, column_family_name, level), kUnknownColumnFamily, column_family_name, level),
f.GetFileWriter())); f.GetFileWriter()));
@ -4548,8 +4555,10 @@ TEST_F(PlainTableTest, BasicPlainTableProperties) {
IntTblPropCollectorFactories int_tbl_prop_collector_factories; IntTblPropCollectorFactories int_tbl_prop_collector_factories;
std::string column_family_name; std::string column_family_name;
int unknown_level = -1; int unknown_level = -1;
const ReadOptions read_options;
const WriteOptions write_options;
std::unique_ptr<TableBuilder> builder(factory.NewTableBuilder( std::unique_ptr<TableBuilder> builder(factory.NewTableBuilder(
TableBuilderOptions(ioptions, moptions, ikc, TableBuilderOptions(ioptions, moptions, read_options, write_options, ikc,
&int_tbl_prop_collector_factories, kNoCompression, &int_tbl_prop_collector_factories, kNoCompression,
CompressionOptions(), kUnknownColumnFamily, CompressionOptions(), kUnknownColumnFamily,
column_family_name, unknown_level), column_family_name, unknown_level),
@ -4562,7 +4571,7 @@ TEST_F(PlainTableTest, BasicPlainTableProperties) {
builder->Add(key, value); builder->Add(key, value);
} }
ASSERT_OK(builder->Finish()); ASSERT_OK(builder->Finish());
ASSERT_OK(file_writer->Flush()); ASSERT_OK(file_writer->Flush(IOOptions()));
test::StringSink* ss = test::StringSink* ss =
static_cast<test::StringSink*>(file_writer->writable_file()); static_cast<test::StringSink*>(file_writer->writable_file());
@ -4572,7 +4581,6 @@ TEST_F(PlainTableTest, BasicPlainTableProperties) {
new RandomAccessFileReader(std::move(source), "test")); new RandomAccessFileReader(std::move(source), "test"));
std::unique_ptr<TableProperties> props; std::unique_ptr<TableProperties> props;
const ReadOptions read_options;
auto s = ReadTableProperties(file_reader.get(), ss->contents().size(), auto s = ReadTableProperties(file_reader.get(), ss->contents().size(),
kPlainTableMagicNumber, ioptions, read_options, kPlainTableMagicNumber, ioptions, read_options,
&props); &props);
@ -4602,9 +4610,10 @@ TEST_F(PlainTableTest, NoFileChecksum) {
int unknown_level = -1; int unknown_level = -1;
FileChecksumTestHelper f(true); FileChecksumTestHelper f(true);
f.CreateWritableFile(); f.CreateWritableFile();
const ReadOptions read_options;
const WriteOptions write_options;
std::unique_ptr<TableBuilder> builder(factory.NewTableBuilder( std::unique_ptr<TableBuilder> builder(factory.NewTableBuilder(
TableBuilderOptions(ioptions, moptions, ikc, TableBuilderOptions(ioptions, moptions, read_options, write_options, ikc,
&int_tbl_prop_collector_factories, kNoCompression, &int_tbl_prop_collector_factories, kNoCompression,
CompressionOptions(), kUnknownColumnFamily, CompressionOptions(), kUnknownColumnFamily,
column_family_name, unknown_level), column_family_name, unknown_level),
@ -4642,9 +4651,10 @@ TEST_F(PlainTableTest, Crc32cFileChecksum) {
FileChecksumTestHelper f(true); FileChecksumTestHelper f(true);
f.CreateWritableFile(); f.CreateWritableFile();
f.SetFileChecksumGenerator(checksum_crc32c_gen1.release()); f.SetFileChecksumGenerator(checksum_crc32c_gen1.release());
const ReadOptions read_options;
const WriteOptions write_options;
std::unique_ptr<TableBuilder> builder(factory.NewTableBuilder( std::unique_ptr<TableBuilder> builder(factory.NewTableBuilder(
TableBuilderOptions(ioptions, moptions, ikc, TableBuilderOptions(ioptions, moptions, read_options, write_options, ikc,
&int_tbl_prop_collector_factories, kNoCompression, &int_tbl_prop_collector_factories, kNoCompression,
CompressionOptions(), kUnknownColumnFamily, CompressionOptions(), kUnknownColumnFamily,
column_family_name, unknown_level), column_family_name, unknown_level),
@ -5252,8 +5262,10 @@ TEST_P(BlockBasedTableTest, DISABLED_TableWithGlobalSeqno) {
new SstFileWriterPropertiesCollectorFactory(2 /* version */, new SstFileWriterPropertiesCollectorFactory(2 /* version */,
0 /* global_seqno*/)); 0 /* global_seqno*/));
std::string column_family_name; std::string column_family_name;
const ReadOptions read_options;
const WriteOptions write_options;
std::unique_ptr<TableBuilder> builder(options.table_factory->NewTableBuilder( std::unique_ptr<TableBuilder> builder(options.table_factory->NewTableBuilder(
TableBuilderOptions(ioptions, moptions, ikc, TableBuilderOptions(ioptions, moptions, read_options, write_options, ikc,
&int_tbl_prop_collector_factories, kNoCompression, &int_tbl_prop_collector_factories, kNoCompression,
CompressionOptions(), kUnknownColumnFamily, CompressionOptions(), kUnknownColumnFamily,
column_family_name, -1), column_family_name, -1),
@ -5267,7 +5279,7 @@ TEST_P(BlockBasedTableTest, DISABLED_TableWithGlobalSeqno) {
builder->Add(ik.Encode(), value); builder->Add(ik.Encode(), value);
} }
ASSERT_OK(builder->Finish()); ASSERT_OK(builder->Finish());
ASSERT_OK(file_writer->Flush()); ASSERT_OK(file_writer->Flush(IOOptions()));
test::RandomRWStringSink ss_rw(sink); test::RandomRWStringSink ss_rw(sink);
uint32_t version; uint32_t version;
@ -5282,7 +5294,6 @@ TEST_P(BlockBasedTableTest, DISABLED_TableWithGlobalSeqno) {
new RandomAccessFileReader(std::move(source), "")); new RandomAccessFileReader(std::move(source), ""));
std::unique_ptr<TableProperties> props; std::unique_ptr<TableProperties> props;
const ReadOptions read_options;
ASSERT_OK(ReadTableProperties(file_reader.get(), ss_rw.contents().size(), ASSERT_OK(ReadTableProperties(file_reader.get(), ss_rw.contents().size(),
kBlockBasedTableMagicNumber, ioptions, kBlockBasedTableMagicNumber, ioptions,
read_options, &props)); read_options, &props));
@ -5306,7 +5317,6 @@ TEST_P(BlockBasedTableTest, DISABLED_TableWithGlobalSeqno) {
// Helper function to get the contents of the table InternalIterator // Helper function to get the contents of the table InternalIterator
std::unique_ptr<TableReader> table_reader; std::unique_ptr<TableReader> table_reader;
const ReadOptions read_options;
std::function<InternalIterator*()> GetTableInternalIter = [&]() { std::function<InternalIterator*()> GetTableInternalIter = [&]() {
std::unique_ptr<FSRandomAccessFile> source( std::unique_ptr<FSRandomAccessFile> source(
new test::StringSource(ss_rw.contents(), 73342, true)); new test::StringSource(ss_rw.contents(), 73342, true));
@ -5434,8 +5444,10 @@ TEST_P(BlockBasedTableTest, BlockAlignTest) {
InternalKeyComparator ikc(options.comparator); InternalKeyComparator ikc(options.comparator);
IntTblPropCollectorFactories int_tbl_prop_collector_factories; IntTblPropCollectorFactories int_tbl_prop_collector_factories;
std::string column_family_name; std::string column_family_name;
const ReadOptions read_options;
const WriteOptions write_options;
std::unique_ptr<TableBuilder> builder(options.table_factory->NewTableBuilder( std::unique_ptr<TableBuilder> builder(options.table_factory->NewTableBuilder(
TableBuilderOptions(ioptions, moptions, ikc, TableBuilderOptions(ioptions, moptions, read_options, write_options, ikc,
&int_tbl_prop_collector_factories, kNoCompression, &int_tbl_prop_collector_factories, kNoCompression,
CompressionOptions(), kUnknownColumnFamily, CompressionOptions(), kUnknownColumnFamily,
column_family_name, -1), column_family_name, -1),
@ -5451,7 +5463,7 @@ TEST_P(BlockBasedTableTest, BlockAlignTest) {
builder->Add(ik.Encode(), value); builder->Add(ik.Encode(), value);
} }
ASSERT_OK(builder->Finish()); ASSERT_OK(builder->Finish());
ASSERT_OK(file_writer->Flush()); ASSERT_OK(file_writer->Flush(IOOptions()));
std::unique_ptr<FSRandomAccessFile> source( std::unique_ptr<FSRandomAccessFile> source(
new test::StringSource(sink->contents(), 73342, false)); new test::StringSource(sink->contents(), 73342, false));
@ -5460,7 +5472,6 @@ TEST_P(BlockBasedTableTest, BlockAlignTest) {
// Helper function to get version, global_seqno, global_seqno_offset // Helper function to get version, global_seqno, global_seqno_offset
std::function<void()> VerifyBlockAlignment = [&]() { std::function<void()> VerifyBlockAlignment = [&]() {
std::unique_ptr<TableProperties> props; std::unique_ptr<TableProperties> props;
const ReadOptions read_options;
ASSERT_OK(ReadTableProperties(file_reader.get(), sink->contents().size(), ASSERT_OK(ReadTableProperties(file_reader.get(), sink->contents().size(),
kBlockBasedTableMagicNumber, ioptions, kBlockBasedTableMagicNumber, ioptions,
read_options, &props)); read_options, &props));
@ -5488,7 +5499,6 @@ TEST_P(BlockBasedTableTest, BlockAlignTest) {
0 /* block_protection_bytes_per_key */), 0 /* block_protection_bytes_per_key */),
std::move(file_reader), sink->contents().size(), &table_reader)); std::move(file_reader), sink->contents().size(), &table_reader));
ReadOptions read_options;
std::unique_ptr<InternalIterator> db_iter(table_reader->NewIterator( std::unique_ptr<InternalIterator> db_iter(table_reader->NewIterator(
read_options, moptions2.prefix_extractor.get(), /*arena=*/nullptr, read_options, moptions2.prefix_extractor.get(), /*arena=*/nullptr,
/*skip_filters=*/false, TableReaderCaller::kUncategorized)); /*skip_filters=*/false, TableReaderCaller::kUncategorized));
@ -5526,9 +5536,10 @@ TEST_P(BlockBasedTableTest, PropertiesBlockRestartPointTest) {
InternalKeyComparator ikc(options.comparator); InternalKeyComparator ikc(options.comparator);
IntTblPropCollectorFactories int_tbl_prop_collector_factories; IntTblPropCollectorFactories int_tbl_prop_collector_factories;
std::string column_family_name; std::string column_family_name;
const ReadOptions read_options;
const WriteOptions write_options;
std::unique_ptr<TableBuilder> builder(options.table_factory->NewTableBuilder( std::unique_ptr<TableBuilder> builder(options.table_factory->NewTableBuilder(
TableBuilderOptions(ioptions, moptions, ikc, TableBuilderOptions(ioptions, moptions, read_options, write_options, ikc,
&int_tbl_prop_collector_factories, kNoCompression, &int_tbl_prop_collector_factories, kNoCompression,
CompressionOptions(), kUnknownColumnFamily, CompressionOptions(), kUnknownColumnFamily,
column_family_name, -1), column_family_name, -1),
@ -5544,7 +5555,7 @@ TEST_P(BlockBasedTableTest, PropertiesBlockRestartPointTest) {
builder->Add(ik.Encode(), value); builder->Add(ik.Encode(), value);
} }
ASSERT_OK(builder->Finish()); ASSERT_OK(builder->Finish());
ASSERT_OK(file_writer->Flush()); ASSERT_OK(file_writer->Flush(IOOptions()));
std::unique_ptr<FSRandomAccessFile> source( std::unique_ptr<FSRandomAccessFile> source(
new test::StringSource(sink->contents(), 73342, true)); new test::StringSource(sink->contents(), 73342, true));
@ -5556,20 +5567,19 @@ TEST_P(BlockBasedTableTest, PropertiesBlockRestartPointTest) {
uint64_t file_size = sink->contents().size(); uint64_t file_size = sink->contents().size();
Footer footer; Footer footer;
IOOptions opts; ASSERT_OK(ReadFooterFromFile(IOOptions(), file, *FileSystem::Default(),
ASSERT_OK(ReadFooterFromFile(opts, file, *FileSystem::Default(),
nullptr /* prefetch_buffer */, file_size, nullptr /* prefetch_buffer */, file_size,
&footer, kBlockBasedTableMagicNumber)); &footer, kBlockBasedTableMagicNumber));
auto BlockFetchHelper = [&](const BlockHandle& handle, BlockType block_type, auto BlockFetchHelper = [&](const BlockHandle& handle, BlockType block_type,
BlockContents* contents) { BlockContents* contents) {
ReadOptions read_options; ReadOptions read_options_for_helper;
read_options.verify_checksums = false; read_options_for_helper.verify_checksums = false;
PersistentCacheOptions cache_options; PersistentCacheOptions cache_options;
BlockFetcher block_fetcher( BlockFetcher block_fetcher(
file, nullptr /* prefetch_buffer */, footer, read_options, handle, file, nullptr /* prefetch_buffer */, footer, read_options_for_helper,
contents, ioptions, false /* decompress */, handle, contents, ioptions, false /* decompress */,
false /*maybe_compressed*/, block_type, false /*maybe_compressed*/, block_type,
UncompressionDict::GetEmptyDict(), cache_options); UncompressionDict::GetEmptyDict(), cache_options);
@ -6117,12 +6127,15 @@ TEST_F(ChargeCompressionDictionaryBuildingBufferTest, Basic) {
InternalKeyComparator ikc(options.comparator); InternalKeyComparator ikc(options.comparator);
IntTblPropCollectorFactories int_tbl_prop_collector_factories; IntTblPropCollectorFactories int_tbl_prop_collector_factories;
const ReadOptions read_options;
const WriteOptions write_options;
std::unique_ptr<TableBuilder> builder( std::unique_ptr<TableBuilder> builder(
options.table_factory->NewTableBuilder( options.table_factory->NewTableBuilder(
TableBuilderOptions( TableBuilderOptions(ioptions, moptions, read_options, write_options,
ioptions, moptions, ikc, &int_tbl_prop_collector_factories, ikc, &int_tbl_prop_collector_factories,
kSnappyCompression, options.compression_opts, kSnappyCompression, options.compression_opts,
kUnknownColumnFamily, "test_cf", -1 /* level */), kUnknownColumnFamily, "test_cf",
-1 /* level */),
file_writer.get())); file_writer.get()));
std::string key1 = "key1"; std::string key1 = "key1";
@ -6193,8 +6206,10 @@ TEST_F(ChargeCompressionDictionaryBuildingBufferTest,
InternalKeyComparator ikc(options.comparator); InternalKeyComparator ikc(options.comparator);
IntTblPropCollectorFactories int_tbl_prop_collector_factories; IntTblPropCollectorFactories int_tbl_prop_collector_factories;
const ReadOptions read_options;
const WriteOptions write_options;
std::unique_ptr<TableBuilder> builder(options.table_factory->NewTableBuilder( std::unique_ptr<TableBuilder> builder(options.table_factory->NewTableBuilder(
TableBuilderOptions(ioptions, moptions, ikc, TableBuilderOptions(ioptions, moptions, read_options, write_options, ikc,
&int_tbl_prop_collector_factories, kSnappyCompression, &int_tbl_prop_collector_factories, kSnappyCompression,
options.compression_opts, kUnknownColumnFamily, options.compression_opts, kUnknownColumnFamily,
"test_cf", -1 /* level */), "test_cf", -1 /* level */),
@ -6278,8 +6293,10 @@ TEST_F(ChargeCompressionDictionaryBuildingBufferTest, BasicWithCacheFull) {
InternalKeyComparator ikc(options.comparator); InternalKeyComparator ikc(options.comparator);
IntTblPropCollectorFactories int_tbl_prop_collector_factories; IntTblPropCollectorFactories int_tbl_prop_collector_factories;
const ReadOptions read_options;
const WriteOptions write_options;
std::unique_ptr<TableBuilder> builder(options.table_factory->NewTableBuilder( std::unique_ptr<TableBuilder> builder(options.table_factory->NewTableBuilder(
TableBuilderOptions(ioptions, moptions, ikc, TableBuilderOptions(ioptions, moptions, read_options, write_options, ikc,
&int_tbl_prop_collector_factories, kSnappyCompression, &int_tbl_prop_collector_factories, kSnappyCompression,
options.compression_opts, kUnknownColumnFamily, options.compression_opts, kUnknownColumnFamily,
"test_cf", -1 /* level */), "test_cf", -1 /* level */),

View File

@ -463,15 +463,16 @@ bool IsPrefetchSupported(const std::shared_ptr<FileSystem>& fs,
Random rnd(301); Random rnd(301);
std::string test_string = rnd.RandomString(4096); std::string test_string = rnd.RandomString(4096);
Slice data(test_string); Slice data(test_string);
Status s = WriteStringToFile(fs.get(), data, tmp, true); IOOptions opts;
Status s = WriteStringToFile(fs.get(), data, tmp, true, opts);
if (s.ok()) { if (s.ok()) {
std::unique_ptr<FSRandomAccessFile> file; std::unique_ptr<FSRandomAccessFile> file;
auto io_s = fs->NewRandomAccessFile(tmp, FileOptions(), &file, nullptr); auto io_s = fs->NewRandomAccessFile(tmp, FileOptions(), &file, nullptr);
if (io_s.ok()) { if (io_s.ok()) {
supported = !(file->Prefetch(0, data.size(), IOOptions(), nullptr) supported =
.IsNotSupported()); !(file->Prefetch(0, data.size(), opts, nullptr).IsNotSupported());
} }
s = fs->DeleteFile(tmp, IOOptions(), nullptr); s = fs->DeleteFile(tmp, opts, nullptr);
} }
return s.ok() && supported; return s.ok() && supported;
} }
@ -521,7 +522,7 @@ Status CorruptFile(Env* env, const std::string& fname, int offset,
for (int i = 0; i < bytes_to_corrupt; i++) { for (int i = 0; i < bytes_to_corrupt; i++) {
contents[i + offset] ^= 0x80; contents[i + offset] ^= 0x80;
} }
s = WriteStringToFile(env, contents, fname); s = WriteStringToFile(env, contents, fname, false /* should_sync */);
} }
if (s.ok() && verify_checksum) { if (s.ok() && verify_checksum) {
Options options; Options options;
@ -544,7 +545,7 @@ Status TruncateFile(Env* env, const std::string& fname, uint64_t new_length) {
s = ReadFileToString(env, fname, &contents); s = ReadFileToString(env, fname, &contents);
if (s.ok()) { if (s.ok()) {
contents.resize(static_cast<size_t>(new_length), 'b'); contents.resize(static_cast<size_t>(new_length), 'b');
s = WriteStringToFile(env, contents, fname); s = WriteStringToFile(env, contents, fname, false /* should_sync */);
} }
return s; return s;
} }

View File

@ -130,7 +130,7 @@ namespace {} // namespace
TEST_F(DBBenchTest, OptionsFile) { TEST_F(DBBenchTest, OptionsFile) {
const std::string kOptionsFileName = test_path_ + "/OPTIONS_test"; const std::string kOptionsFileName = test_path_ + "/OPTIONS_test";
Options opt = GetDefaultOptions(); Options opt = GetDefaultOptions();
ASSERT_OK(PersistRocksDBOptions(DBOptions(opt), {"default"}, ASSERT_OK(PersistRocksDBOptions(WriteOptions(), DBOptions(opt), {"default"},
{ColumnFamilyOptions(opt)}, kOptionsFileName, {ColumnFamilyOptions(opt)}, kOptionsFileName,
opt.env->GetFileSystem().get())); opt.env->GetFileSystem().get()));
@ -149,7 +149,7 @@ TEST_F(DBBenchTest, OptionsFileUniversal) {
Options opt = GetDefaultOptions(kCompactionStyleUniversal, 1); Options opt = GetDefaultOptions(kCompactionStyleUniversal, 1);
ASSERT_OK(PersistRocksDBOptions(DBOptions(opt), {"default"}, ASSERT_OK(PersistRocksDBOptions(WriteOptions(), DBOptions(opt), {"default"},
{ColumnFamilyOptions(opt)}, kOptionsFileName, {ColumnFamilyOptions(opt)}, kOptionsFileName,
opt.env->GetFileSystem().get())); opt.env->GetFileSystem().get()));
@ -166,7 +166,7 @@ TEST_F(DBBenchTest, OptionsFileMultiLevelUniversal) {
Options opt = GetDefaultOptions(kCompactionStyleUniversal, 12); Options opt = GetDefaultOptions(kCompactionStyleUniversal, 12);
ASSERT_OK(PersistRocksDBOptions(DBOptions(opt), {"default"}, ASSERT_OK(PersistRocksDBOptions(WriteOptions(), DBOptions(opt), {"default"},
{ColumnFamilyOptions(opt)}, kOptionsFileName, {ColumnFamilyOptions(opt)}, kOptionsFileName,
opt.env->GetFileSystem().get())); opt.env->GetFileSystem().get()));

View File

@ -4376,8 +4376,10 @@ UnsafeRemoveSstFileCommand::UnsafeRemoveSstFileCommand(
} }
void UnsafeRemoveSstFileCommand::DoCommand() { void UnsafeRemoveSstFileCommand::DoCommand() {
// TODO: plumb Env::IOActivity // TODO: plumb Env::IOActivity, Env::IOPriority
const ReadOptions read_options; const ReadOptions read_options;
const WriteOptions write_options;
PrepareOptions(); PrepareOptions();
OfflineManifestWriter w(options_, db_path_); OfflineManifestWriter w(options_, db_path_);
@ -4402,7 +4404,7 @@ void UnsafeRemoveSstFileCommand::DoCommand() {
s = options_.env->GetFileSystem()->NewDirectory(db_path_, IOOptions(), s = options_.env->GetFileSystem()->NewDirectory(db_path_, IOOptions(),
&db_dir, nullptr); &db_dir, nullptr);
if (s.ok()) { if (s.ok()) {
s = w.LogAndApply(read_options, cfd, &edit, db_dir.get()); s = w.LogAndApply(read_options, write_options, cfd, &edit, db_dir.get());
} }
} }

View File

@ -86,7 +86,9 @@ SimulatedHybridFileSystem::~SimulatedHybridFileSystem() {
metadata += f; metadata += f;
metadata += "\n"; metadata += "\n";
} }
IOStatus s = WriteStringToFile(target(), metadata, metadata_file_name_, true); IOOptions opts;
IOStatus s =
WriteStringToFile(target(), metadata, metadata_file_name_, true, opts);
if (!s.ok()) { if (!s.ok()) {
fprintf(stderr, "Error writing to file %s: %s", metadata_file_name_.c_str(), fprintf(stderr, "Error writing to file %s: %s", metadata_file_name_.c_str(),
s.ToString().c_str()); s.ToString().c_str());
@ -240,4 +242,3 @@ IOStatus SimulatedWritableFile::Sync(const IOOptions& options,
return target()->Sync(options, dbg); return target()->Sync(options, dbg);
} }
} // namespace ROCKSDB_NAMESPACE } // namespace ROCKSDB_NAMESPACE

View File

@ -123,10 +123,12 @@ class SSTDumpToolTest : public testing::Test {
std::string column_family_name; std::string column_family_name;
int unknown_level = -1; int unknown_level = -1;
const WriteOptions write_options;
tb.reset(opts.table_factory->NewTableBuilder( tb.reset(opts.table_factory->NewTableBuilder(
TableBuilderOptions( TableBuilderOptions(
imoptions, moptions, ikc, &int_tbl_prop_collector_factories, imoptions, moptions, read_options, write_options, ikc,
CompressionType::kNoCompression, CompressionOptions(), &int_tbl_prop_collector_factories, CompressionType::kNoCompression,
CompressionOptions(),
TablePropertiesCollectorFactory::Context::kUnknownColumnFamily, TablePropertiesCollectorFactory::Context::kUnknownColumnFamily,
column_family_name, unknown_level), column_family_name, unknown_level),
file_writer.get())); file_writer.get()));
@ -160,7 +162,7 @@ class SSTDumpToolTest : public testing::Test {
} }
} }
ASSERT_OK(tb->Finish()); ASSERT_OK(tb->Finish());
ASSERT_OK(file_writer->Close()); ASSERT_OK(file_writer->Close(IOOptions()));
} }
protected: protected:
@ -417,9 +419,9 @@ TEST_F(SSTDumpToolTest, ValidSSTPath) {
std::string sst_file = MakeFilePath("rocksdb_sst_test.sst"); std::string sst_file = MakeFilePath("rocksdb_sst_test.sst");
createSST(opts, sst_file); createSST(opts, sst_file);
std::string text_file = MakeFilePath("text_file"); std::string text_file = MakeFilePath("text_file");
ASSERT_OK(WriteStringToFile(opts.env, "Hello World!", text_file)); ASSERT_OK(WriteStringToFile(opts.env, "Hello World!", text_file, false));
std::string fake_sst = MakeFilePath("fake_sst.sst"); std::string fake_sst = MakeFilePath("fake_sst.sst");
ASSERT_OK(WriteStringToFile(opts.env, "Not an SST file!", fake_sst)); ASSERT_OK(WriteStringToFile(opts.env, "Not an SST file!", fake_sst, false));
for (const auto& command_arg : {"--command=verify", "--command=identify"}) { for (const auto& command_arg : {"--command=verify", "--command=identify"}) {
snprintf(usage[1], kOptLength, "%s", command_arg); snprintf(usage[1], kOptLength, "%s", command_arg);

View File

@ -0,0 +1 @@
`rocksdb.blobdb.blob.file.write.micros` expands to also measure time writing the header and footer. Therefore the COUNT may be higher and values may be smaller than before. For stacked BlobDB, it no longer measures the time of explictly flushing blob file.

View File

@ -0,0 +1 @@
Fix bugs where `rocksdb.blobdb.blob.file.synced` includes blob files failed to get synced and `rocksdb.blobdb.blob.file.bytes.written` includes blob bytes failed to get written.

Some files were not shown because too many files have changed in this diff Show More