Commit graph

81 commits

Author SHA1 Message Date
Peter Dillinger c96d9a0fbb Allow TablePropertiesCollectorFactory to return null collector (#12129)
Summary:
As part of building another feature, I wanted this:
* Custom implementations of `TablePropertiesCollectorFactory` may now return a `nullptr` collector to decline processing a file, reducing callback overheads in such cases.
* Polished, clarified some related API comments.

Pull Request resolved: https://github.com/facebook/rocksdb/pull/12129

Test Plan: unit test added

Reviewed By: ltamasi

Differential Revision: D51966667

Pulled By: pdillinger

fbshipit-source-id: 2991c08fe6ce3a8c9f14c68f1495f5a17bca2770
2023-12-11 12:02:56 -08:00
Andrew Kryczka 06dc32ef25 internal_repo_rocksdb (435146444452818992) (#12115)
Summary: Pull Request resolved: https://github.com/facebook/rocksdb/pull/12115

Reviewed By: jowlyzhang

Differential Revision: D51745742

Pulled By: ajkr

fbshipit-source-id: 67000d07783b413924798dd9c1751da27e119d53
2023-12-01 11:15:17 -08:00
Hui Xiao f337533b6f Ensure and clarify how RocksDB calls TablePropertiesCollector's functions (#12053)
Summary:
**Context/Summary:**
It's intuitive for users to assume `TablePropertiesCollector::Finish()` is called only once by RocksDB internal by the word "finish".

However, this is currently not true as RocksDB also calls this function in `BlockBased/PlainTableBuilder::GetTableProperties()` to populate user collected properties on demand.

This PR avoids that by moving that populating to where we first call `Finish()` (i.e, `NotifyCollectTableCollectorsOnFinish`)

Bonus: clarified in the API that `GetReadableProperties()` will be called after `Finish()` and added UT to ensure that.

Pull Request resolved: https://github.com/facebook/rocksdb/pull/12053

Test Plan:
- Modified test `DBPropertiesTest.GetUserDefinedTableProperties` to ensure `Finish()` only called once.
- Existing test particularly `db_properties_test, table_properties_collector_test` verify the functionality  `NotifyCollectTableCollectorsOnFinish` and `GetReadableProperties()` are not broken by this change.

Reviewed By: ajkr

Differential Revision: D51095434

Pulled By: hx235

fbshipit-source-id: 1c6275258f9b99dedad313ee8427119126817973
2023-11-08 14:00:36 -08:00
Hui Xiao 9a034801ce Group rocksdb.sst.read.micros stat by different user read IOActivity + misc (#11444)
Summary:
**Context/Summary:**
- Similar to https://github.com/facebook/rocksdb/pull/11288 but for user read such as `Get(), MultiGet(), DBIterator::XXX(), Verify(File)Checksum()`.
   - For this, I refactored some user-facing `MultiGet` calls in `TransactionBase` and various types of `DB` so that it does not call a user-facing `Get()` but `GetImpl()` for passing the `ReadOptions::io_activity` check (see PR conversation)
   - New user read stats breakdown are guarded by `kExceptDetailedTimers` since measurement shows they have 4-5% regression to the upstream/main.

- Misc
   - More refactoring: with https://github.com/facebook/rocksdb/pull/11288, we complete passing `ReadOptions/IOOptions` to FS level. So we can now replace the previously [added](https://github.com/facebook/rocksdb/pull/9424) `rate_limiter_priority` parameter in `RandomAccessFileReader`'s `Read/MultiRead/Prefetch()` with `IOOptions::rate_limiter_priority`
   - Also, `ReadAsync()` call time is measured in `SST_READ_MICRO` now

Pull Request resolved: https://github.com/facebook/rocksdb/pull/11444

Test Plan:
- CI fake db crash/stress test
- Microbenchmarking

**Build** `make clean && ROCKSDB_NO_FBCODE=1 DEBUG_LEVEL=0 make -jN db_basic_bench`
- google benchmark version: 604f6fd3f4
- db_basic_bench_base: upstream
- db_basic_bench_pr: db_basic_bench_base + this PR
- asyncread_db_basic_bench_base: upstream + [db basic bench patch for IteratorNext](https://github.com/facebook/rocksdb/compare/main...hx235:rocksdb:micro_bench_async_read)
- asyncread_db_basic_bench_pr: asyncread_db_basic_bench_base + this PR

**Test**

Get
```
TEST_TMPDIR=/dev/shm ./db_basic_bench_{null_stat|base|pr} --benchmark_filter=DBGet/comp_style:0/max_data:134217728/per_key_size:256/enable_statistics:1/negative_query:0/enable_filter:0/mmap:1/threads:1 --benchmark_repetitions=1000
```

Result
```
Coming soon
```

AsyncRead
```
TEST_TMPDIR=/dev/shm ./asyncread_db_basic_bench_{base|pr} --benchmark_filter=IteratorNext/comp_style:0/max_data:134217728/per_key_size:256/enable_statistics:1/async_io:1/include_detailed_timers:0 --benchmark_repetitions=1000 > syncread_db_basic_bench_{base|pr}.out
```

Result
```
Base:
1956,1956,1968,1977,1979,1986,1988,1988,1988,1990,1991,1991,1993,1993,1993,1993,1994,1996,1997,1997,1997,1998,1999,2001,2001,2002,2004,2007,2007,2008,

PR (2.3% regression, due to measuring `SST_READ_MICRO` that wasn't measured before):
1993,2014,2016,2022,2024,2027,2027,2028,2028,2030,2031,2031,2032,2032,2038,2039,2042,2044,2044,2047,2047,2047,2048,2049,2050,2052,2052,2052,2053,2053,
```

Reviewed By: ajkr

Differential Revision: D45918925

Pulled By: hx235

fbshipit-source-id: 58a54560d9ebeb3a59b6d807639692614dad058a
2023-08-08 17:26:50 -07:00
Peter Dillinger 7a1b0207e6 format_version=6 and context-aware block checksums (#9058)
Summary:
## Context checksum
All RocksDB checksums currently use 32 bits of checking
power, which should be 1 in 4 billion false negative (FN) probability (failing to
detect corruption). This is true for random corruptions, and in some cases
small corruptions are guaranteed to be detected. But some possible
corruptions, such as in storage metadata rather than storage payload data,
would have a much higher FN rate. For example:
* Data larger than one SST block is replaced by data from elsewhere in
the same or another SST file. Especially with block_align=true, the
probability of exact block size match is probably around 1 in 100, making
the FN probability around that same. Without `block_align=true` the
probability of same block start location is probably around 1 in 10,000,
for FN probability around 1 in a million.

To solve this problem in new format_version=6, we add "context awareness"
to block checksum checks. The stored and expected checksum value is
modified based on the block's position in the file and which file it is in. The
modifications are cleverly chosen so that, for example
* blocks within about 4GB of each other are guaranteed to use different context
* blocks that are offset by exactly some multiple of 4GiB are guaranteed to use
different context
* files generated by the same process are guaranteed to use different context
for the same offsets, until wrap-around after 2^32 - 1 files

Thus, with format_version=6, if a valid SST block and checksum is misplaced,
its checksum FN probability should be essentially ideal, 1 in 4B.

## Footer checksum
This change also adds checksum protection to the SST footer (with
format_version=6), for the first time without relying on whole file checksum.
To prevent a corruption of the format_version in the footer (e.g. 6 -> 5) to
defeat the footer checksum, we change much of the footer data format
including an "extended magic number" in format_version 6 that would be
interpreted as empty index and metaindex block handles in older footer
versions. We also change the encoding of handles to free up space for
other new data in footer.

## More detail: making space in footer
In order to keep footer the same size in format_version=6 (avoid change to IO
patterns), we have to free up some space for new data. We do this two ways:
* Metaindex block handle is encoded down to 4 bytes (from 10) by assuming
it immediately precedes the footer, and by assuming it is < 4GB.
* Index block handle is moved into metaindex. (I don't know why it was
in footer to begin with.)

## Performance
In case of small performance penalty, I've made a "pay as you go" optimization
to compensate: replace `MutableCFOptions` in BlockBasedTableBuilder::Rep
with the only field used in that structure after construction: `prefix_extractor`.
This makes the PR an overall performance improvement (results below).

Nevertheless I'm seeing essentially no difference going from fv=5 to fv=6,
even including that improvement for both. That's based on extreme case table
write performance testing, many files with many blocks. This is relatively
checksum intensive (small blocks) and salt generation intensive (small files).

```
(for I in `seq 1 100`; do TEST_TMPDIR=/dev/shm/dbbench2 ./db_bench -benchmarks=fillseq -memtablerep=vector -disable_wal=1 -allow_concurrent_memtable_write=false -num=3000000 -compaction_style=2 -fifo_compaction_max_table_files_size_mb=10000 -fifo_compaction_allow_compaction=0 -write_buffer_size=100000 -compression_type=none -block_size=1000; done) 2>&1 | grep micros/op | tee out
awk '{ tot += $5; n += 1; } END { print int(1.0 * tot / n) }' < out
```

Each value below is ops/s averaged over 100 runs, run simultaneously with competing
configuration for load fairness

Before -> after (both fv=5): 483530 -> 483673 (negligible)
Re-run 1: 480733 -> 485427 (1.0% faster)
Re-run 2: 483821 -> 484541 (0.1% faster)
Before (fv=5) -> after (fv=6): 482006 -> 485100 (0.6% faster)
Re-run 1: 482212 -> 485075 (0.6% faster)
Re-run 2: 483590 -> 484073 (0.1% faster)
After fv=5 -> after fv=6: 483878 -> 485542 (0.3% faster)
Re-run 1: 485331 -> 483385 (0.4% slower)
Re-run 2: 485283 -> 483435 (0.4% slower)
Re-run 3: 483647 -> 486109 (0.5% faster)

Pull Request resolved: https://github.com/facebook/rocksdb/pull/9058

Test Plan:
unit tests included (table_test, db_properties_test, salt in env_test). General DB tests
and crash test updated to test new format_version.

Also temporarily updated the default format version to 6 and saw some test failures. Almost all
were due to an inadvertent additional read in VerifyChecksum to verify the index block checksum,
though it's arguably a bug that VerifyChecksum does not appear to (re-)verify the index block
checksum, just assuming it was verified in opening the index reader (probably *usually* true but
probably not always true). Some other concerns about VerifyChecksum are left in FIXME
comments. The only remaining test failure on change of default (in block_fetcher_test) now
has a comment about how to upgrade the test.

The format compatibility test does not need updating because we have not updated the default
format_version.

Reviewed By: ajkr, mrambacher

Differential Revision: D33100915

Pulled By: pdillinger

fbshipit-source-id: 8679e3e572fa580181a737fd6d113ed53c5422ee
2023-07-30 16:40:01 -07:00
zhangyuxiang.ax 1567108fc1 Add missing table properties in plaintable GetTableProperties() (#11267)
Summary:
Plaintable will miss properties.
It should have some behavior like blockbasedtable.
Here is a unit test for reproduce this bug.

```
#include <gflags/gflags.h>
#include "rocksdb/db.h"
#include "rocksdb/options.h"
#include "rocksdb/table.h"
#include "rocksdb/slice_transform.h"
#include <iostream>
#include <thread>
#include <csignal>
const std::string kKey = "key";

DEFINE_bool(use_plaintable, true, "use plain table");
DEFINE_string(db_path, "/dev/shm/test_zyx_path", "db_path");

rocksdb::DB* db = nullptr;

class NoopTransform : public rocksdb::SliceTransform {
public:
    explicit NoopTransform() {
    }

    virtual const char* Name() const override {
        return "rocksdb.Noop";
    }

    virtual rocksdb::Slice Transform(const rocksdb::Slice& src) const override {
        return src;
    }

    virtual bool InDomain(const rocksdb::Slice& src) const override {
        return true;
    }

    virtual bool InRange(const rocksdb::Slice& dst) const override {
        return true;
    }

    virtual bool SameResultWhenAppended(const rocksdb::Slice& prefix) const override {
        return false;
    }
};

class TestPropertiesCollector : public ::rocksdb::TablePropertiesCollector {
public:
    explicit TestPropertiesCollector() {
    }

private:
    ::rocksdb::Status AddUserKey(const ::rocksdb::Slice& key, const ::rocksdb::Slice& value, ::rocksdb::EntryType type,
                                 ::rocksdb::SequenceNumber seq, uint64_t file_size) override {
        count++;
        return ::rocksdb::Status::OK();
    }

    ::rocksdb::Status Finish(::rocksdb::UserCollectedProperties* properties) override {
        properties->insert({kKey, std::to_string(count)});
        return ::rocksdb::Status::OK();
    }

    ::rocksdb::UserCollectedProperties GetReadableProperties() const override {
        ::rocksdb::UserCollectedProperties properties;
        properties.insert({kKey, std::to_string(count)});
        return properties;
    }

    const char* Name() const override {
        return "TestPropertiesCollector";
    }
    int count = 0;
};

class TestTablePropertiesCollectorFactory : public ::rocksdb::TablePropertiesCollectorFactory {
public:
    explicit TestTablePropertiesCollectorFactory() {
    }

private:
    ::rocksdb::TablePropertiesCollector* CreateTablePropertiesCollector(
            ::rocksdb::TablePropertiesCollectorFactory::Context context) override {
        return new TestPropertiesCollector();
    }

    const char* Name() const override {
        return "test.TablePropertiesCollectorFactory";
    }
};

class TestFlushListener : rocksdb::EventListener {
public:
    const char* Name() const override {
        return "TestFlushListener";
    }
    void OnFlushCompleted(rocksdb::DB* /*db*/, const rocksdb::FlushJobInfo& flush_job_info) override {
        if (flush_job_info.table_properties.user_collected_properties.find(kKey) ==
            flush_job_info.table_properties.user_collected_properties.end()) {
            std::cerr << "OnFlushCompleted: properties not found" << std::endl;
            return;
        }
        std::cerr << "OnFlushCompleted: properties found "
                  << flush_job_info.table_properties.user_collected_properties.at(kKey) << std::endl;
    }
    explicit TestFlushListener() {
    }
};

int main(int argc, char* argv[]) {
    gflags::ParseCommandLineFlags(&argc, &argv, true);
    rocksdb::DBOptions rocksdb_options;
    std::shared_ptr<rocksdb::EventListener> flush_offset;
    rocksdb_options.create_if_missing = true;
    rocksdb_options.create_missing_column_families = true;
    std::shared_ptr<::rocksdb::TablePropertiesCollectorFactory> properties_collector(
            new TestTablePropertiesCollectorFactory());
    rocksdb::ColumnFamilyOptions cfoptions;
    cfoptions.table_properties_collector_factories.emplace_back(properties_collector);
    std::shared_ptr<rocksdb::EventListener> test_cleaner;
    test_cleaner.reset((rocksdb::EventListener*)new TestFlushListener());
    rocksdb_options.listeners.emplace_back(test_cleaner);

    std::vector<rocksdb::ColumnFamilyDescriptor> cf_desc_;
    cf_desc_.emplace_back(rocksdb::kDefaultColumnFamilyName, cfoptions);
    std::vector<rocksdb::ColumnFamilyHandle*> cfhs;
    cfoptions.prefix_extractor.reset(new NoopTransform());
    if (FLAGS_use_plaintable) {
        cfoptions.table_factory.reset(rocksdb::NewPlainTableFactory());
        std::cerr << "use plaintable" << std::endl;
    } else {
        cfoptions.table_factory.reset(rocksdb::NewBlockBasedTableFactory());
        std::cerr << "use blockbasedtable" << std::endl;
    }

    auto s = rocksdb::DB::Open(rocksdb_options, FLAGS_db_path, cf_desc_, &cfhs, &db);
    if (s.ok()) {
        rocksdb::WriteOptions wops;
        wops.disableWAL = true;
        for (int i = 0; i < 1000000; i++) {
            auto status = db->Put(wops, std::to_string(i), std::string(1024, '3'));
            if (!status.ok()) {
                std::cerr << "write fail " << status.getState() << std::endl;
            }
        }
    } else {
        std::cerr << "open rocksdb failed" << s.getState() << std::endl;
    }
    std::this_thread::sleep_for(std::chrono::seconds(1000));
    delete db;
}
```

Pull Request resolved: https://github.com/facebook/rocksdb/pull/11267

Reviewed By: jowlyzhang

Differential Revision: D47689943

Pulled By: hx235

fbshipit-source-id: 585589cc48f8b26c7dd2323fc7ac4a0c3d4df6bb
2023-07-21 17:55:25 -07:00
Hui Xiao 151242ce46 Group rocksdb.sst.read.micros stat by IOActivity flush and compaction (#11288)
Summary:
**Context:**
The existing stat rocksdb.sst.read.micros does not reflect each of compaction and flush cases but aggregate them, which is not so helpful for us to understand IO read behavior of each of them.

**Summary**
- Update `StopWatch` and `RandomAccessFileReader` to record `rocksdb.sst.read.micros` and `rocksdb.file.{flush/compaction}.read.micros`
   - Fixed the default histogram in `RandomAccessFileReader`
- New field `ReadOptions/IOOptions::io_activity`; Pass `ReadOptions` through paths under db open, flush and compaction to where we can prepare `IOOptions` and pass it to `RandomAccessFileReader`
- Use `thread_status_util` for assertion in `DbStressFSWrapper` for continuous testing on we are passing correct `io_activity` under db open, flush and compaction

Pull Request resolved: https://github.com/facebook/rocksdb/pull/11288

Test Plan:
- **Stress test**
- **Db bench 1: rocksdb.sst.read.micros COUNT ≈ sum of rocksdb.file.read.flush.micros's and rocksdb.file.read.compaction.micros's.**  (without blob)
     - May not be exactly the same due to `HistogramStat::Add` only guarantees atomic not accuracy across threads.
```
./db_bench -db=/dev/shm/testdb/ -statistics=true -benchmarks="fillseq" -key_size=32 -value_size=512 -num=50000 -write_buffer_size=655 -target_file_size_base=655 -disable_auto_compactions=false -compression_type=none -bloom_bits=3 (-use_plain_table=1 -prefix_size=10)
```
```
// BlockBasedTable
rocksdb.sst.read.micros P50 : 2.009374 P95 : 4.968548 P99 : 8.110362 P100 : 43.000000 COUNT : 40456 SUM : 114805
rocksdb.file.read.flush.micros P50 : 1.871841 P95 : 3.872407 P99 : 5.540541 P100 : 43.000000 COUNT : 2250 SUM : 6116
rocksdb.file.read.compaction.micros P50 : 2.023109 P95 : 5.029149 P99 : 8.196910 P100 : 26.000000 COUNT : 38206 SUM : 108689

// PlainTable
Does not apply
```
- **Db bench 2: performance**

**Read**

SETUP: db with 900 files
```
./db_bench -db=/dev/shm/testdb/ -benchmarks="fillseq" -key_size=32 -value_size=512 -num=50000 -write_buffer_size=655  -disable_auto_compactions=true -target_file_size_base=655 -compression_type=none
```run till convergence
```
./db_bench -seed=1678564177044286 -use_existing_db=true -db=/dev/shm/testdb -benchmarks=readrandom[-X60] -statistics=true -num=1000000 -disable_auto_compactions=true -compression_type=none -bloom_bits=3
```
Pre-change
`readrandom [AVG 60 runs] : 21568 (± 248) ops/sec`
Post-change (no regression, -0.3%)
`readrandom [AVG 60 runs] : 21486 (± 236) ops/sec`

**Compaction/Flush**run till convergence
```
./db_bench -db=/dev/shm/testdb2/ -seed=1678564177044286 -benchmarks="fillseq[-X60]" -key_size=32 -value_size=512 -num=50000 -write_buffer_size=655  -disable_auto_compactions=false -target_file_size_base=655 -compression_type=none

rocksdb.sst.read.micros  COUNT : 33820
rocksdb.sst.read.flush.micros COUNT : 1800
rocksdb.sst.read.compaction.micros COUNT : 32020
```
Pre-change
`fillseq [AVG 46 runs] : 1391 (± 214) ops/sec;    0.7 (± 0.1) MB/sec`

Post-change (no regression, ~-0.4%)
`fillseq [AVG 46 runs] : 1385 (± 216) ops/sec;    0.7 (± 0.1) MB/sec`

Reviewed By: ajkr

Differential Revision: D44007011

Pulled By: hx235

fbshipit-source-id: a54c89e4846dfc9a135389edf3f3eedfea257132
2023-04-21 09:07:18 -07:00
Hui Xiao 6650ca244e Remove a couple deprecated convenience.h APIs (#11120)
Summary:
**Context/Summary:**
As instructed by convenience.h comments, a few deprecated APIs are removed.

Pull Request resolved: https://github.com/facebook/rocksdb/pull/11120

Test Plan:
- make check & CI
- eyeball check on test semantics.

Reviewed By: pdillinger

Differential Revision: D42937507

Pulled By: hx235

fbshipit-source-id: a9e4709387da01b1d0e9148c2e210f02e9746ee1
2023-02-07 14:11:53 -08:00
sdong 4720ba4391 Remove RocksDB LITE (#11147)
Summary:
We haven't been actively mantaining RocksDB LITE recently and the size must have been gone up significantly. We are removing the support.

Most of changes were done through following comments:

unifdef -m -UROCKSDB_LITE `git grep -l ROCKSDB_LITE | egrep '[.](cc|h)'`

by Peter Dillinger. Others changes were manually applied to build scripts, CircleCI manifests, ROCKSDB_LITE is used in an expression and file db_stress_test_base.cc.

Pull Request resolved: https://github.com/facebook/rocksdb/pull/11147

Test Plan: See CI

Reviewed By: pdillinger

Differential Revision: D42796341

fbshipit-source-id: 4920e15fc2060c2cd2221330a6d0e5e65d4b7fe2
2023-01-27 13:14:19 -08:00
anand76 727bad78b8 Format files under table/ by clang-format (#10852)
Summary:
Run clang-format on files under the `table` directory.

Pull Request resolved: https://github.com/facebook/rocksdb/pull/10852

Reviewed By: ajkr

Differential Revision: D40650732

Pulled By: anand1976

fbshipit-source-id: 2023a958e37fd6274040c5181130284600c9e0ef
2022-10-25 11:50:38 -07:00
anand76 fb9a025892 Fix platform 10 build with folly (#10708)
Summary:
Change the library order in PLATFORM_LDFLAGS to enable fbcode platform 10 build with folly. This PR also has a few fixes for platform 10 compiler errors.

Pull Request resolved: https://github.com/facebook/rocksdb/pull/10708

Test Plan:
ROCKSDB_FBCODE_BUILD_WITH_PLATFORM010=1 USE_COROUTINES=1 make -j64 check
ROCKSDB_FBCODE_BUILD_WITH_PLATFORM010=1 USE_FOLLY=1 make -j64 check

Reviewed By: ajkr

Differential Revision: D39666590

Pulled By: anand1976

fbshipit-source-id: 256a1127ef561399cd6299a6a392ca29bd68ca44
2022-09-21 14:43:44 -07:00
Jay Zhuang a3acf2ef87 Add seqno to time mapping (#10338)
Summary:
Which will be used for tiered storage to preclude hot data from
compacting to the cold tier (the last level).
Internally, adding seqno to time mapping. A periodic_task is scheduled
to record the current_seqno -> current_time in certain cadence. When
memtable flush, the mapping informaiton is stored in sstable property.
During compaction, the mapping information are merged and get the
approximate time of sequence number, which is used to determine if a key
is recently inserted or not and preclude it from the last level if it's
recently inserted (within the `preclude_last_level_data_seconds`).

Pull Request resolved: https://github.com/facebook/rocksdb/pull/10338

Test Plan: CI

Reviewed By: siying

Differential Revision: D37810187

Pulled By: jay-zhuang

fbshipit-source-id: 6953be7a18a99de8b1cb3b162d712f79c2b4899f
2022-07-14 21:49:34 -07:00
mrambacher 204a42ca97 Added GetFactoryCount/Names/Types to ObjectRegistry (#9358)
Summary:
These methods allow for more thorough testing of the ObjectRegistry and Customizable infrastructure in a simpler manner.  With this change, the Customizable tests can now check what factories are registered and attempt to create each of them in a systematic fashion.

With this change, I think all of the factories registered with the ObjectRegistry/CreateFromString are now tested via the customizable_test classes.

Note that there were a few other minor changes.  There was a "posix://*" register with the ObjectRegistry which was missed during the PatternEntry conversion -- these changes found that.  The nickname and default names for the FileSystem classes was also inverted.

Pull Request resolved: https://github.com/facebook/rocksdb/pull/9358

Reviewed By: pdillinger

Differential Revision: D33433542

Pulled By: mrambacher

fbshipit-source-id: 9a32da74e6620745b4eeffb2712be70eeeadfa7e
2022-05-16 09:44:43 -07:00
sdong 736a7b5433 Remove own ToString() (#9955)
Summary:
ToString() is created as some platform doesn't support std::to_string(). However, we've already used std::to_string() by mistake for 16 months (in db/db_info_dumper.cc). This commit just remove ToString().

Pull Request resolved: https://github.com/facebook/rocksdb/pull/9955

Test Plan: Watch CI tests

Reviewed By: riversand963

Differential Revision: D36176799

fbshipit-source-id: bdb6dcd0e3a3ab96a1ac810f5d0188f684064471
2022-05-06 13:03:58 -07:00
Peter Dillinger efd035164b Meta-internal folly integration with F14FastMap (#9546)
Summary:
Especially after updating to C++17, I don't see a compelling case for
*requiring* any folly components in RocksDB. I was able to purge the existing
hard dependencies, and it can be quite difficult to strip out non-trivial components
from folly for use in RocksDB. (The prospect of doing that on F14 has changed
my mind on the best approach here.)

But this change creates an optional integration where we can plug in
components from folly at compile time, starting here with F14FastMap to replace
std::unordered_map when possible (probably no public APIs for example). I have
replaced the biggest CPU users of std::unordered_map with compile-time
pluggable UnorderedMap which will use F14FastMap when USE_FOLLY is set.
USE_FOLLY is always set in the Meta-internal buck build, and a simulation of
that is in the Makefile for public CI testing. A full folly build is not needed, but
checking out the full folly repo is much simpler for getting the dependency,
and anything else we might want to optionally integrate in the future.

Some picky details:
* I don't think the distributed mutex stuff is actually used, so it was easy to remove.
* I implemented an alternative to `folly::constexpr_log2` (which is much easier
in C++17 than C++11) so that I could pull out the hard dependencies on
`ConstexprMath.h`
* I had to add noexcept move constructors/operators to some types to make
F14's complainUnlessNothrowMoveAndDestroy check happy, and I added a
macro to make that easier in some common cases.
* Updated Meta-internal buck build to use folly F14Map (always)

No updates to HISTORY.md nor INSTALL.md as this is not (yet?) considered a
production integration for open source users.

Pull Request resolved: https://github.com/facebook/rocksdb/pull/9546

Test Plan:
CircleCI tests updated so that a couple of them use folly.

Most internal unit & stress/crash tests updated to use Meta-internal latest folly.
(Note: they should probably use buck but they currently use Makefile.)

Example performance improvement: when filter partitions are pinned in cache,
they are tracked by PartitionedFilterBlockReader::filter_map_ and we can build
a test that exercises that heavily. Build DB with

```
TEST_TMPDIR=/dev/shm/rocksdb ./db_bench -benchmarks=fillrandom -num=10000000 -disable_wal=1 -write_buffer_size=30000000 -bloom_bits=16 -compaction_style=2 -fifo_compaction_max_table_files_size_mb=10000 -fifo_compaction_allow_compaction=0 -partition_index_and_filters
```

and test with (simultaneous runs with & without folly, ~20 times each to see
convergence)

```
TEST_TMPDIR=/dev/shm/rocksdb ./db_bench_folly -readonly -use_existing_db -benchmarks=readrandom -num=10000000 -bloom_bits=16 -compaction_style=2 -fifo_compaction_max_table_files_size_mb=10000 -fifo_compaction_allow_compaction=0 -partition_index_and_filters -duration=40 -pin_l0_filter_and_index_blocks_in_cache
```

Average ops/s no folly: 26229.2
Average ops/s with folly: 26853.3 (+2.4%)

Reviewed By: ajkr

Differential Revision: D34181736

Pulled By: pdillinger

fbshipit-source-id: ffa6ad5104c2880321d8a1aa7187e00ab0d02e94
2022-04-13 07:34:01 -07:00
ehds@qq.com 60422f1676 Replace GetUserKey with ExtractUserKey (#9664)
Summary:
Replace `GetUserKey` with `ExtractUserKey`

Pull Request resolved: https://github.com/facebook/rocksdb/pull/9664

Reviewed By: jay-zhuang

Differential Revision: D34673571

Pulled By: ajkr

fbshipit-source-id: 5acf7d0d1a45efce0ccbe505991acf76c9cdc461
2022-03-15 10:02:33 -07:00
Andrew Kryczka babe56ddba Add rate limiter priority to ReadOptions (#9424)
Summary:
Users can set the priority for file reads associated with their operation by setting `ReadOptions::rate_limiter_priority` to something other than `Env::IO_TOTAL`. Rate limiting `VerifyChecksum()` and `VerifyFileChecksums()` is the motivation for this PR, so it also includes benchmarks and minor bug fixes to get that working.

`RandomAccessFileReader::Read()` already had support for rate limiting compaction reads. I changed that rate limiting to be non-specific to compaction, but rather performed according to the passed in `Env::IOPriority`. Now the compaction read rate limiting is supported by setting `rate_limiter_priority = Env::IO_LOW` on its `ReadOptions`.

There is no default value for the new `Env::IOPriority` parameter to `RandomAccessFileReader::Read()`. That means this PR goes through all callers (in some cases multiple layers up the call stack) to find a `ReadOptions` to provide the priority. There are TODOs for cases I believe it would be good to let user control the priority some day (e.g., file footer reads), and no TODO in cases I believe it doesn't matter (e.g., trace file reads).

The API doc only lists the missing cases where a file read associated with a provided `ReadOptions` cannot be rate limited. For cases like file ingestion checksum calculation, there is no API to provide `ReadOptions` or `Env::IOPriority`, so I didn't count that as missing.

Pull Request resolved: https://github.com/facebook/rocksdb/pull/9424

Test Plan:
- new unit tests
- new benchmarks on ~50MB database with 1MB/s read rate limit and 100ms refill interval; verified with strace reads are chunked (at 0.1MB per chunk) and spaced roughly 100ms apart.
  - setup command: `./db_bench -benchmarks=fillrandom,compact -db=/tmp/testdb -target_file_size_base=1048576 -disable_auto_compactions=true -file_checksum=true`
  - benchmarks command: `strace -ttfe pread64 ./db_bench -benchmarks=verifychecksum,verifyfilechecksums -use_existing_db=true -db=/tmp/testdb -rate_limiter_bytes_per_sec=1048576 -rate_limit_bg_reads=1 -rate_limit_user_ops=true -file_checksum=true`
- crash test using IO_USER priority on non-validation reads with https://github.com/facebook/rocksdb/issues/9567 reverted: `python3 tools/db_crashtest.py blackbox --max_key=1000000 --write_buffer_size=524288 --target_file_size_base=524288 --level_compaction_dynamic_level_bytes=true --duration=3600 --rate_limit_bg_reads=true --rate_limit_user_ops=true --rate_limiter_bytes_per_sec=10485760 --interval=10`

Reviewed By: hx235

Differential Revision: D33747386

Pulled By: ajkr

fbshipit-source-id: a2d985e97912fba8c54763798e04f006ccc56e0c
2022-02-16 23:18:14 -08:00
Peter Dillinger fc9d4071f0 Fast path for detecting unchanged prefix_extractor (#9407)
Summary:
Fixes a major performance regression in 6.26, where
extra CPU is spent in SliceTransform::AsString when reads involve
a prefix_extractor (Get, MultiGet, Seek). Common case performance
is now better than 6.25.

This change creates a "fast path" for verifying that the current prefix
extractor is unchanged and compatible with what was used to
generate a table file. This fast path detects the common case by
pointer comparison on the current prefix_extractor and a "known
good" prefix extractor (if applicable) that is saved at the time the
table reader is opened. The "known good" prefix extractor is saved
as another shared_ptr copy (in an existing field, however) to ensure
the pointer is not recycled.

When the prefix_extractor has changed to a different instance but
same compatible configuration (rare, odd), performance is still a
regression compared to 6.25, but this is likely acceptable because
of the oddity of such a case. The performance of incompatible
prefix_extractor is essentially unchanged.

Also fixed a minor case (ForwardIterator) where a prefix_extractor
could be used via a raw pointer after being freed as a shared_ptr,
if replaced via SetOptions.

Pull Request resolved: https://github.com/facebook/rocksdb/pull/9407

Test Plan:
## Performance
Populate DB with `TEST_TMPDIR=/dev/shm/rocksdb ./db_bench -benchmarks=fillrandom -num=10000000 -disable_wal=1 -write_buffer_size=10000000 -bloom_bits=16 -compaction_style=2 -fifo_compaction_max_table_files_size_mb=10000 -fifo_compaction_allow_compaction=0 -prefix_size=12`

Running head-to-head comparisons simultaneously with `TEST_TMPDIR=/dev/shm/rocksdb ./db_bench -use_existing_db -readonly -benchmarks=seekrandom -num=10000000 -duration=20 -disable_wal=1 -bloom_bits=16 -compaction_style=2 -fifo_compaction_max_table_files_size_mb=10000 -fifo_compaction_allow_compaction=0 -prefix_size=12`

Below each is compared by ops/sec vs. baseline which is version 6.25 (multiple baseline runs because of variable machine load)

v6.26: 4833 vs. 6698 (<- major regression!)
v6.27: 4737 vs. 6397 (still)
New: 6704 vs. 6461 (better than baseline in common case)
Disabled fastpath: 4843 vs. 6389 (e.g. if prefix extractor instance changes but is still compatible)
Changed prefix size (no usable filter) in new: 787 vs. 5927
Changed prefix size (no usable filter) in new & baseline: 773 vs. 784

Reviewed By: mrambacher

Differential Revision: D33677812

Pulled By: pdillinger

fbshipit-source-id: 571d9711c461fb97f957378a061b7e7dbc4d6a76
2022-01-21 11:37:46 -08:00
mrambacher 1973fcba11 Restore Regex support for ObjectLibrary::Register, rename new APIs to allow old one to be deprecated in the future (#9362)
Summary:
In order to support old-style regex function registration, restored the original "Register<T>(string, Factory)" method using regular expressions.  The PatternEntry methods were left in place but renamed to AddFactory.  The goal is to allow for the deprecation of the original regex Registry method in an upcoming release.

Added modes to the PatternEntry kMatchZeroOrMore and kMatchAtLeastOne to match * or +, respectively (kMatchAtLeastOne was the original behavior).

Pull Request resolved: https://github.com/facebook/rocksdb/pull/9362

Reviewed By: pdillinger

Differential Revision: D33432562

Pulled By: mrambacher

fbshipit-source-id: ed88ab3f9a2ad0d525c7bd1692873f9bb3209d02
2022-01-11 06:33:48 -08:00
mrambacher 1c39b7952b Remove/Reduce use of Regex in ObjectRegistry/Library (#9264)
Summary:
Added new ObjectLibrary::Entry classes to replace/reduce the use of Regex.  For simple factories that only do name matching, there are "StringEntry" and "AltStringEntry" classes.  For classes that use some semblance of regular expressions, there is a PatternEntry class that can match a name and prefixes.  There is also a class for Customizable::IndividualId format matches.

Added tests for the new derivative classes and got all unit tests to pass.

Resolves https://github.com/facebook/rocksdb/issues/9225.

Pull Request resolved: https://github.com/facebook/rocksdb/pull/9264

Reviewed By: pdillinger

Differential Revision: D33062001

Pulled By: mrambacher

fbshipit-source-id: c2d2143bd2d38bdf522705c8280c35381b135c03
2021-12-29 07:56:23 -08:00
Peter Dillinger e92a0ed040 Optimize & clean up footer code (#9280)
Summary:
Again, ahead of planned changes in https://github.com/facebook/rocksdb/issues/9058. This change improves
performance (vs. pre-https://github.com/facebook/rocksdb/issues/9240 baseline) by separating a FooterBuilder from
Footer, where FooterBuilder includes (inline owns) the serialized data
so that it can be stack allocated.

Pull Request resolved: https://github.com/facebook/rocksdb/pull/9280

Test Plan:
existing tests + performance testing below

Extreme case performance testing as in https://github.com/facebook/rocksdb/issues/9240 with

    TEST_TMPDIR=/dev/shm/ ./db_bench -benchmarks=fillseq -memtablerep=vector -allow_concurrent_memtable_write=false -num=30000000

(Each is ops/s averaged over 50 runs, run simultaneously with competing
configuration for load fairness)
Pre-https://github.com/facebook/rocksdb/issues/9240 baseline (f577458): 436389
With https://github.com/facebook/rocksdb/issues/9240 (653c392): 417946 (-4.2% vs. baseline)
This change: 443762 (+1.7% vs. baseline)

Reviewed By: ajkr

Differential Revision: D33077220

Pulled By: pdillinger

fbshipit-source-id: 7eaa6499589aac1693414a758e8c799216c5016c
2021-12-13 17:43:07 -08:00
Peter Dillinger 653c392e47 More refactoring ahead of footer & meta changes (#9240)
Summary:
I'm working on a new format_version=6 to support context
checksum (https://github.com/facebook/rocksdb/issues/9058) and this includes much of the refactoring and test
updates to support that change.

Test coverage data and manual inspection agree on dead code in
block_based_table_reader.cc (removed).

Pull Request resolved: https://github.com/facebook/rocksdb/pull/9240

Test Plan:
tests enhanced to cover more cases etc.

Extreme case performance testing indicates small % regression in fillseq (w/ compaction), though CPU profile etc. doesn't suggest any explanation. There is enhanced correctness checking in Footer::DecodeFrom, but this should be negligible.

TEST_TMPDIR=/dev/shm/ ./db_bench -benchmarks=fillseq -memtablerep=vector -allow_concurrent_memtable_write=false -num=30000000 -checksum_type=1 --disable_wal={false,true}

(Each is ops/s averaged over 50 runs, run simultaneously with competing configuration for load fairness)
Before w/ wal: 454512
After w/ wal: 444820 (-2.1%)
Before w/o wal: 1004560
After w/o wal: 998897 (-0.6%)

Since this doesn't modify WAL code, one would expect real effects to be larger in w/o wal case.

This regression will be corrected in a follow-up PR.

Reviewed By: ajkr

Differential Revision: D32813769

Pulled By: pdillinger

fbshipit-source-id: 444a244eabf3825cd329b7d1b150cddce320862f
2021-12-10 08:13:26 -08:00
Peter Dillinger 230660be73 Improve / clean up meta block code & integrity (#9163)
Summary:
* Checksums are now checked on meta blocks unless specifically
suppressed or not applicable (e.g. plain table). (Was other way around.)
This means a number of cases that were not checking checksums now are,
including direct read TableProperties in Version::GetTableProperties
(fixed in meta_blocks ReadTableProperties), reading any block from
PersistentCache (fixed in BlockFetcher), read TableProperties in
SstFileDumper (ldb/sst_dump/BackupEngine) before table reader open,
maybe more.
* For that to work, I moved the global_seqno+TableProperties checksum
logic to the shared table/ code, because that is used by many utilies
such as SstFileDumper.
* Also for that to work, we have to know when we're dealing with a block
that has a checksum (trailer), so added that capability to Footer based
on magic number, and from there BlockFetcher.
* Knowledge of trailer presence has also fixed a problem where other
table formats were reading blocks including bytes for a non-existant
trailer--and awkwardly kind-of not using them, e.g. no shared code
checking checksums. (BlockFetcher compression type was populated
incorrectly.) Now we only read what is needed.
* Minimized code duplication and differing/incompatible/awkward
abstractions in meta_blocks.{cc,h} (e.g. SeekTo in metaindex block
without parsing block handle)
* Moved some meta block handling code from table_properties*.*
* Moved some code specific to block-based table from shared table/ code
to BlockBasedTable class. The checksum stuff means we can't completely
separate it, but things that don't need to be in shared table/ code
should not be.
* Use unique_ptr rather than raw ptr in more places. (Note: you can
std::move from unique_ptr to shared_ptr.)

Without enhancements to GetPropertiesOfAllTablesTest (see below),
net reduction of roughly 100 lines of code.

Pull Request resolved: https://github.com/facebook/rocksdb/pull/9163

Test Plan:
existing tests and
* Enhanced DBTablePropertiesTest.GetPropertiesOfAllTablesTest to verify that
checksums are now checked on direct read of table properties by TableCache
(new test would fail before this change)
* Also enhanced DBTablePropertiesTest.GetPropertiesOfAllTablesTest to test
putting table properties under old meta name
* Also generally enhanced that same test to actually test what it was
supposed to be testing already, by kicking things out of table cache when
we don't want them there.

Reviewed By: ajkr, mrambacher

Differential Revision: D32514757

Pulled By: pdillinger

fbshipit-source-id: 507964b9311d186ae8d1131182290cbd97a99fa9
2021-11-18 11:43:44 -08:00
mrambacher 13ae16c315 Cleanup includes in dbformat.h (#8930)
Summary:
This header file was including everything and the kitchen sink when it did not need to.  This resulted in many places including this header when they needed other pieces instead.

Cleaned up this header to only include what was needed and fixed up the remaining code to include what was now missing.

Hopefully, this sort of code hygiene cleanup will speed up the builds...

Pull Request resolved: https://github.com/facebook/rocksdb/pull/8930

Reviewed By: pdillinger

Differential Revision: D31142788

Pulled By: mrambacher

fbshipit-source-id: 6b45de3f300750c79f751f6227dece9cfd44085d
2021-09-29 04:04:40 -07:00
Hui Xiao d6bd1a0291 Support "level_at_creation" in TablePropertiesCollectorFactory::Context (#8919)
Summary:
Context:
Exposing the level of the sst file (i.e, table) where it is created in `TablePropertiesCollectorFactory::Context` allows users of `TablePropertiesCollectorFactory` to customize some implementation details of `TablePropertiesCollectorFactory` and `TablePropertiesCollector` based on the level of creation. For example, `TablePropertiesCollector::NeedCompact()` can return different values based on level of creation.
- Declared an extra field `level_at_creation` in `TablePropertiesCollectorFactory::Context`
- Allowed `level_at_creation` to be passed in as an argument in `IntTblPropCollectorFactory::CreateIntTblPropCollector()` and `UserKeyTablePropertiesCollectorFactory::CreateIntTblPropCollector()`, the latter of which is an internal wrapper of user's passed-in `TablePropertiesCollectorFactory::CreateTablePropertiesCollector()` used in table-building process
- Called `IntTblPropCollectorFactory::CreateIntTblPropCollector()` with `level_at_creation` passed into both `BlockBasedTableBuilder` and `PlainTableBuilder`
  -  `PlainTableBuilder` previously did not capture `level_at_creation` from `TableBuilderOptions` in `PlainTableFactory`. In order for it to call the method with this parameter, this PR also made `PlainTableBuilder` capture `level_at_creation` as a required parameter
- Called `IntTblPropCollectorFactory::CreateIntTblPropCollector()` with `level_at_creation` its overridden functions in its derived classes, including `RegularKeysStartWithAFactory::CreateIntTblPropCollector()` in `table_properties_collector_test.cc`, `SstFileWriterPropertiesCollectorFactory::CreateIntTblPropCollector()` in `sst_file_writer_collectors.h`

Pull Request resolved: https://github.com/facebook/rocksdb/pull/8919

Test Plan:
- Passed the added assertion for `context.level_at_creation`
- Passed existing tests
- Run `Make` to make sure adding a required parameter to `PlainTableBuilder`'s constructor does not break anything

Reviewed By: anand1976

Differential Revision: D30951729

Pulled By: hx235

fbshipit-source-id: c4a0173b0d9344a4cf47e1b987d759c1c73cb474
2021-09-28 12:35:24 -07:00
mrambacher e0f697d2bd Make SliceTransform into a Customizable class (#8641)
Summary:
Made SliceTransform into a Customizable class.

Would be nice to write a test that stored and used a custom transform  in an SST table.

There are a set of tests (DBBlockFliterTest.PrefixExtractor*, SamePrefixTest.InDomainTest, PrefixTest.PrefixAndWholeKeyTest that run the same with or without a SliceTransform/PrefixFilter.  Is this expected?

Pull Request resolved: https://github.com/facebook/rocksdb/pull/8641

Reviewed By: zhichao-cao

Differential Revision: D31142793

Pulled By: mrambacher

fbshipit-source-id: bb08672fccbfdc263dcae21f25a62307e1facda1
2021-09-27 07:43:47 -07:00
mrambacher beed86473a Make MemTableRepFactory into a Customizable class (#8419)
Summary:
This PR does the following:
-> Makes the MemTableRepFactory into a Customizable class and creatable/configurable via CreateFromString
-> Makes the existing implementations compatible with configurations
-> Moves the "SpecialRepFactory" test class into testutil, accessible via the ObjectRegistry or a NewSpecial API

New tests were added to validate the functionality and all existing tests pass.  db_bench and memtablerep_bench were hand-tested to verify the functionality in those tools.

Pull Request resolved: https://github.com/facebook/rocksdb/pull/8419

Reviewed By: zhichao-cao

Differential Revision: D29558961

Pulled By: mrambacher

fbshipit-source-id: 81b7229636e4e649a0c914e73ac7b0f8454c931c
2021-09-08 07:46:44 -07:00
Peter Dillinger c9cd5d25a8 Remove some unneeded code (#8736)
Summary:
* FullKey and ParseFullKey appear to serve no purpose in the public API
(or anything else) so removed. Only use in one test updated.
* NumberToString serves no purpose vs. ToString so removed, numerous
calls updated
* Remove unnecessary forward declarations in metadata.h by re-arranging
class definitions.
* Remove some unneeded semicolons

Pull Request resolved: https://github.com/facebook/rocksdb/pull/8736

Test Plan: existing tests

Reviewed By: mrambacher

Differential Revision: D30700039

Pulled By: pdillinger

fbshipit-source-id: 1e436a576f511a6ed8b4d97af7cc8216bc729af2
2021-09-01 14:28:58 -07:00
Peter Dillinger 04db764831 Embed original file number in SST table properties (#8686)
Summary:
I very recently realized that with https://github.com/facebook/rocksdb/issues/8669 we cannot later add
file numbers to external SST files (so that more can share db session
ids for better uniqueness properties), because of forward compatibility.
We would have a version of RocksDB that assumes session IDs are unique
on external SST files and therefore can't really break that invariant in
future files.

This change adds a table property for "orig_file_number" which is
populated by normal SST files and also external SST files generated by
SstFileWriter. SstFileWriter now keeps a db_session_id for life of the
object and increments its own file numbers for embedding in table
properties. (They are arguably "fake" file numbers because these numbers
and not embedded in the file name.)

While updating block_based_table_builder, I removed several unnecessary
fields from Rep, because following the pattern would have created
another unnecessary field.

This change also updates block_based_table_reader to use this new
property when available, which means that for newer SST files, we can
determine the stable/original <db_session_id,file_number> unique
identifier using just the file contents, not the file name. (It's a bit
complicated; detailed comments in block_based_table_reader.)

Also added DB host id to properties listing by sst_dump, which could be
useful in debugging.

Pull Request resolved: https://github.com/facebook/rocksdb/pull/8686

Test Plan: majorly overhauled StableCacheKeys test for this change

Reviewed By: zhichao-cao

Differential Revision: D30457742

Pulled By: pdillinger

fbshipit-source-id: 2e5ae7dddeb94fb9d8eac8a928486aed8b8cd445
2021-08-20 20:40:48 -07:00
Levi Tamasi 1ae026c400 Partially revert the "apply subrange of table property collectors" change (#8465)
Summary:
We ended up using a different approach for tracking the amount of
garbage in blob files (see e.g. https://github.com/facebook/rocksdb/pull/8450),
so the ability to apply only a range of table property collectors is
now unnecessary. The patch reverts this part of
https://github.com/facebook/rocksdb/pull/8298 while keeping the cleanup done
in that PR.

Pull Request resolved: https://github.com/facebook/rocksdb/pull/8465

Test Plan: `make check`

Reviewed By: jay-zhuang

Differential Revision: D29399921

Pulled By: ltamasi

fbshipit-source-id: af64816c357d0829b9d7ba8ca1477038138f6f0a
2021-07-06 10:14:32 -07:00
Levi Tamasi d83542ca83 Make it possible to apply only a subrange of table property collectors (#8298)
Summary:
This patch does two things:
1) Introduces some aliases in order to eliminate/prevent long-winded type names
w/r/t the internal table property collectors (see e.g.
`std::vector<std::unique_ptr<IntTblPropCollectorFactory>>`).
2) Makes it possible to apply only a subrange of table property collectors during
table building by turning `TableBuilderOptions::int_tbl_prop_collector_factories`
from a pointer to a `vector` into a range (i.e. a pair of iterators).

Rationale: I plan to introduce a BlobDB related table property collector, which
should only be applied during table creation if blob storage is enabled at the moment
(which can be changed dynamically). This change will make it possible to include/
exclude the BlobDB related collector as needed without having to introduce
a second `vector` of collectors in `ColumnFamilyData` with pretty much the same
contents.

Pull Request resolved: https://github.com/facebook/rocksdb/pull/8298

Test Plan: `make check`

Reviewed By: jay-zhuang

Differential Revision: D28430910

Pulled By: ltamasi

fbshipit-source-id: a81d28f2c59495865300f43deb2257d2e6977c8e
2021-05-17 18:28:39 -07:00
mrambacher 8948dc8524 Make ImmutableOptions struct that inherits from ImmutableCFOptions and ImmutableDBOptions (#8262)
Summary:
The ImmutableCFOptions contained a bunch of fields that belonged to the ImmutableDBOptions.  This change cleans that up by introducing an ImmutableOptions struct.  Following the pattern of Options struct, this class inherits from the DB and CFOption structs (of the Immutable form).

Only one structural change (the ImmutableCFOptions::fs was changed to a shared_ptr from a raw one) is in this PR.  All of the other changes involve moving the member variables from the ImmutableCFOptions into the ImmutableOptions and changing member variables or function parameters as required for compilation purposes.

Follow-on PRs may do a further clean-up of the code, such as renaming variables (such as "ImmutableOptions cf_options") and potentially eliminating un-needed function parameters (there is no longer a need to pass both an ImmutableDBOptions and an ImmutableOptions to a function).

Pull Request resolved: https://github.com/facebook/rocksdb/pull/8262

Reviewed By: pdillinger

Differential Revision: D28226540

Pulled By: mrambacher

fbshipit-source-id: 18ae71eadc879dedbe38b1eb8e6f9ff5c7147dbf
2021-05-05 14:00:17 -07:00
Peter Dillinger 85becd94c1 Refactor: use TableBuilderOptions to reduce parameter lists (#8240)
Summary:
Greatly reduced the not-quite-copy-paste giant parameter lists
of rocksdb::NewTableBuilder, rocksdb::BuildTable,
BlockBasedTableBuilder::Rep ctor, and BlockBasedTableBuilder ctor.

Moved weird separate parameter `uint32_t column_family_id` of
TableFactory::NewTableBuilder into TableBuilderOptions.

Re-ordered parameters to TableBuilderOptions ctor, so that `uint64_t
target_file_size` is not randomly placed between uint64_t timestamps
(was easy to mix up).

Replaced a couple of fields of BlockBasedTableBuilder::Rep with a
FilterBuildingContext. The motivation for this change is making it
easier to pass along more data into new fields in FilterBuildingContext
(follow-up PR).

Pull Request resolved: https://github.com/facebook/rocksdb/pull/8240

Test Plan: ASAN make check

Reviewed By: mrambacher

Differential Revision: D28075891

Pulled By: pdillinger

fbshipit-source-id: fddb3dbb8260a0e8bdcbb51b877ebabf9a690d4f
2021-04-29 07:00:50 -07:00
mrambacher 0ca6d6297f Rename variables in ImmutableCFOptions to avoid conflicts with ImmutableDBOptions (#8227)
Summary:
Renaming ImmutableCFOptions::info_log and statistics to logger and stats.  This is stage 2 in creating an ImmutableOptions class.  It is necessary because the names match those in ImmutableOptions and have different types.

Pull Request resolved: https://github.com/facebook/rocksdb/pull/8227

Reviewed By: jay-zhuang

Differential Revision: D28000967

Pulled By: mrambacher

fbshipit-source-id: 3bf2aa04e8f1e8724d825b7deacf41080c14420b
2021-04-26 12:43:45 -07:00
mrambacher 6bab3a34e9 Move RegisterOptions into the Configurable API (#8223)
Summary:
As previously coded, a Configurable extension would need access to code not in the public API.  This change moves RegisterOptions into the Configurable class and therefore available to public extensions.

Pull Request resolved: https://github.com/facebook/rocksdb/pull/8223

Reviewed By: anand1976

Differential Revision: D27960188

Pulled By: mrambacher

fbshipit-source-id: ac88b19397183df633902def5b5701b9b65fbf40
2021-04-26 03:13:24 -07:00
mrambacher c442f6809f Create a Customizable class to load classes and configurations (#6590)
Summary:
The Customizable class is an extension of the Configurable class and allows instances to be created by a name/ID.  Classes that extend customizable can define their Type (e.g. "TableFactory", "Cache") and  a method to instantiate them (TableFactory::CreateFromString).  Customizable objects can be registered with the ObjectRegistry and created dynamically.

Future PRs will make more types of objects extend Customizable.

Pull Request resolved: https://github.com/facebook/rocksdb/pull/6590

Reviewed By: cheng-chang

Differential Revision: D24841553

Pulled By: zhichao-cao

fbshipit-source-id: d0c2132bd932e971cbfe2c908ca2e5db30c5e155
2020-11-11 15:10:41 -08:00
Ramkumar Vadivelu 9a690a74e1 In ParseInternalKey(), include corrupt key info in Status (#7515)
Summary:
Fixes Issue https://github.com/facebook/rocksdb/issues/7497

When allow_data_in_errors db_options is set, log error key details in `ParseInternalKey()`

Have fixed most of the calls. Have few TODOs still pending - because have to make more deeper changes to pass in the allow_data_in_errors flag. Will do those in a separate PR later.

Tests:
- make check
- some of the existing tests that exercise the "internal key too small" condition are: dbformat_test, cuckoo_table_builder_test
- some of the existing tests that exercise the corrupted key path are: corruption_test, merge_helper_test, compaction_iterator_test

Example of new status returns:
- Key too small - `Corrupted Key: Internal Key too small. Size=5`
- Corrupt key with allow_data_in_errors option set to false: `Corrupted Key: '<redacted>' seq:3, type:3`
- Corrupt key with allow_data_in_errors option set to true: `Corrupted Key: '61' seq:3, type:3`

Pull Request resolved: https://github.com/facebook/rocksdb/pull/7515

Reviewed By: ajkr

Differential Revision: D24240264

Pulled By: ramvadiv

fbshipit-source-id: bc48f5d4475ac19d7713e16df37505b31aac42e7
2020-10-28 10:12:58 -07:00
mrambacher 1eda625eab Revert Statuses returned from pre-Configurable options functions (#7563)
Summary:
Further refinement of the earlier PR.  Now the Status is NotFound with a subcode of PathNotFound. Also the existing functions for options parsing/loading are reverted to return InvalidArgument no matter in which way the user-provided arguments are deemed invalid.

Pull Request resolved: https://github.com/facebook/rocksdb/pull/7563

Reviewed By: zhichao-cao

Differential Revision: D24422491

Pulled By: ajkr

fbshipit-source-id: ba6b237cd0584d3f925c5ba0d349aeb8c250af67
2020-10-20 11:53:28 -07:00
anand76 00751e4292 Add a host location property to TableProperties (#7479)
Summary:
This PR adds support for writing a location identifier of the DB host to SST files as a table property. By default, the hostname is used, but can be overridden by the user. There have been some recent corruptions in files written by ```SstFileWriter``` before checksumming, so this property can be used to trace it back to the writing host and checking the host for hardware isues.

Pull Request resolved: https://github.com/facebook/rocksdb/pull/7479

Test Plan: Add new unit tests

Reviewed By: pdillinger

Differential Revision: D24340671

Pulled By: anand1976

fbshipit-source-id: 2038949fd8d160c0633ccb4f9da77740f19fa2a2
2020-10-19 11:38:48 -07:00
Zhichao Cao 16bff5370d Add plain_table_db_test to ASSERT_STATUS_CHECKED list (#7482)
Summary:
Add plain_table_db_test to ASSERT_STATUS_CHECKED list

Pull Request resolved: https://github.com/facebook/rocksdb/pull/7482

Test Plan: ASSERT_STATUS_CHECKED=1 make -j48 plain_table_db_test

Reviewed By: riversand963

Differential Revision: D24034987

Pulled By: zhichao-cao

fbshipit-source-id: e61c937d55ded0947cc8936937362dafed572a60
2020-10-13 12:00:09 -07:00
Ramkumar Vadivelu e04a50923d Change ParseInternalKey() to return Status instead of bool (#7457)
Summary:
Fixes https://github.com/facebook/rocksdb/issues/7430

Change ParseInternalKey() to return Status instead of bool.

db_bench (seekrandom) based before/after results with value size of 100 bytes and 16 bytes can be found at (tests ran on an udb server):
https://www.dropbox.com/s/47bwamdy5ozngph/PIK_ret_Status_results.xlsx?dl=0

![db_bench_results](https://user-images.githubusercontent.com/62277872/94642825-2a21a800-029a-11eb-88f2-124136c83fd3.png)

Pull Request resolved: https://github.com/facebook/rocksdb/pull/7457

Reviewed By: ajkr

Differential Revision: D24002433

Pulled By: ramvadiv

fbshipit-source-id: ac253ecf577a29044c47c3fe254a01e71404c44c
2020-09-30 19:16:47 -07:00
sdong d08a9005b7 Make db_basic_test pass assert status checked (#7452)
Summary:
Add db_basic_test status check list. Some of the warnings are suppressed. It is possible that some of them are due to real bugs.

Pull Request resolved: https://github.com/facebook/rocksdb/pull/7452

Test Plan: See CI tests pass.

Reviewed By: zhichao-cao

Differential Revision: D23979764

fbshipit-source-id: 6151570c2a9b931b0fbb3fe939a94b2bd1583cbe
2020-09-29 09:49:04 -07:00
yaphet 323a834d1d Fix typo (#7353)
Summary: Pull Request resolved: https://github.com/facebook/rocksdb/pull/7353

Reviewed By: ajkr

Differential Revision: D23585525

Pulled By: zhichao-cao

fbshipit-source-id: 3f72d9663ec207b9dfe6b287d455a13d3c86879f
2020-09-19 18:11:16 -07:00
mrambacher 7d472accdc Bring the Configurable options together (#5753)
Summary:
This PR merges the functionality of making the ColumnFamilyOptions, TableFactory, and DBOptions into Configurable into a single PR, resolving any merge conflicts

Pull Request resolved: https://github.com/facebook/rocksdb/pull/5753

Reviewed By: ajkr

Differential Revision: D23385030

Pulled By: zhichao-cao

fbshipit-source-id: 8b977a7731556230b9b8c5a081b98e49ee4f160a
2020-09-14 17:01:01 -07:00
rockeet b649d8cb97 Fixed Factory construct just for calling .Name() (#7080)
Summary: Pull Request resolved: https://github.com/facebook/rocksdb/pull/7080

Reviewed By: riversand963

Differential Revision: D22412352

Pulled By: ajkr

fbshipit-source-id: 1d7f4c1621040a0130245139b52c3f4d3deac865
2020-07-08 11:54:00 -07:00
Anand Ananthabhotla 9a5886bd8c Extend Get/MultiGet deadline support to table open (#6982)
Summary:
Current implementation of the ```read_options.deadline``` option only checks the deadline for random file reads during point lookups. This PR extends the checks to file opens, prefetches and preloads as part of table open.

The main changes are in the ```BlockBasedTable```, partitioned index and filter readers, and ```TableCache``` to take ReadOptions as an additional parameter. In ```BlockBasedTable::Open```, in order to retain existing behavior w.r.t checksum verification and block cache usage, we filter out most of the options in ```ReadOptions``` except ```deadline```. However, having the ```ReadOptions``` gives us more flexibility to honor other options like verify_checksums, fill_cache etc. in the future.

Additional changes in callsites due to function signature changes in ```NewTableReader()``` and ```FilePrefetchBuffer```.
Pull Request resolved: https://github.com/facebook/rocksdb/pull/6982

Test Plan: Add new unit tests in db_basic_test

Reviewed By: riversand963

Differential Revision: D22219515

Pulled By: anand1976

fbshipit-source-id: 8a3b92f4a889808013838603aa3ca35229cd501b
2020-06-29 14:53:17 -07:00
sdong f9817201af Add unity build to CircleCI (#7026)
Summary:
We are still keeping unity build working. So it's a good idea to add to a pre-commit CI.
A latest GCC docker image just to get a little bit more coverage. Fix three small issues to make it pass.
Also make unity_test to run db_basic_test rather than db_test to cut the test time. There is no point to run expensive tests here. It was set to run db_test before db_basic_test was separated out.
Pull Request resolved: https://github.com/facebook/rocksdb/pull/7026

Test Plan: watch tests to pass.

Reviewed By: zhichao-cao

Differential Revision: D22223197

fbshipit-source-id: baa3b6cbb623bf359829b63ce35715c75bcb0ed4
2020-06-26 11:14:08 -07:00
Zitan Chen 94d04529de Store DB identity and DB session ID in SST files (#6983)
Summary:
`db_id` and `db_session_id` are now part of the table properties for all formats and stored in SST files. This adds about 99 bytes to each new SST file.

The `TablePropertiesNames` for these two identifiers are `rocksdb.creating.db.identity` and `rocksdb.creating.session.identity`.

In addition, SST files generated from SstFileWriter and Repairer have DB identity “SST Writer” and “DB Repairer”, respectively. Their DB session IDs are generated in the same way as `DB::GetDbSessionId`.

A table property test is added.
Pull Request resolved: https://github.com/facebook/rocksdb/pull/6983

Test Plan: make check and some manual tests.

Reviewed By: zhichao-cao

Differential Revision: D22048826

Pulled By: gg814

fbshipit-source-id: afdf8c11424a6f509b5c0b06dafad584a80103c9
2020-06-17 10:57:40 -07:00
Yanqin Jin 3020df9df5 Remove unnecessary inclusion of version_edit.h in env (#6952)
Summary:
In db_options.c, we should avoid including header files in the `db` directory to avoid introducing unnecessary dependency. The reason why `version_edit.h` has been included in `db_options.cc` is because we need two constants, `kUnknownChecksum` and `kUnknownChecksumFuncName`. We can put these two constants as `constexpr` in the public header `file_checksum.h`.

Test plan (devserver):
make check
Pull Request resolved: https://github.com/facebook/rocksdb/pull/6952

Reviewed By: zhichao-cao

Differential Revision: D21925341

Pulled By: riversand963

fbshipit-source-id: 2902f3b74c97f0cf16c58ad24c095c787c3a40e2
2020-06-07 21:56:55 -07:00
mrambacher 38be686160 Add Struct Type to OptionsTypeInfo (#6425)
Summary:
Added code for generically handing structs to OptionTypeInfo.  A struct is a collection of variables handled by their own map of OptionTypeInfos.  Examples of structs include Compaction and Cache options.
Pull Request resolved: https://github.com/facebook/rocksdb/pull/6425

Reviewed By: siying

Differential Revision: D21668789

Pulled By: zhichao-cao

fbshipit-source-id: 064b110de39dadf82361ed4663f7ac1a535b0b07
2020-05-21 10:58:39 -07:00