rocksdb/util/bloom_test.cc

649 lines
18 KiB
C++
Raw Normal View History

// Copyright (c) 2011-present, Facebook, Inc. All rights reserved.
// This source code is licensed under both the GPLv2 (found in the
// COPYING file in the root directory) and Apache 2.0 License
// (found in the LICENSE.Apache file in the root directory).
//
// Copyright (c) 2012 The LevelDB Authors. All rights reserved.
// Use of this source code is governed by a BSD-style license that can be
// found in the LICENSE file. See the AUTHORS file for names of contributors.
2014-05-09 15:34:18 +00:00
#ifndef GFLAGS
#include <cstdio>
int main() {
fprintf(stderr, "Please install gflags to run this test... Skipping...\n");
return 0;
2014-05-09 15:34:18 +00:00
}
#else
#include <array>
Implement full filter for block based table. Summary: 1. Make filter_block.h a base class. Derive block_based_filter_block and full_filter_block. The previous one is the traditional filter block. The full_filter_block is newly added. It would generate a filter block that contain all the keys in SST file. 2. When querying a key, table would first check if full_filter is available. If not, it would go to the exact data block and check using block_based filter. 3. User could choose to use full_filter or tradional(block_based_filter). They would be stored in SST file with different meta index name. "filter.filter_policy" or "full_filter.filter_policy". Then, Table reader is able to know the fllter block type. 4. Some optimizations have been done for full_filter_block, thus it requires a different interface compared to the original one in filter_policy.h. 5. Actual implementation of filter bits coding/decoding is placed in util/bloom_impl.cc Benchmark: base commit 1d23b5c470844c1208301311f0889eca750431c0 Command: db_bench --db=/dev/shm/rocksdb --num_levels=6 --key_size=20 --prefix_size=20 --keys_per_prefix=0 --value_size=100 --write_buffer_size=134217728 --max_write_buffer_number=2 --target_file_size_base=33554432 --max_bytes_for_level_base=1073741824 --verify_checksum=false --max_background_compactions=4 --use_plain_table=0 --memtablerep=prefix_hash --open_files=-1 --mmap_read=1 --mmap_write=0 --bloom_bits=10 --bloom_locality=1 --memtable_bloom_bits=500000 --compression_type=lz4 --num=393216000 --use_hash_search=1 --block_size=1024 --block_restart_interval=16 --use_existing_db=1 --threads=1 --benchmarks=readrandom —disable_auto_compactions=1 Read QPS increase for about 30% from 2230002 to 2991411. Test Plan: make all check valgrind db_test db_stress --use_block_based_filter = 0 ./auto_sanity_test.sh Reviewers: igor, yhchiang, ljin, sdong Reviewed By: sdong Subscribers: dhruba, leveldb Differential Revision: https://reviews.facebook.net/D20979
2014-09-08 17:37:05 +00:00
#include <vector>
#include "logging/logging.h"
#include "memory/arena.h"
#include "rocksdb/filter_policy.h"
#include "table/block_based/filter_policy_internal.h"
#include "test_util/testharness.h"
#include "test_util/testutil.h"
#include "util/gflags_compat.h"
#include "util/hash.h"
using GFLAGS_NAMESPACE::ParseCommandLineFlags;
2014-05-09 15:34:18 +00:00
DEFINE_int32(bits_per_key, 10, "");
namespace rocksdb {
static const int kVerbose = 1;
static Slice Key(int i, char* buffer) {
std::string s;
PutFixed32(&s, static_cast<uint32_t>(i));
memcpy(buffer, s.c_str(), sizeof(i));
return Slice(buffer, sizeof(i));
}
Implement full filter for block based table. Summary: 1. Make filter_block.h a base class. Derive block_based_filter_block and full_filter_block. The previous one is the traditional filter block. The full_filter_block is newly added. It would generate a filter block that contain all the keys in SST file. 2. When querying a key, table would first check if full_filter is available. If not, it would go to the exact data block and check using block_based filter. 3. User could choose to use full_filter or tradional(block_based_filter). They would be stored in SST file with different meta index name. "filter.filter_policy" or "full_filter.filter_policy". Then, Table reader is able to know the fllter block type. 4. Some optimizations have been done for full_filter_block, thus it requires a different interface compared to the original one in filter_policy.h. 5. Actual implementation of filter bits coding/decoding is placed in util/bloom_impl.cc Benchmark: base commit 1d23b5c470844c1208301311f0889eca750431c0 Command: db_bench --db=/dev/shm/rocksdb --num_levels=6 --key_size=20 --prefix_size=20 --keys_per_prefix=0 --value_size=100 --write_buffer_size=134217728 --max_write_buffer_number=2 --target_file_size_base=33554432 --max_bytes_for_level_base=1073741824 --verify_checksum=false --max_background_compactions=4 --use_plain_table=0 --memtablerep=prefix_hash --open_files=-1 --mmap_read=1 --mmap_write=0 --bloom_bits=10 --bloom_locality=1 --memtable_bloom_bits=500000 --compression_type=lz4 --num=393216000 --use_hash_search=1 --block_size=1024 --block_restart_interval=16 --use_existing_db=1 --threads=1 --benchmarks=readrandom —disable_auto_compactions=1 Read QPS increase for about 30% from 2230002 to 2991411. Test Plan: make all check valgrind db_test db_stress --use_block_based_filter = 0 ./auto_sanity_test.sh Reviewers: igor, yhchiang, ljin, sdong Reviewed By: sdong Subscribers: dhruba, leveldb Differential Revision: https://reviews.facebook.net/D20979
2014-09-08 17:37:05 +00:00
static int NextLength(int length) {
if (length < 10) {
length += 1;
} else if (length < 100) {
length += 10;
} else if (length < 1000) {
length += 100;
} else {
length += 1000;
}
return length;
}
rocksdb: switch to gtest Summary: Our existing test notation is very similar to what is used in gtest. It makes it easy to adopt what is different. In this diff I modify existing [[ https://code.google.com/p/googletest/wiki/Primer#Test_Fixtures:_Using_the_Same_Data_Configuration_for_Multiple_Te | test fixture ]] classes to inherit from `testing::Test`. Also for unit tests that use fixture class, `TEST` is replaced with `TEST_F` as required in gtest. There are several custom `main` functions in our existing tests. To make this transition easier, I modify all `main` functions to fallow gtest notation. But eventually we can remove them and use implementation of `main` that gtest provides. ```lang=bash % cat ~/transform #!/bin/sh files=$(git ls-files '*test\.cc') for file in $files do if grep -q "rocksdb::test::RunAllTests()" $file then if grep -Eq '^class \w+Test {' $file then perl -pi -e 's/^(class \w+Test) {/${1}: public testing::Test {/g' $file perl -pi -e 's/^(TEST)/${1}_F/g' $file fi perl -pi -e 's/(int main.*\{)/${1}::testing::InitGoogleTest(&argc, argv);/g' $file perl -pi -e 's/rocksdb::test::RunAllTests/RUN_ALL_TESTS/g' $file fi done % sh ~/transform % make format ``` Second iteration of this diff contains only scripted changes. Third iteration contains manual changes to fix last errors and make it compilable. Test Plan: Build and notice no errors. ```lang=bash % USE_CLANG=1 make check -j55 ``` Tests are still testing. Reviewers: meyering, sdong, rven, igor Reviewed By: igor Subscribers: dhruba, leveldb Differential Revision: https://reviews.facebook.net/D35157
2015-03-17 21:08:00 +00:00
class BloomTest : public testing::Test {
private:
std::unique_ptr<const FilterPolicy> policy_;
std::string filter_;
std::vector<std::string> keys_;
public:
Implement full filter for block based table. Summary: 1. Make filter_block.h a base class. Derive block_based_filter_block and full_filter_block. The previous one is the traditional filter block. The full_filter_block is newly added. It would generate a filter block that contain all the keys in SST file. 2. When querying a key, table would first check if full_filter is available. If not, it would go to the exact data block and check using block_based filter. 3. User could choose to use full_filter or tradional(block_based_filter). They would be stored in SST file with different meta index name. "filter.filter_policy" or "full_filter.filter_policy". Then, Table reader is able to know the fllter block type. 4. Some optimizations have been done for full_filter_block, thus it requires a different interface compared to the original one in filter_policy.h. 5. Actual implementation of filter bits coding/decoding is placed in util/bloom_impl.cc Benchmark: base commit 1d23b5c470844c1208301311f0889eca750431c0 Command: db_bench --db=/dev/shm/rocksdb --num_levels=6 --key_size=20 --prefix_size=20 --keys_per_prefix=0 --value_size=100 --write_buffer_size=134217728 --max_write_buffer_number=2 --target_file_size_base=33554432 --max_bytes_for_level_base=1073741824 --verify_checksum=false --max_background_compactions=4 --use_plain_table=0 --memtablerep=prefix_hash --open_files=-1 --mmap_read=1 --mmap_write=0 --bloom_bits=10 --bloom_locality=1 --memtable_bloom_bits=500000 --compression_type=lz4 --num=393216000 --use_hash_search=1 --block_size=1024 --block_restart_interval=16 --use_existing_db=1 --threads=1 --benchmarks=readrandom —disable_auto_compactions=1 Read QPS increase for about 30% from 2230002 to 2991411. Test Plan: make all check valgrind db_test db_stress --use_block_based_filter = 0 ./auto_sanity_test.sh Reviewers: igor, yhchiang, ljin, sdong Reviewed By: sdong Subscribers: dhruba, leveldb Differential Revision: https://reviews.facebook.net/D20979
2014-09-08 17:37:05 +00:00
BloomTest() : policy_(
NewBloomFilterPolicy(FLAGS_bits_per_key)) {}
void Reset() {
keys_.clear();
filter_.clear();
}
void ResetPolicy(const FilterPolicy* policy = nullptr) {
if (policy == nullptr) {
policy_.reset(NewBloomFilterPolicy(FLAGS_bits_per_key));
} else {
policy_.reset(policy);
}
Reset();
}
void Add(const Slice& s) {
keys_.push_back(s.ToString());
}
void Build() {
std::vector<Slice> key_slices;
for (size_t i = 0; i < keys_.size(); i++) {
key_slices.push_back(Slice(keys_[i]));
}
filter_.clear();
policy_->CreateFilter(&key_slices[0], static_cast<int>(key_slices.size()),
&filter_);
keys_.clear();
if (kVerbose >= 2) DumpFilter();
}
size_t FilterSize() const {
return filter_.size();
}
Slice FilterData() const { return Slice(filter_); }
void DumpFilter() {
fprintf(stderr, "F(");
for (size_t i = 0; i+1 < filter_.size(); i++) {
const unsigned int c = static_cast<unsigned int>(filter_[i]);
for (int j = 0; j < 8; j++) {
fprintf(stderr, "%c", (c & (1 <<j)) ? '1' : '.');
}
}
fprintf(stderr, ")\n");
}
bool Matches(const Slice& s) {
if (!keys_.empty()) {
Build();
}
return policy_->KeyMayMatch(s, filter_);
}
double FalsePositiveRate() {
char buffer[sizeof(int)];
int result = 0;
for (int i = 0; i < 10000; i++) {
if (Matches(Key(i + 1000000000, buffer))) {
result++;
}
}
return result / 10000.0;
}
};
rocksdb: switch to gtest Summary: Our existing test notation is very similar to what is used in gtest. It makes it easy to adopt what is different. In this diff I modify existing [[ https://code.google.com/p/googletest/wiki/Primer#Test_Fixtures:_Using_the_Same_Data_Configuration_for_Multiple_Te | test fixture ]] classes to inherit from `testing::Test`. Also for unit tests that use fixture class, `TEST` is replaced with `TEST_F` as required in gtest. There are several custom `main` functions in our existing tests. To make this transition easier, I modify all `main` functions to fallow gtest notation. But eventually we can remove them and use implementation of `main` that gtest provides. ```lang=bash % cat ~/transform #!/bin/sh files=$(git ls-files '*test\.cc') for file in $files do if grep -q "rocksdb::test::RunAllTests()" $file then if grep -Eq '^class \w+Test {' $file then perl -pi -e 's/^(class \w+Test) {/${1}: public testing::Test {/g' $file perl -pi -e 's/^(TEST)/${1}_F/g' $file fi perl -pi -e 's/(int main.*\{)/${1}::testing::InitGoogleTest(&argc, argv);/g' $file perl -pi -e 's/rocksdb::test::RunAllTests/RUN_ALL_TESTS/g' $file fi done % sh ~/transform % make format ``` Second iteration of this diff contains only scripted changes. Third iteration contains manual changes to fix last errors and make it compilable. Test Plan: Build and notice no errors. ```lang=bash % USE_CLANG=1 make check -j55 ``` Tests are still testing. Reviewers: meyering, sdong, rven, igor Reviewed By: igor Subscribers: dhruba, leveldb Differential Revision: https://reviews.facebook.net/D35157
2015-03-17 21:08:00 +00:00
TEST_F(BloomTest, EmptyFilter) {
ASSERT_TRUE(! Matches("hello"));
ASSERT_TRUE(! Matches("world"));
}
rocksdb: switch to gtest Summary: Our existing test notation is very similar to what is used in gtest. It makes it easy to adopt what is different. In this diff I modify existing [[ https://code.google.com/p/googletest/wiki/Primer#Test_Fixtures:_Using_the_Same_Data_Configuration_for_Multiple_Te | test fixture ]] classes to inherit from `testing::Test`. Also for unit tests that use fixture class, `TEST` is replaced with `TEST_F` as required in gtest. There are several custom `main` functions in our existing tests. To make this transition easier, I modify all `main` functions to fallow gtest notation. But eventually we can remove them and use implementation of `main` that gtest provides. ```lang=bash % cat ~/transform #!/bin/sh files=$(git ls-files '*test\.cc') for file in $files do if grep -q "rocksdb::test::RunAllTests()" $file then if grep -Eq '^class \w+Test {' $file then perl -pi -e 's/^(class \w+Test) {/${1}: public testing::Test {/g' $file perl -pi -e 's/^(TEST)/${1}_F/g' $file fi perl -pi -e 's/(int main.*\{)/${1}::testing::InitGoogleTest(&argc, argv);/g' $file perl -pi -e 's/rocksdb::test::RunAllTests/RUN_ALL_TESTS/g' $file fi done % sh ~/transform % make format ``` Second iteration of this diff contains only scripted changes. Third iteration contains manual changes to fix last errors and make it compilable. Test Plan: Build and notice no errors. ```lang=bash % USE_CLANG=1 make check -j55 ``` Tests are still testing. Reviewers: meyering, sdong, rven, igor Reviewed By: igor Subscribers: dhruba, leveldb Differential Revision: https://reviews.facebook.net/D35157
2015-03-17 21:08:00 +00:00
TEST_F(BloomTest, Small) {
Add("hello");
Add("world");
ASSERT_TRUE(Matches("hello"));
ASSERT_TRUE(Matches("world"));
ASSERT_TRUE(! Matches("x"));
ASSERT_TRUE(! Matches("foo"));
}
rocksdb: switch to gtest Summary: Our existing test notation is very similar to what is used in gtest. It makes it easy to adopt what is different. In this diff I modify existing [[ https://code.google.com/p/googletest/wiki/Primer#Test_Fixtures:_Using_the_Same_Data_Configuration_for_Multiple_Te | test fixture ]] classes to inherit from `testing::Test`. Also for unit tests that use fixture class, `TEST` is replaced with `TEST_F` as required in gtest. There are several custom `main` functions in our existing tests. To make this transition easier, I modify all `main` functions to fallow gtest notation. But eventually we can remove them and use implementation of `main` that gtest provides. ```lang=bash % cat ~/transform #!/bin/sh files=$(git ls-files '*test\.cc') for file in $files do if grep -q "rocksdb::test::RunAllTests()" $file then if grep -Eq '^class \w+Test {' $file then perl -pi -e 's/^(class \w+Test) {/${1}: public testing::Test {/g' $file perl -pi -e 's/^(TEST)/${1}_F/g' $file fi perl -pi -e 's/(int main.*\{)/${1}::testing::InitGoogleTest(&argc, argv);/g' $file perl -pi -e 's/rocksdb::test::RunAllTests/RUN_ALL_TESTS/g' $file fi done % sh ~/transform % make format ``` Second iteration of this diff contains only scripted changes. Third iteration contains manual changes to fix last errors and make it compilable. Test Plan: Build and notice no errors. ```lang=bash % USE_CLANG=1 make check -j55 ``` Tests are still testing. Reviewers: meyering, sdong, rven, igor Reviewed By: igor Subscribers: dhruba, leveldb Differential Revision: https://reviews.facebook.net/D35157
2015-03-17 21:08:00 +00:00
TEST_F(BloomTest, VaryingLengths) {
char buffer[sizeof(int)];
// Count number of filters that significantly exceed the false positive rate
int mediocre_filters = 0;
int good_filters = 0;
for (int length = 1; length <= 10000; length = NextLength(length)) {
Reset();
for (int i = 0; i < length; i++) {
Add(Key(i, buffer));
}
Build();
ASSERT_LE(FilterSize(), (size_t)((length * 10 / 8) + 40)) << length;
// All added keys must match
for (int i = 0; i < length; i++) {
ASSERT_TRUE(Matches(Key(i, buffer)))
<< "Length " << length << "; key " << i;
}
// Check false positive rate
double rate = FalsePositiveRate();
if (kVerbose >= 1) {
fprintf(stderr, "False positives: %5.2f%% @ length = %6d ; bytes = %6d\n",
rate*100.0, length, static_cast<int>(FilterSize()));
}
ASSERT_LE(rate, 0.02); // Must not be over 2%
if (rate > 0.0125) mediocre_filters++; // Allowed, but not too often
else good_filters++;
}
if (kVerbose >= 1) {
fprintf(stderr, "Filters: %d good, %d mediocre\n",
good_filters, mediocre_filters);
}
ASSERT_LE(mediocre_filters, good_filters/5);
}
// Ensure the implementation doesn't accidentally change in an
// incompatible way
TEST_F(BloomTest, Schema) {
char buffer[sizeof(int)];
ResetPolicy(NewBloomFilterPolicy(8)); // num_probes = 5
for (int key = 0; key < 87; key++) {
Add(Key(key, buffer));
}
Build();
ASSERT_EQ(BloomHash(FilterData()), 3589896109U);
ResetPolicy(NewBloomFilterPolicy(9)); // num_probes = 6
for (int key = 0; key < 87; key++) {
Add(Key(key, buffer));
}
Build();
ASSERT_EQ(BloomHash(FilterData()), 969445585);
ResetPolicy(NewBloomFilterPolicy(11)); // num_probes = 7
for (int key = 0; key < 87; key++) {
Add(Key(key, buffer));
}
Build();
ASSERT_EQ(BloomHash(FilterData()), 1694458207);
ResetPolicy(NewBloomFilterPolicy(10)); // num_probes = 6
for (int key = 0; key < 87; key++) {
Add(Key(key, buffer));
}
Build();
ASSERT_EQ(BloomHash(FilterData()), 2373646410U);
ResetPolicy(NewBloomFilterPolicy(10));
for (int key = 1; key < 87; key++) {
Add(Key(key, buffer));
}
Build();
ASSERT_EQ(BloomHash(FilterData()), 1908442116);
ResetPolicy(NewBloomFilterPolicy(10));
for (int key = 1; key < 88; key++) {
Add(Key(key, buffer));
}
Build();
ASSERT_EQ(BloomHash(FilterData()), 3057004015U);
ResetPolicy();
}
// Different bits-per-byte
rocksdb: switch to gtest Summary: Our existing test notation is very similar to what is used in gtest. It makes it easy to adopt what is different. In this diff I modify existing [[ https://code.google.com/p/googletest/wiki/Primer#Test_Fixtures:_Using_the_Same_Data_Configuration_for_Multiple_Te | test fixture ]] classes to inherit from `testing::Test`. Also for unit tests that use fixture class, `TEST` is replaced with `TEST_F` as required in gtest. There are several custom `main` functions in our existing tests. To make this transition easier, I modify all `main` functions to fallow gtest notation. But eventually we can remove them and use implementation of `main` that gtest provides. ```lang=bash % cat ~/transform #!/bin/sh files=$(git ls-files '*test\.cc') for file in $files do if grep -q "rocksdb::test::RunAllTests()" $file then if grep -Eq '^class \w+Test {' $file then perl -pi -e 's/^(class \w+Test) {/${1}: public testing::Test {/g' $file perl -pi -e 's/^(TEST)/${1}_F/g' $file fi perl -pi -e 's/(int main.*\{)/${1}::testing::InitGoogleTest(&argc, argv);/g' $file perl -pi -e 's/rocksdb::test::RunAllTests/RUN_ALL_TESTS/g' $file fi done % sh ~/transform % make format ``` Second iteration of this diff contains only scripted changes. Third iteration contains manual changes to fix last errors and make it compilable. Test Plan: Build and notice no errors. ```lang=bash % USE_CLANG=1 make check -j55 ``` Tests are still testing. Reviewers: meyering, sdong, rven, igor Reviewed By: igor Subscribers: dhruba, leveldb Differential Revision: https://reviews.facebook.net/D35157
2015-03-17 21:08:00 +00:00
class FullBloomTest : public testing::Test {
Implement full filter for block based table. Summary: 1. Make filter_block.h a base class. Derive block_based_filter_block and full_filter_block. The previous one is the traditional filter block. The full_filter_block is newly added. It would generate a filter block that contain all the keys in SST file. 2. When querying a key, table would first check if full_filter is available. If not, it would go to the exact data block and check using block_based filter. 3. User could choose to use full_filter or tradional(block_based_filter). They would be stored in SST file with different meta index name. "filter.filter_policy" or "full_filter.filter_policy". Then, Table reader is able to know the fllter block type. 4. Some optimizations have been done for full_filter_block, thus it requires a different interface compared to the original one in filter_policy.h. 5. Actual implementation of filter bits coding/decoding is placed in util/bloom_impl.cc Benchmark: base commit 1d23b5c470844c1208301311f0889eca750431c0 Command: db_bench --db=/dev/shm/rocksdb --num_levels=6 --key_size=20 --prefix_size=20 --keys_per_prefix=0 --value_size=100 --write_buffer_size=134217728 --max_write_buffer_number=2 --target_file_size_base=33554432 --max_bytes_for_level_base=1073741824 --verify_checksum=false --max_background_compactions=4 --use_plain_table=0 --memtablerep=prefix_hash --open_files=-1 --mmap_read=1 --mmap_write=0 --bloom_bits=10 --bloom_locality=1 --memtable_bloom_bits=500000 --compression_type=lz4 --num=393216000 --use_hash_search=1 --block_size=1024 --block_restart_interval=16 --use_existing_db=1 --threads=1 --benchmarks=readrandom —disable_auto_compactions=1 Read QPS increase for about 30% from 2230002 to 2991411. Test Plan: make all check valgrind db_test db_stress --use_block_based_filter = 0 ./auto_sanity_test.sh Reviewers: igor, yhchiang, ljin, sdong Reviewed By: sdong Subscribers: dhruba, leveldb Differential Revision: https://reviews.facebook.net/D20979
2014-09-08 17:37:05 +00:00
private:
std::unique_ptr<const FilterPolicy> policy_;
Implement full filter for block based table. Summary: 1. Make filter_block.h a base class. Derive block_based_filter_block and full_filter_block. The previous one is the traditional filter block. The full_filter_block is newly added. It would generate a filter block that contain all the keys in SST file. 2. When querying a key, table would first check if full_filter is available. If not, it would go to the exact data block and check using block_based filter. 3. User could choose to use full_filter or tradional(block_based_filter). They would be stored in SST file with different meta index name. "filter.filter_policy" or "full_filter.filter_policy". Then, Table reader is able to know the fllter block type. 4. Some optimizations have been done for full_filter_block, thus it requires a different interface compared to the original one in filter_policy.h. 5. Actual implementation of filter bits coding/decoding is placed in util/bloom_impl.cc Benchmark: base commit 1d23b5c470844c1208301311f0889eca750431c0 Command: db_bench --db=/dev/shm/rocksdb --num_levels=6 --key_size=20 --prefix_size=20 --keys_per_prefix=0 --value_size=100 --write_buffer_size=134217728 --max_write_buffer_number=2 --target_file_size_base=33554432 --max_bytes_for_level_base=1073741824 --verify_checksum=false --max_background_compactions=4 --use_plain_table=0 --memtablerep=prefix_hash --open_files=-1 --mmap_read=1 --mmap_write=0 --bloom_bits=10 --bloom_locality=1 --memtable_bloom_bits=500000 --compression_type=lz4 --num=393216000 --use_hash_search=1 --block_size=1024 --block_restart_interval=16 --use_existing_db=1 --threads=1 --benchmarks=readrandom —disable_auto_compactions=1 Read QPS increase for about 30% from 2230002 to 2991411. Test Plan: make all check valgrind db_test db_stress --use_block_based_filter = 0 ./auto_sanity_test.sh Reviewers: igor, yhchiang, ljin, sdong Reviewed By: sdong Subscribers: dhruba, leveldb Differential Revision: https://reviews.facebook.net/D20979
2014-09-08 17:37:05 +00:00
std::unique_ptr<FilterBitsBuilder> bits_builder_;
std::unique_ptr<FilterBitsReader> bits_reader_;
std::unique_ptr<const char[]> buf_;
size_t filter_size_;
public:
FullBloomTest() :
policy_(NewBloomFilterPolicy(FLAGS_bits_per_key, false)),
filter_size_(0) {
Reset();
}
FullFilterBitsBuilder* GetFullFilterBitsBuilder() {
return dynamic_cast<FullFilterBitsBuilder*>(bits_builder_.get());
}
Implement full filter for block based table. Summary: 1. Make filter_block.h a base class. Derive block_based_filter_block and full_filter_block. The previous one is the traditional filter block. The full_filter_block is newly added. It would generate a filter block that contain all the keys in SST file. 2. When querying a key, table would first check if full_filter is available. If not, it would go to the exact data block and check using block_based filter. 3. User could choose to use full_filter or tradional(block_based_filter). They would be stored in SST file with different meta index name. "filter.filter_policy" or "full_filter.filter_policy". Then, Table reader is able to know the fllter block type. 4. Some optimizations have been done for full_filter_block, thus it requires a different interface compared to the original one in filter_policy.h. 5. Actual implementation of filter bits coding/decoding is placed in util/bloom_impl.cc Benchmark: base commit 1d23b5c470844c1208301311f0889eca750431c0 Command: db_bench --db=/dev/shm/rocksdb --num_levels=6 --key_size=20 --prefix_size=20 --keys_per_prefix=0 --value_size=100 --write_buffer_size=134217728 --max_write_buffer_number=2 --target_file_size_base=33554432 --max_bytes_for_level_base=1073741824 --verify_checksum=false --max_background_compactions=4 --use_plain_table=0 --memtablerep=prefix_hash --open_files=-1 --mmap_read=1 --mmap_write=0 --bloom_bits=10 --bloom_locality=1 --memtable_bloom_bits=500000 --compression_type=lz4 --num=393216000 --use_hash_search=1 --block_size=1024 --block_restart_interval=16 --use_existing_db=1 --threads=1 --benchmarks=readrandom —disable_auto_compactions=1 Read QPS increase for about 30% from 2230002 to 2991411. Test Plan: make all check valgrind db_test db_stress --use_block_based_filter = 0 ./auto_sanity_test.sh Reviewers: igor, yhchiang, ljin, sdong Reviewed By: sdong Subscribers: dhruba, leveldb Differential Revision: https://reviews.facebook.net/D20979
2014-09-08 17:37:05 +00:00
void Reset() {
bits_builder_.reset(policy_->GetFilterBitsBuilder());
bits_reader_.reset(nullptr);
buf_.reset(nullptr);
filter_size_ = 0;
}
void ResetPolicy(const FilterPolicy* policy = nullptr) {
if (policy == nullptr) {
policy_.reset(NewBloomFilterPolicy(FLAGS_bits_per_key, false));
} else {
policy_.reset(policy);
}
Reset();
}
Implement full filter for block based table. Summary: 1. Make filter_block.h a base class. Derive block_based_filter_block and full_filter_block. The previous one is the traditional filter block. The full_filter_block is newly added. It would generate a filter block that contain all the keys in SST file. 2. When querying a key, table would first check if full_filter is available. If not, it would go to the exact data block and check using block_based filter. 3. User could choose to use full_filter or tradional(block_based_filter). They would be stored in SST file with different meta index name. "filter.filter_policy" or "full_filter.filter_policy". Then, Table reader is able to know the fllter block type. 4. Some optimizations have been done for full_filter_block, thus it requires a different interface compared to the original one in filter_policy.h. 5. Actual implementation of filter bits coding/decoding is placed in util/bloom_impl.cc Benchmark: base commit 1d23b5c470844c1208301311f0889eca750431c0 Command: db_bench --db=/dev/shm/rocksdb --num_levels=6 --key_size=20 --prefix_size=20 --keys_per_prefix=0 --value_size=100 --write_buffer_size=134217728 --max_write_buffer_number=2 --target_file_size_base=33554432 --max_bytes_for_level_base=1073741824 --verify_checksum=false --max_background_compactions=4 --use_plain_table=0 --memtablerep=prefix_hash --open_files=-1 --mmap_read=1 --mmap_write=0 --bloom_bits=10 --bloom_locality=1 --memtable_bloom_bits=500000 --compression_type=lz4 --num=393216000 --use_hash_search=1 --block_size=1024 --block_restart_interval=16 --use_existing_db=1 --threads=1 --benchmarks=readrandom —disable_auto_compactions=1 Read QPS increase for about 30% from 2230002 to 2991411. Test Plan: make all check valgrind db_test db_stress --use_block_based_filter = 0 ./auto_sanity_test.sh Reviewers: igor, yhchiang, ljin, sdong Reviewed By: sdong Subscribers: dhruba, leveldb Differential Revision: https://reviews.facebook.net/D20979
2014-09-08 17:37:05 +00:00
void Add(const Slice& s) {
bits_builder_->AddKey(s);
}
void OpenRaw(const Slice& s) {
bits_reader_.reset(policy_->GetFilterBitsReader(s));
}
Implement full filter for block based table. Summary: 1. Make filter_block.h a base class. Derive block_based_filter_block and full_filter_block. The previous one is the traditional filter block. The full_filter_block is newly added. It would generate a filter block that contain all the keys in SST file. 2. When querying a key, table would first check if full_filter is available. If not, it would go to the exact data block and check using block_based filter. 3. User could choose to use full_filter or tradional(block_based_filter). They would be stored in SST file with different meta index name. "filter.filter_policy" or "full_filter.filter_policy". Then, Table reader is able to know the fllter block type. 4. Some optimizations have been done for full_filter_block, thus it requires a different interface compared to the original one in filter_policy.h. 5. Actual implementation of filter bits coding/decoding is placed in util/bloom_impl.cc Benchmark: base commit 1d23b5c470844c1208301311f0889eca750431c0 Command: db_bench --db=/dev/shm/rocksdb --num_levels=6 --key_size=20 --prefix_size=20 --keys_per_prefix=0 --value_size=100 --write_buffer_size=134217728 --max_write_buffer_number=2 --target_file_size_base=33554432 --max_bytes_for_level_base=1073741824 --verify_checksum=false --max_background_compactions=4 --use_plain_table=0 --memtablerep=prefix_hash --open_files=-1 --mmap_read=1 --mmap_write=0 --bloom_bits=10 --bloom_locality=1 --memtable_bloom_bits=500000 --compression_type=lz4 --num=393216000 --use_hash_search=1 --block_size=1024 --block_restart_interval=16 --use_existing_db=1 --threads=1 --benchmarks=readrandom —disable_auto_compactions=1 Read QPS increase for about 30% from 2230002 to 2991411. Test Plan: make all check valgrind db_test db_stress --use_block_based_filter = 0 ./auto_sanity_test.sh Reviewers: igor, yhchiang, ljin, sdong Reviewed By: sdong Subscribers: dhruba, leveldb Differential Revision: https://reviews.facebook.net/D20979
2014-09-08 17:37:05 +00:00
void Build() {
Slice filter = bits_builder_->Finish(&buf_);
bits_reader_.reset(policy_->GetFilterBitsReader(filter));
filter_size_ = filter.size();
}
size_t FilterSize() const {
return filter_size_;
}
Slice FilterData() { return Slice(buf_.get(), filter_size_); }
Implement full filter for block based table. Summary: 1. Make filter_block.h a base class. Derive block_based_filter_block and full_filter_block. The previous one is the traditional filter block. The full_filter_block is newly added. It would generate a filter block that contain all the keys in SST file. 2. When querying a key, table would first check if full_filter is available. If not, it would go to the exact data block and check using block_based filter. 3. User could choose to use full_filter or tradional(block_based_filter). They would be stored in SST file with different meta index name. "filter.filter_policy" or "full_filter.filter_policy". Then, Table reader is able to know the fllter block type. 4. Some optimizations have been done for full_filter_block, thus it requires a different interface compared to the original one in filter_policy.h. 5. Actual implementation of filter bits coding/decoding is placed in util/bloom_impl.cc Benchmark: base commit 1d23b5c470844c1208301311f0889eca750431c0 Command: db_bench --db=/dev/shm/rocksdb --num_levels=6 --key_size=20 --prefix_size=20 --keys_per_prefix=0 --value_size=100 --write_buffer_size=134217728 --max_write_buffer_number=2 --target_file_size_base=33554432 --max_bytes_for_level_base=1073741824 --verify_checksum=false --max_background_compactions=4 --use_plain_table=0 --memtablerep=prefix_hash --open_files=-1 --mmap_read=1 --mmap_write=0 --bloom_bits=10 --bloom_locality=1 --memtable_bloom_bits=500000 --compression_type=lz4 --num=393216000 --use_hash_search=1 --block_size=1024 --block_restart_interval=16 --use_existing_db=1 --threads=1 --benchmarks=readrandom —disable_auto_compactions=1 Read QPS increase for about 30% from 2230002 to 2991411. Test Plan: make all check valgrind db_test db_stress --use_block_based_filter = 0 ./auto_sanity_test.sh Reviewers: igor, yhchiang, ljin, sdong Reviewed By: sdong Subscribers: dhruba, leveldb Differential Revision: https://reviews.facebook.net/D20979
2014-09-08 17:37:05 +00:00
bool Matches(const Slice& s) {
if (bits_reader_ == nullptr) {
Build();
}
return bits_reader_->MayMatch(s);
}
uint64_t PackedMatches() {
char buffer[sizeof(int)];
uint64_t result = 0;
for (int i = 0; i < 64; i++) {
if (Matches(Key(i + 12345, buffer))) {
result |= uint64_t{1} << i;
}
}
return result;
}
Implement full filter for block based table. Summary: 1. Make filter_block.h a base class. Derive block_based_filter_block and full_filter_block. The previous one is the traditional filter block. The full_filter_block is newly added. It would generate a filter block that contain all the keys in SST file. 2. When querying a key, table would first check if full_filter is available. If not, it would go to the exact data block and check using block_based filter. 3. User could choose to use full_filter or tradional(block_based_filter). They would be stored in SST file with different meta index name. "filter.filter_policy" or "full_filter.filter_policy". Then, Table reader is able to know the fllter block type. 4. Some optimizations have been done for full_filter_block, thus it requires a different interface compared to the original one in filter_policy.h. 5. Actual implementation of filter bits coding/decoding is placed in util/bloom_impl.cc Benchmark: base commit 1d23b5c470844c1208301311f0889eca750431c0 Command: db_bench --db=/dev/shm/rocksdb --num_levels=6 --key_size=20 --prefix_size=20 --keys_per_prefix=0 --value_size=100 --write_buffer_size=134217728 --max_write_buffer_number=2 --target_file_size_base=33554432 --max_bytes_for_level_base=1073741824 --verify_checksum=false --max_background_compactions=4 --use_plain_table=0 --memtablerep=prefix_hash --open_files=-1 --mmap_read=1 --mmap_write=0 --bloom_bits=10 --bloom_locality=1 --memtable_bloom_bits=500000 --compression_type=lz4 --num=393216000 --use_hash_search=1 --block_size=1024 --block_restart_interval=16 --use_existing_db=1 --threads=1 --benchmarks=readrandom —disable_auto_compactions=1 Read QPS increase for about 30% from 2230002 to 2991411. Test Plan: make all check valgrind db_test db_stress --use_block_based_filter = 0 ./auto_sanity_test.sh Reviewers: igor, yhchiang, ljin, sdong Reviewed By: sdong Subscribers: dhruba, leveldb Differential Revision: https://reviews.facebook.net/D20979
2014-09-08 17:37:05 +00:00
double FalsePositiveRate() {
char buffer[sizeof(int)];
int result = 0;
for (int i = 0; i < 10000; i++) {
if (Matches(Key(i + 1000000000, buffer))) {
result++;
}
}
return result / 10000.0;
}
};
TEST_F(FullBloomTest, FilterSize) {
uint32_t dont_care1, dont_care2;
auto full_bits_builder = GetFullFilterBitsBuilder();
ASSERT_TRUE(full_bits_builder != nullptr);
for (int n = 1; n < 100; n++) {
auto space = full_bits_builder->CalculateSpace(n, &dont_care1, &dont_care2);
auto n2 = full_bits_builder->CalculateNumEntry(space);
ASSERT_GE(n2, n);
auto space2 =
full_bits_builder->CalculateSpace(n2, &dont_care1, &dont_care2);
ASSERT_EQ(space, space2);
}
}
rocksdb: switch to gtest Summary: Our existing test notation is very similar to what is used in gtest. It makes it easy to adopt what is different. In this diff I modify existing [[ https://code.google.com/p/googletest/wiki/Primer#Test_Fixtures:_Using_the_Same_Data_Configuration_for_Multiple_Te | test fixture ]] classes to inherit from `testing::Test`. Also for unit tests that use fixture class, `TEST` is replaced with `TEST_F` as required in gtest. There are several custom `main` functions in our existing tests. To make this transition easier, I modify all `main` functions to fallow gtest notation. But eventually we can remove them and use implementation of `main` that gtest provides. ```lang=bash % cat ~/transform #!/bin/sh files=$(git ls-files '*test\.cc') for file in $files do if grep -q "rocksdb::test::RunAllTests()" $file then if grep -Eq '^class \w+Test {' $file then perl -pi -e 's/^(class \w+Test) {/${1}: public testing::Test {/g' $file perl -pi -e 's/^(TEST)/${1}_F/g' $file fi perl -pi -e 's/(int main.*\{)/${1}::testing::InitGoogleTest(&argc, argv);/g' $file perl -pi -e 's/rocksdb::test::RunAllTests/RUN_ALL_TESTS/g' $file fi done % sh ~/transform % make format ``` Second iteration of this diff contains only scripted changes. Third iteration contains manual changes to fix last errors and make it compilable. Test Plan: Build and notice no errors. ```lang=bash % USE_CLANG=1 make check -j55 ``` Tests are still testing. Reviewers: meyering, sdong, rven, igor Reviewed By: igor Subscribers: dhruba, leveldb Differential Revision: https://reviews.facebook.net/D35157
2015-03-17 21:08:00 +00:00
TEST_F(FullBloomTest, FullEmptyFilter) {
Implement full filter for block based table. Summary: 1. Make filter_block.h a base class. Derive block_based_filter_block and full_filter_block. The previous one is the traditional filter block. The full_filter_block is newly added. It would generate a filter block that contain all the keys in SST file. 2. When querying a key, table would first check if full_filter is available. If not, it would go to the exact data block and check using block_based filter. 3. User could choose to use full_filter or tradional(block_based_filter). They would be stored in SST file with different meta index name. "filter.filter_policy" or "full_filter.filter_policy". Then, Table reader is able to know the fllter block type. 4. Some optimizations have been done for full_filter_block, thus it requires a different interface compared to the original one in filter_policy.h. 5. Actual implementation of filter bits coding/decoding is placed in util/bloom_impl.cc Benchmark: base commit 1d23b5c470844c1208301311f0889eca750431c0 Command: db_bench --db=/dev/shm/rocksdb --num_levels=6 --key_size=20 --prefix_size=20 --keys_per_prefix=0 --value_size=100 --write_buffer_size=134217728 --max_write_buffer_number=2 --target_file_size_base=33554432 --max_bytes_for_level_base=1073741824 --verify_checksum=false --max_background_compactions=4 --use_plain_table=0 --memtablerep=prefix_hash --open_files=-1 --mmap_read=1 --mmap_write=0 --bloom_bits=10 --bloom_locality=1 --memtable_bloom_bits=500000 --compression_type=lz4 --num=393216000 --use_hash_search=1 --block_size=1024 --block_restart_interval=16 --use_existing_db=1 --threads=1 --benchmarks=readrandom —disable_auto_compactions=1 Read QPS increase for about 30% from 2230002 to 2991411. Test Plan: make all check valgrind db_test db_stress --use_block_based_filter = 0 ./auto_sanity_test.sh Reviewers: igor, yhchiang, ljin, sdong Reviewed By: sdong Subscribers: dhruba, leveldb Differential Revision: https://reviews.facebook.net/D20979
2014-09-08 17:37:05 +00:00
// Empty filter is not match, at this level
ASSERT_TRUE(!Matches("hello"));
ASSERT_TRUE(!Matches("world"));
}
rocksdb: switch to gtest Summary: Our existing test notation is very similar to what is used in gtest. It makes it easy to adopt what is different. In this diff I modify existing [[ https://code.google.com/p/googletest/wiki/Primer#Test_Fixtures:_Using_the_Same_Data_Configuration_for_Multiple_Te | test fixture ]] classes to inherit from `testing::Test`. Also for unit tests that use fixture class, `TEST` is replaced with `TEST_F` as required in gtest. There are several custom `main` functions in our existing tests. To make this transition easier, I modify all `main` functions to fallow gtest notation. But eventually we can remove them and use implementation of `main` that gtest provides. ```lang=bash % cat ~/transform #!/bin/sh files=$(git ls-files '*test\.cc') for file in $files do if grep -q "rocksdb::test::RunAllTests()" $file then if grep -Eq '^class \w+Test {' $file then perl -pi -e 's/^(class \w+Test) {/${1}: public testing::Test {/g' $file perl -pi -e 's/^(TEST)/${1}_F/g' $file fi perl -pi -e 's/(int main.*\{)/${1}::testing::InitGoogleTest(&argc, argv);/g' $file perl -pi -e 's/rocksdb::test::RunAllTests/RUN_ALL_TESTS/g' $file fi done % sh ~/transform % make format ``` Second iteration of this diff contains only scripted changes. Third iteration contains manual changes to fix last errors and make it compilable. Test Plan: Build and notice no errors. ```lang=bash % USE_CLANG=1 make check -j55 ``` Tests are still testing. Reviewers: meyering, sdong, rven, igor Reviewed By: igor Subscribers: dhruba, leveldb Differential Revision: https://reviews.facebook.net/D35157
2015-03-17 21:08:00 +00:00
TEST_F(FullBloomTest, FullSmall) {
Implement full filter for block based table. Summary: 1. Make filter_block.h a base class. Derive block_based_filter_block and full_filter_block. The previous one is the traditional filter block. The full_filter_block is newly added. It would generate a filter block that contain all the keys in SST file. 2. When querying a key, table would first check if full_filter is available. If not, it would go to the exact data block and check using block_based filter. 3. User could choose to use full_filter or tradional(block_based_filter). They would be stored in SST file with different meta index name. "filter.filter_policy" or "full_filter.filter_policy". Then, Table reader is able to know the fllter block type. 4. Some optimizations have been done for full_filter_block, thus it requires a different interface compared to the original one in filter_policy.h. 5. Actual implementation of filter bits coding/decoding is placed in util/bloom_impl.cc Benchmark: base commit 1d23b5c470844c1208301311f0889eca750431c0 Command: db_bench --db=/dev/shm/rocksdb --num_levels=6 --key_size=20 --prefix_size=20 --keys_per_prefix=0 --value_size=100 --write_buffer_size=134217728 --max_write_buffer_number=2 --target_file_size_base=33554432 --max_bytes_for_level_base=1073741824 --verify_checksum=false --max_background_compactions=4 --use_plain_table=0 --memtablerep=prefix_hash --open_files=-1 --mmap_read=1 --mmap_write=0 --bloom_bits=10 --bloom_locality=1 --memtable_bloom_bits=500000 --compression_type=lz4 --num=393216000 --use_hash_search=1 --block_size=1024 --block_restart_interval=16 --use_existing_db=1 --threads=1 --benchmarks=readrandom —disable_auto_compactions=1 Read QPS increase for about 30% from 2230002 to 2991411. Test Plan: make all check valgrind db_test db_stress --use_block_based_filter = 0 ./auto_sanity_test.sh Reviewers: igor, yhchiang, ljin, sdong Reviewed By: sdong Subscribers: dhruba, leveldb Differential Revision: https://reviews.facebook.net/D20979
2014-09-08 17:37:05 +00:00
Add("hello");
Add("world");
ASSERT_TRUE(Matches("hello"));
ASSERT_TRUE(Matches("world"));
ASSERT_TRUE(!Matches("x"));
ASSERT_TRUE(!Matches("foo"));
}
rocksdb: switch to gtest Summary: Our existing test notation is very similar to what is used in gtest. It makes it easy to adopt what is different. In this diff I modify existing [[ https://code.google.com/p/googletest/wiki/Primer#Test_Fixtures:_Using_the_Same_Data_Configuration_for_Multiple_Te | test fixture ]] classes to inherit from `testing::Test`. Also for unit tests that use fixture class, `TEST` is replaced with `TEST_F` as required in gtest. There are several custom `main` functions in our existing tests. To make this transition easier, I modify all `main` functions to fallow gtest notation. But eventually we can remove them and use implementation of `main` that gtest provides. ```lang=bash % cat ~/transform #!/bin/sh files=$(git ls-files '*test\.cc') for file in $files do if grep -q "rocksdb::test::RunAllTests()" $file then if grep -Eq '^class \w+Test {' $file then perl -pi -e 's/^(class \w+Test) {/${1}: public testing::Test {/g' $file perl -pi -e 's/^(TEST)/${1}_F/g' $file fi perl -pi -e 's/(int main.*\{)/${1}::testing::InitGoogleTest(&argc, argv);/g' $file perl -pi -e 's/rocksdb::test::RunAllTests/RUN_ALL_TESTS/g' $file fi done % sh ~/transform % make format ``` Second iteration of this diff contains only scripted changes. Third iteration contains manual changes to fix last errors and make it compilable. Test Plan: Build and notice no errors. ```lang=bash % USE_CLANG=1 make check -j55 ``` Tests are still testing. Reviewers: meyering, sdong, rven, igor Reviewed By: igor Subscribers: dhruba, leveldb Differential Revision: https://reviews.facebook.net/D35157
2015-03-17 21:08:00 +00:00
TEST_F(FullBloomTest, FullVaryingLengths) {
Implement full filter for block based table. Summary: 1. Make filter_block.h a base class. Derive block_based_filter_block and full_filter_block. The previous one is the traditional filter block. The full_filter_block is newly added. It would generate a filter block that contain all the keys in SST file. 2. When querying a key, table would first check if full_filter is available. If not, it would go to the exact data block and check using block_based filter. 3. User could choose to use full_filter or tradional(block_based_filter). They would be stored in SST file with different meta index name. "filter.filter_policy" or "full_filter.filter_policy". Then, Table reader is able to know the fllter block type. 4. Some optimizations have been done for full_filter_block, thus it requires a different interface compared to the original one in filter_policy.h. 5. Actual implementation of filter bits coding/decoding is placed in util/bloom_impl.cc Benchmark: base commit 1d23b5c470844c1208301311f0889eca750431c0 Command: db_bench --db=/dev/shm/rocksdb --num_levels=6 --key_size=20 --prefix_size=20 --keys_per_prefix=0 --value_size=100 --write_buffer_size=134217728 --max_write_buffer_number=2 --target_file_size_base=33554432 --max_bytes_for_level_base=1073741824 --verify_checksum=false --max_background_compactions=4 --use_plain_table=0 --memtablerep=prefix_hash --open_files=-1 --mmap_read=1 --mmap_write=0 --bloom_bits=10 --bloom_locality=1 --memtable_bloom_bits=500000 --compression_type=lz4 --num=393216000 --use_hash_search=1 --block_size=1024 --block_restart_interval=16 --use_existing_db=1 --threads=1 --benchmarks=readrandom —disable_auto_compactions=1 Read QPS increase for about 30% from 2230002 to 2991411. Test Plan: make all check valgrind db_test db_stress --use_block_based_filter = 0 ./auto_sanity_test.sh Reviewers: igor, yhchiang, ljin, sdong Reviewed By: sdong Subscribers: dhruba, leveldb Differential Revision: https://reviews.facebook.net/D20979
2014-09-08 17:37:05 +00:00
char buffer[sizeof(int)];
// Count number of filters that significantly exceed the false positive rate
int mediocre_filters = 0;
int good_filters = 0;
for (int length = 1; length <= 10000; length = NextLength(length)) {
Reset();
for (int i = 0; i < length; i++) {
Add(Key(i, buffer));
}
Build();
ASSERT_LE(FilterSize(),
(size_t)((length * 10 / 8) + CACHE_LINE_SIZE * 2 + 5));
Implement full filter for block based table. Summary: 1. Make filter_block.h a base class. Derive block_based_filter_block and full_filter_block. The previous one is the traditional filter block. The full_filter_block is newly added. It would generate a filter block that contain all the keys in SST file. 2. When querying a key, table would first check if full_filter is available. If not, it would go to the exact data block and check using block_based filter. 3. User could choose to use full_filter or tradional(block_based_filter). They would be stored in SST file with different meta index name. "filter.filter_policy" or "full_filter.filter_policy". Then, Table reader is able to know the fllter block type. 4. Some optimizations have been done for full_filter_block, thus it requires a different interface compared to the original one in filter_policy.h. 5. Actual implementation of filter bits coding/decoding is placed in util/bloom_impl.cc Benchmark: base commit 1d23b5c470844c1208301311f0889eca750431c0 Command: db_bench --db=/dev/shm/rocksdb --num_levels=6 --key_size=20 --prefix_size=20 --keys_per_prefix=0 --value_size=100 --write_buffer_size=134217728 --max_write_buffer_number=2 --target_file_size_base=33554432 --max_bytes_for_level_base=1073741824 --verify_checksum=false --max_background_compactions=4 --use_plain_table=0 --memtablerep=prefix_hash --open_files=-1 --mmap_read=1 --mmap_write=0 --bloom_bits=10 --bloom_locality=1 --memtable_bloom_bits=500000 --compression_type=lz4 --num=393216000 --use_hash_search=1 --block_size=1024 --block_restart_interval=16 --use_existing_db=1 --threads=1 --benchmarks=readrandom —disable_auto_compactions=1 Read QPS increase for about 30% from 2230002 to 2991411. Test Plan: make all check valgrind db_test db_stress --use_block_based_filter = 0 ./auto_sanity_test.sh Reviewers: igor, yhchiang, ljin, sdong Reviewed By: sdong Subscribers: dhruba, leveldb Differential Revision: https://reviews.facebook.net/D20979
2014-09-08 17:37:05 +00:00
// All added keys must match
for (int i = 0; i < length; i++) {
ASSERT_TRUE(Matches(Key(i, buffer)))
<< "Length " << length << "; key " << i;
}
// Check false positive rate
double rate = FalsePositiveRate();
if (kVerbose >= 1) {
fprintf(stderr, "False positives: %5.2f%% @ length = %6d ; bytes = %6d\n",
rate*100.0, length, static_cast<int>(FilterSize()));
}
ASSERT_LE(rate, 0.02); // Must not be over 2%
if (rate > 0.0125)
mediocre_filters++; // Allowed, but not too often
else
good_filters++;
}
if (kVerbose >= 1) {
fprintf(stderr, "Filters: %d good, %d mediocre\n",
good_filters, mediocre_filters);
}
ASSERT_LE(mediocre_filters, good_filters/5);
}
namespace {
inline uint32_t SelectByCacheLineSize(uint32_t for64, uint32_t for128,
uint32_t for256) {
(void)for64;
(void)for128;
(void)for256;
#if CACHE_LINE_SIZE == 64
return for64;
#elif CACHE_LINE_SIZE == 128
return for128;
#elif CACHE_LINE_SIZE == 256
return for256;
#else
#error "CACHE_LINE_SIZE unknown or unrecognized"
#endif
}
} // namespace
// Ensure the implementation doesn't accidentally change in an
// incompatible way
TEST_F(FullBloomTest, Schema) {
char buffer[sizeof(int)];
// Use enough keys so that changing bits / key by 1 is guaranteed to
// change number of allocated cache lines. So keys > max cache line bits.
ResetPolicy(NewBloomFilterPolicy(8)); // num_probes = 5
for (int key = 0; key < 2087; key++) {
Add(Key(key, buffer));
}
Build();
ASSERT_EQ(BloomHash(FilterData()),
SelectByCacheLineSize(1302145999, 2811644657U, 756553699));
ResetPolicy(NewBloomFilterPolicy(9)); // num_probes = 6
for (int key = 0; key < 2087; key++) {
Add(Key(key, buffer));
}
Build();
ASSERT_EQ(BloomHash(FilterData()),
SelectByCacheLineSize(2092755149, 661139132, 1182970461));
ResetPolicy(NewBloomFilterPolicy(11)); // num_probes = 7
for (int key = 0; key < 2087; key++) {
Add(Key(key, buffer));
}
Build();
ASSERT_EQ(BloomHash(FilterData()),
SelectByCacheLineSize(3755609649U, 1812694762, 1449142939));
ResetPolicy(NewBloomFilterPolicy(10)); // num_probes = 6
for (int key = 0; key < 2087; key++) {
Add(Key(key, buffer));
}
Build();
ASSERT_EQ(BloomHash(FilterData()),
SelectByCacheLineSize(1478976371, 2910591341U, 1182970461));
ResetPolicy(NewBloomFilterPolicy(10));
for (int key = 1; key < 2087; key++) {
Add(Key(key, buffer));
}
Build();
ASSERT_EQ(BloomHash(FilterData()),
SelectByCacheLineSize(4205696321U, 1132081253U, 2385981855U));
ResetPolicy(NewBloomFilterPolicy(10));
for (int key = 1; key < 2088; key++) {
Add(Key(key, buffer));
}
Build();
ASSERT_EQ(BloomHash(FilterData()),
SelectByCacheLineSize(2885052954U, 769447944, 4175124908U));
ResetPolicy();
}
// A helper class for testing custom or corrupt filter bits as read by
// FullFilterBitsReader.
struct RawFilterTester {
// Buffer, from which we always return a tail Slice, so the
// last five bytes are always the metadata bytes.
std::array<char, 3000> data_;
// Points five bytes from the end
char* metadata_ptr_;
RawFilterTester() : metadata_ptr_(&*(data_.end() - 5)) {}
Slice ResetNoFill(uint32_t len_without_metadata, uint32_t num_lines,
uint32_t num_probes) {
metadata_ptr_[0] = static_cast<char>(num_probes);
EncodeFixed32(metadata_ptr_ + 1, num_lines);
uint32_t len = len_without_metadata + /*metadata*/ 5;
assert(len <= data_.size());
return Slice(metadata_ptr_ - len_without_metadata, len);
}
Slice Reset(uint32_t len_without_metadata, uint32_t num_lines,
uint32_t num_probes, bool fill_ones) {
data_.fill(fill_ones ? 0xff : 0);
return ResetNoFill(len_without_metadata, num_lines, num_probes);
}
Slice ResetWeirdFill(uint32_t len_without_metadata, uint32_t num_lines,
uint32_t num_probes) {
for (uint32_t i = 0; i < data_.size(); ++i) {
data_[i] = static_cast<char>(0x7b7b >> (i % 7));
}
return ResetNoFill(len_without_metadata, num_lines, num_probes);
}
};
TEST_F(FullBloomTest, RawSchema) {
RawFilterTester cft;
// Two probes, about 3/4 bits set: ~50% "FP" rate
// One 256-byte cache line.
OpenRaw(cft.ResetWeirdFill(256, 1, 2));
ASSERT_EQ(uint64_t{11384799501900898790U}, PackedMatches());
// Two 128-byte cache lines.
OpenRaw(cft.ResetWeirdFill(256, 2, 2));
ASSERT_EQ(uint64_t{10157853359773492589U}, PackedMatches());
// Four 64-byte cache lines.
OpenRaw(cft.ResetWeirdFill(256, 4, 2));
ASSERT_EQ(uint64_t{7123594913907464682U}, PackedMatches());
}
TEST_F(FullBloomTest, CorruptFilters) {
RawFilterTester cft;
for (bool fill : {false, true}) {
// Good filter bits - returns same as fill
OpenRaw(cft.Reset(CACHE_LINE_SIZE, 1, 6, fill));
ASSERT_EQ(fill, Matches("hello"));
ASSERT_EQ(fill, Matches("world"));
// Good filter bits - returns same as fill
OpenRaw(cft.Reset(CACHE_LINE_SIZE * 3, 3, 6, fill));
ASSERT_EQ(fill, Matches("hello"));
ASSERT_EQ(fill, Matches("world"));
// Good filter bits - returns same as fill
// 256 is unusual but legal cache line size
OpenRaw(cft.Reset(256 * 3, 3, 6, fill));
ASSERT_EQ(fill, Matches("hello"));
ASSERT_EQ(fill, Matches("world"));
// Good filter bits - returns same as fill
// 30 should be max num_probes
OpenRaw(cft.Reset(CACHE_LINE_SIZE, 1, 30, fill));
ASSERT_EQ(fill, Matches("hello"));
ASSERT_EQ(fill, Matches("world"));
// Good filter bits - returns same as fill
// 1 should be min num_probes
OpenRaw(cft.Reset(CACHE_LINE_SIZE, 1, 1, fill));
ASSERT_EQ(fill, Matches("hello"));
ASSERT_EQ(fill, Matches("world"));
// Type 1 trivial filter bits - returns true as if FP by zero probes
OpenRaw(cft.Reset(CACHE_LINE_SIZE, 1, 0, fill));
ASSERT_TRUE(Matches("hello"));
ASSERT_TRUE(Matches("world"));
// Type 2 trivial filter bits - returns false as if built from zero keys
OpenRaw(cft.Reset(0, 0, 6, fill));
ASSERT_FALSE(Matches("hello"));
ASSERT_FALSE(Matches("world"));
// Type 2 trivial filter bits - returns false as if built from zero keys
OpenRaw(cft.Reset(0, 37, 6, fill));
ASSERT_FALSE(Matches("hello"));
ASSERT_FALSE(Matches("world"));
// Type 2 trivial filter bits - returns false as 0 size trumps 0 probes
OpenRaw(cft.Reset(0, 0, 0, fill));
ASSERT_FALSE(Matches("hello"));
ASSERT_FALSE(Matches("world"));
// Bad filter bits - returns true for safety
// No solution to 0 * x == CACHE_LINE_SIZE
OpenRaw(cft.Reset(CACHE_LINE_SIZE, 0, 6, fill));
ASSERT_TRUE(Matches("hello"));
ASSERT_TRUE(Matches("world"));
// Bad filter bits - returns true for safety
// Can't have 3 * x == 4 for integer x
OpenRaw(cft.Reset(4, 3, 6, fill));
ASSERT_TRUE(Matches("hello"));
ASSERT_TRUE(Matches("world"));
// Bad filter bits - returns true for safety
// 97 bytes is not a power of two, so not a legal cache line size
OpenRaw(cft.Reset(97 * 3, 3, 6, fill));
ASSERT_TRUE(Matches("hello"));
ASSERT_TRUE(Matches("world"));
Refactor / clean up / optimize FullFilterBitsReader (#5941) Summary: FullFilterBitsReader, after creating in BloomFilterPolicy, was responsible for decoding metadata bits. This meant that FullFilterBitsReader::MayMatch had some metadata checks in order to implement "always true" or "always false" functionality in the case of inconsistent or trivial metadata. This made for ugly mixing-of-concerns code and probably had some runtime cost. It also didn't really support plugging in alternative filter implementations with extensions to the existing metadata schema. BloomFilterPolicy::GetFilterBitsReader is now (exclusively) responsible for decoding filter metadata bits and constructing appropriate instances deriving from FilterBitsReader. "Always false" and "always true" derived classes allow FullFilterBitsReader not to be concerned with handling of trivial or inconsistent metadata. This also makes for easy expansion to alternative filter implementations in new, alternative derived classes. This change makes calls to FilterBitsReader::MayMatch *necessarily* virtual because there's now more than one built-in implementation. Compared with the previous implementation's extra 'if' checks in MayMatch, there's no consistent performance difference, measured by (an older revision of) filter_bench (differences here seem to be within noise): Inside queries... - Dry run (407) ns/op: 35.9996 + Dry run (407) ns/op: 35.2034 - Single filter ns/op: 47.5483 + Single filter ns/op: 47.4034 - Batched, prepared ns/op: 43.1559 + Batched, prepared ns/op: 42.2923 ... - Random filter ns/op: 150.697 + Random filter ns/op: 149.403 ---------------------------- Outside queries... - Dry run (980) ns/op: 34.6114 + Dry run (980) ns/op: 34.0405 - Single filter ns/op: 56.8326 + Single filter ns/op: 55.8414 - Batched, prepared ns/op: 48.2346 + Batched, prepared ns/op: 47.5667 - Random filter ns/op: 155.377 + Random filter ns/op: 153.942 Average FP rate %: 1.1386 Also, the FullFilterBitsReader ctor was responsible for a surprising amount of CPU in production, due in part to inefficient determination of the CACHE_LINE_SIZE used to construct the filter being read. The overwhelming common case (same as my CACHE_LINE_SIZE) is now substantially optimized, as shown with filter_bench with -new_reader_every=1 (old option - see below) (repeatable result): Inside queries... - Dry run (453) ns/op: 118.799 + Dry run (453) ns/op: 105.869 - Single filter ns/op: 82.5831 + Single filter ns/op: 74.2509 ... - Random filter ns/op: 224.936 + Random filter ns/op: 194.833 ---------------------------- Outside queries... - Dry run (aa1) ns/op: 118.503 + Dry run (aa1) ns/op: 104.925 - Single filter ns/op: 90.3023 + Single filter ns/op: 83.425 ... - Random filter ns/op: 220.455 + Random filter ns/op: 175.7 Average FP rate %: 1.13886 However PR#5936 has/will reclaim most of this cost. After that PR, the optimization of this code path is likely negligible, but nonetheless it's clear we aren't making performance any worse. Also fixed inadequate check of consistency between filter data size and num_lines. (Unit test updated.) Pull Request resolved: https://github.com/facebook/rocksdb/pull/5941 Test Plan: previously added unit tests FullBloomTest.CorruptFilters and FullBloomTest.RawSchema Differential Revision: D18018353 Pulled By: pdillinger fbshipit-source-id: 8e04c2b4a7d93223f49a237fd52ef2483929ed9c
2019-10-18 21:49:26 +00:00
// Bad filter bits - returns true for safety
// 65 bytes is not a power of two, so not a legal cache line size
OpenRaw(cft.Reset(65 * 3, 3, 6, fill));
Refactor / clean up / optimize FullFilterBitsReader (#5941) Summary: FullFilterBitsReader, after creating in BloomFilterPolicy, was responsible for decoding metadata bits. This meant that FullFilterBitsReader::MayMatch had some metadata checks in order to implement "always true" or "always false" functionality in the case of inconsistent or trivial metadata. This made for ugly mixing-of-concerns code and probably had some runtime cost. It also didn't really support plugging in alternative filter implementations with extensions to the existing metadata schema. BloomFilterPolicy::GetFilterBitsReader is now (exclusively) responsible for decoding filter metadata bits and constructing appropriate instances deriving from FilterBitsReader. "Always false" and "always true" derived classes allow FullFilterBitsReader not to be concerned with handling of trivial or inconsistent metadata. This also makes for easy expansion to alternative filter implementations in new, alternative derived classes. This change makes calls to FilterBitsReader::MayMatch *necessarily* virtual because there's now more than one built-in implementation. Compared with the previous implementation's extra 'if' checks in MayMatch, there's no consistent performance difference, measured by (an older revision of) filter_bench (differences here seem to be within noise): Inside queries... - Dry run (407) ns/op: 35.9996 + Dry run (407) ns/op: 35.2034 - Single filter ns/op: 47.5483 + Single filter ns/op: 47.4034 - Batched, prepared ns/op: 43.1559 + Batched, prepared ns/op: 42.2923 ... - Random filter ns/op: 150.697 + Random filter ns/op: 149.403 ---------------------------- Outside queries... - Dry run (980) ns/op: 34.6114 + Dry run (980) ns/op: 34.0405 - Single filter ns/op: 56.8326 + Single filter ns/op: 55.8414 - Batched, prepared ns/op: 48.2346 + Batched, prepared ns/op: 47.5667 - Random filter ns/op: 155.377 + Random filter ns/op: 153.942 Average FP rate %: 1.1386 Also, the FullFilterBitsReader ctor was responsible for a surprising amount of CPU in production, due in part to inefficient determination of the CACHE_LINE_SIZE used to construct the filter being read. The overwhelming common case (same as my CACHE_LINE_SIZE) is now substantially optimized, as shown with filter_bench with -new_reader_every=1 (old option - see below) (repeatable result): Inside queries... - Dry run (453) ns/op: 118.799 + Dry run (453) ns/op: 105.869 - Single filter ns/op: 82.5831 + Single filter ns/op: 74.2509 ... - Random filter ns/op: 224.936 + Random filter ns/op: 194.833 ---------------------------- Outside queries... - Dry run (aa1) ns/op: 118.503 + Dry run (aa1) ns/op: 104.925 - Single filter ns/op: 90.3023 + Single filter ns/op: 83.425 ... - Random filter ns/op: 220.455 + Random filter ns/op: 175.7 Average FP rate %: 1.13886 However PR#5936 has/will reclaim most of this cost. After that PR, the optimization of this code path is likely negligible, but nonetheless it's clear we aren't making performance any worse. Also fixed inadequate check of consistency between filter data size and num_lines. (Unit test updated.) Pull Request resolved: https://github.com/facebook/rocksdb/pull/5941 Test Plan: previously added unit tests FullBloomTest.CorruptFilters and FullBloomTest.RawSchema Differential Revision: D18018353 Pulled By: pdillinger fbshipit-source-id: 8e04c2b4a7d93223f49a237fd52ef2483929ed9c
2019-10-18 21:49:26 +00:00
ASSERT_TRUE(Matches("hello"));
ASSERT_TRUE(Matches("world"));
// Bad filter bits - returns false as if built from zero keys
// < 5 bytes overall means missing even metadata
OpenRaw(cft.Reset(-1, 3, 6, fill));
ASSERT_FALSE(Matches("hello"));
ASSERT_FALSE(Matches("world"));
OpenRaw(cft.Reset(-5, 3, 6, fill));
ASSERT_FALSE(Matches("hello"));
ASSERT_FALSE(Matches("world"));
// Dubious filter bits - returns same as fill (for now)
// 31 is not a useful num_probes, nor generated by RocksDB unless directly
// using filter bits API without BloomFilterPolicy.
OpenRaw(cft.Reset(CACHE_LINE_SIZE, 1, 31, fill));
ASSERT_EQ(fill, Matches("hello"));
ASSERT_EQ(fill, Matches("world"));
// Dubious filter bits - returns same as fill (for now)
// Similar, with 127, largest positive char
OpenRaw(cft.Reset(CACHE_LINE_SIZE, 1, 127, fill));
ASSERT_EQ(fill, Matches("hello"));
ASSERT_EQ(fill, Matches("world"));
// Dubious filter bits - returns true (for now)
// num_probes set to 128 / -128, lowest negative char
// NB: Bug in implementation interprets this as negative and has same
// effect as zero probes, but effectively reserves negative char values
// for future use.
OpenRaw(cft.Reset(CACHE_LINE_SIZE, 1, 128, fill));
ASSERT_TRUE(Matches("hello"));
ASSERT_TRUE(Matches("world"));
// Dubious filter bits - returns true (for now)
// Similar, with 255 / -1
OpenRaw(cft.Reset(CACHE_LINE_SIZE, 1, 255, fill));
ASSERT_TRUE(Matches("hello"));
ASSERT_TRUE(Matches("world"));
}
}
} // namespace rocksdb
int main(int argc, char** argv) {
rocksdb: switch to gtest Summary: Our existing test notation is very similar to what is used in gtest. It makes it easy to adopt what is different. In this diff I modify existing [[ https://code.google.com/p/googletest/wiki/Primer#Test_Fixtures:_Using_the_Same_Data_Configuration_for_Multiple_Te | test fixture ]] classes to inherit from `testing::Test`. Also for unit tests that use fixture class, `TEST` is replaced with `TEST_F` as required in gtest. There are several custom `main` functions in our existing tests. To make this transition easier, I modify all `main` functions to fallow gtest notation. But eventually we can remove them and use implementation of `main` that gtest provides. ```lang=bash % cat ~/transform #!/bin/sh files=$(git ls-files '*test\.cc') for file in $files do if grep -q "rocksdb::test::RunAllTests()" $file then if grep -Eq '^class \w+Test {' $file then perl -pi -e 's/^(class \w+Test) {/${1}: public testing::Test {/g' $file perl -pi -e 's/^(TEST)/${1}_F/g' $file fi perl -pi -e 's/(int main.*\{)/${1}::testing::InitGoogleTest(&argc, argv);/g' $file perl -pi -e 's/rocksdb::test::RunAllTests/RUN_ALL_TESTS/g' $file fi done % sh ~/transform % make format ``` Second iteration of this diff contains only scripted changes. Third iteration contains manual changes to fix last errors and make it compilable. Test Plan: Build and notice no errors. ```lang=bash % USE_CLANG=1 make check -j55 ``` Tests are still testing. Reviewers: meyering, sdong, rven, igor Reviewed By: igor Subscribers: dhruba, leveldb Differential Revision: https://reviews.facebook.net/D35157
2015-03-17 21:08:00 +00:00
::testing::InitGoogleTest(&argc, argv);
2014-05-09 15:34:18 +00:00
ParseCommandLineFlags(&argc, &argv, true);
rocksdb: switch to gtest Summary: Our existing test notation is very similar to what is used in gtest. It makes it easy to adopt what is different. In this diff I modify existing [[ https://code.google.com/p/googletest/wiki/Primer#Test_Fixtures:_Using_the_Same_Data_Configuration_for_Multiple_Te | test fixture ]] classes to inherit from `testing::Test`. Also for unit tests that use fixture class, `TEST` is replaced with `TEST_F` as required in gtest. There are several custom `main` functions in our existing tests. To make this transition easier, I modify all `main` functions to fallow gtest notation. But eventually we can remove them and use implementation of `main` that gtest provides. ```lang=bash % cat ~/transform #!/bin/sh files=$(git ls-files '*test\.cc') for file in $files do if grep -q "rocksdb::test::RunAllTests()" $file then if grep -Eq '^class \w+Test {' $file then perl -pi -e 's/^(class \w+Test) {/${1}: public testing::Test {/g' $file perl -pi -e 's/^(TEST)/${1}_F/g' $file fi perl -pi -e 's/(int main.*\{)/${1}::testing::InitGoogleTest(&argc, argv);/g' $file perl -pi -e 's/rocksdb::test::RunAllTests/RUN_ALL_TESTS/g' $file fi done % sh ~/transform % make format ``` Second iteration of this diff contains only scripted changes. Third iteration contains manual changes to fix last errors and make it compilable. Test Plan: Build and notice no errors. ```lang=bash % USE_CLANG=1 make check -j55 ``` Tests are still testing. Reviewers: meyering, sdong, rven, igor Reviewed By: igor Subscribers: dhruba, leveldb Differential Revision: https://reviews.facebook.net/D35157
2015-03-17 21:08:00 +00:00
return RUN_ALL_TESTS();
}
2014-05-09 15:34:18 +00:00
#endif // GFLAGS