mirror of https://github.com/facebook/rocksdb.git
Re-implement GetApproximateMemTableStats for skip lists (#13047)
Summary: GetApproximateMemTableStats() could return some bad results with the standard skip list memtable. See this new db_bench test showing the dismal distribution of results when the actual number of entries in range is 1000: ``` $ ./db_bench --benchmarks=filluniquerandom,approximatememtablestats,readrandom --value_size=1 --num=1000000 --batch_size=1000 ... filluniquerandom : 1.391 micros/op 718915 ops/sec 1.391 seconds 1000000 operations; 11.7 MB/s approximatememtablestats : 3.711 micros/op 269492 ops/sec 3.711 seconds 1000000 operations; Reported entry count stats (expected 1000): Count: 1000000 Average: 2344.1611 StdDev: 26587.27 Min: 0 Median: 965.8555 Max: 835273 Percentiles: P50: 965.86 P75: 1610.77 P99: 12618.01 P99.9: 74991.58 P99.99: 830970.97 ------------------------------------------------------ [ 0, 1 ] 131344 13.134% 13.134% ### ( 1, 2 ] 115 0.011% 13.146% ( 2, 3 ] 106 0.011% 13.157% ( 3, 4 ] 190 0.019% 13.176% ( 4, 6 ] 214 0.021% 13.197% ( 6, 10 ] 522 0.052% 13.249% ( 10, 15 ] 748 0.075% 13.324% ( 15, 22 ] 1002 0.100% 13.424% ( 22, 34 ] 1948 0.195% 13.619% ( 34, 51 ] 3067 0.307% 13.926% ( 51, 76 ] 4213 0.421% 14.347% ( 76, 110 ] 5721 0.572% 14.919% ( 110, 170 ] 11375 1.137% 16.056% ( 170, 250 ] 17928 1.793% 17.849% ( 250, 380 ] 36597 3.660% 21.509% # ( 380, 580 ] 77882 7.788% 29.297% ## ( 580, 870 ] 160193 16.019% 45.317% ### ( 870, 1300 ] 210098 21.010% 66.326% #### ( 1300, 1900 ] 167461 16.746% 83.072% ### ( 1900, 2900 ] 78678 7.868% 90.940% ## ( 2900, 4400 ] 47743 4.774% 95.715% # ( 4400, 6600 ] 17650 1.765% 97.480% ( 6600, 9900 ] 11895 1.190% 98.669% ( 9900, 14000 ] 4993 0.499% 99.168% ( 14000, 22000 ] 2384 0.238% 99.407% ( 22000, 33000 ] 1966 0.197% 99.603% ( 50000, 75000 ] 2968 0.297% 99.900% ( 570000, 860000 ] 999 0.100% 100.000% readrandom : 1.967 micros/op 508487 ops/sec 1.967 seconds 1000000 operations; 8.2 MB/s (1000000 of 1000000 found) ``` Perhaps the only good thing to say about the old implementation was that it was fast, though apparently not that fast. I've implemented a much more robust and reasonably fast new version of the function. It's still logarithmic but with some larger constant factors. The standard deviation from true count is around 20% or less, and roughly the CPU cost of two memtable point look-ups. See code comments for detail. ``` $ ./db_bench --benchmarks=filluniquerandom,approximatememtablestats,readrandom --value_size=1 --num=1000000 --batch_size=1000 ... filluniquerandom : 1.478 micros/op 676434 ops/sec 1.478 seconds 1000000 operations; 11.0 MB/s approximatememtablestats : 2.694 micros/op 371157 ops/sec 2.694 seconds 1000000 operations; Reported entry count stats (expected 1000): Count: 1000000 Average: 1073.5158 StdDev: 197.80 Min: 608 Median: 1079.9506 Max: 2176 Percentiles: P50: 1079.95 P75: 1223.69 P99: 1852.36 P99.9: 1898.70 P99.99: 2176.00 ------------------------------------------------------ ( 580, 870 ] 134848 13.485% 13.485% ### ( 870, 1300 ] 747868 74.787% 88.272% ############### ( 1300, 1900 ] 116536 11.654% 99.925% ## ( 1900, 2900 ] 748 0.075% 100.000% readrandom : 1.997 micros/op 500654 ops/sec 1.997 seconds 1000000 operations; 8.1 MB/s (1000000 of 1000000 found) ``` We can already see that the distribution of results is dramatically better and wonderfully normal-looking, with relative standard deviation around 20%. The function is also FASTER, at least with these parameters. Let's look how this behavior generalizes, first *much* larger range: ``` $ ./db_bench --benchmarks=filluniquerandom,approximatememtablestats,readrandom --value_size=1 --num=1000000 --batch_size=30000 filluniquerandom : 1.390 micros/op 719654 ops/sec 1.376 seconds 990000 operations; 11.7 MB/s approximatememtablestats : 1.129 micros/op 885649 ops/sec 1.129 seconds 1000000 operations; Reported entry count stats (expected 30000): Count: 1000000 Average: 31098.8795 StdDev: 3601.47 Min: 21504 Median: 29333.9303 Max: 43008 Percentiles: P50: 29333.93 P75: 33018.00 P99: 43008.00 P99.9: 43008.00 P99.99: 43008.00 ------------------------------------------------------ ( 14000, 22000 ] 408 0.041% 0.041% ( 22000, 33000 ] 749327 74.933% 74.974% ############### ( 33000, 50000 ] 250265 25.027% 100.000% ##### readrandom : 1.894 micros/op 528083 ops/sec 1.894 seconds 1000000 operations; 8.5 MB/s (989989 of 1000000 found) ``` This is *even faster* and relatively *more accurate*, with relative standard deviation closer to 10%. Code comments explain why. Now let's look at smaller ranges. Implementation quirks or conveniences: * When actual number in range is >= 40, the minimum return value is 40. * When the actual is <= 10, it is guaranteed to return that actual number. ``` $ ./db_bench --benchmarks=filluniquerandom,approximatememtablestats,readrandom --value_size=1 --num=1000000 --batch_size=75 ... filluniquerandom : 1.417 micros/op 705668 ops/sec 1.417 seconds 999975 operations; 11.4 MB/s approximatememtablestats : 3.342 micros/op 299197 ops/sec 3.342 seconds 1000000 operations; Reported entry count stats (expected 75): Count: 1000000 Average: 75.1210 StdDev: 15.02 Min: 40 Median: 71.9395 Max: 256 Percentiles: P50: 71.94 P75: 89.69 P99: 119.12 P99.9: 166.68 P99.99: 229.78 ------------------------------------------------------ ( 34, 51 ] 38867 3.887% 3.887% # ( 51, 76 ] 550554 55.055% 58.942% ########### ( 76, 110 ] 398854 39.885% 98.828% ######## ( 110, 170 ] 11353 1.135% 99.963% ( 170, 250 ] 364 0.036% 99.999% ( 250, 380 ] 8 0.001% 100.000% readrandom : 1.861 micros/op 537224 ops/sec 1.861 seconds 1000000 operations; 8.7 MB/s (999974 of 1000000 found) $ ./db_bench --benchmarks=filluniquerandom,approximatememtablestats,readrandom --value_size=1 --num=1000000 --batch_size=25 ... filluniquerandom : 1.501 micros/op 666283 ops/sec 1.501 seconds 1000000 operations; 10.8 MB/s approximatememtablestats : 5.118 micros/op 195401 ops/sec 5.118 seconds 1000000 operations; Reported entry count stats (expected 25): Count: 1000000 Average: 26.2392 StdDev: 4.58 Min: 25 Median: 28.4590 Max: 72 Percentiles: P50: 28.46 P75: 31.69 P99: 49.27 P99.9: 67.95 P99.99: 72.00 ------------------------------------------------------ ( 22, 34 ] 928936 92.894% 92.894% ################### ( 34, 51 ] 67960 6.796% 99.690% # ( 51, 76 ] 3104 0.310% 100.000% readrandom : 1.892 micros/op 528595 ops/sec 1.892 seconds 1000000 operations; 8.6 MB/s (1000000 of 1000000 found) $ ./db_bench --benchmarks=filluniquerandom,approximatememtablestats,readrandom --value_size=1 --num=1000000 --batch_size=10 ... filluniquerandom : 1.642 micros/op 608916 ops/sec 1.642 seconds 1000000 operations; 9.9 MB/s approximatememtablestats : 3.042 micros/op 328721 ops/sec 3.042 seconds 1000000 operations; Reported entry count stats (expected 10): Count: 1000000 Average: 10.0000 StdDev: 0.00 Min: 10 Median: 10.0000 Max: 10 Percentiles: P50: 10.00 P75: 10.00 P99: 10.00 P99.9: 10.00 P99.99: 10.00 ------------------------------------------------------ ( 6, 10 ] 1000000 100.000% 100.000% #################### readrandom : 1.805 micros/op 554126 ops/sec 1.805 seconds 1000000 operations; 9.0 MB/s (1000000 of 1000000 found) ``` Remarkably consistent. Pull Request resolved: https://github.com/facebook/rocksdb/pull/13047 Test Plan: new db_bench test for both performance and accuracy (see above); added to crash test; unit test updated. Reviewed By: cbi42 Differential Revision: D63722003 Pulled By: pdillinger fbshipit-source-id: cfc8613c085e87c17ecec22d82601aac2a5a1b26
This commit is contained in:
parent
389e66bef5
commit
dd23e84cad
|
@ -1826,21 +1826,30 @@ TEST_F(DBTest, GetApproximateMemTableStats) {
|
||||||
uint64_t count;
|
uint64_t count;
|
||||||
uint64_t size;
|
uint64_t size;
|
||||||
|
|
||||||
|
// Because Random::GetTLSInstance() seed is reset in DBTestBase,
|
||||||
|
// this test is deterministic.
|
||||||
|
|
||||||
std::string start = Key(50);
|
std::string start = Key(50);
|
||||||
std::string end = Key(60);
|
std::string end = Key(60);
|
||||||
Range r(start, end);
|
Range r(start, end);
|
||||||
db_->GetApproximateMemTableStats(r, &count, &size);
|
db_->GetApproximateMemTableStats(r, &count, &size);
|
||||||
ASSERT_GT(count, 0);
|
// When actual count is <= 10, it returns that as the minimum
|
||||||
ASSERT_LE(count, N);
|
EXPECT_EQ(count, 10);
|
||||||
ASSERT_GT(size, 6000);
|
EXPECT_EQ(size, 10440);
|
||||||
ASSERT_LT(size, 204800);
|
|
||||||
|
start = Key(20);
|
||||||
|
end = Key(100);
|
||||||
|
r = Range(start, end);
|
||||||
|
db_->GetApproximateMemTableStats(r, &count, &size);
|
||||||
|
EXPECT_EQ(count, 72);
|
||||||
|
EXPECT_EQ(size, 75168);
|
||||||
|
|
||||||
start = Key(500);
|
start = Key(500);
|
||||||
end = Key(600);
|
end = Key(600);
|
||||||
r = Range(start, end);
|
r = Range(start, end);
|
||||||
db_->GetApproximateMemTableStats(r, &count, &size);
|
db_->GetApproximateMemTableStats(r, &count, &size);
|
||||||
ASSERT_EQ(count, 0);
|
EXPECT_EQ(count, 0);
|
||||||
ASSERT_EQ(size, 0);
|
EXPECT_EQ(size, 0);
|
||||||
|
|
||||||
ASSERT_OK(Flush());
|
ASSERT_OK(Flush());
|
||||||
|
|
||||||
|
@ -1848,8 +1857,8 @@ TEST_F(DBTest, GetApproximateMemTableStats) {
|
||||||
end = Key(60);
|
end = Key(60);
|
||||||
r = Range(start, end);
|
r = Range(start, end);
|
||||||
db_->GetApproximateMemTableStats(r, &count, &size);
|
db_->GetApproximateMemTableStats(r, &count, &size);
|
||||||
ASSERT_EQ(count, 0);
|
EXPECT_EQ(count, 0);
|
||||||
ASSERT_EQ(size, 0);
|
EXPECT_EQ(size, 0);
|
||||||
|
|
||||||
for (int i = 0; i < N; i++) {
|
for (int i = 0; i < N; i++) {
|
||||||
ASSERT_OK(Put(Key(1000 + i), rnd.RandomString(1024)));
|
ASSERT_OK(Put(Key(1000 + i), rnd.RandomString(1024)));
|
||||||
|
@ -1857,10 +1866,11 @@ TEST_F(DBTest, GetApproximateMemTableStats) {
|
||||||
|
|
||||||
start = Key(100);
|
start = Key(100);
|
||||||
end = Key(1020);
|
end = Key(1020);
|
||||||
|
// Actually 20 keys in the range ^^
|
||||||
r = Range(start, end);
|
r = Range(start, end);
|
||||||
db_->GetApproximateMemTableStats(r, &count, &size);
|
db_->GetApproximateMemTableStats(r, &count, &size);
|
||||||
ASSERT_GT(count, 20);
|
EXPECT_EQ(count, 20);
|
||||||
ASSERT_GT(size, 6000);
|
EXPECT_EQ(size, 20880);
|
||||||
}
|
}
|
||||||
|
|
||||||
TEST_F(DBTest, ApproximateSizes) {
|
TEST_F(DBTest, ApproximateSizes) {
|
||||||
|
|
|
@ -1031,8 +1031,9 @@ DEFINE_int32(continuous_verification_interval, 1000,
|
||||||
"disables continuous verification.");
|
"disables continuous verification.");
|
||||||
|
|
||||||
DEFINE_int32(approximate_size_one_in, 64,
|
DEFINE_int32(approximate_size_one_in, 64,
|
||||||
"If non-zero, DB::GetApproximateSizes() will be called against"
|
"If non-zero, DB::GetApproximateSizes() and "
|
||||||
" random key ranges.");
|
"DB::GetApproximateMemTableStats() will be called against "
|
||||||
|
"random key ranges.");
|
||||||
|
|
||||||
DEFINE_int32(read_fault_one_in, 1000,
|
DEFINE_int32(read_fault_one_in, 1000,
|
||||||
"On non-zero, enables fault injection on read");
|
"On non-zero, enables fault injection on read");
|
||||||
|
|
|
@ -2427,22 +2427,31 @@ Status StressTest::TestApproximateSize(
|
||||||
std::string key1_str = Key(key1);
|
std::string key1_str = Key(key1);
|
||||||
std::string key2_str = Key(key2);
|
std::string key2_str = Key(key2);
|
||||||
Range range{Slice(key1_str), Slice(key2_str)};
|
Range range{Slice(key1_str), Slice(key2_str)};
|
||||||
SizeApproximationOptions sao;
|
if (thread->rand.OneIn(3)) {
|
||||||
sao.include_memtables = thread->rand.OneIn(2);
|
// Call GetApproximateMemTableStats instead
|
||||||
if (sao.include_memtables) {
|
uint64_t count, size;
|
||||||
sao.include_files = thread->rand.OneIn(2);
|
db_->GetApproximateMemTableStats(column_families_[rand_column_families[0]],
|
||||||
}
|
range, &count, &size);
|
||||||
if (thread->rand.OneIn(2)) {
|
return Status::OK();
|
||||||
if (thread->rand.OneIn(2)) {
|
} else {
|
||||||
sao.files_size_error_margin = 0.0;
|
// Call GetApproximateSizes
|
||||||
} else {
|
SizeApproximationOptions sao;
|
||||||
sao.files_size_error_margin =
|
sao.include_memtables = thread->rand.OneIn(2);
|
||||||
static_cast<double>(thread->rand.Uniform(3));
|
if (sao.include_memtables) {
|
||||||
|
sao.include_files = thread->rand.OneIn(2);
|
||||||
}
|
}
|
||||||
|
if (thread->rand.OneIn(2)) {
|
||||||
|
if (thread->rand.OneIn(2)) {
|
||||||
|
sao.files_size_error_margin = 0.0;
|
||||||
|
} else {
|
||||||
|
sao.files_size_error_margin =
|
||||||
|
static_cast<double>(thread->rand.Uniform(3));
|
||||||
|
}
|
||||||
|
}
|
||||||
|
uint64_t result;
|
||||||
|
return db_->GetApproximateSizes(
|
||||||
|
sao, column_families_[rand_column_families[0]], &range, 1, &result);
|
||||||
}
|
}
|
||||||
uint64_t result;
|
|
||||||
return db_->GetApproximateSizes(
|
|
||||||
sao, column_families_[rand_column_families[0]], &range, 1, &result);
|
|
||||||
}
|
}
|
||||||
|
|
||||||
Status StressTest::TestCheckpoint(ThreadState* thread,
|
Status StressTest::TestCheckpoint(ThreadState* thread,
|
||||||
|
|
|
@ -141,8 +141,9 @@ class InlineSkipList {
|
||||||
// Returns true iff an entry that compares equal to key is in the list.
|
// Returns true iff an entry that compares equal to key is in the list.
|
||||||
bool Contains(const char* key) const;
|
bool Contains(const char* key) const;
|
||||||
|
|
||||||
// Return estimated number of entries smaller than `key`.
|
// Return estimated number of entries from `start_ikey` to `end_ikey`.
|
||||||
uint64_t EstimateCount(const char* key) const;
|
uint64_t ApproximateNumEntries(const Slice& start_ikey,
|
||||||
|
const Slice& end_ikey) const;
|
||||||
|
|
||||||
// Validate correctness of the skip-list.
|
// Validate correctness of the skip-list.
|
||||||
void TEST_Validate() const;
|
void TEST_Validate() const;
|
||||||
|
@ -673,31 +674,88 @@ InlineSkipList<Comparator>::FindRandomEntry() const {
|
||||||
}
|
}
|
||||||
|
|
||||||
template <class Comparator>
|
template <class Comparator>
|
||||||
uint64_t InlineSkipList<Comparator>::EstimateCount(const char* key) const {
|
uint64_t InlineSkipList<Comparator>::ApproximateNumEntries(
|
||||||
uint64_t count = 0;
|
const Slice& start_ikey, const Slice& end_ikey) const {
|
||||||
|
// The number of entries at a given level for the given range, in terms of
|
||||||
|
// the actual number of entries in that range (level 0), follows a binomial
|
||||||
|
// distribution, which is very well approximated by the Poisson distribution.
|
||||||
|
// That has stddev sqrt(x) where x is the expected number of entries (mean)
|
||||||
|
// at this level, and the best predictor of x is the number of observed
|
||||||
|
// entries (at this level). To predict the number of entries on level 0 we use
|
||||||
|
// x * kBranchinng ^ level. From the standard deviation, the P99+ relative
|
||||||
|
// error is roughly 3 * sqrt(x) / x. Thus, a reasonable approach would be to
|
||||||
|
// find the smallest level with at least some moderate constant number entries
|
||||||
|
// in range. E.g. with at least ~40 entries, we expect P99+ relative error
|
||||||
|
// (approximation accuracy) of ~ 50% = 3 * sqrt(40) / 40; P95 error of
|
||||||
|
// ~30%; P75 error of < 20%.
|
||||||
|
//
|
||||||
|
// However, there are two issues with this approach, and an observation:
|
||||||
|
// * Pointer chasing on the larger (bottom) levels is much slower because of
|
||||||
|
// cache hierarchy effects, so when the result is smaller, getting the result
|
||||||
|
// will be substantially slower, despite traversing a similar number of
|
||||||
|
// entries. (We could be clever about pipelining our pointer chasing but
|
||||||
|
// that's complicated.)
|
||||||
|
// * The larger (bottom) levels also have lower variance because there's a
|
||||||
|
// chance (or certainty) that we reach level 0 and return the exact answer.
|
||||||
|
// * For applications in query planning, we can also tolerate more variance on
|
||||||
|
// small results because the impact of misestimating is likely smaller.
|
||||||
|
//
|
||||||
|
// These factors point us to an approach in which we have a higher minimum
|
||||||
|
// threshold number of samples for higher levels and lower for lower levels
|
||||||
|
// (see sufficient_samples below). This seems to yield roughly consistent
|
||||||
|
// relative error (stddev around 20%, less for large results) and roughly
|
||||||
|
// consistent query time around the time of two memtable point queries.
|
||||||
|
//
|
||||||
|
// Engineering observation: it is tempting to think that taking into account
|
||||||
|
// what we already found in how many entries occur on higher levels, not just
|
||||||
|
// the first iterated level with a sufficient number of samples, would yield
|
||||||
|
// a more accurate estimate. But that doesn't work because of the particular
|
||||||
|
// correlations and independences of the data: each level higher is just an
|
||||||
|
// independently probabilistic filtering of the level below it. That
|
||||||
|
// filtering from level l to l+1 has no more information about levels
|
||||||
|
// 0 .. l-1 than we can get from level l. The structure of RandomHeight() is
|
||||||
|
// a clue to these correlations and independences.
|
||||||
|
|
||||||
Node* x = head_;
|
Node* lb = head_;
|
||||||
int level = GetMaxHeight() - 1;
|
Node* ub = nullptr;
|
||||||
const DecodedKey key_decoded = compare_.decode_key(key);
|
uint64_t count = 0;
|
||||||
while (true) {
|
for (int level = GetMaxHeight() - 1; level >= 0; level--) {
|
||||||
assert(x == head_ || compare_(x->Key(), key_decoded) < 0);
|
auto sufficient_samples = static_cast<uint64_t>(level) * kBranching_ + 10U;
|
||||||
Node* next = x->Next(level);
|
if (count >= sufficient_samples) {
|
||||||
if (next != nullptr) {
|
// No more counting; apply powers of kBranching and avoid floating point
|
||||||
PREFETCH(next->Next(level), 0, 1);
|
count *= kBranching_;
|
||||||
|
continue;
|
||||||
}
|
}
|
||||||
if (next == nullptr || compare_(next->Key(), key_decoded) >= 0) {
|
count = 0;
|
||||||
if (level == 0) {
|
Node* next;
|
||||||
return count;
|
// Get a more precise lower bound (for start key)
|
||||||
} else {
|
for (;;) {
|
||||||
// Switch to next list
|
next = lb->Next(level);
|
||||||
count *= kBranching_;
|
if (next == ub) {
|
||||||
level--;
|
break;
|
||||||
|
}
|
||||||
|
assert(next != nullptr);
|
||||||
|
if (compare_(next->Key(), start_ikey) >= 0) {
|
||||||
|
break;
|
||||||
|
}
|
||||||
|
lb = next;
|
||||||
|
}
|
||||||
|
// Count entries on this level until upper bound (for end key)
|
||||||
|
for (;;) {
|
||||||
|
if (next == ub) {
|
||||||
|
break;
|
||||||
|
}
|
||||||
|
assert(next != nullptr);
|
||||||
|
if (compare_(next->Key(), end_ikey) >= 0) {
|
||||||
|
// Save refined upper bound to potentially save key comparison
|
||||||
|
ub = next;
|
||||||
|
break;
|
||||||
}
|
}
|
||||||
} else {
|
|
||||||
x = next;
|
|
||||||
count++;
|
count++;
|
||||||
|
next = next->Next(level);
|
||||||
}
|
}
|
||||||
}
|
}
|
||||||
|
return count;
|
||||||
}
|
}
|
||||||
|
|
||||||
template <class Comparator>
|
template <class Comparator>
|
||||||
|
|
|
@ -64,8 +64,9 @@ class SkipList {
|
||||||
// Returns true iff an entry that compares equal to key is in the list.
|
// Returns true iff an entry that compares equal to key is in the list.
|
||||||
bool Contains(const Key& key) const;
|
bool Contains(const Key& key) const;
|
||||||
|
|
||||||
// Return estimated number of entries smaller than `key`.
|
// Return estimated number of entries from `start_ikey` to `end_ikey`.
|
||||||
uint64_t EstimateCount(const Key& key) const;
|
uint64_t ApproximateNumEntries(const Slice& start_ikey,
|
||||||
|
const Slice& end_ikey) const;
|
||||||
|
|
||||||
// Iteration over the contents of a skip list
|
// Iteration over the contents of a skip list
|
||||||
class Iterator {
|
class Iterator {
|
||||||
|
@ -383,27 +384,49 @@ typename SkipList<Key, Comparator>::Node* SkipList<Key, Comparator>::FindLast()
|
||||||
}
|
}
|
||||||
|
|
||||||
template <typename Key, class Comparator>
|
template <typename Key, class Comparator>
|
||||||
uint64_t SkipList<Key, Comparator>::EstimateCount(const Key& key) const {
|
uint64_t SkipList<Key, Comparator>::ApproximateNumEntries(
|
||||||
|
const Slice& start_ikey, const Slice& end_ikey) const {
|
||||||
|
// See InlineSkipList<Comparator>::ApproximateNumEntries() (copy-paste)
|
||||||
|
Node* lb = head_;
|
||||||
|
Node* ub = nullptr;
|
||||||
uint64_t count = 0;
|
uint64_t count = 0;
|
||||||
|
for (int level = GetMaxHeight() - 1; level >= 0; level--) {
|
||||||
Node* x = head_;
|
auto sufficient_samples = static_cast<uint64_t>(level) * kBranching_ + 10U;
|
||||||
int level = GetMaxHeight() - 1;
|
if (count >= sufficient_samples) {
|
||||||
while (true) {
|
// No more counting; apply powers of kBranching and avoid floating point
|
||||||
assert(x == head_ || compare_(x->key, key) < 0);
|
count *= kBranching_;
|
||||||
Node* next = x->Next(level);
|
continue;
|
||||||
if (next == nullptr || compare_(next->key, key) >= 0) {
|
}
|
||||||
if (level == 0) {
|
count = 0;
|
||||||
return count;
|
Node* next;
|
||||||
} else {
|
// Get a more precise lower bound (for start key)
|
||||||
// Switch to next list
|
for (;;) {
|
||||||
count *= kBranching_;
|
next = lb->Next(level);
|
||||||
level--;
|
if (next == ub) {
|
||||||
|
break;
|
||||||
|
}
|
||||||
|
assert(next != nullptr);
|
||||||
|
if (compare_(next->Key(), start_ikey) >= 0) {
|
||||||
|
break;
|
||||||
|
}
|
||||||
|
lb = next;
|
||||||
|
}
|
||||||
|
// Count entries on this level until upper bound (for end key)
|
||||||
|
for (;;) {
|
||||||
|
if (next == ub) {
|
||||||
|
break;
|
||||||
|
}
|
||||||
|
assert(next != nullptr);
|
||||||
|
if (compare_(next->Key(), end_ikey) >= 0) {
|
||||||
|
// Save refined upper bound to potentially save key comparison
|
||||||
|
ub = next;
|
||||||
|
break;
|
||||||
}
|
}
|
||||||
} else {
|
|
||||||
x = next;
|
|
||||||
count++;
|
count++;
|
||||||
|
next = next->Next(level);
|
||||||
}
|
}
|
||||||
}
|
}
|
||||||
|
return count;
|
||||||
}
|
}
|
||||||
|
|
||||||
template <typename Key, class Comparator>
|
template <typename Key, class Comparator>
|
||||||
|
|
|
@ -108,11 +108,7 @@ class SkipListRep : public MemTableRep {
|
||||||
|
|
||||||
uint64_t ApproximateNumEntries(const Slice& start_ikey,
|
uint64_t ApproximateNumEntries(const Slice& start_ikey,
|
||||||
const Slice& end_ikey) override {
|
const Slice& end_ikey) override {
|
||||||
std::string tmp;
|
return skip_list_.ApproximateNumEntries(start_ikey, end_ikey);
|
||||||
uint64_t start_count =
|
|
||||||
skip_list_.EstimateCount(EncodeKey(&tmp, start_ikey));
|
|
||||||
uint64_t end_count = skip_list_.EstimateCount(EncodeKey(&tmp, end_ikey));
|
|
||||||
return (end_count >= start_count) ? (end_count - start_count) : 0;
|
|
||||||
}
|
}
|
||||||
|
|
||||||
void UniqueRandomSample(const uint64_t num_entries,
|
void UniqueRandomSample(const uint64_t num_entries,
|
||||||
|
|
|
@ -153,10 +153,11 @@ DEFINE_string(
|
||||||
"randomtransaction,"
|
"randomtransaction,"
|
||||||
"randomreplacekeys,"
|
"randomreplacekeys,"
|
||||||
"timeseries,"
|
"timeseries,"
|
||||||
"getmergeoperands,",
|
"getmergeoperands,"
|
||||||
"readrandomoperands,"
|
"readrandomoperands,"
|
||||||
"backup,"
|
"backup,"
|
||||||
"restore"
|
"restore,"
|
||||||
|
"approximatememtablestats",
|
||||||
|
|
||||||
"Comma-separated list of operations to run in the specified"
|
"Comma-separated list of operations to run in the specified"
|
||||||
" order. Available benchmarks:\n"
|
" order. Available benchmarks:\n"
|
||||||
|
@ -243,9 +244,14 @@ DEFINE_string(
|
||||||
"operation includes a rare but possible retry in case it got "
|
"operation includes a rare but possible retry in case it got "
|
||||||
"`Status::Incomplete()`. This happens upon encountering more keys than "
|
"`Status::Incomplete()`. This happens upon encountering more keys than "
|
||||||
"have ever been seen by the thread (or eight initially)\n"
|
"have ever been seen by the thread (or eight initially)\n"
|
||||||
"\tbackup -- Create a backup of the current DB and verify that a new backup is corrected. "
|
"\tbackup -- Create a backup of the current DB and verify that a new "
|
||||||
|
"backup is corrected. "
|
||||||
"Rate limit can be specified through --backup_rate_limit\n"
|
"Rate limit can be specified through --backup_rate_limit\n"
|
||||||
"\trestore -- Restore the DB from the latest backup available, rate limit can be specified through --restore_rate_limit\n");
|
"\trestore -- Restore the DB from the latest backup available, rate limit "
|
||||||
|
"can be specified through --restore_rate_limit\n"
|
||||||
|
"\tapproximatememtablestats -- Tests accuracy of "
|
||||||
|
"GetApproximateMemTableStats, ideally\n"
|
||||||
|
"after fillrandom, where actual answer is batch_size");
|
||||||
|
|
||||||
DEFINE_int64(num, 1000000, "Number of key/values to place in database");
|
DEFINE_int64(num, 1000000, "Number of key/values to place in database");
|
||||||
|
|
||||||
|
@ -3621,6 +3627,8 @@ class Benchmark {
|
||||||
fprintf(stderr, "entries_per_batch = %" PRIi64 "\n",
|
fprintf(stderr, "entries_per_batch = %" PRIi64 "\n",
|
||||||
entries_per_batch_);
|
entries_per_batch_);
|
||||||
method = &Benchmark::ApproximateSizeRandom;
|
method = &Benchmark::ApproximateSizeRandom;
|
||||||
|
} else if (name == "approximatememtablestats") {
|
||||||
|
method = &Benchmark::ApproximateMemtableStats;
|
||||||
} else if (name == "mixgraph") {
|
} else if (name == "mixgraph") {
|
||||||
method = &Benchmark::MixGraph;
|
method = &Benchmark::MixGraph;
|
||||||
} else if (name == "readmissing") {
|
} else if (name == "readmissing") {
|
||||||
|
@ -6298,6 +6306,35 @@ class Benchmark {
|
||||||
thread->stats.AddMessage(msg);
|
thread->stats.AddMessage(msg);
|
||||||
}
|
}
|
||||||
|
|
||||||
|
void ApproximateMemtableStats(ThreadState* thread) {
|
||||||
|
const size_t batch_size = entries_per_batch_;
|
||||||
|
std::unique_ptr<const char[]> skey_guard;
|
||||||
|
Slice skey = AllocateKey(&skey_guard);
|
||||||
|
std::unique_ptr<const char[]> ekey_guard;
|
||||||
|
Slice ekey = AllocateKey(&ekey_guard);
|
||||||
|
Duration duration(FLAGS_duration, reads_);
|
||||||
|
if (FLAGS_num < static_cast<int64_t>(batch_size)) {
|
||||||
|
std::terminate();
|
||||||
|
}
|
||||||
|
uint64_t range = static_cast<uint64_t>(FLAGS_num) - batch_size;
|
||||||
|
auto count_hist = std::make_shared<HistogramImpl>();
|
||||||
|
while (!duration.Done(1)) {
|
||||||
|
DB* db = SelectDB(thread);
|
||||||
|
uint64_t start_key = thread->rand.Uniform(range);
|
||||||
|
GenerateKeyFromInt(start_key, FLAGS_num, &skey);
|
||||||
|
uint64_t end_key = start_key + batch_size;
|
||||||
|
GenerateKeyFromInt(end_key, FLAGS_num, &ekey);
|
||||||
|
uint64_t count = UINT64_MAX;
|
||||||
|
uint64_t size = UINT64_MAX;
|
||||||
|
db->GetApproximateMemTableStats({skey, ekey}, &count, &size);
|
||||||
|
count_hist->Add(count);
|
||||||
|
thread->stats.FinishedOps(nullptr, db, 1, kOthers);
|
||||||
|
}
|
||||||
|
thread->stats.AddMessage("\nReported entry count stats (expected " +
|
||||||
|
std::to_string(batch_size) + "):");
|
||||||
|
thread->stats.AddMessage("\n" + count_hist->ToString());
|
||||||
|
}
|
||||||
|
|
||||||
// Calls ApproximateSize over random key ranges.
|
// Calls ApproximateSize over random key ranges.
|
||||||
void ApproximateSizeRandom(ThreadState* thread) {
|
void ApproximateSizeRandom(ThreadState* thread) {
|
||||||
int64_t size_sum = 0;
|
int64_t size_sum = 0;
|
||||||
|
|
|
@ -0,0 +1 @@
|
||||||
|
* `GetApproximateMemTableStats()` could return disastrously bad estimates 5-25% of the time. The function has been re-engineered to return much better estimates with similar CPU cost.
|
Loading…
Reference in New Issue