From 275cd80cdb2de8e53c7ab805d74394309372005e Mon Sep 17 00:00:00 2001 From: Gang Liao Date: Fri, 12 Aug 2022 17:59:06 -0700 Subject: [PATCH] Add a blob-specific cache priority (#10461) Summary: RocksDB's `Cache` abstraction currently supports two priority levels for items: high (used for frequently accessed/highly valuable SST metablocks like index/filter blocks) and low (used for SST data blocks). Blobs are typically lower-value targets for caching than data blocks, since 1) with BlobDB, data blocks containing blob references conceptually form an index structure which has to be consulted before we can read the blob value, and 2) cached blobs represent only a single key-value, while cached data blocks generally contain multiple KVs. Since we would like to make it possible to use the same backing cache for the block cache and the blob cache, it would make sense to add a new, lower-than-low cache priority level (bottom level) for blobs so data blocks are prioritized over them. This task is a part of https://github.com/facebook/rocksdb/issues/10156 Pull Request resolved: https://github.com/facebook/rocksdb/pull/10461 Reviewed By: siying Differential Revision: D38672823 Pulled By: ltamasi fbshipit-source-id: 90cf7362036563d79891f47be2cc24b827482743 --- HISTORY.md | 3 +- cache/cache.cc | 4 + cache/cache_bench_tool.cc | 4 +- cache/clock_cache.cc | 6 +- cache/compressed_secondary_cache.cc | 20 +- cache/compressed_secondary_cache.h | 2 +- cache/compressed_secondary_cache_test.cc | 46 ++- cache/lru_cache.cc | 103 ++++- cache/lru_cache.h | 56 ++- cache/lru_cache_test.cc | 365 ++++++++++++++---- db/blob/blob_file_builder.cc | 2 +- db/blob/blob_source.cc | 4 +- db/blob/blob_source_test.cc | 10 + db/db_block_cache_test.cc | 4 +- db/db_test2.cc | 8 +- include/rocksdb/cache.h | 37 +- java/rocksjni/lru_cache.cc | 8 +- java/src/main/java/org/rocksdb/LRUCache.java | 44 ++- .../test/java/org/rocksdb/LRUCacheTest.java | 7 +- memory/memory_allocator_test.cc | 2 +- .../block_based/block_based_table_factory.cc | 1 + tools/benchmark.sh | 2 +- tools/db_bench_tool.cc | 21 +- 23 files changed, 593 insertions(+), 166 deletions(-) diff --git a/HISTORY.md b/HISTORY.md index a2e5361329..b6047a600a 100644 --- a/HISTORY.md +++ b/HISTORY.md @@ -1,13 +1,14 @@ # Rocksdb Change Log ## Unreleased ### New Features - * Added `prepopulate_blob_cache` to ColumnFamilyOptions. If enabled, prepopulate warm/hot blobs which are already in memory into blob cache at the time of flush. On a flush, the blob that is in memory (in memtables) get flushed to the device. If using Direct IO, additional IO is incurred to read this blob back into memory again, which is avoided by enabling this option. This further helps if the workload exhibits high temporal locality, where most of the reads go to recently written data. This also helps in case of the remote file system since it involves network traffic and higher latencies. +* Added `prepopulate_blob_cache` to ColumnFamilyOptions. If enabled, prepopulate warm/hot blobs which are already in memory into blob cache at the time of flush. On a flush, the blob that is in memory (in memtables) get flushed to the device. If using Direct IO, additional IO is incurred to read this blob back into memory again, which is avoided by enabling this option. This further helps if the workload exhibits high temporal locality, where most of the reads go to recently written data. This also helps in case of the remote file system since it involves network traffic and higher latencies. * Support using secondary cache with the blob cache. When creating a blob cache, the user can set a secondary blob cache by configuring `secondary_cache` in LRUCacheOptions. * Charge memory usage of blob cache when the backing cache of the blob cache and the block cache are different. If an operation reserving memory for blob cache exceeds the avaible space left in the block cache at some point (i.e, causing a cache full under `LRUCacheOptions::strict_capacity_limit` = true), creation will fail with `Status::MemoryLimit()`. To opt in this feature, enable charging `CacheEntryRole::kBlobCache` in `BlockBasedTableOptions::cache_usage_options`. * Improve subcompaction range partition so that it is likely to be more even. More evenly distribution of subcompaction will improve compaction throughput for some workloads. All input files' index blocks to sample some anchor key points from which we pick positions to partition the input range. This would introduce some CPU overhead in compaction preparation phase, if subcompaction is enabled, but it should be a small fraction of the CPU usage of the whole compaction process. This also brings a behavier change: subcompaction number is much more likely to maxed out than before. * Add CompactionPri::kRoundRobin, a compaction picking mode that cycles through all the files with a compact cursor in a round-robin manner. This feature is available since 7.5. * Provide support for subcompactions for user_defined_timestamp. * Added an option `memtable_protection_bytes_per_key` that turns on memtable per key-value checksum protection. Each memtable entry will be suffixed by a checksum that is computed during writes, and verified in reads/compaction. Detected corruption will be logged and with corruption status returned to user. +* Added a blob-specific cache priority level - bottom level. Blobs are typically lower-value targets for caching than data blocks, since 1) with BlobDB, data blocks containing blob references conceptually form an index structure which has to be consulted before we can read the blob value, and 2) cached blobs represent only a single key-value, while cached data blocks generally contain multiple KVs. The user can specify the new option `low_pri_pool_ratio` in `LRUCacheOptions` to configure the ratio of capacity reserved for low priority cache entries (and therefore the remaining ratio is the space reserved for the bottom level), or configuring the new argument `low_pri_pool_ratio` in `NewLRUCache()` to achieve the same effect. ### Public API changes * Removed Customizable support for RateLimiter and removed its CreateFromString() and Type() functions. diff --git a/cache/cache.cc b/cache/cache.cc index 2904d5ecef..769cf82460 100644 --- a/cache/cache.cc +++ b/cache/cache.cc @@ -33,6 +33,10 @@ static std::unordered_map {offsetof(struct LRUCacheOptions, high_pri_pool_ratio), OptionType::kDouble, OptionVerificationType::kNormal, OptionTypeFlags::kMutable}}, + {"low_pri_pool_ratio", + {offsetof(struct LRUCacheOptions, low_pri_pool_ratio), + OptionType::kDouble, OptionVerificationType::kNormal, + OptionTypeFlags::kMutable}}, }; static std::unordered_map diff --git a/cache/cache_bench_tool.cc b/cache/cache_bench_tool.cc index ccdb90e49a..663bff953d 100644 --- a/cache/cache_bench_tool.cc +++ b/cache/cache_bench_tool.cc @@ -304,7 +304,9 @@ class CacheBench { FLAGS_cache_size, FLAGS_value_bytes, FLAGS_num_shard_bits, false /*strict_capacity_limit*/, kDefaultCacheMetadataChargePolicy); } else if (FLAGS_cache_type == "lru_cache") { - LRUCacheOptions opts(FLAGS_cache_size, FLAGS_num_shard_bits, false, 0.5); + LRUCacheOptions opts(FLAGS_cache_size, FLAGS_num_shard_bits, + false /* strict_capacity_limit */, + 0.5 /* high_pri_pool_ratio */); #ifndef ROCKSDB_LITE if (!FLAGS_secondary_cache_uri.empty()) { Status s = SecondaryCache::CreateFromString( diff --git a/cache/clock_cache.cc b/cache/clock_cache.cc index bb1d06254a..edc63ae4ed 100644 --- a/cache/clock_cache.cc +++ b/cache/clock_cache.cc @@ -697,8 +697,10 @@ void ClockCache::DisownData() { std::shared_ptr NewClockCache( size_t capacity, int num_shard_bits, bool strict_capacity_limit, CacheMetadataChargePolicy metadata_charge_policy) { - return NewLRUCache(capacity, num_shard_bits, strict_capacity_limit, 0.5, - nullptr, kDefaultToAdaptiveMutex, metadata_charge_policy); + return NewLRUCache(capacity, num_shard_bits, strict_capacity_limit, + /* high_pri_pool_ratio */ 0.5, nullptr, + kDefaultToAdaptiveMutex, metadata_charge_policy, + /* low_pri_pool_ratio */ 0.0); } std::shared_ptr ExperimentalNewClockCache( diff --git a/cache/compressed_secondary_cache.cc b/cache/compressed_secondary_cache.cc index 04640cdc45..a77ceff190 100644 --- a/cache/compressed_secondary_cache.cc +++ b/cache/compressed_secondary_cache.cc @@ -17,17 +17,18 @@ namespace ROCKSDB_NAMESPACE { CompressedSecondaryCache::CompressedSecondaryCache( size_t capacity, int num_shard_bits, bool strict_capacity_limit, - double high_pri_pool_ratio, + double high_pri_pool_ratio, double low_pri_pool_ratio, std::shared_ptr memory_allocator, bool use_adaptive_mutex, CacheMetadataChargePolicy metadata_charge_policy, CompressionType compression_type, uint32_t compress_format_version) : cache_options_(capacity, num_shard_bits, strict_capacity_limit, high_pri_pool_ratio, memory_allocator, use_adaptive_mutex, metadata_charge_policy, compression_type, - compress_format_version) { - cache_ = NewLRUCache(capacity, num_shard_bits, strict_capacity_limit, - high_pri_pool_ratio, memory_allocator, - use_adaptive_mutex, metadata_charge_policy); + compress_format_version, low_pri_pool_ratio) { + cache_ = + NewLRUCache(capacity, num_shard_bits, strict_capacity_limit, + high_pri_pool_ratio, memory_allocator, use_adaptive_mutex, + metadata_charge_policy, low_pri_pool_ratio); } CompressedSecondaryCache::~CompressedSecondaryCache() { cache_.reset(); } @@ -225,11 +226,12 @@ std::shared_ptr NewCompressedSecondaryCache( double high_pri_pool_ratio, std::shared_ptr memory_allocator, bool use_adaptive_mutex, CacheMetadataChargePolicy metadata_charge_policy, - CompressionType compression_type, uint32_t compress_format_version) { + CompressionType compression_type, uint32_t compress_format_version, + double low_pri_pool_ratio) { return std::make_shared( capacity, num_shard_bits, strict_capacity_limit, high_pri_pool_ratio, - memory_allocator, use_adaptive_mutex, metadata_charge_policy, - compression_type, compress_format_version); + low_pri_pool_ratio, memory_allocator, use_adaptive_mutex, + metadata_charge_policy, compression_type, compress_format_version); } std::shared_ptr NewCompressedSecondaryCache( @@ -240,7 +242,7 @@ std::shared_ptr NewCompressedSecondaryCache( opts.capacity, opts.num_shard_bits, opts.strict_capacity_limit, opts.high_pri_pool_ratio, opts.memory_allocator, opts.use_adaptive_mutex, opts.metadata_charge_policy, opts.compression_type, - opts.compress_format_version); + opts.compress_format_version, opts.low_pri_pool_ratio); } } // namespace ROCKSDB_NAMESPACE diff --git a/cache/compressed_secondary_cache.h b/cache/compressed_secondary_cache.h index e5ca55a336..bc194ee248 100644 --- a/cache/compressed_secondary_cache.h +++ b/cache/compressed_secondary_cache.h @@ -56,7 +56,7 @@ class CompressedSecondaryCache : public SecondaryCache { public: CompressedSecondaryCache( size_t capacity, int num_shard_bits, bool strict_capacity_limit, - double high_pri_pool_ratio, + double high_pri_pool_ratio, double low_pri_pool_ratio, std::shared_ptr memory_allocator = nullptr, bool use_adaptive_mutex = kDefaultToAdaptiveMutex, CacheMetadataChargePolicy metadata_charge_policy = diff --git a/cache/compressed_secondary_cache_test.cc b/cache/compressed_secondary_cache_test.cc index c335a1bf2b..4f1d02afa4 100644 --- a/cache/compressed_secondary_cache_test.cc +++ b/cache/compressed_secondary_cache_test.cc @@ -240,9 +240,11 @@ class CompressedSecondaryCacheTest : public testing::Test { secondary_cache_opts.num_shard_bits = 0; std::shared_ptr secondary_cache = NewCompressedSecondaryCache(secondary_cache_opts); - LRUCacheOptions lru_cache_opts(1300, 0, /*_strict_capacity_limit=*/false, - 0.5, nullptr, kDefaultToAdaptiveMutex, - kDefaultCacheMetadataChargePolicy); + LRUCacheOptions lru_cache_opts( + 1300 /* capacity */, 0 /* num_shard_bits */, + false /* strict_capacity_limit */, 0.5 /* high_pri_pool_ratio */, + nullptr /* memory_allocator */, kDefaultToAdaptiveMutex, + kDefaultCacheMetadataChargePolicy); lru_cache_opts.secondary_cache = secondary_cache; std::shared_ptr cache = NewLRUCache(lru_cache_opts); std::shared_ptr stats = CreateDBStatistics(); @@ -324,9 +326,11 @@ class CompressedSecondaryCacheTest : public testing::Test { std::shared_ptr secondary_cache = NewCompressedSecondaryCache(secondary_cache_opts); - LRUCacheOptions opts(1024, 0, /*_strict_capacity_limit=*/false, 0.5, - nullptr, kDefaultToAdaptiveMutex, - kDefaultCacheMetadataChargePolicy); + LRUCacheOptions opts( + 1024 /* capacity */, 0 /* num_shard_bits */, + false /* strict_capacity_limit */, 0.5 /* high_pri_pool_ratio */, + nullptr /* memory_allocator */, kDefaultToAdaptiveMutex, + kDefaultCacheMetadataChargePolicy); opts.secondary_cache = secondary_cache; std::shared_ptr cache = NewLRUCache(opts); @@ -371,9 +375,11 @@ class CompressedSecondaryCacheTest : public testing::Test { std::shared_ptr secondary_cache = NewCompressedSecondaryCache(secondary_cache_opts); - LRUCacheOptions opts(1200, 0, /*_strict_capacity_limit=*/false, 0.5, - nullptr, kDefaultToAdaptiveMutex, - kDefaultCacheMetadataChargePolicy); + LRUCacheOptions opts( + 1200 /* capacity */, 0 /* num_shard_bits */, + false /* strict_capacity_limit */, 0.5 /* high_pri_pool_ratio */, + nullptr /* memory_allocator */, kDefaultToAdaptiveMutex, + kDefaultCacheMetadataChargePolicy); opts.secondary_cache = secondary_cache; std::shared_ptr cache = NewLRUCache(opts); @@ -430,9 +436,11 @@ class CompressedSecondaryCacheTest : public testing::Test { std::shared_ptr secondary_cache = NewCompressedSecondaryCache(secondary_cache_opts); - LRUCacheOptions opts(1200, 0, /*_strict_capacity_limit=*/false, 0.5, - nullptr, kDefaultToAdaptiveMutex, - kDefaultCacheMetadataChargePolicy); + LRUCacheOptions opts( + 1200 /* capacity */, 0 /* num_shard_bits */, + false /* strict_capacity_limit */, 0.5 /* high_pri_pool_ratio */, + nullptr /* memory_allocator */, kDefaultToAdaptiveMutex, + kDefaultCacheMetadataChargePolicy); opts.secondary_cache = secondary_cache; std::shared_ptr cache = NewLRUCache(opts); @@ -488,9 +496,11 @@ class CompressedSecondaryCacheTest : public testing::Test { std::shared_ptr secondary_cache = NewCompressedSecondaryCache(secondary_cache_opts); - LRUCacheOptions opts(1200, 0, /*_strict_capacity_limit=*/true, 0.5, nullptr, - kDefaultToAdaptiveMutex, - kDefaultCacheMetadataChargePolicy); + LRUCacheOptions opts( + 1200 /* capacity */, 0 /* num_shard_bits */, + true /* strict_capacity_limit */, 0.5 /* high_pri_pool_ratio */, + nullptr /* memory_allocator */, kDefaultToAdaptiveMutex, + kDefaultCacheMetadataChargePolicy); opts.secondary_cache = secondary_cache; std::shared_ptr cache = NewLRUCache(opts); @@ -548,7 +558,7 @@ class CompressedSecondaryCacheTest : public testing::Test { using CacheValueChunk = CompressedSecondaryCache::CacheValueChunk; std::unique_ptr sec_cache = - std::make_unique(1000, 0, true, 0.5, + std::make_unique(1000, 0, true, 0.5, 0.0, allocator); Random rnd(301); // 10000 = 8169 + 1769 + 62 , so there should be 3 chunks after split. @@ -600,7 +610,7 @@ class CompressedSecondaryCacheTest : public testing::Test { std::string str = str1 + str2 + str3; std::unique_ptr sec_cache = - std::make_unique(1000, 0, true, 0.5); + std::make_unique(1000, 0, true, 0.5, 0.0); size_t charge{0}; CacheAllocationPtr value = sec_cache->MergeChunksIntoValue(chunks_head, charge); @@ -626,7 +636,7 @@ class CompressedSecondaryCacheTest : public testing::Test { using CacheValueChunk = CompressedSecondaryCache::CacheValueChunk; std::unique_ptr sec_cache = - std::make_unique(1000, 0, true, 0.5, + std::make_unique(1000, 0, true, 0.5, 0.0, allocator); Random rnd(301); // 10000 = 8169 + 1769 + 62 , so there should be 3 chunks after split. diff --git a/cache/lru_cache.cc b/cache/lru_cache.cc index 0d7f946c4f..1434e18ba4 100644 --- a/cache/lru_cache.cc +++ b/cache/lru_cache.cc @@ -111,14 +111,17 @@ void LRUHandleTable::Resize() { LRUCacheShard::LRUCacheShard( size_t capacity, bool strict_capacity_limit, double high_pri_pool_ratio, - bool use_adaptive_mutex, CacheMetadataChargePolicy metadata_charge_policy, - int max_upper_hash_bits, + double low_pri_pool_ratio, bool use_adaptive_mutex, + CacheMetadataChargePolicy metadata_charge_policy, int max_upper_hash_bits, const std::shared_ptr& secondary_cache) : capacity_(0), high_pri_pool_usage_(0), + low_pri_pool_usage_(0), strict_capacity_limit_(strict_capacity_limit), high_pri_pool_ratio_(high_pri_pool_ratio), high_pri_pool_capacity_(0), + low_pri_pool_ratio_(low_pri_pool_ratio), + low_pri_pool_capacity_(0), table_(max_upper_hash_bits), usage_(0), lru_usage_(0), @@ -129,6 +132,7 @@ LRUCacheShard::LRUCacheShard( lru_.next = &lru_; lru_.prev = &lru_; lru_low_pri_ = &lru_; + lru_bottom_pri_ = &lru_; SetCapacity(capacity); } @@ -192,10 +196,12 @@ void LRUCacheShard::ApplyToSomeEntries( index_begin, index_end); } -void LRUCacheShard::TEST_GetLRUList(LRUHandle** lru, LRUHandle** lru_low_pri) { +void LRUCacheShard::TEST_GetLRUList(LRUHandle** lru, LRUHandle** lru_low_pri, + LRUHandle** lru_bottom_pri) { DMutexLock l(mutex_); *lru = &lru_; *lru_low_pri = lru_low_pri_; + *lru_bottom_pri = lru_bottom_pri_; } size_t LRUCacheShard::TEST_GetLRUSize() { @@ -214,20 +220,32 @@ double LRUCacheShard::GetHighPriPoolRatio() { return high_pri_pool_ratio_; } +double LRUCacheShard::GetLowPriPoolRatio() { + DMutexLock l(mutex_); + return low_pri_pool_ratio_; +} + void LRUCacheShard::LRU_Remove(LRUHandle* e) { assert(e->next != nullptr); assert(e->prev != nullptr); if (lru_low_pri_ == e) { lru_low_pri_ = e->prev; } + if (lru_bottom_pri_ == e) { + lru_bottom_pri_ = e->prev; + } e->next->prev = e->prev; e->prev->next = e->next; e->prev = e->next = nullptr; assert(lru_usage_ >= e->total_charge); lru_usage_ -= e->total_charge; + assert(!e->InHighPriPool() || !e->InLowPriPool()); if (e->InHighPriPool()) { assert(high_pri_pool_usage_ >= e->total_charge); high_pri_pool_usage_ -= e->total_charge; + } else if (e->InLowPriPool()) { + assert(low_pri_pool_usage_ >= e->total_charge); + low_pri_pool_usage_ -= e->total_charge; } } @@ -241,17 +259,34 @@ void LRUCacheShard::LRU_Insert(LRUHandle* e) { e->prev->next = e; e->next->prev = e; e->SetInHighPriPool(true); + e->SetInLowPriPool(false); high_pri_pool_usage_ += e->total_charge; MaintainPoolSize(); - } else { - // Insert "e" to the head of low-pri pool. Note that when - // high_pri_pool_ratio is 0, head of low-pri pool is also head of LRU list. + } else if (low_pri_pool_ratio_ > 0 && + (e->IsHighPri() || e->IsLowPri() || e->HasHit())) { + // Insert "e" to the head of low-pri pool. e->next = lru_low_pri_->next; e->prev = lru_low_pri_; e->prev->next = e; e->next->prev = e; e->SetInHighPriPool(false); + e->SetInLowPriPool(true); + low_pri_pool_usage_ += e->total_charge; + MaintainPoolSize(); lru_low_pri_ = e; + } else { + // Insert "e" to the head of bottom-pri pool. + e->next = lru_bottom_pri_->next; + e->prev = lru_bottom_pri_; + e->prev->next = e; + e->next->prev = e; + e->SetInHighPriPool(false); + e->SetInLowPriPool(false); + // if the low-pri pool is empty, lru_low_pri_ also needs to be updated. + if (lru_bottom_pri_ == lru_low_pri_) { + lru_low_pri_ = e; + } + lru_bottom_pri_ = e; } lru_usage_ += e->total_charge; } @@ -262,8 +297,20 @@ void LRUCacheShard::MaintainPoolSize() { lru_low_pri_ = lru_low_pri_->next; assert(lru_low_pri_ != &lru_); lru_low_pri_->SetInHighPriPool(false); + lru_low_pri_->SetInLowPriPool(true); assert(high_pri_pool_usage_ >= lru_low_pri_->total_charge); high_pri_pool_usage_ -= lru_low_pri_->total_charge; + low_pri_pool_usage_ += lru_low_pri_->total_charge; + } + + while (low_pri_pool_usage_ > low_pri_pool_capacity_) { + // Overflow last entry in low-pri pool to bottom-pri pool. + lru_bottom_pri_ = lru_bottom_pri_->next; + assert(lru_bottom_pri_ != &lru_); + lru_bottom_pri_->SetInHighPriPool(false); + lru_bottom_pri_->SetInLowPriPool(false); + assert(low_pri_pool_usage_ >= lru_bottom_pri_->total_charge); + low_pri_pool_usage_ -= lru_bottom_pri_->total_charge; } } @@ -288,6 +335,7 @@ void LRUCacheShard::SetCapacity(size_t capacity) { DMutexLock l(mutex_); capacity_ = capacity; high_pri_pool_capacity_ = capacity_ * high_pri_pool_ratio_; + low_pri_pool_capacity_ = capacity_ * low_pri_pool_ratio_; EvictFromLRU(0, &last_reference_list); } @@ -503,6 +551,13 @@ void LRUCacheShard::SetHighPriorityPoolRatio(double high_pri_pool_ratio) { MaintainPoolSize(); } +void LRUCacheShard::SetLowPriorityPoolRatio(double low_pri_pool_ratio) { + DMutexLock l(mutex_); + low_pri_pool_ratio_ = low_pri_pool_ratio; + low_pri_pool_capacity_ = capacity_ * low_pri_pool_ratio_; + MaintainPoolSize(); +} + bool LRUCacheShard::Release(Cache::Handle* handle, bool erase_if_last_ref) { if (handle == nullptr) { return false; @@ -634,12 +689,15 @@ std::string LRUCacheShard::GetPrintableOptions() const { DMutexLock l(mutex_); snprintf(buffer, kBufferSize, " high_pri_pool_ratio: %.3lf\n", high_pri_pool_ratio_); + snprintf(buffer + strlen(buffer), kBufferSize - strlen(buffer), + " low_pri_pool_ratio: %.3lf\n", low_pri_pool_ratio_); } return std::string(buffer); } LRUCache::LRUCache(size_t capacity, int num_shard_bits, bool strict_capacity_limit, double high_pri_pool_ratio, + double low_pri_pool_ratio, std::shared_ptr allocator, bool use_adaptive_mutex, CacheMetadataChargePolicy metadata_charge_policy, @@ -653,7 +711,7 @@ LRUCache::LRUCache(size_t capacity, int num_shard_bits, for (int i = 0; i < num_shards_; i++) { new (&shards_[i]) LRUCacheShard( per_shard, strict_capacity_limit, high_pri_pool_ratio, - use_adaptive_mutex, metadata_charge_policy, + low_pri_pool_ratio, use_adaptive_mutex, metadata_charge_policy, /* max_upper_hash_bits */ 32 - num_shard_bits, secondary_cache); } secondary_cache_ = secondary_cache; @@ -775,7 +833,8 @@ std::shared_ptr NewLRUCache( double high_pri_pool_ratio, std::shared_ptr memory_allocator, bool use_adaptive_mutex, CacheMetadataChargePolicy metadata_charge_policy, - const std::shared_ptr& secondary_cache) { + const std::shared_ptr& secondary_cache, + double low_pri_pool_ratio) { if (num_shard_bits >= 20) { return nullptr; // The cache cannot be sharded into too many fine pieces. } @@ -783,30 +842,40 @@ std::shared_ptr NewLRUCache( // Invalid high_pri_pool_ratio return nullptr; } + if (low_pri_pool_ratio < 0.0 || low_pri_pool_ratio > 1.0) { + // Invalid high_pri_pool_ratio + return nullptr; + } + if (low_pri_pool_ratio + high_pri_pool_ratio > 1.0) { + // Invalid high_pri_pool_ratio and low_pri_pool_ratio combination + return nullptr; + } if (num_shard_bits < 0) { num_shard_bits = GetDefaultCacheShardBits(capacity); } return std::make_shared( capacity, num_shard_bits, strict_capacity_limit, high_pri_pool_ratio, - std::move(memory_allocator), use_adaptive_mutex, metadata_charge_policy, - secondary_cache); + low_pri_pool_ratio, std::move(memory_allocator), use_adaptive_mutex, + metadata_charge_policy, secondary_cache); } std::shared_ptr NewLRUCache(const LRUCacheOptions& cache_opts) { - return NewLRUCache( - cache_opts.capacity, cache_opts.num_shard_bits, - cache_opts.strict_capacity_limit, cache_opts.high_pri_pool_ratio, - cache_opts.memory_allocator, cache_opts.use_adaptive_mutex, - cache_opts.metadata_charge_policy, cache_opts.secondary_cache); + return NewLRUCache(cache_opts.capacity, cache_opts.num_shard_bits, + cache_opts.strict_capacity_limit, + cache_opts.high_pri_pool_ratio, + cache_opts.memory_allocator, cache_opts.use_adaptive_mutex, + cache_opts.metadata_charge_policy, + cache_opts.secondary_cache, cache_opts.low_pri_pool_ratio); } std::shared_ptr NewLRUCache( size_t capacity, int num_shard_bits, bool strict_capacity_limit, double high_pri_pool_ratio, std::shared_ptr memory_allocator, bool use_adaptive_mutex, - CacheMetadataChargePolicy metadata_charge_policy) { + CacheMetadataChargePolicy metadata_charge_policy, + double low_pri_pool_ratio) { return NewLRUCache(capacity, num_shard_bits, strict_capacity_limit, high_pri_pool_ratio, memory_allocator, use_adaptive_mutex, - metadata_charge_policy, nullptr); + metadata_charge_policy, nullptr, low_pri_pool_ratio); } } // namespace ROCKSDB_NAMESPACE diff --git a/cache/lru_cache.h b/cache/lru_cache.h index 67cb97e870..bdb6c44ab2 100644 --- a/cache/lru_cache.h +++ b/cache/lru_cache.h @@ -74,7 +74,7 @@ struct LRUHandle { // The number of external refs to this entry. The cache itself is not counted. uint32_t refs; - enum Flags : uint8_t { + enum Flags : uint16_t { // Whether this entry is referenced by the hash table. IN_CACHE = (1 << 0), // Whether this entry is high priority entry. @@ -89,9 +89,13 @@ struct LRUHandle { IS_PENDING = (1 << 5), // Whether this handle is still in a lower tier IS_IN_SECONDARY_CACHE = (1 << 6), + // Whether this entry is low priority entry. + IS_LOW_PRI = (1 << 7), + // Whether this entry is in low-pri pool. + IN_LOW_PRI_POOL = (1 << 8), }; - uint8_t flags; + uint16_t flags; #ifdef __SANITIZE_THREAD__ // TSAN can report a false data race on flags, where one thread is writing @@ -122,6 +126,8 @@ struct LRUHandle { bool InCache() const { return flags & IN_CACHE; } bool IsHighPri() const { return flags & IS_HIGH_PRI; } bool InHighPriPool() const { return flags & IN_HIGH_PRI_POOL; } + bool IsLowPri() const { return flags & IS_LOW_PRI; } + bool InLowPriPool() const { return flags & IN_LOW_PRI_POOL; } bool HasHit() const { return flags & HAS_HIT; } bool IsSecondaryCacheCompatible() const { #ifdef __SANITIZE_THREAD__ @@ -144,8 +150,13 @@ struct LRUHandle { void SetPriority(Cache::Priority priority) { if (priority == Cache::Priority::HIGH) { flags |= IS_HIGH_PRI; + flags &= ~IS_LOW_PRI; + } else if (priority == Cache::Priority::LOW) { + flags &= ~IS_HIGH_PRI; + flags |= IS_LOW_PRI; } else { flags &= ~IS_HIGH_PRI; + flags &= ~IS_LOW_PRI; } } @@ -157,6 +168,14 @@ struct LRUHandle { } } + void SetInLowPriPool(bool in_low_pri_pool) { + if (in_low_pri_pool) { + flags |= IN_LOW_PRI_POOL; + } else { + flags &= ~IN_LOW_PRI_POOL; + } + } + void SetHit() { flags |= HAS_HIT; } void SetSecondaryCacheCompatible(bool compat) { @@ -298,7 +317,8 @@ class LRUHandleTable { class ALIGN_AS(CACHE_LINE_SIZE) LRUCacheShard final : public CacheShard { public: LRUCacheShard(size_t capacity, bool strict_capacity_limit, - double high_pri_pool_ratio, bool use_adaptive_mutex, + double high_pri_pool_ratio, double low_pri_pool_ratio, + bool use_adaptive_mutex, CacheMetadataChargePolicy metadata_charge_policy, int max_upper_hash_bits, const std::shared_ptr& secondary_cache); @@ -315,6 +335,9 @@ class ALIGN_AS(CACHE_LINE_SIZE) LRUCacheShard final : public CacheShard { // Set percentage of capacity reserved for high-pri cache entries. void SetHighPriorityPoolRatio(double high_pri_pool_ratio); + // Set percentage of capacity reserved for low-pri cache entries. + void SetLowPriorityPoolRatio(double low_pri_pool_ratio); + // Like Cache methods, but with an extra "hash" parameter. virtual Status Insert(const Slice& key, uint32_t hash, void* value, size_t charge, Cache::DeleterFn deleter, @@ -366,15 +389,19 @@ class ALIGN_AS(CACHE_LINE_SIZE) LRUCacheShard final : public CacheShard { virtual std::string GetPrintableOptions() const override; - void TEST_GetLRUList(LRUHandle** lru, LRUHandle** lru_low_pri); + void TEST_GetLRUList(LRUHandle** lru, LRUHandle** lru_low_pri, + LRUHandle** lru_bottom_pri); - // Retrieves number of elements in LRU, for unit test purpose only. - // Not threadsafe. + // Retrieves number of elements in LRU, for unit test purpose only. + // Not threadsafe. size_t TEST_GetLRUSize(); - // Retrieves high pri pool ratio + // Retrieves high pri pool ratio double GetHighPriPoolRatio(); + // Retrieves low pri pool ratio + double GetLowPriPoolRatio(); + private: friend class LRUCache; // Insert an item into the hash table and, if handle is null, insert into @@ -414,6 +441,9 @@ class ALIGN_AS(CACHE_LINE_SIZE) LRUCacheShard final : public CacheShard { // Memory size for entries in high-pri pool. size_t high_pri_pool_usage_; + // Memory size for entries in low-pri pool. + size_t low_pri_pool_usage_; + // Whether to reject insertion if cache reaches its full capacity. bool strict_capacity_limit_; @@ -424,6 +454,13 @@ class ALIGN_AS(CACHE_LINE_SIZE) LRUCacheShard final : public CacheShard { // Remember the value to avoid recomputing each time. double high_pri_pool_capacity_; + // Ratio of capacity reserved for low priority cache entries. + double low_pri_pool_ratio_; + + // Low-pri pool size, equals to capacity * low_pri_pool_ratio. + // Remember the value to avoid recomputing each time. + double low_pri_pool_capacity_; + // Dummy head of LRU list. // lru.prev is newest entry, lru.next is oldest entry. // LRU contains items which can be evicted, ie reference only by cache @@ -432,6 +469,9 @@ class ALIGN_AS(CACHE_LINE_SIZE) LRUCacheShard final : public CacheShard { // Pointer to head of low-pri pool in LRU list. LRUHandle* lru_low_pri_; + // Pointer to head of bottom-pri pool in LRU list. + LRUHandle* lru_bottom_pri_; + // ------------^^^^^^^^^^^^^----------- // Not frequently modified data members // ------------------------------------ @@ -466,7 +506,7 @@ class LRUCache : public ShardedCache { public: LRUCache(size_t capacity, int num_shard_bits, bool strict_capacity_limit, - double high_pri_pool_ratio, + double high_pri_pool_ratio, double low_pri_pool_ratio, std::shared_ptr memory_allocator = nullptr, bool use_adaptive_mutex = kDefaultToAdaptiveMutex, CacheMetadataChargePolicy metadata_charge_policy = diff --git a/cache/lru_cache_test.cc b/cache/lru_cache_test.cc index 9c00b21504..cb472538d5 100644 --- a/cache/lru_cache_test.cc +++ b/cache/lru_cache_test.cc @@ -41,13 +41,14 @@ class LRUCacheTest : public testing::Test { } void NewCache(size_t capacity, double high_pri_pool_ratio = 0.0, + double low_pri_pool_ratio = 1.0, bool use_adaptive_mutex = kDefaultToAdaptiveMutex) { DeleteCache(); cache_ = reinterpret_cast( port::cacheline_aligned_alloc(sizeof(LRUCacheShard))); new (cache_) LRUCacheShard( capacity, false /*strict_capcity_limit*/, high_pri_pool_ratio, - use_adaptive_mutex, kDontChargeCacheMetadata, + low_pri_pool_ratio, use_adaptive_mutex, kDontChargeCacheMetadata, 24 /*max_upper_hash_bits*/, nullptr /*secondary_cache*/); } @@ -76,32 +77,66 @@ class LRUCacheTest : public testing::Test { void Erase(const std::string& key) { cache_->Erase(key, 0 /*hash*/); } void ValidateLRUList(std::vector keys, - size_t num_high_pri_pool_keys = 0) { + size_t num_high_pri_pool_keys = 0, + size_t num_low_pri_pool_keys = 0, + size_t num_bottom_pri_pool_keys = 0) { LRUHandle* lru; LRUHandle* lru_low_pri; - cache_->TEST_GetLRUList(&lru, &lru_low_pri); + LRUHandle* lru_bottom_pri; + cache_->TEST_GetLRUList(&lru, &lru_low_pri, &lru_bottom_pri); + LRUHandle* iter = lru; + + bool in_low_pri_pool = false; bool in_high_pri_pool = false; + size_t high_pri_pool_keys = 0; + size_t low_pri_pool_keys = 0; + size_t bottom_pri_pool_keys = 0; + + if (iter == lru_bottom_pri) { + in_low_pri_pool = true; + in_high_pri_pool = false; + } if (iter == lru_low_pri) { + in_low_pri_pool = false; in_high_pri_pool = true; } + for (const auto& key : keys) { iter = iter->next; ASSERT_NE(lru, iter); ASSERT_EQ(key, iter->key().ToString()); ASSERT_EQ(in_high_pri_pool, iter->InHighPriPool()); + ASSERT_EQ(in_low_pri_pool, iter->InLowPriPool()); if (in_high_pri_pool) { + ASSERT_FALSE(iter->InLowPriPool()); high_pri_pool_keys++; + } else if (in_low_pri_pool) { + ASSERT_FALSE(iter->InHighPriPool()); + low_pri_pool_keys++; + } else { + bottom_pri_pool_keys++; + } + if (iter == lru_bottom_pri) { + ASSERT_FALSE(in_low_pri_pool); + ASSERT_FALSE(in_high_pri_pool); + in_low_pri_pool = true; + in_high_pri_pool = false; } if (iter == lru_low_pri) { + ASSERT_TRUE(in_low_pri_pool); ASSERT_FALSE(in_high_pri_pool); + in_low_pri_pool = false; in_high_pri_pool = true; } } ASSERT_EQ(lru, iter->next); + ASSERT_FALSE(in_low_pri_pool); ASSERT_TRUE(in_high_pri_pool); ASSERT_EQ(num_high_pri_pool_keys, high_pri_pool_keys); + ASSERT_EQ(num_low_pri_pool_keys, low_pri_pool_keys); + ASSERT_EQ(num_bottom_pri_pool_keys, bottom_pri_pool_keys); } private: @@ -113,98 +148,219 @@ TEST_F(LRUCacheTest, BasicLRU) { for (char ch = 'a'; ch <= 'e'; ch++) { Insert(ch); } - ValidateLRUList({"a", "b", "c", "d", "e"}); + ValidateLRUList({"a", "b", "c", "d", "e"}, 0, 5); for (char ch = 'x'; ch <= 'z'; ch++) { Insert(ch); } - ValidateLRUList({"d", "e", "x", "y", "z"}); + ValidateLRUList({"d", "e", "x", "y", "z"}, 0, 5); ASSERT_FALSE(Lookup("b")); - ValidateLRUList({"d", "e", "x", "y", "z"}); + ValidateLRUList({"d", "e", "x", "y", "z"}, 0, 5); ASSERT_TRUE(Lookup("e")); - ValidateLRUList({"d", "x", "y", "z", "e"}); + ValidateLRUList({"d", "x", "y", "z", "e"}, 0, 5); ASSERT_TRUE(Lookup("z")); - ValidateLRUList({"d", "x", "y", "e", "z"}); + ValidateLRUList({"d", "x", "y", "e", "z"}, 0, 5); Erase("x"); - ValidateLRUList({"d", "y", "e", "z"}); + ValidateLRUList({"d", "y", "e", "z"}, 0, 4); ASSERT_TRUE(Lookup("d")); - ValidateLRUList({"y", "e", "z", "d"}); + ValidateLRUList({"y", "e", "z", "d"}, 0, 4); Insert("u"); - ValidateLRUList({"y", "e", "z", "d", "u"}); + ValidateLRUList({"y", "e", "z", "d", "u"}, 0, 5); Insert("v"); - ValidateLRUList({"e", "z", "d", "u", "v"}); + ValidateLRUList({"e", "z", "d", "u", "v"}, 0, 5); } -TEST_F(LRUCacheTest, MidpointInsertion) { - // Allocate 2 cache entries to high-pri pool. - NewCache(5, 0.45); +TEST_F(LRUCacheTest, LowPriorityMidpointInsertion) { + // Allocate 2 cache entries to high-pri pool and 3 to low-pri pool. + NewCache(5, /* high_pri_pool_ratio */ 0.40, /* low_pri_pool_ratio */ 0.60); Insert("a", Cache::Priority::LOW); Insert("b", Cache::Priority::LOW); Insert("c", Cache::Priority::LOW); Insert("x", Cache::Priority::HIGH); Insert("y", Cache::Priority::HIGH); - ValidateLRUList({"a", "b", "c", "x", "y"}, 2); + ValidateLRUList({"a", "b", "c", "x", "y"}, 2, 3); // Low-pri entries inserted to the tail of low-pri list (the midpoint). // After lookup, it will move to the tail of the full list. Insert("d", Cache::Priority::LOW); - ValidateLRUList({"b", "c", "d", "x", "y"}, 2); + ValidateLRUList({"b", "c", "d", "x", "y"}, 2, 3); ASSERT_TRUE(Lookup("d")); - ValidateLRUList({"b", "c", "x", "y", "d"}, 2); + ValidateLRUList({"b", "c", "x", "y", "d"}, 2, 3); // High-pri entries will be inserted to the tail of full list. Insert("z", Cache::Priority::HIGH); - ValidateLRUList({"c", "x", "y", "d", "z"}, 2); + ValidateLRUList({"c", "x", "y", "d", "z"}, 2, 3); +} + +TEST_F(LRUCacheTest, BottomPriorityMidpointInsertion) { + // Allocate 2 cache entries to high-pri pool and 2 to low-pri pool. + NewCache(6, /* high_pri_pool_ratio */ 0.35, /* low_pri_pool_ratio */ 0.35); + + Insert("a", Cache::Priority::BOTTOM); + Insert("b", Cache::Priority::BOTTOM); + Insert("i", Cache::Priority::LOW); + Insert("j", Cache::Priority::LOW); + Insert("x", Cache::Priority::HIGH); + Insert("y", Cache::Priority::HIGH); + ValidateLRUList({"a", "b", "i", "j", "x", "y"}, 2, 2, 2); + + // Low-pri entries will be inserted to the tail of low-pri list (the + // midpoint). After lookup, 'k' will move to the tail of the full list, and + // 'x' will spill over to the low-pri pool. + Insert("k", Cache::Priority::LOW); + ValidateLRUList({"b", "i", "j", "k", "x", "y"}, 2, 2, 2); + ASSERT_TRUE(Lookup("k")); + ValidateLRUList({"b", "i", "j", "x", "y", "k"}, 2, 2, 2); + + // High-pri entries will be inserted to the tail of full list. Although y was + // inserted with high priority, it got spilled over to the low-pri pool. As + // a result, j also got spilled over to the bottom-pri pool. + Insert("z", Cache::Priority::HIGH); + ValidateLRUList({"i", "j", "x", "y", "k", "z"}, 2, 2, 2); + Erase("x"); + ValidateLRUList({"i", "j", "y", "k", "z"}, 2, 1, 2); + Erase("y"); + ValidateLRUList({"i", "j", "k", "z"}, 2, 0, 2); + + // Bottom-pri entries will be inserted to the tail of bottom-pri list. + Insert("c", Cache::Priority::BOTTOM); + ValidateLRUList({"i", "j", "c", "k", "z"}, 2, 0, 3); + Insert("d", Cache::Priority::BOTTOM); + ValidateLRUList({"i", "j", "c", "d", "k", "z"}, 2, 0, 4); + Insert("e", Cache::Priority::BOTTOM); + ValidateLRUList({"j", "c", "d", "e", "k", "z"}, 2, 0, 4); + + // Low-pri entries will be inserted to the tail of low-pri list (the + // midpoint). + Insert("l", Cache::Priority::LOW); + ValidateLRUList({"c", "d", "e", "l", "k", "z"}, 2, 1, 3); + Insert("m", Cache::Priority::LOW); + ValidateLRUList({"d", "e", "l", "m", "k", "z"}, 2, 2, 2); + + Erase("k"); + ValidateLRUList({"d", "e", "l", "m", "z"}, 1, 2, 2); + Erase("z"); + ValidateLRUList({"d", "e", "l", "m"}, 0, 2, 2); + + // Bottom-pri entries will be inserted to the tail of bottom-pri list. + Insert("f", Cache::Priority::BOTTOM); + ValidateLRUList({"d", "e", "f", "l", "m"}, 0, 2, 3); + Insert("g", Cache::Priority::BOTTOM); + ValidateLRUList({"d", "e", "f", "g", "l", "m"}, 0, 2, 4); + + // High-pri entries will be inserted to the tail of full list. + Insert("o", Cache::Priority::HIGH); + ValidateLRUList({"e", "f", "g", "l", "m", "o"}, 1, 2, 3); + Insert("p", Cache::Priority::HIGH); + ValidateLRUList({"f", "g", "l", "m", "o", "p"}, 2, 2, 2); } TEST_F(LRUCacheTest, EntriesWithPriority) { - // Allocate 2 cache entries to high-pri pool. - NewCache(5, 0.45); + // Allocate 2 cache entries to high-pri pool and 2 to low-pri pool. + NewCache(6, /* high_pri_pool_ratio */ 0.35, /* low_pri_pool_ratio */ 0.35); Insert("a", Cache::Priority::LOW); Insert("b", Cache::Priority::LOW); + ValidateLRUList({"a", "b"}, 0, 2, 0); + // Low-pri entries can overflow to bottom-pri pool. Insert("c", Cache::Priority::LOW); - ValidateLRUList({"a", "b", "c"}, 0); + ValidateLRUList({"a", "b", "c"}, 0, 2, 1); - // Low-pri entries can take high-pri pool capacity if available + // Bottom-pri entries can take high-pri pool capacity if available + Insert("t", Cache::Priority::LOW); Insert("u", Cache::Priority::LOW); + ValidateLRUList({"a", "b", "c", "t", "u"}, 0, 2, 3); Insert("v", Cache::Priority::LOW); - ValidateLRUList({"a", "b", "c", "u", "v"}, 0); + ValidateLRUList({"a", "b", "c", "t", "u", "v"}, 0, 2, 4); + Insert("w", Cache::Priority::LOW); + ValidateLRUList({"b", "c", "t", "u", "v", "w"}, 0, 2, 4); Insert("X", Cache::Priority::HIGH); Insert("Y", Cache::Priority::HIGH); - ValidateLRUList({"c", "u", "v", "X", "Y"}, 2); + ValidateLRUList({"t", "u", "v", "w", "X", "Y"}, 2, 2, 2); - // High-pri entries can overflow to low-pri pool. + // After lookup, the high-pri entry 'X' got spilled over to the low-pri pool. + // The low-pri entry 'v' got spilled over to the bottom-pri pool. Insert("Z", Cache::Priority::HIGH); - ValidateLRUList({"u", "v", "X", "Y", "Z"}, 2); + ValidateLRUList({"u", "v", "w", "X", "Y", "Z"}, 2, 2, 2); // Low-pri entries will be inserted to head of low-pri pool. Insert("a", Cache::Priority::LOW); - ValidateLRUList({"v", "X", "a", "Y", "Z"}, 2); + ValidateLRUList({"v", "w", "X", "a", "Y", "Z"}, 2, 2, 2); - // Low-pri entries will be inserted to head of high-pri pool after lookup. + // After lookup, the high-pri entry 'Y' got spilled over to the low-pri pool. + // The low-pri entry 'X' got spilled over to the bottom-pri pool. ASSERT_TRUE(Lookup("v")); - ValidateLRUList({"X", "a", "Y", "Z", "v"}, 2); + ValidateLRUList({"w", "X", "a", "Y", "Z", "v"}, 2, 2, 2); - // High-pri entries will be inserted to the head of the list after lookup. + // After lookup, the high-pri entry 'Z' got spilled over to the low-pri pool. + // The low-pri entry 'a' got spilled over to the bottom-pri pool. ASSERT_TRUE(Lookup("X")); - ValidateLRUList({"a", "Y", "Z", "v", "X"}, 2); + ValidateLRUList({"w", "a", "Y", "Z", "v", "X"}, 2, 2, 2); + + // After lookup, the low pri entry 'Z' got promoted back to high-pri pool. The + // high-pri entry 'v' got spilled over to the low-pri pool. ASSERT_TRUE(Lookup("Z")); - ValidateLRUList({"a", "Y", "v", "X", "Z"}, 2); + ValidateLRUList({"w", "a", "Y", "v", "X", "Z"}, 2, 2, 2); Erase("Y"); - ValidateLRUList({"a", "v", "X", "Z"}, 2); + ValidateLRUList({"w", "a", "v", "X", "Z"}, 2, 1, 2); Erase("X"); - ValidateLRUList({"a", "v", "Z"}, 1); + ValidateLRUList({"w", "a", "v", "Z"}, 1, 1, 2); + Insert("d", Cache::Priority::LOW); Insert("e", Cache::Priority::LOW); - ValidateLRUList({"a", "v", "d", "e", "Z"}, 1); + ValidateLRUList({"w", "a", "v", "d", "e", "Z"}, 1, 2, 3); + Insert("f", Cache::Priority::LOW); Insert("g", Cache::Priority::LOW); - ValidateLRUList({"d", "e", "f", "g", "Z"}, 1); + ValidateLRUList({"v", "d", "e", "f", "g", "Z"}, 1, 2, 3); ASSERT_TRUE(Lookup("d")); - ValidateLRUList({"e", "f", "g", "Z", "d"}, 2); + ValidateLRUList({"v", "e", "f", "g", "Z", "d"}, 2, 2, 2); + + // Erase some entries. + Erase("e"); + Erase("f"); + Erase("Z"); + ValidateLRUList({"v", "g", "d"}, 1, 1, 1); + + // Bottom-pri entries can take low- and high-pri pool capacity if available + Insert("o", Cache::Priority::BOTTOM); + ValidateLRUList({"v", "o", "g", "d"}, 1, 1, 2); + Insert("p", Cache::Priority::BOTTOM); + ValidateLRUList({"v", "o", "p", "g", "d"}, 1, 1, 3); + Insert("q", Cache::Priority::BOTTOM); + ValidateLRUList({"v", "o", "p", "q", "g", "d"}, 1, 1, 4); + + // High-pri entries can overflow to low-pri pool, and bottom-pri entries will + // be evicted. + Insert("x", Cache::Priority::HIGH); + ValidateLRUList({"o", "p", "q", "g", "d", "x"}, 2, 1, 3); + Insert("y", Cache::Priority::HIGH); + ValidateLRUList({"p", "q", "g", "d", "x", "y"}, 2, 2, 2); + Insert("z", Cache::Priority::HIGH); + ValidateLRUList({"q", "g", "d", "x", "y", "z"}, 2, 2, 2); + + // 'g' is bottom-pri before this lookup, it will be inserted to head of + // high-pri pool after lookup. + ASSERT_TRUE(Lookup("g")); + ValidateLRUList({"q", "d", "x", "y", "z", "g"}, 2, 2, 2); + + // High-pri entries will be inserted to head of high-pri pool after lookup. + ASSERT_TRUE(Lookup("z")); + ValidateLRUList({"q", "d", "x", "y", "g", "z"}, 2, 2, 2); + + // Bottom-pri entries will be inserted to head of high-pri pool after lookup. + ASSERT_TRUE(Lookup("d")); + ValidateLRUList({"q", "x", "y", "g", "z", "d"}, 2, 2, 2); + + // Bottom-pri entries will be inserted to the tail of bottom-pri list. + Insert("m", Cache::Priority::BOTTOM); + ValidateLRUList({"x", "m", "y", "g", "z", "d"}, 2, 2, 2); + + // Bottom-pri entries will be inserted to head of high-pri pool after lookup. + ASSERT_TRUE(Lookup("m")); + ValidateLRUList({"x", "y", "g", "z", "d", "m"}, 2, 2, 2); } // TODO: FastLRUCache and ClockCache use the same tests. We can probably remove @@ -547,8 +703,9 @@ class TestSecondaryCache : public SecondaryCache { explicit TestSecondaryCache(size_t capacity) : num_inserts_(0), num_lookups_(0), inject_failure_(false) { - cache_ = NewLRUCache(capacity, 0, false, 0.5, nullptr, - kDefaultToAdaptiveMutex, kDontChargeCacheMetadata); + cache_ = + NewLRUCache(capacity, 0, false, 0.5 /* high_pri_pool_ratio */, nullptr, + kDefaultToAdaptiveMutex, kDontChargeCacheMetadata); } ~TestSecondaryCache() override { cache_.reset(); } @@ -785,7 +942,10 @@ Cache::CacheItemHelper LRUCacheSecondaryCacheTest::helper_fail_( LRUCacheSecondaryCacheTest::DeletionCallback); TEST_F(LRUCacheSecondaryCacheTest, BasicTest) { - LRUCacheOptions opts(1024, 0, false, 0.5, nullptr, kDefaultToAdaptiveMutex, + LRUCacheOptions opts(1024 /* capacity */, 0 /* num_shard_bits */, + false /* strict_capacity_limit */, + 0.5 /* high_pri_pool_ratio */, + nullptr /* memory_allocator */, kDefaultToAdaptiveMutex, kDontChargeCacheMetadata); std::shared_ptr secondary_cache = std::make_shared(2048); @@ -831,7 +991,10 @@ TEST_F(LRUCacheSecondaryCacheTest, BasicTest) { } TEST_F(LRUCacheSecondaryCacheTest, BasicFailTest) { - LRUCacheOptions opts(1024, 0, false, 0.5, nullptr, kDefaultToAdaptiveMutex, + LRUCacheOptions opts(1024 /* capacity */, 0 /* num_shard_bits */, + false /* strict_capacity_limit */, + 0.5 /* high_pri_pool_ratio */, + nullptr /* memory_allocator */, kDefaultToAdaptiveMutex, kDontChargeCacheMetadata); std::shared_ptr secondary_cache = std::make_shared(2048); @@ -862,7 +1025,10 @@ TEST_F(LRUCacheSecondaryCacheTest, BasicFailTest) { } TEST_F(LRUCacheSecondaryCacheTest, SaveFailTest) { - LRUCacheOptions opts(1024, 0, false, 0.5, nullptr, kDefaultToAdaptiveMutex, + LRUCacheOptions opts(1024 /* capacity */, 0 /* num_shard_bits */, + false /* strict_capacity_limit */, + 0.5 /* high_pri_pool_ratio */, + nullptr /* memory_allocator */, kDefaultToAdaptiveMutex, kDontChargeCacheMetadata); std::shared_ptr secondary_cache = std::make_shared(2048); @@ -909,7 +1075,10 @@ TEST_F(LRUCacheSecondaryCacheTest, SaveFailTest) { } TEST_F(LRUCacheSecondaryCacheTest, CreateFailTest) { - LRUCacheOptions opts(1024, 0, false, 0.5, nullptr, kDefaultToAdaptiveMutex, + LRUCacheOptions opts(1024 /* capacity */, 0 /* num_shard_bits */, + false /* strict_capacity_limit */, + 0.5 /* high_pri_pool_ratio */, + nullptr /* memory_allocator */, kDefaultToAdaptiveMutex, kDontChargeCacheMetadata); std::shared_ptr secondary_cache = std::make_shared(2048); @@ -952,8 +1121,11 @@ TEST_F(LRUCacheSecondaryCacheTest, CreateFailTest) { } TEST_F(LRUCacheSecondaryCacheTest, FullCapacityTest) { - LRUCacheOptions opts(1024, 0, /*_strict_capacity_limit=*/true, 0.5, nullptr, - kDefaultToAdaptiveMutex, kDontChargeCacheMetadata); + LRUCacheOptions opts(1024 /* capacity */, 0 /* num_shard_bits */, + true /* strict_capacity_limit */, + 0.5 /* high_pri_pool_ratio */, + nullptr /* memory_allocator */, kDefaultToAdaptiveMutex, + kDontChargeCacheMetadata); std::shared_ptr secondary_cache = std::make_shared(2048); opts.secondary_cache = secondary_cache; @@ -1003,8 +1175,11 @@ TEST_F(LRUCacheSecondaryCacheTest, FullCapacityTest) { // if we try to insert block_1 to the block cache, it will always fails. Only // block_2 will be successfully inserted into the block cache. TEST_F(DBSecondaryCacheTest, TestSecondaryCacheCorrectness1) { - LRUCacheOptions opts(4 * 1024, 0, false, 0.5, nullptr, - kDefaultToAdaptiveMutex, kDontChargeCacheMetadata); + LRUCacheOptions opts(4 * 1024 /* capacity */, 0 /* num_shard_bits */, + false /* strict_capacity_limit */, + 0.5 /* high_pri_pool_ratio */, + nullptr /* memory_allocator */, kDefaultToAdaptiveMutex, + kDontChargeCacheMetadata); std::shared_ptr secondary_cache( new TestSecondaryCache(2048 * 1024)); opts.secondary_cache = secondary_cache; @@ -1097,7 +1272,10 @@ TEST_F(DBSecondaryCacheTest, TestSecondaryCacheCorrectness1) { // insert and cache block_1 in the block cache (this is the different place // from TestSecondaryCacheCorrectness1) TEST_F(DBSecondaryCacheTest, TestSecondaryCacheCorrectness2) { - LRUCacheOptions opts(6100, 0, false, 0.5, nullptr, kDefaultToAdaptiveMutex, + LRUCacheOptions opts(6100 /* capacity */, 0 /* num_shard_bits */, + false /* strict_capacity_limit */, + 0.5 /* high_pri_pool_ratio */, + nullptr /* memory_allocator */, kDefaultToAdaptiveMutex, kDontChargeCacheMetadata); std::shared_ptr secondary_cache( new TestSecondaryCache(2048 * 1024)); @@ -1187,8 +1365,11 @@ TEST_F(DBSecondaryCacheTest, TestSecondaryCacheCorrectness2) { // cache all the blocks in the block cache and there is not secondary cache // insertion. 2 lookup is needed for the blocks. TEST_F(DBSecondaryCacheTest, NoSecondaryCacheInsertion) { - LRUCacheOptions opts(1024 * 1024, 0, false, 0.5, nullptr, - kDefaultToAdaptiveMutex, kDontChargeCacheMetadata); + LRUCacheOptions opts(1024 * 1024 /* capacity */, 0 /* num_shard_bits */, + false /* strict_capacity_limit */, + 0.5 /* high_pri_pool_ratio */, + nullptr /* memory_allocator */, kDefaultToAdaptiveMutex, + kDontChargeCacheMetadata); std::shared_ptr secondary_cache( new TestSecondaryCache(2048 * 1024)); opts.secondary_cache = secondary_cache; @@ -1238,8 +1419,11 @@ TEST_F(DBSecondaryCacheTest, NoSecondaryCacheInsertion) { } TEST_F(DBSecondaryCacheTest, SecondaryCacheIntensiveTesting) { - LRUCacheOptions opts(8 * 1024, 0, false, 0.5, nullptr, - kDefaultToAdaptiveMutex, kDontChargeCacheMetadata); + LRUCacheOptions opts(8 * 1024 /* capacity */, 0 /* num_shard_bits */, + false /* strict_capacity_limit */, + 0.5 /* high_pri_pool_ratio */, + nullptr /* memory_allocator */, kDefaultToAdaptiveMutex, + kDontChargeCacheMetadata); std::shared_ptr secondary_cache( new TestSecondaryCache(2048 * 1024)); opts.secondary_cache = secondary_cache; @@ -1284,8 +1468,11 @@ TEST_F(DBSecondaryCacheTest, SecondaryCacheIntensiveTesting) { // if we try to insert block_1 to the block cache, it will always fails. Only // block_2 will be successfully inserted into the block cache. TEST_F(DBSecondaryCacheTest, SecondaryCacheFailureTest) { - LRUCacheOptions opts(4 * 1024, 0, false, 0.5, nullptr, - kDefaultToAdaptiveMutex, kDontChargeCacheMetadata); + LRUCacheOptions opts(4 * 1024 /* capacity */, 0 /* num_shard_bits */, + false /* strict_capacity_limit */, + 0.5 /* high_pri_pool_ratio */, + nullptr /* memory_allocator */, kDefaultToAdaptiveMutex, + kDontChargeCacheMetadata); std::shared_ptr secondary_cache( new TestSecondaryCache(2048 * 1024)); opts.secondary_cache = secondary_cache; @@ -1373,7 +1560,10 @@ TEST_F(DBSecondaryCacheTest, SecondaryCacheFailureTest) { } TEST_F(LRUCacheSecondaryCacheTest, BasicWaitAllTest) { - LRUCacheOptions opts(1024, 2, false, 0.5, nullptr, kDefaultToAdaptiveMutex, + LRUCacheOptions opts(1024 /* capacity */, 2 /* num_shard_bits */, + false /* strict_capacity_limit */, + 0.5 /* high_pri_pool_ratio */, + nullptr /* memory_allocator */, kDefaultToAdaptiveMutex, kDontChargeCacheMetadata); std::shared_ptr secondary_cache = std::make_shared(32 * 1024); @@ -1433,7 +1623,10 @@ TEST_F(LRUCacheSecondaryCacheTest, BasicWaitAllTest) { // a sync point callback in TestSecondaryCache::Lookup. We then control the // lookup result by setting the ResultMap. TEST_F(DBSecondaryCacheTest, TestSecondaryCacheMultiGet) { - LRUCacheOptions opts(1 << 20, 0, false, 0.5, nullptr, kDefaultToAdaptiveMutex, + LRUCacheOptions opts(1 << 20 /* capacity */, 0 /* num_shard_bits */, + false /* strict_capacity_limit */, + 0.5 /* high_pri_pool_ratio */, + nullptr /* memory_allocator */, kDefaultToAdaptiveMutex, kDontChargeCacheMetadata); std::shared_ptr secondary_cache( new TestSecondaryCache(2048 * 1024)); @@ -1516,15 +1709,16 @@ class LRUCacheWithStat : public LRUCache { public: LRUCacheWithStat( size_t _capacity, int _num_shard_bits, bool _strict_capacity_limit, - double _high_pri_pool_ratio, + double _high_pri_pool_ratio, double _low_pri_pool_ratio, std::shared_ptr _memory_allocator = nullptr, bool _use_adaptive_mutex = kDefaultToAdaptiveMutex, CacheMetadataChargePolicy _metadata_charge_policy = kDontChargeCacheMetadata, const std::shared_ptr& _secondary_cache = nullptr) : LRUCache(_capacity, _num_shard_bits, _strict_capacity_limit, - _high_pri_pool_ratio, _memory_allocator, _use_adaptive_mutex, - _metadata_charge_policy, _secondary_cache) { + _high_pri_pool_ratio, _low_pri_pool_ratio, _memory_allocator, + _use_adaptive_mutex, _metadata_charge_policy, + _secondary_cache) { insert_count_ = 0; lookup_count_ = 0; } @@ -1567,13 +1761,17 @@ class LRUCacheWithStat : public LRUCache { #ifndef ROCKSDB_LITE TEST_F(DBSecondaryCacheTest, LRUCacheDumpLoadBasic) { - LRUCacheOptions cache_opts(1024 * 1024, 0, false, 0.5, nullptr, + LRUCacheOptions cache_opts(1024 * 1024 /* capacity */, 0 /* num_shard_bits */, + false /* strict_capacity_limit */, + 0.5 /* high_pri_pool_ratio */, + nullptr /* memory_allocator */, kDefaultToAdaptiveMutex, kDontChargeCacheMetadata); LRUCacheWithStat* tmp_cache = new LRUCacheWithStat( cache_opts.capacity, cache_opts.num_shard_bits, cache_opts.strict_capacity_limit, cache_opts.high_pri_pool_ratio, - cache_opts.memory_allocator, cache_opts.use_adaptive_mutex, - cache_opts.metadata_charge_policy, cache_opts.secondary_cache); + cache_opts.low_pri_pool_ratio, cache_opts.memory_allocator, + cache_opts.use_adaptive_mutex, cache_opts.metadata_charge_policy, + cache_opts.secondary_cache); std::shared_ptr cache(tmp_cache); BlockBasedTableOptions table_options; table_options.block_cache = cache; @@ -1644,8 +1842,9 @@ TEST_F(DBSecondaryCacheTest, LRUCacheDumpLoadBasic) { tmp_cache = new LRUCacheWithStat( cache_opts.capacity, cache_opts.num_shard_bits, cache_opts.strict_capacity_limit, cache_opts.high_pri_pool_ratio, - cache_opts.memory_allocator, cache_opts.use_adaptive_mutex, - cache_opts.metadata_charge_policy, cache_opts.secondary_cache); + cache_opts.low_pri_pool_ratio, cache_opts.memory_allocator, + cache_opts.use_adaptive_mutex, cache_opts.metadata_charge_policy, + cache_opts.secondary_cache); std::shared_ptr cache_new(tmp_cache); table_options.block_cache = cache_new; table_options.block_size = 4 * 1024; @@ -1702,13 +1901,17 @@ TEST_F(DBSecondaryCacheTest, LRUCacheDumpLoadBasic) { } TEST_F(DBSecondaryCacheTest, LRUCacheDumpLoadWithFilter) { - LRUCacheOptions cache_opts(1024 * 1024, 0, false, 0.5, nullptr, + LRUCacheOptions cache_opts(1024 * 1024 /* capacity */, 0 /* num_shard_bits */, + false /* strict_capacity_limit */, + 0.5 /* high_pri_pool_ratio */, + nullptr /* memory_allocator */, kDefaultToAdaptiveMutex, kDontChargeCacheMetadata); LRUCacheWithStat* tmp_cache = new LRUCacheWithStat( cache_opts.capacity, cache_opts.num_shard_bits, cache_opts.strict_capacity_limit, cache_opts.high_pri_pool_ratio, - cache_opts.memory_allocator, cache_opts.use_adaptive_mutex, - cache_opts.metadata_charge_policy, cache_opts.secondary_cache); + cache_opts.low_pri_pool_ratio, cache_opts.memory_allocator, + cache_opts.use_adaptive_mutex, cache_opts.metadata_charge_policy, + cache_opts.secondary_cache); std::shared_ptr cache(tmp_cache); BlockBasedTableOptions table_options; table_options.block_cache = cache; @@ -1806,8 +2009,9 @@ TEST_F(DBSecondaryCacheTest, LRUCacheDumpLoadWithFilter) { tmp_cache = new LRUCacheWithStat( cache_opts.capacity, cache_opts.num_shard_bits, cache_opts.strict_capacity_limit, cache_opts.high_pri_pool_ratio, - cache_opts.memory_allocator, cache_opts.use_adaptive_mutex, - cache_opts.metadata_charge_policy, cache_opts.secondary_cache); + cache_opts.low_pri_pool_ratio, cache_opts.memory_allocator, + cache_opts.use_adaptive_mutex, cache_opts.metadata_charge_policy, + cache_opts.secondary_cache); std::shared_ptr cache_new(tmp_cache); table_options.block_cache = cache_new; table_options.block_size = 4 * 1024; @@ -1873,8 +2077,11 @@ TEST_F(DBSecondaryCacheTest, LRUCacheDumpLoadWithFilter) { // Test the option not to use the secondary cache in a certain DB. TEST_F(DBSecondaryCacheTest, TestSecondaryCacheOptionBasic) { - LRUCacheOptions opts(4 * 1024, 0, false, 0.5, nullptr, - kDefaultToAdaptiveMutex, kDontChargeCacheMetadata); + LRUCacheOptions opts(4 * 1024 /* capacity */, 0 /* num_shard_bits */, + false /* strict_capacity_limit */, + 0.5 /* high_pri_pool_ratio */, + nullptr /* memory_allocator */, kDefaultToAdaptiveMutex, + kDontChargeCacheMetadata); std::shared_ptr secondary_cache( new TestSecondaryCache(2048 * 1024)); opts.secondary_cache = secondary_cache; @@ -1965,8 +2172,11 @@ TEST_F(DBSecondaryCacheTest, TestSecondaryCacheOptionBasic) { // with new options, which set the lowest_used_cache_tier to // kNonVolatileBlockTier. So secondary cache will be used. TEST_F(DBSecondaryCacheTest, TestSecondaryCacheOptionChange) { - LRUCacheOptions opts(4 * 1024, 0, false, 0.5, nullptr, - kDefaultToAdaptiveMutex, kDontChargeCacheMetadata); + LRUCacheOptions opts(4 * 1024 /* capacity */, 0 /* num_shard_bits */, + false /* strict_capacity_limit */, + 0.5 /* high_pri_pool_ratio */, + nullptr /* memory_allocator */, kDefaultToAdaptiveMutex, + kDontChargeCacheMetadata); std::shared_ptr secondary_cache( new TestSecondaryCache(2048 * 1024)); opts.secondary_cache = secondary_cache; @@ -2057,8 +2267,11 @@ TEST_F(DBSecondaryCacheTest, TestSecondaryCacheOptionChange) { // Two DB test. We create 2 DBs sharing the same block cache and secondary // cache. We diable the secondary cache option for DB2. TEST_F(DBSecondaryCacheTest, TestSecondaryCacheOptionTwoDB) { - LRUCacheOptions opts(4 * 1024, 0, false, 0.5, nullptr, - kDefaultToAdaptiveMutex, kDontChargeCacheMetadata); + LRUCacheOptions opts(4 * 1024 /* capacity */, 0 /* num_shard_bits */, + false /* strict_capacity_limit */, + 0.5 /* high_pri_pool_ratio */, + nullptr /* memory_allocator */, kDefaultToAdaptiveMutex, + kDontChargeCacheMetadata); std::shared_ptr secondary_cache( new TestSecondaryCache(2048 * 1024)); opts.secondary_cache = secondary_cache; diff --git a/db/blob/blob_file_builder.cc b/db/blob/blob_file_builder.cc index 214c2a49be..c8e99a8381 100644 --- a/db/blob/blob_file_builder.cc +++ b/db/blob/blob_file_builder.cc @@ -404,7 +404,7 @@ Status BlobFileBuilder::PutBlobIntoCacheIfNeeded(const Slice& blob, const CacheKey cache_key = base_cache_key.WithOffset(blob_offset); const Slice key = cache_key.AsSlice(); - const Cache::Priority priority = Cache::Priority::LOW; + const Cache::Priority priority = Cache::Priority::BOTTOM; // Objects to be put into the cache have to be heap-allocated and // self-contained, i.e. own their contents. The Cache has to be able to diff --git a/db/blob/blob_source.cc b/db/blob/blob_source.cc index 2ddf12feb6..945f5481dc 100644 --- a/db/blob/blob_source.cc +++ b/db/blob/blob_source.cc @@ -70,7 +70,7 @@ Status BlobSource::PutBlobIntoCache(const Slice& cache_key, assert(blob_cache_); Status s; - const Cache::Priority priority = Cache::Priority::LOW; + const Cache::Priority priority = Cache::Priority::BOTTOM; // Objects to be put into the cache have to be heap-allocated and // self-contained, i.e. own their contents. The Cache has to be able to take @@ -108,7 +108,7 @@ Cache::Handle* BlobSource::GetEntryFromCache(const Slice& key) const { return Status::OK(); }; cache_handle = blob_cache_->Lookup(key, GetCacheItemHelper(), create_cb, - Cache::Priority::LOW, + Cache::Priority::BOTTOM, true /* wait_for_cache */, statistics_); } else { cache_handle = blob_cache_->Lookup(key, statistics_); diff --git a/db/blob/blob_source_test.cc b/db/blob/blob_source_test.cc index 3676e9d3a3..d3c8339e82 100644 --- a/db/blob/blob_source_test.cc +++ b/db/blob/blob_source_test.cc @@ -121,6 +121,8 @@ class BlobSourceTest : public DBTestBase { co.capacity = 8 << 20; co.num_shard_bits = 2; co.metadata_charge_policy = kDontChargeCacheMetadata; + co.high_pri_pool_ratio = 0.2; + co.low_pri_pool_ratio = 0.2; options_.blob_cache = NewLRUCache(co); options_.lowest_used_cache_tier = CacheTier::kVolatileTier; @@ -1042,6 +1044,8 @@ class BlobSecondaryCacheTest : public DBTestBase { lru_cache_ops_.num_shard_bits = 0; lru_cache_ops_.strict_capacity_limit = true; lru_cache_ops_.metadata_charge_policy = kDontChargeCacheMetadata; + lru_cache_ops_.high_pri_pool_ratio = 0.2; + lru_cache_ops_.low_pri_pool_ratio = 0.2; secondary_cache_opts_.capacity = 8 << 20; // 8 MB secondary_cache_opts_.num_shard_bits = 0; @@ -1275,7 +1279,13 @@ class BlobSourceCacheReservationTest : public DBTestBase { co.capacity = kCacheCapacity; co.num_shard_bits = kNumShardBits; co.metadata_charge_policy = kDontChargeCacheMetadata; + + co.high_pri_pool_ratio = 0.0; + co.low_pri_pool_ratio = 0.0; std::shared_ptr blob_cache = NewLRUCache(co); + + co.high_pri_pool_ratio = 0.5; + co.low_pri_pool_ratio = 0.5; std::shared_ptr block_cache = NewLRUCache(co); options_.blob_cache = blob_cache; diff --git a/db/db_block_cache_test.cc b/db/db_block_cache_test.cc index 04e0dfdc04..5e549ab6a5 100644 --- a/db/db_block_cache_test.cc +++ b/db/db_block_cache_test.cc @@ -819,8 +819,8 @@ class MockCache : public LRUCache { MockCache() : LRUCache((size_t)1 << 25 /*capacity*/, 0 /*num_shard_bits*/, - false /*strict_capacity_limit*/, 0.0 /*high_pri_pool_ratio*/) { - } + false /*strict_capacity_limit*/, 0.0 /*high_pri_pool_ratio*/, + 0.0 /*low_pri_pool_ratio*/) {} using ShardedCache::Insert; diff --git a/db/db_test2.cc b/db/db_test2.cc index 4aa0aa7822..cc50e65ebd 100644 --- a/db/db_test2.cc +++ b/db/db_test2.cc @@ -651,8 +651,12 @@ TEST_F(DBTest2, SharedWriteBufferLimitAcrossDB) { TEST_F(DBTest2, TestWriteBufferNoLimitWithCache) { Options options = CurrentOptions(); options.arena_block_size = 4096; - std::shared_ptr cache = - NewLRUCache(LRUCacheOptions(10000000, 1, false, 0.0)); + std::shared_ptr cache = NewLRUCache(LRUCacheOptions( + 10000000 /* capacity */, 1 /* num_shard_bits */, + false /* strict_capacity_limit */, 0.0 /* high_pri_pool_ratio */, + nullptr /* memory_allocator */, kDefaultToAdaptiveMutex, + kDontChargeCacheMetadata)); + options.write_buffer_size = 50000; // this is never hit // Use a write buffer total size so that the soft limit is about // 105000. diff --git a/include/rocksdb/cache.h b/include/rocksdb/cache.h index 3010882596..d38fc37c47 100644 --- a/include/rocksdb/cache.h +++ b/include/rocksdb/cache.h @@ -72,6 +72,17 @@ struct LRUCacheOptions { // BlockBasedTableOptions::cache_index_and_filter_blocks_with_high_priority. double high_pri_pool_ratio = 0.5; + // Percentage of cache reserved for low priority entries. + // If greater than zero, the LRU list will be split into a high-pri list, a + // low-pri list and a bottom-pri list. High-pri entries will be inserted to + // the tail of high-pri list, while low-pri entries will be first inserted to + // the low-pri list (the midpoint) and bottom-pri entries will be first + // inserted to the bottom-pri list. + // + // + // See also high_pri_pool_ratio. + double low_pri_pool_ratio = 0.0; + // If non-nullptr will use this allocator instead of system allocator when // allocating memory for cache blocks. Call this method before you start using // the cache! @@ -99,11 +110,13 @@ struct LRUCacheOptions { std::shared_ptr _memory_allocator = nullptr, bool _use_adaptive_mutex = kDefaultToAdaptiveMutex, CacheMetadataChargePolicy _metadata_charge_policy = - kDefaultCacheMetadataChargePolicy) + kDefaultCacheMetadataChargePolicy, + double _low_pri_pool_ratio = 0.0) : capacity(_capacity), num_shard_bits(_num_shard_bits), strict_capacity_limit(_strict_capacity_limit), high_pri_pool_ratio(_high_pri_pool_ratio), + low_pri_pool_ratio(_low_pri_pool_ratio), memory_allocator(std::move(_memory_allocator)), use_adaptive_mutex(_use_adaptive_mutex), metadata_charge_policy(_metadata_charge_policy) {} @@ -123,7 +136,8 @@ extern std::shared_ptr NewLRUCache( std::shared_ptr memory_allocator = nullptr, bool use_adaptive_mutex = kDefaultToAdaptiveMutex, CacheMetadataChargePolicy metadata_charge_policy = - kDefaultCacheMetadataChargePolicy); + kDefaultCacheMetadataChargePolicy, + double low_pri_pool_ratio = 0.0); extern std::shared_ptr NewLRUCache(const LRUCacheOptions& cache_opts); @@ -151,10 +165,11 @@ struct CompressedSecondaryCacheOptions : LRUCacheOptions { CacheMetadataChargePolicy _metadata_charge_policy = kDefaultCacheMetadataChargePolicy, CompressionType _compression_type = CompressionType::kLZ4Compression, - uint32_t _compress_format_version = 2) + uint32_t _compress_format_version = 2, double _low_pri_pool_ratio = 0.0) : LRUCacheOptions(_capacity, _num_shard_bits, _strict_capacity_limit, _high_pri_pool_ratio, std::move(_memory_allocator), - _use_adaptive_mutex, _metadata_charge_policy), + _use_adaptive_mutex, _metadata_charge_policy, + _low_pri_pool_ratio), compression_type(_compression_type), compress_format_version(_compress_format_version) {} }; @@ -169,7 +184,7 @@ extern std::shared_ptr NewCompressedSecondaryCache( CacheMetadataChargePolicy metadata_charge_policy = kDefaultCacheMetadataChargePolicy, CompressionType compression_type = CompressionType::kLZ4Compression, - uint32_t compress_format_version = 2); + uint32_t compress_format_version = 2, double low_pri_pool_ratio = 0.0); extern std::shared_ptr NewCompressedSecondaryCache( const CompressedSecondaryCacheOptions& opts); @@ -196,7 +211,17 @@ class Cache { public: // Depending on implementation, cache entries with high priority could be less // likely to get evicted than low priority entries. - enum class Priority { HIGH, LOW }; + // + // The BOTTOM priority is mainly used for blob caching. Blobs are typically + // lower-value targets for caching than data blocks, since 1) with BlobDB, + // data blocks containing blob references conceptually form an index structure + // which has to be consulted before we can read the blob value, and 2) cached + // blobs represent only a single key-value, while cached data blocks generally + // contain multiple KVs. Since we would like to make it possible to use the + // same backing cache for the block cache and the blob cache, it would make + // sense to add a new, bottom cache priority level for blobs so data blocks + // are prioritized over them. + enum class Priority { HIGH, LOW, BOTTOM }; // A set of callbacks to allow objects in the primary block cache to be // be persisted in a secondary cache. The purpose of the secondary cache diff --git a/java/rocksjni/lru_cache.cc b/java/rocksjni/lru_cache.cc index 7d03f43b18..58a52609fc 100644 --- a/java/rocksjni/lru_cache.cc +++ b/java/rocksjni/lru_cache.cc @@ -22,12 +22,16 @@ jlong Java_org_rocksdb_LRUCache_newLRUCache(JNIEnv* /*env*/, jclass /*jcls*/, jlong jcapacity, jint jnum_shard_bits, jboolean jstrict_capacity_limit, - jdouble jhigh_pri_pool_ratio) { + jdouble jhigh_pri_pool_ratio, + jdouble jlow_pri_pool_ratio) { auto* sptr_lru_cache = new std::shared_ptr( ROCKSDB_NAMESPACE::NewLRUCache( static_cast(jcapacity), static_cast(jnum_shard_bits), static_cast(jstrict_capacity_limit), - static_cast(jhigh_pri_pool_ratio))); + static_cast(jhigh_pri_pool_ratio), + nullptr /* memory_allocator */, rocksdb::kDefaultToAdaptiveMutex, + rocksdb::kDontChargeCacheMetadata, + static_cast(jlow_pri_pool_ratio))); return GET_CPLUSPLUS_POINTER(sptr_lru_cache); } diff --git a/java/src/main/java/org/rocksdb/LRUCache.java b/java/src/main/java/org/rocksdb/LRUCache.java index 5e5bdeea27..db90b17c5b 100644 --- a/java/src/main/java/org/rocksdb/LRUCache.java +++ b/java/src/main/java/org/rocksdb/LRUCache.java @@ -16,7 +16,7 @@ public class LRUCache extends Cache { * @param capacity The fixed size capacity of the cache */ public LRUCache(final long capacity) { - this(capacity, -1, false, 0.0); + this(capacity, -1, false, 0.0, 0.0); } /** @@ -31,7 +31,7 @@ public class LRUCache extends Cache { * by hash of the key */ public LRUCache(final long capacity, final int numShardBits) { - super(newLRUCache(capacity, numShardBits, false,0.0)); + super(newLRUCache(capacity, numShardBits, false, 0.0, 0.0)); } /** @@ -49,7 +49,7 @@ public class LRUCache extends Cache { */ public LRUCache(final long capacity, final int numShardBits, final boolean strictCapacityLimit) { - super(newLRUCache(capacity, numShardBits, strictCapacityLimit,0.0)); + super(newLRUCache(capacity, numShardBits, strictCapacityLimit, 0.0, 0.0)); } /** @@ -69,14 +69,38 @@ public class LRUCache extends Cache { * @param highPriPoolRatio percentage of the cache reserves for high priority * entries */ - public LRUCache(final long capacity, final int numShardBits, - final boolean strictCapacityLimit, final double highPriPoolRatio) { - super(newLRUCache(capacity, numShardBits, strictCapacityLimit, - highPriPoolRatio)); + public LRUCache(final long capacity, final int numShardBits, final boolean strictCapacityLimit, + final double highPriPoolRatio) { + super(newLRUCache(capacity, numShardBits, strictCapacityLimit, highPriPoolRatio, 0.0)); } - private native static long newLRUCache(final long capacity, - final int numShardBits, final boolean strictCapacityLimit, - final double highPriPoolRatio); + /** + * Create a new cache with a fixed size capacity. The cache is sharded + * to 2^numShardBits shards, by hash of the key. The total capacity + * is divided and evenly assigned to each shard. If strictCapacityLimit + * is set, insert to the cache will fail when cache is full. User can also + * set percentage of the cache reserves for high priority entries and low + * priority entries via highPriPoolRatio and lowPriPoolRatio. + * numShardBits = -1 means it is automatically determined: every shard + * will be at least 512KB and number of shard bits will not exceed 6. + * + * @param capacity The fixed size capacity of the cache + * @param numShardBits The cache is sharded to 2^numShardBits shards, + * by hash of the key + * @param strictCapacityLimit insert to the cache will fail when cache is full + * @param highPriPoolRatio percentage of the cache reserves for high priority + * entries + * @param lowPriPoolRatio percentage of the cache reserves for low priority + * entries + */ + public LRUCache(final long capacity, final int numShardBits, final boolean strictCapacityLimit, + final double highPriPoolRatio, final double lowPriPoolRatio) { + super(newLRUCache( + capacity, numShardBits, strictCapacityLimit, highPriPoolRatio, lowPriPoolRatio)); + } + + private native static long newLRUCache(final long capacity, final int numShardBits, + final boolean strictCapacityLimit, final double highPriPoolRatio, + final double lowPriPoolRatio); @Override protected final native void disposeInternal(final long handle); } diff --git a/java/src/test/java/org/rocksdb/LRUCacheTest.java b/java/src/test/java/org/rocksdb/LRUCacheTest.java index 275cb560a1..4d194e7121 100644 --- a/java/src/test/java/org/rocksdb/LRUCacheTest.java +++ b/java/src/test/java/org/rocksdb/LRUCacheTest.java @@ -20,9 +20,10 @@ public class LRUCacheTest { final long capacity = 80000000; final int numShardBits = 16; final boolean strictCapacityLimit = true; - final double highPriPoolRatio = 0.05; - try(final Cache lruCache = new LRUCache(capacity, - numShardBits, strictCapacityLimit, highPriPoolRatio)) { + final double highPriPoolRatio = 0.5; + final double lowPriPoolRatio = 0.5; + try (final Cache lruCache = new LRUCache( + capacity, numShardBits, strictCapacityLimit, highPriPoolRatio, lowPriPoolRatio)) { //no op assertThat(lruCache.getUsage()).isGreaterThanOrEqualTo(0); assertThat(lruCache.getPinnedUsage()).isGreaterThanOrEqualTo(0); diff --git a/memory/memory_allocator_test.cc b/memory/memory_allocator_test.cc index 1e96c44ee9..104241f5c6 100644 --- a/memory/memory_allocator_test.cc +++ b/memory/memory_allocator_test.cc @@ -83,7 +83,7 @@ TEST_P(MemoryAllocatorTest, DatabaseBlockCache) { options.create_if_missing = true; BlockBasedTableOptions table_options; - auto cache = NewLRUCache(1024 * 1024, 6, false, false, allocator_); + auto cache = NewLRUCache(1024 * 1024, 6, false, 0.0, allocator_); table_options.block_cache = cache; options.table_factory.reset(NewBlockBasedTableFactory(table_options)); DB* db = nullptr; diff --git a/table/block_based/block_based_table_factory.cc b/table/block_based/block_based_table_factory.cc index aa936ea831..0192605afd 100644 --- a/table/block_based/block_based_table_factory.cc +++ b/table/block_based/block_based_table_factory.cc @@ -454,6 +454,7 @@ void BlockBasedTableFactory::InitializeOptions() { // It makes little sense to pay overhead for mid-point insertion while the // block size is only 8MB. co.high_pri_pool_ratio = 0.0; + co.low_pri_pool_ratio = 0.0; table_options_.block_cache = NewLRUCache(co); } if (table_options_.block_size_deviation < 0 || diff --git a/tools/benchmark.sh b/tools/benchmark.sh index 1773f9d6ef..5a2d358899 100755 --- a/tools/benchmark.sh +++ b/tools/benchmark.sh @@ -192,7 +192,7 @@ if [[ $cache_index_and_filter -eq 0 ]]; then elif [[ $cache_index_and_filter -eq 1 ]]; then cache_meta_flags="\ --cache_index_and_filter_blocks=$cache_index_and_filter \ - --cache_high_pri_pool_ratio=0.5" + --cache_high_pri_pool_ratio=0.5 --cache_low_pri_pool_ratio=0" else echo CACHE_INDEX_AND_FILTER_BLOCKS was $CACHE_INDEX_AND_FILTER_BLOCKS but must be 0 or 1 exit $EXIT_INVALID_ARGS diff --git a/tools/db_bench_tool.cc b/tools/db_bench_tool.cc index 568a2c73a7..f0b3ed46c7 100644 --- a/tools/db_bench_tool.cc +++ b/tools/db_bench_tool.cc @@ -570,6 +570,9 @@ DEFINE_double(cache_high_pri_pool_ratio, 0.0, "If > 0.0, we also enable " "cache_index_and_filter_blocks_with_high_priority."); +DEFINE_double(cache_low_pri_pool_ratio, 0.0, + "Ratio of block cache reserve for low pri blocks."); + DEFINE_string(cache_type, "lru_cache", "Type of block cache."); DEFINE_bool(use_compressed_secondary_cache, false, @@ -589,6 +592,9 @@ DEFINE_double(compressed_secondary_cache_high_pri_pool_ratio, 0.0, "If > 0.0, we also enable " "cache_index_and_filter_blocks_with_high_priority."); +DEFINE_double(compressed_secondary_cache_low_pri_pool_ratio, 0.0, + "Ratio of block cache reserve for low pri blocks."); + DEFINE_string(compressed_secondary_cache_compression_type, "lz4", "The compression algorithm to use for large " "values stored in CompressedSecondaryCache."); @@ -3022,11 +3028,12 @@ class Benchmark { #ifdef MEMKIND FLAGS_use_cache_memkind_kmem_allocator ? std::make_shared() - : nullptr + : nullptr, #else - nullptr + nullptr, #endif - ); + kDefaultToAdaptiveMutex, kDefaultCacheMetadataChargePolicy, + FLAGS_cache_low_pri_pool_ratio); if (FLAGS_use_cache_memkind_kmem_allocator) { #ifndef MEMKIND fprintf(stderr, "Memkind library is not linked with the binary."); @@ -3055,6 +3062,8 @@ class Benchmark { FLAGS_compressed_secondary_cache_numshardbits; secondary_cache_opts.high_pri_pool_ratio = FLAGS_compressed_secondary_cache_high_pri_pool_ratio; + secondary_cache_opts.low_pri_pool_ratio = + FLAGS_compressed_secondary_cache_low_pri_pool_ratio; secondary_cache_opts.compression_type = FLAGS_compressed_secondary_cache_compression_type_e; secondary_cache_opts.compress_format_version = @@ -4296,6 +4305,12 @@ class Benchmark { block_based_options.cache_index_and_filter_blocks_with_high_priority = true; } + if (FLAGS_cache_high_pri_pool_ratio + FLAGS_cache_low_pri_pool_ratio > + 1.0) { + fprintf(stderr, + "Sum of high_pri_pool_ratio and low_pri_pool_ratio " + "cannot exceed 1.0.\n"); + } block_based_options.block_cache = cache_; block_based_options.cache_usage_options.options_overrides.insert( {CacheEntryRole::kCompressionDictionaryBuildingBuffer,