Commit Graph

12 Commits

Author SHA1 Message Date
Peter Dillinger e466173d5c Print stack traces on frozen tests in CI (#10828)
Summary:
Instead of existing calls to ps from gnu_parallel, call a new wrapper that does ps, looks for unit test like processes, and uses pstack or gdb to print thread stack traces. Also, using `ps -wwf` instead of `ps -wf` ensures output is not cut off.

For security, CircleCI runs with security restrictions on ptrace (/proc/sys/kernel/yama/ptrace_scope = 1), and this change adds a work-around to `InstallStackTraceHandler()` (only used by testing tools) to allow any process from the same user to debug it. (I've also touched >100 files to ensure all the unit tests call this function.)

Pull Request resolved: https://github.com/facebook/rocksdb/pull/10828

Test Plan: local manual + temporary infinite loop in a unit test to observe in CircleCI

Reviewed By: hx235

Differential Revision: D40447634

Pulled By: pdillinger

fbshipit-source-id: 718a4c4a5b54fa0f9af2d01a446162b45e5e84e1
2022-10-18 00:35:35 -07:00
gitbw95 47b57a3731 add SetCapacity and GetCapacity for secondary cache (#10712)
Summary:
To support tuning secondary cache dynamically, add `SetCapacity()` and `GetCapacity()` for CompressedSecondaryCache.

Pull Request resolved: https://github.com/facebook/rocksdb/pull/10712

Test Plan: Unit Tests

Reviewed By: anand1976

Differential Revision: D39685212

Pulled By: gitbw95

fbshipit-source-id: 19573c67237011927320207732b5de083cb87240
2022-09-29 19:15:04 -07:00
gitbw95 2cc5b39560 Add enable_split_merge option for CompressedSecondaryCache (#10690)
Summary:
`enable_custom_split_merge` is added for enabling the custom split and merge feature, which split the compressed value into chunks so that they may better fit jemalloc bins.

Pull Request resolved: https://github.com/facebook/rocksdb/pull/10690

Test Plan:
Unit Tests
Stress Tests

Reviewed By: anand1976

Differential Revision: D39567604

Pulled By: gitbw95

fbshipit-source-id: f6d1d46200f365220055f793514601dcb0edc4b7
2022-09-16 15:41:49 -07:00
gitbw95 0148c4934d Add PerfContext counters for CompressedSecondaryCache (#10650)
Summary:
Add PerfContext counters for CompressedSecondaryCache.

Pull Request resolved: https://github.com/facebook/rocksdb/pull/10650

Test Plan: Unit Tests.

Reviewed By: anand1976

Differential Revision: D39354712

Pulled By: gitbw95

fbshipit-source-id: 1b90d3df99d08ddecd351edfd48d1e3723fdbc15
2022-09-08 16:35:57 -07:00
Bo Wang d490bfcdb6 Avoid recompressing cold block in CompressedSecondaryCache (#10527)
Summary:
**Summary:**
When a block is firstly `Lookup` from the secondary cache, we just insert a dummy block in the primary cache (charging the actual size of the block) and don’t erase the block from the secondary cache. A standalone handle is returned from `Lookup`. Only if the block is hit again, we erase it from the secondary cache and add it into the primary cache.

When a block is firstly evicted from the primary cache to the secondary cache, we just insert a dummy block (size 0) in the secondary cache. When the block is evicted again, it is treated as a hot block and is inserted into the secondary cache.

**Implementation Details**
Add a new state of LRUHandle: The handle is never inserted into the LRUCache (both hash table and LRU list) and it doesn't experience the above three states. The entry can be freed when refs becomes 0.  (refs >= 1 && in_cache == false && IS_STANDALONE == true)

The behaviors of  `LRUCacheShard::Lookup()` are updated if the secondary_cache is CompressedSecondaryCache:
1. If a handle is found in primary cache:
  1.1. If the handle's value is not nullptr, it is returned immediately.
  1.2. If the handle's value is nullptr, this means the handle is a dummy one. For a dummy handle, if it was retrieved from secondary cache, it may still exist in secondary cache.
    - 1.2.1. If no valid handle can be `Lookup` from secondary cache, return nullptr.
    - 1.2.2. If the handle from secondary cache is valid, erase it from the secondary cache and add it into the primary cache.
2. If a handle is not found in primary cache:
  2.1. If no valid handle can be `Lookup` from secondary cache, return nullptr.
  2.2.  If the handle from secondary cache is valid, insert a dummy block in the primary cache (charging the actual size of the block)  and return a standalone handle.

The behaviors of `LRUCacheShard::Promote()` are updated as follows:
1. If `e->sec_handle` has value, one of the following steps can happen:
  1.1. Insert a dummy handle and return a standalone handle to caller when `secondary_cache_` is `CompressedSecondaryCache` and e is a standalone handle.
  1.2. Insert the item into the primary cache and return the handle to caller.
  1.3. Exception handling.
3. If `e->sec_handle` has no value, mark the item as not in cache and charge the cache as its only metadata that'll shortly be released.

The behavior of  `CompressedSecondaryCache::Insert()` is updated:
1. If a block is evicted from the primary cache for the first time, a dummy item is inserted.
4. If a dummy item is found for a block, the block is inserted into the secondary cache.

The behavior of  `CompressedSecondaryCache:::Lookup()` is updated:
1. If a handle is not found or it is a dummy item, a nullptr is returned.
2. If `erase_handle` is true, the handle is erased.

The behaviors of  `LRUCacheShard::Release()` are adjusted for the standalone handles.

Pull Request resolved: https://github.com/facebook/rocksdb/pull/10527

Test Plan:
1. stress tests.
5. unit tests.
6. CPU profiling for db_bench.

Reviewed By: siying

Differential Revision: D38747613

Pulled By: gitbw95

fbshipit-source-id: 74a1eba7e1957c9affb2bd2ae3e0194584fa6eca
2022-09-07 19:00:27 -07:00
Gang Liao 275cd80cdb Add a blob-specific cache priority (#10461)
Summary:
RocksDB's `Cache` abstraction currently supports two priority levels for items: high (used for frequently accessed/highly valuable SST metablocks like index/filter blocks) and low (used for SST data blocks). Blobs are typically lower-value targets for caching than data blocks, since 1) with BlobDB, data blocks containing blob references conceptually form an index structure which has to be consulted before we can read the blob value, and 2) cached blobs represent only a single key-value, while cached data blocks generally contain multiple KVs. Since we would like to make it possible to use the same backing cache for the block cache and the blob cache, it would make sense to add a new, lower-than-low cache priority level (bottom level) for blobs so data blocks are prioritized over them.

This task is a part of https://github.com/facebook/rocksdb/issues/10156

Pull Request resolved: https://github.com/facebook/rocksdb/pull/10461

Reviewed By: siying

Differential Revision: D38672823

Pulled By: ltamasi

fbshipit-source-id: 90cf7362036563d79891f47be2cc24b827482743
2022-08-12 17:59:06 -07:00
gitbw95 f060b47ee8 Fix the segdefault bug in CompressedSecondaryCache and its tests (#10507)
Summary:
This fix is to replace `AllocateBlock()` with `new`. Once I figure out why `AllocateBlock()` might cause the segfault, I will update the implementation.

Fix the bug that causes ./compressed_secondary_cache_test output following test failures:

```
Note: Google Test filter = CompressedSecondaryCacheTest.MergeChunksIntoValueTest
[==========] Running 1 test from 1 test case.
[----------] Global test environment set-up.
[----------] 1 test from CompressedSecondaryCacheTest
[ RUN      ] CompressedSecondaryCacheTest.MergeChunksIntoValueTest
[       OK ] CompressedSecondaryCacheTest.MergeChunksIntoValueTest (1 ms)
[----------] 1 test from CompressedSecondaryCacheTest (1 ms total)

[----------] Global test environment tear-down
[==========] 1 test from 1 test case ran. (9 ms total)
[  PASSED  ] 1 test.
t/run-compressed_secondary_cache_test-CompressedSecondaryCacheTest.MergeChunksIntoValueTest: line 4: 1091086 Segmentation fault      (core dumped) TEST_TMPDIR=$d ./compressed_secondary_cache_test --gtest_filter=CompressedSecondaryCacheTest.MergeChunksIntoValueTest
Note: Google Test filter = CompressedSecondaryCacheTest.BasicTestWithMemoryAllocatorAndCompression
[==========] Running 1 test from 1 test case.
[----------] Global test environment set-up.
[----------] 1 test from CompressedSecondaryCacheTest
[ RUN      ] CompressedSecondaryCacheTest.BasicTestWithMemoryAllocatorAndCompression
[       OK ] CompressedSecondaryCacheTest.BasicTestWithMemoryAllocatorAndCompression (1 ms)
[----------] 1 test from CompressedSecondaryCacheTest (1 ms total)

[----------] Global test environment tear-down
[==========] 1 test from 1 test case ran. (2 ms total)
[  PASSED  ] 1 test.
t/run-compressed_secondary_cache_test-CompressedSecondaryCacheTest.BasicTestWithMemoryAllocatorAndCompression: line 4: 1090883 Segmentation fault      (core dumped) TEST_TMPDIR=$d ./compressed_secondary_cache_test --gtest_filter=CompressedSecondaryCacheTest.BasicTestWithMemoryAllocatorAndCompression

```

Pull Request resolved: https://github.com/facebook/rocksdb/pull/10507

Test Plan:
Test 1:
```
$make -j 24
$./compressed_secondary_cache_test
```
Test 2:
```
$COMPILE_WITH_ASAN=1  make -j 24
$./compressed_secondary_cache_test
```
Test 3:
```
$COMPILE_WITH_TSAN=1 make -j 24
$./compressed_secondary_cache_test
```

Reviewed By: anand1976

Differential Revision: D38529885

Pulled By: gitbw95

fbshipit-source-id: d903fa3fadbd4d29f9528728c63a4f61c4396890
2022-08-09 15:34:50 -07:00
Bo Wang 87b82f28a1 Split cache to minimize internal fragmentation (#10287)
Summary:
### **Summary:**
To minimize the internal fragmentation caused by the variable size of the compressed blocks, the original block is split according to the jemalloc bin size in `Insert()` and then merged back in `Lookup()`.  Based on the analysis of the results of the following tests, from the overall internal fragmentation perspective, this PR does mitigate the internal fragmentation issue.

_Do more myshadow tests with the latest commit. I finished several myshadow AB Testing and the results are promising. For the config of 4GB primary cache and 3GB secondary cache, Jemalloc resident stats shows consistently ~0.15GB memory saving; the allocated and active stats show similar memory savings. The CPU usage is almost the same before and after this PR._

To evaluate the issue of memory fragmentations and the benefits of this PR, I conducted two sets of local tests as follows.

**T1**
Keys:       16 bytes each (+ 0 bytes user-defined timestamp)
Values:     100 bytes each (50 bytes after compression)
Entries:    90000000
RawSize:    9956.4 MB (estimated)
FileSize:   5664.8 MB (estimated)

| Test Name | Primary Cache Size (MB) | Compressed Secondary Cache Size (MB) |
| - | - | - |
| T1_3 | 4000 | 4000 |
| T1_4 | 2000 | 3000 |

Populate the DB:
./db_bench --benchmarks=fillrandom --num=90000000 -db=/mem_fragmentation/db_bench_1
Overwrite it to a stable state:
./db_bench --benchmarks=overwrite --num=90000000 -use_existing_db -db=/mem_fragmentation/db_bench_1

Run read tests with differnt cache setting:
T1_3:
MALLOC_CONF="prof:true,prof_stats:true" ../rocksdb/db_bench --benchmarks=seekrandom  --threads=16 --num=90000000 -use_existing_db --benchmark_write_rate_limit=52000000 -use_direct_reads --cache_size=4000000000 -compressed_secondary_cache_size=4000000000 -use_compressed_secondary_cache -db=/mem_fragmentation/db_bench_1 --print_malloc_stats=true > ~/temp/mem_frag/20220710/jemalloc_stats_json_T1_3_20220710 -duration=1800 &

T1_4:
MALLOC_CONF="prof:true,prof_stats:true" ../rocksdb/db_bench --benchmarks=seekrandom  --threads=16 --num=90000000 -use_existing_db --benchmark_write_rate_limit=52000000 -use_direct_reads --cache_size=2000000000 -compressed_secondary_cache_size=3000000000 -use_compressed_secondary_cache -db=/mem_fragmentation/db_bench_1 --print_malloc_stats=true > ~/temp/mem_frag/20220710/jemalloc_stats_json_T1_4_20220710 -duration=1800 &

For T1_3 and T1_4, I also conducted the tests before and after this PR. The following table show the important jemalloc stats.

| Test Name | T1_3 | T1_3 after mem defrag | T1_4 | T1_4 after mem defrag |
| - | - | - | - | - |
| allocated (MB)  | 8728 | 8076 | 5518 | 5043 |
| available (MB)  | 8753 | 8092 | 5536 | 5051 |
| external fragmentation rate  | 0.003 | 0.002 | 0.003 | 0.0016 |
| resident (MB)  | 8956 | 8365 | 5655 | 5235 |

**T2**
Keys:       32 bytes each (+ 0 bytes user-defined timestamp)
Values:     256 bytes each (128 bytes after compression)
Entries:    40000000
RawSize:    10986.3 MB (estimated)
FileSize:   6103.5 MB (estimated)

| Test Name | Primary Cache Size (MB) | Compressed Secondary Cache Size (MB) |
| - | - | - |
| T2_3 | 4000 | 4000 |
| T2_4 | 2000 | 3000 |

Create DB (10GB):
./db_bench -benchmarks=fillrandom -use_direct_reads=true -num=40000000 -key_size=32 -value_size=256 -db=/mem_fragmentation/db_bench_2
Overwrite it to a stable state:
./db_bench --benchmarks=overwrite --num=40000000 -use_existing_db -key_size=32 -value_size=256 -db=/mem_fragmentation/db_bench_2

Run read tests with differnt cache setting:
T2_3:
MALLOC_CONF="prof:true,prof_stats:true" ./db_bench  --benchmarks="mixgraph" -use_direct_io_for_flush_and_compaction=true -use_direct_reads=true -cache_size=4000000000 -compressed_secondary_cache_size=4000000000 -use_compressed_secondary_cache -keyrange_dist_a=14.18 -keyrange_dist_b=-2.917 -keyrange_dist_c=0.0164 -keyrange_dist_d=-0.08082 -keyrange_num=30 -value_k=0.2615 -value_sigma=25.45 -iter_k=2.517 -iter_sigma=14.236 -mix_get_ratio=0.85 -mix_put_ratio=0.14 -mix_seek_ratio=0.01 -sine_mix_rate_interval_milliseconds=5000 -sine_a=1000 -sine_b=0.000073 -sine_d=400000 -reads=80000000 -num=40000000 -key_size=32 -value_size=256 -use_existing_db=true -db=/mem_fragmentation/db_bench_2 --print_malloc_stats=true > ~/temp/mem_frag/jemalloc_stats_T2_3 -duration=1800  &

T2_4:
MALLOC_CONF="prof:true,prof_stats:true" ./db_bench  --benchmarks="mixgraph" -use_direct_io_for_flush_and_compaction=true -use_direct_reads=true -cache_size=2000000000 -compressed_secondary_cache_size=3000000000 -use_compressed_secondary_cache -keyrange_dist_a=14.18 -keyrange_dist_b=-2.917 -keyrange_dist_c=0.0164 -keyrange_dist_d=-0.08082 -keyrange_num=30 -value_k=0.2615 -value_sigma=25.45 -iter_k=2.517 -iter_sigma=14.236 -mix_get_ratio=0.85 -mix_put_ratio=0.14 -mix_seek_ratio=0.01 -sine_mix_rate_interval_milliseconds=5000 -sine_a=1000 -sine_b=0.000073 -sine_d=400000 -reads=80000000 -num=40000000 -key_size=32 -value_size=256 -use_existing_db=true -db=/mem_fragmentation/db_bench_2 --print_malloc_stats=true > ~/temp/mem_frag/jemalloc_stats_T2_4 -duration=1800  &

For T2_3 and T2_4, I also conducted the tests before and after this PR. The following table show the important jemalloc stats.

| Test Name |  T2_3 | T2_3 after mem defrag | T2_4 | T2_4 after mem defrag |
| -  | - | - | - | - |
| allocated (MB)  | 8425 | 8093 | 5426 | 5149 |
| available (MB)  | 8489 | 8138 | 5435 | 5158 |
| external fragmentation rate  | 0.008 | 0.0055 | 0.0017 | 0.0017 |
| resident (MB)  | 8676 | 8392 | 5541 | 5321 |

Pull Request resolved: https://github.com/facebook/rocksdb/pull/10287

Test Plan: Unit tests.

Reviewed By: anand1976

Differential Revision: D37743362

Pulled By: gitbw95

fbshipit-source-id: 0010c5af08addeacc5ebbc4ffe5be882fb1d38ad
2022-08-02 15:28:11 -07:00
Peter Dillinger 65036e4217 Revert "Add a blob-specific cache priority (#10309)" (#10434)
Summary:
This reverts commit 8d178090be
because of a clear performance regression seen in internal dashboard
https://fburl.com/unidash/tpz75iee

Pull Request resolved: https://github.com/facebook/rocksdb/pull/10434

Reviewed By: ltamasi

Differential Revision: D38256373

Pulled By: pdillinger

fbshipit-source-id: 134aa00f50dd7b1bbe037c227884a351342ec44b
2022-07-29 07:18:15 -07:00
Gang Liao 8d178090be Add a blob-specific cache priority (#10309)
Summary:
RocksDB's `Cache` abstraction currently supports two priority levels for items: high (used for frequently accessed/highly valuable SST metablocks like index/filter blocks) and low (used for SST data blocks). Blobs are typically lower-value targets for caching than data blocks, since 1) with BlobDB, data blocks containing blob references conceptually form an index structure which has to be consulted before we can read the blob value, and 2) cached blobs represent only a single key-value, while cached data blocks generally contain multiple KVs. Since we would like to make it possible to use the same backing cache for the block cache and the blob cache, it would make sense to add a new, lower-than-low cache priority level (bottom level) for blobs so data blocks are prioritized over them.

This task is a part of https://github.com/facebook/rocksdb/issues/10156

Pull Request resolved: https://github.com/facebook/rocksdb/pull/10309

Reviewed By: ltamasi

Differential Revision: D38211655

Pulled By: gangliao

fbshipit-source-id: 65ef33337db4d85277cc6f9782d67c421ad71dd5
2022-07-27 19:09:24 -07:00
gitbw95 f4052d13b7 Enable SecondaryCache::CreateFromString to create sec cache based on the uri for CompressedSecondaryCache (#10132)
Summary:
Update SecondaryCache::CreateFromString and enable it to create sec cache based on the uri for CompressedSecondaryCache.

Pull Request resolved: https://github.com/facebook/rocksdb/pull/10132

Test Plan: Add unit tests.

Reviewed By: anand1976

Differential Revision: D36996997

Pulled By: gitbw95

fbshipit-source-id: 882ad563cff6d38b306a53426ad7e47273f34edc
2022-06-10 12:23:10 -07:00
gitbw95 f241d082b6 Prevent double caching in the compressed secondary cache (#9747)
Summary:
###  **Summary:**
When both LRU Cache and CompressedSecondaryCache are configured together, there possibly are some data blocks double cached.

**Changes include:**
1. Update IS_PROMOTED to IS_IN_SECONDARY_CACHE to prevent confusions.
2. This PR updates SecondaryCacheResultHandle and use IsErasedFromSecondaryCache to determine whether the handle is erased in the secondary cache. Then, the caller can determine whether to SetIsInSecondaryCache().
3. Rename LRUSecondaryCache to CompressedSecondaryCache.

Pull Request resolved: https://github.com/facebook/rocksdb/pull/9747

Test Plan:
**Test Scripts:**
1. Populate a DB. The on disk footprint is 482 MB. The data is set to be 50% compressible, so the total decompressed size is expected to be 964 MB.
./db_bench --benchmarks=fillrandom --num=10000000 -db=/db_bench_1

2. overwrite it to a stable state:
./db_bench --benchmarks=overwrite,stats --num=10000000 -use_existing_db -duration=10 --benchmark_write_rate_limit=2000000 -db=/db_bench_1

4. Run read tests with diffeernt cache setting:

T1:
./db_bench --benchmarks=seekrandom,stats --threads=16 --num=10000000 -use_existing_db -duration=120 --benchmark_write_rate_limit=52000000 -use_direct_reads --cache_size=520000000  --statistics -db=/db_bench_1

T2:
./db_bench --benchmarks=seekrandom,stats --threads=16 --num=10000000 -use_existing_db -duration=120 --benchmark_write_rate_limit=52000000 -use_direct_reads --cache_size=320000000 -compressed_secondary_cache_size=400000000 --statistics -use_compressed_secondary_cache -db=/db_bench_1

T3:
./db_bench --benchmarks=seekrandom,stats --threads=16 --num=10000000 -use_existing_db -duration=120 --benchmark_write_rate_limit=52000000 -use_direct_reads --cache_size=520000000 -compressed_secondary_cache_size=400000000 --statistics -use_compressed_secondary_cache -db=/db_bench_1

T4:
./db_bench --benchmarks=seekrandom,stats --threads=16 --num=10000000 -use_existing_db -duration=120 --benchmark_write_rate_limit=52000000 -use_direct_reads --cache_size=20000000 -compressed_secondary_cache_size=500000000 --statistics -use_compressed_secondary_cache -db=/db_bench_1

**Before this PR**
| Cache Size | Compressed Secondary Cache Size | Cache Hit Rate |
|------------|-------------------------------------|----------------|
|520 MB | 0 MB | 85.5% |
|320 MB | 400 MB | 96.2% |
|520 MB | 400 MB | 98.3% |
|20 MB | 500 MB | 98.8% |

**Before this PR**
| Cache Size | Compressed Secondary Cache Size | Cache Hit Rate |
|------------|-------------------------------------|----------------|
|520 MB | 0 MB | 85.5% |
|320 MB | 400 MB | 99.9% |
|520 MB | 400 MB | 99.9% |
|20 MB | 500 MB | 99.2% |

Reviewed By: anand1976

Differential Revision: D35117499

Pulled By: gitbw95

fbshipit-source-id: ea2657749fc13efebe91a8a1b56bc61d6a224a12
2022-04-11 13:28:33 -07:00
Renamed from cache/lru_secondary_cache_test.cc (Browse further)