rocksdb/cache
Peter Dillinger dc576af0fd AutoHCC - fix a rare loop condition in Lookup (#11948)
Summary:
Saw this in stress test:
```
db_stress: cache/clock_cache.cc:3152:[...] Assertion `i < 0x2000' failed.
```

The problem is related to Lookups on a chain currently involved in a Grow operation. To avoid Lookup waiting on Grow, Lookup is able to walk a chain whose first part is already migrated and tail is not yet migrated, so is mixed with entries with a different destination home (according to `home_shift`) than what we're looking for. This is fine until we save one of these entries as a safe point in the chain to backtrack to (`read_ref_on_chain`) in case of concurrent modification and end up backtracking to it. In that case, we can get stuck on the wrong destination chain and keep trying to backtrack to an entry that is supposed to be on the correct chain but is not (anymore).

For some reason I haven't quite worked out, I believe it's usually able to recover after some 1000+ looop iterations, so reproducibility depends on the threshold at which we consider a Lookup loop to be too many iterations for a plausibly valid Lookup.

Detecting and working around this case is relatively simple. We can (and must) keep going on the chain but ensure we don't save it as a safe entry to backtrack to.

Pull Request resolved: https://github.com/facebook/rocksdb/pull/11948

Test Plan:
The problem could be reproduced in a few minutes with this (debug build):
```
$ while ./cache_bench -cache_type=auto_hyper_clock_cache -histograms=0 -cache_size=80000000 -threads=32 -populate_cache=0 -ops_per_thread=10000 -degenerate_hash_bits=6 -num_shard_bits=0; do :; done
```

At least with a lower threshold on suspiciously high number of iterations. I've lowered the thresholds quite a bit and no longer able to reproduce a failure.

Reviewed By: jowlyzhang

Differential Revision: D50236574

Pulled By: pdillinger

fbshipit-source-id: 2cb54a4e02bb51d5933eea41fcd489ab9d34aa96
2023-10-13 09:52:33 -07:00
..
cache.cc Support compressed and local flash secondary cache stacking (#11812) 2023-09-21 20:30:53 -07:00
cache_bench.cc Add (& fix) some simple source code checks (#8821) 2021-09-07 21:19:27 -07:00
cache_bench_tool.cc Support compressed and local flash secondary cache stacking (#11812) 2023-09-21 20:30:53 -07:00
cache_entry_roles.cc Major Cache refactoring, CPU efficiency improvement (#10975) 2023-01-11 14:20:40 -08:00
cache_entry_roles.h Major Cache refactoring, CPU efficiency improvement (#10975) 2023-01-11 14:20:40 -08:00
cache_entry_stats.h Simplify tracking entries already in SecondaryCache (#11299) 2023-03-15 17:51:44 -07:00
cache_helpers.cc Support compressed and local flash secondary cache stacking (#11812) 2023-09-21 20:30:53 -07:00
cache_helpers.h Put Cache and CacheWrapper in new public header (#11192) 2023-02-09 12:12:02 -08:00
cache_key.cc Put Cache and CacheWrapper in new public header (#11192) 2023-02-09 12:12:02 -08:00
cache_key.h Derive cache keys from SST unique IDs (#10394) 2022-08-12 13:49:49 -07:00
cache_reservation_manager.cc Simplify tracking entries already in SecondaryCache (#11299) 2023-03-15 17:51:44 -07:00
cache_reservation_manager.h Fix updating the capacity of a tiered cache (#11873) 2023-09-22 18:07:46 -07:00
cache_reservation_manager_test.cc Print stack traces on frozen tests in CI (#10828) 2022-10-18 00:35:35 -07:00
cache_test.cc Automatic table sizing for HyperClockCache (AutoHCC) (#11738) 2023-09-01 15:44:38 -07:00
charged_cache.cc Support compressed and local flash secondary cache stacking (#11812) 2023-09-21 20:30:53 -07:00
charged_cache.h Support compressed and local flash secondary cache stacking (#11812) 2023-09-21 20:30:53 -07:00
clock_cache.cc AutoHCC - fix a rare loop condition in Lookup (#11948) 2023-10-13 09:52:33 -07:00
clock_cache.h Fix major performance bug in AutoHCC growth phase (#11871) 2023-09-22 13:47:31 -07:00
compressed_secondary_cache.cc Fix TSAN crash test false positive (#11941) 2023-10-11 13:28:10 -07:00
compressed_secondary_cache.h Support compressed and local flash secondary cache stacking (#11812) 2023-09-21 20:30:53 -07:00
compressed_secondary_cache_test.cc Fix runtime error in UpdateTieredCache due to integer underflow (#11949) 2023-10-12 15:09:40 -07:00
lru_cache.cc Implement a allow cache hits admission policy for the compressed secondary cache (#11713) 2023-08-18 11:19:48 -07:00
lru_cache.h Add hash_seed to Caches (#11391) 2023-05-09 22:24:26 -07:00
lru_cache_test.cc Support compressed and local flash secondary cache stacking (#11812) 2023-09-21 20:30:53 -07:00
secondary_cache.cc Support compressed and local flash secondary cache stacking (#11812) 2023-09-21 20:30:53 -07:00
secondary_cache_adapter.cc Fix runtime error in UpdateTieredCache due to integer underflow (#11949) 2023-10-12 15:09:40 -07:00
secondary_cache_adapter.h Fix runtime error in UpdateTieredCache due to integer underflow (#11949) 2023-10-12 15:09:40 -07:00
sharded_cache.cc Add some more bit operations to internal APIs (#11660) 2023-08-02 11:30:10 -07:00
sharded_cache.h Support compressed and local flash secondary cache stacking (#11812) 2023-09-21 20:30:53 -07:00
tiered_secondary_cache.cc Support compressed and local flash secondary cache stacking (#11812) 2023-09-21 20:30:53 -07:00
tiered_secondary_cache.h Support compressed and local flash secondary cache stacking (#11812) 2023-09-21 20:30:53 -07:00
tiered_secondary_cache_test.cc Don't call InsertSaved on compressed only secondary cache (#11889) 2023-09-27 12:08:08 -07:00
typed_cache.h Support compressed and local flash secondary cache stacking (#11812) 2023-09-21 20:30:53 -07:00