Commit Graph

105 Commits

Author SHA1 Message Date
Daniel Nephin ab068325da agent/cache: move typeEntry lookup to the edge
This change moves all the typeEntry lookups to the first step in the exported methods,
and makes unexporter internals accept the typeEntry struct.

This change is primarily intended to make it easier to extract the container of caches
from the Cache type.

It may incidentally reduce locking in fetch, but that was not a goal.
2020-04-03 16:01:56 -04:00
Pierre Souchay 984583d980
tests: more tolerance to latency for unstable test `TestCacheNotifyPolling()`. (#7574) 2020-04-03 10:29:38 +02:00
R.B. Boyer 0e152672a1
avoid 'panic: Log in goroutine after TestCacheGet_refreshAge has completed' (#7276) 2020-02-12 10:01:51 -06:00
Anthony Scalisi 4b92c2deee fix spelling errors (#7135) 2020-01-27 07:00:33 -06:00
R.B. Boyer 55fdae203f
agent: cache notifications work after error if the underlying RPC returns index=1 (#6547)
Fixes #6521

Ensure that initial failures to fetch an agent cache entry using the
notify API where the underlying RPC returns a synthetic index of 1
correctly recovers when those RPCs resume working.

The bug in the Cache.notifyBlockingQuery used to incorrectly "fix" the
index for the next query from 0 to 1 for all queries, when it should
have not done so for queries that errored.

Also fixed some things that made debugging difficult:

- config entry read/list endpoints send back QueryMeta headers
- xds event loops don't swallow the cache notification errors
2019-09-26 10:42:17 -05:00
Christian Muehlhaeuser 2602f6907e Simplified code in various places (#6176)
All these changes should have no side-effects or change behavior:

- Use bytes.Buffer's String() instead of a conversion
- Use time.Since and time.Until where fitting
- Drop unnecessary returns and assignment
2019-07-20 09:37:19 -04:00
Freddy a295d9e5db
Flaky test overhaul (#6100) 2019-07-12 09:52:26 -06:00
Hans Hasselberg 73c4e9f07c
tls: auto_encrypt enables automatic RPC cert provisioning for consul clients (#5597) 2019-06-27 22:22:07 +02:00
Matt Keeler 07f2854683 Fixes race condition in Agent Cache (#5796)
* Fix race condition during a cache get

Check the entry we pulled out of the cache while holding the lock had Fetching set.
If it did then we should use the existing Waiter instead of calling fetch. The reason
this is better than just calling fetch is that fetch re-gets the entry out of the
entries map and the previous fetch may have finished. Therefore this prevents
erroneously starting a new fetch because we just missed the last update.

* Fix race condition fully

The first commit still allowed for the following scenario:

• No entry existing when checked in getWithIndex while holding the read lock
• Then by time we had reached fetch it had been created and finished.

* always use ok when returning

* comment mentioning the reading from entries.

* use cacheHit consistently
2019-05-07 11:15:49 +01:00
Kyle Havlovitz a113d8ca1f Test an index=0 value in cache.Notify 2019-04-25 02:11:07 -07:00
Kyle Havlovitz 1fc96c770b Make central service config opt-in and rework the initial registration 2019-04-24 06:11:08 -07:00
R.B. Boyer 91e78e00c7
fix typos reported by golangci-lint:misspell (#5434) 2019-03-06 11:13:28 -06:00
Paul Banks 1c4dfbcd2e
connect: tame thundering herd of CSRs on CA rotation (#5228)
* Support rate limiting and concurrency limiting CSR requests on servers; handle CA rotations gracefully with jitter and backoff-on-rate-limit in client

* Add CSR rate limiting docs

* Fix config naming and add tests for new CA configs
2019-01-22 17:19:36 +00:00
Matt Keeler 8e54856c46
Implement prepared query upstreams watching for envoy (#5224)
Fixes #4969 

This implements non-blocking request polling at the cache layer which is currently only used for prepared queries. Additionally this enables the proxycfg manager to poll prepared queries for use in envoy proxy upstreams.
2019-01-18 12:44:04 -05:00
Paul Banks c4fa66b4c9
connect: agent leaf cert caching improvements (#5091)
* Add State storage and LastResult argument into Cache so that cache.Types can safely store additional data that is eventually expired.

* New Leaf cache type working and basic tests passing. TODO: more extensive testing for the Root change jitter across blocking requests, test concurrent fetches for different leaves interact nicely with rootsWatcher.

* Add multi-client and delayed rotation tests.

* Typos and cleanup error handling in roots watch

* Add comment about how the FetchResult can be used and change ca leaf state to use a non-pointer state.

* Plumb test override of root CA jitter through TestAgent so that tests are deterministic again!

* Fix failing config test
2019-01-10 12:46:11 +00:00
Paul Banks 1bf3a37597
agent: Don't leave old errors around in cache (#5094)
* Fixes #4480. Don't leave old errors around in cache that can be hit in specific circumstances.

* Move error reset to cover extreme edge case of nil Value, nil err Fetch
2019-01-08 10:06:38 +00:00
Paul Banks bf741cbc04
Quick fix for cache age flakiness in CI 2018-10-11 13:12:19 +01:00
Paul Banks 0523efa2fe merge feedback: fix typos; actually use deliverLatest added previously but not plumbed in 2018-10-10 16:55:34 +01:00
Paul Banks a640cc6bc7 Add cache.Notify to abstract watching for cache updates for types that support blocking semantics. (#4695) 2018-10-10 16:55:34 +01:00
Paul Banks 5b0d4db6bc Support Agent Caching for Service Discovery Results (#4541)
* Add cache types for catalog/services and health/services and basic test that caching works

* Support non-blocking cache types with Cache-Control semantics.

* Update API docs to include caching info for every endpoint.

* Comment updates per PR feedback.

* Add note on caching to the 10,000 foot view on the architecture page to make the new data path more clear.

* Document prepared query staleness quirk and force all background requests to AllowStale so we can spread service discovery load across servers.
2018-10-10 16:55:34 +01:00
Paul Banks 77e0577ff6
Add a Close method to cache that stops background goroutines. (#4746)
In a real agent the `cache` instance is alive until the agent shuts down so this is not a real leak in production, however in out test suite, every testAgent that is started and stops leaks goroutines that never get cleaned up which accumulate consuming CPU and memory through subsequent test in the `agent` package which doesn't help our test flakiness.

This adds a Close method that doesn't invalidate or clean up the cache, and still allows concurrent blocking queries to run (for up to 10 mins which might still affect tests). But at least it doesn't maintain them forever with background refresh and an expiry watcher routine.

It would be nice to cancel any outstanding blocking requests as well when we close but that requires much more invasive surgery right into our RPC protocol since we don't have a way to cancel requests currently.

Unscientifically this seems to make tests pass a bit quicker and more reliably locally but I can't really be sure of that!
2018-10-04 11:27:11 +01:00
Paul Banks 217137b775
Fixes #4421: General solution to stop blocking queries with index 0 (#4437)
* Fix theoretical cache collision bug if/when we use more cache types with same result type

* Generalized fix for blocking query handling when state store methods return zero index

* Refactor test retry to only affect CI

* Undo make file merge

* Add hint to error message returned to end-user requests if Connect is not enabled when they try to request cert

* Explicit error for Roots endpoint if connect is disabled

* Fix tests that were asserting old behaviour
2018-07-25 20:26:27 +01:00
Paul Banks 81bd1b43a3 Fix hot loop in cache for RPC returning zero index. 2018-06-25 12:25:37 -07:00
Paul Banks 3d51c2aeac Get agent cache tests passing without global hit count (which is racy).
Few other fixes in here just to get a clean run locally - they are all also fixed in other PRs but shouldn't conflict.

This should be robust to timing between goroutines now.
2018-06-25 12:25:37 -07:00
Mitchell Hashimoto 29af8fb5ab agent/cache: always schedule the refresh 2018-06-25 12:25:14 -07:00
Mitchell Hashimoto a3ef9c2308 agent/cache: always fetch with minimum index of 1 at least 2018-06-25 12:25:12 -07:00
Mitchell Hashimoto bb98686ec8 agent/cache: update comment from PR review to clarify 2018-06-25 12:24:08 -07:00
Mitchell Hashimoto 55b3d5d6f4 agent/cache: update comments 2018-06-25 12:24:07 -07:00
Mitchell Hashimoto ea0270e6aa agent/cache: correct test name 2018-06-25 12:24:07 -07:00
Mitchell Hashimoto b5201276bc agent/cache: change behavior to return error rather than retry
The cache behavior should not be to mask errors and retry. Instead, it
should aim to return errors as quickly as possible. We do that here.
2018-06-25 12:24:07 -07:00
Mitchell Hashimoto 778e318a52 agent/cache: perform backoffs on error retries on blocking queries 2018-06-25 12:24:06 -07:00
Paul Banks 5abf47472d
Verify trust domain on /authorize calls 2018-06-14 09:42:16 -07:00
Mitchell Hashimoto 4bb745a2d4
agent/cache: change uint8 to uint 2018-06-14 09:42:15 -07:00
Mitchell Hashimoto 6cf2e1ef1a
agent/cache: string through attempt rather than storing on the entry 2018-06-14 09:42:15 -07:00
Mitchell Hashimoto c42510e1ec
agent/cache: implement refresh backoff 2018-06-14 09:42:14 -07:00
Mitchell Hashimoto dcb2671d10
agent/cache: address PR feedback, lots of typos 2018-06-14 09:42:03 -07:00
Mitchell Hashimoto 07d878a157
agent/cache: address feedback, clarify comments 2018-06-14 09:42:03 -07:00
Mitchell Hashimoto ad3928b6bd
agent/cache: don't every block on NotifyCh 2018-06-14 09:42:03 -07:00
Mitchell Hashimoto 3f80a9f330
agent/cache: unit tests for ExpiryHeap, found a bug! 2018-06-14 09:42:03 -07:00
Mitchell Hashimoto 1c31e34e5b
agent/cache: send the total entries count on eviction to go-metrics 2018-06-14 09:42:03 -07:00
Mitchell Hashimoto ec559d77bd
agent/cache: make edge case with prev/next idx == 0 handled better 2018-06-14 09:42:03 -07:00
Mitchell Hashimoto b319d06276
agent/cache: rework how expiry data is stored to be more efficient 2018-06-14 09:42:03 -07:00
Mitchell Hashimoto 449bbd817d
agent/cache: initial TTL work 2018-06-14 09:42:02 -07:00
Mitchell Hashimoto 3c6acbda5d
agent/cache: send the RefreshTimeout into the backend fetch 2018-06-14 09:42:02 -07:00
Mitchell Hashimoto 257fc34e51
agent/cache: on error, return from Get immediately, don't block forever 2018-06-14 09:42:02 -07:00
Mitchell Hashimoto e9d58ca219
agent/cache: lots of comment/doc updates 2018-06-14 09:42:02 -07:00
Mitchell Hashimoto 109bb946e9
agent/cache: return the error as part of Get 2018-06-14 09:42:01 -07:00
Mitchell Hashimoto 6ecc2da7ff
agent/cache: integrate go-metrics so the cache is debuggable 2018-06-14 09:42:01 -07:00
Mitchell Hashimoto 4509589427
agent/cache: support timeouts for cache reads and empty fetch results 2018-06-14 09:42:01 -07:00
Mitchell Hashimoto 9e44a319d3
agent: check cache hit count to verify CA root caching, background update 2018-06-14 09:42:00 -07:00
Mitchell Hashimoto 286217cbd8
agent/cache: partition by DC/ACL token 2018-06-14 09:42:00 -07:00
Mitchell Hashimoto 72c82a9b29
agent/cache: Reorganize some files, RequestInfo struct, prepare for partitioning 2018-06-14 09:42:00 -07:00
Mitchell Hashimoto ecc789ddb5
agent/cache: ConnectCA roots caching type 2018-06-14 09:42:00 -07:00
Mitchell Hashimoto c69df79e0c
agent/cache: blank cache key means to always fetch 2018-06-14 09:42:00 -07:00
Mitchell Hashimoto 8584e9262e
agent/cache: initial kind-of working cache 2018-06-14 09:42:00 -07:00