* Plumb xDS server and proxyxfg into the agent startup
* Add `consul connect envoy` command to allow running Envoy as a connect sidecar.
* Add test for help tabs; typos and style fixups from review
- A new endpoint `/v1/agent/service/:service_id` which is a generic way to look up the service for a single instance. The primary value here is that it:
- **supports hash-based blocking** and so;
- **replaces `/agent/connect/proxy/:proxy_id`** as the mechanism the built-in proxy uses to read its config.
- It's not proxy specific and so works for any service.
- It has a temporary shim to call through to the existing endpoint to preserve current managed proxy config defaulting behaviour until that is removed entirely (tested).
- The built-in proxy now uses the new endpoint exclusively for it's config
- The built-in proxy now has a `-sidecar-for` flag that allows the service ID of the _target_ service to be specified, on the condition that there is exactly one "sidecar" proxy (that is one that has `Proxy.DestinationServiceID` set) for the service registered.
- Several fixes for edge cases for SidecarService
- A fix for `Alias` checks - when running locally they didn't update their state until some external thing updated the target. If the target service has no checks registered as below, then the alias never made it past critical.
* Added new Config for SidecarService in ServiceDefinitions.
* WIP: all the code needed for SidecarService is written... none of it is tested other than config :). Need API updates too.
* Test coverage for the new sidecarServiceFromNodeService method.
* Test API registratrion with SidecarService
* Recursive Key Translation 🤦
* Add tests for nested sidecar defintion arrays to ensure they are translated correctly
* Use dedicated internal state rather than Service Meta for tracking sidecars for deregistration.
Add tests for deregistration.
* API struct for agent register. No other endpoint should be affected yet.
* Additional test cases to cover updates to API registrations
* Rename agent/proxy package to reflect that it is limited to managed proxy processes
Rationale: we have several other components of the agent that relate to Connect proxies for example the ProxyConfigManager component needed for Envoy work. Those things are pretty separate from the focus of this package so far which is only concerned with managing external proxy processes so it's nota good fit to put code for that in here, yet there is a naming clash if we have other packages related to proxy functionality that are not in the `agent/proxy` package.
Happy to bikeshed the name. I started by calling it `managedproxy` but `managedproxy.Manager` is especially unpleasant. `proxyprocess` seems good in that it's more specific about purpose but less clearly connected with the concept of "managed proxies". The names in use are cleaner though e.g. `proxyprocess.Manager`.
This rename was completed automatically using golang.org/x/tools/cmd/gomvpkg.
Depends on #4541
* Fix missed windows tagged files
* Add cache types for catalog/services and health/services and basic test that caching works
* Support non-blocking cache types with Cache-Control semantics.
* Update API docs to include caching info for every endpoint.
* Comment updates per PR feedback.
* Add note on caching to the 10,000 foot view on the architecture page to make the new data path more clear.
* Document prepared query staleness quirk and force all background requests to AllowStale so we can spread service discovery load across servers.
In a real agent the `cache` instance is alive until the agent shuts down so this is not a real leak in production, however in out test suite, every testAgent that is started and stops leaks goroutines that never get cleaned up which accumulate consuming CPU and memory through subsequent test in the `agent` package which doesn't help our test flakiness.
This adds a Close method that doesn't invalidate or clean up the cache, and still allows concurrent blocking queries to run (for up to 10 mins which might still affect tests). But at least it doesn't maintain them forever with background refresh and an expiry watcher routine.
It would be nice to cancel any outstanding blocking requests as well when we close but that requires much more invasive surgery right into our RPC protocol since we don't have a way to cancel requests currently.
Unscientifically this seems to make tests pass a bit quicker and more reliably locally but I can't really be sure of that!
If you provide an invalid HTTP configuration consul will still start again instead of failing. But if you do so the build-in proxy won't be able to start which you might need for connect.
Fixes: #4578
Prior to this fix if there was an error binding to ports for the DNS servers the error would be swallowed by the gated log writer and never output. This fix propagates the DNS server errors back to the shell with a multierror.
Fixes#4515
This just slightly refactors the logic to only attempt to set the serf wan reconnect timeout when the rest of the serf wan settings are configured - thus avoiding a segfault.
Also change how loadProxies works. Now it will load all persisted proxies into a map, then when loading config file proxies will look up the previous proxy token in that map.
- Dev mode assumed no persistence of services although proxy state is persisted which caused proxies to be killed on startup as their services were no longer registered. Fixed.
- Didn't snapshot the ProxyID which meant that proxies were adopted OK from snapshot but failed to restart if they died since there was no proxyID in the ENV on restart
- Dev mode with no persistence just kills all proxies on shutdown since it can't recover them later
- Naming things
This turns out to have a lot more subtelty than we accounted for. The test suite is especially prone to races now we can only poll the child and many extra levels of indirectoin are needed to correctly run daemon process without it becoming a Zombie.
I ran this test suite in a loop with parallel enabled to verify for races (-race doesn't find any as they are logical inter-process ones not actual data races). I made it through ~50 runs before hitting an error due to timing which is much better than before. I want to go back and see if we can do better though. Just getting this up.