open-consul

Commit Graph

Author	SHA1	Message	Date
R.B. Boyer	baf886c6f3	proxycfg: introduce explicit UpstreamID in lieu of bare string (#12125 ) The gist here is that now we use a value-type struct proxycfg.UpstreamID as the map key in ConfigSnapshot maps where we used to use "upstream id-ish" strings. These are internal only and used just for bidirectional trips through the agent cache keyspace (like the discovery chain target struct). For the few places where the upstream id needs to be projected into xDS, that's what (proxycfg.UpstreamID).EnvoyID() is for. This lets us ALWAYS inject the partition and namespace into these things without making stuff like the golden testdata diverge.	2022-01-20 10:12:04 -06:00
freddygv	be85ae11ca	additional test fixes	2021-12-13 18:56:44 -07:00
Daniel Upton	caa5b5a5a6	xds: prefer fed state gateway definitions if they're fresher (#11522 ) Fixes an issue described in #10132, where if two DCs are WAN federated over mesh gateways, and the gateway in the non-primary DC is terminated and receives a new IP address (as is commonly the case when running them on ephemeral compute instances) the primary DC is unable to re-establish its connection until the agent running on its own gateway is restarted. This was happening because we always preferred gateways discovered by the `Internal.ServiceDump` RPC (which would fail because there's no way to dial the remote DC) over those discovered in the federation state, which is replicated as long as the primary DC's gateway is reachable.	2021-11-09 16:45:36 +00:00
freddygv	51c888a41a	Update locality check in xds	2021-11-01 13:58:53 -06:00
Evan Culver	b3c92f22b1	connect: Remove support for Envoy 1.16 (#11354 )	2021-10-27 18:51:35 -07:00
R.B. Boyer	97e57aedfb	connect: update supported envoy versions to 1.18.2, 1.17.2, 1.16.3, and 1.15.4 (#10101 ) The only thing that needed fixing up pertained to this section of the 1.18.x release notes: > grpc_stats: the default value for stats_for_all_methods is switched from true to false, in order to avoid possible memory exhaustion due to an untrusted downstream sending a large number of unique method names. The previous default value was deprecated in version 1.14.0. This only changes the behavior when the value is not set. The previous behavior can be used by setting the value to true. This behavior change by be overridden by setting runtime feature envoy.deprecated_features.grpc_stats_filter_enable_stats_for_all_methods_by_default. For now to maintain status-quo I'm explicitly setting `stats_for_all_methods=true` in all versions to avoid relying upon the default. Additionally the naming of the emitted metrics for these gRPC requests changed slightly so the integration test assertions for `case-grpc` needed adjusting.	2021-04-29 15:22:03 -05:00
R.B. Boyer	91bee6246f	Support Incremental xDS mode (#9855 ) This adds support for the Incremental xDS protocol when using xDS v3. This is best reviewed commit-by-commit and will not be squashed when merged. Union of all commit messages follows to give an overarching summary: xds: exclusively support incremental xDS when using xDS v3 Attempts to use SoTW via v3 will fail, much like attempts to use incremental via v2 will fail. Work around a strange older envoy behavior involving empty CDS responses over incremental xDS. xds: various cleanups and refactors that don't strictly concern the addition of incremental xDS support Dissolve the connectionInfo struct in favor of per-connection ResourceGenerators instead. Do a better job of ensuring the xds code uses a well configured logger that accurately describes the connected client. xds: pull out checkStreamACLs method in advance of a later commit xds: rewrite SoTW xDS protocol tests to use protobufs rather than hand-rolled json strings In the test we very lightly reuse some of the more boring protobuf construction helper code that is also technically under test. The important thing of the protocol tests is testing the protocol. The actual inputs and outputs are largely already handled by the xds golden output tests now so these protocol tests don't have to do double-duty. This also updates the SoTW protocol test to exclusively use xDS v2 which is the only variant of SoTW that will be supported in Consul 1.10. xds: default xds.Server.AuthCheckFrequency at use-time instead of construction-time	2021-04-29 13:54:05 -05:00
freddygv	5b59780431	Update xds for transparent proxy	2021-03-17 13:40:49 -06:00
R.B. Boyer	503041f216	xds: default to speaking xDS v3, but allow for v2 to be spoken upon request (#9658 ) - Also add support for envoy 1.17.0	2021-02-26 16:23:15 -06:00
R.B. Boyer	4336d522c1	test: omit envoy golden test files that differ from the latest version (#9807 ) Since we currently do no version switching this removes 75% of the PR noise. To generate all *.golden files were removed and then I ran: go test ./agent/xds -update	2021-02-24 14:04:31 -06:00
R.B. Boyer	cdc5e99184	xds: remove deprecated usages of xDS (#9602 ) Note that this does NOT upgrade to xDS v3. That will come in a future PR. Additionally: - Ignored staticcheck warnings about how github.com/golang/protobuf is deprecated. - Shuffled some agent/xds imports in advance of a later xDS v3 upgrade. - Remove support for envoy 1.13.x but don't add in 1.17.x yet. We have to wait until the xDS v3 support is added in a follow-up PR. Fixes #8425	2021-02-22 15:00:15 -06:00
Daniel Nephin	de226f26e4	xds: Pass in logger small cleanup in tests	2021-01-07 18:13:48 -05:00
Daniel Nephin	ef0999547a	testing: skip slow tests with -short Add a skip condition to all tests slower than 100ms. This change was made using `gotestsum tool slowest` with data from the last 3 CI runs of master. See https://github.com/gotestyourself/gotestsum#finding-and-skipping-slow-tests With this change: ``` $ time go test -count=1 -short ./agent ok github.com/hashicorp/consul/agent 0.743s real 0m4.791s $ time go test -count=1 -short ./agent/consul ok github.com/hashicorp/consul/agent/consul 4.229s real 0m8.769s ```	2020-12-07 13:42:55 -05:00
R.B. Boyer	8ea4c482b3	xds: add support for envoy 1.15.0 and drop support for 1.11.x (#8424 ) Related changes: - hard-fail the xDS connection attempt if the envoy version is known to be too old to be supported - remove the RouterMatchSafeRegex proxy feature since all supported envoy versions have it - stop using --max-obj-name-len (due to: envoyproxy/envoy#11740)	2020-07-31 15:52:49 -05:00
R.B. Boyer	6e3d07c995	xds: version sniff envoy and switch regular expressions from 'regex' to 'safe_regex' on newer envoy versions (#8222 ) - cut down on extra node metadata transmission - split the golden file generation to compare all envoy version	2020-07-09 17:04:51 -05:00
R.B. Boyer	ba83b52b32	connect: upgrade github.com/envoyproxy/go-control-plane to v0.9.5 (#8165 )	2020-06-23 15:19:56 -05:00
Daniel Nephin	89d95561df	Enable gofmt simplify Code changes done automatically with 'gofmt -s -w'	2020-06-16 13:21:11 -04:00
freddygv	1e7e716742	Move compound service names to use ServiceName type	2020-06-12 13:47:43 -06:00
Raphaël Rondeau	b799471e29	connect: fix endpoints clusterName when using cluster escape hatch (#7319 ) ```changelog * fix(connect): fix endpoints clusterName when using cluster escape hatch ```	2020-05-26 10:57:22 +02:00
Kyle Havlovitz	e4268c8b7f	Support multiple listeners referencing the same service in gateway definitions	2020-05-06 15:06:13 -05:00
freddygv	929491c979	Add subset support	2020-04-27 11:08:40 -06:00
freddygv	2e35a9bb18	Add xds cluster/listener/endpoint management	2020-04-27 11:08:40 -06:00
Chris Piraino	af5cc8fd92	Add all the xds ingress tests This commit copies many of the connect-proxy xds testcases and reuses for ingress gateways. This allows us to more easily see changes to the envoy configuration when make updates to ingress gateways.	2020-04-24 09:31:32 -05:00
Kyle Havlovitz	6a5eba63ab	Ingress Gateways for TCP services (#7509 ) * Implements a simple, tcp ingress gateway workflow This adds a new type of gateway for allowing Ingress traffic into Connect from external services. Co-authored-by: Chris Piraino <cpiraino@hashicorp.com>	2020-04-16 14:00:48 -07:00
R.B. Boyer	a7fb26f50f	wan federation via mesh gateways (#6884 ) This is like a Möbius strip of code due to the fact that low-level components (serf/memberlist) are connected to high-level components (the catalog and mesh-gateways) in a twisty maze of references which make it hard to dive into. With that in mind here's a high level summary of what you'll find in the patch: There are several distinct chunks of code that are affected: * new flags and config options for the server * retry join WAN is slightly different * retry join code is shared to discover primary mesh gateways from secondary datacenters * because retry join logic runs in the agent and the results of that operation for primary mesh gateways are needed in the server there are some methods like `RefreshPrimaryGatewayFallbackAddresses` that must occur at multiple layers of abstraction just to pass the data down to the right layer. * new cache type `FederationStateListMeshGatewaysName` for use in `proxycfg/xds` layers * the function signature for RPC dialing picked up a new required field (the node name of the destination) * several new RPCs for manipulating a FederationState object: `FederationState:{Apply,Get,List,ListMeshGateways}` * 3 read-only internal APIs for debugging use to invoke those RPCs from curl * raft and fsm changes to persist these FederationStates * replication for FederationStates as they are canonically stored in the Primary and replicated to the Secondaries. * a special derivative of anti-entropy that runs in secondaries to snapshot their local mesh gateway `CheckServiceNodes` and sync them into their upstream FederationState in the primary (this works in conjunction with the replication to distribute addresses for all mesh gateways in all DCs to all other DCs) * a "gateway locator" convenience object to make use of this data to choose the addresses of gateways to use for any given RPC or gossip operation to a remote DC. This gets data from the "retry join" logic in the agent and also directly calls into the FSM. * RPC (`:8300`) on the server sniffs the first byte of a new connection to determine if it's actually doing native TLS. If so it checks the ALPN header for protocol determination (just like how the existing system uses the type-byte marker). * 2 new kinds of protocols are exclusively decoded via this native TLS mechanism: one for ferrying "packet" operations (udp-like) from the gossip layer and one for "stream" operations (tcp-like). The packet operations re-use sockets (using length-prefixing) to cut down on TLS re-negotiation overhead. * the server instances specially wrap the `memberlist.NetTransport` when running with gateway federation enabled (in a `wanfed.Transport`). The general gist is that if it tries to dial a node in the SAME datacenter (deduced by looking at the suffix of the node name) there is no change. If dialing a DIFFERENT datacenter it is wrapped up in a TLS+ALPN blob and sent through some mesh gateways to eventually end up in a server's :8300 port. * a new flag when launching a mesh gateway via `consul connect envoy` to indicate that the servers are to be exposed. This sets a special service meta when registering the gateway into the catalog. * `proxycfg/xds` notice this metadata blob to activate additional watches for the FederationState objects as well as the location of all of the consul servers in that datacenter. * `xds:` if the extra metadata is in place additional clusters are defined in a DC to bulk sink all traffic to another DC's gateways. For the current datacenter we listen on a wildcard name (`server.<dc>.consul`) that load balances all servers as well as one mini-cluster per node (`<node>.server.<dc>.consul`) * the `consul tls cert create` command got a new flag (`-node`) to help create an additional SAN in certs that can be used with this flavor of federation.	2020-03-09 15:59:02 -05:00
Matt Keeler	154eafe140	xDS Mesh Gateway Resolver Subset Fixes (#7294 ) * xDS Mesh Gateway Resolver Subset Fixes The first fix was that clusters were being generated for every service resolver subset regardless of there being any service instances of the associated service in that dc. The previous logic didn’t care at all but now it will omit generating those clusters unless we also have service instances that should be proxied. The second fix was to respect the DefaultSubset of a service resolver so that mesh-gateways would configure the endpoints of the unnamed subset cluster to only those endpoints matched by the default subsets filters. * Refactor the gateway endpoint generation to be a little easier to read	2020-02-19 11:57:55 -05:00
Chris Piraino	3dd0b59793	Allow users to configure either unstructured or JSON logging (#7130 ) * hclog Allow users to choose between unstructured and JSON logging	2020-01-28 17:50:41 -06:00
Matt Keeler	485a0a65ea	Updates to Config Entries and Connect for Namespaces (#7116 )	2020-01-24 10:04:58 -05:00
R.B. Boyer	b091647090	agent: allow mesh gateways to initialize even if there are no connect services registered yet (#6576 ) Fixes #6543 Also improved some of the proxycfg tests to cover snapshot validity better.	2019-10-17 16:46:49 -05:00
R.B. Boyer	d6456fddeb	connect: introduce ExternalSNI field on service-defaults (#6324 ) Compiling this will set an optional SNI field on each DiscoveryTarget. When set this value should be used for TLS connections to the instances of the target. If not set the default should be used. Setting ExternalSNI will disable mesh gateway use for that target. It also disables several service-resolver features that do not make sense for an external service.	2019-08-19 12:19:44 -05:00
R.B. Boyer	64fc002e03	connect: fix failover through a mesh gateway to a remote datacenter (#6259 ) Failover is pushed entirely down to the data plane by creating envoy clusters and putting each successive destination in a different load assignment priority band. For example this shows that normally requests go to 1.2.3.4:8080 but when that fails they go to 6.7.8.9:8080: - name: foo load_assignment: cluster_name: foo policy: overprovisioning_factor: 100000 endpoints: - priority: 0 lb_endpoints: - endpoint: address: socket_address: address: 1.2.3.4 port_value: 8080 - priority: 1 lb_endpoints: - endpoint: address: socket_address: address: 6.7.8.9 port_value: 8080 Mesh gateways route requests based solely on the SNI header tacked onto the TLS layer. Envoy currently only lets you configure the outbound SNI header at the cluster layer. If you try to failover through a mesh gateway you ideally would configure the SNI value per endpoint, but that's not possible in envoy today. This PR introduces a simpler way around the problem for now: 1. We identify any target of failover that will use mesh gateway mode local or remote and then further isolate any resolver node in the compiled discovery chain that has a failover destination set to one of those targets. 2. For each of these resolvers we will perform a small measurement of comparative healths of the endpoints that come back from the health API for the set of primary target and serial failover targets. We walk the list of targets in order and if any endpoint is healthy we return that target, otherwise we move on to the next target. 3. The CDS and EDS endpoints both perform the measurements in (2) for the affected resolver nodes. 4. For CDS this measurement selects which TLS SNI field to use for the cluster (note the cluster is always going to be named for the primary target) 5. For EDS this measurement selects which set of endpoints will populate the cluster. Priority tiered failover is ignored. One of the big downsides to this approach to failover is that the failover detection and correction is going to be controlled by consul rather than deferring that entirely to the data plane as with the prior version. This also means that we are bound to only failover using official health signals and cannot make use of data plane signals like outlier detection to affect failover. In this specific scenario the lack of data plane signals is ok because the effectiveness is already muted by the fact that the ultimate destination endpoints will have their data plane signals scrambled when they pass through the mesh gateway wrapper anyway so we're not losing much. Another related fix is that we now use the endpoint health from the underlying service, not the health of the gateway (regardless of failover mode).	2019-08-05 13:30:35 -05:00
R.B. Boyer	0165e93517	connect: expose an API endpoint to compile the discovery chain (#6248 ) In addition to exposing compilation over the API cleaned up the structures that would be exchanged to be cleaner and easier to support and understand. Also removed ability to configure the envoy OverprovisioningFactor.	2019-08-02 15:34:54 -05:00
R.B. Boyer	782c647bf4	connect: simplify the compiled discovery chain data structures (#6242 ) This should make them better for sending over RPC or the API. Instead of a chain implemented explicitly like a linked list (nodes holding pointers to other nodes) instead switch to a flat map of named nodes with nodes linking other other nodes by name. The shipped structure is just a map and a string to indicate which key to start from. Other changes: * inline the compiler option InferDefaults as true * introduce compiled target config to avoid needing to send back additional maps of Resolvers; future target-specific compiled state can go here * move compiled MeshGateway out of the Resolver and into the TargetConfig where it makes more sense.	2019-08-01 22:44:05 -05:00
R.B. Boyer	4666599e18	connect: reconcile how upstream configuration works with discovery chains (#6225 ) * connect: reconcile how upstream configuration works with discovery chains The following upstream config fields for connect sidecars sanely integrate into discovery chain resolution: - Destination Namespace/Datacenter: Compilation occurs locally but using different default values for namespaces and datacenters. The xDS clusters that are created are named as they normally would be. - Mesh Gateway Mode (single upstream): If set this value overrides any value computed for any resolver for the entire discovery chain. The xDS clusters that are created may be named differently (see below). - Mesh Gateway Mode (whole sidecar): If set this value overrides any value computed for any resolver for the entire discovery chain. If this is specifically overridden for a single upstream this value is ignored in that case. The xDS clusters that are created may be named differently (see below). - Protocol (in opaque config): If set this value overrides the value computed when evaluating the entire discovery chain. If the normal chain would be TCP or if this override is set to TCP then the result is that we explicitly disable L7 Routing and Splitting. The xDS clusters that are created may be named differently (see below). - Connect Timeout (in opaque config): If set this value overrides the value for any resolver in the entire discovery chain. The xDS clusters that are created may be named differently (see below). If any of the above overrides affect the actual result of compiling the discovery chain (i.e. "tcp" becomes "grpc" instead of being a no-op override to "tcp") then the relevant parameters are hashed and provided to the xDS layer as a prefix for use in naming the Clusters. This is to ensure that if one Upstream discovery chain has no overrides and tangentially needs a cluster named "api.default.XXX", and another Upstream does have overrides for "api.default.XXX" that they won't cross-pollinate against the operator's wishes. Fixes #6159	2019-08-01 22:03:34 -05:00
R.B. Boyer	2bfad66efa	connect: rework how the service resolver subset OnlyPassing flag works (#6173 ) The main change is that we no longer filter service instances by health, preferring instead to render all results down into EDS endpoints in envoy and merely label the endpoints as HEALTHY or UNHEALTHY. When OnlyPassing is set to true we will force consul checks in a 'warning' state to render as UNHEALTHY in envoy. Fixes #6171	2019-07-23 20:20:24 -05:00
R.B. Boyer	72a8195839	implement some missing service-router features and add more xDS testing (#6065 ) - also implement OnlyPassing filters for non-gateway clusters	2019-07-12 14:16:21 -05:00
Matt Keeler	0a0775b9a6	Fix a bunch of xds flaky tests The clusters/endpoints test were still relying on deterministic ordering of clusters/endpoints which cannot be relied upon due to golang purposefully not providing any guarantee about consistent interation ordering of maps. Also fixed a small bug in the connect proxy cluster generation that was causing the clusters slice to be double the size it needed to with the first half being all nil pointers.	2019-07-02 15:53:06 -04:00
Matt Keeler	e916f2d954	Implement mesh gateway management of service subsets Fixup some error handling	2019-07-02 10:29:37 -04:00
R.B. Boyer	bccbb2b4ae	activate most discovery chain features in xDS for envoy (#6024 )	2019-07-01 22:10:51 -05:00
Matt Keeler	39bb0e3e77	Implement Mesh Gateways This includes both ingress and egress functionality.	2019-07-01 16:28:30 -04:00
Paul Banks	737be347eb	Upgrade xDS (go-control-plane) API to support Envoy 1.10. (#5872 ) * Upgrade xDS (go-control-plane) API to support Envoy 1.10. This includes backwards compatibility shim to work around the ext_authz package rename in 1.10. It also adds integration test support in CI for 1.10.0. * Fix go vet complaints * go mod vendor * Update Envoy version info in docs * Update website/source/docs/connect/proxies/envoy.md	2019-06-07 07:10:43 -05:00
Paul Banks	cf5528734c	Connect: Fix Envoy getting stuck during load (#5499 ) * Connect: Fix Envoy getting stuck during load Also in this PR: - Enabled outlier detection on upstreams which will mark instances unhealthy after 5 failures (using Envoy's defaults) - Enable weighted load balancing where DNS weights are configured * Fix empty load assignments in the right place * Fix import names from review * Move millisecond parse to a helper function	2019-03-22 19:37:14 +00:00

42 Commits