open-consul

Commit Graph

Author	SHA1	Message	Date
freddygv	0fb96afe31	Avoid potential deadlock using non-blocking send Deadlock scenario: 1. Due to scheduling, the state runner sends one snapshot into snapCh and then attempts to send a second. The first send succeeds because the channel is buffered, but the second blocks. 2. Separately, Manager.Watch is called by the xDS server after getting a discovery request from Envoy. This function acquires the manager lock and then blocks on receiving the CurrentSnapshot from the state runner. 3. Separately, there is a Manager goroutine that reads the snapshots from the channel in step 1. These reads are done to notify proxy watchers, but they require holding the manager lock. This goroutine goes to acquire that lock, but can't because it is held by step 2. Now, the goroutine from step 3 is waiting on the one from step 2 to release the lock. The goroutine from step 2 won't release the lock until the goroutine in step 1 advances. But the goroutine in step 1 is waiting for the one in step 3. Deadlock. By making this send non-blocking step 1 above can proceed. The coalesce timer will be reset and a new valid snapshot will be delivered after it elapses or when one is requested by xDS.	2021-02-02 11:31:14 -07:00
freddygv	43efb4809c	Merge master	2020-09-14 16:17:43 -06:00
freddygv	66e5c5989a	Fix type assertion	2020-09-14 16:12:21 -06:00
R.B. Boyer	f2b8bf109c	xds: use envoy's rbac filter to handle intentions entirely within envoy (#8569 )	2020-08-27 12:20:58 -05:00
Daniel Nephin	1ef8279ac9	Merge pull request #8034 from hashicorp/dnephin/add-linter-staticcheck-4 ci: enable SA4006 staticcheck check and add ineffassign	2020-06-17 12:16:02 -04:00
Daniel Nephin	89d95561df	Enable gofmt simplify Code changes done automatically with 'gofmt -s -w'	2020-06-16 13:21:11 -04:00
Daniel Nephin	5f24171f13	ci: enable SA4006 staticcheck check And fix the 'value not used' issues. Many of these are not bugs, but a few are tests not checking errors, and one appears to be a missed error in non-test code.	2020-06-16 13:10:11 -04:00
freddygv	1e7e716742	Move compound service names to use ServiceName type	2020-06-12 13:47:43 -06:00
Freddy	66e2def461	Only pass one hostname via EDS and prefer healthy ones (#8084 ) Co-authored-by: Matt Keeler <mkeeler@users.noreply.github.com> Currently when passing hostname clusters to Envoy, we set each service instance registered with Consul as an LbEndpoint for the cluster. However, Envoy can only handle one per cluster: [2020-06-04 18:32:34.094][1][warning][config] [source/common/config/grpc_subscription_impl.cc:87] gRPC config for type.googleapis.com/envoy.api.v2.Cluster rejected: Error adding/updating cluster(s) dc2.internal.ddd90499-9b47-91c5-4616-c0cbf0fc358a.consul: LOGICAL_DNS clusters must have a single locality_lb_endpoint and a single lb_endpoint, server.dc2.consul: LOGICAL_DNS clusters must have a single locality_lb_endpoint and a single lb_endpoint Envoy is currently handling this gracefully by only picking one of the endpoints. However, we should avoid passing multiple to avoid these warning logs. This PR: * Ensures we only pass one endpoint, which is tied to one service instance. * We prefer sending an endpoint which is marked as Healthy by Consul. * If no endpoints are healthy we emit a warning and skip the cluster. * If multiple unique hostnames are spread across service instances we emit a warning and let the user know which will be resolved.	2020-06-12 13:46:17 -06:00
Freddy	f759a48726	Enable gateways to resolve hostnames to IPv4 addresses (#7999 ) The DNS resolution will be handled by Envoy and defaults to LOGICAL_DNS. This discovery type can be overridden on a per-gateway basis with the envoy_dns_discovery_type Gateway Option. If a service contains an instance with a hostname as an address we set the Envoy cluster to use DNS as the discovery type rather than EDS. Since both mesh gateways and terminating gateways route to clusters using SNI, whenever there is a mix of hostnames and IP addresses associated with a service we use the hostname + CDS rather than the IPs + EDS. Note that we detect hostnames by attempting to parse the service instance's address as an IP. If it is not a valid IP we assume it is a hostname.	2020-06-03 15:28:45 -06:00
Daniel Nephin	ea6c2b2adc	ci: Add staticcheck and fix most errors Three of the checks are temporarily disabled to limit the size of the diff, and allow us to enable all the other checks in CI. In a follow up we can fix the issues reported by the other checks one at a time, and enable them.	2020-05-28 11:59:58 -04:00
Kyle Havlovitz	5aefdea1a8	Standardize support for Tagged and BindAddresses in Ingress Gateways (#7924 ) * Standardize support for Tagged and BindAddresses in Ingress Gateways This updates the TaggedAddresses and BindAddresses behavior for Ingress to match Mesh/Terminating gateways. The `consul connect envoy` command now also allows passing an address without a port for tagged/bind addresses. * Update command/connect/envoy/envoy.go Co-authored-by: Freddy <freddygv@users.noreply.github.com> * PR comments * Check to see if address is an actual IP address * Update agent/xds/listeners.go Co-authored-by: Freddy <freddygv@users.noreply.github.com> * fix whitespace Co-authored-by: Chris Piraino <cpiraino@hashicorp.com> Co-authored-by: Freddy <freddygv@users.noreply.github.com>	2020-05-21 09:08:12 -05:00
Chris Piraino	3931015a90	Remove development log line	2020-05-08 20:24:18 -07:00
Chris Piraino	a500262a77	Compute all valid DNSSANs for ingress gateways For DNSSANs we take into account the following and compute the appropriate wildcard values: - source datacenter - namespaces - alt domains	2020-05-08 20:23:17 -07:00
Chris Piraino	964e55e45e	Cleanup proxycfg for TLS - Use correct enterprise metadata for finding config entry - nil out cancel functions on config snapshot copy - Look at HostsSet when checking validity	2020-05-07 10:22:57 -05:00
Kyle Havlovitz	bd6bb3bf2d	Add TLS option and DNS SAN support to ingress config xds: Only set TLS context for ingress listener when requested	2020-05-06 15:12:02 -05:00
Chris Piraino	210dda5682	Allow Hosts field to be set on an ingress config entry - Validate that this cannot be set on a 'tcp' listener nor on a wildcard service. - Add Hosts field to api and test in consul config write CLI - xds: Configure envoy with user-provided hosts from ingress gateways	2020-05-06 15:06:13 -05:00
Kyle Havlovitz	e4268c8b7f	Support multiple listeners referencing the same service in gateway definitions	2020-05-06 15:06:13 -05:00
Kyle Havlovitz	b21cd112e5	Allow ingress gateways to route traffic based on Host header This commit adds the necessary changes to allow an ingress gateway to route traffic from a single defined port to multiple different upstream services in the Consul mesh. To do this, we now require all HTTP requests coming into the ingress gateway to specify a Host header that matches "<service-name>.*" in order to correctly route traffic to the correct service. - Differentiate multiple listener's route names by port - Adds a case in xds for allowing default discovery chains to create a route configuration when on an ingress gateway. This allows default services to easily use host header routing - ingress-gateways have a single route config for each listener that utilizes domain matching to route to different services.	2020-05-06 15:06:13 -05:00
Freddy	f5c1e5268b	TLS Origination for Terminating Gateways (#7671 )	2020-04-27 16:25:37 -06:00
freddygv	e30d64289d	PR comments	2020-04-27 11:08:41 -06:00
freddygv	6ecb3b7a42	Skip filter chain creation if no client cert	2020-04-27 11:08:41 -06:00
freddygv	b2b5942f4b	Un-nest switch in gateway update handler	2020-04-27 11:08:40 -06:00
freddygv	929491c979	Add subset support	2020-04-27 11:08:40 -06:00
freddygv	c80f89b92f	Add proxycfg state management for terminating-gateways	2020-04-27 11:07:06 -06:00
Kyle Havlovitz	6a5eba63ab	Ingress Gateways for TCP services (#7509 ) * Implements a simple, tcp ingress gateway workflow This adds a new type of gateway for allowing Ingress traffic into Connect from external services. Co-authored-by: Chris Piraino <cpiraino@hashicorp.com>	2020-04-16 14:00:48 -07:00
Chris Piraino	d7a870fd32	Fix flapping of mesh gateway connect-service watches (#7575 )	2020-04-02 10:12:13 -05:00
R.B. Boyer	a7fb26f50f	wan federation via mesh gateways (#6884 ) This is like a Möbius strip of code due to the fact that low-level components (serf/memberlist) are connected to high-level components (the catalog and mesh-gateways) in a twisty maze of references which make it hard to dive into. With that in mind here's a high level summary of what you'll find in the patch: There are several distinct chunks of code that are affected: * new flags and config options for the server * retry join WAN is slightly different * retry join code is shared to discover primary mesh gateways from secondary datacenters * because retry join logic runs in the agent and the results of that operation for primary mesh gateways are needed in the server there are some methods like `RefreshPrimaryGatewayFallbackAddresses` that must occur at multiple layers of abstraction just to pass the data down to the right layer. * new cache type `FederationStateListMeshGatewaysName` for use in `proxycfg/xds` layers * the function signature for RPC dialing picked up a new required field (the node name of the destination) * several new RPCs for manipulating a FederationState object: `FederationState:{Apply,Get,List,ListMeshGateways}` * 3 read-only internal APIs for debugging use to invoke those RPCs from curl * raft and fsm changes to persist these FederationStates * replication for FederationStates as they are canonically stored in the Primary and replicated to the Secondaries. * a special derivative of anti-entropy that runs in secondaries to snapshot their local mesh gateway `CheckServiceNodes` and sync them into their upstream FederationState in the primary (this works in conjunction with the replication to distribute addresses for all mesh gateways in all DCs to all other DCs) * a "gateway locator" convenience object to make use of this data to choose the addresses of gateways to use for any given RPC or gossip operation to a remote DC. This gets data from the "retry join" logic in the agent and also directly calls into the FSM. * RPC (`:8300`) on the server sniffs the first byte of a new connection to determine if it's actually doing native TLS. If so it checks the ALPN header for protocol determination (just like how the existing system uses the type-byte marker). * 2 new kinds of protocols are exclusively decoded via this native TLS mechanism: one for ferrying "packet" operations (udp-like) from the gossip layer and one for "stream" operations (tcp-like). The packet operations re-use sockets (using length-prefixing) to cut down on TLS re-negotiation overhead. * the server instances specially wrap the `memberlist.NetTransport` when running with gateway federation enabled (in a `wanfed.Transport`). The general gist is that if it tries to dial a node in the SAME datacenter (deduced by looking at the suffix of the node name) there is no change. If dialing a DIFFERENT datacenter it is wrapped up in a TLS+ALPN blob and sent through some mesh gateways to eventually end up in a server's :8300 port. * a new flag when launching a mesh gateway via `consul connect envoy` to indicate that the servers are to be exposed. This sets a special service meta when registering the gateway into the catalog. * `proxycfg/xds` notice this metadata blob to activate additional watches for the FederationState objects as well as the location of all of the consul servers in that datacenter. * `xds:` if the extra metadata is in place additional clusters are defined in a DC to bulk sink all traffic to another DC's gateways. For the current datacenter we listen on a wildcard name (`server.<dc>.consul`) that load balances all servers as well as one mini-cluster per node (`<node>.server.<dc>.consul`) * the `consul tls cert create` command got a new flag (`-node`) to help create an additional SAN in certs that can be used with this flavor of federation.	2020-03-09 15:59:02 -05:00
Lars Lehtonen	83a3136b5a	agent/proxycfg: fix dropped error in state.initWatchesMeshGateway() (#7267 )	2020-02-18 14:41:01 +01:00
Matt Keeler	2524a028ea	OSS Changes for various config entry namespacing bugs (#7226 )	2020-02-06 10:52:25 -05:00
Matt Keeler	111cb51fc8	Testing updates to support namespaced testing of the agent/xds… (#7185 ) * Various testing updates to support namespaced testing of the agent/xds package * agent/proxycfg package updates to support better namespace testing	2020-02-03 09:26:47 -05:00
Chris Piraino	3dd0b59793	Allow users to configure either unstructured or JSON logging (#7130 ) * hclog Allow users to choose between unstructured and JSON logging	2020-01-28 17:50:41 -06:00
Matt Keeler	485a0a65ea	Updates to Config Entries and Connect for Namespaces (#7116 )	2020-01-24 10:04:58 -05:00
Matt Keeler	0b4bd016a9	Move where the service-resolver watch is done so that it happen… (#7025 ) Before we were issuing 1 watch for every service in the services listing which would have caused the agent to process many more identical events simultaneously.	2020-01-10 10:30:13 -05:00
R.B. Boyer	b091647090	agent: allow mesh gateways to initialize even if there are no connect services registered yet (#6576 ) Fixes #6543 Also improved some of the proxycfg tests to cover snapshot validity better.	2019-10-17 16:46:49 -05:00
R.B. Boyer	55fdae203f	agent: cache notifications work after error if the underlying RPC returns index=1 (#6547 ) Fixes #6521 Ensure that initial failures to fetch an agent cache entry using the notify API where the underlying RPC returns a synthetic index of 1 correctly recovers when those RPCs resume working. The bug in the Cache.notifyBlockingQuery used to incorrectly "fix" the index for the next query from 0 to 1 for all queries, when it should have not done so for queries that errored. Also fixed some things that made debugging difficult: - config entry read/list endpoints send back QueryMeta headers - xds event loops don't swallow the cache notification errors	2019-09-26 10:42:17 -05:00
Freddy	5eace88ce2	Expose HTTP-based paths through Connect proxy (#6446 ) Fixes: #5396 This PR adds a proxy configuration stanza called expose. These flags register listeners in Connect sidecar proxies to allow requests to specific HTTP paths from outside of the node. This allows services to protect themselves by only listening on the loopback interface, while still accepting traffic from non Connect-enabled services. Under expose there is a boolean checks flag that would automatically expose all registered HTTP and gRPC check paths. This stanza also accepts a paths list to expose individual paths. The primary use case for this functionality would be to expose paths for third parties like Prometheus or the kubelet. Listeners for requests to exposed paths are be configured dynamically at run time. Any time a proxy, or check can be registered, a listener can also be created. In this initial implementation requests to these paths are not authenticated/encrypted.	2019-09-25 20:55:52 -06:00
R.B. Boyer	64fc002e03	connect: fix failover through a mesh gateway to a remote datacenter (#6259 ) Failover is pushed entirely down to the data plane by creating envoy clusters and putting each successive destination in a different load assignment priority band. For example this shows that normally requests go to 1.2.3.4:8080 but when that fails they go to 6.7.8.9:8080: - name: foo load_assignment: cluster_name: foo policy: overprovisioning_factor: 100000 endpoints: - priority: 0 lb_endpoints: - endpoint: address: socket_address: address: 1.2.3.4 port_value: 8080 - priority: 1 lb_endpoints: - endpoint: address: socket_address: address: 6.7.8.9 port_value: 8080 Mesh gateways route requests based solely on the SNI header tacked onto the TLS layer. Envoy currently only lets you configure the outbound SNI header at the cluster layer. If you try to failover through a mesh gateway you ideally would configure the SNI value per endpoint, but that's not possible in envoy today. This PR introduces a simpler way around the problem for now: 1. We identify any target of failover that will use mesh gateway mode local or remote and then further isolate any resolver node in the compiled discovery chain that has a failover destination set to one of those targets. 2. For each of these resolvers we will perform a small measurement of comparative healths of the endpoints that come back from the health API for the set of primary target and serial failover targets. We walk the list of targets in order and if any endpoint is healthy we return that target, otherwise we move on to the next target. 3. The CDS and EDS endpoints both perform the measurements in (2) for the affected resolver nodes. 4. For CDS this measurement selects which TLS SNI field to use for the cluster (note the cluster is always going to be named for the primary target) 5. For EDS this measurement selects which set of endpoints will populate the cluster. Priority tiered failover is ignored. One of the big downsides to this approach to failover is that the failover detection and correction is going to be controlled by consul rather than deferring that entirely to the data plane as with the prior version. This also means that we are bound to only failover using official health signals and cannot make use of data plane signals like outlier detection to affect failover. In this specific scenario the lack of data plane signals is ok because the effectiveness is already muted by the fact that the ultimate destination endpoints will have their data plane signals scrambled when they pass through the mesh gateway wrapper anyway so we're not losing much. Another related fix is that we now use the endpoint health from the underlying service, not the health of the gateway (regardless of failover mode).	2019-08-05 13:30:35 -05:00
R.B. Boyer	0165e93517	connect: expose an API endpoint to compile the discovery chain (#6248 ) In addition to exposing compilation over the API cleaned up the structures that would be exchanged to be cleaner and easier to support and understand. Also removed ability to configure the envoy OverprovisioningFactor.	2019-08-02 15:34:54 -05:00
R.B. Boyer	782c647bf4	connect: simplify the compiled discovery chain data structures (#6242 ) This should make them better for sending over RPC or the API. Instead of a chain implemented explicitly like a linked list (nodes holding pointers to other nodes) instead switch to a flat map of named nodes with nodes linking other other nodes by name. The shipped structure is just a map and a string to indicate which key to start from. Other changes: * inline the compiler option InferDefaults as true * introduce compiled target config to avoid needing to send back additional maps of Resolvers; future target-specific compiled state can go here * move compiled MeshGateway out of the Resolver and into the TargetConfig where it makes more sense.	2019-08-01 22:44:05 -05:00
R.B. Boyer	4666599e18	connect: reconcile how upstream configuration works with discovery chains (#6225 ) * connect: reconcile how upstream configuration works with discovery chains The following upstream config fields for connect sidecars sanely integrate into discovery chain resolution: - Destination Namespace/Datacenter: Compilation occurs locally but using different default values for namespaces and datacenters. The xDS clusters that are created are named as they normally would be. - Mesh Gateway Mode (single upstream): If set this value overrides any value computed for any resolver for the entire discovery chain. The xDS clusters that are created may be named differently (see below). - Mesh Gateway Mode (whole sidecar): If set this value overrides any value computed for any resolver for the entire discovery chain. If this is specifically overridden for a single upstream this value is ignored in that case. The xDS clusters that are created may be named differently (see below). - Protocol (in opaque config): If set this value overrides the value computed when evaluating the entire discovery chain. If the normal chain would be TCP or if this override is set to TCP then the result is that we explicitly disable L7 Routing and Splitting. The xDS clusters that are created may be named differently (see below). - Connect Timeout (in opaque config): If set this value overrides the value for any resolver in the entire discovery chain. The xDS clusters that are created may be named differently (see below). If any of the above overrides affect the actual result of compiling the discovery chain (i.e. "tcp" becomes "grpc" instead of being a no-op override to "tcp") then the relevant parameters are hashed and provided to the xDS layer as a prefix for use in naming the Clusters. This is to ensure that if one Upstream discovery chain has no overrides and tangentially needs a cluster named "api.default.XXX", and another Upstream does have overrides for "api.default.XXX" that they won't cross-pollinate against the operator's wishes. Fixes #6159	2019-08-01 22:03:34 -05:00
R.B. Boyer	2bfad66efa	connect: rework how the service resolver subset OnlyPassing flag works (#6173 ) The main change is that we no longer filter service instances by health, preferring instead to render all results down into EDS endpoints in envoy and merely label the endpoints as HEALTHY or UNHEALTHY. When OnlyPassing is set to true we will force consul checks in a 'warning' state to render as UNHEALTHY in envoy. Fixes #6171	2019-07-23 20:20:24 -05:00
Matt Keeler	3914ec5c62	Various Gateway Fixes (#6093 ) * Ensure the mesh gateway configuration comes back in the api within each upstream * Add a test for the MeshGatewayConfig in the ToAPI functions * Ensure we don’t use gateways for dc local connections * Update the svc kind index for deletions * Replace the proxycfg.state cache with an interface for testing Also start implementing proxycfg state testing. * Update the state tests to verify some gateway watches for upstream-targets of a discovery chain.	2019-07-12 17:19:37 -04:00
R.B. Boyer	72a8195839	implement some missing service-router features and add more xDS testing (#6065 ) - also implement OnlyPassing filters for non-gateway clusters	2019-07-12 14:16:21 -05:00
Jack Pearkes	2b1761bab3	Make cluster names SNI always (#6081 ) * Make cluster names SNI always * Update some tests * Ensure we check for prepared query types * Use sni for route cluster names * Proper mesh gateway mode defaulting when the discovery chain is used * Ignore service splits from PatchSliceOfMaps * Update some xds golden files for proper test output * Allow for grpc/http listeners/cluster configs with the disco chain * Update stats expectation	2019-07-08 12:48:48 +01:00
Matt Keeler	35a839952b	Fix Internal.ServiceDump blocking (#6076 ) maxIndexWatchTxn was only watching the IndexEntry of the max index of all the entries. It needed to watch all of them regardless of which was the max. Also plumbed the query source through in the proxy config to help better track requests.	2019-07-04 16:17:49 +01:00
Matt Keeler	e916f2d954	Implement mesh gateway management of service subsets Fixup some error handling	2019-07-02 10:29:37 -04:00
R.B. Boyer	bccbb2b4ae	activate most discovery chain features in xDS for envoy (#6024 )	2019-07-01 22:10:51 -05:00
Matt Keeler	39bb0e3e77	Implement Mesh Gateways This includes both ingress and egress functionality.	2019-07-01 16:28:30 -04:00
Matt Keeler	b13dc6289b	Prepare for having different service kinds that are all generic… (#6013 ) Default to internal error when service kind is unknown	2019-06-24 15:05:36 -04:00
Paul Banks	d6c0557e86	Connect: allow configuring Envoy for L7 Observability (#5558 ) * Add support for HTTP proxy listeners * Add customizable bootstrap configuration options * Debug logging for xDS AuthZ * Add Envoy Integration test suite with basic test coverage * Add envoy command tests to cover new cases * Add tracing integration test * Add gRPC support WIP * Merged changes from master Docker. get CI integration to work with same Dockerfile now * Make docker build optional for integration * Enable integration tests again! * http2 and grpc integration tests and fixes * Fix up command config tests * Store all container logs as artifacts in circle on fail * Add retries to outer part of stats measurements as we keep missing them in CI * Only dump logs on failing cases * Fix typos from code review * Review tidying and make tests pass again * Add debug logs to exec test. * Fix legit test failure caused by upstream rename in envoy config * Attempt to reduce cases of bad TLS handshake in CI integration tests * bring up the right service * Add prometheus integration test * Add test for denied AuthZ both HTTP and TCP * Try ANSI term for Circle	2019-04-29 17:27:57 +01:00
R.B. Boyer	91e78e00c7	fix typos reported by golangci-lint:misspell (#5434 )	2019-03-06 11:13:28 -06:00
Matt Keeler	8e54856c46	Implement prepared query upstreams watching for envoy (#5224 ) Fixes #4969 This implements non-blocking request polling at the cache layer which is currently only used for prepared queries. Additionally this enables the proxycfg manager to poll prepared queries for use in envoy proxy upstreams.	2019-01-18 12:44:04 -05:00
Paul Banks	10af44006a	Proxy Config Manager (#4729 ) * Proxy Config Manager This component watches for local state changes on the agent and ensures that each service registered locally with Kind == connect-proxy has it's state being actively populated in the cache. This serves two purposes: 1. For the built-in proxy, it ensures that the state needed to accept connections is available in RAM shortly after registration and likely before the proxy actually starts accepting traffic. 2. For (future - next PR) xDS server and other possible future proxies that require _push_ based config discovery, this provides a mechanism to subscribe and be notified about updates to a proxy instance's config including upstream service discovery results. * Address review comments * Better comments; Better delivery of latest snapshot for slow watchers; Embed Config * Comment typos * Add upstream Stringer for funsies	2018-10-10 16:55:34 +01:00

1 2 3

104 Commits