Commit Graph

116 Commits

Author SHA1 Message Date
freddygv eeccba945d Replace TransparentProxy bool with ProxyMode
This PR replaces the original boolean used to configure transparent
proxy mode. It was replaced with a string mode that can be set to:

- "": Empty string is the default for when the setting should be
defaulted from other configuration like config entries.
- "direct": Direct mode is how applications originally opted into the
mesh. Proxy listeners need to be dialed directly.
- "transparent": Transparent mode enables configuring Envoy as a
transparent proxy. Traffic must be captured and redirected to the
inbound and outbound listeners.

This PR also adds a struct for transparent proxy specific configuration.
Initially this is not stored as a pointer. Will revisit that decision
before GA.
2021-04-12 09:35:14 -06:00
freddygv 0d0205e0dc PR comments 2021-04-08 11:16:03 -06:00
freddygv ddc6c9b7ca Ensure mesh gateway mode override is set for upstreams for intentions 2021-04-07 09:32:48 -06:00
freddygv 619dc5ede4 Finish resolving upstream defaults in proxycfg 2021-04-07 09:32:48 -06:00
R.B. Boyer 82245585c6
connect: add toggle to globally disable wildcard outbound network access when transparent proxy is enabled (#9973)
This adds a new config entry kind "cluster" with a single special name "cluster" where this can be controlled.
2021-04-06 13:19:59 -05:00
freddygv b56bd690aa Fixup enterprise tests from tproxy changes 2021-03-17 23:05:00 -06:00
freddygv 291d7562d1 Cancel watch on all errors 2021-03-17 21:44:14 -06:00
freddygv 6c43195e2a Merge master and fix upstream config protocol defaulting 2021-03-17 21:13:40 -06:00
freddygv 3c7e5c3308 PR comments 2021-03-17 16:18:56 -06:00
freddygv 3c97e5a777 Update proxycfg for transparent proxy 2021-03-17 13:40:39 -06:00
Daniel Nephin 2a53b8293a proxycfg: use rpcclient/health.Client instead of passing around cache name
This should allow us to swap out the implementation with something other
than `agent/cache` without making further code changes.
2021-03-12 11:46:04 -05:00
Daniel Nephin 410b1261c2 proxycfg: Use streaming in connect state 2021-03-12 11:35:42 -05:00
Freddy 5a50b26767
Avoid potential proxycfg/xDS deadlock using non-blocking send 2021-02-08 16:14:06 -07:00
freddygv a417f88e44 Update comments on avoiding proxycfg deadlock 2021-02-08 09:45:45 -07:00
R.B. Boyer 77424e179a
xds: prevent LDS flaps in mesh gateways due to unstable datacenter lists (#9651)
Also fix a similar issue in Terminating Gateways that was masked by an overzealous test.
2021-02-08 10:19:57 -06:00
freddygv 0a8f2f2105 Retry send after timer fires, in case no updates occur 2021-02-05 18:00:59 -07:00
freddygv 57c29aba5d Update proxycfg logging, labels were already attached 2021-02-05 15:14:49 -07:00
freddygv a0be7dcc1d Add trace logs to proxycfg state runner and xds srv 2021-02-02 12:26:38 -07:00
freddygv 0fb96afe31 Avoid potential deadlock using non-blocking send
Deadlock scenario:
    1. Due to scheduling, the state runner sends one snapshot into
    snapCh and then attempts to send a second. The first send succeeds
    because the channel is buffered, but the second blocks.
    2. Separately, Manager.Watch is called by the xDS server after
    getting a discovery request from Envoy. This function acquires the
    manager lock and then blocks on receiving the CurrentSnapshot from
    the state runner.
    3. Separately, there is a Manager goroutine that reads the snapshots
    from the channel in step 1. These reads are done to notify proxy
    watchers, but they require holding the manager lock. This goroutine
    goes to acquire that lock, but can't because it is held by step 2.

Now, the goroutine from step 3 is waiting on the one from step 2 to
release the lock. The goroutine from step 2 won't release the lock until
the goroutine in step 1 advances. But the goroutine in step 1 is waiting
for the one in step 3. Deadlock.

By making this send non-blocking step 1 above can proceed. The coalesce
timer will be reset and a new valid snapshot will be delivered after it
elapses or when one is requested by xDS.
2021-02-02 11:31:14 -07:00
Daniel Nephin ef0999547a testing: skip slow tests with -short
Add a skip condition to all tests slower than 100ms.

This change was made using `gotestsum tool slowest` with data from the
last 3 CI runs of master.
See https://github.com/gotestyourself/gotestsum#finding-and-skipping-slow-tests

With this change:

```
$ time go test -count=1 -short ./agent
ok      github.com/hashicorp/consul/agent       0.743s

real    0m4.791s

$ time go test -count=1 -short ./agent/consul
ok      github.com/hashicorp/consul/agent/consul        4.229s

real    0m8.769s
```
2020-12-07 13:42:55 -05:00
freddygv e0db834148 Fix text type assertion 2020-09-14 16:28:40 -06:00
freddygv 43efb4809c Merge master 2020-09-14 16:17:43 -06:00
freddygv 66e5c5989a Fix type assertion 2020-09-14 16:12:21 -06:00
freddygv 60cb306524 Add session flag to cookie config 2020-09-11 18:34:03 -06:00
freddygv 5871b667a5 Revert EnvoyConfig nesting 2020-09-11 09:21:43 -06:00
freddygv 0c50b8e769 Add explicit protocol overrides in tgw xds test cases 2020-09-03 08:57:48 -06:00
freddygv daad3b9210 Remove LB infix and move injection to xds 2020-09-02 15:13:50 -06:00
freddygv d7bda050e0 Restructure structs and other PR comments 2020-09-02 09:10:50 -06:00
freddygv 194d34b09d Pass LB config to Envoy via xDS 2020-08-28 14:27:40 -06:00
R.B. Boyer f2b8bf109c
xds: use envoy's rbac filter to handle intentions entirely within envoy (#8569) 2020-08-27 12:20:58 -05:00
Matt Keeler 2ec4e46eb2
Default Cache rate limiting options in New
Also get rid of the TestCache helper which was where these defaults were happening previously.
2020-07-28 12:34:35 -04:00
Pierre Souchay 947d8eb039
Added ratelimit to handle throtling cache (#8226)
This implements a solution for #7863

It does:

    Add a new config cache.entry_fetch_rate to limit the number of calls/s for a given cache entry, default value = rate.Inf
    Add cache.entry_fetch_max_burst size of rate limit (default value = 2)

The new configuration now supports the following syntax for instance to allow 1 query every 3s:

    command line HCL: -hcl 'cache = { entry_fetch_rate = 0.333}'
    in JSON

{
  "cache": {
    "entry_fetch_rate": 0.333
  }
}
2020-07-27 23:11:11 +02:00
Matt Keeler 6d94900cd7
Disable background cache refresh for Connect Leaf Certs
The rationale behind removing them is that all of our own code (xDS, builtin connect proxy) use the cache notification mechanism. This ensures that the blocking fetch behind the scenes is always executing. Therefore the only way you might go to get a certificate and have to wait is when 1) the request has never been made for that cert before or 2) you are using the v1/agent/connect/ca/leaf API for retrieving the cert yourself.

In the first case, the refresh change doesn’t alter the behavior. In the second case, it can be mitigated by using blocking queries with that API which just like normal cache notification mechanism will cause the blocking fetch to be initiated and to get leaf certs as soon as needed.

If you are not using blocking queries, or Envoy/xDS, or the builtin connect proxy but are retrieving the certs yourself then the HTTP endpoint might take a little longer to respond.

This also renames the RefreshTimeout field on the register options to QueryTimeout to more accurately reflect that it is used for any type that supports blocking queries.
2020-07-21 12:19:25 -04:00
Daniel Nephin 07c1081d39 Fix a bunch of unparam lint issues 2020-06-24 13:00:14 -04:00
Freddy 7e7c783c8f
Always return a gateway cluster (#8158) 2020-06-19 13:31:39 -06:00
Daniel Nephin 1ef8279ac9
Merge pull request #8034 from hashicorp/dnephin/add-linter-staticcheck-4
ci: enable SA4006 staticcheck check and add ineffassign
2020-06-17 12:16:02 -04:00
Daniel Nephin 89d95561df Enable gofmt simplify
Code changes done automatically with 'gofmt -s -w'
2020-06-16 13:21:11 -04:00
Daniel Nephin 5f24171f13 ci: enable SA4006 staticcheck check
And fix the 'value not used' issues.

Many of these are not bugs, but a few are tests not checking errors, and
one appears to be a missed error in non-test code.
2020-06-16 13:10:11 -04:00
freddygv 1e7e716742 Move compound service names to use ServiceName type 2020-06-12 13:47:43 -06:00
Freddy 66e2def461
Only pass one hostname via EDS and prefer healthy ones (#8084)
Co-authored-by: Matt Keeler <mkeeler@users.noreply.github.com>

Currently when passing hostname clusters to Envoy, we set each service instance registered with Consul as an LbEndpoint for the cluster.

However, Envoy can only handle one per cluster:
[2020-06-04 18:32:34.094][1][warning][config] [source/common/config/grpc_subscription_impl.cc:87] gRPC config for type.googleapis.com/envoy.api.v2.Cluster rejected: Error adding/updating cluster(s) dc2.internal.ddd90499-9b47-91c5-4616-c0cbf0fc358a.consul: LOGICAL_DNS clusters must have a single locality_lb_endpoint and a single lb_endpoint, server.dc2.consul: LOGICAL_DNS clusters must have a single locality_lb_endpoint and a single lb_endpoint

Envoy is currently handling this gracefully by only picking one of the endpoints. However, we should avoid passing multiple to avoid these warning logs.

This PR:

* Ensures we only pass one endpoint, which is tied to one service instance.
* We prefer sending an endpoint which is marked as Healthy by Consul.
* If no endpoints are healthy we emit a warning and skip the cluster.
* If multiple unique hostnames are spread across service instances we emit a warning and let the user know which will be resolved.
2020-06-12 13:46:17 -06:00
Freddy f759a48726
Enable gateways to resolve hostnames to IPv4 addresses (#7999)
The DNS resolution will be handled by Envoy and defaults to LOGICAL_DNS. This discovery type can be overridden on a per-gateway basis with the envoy_dns_discovery_type Gateway Option.

If a service contains an instance with a hostname as an address we set the Envoy cluster to use DNS as the discovery type rather than EDS. Since both mesh gateways and terminating gateways route to clusters using SNI, whenever there is a mix of hostnames and IP addresses associated with a service we use the hostname + CDS rather than the IPs + EDS.

Note that we detect hostnames by attempting to parse the service instance's address as an IP. If it is not a valid IP we assume it is a hostname.
2020-06-03 15:28:45 -06:00
Daniel Nephin ea6c2b2adc ci: Add staticcheck and fix most errors
Three of the checks are temporarily disabled to limit the size of the
diff, and allow us to enable all the other checks in CI.

In a follow up we can fix the issues reported by the other checks one
at a time, and enable them.
2020-05-28 11:59:58 -04:00
Kyle Havlovitz 5aefdea1a8
Standardize support for Tagged and BindAddresses in Ingress Gateways (#7924)
* Standardize support for Tagged and BindAddresses in Ingress Gateways

This updates the TaggedAddresses and BindAddresses behavior for Ingress
to match Mesh/Terminating gateways. The `consul connect envoy` command
now also allows passing an address without a port for tagged/bind
addresses.

* Update command/connect/envoy/envoy.go

Co-authored-by: Freddy <freddygv@users.noreply.github.com>

* PR comments

* Check to see if address is an actual IP address

* Update agent/xds/listeners.go

Co-authored-by: Freddy <freddygv@users.noreply.github.com>

* fix whitespace

Co-authored-by: Chris Piraino <cpiraino@hashicorp.com>
Co-authored-by: Freddy <freddygv@users.noreply.github.com>
2020-05-21 09:08:12 -05:00
Chris Piraino 0995906b24 Add service id context to the proxycfg logger
This is especially useful when multiple proxies are all querying the
same Consul agent.
2020-05-18 09:08:05 -05:00
Kyle Havlovitz 28b4819882
Merge pull request #7759 from hashicorp/ingress/tls-hosts
Add TLS option for Ingress Gateway listeners
2020-05-11 09:18:43 -07:00
Chris Piraino 3931015a90 Remove development log line 2020-05-08 20:24:18 -07:00
Chris Piraino a500262a77 Compute all valid DNSSANs for ingress gateways
For DNSSANs we take into account the following and compute the
appropriate wildcard values:
- source datacenter
- namespaces
- alt domains
2020-05-08 20:23:17 -07:00
Freddy a37d7a42c9
Fix up enterprise compatibility for gateways (#7813) 2020-05-08 09:44:34 -06:00
Chris Piraino 964e55e45e Cleanup proxycfg for TLS
- Use correct enterprise metadata for finding config entry
- nil out cancel functions on config snapshot copy
- Look at HostsSet when checking validity
2020-05-07 10:22:57 -05:00
Freddy a749f46316
Remove timeout and call to Fatal from goroutine (#7797) 2020-05-06 14:33:17 -06:00