- `tls.incoming`: applies to the inbound mTLS targeting the public
listener on `connect-proxy` and `terminating-gateway` envoy instances
- `tls.outgoing`: applies to the outbound mTLS dialing upstreams from
`connect-proxy` and `ingress-gateway` envoy instances
Fixes#11966
Transparent proxies typically cannot dial upstreams in remote
datacenters. However, if their upstream configures a redirect to a
remote DC then the upstream targets will be in another datacenter.
In that sort of case we should use the WAN address for the passthrough.
Due to timing, a transparent proxy could have two upstreams to dial
directly with the same address.
For example:
- The orders service can dial upstreams shipping and payment directly.
- An instance of shipping at address 10.0.0.1 is deregistered.
- Payments is scaled up and scheduled to have address 10.0.0.1.
- The orders service receives the event for the new payments instance
before seeing the deregistration for the shipping instance. At this
point two upstreams have the same passthrough address and Envoy will
reject the listener configuration.
To disambiguate this commit considers the Raft index when storing
passthrough addresses. In the example above, 10.0.0.1 would only be
associated with the newer payments service instance.
Transparent proxies can set up filter chains that allow direct
connections to upstream service instances. Services that can be dialed
directly are stored in the PassthroughUpstreams map of the proxycfg
snapshot.
Previously these addresses were not being cleaned up based on new
service health data. The list of addresses associated with an upstream
service would only ever grow.
As services scale up and down, eventually they will have instances
assigned to an IP that was previously assigned to a different service.
When IP addresses are duplicated across filter chain match rules the
listener config will be rejected by Envoy.
This commit updates the proxycfg snapshot management so that passthrough
addresses can get cleaned up when no longer associated with a given
upstream.
There is still the possibility of a race condition here where due to
timing an address is shared between multiple passthrough upstreams.
That concern is mitigated by #12195, but will be further addressed
in a follow-up.
set -euo pipefail
unset CDPATH
cd "$(dirname "$0")"
for f in $(git grep '\brequire := require\.New(' | cut -d':' -f1 | sort -u); do
echo "=== require: $f ==="
sed -i '/require := require.New(t)/d' $f
# require.XXX(blah) but not require.XXX(tblah) or require.XXX(rblah)
sed -i 's/\brequire\.\([a-zA-Z0-9_]*\)(\([^tr]\)/require.\1(t,\2/g' $f
# require.XXX(tblah) but not require.XXX(t, blah)
sed -i 's/\brequire\.\([a-zA-Z0-9_]*\)(\(t[^,]\)/require.\1(t,\2/g' $f
# require.XXX(rblah) but not require.XXX(r, blah)
sed -i 's/\brequire\.\([a-zA-Z0-9_]*\)(\(r[^,]\)/require.\1(t,\2/g' $f
gofmt -s -w $f
done
for f in $(git grep '\bassert := assert\.New(' | cut -d':' -f1 | sort -u); do
echo "=== assert: $f ==="
sed -i '/assert := assert.New(t)/d' $f
# assert.XXX(blah) but not assert.XXX(tblah) or assert.XXX(rblah)
sed -i 's/\bassert\.\([a-zA-Z0-9_]*\)(\([^tr]\)/assert.\1(t,\2/g' $f
# assert.XXX(tblah) but not assert.XXX(t, blah)
sed -i 's/\bassert\.\([a-zA-Z0-9_]*\)(\(t[^,]\)/assert.\1(t,\2/g' $f
# assert.XXX(rblah) but not assert.XXX(r, blah)
sed -i 's/\bassert\.\([a-zA-Z0-9_]*\)(\(r[^,]\)/assert.\1(t,\2/g' $f
gofmt -s -w $f
done
The gist here is that now we use a value-type struct proxycfg.UpstreamID
as the map key in ConfigSnapshot maps where we used to use "upstream
id-ish" strings. These are internal only and used just for bidirectional
trips through the agent cache keyspace (like the discovery chain target
struct).
For the few places where the upstream id needs to be projected into xDS,
that's what (proxycfg.UpstreamID).EnvoyID() is for. This lets us ALWAYS
inject the partition and namespace into these things without making
stuff like the golden testdata diverge.
The duo of `makeUpstreamFilterChainForDiscoveryChain` and `makeListenerForDiscoveryChain` were really hard to reason about, and led to concealing a bug in their branching logic. There were several issues here:
- They tried to accomplish too much: determining filter name, cluster name, and whether RDS should be used.
- They embedded logic to handle significantly different kinds of upstream listeners (passthrough, prepared query, typical services, and catch-all)
- They needed to coalesce different data sources (Upstream and CompiledDiscoveryChain)
Rather than handling all of those tasks inside of these functions, this PR pulls out the RDS/clusterName/filterName logic.
This refactor also fixed a bug with the handling of [UpstreamDefaults](https://www.consul.io/docs/connect/config-entries/service-defaults#defaults). These defaults get stored as UpstreamConfig in the proxy snapshot with a DestinationName of "*", since they apply to all upstreams. However, this wildcard destination name must not be used when creating the name of the associated upstream cluster. The coalescing logic in the original functions here was in some situations creating clusters with a `*.` prefix, which is not a valid destination.
This will behave the way we handle SNI and SPIFFE IDs, where the default
partition is excluded.
Excluding the default ensures that don't attempt to compare default.dc2
to dc2 in OSS.
This commit updates mesh gateway watches for cross-partitions
communication.
* Mesh gateways are keyed by partition and datacenter.
* Mesh gateways will now watch gateways in partitions that export
services to their partition.
* Mesh gateways in non-default partitions will not have cross-datacenter
watches. They are not involved in traditional WAN federation.
Previously the datacenter of the gateway was the key identifier, now it
is the datacenter and partition.
When dialing services in other partitions or datacenters we now watch
the appropriate partition.
This commit extracts all the kind-specific logic into handler types, and
keeps the generic parts on the state struct. This change should make it
easier to add new kinds, and see the implementation of each kind more
clearly.
context.Context should never be stored on a struct (as it says in the godoc) because it is easy to
to end up with the wrong context when it is stored.
Also see https://blog.golang.org/context-and-structs
This change is also in preparation for splitting state into kind-specific handlers so that the
implementation of each kind is grouped together.
Co-authored-by: R.B. Boyer <4903+rboyer@users.noreply.github.com>
Previously we would associate the address of a discovery chain target
with the discovery chain's filter chain. This was broken for a few reasons:
- If the upstream is a virtual service, the client proxy has no way of
dialing it because virtual services are not targets of their discovery
chains. The targets are distinct services. This is addressed by watching
the endpoints of all upstream services, not just their discovery chain
targets.
- If multiple discovery chains resolve to the same target, that would
lead to multiple filter chains attempting to match on the target's
virtual IP. This is addressed by only matching on the upstream's virtual
IP.
NOTE: this implementation requires an intention to the redirecting
virtual service and not just to the final destination. This is how
we can know that the virtual service is an upstream to watch.
A later PR will look into traversing discovery chains when computing
upstreams so that intentions are only required to the discovery chain
targets.
No config entry needs a Kind field. It is only used to determine the Go type to
target. As we introduce new config entries (like this one) we can remove the kind field
and have the GetKind method return the single supported value.
In this case (similar to proxy-defaults) the Name field is also unnecessary. We always
use the same value. So we can omit the name field entirely.
This config entry is being renamed primarily because in k8s the name
cluster could be confusing given that the config entry applies across
federated datacenters.
Additionally, this config entry will only apply to Consul as a service
mesh, so the more generic "cluster" name is not needed.
This PR replaces the original boolean used to configure transparent
proxy mode. It was replaced with a string mode that can be set to:
- "": Empty string is the default for when the setting should be
defaulted from other configuration like config entries.
- "direct": Direct mode is how applications originally opted into the
mesh. Proxy listeners need to be dialed directly.
- "transparent": Transparent mode enables configuring Envoy as a
transparent proxy. Traffic must be captured and redirected to the
inbound and outbound listeners.
This PR also adds a struct for transparent proxy specific configuration.
Initially this is not stored as a pointer. Will revisit that decision
before GA.