Co-authored-by: Ashvitha Sridharan <ashvitha.sridharan@hashicorp.com>
Co-authored-by: Freddy <freddygv@users.noreply.github.com>
Add a new envoy flag: "envoy_hcp_metrics_bind_socket_dir", a directory
where a unix socket will be created with the name
`<namespace>_<proxy_id>.sock` to forward Envoy metrics.
If set, this will configure:
- In bootstrap configuration a local stats_sink and static cluster.
These will forward metrics to a loopback listener sent over xDS.
- A dynamic listener listening at the socket path that the previously
defined static cluster is sending metrics to.
- A dynamic cluster that will forward traffic received at this listener
to the hcp-metrics-collector service.
Reasons for having a static cluster pointing at a dynamic listener:
- We want to secure the metrics stream using TLS, but the stats sink can
only be defined in bootstrap config. With dynamic listeners/clusters
we can use the proxy's leaf certificate issued by the Connect CA,
which isn't available at bootstrap time.
- We want to intelligently route to the HCP collector. Configuring its
addreess at bootstrap time limits our flexibility routing-wise. More
on this below.
Reasons for defining the collector as an upstream in `proxycfg`:
- The HCP collector will be deployed as a mesh service.
- Certificate management is taken care of, as mentioned above.
- Service discovery and routing logic is automatically taken care of,
meaning that no code changes are required in the xds package.
- Custom routing rules can be added for the collector using discovery
chain config entries. Initially the collector is expected to be
deployed to each admin partition, but in the future could be deployed
centrally in the default partition. These config entries could even be
managed by HCP itself.
* Fix issue where terminating gateway service resolvers weren't properly cleaned up
* Add integration test for cleaning up resolvers
* Add changelog entry
* Use state test and drop integration test
Protobuf Refactoring for Multi-Module Cleanliness
This commit includes the following:
Moves all packages that were within proto/ to proto/private
Rewrites imports to account for the packages being moved
Adds in buf.work.yaml to enable buf workspaces
Names the proto-public buf module so that we can override the Go package imports within proto/buf.yaml
Bumps the buf version dependency to 1.14.0 (I was trying out the version to see if it would get around an issue - it didn't but it also doesn't break things and it seemed best to keep up with the toolchain changes)
Why:
In the future we will need to consume other protobuf dependencies such as the Google HTTP annotations for openapi generation or grpc-gateway usage.
There were some recent changes to have our own ratelimiting annotations.
The two combined were not working when I was trying to use them together (attempting to rebase another branch)
Buf workspaces should be the solution to the problem
Buf workspaces means that each module will have generated Go code that embeds proto file names relative to the proto dir and not the top level repo root.
This resulted in proto file name conflicts in the Go global protobuf type registry.
The solution to that was to add in a private/ directory into the path within the proto/ directory.
That then required rewriting all the imports.
Is this safe?
AFAICT yes
The gRPC wire protocol doesn't seem to care about the proto file names (although the Go grpc code does tack on the proto file name as Metadata in the ServiceDesc)
Other than imports, there were no changes to any generated code as a result of this.
Fix local mesh gateway with peering discovery chains.
Prior to this patch, discovery chains with peers would not
properly honor the mesh gateway mode for two reasons.
1. An incorrect target upstream ID was used to lookup the
mesh gateway mode. To fix this, the parent upstream uid is
now used instead of the discovery-chain-target-uid to find
the intended mesh gateway mode.
2. The watch for local mesh gateways was never initialized
for discovery chains. To fix this, the discovery chains are
now scanned, and a local GW watch is spawned if: the mesh
gateway mode is local and the target is a peering connection.
* Fix mesh gateway proxy-defaults not affecting upstreams.
* Clarify distinction with upstream settings
Top-level mesh gateway mode in proxy-defaults and service-defaults gets
merged into NodeService.Proxy.MeshGateway, and only gets merged with
the mode attached to an an upstream in proxycfg/xds.
* Fix mgw mode usage for peered upstreams
There were a couple issues with how mgw mode was being handled for
peered upstreams.
For starters, mesh gateway mode from proxy-defaults
and the top-level of service-defaults gets stored in
NodeService.Proxy.MeshGateway, but the upstream watch for peered data
was only considering the mesh gateway config attached in
NodeService.Proxy.Upstreams[i]. This means that applying a mesh gateway
mode via global proxy-defaults or service-defaults on the downstream
would not have an effect.
Separately, transparent proxy watches for peered upstreams didn't
consider mesh gateway mode at all.
This commit addresses the first issue by ensuring that we overlay the
upstream config for peered upstreams as we do for non-peered. The second
issue is addressed by re-using setupWatchesForPeeredUpstream when
handling transparent proxy updates.
Note that for transparent proxies we do not yet support mesh gateway
mode per upstream, so the NodeService.Proxy.MeshGateway mode is used.
* Fix upstream mesh gateway mode handling in xds
This commit ensures that when determining the mesh gateway mode for
peered upstreams we consider the NodeService.Proxy.MeshGateway config as
a baseline.
In absense of this change, setting a mesh gateway mode via
proxy-defaults or the top-level of service-defaults will not have an
effect for peered upstreams.
* Merge service/proxy defaults in cfg resolver
Previously the mesh gateway mode for connect proxies would be
merged at three points:
1. On servers, in ComputeResolvedServiceConfig.
2. On clients, in MergeServiceConfig.
3. On clients, in proxycfg/xds.
The first merge returns a ServiceConfigResponse where there is a
top-level MeshGateway config from proxy/service-defaults, along with
per-upstream config.
The second merge combines per-upstream config specified at the service
instance with per-upstream config specified centrally.
The third merge combines the NodeService.Proxy.MeshGateway
config containing proxy/service-defaults data with the per-upstream
mode. This third merge is easy to miss, which led to peered upstreams
not considering the mesh gateway mode from proxy-defaults.
This commit removes the third merge, and ensures that all mesh gateway
config is available at the upstream. This way proxycfg/xds do not need
to do additional overlays.
* Ensure that proxy-defaults is considered in wc
Upstream defaults become a synthetic Upstream definition under a
wildcard key "*". Now that proxycfg/xds expect Upstream definitions to
have the final MeshGateway values, this commit ensures that values from
proxy-defaults/service-defaults are the default for this synthetic
upstream.
* Add changelog.
Co-authored-by: freddygv <freddy@hashicorp.com>
Adds a user-configurable rate limiter to proxycfg snapshot delivery,
with a default limit of 250 updates per second.
This addresses a problem observed in our load testing of Consul
Dataplane where updating a "global" resource such as a wildcard
intention or the proxy-defaults config entry could starve the Raft or
Memberlist goroutines of CPU time, causing general cluster instability.
Routing peering control plane traffic through mesh gateways can be
enabled or disabled at runtime with the mesh config entry.
This commit updates proxycfg to add or cancel watches for local servers
depending on this central config.
Note that WAN federation over mesh gateways is determined by a service
metadata flag, and any updates to the gateway service registration will
force the creation of a new snapshot. If enabled, WAN-fed over mesh
gateways will trigger a local server watch on initialize().
Because of this we will only add/remove server watches if WAN federation
over mesh gateways is disabled.
This is the OSS portion of enterprise PR 2489.
This PR introduces a server-local implementation of the
proxycfg.InternalServiceDump interface that sources data from a blocking query
against the server's state store.
For simplicity, it only implements the subset of the Internal.ServiceDump RPC
handler actually used by proxycfg - as such the result type has been changed
to IndexedCheckServiceNodes to avoid confusion.
This is the OSS portion of enterprise PR 2141.
This commit provides a server-local implementation of the `proxycfg.Intentions`
interface that sources data from streaming events.
It adds events for the `service-intentions` config entry type, and then consumes
event streams (via materialized views) for the service's explicit intentions and
any applicable wildcard intentions, merging them into a single list of intentions.
An alternative approach I considered was to consume _all_ intention events (via
`SubjectWildcard`) and filter out the irrelevant ones. This would admittedly
remove some complexity in the `agent/proxycfg-glue` package but at the expense
of considerable overhead from waking potentially many thousands of connect
proxies every time any intention is updated.
This is only configured in xDS when a service with an L7 protocol is
exported.
They also load any relevant trust bundles for the peered services to
eventually use for L7 SPIFFE validation during mTLS termination.
Mesh gateways will now enable tcp connections with SNI names including peering information so that those connections may be proxied.
Note: this does not change the callers to use these mesh gateways.
Envoy's SPIFFE certificate validation extension allows for us to
validate against different root certificates depending on the trust
domain of the dialing proxy.
If there are any trust bundles from peers in the config snapshot then we
use the SPIFFE validator as the validation context, rather than the
usual TrustedCA.
The injected validation config includes the local root certificates as
well.
For mTLS to work between two proxies in peered clusters with different root CAs,
proxies need to configure their outbound listener to use different root certificates
for validation.
Up until peering was introduced proxies would only ever use one set of root certificates
to validate all mesh traffic, both inbound and outbound. Now an upstream proxy
may have a leaf certificate signed by a CA that's different from the dialing proxy's.
This PR makes changes to proxycfg and xds so that the upstream TLS validation
uses different root certificates depending on which cluster is being dialed.
This is the OSS portion of enterprise PRs 1904, 1905, 1906, 1907, 1949,
and 1971.
It replaces the proxycfg manager's direct dependency on the agent cache
with interfaces that will be implemented differently when serving xDS
sessions from a Consul server.
OSS port of enterprise PR 1822
Includes the necessary changes to the `proxycfg` and `xds` packages to enable
Consul servers to configure arbitrary proxies using catalog data.
Broadly, `proxycfg.Manager` now has public methods for registering,
deregistering, and listing registered proxies — the existing local agent
state-sync behavior has been moved into a separate component that makes use of
these methods.
When an xDS session is started for a proxy service in the catalog, a goroutine
will be spawned to watch the service in the server's state store and
re-register it with the `proxycfg.Manager` whenever it is updated (and clean
it up when the client goes away).
OSS portion of enterprise PR 1857.
This removes (most) references to the `cache.UpdateEvent` type in the
`proxycfg` package.
As we're going to be direct usage of the agent cache with interfaces that
can be satisfied by alternative server-local datasources, it doesn't make
sense to depend on this type everywhere anymore (particularly on the
`state.ch` channel).
We also plan to extract `proxycfg` out of Consul into a shared library in
the future, which would require removing this dependency.
Aside from a fairly rote find-and-replace, the main change is that the
`cache.Cache` and `health.Client` types now accept a callback function
parameter, rather than a `chan<- cache.UpdateEvents`. This allows us to
do the type conversion without running another goroutine.
- `tls.incoming`: applies to the inbound mTLS targeting the public
listener on `connect-proxy` and `terminating-gateway` envoy instances
- `tls.outgoing`: applies to the outbound mTLS dialing upstreams from
`connect-proxy` and `ingress-gateway` envoy instances
Fixes#11966
Transparent proxies typically cannot dial upstreams in remote
datacenters. However, if their upstream configures a redirect to a
remote DC then the upstream targets will be in another datacenter.
In that sort of case we should use the WAN address for the passthrough.
Due to timing, a transparent proxy could have two upstreams to dial
directly with the same address.
For example:
- The orders service can dial upstreams shipping and payment directly.
- An instance of shipping at address 10.0.0.1 is deregistered.
- Payments is scaled up and scheduled to have address 10.0.0.1.
- The orders service receives the event for the new payments instance
before seeing the deregistration for the shipping instance. At this
point two upstreams have the same passthrough address and Envoy will
reject the listener configuration.
To disambiguate this commit considers the Raft index when storing
passthrough addresses. In the example above, 10.0.0.1 would only be
associated with the newer payments service instance.
Transparent proxies can set up filter chains that allow direct
connections to upstream service instances. Services that can be dialed
directly are stored in the PassthroughUpstreams map of the proxycfg
snapshot.
Previously these addresses were not being cleaned up based on new
service health data. The list of addresses associated with an upstream
service would only ever grow.
As services scale up and down, eventually they will have instances
assigned to an IP that was previously assigned to a different service.
When IP addresses are duplicated across filter chain match rules the
listener config will be rejected by Envoy.
This commit updates the proxycfg snapshot management so that passthrough
addresses can get cleaned up when no longer associated with a given
upstream.
There is still the possibility of a race condition here where due to
timing an address is shared between multiple passthrough upstreams.
That concern is mitigated by #12195, but will be further addressed
in a follow-up.
set -euo pipefail
unset CDPATH
cd "$(dirname "$0")"
for f in $(git grep '\brequire := require\.New(' | cut -d':' -f1 | sort -u); do
echo "=== require: $f ==="
sed -i '/require := require.New(t)/d' $f
# require.XXX(blah) but not require.XXX(tblah) or require.XXX(rblah)
sed -i 's/\brequire\.\([a-zA-Z0-9_]*\)(\([^tr]\)/require.\1(t,\2/g' $f
# require.XXX(tblah) but not require.XXX(t, blah)
sed -i 's/\brequire\.\([a-zA-Z0-9_]*\)(\(t[^,]\)/require.\1(t,\2/g' $f
# require.XXX(rblah) but not require.XXX(r, blah)
sed -i 's/\brequire\.\([a-zA-Z0-9_]*\)(\(r[^,]\)/require.\1(t,\2/g' $f
gofmt -s -w $f
done
for f in $(git grep '\bassert := assert\.New(' | cut -d':' -f1 | sort -u); do
echo "=== assert: $f ==="
sed -i '/assert := assert.New(t)/d' $f
# assert.XXX(blah) but not assert.XXX(tblah) or assert.XXX(rblah)
sed -i 's/\bassert\.\([a-zA-Z0-9_]*\)(\([^tr]\)/assert.\1(t,\2/g' $f
# assert.XXX(tblah) but not assert.XXX(t, blah)
sed -i 's/\bassert\.\([a-zA-Z0-9_]*\)(\(t[^,]\)/assert.\1(t,\2/g' $f
# assert.XXX(rblah) but not assert.XXX(r, blah)
sed -i 's/\bassert\.\([a-zA-Z0-9_]*\)(\(r[^,]\)/assert.\1(t,\2/g' $f
gofmt -s -w $f
done
The gist here is that now we use a value-type struct proxycfg.UpstreamID
as the map key in ConfigSnapshot maps where we used to use "upstream
id-ish" strings. These are internal only and used just for bidirectional
trips through the agent cache keyspace (like the discovery chain target
struct).
For the few places where the upstream id needs to be projected into xDS,
that's what (proxycfg.UpstreamID).EnvoyID() is for. This lets us ALWAYS
inject the partition and namespace into these things without making
stuff like the golden testdata diverge.
The duo of `makeUpstreamFilterChainForDiscoveryChain` and `makeListenerForDiscoveryChain` were really hard to reason about, and led to concealing a bug in their branching logic. There were several issues here:
- They tried to accomplish too much: determining filter name, cluster name, and whether RDS should be used.
- They embedded logic to handle significantly different kinds of upstream listeners (passthrough, prepared query, typical services, and catch-all)
- They needed to coalesce different data sources (Upstream and CompiledDiscoveryChain)
Rather than handling all of those tasks inside of these functions, this PR pulls out the RDS/clusterName/filterName logic.
This refactor also fixed a bug with the handling of [UpstreamDefaults](https://www.consul.io/docs/connect/config-entries/service-defaults#defaults). These defaults get stored as UpstreamConfig in the proxy snapshot with a DestinationName of "*", since they apply to all upstreams. However, this wildcard destination name must not be used when creating the name of the associated upstream cluster. The coalescing logic in the original functions here was in some situations creating clusters with a `*.` prefix, which is not a valid destination.
This will behave the way we handle SNI and SPIFFE IDs, where the default
partition is excluded.
Excluding the default ensures that don't attempt to compare default.dc2
to dc2 in OSS.
This commit updates mesh gateway watches for cross-partitions
communication.
* Mesh gateways are keyed by partition and datacenter.
* Mesh gateways will now watch gateways in partitions that export
services to their partition.
* Mesh gateways in non-default partitions will not have cross-datacenter
watches. They are not involved in traditional WAN federation.