* remove legacy tokens
* remove lingering legacy token references from docs
* update language and naming for token secrets and accessor IDs
* updates all tokenID references to clarify accessorID
* remove token type references and lookup tokens by accessorID index
* remove unnecessary constants
* replace additional tokenID param names
* Add warning info for deprecated -id parameter
Co-authored-by: Paul Glass <pglass@hashicorp.com>
* Update field comment
Co-authored-by: Paul Glass <pglass@hashicorp.com>
---------
Co-authored-by: Paul Glass <pglass@hashicorp.com>
Ensure nothing in the troubleshoot go module depends on consul's top level module. This is so we can import troubleshoot into consul-k8s and not import all of consul.
* turns troubleshoot into a go module [authored by @curtbushko]
* gets the envoy protos into the troubleshoot module [authored by @curtbushko]
* adds a new go module `envoyextensions` which has xdscommon and extensioncommon folders that both the xds package and the troubleshoot package can import
* adds testing and linting for the new go modules
* moves the unit tests in `troubleshoot/validateupstream` that depend on proxycfg/xds into the xds package, with a comment describing why those tests cannot be in the troubleshoot package
* fixes all the imports everywhere as a result of these changes
Co-authored-by: Curt Bushko <cbushko@gmail.com>
* Add Peer field to service-defaults upstream overrides.
* add api changes, compat mode for service default overrides
* Fixes based on testing
---------
Co-authored-by: DanStough <dan.stough@hashicorp.com>
* Add Tproxy support to Envoy Extensions (this is needed for service to service validation)
* Add validation for Envoy configuration for an upstream service
* Use both /config_dump and /cluster to validate Envoy configuration
This is because of a bug in Envoy where the EndpointsConfigDump does not
include a cluster_name, making it impossible to match an endpoint to
verify it exists.
This removes endpoints support for builtin extensions since only the
validate plugin was using it, and it is no longer used. It also removes
test cases for endpoint validation. Endpoints validation now only occurs
in the top level test from config_dump and clusters json files.
Co-authored-by: Eric <eric@haberkorn.co>
* updated builtin extension to parse region directly from ARN
- added a unit test
- added some comments/light refactoring
* updated golden files with proper ARNs
- ARNs need to be right format now that they are being processed
* updated tests and integration tests
- removed 'region' from all EnvoyExtension arguments
- added properly formatted ARN which includes the same region found in the removed "Region" field: 'us-east-1'
Fix configuration merging for implicit tproxy upstreams.
Change the merging logic so that the wildcard upstream has correct proxy-defaults
and service-defaults values combined into it. It did not previously merge all fields,
and the wildcard upstream did not exist unless service-defaults existed (it ignored
proxy-defaults, essentially).
Change the way we fetch upstream configuration in the xDS layer so that it falls back
to the wildcard when no matching upstream is found. This is what allows implicit peer
upstreams to have the correct "merged" config.
Change proxycfg to always watch local mesh gateway endpoints whenever a peer upstream
is found. This simplifies the logic so that we do not have to inspect the "merged"
configuration on peer upstreams to extract the mesh gateway mode.
Previously, we'd begin a session with the xDS concurrency limiter
regardless of whether the proxy was registered in the catalog or in
the server's local agent state.
This caused problems for users who run `consul connect envoy` directly
against a server rather than a client agent, as the server's locally
registered proxies wouldn't be included in the limiter's capacity.
Now, the `ConfigSource` is responsible for beginning the session and we
only do so for services in the catalog.
Fixes: https://github.com/hashicorp/consul/issues/15753
* Protobuf Modernization
Remove direct usage of golang/protobuf in favor of google.golang.org/protobuf
Marshallers (protobuf and json) needed some changes to account for different APIs.
Moved to using the google.golang.org/protobuf/types/known/* for the well known types including replacing some custom Struct manipulation with whats available in the structpb well known type package.
This also updates our devtools script to install protoc-gen-go from the right location so that files it generates conform to the correct interfaces.
* Fix go-mod-tidy make target to work on all modules
This gets the extensions information for the local service into the snapshot and ExtensionConfigurations for a proxy. It grabs the extensions from config entries and puts them in structs.NodeService.Proxy field, which already is copied into the config snapshot.
Also:
* add EnvoyExtensions to api.AgentService so that it matches structs.NodeService
* extensions: refactor PluginConfiguration into a more generic type
ExtensionConfiguration
Also:
* adds endpoints configuration to lambda golden tests
* uses string constant for builtin/aws/lambda
Co-authored-by: Eric <eric@haberkorn.co>
* add functions for returning the max and min Envoy major versions
- added an UnsupportedEnvoyVersions list
- removed an unused error from TestDetermineSupportedProxyFeaturesFromString
- modified minSupportedVersion to use the function for getting the Min Envoy major version. Using just the major version without the patch is equivalent to using `.0`
* added a function for executing the envoy --version command
- added a new exec.go file to not be locked to unix system
* added envoy version check when using consul connect envoy
* added changelog entry
* added docs change
* feat(ingress-gateway): support outlier detection of upstream service for ingress gateway
* changelog
Co-authored-by: Eric Haberkorn <erichaberkorn@gmail.com>
* integ-test: fix flaky test - case-cfg-splitter-peering-ingress-gateways
* add retry peering to all peering cases
Co-authored-by: Dan Stough <dan.stough@hashicorp.com>
Fix local mesh gateway with peering discovery chains.
Prior to this patch, discovery chains with peers would not
properly honor the mesh gateway mode for two reasons.
1. An incorrect target upstream ID was used to lookup the
mesh gateway mode. To fix this, the parent upstream uid is
now used instead of the discovery-chain-target-uid to find
the intended mesh gateway mode.
2. The watch for local mesh gateways was never initialized
for discovery chains. To fix this, the discovery chains are
now scanned, and a local GW watch is spawned if: the mesh
gateway mode is local and the target is a peering connection.
* update go version to 1.18 for api and sdk, go mod tidy
* removes ioutil usage everywhere which was deprecated in go1.16 in favour of io and os packages. Also introduces a lint rule which forbids use of ioutil going forward.
Co-authored-by: R.B. Boyer <4903+rboyer@users.noreply.github.com>
* Fix mesh gateway proxy-defaults not affecting upstreams.
* Clarify distinction with upstream settings
Top-level mesh gateway mode in proxy-defaults and service-defaults gets
merged into NodeService.Proxy.MeshGateway, and only gets merged with
the mode attached to an an upstream in proxycfg/xds.
* Fix mgw mode usage for peered upstreams
There were a couple issues with how mgw mode was being handled for
peered upstreams.
For starters, mesh gateway mode from proxy-defaults
and the top-level of service-defaults gets stored in
NodeService.Proxy.MeshGateway, but the upstream watch for peered data
was only considering the mesh gateway config attached in
NodeService.Proxy.Upstreams[i]. This means that applying a mesh gateway
mode via global proxy-defaults or service-defaults on the downstream
would not have an effect.
Separately, transparent proxy watches for peered upstreams didn't
consider mesh gateway mode at all.
This commit addresses the first issue by ensuring that we overlay the
upstream config for peered upstreams as we do for non-peered. The second
issue is addressed by re-using setupWatchesForPeeredUpstream when
handling transparent proxy updates.
Note that for transparent proxies we do not yet support mesh gateway
mode per upstream, so the NodeService.Proxy.MeshGateway mode is used.
* Fix upstream mesh gateway mode handling in xds
This commit ensures that when determining the mesh gateway mode for
peered upstreams we consider the NodeService.Proxy.MeshGateway config as
a baseline.
In absense of this change, setting a mesh gateway mode via
proxy-defaults or the top-level of service-defaults will not have an
effect for peered upstreams.
* Merge service/proxy defaults in cfg resolver
Previously the mesh gateway mode for connect proxies would be
merged at three points:
1. On servers, in ComputeResolvedServiceConfig.
2. On clients, in MergeServiceConfig.
3. On clients, in proxycfg/xds.
The first merge returns a ServiceConfigResponse where there is a
top-level MeshGateway config from proxy/service-defaults, along with
per-upstream config.
The second merge combines per-upstream config specified at the service
instance with per-upstream config specified centrally.
The third merge combines the NodeService.Proxy.MeshGateway
config containing proxy/service-defaults data with the per-upstream
mode. This third merge is easy to miss, which led to peered upstreams
not considering the mesh gateway mode from proxy-defaults.
This commit removes the third merge, and ensures that all mesh gateway
config is available at the upstream. This way proxycfg/xds do not need
to do additional overlays.
* Ensure that proxy-defaults is considered in wc
Upstream defaults become a synthetic Upstream definition under a
wildcard key "*". Now that proxycfg/xds expect Upstream definitions to
have the final MeshGateway values, this commit ensures that values from
proxy-defaults/service-defaults are the default for this synthetic
upstream.
* Add changelog.
Co-authored-by: freddygv <freddy@hashicorp.com>
* Regenerate golden files.
* Backport from ENT: "Avoid race"
Original commit: 5006c8c858b0e332be95271ef9ba35122453315b
Original author: freddygv
* Backport from ENT: "chore: fix flake peerstream test"
Original commit: b74097e7135eca48cc289798c5739f9ef72c0cc8
Original author: DanStough
* ingress-gateways: don't log error when registering gateway
Previously, when an ingress gateway was registered without a
corresponding ingress gateway config entry, an error was logged
because the watch on the config entry returned a nil result.
This is expected so don't log an error.
* Configure Envoy alpn_protocols based on service protocol
* define alpnProtocols in a more standard way
* http2 protocol should be h2 only
* formatting
* add test for getAlpnProtocol()
* create changelog entry
* change scope is connect-proxy
* ignore errors on ParseProxyConfig; fixes linter
* add tests for grpc and http2 public listeners
* remove newlines from PR
* Add alpn_protocol configuration for ingress gateway
* Guard against nil tlsContext
* add ingress gateway w/ TLS tests for gRPC and HTTP2
* getAlpnProtocols: add TCP protocol test
* add tests for ingress gateway with grpc/http2 and per-listener TLS config
* add tests for ingress gateway with grpc/http2 and per-listener TLS config
* add Gateway level TLS config with mixed protocol listeners to validate ALPN
* update changelog to include ingress-gateway
* add http/1.1 to http2 ALPN
* go fmt
* fix test on custom-trace-listener
This commit adds the xDS resources needed for INBOUND traffic from peer
clusters:
- 1 filter chain for all inbound peering requests.
- 1 cluster for all inbound peering requests.
- 1 endpoint per voting server with the gRPC TLS port configured.
There is one filter chain and cluster because unlike with WAN
federation, peer clusters will not attempt to dial individual servers.
Peer clusters will only dial the local mesh gateway addresses.
* feat(ingress gateway: support configuring limits in ingress-gateway config entry
- a new Defaults field with max_connections, max_pending_connections, max_requests
is added to ingress gateway config entry
- new field max_connections, max_pending_connections, max_requests in
individual services to overwrite the value in Default
- added unit test and integration test
- updated doc
Co-authored-by: Chris S. Kim <ckim@hashicorp.com>
Co-authored-by: Jeff Boruszak <104028618+boruszak@users.noreply.github.com>
Co-authored-by: Dan Stough <dan.stough@hashicorp.com>
Routing peering control plane traffic through mesh gateways can be
enabled or disabled at runtime with the mesh config entry.
This commit updates proxycfg to add or cancel watches for local servers
depending on this central config.
Note that WAN federation over mesh gateways is determined by a service
metadata flag, and any updates to the gateway service registration will
force the creation of a new snapshot. If enabled, WAN-fed over mesh
gateways will trigger a local server watch on initialize().
Because of this we will only add/remove server watches if WAN federation
over mesh gateways is disabled.
Prior to #13244, connect proxies and gateways could only be configured by an
xDS session served by the local client agent.
In an upcoming release, it will be possible to deploy a Consul service mesh
without client agents. In this model, xDS sessions will be handled by the
servers themselves, which necessitates load-balancing to prevent a single
server from receiving a disproportionate amount of load and becoming
overwhelmed.
This introduces a simple form of load-balancing where Consul will attempt to
achieve an even spread of load (xDS sessions) between all healthy servers.
It does so by implementing a concurrent session limiter (limiter.SessionLimiter)
and adjusting the limit according to autopilot state and proxy service
registrations in the catalog.
If a server is already over capacity (i.e. the session limit is lowered),
Consul will begin draining sessions to rebalance the load. This will result
in the client receiving a `RESOURCE_EXHAUSTED` status code. It is the client's
responsibility to observe this response and reconnect to a different server.
Users of the gRPC client connection brokered by the
consul-server-connection-manager library will get this for free.
The rate at which Consul will drain sessions to rebalance load is scaled
dynamically based on the number of proxies in the catalog.
* draft commit
* add changelog, update test
* remove extra param
* fix test
* update type to account for nil value
* add test for custom passive health check
* update comments and tests
* update description in docs
* fix missing commas
This is the OSS portion of enterprise PR 2339.
It improves our handling of "irrecoverable" errors in proxycfg data sources.
The canonical example of this is what happens when the ACL token presented by
Envoy is deleted/revoked. Previously, the stream would get "stuck" until the
xDS server re-checked the token (after 5 minutes) and terminated the stream.
Materializers would also sit burning resources retrying something that could
never succeed.
Now, it is possible for data sources to mark errors as "terminal" which causes
the xDS stream to be closed immediately. Similarly, the submatview.Store will
evict materializers when it observes they have encountered such an error.
* add golden files
* add support to http in tgateway egress destination
* fix slice sorting to include both address and port when using server_names
* fix listener loop for http destination
* fix routes to generate a route per port and a virtualhost per port-address combination
* sort virtual hosts list to have a stable order
* extract redundant serviceNode
Now that peered upstreams can generate envoy resources (#13758), we need a way to disambiguate local from peered resources in our metrics. The key difference is that datacenter and partition will be replaced with peer, since in the context of peered resources partition is ambiguous (could refer to the partition in a remote cluster or one that exists locally). The partition and datacenter of the proxy will always be that of the source service.
Regexes were updated to make emitting datacenter and partition labels mutually exclusive with peer labels.
Listener filter names were updated to better match the existing regex.
Cluster names assigned to peered upstreams were updated to be synthesized from local peer name (it previously used the externally provided primary SNI, which contained the peer name from the other side of the peering). Integration tests were updated to assert for the new peer labels.
Peered upstreams has a separate loop in xds from discovery chain upstreams. This PR adds similar but slightly modified code to add filters for peered upstream listeners, clusters, and endpoints in the case of transparent proxy.
Previously, public referred to gRPC services that are both exposed on
the dedicated gRPC port and have their definitions in the proto-public
directory (so were considered usable by 3rd parties). Whereas private
referred to services on the multiplexed server port that are only usable
by agents and other servers.
Now, we're splitting these definitions, such that external/internal
refers to the port and public/private refers to whether they can be used
by 3rd parties.
This is necessary because the peering replication API needs to be
exposed on the dedicated port, but is not (yet) suitable for use by 3rd
parties.
Because peerings are pairwise, between two tuples of (datacenter,
partition) having any exported reference via a discovery chain that
crosses out of the peered datacenter or partition will ultimately not be
able to work for various reasons. The biggest one is that there is no
way in the ultimate destination to configure an intention that can allow
an external SpiffeID to access a service.
This PR ensures that a user simply cannot do this, so they won't run
into weird situations like this.
When the protocol is http-like, and an intention has a peered source
then the normal RBAC mTLS SAN field check is replaces with a joint combo
of:
mTLS SAN field must be the service's local mesh gateway leaf cert
AND
the first XFCC header (from the MGW) must have a URI field that matches the original intention source
Also:
- Update the regex program limit to be much higher than the teeny
defaults, since the RBAC regex constructions are more complicated now.
- Fix a few stray panics in xds generation.
This is only configured in xDS when a service with an L7 protocol is
exported.
They also load any relevant trust bundles for the peered services to
eventually use for L7 SPIFFE validation during mTLS termination.
When converting from Consul intentions to xds RBAC rules, services imported from other peers must encode additional data like partition (from the remote cluster) and trust domain.
This PR updates the PeeringTrustBundle to hold the sending side's local partition as ExportedPartition. It also updates RBAC code to encode SpiffeIDs of imported services with the ExportedPartition and TrustDomain.
Mesh gateways can use hostnames in their tagged addresses (#7999). This is useful
if you were to expose a mesh gateway using a cloud networking load balancer appliance
that gives you a DNS name but no reliable static IPs.
Envoy cannot accept hostnames via EDS and those must be configured using CDS.
There was already logic when configuring gateways in other locations in the code, but
given the illusions in play for peering the downstream of a peered service wasn't aware
that it should be doing that.
Also:
- ensuring that we always try to use wan-like addresses to cross peer boundaries.
Mesh gateways will now enable tcp connections with SNI names including peering information so that those connections may be proxied.
Note: this does not change the callers to use these mesh gateways.
This is the OSS portion of enterprise PR 1994
Rather than directly interrogating the agent-local state for HTTP
checks using the `HTTPCheckFetcher` interface, we now rely on the
config snapshot containing the checks.
This reduces the number of changes required to support server xDS
sessions.
It's not clear why the fetching approach was introduced in
931d167ebb2300839b218d08871f22323c60175d.
Envoy's SPIFFE certificate validation extension allows for us to
validate against different root certificates depending on the trust
domain of the dialing proxy.
If there are any trust bundles from peers in the config snapshot then we
use the SPIFFE validator as the validation context, rather than the
usual TrustedCA.
The injected validation config includes the local root certificates as
well.
For mTLS to work between two proxies in peered clusters with different root CAs,
proxies need to configure their outbound listener to use different root certificates
for validation.
Up until peering was introduced proxies would only ever use one set of root certificates
to validate all mesh traffic, both inbound and outbound. Now an upstream proxy
may have a leaf certificate signed by a CA that's different from the dialing proxy's.
This PR makes changes to proxycfg and xds so that the upstream TLS validation
uses different root certificates depending on which cluster is being dialed.