* Fix mesh gateway proxy-defaults not affecting upstreams.
* Clarify distinction with upstream settings
Top-level mesh gateway mode in proxy-defaults and service-defaults gets
merged into NodeService.Proxy.MeshGateway, and only gets merged with
the mode attached to an an upstream in proxycfg/xds.
* Fix mgw mode usage for peered upstreams
There were a couple issues with how mgw mode was being handled for
peered upstreams.
For starters, mesh gateway mode from proxy-defaults
and the top-level of service-defaults gets stored in
NodeService.Proxy.MeshGateway, but the upstream watch for peered data
was only considering the mesh gateway config attached in
NodeService.Proxy.Upstreams[i]. This means that applying a mesh gateway
mode via global proxy-defaults or service-defaults on the downstream
would not have an effect.
Separately, transparent proxy watches for peered upstreams didn't
consider mesh gateway mode at all.
This commit addresses the first issue by ensuring that we overlay the
upstream config for peered upstreams as we do for non-peered. The second
issue is addressed by re-using setupWatchesForPeeredUpstream when
handling transparent proxy updates.
Note that for transparent proxies we do not yet support mesh gateway
mode per upstream, so the NodeService.Proxy.MeshGateway mode is used.
* Fix upstream mesh gateway mode handling in xds
This commit ensures that when determining the mesh gateway mode for
peered upstreams we consider the NodeService.Proxy.MeshGateway config as
a baseline.
In absense of this change, setting a mesh gateway mode via
proxy-defaults or the top-level of service-defaults will not have an
effect for peered upstreams.
* Merge service/proxy defaults in cfg resolver
Previously the mesh gateway mode for connect proxies would be
merged at three points:
1. On servers, in ComputeResolvedServiceConfig.
2. On clients, in MergeServiceConfig.
3. On clients, in proxycfg/xds.
The first merge returns a ServiceConfigResponse where there is a
top-level MeshGateway config from proxy/service-defaults, along with
per-upstream config.
The second merge combines per-upstream config specified at the service
instance with per-upstream config specified centrally.
The third merge combines the NodeService.Proxy.MeshGateway
config containing proxy/service-defaults data with the per-upstream
mode. This third merge is easy to miss, which led to peered upstreams
not considering the mesh gateway mode from proxy-defaults.
This commit removes the third merge, and ensures that all mesh gateway
config is available at the upstream. This way proxycfg/xds do not need
to do additional overlays.
* Ensure that proxy-defaults is considered in wc
Upstream defaults become a synthetic Upstream definition under a
wildcard key "*". Now that proxycfg/xds expect Upstream definitions to
have the final MeshGateway values, this commit ensures that values from
proxy-defaults/service-defaults are the default for this synthetic
upstream.
* Add changelog.
Co-authored-by: freddygv <freddy@hashicorp.com>
Previously, the MergeNodeServiceWithCentralConfig method accepted a
ServiceSpecificRequest argument, of which only the Datacenter and
QueryOptions fields were used.
Digging a little deeper, it turns out these fields were only passed
down to the ComputeResolvedServiceConfig method (through the
ServiceConfigRequest struct) which didn't actually use them.
As such, not all call-sites passed a valid ServiceSpecificRequest
so it's safer to remove the argument altogether to prevent future
changes from depending on it.
Re-add ServerExternalAddresses parameter in GenerateToken endpoint
This reverts commit 5e156772f6a7fba5324eb6804ae4e93c091229a6
and adds extra functionality to support newer peering behaviors.
* Backport agent tests.
Original commit: 0710b2d12fb51a29cedd1119b5fb086e5c71f632
Original commit: aaedb3c28bfe247266f21013d500147d8decb7cd (partial)
* Backport test fix and reduce flaky failures.
* Backport test from ENT: "Fix missing test fields"
Original Author: Sarah Pratt
Original Commit: a5c88bef7a969ea5d06ed898d142ab081ba65c69
* Update with proper linting.
* Regenerate golden files.
* Backport from ENT: "Avoid race"
Original commit: 5006c8c858b0e332be95271ef9ba35122453315b
Original author: freddygv
* Backport from ENT: "chore: fix flake peerstream test"
Original commit: b74097e7135eca48cc289798c5739f9ef72c0cc8
Original author: DanStough
Allow for some message duplication in subscription events during assertions.
I'm pretty sure the subscriptions machinery allows for messages to occasionally
be duplicated instead of dropping them, as a once-and-only-once queue is a pipe
dream and you have to pick one of the other two options.
* ingress-gateways: don't log error when registering gateway
Previously, when an ingress gateway was registered without a
corresponding ingress gateway config entry, an error was logged
because the watch on the config entry returned a nil result.
This is expected so don't log an error.
* autoencrypt: helpful error for clients with wrong dc
If clients have set a different datacenter than the servers they're
connecting with for autoencrypt, give a helpful error message.
This continues the work done in #14908 where a crude solution to prevent a
goroutine leak was implemented. The former code would launch a perpetual
goroutine family every iteration (+1 +1) and the fixed code simply caused a
new goroutine family to first cancel the prior one to prevent the
leak (-1 +1 == 0).
This PR refactors this code completely to:
- make it more understandable
- remove the recursion-via-goroutine strangeness
- prevent unnecessary RPC fetches when the prior one has errored.
The core issue arose from a conflation of the entry.Fetching field to mean:
- there is an RPC (blocking query) in flight right now
- there is a goroutine running to manage the RPC fetch retry loop
The problem is that the goroutine-leak-avoidance check would treat
Fetching like (2), but within the body of a goroutine it would flip that
boolean back to false before the retry sleep. This would cause a new
chain of goroutines to launch which #14908 would correct crudely.
The refactored code uses a plain for-loop and changes the semantics
to track state for "is there a goroutine associated with this cache entry"
instead of the former.
We use a uint64 unique identity per goroutine instead of a boolean so
that any orphaned goroutines can tell when they've been replaced when
the expiry loop deletes a cache entry while the goroutine is still running
and is later replaced.
To support Destinations on the service-defaults (for tproxy with terminating gateway), we need to now also make servers watch service-defaults config entries.
This commit updates the establish endpoint to bubble up a 403 status
code to callers when the establishment secret from the token is invalid.
This is a signal that a new peering token must be generated.
* peering: skip register duplicate node and check from the peer
* Prebuilt the nodes map and checks map to avoid repeated for loop
* use key type to struct: node id, service id, and check id
Fix an issue where rpc_hold_timeout was being used as the timeout for non-blocking queries. Users should be able to tune read timeouts without fiddling with rpc_hold_timeout. A new configuration `rpc_read_timeout` is created.
Refactor some implementation from the original PR 11500 to remove the misleading linkage between RPCInfo's timeout (used to retry in case of certain modes of failures) and the client RPC timeouts.
There is a bug in the error handling code for the Agent cache subsystem discovered:
1. NotifyCallback calls notifyBlockingQuery which calls getWithIndex in
a loop (which backs off on-error up to 1 minute)
2. getWithIndex calls fetch if there’s no valid entry in the cache
3. fetch starts a goroutine which calls Fetch on the cache-type, waits
for a while (again with backoff up to 1 minute for errors) and then
calls fetch to trigger a refresh
The end result being that every 1 minute notifyBlockingQuery spawns an
ancestry of goroutines that essentially lives forever.
This PR ensures that the goroutine started by `fetch` cancels any prior
goroutine spawned by the same line for the same key.
In isolated testing where a cache type was tweaked to indefinitely
error, this patch prevented goroutine counts from skyrocketing.
In practice this was masked by #14956 and was only uncovered fixing the
other bug.
go test ./agent -run TestAgentConnectCALeafCert_goodNotLocal
would fail when only #14956 was fixed.
Adds a user-configurable rate limiter to proxycfg snapshot delivery,
with a default limit of 250 updates per second.
This addresses a problem observed in our load testing of Consul
Dataplane where updating a "global" resource such as a wildcard
intention or the proxy-defaults config entry could starve the Raft or
Memberlist goroutines of CPU time, causing general cluster instability.
Replaces the reflection-based implementation of proxycfg's
ConfigSnapshot.Clone with code generated by deep-copy.
While load testing server-based xDS (for consul-dataplane) we discovered
this method is extremely expensive. The ConfigSnapshot struct, directly
or indirectly, contains a copy of many of the structs in the agent/structs
package, which creates a large graph for copystructure.Copy to traverse
at runtime, on every proxy reconfiguration.
When peering through mesh gateways we expect outbound dials to peer
servers to flow through the local mesh gateway addresses.
Now when establishing a peering we get a list of dial addresses as a
ring buffer that includes local mesh gateway addresses if the local DC
is configured to peer through mesh gateways. The ring buffer includes
the mesh gateway addresses first, but also includes the remote server
addresses as a fallback.
This fallback is present because it's possible that direct egress from
the servers may be allowed. If not allowed then the leader will cycle
back to a mesh gateway address through the ring.
When attempting to dial the remote servers we retry up to a fixed
timeout. If using mesh gateways we also have an initial wait in
order to allow for the mesh gateways to configure themselves.
Note that if we encounter a permission denied error we do not retry
since that error indicates that the secret in the peering token is
invalid.
memdb's `WatchCh` method creates a goroutine that will publish to the
returned channel when the watchset is triggered or the given context
is canceled. Although this is called out in its godoc comment, it's
not obvious that this method creates a goroutine who's lifecycle you
need to manage.
In the xDS capacity controller, we were calling `WatchCh` on each
iteration of the control loop, meaning the number of goroutines would
grow on each autopilot event until there was catalog churn.
In the catalog config source, we were calling `WatchCh` with the
background context, meaning that the goroutine would keep running after
the sync loop had terminated.
Adds another datasource for proxycfg.HTTPChecks, for use on server agents. Typically these checks are performed by local client agents and there is no equivalent of this in agentless (where servers configure consul-dataplane proxies).
Hence, the data source is mostly a no-op on servers but in the case where the service is present within the local state, it delegates to the cache data source.
This commit adds a monotonically increasing nonce to include in peering
replication response messages. Every ack/nack from the peer handling a
response will include this nonce, allowing to correlate the ack/nack
with a specific resource.
At the moment nothing is done with the nonce when it is received. In the
future we may want to add functionality such as retries on NACKs,
depending on the class of error.
* Move stats.go from grpc-internal to grpc-middleware
* Update grpc server metrics with server type label
* Add stats test to grpc-external
* Remove global metrics instance from grpc server tests
Preivously when alias check was removed it would not be stopped nor
cleaned up from the associated aliasChecks map.
This means that any time an alias check was deregistered we would
leak a goroutine for CheckAlias.run() because the stopCh would never
be closed.
This issue mostly affects service mesh deployments on platforms where
the client agent is mostly static but proxy services come and go
regularly, since by default sidecars are registered with an alias check.
* Configure Envoy alpn_protocols based on service protocol
* define alpnProtocols in a more standard way
* http2 protocol should be h2 only
* formatting
* add test for getAlpnProtocol()
* create changelog entry
* change scope is connect-proxy
* ignore errors on ParseProxyConfig; fixes linter
* add tests for grpc and http2 public listeners
* remove newlines from PR
* Add alpn_protocol configuration for ingress gateway
* Guard against nil tlsContext
* add ingress gateway w/ TLS tests for gRPC and HTTP2
* getAlpnProtocols: add TCP protocol test
* add tests for ingress gateway with grpc/http2 and per-listener TLS config
* add tests for ingress gateway with grpc/http2 and per-listener TLS config
* add Gateway level TLS config with mixed protocol listeners to validate ALPN
* update changelog to include ingress-gateway
* add http/1.1 to http2 ALPN
* go fmt
* fix test on custom-trace-listener
A previous commit introduced an internally-managed server certificate
to use for peering-related purposes.
Now the peering token has been updated to match that behavior:
- The server name matches the structure of the server cert
- The CA PEMs correspond to the Connect CA
Note that if Conect is disabled, and by extension the Connect CA, we
fall back to the previous behavior of returning the manually configured
certs and local server SNI.
Several tests were updated to use the gRPC TLS port since they enable
Connect by default. This means that the peering token will embed the
Connect CA, and the dialer will expect a TLS listener.
* updating to serf v0.10.1 and memberlist v0.5.0 to get memberlist size metrics and memberlist broadcast queue depth metric
* update changelog
* update changelog
* correcting changelog
* adding "QueueCheckInterval" for memberlist to test
* updating integration test containers to grab latest api
This commit adds the xDS resources needed for INBOUND traffic from peer
clusters:
- 1 filter chain for all inbound peering requests.
- 1 cluster for all inbound peering requests.
- 1 endpoint per voting server with the gRPC TLS port configured.
There is one filter chain and cluster because unlike with WAN
federation, peer clusters will not attempt to dial individual servers.
Peer clusters will only dial the local mesh gateway addresses.
This commit adds handling so that the replication stream considers
whether the user intends to peer through mesh gateways.
The subscription will return server or mesh gateway addresses depending
on the mesh configuration setting. These watches can be updated at
runtime by modifying the mesh config entry.
* feat(ingress gateway: support configuring limits in ingress-gateway config entry
- a new Defaults field with max_connections, max_pending_connections, max_requests
is added to ingress gateway config entry
- new field max_connections, max_pending_connections, max_requests in
individual services to overwrite the value in Default
- added unit test and integration test
- updated doc
Co-authored-by: Chris S. Kim <ckim@hashicorp.com>
Co-authored-by: Jeff Boruszak <104028618+boruszak@users.noreply.github.com>
Co-authored-by: Dan Stough <dan.stough@hashicorp.com>
Routing peering control plane traffic through mesh gateways can be
enabled or disabled at runtime with the mesh config entry.
This commit updates proxycfg to add or cancel watches for local servers
depending on this central config.
Note that WAN federation over mesh gateways is determined by a service
metadata flag, and any updates to the gateway service registration will
force the creation of a new snapshot. If enabled, WAN-fed over mesh
gateways will trigger a local server watch on initialize().
Because of this we will only add/remove server watches if WAN federation
over mesh gateways is disabled.
Preivously the TLS configurator would default to presenting auto TLS
certificates as client certificates.
Server agents should not have this behavior and should instead present
the manually configured certs. The autoTLS certs for servers are
exclusively used for peering and should not be used as the default for
outbound communication.
This commit introduces a new ACL token used for internal server
management purposes.
It has a few key properties:
- It has unlimited permissions.
- It is persisted through Raft as System Metadata rather than in the
ACL tokens table. This is to avoid users seeing or modifying it.
- It is re-generated on leadership establishment.
* Typos
* Test failing
* Convert values <1ms to decimal
* Fix test
* Update docs and test error msg
* Applied suggested changes to test case
* Changelog file and suggested changes
* Update .changelog/12905.txt
Co-authored-by: Chris S. Kim <kisunji92@gmail.com>
* suggested change - start duration with microseconds instead of nanoseconds
* fix error
* suggested change - floats
Co-authored-by: alex <8968914+acpana@users.noreply.github.com>
Co-authored-by: Chris S. Kim <kisunji92@gmail.com>
* Config-entry: Support proxy config in service-defaults
* Update website/content/docs/connect/config-entries/service-defaults.mdx
Co-authored-by: Jeff Boruszak <104028618+boruszak@users.noreply.github.com>
Prior to #13244, connect proxies and gateways could only be configured by an
xDS session served by the local client agent.
In an upcoming release, it will be possible to deploy a Consul service mesh
without client agents. In this model, xDS sessions will be handled by the
servers themselves, which necessitates load-balancing to prevent a single
server from receiving a disproportionate amount of load and becoming
overwhelmed.
This introduces a simple form of load-balancing where Consul will attempt to
achieve an even spread of load (xDS sessions) between all healthy servers.
It does so by implementing a concurrent session limiter (limiter.SessionLimiter)
and adjusting the limit according to autopilot state and proxy service
registrations in the catalog.
If a server is already over capacity (i.e. the session limit is lowered),
Consul will begin draining sessions to rebalance the load. This will result
in the client receiving a `RESOURCE_EXHAUSTED` status code. It is the client's
responsibility to observe this response and reconnect to a different server.
Users of the gRPC client connection brokered by the
consul-server-connection-manager library will get this for free.
The rate at which Consul will drain sessions to rebalance load is scaled
dynamically based on the number of proxies in the catalog.
http.Transport keeps a pool of connections and should be reused when possible. We instantiate a new http.DefaultTransport for every metrics request, making large numbers of concurrent requests inefficiently spin up new connections instead of reusing open ones.
Co-authored-by: Eric Haberkorn <erichaberkorn@gmail.com>
By adding a SpiffeID for server agents, servers can now request a leaf
certificate from the Connect CA.
This new Spiffe ID has a key property: servers are identified by their
datacenter name and trust domain. All servers that share these
attributes will share a ServerURI.
The aim is to use these certificates to verify the server name of ANY
server in a Consul datacenter.
This is the OSS portion of enterprise PR 2489.
This PR introduces a server-local implementation of the
proxycfg.InternalServiceDump interface that sources data from a blocking query
against the server's state store.
For simplicity, it only implements the subset of the Internal.ServiceDump RPC
handler actually used by proxycfg - as such the result type has been changed
to IndexedCheckServiceNodes to avoid confusion.
This is the OSS portion of enterprise PR 2460.
Introduces a server-local implementation of the proxycfg.ResolvedServiceConfig
interface that sources data from a blocking query against the server's state
store.
It moves the service config resolution logic into the agent/configentry package
so that it can be used in both the RPC handler and data source.
I've also done a little re-arranging and adding comments to call out data
sources for which there is to be no server-local equivalent.
When a sidecar proxy is registered, a check is automatically added.
Previously, the address this check used was the underlying service's
address instead of the proxy's address, even though the check is testing
if the proxy is up.
This worked in most cases because the proxy ran on the same IP as the
underlying service but it's not guaranteed and so the proper default
address should be the proxy's address.
* draft commit
* add changelog, update test
* remove extra param
* fix test
* update type to account for nil value
* add test for custom passive health check
* update comments and tests
* update description in docs
* fix missing commas
* validate args before deleting proxy defaults
* add changelog
* validate name when normalizing proxy defaults
* add test for proxyConfigEntry
* add comments
To ease the transition for users, the original gRPC
port can still operate in a deprecated mode as either
plain-text or TLS mode. This behavior should be removed
in a future release whenever we no longer support this.
The resulting behavior from this commit is:
`ports.grpc > 0 && ports.grpc_tls > 0` spawns both plain-text and tls ports.
`ports.grpc > 0 && grpc.tls == undefined` spawns a single plain-text port.
`ports.grpc > 0 && grpc.tls != undefined` spawns a single tls port (backwards compat mode).
Peerings are terminated when a peer decides to delete the peering from
their end. Deleting a peering sends a termination message to the peer
and triggers them to mark the peering as terminated but does NOT delete
the peering itself. This is to prevent peerings from disappearing from
both sides just because one side deleted them.
Previously the Delete endpoint was skipping the deletion if the peering
was not marked as active. However, terminated peerings are also
inactive.
This PR makes some updates so that peerings marked as terminated can be
deleted by users.
We need to watch for changes to peerings and update the server addresses which get served by the ring buffer.
Also, if there is an active connection for a peer, we are getting up-to-date server addresses from the replication stream and can safely ignore the token's addresses which may be stale.
Contains 2 changes to the GetEnvoyBootstrapParams response to support
consul-dataplane.
Exposing node_name and node_id:
consul-dataplane will support providing either the node_id or node_name in its
configuration. Unfortunately, supporting both in the xDS meta adds a fair amount
of complexity (partly because most tables are currently indexed on node_name)
so for now we're going to return them both from the bootstrap params endpoint,
allowing consul-dataplane to exchange a node_id for a node_name (which it will
supply in the xDS meta).
Properly setting service for gateways:
To avoid the need to special case gateways in consul-dataplane, service will now
either be the destination service name for connect proxies, or the gateway
service name. This means it can be used as-is in Envoy configuration (i.e. as a
cluster name or in metric tags).
This is the OSS portion of enterprise PR 2339.
It improves our handling of "irrecoverable" errors in proxycfg data sources.
The canonical example of this is what happens when the ACL token presented by
Envoy is deleted/revoked. Previously, the stream would get "stuck" until the
xDS server re-checked the token (after 5 minutes) and terminated the stream.
Materializers would also sit burning resources retrying something that could
never succeed.
Now, it is possible for data sources to mark errors as "terminal" which causes
the xDS stream to be closed immediately. Similarly, the submatview.Store will
evict materializers when it observes they have encountered such an error.
newMockSnapshotHandler has an assertion on t.Cleanup which gets called before the event publisher is cancelled. This commit reorders the context.WithCancel so it properly gets cancelled before the assertion is made.
Consul 1.13.0 changed ServiceVirtualIP to use PeeredServiceName instead of ServiceName which was a breaking change for those using service mesh and wanted to restore their snapshot after upgrading to 1.13.0.
This commit handles existing data with older ServiceName and converts it during restore so that there are no issues when restoring from older snapshots.
1. Create a bexpr filter for performing the filtering
2. Change the state store functions to return the raw (not aggregated)
list of ServiceNodes.
3. Move the aggregate service tags by name logic out of the state store
functions into a new function called from the RPC endpoint
4. Perform the filtering in the endpoint before aggregation.
If startListeners successfully created listeners for some of its input addresses but eventually failed, the function would return an error and existing listeners would not be cleaned up.
Previously establishment and pending secrets were only checked at the
RPC layer. However, given that these are Check-and-Set transactions we
should ensure that the given secrets are still valid when persisting a
secret exchange or promotion.
Otherwise it would be possible for concurrent requests to overwrite each
other.
Previously there was a field indicating the operation that triggered a
secrets write. Now there is a message for each operation and it contains
the secret ID being persisted.
Previously the updates to the peering secrets UUID table relied on
inferring what action triggered the update based on a reconciliation
against the existing secrets.
Instead we now explicitly require the operation to be given so that the
inference isn't necessary. This makes the UUID table logic easier to
reason about and fixes some related bugs.
There is also an update so that the peering secrets get handled on
snapshots/restores.
Dialers do not keep track of peering secret UUIDs, so they should not
attempt to clean up data from that table when their peering is deleted.
We also now keep peer server addresses when marking peerings for
deletion. Peer server addresses are used by the ShouldDial() helper
when determining whether the peering is for a dialer or an acceptor.
We need to keep this data so that peering secrets can be cleaned up
accordingly.
Fixes a bug where a service getting deleted from the catalog would cause
the ConfigSource to spin in a hot loop attempting to look up the service.
This is because we were returning a nil WatchSet which would always
unblock the select.
Kudos to @freddygv for discovering this!
* Avoid logging StreamSecretID
* Wrap additional errors in stream handler
* Fix flakiness in leader test and rename servers for clarity. There was
a race condition where the peering was being deleted in the test
before the stream was active. Now the test waits for the stream to be
connected on both sides before deleting the associated peering.
* Run flaky test serially
* defaulting to false because peering will be released as beta
* Ignore peering disabled error in bundles cachetype
Co-authored-by: Matt Keeler <mkeeler@users.noreply.github.com>
Co-authored-by: freddygv <freddy@hashicorp.com>
Co-authored-by: Matt Keeler <mjkeeler7@gmail.com>
* add golden files
* add support to http in tgateway egress destination
* fix slice sorting to include both address and port when using server_names
* fix listener loop for http destination
* fix routes to generate a route per port and a virtualhost per port-address combination
* sort virtual hosts list to have a stable order
* extract redundant serviceNode
When we receive a FailedPrecondition error, retry that more quickly
because we expect it will resolve shortly. This is particularly
important in the context of Consul servers behind a load balancer
because when establishing a connection we have to retry until we
randomly land on a leader node.
The default retry backoff goes from 2s, 4s, 8s, etc. which can result in
very long delays quite quickly. Instead, this backoff retries in 8ms
five times, then goes exponentially from there: 16ms, 32ms, ... up to a
max of 8152ms.
There were 16 combinations of tests but 4 of them were duplicates since the default key type and bits were "ec" and 256. That entry was commented out to reduce the subtest count to 12.
testrpc.WaitForLeader was failing on arm64 environments; the cause is unknown but it might be due to the environment being flooded with parallel tests making RPC calls. The RPC polling+retry was replaced with a simpler check for leadership based on raft.
- when register service using catalog endpoint, the key of service
name actually should be "service". Add this information to the
error message will help user to quickly fix in the request.