Peerings are terminated when a peer decides to delete the peering from
their end. Deleting a peering sends a termination message to the peer
and triggers them to mark the peering as terminated but does NOT delete
the peering itself. This is to prevent peerings from disappearing from
both sides just because one side deleted them.
Previously the Delete endpoint was skipping the deletion if the peering
was not marked as active. However, terminated peerings are also
inactive.
This PR makes some updates so that peerings marked as terminated can be
deleted by users.
We need to watch for changes to peerings and update the server addresses which get served by the ring buffer.
Also, if there is an active connection for a peer, we are getting up-to-date server addresses from the replication stream and can safely ignore the token's addresses which may be stale.
Contains 2 changes to the GetEnvoyBootstrapParams response to support
consul-dataplane.
Exposing node_name and node_id:
consul-dataplane will support providing either the node_id or node_name in its
configuration. Unfortunately, supporting both in the xDS meta adds a fair amount
of complexity (partly because most tables are currently indexed on node_name)
so for now we're going to return them both from the bootstrap params endpoint,
allowing consul-dataplane to exchange a node_id for a node_name (which it will
supply in the xDS meta).
Properly setting service for gateways:
To avoid the need to special case gateways in consul-dataplane, service will now
either be the destination service name for connect proxies, or the gateway
service name. This means it can be used as-is in Envoy configuration (i.e. as a
cluster name or in metric tags).
This is the OSS portion of enterprise PR 2339.
It improves our handling of "irrecoverable" errors in proxycfg data sources.
The canonical example of this is what happens when the ACL token presented by
Envoy is deleted/revoked. Previously, the stream would get "stuck" until the
xDS server re-checked the token (after 5 minutes) and terminated the stream.
Materializers would also sit burning resources retrying something that could
never succeed.
Now, it is possible for data sources to mark errors as "terminal" which causes
the xDS stream to be closed immediately. Similarly, the submatview.Store will
evict materializers when it observes they have encountered such an error.
Consul 1.13.0 changed ServiceVirtualIP to use PeeredServiceName instead of ServiceName which was a breaking change for those using service mesh and wanted to restore their snapshot after upgrading to 1.13.0.
This commit handles existing data with older ServiceName and converts it during restore so that there are no issues when restoring from older snapshots.
1. Create a bexpr filter for performing the filtering
2. Change the state store functions to return the raw (not aggregated)
list of ServiceNodes.
3. Move the aggregate service tags by name logic out of the state store
functions into a new function called from the RPC endpoint
4. Perform the filtering in the endpoint before aggregation.
If startListeners successfully created listeners for some of its input addresses but eventually failed, the function would return an error and existing listeners would not be cleaned up.
Previously establishment and pending secrets were only checked at the
RPC layer. However, given that these are Check-and-Set transactions we
should ensure that the given secrets are still valid when persisting a
secret exchange or promotion.
Otherwise it would be possible for concurrent requests to overwrite each
other.
Previously there was a field indicating the operation that triggered a
secrets write. Now there is a message for each operation and it contains
the secret ID being persisted.
Previously the updates to the peering secrets UUID table relied on
inferring what action triggered the update based on a reconciliation
against the existing secrets.
Instead we now explicitly require the operation to be given so that the
inference isn't necessary. This makes the UUID table logic easier to
reason about and fixes some related bugs.
There is also an update so that the peering secrets get handled on
snapshots/restores.