This is only configured in xDS when a service with an L7 protocol is
exported.
They also load any relevant trust bundles for the peered services to
eventually use for L7 SPIFFE validation during mTLS termination.
1. Fix a bug where the peering leader routine would not track all active
peerings in the "stored" reconciliation map. This could lead to
tearing down streams where the token was generated, since the
ConnectedStreams() method used for reconciliation returns all streams
and not just the ones initiated by this leader routine.
2. Fix a race where stream contexts were being canceled before
termination messages were being processed by a peer.
Previously the leader routine would tear down streams by canceling
their context right after the termination message was sent. This
context cancelation could be propagated to the server side faster
than the termination message. Now there is a change where the
dialing peer uses CloseSend() to signal when no more messages will
be sent. Eventually the server peer will read an EOF after receiving
and processing the preceding termination message.
Using CloseSend() is actually not enough to address the issue
mentioned, since it doesn't wait for the server peer to finish
processing messages. Because of this now the dialing peer also reads
from the stream until an error signals that there are no more
messages. Receiving an EOF from our peer indicates that they
processed the termination message and have no additional work to do.
Given that the stream is being closed, all the messages received by
Recv are discarded. We only check for errors to avoid importing new
data.
When deleting a peering we do not want to delete the peering and all
imported data in a single operation, since deleting a large amount of
data at once could overload Consul.
Instead we defer deletion of peerings so that:
1. When a peering deletion request is received via gRPC the peering is
marked for deletion by setting the DeletedAt field.
2. A leader routine will monitor for peerings that are marked for
deletion and kick off a throttled deletion of all imported resources
before deleting the peering itself.
This commit mostly addresses point #1 by modifying the peering service
to mark peerings for deletion. Another key change is to add a
PeeringListDeleted state store function which can return all peerings
marked for deletion. This function is what will be watched by the
deferred deletion leader routine.
When converting from Consul intentions to xds RBAC rules, services imported from other peers must encode additional data like partition (from the remote cluster) and trust domain.
This PR updates the PeeringTrustBundle to hold the sending side's local partition as ExportedPartition. It also updates RBAC code to encode SpiffeIDs of imported services with the ExportedPartition and TrustDomain.
There are a handful of changes in this commit:
* When querying trust bundles for a service we need to be able to
specify the namespace of the service.
* The endpoint needs to track the index because the cache watches use
it.
* Extracted bulk of the endpoint's logic to a state store function
so that index tracking could be tested more easily.
* Removed check for service existence, deferring that sort of work to ACL authz
* Added the cache type
For mTLS to work between two proxies in peered clusters with different root CAs,
proxies need to configure their outbound listener to use different root certificates
for validation.
Up until peering was introduced proxies would only ever use one set of root certificates
to validate all mesh traffic, both inbound and outbound. Now an upstream proxy
may have a leaf certificate signed by a CA that's different from the dialing proxy's.
This PR makes changes to proxycfg and xds so that the upstream TLS validation
uses different root certificates depending on which cluster is being dialed.
I noticed that the JSON api endpoints for peerings json encodes protobufs directly, rather than converting them into their `api` package equivalents before marshal/unmarshaling them.
I updated this and used `mog` to do the annoying part in the middle.
Other changes:
- the status enum was converted into the friendlier string form of the enum for readability with tools like `curl`
- some of the `api` library functions were slightly modified to match other similar endpoints in UX (cc: @ndhanushkodi )
- peeringRead returns `nil` if not found
- partitions are NOT inferred from the agent's partition (matching 1.11-style logic)
* Install `buf` instead of `protoc`
* Created `buf.yaml` and `buf.gen.yaml` files in the two proto directories to control how `buf` generates/lints proto code.
* Invoke `buf` instead of `protoc`
* Added a `proto-format` make target.
* Committed the reformatted proto files.
* Added a `proto-lint` make target.
* Integrated proto linting with CI
* Fixed tons of proto linter warnings.
* Got rid of deprecated builtin protoc-gen-go grpc plugin usage. Moved to direct usage of protoc-gen-go-grpc.
* Unified all proto directories / go packages around using pb prefixes but ensuring all proto packages do not have the prefix.