Commit Graph

133 Commits

Author SHA1 Message Date
Daniel Upton 497df1ca3b proxycfg: server-local config entry data sources
This is the OSS portion of enterprise PR 2056.

This commit provides server-local implementations of the proxycfg.ConfigEntry
and proxycfg.ConfigEntryList interfaces, that source data from streaming events.

It makes use of the LocalMaterializer type introduced for peering replication,
adding the necessary support for authorization.

It also adds support for "wildcard" subscriptions (within a topic) to the event
publisher, as this is needed to fetch service-resolvers for all services when
configuring mesh gateways.

Currently, events will be emitted for just the ingress-gateway, service-resolver,
and mesh config entry types, as these are the only entries required by proxycfg
— the events will be emitted on topics named IngressGateway, ServiceResolver,
and MeshConfig topics respectively.

Though these events will only be consumed "locally" for now, they can also be
consumed via the gRPC endpoint (confirmed using grpcurl) so using them from
client agents should be a case of swapping the LocalMaterializer for an
RPCMaterializer.
2022-07-04 10:48:36 +01:00
alex 90577810cc
peering: add imported/exported counts to peering (#13644)
Signed-off-by: acpana <8968914+acpana@users.noreply.github.com>
Co-authored-by: Chris S. Kim <ckim@hashicorp.com>
2022-06-29 14:07:30 -07:00
alex a8ae8de20e
peering: reconcile/ hint active state for list (#13619)
Signed-off-by: acpana <8968914+acpana@users.noreply.github.com>
2022-06-29 09:43:50 -07:00
R.B. Boyer 2dba16be52
peering: replicate all SpiffeID values necessary for the importing side to do SAN validation (#13612)
When traversing an exported peered service, the discovery chain
evaluation at the other side may re-route the request to a variety of
endpoints. Furthermore we intend to terminate mTLS at the mesh gateway
for arriving peered traffic that is http-like (L7), so the caller needs
to know the mesh gateway's SpiffeID in that case as well.

The following new SpiffeID values will be shipped back in the peerstream
replication:

- tcp: all possible SpiffeIDs resulting from the service-resolver
        component of the exported discovery chain

- http-like: the SpiffeID of the mesh gateway
2022-06-27 14:37:18 -05:00
R.B. Boyer e7a7232a6b
state: peering ID assignment cannot happen inside of the state store (#13525)
Move peering ID assignment outisde of the FSM, so that the ID is written
to the raft log and the same ID is used by all voters, and after
restarts.
2022-06-21 13:04:08 -05:00
R.B. Boyer 93611819e2
xds: mesh gateways now have their own leaf certificate when involved in a peering (#13460)
This is only configured in xDS when a service with an L7 protocol is
exported.

They also load any relevant trust bundles for the peered services to
eventually use for L7 SPIFFE validation during mTLS termination.
2022-06-15 14:36:18 -05:00
freddygv a288d0c388 Avoid deleting peerings marked as terminated.
When our peer deletes the peering it is locally marked as terminated.
This termination should kick off deleting all imported data, but should
not delete the peering object itself.

Keeping peerings marked as terminated acts as a signal that the action
took place.
2022-06-14 15:37:09 -06:00
freddygv a5283e4361 Add leader routine to clean up peerings
Once a peering is marked for deletion a new leader routine will now
clean up all imported resources and then the peering itself.

A lot of the logic was grabbed from the namespace/partitions deferred
deletions but with a handful of simplifications:
- The rate limiting is not configurable.

- Deleting imported nodes/services/checks is done by deleting nodes with
  the Txn API. The services and checks are deleted as a side-effect.

- There is no "round rate limiter" like with namespaces and partitions.
  This is because peerings are purely local, and deleting a peering in
  the datacenter does not depend on deleting data from other DCs like
  with WAN-federated namespaces. All rate limiting is handled by the
  Raft rate limiter.
2022-06-14 15:36:50 -06:00
freddygv dbcbf3978f Fixup stream tear-down steps.
1. Fix a bug where the peering leader routine would not track all active
   peerings in the "stored" reconciliation map. This could lead to
   tearing down streams where the token was generated, since the
   ConnectedStreams() method used for reconciliation returns all streams
   and not just the ones initiated by this leader routine.

2. Fix a race where stream contexts were being canceled before
   termination messages were being processed by a peer.

   Previously the leader routine would tear down streams by canceling
   their context right after the termination message was sent. This
   context cancelation could be propagated to the server side faster
   than the termination message. Now there is a change where the
   dialing peer uses CloseSend() to signal when no more messages will
   be sent. Eventually the server peer will read an EOF after receiving
   and processing the preceding termination message.

   Using CloseSend() is actually not enough to address the issue
   mentioned, since it doesn't wait for the server peer to finish
   processing messages. Because of this now the dialing peer also reads
   from the stream until an error signals that there are no more
   messages. Receiving an EOF from our peer indicates that they
   processed the termination message and have no additional work to do.

   Given that the stream is being closed, all the messages received by
   Recv are discarded. We only check for errors to avoid importing new
   data.
2022-06-13 12:10:42 -06:00
freddygv 6d368b5eed Update peering state and RPC for deferred deletion
When deleting a peering we do not want to delete the peering and all
imported data in a single operation, since deleting a large amount of
data at once could overload Consul.

Instead we defer deletion of peerings so that:

1. When a peering deletion request is received via gRPC the peering is
   marked for deletion by setting the DeletedAt field.

2. A leader routine will monitor for peerings that are marked for
   deletion and kick off a throttled deletion of all imported resources
   before deleting the peering itself.

This commit mostly addresses point #1 by modifying the peering service
to mark peerings for deletion. Another key change is to add a
PeeringListDeleted state store function which can return all peerings
marked for deletion. This function is what will be watched by the
deferred deletion leader routine.
2022-06-13 12:10:32 -06:00
Freddy 9eeb9e4ee3
Clean up imported nodes/services/checks as needed (#13367)
Previously, imported data would never be deleted. As
nodes/services/checks were registered and deregistered, resources
deleted from the exporting cluster would accumulate in the imported
cluster.

This commit makes updates to replication so that whenever an update is
received for a service name we reconcile what was present in the catalog
against what was received.

This handleUpdateService method can handle both updates and deletions.
2022-06-13 11:52:28 -06:00
Chris S. Kim 4cb251497f
Update RBAC to handle imported services (#13404)
When converting from Consul intentions to xds RBAC rules, services imported from other peers must encode additional data like partition (from the remote cluster) and trust domain.

This PR updates the PeeringTrustBundle to hold the sending side's local partition as ExportedPartition. It also updates RBAC code to encode SpiffeIDs of imported services with the ExportedPartition and TrustDomain.
2022-06-10 17:15:22 -04:00
R.B. Boyer 33b497e7c9
peering: rename initiate to establish in the context of the APIs (#13419) 2022-06-10 11:10:46 -05:00
R.B. Boyer d81d8468db
peering: mesh gateways are required for cross-peer service mesh communication (#13410)
Require use of mesh gateways in order for service mesh data plane
traffic to flow between peers.

This also adds plumbing for envoy integration tests involving peers, and
one starter peering test.
2022-06-09 11:05:18 -05:00
R.B. Boyer c1f20d17ee
peering: allow protobuf requests to populate the default partition or namespace (#13398) 2022-06-08 11:55:18 -05:00
R.B. Boyer 0681f3571d
peering: allow mesh gateways to proxy L4 peered traffic (#13339)
Mesh gateways will now enable tcp connections with SNI names including peering information so that those connections may be proxied.

Note: this does not change the callers to use these mesh gateways.
2022-06-06 14:20:41 -05:00
alex ff2ad3ba0c
peering: send leader addr (#13342)
Signed-off-by: acpana <8968914+acpana@users.noreply.github.com>
2022-06-06 10:00:38 -07:00
R.B. Boyer 4c781d1e15
peering: update how cross-peer upstreams and represented in proxycfg and rendered in xds (#13362)
This removes unnecessary, vestigal remnants of discovery chains.
2022-06-03 16:42:50 -05:00
freddygv ad6dbe081a Add agent cache-type for TrustBundleListByService
There are a handful of changes in this commit:
* When querying trust bundles for a service we need to be able to
  specify the namespace of the service.
* The endpoint needs to track the index because the cache watches use
  it.
* Extracted bulk of the endpoint's logic to a state store function
  so that index tracking could be tested more easily.
* Removed check for service existence, deferring that sort of work to ACL authz
* Added the cache type
2022-06-01 17:05:10 -06:00
freddygv 073c9e3a91 Update assumptions around exported-service config
Given that the exported-services config entry can use wildcards, the
precedence for wildcards is handled as with intentions. The most exact
match is the match that applies for any given service. We do not take
the union of all that apply.

Another update that was made was to reflect that only one
exported-services config entry applies to any given service in a
partition. This is a pre-existing constraint that gets enforced by
the Normalize() method on that config entry type.
2022-06-01 17:03:51 -06:00
freddygv 5cd5108075 Return SPIFFE ID for connect proxies in PeerMeta
Proxies dialing exporting services need to know the SPIFFE ID of
services dialed so that the upstream's SANs can be validated.

This commit attaches the SPIFFE ID to all connect proxies exported over
the peering stream so that they are available to importing clusters.

The data in the SPIFFE ID cannot be re-constructed in peer clusters
because the partition of exported services is overwritten on imports.
2022-05-31 09:55:37 -06:00
Freddy a75af9d94a
[OSS] Add grpc endpoint to fetch a specific trust bundle (#13292)
Co-authored-by: R.B. Boyer <rb@hashicorp.com>
2022-05-31 09:54:40 -06:00
alex 2d8664d384
monitor leadership in peering service (#13257)
Signed-off-by: acpana <8968914+acpana@users.noreply.github.com>

Co-authored-by: Chris S. Kim <ckim@hashicorp.com>
Co-authored-by: Freddy <freddygv@users.noreply.github.com>
2022-05-26 17:55:16 -07:00
Chris S. Kim d73a9522cb
Add support for streaming CA roots to peers (#13260)
Sender watches for changes to CA roots and sends
them through the replication stream. Receiver saves
CA roots to tablePeeringTrustBundle
2022-05-26 15:24:09 -04:00
R.B. Boyer bc10055edc
peering: replicate expected SNI, SPIFFE, and service protocol to peers (#13218)
The importing peer will need to know what SNI and SPIFFE name
corresponds to each exported service. Additionally it will need to know
at a high level the protocol in use (L4/L7) to generate the appropriate
connection pool and local metrics.

For replicated connect synthetic entities we edit the `Connect{}` part
of a `NodeService` to have a new section:

    {
      "PeerMeta": {
        "SNI": [
          "web.default.default.owt.external.183150d5-1033-3672-c426-c29205a576b8.consul"
        ],
        "SpiffeID": [
          "spiffe://183150d5-1033-3672-c426-c29205a576b8.consul/ns/default/dc/dc1/svc/web"
        ],
        "Protocol": "tcp"
      }
    }

This data is then replicated and saved as-is at the importing side. Both
SNI and SpiffeID are slices for now until I can be sure we don't need
them for how mesh gateways will ultimately work.
2022-05-25 12:37:44 -05:00
R.B. Boyer 69191fc0da
peering: disable requirement for mesh gateways initially (#13213) 2022-05-25 10:13:23 -05:00
alex 451dc50f4f
peering: expose IsLeader, hung up on dialer if follower (#13164)
Signed-off-by: acpana <8968914+acpana@users.noreply.github.com>

Co-authored-by: R.B. Boyer <4903+rboyer@users.noreply.github.com>
2022-05-23 11:30:58 -07:00
R.B. Boyer 63a9175bd6
peering: accept replication stream of discovery chain information at the importing side (#13151) 2022-05-19 16:37:52 -05:00
R.B. Boyer 91691eca87 peering: replicate discovery chains information to importing peers
Treat each exported service as a "discovery chain" and replicate one
synthetic CheckServiceNode for each chain and remote mesh gateway.

The health will be a flattened generated check of the checks for that
mesh gateway node.
2022-05-19 14:21:44 -05:00
R.B. Boyer bf05e8c1f1 prefactor some functions out of the monolithic file 2022-05-19 14:21:29 -05:00
Freddy 6c868b6c0e
Patches to peering initiation for POC demo (#13076)
Co-authored-by: R.B. Boyer <rb@hashicorp.com>
2022-05-13 13:01:00 -06:00
Freddy 160acdf876
Actually block when syncing subscriptions (#13066)
By changing to use WatchCtx we will actually block for changes to the peering list. WatchCh creates a goroutine to collect errors from WatchCtx and returns immediately.

The existing behavior wouldn't result in a tight loop because of the rate limiting in the surrounding function, but it would still lead to more work than is necessary.
2022-05-12 17:36:14 -06:00
Evan Culver 535e811020
peering: add TrustBundleListByService endpoint (#13048) 2022-05-12 15:58:22 -07:00
Freddy 8894365c5a
[OSS] Add upsert handling for receiving CheckServiceNode (#13061) 2022-05-12 15:04:44 -06:00
R.B. Boyer b932d0dabc
test: ensure this package uses freeport for port allocation (#13036) 2022-05-11 14:20:50 -05:00
R.B. Boyer c855df87ec
remove remaining shim runStep functions (#13015)
Wraps up the refactor from #13013
2022-05-10 16:24:45 -05:00
R.B. Boyer 9ad10318cd
add general runstep test helper instead of copying it all over the place (#13013) 2022-05-10 15:25:51 -05:00
FFMMM 76a6647700
expose meta tags for peering (#12964) 2022-05-09 13:47:37 -07:00
R.B. Boyer 809344a6f5
peering: initial sync (#12842)
- Add endpoints related to peering: read, list, generate token, initiate peering
- Update node/service/check table indexing to account for peers
- Foundational changes for pushing service updates to a peer
- Plumb peer name through Health.ServiceNodes path

see: ENT-1765, ENT-1280, ENT-1283, ENT-1283, ENT-1756, ENT-1739, ENT-1750, ENT-1679,
     ENT-1709, ENT-1704, ENT-1690, ENT-1689, ENT-1702, ENT-1701, ENT-1683, ENT-1663,
     ENT-1650, ENT-1678, ENT-1628, ENT-1658, ENT-1640, ENT-1637, ENT-1597, ENT-1634,
     ENT-1613, ENT-1616, ENT-1617, ENT-1591, ENT-1588, ENT-1596, ENT-1572, ENT-1555

Co-authored-by: R.B. Boyer <rb@hashicorp.com>
Co-authored-by: freddygv <freddy@hashicorp.com>
Co-authored-by: Chris S. Kim <ckim@hashicorp.com>
Co-authored-by: Evan Culver <eculver@hashicorp.com>
Co-authored-by: Nitya Dhanushkodi <nitya@hashicorp.com>
2022-04-21 17:34:40 -05:00
FFMMM cf7e6484aa
add more labels to RequestRecorder (#12727)
Co-authored-by: Daniel Nephin <dnephin@hashicorp.com>
Signed-off-by: FFMMM <FFMMM@users.noreply.github.com>
2022-04-12 10:50:25 -07:00
FFMMM 0f68bf879a
[rpc/middleware][consul] plumb intercept off, add server level happy test (#12692) 2022-04-06 14:33:05 -07:00
FFMMM 6bdde40d5e
lower log to trace (#12708) 2022-04-06 11:37:08 -07:00
FFMMM 8b184197b3
polish rpc.service.call metric behavior (#12624) 2022-03-31 10:49:37 -07:00
FFMMM 560f8cbc89
fix bad oss sync, use gauges not counters (#12611) 2022-03-24 14:41:30 -07:00
FFMMM 76d8798590
factor out recording func, add unit tests (#12585)
Signed-off-by: FFMMM <FFMMM@users.noreply.github.com>
2022-03-22 09:31:54 -07:00
Dan Upton fb441e323a
Restructure gRPC server setup (#12586)
OSS sync of enterprise changes at 0b44395e
2022-03-22 12:40:24 +00:00
FFMMM 08f2838b78
pre register new rpc metric, rename metric (#12582) 2022-03-21 17:26:32 -07:00
FFMMM 3c08843847
[sync oss] add net/rpc interceptor implementation (#12573)
* sync ent changes from 866dcb0667

Signed-off-by: FFMMM <FFMMM@users.noreply.github.com>

* update oss go.mod

Signed-off-by: FFMMM <FFMMM@users.noreply.github.com>
2022-03-17 16:02:26 -07:00
Dan Upton ebdda4848f
streaming: split event buffer by key (#12080) 2022-01-28 12:27:00 +00:00
Giulio Micheloni 10cdc0a5c8
Merge branch 'main' into serve-panic-recovery 2021-11-06 16:12:06 +01:00
Daniel Nephin db29ad346b acl: remove id and revision from Policy constructors
The fields were removed in a previous commit.

Also remove an unused constructor for PolicyMerger
2021-11-05 15:45:08 -04:00
Daniel Nephin 88c6aeea34 acl: remove legacy arg to store.ACLTokenSet
And remove the tests for legacy=true
2021-10-25 17:25:14 -04:00
Giulio Micheloni 10814d934e Merge branch 'main' of https://github.com/hashicorp/consul into hashicorp-main 2021-10-16 16:59:32 +01:00
R.B. Boyer ba13416b57
grpc: strip local ACL tokens from RPCs during forwarding if crossing datacenters (#11099)
Fixes #11086
2021-09-22 13:14:26 -05:00
Giulio Micheloni 10b03c3f4e
Merge branch 'main' into serve-panic-recovery 2021-08-22 20:31:11 +02:00
Giulio Micheloni 465e9fecda grpc, xds: recovery middleware to return and log error in case of panic
1) xds and grpc servers:
   1.1) to use recovery middleware with callback that prints stack trace to log
   1.2) callback turn the panic into a core.Internal error
2) added unit test for grpc server
2021-08-22 19:06:26 +01:00
R.B. Boyer 61f1c01b83
agent: ensure that most agent behavior correctly respects partition configuration (#10880) 2021-08-19 15:09:42 -05:00
R.B. Boyer e50e13d2ab
state: partition nodes and coordinates in the state store (#10859)
Additionally:

- partitioned the catalog indexes appropriately for partitioning
- removed a stray reference to a non-existent index named "node.checks"
2021-08-17 13:29:39 -05:00
Daniel Nephin b6d9d0d9f7 acl: remove many instances of authz == nil 2021-07-30 13:58:35 -04:00
R.B. Boyer 254557a1f6
sync changes to oss files made in enterprise (#10670) 2021-07-22 13:58:08 -05:00
R.B. Boyer 62ac98b564
agent/structs: add a bunch more EnterpriseMeta helper functions to help with partitioning (#10669) 2021-07-22 13:20:45 -05:00
Daniel Nephin 94820e67a8 structs: remove EnterpriseMeta.GetNamespace
I added this recently without realizing that the method already existed and was named
NamespaceOrEmpty. Replace all calls to GetNamespace with NamespaceOrEmpty or NamespaceOrDefault
as appropriate.
2021-03-09 15:17:26 -05:00
Daniel Nephin 88bbde56da agent: add a test for streaming in the service health endpoint
Co-authored-by: Paul Banks <banks@banksco.de>
2021-02-25 14:08:10 -05:00
Daniel Nephin c40d063a0e structs: rename EnterpriseMeta constructor
To match the Go convention.
2021-02-16 14:45:43 -05:00
Daniel Nephin ef0999547a testing: skip slow tests with -short
Add a skip condition to all tests slower than 100ms.

This change was made using `gotestsum tool slowest` with data from the
last 3 CI runs of master.
See https://github.com/gotestyourself/gotestsum#finding-and-skipping-slow-tests

With this change:

```
$ time go test -count=1 -short ./agent
ok      github.com/hashicorp/consul/agent       0.743s

real    0m4.791s

$ time go test -count=1 -short ./agent/consul
ok      github.com/hashicorp/consul/agent/consul        4.229s

real    0m8.769s
```
2020-12-07 13:42:55 -05:00
Daniel Nephin e4a78c977d stream: document that Payload must be immutable
If they are sent to EventPublisher.Publish.

Also document that PayloadEvents is expected to come from a subscription and that it is
not immutable.
2020-11-06 13:00:33 -05:00
Daniel Nephin d4cd2fa6a8 stream: Add HasReadPermission to Payload
Required now that filter is a method on PayloadEvents instead of Event
2020-11-05 19:17:18 -05:00
Daniel Nephin 621f1db766
Merge pull request #9073 from hashicorp/dnephin/backport-streaming-namespaces
streaming: backport namespace changes
2020-11-05 14:19:10 -05:00
Daniel Nephin cd220e5d6c
Merge pull request #9061 from hashicorp/dnephin/event-fields
stream: support filtering by namespace
2020-11-05 14:18:35 -05:00
Daniel Nephin 8a017c4f43 structs: add a namespace test for CheckServiceNode.CanRead 2020-10-30 15:07:04 -04:00
Daniel Nephin 8da30fcb9a subscribe: set the request namespace 2020-10-30 14:34:04 -04:00
Daniel Nephin 61ce0964a4 stream: remove Event.Key
Makes Payload a type with FilterByKey so that Payloads can implement
filtering by key. With this approach we don't need to expose a Namespace
field on Event, and we don't need to invest micro formats or require a
bunch of code to be aware of exactly how the key field is encoded.
2020-10-28 16:48:04 -04:00
Daniel Nephin c106d94742 proto: remove Event.Key field
The field is never used, and the value is available from the payload.
2020-10-28 16:33:00 -04:00
Daniel Nephin ab43236f86 proto: remove Event.Namespace field
All events are part of a single Topic, so we don't need this field.
2020-10-28 16:33:00 -04:00
Daniel Nephin 44da869ed4 stream: Use a no-op event publisher if streaming is disabled 2020-10-28 13:54:19 -04:00
Daniel Nephin fb8b68a6ec stream: close the subscription on Unsubscribe 2020-10-22 13:39:27 -04:00
Daniel Nephin f06fd96d3a subscribe: add test cases for newEventFromStreamEvent 2020-10-08 18:48:17 -04:00
Daniel Nephin ea95908f63 subscribe: Add steps to rpc/subscribe tests
To make them easier to follow
2020-10-08 15:38:01 -04:00
Daniel Nephin e0236b5a9f
Merge pull request #8818 from hashicorp/streaming/add-subscribe-service-batch-events
stream: handle batch events as a special case of Event
2020-10-07 21:25:32 -04:00
Daniel Nephin eb6f2a8d72 structs: add CheckServiceNode.CanRead
And use it from the subscribe endpoint.
2020-10-07 18:15:13 -04:00
Daniel Nephin ad29cf4f94 stream: Return a single event from a subscription.Next
Handle batch events as a single event
2020-10-06 13:18:20 -04:00
Daniel Nephin 3183b9ebb3 subscribe: update to use NewSnapshotToFollow event 2020-10-06 12:49:35 -04:00
Daniel Nephin fa115c6249 Move agent/subscribe -> agent/rpc/subscribe 2020-10-06 12:49:35 -04:00