Commit Graph

1989 Commits

Author SHA1 Message Date
Eric Haberkorn 5fd1e6daea
Add exported services event to cluster peering replication. (#14797) 2022-09-29 15:37:19 -04:00
malizz 5c470b28dd
Support Stale Queries for Trust Bundle Lookups (#14724)
* initial commit

* add tags, add conversations

* add test for query options utility functions

* update previous tests

* fix test

* don't error out on empty context

* add changelog

* update decode config
2022-09-28 09:56:59 -07:00
Nick Ethier 5e4b3ef5d4
add HCP integration component (#14723)
* add HCP integration

* lint: use non-deprecated logging interface
2022-09-26 14:58:15 -04:00
Chris S. Kim 7ec8a0667a Add new internal endpoint to list exported services to a peer 2022-09-23 09:43:56 -04:00
freddygv 0c3853a2d0 Add server certificate manager
This certificate manager will request a leaf certificate for server
agents and then keep them up to date.
2022-09-16 17:57:10 -06:00
freddygv ef99b30cb8 Generate ACL token for server management
This commit introduces a new ACL token used for internal server
management purposes.

It has a few key properties:
- It has unlimited permissions.
- It is persisted through Raft as System Metadata rather than in the
ACL tokens table. This is to avoid users seeing or modifying it.
- It is re-generated on leadership establishment.
2022-09-16 17:54:34 -06:00
Kyle Havlovitz 40da079f18
Merge pull request #14598 from hashicorp/root-removal-fix
connect/ca: Don't discard old roots on primaryInitialize
2022-09-15 14:36:01 -07:00
Kyle Havlovitz fe10009a12 connect/ca: don't discard old roots on primaryInitialize 2022-09-15 12:59:09 -07:00
DanStough b37a2ba889 feat(peering): validate server name conflicts on establish 2022-09-14 11:37:30 -04:00
Derek Menteer 5d1487e167
Add CSR check for number of URIs. (#14579)
Add CSR check for number of URIs.
2022-09-13 14:21:47 -05:00
Derek Menteer cfcd9f2a2c Add input validation for auto-config JWT authorization checks. 2022-09-13 11:16:36 -05:00
skpratt cf6c1d9388
add non-double-prefixed metrics (#14193) 2022-09-09 12:13:43 -05:00
Dan Upton 9fe6c33c0d
xDS Load Balancing (#14397)
Prior to #13244, connect proxies and gateways could only be configured by an
xDS session served by the local client agent.

In an upcoming release, it will be possible to deploy a Consul service mesh
without client agents. In this model, xDS sessions will be handled by the
servers themselves, which necessitates load-balancing to prevent a single
server from receiving a disproportionate amount of load and becoming
overwhelmed.

This introduces a simple form of load-balancing where Consul will attempt to
achieve an even spread of load (xDS sessions) between all healthy servers.
It does so by implementing a concurrent session limiter (limiter.SessionLimiter)
and adjusting the limit according to autopilot state and proxy service
registrations in the catalog.

If a server is already over capacity (i.e. the session limit is lowered),
Consul will begin draining sessions to rebalance the load. This will result
in the client receiving a `RESOURCE_EXHAUSTED` status code. It is the client's
responsibility to observe this response and reconnect to a different server.

Users of the gRPC client connection brokered by the
consul-server-connection-manager library will get this for free.

The rate at which Consul will drain sessions to rebalance load is scaled
dynamically based on the number of proxies in the catalog.
2022-09-09 15:02:01 +01:00
Derek Menteer 8efe862b76 Merge branch 'main' of github.com:hashicorp/consul into derekm/split-grpc-ports 2022-09-08 14:53:08 -05:00
Derek Menteer 6aaf1c6035 Various cleanups. 2022-09-08 10:51:50 -05:00
Chris S. Kim 9b5c5c5062
Merge pull request #14285 from hashicorp/NET-638-push-server-address-updates-to-the-peer
peering: Subscribe to server address changes and push updates to peers
2022-09-07 09:30:45 -04:00
Freddy a7f38384ae
Add SpiffeID for Consul server agents (#14485)
Co-authored-by: Eric Haberkorn <erichaberkorn@gmail.com>

By adding a SpiffeID for server agents, servers can now request a leaf
certificate from the Connect CA.

This new Spiffe ID has a key property: servers are identified by their
datacenter name and trust domain. All servers that share these
attributes will share a ServerURI.

The aim is to use these certificates to verify the server name of ANY
server in a Consul datacenter.
2022-09-06 17:58:13 -06:00
Daniel Upton 8cd6c9f95e proxycfg-glue: server-local implementation of ResolvedServiceConfig
This is the OSS portion of enterprise PR 2460.

Introduces a server-local implementation of the proxycfg.ResolvedServiceConfig
interface that sources data from a blocking query against the server's state
store.

It moves the service config resolution logic into the agent/configentry package
so that it can be used in both the RPC handler and data source.

I've also done a little re-arranging and adding comments to call out data
sources for which there is to be no server-local equivalent.
2022-09-06 23:27:25 +01:00
Derek Menteer b50bc443f3 Merge branch 'main' of github.com:hashicorp/consul into derekm/split-grpc-ports 2022-09-06 10:51:04 -05:00
Derek Menteer d771725a14 Add kv txn get-not-exists operation. 2022-09-06 10:28:59 -05:00
Chris S. Kim 9ad8bf67a5 Add testcase for parsing grpc_port 2022-09-06 10:17:44 -04:00
Kyle Havlovitz a484a759c8
Merge pull request #14429 from hashicorp/ca-prune-intermediates
Prune old expired intermediate certs when appending a new one
2022-09-02 15:34:33 -07:00
Derek Menteer cb478b0e61 Address PR comments. 2022-09-01 16:54:24 -05:00
Kyle Havlovitz 90fa16c8b5 Prune intermediates before appending new one 2022-09-01 14:24:30 -07:00
malizz ef5f697121
Add additional parameters to envoy passive health check config (#14238)
* draft commit

* add changelog, update test

* remove extra param

* fix test

* update type to account for nil value

* add test for custom passive health check

* update comments and tests

* update description in docs

* fix missing commas
2022-09-01 09:59:11 -07:00
Chris S. Kim e70ba97e45 Add Internal.ServiceDump support for querying by PeerName 2022-09-01 10:32:59 -04:00
Derek Menteer ab9d421ba2 Change serf-tag references to field references. 2022-08-31 16:38:42 -05:00
Kyle Havlovitz c5370d52e9 Prune old expired intermediate certs when appending a new one 2022-08-31 11:41:58 -07:00
Chris S. Kim 9c157e40a3 Merge branch 'main' into NET-638-push-server-address-updates-to-the-peer
# Conflicts:
#	agent/grpc-external/services/peerstream/stream_test.go
2022-08-30 11:09:25 -04:00
Freddy f27a9effca
Merge pull request #13496 from maxb/fix-kv_entries-metric 2022-08-29 15:35:11 -06:00
Freddy 69d99aa8c0
Merge pull request #14364 from hashicorp/peering/term-delete 2022-08-29 15:33:18 -06:00
Max Bowsher 3aefc4123f Merge branch 'main' into fix-kv_entries-metric 2022-08-29 22:22:10 +01:00
Chris S. Kim 7b267f5c01
Merge pull request #14371 from hashicorp/kisunji/peering-metrics-update
Adjust metrics reporting for peering tracker
2022-08-29 17:16:19 -04:00
Chris S. Kim e4a154c88e Add heartbeat timeout grace period when accounting for peering health 2022-08-29 16:32:26 -04:00
Derek Menteer b641dcf03d Expose `grpc_tls` via serf for cluster peering. 2022-08-29 13:43:49 -05:00
Derek Menteer 4a01d75cf8 Add separate grpc_tls port.
To ease the transition for users, the original gRPC
port can still operate in a deprecated mode as either
plain-text or TLS mode. This behavior should be removed
in a future release whenever we no longer support this.

The resulting behavior from this commit is:
  `ports.grpc > 0 && ports.grpc_tls > 0` spawns both plain-text and tls ports.
  `ports.grpc > 0 && grpc.tls == undefined` spawns a single plain-text port.
  `ports.grpc > 0 && grpc.tls != undefined` spawns a single tls port (backwards compat mode).
2022-08-29 13:43:43 -05:00
freddygv f790d84c04 Add validation to prevent switching dialing mode
This prevents unexpected changes to the output of ShouldDial, which
should never change unless a peering is deleted and recreated.
2022-08-29 12:31:13 -06:00
Eric Haberkorn 2a370d456b
Update the structs and discovery chain for service resolver redirects to cluster peers. (#14366) 2022-08-29 09:51:32 -04:00
Chris S. Kim b1025f2dd9 Adjust metrics reporting for peering tracker 2022-08-26 17:34:17 -04:00
freddygv 19f25fc3a5 Allow terminated peerings to be deleted
Peerings are terminated when a peer decides to delete the peering from
their end. Deleting a peering sends a termination message to the peer
and triggers them to mark the peering as terminated but does NOT delete
the peering itself. This is to prevent peerings from disappearing from
both sides just because one side deleted them.

Previously the Delete endpoint was skipping the deletion if the peering
was not marked as active. However, terminated peerings are also
inactive.

This PR makes some updates so that peerings marked as terminated can be
deleted by users.
2022-08-26 10:52:47 -06:00
Chris S. Kim 516a6daefa Merge branch 'main' into catalog-service-list-filter 2022-08-26 11:16:06 -04:00
Chris S. Kim a2c857df40 Fix tests for enterprise 2022-08-26 11:14:02 -04:00
Chris S. Kim a5e9ea6d96 Merge branch 'main' into NET-638-push-server-address-updates-to-the-peer
# Conflicts:
#	agent/grpc-external/services/peerstream/stream_test.go
2022-08-26 10:43:56 -04:00
Chris S. Kim a8090268d4
Replace ring buffer with async version (#14314)
We need to watch for changes to peerings and update the server addresses which get served by the ring buffer.

Also, if there is an active connection for a peer, we are getting up-to-date server addresses from the replication stream and can safely ignore the token's addresses which may be stale.
2022-08-26 10:27:13 -04:00
alex f64af3be24
peering: add peer health metric (#14004)
Signed-off-by: acpana <8968914+acpana@users.noreply.github.com>
2022-08-25 16:32:59 -07:00
Chris S. Kim 2e75833133 Exit loop when context is cancelled 2022-08-25 11:48:25 -04:00
skpratt c039028401
no-op: refactor usagemetrics tests for clarity and DRY cases (#14313) 2022-08-24 12:00:09 -05:00
Dan Upton 20c87d235f
dataplane: update envoy bootstrap params for consul-dataplane (#14017)
Contains 2 changes to the GetEnvoyBootstrapParams response to support
consul-dataplane.

Exposing node_name and node_id:

consul-dataplane will support providing either the node_id or node_name in its
configuration. Unfortunately, supporting both in the xDS meta adds a fair amount
of complexity (partly because most tables are currently indexed on node_name)
so for now we're going to return them both from the bootstrap params endpoint,
allowing consul-dataplane to exchange a node_id for a node_name (which it will
supply in the xDS meta).

Properly setting service for gateways:

To avoid the need to special case gateways in consul-dataplane, service will now
either be the destination service name for connect proxies, or the gateway
service name. This means it can be used as-is in Envoy configuration (i.e. as a
cluster name or in metric tags).
2022-08-24 12:03:15 +01:00
Chris S. Kim 1e7a3b8d8d PR feedback to specify Node name in test mock 2022-08-23 11:51:04 -04:00
Eric Haberkorn 3d45306e1b
Cluster peering failover disco chain changes (#14296) 2022-08-23 09:13:43 -04:00
Chris S. Kim 0ae3462e61 Add missing mock assertions 2022-08-22 13:55:01 -04:00
cskh e30d6bfc40
Fix: add missing ent meta for test (#14289) 2022-08-22 13:51:04 -04:00
Chris S. Kim 9f96f98ab6 Expose external gRPC port in autopilot
The grpc_port was added to a NodeService's meta in ea58f235f5da416224ba615405269661ba1f4d8d
2022-08-22 10:07:00 -04:00
cskh a87d8f48be
fix: missing MaxInboundConnections field in service-defaults config entry (#14072)
* fix:  missing max_inbound_connections field in merge config
2022-08-19 14:11:21 -04:00
cskh 7f66dfc780
Fix: upgrade pkg imdario/merg to prevent merge config panic (#14237)
* upgrade imdario/merg to prevent merge config panic

* test: service definition takes precedence over service-defaults in merged results
2022-08-17 21:14:04 -04:00
James Hartig a5a200e0e9 Use the maximum jitter when calculating the timeout
The timeout should include the maximum possible
jitter since the server will randomly add to it's
timeout a jitter. If the server's timeout is less
than the client's timeout then the client will
return an i/o deadline reached error.

Before:
```
time curl 'http://localhost:8500/v1/catalog/service/service?dc=other-dc&stale=&wait=600s&index=15820644'
rpc error making call: i/o deadline reached
real    10m11.469s
user    0m0.018s
sys     0m0.023s
```

After:
```
time curl 'http://localhost:8500/v1/catalog/service/service?dc=other-dc&stale=&wait=600s&index=15820644'
[...]
real    10m35.835s
user    0m0.021s
sys     0m0.021s
```
2022-08-17 10:24:09 -04:00
cskh c20d016f62
fix: missing segment and partition (#14194) 2022-08-12 15:21:39 -04:00
cskh e7b5baa3cc
feat(telemetry): add labels to serf and memberlist metrics (#14161)
* feat(telemetry): add labels to serf and memberlist metrics
* changelog
* doc update

Co-authored-by: R.B. Boyer <4903+rboyer@users.noreply.github.com>
2022-08-11 22:09:56 -04:00
Chris S. Kim 182399255b
Handle breaking change for ServiceVirtualIP restore (#14149)
Consul 1.13.0 changed ServiceVirtualIP to use PeeredServiceName instead of ServiceName which was a breaking change for those using service mesh and wanted to restore their snapshot after upgrading to 1.13.0.

This commit handles existing data with older ServiceName and converts it during restore so that there are no issues when restoring from older snapshots.
2022-08-11 14:47:10 -04:00
Chris S. Kim 55945a8231 Add test to verify forwarding 2022-08-11 11:16:02 -04:00
Chris S. Kim fbbb54fdc2 Register peerStreamServer internally to enable RPC forwarding 2022-08-11 11:16:02 -04:00
Chris S. Kim 534096a6ac Handle wrapped errors in isFailedPreconditionErr 2022-08-11 11:16:02 -04:00
Daniel Kimsey 4243e1e05f Add support for filtering the 'List Services' API
1. Create a bexpr filter for performing the filtering
2. Change the state store functions to return the raw (not aggregated)
   list of ServiceNodes.
3. Move the aggregate service tags by name logic out of the state store
   functions into a new function called from the RPC endpoint
4. Perform the filtering in the endpoint before aggregation.
2022-08-10 16:52:32 -05:00
Kyle Havlovitz 57afbb58ac
Merge pull request #13958 from hashicorp/gateway-wildcard-fix
Fix wildcard picking up services it shouldn't for ingress/terminating gateways
2022-08-08 12:54:40 -07:00
Kyle Havlovitz 2a0ab31ca4 Add some extra handling for destination deletes 2022-08-08 11:38:13 -07:00
freddygv 1e48b4f665 Update snapshot test 2022-08-08 09:17:15 -06:00
freddygv 65bcd3d84f Re-validate existing secrets at state store
Previously establishment and pending secrets were only checked at the
RPC layer. However, given that these are Check-and-Set transactions we
should ensure that the given secrets are still valid when persisting a
secret exchange or promotion.

Otherwise it would be possible for concurrent requests to overwrite each
other.
2022-08-08 09:06:07 -06:00
freddygv 67aa7ed15c Test fixes 2022-08-08 08:31:47 -06:00
freddygv 01b0cbcbd7 Use proto message for each secrets write op
Previously there was a field indicating the operation that triggered a
secrets write. Now there is a message for each operation and it contains
the secret ID being persisted.
2022-08-08 01:41:00 -06:00
Kyle Havlovitz 3f435f31ac Update ingress/terminating wildcard logic and handle destinations 2022-08-05 07:56:10 -07:00
freddygv 3a623f2e9d Inherit active secret when exchanging 2022-08-03 17:32:53 -05:00
freddygv b089472a12 Pass explicit signal with op for secrets write
Previously the updates to the peering secrets UUID table relied on
inferring what action triggered the update based on a reconciliation
against the existing secrets.

Instead we now explicitly require the operation to be given so that the
inference isn't necessary. This makes the UUID table logic easier to
reason about and fixes some related bugs.

There is also an update so that the peering secrets get handled on
snapshots/restores.
2022-08-03 17:25:12 -05:00
freddygv 544b3603e9 Avoid deleting peering secret UUIDs at dialers
Dialers do not keep track of peering secret UUIDs, so they should not
attempt to clean up data from that table when their peering is deleted.

We also now keep peer server addresses when marking peerings for
deletion. Peer server addresses are used by the ShouldDial() helper
when determining whether the peering is for a dialer or an acceptor.
We need to keep this data so that peering secrets can be cleaned up
accordingly.
2022-08-03 16:34:57 -05:00
Kyle Havlovitz fce49a1ec0 Fix wildcard picking up services it shouldn't for ingress/terminating gateways 2022-08-02 09:41:31 -07:00
Freddy 56144cf5f7
Various peering fixes (#13979)
* Avoid logging StreamSecretID
* Wrap additional errors in stream handler
* Fix flakiness in leader test and rename servers for clarity. There was
  a race condition where the peering was being deleted in the test
  before the stream was active. Now the test waits for the stream to be
  connected on both sides before deleting the associated peering.
* Run flaky test serially
2022-08-01 15:06:18 -06:00
Luke Kysow e9960dfdf3
peering: default to false (#13963)
* defaulting to false because peering will be released as beta
* Ignore peering disabled error in bundles cachetype

Co-authored-by: Matt Keeler <mkeeler@users.noreply.github.com>
Co-authored-by: freddygv <freddy@hashicorp.com>
Co-authored-by: Matt Keeler <mjkeeler7@gmail.com>
2022-08-01 15:22:36 -04:00
Freddy a54903b0f4
Merge branch 'main' into fix-kv_entries-metric 2022-08-01 13:19:27 -06:00
Matt Keeler 795e5830c6
Implement/Utilize secrets for Peering Replication Stream (#13977) 2022-08-01 10:33:18 -04:00
alex 0f6354685b
block PeerName register requests (#13887)
Signed-off-by: acpana <8968914+acpana@users.noreply.github.com>
2022-07-29 14:36:22 -07:00
Luke Kysow 17594a123e
peering: retry establishing connection more quickly on certain errors (#13938)
When we receive a FailedPrecondition error, retry that more quickly
because we expect it will resolve shortly. This is particularly
important in the context of Consul servers behind a load balancer
because when establishing a connection we have to retry until we
randomly land on a leader node.

The default retry backoff goes from 2s, 4s, 8s, etc. which can result in
very long delays quite quickly. Instead, this backoff retries in 8ms
five times, then goes exponentially from there: 16ms, 32ms, ... up to a
max of 8152ms.
2022-07-29 13:04:32 -07:00
acpana 70e052f35f
sync more acl enforcement
sync w ent at 32756f7

Signed-off-by: acpana <8968914+acpana@users.noreply.github.com>
2022-07-28 12:01:52 -07:00
Ashwin Venkatesh e4aaf467eb
Add peer counts to emitted metrics. (#13930) 2022-07-27 18:34:04 -04:00
Luke Kysow a2290791b2
Merge pull request #13924 from hashicorp/lkysow/util-metric-peering
peering: don't track imported services/nodes in usage
2022-07-27 14:49:55 -07:00
Chris S. Kim 213e985d17 Reduce arm64 flakes for TestConnectCA_ConfigurationSet_ChangeKeyConfig_Primary
There were 16 combinations of tests but 4 of them were duplicates since the default key type and bits were "ec" and 256. That entry was commented out to reduce the subtest count to 12.

testrpc.WaitForLeader was failing on arm64 environments; the cause is unknown but it might be due to the environment being flooded with parallel tests making RPC calls. The RPC polling+retry was replaced with a simpler check for leadership based on raft.
2022-07-27 13:54:34 -04:00
Chris S. Kim c80ab10527 Retry checks for virtual IP metadata 2022-07-27 13:54:34 -04:00
Chris S. Kim 146dd93775 Sort slice of ServiceNames deterministically 2022-07-27 13:54:34 -04:00
Luke Kysow 92c1f30359 peering: don't track imported services/nodes in usage
Services/nodes that are imported from other peers are stored in
state. We don't want to count those as part of our own cluster's usage.
2022-07-27 09:08:51 -07:00
cskh f7858a1bda
chore: clarify the error message: service.service must not be empty (#13907)
- when register service using catalog endpoint, the key of service
  name actually should be "service". Add this information to the
  error message will help user to quickly fix in the request.
2022-07-27 10:16:46 -04:00
Chris S. Kim 1f8ae56951
Preserve PeeringState on upsert (#13666)
Fixes a bug where if the generate token is called twice, the second call upserts the zero-value (undefined) of PeeringState.
2022-07-25 14:37:56 -04:00
DanStough f690d299c9 feat: convert destination address to slice 2022-07-25 12:31:58 -04:00
freddygv 5bbc0cc615 Add ACL enforcement to peering endpoints 2022-07-25 09:34:29 -06:00
Luke Kysow d21f793b74
peering: add config to enable/disable peering (#13867)
* peering: add config to enable/disable peering

Add config:

```
peering {
  enabled = true
}
```

Defaults to true. When disabled:
1. All peering RPC endpoints will return an error
2. Leader won't start its peering establishment goroutines
3. Leader won't start its peering deletion goroutines
2022-07-22 15:20:21 -07:00
alex 7bd55578cc
peering: emit exported services count metric (#13811)
Signed-off-by: acpana <8968914+acpana@users.noreply.github.com>
2022-07-22 12:05:08 -07:00
Eric Haberkorn e044343105
Add Cluster Peering Failover Support to Prepared Queries (#13835)
Add peering failover support to prepared queries
2022-07-22 09:14:43 -04:00
acpana b847f656a8
Rename peering internal to ~
sync ENT to 5679392c81

Signed-off-by: acpana <8968914+acpana@users.noreply.github.com>
2022-07-21 10:51:05 -07:00
Daniel Upton e3bff8fb39 proxycfg-glue: server-local implementation of `PeeredUpstreams`
This is the OSS portion of enterprise PR 2352.

It adds a server-local implementation of the proxycfg.PeeredUpstreams interface
based on a blocking query against the server's state store.

It also fixes an omission in the Virtual IP freeing logic where we were never
updating the max index (and therefore blocking queries against
VirtualIPsForAllImportedServices would not return on service deletion).
2022-07-21 13:51:59 +01:00
Paul Glass a9f17c0f99
Extract AWS auth implementation out of Consul (#13760) 2022-07-19 16:26:44 -05:00
alex 64b3705a31
peering: refactor reconcile, cleanup (#13795)
Signed-off-by: acpana <8968914+acpana@users.noreply.github.com>
2022-07-19 11:43:29 -07:00
alex 4ff097c4cf
peering: track exported services (#13784)
Signed-off-by: acpana <8968914+acpana@users.noreply.github.com>
2022-07-18 10:20:04 -07:00
Luke Kysow 3968f21339
Add docs for peerStreamServer vs peeringServer. (#13781) 2022-07-15 12:23:05 -07:00