Commit Graph

1951 Commits

Author SHA1 Message Date
Dan Upton 36a3d00f0d
bug: fix goroutine leaks caused by incorrect usage of `WatchCh` (#14916)
memdb's `WatchCh` method creates a goroutine that will publish to the
returned channel when the watchset is triggered or the given context
is canceled. Although this is called out in its godoc comment, it's
not obvious that this method creates a goroutine who's lifecycle you
need to manage.

In the xDS capacity controller, we were calling `WatchCh` on each
iteration of the control loop, meaning the number of goroutines would
grow on each autopilot event until there was catalog churn.

In the catalog config source, we were calling `WatchCh` with the
background context, meaning that the goroutine would keep running after
the sync loop had terminated.
2022-10-13 12:04:27 +01:00
Paul Glass 8cf430140a
gRPC server metrics (#14922)
* Move stats.go from grpc-internal to grpc-middleware
* Update grpc server metrics with server type label
* Add stats test to grpc-external
* Remove global metrics instance from grpc server tests
2022-10-11 17:00:32 -05:00
cskh 45278cb69e
fix(peering): add missing grpc_tls_port for server address reconciliation (#14944) 2022-10-11 10:56:29 -04:00
Chris S. Kim 9d4fb0445a Include stream-related information in peering endpoints 2022-10-10 13:20:14 -06:00
Paul Glass a3fccf5e5b
Merge central config for GetEnvoyBootstrapParams (#14869)
This fixes GetEnvoyBootstrapParams to merge in proxy-defaults and service-defaults.

Co-authored-by: Dan Upton <daniel@floppy.co>
2022-10-10 12:40:27 -05:00
freddygv ae9b3eb662 Fixup test 2022-10-07 09:34:16 -06:00
freddygv 6ef8d329d2 Require Connect and TLS to generate peering tokens
By requiring Connect and a gRPC TLS listener we can automatically
configure TLS for all peering control-plane traffic.
2022-10-07 09:06:29 -06:00
freddygv a21e5799f7 Use internal server certificate for peering TLS
A previous commit introduced an internally-managed server certificate
to use for peering-related purposes.

Now the peering token has been updated to match that behavior:
- The server name matches the structure of the server cert
- The CA PEMs correspond to the Connect CA

Note that if Conect is disabled, and by extension the Connect CA, we
fall back to the previous behavior of returning the manually configured
certs and local server SNI.

Several tests were updated to use the gRPC TLS port since they enable
Connect by default. This means that the peering token will embed the
Connect CA, and the dialer will expect a TLS listener.
2022-10-07 09:05:32 -06:00
John Murret 08203ace4a
Upgrade serf to v0.10.1 and memberlist to v0.5.0 to get memberlist size metrics and broadcast queue depth metric (#14873)
* updating to serf v0.10.1 and memberlist v0.5.0 to get memberlist size metrics and memberlist broadcast queue depth metric

* update changelog

* update changelog

* correcting changelog

* adding "QueueCheckInterval" for memberlist to test

* updating integration test containers to grab latest api
2022-10-04 17:51:37 -06:00
Eric Haberkorn 2178e38204
Rename `PeerName` to `Peer` on prepared queries and exported services (#14854) 2022-10-04 14:46:15 -04:00
freddygv 2c5caec97c Share mgw addrs in peering stream if needed
This commit adds handling so that the replication stream considers
whether the user intends to peer through mesh gateways.

The subscription will return server or mesh gateway addresses depending
on the mesh configuration setting. These watches can be updated at
runtime by modifying the mesh config entry.
2022-10-03 11:42:20 -06:00
freddygv 17463472b7 Return mesh gateway addrs if peering through mgw 2022-10-03 11:35:10 -06:00
Eric Haberkorn 5fd1e6daea
Add exported services event to cluster peering replication. (#14797) 2022-09-29 15:37:19 -04:00
malizz 5c470b28dd
Support Stale Queries for Trust Bundle Lookups (#14724)
* initial commit

* add tags, add conversations

* add test for query options utility functions

* update previous tests

* fix test

* don't error out on empty context

* add changelog

* update decode config
2022-09-28 09:56:59 -07:00
Nick Ethier 5e4b3ef5d4
add HCP integration component (#14723)
* add HCP integration

* lint: use non-deprecated logging interface
2022-09-26 14:58:15 -04:00
Chris S. Kim 7ec8a0667a Add new internal endpoint to list exported services to a peer 2022-09-23 09:43:56 -04:00
freddygv 0c3853a2d0 Add server certificate manager
This certificate manager will request a leaf certificate for server
agents and then keep them up to date.
2022-09-16 17:57:10 -06:00
freddygv ef99b30cb8 Generate ACL token for server management
This commit introduces a new ACL token used for internal server
management purposes.

It has a few key properties:
- It has unlimited permissions.
- It is persisted through Raft as System Metadata rather than in the
ACL tokens table. This is to avoid users seeing or modifying it.
- It is re-generated on leadership establishment.
2022-09-16 17:54:34 -06:00
Kyle Havlovitz 40da079f18
Merge pull request #14598 from hashicorp/root-removal-fix
connect/ca: Don't discard old roots on primaryInitialize
2022-09-15 14:36:01 -07:00
Kyle Havlovitz fe10009a12 connect/ca: don't discard old roots on primaryInitialize 2022-09-15 12:59:09 -07:00
DanStough b37a2ba889 feat(peering): validate server name conflicts on establish 2022-09-14 11:37:30 -04:00
Derek Menteer 5d1487e167
Add CSR check for number of URIs. (#14579)
Add CSR check for number of URIs.
2022-09-13 14:21:47 -05:00
Derek Menteer cfcd9f2a2c Add input validation for auto-config JWT authorization checks. 2022-09-13 11:16:36 -05:00
skpratt cf6c1d9388
add non-double-prefixed metrics (#14193) 2022-09-09 12:13:43 -05:00
Dan Upton 9fe6c33c0d
xDS Load Balancing (#14397)
Prior to #13244, connect proxies and gateways could only be configured by an
xDS session served by the local client agent.

In an upcoming release, it will be possible to deploy a Consul service mesh
without client agents. In this model, xDS sessions will be handled by the
servers themselves, which necessitates load-balancing to prevent a single
server from receiving a disproportionate amount of load and becoming
overwhelmed.

This introduces a simple form of load-balancing where Consul will attempt to
achieve an even spread of load (xDS sessions) between all healthy servers.
It does so by implementing a concurrent session limiter (limiter.SessionLimiter)
and adjusting the limit according to autopilot state and proxy service
registrations in the catalog.

If a server is already over capacity (i.e. the session limit is lowered),
Consul will begin draining sessions to rebalance the load. This will result
in the client receiving a `RESOURCE_EXHAUSTED` status code. It is the client's
responsibility to observe this response and reconnect to a different server.

Users of the gRPC client connection brokered by the
consul-server-connection-manager library will get this for free.

The rate at which Consul will drain sessions to rebalance load is scaled
dynamically based on the number of proxies in the catalog.
2022-09-09 15:02:01 +01:00
Derek Menteer 8efe862b76 Merge branch 'main' of github.com:hashicorp/consul into derekm/split-grpc-ports 2022-09-08 14:53:08 -05:00
Derek Menteer 6aaf1c6035 Various cleanups. 2022-09-08 10:51:50 -05:00
Chris S. Kim 9b5c5c5062
Merge pull request #14285 from hashicorp/NET-638-push-server-address-updates-to-the-peer
peering: Subscribe to server address changes and push updates to peers
2022-09-07 09:30:45 -04:00
Freddy a7f38384ae
Add SpiffeID for Consul server agents (#14485)
Co-authored-by: Eric Haberkorn <erichaberkorn@gmail.com>

By adding a SpiffeID for server agents, servers can now request a leaf
certificate from the Connect CA.

This new Spiffe ID has a key property: servers are identified by their
datacenter name and trust domain. All servers that share these
attributes will share a ServerURI.

The aim is to use these certificates to verify the server name of ANY
server in a Consul datacenter.
2022-09-06 17:58:13 -06:00
Daniel Upton 8cd6c9f95e proxycfg-glue: server-local implementation of ResolvedServiceConfig
This is the OSS portion of enterprise PR 2460.

Introduces a server-local implementation of the proxycfg.ResolvedServiceConfig
interface that sources data from a blocking query against the server's state
store.

It moves the service config resolution logic into the agent/configentry package
so that it can be used in both the RPC handler and data source.

I've also done a little re-arranging and adding comments to call out data
sources for which there is to be no server-local equivalent.
2022-09-06 23:27:25 +01:00
Derek Menteer b50bc443f3 Merge branch 'main' of github.com:hashicorp/consul into derekm/split-grpc-ports 2022-09-06 10:51:04 -05:00
Derek Menteer d771725a14 Add kv txn get-not-exists operation. 2022-09-06 10:28:59 -05:00
Chris S. Kim 9ad8bf67a5 Add testcase for parsing grpc_port 2022-09-06 10:17:44 -04:00
Kyle Havlovitz a484a759c8
Merge pull request #14429 from hashicorp/ca-prune-intermediates
Prune old expired intermediate certs when appending a new one
2022-09-02 15:34:33 -07:00
Derek Menteer cb478b0e61 Address PR comments. 2022-09-01 16:54:24 -05:00
Kyle Havlovitz 90fa16c8b5 Prune intermediates before appending new one 2022-09-01 14:24:30 -07:00
malizz ef5f697121
Add additional parameters to envoy passive health check config (#14238)
* draft commit

* add changelog, update test

* remove extra param

* fix test

* update type to account for nil value

* add test for custom passive health check

* update comments and tests

* update description in docs

* fix missing commas
2022-09-01 09:59:11 -07:00
Chris S. Kim e70ba97e45 Add Internal.ServiceDump support for querying by PeerName 2022-09-01 10:32:59 -04:00
Derek Menteer ab9d421ba2 Change serf-tag references to field references. 2022-08-31 16:38:42 -05:00
Kyle Havlovitz c5370d52e9 Prune old expired intermediate certs when appending a new one 2022-08-31 11:41:58 -07:00
Chris S. Kim 9c157e40a3 Merge branch 'main' into NET-638-push-server-address-updates-to-the-peer
# Conflicts:
#	agent/grpc-external/services/peerstream/stream_test.go
2022-08-30 11:09:25 -04:00
Freddy f27a9effca
Merge pull request #13496 from maxb/fix-kv_entries-metric 2022-08-29 15:35:11 -06:00
Freddy 69d99aa8c0
Merge pull request #14364 from hashicorp/peering/term-delete 2022-08-29 15:33:18 -06:00
Max Bowsher 3aefc4123f Merge branch 'main' into fix-kv_entries-metric 2022-08-29 22:22:10 +01:00
Chris S. Kim 7b267f5c01
Merge pull request #14371 from hashicorp/kisunji/peering-metrics-update
Adjust metrics reporting for peering tracker
2022-08-29 17:16:19 -04:00
Chris S. Kim e4a154c88e Add heartbeat timeout grace period when accounting for peering health 2022-08-29 16:32:26 -04:00
Derek Menteer b641dcf03d Expose `grpc_tls` via serf for cluster peering. 2022-08-29 13:43:49 -05:00
Derek Menteer 4a01d75cf8 Add separate grpc_tls port.
To ease the transition for users, the original gRPC
port can still operate in a deprecated mode as either
plain-text or TLS mode. This behavior should be removed
in a future release whenever we no longer support this.

The resulting behavior from this commit is:
  `ports.grpc > 0 && ports.grpc_tls > 0` spawns both plain-text and tls ports.
  `ports.grpc > 0 && grpc.tls == undefined` spawns a single plain-text port.
  `ports.grpc > 0 && grpc.tls != undefined` spawns a single tls port (backwards compat mode).
2022-08-29 13:43:43 -05:00
freddygv f790d84c04 Add validation to prevent switching dialing mode
This prevents unexpected changes to the output of ShouldDial, which
should never change unless a peering is deleted and recreated.
2022-08-29 12:31:13 -06:00
Eric Haberkorn 2a370d456b
Update the structs and discovery chain for service resolver redirects to cluster peers. (#14366) 2022-08-29 09:51:32 -04:00