Commit Graph

1989 Commits

Author SHA1 Message Date
Kyle Havlovitz f5c5d2f5c6
auto-config: relax node name validation for JWT authorization (#15370)
* auto-config: relax node name validation for JWT authorization

This changes the JWT authorization logic to allow all non-whitespace,
non-quote characters when validating node names. Consul had previously
allowed these characters in node names, until this validation was added
to fix a security vulnerability with whitespace/quotes being passed to
the `bexpr` library. This unintentionally broke node names with
characters like `.` which aren't related to this vulnerability.

* Update website/content/docs/agent/config/cli-flags.mdx

Co-authored-by: trujillo-adam <47586768+trujillo-adam@users.noreply.github.com>

Co-authored-by: trujillo-adam <47586768+trujillo-adam@users.noreply.github.com>
2022-11-14 18:24:40 -06:00
Dhia Ayachi 219a3c5bd3
Leadership transfer cmd (#14132)
* add leadership transfer command

* add RPC call test (flaky)

* add missing import

* add changelog

* add command registration

* Apply suggestions from code review

Co-authored-by: Matt Keeler <mkeeler@users.noreply.github.com>

* add the possibility of providing an id to raft leadership transfer. Add few tests.

* delete old file from cherry pick

* rename changelog filename to PR #

* rename changelog and fix import

* fix failing test

* check for OperatorWrite

Co-authored-by: Matt Keeler <mkeeler@users.noreply.github.com>

* rename from leader-transfer to transfer-leader

* remove version check and add test for operator read

* move struct to operator.go

* first pass

* add code for leader transfer in the grpc backend and tests

* wire the http endpoint to the new grpc endpoint

* remove the RPC endpoint

* remove non needed struct

* fix naming

* add mog glue to API

* fix comment

* remove dead code

* fix linter error

* change package name for proto file

* remove error wrapping

* fix failing test

* add command registration

* add grpc service mock tests

* fix receiver to be pointer

* use defined values

Co-authored-by: Matt Keeler <mkeeler@users.noreply.github.com>

* reuse MockAclAuthorizer

* add documentation

* remove usage of external.TokenFromContext

* fix failing tests

* fix proto generation

* Apply suggestions from code review

Co-authored-by: Jared Kirschner <85913323+jkirschner-hashicorp@users.noreply.github.com>

* Apply suggestions from code review

Co-authored-by: Jared Kirschner <85913323+jkirschner-hashicorp@users.noreply.github.com>

* Apply suggestions from code review

Co-authored-by: Jared Kirschner <85913323+jkirschner-hashicorp@users.noreply.github.com>

* Apply suggestions from code review

* add more context in doc for the reason

* Apply suggestions from docs code review

Co-authored-by: Jeff Boruszak <104028618+boruszak@users.noreply.github.com>

* regenerate proto

* fix linter errors

Co-authored-by: github-team-consul-core <github-team-consul-core@hashicorp.com>
Co-authored-by: Matt Keeler <mkeeler@users.noreply.github.com>
Co-authored-by: Jared Kirschner <85913323+jkirschner-hashicorp@users.noreply.github.com>
Co-authored-by: Jeff Boruszak <104028618+boruszak@users.noreply.github.com>
2022-11-14 15:35:12 -05:00
Freddy e96c0e1dad
Fixup authz for data imported from peers (#15347)
There are a few changes that needed to be made to to handle authorizing
reads for imported data:

- If the data was imported from a peer we should not attempt to read the
  data using the traditional authz rules. This is because the name of
  services/nodes in a peer cluster are not equivalent to those of the
  importing cluster.

- If the data was imported from a peer we need to check whether the
  token corresponds to a service, meaning that it has service:write
  permissions, or to a local read only token that can read all
  nodes/services in a namespace.

This required changes at the policyAuthorizer level, since that is the
only view available to OSS Consul, and at the enterprise
partition/namespace level.
2022-11-14 11:36:27 -07:00
Dan Stough ee56e06f22
[OSS] fix: wait and try longer to peer through mesh gw (#15328) 2022-11-10 13:54:00 -05:00
Kyle Schochenmaier 2b1e5f69e2
removes ioutil usage everywhere which was deprecated in go1.16 (#15297)
* update go version to 1.18 for api and sdk, go mod tidy
* removes ioutil usage everywhere which was deprecated in go1.16 in favour of io and os packages. Also introduces a lint rule which forbids use of ioutil going forward.
Co-authored-by: R.B. Boyer <4903+rboyer@users.noreply.github.com>
2022-11-10 10:26:01 -06:00
malizz 8d2ed1999d
update ACLs for cluster peering (#15317)
* update ACLs for cluster peering

* add changelog

* Update .changelog/15317.txt

Co-authored-by: Eric Haberkorn <erichaberkorn@gmail.com>

Co-authored-by: Eric Haberkorn <erichaberkorn@gmail.com>
2022-11-09 13:02:58 -08:00
Derek Menteer 9e76d274ec
Fix mesh gateway configuration with proxy-defaults (#15186)
* Fix mesh gateway proxy-defaults not affecting upstreams.

* Clarify distinction with upstream settings

Top-level mesh gateway mode in proxy-defaults and service-defaults gets
merged into NodeService.Proxy.MeshGateway, and only gets merged with
the mode attached to an an upstream in proxycfg/xds.

* Fix mgw mode usage for peered upstreams

There were a couple issues with how mgw mode was being handled for
peered upstreams.

For starters, mesh gateway mode from proxy-defaults
and the top-level of service-defaults gets stored in
NodeService.Proxy.MeshGateway, but the upstream watch for peered data
was only considering the mesh gateway config attached in
NodeService.Proxy.Upstreams[i]. This means that applying a mesh gateway
mode via global proxy-defaults or service-defaults on the downstream
would not have an effect.

Separately, transparent proxy watches for peered upstreams didn't
consider mesh gateway mode at all.

This commit addresses the first issue by ensuring that we overlay the
upstream config for peered upstreams as we do for non-peered. The second
issue is addressed by re-using setupWatchesForPeeredUpstream when
handling transparent proxy updates.

Note that for transparent proxies we do not yet support mesh gateway
mode per upstream, so the NodeService.Proxy.MeshGateway mode is used.

* Fix upstream mesh gateway mode handling in xds

This commit ensures that when determining the mesh gateway mode for
peered upstreams we consider the NodeService.Proxy.MeshGateway config as
a baseline.

In absense of this change, setting a mesh gateway mode via
proxy-defaults or the top-level of service-defaults will not have an
effect for peered upstreams.

* Merge service/proxy defaults in cfg resolver

Previously the mesh gateway mode for connect proxies would be
merged at three points:

1. On servers, in ComputeResolvedServiceConfig.
2. On clients, in MergeServiceConfig.
3. On clients, in proxycfg/xds.

The first merge returns a ServiceConfigResponse where there is a
top-level MeshGateway config from proxy/service-defaults, along with
per-upstream config.

The second merge combines per-upstream config specified at the service
instance with per-upstream config specified centrally.

The third merge combines the NodeService.Proxy.MeshGateway
config containing proxy/service-defaults data with the per-upstream
mode. This third merge is easy to miss, which led to peered upstreams
not considering the mesh gateway mode from proxy-defaults.

This commit removes the third merge, and ensures that all mesh gateway
config is available at the upstream. This way proxycfg/xds do not need
to do additional overlays.

* Ensure that proxy-defaults is considered in wc

Upstream defaults become a synthetic Upstream definition under a
wildcard key "*". Now that proxycfg/xds expect Upstream definitions to
have the final MeshGateway values, this commit ensures that values from
proxy-defaults/service-defaults are the default for this synthetic
upstream.

* Add changelog.

Co-authored-by: freddygv <freddy@hashicorp.com>
2022-11-09 10:14:29 -06:00
Dan Upton acfdbb23a9
chore: remove unused argument from MergeNodeServiceWithCentralConfig (#15024)
Previously, the MergeNodeServiceWithCentralConfig method accepted a
ServiceSpecificRequest argument, of which only the Datacenter and
QueryOptions fields were used.

Digging a little deeper, it turns out these fields were only passed
down to the ComputeResolvedServiceConfig method (through the
ServiceConfigRequest struct) which didn't actually use them.

As such, not all call-sites passed a valid ServiceSpecificRequest
so it's safer to remove the argument altogether to prevent future
changes from depending on it.
2022-11-09 14:54:57 +00:00
Derek Menteer a8eb047ee6
Bring back parameter ServerExternalAddresses in GenerateToken endpoint (#15267)
Re-add ServerExternalAddresses parameter in GenerateToken endpoint

This reverts commit 5e156772f6a7fba5324eb6804ae4e93c091229a6
and adds extra functionality to support newer peering behaviors.
2022-11-08 14:55:18 -06:00
Chris S. Kim dbe3dc96f3
Update hcp-scada-provider to fix diamond dependency problem with go-msgpack (#15185) 2022-11-07 11:34:30 -05:00
Dan Stough 3eb3cf3b0d
fix: persist peering CA updates to dialing clusters (#15243)
fix: persist peering CA updates to dialing clusters
2022-11-04 12:53:20 -04:00
Derek Menteer cad89029dd Decrease retry time for failed peering connections. 2022-10-31 14:30:27 -05:00
R.B. Boyer 879584a773
test: fix flaky TestSubscribeBackend_IntegrationWithServer_DeliversAllMessages test (#15195)
Allow for some message duplication in subscription events during assertions.

I'm pretty sure the subscriptions machinery allows for messages to occasionally
be duplicated instead of dropping them, as a once-and-only-once queue is a pipe
dream and you have to pick one of the other two options.
2022-10-31 12:10:43 -05:00
Derek Menteer 58f15db4c4 Allow peering endpoints to bypass verify_incoming. 2022-10-31 09:56:30 -05:00
Eric Haberkorn 57fb729547
Fix peering metrics bug (#15178)
This bug was caused by the peering health metric being set to NaN.
2022-10-28 10:51:12 -04:00
Luke Kysow 6b1ec05470
autoencrypt: helpful error for clients with wrong dc (#14832)
* autoencrypt: helpful error for clients with wrong dc

If clients have set a different datacenter than the servers they're
connecting with for autoencrypt, give a helpful error message.
2022-10-25 10:13:41 -07:00
Chris S. Kim ae1646706f Regenerate files according to 1.19.2 formatter 2022-10-24 16:12:08 -04:00
Iryna Shustava a3a6743e0a
proxycfg: watch service-defaults config entries (#15025)
To support Destinations on the service-defaults (for tproxy with terminating gateway), we need to now also make servers watch service-defaults config entries.
2022-10-24 12:50:28 -06:00
Chris S. Kim 06f583a7c2 Move oss-only test to its own file 2022-10-24 14:17:43 -04:00
R.B. Boyer 87432a8dd4
chore: update golangci-lint to v1.50.1 (#15022) 2022-10-24 11:48:02 -05:00
Venu Yanamandra 3dd12a2960
Update error message when restoring ENT snapshot in OSS (#15066) 2022-10-24 11:40:26 -04:00
Chris S. Kim 569c3bce88 Update expected encoding in test
go-memdb was updated in v1.3.3 to make integers in indexes sortable, which changed how integers were encoded.
2022-10-20 14:32:42 -04:00
freddygv f3548167fc Use plain TaggedAddressWAN 2022-10-19 16:32:44 -06:00
freddygv 1b589ba964 Add unit test 2022-10-19 16:26:15 -06:00
cskh c0dc93e5b8 fix: wan address isn't used by peering token 2022-10-19 16:33:25 -04:00
cskh e18434bcb1
peering: skip registering duplicate node and check from the peer (#14994)
* peering: skip register duplicate node and check from the peer

* Prebuilt the nodes map and checks map to avoid repeated for loop

* use key type to struct: node id, service id, and check id
2022-10-18 16:19:24 -04:00
Chris S. Kim e4c20ec190
Refactor client RPC timeouts (#14965)
Fix an issue where rpc_hold_timeout was being used as the timeout for non-blocking queries. Users should be able to tune read timeouts without fiddling with rpc_hold_timeout. A new configuration `rpc_read_timeout` is created.

Refactor some implementation from the original PR 11500 to remove the misleading linkage between RPCInfo's timeout (used to retry in case of certain modes of failures) and the client RPC timeouts.
2022-10-18 15:05:09 -04:00
Derek Menteer 25d3d244f0 Fix issue with incorrect method signature on test. 2022-10-14 11:04:57 -05:00
Freddy bbf6b17e44
Merge pull request #14981 from hashicorp/peering/dial-through-gateways 2022-10-14 09:44:56 -06:00
Derek Menteer 6c355134e8 Add tests for peering state snapshots / restores. 2022-10-14 09:48:04 -05:00
Derek Menteer 27bbdced8d Add test for ExportedServicesForAllPeersByName 2022-10-14 09:48:04 -05:00
freddygv 452dc2867c Lint 2022-10-13 15:55:55 -06:00
freddygv 37a765f8df Update leader routine to maybe use gateways 2022-10-13 14:58:00 -06:00
freddygv 239f0e3084 Update peering establishment to maybe use gateways
When peering through mesh gateways we expect outbound dials to peer
servers to flow through the local mesh gateway addresses.

Now when establishing a peering we get a list of dial addresses as a
ring buffer that includes local mesh gateway addresses if the local DC
is configured to peer through mesh gateways. The ring buffer includes
the mesh gateway addresses first, but also includes the remote server
addresses as a fallback.

This fallback is present because it's possible that direct egress from
the servers may be allowed. If not allowed then the leader will cycle
back to a mesh gateway address through the ring.

When attempting to dial the remote servers we retry up to a fixed
timeout. If using mesh gateways we also have an initial wait in
order to allow for the mesh gateways to configure themselves.

Note that if we encounter a permission denied error we do not retry
since that error indicates that the secret in the peering token is
invalid.
2022-10-13 14:57:55 -06:00
malizz 27d0181806
increase protobuf size limit for cluster peering (#14976) 2022-10-13 13:46:51 -07:00
Derek Menteer d47c9b446c Prevent consul peer-exports by discovery chain. 2022-10-13 12:45:09 -05:00
Derek Menteer ee49db9a2f Prevent the "consul" service from being exported. 2022-10-13 12:45:09 -05:00
Derek Menteer bfa4adbfce Add remote peer partition and datacenter info. 2022-10-13 10:37:41 -05:00
Dan Upton 36a3d00f0d
bug: fix goroutine leaks caused by incorrect usage of `WatchCh` (#14916)
memdb's `WatchCh` method creates a goroutine that will publish to the
returned channel when the watchset is triggered or the given context
is canceled. Although this is called out in its godoc comment, it's
not obvious that this method creates a goroutine who's lifecycle you
need to manage.

In the xDS capacity controller, we were calling `WatchCh` on each
iteration of the control loop, meaning the number of goroutines would
grow on each autopilot event until there was catalog churn.

In the catalog config source, we were calling `WatchCh` with the
background context, meaning that the goroutine would keep running after
the sync loop had terminated.
2022-10-13 12:04:27 +01:00
Paul Glass 8cf430140a
gRPC server metrics (#14922)
* Move stats.go from grpc-internal to grpc-middleware
* Update grpc server metrics with server type label
* Add stats test to grpc-external
* Remove global metrics instance from grpc server tests
2022-10-11 17:00:32 -05:00
cskh 45278cb69e
fix(peering): add missing grpc_tls_port for server address reconciliation (#14944) 2022-10-11 10:56:29 -04:00
Chris S. Kim 9d4fb0445a Include stream-related information in peering endpoints 2022-10-10 13:20:14 -06:00
Paul Glass a3fccf5e5b
Merge central config for GetEnvoyBootstrapParams (#14869)
This fixes GetEnvoyBootstrapParams to merge in proxy-defaults and service-defaults.

Co-authored-by: Dan Upton <daniel@floppy.co>
2022-10-10 12:40:27 -05:00
freddygv ae9b3eb662 Fixup test 2022-10-07 09:34:16 -06:00
freddygv 6ef8d329d2 Require Connect and TLS to generate peering tokens
By requiring Connect and a gRPC TLS listener we can automatically
configure TLS for all peering control-plane traffic.
2022-10-07 09:06:29 -06:00
freddygv a21e5799f7 Use internal server certificate for peering TLS
A previous commit introduced an internally-managed server certificate
to use for peering-related purposes.

Now the peering token has been updated to match that behavior:
- The server name matches the structure of the server cert
- The CA PEMs correspond to the Connect CA

Note that if Conect is disabled, and by extension the Connect CA, we
fall back to the previous behavior of returning the manually configured
certs and local server SNI.

Several tests were updated to use the gRPC TLS port since they enable
Connect by default. This means that the peering token will embed the
Connect CA, and the dialer will expect a TLS listener.
2022-10-07 09:05:32 -06:00
John Murret 08203ace4a
Upgrade serf to v0.10.1 and memberlist to v0.5.0 to get memberlist size metrics and broadcast queue depth metric (#14873)
* updating to serf v0.10.1 and memberlist v0.5.0 to get memberlist size metrics and memberlist broadcast queue depth metric

* update changelog

* update changelog

* correcting changelog

* adding "QueueCheckInterval" for memberlist to test

* updating integration test containers to grab latest api
2022-10-04 17:51:37 -06:00
Eric Haberkorn 2178e38204
Rename `PeerName` to `Peer` on prepared queries and exported services (#14854) 2022-10-04 14:46:15 -04:00
freddygv 2c5caec97c Share mgw addrs in peering stream if needed
This commit adds handling so that the replication stream considers
whether the user intends to peer through mesh gateways.

The subscription will return server or mesh gateway addresses depending
on the mesh configuration setting. These watches can be updated at
runtime by modifying the mesh config entry.
2022-10-03 11:42:20 -06:00
freddygv 17463472b7 Return mesh gateway addrs if peering through mgw 2022-10-03 11:35:10 -06:00