Commit Graph

4982 Commits

Author SHA1 Message Date
Eric Haberkorn 5dd131fee8
Remove the `connect.enable_serverless_plugin` agent configuration option (#15710) 2022-12-08 14:46:42 -05:00
Dhia Ayachi b459d58e8d
add multilimiter and tests (#15467)
* add multilimiter and tests

* exporting LimitedEntity

* go mod tidy

* Apply suggestions from code review

Co-authored-by: John Murret <john.murret@hashicorp.com>

* add config update and rename config params

* add doc string and split config

* Apply suggestions from code review

Co-authored-by: Dan Upton <daniel@floppy.co>

* use timer to avoid go routine leak and change the interface

* add comments to tests

* fix failing test

* add prefix with config edge, refactor tests

* Apply suggestions from code review

Co-authored-by: Dan Upton <daniel@floppy.co>

* refactor to apply configs for limiters under a prefix

* add fuzz tests and fix bugs found. Refactor reconcile loop to have a simpler logic

* make KeyType an exported type

* split the config and limiter trees to fix race conditions in config update

* rename variables

* fix race in test and remove dead code

* fix reconcile loop to not create a timer on each loop

* add extra benchmark tests and fix tests

* fix benchmark test to pass value to func

* use a separate go routine to write limiters (#15643)

* use a separate go routine to write limiters

* Add updating limiter when another limiter is created

* fix waiter to be a ticker, so we commit more than once.

* fix tests and add tests for coverage

* unexport members and add tests

* make UpdateConfig thread safe and multi call to Run safe

* replace swith with if

* fix review comments

* replace time.sleep with retries

* fix flaky test and remove unnecessary init

* fix test races

* remove unnecessary negative test case

* remove fixed todo

Co-authored-by: John Murret <john.murret@hashicorp.com>
Co-authored-by: Dan Upton <daniel@floppy.co>
2022-12-08 14:42:07 -05:00
cskh df06ab4181
Flakiness test: case-cfg-splitter-peering-ingress-gateways (#15707)
* integ-test: fix flaky test - case-cfg-splitter-peering-ingress-gateways

* add retry peering to all peering cases

Co-authored-by: Dan Stough <dan.stough@hashicorp.com>
2022-12-07 20:19:34 -05:00
Derek Menteer f17a4f07c5
Fix local mesh gateway with peering discovery chains. (#15690)
Fix local mesh gateway with peering discovery chains.

Prior to this patch, discovery chains with peers would not
properly honor the mesh gateway mode for two reasons.

1. An incorrect target upstream ID was used to lookup the
mesh gateway mode. To fix this, the parent upstream uid is
now used instead of the discovery-chain-target-uid to find
the intended mesh gateway mode.

2. The watch for local mesh gateways was never initialized
for discovery chains. To fix this, the discovery chains are
now scanned, and a local GW watch is spawned if: the mesh
gateway mode is local and the target is a peering connection.
2022-12-07 13:07:42 -06:00
R.B. Boyer ec0857075e
connect: use -dev-no-store-token for test vaults to reduce source of flakes (#15691)
It turns out that by default the dev mode vault server will attempt to interact with the
filesystem to store the provided root token. If multiple vault instances are running
they'll all awkwardly share the filesystem and if timing results in one server stopping
while another one is starting then the starting one will error with:

    Error initializing Dev mode: rename /home/circleci/.vault-token.tmp /home/circleci/.vault-token: no such file or directory

This change uses `-dev-no-store-token` to bypass that source of flakes. Also the
stdout/stderr from the vault process is included if the test fails.

The introduction of more `t.Parallel` use in https://github.com/hashicorp/consul/pull/15669
increased the likelihood of this failure, but any of the tests with multiple vaults in use
(or running multiple package tests in parallel that all use vault) were eventually going
to flake on this.
2022-12-06 13:15:13 -06:00
R.B. Boyer ba6b24babf
connect: ensure all vault connect CA tests use limited privilege tokens (#15669)
All of the current integration tests where Vault is the Connect CA now use non-root tokens for the test. This helps us detect privilege changes in the vault model so we can keep our guides up to date.

One larger change was that the RenewIntermediate function got refactored slightly so it could be used from a test, rather than the large duplicated function we were testing in a test which seemed error prone.
2022-12-06 10:06:36 -06:00
R.B. Boyer a88d1239e3
Detect Vault 1.11+ import in secondary datacenters and update default issuer (#15661)
The fix outlined and merged in #15253 fixed the issue as it occurs in the primary
DC. There is a similar issue that arises when vault is used as the Connect CA in a
secondary datacenter that is fixed by this PR.

Additionally: this PR adds support to run the existing suite of vault related integration
tests against the last 4 versions of vault (1.9, 1.10, 1.11, 1.12)
2022-12-05 15:39:21 -06:00
Chris S. Kim 5d06668248
Add warn log when all ACL policies are filtered out (#15632) 2022-12-05 11:26:10 -05:00
cskh 426c2b72d2
integ-test: test consul upgrade from the snapshot of a running cluster (#15595)
* integ-test: test consul upgrade from the snapshot of a running cluster

* use Target version as default


Co-authored-by: Dan Stough <dan.stough@hashicorp.com>
2022-12-01 10:39:09 -05:00
R.B. Boyer a8411976a8
peering: better represent non-passing states during peer check flattening (#15615)
During peer stream replication we flatten checks from the source cluster and build one thin overall check to hide the irrelevant details from the consuming cluster. This flattening logic did correctly flip to non-passing if there were any non-passing checks, but WHICH status it got during that was random (warn/error).

Also it didn't represent "maintenance" operations. There is an api package call AggregatedStatus which more correctly flattened check statuses.

This PR replicated the more complete logic into the peer stream package.
2022-11-30 11:29:21 -06:00
Freddy 7641d10184
Remove log line about server mgmt token init (#15610)
* Remove log line about server mgmt token init

Currently the server management token is only being bootstrapped in the
primary datacenter. That means that servers on the secondary datacenter
will never have this token available, and would log this line any time a
token is resolved.

Bootstrapping the token in secondary datacenters will be done in a
follow-up.

* Add changelog entry
2022-11-29 17:56:03 -05:00
James Oulman 71f7f2e3dc
Add support for configuring Envoys route idle_timeout (#14340)
* Add idleTimeout

Co-authored-by: Jeff Boruszak <104028618+boruszak@users.noreply.github.com>
Co-authored-by: Dhia Ayachi <dhia@hashicorp.com>
2022-11-29 17:43:15 -05:00
Derek Menteer 79bef1982f
Add peering `.service` and `.node` DNS lookups. (#15596)
Add peering `.service` and `.node` DNS lookups.
2022-11-29 12:23:18 -06:00
cskh 92e71318c1
fix(peering): increase the gRPC limit to 8MB (#15503)
* fix(peering): increase the gRPC limit to 50MB

* changelog

* update gRPC limit to 8MB
2022-11-28 17:48:43 -05:00
Chris S. Kim efffcd56d0
Fix Vault managed intermediate PKI bug (#15525) 2022-11-28 16:17:58 -05:00
Chris S. Kim 4ad4cb1183
Use backport-compatible assertion (#15546)
* Use backport-compatible assertion

* Add workaround for broken apt-get
2022-11-24 11:44:20 -05:00
Chris S. Kim d146a3d542
Use rpcHoldTimeout to calculate blocking timeout (#15541)
Adds buffer to clients so that servers have time to respond to blocking queries.
2022-11-24 10:13:02 -05:00
Jared Kirschner b97acfb107
Support RFC 2782 for prepared query DNS lookups (#14465)
Format:
	_<query id or name>._tcp.query[.<datacenter>].<domain>
2022-11-20 17:21:24 -05:00
Alexander Scheel 8ef3fe3812
Detect Vault 1.11+ import, update default issuer (#15253)
Consul used to rely on implicit issuer selection when calling Vault endpoints to issue new CSRs. Vault 1.11+ changed that behavior, which caused Consul to check the wrong (previous) issuer when renewing its Intermediate CA. This patch allows Consul to explicitly set a default issuer when it detects that the response from Vault is 1.11+.

Signed-off-by: Alexander Scheel <alex.scheel@hashicorp.com>
Co-authored-by: Chris S. Kim <ckim@hashicorp.com>
2022-11-17 16:29:49 -05:00
cskh 248aef38cc
fix: clarifying error message when acquiring a lock in remote dc (#15394)
* fix: clarifying error message when acquiring a lock in remote dc

* Update website/content/commands/lock.mdx

Co-authored-by: trujillo-adam <47586768+trujillo-adam@users.noreply.github.com>
2022-11-16 15:27:37 -05:00
Kyle Havlovitz f5c5d2f5c6
auto-config: relax node name validation for JWT authorization (#15370)
* auto-config: relax node name validation for JWT authorization

This changes the JWT authorization logic to allow all non-whitespace,
non-quote characters when validating node names. Consul had previously
allowed these characters in node names, until this validation was added
to fix a security vulnerability with whitespace/quotes being passed to
the `bexpr` library. This unintentionally broke node names with
characters like `.` which aren't related to this vulnerability.

* Update website/content/docs/agent/config/cli-flags.mdx

Co-authored-by: trujillo-adam <47586768+trujillo-adam@users.noreply.github.com>

Co-authored-by: trujillo-adam <47586768+trujillo-adam@users.noreply.github.com>
2022-11-14 18:24:40 -06:00
Dhia Ayachi 219a3c5bd3
Leadership transfer cmd (#14132)
* add leadership transfer command

* add RPC call test (flaky)

* add missing import

* add changelog

* add command registration

* Apply suggestions from code review

Co-authored-by: Matt Keeler <mkeeler@users.noreply.github.com>

* add the possibility of providing an id to raft leadership transfer. Add few tests.

* delete old file from cherry pick

* rename changelog filename to PR #

* rename changelog and fix import

* fix failing test

* check for OperatorWrite

Co-authored-by: Matt Keeler <mkeeler@users.noreply.github.com>

* rename from leader-transfer to transfer-leader

* remove version check and add test for operator read

* move struct to operator.go

* first pass

* add code for leader transfer in the grpc backend and tests

* wire the http endpoint to the new grpc endpoint

* remove the RPC endpoint

* remove non needed struct

* fix naming

* add mog glue to API

* fix comment

* remove dead code

* fix linter error

* change package name for proto file

* remove error wrapping

* fix failing test

* add command registration

* add grpc service mock tests

* fix receiver to be pointer

* use defined values

Co-authored-by: Matt Keeler <mkeeler@users.noreply.github.com>

* reuse MockAclAuthorizer

* add documentation

* remove usage of external.TokenFromContext

* fix failing tests

* fix proto generation

* Apply suggestions from code review

Co-authored-by: Jared Kirschner <85913323+jkirschner-hashicorp@users.noreply.github.com>

* Apply suggestions from code review

Co-authored-by: Jared Kirschner <85913323+jkirschner-hashicorp@users.noreply.github.com>

* Apply suggestions from code review

Co-authored-by: Jared Kirschner <85913323+jkirschner-hashicorp@users.noreply.github.com>

* Apply suggestions from code review

* add more context in doc for the reason

* Apply suggestions from docs code review

Co-authored-by: Jeff Boruszak <104028618+boruszak@users.noreply.github.com>

* regenerate proto

* fix linter errors

Co-authored-by: github-team-consul-core <github-team-consul-core@hashicorp.com>
Co-authored-by: Matt Keeler <mkeeler@users.noreply.github.com>
Co-authored-by: Jared Kirschner <85913323+jkirschner-hashicorp@users.noreply.github.com>
Co-authored-by: Jeff Boruszak <104028618+boruszak@users.noreply.github.com>
2022-11-14 15:35:12 -05:00
Freddy 0cc3fac6c4
Ensure that NodeDump imported nodes are filtered (#15356) 2022-11-14 12:35:20 -07:00
Freddy e96c0e1dad
Fixup authz for data imported from peers (#15347)
There are a few changes that needed to be made to to handle authorizing
reads for imported data:

- If the data was imported from a peer we should not attempt to read the
  data using the traditional authz rules. This is because the name of
  services/nodes in a peer cluster are not equivalent to those of the
  importing cluster.

- If the data was imported from a peer we need to check whether the
  token corresponds to a service, meaning that it has service:write
  permissions, or to a local read only token that can read all
  nodes/services in a namespace.

This required changes at the policyAuthorizer level, since that is the
only view available to OSS Consul, and at the enterprise
partition/namespace level.
2022-11-14 11:36:27 -07:00
Kyle Havlovitz 7be442ee63
connect: strip port from DNS SANs for ingress gateway leaf cert (#15320)
* connect: strip port from DNS SANs for ingress gateway leaf cert

* connect: format DNS SANs in CreateCSR

* connect: Test wildcard case when formatting SANs
2022-11-14 10:27:03 -08:00
Derek Menteer 0c07a36408
Prevent serving TLS via ports.grpc (#15339)
Prevent serving TLS via ports.grpc

We remove the ability to run the ports.grpc in TLS mode to avoid
confusion and to simplify configuration. This breaking change
ensures that any user currently using ports.grpc in an encrypted
mode will receive an error message indicating that ports.grpc_tls
must be explicitly used.

The suggested action for these users is to simply swap their ports.grpc
to ports.grpc_tls in the configuration file. If both ports are defined,
or if the user has not configured TLS for grpc, then the error message
will not be printed.
2022-11-11 14:29:22 -06:00
Dan Stough ee56e06f22
[OSS] fix: wait and try longer to peer through mesh gw (#15328) 2022-11-10 13:54:00 -05:00
Kyle Schochenmaier 2b1e5f69e2
removes ioutil usage everywhere which was deprecated in go1.16 (#15297)
* update go version to 1.18 for api and sdk, go mod tidy
* removes ioutil usage everywhere which was deprecated in go1.16 in favour of io and os packages. Also introduces a lint rule which forbids use of ioutil going forward.
Co-authored-by: R.B. Boyer <4903+rboyer@users.noreply.github.com>
2022-11-10 10:26:01 -06:00
malizz 8d2ed1999d
update ACLs for cluster peering (#15317)
* update ACLs for cluster peering

* add changelog

* Update .changelog/15317.txt

Co-authored-by: Eric Haberkorn <erichaberkorn@gmail.com>

Co-authored-by: Eric Haberkorn <erichaberkorn@gmail.com>
2022-11-09 13:02:58 -08:00
malizz b823d79fcf
update config defaults, add docs (#15302)
* update config defaults, add docs

* update grpc tls port for non-default values

* add changelog

* Update website/content/docs/upgrading/upgrade-specific.mdx

Co-authored-by: Derek Menteer <105233703+hashi-derek@users.noreply.github.com>

* Update website/content/docs/agent/config/config-files.mdx

Co-authored-by: Derek Menteer <105233703+hashi-derek@users.noreply.github.com>

* update logic for setting grpc tls port value

* move default config to default.go, update changelog

* update docs

* Fix config tests.

* Fix linter error.

* Fix ConnectCA tests.

* Cleanup markdown on upgrade notes.

Co-authored-by: Derek Menteer <105233703+hashi-derek@users.noreply.github.com>
Co-authored-by: Derek Menteer <derek.menteer@hashicorp.com>
2022-11-09 09:29:55 -08:00
Eric Haberkorn 69914f59f7
Log Warnings When Peering With Mesh Gateway Mode None (#15304)
warn when mesh gateway mode is set to none for peering
2022-11-09 11:48:58 -05:00
Derek Menteer 9e76d274ec
Fix mesh gateway configuration with proxy-defaults (#15186)
* Fix mesh gateway proxy-defaults not affecting upstreams.

* Clarify distinction with upstream settings

Top-level mesh gateway mode in proxy-defaults and service-defaults gets
merged into NodeService.Proxy.MeshGateway, and only gets merged with
the mode attached to an an upstream in proxycfg/xds.

* Fix mgw mode usage for peered upstreams

There were a couple issues with how mgw mode was being handled for
peered upstreams.

For starters, mesh gateway mode from proxy-defaults
and the top-level of service-defaults gets stored in
NodeService.Proxy.MeshGateway, but the upstream watch for peered data
was only considering the mesh gateway config attached in
NodeService.Proxy.Upstreams[i]. This means that applying a mesh gateway
mode via global proxy-defaults or service-defaults on the downstream
would not have an effect.

Separately, transparent proxy watches for peered upstreams didn't
consider mesh gateway mode at all.

This commit addresses the first issue by ensuring that we overlay the
upstream config for peered upstreams as we do for non-peered. The second
issue is addressed by re-using setupWatchesForPeeredUpstream when
handling transparent proxy updates.

Note that for transparent proxies we do not yet support mesh gateway
mode per upstream, so the NodeService.Proxy.MeshGateway mode is used.

* Fix upstream mesh gateway mode handling in xds

This commit ensures that when determining the mesh gateway mode for
peered upstreams we consider the NodeService.Proxy.MeshGateway config as
a baseline.

In absense of this change, setting a mesh gateway mode via
proxy-defaults or the top-level of service-defaults will not have an
effect for peered upstreams.

* Merge service/proxy defaults in cfg resolver

Previously the mesh gateway mode for connect proxies would be
merged at three points:

1. On servers, in ComputeResolvedServiceConfig.
2. On clients, in MergeServiceConfig.
3. On clients, in proxycfg/xds.

The first merge returns a ServiceConfigResponse where there is a
top-level MeshGateway config from proxy/service-defaults, along with
per-upstream config.

The second merge combines per-upstream config specified at the service
instance with per-upstream config specified centrally.

The third merge combines the NodeService.Proxy.MeshGateway
config containing proxy/service-defaults data with the per-upstream
mode. This third merge is easy to miss, which led to peered upstreams
not considering the mesh gateway mode from proxy-defaults.

This commit removes the third merge, and ensures that all mesh gateway
config is available at the upstream. This way proxycfg/xds do not need
to do additional overlays.

* Ensure that proxy-defaults is considered in wc

Upstream defaults become a synthetic Upstream definition under a
wildcard key "*". Now that proxycfg/xds expect Upstream definitions to
have the final MeshGateway values, this commit ensures that values from
proxy-defaults/service-defaults are the default for this synthetic
upstream.

* Add changelog.

Co-authored-by: freddygv <freddy@hashicorp.com>
2022-11-09 10:14:29 -06:00
Dan Upton acfdbb23a9
chore: remove unused argument from MergeNodeServiceWithCentralConfig (#15024)
Previously, the MergeNodeServiceWithCentralConfig method accepted a
ServiceSpecificRequest argument, of which only the Datacenter and
QueryOptions fields were used.

Digging a little deeper, it turns out these fields were only passed
down to the ComputeResolvedServiceConfig method (through the
ServiceConfigRequest struct) which didn't actually use them.

As such, not all call-sites passed a valid ServiceSpecificRequest
so it's safer to remove the argument altogether to prevent future
changes from depending on it.
2022-11-09 14:54:57 +00:00
Derek Menteer a8eb047ee6
Bring back parameter ServerExternalAddresses in GenerateToken endpoint (#15267)
Re-add ServerExternalAddresses parameter in GenerateToken endpoint

This reverts commit 5e156772f6a7fba5324eb6804ae4e93c091229a6
and adds extra functionality to support newer peering behaviors.
2022-11-08 14:55:18 -06:00
cskh 3d2d7a77cb
fix(mesh-gateway): remove deregistered service from mesh gateway (#15272)
* fix(mesh-gateway): remove deregistered service from mesh gateway

* changelog

Co-authored-by: Derek Menteer <105233703+hashi-derek@users.noreply.github.com>
Co-authored-by: Evan Culver <eculver@users.noreply.github.com>
2022-11-07 20:30:15 -05:00
Freddy eee0fb1035
Avoid blocking child type updates on parent ack (#15083) 2022-11-07 18:10:42 -07:00
Derek Menteer 4672d8bd3c
Backport test fix from ent. (#15279) 2022-11-07 12:17:46 -06:00
Chris S. Kim dbe3dc96f3
Update hcp-scada-provider to fix diamond dependency problem with go-msgpack (#15185) 2022-11-07 11:34:30 -05:00
Eric Haberkorn d6b614110a
Fix a bug in mesh gateway proxycfg where ACL tokens aren't passed. (#15273) 2022-11-07 10:00:11 -05:00
Dan Stough 3eb3cf3b0d
fix: persist peering CA updates to dialing clusters (#15243)
fix: persist peering CA updates to dialing clusters
2022-11-04 12:53:20 -04:00
Derek Menteer 7bcded133e
Backport tests from ent. (#15260)
* Backport agent tests.

Original commit: 0710b2d12fb51a29cedd1119b5fb086e5c71f632
Original commit: aaedb3c28bfe247266f21013d500147d8decb7cd (partial)

* Backport test fix and reduce flaky failures.
2022-11-04 10:19:24 -05:00
Derek Menteer 9245a44e68
Backport test from ENT: "Fix missing test fields" (#15258)
* Backport test from ENT: "Fix missing test fields"

Original Author: Sarah Pratt
Original Commit: a5c88bef7a969ea5d06ed898d142ab081ba65c69

* Update with proper linting.
2022-11-04 09:29:16 -05:00
Derek Menteer 261ba1e65d
Backport various fixes from ENT. (#15254)
* Regenerate golden files.

* Backport from ENT: "Avoid race"

Original commit: 5006c8c858b0e332be95271ef9ba35122453315b
Original author: freddygv

* Backport from ENT: "chore: fix flake peerstream test"

Original commit: b74097e7135eca48cc289798c5739f9ef72c0cc8
Original author: DanStough
2022-11-03 16:34:57 -05:00
malizz 24ddeac74b
convert stream status time fields to pointers (#15252) 2022-11-03 11:51:22 -07:00
sarahalsmiller befefe42ee
Added check for empty peeringsni in restrictPeeringEndpoints (#15239)
Add check for empty peeringSNI in restrictPeeringEndpoints

Co-authored-by: Derek Menteer <derek.menteer@hashicorp.com>
2022-11-02 17:20:52 -05:00
Derek Menteer f704e72f3e
Prevent peering acceptor from subscribing to addr updates. (#15214) 2022-11-02 07:55:41 -05:00
Dan Stough 19ec59c930
test: refactor testcontainers and add peering integ tests (#15084) 2022-11-01 15:03:23 -04:00
Derek Menteer cad89029dd Decrease retry time for failed peering connections. 2022-10-31 14:30:27 -05:00
R.B. Boyer 879584a773
test: fix flaky TestSubscribeBackend_IntegrationWithServer_DeliversAllMessages test (#15195)
Allow for some message duplication in subscription events during assertions.

I'm pretty sure the subscriptions machinery allows for messages to occasionally
be duplicated instead of dropping them, as a once-and-only-once queue is a pipe
dream and you have to pick one of the other two options.
2022-10-31 12:10:43 -05:00
Evan Culver 548cf6f7a4
connect: Add Envoy 1.24 to integration tests, remove Envoy 1.20 (#15093) 2022-10-31 10:50:45 -05:00
Derek Menteer 58f15db4c4 Allow peering endpoints to bypass verify_incoming. 2022-10-31 09:56:30 -05:00
Derek Menteer 065e538de3 Add tests. 2022-10-31 08:45:00 -05:00
Derek Menteer 59a385bc9a Fix peered service protocols using proxy-defaults. 2022-10-31 08:45:00 -05:00
Eric Haberkorn 57fb729547
Fix peering metrics bug (#15178)
This bug was caused by the peering health metric being set to NaN.
2022-10-28 10:51:12 -04:00
Chris S. Kim a0ac76ecf5
Allow consul debug on non-ACL consul servers (#15155) 2022-10-27 09:25:18 -04:00
cskh 57380ea752
fix(peering): nil pointer in calling handleUpdateService (#15160)
* fix(peering): nil pointer in calling handleUpdateService

* changelog
2022-10-26 11:50:34 -04:00
Eric Haberkorn 74baaf910c
fix bug that resulted in generating Envoy configs that use CDS with an EDS configuration (#15140) 2022-10-25 14:49:57 -04:00
Luke Kysow 4956b81333
ingress-gateways: don't log error when registering gateway (#15001)
* ingress-gateways: don't log error when registering gateway

Previously, when an ingress gateway was registered without a
corresponding ingress gateway config entry, an error was logged
because the watch on the config entry returned a nil result.
This is expected so don't log an error.
2022-10-25 10:55:44 -07:00
Luke Kysow 6b1ec05470
autoencrypt: helpful error for clients with wrong dc (#14832)
* autoencrypt: helpful error for clients with wrong dc

If clients have set a different datacenter than the servers they're
connecting with for autoencrypt, give a helpful error message.
2022-10-25 10:13:41 -07:00
R.B. Boyer a01936442c
cache: refactor agent cache fetching to prevent unnecessary fetches on error (#14956)
This continues the work done in #14908 where a crude solution to prevent a
goroutine leak was implemented. The former code would launch a perpetual
goroutine family every iteration (+1 +1) and the fixed code simply caused a
new goroutine family to first cancel the prior one to prevent the
leak (-1 +1 == 0).

This PR refactors this code completely to:

- make it more understandable
- remove the recursion-via-goroutine strangeness
- prevent unnecessary RPC fetches when the prior one has errored.

The core issue arose from a conflation of the entry.Fetching field to mean:

- there is an RPC (blocking query) in flight right now
- there is a goroutine running to manage the RPC fetch retry loop

The problem is that the goroutine-leak-avoidance check would treat
Fetching like (2), but within the body of a goroutine it would flip that
boolean back to false before the retry sleep. This would cause a new
chain of goroutines to launch which #14908 would correct crudely.

The refactored code uses a plain for-loop and changes the semantics
to track state for "is there a goroutine associated with this cache entry"
instead of the former.

We use a uint64 unique identity per goroutine instead of a boolean so
that any orphaned goroutines can tell when they've been replaced when
the expiry loop deletes a cache entry while the goroutine is still running
and is later replaced.
2022-10-25 10:27:26 -05:00
R.B. Boyer bcbe7b225f
test: ensure that all dependencies in a test agent use the test logger (#14996) 2022-10-24 17:02:38 -05:00
Chris S. Kim 5e901bfa01 Remove invalid 1xx HTTP codes
These tests started failing in go1.19, presumably due to
support for valid 1xx responses being added.

https://github.com/golang/go/issues/56346
2022-10-24 16:12:08 -04:00
Chris S. Kim ae1646706f Regenerate files according to 1.19.2 formatter 2022-10-24 16:12:08 -04:00
cskh a5acb987fa
fix(peering): replicating wan address (#15108)
* fix(peering): replicating wan address

* add changelog

* unit test
2022-10-24 15:44:57 -04:00
Iryna Shustava a3a6743e0a
proxycfg: watch service-defaults config entries (#15025)
To support Destinations on the service-defaults (for tproxy with terminating gateway), we need to now also make servers watch service-defaults config entries.
2022-10-24 12:50:28 -06:00
Chris S. Kim 06f583a7c2 Move oss-only test to its own file 2022-10-24 14:17:43 -04:00
R.B. Boyer bf05547080
test: fix flaky TestHealthServiceNodes_NodeMetaFilter by waiting until the streaming subsystem has a valid grpc connection (#15019)
Also potentially unflakes TestHealthIngressServiceNodes for similar
reasons.
2022-10-24 13:09:53 -05:00
R.B. Boyer 87432a8dd4
chore: update golangci-lint to v1.50.1 (#15022) 2022-10-24 11:48:02 -05:00
Venu Yanamandra 3dd12a2960
Update error message when restoring ENT snapshot in OSS (#15066) 2022-10-24 11:40:26 -04:00
freddygv 483720a443 Return forbidden on permission denied
This commit updates the establish endpoint to bubble up a 403 status
code to callers when the establishment secret from the token is invalid.
This is a signal that a new peering token must be generated.
2022-10-20 17:11:49 -06:00
Chris S. Kim 569c3bce88 Update expected encoding in test
go-memdb was updated in v1.3.3 to make integers in indexes sortable, which changed how integers were encoded.
2022-10-20 14:32:42 -04:00
freddygv f3548167fc Use plain TaggedAddressWAN 2022-10-19 16:32:44 -06:00
freddygv 1b589ba964 Add unit test 2022-10-19 16:26:15 -06:00
cskh c0dc93e5b8 fix: wan address isn't used by peering token 2022-10-19 16:33:25 -04:00
Nitya Dhanushkodi 598670e376
Remove ability to specify external addresses in GenerateToken endpoint (#14930)
* Reverts "update generate token endpoint to take external addresses (#13844)"

This reverts commit f47319b7c6b6e7c7dd720a5af927ad2d33fa536d.
2022-10-19 09:31:36 -07:00
Kyle Havlovitz 3c13cf0994
Merge pull request #15035 from hashicorp/vault-ttl-update-warn
Warn instead of returning error when missing intermediate mount tune permissions
2022-10-18 15:41:52 -07:00
cskh e18434bcb1
peering: skip registering duplicate node and check from the peer (#14994)
* peering: skip register duplicate node and check from the peer

* Prebuilt the nodes map and checks map to avoid repeated for loop

* use key type to struct: node id, service id, and check id
2022-10-18 16:19:24 -04:00
Chris S. Kim e4c20ec190
Refactor client RPC timeouts (#14965)
Fix an issue where rpc_hold_timeout was being used as the timeout for non-blocking queries. Users should be able to tune read timeouts without fiddling with rpc_hold_timeout. A new configuration `rpc_read_timeout` is created.

Refactor some implementation from the original PR 11500 to remove the misleading linkage between RPCInfo's timeout (used to retry in case of certain modes of failures) and the client RPC timeouts.
2022-10-18 15:05:09 -04:00
Kyle Havlovitz 0a968e53b5 Warn instead of returning an error when intermediate mount tune permission is missing 2022-10-18 12:01:25 -07:00
R.B. Boyer 0712e1a456
test: possibly fix flake in TestIntentionGetExact (#15021)
Restructure test setup to be similar to TestAgent_ServerCertificate
and see if that's enough to avoid flaking after join.
2022-10-18 10:51:20 -05:00
R.B. Boyer 9f41cc4a25
cache: prevent goroutine leak in agent cache (#14908)
There is a bug in the error handling code for the Agent cache subsystem discovered:

1. NotifyCallback calls notifyBlockingQuery which calls getWithIndex in
   a loop (which backs off on-error up to 1 minute)

2. getWithIndex calls fetch if there’s no valid entry in the cache

3. fetch starts a goroutine which calls Fetch on the cache-type, waits
   for a while (again with backoff up to 1 minute for errors) and then
   calls fetch to trigger a refresh

The end result being that every 1 minute notifyBlockingQuery spawns an
ancestry of goroutines that essentially lives forever.

This PR ensures that the goroutine started by `fetch` cancels any prior
goroutine spawned by the same line for the same key.

In isolated testing where a cache type was tweaked to indefinitely
error, this patch prevented goroutine counts from skyrocketing.
2022-10-17 14:38:10 -05:00
R.B. Boyer ca916eec32
ca: fix a masked bug in leaf cert generation that would not be notified of root cert rotation after the first one (#15005)
In practice this was masked by #14956 and was only uncovered fixing the
other bug.

  go test ./agent -run TestAgentConnectCALeafCert_goodNotLocal

would fail when only #14956 was fixed.
2022-10-17 13:24:27 -05:00
Chris S. Kim 58c041eb6e
Merge pull request #13388 from deblasis/feature/health-checks_windows_service
Feature: Health checks windows service
2022-10-17 09:26:19 -04:00
Dan Upton 90129919a8
proxycfg: fix goroutine leak when service is re-registered (#14988)
Fixes a bug where we'd leak a goroutine in state.run when the given
context was canceled while there was a pending update.
2022-10-17 11:31:10 +01:00
Kyle Havlovitz 096ca5e4b0 Extend tcp keepalive settings to work for terminating gateways as well 2022-10-14 17:05:46 -07:00
Kyle Havlovitz f8e745315f Update docs and add tcp_keepalive_probes setting 2022-10-14 17:05:46 -07:00
Kyle Havlovitz 526d49c6ff Add TCP keepalive settings to proxy config for mesh gateways 2022-10-14 17:05:46 -07:00
Derek Menteer 25d3d244f0 Fix issue with incorrect method signature on test. 2022-10-14 11:04:57 -05:00
Freddy bbf6b17e44
Merge pull request #14981 from hashicorp/peering/dial-through-gateways 2022-10-14 09:44:56 -06:00
Dan Upton 3b9297f95a
proxycfg: rate-limit delivery of config snapshots (#14960)
Adds a user-configurable rate limiter to proxycfg snapshot delivery,
with a default limit of 250 updates per second.

This addresses a problem observed in our load testing of Consul
Dataplane where updating a "global" resource such as a wildcard
intention or the proxy-defaults config entry could starve the Raft or
Memberlist goroutines of CPU time, causing general cluster instability.
2022-10-14 15:52:00 +01:00
Derek Menteer 6c355134e8 Add tests for peering state snapshots / restores. 2022-10-14 09:48:04 -05:00
Derek Menteer 27bbdced8d Add test for ExportedServicesForAllPeersByName 2022-10-14 09:48:04 -05:00
Dan Upton 0a0534a094
perf: remove expensive reflection from xDS hot path (#14934)
Replaces the reflection-based implementation of proxycfg's
ConfigSnapshot.Clone with code generated by deep-copy.

While load testing server-based xDS (for consul-dataplane) we discovered
this method is extremely expensive. The ConfigSnapshot struct, directly
or indirectly, contains a copy of many of the structs in the agent/structs
package, which creates a large graph for copystructure.Copy to traverse
at runtime, on every proxy reconfiguration.
2022-10-14 10:26:42 +01:00
freddygv 89596f13c4 Use split var in tests 2022-10-13 17:12:47 -06:00
freddygv b4e48f0a70 Use split wildcard partition name
This way OSS avoids passing a non-empty label, which will be rejected in
OSS consul.
2022-10-13 16:55:28 -06:00
Freddy 909fc33271
Merge pull request #14935 from hashicorp/fix/alias-leak 2022-10-13 16:31:15 -06:00
freddygv 452dc2867c Lint 2022-10-13 15:55:55 -06:00
Derek Menteer 092e5fd074 Reset wait on ensureServerAddrSubscription 2022-10-13 15:58:26 -05:00
freddygv 437a513d9b Fix CA init error code 2022-10-13 14:58:11 -06:00
freddygv 37a765f8df Update leader routine to maybe use gateways 2022-10-13 14:58:00 -06:00
freddygv 239f0e3084 Update peering establishment to maybe use gateways
When peering through mesh gateways we expect outbound dials to peer
servers to flow through the local mesh gateway addresses.

Now when establishing a peering we get a list of dial addresses as a
ring buffer that includes local mesh gateway addresses if the local DC
is configured to peer through mesh gateways. The ring buffer includes
the mesh gateway addresses first, but also includes the remote server
addresses as a fallback.

This fallback is present because it's possible that direct egress from
the servers may be allowed. If not allowed then the leader will cycle
back to a mesh gateway address through the ring.

When attempting to dial the remote servers we retry up to a fixed
timeout. If using mesh gateways we also have an initial wait in
order to allow for the mesh gateways to configure themselves.

Note that if we encounter a permission denied error we do not retry
since that error indicates that the secret in the peering token is
invalid.
2022-10-13 14:57:55 -06:00
malizz 27d0181806
increase protobuf size limit for cluster peering (#14976) 2022-10-13 13:46:51 -07:00
Derek Menteer ff01c11672 Address PR comments. 2022-10-13 14:11:02 -05:00
Derek Menteer cc0a05ffa0 Disallow peering to the same cluster. 2022-10-13 14:11:02 -05:00
Derek Menteer d47c9b446c Prevent consul peer-exports by discovery chain. 2022-10-13 12:45:09 -05:00
Derek Menteer ee49db9a2f Prevent the "consul" service from being exported. 2022-10-13 12:45:09 -05:00
Derek Menteer bfa4adbfce Add remote peer partition and datacenter info. 2022-10-13 10:37:41 -05:00
Dan Upton de7f380385
xds: properly merge central config for "agentless" services (#14962) 2022-10-13 12:04:59 +01:00
Dan Upton 36a3d00f0d
bug: fix goroutine leaks caused by incorrect usage of `WatchCh` (#14916)
memdb's `WatchCh` method creates a goroutine that will publish to the
returned channel when the watchset is triggered or the given context
is canceled. Although this is called out in its godoc comment, it's
not obvious that this method creates a goroutine who's lifecycle you
need to manage.

In the xDS capacity controller, we were calling `WatchCh` on each
iteration of the control loop, meaning the number of goroutines would
grow on each autopilot event until there was catalog churn.

In the catalog config source, we were calling `WatchCh` with the
background context, meaning that the goroutine would keep running after
the sync loop had terminated.
2022-10-13 12:04:27 +01:00
Hans Hasselberg 56580d6fa6
adding configuration option cloud.scada_address (#14936)
* adding scada_address

* config tests

* add changelog entry
2022-10-13 11:31:28 +02:00
Paul Glass be1a4438a9
Add consul.xds.server.streamStart metric (#14957)
This adds a new consul.xds.server.streamStart metric to measure the time taken to first generate xDS resources after an xDS stream is opened.
2022-10-12 14:17:58 -05:00
Riddhi Shah 474d9cfcdc
Service http checks data source for agentless proxies (#14924)
Adds another datasource for proxycfg.HTTPChecks, for use on server agents. Typically these checks are performed by local client agents and there is no equivalent of this in agentless (where servers configure consul-dataplane proxies).
Hence, the data source is mostly a no-op on servers but in the case where the service is present within the local state, it delegates to the cache data source.
2022-10-12 07:49:56 -07:00
Freddy 4cf0bf4865
Merge pull request #14958 from hashicorp/peering/nonce 2022-10-12 08:18:15 -06:00
freddygv 4d1e7c4cbb Actually track nonce in test 2022-10-12 07:50:17 -06:00
Derek Menteer 00312bcf57 Fix incorrect backoff-wait logic. 2022-10-12 08:01:10 -05:00
freddygv c9d171c031 Add basic nonce management
This commit adds a monotonically increasing nonce to include in peering
replication response messages. Every ack/nack from the peer handling a
response will include this nonce, allowing to correlate the ack/nack
with a specific resource.

At the moment nothing is done with the nonce when it is received. In the
future we may want to add functionality such as retries on NACKs,
depending on the class of error.
2022-10-11 19:02:04 -06:00
Paul Glass 8cf430140a
gRPC server metrics (#14922)
* Move stats.go from grpc-internal to grpc-middleware
* Update grpc server metrics with server type label
* Add stats test to grpc-external
* Remove global metrics instance from grpc server tests
2022-10-11 17:00:32 -05:00
cskh 45278cb69e
fix(peering): add missing grpc_tls_port for server address reconciliation (#14944) 2022-10-11 10:56:29 -04:00
freddygv 9f0ab69aef Fix alias check leak
Preivously when alias check was removed it would not be stopped nor
cleaned up from the associated aliasChecks map.

This means that any time an alias check was deregistered we would
leak a goroutine for CheckAlias.run() because the stopCh would never
be closed.

This issue mostly affects service mesh deployments on platforms where
the client agent is mostly static but proxy services come and go
regularly, since by default sidecars are registered with an alias check.
2022-10-10 16:42:29 -06:00
James Oulman a8695c88d4
Configure Envoy alpn_protocols based on service protocol (#14356)
* Configure Envoy alpn_protocols based on service protocol

* define alpnProtocols in a more standard way

* http2 protocol should be h2 only

* formatting

* add test for getAlpnProtocol()

* create changelog entry

* change scope is connect-proxy

* ignore errors on ParseProxyConfig; fixes linter

* add tests for grpc and http2 public listeners

* remove newlines from PR

* Add alpn_protocol configuration for ingress gateway

* Guard against nil tlsContext

* add ingress gateway w/ TLS tests for gRPC and HTTP2

* getAlpnProtocols: add TCP protocol test

* add tests for ingress gateway with grpc/http2 and per-listener TLS config

* add tests for ingress gateway with grpc/http2 and per-listener TLS config

* add Gateway level TLS config with mixed protocol listeners to validate ALPN

* update changelog to include ingress-gateway

* add http/1.1 to http2 ALPN

* go fmt

* fix test on custom-trace-listener
2022-10-10 13:13:56 -07:00
freddygv 55b5c1a073 Fixup test 2022-10-10 13:20:14 -06:00
Chris S. Kim 7f48033d0b Fix nil pointer 2022-10-10 13:20:14 -06:00
Chris S. Kim 9d4fb0445a Include stream-related information in peering endpoints 2022-10-10 13:20:14 -06:00
Paul Glass a3fccf5e5b
Merge central config for GetEnvoyBootstrapParams (#14869)
This fixes GetEnvoyBootstrapParams to merge in proxy-defaults and service-defaults.

Co-authored-by: Dan Upton <daniel@floppy.co>
2022-10-10 12:40:27 -05:00
Freddy 8d93f120ea
Merge pull request #14796 from hashicorp/peering/use-connect-ca 2022-10-07 10:37:37 -06:00
freddygv ae9b3eb662 Fixup test 2022-10-07 09:34:16 -06:00
freddygv 6ef8d329d2 Require Connect and TLS to generate peering tokens
By requiring Connect and a gRPC TLS listener we can automatically
configure TLS for all peering control-plane traffic.
2022-10-07 09:06:29 -06:00
freddygv a21e5799f7 Use internal server certificate for peering TLS
A previous commit introduced an internally-managed server certificate
to use for peering-related purposes.

Now the peering token has been updated to match that behavior:
- The server name matches the structure of the server cert
- The CA PEMs correspond to the Connect CA

Note that if Conect is disabled, and by extension the Connect CA, we
fall back to the previous behavior of returning the manually configured
certs and local server SNI.

Several tests were updated to use the gRPC TLS port since they enable
Connect by default. This means that the peering token will embed the
Connect CA, and the dialer will expect a TLS listener.
2022-10-07 09:05:32 -06:00
freddygv 1c696922fe Simplify mgw watch mgmt 2022-10-07 08:54:37 -06:00
freddygv b67d001b2c Use existing query options to build ctx 2022-10-07 08:46:53 -06:00
DanStough df94470e76 feat: xDS updates for peerings control plane through mesh gw 2022-10-07 08:46:42 -06:00
Eric Haberkorn 2f08fab317
Make the mesh gateway changes to allow `local` mode for cluster peering data plane traffic (#14817)
Make the mesh gateway changes to allow `local` mode for cluster peering data plane traffic
2022-10-06 09:54:14 -04:00
cskh 53ff317b01
fix: missing UDP field in checkType (#14885)
* fix: missing UDP field in checkType

* Add changelog

* Update doc
2022-10-05 15:57:21 -04:00
Derek Menteer fbee1272e7
Fix explicit tproxy listeners with discovery chains. (#14751)
Fix explicit tproxy listeners with discovery chains.
2022-10-05 14:38:25 -05:00
Alex Oskotsky 4d9309327f
Add the ability to retry on reset connection to service-routers (#12890) 2022-10-05 13:06:44 -04:00
John Murret 08203ace4a
Upgrade serf to v0.10.1 and memberlist to v0.5.0 to get memberlist size metrics and broadcast queue depth metric (#14873)
* updating to serf v0.10.1 and memberlist v0.5.0 to get memberlist size metrics and memberlist broadcast queue depth metric

* update changelog

* update changelog

* correcting changelog

* adding "QueueCheckInterval" for memberlist to test

* updating integration test containers to grab latest api
2022-10-04 17:51:37 -06:00
Evan Culver 42423ffce2
connect: Bump Envoy 1.20 to 1.20.7, 1.21 to 1.21.5 and 1.22 to 1.22.5 (#14831) 2022-10-04 13:15:01 -07:00
Eric Haberkorn 2178e38204
Rename `PeerName` to `Peer` on prepared queries and exported services (#14854) 2022-10-04 14:46:15 -04:00
Freddy 89141256c7
Merge pull request #14734 from hashicorp/NET-643-update-mesh-gateway-envoy-config-for-inbound-peering-control-plane-traffic 2022-10-03 12:54:11 -06:00
freddygv 0d61aa5d37 Update xds generation for peering over mesh gws
This commit adds the xDS resources needed for INBOUND traffic from peer
clusters:

- 1 filter chain for all inbound peering requests.
- 1 cluster for all inbound peering requests.
- 1 endpoint per voting server with the gRPC TLS port configured.

There is one filter chain and cluster because unlike with WAN
federation, peer clusters will not attempt to dial individual servers.
Peer clusters will only dial the local mesh gateway addresses.
2022-10-03 12:42:27 -06:00
freddygv 2c5caec97c Share mgw addrs in peering stream if needed
This commit adds handling so that the replication stream considers
whether the user intends to peer through mesh gateways.

The subscription will return server or mesh gateway addresses depending
on the mesh configuration setting. These watches can be updated at
runtime by modifying the mesh config entry.
2022-10-03 11:42:20 -06:00
freddygv 17463472b7 Return mesh gateway addrs if peering through mgw 2022-10-03 11:35:10 -06:00
chappie f49332a151
Merge pull request #14811 from hashicorp/chappie/dns
Add DNS gRPC proxying support
2022-10-03 08:02:48 -07:00
Chris Chapman 1b24aafb23
Making suggested comments 2022-09-30 15:03:33 -07:00
Chris Chapman 399fafb679
Making suggested changes 2022-09-30 14:51:12 -07:00
Chris Chapman 8e44a8c644
Update comment 2022-09-30 09:35:01 -07:00
DanStough 16fe27c9b8 chore: fix flakey scada provider test 2022-09-30 11:56:40 -04:00
Chris Chapman c4c5f900e0
Bind a dns mux handler to gRPC proxy 2022-09-29 21:44:45 -07:00
Chris Chapman 175e6e56f9
Adding grpc handler for dns proxy 2022-09-29 21:19:51 -07:00
Eric Haberkorn 5fd1e6daea
Add exported services event to cluster peering replication. (#14797) 2022-09-29 15:37:19 -04:00
Ashwin Venkatesh ddcd3e06e7
bug: watch local mesh gateways in non-default partitions with agentless (#14799) 2022-09-29 13:19:04 -04:00
cskh 4ece020bf1
feat(ingress gateway: support configuring limits in ingress-gateway c… (#14749)
* feat(ingress gateway: support configuring limits in ingress-gateway config entry

- a new Defaults field with max_connections, max_pending_connections, max_requests
  is added to ingress gateway config entry
- new field max_connections, max_pending_connections, max_requests in
  individual services to overwrite the value in Default
- added unit test and integration test
- updated doc

Co-authored-by: Chris S. Kim <ckim@hashicorp.com>
Co-authored-by: Jeff Boruszak <104028618+boruszak@users.noreply.github.com>
Co-authored-by: Dan Stough <dan.stough@hashicorp.com>
2022-09-28 14:56:46 -04:00
malizz 5c470b28dd
Support Stale Queries for Trust Bundle Lookups (#14724)
* initial commit

* add tags, add conversations

* add test for query options utility functions

* update previous tests

* fix test

* don't error out on empty context

* add changelog

* update decode config
2022-09-28 09:56:59 -07:00
Eric Haberkorn e80b7068a6
Enable outbound peered requests to go through local mesh gateway (#14763) 2022-09-27 09:49:28 -04:00
Nick Ethier 5e4b3ef5d4
add HCP integration component (#14723)
* add HCP integration

* lint: use non-deprecated logging interface
2022-09-26 14:58:15 -04:00
Derek Menteer d9e42b0f1c
Add envoy connection balancing. (#14616)
Add envoy connection balancing config.
2022-09-26 11:29:06 -05:00
Chris S. Kim 7ec8a0667a Add new internal endpoint to list exported services to a peer 2022-09-23 09:43:56 -04:00
freddygv 520507232f Manage local server watches depending on mesh cfg
Routing peering control plane traffic through mesh gateways can be
enabled or disabled at runtime with the mesh config entry.

This commit updates proxycfg to add or cancel watches for local servers
depending on this central config.

Note that WAN federation over mesh gateways is determined by a service
metadata flag, and any updates to the gateway service registration will
force the creation of a new snapshot. If enabled, WAN-fed over mesh
gateways will trigger a local server watch on initialize().

Because of this we will only add/remove server watches if WAN federation
over mesh gateways is disabled.
2022-09-22 19:32:10 -06:00
Alessandro De Blasis 6e99434215 fix(check): added missing OSService props 2022-09-21 13:10:21 +01:00
Alessandro De Blasis 6471184754 fix(checks): os_service OK message in output 2022-09-21 09:27:33 +01:00
Alessandro De Blasis 7f7c320746 fix(checks): os_service lifecycle bugfix 2022-09-21 09:26:47 +01:00
Alessandro De Blasis 3b9061ab5b fix(agent): uninitialized map panic error 2022-09-21 09:25:54 +01:00
malizz a3fc665eef
increase the size of txn to support vault (#14599)
* increase the size of txn to support vault

* add test, revert change to acl endpoint

* add changelog

* update test, add passing test case

* Update .changelog/14599.txt

Co-authored-by: Freddy <freddygv@users.noreply.github.com>

Co-authored-by: Freddy <freddygv@users.noreply.github.com>
2022-09-19 09:07:19 -07:00
freddygv 8166a870b6 Add awareness of server mode to TLS configurator
Preivously the TLS configurator would default to presenting auto TLS
certificates as client certificates.

Server agents should not have this behavior and should instead present
the manually configured certs. The autoTLS certs for servers are
exclusively used for peering and should not be used as the default for
outbound communication.
2022-09-16 17:57:10 -06:00
freddygv 107e4d8494 Test fixes
- Pulls in CLI test fix from main
- Updates psutils to fix TestAgent_Host on M1 Mac
2022-09-16 17:57:10 -06:00
freddygv 0c3853a2d0 Add server certificate manager
This certificate manager will request a leaf certificate for server
agents and then keep them up to date.
2022-09-16 17:57:10 -06:00
freddygv ef99b30cb8 Generate ACL token for server management
This commit introduces a new ACL token used for internal server
management purposes.

It has a few key properties:
- It has unlimited permissions.
- It is persisted through Raft as System Metadata rather than in the
ACL tokens table. This is to avoid users seeing or modifying it.
- It is re-generated on leadership establishment.
2022-09-16 17:54:34 -06:00
freddygv a33a014b9c Add handling in agent cache for server leaf certs 2022-09-16 17:54:34 -06:00
Kyle Havlovitz 40da079f18
Merge pull request #14598 from hashicorp/root-removal-fix
connect/ca: Don't discard old roots on primaryInitialize
2022-09-15 14:36:01 -07:00
Kyle Havlovitz fe10009a12 connect/ca: don't discard old roots on primaryInitialize 2022-09-15 12:59:09 -07:00
Gabriel Santos 09c00ff39a
Middleware: `RequestRecorder` reports calls below 1ms as decimal value (#12905)
* Typos

* Test failing

* Convert values <1ms to decimal

* Fix test

* Update docs and test error msg

* Applied suggested changes to test case

* Changelog file and suggested changes

* Update .changelog/12905.txt

Co-authored-by: Chris S. Kim <kisunji92@gmail.com>

* suggested change - start duration with microseconds instead of nanoseconds

* fix error

* suggested change - floats

Co-authored-by: alex <8968914+acpana@users.noreply.github.com>
Co-authored-by: Chris S. Kim <kisunji92@gmail.com>
2022-09-15 13:04:37 -04:00
Daniel Graña 13ac6356a8
[BUGFIX] Do not use interval as timeout (#14619)
Do not use interval as timeout
2022-09-15 12:39:48 -04:00
Evan Culver aa40adf97e
connect: Bump latest Envoy to 1.23.1 in test matrix (#14573) 2022-09-14 13:20:16 -07:00
DanStough f9b4fa17f1 fix(peering): generate token metrics only for leader 2022-09-14 11:37:30 -04:00
DanStough b37a2ba889 feat(peering): validate server name conflicts on establish 2022-09-14 11:37:30 -04:00
Kyle Havlovitz ea4d95a5c6
Merge pull request #14516 from hashicorp/ca-ttl-fixes
Fix inconsistent TTL behavior in CA providers
2022-09-13 16:07:36 -07:00
Kyle Havlovitz 33e616987c Update intermediate pki mount/role when reconfiguring Vault provider 2022-09-13 15:42:26 -07:00
Kyle Havlovitz 1ded025400 connect/ca: Clarify behavior around IntermediateCertTTL in CA config 2022-09-13 15:42:26 -07:00
DanStough fca4042bd9 feat: add PeerThroughMeshGateways to mesh config 2022-09-13 17:19:54 -04:00
Derek Menteer 5d1487e167
Add CSR check for number of URIs. (#14579)
Add CSR check for number of URIs.
2022-09-13 14:21:47 -05:00
Derek Menteer cfcd9f2a2c Add input validation for auto-config JWT authorization checks. 2022-09-13 11:16:36 -05:00
cskh 6196be1f98
Config-entry: Support proxy config in service-defaults (#14395)
* Config-entry: Support proxy config in service-defaults

* Update website/content/docs/connect/config-entries/service-defaults.mdx

Co-authored-by: Jeff Boruszak <104028618+boruszak@users.noreply.github.com>
2022-09-12 10:41:58 -04:00
Eric Haberkorn 1490eedfbc
Implement Cluster Peering Redirects (#14445)
implement cluster peering redirects
2022-09-09 13:58:28 -04:00
skpratt cf6c1d9388
add non-double-prefixed metrics (#14193) 2022-09-09 12:13:43 -05:00
skpratt 1ae31a520a
PR #14057 follow up fix: service id parsing from sidecar id (#14541)
* fix service id parsing from sidecar id

* simplify suffix trimming
2022-09-09 09:47:10 -05:00
Dan Upton 9fe6c33c0d
xDS Load Balancing (#14397)
Prior to #13244, connect proxies and gateways could only be configured by an
xDS session served by the local client agent.

In an upcoming release, it will be possible to deploy a Consul service mesh
without client agents. In this model, xDS sessions will be handled by the
servers themselves, which necessitates load-balancing to prevent a single
server from receiving a disproportionate amount of load and becoming
overwhelmed.

This introduces a simple form of load-balancing where Consul will attempt to
achieve an even spread of load (xDS sessions) between all healthy servers.
It does so by implementing a concurrent session limiter (limiter.SessionLimiter)
and adjusting the limit according to autopilot state and proxy service
registrations in the catalog.

If a server is already over capacity (i.e. the session limit is lowered),
Consul will begin draining sessions to rebalance the load. This will result
in the client receiving a `RESOURCE_EXHAUSTED` status code. It is the client's
responsibility to observe this response and reconnect to a different server.

Users of the gRPC client connection brokered by the
consul-server-connection-manager library will get this for free.

The rate at which Consul will drain sessions to rebalance load is scaled
dynamically based on the number of proxies in the catalog.
2022-09-09 15:02:01 +01:00
Derek Menteer 8efe862b76 Merge branch 'main' of github.com:hashicorp/consul into derekm/split-grpc-ports 2022-09-08 14:53:08 -05:00
Derek Menteer 75dec4c31d Remove rebuilding grpc server. 2022-09-08 13:45:44 -05:00
Derek Menteer 6aaf1c6035 Various cleanups. 2022-09-08 10:51:50 -05:00
Chris S. Kim 331c756471
Reuse http.DefaultTransport in UIMetricsProxy (#14521)
http.Transport keeps a pool of connections and should be reused when possible. We instantiate a new http.DefaultTransport for every metrics request, making large numbers of concurrent requests inefficiently spin up new connections instead of reusing open ones.
2022-09-08 11:02:05 -04:00
Chris S. Kim 9b5c5c5062
Merge pull request #14285 from hashicorp/NET-638-push-server-address-updates-to-the-peer
peering: Subscribe to server address changes and push updates to peers
2022-09-07 09:30:45 -04:00
skpratt 02559085ad
move port and default check logic to locked step (#14057) 2022-09-06 19:35:31 -05:00
Freddy a7f38384ae
Add SpiffeID for Consul server agents (#14485)
Co-authored-by: Eric Haberkorn <erichaberkorn@gmail.com>

By adding a SpiffeID for server agents, servers can now request a leaf
certificate from the Connect CA.

This new Spiffe ID has a key property: servers are identified by their
datacenter name and trust domain. All servers that share these
attributes will share a ServerURI.

The aim is to use these certificates to verify the server name of ANY
server in a Consul datacenter.
2022-09-06 17:58:13 -06:00
Daniel Upton 128055c44c proxycfg-glue: server-local implementation of IntentionUpstreamsDestination
This is the OSS portion of enterprise PR 2463.

Generalises the serverIntentionUpstreams type to support matching on a
service or destination.
2022-09-06 23:27:25 +01:00
Daniel Upton 4b76d8a8ff proxycfg-glue: server-local implementation of InternalServiceDump
This is the OSS portion of enterprise PR 2489.

This PR introduces a server-local implementation of the
proxycfg.InternalServiceDump interface that sources data from a blocking query
against the server's state store.

For simplicity, it only implements the subset of the Internal.ServiceDump RPC
handler actually used by proxycfg - as such the result type has been changed
to IndexedCheckServiceNodes to avoid confusion.
2022-09-06 23:27:25 +01:00
Daniel Upton 8cd6c9f95e proxycfg-glue: server-local implementation of ResolvedServiceConfig
This is the OSS portion of enterprise PR 2460.

Introduces a server-local implementation of the proxycfg.ResolvedServiceConfig
interface that sources data from a blocking query against the server's state
store.

It moves the service config resolution logic into the agent/configentry package
so that it can be used in both the RPC handler and data source.

I've also done a little re-arranging and adding comments to call out data
sources for which there is to be no server-local equivalent.
2022-09-06 23:27:25 +01:00
Derek Menteer b50bc443f3 Merge branch 'main' of github.com:hashicorp/consul into derekm/split-grpc-ports 2022-09-06 10:51:04 -05:00
Derek Menteer d771725a14 Add kv txn get-not-exists operation. 2022-09-06 10:28:59 -05:00
Chris S. Kim 0148263780 PR feedback on terminated state checking 2022-09-06 10:28:20 -04:00
Chris S. Kim 9ad8bf67a5 Add testcase for parsing grpc_port 2022-09-06 10:17:44 -04:00
Kyle Havlovitz a484a759c8
Merge pull request #14429 from hashicorp/ca-prune-intermediates
Prune old expired intermediate certs when appending a new one
2022-09-02 15:34:33 -07:00
cskh 4641a78d27
fix(txn api): missing proxy config in registering proxy service (#14471)
* fix(txn api): missing proxy config in registering proxy service
2022-09-02 14:28:05 -04:00
Chris S. Kim cd51b2f400 Properly assert for ServerAddresses replication request 2022-09-02 11:44:54 -04:00
Chris S. Kim 258c0a1bc1 Fix terminate not returning early 2022-09-02 11:44:38 -04:00
Derek Menteer cb478b0e61 Address PR comments. 2022-09-01 16:54:24 -05:00
Kyle Havlovitz 90fa16c8b5 Prune intermediates before appending new one 2022-09-01 14:24:30 -07:00
Luke Kysow 3cfea70273
Use proxy address for default check (#14433)
When a sidecar proxy is registered, a check is automatically added.
Previously, the address this check used was the underlying service's
address instead of the proxy's address, even though the check is testing
if the proxy is up.

This worked in most cases because the proxy ran on the same IP as the
underlying service but it's not guaranteed and so the proper default
address should be the proxy's address.
2022-09-01 14:03:35 -07:00
malizz c5cbd45b7d
fix TestProxyConfigEntry (#14435) 2022-09-01 11:37:47 -07:00
malizz ef5f697121
Add additional parameters to envoy passive health check config (#14238)
* draft commit

* add changelog, update test

* remove extra param

* fix test

* update type to account for nil value

* add test for custom passive health check

* update comments and tests

* update description in docs

* fix missing commas
2022-09-01 09:59:11 -07:00
Chris S. Kim e70ba97e45 Add Internal.ServiceDump support for querying by PeerName 2022-09-01 10:32:59 -04:00
Chris S. Kim 7b338c8d00
Merge pull request #13998 from jorgemarey/f-new-tracing-envoy
Add new envoy tracing configuration
2022-09-01 08:57:23 -04:00
Derek Menteer ab9d421ba2 Change serf-tag references to field references. 2022-08-31 16:38:42 -05:00
malizz ad30192499
validate args before deleting proxy defaults (#14290)
* validate args before deleting proxy defaults

* add changelog

* validate name when normalizing proxy defaults

* add test for proxyConfigEntry

* add comments
2022-08-31 13:03:38 -07:00
Kyle Havlovitz c5370d52e9 Prune old expired intermediate certs when appending a new one 2022-08-31 11:41:58 -07:00
Alessandro De Blasis fdc9fb8e6c Merge remote-tracking branch 'hashicorp/main' into feature/health-checks_windows_service 2022-08-30 18:49:20 +01:00
Eric Haberkorn 06e7f3cadb
Finish up cluster peering failover (#14396) 2022-08-30 11:46:34 -04:00
Chris S. Kim 9c157e40a3 Merge branch 'main' into NET-638-push-server-address-updates-to-the-peer
# Conflicts:
#	agent/grpc-external/services/peerstream/stream_test.go
2022-08-30 11:09:25 -04:00
Jorge Marey e3813586f3 Fix typos. Add test. Add documentation 2022-08-30 16:59:02 +02:00
Jorge Marey 4d8f5ab539 Add new tracing configuration 2022-08-30 16:59:02 +02:00
Freddy f27a9effca
Merge pull request #13496 from maxb/fix-kv_entries-metric 2022-08-29 15:35:11 -06:00
Freddy 69d99aa8c0
Merge pull request #14364 from hashicorp/peering/term-delete 2022-08-29 15:33:18 -06:00
Max Bowsher 3aefc4123f Merge branch 'main' into fix-kv_entries-metric 2022-08-29 22:22:10 +01:00
Chris S. Kim 7b267f5c01
Merge pull request #14371 from hashicorp/kisunji/peering-metrics-update
Adjust metrics reporting for peering tracker
2022-08-29 17:16:19 -04:00
Chris S. Kim e4a154c88e Add heartbeat timeout grace period when accounting for peering health 2022-08-29 16:32:26 -04:00
Derek Menteer b641dcf03d Expose `grpc_tls` via serf for cluster peering. 2022-08-29 13:43:49 -05:00
Derek Menteer 4a01d75cf8 Add separate grpc_tls port.
To ease the transition for users, the original gRPC
port can still operate in a deprecated mode as either
plain-text or TLS mode. This behavior should be removed
in a future release whenever we no longer support this.

The resulting behavior from this commit is:
  `ports.grpc > 0 && ports.grpc_tls > 0` spawns both plain-text and tls ports.
  `ports.grpc > 0 && grpc.tls == undefined` spawns a single plain-text port.
  `ports.grpc > 0 && grpc.tls != undefined` spawns a single tls port (backwards compat mode).
2022-08-29 13:43:43 -05:00
freddygv f790d84c04 Add validation to prevent switching dialing mode
This prevents unexpected changes to the output of ShouldDial, which
should never change unless a peering is deleted and recreated.
2022-08-29 12:31:13 -06:00
Eric Haberkorn 13992d5dc8
Update max_ejection_percent on outlier detection for peered clusters to 100% (#14373)
We can't trust health checks on peered services when service resolvers,
splitters and routers are used.
2022-08-29 13:46:41 -04:00
Alessandro De Blasis 37bd70ca59 fix(agent): removed redundant code in docker check as well 2022-08-29 18:15:59 +01:00
Alessandro De Blasis b5cf0524b7 fix(agent): removed redundant check on prev. running check 2022-08-29 17:53:39 +01:00
Chris S. Kim a58e943502 Rename test 2022-08-29 10:34:50 -04:00
Chris S. Kim 78bf8437d8 Fix test 2022-08-29 10:20:30 -04:00
Eric Haberkorn 2a370d456b
Update the structs and discovery chain for service resolver redirects to cluster peers. (#14366) 2022-08-29 09:51:32 -04:00
Alessandro De Blasis 260c37f9fd Merge remote-tracking branch 'hashicorp/main' into feature/health-checks_windows_service
Signed-off-by: Alessandro De Blasis <alex@deblasis.net>
2022-08-28 18:09:31 +01:00
Alessandro De Blasis 31832ec908 fix(OSServiceCheck): fixes following code-review 2022-08-28 17:56:30 +01:00
Chris S. Kim b1025f2dd9 Adjust metrics reporting for peering tracker 2022-08-26 17:34:17 -04:00
freddygv 19f25fc3a5 Allow terminated peerings to be deleted
Peerings are terminated when a peer decides to delete the peering from
their end. Deleting a peering sends a termination message to the peer
and triggers them to mark the peering as terminated but does NOT delete
the peering itself. This is to prevent peerings from disappearing from
both sides just because one side deleted them.

Previously the Delete endpoint was skipping the deletion if the peering
was not marked as active. However, terminated peerings are also
inactive.

This PR makes some updates so that peerings marked as terminated can be
deleted by users.
2022-08-26 10:52:47 -06:00
Chris S. Kim 7e95b35102 Fix casing 2022-08-26 11:56:26 -04:00
Chris S. Kim 516a6daefa Merge branch 'main' into catalog-service-list-filter 2022-08-26 11:16:06 -04:00
Chris S. Kim a2c857df40 Fix tests for enterprise 2022-08-26 11:14:02 -04:00
Chris S. Kim a5e9ea6d96 Merge branch 'main' into NET-638-push-server-address-updates-to-the-peer
# Conflicts:
#	agent/grpc-external/services/peerstream/stream_test.go
2022-08-26 10:43:56 -04:00
Chris S. Kim a8090268d4
Replace ring buffer with async version (#14314)
We need to watch for changes to peerings and update the server addresses which get served by the ring buffer.

Also, if there is an active connection for a peer, we are getting up-to-date server addresses from the replication stream and can safely ignore the token's addresses which may be stale.
2022-08-26 10:27:13 -04:00
alex f64af3be24
peering: add peer health metric (#14004)
Signed-off-by: acpana <8968914+acpana@users.noreply.github.com>
2022-08-25 16:32:59 -07:00
Chris S. Kim 2e75833133 Exit loop when context is cancelled 2022-08-25 11:48:25 -04:00
cskh 7ee1c857c3
Fix: the inboundconnection limit filter should be placed in front of http co… (#14325)
* fix: the inboundconnection limit should be placed in front of http connection manager

Co-authored-by: Freddy <freddygv@users.noreply.github.com>
2022-08-24 14:13:10 -04:00
Chris S. Kim eac63fea1f Update test comment 2022-08-24 13:50:24 -04:00
Chris S. Kim 6f98c853b8 Add check for zero-length server addresses 2022-08-24 13:30:52 -04:00
skpratt c039028401
no-op: refactor usagemetrics tests for clarity and DRY cases (#14313) 2022-08-24 12:00:09 -05:00
Pablo Ruiz García 4188769c32
Added new auto_encrypt.grpc_server_tls config option to control AutoTLS enabling of GRPC Server's TLS usage
Fix for #14253

Co-authored-by: trujillo-adam <47586768+trujillo-adam@users.noreply.github.com>
2022-08-24 12:31:38 -04:00
Dan Upton 20c87d235f
dataplane: update envoy bootstrap params for consul-dataplane (#14017)
Contains 2 changes to the GetEnvoyBootstrapParams response to support
consul-dataplane.

Exposing node_name and node_id:

consul-dataplane will support providing either the node_id or node_name in its
configuration. Unfortunately, supporting both in the xDS meta adds a fair amount
of complexity (partly because most tables are currently indexed on node_name)
so for now we're going to return them both from the bootstrap params endpoint,
allowing consul-dataplane to exchange a node_id for a node_name (which it will
supply in the xDS meta).

Properly setting service for gateways:

To avoid the need to special case gateways in consul-dataplane, service will now
either be the destination service name for connect proxies, or the gateway
service name. This means it can be used as-is in Envoy configuration (i.e. as a
cluster name or in metric tags).
2022-08-24 12:03:15 +01:00