Commit graph

4980 commits

Author SHA1 Message Date
Dan Upton 618deae657
xds: don't attempt to load-balance sessions for local proxies (#15789)
Previously, we'd begin a session with the xDS concurrency limiter
regardless of whether the proxy was registered in the catalog or in
the server's local agent state.

This caused problems for users who run `consul connect envoy` directly
against a server rather than a client agent, as the server's locally
registered proxies wouldn't be included in the limiter's capacity.

Now, the `ConfigSource` is responsible for beginning the session and we
only do so for services in the catalog.

Fixes: https://github.com/hashicorp/consul/issues/15753
2023-01-18 12:33:21 -06:00
Chris S. Kim c262065276
Warn if ACL is enabled but no token is provided to Envoy (#15967) 2023-01-16 12:31:56 -05:00
Dhia Ayachi 9e99715ee8
avoid logging RPC errors when it's specific rate limiter errors (#15968)
* avoid logging RPC errors when it's specific rate limiter errors

* simplify if statements
2023-01-16 12:08:09 -05:00
Derek Menteer f926c7643f
Enforce lowercase peer names. (#15697)
Enforce lowercase peer names.

Prior to this change peer names could be mixed case.
This can cause issues, as peer names are used as DNS labels
in various locations. It also caused issues with envoy
configuration.
2023-01-13 14:20:28 -06:00
Dan Stough 27d74de701
feat: add access logs to dataplane bootstrap rpc (#15951) 2023-01-11 13:40:09 -05:00
Matt Keeler 554f1e6fee
Protobuf Modernization (#15949)
* Protobuf Modernization

Remove direct usage of golang/protobuf in favor of google.golang.org/protobuf

Marshallers (protobuf and json) needed some changes to account for different APIs.

Moved to using the google.golang.org/protobuf/types/known/* for the well known types including replacing some custom Struct manipulation with whats available in the structpb well known type package.

This also updates our devtools script to install protoc-gen-go from the right location so that files it generates conform to the correct interfaces.

* Fix go-mod-tidy make target to work on all modules
2023-01-11 09:39:10 -05:00
Paul Glass 1bf1686ebc
Add new config_file_service_registration token (#15828) 2023-01-10 10:24:02 -06:00
Chris S. Kim 82d6d12a13
Output user-friendly name for anonymous token (#15884) 2023-01-09 12:28:53 -06:00
Dan Upton f24f8c681f
Rate limit improvements and fixes (#15917)
- Fixes a panic when Operation.SourceAddr is nil (internal net/rpc calls)
- Adds proper HTTP response codes (429 and 503) for rate limit errors
- Makes the error messages clearer
- Enables automatic retries for rate-limit errors in the net/rpc stack
2023-01-09 10:20:05 +00:00
Semir Patel 2a90faa4b1
emit metrics for global rate limiting (#15891) 2023-01-06 17:49:33 -06:00
Dhia Ayachi f17bc5ed73
inject logger and create logdrop sink (#15822)
* inject logger and create logdrop sink

* init sink with an empty struct instead of nil

* wrap a logger instead of a sink and add a discard logger to avoid double logging

* fix compile errors

* fix linter errors

* Fix bug where log arguments aren't properly formatted

* Move log sink construction outside of handler

* Add prometheus definition and docs for log drop counter

Co-authored-by: Daniel Upton <daniel@floppy.co>
2023-01-06 11:33:53 -07:00
Eric Haberkorn 01a0142d1f
Add the Lua Envoy extension (#15906) 2023-01-06 12:13:40 -05:00
Paul Glass a36839d9c3
Fix TLS_BadVerify test assertions on macOS (#15903) 2023-01-05 11:47:45 -06:00
Dan Upton 76fea384a3
grpc/acl: fix bug where ACL token was required even if disabled (#15904)
Fixes a bug introduced by #15346 where we'd always require an ACL
token even if ACLs were disabled because we were erroneously
treating `nil` identity as anonymous.
2023-01-05 16:31:18 +00:00
Dan Upton 15c7c03fa5
grpc: switch servers and retry on error (#15892)
This is the OSS portion of enterprise PR 3822.

Adds a custom gRPC balancer that replicates the router's server cycling
behavior. Also enables automatic retries for RESOURCE_EXHAUSTED errors,
which we now get for free.
2023-01-05 10:21:27 +00:00
Nick Irvine 2c37b0afd1
fix: return error when config file with unknown extension is passed (#15107) 2023-01-04 16:57:00 -08:00
Florian Apolloner cb5389cc89
Allow Operator Generated bootstrap token (#14437)
Add support to provide an initial token via the bootstrap HTTP API, similar to hashicorp/nomad#12520
2023-01-04 20:19:33 +00:00
Semir Patel 8242459c66
Wire up the rate limiter to net/rpc calls (#15879) 2023-01-04 13:38:44 -06:00
Dan Upton 1d95609fb7
grpc: protoc plugin for generating gRPC rate limit specifications (#15564)
Adds automation for generating the map of `gRPC Method Name → Rate Limit Type`
used by the middleware introduced in #15550, and will ensure we don't forget
to add new endpoints.

Engineers must annotate their RPCs in the proto file like so:

```
rpc Foo(FooRequest) returns (FooResponse) {
  option (consul.internal.ratelimit.spec) = {
    operation_type: READ,
  };
}
```

When they run `make proto` a protoc plugin `protoc-gen-consul-rate-limit` will
be installed that writes rate-limit specs as a JSON array to a file called
`.ratelimit.tmp` (one per protobuf package/directory).

After running Buf, `make proto` will execute a post-process script that will
ingest all of the `.ratelimit.tmp` files and generate a Go file containing the
mappings in the `agent/grpc-middleware` package. In the enterprise repository,
it will write an additional file with the enterprise-only endpoints.

If an engineer forgets to add the annotation to a new RPC, the plugin will
return an error like so:

```
RPC Foo is missing rate-limit specification, fix it with:

	import "proto-public/annotations/ratelimit/ratelimit.proto";

	service Bar {
	  rpc Foo(...) returns (...) {
	    option (hashicorp.consul.internal.ratelimit.spec) = {
	      operation_type: OPERATION_READ | OPERATION_WRITE | OPERATION_EXEMPT,
	    };
	  }
	}
```

In the future, this annotation can be extended to support rate-limit
category (e.g. KV vs Catalog) and to determine the retry policy.
2023-01-04 16:07:02 +00:00
Dan Upton 4719b717ea
grpc/acl: relax permissions required for "core" endpoints (#15346)
Previously, these endpoints required `service:write` permission on _any_
service as a sort of proxy for "is the caller allowed to participate in
the mesh?".

Now, they're called as part of the process of establishing a server
connection by any consumer of the consul-server-connection-manager
library, which will include non-mesh workloads (e.g. Consul KV as a
storage backend for Vault) as well as ancillary components such as
consul-k8s' acl-init process, which likely won't have `service:write`
permission.

So this commit relaxes those requirements to accept *any* valid ACL token
on the following gRPC endpoints:

- `hashicorp.consul.dataplane.DataplaneService/GetSupportedDataplaneFeatures`
- `hashicorp.consul.serverdiscovery.ServerDiscoveryService/WatchServers`
- `hashicorp.consul.connectca.ConnectCAService/WatchRoots`
2023-01-04 12:40:34 +00:00
Derek Menteer 2af14c0084
Fix issue with incorrect proxycfg watch on upstream peer-targets. (#15865)
This fixes an issue where the incorrect partition was given to the
upstream target watch, which meant that failover logic would not
work correctly.
2023-01-03 10:44:08 -06:00
Derek Menteer 76cc876ac3
Fix agent cache incorrectly notifying unchanged protobufs. (#15866)
Fix agent cache incorrectly notifying unchanged protobufs.

This change fixes a situation where the protobuf private fields
would be read by reflect.DeepEqual() and indicate data was modified.
This resulted in change notifications being fired every time, which
could cause performance problems in proxycfg.
2023-01-03 10:11:56 -06:00
Dan Upton 006138beb4
Wire in rate limiter to handle internal and external gRPC calls (#15857) 2022-12-23 13:42:16 -06:00
Dan Stough 38d65efb72
[OSS] feat: access logs for listeners and listener filters (#15864)
* feat: access logs for listeners and listener filters

* changelog

* fix integration test
2022-12-22 15:18:15 -05:00
Nitya Dhanushkodi e0e4505f44
add extensions for local service to GetExtensionConfigurations (#15871)
This gets the extensions information for the local service into the snapshot and ExtensionConfigurations for a proxy. It grabs the extensions from config entries and puts them in structs.NodeService.Proxy field, which already is copied into the config snapshot.

Also:
* add EnvoyExtensions to api.AgentService so that it matches structs.NodeService
2022-12-22 10:03:33 -08:00
Nitya Dhanushkodi 2800774f68
[OSS] extensions: refactor PluginConfiguration into a more generic type ExtensionConfiguration (#15846)
* extensions: refactor PluginConfiguration into a more generic type
ExtensionConfiguration

Also:
* adds endpoints configuration to lambda golden tests
* uses string constant for builtin/aws/lambda
Co-authored-by: Eric <eric@haberkorn.co>
2022-12-20 22:26:20 -08:00
John Murret 2a0aeb2349
Rate Limit Handler - ensure rate limiting is not in the code path when not configured (#15819)
* Rate limiting handler - ensure configuration has changed before modifying limiters

* Updating test to validate arguments to UpdateConfig

* Removing duplicate test.  Updating mock.

* Renaming NullRateLimiter to NullRequestLimitsHandler

* Rate Limit Handler - ensure rate limiting is not in the code path when not configured

* Update agent/consul/rate/handler.go

Co-authored-by: Dhia Ayachi <dhia@hashicorp.com>

* formatting handler.go

* Rate limiting handler - ensure configuration has changed before modifying limiters

* Updating test to validate arguments to UpdateConfig

* Removing duplicate test.  Updating mock.

* adding logging for when UpdateConfig is called but the config has not changed.

* Update agent/consul/rate/handler.go

Co-authored-by: Dhia Ayachi <dhia@hashicorp.com>

* Update agent/consul/rate/handler_test.go

Co-authored-by: Dan Upton <daniel@floppy.co>

* modifying existing variable name based on pr feedback

* updating a broken merge conflict;

Co-authored-by: Dhia Ayachi <dhia@hashicorp.com>
Co-authored-by: Dan Upton <daniel@floppy.co>
2022-12-20 15:00:22 -07:00
John Murret 8c33d7cc0e
Rate limiting handler - ensure configuration has changed before modifying limiters (#15805)
* Rate limiting handler - ensure configuration has changed before modifying limiters

* Updating test to validate arguments to UpdateConfig

* Removing duplicate test.  Updating mock.

* adding logging for when UpdateConfig is called but the config has not changed.

* Update agent/consul/rate/handler.go

Co-authored-by: Dhia Ayachi <dhia@hashicorp.com>

Co-authored-by: Dhia Ayachi <dhia@hashicorp.com>
2022-12-20 14:12:03 -07:00
Michael Wilkerson ebed9e048f
Enhancement: Consul Compatibility Checking (#15818)
* add functions for returning the max and min Envoy major versions
- added an UnsupportedEnvoyVersions list
- removed an unused error from TestDetermineSupportedProxyFeaturesFromString
- modified minSupportedVersion to use the function for getting the Min Envoy major version. Using just the major version without the patch is equivalent to using `.0`

* added a function for executing the envoy --version command
- added a new exec.go file to not be locked to unix system

* added envoy version check when using consul connect envoy

* added changelog entry

* added docs change
2022-12-20 09:58:19 -08:00
Derek Menteer e25f7313e4
Fix incorrect protocol check on discovery chains with peer targets. (#15833) 2022-12-20 10:15:03 -06:00
Semir Patel 971089482c
Map net/rpc endpoints to a read/write/exempt op for rate-limiting (#15825)
Also fixed TestRequestRecorder flaky tests due to loss of precision in elapsed time in the test.
2022-12-19 16:04:52 -06:00
Nitya Dhanushkodi 8386bf19bf
extensions: refactor serverless plugin to use extensions from config entry fields (#15817)
docs: update config entry docs and the Lambda manual registration docs

Co-authored-by: Nitya Dhanushkodi <nitya@hashicorp.com>
Co-authored-by: Eric <eric@haberkorn.co>
2022-12-19 12:19:37 -08:00
Chris S. Kim f129a6a9d0
Break instead (#15844) 2022-12-19 11:53:05 -07:00
Chris S. Kim f8868c7ccf
Add custom balancer to always remove subConns (#15701)
The new balancer is a patched version of gRPC's default pick_first balancer
which removes the behavior of preserving the active subconnection if
a list of new addresses contains the currently active address.
2022-12-19 17:39:31 +00:00
Andrew Stucki 1ff0906a3e
Add async reconciliation controller subpackage (#15534)
* Add async reconciliation controller subpackage

* Address initial feedback

* Add tests for panic assertions

* Fix comment
2022-12-16 16:49:26 -05:00
Dhia Ayachi a1ceeff461
add missing code and fix enterprise specific code (#15375)
* add missing code and fix enterprise specific code

* fix retry

* fix flaky tests

* fix linter error in test
2022-12-16 16:31:05 -05:00
Dhia Ayachi 108653b9e3
add log-drop package (#15670)
* add log-drop package

* refactor to extract level

* extract metrics

* Apply suggestions from code review

Co-authored-by: Dan Upton <daniel@floppy.co>

* fix compile errors

* change to implement a log sink

* fix tests to remove sleep

* rename and add go docs

* fix expending variadic

Co-authored-by: Dan Upton <daniel@floppy.co>
2022-12-15 12:52:48 -05:00
Paul Glass 62df6a7513
Deprecate -join and -join-wan (#15598) 2022-12-14 20:28:25 +00:00
Dhia Ayachi 11f245f24f
Server side rate limiter: handle the race condition for limiters tree write in multilimiter (#15767)
* change to perform all tree writes in the same go routine to avoid race condition.

* rename runStoreOnce to reconcile

* Apply suggestions from code review

Co-authored-by: Dan Upton <daniel@floppy.co>

* reduce nesting

Co-authored-by: Dan Upton <daniel@floppy.co>
2022-12-14 17:32:11 +00:00
Semir Patel 1f82e82e04
Pass remote addr of incoming HTTP requests through to RPC(..) calls (#15700) 2022-12-14 09:24:22 -06:00
John Murret 700c693b33
adding config for request_limits (#15531)
* server: add placeholder glue for rate limit handler

This commit adds a no-op implementation of the rate-limit handler and
adds it to the `consul.Server` struct and setup code.

This allows us to start working on the net/rpc and gRPC interceptors and
config logic.

* Add handler errors

* Set the global read and write limits

* fixing multilimiter moving packages

* Fix typo

* Simplify globalLimit usage

* add multilimiter and tests

* exporting LimitedEntity

* Apply suggestions from code review

Co-authored-by: John Murret <john.murret@hashicorp.com>

* add config update and rename config params

* add doc string and split config

* Apply suggestions from code review

Co-authored-by: Dan Upton <daniel@floppy.co>

* use timer to avoid go routine leak and change the interface

* add comments to tests

* fix failing test

* add prefix with config edge, refactor tests

* Apply suggestions from code review

Co-authored-by: Dan Upton <daniel@floppy.co>

* refactor to apply configs for limiters under a prefix

* add fuzz tests and fix bugs found. Refactor reconcile loop to have a simpler logic

* make KeyType an exported type

* split the config and limiter trees to fix race conditions in config update

* rename variables

* fix race in test and remove dead code

* fix reconcile loop to not create a timer on each loop

* add extra benchmark tests and fix tests

* fix benchmark test to pass value to func

* server: add placeholder glue for rate limit handler

This commit adds a no-op implementation of the rate-limit handler and
adds it to the `consul.Server` struct and setup code.

This allows us to start working on the net/rpc and gRPC interceptors and
config logic.

* Set the global read and write limits

* fixing multilimiter moving packages

* add server configuration for global rate limiting.

* remove agent test

* remove added stuff from handler

* remove added stuff from multilimiter

* removing unnecessary TODOs

* Removing TODO comment from handler

* adding in defaulting to infinite

* add disabled status in there

* adding in documentation for disabled mode.

* make disabled the default.

* Add mock and agent test

* addig documentation and missing mock file.

* Fixing test TestLoad_IntegrationWithFlags

* updating docs based on PR feedback.

* Updating Request Limits mode to use int based on PR feedback.

* Adding RequestLimits struct so we have a nested struct in ReloadableConfig.

* fixing linting references

* Update agent/consul/rate/handler.go

Co-authored-by: Dan Upton <daniel@floppy.co>

* Update agent/consul/config.go

Co-authored-by: Dan Upton <daniel@floppy.co>

* removing the ignore of the request limits in JSON.  addingbuilder logic to convert any read rate or write rate less than 0 to rate.Inf

* added conversion function to convert request limits object to handler config.

* Updating docs to reflect gRPC and RPC are rate limit and as a result, HTTP requests are as well.

* Updating values for TestLoad_FullConfig() so that they were different and discernable.

* Updating TestRuntimeConfig_Sanitize

* Fixing TestLoad_IntegrationWithFlags test

* putting nil check in place

* fixing rebase

* removing change for missing error checks.  will put in another PR

* Rebasing after default multilimiter config change

* resolving rebase issues

* updating reference for incomingRPCLimiter to use interface

* updating interface

* Updating interfaces

* Fixing mock reference

Co-authored-by: Daniel Upton <daniel@floppy.co>
Co-authored-by: Dhia Ayachi <dhia@hashicorp.com>
2022-12-13 13:09:55 -07:00
Dan Stough b7c51a31c4
feat: add access logging API to proxy defaults (#15780) 2022-12-13 14:52:18 -05:00
cskh 3e37a449c8
feat(ingress-gateway): support outlier detection of upstream service for ingress gateway (#15614)
* feat(ingress-gateway): support outlier detection of upstream service for ingress gateway

* changelog

Co-authored-by: Eric Haberkorn <erichaberkorn@gmail.com>
2022-12-13 11:51:37 -05:00
Derek Menteer 50a5549f8a
Fix DialedDirectly configuration for Consul dataplane. (#15760)
Fix DialedDirectly configuration for Consul dataplane.
2022-12-13 09:16:31 -06:00
Dan Upton c73707ca3c
grpc: add rate-limiting middleware (#15550)
Implements the gRPC middleware for rate-limiting as a tap.ServerInHandle
function (executed before the request is unmarshaled).

Mappings between gRPC methods and their operation type are generated by
a protoc plugin introduced by #15564.
2022-12-13 15:01:56 +00:00
Dan Upton 4894848993
server: add placeholder glue for rate limit handler (#15539)
Adds a no-op implementation of the rate-limit handler and exposes
it on the consul.Server struct.

It allows us to start working on the net/rpc and gRPC interceptors
and config (re)loading logic, without having to implement the full
handler up-front.

Co-authored-by: John Murret <john.murret@hashicorp.com>
Co-authored-by: Dhia Ayachi <dhia@hashicorp.com>
2022-12-13 11:41:54 +00:00
John Murret fe0432ade5
agent: Fix assignment of error when auto-reloading cert and key file changes. (#15769)
* Adding the setting of errors missing in config file watcher code in agent.

* add changelog
2022-12-12 12:24:39 -07:00
R.B. Boyer a52a774c09
test: remove variable shadowing in TestDNS_ServiceLookup_ARecordLimits (#15740) 2022-12-09 10:19:02 -06:00
Eric Haberkorn 5dd131fee8
Remove the connect.enable_serverless_plugin agent configuration option (#15710) 2022-12-08 14:46:42 -05:00
Dhia Ayachi b459d58e8d
add multilimiter and tests (#15467)
* add multilimiter and tests

* exporting LimitedEntity

* go mod tidy

* Apply suggestions from code review

Co-authored-by: John Murret <john.murret@hashicorp.com>

* add config update and rename config params

* add doc string and split config

* Apply suggestions from code review

Co-authored-by: Dan Upton <daniel@floppy.co>

* use timer to avoid go routine leak and change the interface

* add comments to tests

* fix failing test

* add prefix with config edge, refactor tests

* Apply suggestions from code review

Co-authored-by: Dan Upton <daniel@floppy.co>

* refactor to apply configs for limiters under a prefix

* add fuzz tests and fix bugs found. Refactor reconcile loop to have a simpler logic

* make KeyType an exported type

* split the config and limiter trees to fix race conditions in config update

* rename variables

* fix race in test and remove dead code

* fix reconcile loop to not create a timer on each loop

* add extra benchmark tests and fix tests

* fix benchmark test to pass value to func

* use a separate go routine to write limiters (#15643)

* use a separate go routine to write limiters

* Add updating limiter when another limiter is created

* fix waiter to be a ticker, so we commit more than once.

* fix tests and add tests for coverage

* unexport members and add tests

* make UpdateConfig thread safe and multi call to Run safe

* replace swith with if

* fix review comments

* replace time.sleep with retries

* fix flaky test and remove unnecessary init

* fix test races

* remove unnecessary negative test case

* remove fixed todo

Co-authored-by: John Murret <john.murret@hashicorp.com>
Co-authored-by: Dan Upton <daniel@floppy.co>
2022-12-08 14:42:07 -05:00
cskh df06ab4181
Flakiness test: case-cfg-splitter-peering-ingress-gateways (#15707)
* integ-test: fix flaky test - case-cfg-splitter-peering-ingress-gateways

* add retry peering to all peering cases

Co-authored-by: Dan Stough <dan.stough@hashicorp.com>
2022-12-07 20:19:34 -05:00
Derek Menteer f17a4f07c5
Fix local mesh gateway with peering discovery chains. (#15690)
Fix local mesh gateway with peering discovery chains.

Prior to this patch, discovery chains with peers would not
properly honor the mesh gateway mode for two reasons.

1. An incorrect target upstream ID was used to lookup the
mesh gateway mode. To fix this, the parent upstream uid is
now used instead of the discovery-chain-target-uid to find
the intended mesh gateway mode.

2. The watch for local mesh gateways was never initialized
for discovery chains. To fix this, the discovery chains are
now scanned, and a local GW watch is spawned if: the mesh
gateway mode is local and the target is a peering connection.
2022-12-07 13:07:42 -06:00
R.B. Boyer ec0857075e
connect: use -dev-no-store-token for test vaults to reduce source of flakes (#15691)
It turns out that by default the dev mode vault server will attempt to interact with the
filesystem to store the provided root token. If multiple vault instances are running
they'll all awkwardly share the filesystem and if timing results in one server stopping
while another one is starting then the starting one will error with:

    Error initializing Dev mode: rename /home/circleci/.vault-token.tmp /home/circleci/.vault-token: no such file or directory

This change uses `-dev-no-store-token` to bypass that source of flakes. Also the
stdout/stderr from the vault process is included if the test fails.

The introduction of more `t.Parallel` use in https://github.com/hashicorp/consul/pull/15669
increased the likelihood of this failure, but any of the tests with multiple vaults in use
(or running multiple package tests in parallel that all use vault) were eventually going
to flake on this.
2022-12-06 13:15:13 -06:00
R.B. Boyer ba6b24babf
connect: ensure all vault connect CA tests use limited privilege tokens (#15669)
All of the current integration tests where Vault is the Connect CA now use non-root tokens for the test. This helps us detect privilege changes in the vault model so we can keep our guides up to date.

One larger change was that the RenewIntermediate function got refactored slightly so it could be used from a test, rather than the large duplicated function we were testing in a test which seemed error prone.
2022-12-06 10:06:36 -06:00
R.B. Boyer a88d1239e3
Detect Vault 1.11+ import in secondary datacenters and update default issuer (#15661)
The fix outlined and merged in #15253 fixed the issue as it occurs in the primary
DC. There is a similar issue that arises when vault is used as the Connect CA in a
secondary datacenter that is fixed by this PR.

Additionally: this PR adds support to run the existing suite of vault related integration
tests against the last 4 versions of vault (1.9, 1.10, 1.11, 1.12)
2022-12-05 15:39:21 -06:00
Chris S. Kim 5d06668248
Add warn log when all ACL policies are filtered out (#15632) 2022-12-05 11:26:10 -05:00
cskh 426c2b72d2
integ-test: test consul upgrade from the snapshot of a running cluster (#15595)
* integ-test: test consul upgrade from the snapshot of a running cluster

* use Target version as default


Co-authored-by: Dan Stough <dan.stough@hashicorp.com>
2022-12-01 10:39:09 -05:00
R.B. Boyer a8411976a8
peering: better represent non-passing states during peer check flattening (#15615)
During peer stream replication we flatten checks from the source cluster and build one thin overall check to hide the irrelevant details from the consuming cluster. This flattening logic did correctly flip to non-passing if there were any non-passing checks, but WHICH status it got during that was random (warn/error).

Also it didn't represent "maintenance" operations. There is an api package call AggregatedStatus which more correctly flattened check statuses.

This PR replicated the more complete logic into the peer stream package.
2022-11-30 11:29:21 -06:00
Freddy 7641d10184
Remove log line about server mgmt token init (#15610)
* Remove log line about server mgmt token init

Currently the server management token is only being bootstrapped in the
primary datacenter. That means that servers on the secondary datacenter
will never have this token available, and would log this line any time a
token is resolved.

Bootstrapping the token in secondary datacenters will be done in a
follow-up.

* Add changelog entry
2022-11-29 17:56:03 -05:00
James Oulman 71f7f2e3dc
Add support for configuring Envoys route idle_timeout (#14340)
* Add idleTimeout

Co-authored-by: Jeff Boruszak <104028618+boruszak@users.noreply.github.com>
Co-authored-by: Dhia Ayachi <dhia@hashicorp.com>
2022-11-29 17:43:15 -05:00
Derek Menteer 79bef1982f
Add peering .service and .node DNS lookups. (#15596)
Add peering `.service` and `.node` DNS lookups.
2022-11-29 12:23:18 -06:00
cskh 92e71318c1
fix(peering): increase the gRPC limit to 8MB (#15503)
* fix(peering): increase the gRPC limit to 50MB

* changelog

* update gRPC limit to 8MB
2022-11-28 17:48:43 -05:00
Chris S. Kim efffcd56d0
Fix Vault managed intermediate PKI bug (#15525) 2022-11-28 16:17:58 -05:00
Chris S. Kim 4ad4cb1183
Use backport-compatible assertion (#15546)
* Use backport-compatible assertion

* Add workaround for broken apt-get
2022-11-24 11:44:20 -05:00
Chris S. Kim d146a3d542
Use rpcHoldTimeout to calculate blocking timeout (#15541)
Adds buffer to clients so that servers have time to respond to blocking queries.
2022-11-24 10:13:02 -05:00
Jared Kirschner b97acfb107
Support RFC 2782 for prepared query DNS lookups (#14465)
Format:
	_<query id or name>._tcp.query[.<datacenter>].<domain>
2022-11-20 17:21:24 -05:00
Alexander Scheel 8ef3fe3812
Detect Vault 1.11+ import, update default issuer (#15253)
Consul used to rely on implicit issuer selection when calling Vault endpoints to issue new CSRs. Vault 1.11+ changed that behavior, which caused Consul to check the wrong (previous) issuer when renewing its Intermediate CA. This patch allows Consul to explicitly set a default issuer when it detects that the response from Vault is 1.11+.

Signed-off-by: Alexander Scheel <alex.scheel@hashicorp.com>
Co-authored-by: Chris S. Kim <ckim@hashicorp.com>
2022-11-17 16:29:49 -05:00
cskh 248aef38cc
fix: clarifying error message when acquiring a lock in remote dc (#15394)
* fix: clarifying error message when acquiring a lock in remote dc

* Update website/content/commands/lock.mdx

Co-authored-by: trujillo-adam <47586768+trujillo-adam@users.noreply.github.com>
2022-11-16 15:27:37 -05:00
Kyle Havlovitz f5c5d2f5c6
auto-config: relax node name validation for JWT authorization (#15370)
* auto-config: relax node name validation for JWT authorization

This changes the JWT authorization logic to allow all non-whitespace,
non-quote characters when validating node names. Consul had previously
allowed these characters in node names, until this validation was added
to fix a security vulnerability with whitespace/quotes being passed to
the `bexpr` library. This unintentionally broke node names with
characters like `.` which aren't related to this vulnerability.

* Update website/content/docs/agent/config/cli-flags.mdx

Co-authored-by: trujillo-adam <47586768+trujillo-adam@users.noreply.github.com>

Co-authored-by: trujillo-adam <47586768+trujillo-adam@users.noreply.github.com>
2022-11-14 18:24:40 -06:00
Dhia Ayachi 219a3c5bd3
Leadership transfer cmd (#14132)
* add leadership transfer command

* add RPC call test (flaky)

* add missing import

* add changelog

* add command registration

* Apply suggestions from code review

Co-authored-by: Matt Keeler <mkeeler@users.noreply.github.com>

* add the possibility of providing an id to raft leadership transfer. Add few tests.

* delete old file from cherry pick

* rename changelog filename to PR #

* rename changelog and fix import

* fix failing test

* check for OperatorWrite

Co-authored-by: Matt Keeler <mkeeler@users.noreply.github.com>

* rename from leader-transfer to transfer-leader

* remove version check and add test for operator read

* move struct to operator.go

* first pass

* add code for leader transfer in the grpc backend and tests

* wire the http endpoint to the new grpc endpoint

* remove the RPC endpoint

* remove non needed struct

* fix naming

* add mog glue to API

* fix comment

* remove dead code

* fix linter error

* change package name for proto file

* remove error wrapping

* fix failing test

* add command registration

* add grpc service mock tests

* fix receiver to be pointer

* use defined values

Co-authored-by: Matt Keeler <mkeeler@users.noreply.github.com>

* reuse MockAclAuthorizer

* add documentation

* remove usage of external.TokenFromContext

* fix failing tests

* fix proto generation

* Apply suggestions from code review

Co-authored-by: Jared Kirschner <85913323+jkirschner-hashicorp@users.noreply.github.com>

* Apply suggestions from code review

Co-authored-by: Jared Kirschner <85913323+jkirschner-hashicorp@users.noreply.github.com>

* Apply suggestions from code review

Co-authored-by: Jared Kirschner <85913323+jkirschner-hashicorp@users.noreply.github.com>

* Apply suggestions from code review

* add more context in doc for the reason

* Apply suggestions from docs code review

Co-authored-by: Jeff Boruszak <104028618+boruszak@users.noreply.github.com>

* regenerate proto

* fix linter errors

Co-authored-by: github-team-consul-core <github-team-consul-core@hashicorp.com>
Co-authored-by: Matt Keeler <mkeeler@users.noreply.github.com>
Co-authored-by: Jared Kirschner <85913323+jkirschner-hashicorp@users.noreply.github.com>
Co-authored-by: Jeff Boruszak <104028618+boruszak@users.noreply.github.com>
2022-11-14 15:35:12 -05:00
Freddy 0cc3fac6c4
Ensure that NodeDump imported nodes are filtered (#15356) 2022-11-14 12:35:20 -07:00
Freddy e96c0e1dad
Fixup authz for data imported from peers (#15347)
There are a few changes that needed to be made to to handle authorizing
reads for imported data:

- If the data was imported from a peer we should not attempt to read the
  data using the traditional authz rules. This is because the name of
  services/nodes in a peer cluster are not equivalent to those of the
  importing cluster.

- If the data was imported from a peer we need to check whether the
  token corresponds to a service, meaning that it has service:write
  permissions, or to a local read only token that can read all
  nodes/services in a namespace.

This required changes at the policyAuthorizer level, since that is the
only view available to OSS Consul, and at the enterprise
partition/namespace level.
2022-11-14 11:36:27 -07:00
Kyle Havlovitz 7be442ee63
connect: strip port from DNS SANs for ingress gateway leaf cert (#15320)
* connect: strip port from DNS SANs for ingress gateway leaf cert

* connect: format DNS SANs in CreateCSR

* connect: Test wildcard case when formatting SANs
2022-11-14 10:27:03 -08:00
Derek Menteer 0c07a36408
Prevent serving TLS via ports.grpc (#15339)
Prevent serving TLS via ports.grpc

We remove the ability to run the ports.grpc in TLS mode to avoid
confusion and to simplify configuration. This breaking change
ensures that any user currently using ports.grpc in an encrypted
mode will receive an error message indicating that ports.grpc_tls
must be explicitly used.

The suggested action for these users is to simply swap their ports.grpc
to ports.grpc_tls in the configuration file. If both ports are defined,
or if the user has not configured TLS for grpc, then the error message
will not be printed.
2022-11-11 14:29:22 -06:00
Dan Stough ee56e06f22
[OSS] fix: wait and try longer to peer through mesh gw (#15328) 2022-11-10 13:54:00 -05:00
Kyle Schochenmaier 2b1e5f69e2
removes ioutil usage everywhere which was deprecated in go1.16 (#15297)
* update go version to 1.18 for api and sdk, go mod tidy
* removes ioutil usage everywhere which was deprecated in go1.16 in favour of io and os packages. Also introduces a lint rule which forbids use of ioutil going forward.
Co-authored-by: R.B. Boyer <4903+rboyer@users.noreply.github.com>
2022-11-10 10:26:01 -06:00
malizz 8d2ed1999d
update ACLs for cluster peering (#15317)
* update ACLs for cluster peering

* add changelog

* Update .changelog/15317.txt

Co-authored-by: Eric Haberkorn <erichaberkorn@gmail.com>

Co-authored-by: Eric Haberkorn <erichaberkorn@gmail.com>
2022-11-09 13:02:58 -08:00
malizz b823d79fcf
update config defaults, add docs (#15302)
* update config defaults, add docs

* update grpc tls port for non-default values

* add changelog

* Update website/content/docs/upgrading/upgrade-specific.mdx

Co-authored-by: Derek Menteer <105233703+hashi-derek@users.noreply.github.com>

* Update website/content/docs/agent/config/config-files.mdx

Co-authored-by: Derek Menteer <105233703+hashi-derek@users.noreply.github.com>

* update logic for setting grpc tls port value

* move default config to default.go, update changelog

* update docs

* Fix config tests.

* Fix linter error.

* Fix ConnectCA tests.

* Cleanup markdown on upgrade notes.

Co-authored-by: Derek Menteer <105233703+hashi-derek@users.noreply.github.com>
Co-authored-by: Derek Menteer <derek.menteer@hashicorp.com>
2022-11-09 09:29:55 -08:00
Eric Haberkorn 69914f59f7
Log Warnings When Peering With Mesh Gateway Mode None (#15304)
warn when mesh gateway mode is set to none for peering
2022-11-09 11:48:58 -05:00
Derek Menteer 9e76d274ec
Fix mesh gateway configuration with proxy-defaults (#15186)
* Fix mesh gateway proxy-defaults not affecting upstreams.

* Clarify distinction with upstream settings

Top-level mesh gateway mode in proxy-defaults and service-defaults gets
merged into NodeService.Proxy.MeshGateway, and only gets merged with
the mode attached to an an upstream in proxycfg/xds.

* Fix mgw mode usage for peered upstreams

There were a couple issues with how mgw mode was being handled for
peered upstreams.

For starters, mesh gateway mode from proxy-defaults
and the top-level of service-defaults gets stored in
NodeService.Proxy.MeshGateway, but the upstream watch for peered data
was only considering the mesh gateway config attached in
NodeService.Proxy.Upstreams[i]. This means that applying a mesh gateway
mode via global proxy-defaults or service-defaults on the downstream
would not have an effect.

Separately, transparent proxy watches for peered upstreams didn't
consider mesh gateway mode at all.

This commit addresses the first issue by ensuring that we overlay the
upstream config for peered upstreams as we do for non-peered. The second
issue is addressed by re-using setupWatchesForPeeredUpstream when
handling transparent proxy updates.

Note that for transparent proxies we do not yet support mesh gateway
mode per upstream, so the NodeService.Proxy.MeshGateway mode is used.

* Fix upstream mesh gateway mode handling in xds

This commit ensures that when determining the mesh gateway mode for
peered upstreams we consider the NodeService.Proxy.MeshGateway config as
a baseline.

In absense of this change, setting a mesh gateway mode via
proxy-defaults or the top-level of service-defaults will not have an
effect for peered upstreams.

* Merge service/proxy defaults in cfg resolver

Previously the mesh gateway mode for connect proxies would be
merged at three points:

1. On servers, in ComputeResolvedServiceConfig.
2. On clients, in MergeServiceConfig.
3. On clients, in proxycfg/xds.

The first merge returns a ServiceConfigResponse where there is a
top-level MeshGateway config from proxy/service-defaults, along with
per-upstream config.

The second merge combines per-upstream config specified at the service
instance with per-upstream config specified centrally.

The third merge combines the NodeService.Proxy.MeshGateway
config containing proxy/service-defaults data with the per-upstream
mode. This third merge is easy to miss, which led to peered upstreams
not considering the mesh gateway mode from proxy-defaults.

This commit removes the third merge, and ensures that all mesh gateway
config is available at the upstream. This way proxycfg/xds do not need
to do additional overlays.

* Ensure that proxy-defaults is considered in wc

Upstream defaults become a synthetic Upstream definition under a
wildcard key "*". Now that proxycfg/xds expect Upstream definitions to
have the final MeshGateway values, this commit ensures that values from
proxy-defaults/service-defaults are the default for this synthetic
upstream.

* Add changelog.

Co-authored-by: freddygv <freddy@hashicorp.com>
2022-11-09 10:14:29 -06:00
Dan Upton acfdbb23a9
chore: remove unused argument from MergeNodeServiceWithCentralConfig (#15024)
Previously, the MergeNodeServiceWithCentralConfig method accepted a
ServiceSpecificRequest argument, of which only the Datacenter and
QueryOptions fields were used.

Digging a little deeper, it turns out these fields were only passed
down to the ComputeResolvedServiceConfig method (through the
ServiceConfigRequest struct) which didn't actually use them.

As such, not all call-sites passed a valid ServiceSpecificRequest
so it's safer to remove the argument altogether to prevent future
changes from depending on it.
2022-11-09 14:54:57 +00:00
Derek Menteer a8eb047ee6
Bring back parameter ServerExternalAddresses in GenerateToken endpoint (#15267)
Re-add ServerExternalAddresses parameter in GenerateToken endpoint

This reverts commit 5e156772f6a7fba5324eb6804ae4e93c091229a6
and adds extra functionality to support newer peering behaviors.
2022-11-08 14:55:18 -06:00
cskh 3d2d7a77cb
fix(mesh-gateway): remove deregistered service from mesh gateway (#15272)
* fix(mesh-gateway): remove deregistered service from mesh gateway

* changelog

Co-authored-by: Derek Menteer <105233703+hashi-derek@users.noreply.github.com>
Co-authored-by: Evan Culver <eculver@users.noreply.github.com>
2022-11-07 20:30:15 -05:00
Freddy eee0fb1035
Avoid blocking child type updates on parent ack (#15083) 2022-11-07 18:10:42 -07:00
Derek Menteer 4672d8bd3c
Backport test fix from ent. (#15279) 2022-11-07 12:17:46 -06:00
Chris S. Kim dbe3dc96f3
Update hcp-scada-provider to fix diamond dependency problem with go-msgpack (#15185) 2022-11-07 11:34:30 -05:00
Eric Haberkorn d6b614110a
Fix a bug in mesh gateway proxycfg where ACL tokens aren't passed. (#15273) 2022-11-07 10:00:11 -05:00
Dan Stough 3eb3cf3b0d
fix: persist peering CA updates to dialing clusters (#15243)
fix: persist peering CA updates to dialing clusters
2022-11-04 12:53:20 -04:00
Derek Menteer 7bcded133e
Backport tests from ent. (#15260)
* Backport agent tests.

Original commit: 0710b2d12fb51a29cedd1119b5fb086e5c71f632
Original commit: aaedb3c28bfe247266f21013d500147d8decb7cd (partial)

* Backport test fix and reduce flaky failures.
2022-11-04 10:19:24 -05:00
Derek Menteer 9245a44e68
Backport test from ENT: "Fix missing test fields" (#15258)
* Backport test from ENT: "Fix missing test fields"

Original Author: Sarah Pratt
Original Commit: a5c88bef7a969ea5d06ed898d142ab081ba65c69

* Update with proper linting.
2022-11-04 09:29:16 -05:00
Derek Menteer 261ba1e65d
Backport various fixes from ENT. (#15254)
* Regenerate golden files.

* Backport from ENT: "Avoid race"

Original commit: 5006c8c858b0e332be95271ef9ba35122453315b
Original author: freddygv

* Backport from ENT: "chore: fix flake peerstream test"

Original commit: b74097e7135eca48cc289798c5739f9ef72c0cc8
Original author: DanStough
2022-11-03 16:34:57 -05:00
malizz 24ddeac74b
convert stream status time fields to pointers (#15252) 2022-11-03 11:51:22 -07:00
sarahalsmiller befefe42ee
Added check for empty peeringsni in restrictPeeringEndpoints (#15239)
Add check for empty peeringSNI in restrictPeeringEndpoints

Co-authored-by: Derek Menteer <derek.menteer@hashicorp.com>
2022-11-02 17:20:52 -05:00
Derek Menteer f704e72f3e
Prevent peering acceptor from subscribing to addr updates. (#15214) 2022-11-02 07:55:41 -05:00
Dan Stough 19ec59c930
test: refactor testcontainers and add peering integ tests (#15084) 2022-11-01 15:03:23 -04:00
Derek Menteer cad89029dd Decrease retry time for failed peering connections. 2022-10-31 14:30:27 -05:00
R.B. Boyer 879584a773
test: fix flaky TestSubscribeBackend_IntegrationWithServer_DeliversAllMessages test (#15195)
Allow for some message duplication in subscription events during assertions.

I'm pretty sure the subscriptions machinery allows for messages to occasionally
be duplicated instead of dropping them, as a once-and-only-once queue is a pipe
dream and you have to pick one of the other two options.
2022-10-31 12:10:43 -05:00
Evan Culver 548cf6f7a4
connect: Add Envoy 1.24 to integration tests, remove Envoy 1.20 (#15093) 2022-10-31 10:50:45 -05:00
Derek Menteer 58f15db4c4 Allow peering endpoints to bypass verify_incoming. 2022-10-31 09:56:30 -05:00
Derek Menteer 065e538de3 Add tests. 2022-10-31 08:45:00 -05:00
Derek Menteer 59a385bc9a Fix peered service protocols using proxy-defaults. 2022-10-31 08:45:00 -05:00
Eric Haberkorn 57fb729547
Fix peering metrics bug (#15178)
This bug was caused by the peering health metric being set to NaN.
2022-10-28 10:51:12 -04:00
Chris S. Kim a0ac76ecf5
Allow consul debug on non-ACL consul servers (#15155) 2022-10-27 09:25:18 -04:00
cskh 57380ea752
fix(peering): nil pointer in calling handleUpdateService (#15160)
* fix(peering): nil pointer in calling handleUpdateService

* changelog
2022-10-26 11:50:34 -04:00
Eric Haberkorn 74baaf910c
fix bug that resulted in generating Envoy configs that use CDS with an EDS configuration (#15140) 2022-10-25 14:49:57 -04:00
Luke Kysow 4956b81333
ingress-gateways: don't log error when registering gateway (#15001)
* ingress-gateways: don't log error when registering gateway

Previously, when an ingress gateway was registered without a
corresponding ingress gateway config entry, an error was logged
because the watch on the config entry returned a nil result.
This is expected so don't log an error.
2022-10-25 10:55:44 -07:00
Luke Kysow 6b1ec05470
autoencrypt: helpful error for clients with wrong dc (#14832)
* autoencrypt: helpful error for clients with wrong dc

If clients have set a different datacenter than the servers they're
connecting with for autoencrypt, give a helpful error message.
2022-10-25 10:13:41 -07:00
R.B. Boyer a01936442c
cache: refactor agent cache fetching to prevent unnecessary fetches on error (#14956)
This continues the work done in #14908 where a crude solution to prevent a
goroutine leak was implemented. The former code would launch a perpetual
goroutine family every iteration (+1 +1) and the fixed code simply caused a
new goroutine family to first cancel the prior one to prevent the
leak (-1 +1 == 0).

This PR refactors this code completely to:

- make it more understandable
- remove the recursion-via-goroutine strangeness
- prevent unnecessary RPC fetches when the prior one has errored.

The core issue arose from a conflation of the entry.Fetching field to mean:

- there is an RPC (blocking query) in flight right now
- there is a goroutine running to manage the RPC fetch retry loop

The problem is that the goroutine-leak-avoidance check would treat
Fetching like (2), but within the body of a goroutine it would flip that
boolean back to false before the retry sleep. This would cause a new
chain of goroutines to launch which #14908 would correct crudely.

The refactored code uses a plain for-loop and changes the semantics
to track state for "is there a goroutine associated with this cache entry"
instead of the former.

We use a uint64 unique identity per goroutine instead of a boolean so
that any orphaned goroutines can tell when they've been replaced when
the expiry loop deletes a cache entry while the goroutine is still running
and is later replaced.
2022-10-25 10:27:26 -05:00
R.B. Boyer bcbe7b225f
test: ensure that all dependencies in a test agent use the test logger (#14996) 2022-10-24 17:02:38 -05:00
Chris S. Kim 5e901bfa01 Remove invalid 1xx HTTP codes
These tests started failing in go1.19, presumably due to
support for valid 1xx responses being added.

https://github.com/golang/go/issues/56346
2022-10-24 16:12:08 -04:00
Chris S. Kim ae1646706f Regenerate files according to 1.19.2 formatter 2022-10-24 16:12:08 -04:00
cskh a5acb987fa
fix(peering): replicating wan address (#15108)
* fix(peering): replicating wan address

* add changelog

* unit test
2022-10-24 15:44:57 -04:00
Iryna Shustava a3a6743e0a
proxycfg: watch service-defaults config entries (#15025)
To support Destinations on the service-defaults (for tproxy with terminating gateway), we need to now also make servers watch service-defaults config entries.
2022-10-24 12:50:28 -06:00
Chris S. Kim 06f583a7c2 Move oss-only test to its own file 2022-10-24 14:17:43 -04:00
R.B. Boyer bf05547080
test: fix flaky TestHealthServiceNodes_NodeMetaFilter by waiting until the streaming subsystem has a valid grpc connection (#15019)
Also potentially unflakes TestHealthIngressServiceNodes for similar
reasons.
2022-10-24 13:09:53 -05:00
R.B. Boyer 87432a8dd4
chore: update golangci-lint to v1.50.1 (#15022) 2022-10-24 11:48:02 -05:00
Venu Yanamandra 3dd12a2960
Update error message when restoring ENT snapshot in OSS (#15066) 2022-10-24 11:40:26 -04:00
freddygv 483720a443 Return forbidden on permission denied
This commit updates the establish endpoint to bubble up a 403 status
code to callers when the establishment secret from the token is invalid.
This is a signal that a new peering token must be generated.
2022-10-20 17:11:49 -06:00
Chris S. Kim 569c3bce88 Update expected encoding in test
go-memdb was updated in v1.3.3 to make integers in indexes sortable, which changed how integers were encoded.
2022-10-20 14:32:42 -04:00
freddygv f3548167fc Use plain TaggedAddressWAN 2022-10-19 16:32:44 -06:00
freddygv 1b589ba964 Add unit test 2022-10-19 16:26:15 -06:00
cskh c0dc93e5b8 fix: wan address isn't used by peering token 2022-10-19 16:33:25 -04:00
Nitya Dhanushkodi 598670e376
Remove ability to specify external addresses in GenerateToken endpoint (#14930)
* Reverts "update generate token endpoint to take external addresses (#13844)"

This reverts commit f47319b7c6b6e7c7dd720a5af927ad2d33fa536d.
2022-10-19 09:31:36 -07:00
Kyle Havlovitz 3c13cf0994
Merge pull request #15035 from hashicorp/vault-ttl-update-warn
Warn instead of returning error when missing intermediate mount tune permissions
2022-10-18 15:41:52 -07:00
cskh e18434bcb1
peering: skip registering duplicate node and check from the peer (#14994)
* peering: skip register duplicate node and check from the peer

* Prebuilt the nodes map and checks map to avoid repeated for loop

* use key type to struct: node id, service id, and check id
2022-10-18 16:19:24 -04:00
Chris S. Kim e4c20ec190
Refactor client RPC timeouts (#14965)
Fix an issue where rpc_hold_timeout was being used as the timeout for non-blocking queries. Users should be able to tune read timeouts without fiddling with rpc_hold_timeout. A new configuration `rpc_read_timeout` is created.

Refactor some implementation from the original PR 11500 to remove the misleading linkage between RPCInfo's timeout (used to retry in case of certain modes of failures) and the client RPC timeouts.
2022-10-18 15:05:09 -04:00
Kyle Havlovitz 0a968e53b5 Warn instead of returning an error when intermediate mount tune permission is missing 2022-10-18 12:01:25 -07:00
R.B. Boyer 0712e1a456
test: possibly fix flake in TestIntentionGetExact (#15021)
Restructure test setup to be similar to TestAgent_ServerCertificate
and see if that's enough to avoid flaking after join.
2022-10-18 10:51:20 -05:00
R.B. Boyer 9f41cc4a25
cache: prevent goroutine leak in agent cache (#14908)
There is a bug in the error handling code for the Agent cache subsystem discovered:

1. NotifyCallback calls notifyBlockingQuery which calls getWithIndex in
   a loop (which backs off on-error up to 1 minute)

2. getWithIndex calls fetch if there’s no valid entry in the cache

3. fetch starts a goroutine which calls Fetch on the cache-type, waits
   for a while (again with backoff up to 1 minute for errors) and then
   calls fetch to trigger a refresh

The end result being that every 1 minute notifyBlockingQuery spawns an
ancestry of goroutines that essentially lives forever.

This PR ensures that the goroutine started by `fetch` cancels any prior
goroutine spawned by the same line for the same key.

In isolated testing where a cache type was tweaked to indefinitely
error, this patch prevented goroutine counts from skyrocketing.
2022-10-17 14:38:10 -05:00
R.B. Boyer ca916eec32
ca: fix a masked bug in leaf cert generation that would not be notified of root cert rotation after the first one (#15005)
In practice this was masked by #14956 and was only uncovered fixing the
other bug.

  go test ./agent -run TestAgentConnectCALeafCert_goodNotLocal

would fail when only #14956 was fixed.
2022-10-17 13:24:27 -05:00
Chris S. Kim 58c041eb6e
Merge pull request #13388 from deblasis/feature/health-checks_windows_service
Feature: Health checks windows service
2022-10-17 09:26:19 -04:00
Dan Upton 90129919a8
proxycfg: fix goroutine leak when service is re-registered (#14988)
Fixes a bug where we'd leak a goroutine in state.run when the given
context was canceled while there was a pending update.
2022-10-17 11:31:10 +01:00
Kyle Havlovitz 096ca5e4b0 Extend tcp keepalive settings to work for terminating gateways as well 2022-10-14 17:05:46 -07:00
Kyle Havlovitz f8e745315f Update docs and add tcp_keepalive_probes setting 2022-10-14 17:05:46 -07:00
Kyle Havlovitz 526d49c6ff Add TCP keepalive settings to proxy config for mesh gateways 2022-10-14 17:05:46 -07:00
Derek Menteer 25d3d244f0 Fix issue with incorrect method signature on test. 2022-10-14 11:04:57 -05:00
Freddy bbf6b17e44
Merge pull request #14981 from hashicorp/peering/dial-through-gateways 2022-10-14 09:44:56 -06:00
Dan Upton 3b9297f95a
proxycfg: rate-limit delivery of config snapshots (#14960)
Adds a user-configurable rate limiter to proxycfg snapshot delivery,
with a default limit of 250 updates per second.

This addresses a problem observed in our load testing of Consul
Dataplane where updating a "global" resource such as a wildcard
intention or the proxy-defaults config entry could starve the Raft or
Memberlist goroutines of CPU time, causing general cluster instability.
2022-10-14 15:52:00 +01:00
Derek Menteer 6c355134e8 Add tests for peering state snapshots / restores. 2022-10-14 09:48:04 -05:00
Derek Menteer 27bbdced8d Add test for ExportedServicesForAllPeersByName 2022-10-14 09:48:04 -05:00
Dan Upton 0a0534a094
perf: remove expensive reflection from xDS hot path (#14934)
Replaces the reflection-based implementation of proxycfg's
ConfigSnapshot.Clone with code generated by deep-copy.

While load testing server-based xDS (for consul-dataplane) we discovered
this method is extremely expensive. The ConfigSnapshot struct, directly
or indirectly, contains a copy of many of the structs in the agent/structs
package, which creates a large graph for copystructure.Copy to traverse
at runtime, on every proxy reconfiguration.
2022-10-14 10:26:42 +01:00
freddygv 89596f13c4 Use split var in tests 2022-10-13 17:12:47 -06:00
freddygv b4e48f0a70 Use split wildcard partition name
This way OSS avoids passing a non-empty label, which will be rejected in
OSS consul.
2022-10-13 16:55:28 -06:00
Freddy 909fc33271
Merge pull request #14935 from hashicorp/fix/alias-leak 2022-10-13 16:31:15 -06:00
freddygv 452dc2867c Lint 2022-10-13 15:55:55 -06:00
Derek Menteer 092e5fd074 Reset wait on ensureServerAddrSubscription 2022-10-13 15:58:26 -05:00
freddygv 437a513d9b Fix CA init error code 2022-10-13 14:58:11 -06:00
freddygv 37a765f8df Update leader routine to maybe use gateways 2022-10-13 14:58:00 -06:00
freddygv 239f0e3084 Update peering establishment to maybe use gateways
When peering through mesh gateways we expect outbound dials to peer
servers to flow through the local mesh gateway addresses.

Now when establishing a peering we get a list of dial addresses as a
ring buffer that includes local mesh gateway addresses if the local DC
is configured to peer through mesh gateways. The ring buffer includes
the mesh gateway addresses first, but also includes the remote server
addresses as a fallback.

This fallback is present because it's possible that direct egress from
the servers may be allowed. If not allowed then the leader will cycle
back to a mesh gateway address through the ring.

When attempting to dial the remote servers we retry up to a fixed
timeout. If using mesh gateways we also have an initial wait in
order to allow for the mesh gateways to configure themselves.

Note that if we encounter a permission denied error we do not retry
since that error indicates that the secret in the peering token is
invalid.
2022-10-13 14:57:55 -06:00
malizz 27d0181806
increase protobuf size limit for cluster peering (#14976) 2022-10-13 13:46:51 -07:00
Derek Menteer ff01c11672 Address PR comments. 2022-10-13 14:11:02 -05:00
Derek Menteer cc0a05ffa0 Disallow peering to the same cluster. 2022-10-13 14:11:02 -05:00
Derek Menteer d47c9b446c Prevent consul peer-exports by discovery chain. 2022-10-13 12:45:09 -05:00
Derek Menteer ee49db9a2f Prevent the "consul" service from being exported. 2022-10-13 12:45:09 -05:00
Derek Menteer bfa4adbfce Add remote peer partition and datacenter info. 2022-10-13 10:37:41 -05:00
Dan Upton de7f380385
xds: properly merge central config for "agentless" services (#14962) 2022-10-13 12:04:59 +01:00
Dan Upton 36a3d00f0d
bug: fix goroutine leaks caused by incorrect usage of WatchCh (#14916)
memdb's `WatchCh` method creates a goroutine that will publish to the
returned channel when the watchset is triggered or the given context
is canceled. Although this is called out in its godoc comment, it's
not obvious that this method creates a goroutine who's lifecycle you
need to manage.

In the xDS capacity controller, we were calling `WatchCh` on each
iteration of the control loop, meaning the number of goroutines would
grow on each autopilot event until there was catalog churn.

In the catalog config source, we were calling `WatchCh` with the
background context, meaning that the goroutine would keep running after
the sync loop had terminated.
2022-10-13 12:04:27 +01:00
Hans Hasselberg 56580d6fa6
adding configuration option cloud.scada_address (#14936)
* adding scada_address

* config tests

* add changelog entry
2022-10-13 11:31:28 +02:00
Paul Glass be1a4438a9
Add consul.xds.server.streamStart metric (#14957)
This adds a new consul.xds.server.streamStart metric to measure the time taken to first generate xDS resources after an xDS stream is opened.
2022-10-12 14:17:58 -05:00
Riddhi Shah 474d9cfcdc
Service http checks data source for agentless proxies (#14924)
Adds another datasource for proxycfg.HTTPChecks, for use on server agents. Typically these checks are performed by local client agents and there is no equivalent of this in agentless (where servers configure consul-dataplane proxies).
Hence, the data source is mostly a no-op on servers but in the case where the service is present within the local state, it delegates to the cache data source.
2022-10-12 07:49:56 -07:00
Freddy 4cf0bf4865
Merge pull request #14958 from hashicorp/peering/nonce 2022-10-12 08:18:15 -06:00
freddygv 4d1e7c4cbb Actually track nonce in test 2022-10-12 07:50:17 -06:00
Derek Menteer 00312bcf57 Fix incorrect backoff-wait logic. 2022-10-12 08:01:10 -05:00
freddygv c9d171c031 Add basic nonce management
This commit adds a monotonically increasing nonce to include in peering
replication response messages. Every ack/nack from the peer handling a
response will include this nonce, allowing to correlate the ack/nack
with a specific resource.

At the moment nothing is done with the nonce when it is received. In the
future we may want to add functionality such as retries on NACKs,
depending on the class of error.
2022-10-11 19:02:04 -06:00
Paul Glass 8cf430140a
gRPC server metrics (#14922)
* Move stats.go from grpc-internal to grpc-middleware
* Update grpc server metrics with server type label
* Add stats test to grpc-external
* Remove global metrics instance from grpc server tests
2022-10-11 17:00:32 -05:00
cskh 45278cb69e
fix(peering): add missing grpc_tls_port for server address reconciliation (#14944) 2022-10-11 10:56:29 -04:00
freddygv 9f0ab69aef Fix alias check leak
Preivously when alias check was removed it would not be stopped nor
cleaned up from the associated aliasChecks map.

This means that any time an alias check was deregistered we would
leak a goroutine for CheckAlias.run() because the stopCh would never
be closed.

This issue mostly affects service mesh deployments on platforms where
the client agent is mostly static but proxy services come and go
regularly, since by default sidecars are registered with an alias check.
2022-10-10 16:42:29 -06:00
James Oulman a8695c88d4
Configure Envoy alpn_protocols based on service protocol (#14356)
* Configure Envoy alpn_protocols based on service protocol

* define alpnProtocols in a more standard way

* http2 protocol should be h2 only

* formatting

* add test for getAlpnProtocol()

* create changelog entry

* change scope is connect-proxy

* ignore errors on ParseProxyConfig; fixes linter

* add tests for grpc and http2 public listeners

* remove newlines from PR

* Add alpn_protocol configuration for ingress gateway

* Guard against nil tlsContext

* add ingress gateway w/ TLS tests for gRPC and HTTP2

* getAlpnProtocols: add TCP protocol test

* add tests for ingress gateway with grpc/http2 and per-listener TLS config

* add tests for ingress gateway with grpc/http2 and per-listener TLS config

* add Gateway level TLS config with mixed protocol listeners to validate ALPN

* update changelog to include ingress-gateway

* add http/1.1 to http2 ALPN

* go fmt

* fix test on custom-trace-listener
2022-10-10 13:13:56 -07:00
freddygv 55b5c1a073 Fixup test 2022-10-10 13:20:14 -06:00
Chris S. Kim 7f48033d0b Fix nil pointer 2022-10-10 13:20:14 -06:00
Chris S. Kim 9d4fb0445a Include stream-related information in peering endpoints 2022-10-10 13:20:14 -06:00
Paul Glass a3fccf5e5b
Merge central config for GetEnvoyBootstrapParams (#14869)
This fixes GetEnvoyBootstrapParams to merge in proxy-defaults and service-defaults.

Co-authored-by: Dan Upton <daniel@floppy.co>
2022-10-10 12:40:27 -05:00
Freddy 8d93f120ea
Merge pull request #14796 from hashicorp/peering/use-connect-ca 2022-10-07 10:37:37 -06:00
freddygv ae9b3eb662 Fixup test 2022-10-07 09:34:16 -06:00
freddygv 6ef8d329d2 Require Connect and TLS to generate peering tokens
By requiring Connect and a gRPC TLS listener we can automatically
configure TLS for all peering control-plane traffic.
2022-10-07 09:06:29 -06:00
freddygv a21e5799f7 Use internal server certificate for peering TLS
A previous commit introduced an internally-managed server certificate
to use for peering-related purposes.

Now the peering token has been updated to match that behavior:
- The server name matches the structure of the server cert
- The CA PEMs correspond to the Connect CA

Note that if Conect is disabled, and by extension the Connect CA, we
fall back to the previous behavior of returning the manually configured
certs and local server SNI.

Several tests were updated to use the gRPC TLS port since they enable
Connect by default. This means that the peering token will embed the
Connect CA, and the dialer will expect a TLS listener.
2022-10-07 09:05:32 -06:00
freddygv 1c696922fe Simplify mgw watch mgmt 2022-10-07 08:54:37 -06:00
freddygv b67d001b2c Use existing query options to build ctx 2022-10-07 08:46:53 -06:00
DanStough df94470e76 feat: xDS updates for peerings control plane through mesh gw 2022-10-07 08:46:42 -06:00
Eric Haberkorn 2f08fab317
Make the mesh gateway changes to allow local mode for cluster peering data plane traffic (#14817)
Make the mesh gateway changes to allow `local` mode for cluster peering data plane traffic
2022-10-06 09:54:14 -04:00
cskh 53ff317b01
fix: missing UDP field in checkType (#14885)
* fix: missing UDP field in checkType

* Add changelog

* Update doc
2022-10-05 15:57:21 -04:00
Derek Menteer fbee1272e7
Fix explicit tproxy listeners with discovery chains. (#14751)
Fix explicit tproxy listeners with discovery chains.
2022-10-05 14:38:25 -05:00
Alex Oskotsky 4d9309327f
Add the ability to retry on reset connection to service-routers (#12890) 2022-10-05 13:06:44 -04:00
John Murret 08203ace4a
Upgrade serf to v0.10.1 and memberlist to v0.5.0 to get memberlist size metrics and broadcast queue depth metric (#14873)
* updating to serf v0.10.1 and memberlist v0.5.0 to get memberlist size metrics and memberlist broadcast queue depth metric

* update changelog

* update changelog

* correcting changelog

* adding "QueueCheckInterval" for memberlist to test

* updating integration test containers to grab latest api
2022-10-04 17:51:37 -06:00
Evan Culver 42423ffce2
connect: Bump Envoy 1.20 to 1.20.7, 1.21 to 1.21.5 and 1.22 to 1.22.5 (#14831) 2022-10-04 13:15:01 -07:00
Eric Haberkorn 2178e38204
Rename PeerName to Peer on prepared queries and exported services (#14854) 2022-10-04 14:46:15 -04:00
Freddy 89141256c7
Merge pull request #14734 from hashicorp/NET-643-update-mesh-gateway-envoy-config-for-inbound-peering-control-plane-traffic 2022-10-03 12:54:11 -06:00
freddygv 0d61aa5d37 Update xds generation for peering over mesh gws
This commit adds the xDS resources needed for INBOUND traffic from peer
clusters:

- 1 filter chain for all inbound peering requests.
- 1 cluster for all inbound peering requests.
- 1 endpoint per voting server with the gRPC TLS port configured.

There is one filter chain and cluster because unlike with WAN
federation, peer clusters will not attempt to dial individual servers.
Peer clusters will only dial the local mesh gateway addresses.
2022-10-03 12:42:27 -06:00
freddygv 2c5caec97c Share mgw addrs in peering stream if needed
This commit adds handling so that the replication stream considers
whether the user intends to peer through mesh gateways.

The subscription will return server or mesh gateway addresses depending
on the mesh configuration setting. These watches can be updated at
runtime by modifying the mesh config entry.
2022-10-03 11:42:20 -06:00
freddygv 17463472b7 Return mesh gateway addrs if peering through mgw 2022-10-03 11:35:10 -06:00
chappie f49332a151
Merge pull request #14811 from hashicorp/chappie/dns
Add DNS gRPC proxying support
2022-10-03 08:02:48 -07:00
Chris Chapman 1b24aafb23
Making suggested comments 2022-09-30 15:03:33 -07:00
Chris Chapman 399fafb679
Making suggested changes 2022-09-30 14:51:12 -07:00
Chris Chapman 8e44a8c644
Update comment 2022-09-30 09:35:01 -07:00
DanStough 16fe27c9b8 chore: fix flakey scada provider test 2022-09-30 11:56:40 -04:00
Chris Chapman c4c5f900e0
Bind a dns mux handler to gRPC proxy 2022-09-29 21:44:45 -07:00
Chris Chapman 175e6e56f9
Adding grpc handler for dns proxy 2022-09-29 21:19:51 -07:00
Eric Haberkorn 5fd1e6daea
Add exported services event to cluster peering replication. (#14797) 2022-09-29 15:37:19 -04:00
Ashwin Venkatesh ddcd3e06e7
bug: watch local mesh gateways in non-default partitions with agentless (#14799) 2022-09-29 13:19:04 -04:00
cskh 4ece020bf1
feat(ingress gateway: support configuring limits in ingress-gateway c… (#14749)
* feat(ingress gateway: support configuring limits in ingress-gateway config entry

- a new Defaults field with max_connections, max_pending_connections, max_requests
  is added to ingress gateway config entry
- new field max_connections, max_pending_connections, max_requests in
  individual services to overwrite the value in Default
- added unit test and integration test
- updated doc

Co-authored-by: Chris S. Kim <ckim@hashicorp.com>
Co-authored-by: Jeff Boruszak <104028618+boruszak@users.noreply.github.com>
Co-authored-by: Dan Stough <dan.stough@hashicorp.com>
2022-09-28 14:56:46 -04:00