Commit Graph

38 Commits

Author SHA1 Message Date
hc-github-team-consul-core 48b9a0ff8c
Backport of Fix xDS missing endpoint race condition. into release/1.16.x (#19873)
Fix xDS missing endpoint race condition.

This fixes the following race condition:
- Send update endpoints
- Send update cluster
- Recv ACK endpoints
- Recv ACK cluster

Prior to this fix, it would have resulted in the endpoints NOT existing in
Envoy. This occurred because the cluster update implicitly clears the endpoints
in Envoy, but we would never re-send the endpoint data to compensate for the
loss, because we would incorrectly ACK the invalid old endpoint hash. Since the
endpoint's hash did not actually change, they would not be resent.

The fix for this is to effectively clear out the invalid pending ACKs for child
resources whenever the parent changes. This ensures that we do not store the
child's hash as accepted when the race occurs.

An escape-hatch environment variable `XDS_PROTOCOL_LEGACY_CHILD_RESEND` was
added so that users can revert back to the old legacy behavior in the event
that this produces unknown side-effects.

This bug report and fix was mostly implemented by @ksmiley with some minor
tweaks.

Co-authored-by: Derek Menteer <derek.menteer@hashicorp.com>
Co-authored-by: Keith Smiley <ksmiley@salesforce.com>
2023-12-08 12:16:43 -06:00
hc-github-team-consul-core f6c31df709
Backport of [NET-4703] Prevent partial application of Envoy extensions into release/1.16.x (#18332)
backport of commit a920c7195b4ea298bf32b2d535a7eed60e6e0050

Co-authored-by: Michael Zalimeni <michael.zalimeni@hashicorp.com>
2023-07-31 19:37:54 +00:00
hc-github-team-consul-core 0d8d74de0c
backport of commit 43d48124139eb3808cb9ebe6ebda83c7e66481d7 (#17742)
Co-authored-by: Chris Thain <chris.m.thain@gmail.com>
2023-06-14 17:18:59 +00:00
hc-github-team-consul-core 4d369c4aa4
backport of commit 2735bbe60f316a4d4539752a8dd63a3ca360e49b (#17613)
Co-authored-by: Eric <eric@haberkorn.co>
2023-06-08 14:41:44 +00:00
Ronald dd0e8eec14
copyright headers for agent folder (#16704)
* copyright headers for agent folder

* Ignore test data files

* fix proto files and remove headers in agent/uiserver folder

* ignore deep-copy files
2023-03-28 14:39:22 -04:00
Andrew Stucki c8e5a1a684
Inline API Gateway TLS cert code (#16295)
* Include secret type when building resources from config snapshot

* First pass at generating envoy secrets from api-gateway snapshot

* Update comments for xDS update order

* Add secret type + corresponding golden files to existing tests

* Initialize test helpers for testing api-gateway resource generation

* Generate golden files for new api-gateway xDS resource test

* Support ADS for TLS certificates on api-gateway

* Configure TLS on api-gateway listeners

* Inline TLS cert code

* update tests

* Add SNI support so we can have multiple certificates

* Remove commented out section from helper

* regen deep-copy

* Add tcp tls test

---------

Co-authored-by: Nathan Coleman <nathan.coleman@hashicorp.com>
2023-02-17 12:46:03 -05:00
Nathan Coleman 423257c473
Implement APIGateway proxycfg snapshot (#16194)
* Stub proxycfg handler for API gateway

* Add Service Kind constants/handling for API Gateway

* Begin stubbing for SDS

* Add new Secret type to xDS order of operations

* Continue stubbing of SDS

* Iterate on proxycfg handler for API gateway

* Handle BoundAPIGateway config entry subscription in proxycfg-glue

* Add API gateway to config snapshot validation

* Add API gateway to config snapshot clone, leaf, etc.

* Subscribe to bound route + cert config entries on bound-api-gateway

* Track routes + certs on API gateway config snapshot

* Generate DeepCopy() for types used in watch.Map

* Watch all active references on api-gateway, unwatch inactive

* Track loading of initial bound-api-gateway config entry

* Use proper proto package for SDS mapping

* Use ResourceReference instead of ServiceName, collect resources

* Fix typo, add + remove TODOs

* Watch discovery chains for TCPRoute

* Add TODO for updating gateway services for api-gateway

* make proto

* Regenerate deep-copy for proxycfg

* Set datacenter on upstream ID from query source

* Watch discovery chains for http-route service backends

* Add ServiceName getter to HTTP+TCP Service structs

* Clean up unwatched discovery chains on API Gateway

* Implement watch for ingress leaf certificate

* Collect upstreams on http-route + tcp-route updates

* Remove unused GatewayServices update handler

* Remove unnecessary gateway services logic for API Gateway

* Remove outdate TODO

* Use .ToIngress where appropriate, including TODO for cleaning up

* Cancel before returning error

* Remove GatewayServices subscription

* Add godoc for handlerAPIGateway functions

* Update terminology from Connect => Consul Service Mesh

Consistent with terminology changes in https://github.com/hashicorp/consul/pull/12690

* Add missing TODO

* Remove duplicate switch case

* Rerun deep-copy generator

* Use correct property on config snapshot

* Remove unnecessary leaf cert watch

* Clean up based on code review feedback

* Note handler properties that are initialized but set elsewhere

* Add TODO for moving helper func into structs pkg

* Update generated DeepCopy code

* gofmt

* Generate DeepCopy() for API gateway listener types

* Improve variable name

* Regenerate DeepCopy() code

* Fix linting issue

* Temporarily remove the secret type from resource generation
2023-02-08 15:52:12 -06:00
Nitya Dhanushkodi 77f6b20db0
refactor: remove troubleshoot module dependency on consul top level module (#16162)
Ensure nothing in the troubleshoot go module depends on consul's top level module. This is so we can import troubleshoot into consul-k8s and not import all of consul.

* turns troubleshoot into a go module [authored by @curtbushko]
* gets the envoy protos into the troubleshoot module [authored by @curtbushko]
* adds a new go module `envoyextensions` which has xdscommon and extensioncommon folders that both the xds package and the troubleshoot package can import
* adds testing and linting for the new go modules
* moves the unit tests in `troubleshoot/validateupstream` that depend on proxycfg/xds into the xds package, with a comment describing why those tests cannot be in the troubleshoot package
* fixes all the imports everywhere as a result of these changes 

Co-authored-by: Curt Bushko <cbushko@gmail.com>
2023-02-06 09:14:35 -08:00
Nitya Dhanushkodi 6151bcfa75
refactor: move service to service validation to troubleshoot package (#16132)
This is to reduce the dependency on xds from within the troubleshoot package.
2023-02-02 22:18:10 -08:00
Derek Menteer 5572f1584d
Add Envoy extension metrics. (#16114) 2023-01-31 14:50:30 -06:00
Derek Menteer 81cf8f7de3
Add extension validation on config save and refactor extensions. (#16110) 2023-01-30 15:35:26 -06:00
Dan Upton 618deae657
xds: don't attempt to load-balance sessions for local proxies (#15789)
Previously, we'd begin a session with the xDS concurrency limiter
regardless of whether the proxy was registered in the catalog or in
the server's local agent state.

This caused problems for users who run `consul connect envoy` directly
against a server rather than a client agent, as the server's locally
registered proxies wouldn't be included in the limiter's capacity.

Now, the `ConfigSource` is responsible for beginning the session and we
only do so for services in the catalog.

Fixes: https://github.com/hashicorp/consul/issues/15753
2023-01-18 12:33:21 -06:00
Chris S. Kim c262065276
Warn if ACL is enabled but no token is provided to Envoy (#15967) 2023-01-16 12:31:56 -05:00
Matt Keeler 554f1e6fee
Protobuf Modernization (#15949)
* Protobuf Modernization

Remove direct usage of golang/protobuf in favor of google.golang.org/protobuf

Marshallers (protobuf and json) needed some changes to account for different APIs.

Moved to using the google.golang.org/protobuf/types/known/* for the well known types including replacing some custom Struct manipulation with whats available in the structpb well known type package.

This also updates our devtools script to install protoc-gen-go from the right location so that files it generates conform to the correct interfaces.

* Fix go-mod-tidy make target to work on all modules
2023-01-11 09:39:10 -05:00
Eric Haberkorn 01a0142d1f
Add the Lua Envoy extension (#15906) 2023-01-06 12:13:40 -05:00
Nitya Dhanushkodi 2800774f68
[OSS] extensions: refactor PluginConfiguration into a more generic type ExtensionConfiguration (#15846)
* extensions: refactor PluginConfiguration into a more generic type
ExtensionConfiguration

Also:
* adds endpoints configuration to lambda golden tests
* uses string constant for builtin/aws/lambda
Co-authored-by: Eric <eric@haberkorn.co>
2022-12-20 22:26:20 -08:00
Eric Haberkorn 5dd131fee8
Remove the `connect.enable_serverless_plugin` agent configuration option (#15710) 2022-12-08 14:46:42 -05:00
Freddy eee0fb1035
Avoid blocking child type updates on parent ack (#15083) 2022-11-07 18:10:42 -07:00
Paul Glass be1a4438a9
Add consul.xds.server.streamStart metric (#14957)
This adds a new consul.xds.server.streamStart metric to measure the time taken to first generate xDS resources after an xDS stream is opened.
2022-10-12 14:17:58 -05:00
malizz 5c470b28dd
Support Stale Queries for Trust Bundle Lookups (#14724)
* initial commit

* add tags, add conversations

* add test for query options utility functions

* update previous tests

* fix test

* don't error out on empty context

* add changelog

* update decode config
2022-09-28 09:56:59 -07:00
Dan Upton 9fe6c33c0d
xDS Load Balancing (#14397)
Prior to #13244, connect proxies and gateways could only be configured by an
xDS session served by the local client agent.

In an upcoming release, it will be possible to deploy a Consul service mesh
without client agents. In this model, xDS sessions will be handled by the
servers themselves, which necessitates load-balancing to prevent a single
server from receiving a disproportionate amount of load and becoming
overwhelmed.

This introduces a simple form of load-balancing where Consul will attempt to
achieve an even spread of load (xDS sessions) between all healthy servers.
It does so by implementing a concurrent session limiter (limiter.SessionLimiter)
and adjusting the limit according to autopilot state and proxy service
registrations in the catalog.

If a server is already over capacity (i.e. the session limit is lowered),
Consul will begin draining sessions to rebalance the load. This will result
in the client receiving a `RESOURCE_EXHAUSTED` status code. It is the client's
responsibility to observe this response and reconnect to a different server.

Users of the gRPC client connection brokered by the
consul-server-connection-manager library will get this for free.

The rate at which Consul will drain sessions to rebalance load is scaled
dynamically based on the number of proxies in the catalog.
2022-09-09 15:02:01 +01:00
Daniel Upton 1cd7ec0543 proxycfg: terminate stream on irrecoverable errors
This is the OSS portion of enterprise PR 2339.

It improves our handling of "irrecoverable" errors in proxycfg data sources.

The canonical example of this is what happens when the ACL token presented by
Envoy is deleted/revoked. Previously, the stream would get "stuck" until the
xDS server re-checked the token (after 5 minutes) and terminated the stream.

Materializers would also sit burning resources retrying something that could
never succeed.

Now, it is possible for data sources to mark errors as "terminal" which causes
the xDS stream to be closed immediately. Similarly, the submatview.Store will
evict materializers when it observes they have encountered such an error.
2022-08-23 20:17:49 +01:00
Dan Upton 34140ff3e0
grpc: rename public/private directories to external/internal (#13721)
Previously, public referred to gRPC services that are both exposed on
the dedicated gRPC port and have their definitions in the proto-public
directory (so were considered usable by 3rd parties). Whereas private
referred to services on the multiplexed server port that are only usable
by agents and other servers.

Now, we're splitting these definitions, such that external/internal
refers to the port and public/private refers to whether they can be used
by 3rd parties.

This is necessary because the peering replication API needs to be
exposed on the dedicated port, but is not (yet) suitable for use by 3rd
parties.
2022-07-13 16:33:48 +01:00
Dan Upton 5cd31933d1
xds: remove HTTPCheckFetcher dependency (#13366)
This is the OSS portion of enterprise PR 1994

Rather than directly interrogating the agent-local state for HTTP
checks using the `HTTPCheckFetcher` interface, we now rely on the
config snapshot containing the checks.

This reduces the number of changes required to support server xDS
sessions.

It's not clear why the fetching approach was introduced in
931d167ebb2300839b218d08871f22323c60175d.
2022-06-06 15:15:33 +01:00
Dan Upton a6a6d5a8ee
Enable servers to configure arbitrary proxies from the catalog (#13244)
OSS port of enterprise PR 1822

Includes the necessary changes to the `proxycfg` and `xds` packages to enable
Consul servers to configure arbitrary proxies using catalog data.

Broadly, `proxycfg.Manager` now has public methods for registering,
deregistering, and listing registered proxies — the existing local agent
state-sync behavior has been moved into a separate component that makes use of
these methods.

When an xDS session is started for a proxy service in the catalog, a goroutine
will be spawned to watch the service in the server's state store and
re-register it with the `proxycfg.Manager` whenever it is updated (and clean
it up when the client goes away).
2022-05-27 12:38:52 +01:00
Evan Culver 9d0b5bf8e9
connect: Add Envoy 1.22 to integration tests, remove Envoy 1.18 (#12805)
Co-authored-by: R.B. Boyer <rb@hashicorp.com>
2022-04-18 09:36:07 -07:00
R.B. Boyer 3060b5cb8f
xds: errors from the xds serverless plugin are fatal (#12682) 2022-04-01 10:30:26 -05:00
Eric 07bf0c5ce8 Create and wire up the serverless patcher 2022-03-15 10:12:57 -04:00
Eric 3f9b84e6a4 Make an xdscommon package that will be shared between Consul and Envoy plugins 2022-03-08 14:57:23 -05:00
Freddy bb129384b7
Prevent xDS tight loop on cfg errors (#12195) 2022-02-10 15:37:36 -07:00
R.B. Boyer 0cd0d505fa
xds: allow only one outstanding delta request at a time (#12236)
Fixes #11876

This enforces that multiple xDS mutations are not issued on the same ADS connection at once, so that we can 100% control the order that they are applied. The original code made assumptions about the way multiple in-flight mutations were applied on the Envoy side that was incorrect.
2022-02-08 10:36:48 -06:00
R.B. Boyer b999b3edfc
xds: fix for delta xDS reconnect bug in LDS/CDS (#12174)
When a wildcard xDS type (LDS/CDS/SRDS) reconnects from a delta xDS stream,
prior to envoy `1.19.0` it would populate the `ResourceNamesSubscribe` field
with the full list of currently subscribed items, instead of simply omitting it
to infer that it wanted everything (which is what wildcard mode means).

This upstream issue was filed in envoyproxy/envoy#16063 and fixed in
envoyproxy/envoy#16153 which went out in Envoy `1.19.0` and is fixed in later
versions (later refactored in envoyproxy/envoy#16855).

This PR conditionally forces LDS/CDS to be wildcard-only even when the
connected Envoy requests a non-wildcard subscription, but only does so on
versions prior to `1.19.0`, as we should not need to do this on later versions.

This fixes the failure case as described here: #11833 (comment)

Co-authored-by: Huan Wang <fredwanghuan@gmail.com>
2022-01-25 11:24:27 -06:00
Evan Culver 88a899d06a
connect: remove support for Envoy 1.15 2021-09-22 11:48:50 -07:00
R.B. Boyer 2773bd94d7
xds: fix representation of incremental xDS subscriptions (#10987)
Fixes #10563

The `resourceVersion` map was doing two jobs prior to this PR. The first job was
to track what version of every resource we know envoy currently has. The
second was to track subscriptions to those resources (by way of the empty
string for a version). This mostly works out fine, but occasionally leads to
consul removing a resource and accidentally (effectively) unsubscribing at the
same time.

The fix separates these two jobs. When all of the resources for a subscription
are removed we continue to track the subscription until envoy explicitly
unsubscribes
2021-09-21 09:58:56 -05:00
Daniel Nephin 9df2464c7c xds: document how authorization works 2021-08-17 19:26:34 -04:00
R.B. Boyer 8d5f81b460
xds: ensure that dependent xDS resources are reconfigured during primary type warming (#10381)
Updates to a cluster will clear the associated endpoints, and updates to
a listener will clear the associated routes. Update the incremental xDS
logic to account for this implicit cleanup so that we can finish warming
the clusters and listeners.

Fixes #10379
2021-06-14 17:20:27 -05:00
R.B. Boyer 7c9763d027
xds: emit a labeled gauge of connected xDS streams by version (#10243)
Fixes #10099
2021-05-14 13:59:13 -05:00
R.B. Boyer 91bee6246f
Support Incremental xDS mode (#9855)
This adds support for the Incremental xDS protocol when using xDS v3. This is best reviewed commit-by-commit and will not be squashed when merged.

Union of all commit messages follows to give an overarching summary:

xds: exclusively support incremental xDS when using xDS v3

Attempts to use SoTW via v3 will fail, much like attempts to use incremental via v2 will fail.
Work around a strange older envoy behavior involving empty CDS responses over incremental xDS.
xds: various cleanups and refactors that don't strictly concern the addition of incremental xDS support

Dissolve the connectionInfo struct in favor of per-connection ResourceGenerators instead.
Do a better job of ensuring the xds code uses a well configured logger that accurately describes the connected client.
xds: pull out checkStreamACLs method in advance of a later commit

xds: rewrite SoTW xDS protocol tests to use protobufs rather than hand-rolled json strings

In the test we very lightly reuse some of the more boring protobuf construction helper code that is also technically under test. The important thing of the protocol tests is testing the protocol. The actual inputs and outputs are largely already handled by the xds golden output tests now so these protocol tests don't have to do double-duty.

This also updates the SoTW protocol test to exclusively use xDS v2 which is the only variant of SoTW that will be supported in Consul 1.10.

xds: default xds.Server.AuthCheckFrequency at use-time instead of construction-time
2021-04-29 13:54:05 -05:00