Commit Graph

4954 Commits

Author SHA1 Message Date
John Maguire 74dfee9359
Fix struct tags for TCPService enterprise meta (#16781)
* Fix struct tags for TCPService enterprise meta

* Add changelog
2023-03-27 16:17:04 +00:00
Semir Patel 9f607d4970
Read(...) endpoint for the resource service (#16655) 2023-03-27 10:35:39 -05:00
Derek Menteer 5be6469506
Change partition for peers in discovery chain targets (#16769)
This commit swaps the partition field to the local partition for
discovery chains targeting peers. Prior to this change, peer upstreams
would always use a value of default regardless of which partition they
exist in. This caused several issues in xds / proxycfg because of id
mismatches.

Some prior fixes were made to deal with one-off id mismatches that this
PR also cleans up, since they are no longer needed.
2023-03-24 15:40:19 -05:00
John Eikenberry 56b93388bc
tests instantiating clients w/o shutting down (#16755)
noticed via their port still in use messages.
2023-03-24 16:54:11 +00:00
Poonam Jadhav 82a81dc486
fix: remove unused tenancy category from rate limit spec (#16740) 2023-03-23 12:14:59 -04:00
Dhia Ayachi 8fb85d6b7a
delete config when nil (#16690)
* delete config when nil

* fix mock interface implementation

* fix handler test to use the right assertion

* extract DeleteConfig as a separate API.

* fix mock limiter implementation to satisfy the new interface

* fix failing tests

* add test comments
2023-03-22 15:19:54 -04:00
Eric Haberkorn 672c1eea41
add enterprise xds tests (#16738) 2023-03-22 14:56:18 -04:00
Eric Haberkorn d7c81a3b1d
fix bug where pqs that failover to a cluster peer dont un-fail over (#16729) 2023-03-22 09:24:13 -04:00
cskh 3a4835e258
fix: gracefully fail on invalid port number (#16721) 2023-03-21 22:29:21 -04:00
John Maguire b97ff15b4c
Remove unused are hosts set check (#16691)
* Remove unused are hosts set check

* Remove all traces of unused 'AreHostsSet' parameter

* Remove unused Hosts attribute

* Remove commented out use of snap.APIGateway.Hosts
2023-03-21 16:23:23 +00:00
Nitya Dhanushkodi 69bd62f9c3
peering: peering partition failover fixes (#16673)
add local source partition for peered upstreams
2023-03-20 10:00:29 -07:00
John Maguire 2e07180662
Fix route subscription when using namespaces (#16677)
* Fix route subscription when using namespaces

* Update changelog

* Fix changelog entry to reference that the bug was enterprise only
2023-03-20 12:42:30 -04:00
Melisa Griffin fa1b6e7450
Adds check to verify that the API Gateway is being created with at least one listener 2023-03-20 12:37:30 -04:00
Poonam Jadhav 3be683fcc1
feat: add category annotation to RPC and gRPC methods (#16646) 2023-03-20 11:24:29 -04:00
Eric Haberkorn 2bf2e81a6b
add sameness groups to discovery chains (#16671) 2023-03-20 09:12:37 -04:00
Andrew Stucki a597cb3d57
[API Gateway] Fix invalid cluster causing gateway programming delay (#16661)
* Add test for http routes

* Add fix

* Fix tests

* Add changelog entry

* Refactor and fix flaky tests
2023-03-17 13:31:04 -04:00
Eric Haberkorn 68046060ea
add sameness group support to service resolver failover and redirects (#16664) 2023-03-17 10:48:06 -04:00
Eric Haberkorn 65bfef6df1
fix confusing spiffe ids in golden tests (#16643) 2023-03-15 14:30:36 -04:00
wangxinyi7 4aa1b5ee01
net 2731 ip config entry OSS version (#16642)
* ip config entry

* name changing

* move to ent

* ent version

* renaming

* change format

* renaming

* refactor

* add default values
2023-03-15 11:21:24 -07:00
John Maguire 7fca314579
Update e2e tests for namespaces (#16627)
* Refactored "NewGatewayService" to handle namespaces, fixed
TestHTTPRouteFlattening test

* Fixed existing http_route tests for namespacing

* Squash aclEnterpriseMeta for ResourceRefs and HTTPServices, accept
namespace for creating connect services and regular services

* Use require instead of assert after creating namespaces in
http_route_tests

* Refactor NewConnectService and NewGatewayService functions to use cfg
objects to reduce number of method args

* Rename field on SidecarConfig in tests from `SidecarServiceName` to
`Name` to avoid stutter
2023-03-15 17:51:36 +00:00
Freddy a5bd98ae3e
Backport ENT-4704 (#16612) 2023-03-14 14:55:11 -06:00
Derek Menteer f3be5d9b80
Fix issue with trust bundle read ACL check. (#16630)
This commit fixes an issue where trust bundles could not be read
by services in a non-default namespace, unless they had excessive
ACL permissions given to them.

Prior to this change, `service:write` was required in the default
namespace in order to read the trust bundle. Now, `service:write`
to a service in any namespace is sufficient.
2023-03-14 12:24:33 -05:00
Chris S. Kim bb4baeba95
Preserve CARoots when updating Vault CA configuration (#16592)
If a CA config update did not cause a root change, the codepath would return early and skip some steps which preserve its intermediate certificates and signing key ID. This commit re-orders some code and prevents updates from generating new intermediate certificates.
2023-03-13 17:32:59 -04:00
Derek Menteer 5d17b2c90b
Add sameness-group configuration entry. (#16608)
This commit adds a sameness-group config entry to the API and structs packages. It includes some validation logic and a new memdb index that tracks the default sameness-group for each partition. Sameness groups will simplify the effort of managing failovers / intentions / exports for peers and partitions.

Note that this change purely to introduce the configuration entry and does not include the full functionality of sameness-groups.
2023-03-13 16:19:11 -05:00
Ashvitha f514182f3e
Allow HCP metrics collection for Envoy proxies
Co-authored-by: Ashvitha Sridharan <ashvitha.sridharan@hashicorp.com>
Co-authored-by: Freddy <freddygv@users.noreply.github.com>

Add a new envoy flag: "envoy_hcp_metrics_bind_socket_dir", a directory
where a unix socket will be created with the name
`<namespace>_<proxy_id>.sock` to forward Envoy metrics.

If set, this will configure:
- In bootstrap configuration a local stats_sink and static cluster.
  These will forward metrics to a loopback listener sent over xDS.

- A dynamic listener listening at the socket path that the previously
  defined static cluster is sending metrics to.

- A dynamic cluster that will forward traffic received at this listener
  to the hcp-metrics-collector service.


Reasons for having a static cluster pointing at a dynamic listener:
- We want to secure the metrics stream using TLS, but the stats sink can
  only be defined in bootstrap config. With dynamic listeners/clusters
  we can use the proxy's leaf certificate issued by the Connect CA,
  which isn't available at bootstrap time.

- We want to intelligently route to the HCP collector. Configuring its
  addreess at bootstrap time limits our flexibility routing-wise. More
  on this below.

Reasons for defining the collector as an upstream in `proxycfg`:
- The HCP collector will be deployed as a mesh service.

- Certificate management is taken care of, as mentioned above.

- Service discovery and routing logic is automatically taken care of,
  meaning that no code changes are required in the xds package.

- Custom routing rules can be added for the collector using discovery
  chain config entries. Initially the collector is expected to be
  deployed to each admin partition, but in the future could be deployed
  centrally in the default partition. These config entries could even be
  managed by HCP itself.
2023-03-10 13:52:54 -07:00
Eric Haberkorn d449096190
Add Peer Locality to Discovery Chains (#16588)
Add peer locality to discovery chains
2023-03-10 12:59:47 -05:00
Eric Haberkorn 0351f48bfd
allow setting locality on services and nodes (#16581) 2023-03-10 09:36:15 -05:00
Semir Patel a4780c60b8
GRPC stub for the ResourceService (#16528) 2023-03-09 13:40:23 -06:00
Andrew Stucki 28869784e1
auto-updated agent/uiserver/dist/ from commit 63204b518 (#16587)
Co-authored-by: hc-github-team-consul-core <github-team-consul-core@hashicorp.com>
2023-03-09 13:56:53 -05:00
Eric Haberkorn a66433e527
fix bug that can lead to peering service deletes impacting the state of local services (#16570) 2023-03-08 11:24:03 -05:00
Eric Haberkorn 1d9a09f276
add agent locality and replicate it across peer streams (#16522) 2023-03-07 14:05:23 -05:00
John Eikenberry 25ed13261b
support vault auth config for alicloud ca provider
Add support for using existing vault auto-auth configurations as the
provider configuration when using Vault's CA provider with AliCloud.

AliCloud requires 2 extra fields to enable it to use STS (it's preferred
auth setup). Our vault-plugin-auth-alicloud package contained a method
to help generate them as they require you to make an http call to
a faked endpoint proxy to get them (url and headers base64 encoded).
2023-03-07 03:02:05 +00:00
Melisa Griffin dac0cc90ed
NET-2904 Fixes API Gateway Route Service Weight Division Error 2023-03-06 08:41:57 -05:00
Melisa Griffin 71379b9621
NET-2903 Normalize weight for http routes (#16512)
* NET-2903 Normalize weight for http routes

* Update website/content/docs/connect/gateways/api-gateway/configuration/http-route.mdx

Co-authored-by: trujillo-adam <47586768+trujillo-adam@users.noreply.github.com>
2023-03-03 16:39:59 -05:00
R.B. Boyer b089f93292
proxycfg: ensure that an irrecoverable error in proxycfg closes the xds session and triggers a replacement proxycfg watcher (#16497)
Receiving an "acl not found" error from an RPC in the agent cache and the
streaming/event components will cause any request loops to cease under the
assumption that they will never work again if the token was destroyed. This
prevents log spam (#14144, #9738).

Unfortunately due to things like:

- authz requests going to stale servers that may not have witnessed the token
  creation yet

- authz requests in a secondary datacenter happening before the tokens get
  replicated to that datacenter

- authz requests from a primary TO a secondary datacenter happening before the
  tokens get replicated to that datacenter

The caller will get an "acl not found" *before* the token exists, rather than
just after. The machinery added above in the linked PRs will kick in and
prevent the request loop from looping around again once the tokens actually
exist.

For `consul-dataplane` usages, where xDS is served by the Consul servers
rather than the clients ultimately this is not a problem because in that
scenario the `agent/proxycfg` machinery is on-demand and launched by a new xDS
stream needing data for a specific service in the catalog. If the watching
goroutines are terminated it ripples down and terminates the xDS stream, which
CDP will eventually re-establish and restart everything.

For Consul client usages, the `agent/proxycfg` machinery is ahead-of-time
launched at service registration time (called "local" in some of the proxycfg
machinery) so when the xDS stream comes in the data is already ready to go. If
the watching goroutines terminate it should terminate the xDS stream, but
there's no mechanism to re-spawn the watching goroutines. If the xDS stream
reconnects it will see no `ConfigSnapshot` and will not get one again until
the client agent is restarted, or the service is re-registered with something
changed in it.

This PR fixes a few things in the machinery:

- there was an inadvertent deadlock in fetching snapshot from the proxycfg
  machinery by xDS, such that when the watching goroutine terminated the
  snapshots would never be fetched. This caused some of the xDS machinery to
  get indefinitely paused and not finish the teardown properly.

- Every 30s we now attempt to re-insert all locally registered services into
  the proxycfg machinery.

- When services are re-inserted into the proxycfg machinery we special case
  "dead" ones such that we unilaterally replace them rather that doing that
  conditionally.
2023-03-03 14:27:53 -06:00
John Eikenberry 95288615fa
add provider ca support for approle auth-method
Adds support for the approle auth-method. Only handles using the approle
role/secret to auth and it doesn't support the agent's extra management
configuration options (wrap and delete after read) as they are not
required as part of the auth (ie. they are vault agent things).
2023-03-03 19:29:53 +00:00
Andrew Stucki ba667221a5
Fix resolution of service resolvers with subsets for external upstreams (#16499)
* Fix resolution of service resolvers with subsets for external upstreams

* Add tests

* Add changelog entry

* Update view filter logic
2023-03-03 14:17:11 -05:00
Eric Haberkorn 5c8414e772
Add support for failover policies (#16505) 2023-03-03 11:12:38 -05:00
Andrew Stucki 6ca1c9f15c
Fix issue where terminating gateway service resolvers weren't properly cleaned up (#16498)
* Fix issue where terminating gateway service resolvers weren't properly cleaned up

* Add integration test for cleaning up resolvers

* Add changelog entry

* Use state test and drop integration test
2023-03-03 09:56:57 -05:00
Andrew Stucki 2916821b55
Add ServiceResolver RequestTimeout for route timeouts to make TerminatingGateway upstream timeouts configurable (#16495)
* Leverage ServiceResolver ConnectTimeout for route timeouts to make TerminatingGateway upstream timeouts configurable

* Regenerate golden files

* Add RequestTimeout field

* Add changelog entry
2023-03-03 09:37:12 -05:00
John Eikenberry 1a065c08bc
add provider ca auth support for kubernetes
Adds support for Kubernetes jwt/token file based auth. Only needs to
read the file and save the contents as the jwt/token.
2023-03-02 22:05:40 +00:00
John Eikenberry a75800a988
add provider ca support for jwt file base auth
Adds support for a jwt token in a file. Simply reads the file and sends
the read in jwt along to the vault login.

It also supports a legacy mode with the jwt string being passed
directly. In which case the path is made optional.
2023-03-02 20:33:06 +00:00
Chris S. Kim fea543993d
Speed up test by registering services concurrently (#16509) 2023-03-02 14:36:44 -05:00
John Eikenberry 5ac637f07d
add provider ca auth-method support for azure
Does the required dance with the local HTTP endpoint to get the required
data for the jwt based auth setup in Azure. Keeps support for 'legacy'
mode where all login data is passed on via the auth methods parameters.
Refactored check for hardcoded /login fields.
2023-03-01 00:07:33 +00:00
Dan Upton 118ffb1e95
grpc: fix data race in balancer registration (#16229)
Registering gRPC balancers is thread-unsafe because they are stored in a
global map variable that is accessed without holding a lock. Therefore,
it's expected that balancers are registered _once_ at the beginning of
your program (e.g. in a package `init` function) and certainly not after
you've started dialing connections, etc.

> NOTE: this function must only be called during initialization time
> (i.e. in an init() function), and is not thread-safe.

While this is fine for us in production, it's challenging for tests that
spin up multiple agents in-memory. We currently register a balancer per-
agent which holds agent-specific state that cannot safely be shared.

This commit introduces our own registry that _is_ thread-safe, and
implements the Builder interface such that we can call gRPC's `Register`
method once, on start-up. It uses the same pattern as our resolver
registry where we use the dial target's host (aka "authority"), which is
unique per-agent, to determine which builder to use.
2023-02-28 10:18:38 +00:00
Andrew Stucki 682eec50a8
Fix attempt for test fail panics in xDS (#16319)
* Fix attempt for test fail panics in xDS

* switch to a mutex pointer
2023-02-24 17:00:31 -05:00
Chris S. Kim 652b74dd37
Fix various flaky tests (#16396) 2023-02-23 14:52:18 -05:00
Eric Haberkorn be0eda24c9
Refactor the disco chain -> xds logic (#16392) 2023-02-23 11:32:32 -05:00
Paul Banks bc344e81da
Correct WAL metrics registrations (#16388) 2023-02-23 14:07:17 +00:00
Dhia Ayachi fa95f65aab
Rate limiter/add ip prefix (#16342)
* add support for prefixes in the config tree

* fix to use default config when the prefix have no config
2023-02-22 15:15:51 -05:00