Commit Graph

5065 Commits

Author SHA1 Message Date
Dhia Ayachi cdc47ea200
add necessary plumbing to implement per server ip based rate limiting (#17436) 2023-05-23 15:37:01 -04:00
R.B. Boyer 3ed4f7a33a
extract some config entry helpers into package (#17434) 2023-05-23 12:15:30 -05:00
Paul Glass 71992b9c3b
Only synthesize anonymous token in primary DC (#17231)
* Only synthesize anonymous token in primary DC
* Add integration test for wan fed issue
2023-05-23 09:38:04 -05:00
Michael Zalimeni 4cae008559
Disable remote proxy patching except AWS Lambda (#17415)
To avoid unintended tampering with remote downstreams via service
config, refactor BasicEnvoyExtender and RuntimeConfig to disallow
typical Envoy extensions from being applied to non-local proxies.

Continue to allow this behavior for AWS Lambda and the read-only
Validate builtin extensions.

Addresses CVE-2023-2816.
2023-05-23 11:55:06 +00:00
sarahalsmiller eccdf81977
xds: generate listeners directly from API gateway snapshot (#17398)
* API Gateway XDS Primitives, endpoints and clusters (#17002)

* XDS primitive generation for endpoints and clusters

Co-authored-by: Nathan Coleman <nathan.coleman@hashicorp.com>

* server_test

* deleted extra file

* add missing parents to test

---------

Co-authored-by: Nathan Coleman <nathan.coleman@hashicorp.com>

* Routes for API Gateway (#17158)

* XDS primitive generation for endpoints and clusters

Co-authored-by: Nathan Coleman <nathan.coleman@hashicorp.com>

* server_test

* deleted extra file

* add missing parents to test

* checkpoint

* delete extra file

* httproute flattening code

* linting issue

* so close on this, calling for tonight

* unit test passing

* add in header manip to virtual host

* upstream rebuild commented out

* Use consistent upstream name whether or not we're rebuilding

* Start working through route naming logic

* Fix typos in test descriptions

* Simplify route naming logic

* Simplify RebuildHTTPRouteUpstream

* Merge additional compiled discovery chains instead of overwriting

* Use correct chain for flattened route, clean up + add TODOs

* Remove empty conditional branch

* Restore previous variable declaration

Limit the scope of this PR

* Clean up, improve TODO

* add logging, clean up todos

* clean up function

---------

Co-authored-by: Nathan Coleman <nathan.coleman@hashicorp.com>

* checkpoint, skeleton, tests not passing

* checkpoint

* endpoints xds cluster configuration

* resources test fix

* fix reversion in resources_test

* checkpoint

* Update agent/proxycfg/api_gateway.go

Co-authored-by: John Maguire <john.maguire@hashicorp.com>

* unit tests passing

* gofmt

* add deterministic sorting to appease the unit test gods

* remove panic

* Find ready upstream matching listener instead of first in list

* Clean up, improve TODO

* Modify getReadyUpstreams to filter upstreams by listener (#17410)

Each listener would previously have all upstreams from any route that bound to the listener. This is problematic when a route bound to one listener also binds to other listeners and so includes upstreams for multiple listeners. The list for a given listener would then wind up including upstreams for other listeners.

* clean up todos, references to api gateway in listeners_ingress

* merge in Nathan's fix

* Update agent/consul/discoverychain/gateway.go

* cleanup current todos, remove snapshot manipulation from generation code

* Update agent/structs/config_entry_gateways.go

Co-authored-by: Thomas Eckert <teckert@hashicorp.com>

* Update agent/consul/discoverychain/gateway.go

Co-authored-by: Nathan Coleman <nathan.coleman@hashicorp.com>

* Update agent/consul/discoverychain/gateway.go

Co-authored-by: Nathan Coleman <nathan.coleman@hashicorp.com>

* Update agent/proxycfg/snapshot.go

Co-authored-by: Nathan Coleman <nathan.coleman@hashicorp.com>

* clarified header comment for FlattenHTTPRoute, changed RebuildHTTPRouteUpstream to BuildHTTPRouteUpstream

* simplify cert logic

* Delete scratch

* revert route related changes in listener PR

* Update agent/consul/discoverychain/gateway.go

* Update agent/proxycfg/snapshot.go

* clean up uneeded extra lines in endpoints

---------

Co-authored-by: Nathan Coleman <nathan.coleman@hashicorp.com>
Co-authored-by: John Maguire <john.maguire@hashicorp.com>
Co-authored-by: Thomas Eckert <teckert@hashicorp.com>
2023-05-22 17:36:29 -04:00
R.B. Boyer e1110ea82d
prototest: fix early return condition in AssertElementsMatch (#17416) 2023-05-22 13:49:50 -05:00
sarahalsmiller 0477d15a5a
xds: generate clusters directly from API gateway snapshot (#17391)
* endpoints xds cluster configuration

* clusters xds native generation

* resources test fix

* fix reversion in resources_test

* Update agent/proxycfg/api_gateway.go

Co-authored-by: John Maguire <john.maguire@hashicorp.com>

* gofmt

* Modify getReadyUpstreams to filter upstreams by listener (#17410)

Each listener would previously have all upstreams from any route that bound to the listener. This is problematic when a route bound to one listener also binds to other listeners and so includes upstreams for multiple listeners. The list for a given listener would then wind up including upstreams for other listeners.

* Update agent/proxycfg/api_gateway.go

Co-authored-by: Nathan Coleman <nathan.coleman@hashicorp.com>

* Restore import blocking

* Undo removal of unrelated code

---------

Co-authored-by: John Maguire <john.maguire@hashicorp.com>
Co-authored-by: Nathan Coleman <nathan.coleman@hashicorp.com>
2023-05-22 12:00:13 -04:00
Matt Keeler cd3dc460c5
Allow resource updates to omit an owner refs UID (#17423)
This change enables workflows where you are reapplying a resource that should have an owner ref to publish modifications to the resources data without performing a read to figure out the current owner resource incarnations UID.

Basically we want workflows similar to `kubectl apply` or `consul config write` to be able to work seamlessly even for owned resources.

In these cases the users intention is to have the resource owned by the “current” incarnation of the owner resource.
2023-05-22 10:44:49 -04:00
Ronald aad135529f
JWT Authentication with service intentions: xds package update (#17414)
* JWT Authentication with service intentions: update xds package to translate config to envoy
2023-05-19 18:14:16 -04:00
sarahalsmiller 97532900a5
xds: generate endpoints directly from API gateway snapshot (#17390)
* endpoints xds cluster configuration

* resources test fix

* fix reversion in resources_test

* Update agent/proxycfg/api_gateway.go

Co-authored-by: John Maguire <john.maguire@hashicorp.com>

* gofmt

* Modify getReadyUpstreams to filter upstreams by listener (#17410)

Each listener would previously have all upstreams from any route that bound to the listener. This is problematic when a route bound to one listener also binds to other listeners and so includes upstreams for multiple listeners. The list for a given listener would then wind up including upstreams for other listeners.

* Update agent/proxycfg/api_gateway.go

Co-authored-by: Nathan Coleman <nathan.coleman@hashicorp.com>

* Restore import blocking

* Skip to next route if route has no upstreams

* cleanup

* change set from bool to empty struct

---------

Co-authored-by: John Maguire <john.maguire@hashicorp.com>
Co-authored-by: Nathan Coleman <nathan.coleman@hashicorp.com>
2023-05-19 18:50:59 +00:00
Matt Keeler 6216a96f93
Add the workload health controller (#17215) 2023-05-19 13:53:29 -04:00
Kyle Havlovitz 3a8afcea57
Pull virtual IPs for filter chains from discovery chains (#17375) 2023-05-17 11:18:39 -07:00
R.B. Boyer ce6bf1d82e
fix two typos (#17389) 2023-05-17 08:50:26 -07:00
Connor 6532ede487
Rename hcp-metrics-collector to consul-telemetry-collector (#17327)
* Rename hcp-metrics-collector to consul-telemetry-collector

* Fix docs

* Fix doc comment

---------

Co-authored-by: Ashvitha Sridharan <ashvitha.sridharan@hashicorp.com>
2023-05-16 14:36:05 -04:00
Dan Bond 5d07624e80
agent: don't write server metadata in dev mode (#17383)
Signed-off-by: Dan Bond <danbond@protonmail.com>
2023-05-16 02:50:27 -07:00
wangxinyi7 c2a479bffa
counterpart of the ent in oss (#17367) 2023-05-15 10:49:43 -07:00
Semir Patel 3e0d71cf22
Support update resource with change in GroupVersion (#17330) 2023-05-15 09:42:01 -05:00
Matt Keeler e23577b8aa
Add a Node health controller (#17214)
This will aggregate all HealthStatus objects owned by the Node and update the status of the Node with an overall health.
2023-05-15 09:55:03 -04:00
Dan Upton 7abd829d0b
resource: handle `ErrWatchClosed` in `WatchList` endpoint (#17289) 2023-05-15 12:35:10 +01:00
Dan Bond 6bb7782745
agent: prevent very old servers re-joining a cluster with stale data (#17171)
* agent: configure server lastseen timestamp

Signed-off-by: Dan Bond <danbond@protonmail.com>

* use correct config

Signed-off-by: Dan Bond <danbond@protonmail.com>

* add comments

Signed-off-by: Dan Bond <danbond@protonmail.com>

* use default age in test golden data

Signed-off-by: Dan Bond <danbond@protonmail.com>

* add changelog

Signed-off-by: Dan Bond <danbond@protonmail.com>

* fix runtime test

Signed-off-by: Dan Bond <danbond@protonmail.com>

* agent: add server_metadata

Signed-off-by: Dan Bond <danbond@protonmail.com>

* update comments

Signed-off-by: Dan Bond <danbond@protonmail.com>

* correctly check if metadata file does not exist

Signed-off-by: Dan Bond <danbond@protonmail.com>

* follow instructions for adding new config

Signed-off-by: Dan Bond <danbond@protonmail.com>

* add comments

Signed-off-by: Dan Bond <danbond@protonmail.com>

* update comments

Signed-off-by: Dan Bond <danbond@protonmail.com>

* Update agent/agent.go

Co-authored-by: Dan Upton <daniel@floppy.co>

* agent/config: add validation for duration with min

Signed-off-by: Dan Bond <danbond@protonmail.com>

* docs: add new server_rejoin_age_max config definition

Signed-off-by: Dan Bond <danbond@protonmail.com>

* agent: add unit test for checking server last seen

Signed-off-by: Dan Bond <danbond@protonmail.com>

* agent: log continually for 60s before erroring

Signed-off-by: Dan Bond <danbond@protonmail.com>

* pr comments

Signed-off-by: Dan Bond <danbond@protonmail.com>

* remove unneeded todo

* agent: fix error message

Signed-off-by: Dan Bond <danbond@protonmail.com>

---------

Signed-off-by: Dan Bond <danbond@protonmail.com>
Co-authored-by: Dan Upton <daniel@floppy.co>
2023-05-15 04:05:47 -07:00
Hans Hasselberg 90a25d39f1
Add new fields to HCP bootstrap config request and push state request
To support linking cluster, HCP needs to know the datacenter and if ACLs are enabled. Otherwise hosted Consul Core UI won't work properly.
2023-05-12 21:01:56 -06:00
Eric Haberkorn d645fa5ea1
sidecar-proxy refactor (#17328) 2023-05-12 16:49:42 -04:00
Chris Thain f99593a054
Add Network Filter Support for Envoy Extensions (#17325) 2023-05-12 09:52:50 -07:00
Kyle Havlovitz 73897656d5
Attach service virtual IP info to compiled discovery chain (#17295)
* Add v1/internal/service-virtual-ip for manually setting service VIPs

* Attach service virtual IP info to compiled discovery chain

* Separate auto-assigned and manual VIPs in response
2023-05-12 02:28:16 +00:00
Kyle Havlovitz b6d5d5649d
Add /v1/internal/service-virtual-ip for manually setting service VIPs (#17294) 2023-05-12 00:38:52 +00:00
R.B. Boyer 0b79707beb
grpc: ensure grpc resolver correctly uses lan/wan addresses on servers (#17270)
The grpc resolver implementation is fed from changes to the
router.Router. Within the router there is a map of various areas storing
the addressing information for servers in those areas. All map entries
are of the WAN variety except a single special entry for the LAN.

Addressing information in the LAN "area" are local addresses intended
for use when making a client-to-server or server-to-server request.

The client agent correctly updates this LAN area when receiving lan serf
events, so by extension the grpc resolver works fine in that scenario.

The server agent only initially populates a single entry in the LAN area
(for itself) on startup, and then never mutates that area map again.
For normal RPCs a different structure is used for LAN routing.

Additionally when selecting a server to contact in the local datacenter
it will randomly select addresses from either the LAN or WAN addressed
entries in the map.

Unfortunately this means that the grpc resolver stack as it exists on
server agents is either broken or only accidentally functions by having
servers dial each other over the WAN-accessible address. If the operator
disables the serf wan port completely likely this incidental functioning
would break.

This PR enforces that local requests for servers (both for stale reads
or leader forwarded requests) exclusively use the LAN "area" information
and also fixes it so that servers keep that area up to date in the
router.

A test for the grpc resolver logic was added, as well as a higher level
full-stack test to ensure the externally perceived bug does not return.
2023-05-11 11:08:57 -05:00
Dan Upton f72d75d6b2
resource: add missing validation to the `List` and `WatchList` endpoints (#17213) 2023-05-10 10:38:48 +01:00
Derek Menteer 91051761f3
Fix ent bug caused by #17241. (#17278)
Fix ent bug caused by #17241

All tests passed in OSS, but not ENT. This is a patch to resolve
the problem for both.
2023-05-09 16:36:29 -05:00
cskh 3efe8406e4
snapshot: some improvments to the snapshot process (#17236)
* snapshot: some improvments to the snapshot process

Co-authored-by: trujillo-adam <47586768+trujillo-adam@users.noreply.github.com>
Co-authored-by: Chris S. Kim <ckim@hashicorp.com>
2023-05-09 15:28:52 -04:00
Semir Patel f8b900d555
Reaper controller for cascading deletes of owner resources (#17256) 2023-05-09 13:57:40 -05:00
Freddy 0459069523
Hash namespace+proxy ID when creating socket path (#17204)
UNIX domain socket paths are limited to 104-108 characters, depending on
the OS. This limit was quite easy to exceed when testing the feature on
Kubernetes, due to how proxy IDs encode the Pod ID eg:
metrics-collector-59467bcb9b-fkkzl-hcp-metrics-collector-sidecar-proxy

To ensure we stay under that character limit this commit makes a
couple changes:
- Use a b64 encoded SHA1 hash of the namespace + proxy ID to create a
  short and deterministic socket file name.
- Add validation to proxy registrations and proxy-defaults to enforce a
  limit on the socket directory length.
2023-05-09 12:20:26 -06:00
Dan Upton 270df96301
resource: add helpers for more efficiently comparing IDs etc (#17224) 2023-05-09 19:02:24 +01:00
Derek Menteer 3ce5277217
Fix multiple issues related to proxycfg health queries. (#17241)
Fix multiple issues related to proxycfg health queries.

1. The datacenter was not being provided to a proxycfg query, which resulted in
bypassing agentless query optimizations and using the normal API instead.

2. The health rpc endpoint would return a zero index when insufficient ACLs were
detected. This would result in the agent cache performing an infinite loop of
queries in rapid succession without backoff.
2023-05-09 12:37:58 -05:00
Dan Upton 91f76b6fb2
controller: deduplicate items in queue (#17168) 2023-05-09 18:14:20 +01:00
Dan Upton 979ef66885
Controller Runtime 2023-05-09 15:25:55 +01:00
Matt Keeler 6919dabb50
Register new catalog & mesh protobuf types with the resource registry (#17225) 2023-05-08 15:36:35 -04:00
Derek Menteer 73b65228f5
Fix issue with peer stream node cleanup. (#17235)
Fix issue with peer stream node cleanup.

This commit encompasses a few problems that are closely related due to their
proximity in the code.

1. The peerstream utilizes node IDs in several locations to determine which
nodes / services / checks should be cleaned up or created. While VM deployments
with agents will likely always have a node ID, agentless uses synthetic nodes
and does not populate the field. This means that for consul-k8s deployments, all
services were likely bundled together into the same synthetic node in some code
paths (but not all), resulting in strange behavior. The Node.Node field should
be used instead as a unique identifier, as it should always be populated.

2. The peerstream cleanup process for unused nodes uses an incorrect query for
node deregistration. This query is NOT namespace aware and results in the node
(and corresponding services) being deregistered prematurely whenever it has zero
default-namespace services and 1+ non-default-namespace services registered on
it. This issue is tricky to find due to the incorrect logic mentioned in #1,
combined with the fact that the affected services must be co-located on the same
node as the currently deregistering service for this to be encountered.

3. The stream tracker did not understand differences between services in
different namespaces and could therefore report incorrect numbers. It was
updated to utilize the full service name to avoid conflicts and return proper
results.
2023-05-08 13:13:25 -05:00
Semir Patel 9615837c60
resource: List resources by owner (#17190) 2023-05-08 12:26:19 -05:00
Dan Upton 34786c71cd
controller: make the `WorkQueue` generic (#16982) 2023-05-05 15:38:22 +01:00
John Eikenberry 0210211a69
enable auto-tidy expired issuers in vault (as CA)
When using vault as a CA and generating the local signing cert, try to
enable the PKI endpoint's auto-tidy feature with it set to tidy expired
issuers.
2023-05-03 20:30:37 +00:00
Nathan Coleman f5668b3621
Use auth context when evaluating service read permissions (#17207)
Co-authored-by: Blake Covarrubias <1812+blake@users.noreply.github.com>
2023-05-02 16:23:42 -04:00
Poonam Jadhav 2a33111b21
feat: add no-op reporting background routine (#17178) 2023-04-28 20:07:03 -04:00
Eric Haberkorn 47a7e52098
fix panic in `injectSANMatcher` when `tlsContext` is `nil` (#17185) 2023-04-28 16:27:57 -04:00
Paul Glass e1cff98a8f
Permissive mTLS: Config entry filtering and CLI warnings (#17183)
This adds filtering for service-defaults: consul config list -filter 'MutualTLSMode == "permissive"'.

It adds CLI warnings when the CLI writes a config entry and sees that either service-defaults or proxy-defaults contains MutualTLSMode=permissive, or sees that the mesh config entry contains AllowEnablingPermissiveMutualTLSMode=true.
2023-04-28 12:51:36 -05:00
R.B. Boyer 064392441f
peering: ensure that merged central configs of peered upstreams for partitioned downstreams work (#17179)
Partitioned downstreams with peered upstreams could not properly merge central config info (i.e. proxy-defaults and service-defaults things like mesh gateway modes) if the upstream had an empty DestinationPartition field in Enterprise.

Due to data flow, if this setup is done using Consul client agents the field is never empty and thus does not experience the bug.

When a service is registered directly to the catalog as is the case for consul-dataplane use this field may be empty and and the internal machinery of the merging function doesn't handle this well.

This PR ensures the internal machinery of that function is referentially self-consistent.
2023-04-28 12:36:08 -05:00
Semir Patel 2601f0488c
Sync .golangci.yml from ENT (#17180) 2023-04-28 17:14:37 +00:00
John Landa b9cf6579e6
Remove artificial ACLTokenMaxTTL limit for configuring acl token expiry (#17066)
* Remove artificial ACLTokenMaxTTL limit for configuring acl token expiry

* Add changelog

* Remove test on default MaxTokenTTL

* Change to imperitive tense for changelog entry
2023-04-28 10:57:30 -05:00
Semir Patel 896c39d98c
Create tombstone on resource `Delete` (#17108) 2023-04-28 10:49:08 -05:00
Dan Upton 6d024775a0
resource: owner references must include a uid (#17169) 2023-04-28 11:22:42 +01:00
Freddy 29d5811f0d
Update HCP bootstrapping to support existing clusters (#16916)
* Persist HCP management token from server config

We want to move away from injecting an initial management token into
Consul clusters linked to HCP. The reasoning is that by using a separate
class of token we can have more flexibility in terms of allowing HCP's
token to co-exist with the user's management token.

Down the line we can also more easily adjust the permissions attached to
HCP's token to limit it's scope.

With these changes, the cloud management token is like the initial
management token in that iit has the same global management policy and
if it is created it effectively bootstraps the ACL system.

* Update SDK and mock HCP server

The HCP management token will now be sent in a special field rather than
as Consul's "initial management" token configuration.

This commit also updates the mock HCP server to more accurately reflect
the behavior of the CCM backend.

* Refactor HCP bootstrapping logic and add tests

We want to allow users to link Consul clusters that already exist to
HCP. Existing clusters need care when bootstrapped by HCP, since we do
not want to do things like change ACL/TLS settings for a running
cluster.

Additional changes:

* Deconstruct MaybeBootstrap so that it can be tested. The HCP Go SDK
  requires HTTPS to fetch a token from the Auth URL, even if the backend
  server is mocked. By pulling the hcp.Client creation out we can modify
  its TLS configuration in tests while keeping the secure behavior in
  production code.

* Add light validation for data received/loaded.

* Sanitize initial_management token from received config, since HCP will
  only ever use the CloudConfig.MangementToken.

* Add changelog entry
2023-04-27 22:27:39 +02:00