Commit Graph

1082 Commits

Author SHA1 Message Date
Dan Stough e502be8c6e
[OSS] gRPC Blocking Queries (#17426)
* feat: initial grpc blocking queries

* changelog and docs update
2023-05-23 17:29:10 -04:00
Paul Glass 71992b9c3b
Only synthesize anonymous token in primary DC (#17231)
* Only synthesize anonymous token in primary DC
* Add integration test for wan fed issue
2023-05-23 09:38:04 -05:00
Michael Zalimeni 4cae008559
Disable remote proxy patching except AWS Lambda (#17415)
To avoid unintended tampering with remote downstreams via service
config, refactor BasicEnvoyExtender and RuntimeConfig to disallow
typical Envoy extensions from being applied to non-local proxies.

Continue to allow this behavior for AWS Lambda and the read-only
Validate builtin extensions.

Addresses CVE-2023-2816.
2023-05-23 11:55:06 +00:00
John Landa 4859cfb47b
Add ACLs Enabled field to consul agent startup status message (#17086)
* Add ACLs Enabled field to consul agent startup status message

* Add changelog

* Update startup messages to include default ACL policy configuration

* Correct import groupings
2023-05-16 13:47:02 -05:00
Connor 6532ede487
Rename hcp-metrics-collector to consul-telemetry-collector (#17327)
* Rename hcp-metrics-collector to consul-telemetry-collector

* Fix docs

* Fix doc comment

---------

Co-authored-by: Ashvitha Sridharan <ashvitha.sridharan@hashicorp.com>
2023-05-16 14:36:05 -04:00
Dan Stough 4a245b2bff
fix(connect envoy): set initial_fetch_timeout to wait for initial xDS… (#17317)
* fix(connect envoy): set initial_fetch_timeout to wait for initial xDS indefinitely

---------

Co-authored-by: Kiril Angov <kiril.angov@gmail.com>
2023-05-15 10:45:16 -04:00
Dan Bond 6bb7782745
agent: prevent very old servers re-joining a cluster with stale data (#17171)
* agent: configure server lastseen timestamp

Signed-off-by: Dan Bond <danbond@protonmail.com>

* use correct config

Signed-off-by: Dan Bond <danbond@protonmail.com>

* add comments

Signed-off-by: Dan Bond <danbond@protonmail.com>

* use default age in test golden data

Signed-off-by: Dan Bond <danbond@protonmail.com>

* add changelog

Signed-off-by: Dan Bond <danbond@protonmail.com>

* fix runtime test

Signed-off-by: Dan Bond <danbond@protonmail.com>

* agent: add server_metadata

Signed-off-by: Dan Bond <danbond@protonmail.com>

* update comments

Signed-off-by: Dan Bond <danbond@protonmail.com>

* correctly check if metadata file does not exist

Signed-off-by: Dan Bond <danbond@protonmail.com>

* follow instructions for adding new config

Signed-off-by: Dan Bond <danbond@protonmail.com>

* add comments

Signed-off-by: Dan Bond <danbond@protonmail.com>

* update comments

Signed-off-by: Dan Bond <danbond@protonmail.com>

* Update agent/agent.go

Co-authored-by: Dan Upton <daniel@floppy.co>

* agent/config: add validation for duration with min

Signed-off-by: Dan Bond <danbond@protonmail.com>

* docs: add new server_rejoin_age_max config definition

Signed-off-by: Dan Bond <danbond@protonmail.com>

* agent: add unit test for checking server last seen

Signed-off-by: Dan Bond <danbond@protonmail.com>

* agent: log continually for 60s before erroring

Signed-off-by: Dan Bond <danbond@protonmail.com>

* pr comments

Signed-off-by: Dan Bond <danbond@protonmail.com>

* remove unneeded todo

* agent: fix error message

Signed-off-by: Dan Bond <danbond@protonmail.com>

---------

Signed-off-by: Dan Bond <danbond@protonmail.com>
Co-authored-by: Dan Upton <daniel@floppy.co>
2023-05-15 04:05:47 -07:00
R.B. Boyer 0b79707beb
grpc: ensure grpc resolver correctly uses lan/wan addresses on servers (#17270)
The grpc resolver implementation is fed from changes to the
router.Router. Within the router there is a map of various areas storing
the addressing information for servers in those areas. All map entries
are of the WAN variety except a single special entry for the LAN.

Addressing information in the LAN "area" are local addresses intended
for use when making a client-to-server or server-to-server request.

The client agent correctly updates this LAN area when receiving lan serf
events, so by extension the grpc resolver works fine in that scenario.

The server agent only initially populates a single entry in the LAN area
(for itself) on startup, and then never mutates that area map again.
For normal RPCs a different structure is used for LAN routing.

Additionally when selecting a server to contact in the local datacenter
it will randomly select addresses from either the LAN or WAN addressed
entries in the map.

Unfortunately this means that the grpc resolver stack as it exists on
server agents is either broken or only accidentally functions by having
servers dial each other over the WAN-accessible address. If the operator
disables the serf wan port completely likely this incidental functioning
would break.

This PR enforces that local requests for servers (both for stale reads
or leader forwarded requests) exclusively use the LAN "area" information
and also fixes it so that servers keep that area up to date in the
router.

A test for the grpc resolver logic was added, as well as a higher level
full-stack test to ensure the externally perceived bug does not return.
2023-05-11 11:08:57 -05:00
cskh 3efe8406e4
snapshot: some improvments to the snapshot process (#17236)
* snapshot: some improvments to the snapshot process

Co-authored-by: trujillo-adam <47586768+trujillo-adam@users.noreply.github.com>
Co-authored-by: Chris S. Kim <ckim@hashicorp.com>
2023-05-09 15:28:52 -04:00
Derek Menteer 3ce5277217
Fix multiple issues related to proxycfg health queries. (#17241)
Fix multiple issues related to proxycfg health queries.

1. The datacenter was not being provided to a proxycfg query, which resulted in
bypassing agentless query optimizations and using the normal API instead.

2. The health rpc endpoint would return a zero index when insufficient ACLs were
detected. This would result in the agent cache performing an infinite loop of
queries in rapid succession without backoff.
2023-05-09 12:37:58 -05:00
Derek Menteer 73b65228f5
Fix issue with peer stream node cleanup. (#17235)
Fix issue with peer stream node cleanup.

This commit encompasses a few problems that are closely related due to their
proximity in the code.

1. The peerstream utilizes node IDs in several locations to determine which
nodes / services / checks should be cleaned up or created. While VM deployments
with agents will likely always have a node ID, agentless uses synthetic nodes
and does not populate the field. This means that for consul-k8s deployments, all
services were likely bundled together into the same synthetic node in some code
paths (but not all), resulting in strange behavior. The Node.Node field should
be used instead as a unique identifier, as it should always be populated.

2. The peerstream cleanup process for unused nodes uses an incorrect query for
node deregistration. This query is NOT namespace aware and results in the node
(and corresponding services) being deregistered prematurely whenever it has zero
default-namespace services and 1+ non-default-namespace services registered on
it. This issue is tricky to find due to the incorrect logic mentioned in #1,
combined with the fact that the affected services must be co-located on the same
node as the currently deregistering service for this to be encountered.

3. The stream tracker did not understand differences between services in
different namespaces and could therefore report incorrect numbers. It was
updated to utilize the full service name to avoid conflicts and return proper
results.
2023-05-08 13:13:25 -05:00
John Murret 7c101c27c3
security: update go version to 1.20.4 (#17240)
* update go version to 1.20.3

* add changelog

* rename changelog file to remove underscore

* update to use 1.20.4

* update change log entry to reflect 1.20.4
2023-05-08 11:57:11 -06:00
John Eikenberry 0210211a69
enable auto-tidy expired issuers in vault (as CA)
When using vault as a CA and generating the local signing cert, try to
enable the PKI endpoint's auto-tidy feature with it set to tidy expired
issuers.
2023-05-03 20:30:37 +00:00
Eric Haberkorn 47a7e52098
fix panic in `injectSANMatcher` when `tlsContext` is `nil` (#17185) 2023-04-28 16:27:57 -04:00
Paul Glass e1cff98a8f
Permissive mTLS: Config entry filtering and CLI warnings (#17183)
This adds filtering for service-defaults: consul config list -filter 'MutualTLSMode == "permissive"'.

It adds CLI warnings when the CLI writes a config entry and sees that either service-defaults or proxy-defaults contains MutualTLSMode=permissive, or sees that the mesh config entry contains AllowEnablingPermissiveMutualTLSMode=true.
2023-04-28 12:51:36 -05:00
R.B. Boyer 064392441f
peering: ensure that merged central configs of peered upstreams for partitioned downstreams work (#17179)
Partitioned downstreams with peered upstreams could not properly merge central config info (i.e. proxy-defaults and service-defaults things like mesh gateway modes) if the upstream had an empty DestinationPartition field in Enterprise.

Due to data flow, if this setup is done using Consul client agents the field is never empty and thus does not experience the bug.

When a service is registered directly to the catalog as is the case for consul-dataplane use this field may be empty and and the internal machinery of the merging function doesn't handle this well.

This PR ensures the internal machinery of that function is referentially self-consistent.
2023-04-28 12:36:08 -05:00
John Landa b9cf6579e6
Remove artificial ACLTokenMaxTTL limit for configuring acl token expiry (#17066)
* Remove artificial ACLTokenMaxTTL limit for configuring acl token expiry

* Add changelog

* Remove test on default MaxTokenTTL

* Change to imperitive tense for changelog entry
2023-04-28 10:57:30 -05:00
Freddy 29d5811f0d
Update HCP bootstrapping to support existing clusters (#16916)
* Persist HCP management token from server config

We want to move away from injecting an initial management token into
Consul clusters linked to HCP. The reasoning is that by using a separate
class of token we can have more flexibility in terms of allowing HCP's
token to co-exist with the user's management token.

Down the line we can also more easily adjust the permissions attached to
HCP's token to limit it's scope.

With these changes, the cloud management token is like the initial
management token in that iit has the same global management policy and
if it is created it effectively bootstraps the ACL system.

* Update SDK and mock HCP server

The HCP management token will now be sent in a special field rather than
as Consul's "initial management" token configuration.

This commit also updates the mock HCP server to more accurately reflect
the behavior of the CCM backend.

* Refactor HCP bootstrapping logic and add tests

We want to allow users to link Consul clusters that already exist to
HCP. Existing clusters need care when bootstrapped by HCP, since we do
not want to do things like change ACL/TLS settings for a running
cluster.

Additional changes:

* Deconstruct MaybeBootstrap so that it can be tested. The HCP Go SDK
  requires HTTPS to fetch a token from the Auth URL, even if the backend
  server is mocked. By pulling the hcp.Client creation out we can modify
  its TLS configuration in tests while keeping the secure behavior in
  production code.

* Add light validation for data received/loaded.

* Sanitize initial_management token from received config, since HCP will
  only ever use the CloudConfig.MangementToken.

* Add changelog entry
2023-04-27 22:27:39 +02:00
John Maguire d19a7dad68
APIGW: Update how status conditions for certificates are handled (#17115)
* Move status condition for invalid certifcate to reference the listener
that is using the certificate

* Fix where we set the condition status for listeners and certificate
refs, added tests

* Add changelog
2023-04-27 15:54:44 +00:00
Semir Patel 406c1afc04
Support Envoy's MaxEjectionPercent and BaseEjectionTime config entries for passive health checks (#15979)
* Add MaxEjectionPercent to config entry

* Add BaseEjectionTime to config entry

* Add MaxEjectionPercent and BaseEjectionTime to protobufs

* Add MaxEjectionPercent and BaseEjectionTime to api

* Fix integration test breakage

* Verify MaxEjectionPercent and BaseEjectionTime in integration test upstream confings

* Website docs for MaxEjectionPercent and BaseEjection time

* Add `make docs` to browse docs at http://localhost:3000

* Changelog entry

* so that is the difference between consul-docker and dev-docker

* blah

* update proto funcs

* update proto

---------

Co-authored-by: Maliz <maliheh.monshizadeh@hashicorp.com>
2023-04-26 15:59:48 -07:00
Anita Akaeze b0674f7d6d
Merge pull request #5200 from hashicorp/NET-3758 (#17102)
* Merge pull request #5200 from hashicorp/NET-3758

NET-3758: connect: update supported envoy versions to 1.26.0

* lint
2023-04-24 18:23:24 +00:00
Paul Banks 062cd72607
Bump raft to 1.5.0 (#17081)
* Bump raft to 1.5.0

* Add CHANGELOG entry

* Add CHANGELOG entry with right extension (thanks VSCode)

* Add CHANGELOG entry with right extension (thanks VSCode)

* Go mod tidy
2023-04-21 20:13:55 +01:00
Paul Glass d8d89d4b59
Permissive mTLS (#17035)
This implements permissive mTLS , which allows toggling services into "permissive" mTLS mode.
Permissive mTLS mode allows incoming "non Consul-mTLS" traffic to be forward unmodified to the application.

* Update service-defaults and proxy-defaults config entries with a MutualTLSMode field
* Update the mesh config entry with an AllowEnablingPermissiveMutualTLS field and implement the necessary validation. AllowEnablingPermissiveMutualTLS must be true to allow changing to MutualTLSMode=permissive, but this does not require that all proxy-defaults and service-defaults are currently in strict mode.
* Update xDS listener config to add a "permissive filter chain" when MutualTLSMode=permissive for a particular service. The permissive filter chain matches incoming traffic by the destination port. If the destination port matches the service port from the catalog, then no mTLS is required and the traffic sent is forwarded unmodified to the application.
2023-04-19 14:45:00 -05:00
R.B. Boyer 5e019393d3
Revert "cache: refactor agent cache fetching to prevent unnecessary f… (#16818) (#17046)
Revert "cache: refactor agent cache fetching to prevent unnecessary fetches on error (#14956)"

Co-authored-by: Derek Menteer <105233703+hashi-derek@users.noreply.github.com>
2023-04-19 13:17:21 -05:00
Kyle Havlovitz 26128548a5
Avoid decoding nil pointer in map walker (#17048) 2023-04-19 10:23:38 -07:00
Kevin Wang 63e9c08234
Bump the golang.org/x/net to 0.7.0 to address CVE-2022-41723 (#16754)
* Bump the golang.org/x/net to 0.7.0 to address CVE-2022-41723

https://nvd.nist.gov/vuln/detail/CVE-2022-41723

* Add changelog entry

---------

Co-authored-by: Nathan Coleman <nathan.coleman@hashicorp.com>
2023-04-18 17:31:08 +00:00
Andrei Komarov 5c35095490
api: enable query options on agent force-leave endpoint (#15987) 2023-04-18 11:31:48 -05:00
Nathan Coleman ad5a4201d5
Update list of Envoy versions (#16889)
* Update list of Envoy versions

* Update docs + CI + tests

* Add changelog entry

* Add newly-released Envoy versions 1.23.8 and 1.24.6

* Add newly-released Envoy version 1.22.11
2023-04-12 17:43:15 -04:00
Dhia Ayachi 825663b38a
Memdb Txn Commit race condition fix (#16871)
* Add a test to reproduce the race condition

* Fix race condition by publishing the event after the commit and adding a lock to prevent out of order events.

* split publish to generate the list of events before committing the transaction.

* add changelog

* remove extra func

* Apply suggestions from code review

Co-authored-by: Dan Upton <daniel@floppy.co>

* add comment to explain test

---------

Co-authored-by: Dan Upton <daniel@floppy.co>
2023-04-12 13:18:01 -04:00
Derek Menteer 2a13c9af1f
Remove deprecated service-defaults upstream behavior. (#16957)
Prior to this change, peer services would be targeted by service-default
overrides as long as the new `peer` field was not found in the config entry.
This commit removes that deprecated backwards-compatibility behavior. Now
it is necessary to specify the `peer` field in order for upstream overrides
to apply to a peer upstream.
2023-04-11 10:20:33 -05:00
Chris Thain f9126b6c3a
Wasm Envoy HTTP extension (#16877) 2023-04-06 14:12:07 -07:00
Freddy 04e6e79b09
Allow dialer to re-establish terminated peering (#16776)
Currently, if an acceptor peer deletes a peering the dialer's peering
will eventually get to a "terminated" state. If the two clusters need to
be re-peered the acceptor will re-generate the token but the dialer will
encounter this error on the call to establish:

"failed to get addresses to dial peer: failed to refresh peer server
addresses, will continue to use initial addresses: there is no active
peering for "<<<ID>>>""

This is because in `exchangeSecret().GetDialAddresses()` we will get an
error if fetching addresses for an inactive peering. The peering shows
up as inactive at this point because of the existing terminated state.

Rather than checking whether a peering is active we can instead check
whether it was deleted. This way users do not need to delete terminated
peerings in the dialing cluster before re-establishing them.
2023-04-03 12:07:45 -06:00
Eric Haberkorn b97a3a17d8
add order by locality failover to Consul enterprise (#16791) 2023-03-30 10:08:38 -04:00
John Maguire 09512ae32d
Update normalization of route refs (#16789)
* Use merge of enterprise meta's rather than new custom method

* Add merge logic for tcp routes

* Add changelog

* Normalize certificate refs on gateways

* Fix infinite call loop

* Explicitly call enterprise meta
2023-03-28 11:23:49 -04:00
Michael Wilkerson baa1fd3cd6
changes to support new PQ enterprise fields (#16793) 2023-03-27 15:40:49 -07:00
John Maguire 74dfee9359
Fix struct tags for TCPService enterprise meta (#16781)
* Fix struct tags for TCPService enterprise meta

* Add changelog
2023-03-27 16:17:04 +00:00
Derek Menteer 5be6469506
Change partition for peers in discovery chain targets (#16769)
This commit swaps the partition field to the local partition for
discovery chains targeting peers. Prior to this change, peer upstreams
would always use a value of default regardless of which partition they
exist in. This caused several issues in xds / proxycfg because of id
mismatches.

Some prior fixes were made to deal with one-off id mismatches that this
PR also cleans up, since they are no longer needed.
2023-03-24 15:40:19 -05:00
Luke Kysow ea91629a83
Changelog for audit logging fix. (#16700)
* Changelog for audit logging fix.
2023-03-22 13:06:53 -07:00
Eric Haberkorn d7c81a3b1d
fix bug where pqs that failover to a cluster peer dont un-fail over (#16729) 2023-03-22 09:24:13 -04:00
Nitya Dhanushkodi 69bd62f9c3
peering: peering partition failover fixes (#16673)
add local source partition for peered upstreams
2023-03-20 10:00:29 -07:00
John Maguire 2e07180662
Fix route subscription when using namespaces (#16677)
* Fix route subscription when using namespaces

* Update changelog

* Fix changelog entry to reference that the bug was enterprise only
2023-03-20 12:42:30 -04:00
Melisa Griffin fa1b6e7450
Adds check to verify that the API Gateway is being created with at least one listener 2023-03-20 12:37:30 -04:00
Dhia Ayachi 5a9948fab7
Snapshot restore tests (#16647)
* add snapshot restore test

* add logstore as test parameter

* Use the correct image version

* make sure we read the logs from a followers to test the follower snapshot install path.

* update to raf-wal v0.3.0

* add changelog.

* updating changelog for bug description and removed integration test.

* setting up test container builder to only set logStore for 1.15 and higher

---------

Co-authored-by: Paul Banks <pbanks@hashicorp.com>
Co-authored-by: John Murret <john.murret@hashicorp.com>
2023-03-18 14:43:22 -06:00
Andrew Stucki a597cb3d57
[API Gateway] Fix invalid cluster causing gateway programming delay (#16661)
* Add test for http routes

* Add fix

* Fix tests

* Add changelog entry

* Refactor and fix flaky tests
2023-03-17 13:31:04 -04:00
Valeriia Ruban 64f5e20793
fix: add AccessorID property to PUT token request (#16660) 2023-03-16 18:57:59 -07:00
Valeriia Ruban f404d3eb13
feat: update typography to consume hds styles (#16577) 2023-03-14 19:49:14 -07:00
Derek Menteer f3be5d9b80
Fix issue with trust bundle read ACL check. (#16630)
This commit fixes an issue where trust bundles could not be read
by services in a non-default namespace, unless they had excessive
ACL permissions given to them.

Prior to this change, `service:write` was required in the default
namespace in order to read the trust bundle. Now, `service:write`
to a service in any namespace is sufficient.
2023-03-14 12:24:33 -05:00
Chris S. Kim bb4baeba95
Preserve CARoots when updating Vault CA configuration (#16592)
If a CA config update did not cause a root change, the codepath would return early and skip some steps which preserve its intermediate certificates and signing key ID. This commit re-orders some code and prevents updates from generating new intermediate certificates.
2023-03-13 17:32:59 -04:00
Ashvitha f514182f3e
Allow HCP metrics collection for Envoy proxies
Co-authored-by: Ashvitha Sridharan <ashvitha.sridharan@hashicorp.com>
Co-authored-by: Freddy <freddygv@users.noreply.github.com>

Add a new envoy flag: "envoy_hcp_metrics_bind_socket_dir", a directory
where a unix socket will be created with the name
`<namespace>_<proxy_id>.sock` to forward Envoy metrics.

If set, this will configure:
- In bootstrap configuration a local stats_sink and static cluster.
  These will forward metrics to a loopback listener sent over xDS.

- A dynamic listener listening at the socket path that the previously
  defined static cluster is sending metrics to.

- A dynamic cluster that will forward traffic received at this listener
  to the hcp-metrics-collector service.


Reasons for having a static cluster pointing at a dynamic listener:
- We want to secure the metrics stream using TLS, but the stats sink can
  only be defined in bootstrap config. With dynamic listeners/clusters
  we can use the proxy's leaf certificate issued by the Connect CA,
  which isn't available at bootstrap time.

- We want to intelligently route to the HCP collector. Configuring its
  addreess at bootstrap time limits our flexibility routing-wise. More
  on this below.

Reasons for defining the collector as an upstream in `proxycfg`:
- The HCP collector will be deployed as a mesh service.

- Certificate management is taken care of, as mentioned above.

- Service discovery and routing logic is automatically taken care of,
  meaning that no code changes are required in the xds package.

- Custom routing rules can be added for the collector using discovery
  chain config entries. Initially the collector is expected to be
  deployed to each admin partition, but in the future could be deployed
  centrally in the default partition. These config entries could even be
  managed by HCP itself.
2023-03-10 13:52:54 -07:00
Tyler Wendlandt 00caa78594
UI: Fix htmlsafe errors throughout the app (#16574)
* Upgrade ember-intl

* Add changelog

* Add yarn lock
2023-03-09 12:43:35 -07:00