Commit Graph

483 Commits

Author SHA1 Message Date
freddygv ad6dbe081a Add agent cache-type for TrustBundleListByService
There are a handful of changes in this commit:
* When querying trust bundles for a service we need to be able to
  specify the namespace of the service.
* The endpoint needs to track the index because the cache watches use
  it.
* Extracted bulk of the endpoint's logic to a state store function
  so that index tracking could be tested more easily.
* Removed check for service existence, deferring that sort of work to ACL authz
* Added the cache type
2022-06-01 17:05:10 -06:00
Freddy 6ef38eaea7
Configure upstream TLS context with peer root certs (#13321)
For mTLS to work between two proxies in peered clusters with different root CAs,
proxies need to configure their outbound listener to use different root certificates
for validation.

Up until peering was introduced proxies would only ever use one set of root certificates
to validate all mesh traffic, both inbound and outbound. Now an upstream proxy
may have a leaf certificate signed by a CA that's different from the dialing proxy's.

This PR makes changes to proxycfg and xds so that the upstream TLS validation
uses different root certificates depending on which cluster is being dialed.
2022-06-01 15:53:52 -06:00
Dan Upton e6dc26e087
proxycfg: replace direct agent cache usage with interfaces (#13320)
This is the OSS portion of enterprise PRs 1904, 1905, 1906, 1907, 1949,
and 1971.

It replaces the proxycfg manager's direct dependency on the agent cache
with interfaces that will be implemented differently when serving xDS
sessions from a Consul server.
2022-06-01 16:18:06 +01:00
Dan Upton a6a6d5a8ee
Enable servers to configure arbitrary proxies from the catalog (#13244)
OSS port of enterprise PR 1822

Includes the necessary changes to the `proxycfg` and `xds` packages to enable
Consul servers to configure arbitrary proxies using catalog data.

Broadly, `proxycfg.Manager` now has public methods for registering,
deregistering, and listing registered proxies — the existing local agent
state-sync behavior has been moved into a separate component that makes use of
these methods.

When an xDS session is started for a proxy service in the catalog, a goroutine
will be spawned to watch the service in the server's state store and
re-register it with the `proxycfg.Manager` whenever it is updated (and clean
it up when the client goes away).
2022-05-27 12:38:52 +01:00
Dan Upton 30775ed54d
proxycfg: remove dependency on `cache.UpdateEvent` (#13144)
OSS portion of enterprise PR 1857.

This removes (most) references to the `cache.UpdateEvent` type in the
`proxycfg` package.

As we're going to be direct usage of the agent cache with interfaces that
can be satisfied by alternative server-local datasources, it doesn't make
sense to depend on this type everywhere anymore (particularly on the
`state.ch` channel).

We also plan to extract `proxycfg` out of Consul into a shared library in
the future, which would require removing this dependency.

Aside from a fairly rote find-and-replace, the main change is that the
`cache.Cache` and `health.Client` types now accept a callback function
parameter, rather than a `chan<- cache.UpdateEvents`. This allows us to
do the type conversion without running another goroutine.
2022-05-20 15:47:40 +01:00
cskh df27fa0c84
Retry on bad dogstatsd connection (#13091)
- Introduce a new telemetry configurable parameter retry_failed_connection. User can set the value to true to let consul agent continue its start process on failed connection to datadog server. When set to false, agent will stop on failed start. The default behavior is true.

Co-authored-by: Dan Upton <daniel@floppy.co>
Co-authored-by: Evan Culver <eculver@users.noreply.github.com>
2022-05-19 16:03:46 -04:00
Matt Keeler 42aec5caf4
Watch the singular service resolver instead of the list + filtering to 1 (#13012)
* Watch the singular service resolver instead of the list + filtering to 1

* Rename the ConfigEntries cache type to ConfigEntryList
2022-05-12 16:34:17 -04:00
Mark Anderson be059184dd Fix up enterprise version tag.
Changes to how the version string was handled created small regression with the release of consul 1.12.0 enterprise.

Many tools use the Config:Version field reported by the agent/self resource to determine whether Consul is an enterprise or OSS instance, expect something like 1.12.0+ent for enterprise and simply 1.12.0 for OSS. This was accidentally broken during the runup to 1.12.x

This work fixes the value returned by both the self endpoint in ["Config"]["Version"] and the metrics consul.version field.

Signed-off-by: Mark Anderson <manderson@hashicorp.com>
2022-05-09 10:07:19 -07:00
Dhia Ayachi 9dc5200155
update raft to v1.3.8 (#12844)
* update raft to v1.3.7

* add changelog

* fix compilation error

* fix HeartbeatTimeout

* fix ElectionTimeout to reload only if value is valid

* fix default values for `ElectionTimeout` and `HeartbeatTimeout`

* fix test defaults

* bump raft to v1.3.8
2022-04-25 10:19:26 -04:00
R.B. Boyer 809344a6f5
peering: initial sync (#12842)
- Add endpoints related to peering: read, list, generate token, initiate peering
- Update node/service/check table indexing to account for peers
- Foundational changes for pushing service updates to a peer
- Plumb peer name through Health.ServiceNodes path

see: ENT-1765, ENT-1280, ENT-1283, ENT-1283, ENT-1756, ENT-1739, ENT-1750, ENT-1679,
     ENT-1709, ENT-1704, ENT-1690, ENT-1689, ENT-1702, ENT-1701, ENT-1683, ENT-1663,
     ENT-1650, ENT-1678, ENT-1628, ENT-1658, ENT-1640, ENT-1637, ENT-1597, ENT-1634,
     ENT-1613, ENT-1616, ENT-1617, ENT-1591, ENT-1588, ENT-1596, ENT-1572, ENT-1555

Co-authored-by: R.B. Boyer <rb@hashicorp.com>
Co-authored-by: freddygv <freddy@hashicorp.com>
Co-authored-by: Chris S. Kim <ckim@hashicorp.com>
Co-authored-by: Evan Culver <eculver@hashicorp.com>
Co-authored-by: Nitya Dhanushkodi <nitya@hashicorp.com>
2022-04-21 17:34:40 -05:00
Jared Kirschner 9a81a77264
Merge pull request #12725 from hashicorp/clarify-service-deregister-after-critical-message
improve error msg for deregister critical service
2022-04-07 18:01:54 -04:00
Jared Kirschner 1835e761fd improve error msg for deregister critical service
If a service is automatically registered because it has a critical health check
for longer than deregister_critical_service_after, the error message will now
include:
- mention of the deregister_critical_service_after option
- the value of deregister_critical_service_after for that check
2022-04-07 14:50:02 -07:00
Kyle Havlovitz a2c28526a7
Merge pull request #12685 from hashicorp/http-check-redirect-option
Add a field to disable following redirects on http checks
2022-04-07 11:29:27 -07:00
Kyle Havlovitz b3db499c74 Add a field to disable following redirects on http checks 2022-04-05 16:12:18 -07:00
Mark Anderson ed3e42296d Fixup acl.EnterpriseMeta
Signed-off-by: Mark Anderson <manderson@hashicorp.com>
2022-04-05 15:11:49 -07:00
Dhia Ayachi cdcb249449
add a rate limiter to config auto-reload (#12490)
* add config watcher to the config package

* add logging to watcher

* add test and refactor to add WatcherEvent.

* add all API calls and fix a bug with recreated files

* add tests for watcher

* remove the unnecessary use of context

* Add debug log and a test for file rename

* use inode to detect if the file is recreated/replaced and only listen to create events.

* tidy ups (#1535)

* tidy ups

* Add tests for inode reconcile

* fix linux vs windows syscall

* fix linux vs windows syscall

* fix windows compile error

* increase timeout

* use ctime ID

* remove remove/creation test as it's a use case that fail in linux

* fix linux/windows to use Ino/CreationTime

* fix the watcher to only overwrite current file id

* fix linter error

* fix remove/create test

* set reconcile loop to 200 Milliseconds

* fix watcher to not trigger event on remove, add more tests

* on a remove event try to add the file back to the watcher and trigger the handler if success

* fix race condition

* fix flaky test

* fix race conditions

* set level to info

* fix when file is removed and get an event for it after

* fix to trigger handler when we get a remove but re-add fail

* fix error message

* add tests for directory watch and fixes

* detect if a file is a symlink and return an error on Add

* rename Watcher to FileWatcher and remove symlink deref

* add fsnotify@v1.5.1

* fix go mod

* do not reset timer on errors, rename OS specific files

* rename New func

* events trigger on write and rename

* add missing test

* fix flaking tests

* fix flaky test

* check reconcile when removed

* delete invalid file

* fix test to create files with different mod time.

* back date file instead of sleeping

* add watching file in agent command.

* fix watcher call to use new API

* add configuration and stop watcher when server stop

* add certs as watched files

* move FileWatcher to the agent start instead of the command code

* stop watcher before replacing it

* save watched files in agent

* add add and remove interfaces to the file watcher

* fix remove to not return an error

* use `Add` and `Remove` to update certs files

* fix tests

* close events channel on the file watcher even when the context is done

* extract `NotAutoReloadableRuntimeConfig` is a separate struct

* fix linter errors

* add Ca configs and outgoing verify to the not auto reloadable config

* add some logs and fix to use background context

* add tests to auto-config reload

* remove stale test

* add tests to changes to config files

* add check to see if old cert files still trigger updates

* rename `NotAutoReloadableRuntimeConfig` to `StaticRuntimeConfig`

* fix to re add both key and cert file. Add test to cover this case.

* review suggestion

Co-authored-by: R.B. Boyer <4903+rboyer@users.noreply.github.com>

* add check to static runtime config changes

* fix test

* add changelog file

* fix review comments

* Apply suggestions from code review

Co-authored-by: R.B. Boyer <4903+rboyer@users.noreply.github.com>

* update flag description

Co-authored-by: FFMMM <FFMMM@users.noreply.github.com>

* fix compilation error

* add static runtime config support

* fix test

* fix review comments

* fix log test

* Update .changelog/12329.txt

Co-authored-by: Dan Upton <daniel@floppy.co>

* transfer tests to runtime_test.go

* fix filewatcher Replace to not deadlock.

* avoid having lingering locks

Co-authored-by: R.B. Boyer <4903+rboyer@users.noreply.github.com>

* split ReloadConfig func

* fix warning message

Co-authored-by: R.B. Boyer <4903+rboyer@users.noreply.github.com>

* convert `FileWatcher` into an interface

* fix compilation errors

* fix tests

* extract func for adding and removing files

* add a coalesceTimer with a very small timer

* extract coaelsce Timer and add a shim for testing

* add tests to coalesceTimer fix to send remaining events

* set `coalesceTimer` to 1 Second

* support symlink, fix a nil deref.

* fix compile error

* fix compile error

* refactor file watcher rate limiting to be a Watcher implementation

* fix linter issue

* fix runtime config

* fix runtime test

* fix flaky tests

* fix compile error

* Apply suggestions from code review

Co-authored-by: R.B. Boyer <4903+rboyer@users.noreply.github.com>

* fix agent New to return an error if File watcher New return an error

* quit timer loop if ctx is canceled

* Apply suggestions from code review

Co-authored-by: Chris S. Kim <ckim@hashicorp.com>

Co-authored-by: Ashwin Venkatesh <ashwin@hashicorp.com>
Co-authored-by: R.B. Boyer <4903+rboyer@users.noreply.github.com>
Co-authored-by: FFMMM <FFMMM@users.noreply.github.com>
Co-authored-by: Daniel Upton <daniel@floppy.co>
Co-authored-by: Chris S. Kim <ckim@hashicorp.com>
2022-04-04 11:31:39 -04:00
FFMMM b0cba2ec03
mark disable_compat_1.9 to deprecate in 1.13, change default to true (#12675)
Co-authored-by: R.B. Boyer <4903+rboyer@users.noreply.github.com>
2022-04-01 10:35:56 -07:00
Dhia Ayachi 8552efa955
auto-reload configuration when config files change (#12329)
* add config watcher to the config package

* add logging to watcher

* add test and refactor to add WatcherEvent.

* add all API calls and fix a bug with recreated files

* add tests for watcher

* remove the unnecessary use of context

* Add debug log and a test for file rename

* use inode to detect if the file is recreated/replaced and only listen to create events.

* tidy ups (#1535)

* tidy ups

* Add tests for inode reconcile

* fix linux vs windows syscall

* fix linux vs windows syscall

* fix windows compile error

* increase timeout

* use ctime ID

* remove remove/creation test as it's a use case that fail in linux

* fix linux/windows to use Ino/CreationTime

* fix the watcher to only overwrite current file id

* fix linter error

* fix remove/create test

* set reconcile loop to 200 Milliseconds

* fix watcher to not trigger event on remove, add more tests

* on a remove event try to add the file back to the watcher and trigger the handler if success

* fix race condition

* fix flaky test

* fix race conditions

* set level to info

* fix when file is removed and get an event for it after

* fix to trigger handler when we get a remove but re-add fail

* fix error message

* add tests for directory watch and fixes

* detect if a file is a symlink and return an error on Add

* rename Watcher to FileWatcher and remove symlink deref

* add fsnotify@v1.5.1

* fix go mod

* do not reset timer on errors, rename OS specific files

* rename New func

* events trigger on write and rename

* add missing test

* fix flaking tests

* fix flaky test

* check reconcile when removed

* delete invalid file

* fix test to create files with different mod time.

* back date file instead of sleeping

* add watching file in agent command.

* fix watcher call to use new API

* add configuration and stop watcher when server stop

* add certs as watched files

* move FileWatcher to the agent start instead of the command code

* stop watcher before replacing it

* save watched files in agent

* add add and remove interfaces to the file watcher

* fix remove to not return an error

* use `Add` and `Remove` to update certs files

* fix tests

* close events channel on the file watcher even when the context is done

* extract `NotAutoReloadableRuntimeConfig` is a separate struct

* fix linter errors

* add Ca configs and outgoing verify to the not auto reloadable config

* add some logs and fix to use background context

* add tests to auto-config reload

* remove stale test

* add tests to changes to config files

* add check to see if old cert files still trigger updates

* rename `NotAutoReloadableRuntimeConfig` to `StaticRuntimeConfig`

* fix to re add both key and cert file. Add test to cover this case.

* review suggestion

Co-authored-by: R.B. Boyer <4903+rboyer@users.noreply.github.com>

* add check to static runtime config changes

* fix test

* add changelog file

* fix review comments

* Apply suggestions from code review

Co-authored-by: R.B. Boyer <4903+rboyer@users.noreply.github.com>

* update flag description

Co-authored-by: FFMMM <FFMMM@users.noreply.github.com>

* fix compilation error

* add static runtime config support

* fix test

* fix review comments

* fix log test

* Update .changelog/12329.txt

Co-authored-by: Dan Upton <daniel@floppy.co>

* transfer tests to runtime_test.go

* fix filewatcher Replace to not deadlock.

* avoid having lingering locks

Co-authored-by: R.B. Boyer <4903+rboyer@users.noreply.github.com>

* split ReloadConfig func

* fix warning message

Co-authored-by: R.B. Boyer <4903+rboyer@users.noreply.github.com>

* convert `FileWatcher` into an interface

* fix compilation errors

* fix tests

* extract func for adding and removing files

Co-authored-by: Ashwin Venkatesh <ashwin@hashicorp.com>
Co-authored-by: R.B. Boyer <4903+rboyer@users.noreply.github.com>
Co-authored-by: FFMMM <FFMMM@users.noreply.github.com>
Co-authored-by: Daniel Upton <daniel@floppy.co>
2022-03-31 15:11:49 -04:00
Dan Upton fb441e323a
Restructure gRPC server setup (#12586)
OSS sync of enterprise changes at 0b44395e
2022-03-22 12:40:24 +00:00
Dan Upton 57f0f42733
Support per-listener TLS configuration ⚙️ (#12504)
Introduces the capability to configure TLS differently for Consul's
listeners/ports (i.e. HTTPS, gRPC, and the internal multiplexed RPC
port) which is useful in scenarios where you may want the HTTPS or
gRPC interfaces to present a certificate signed by a well-known/public
CA, rather than the certificate used for internal communication which
must have a SAN in the form `server.<dc>.consul`.
2022-03-18 10:46:58 +00:00
Eric 07bf0c5ce8 Create and wire up the serverless patcher 2022-03-15 10:12:57 -04:00
Evan Culver 067223337d
checks: populate interval and timeout when registering services (#11138) 2022-02-18 12:05:33 -08:00
Dhia Ayachi de7598f064
fix race when starting a service while the agent `serviceManager` is … (#12302)
* fix race when starting a service while the agent `serviceManager` is stopping

* add changelog
2022-02-10 13:30:49 -05:00
Daniel Nephin ed1cc5f255 acl: remove ResolveTokenToIdentity
By exposing the AccessorID from the primary ResolveToken method we can
remove this duplication.
2022-01-22 14:47:59 -05:00
Daniel Nephin 26f0ebd96f acl: return a resposne from ResolveToken that includes the ACLIdentity
So that we can duplicate duplicate methods.
2022-01-22 14:33:09 -05:00
Dan Upton 088ba2edaf
[OSS] Remove remaining references to master (#11827) 2022-01-20 12:47:50 +00:00
Dan Upton 8bc11b08dc
Rename `ACLMasterToken` => `ACLInitialManagementToken` (#11746) 2021-12-07 12:39:28 +00:00
Freddy 3791d6d7da
Merge pull request #11720 from hashicorp/bbolt 2021-12-03 14:44:36 -07:00
R.B. Boyer 6ec84cfbe2
agent: add variation of force-leave that exclusively works on the WAN (#11722)
Fixes #6548
2021-12-02 17:15:10 -06:00
Daniel Nephin 8e2c71528f config: add NoFreelistSync option
# Conflicts:
#	agent/config/testdata/TestRuntimeConfig_Sanitize-enterprise.golden
#	agent/consul/server.go
2021-12-02 16:56:15 -05:00
R.B. Boyer fa7a66cd30
agent: purge service/check registration files for incorrect partitions on reload (#11607) 2021-11-18 14:44:20 -06:00
R.B. Boyer 086ff42b56
partitions: various refactors to support partitioning the serf LAN pool (#11568) 2021-11-15 09:51:14 -06:00
R.B. Boyer 7fbf749bc4
segments: ensure that the serf_lan_allowed_cidrs applies to network segments (#11495) 2021-11-04 17:17:19 -05:00
Mark Anderson e9a0fa7d36
Remove some usage of md5 from the system (#11491)
* Remove some usage of md5 from the system

OSS side of https://github.com/hashicorp/consul-enterprise/pull/1253

This is a potential security issue because an attacker could conceivably manipulate inputs to cause persistence files to collide, effectively deleting the persistence file for one of the colliding elements.

Signed-off-by: Mark Anderson <manderson@hashicorp.com>
2021-11-04 13:07:54 -07:00
Daniel Nephin 0ec2a804df
Merge pull request #10690 from tarat44/h2c-support-in-ping-checks
add support for h2c in h2 ping health checks
2021-11-02 13:53:06 -04:00
Daniel Nephin 00ed2b243f
Merge pull request #10771 from hashicorp/dnephin/emit-telemetry-metrics-immediately
telemetry: improve cert expiry metrics
2021-11-01 18:31:03 -04:00
R.B. Boyer 017e9d5ae4
agent: add a clone function for duplicating the serf lan configuration (#11443) 2021-10-28 16:11:26 -05:00
Daniel Nephin 0a19d7fd76 agent: move agent tls metric monitor to a more appropriate place
And add a test for it
2021-10-27 16:26:09 -04:00
Daniel Nephin 1b2144c982 telemetry: set cert expiry metrics to NaN on start
So that followers do not report 0, which would make alerting difficult.
2021-10-27 15:19:25 -04:00
freddygv 183849416b Register the ExportingPartitions cache type 2021-10-27 11:15:25 -06:00
R.B. Boyer e27e58c6cc
agent: refactor the agent delegate interface to be partition friendly (#11429) 2021-10-26 15:08:55 -05:00
Chris S. Kim 27f8a85664
agent: Ensure partition is considered in agent endpoints (#11427) 2021-10-26 15:20:57 -04:00
tarat44 f8b47cdfcd change config option to H2PingUseTLS 2021-10-05 00:12:21 -04:00
tarat44 ed4ca3db49 add support for h2c in h2 ping health checks 2021-10-04 22:51:08 -04:00
Daniel Nephin b8da06a34d acl: remove ACL upgrading from Clients
As part of removing the legacy ACL system ACL upgrading and the flag for
legacy ACLs is removed from Clients.

This commit also removes the 'acls' serf tag from client nodes. The tag is only ever read
from server nodes.

This commit also introduces a constant for the acl serf tag, to make it easier to track where
it is used.
2021-09-29 14:02:38 -04:00
Daniel Nephin bd28d23b55 command/envoy: stop using the DebugConfig from Self endpoint
The DebugConfig in the self endpoint can change at any time. It's not a stable API.

This commit adds the XDSPort to a stable part of the XDS api, and changes the envoy command to read
this new field.

It includes support for the old API as well, in case a newer CLI is used with an older API, and
adds a test for both cases.
2021-09-29 13:21:28 -04:00
Daniel Nephin 402d3792b6 Revert "Merge pull request #10588 from hashicorp/dnephin/config-fix-ports-grpc"
This reverts commit 74fb650b6b966588f8faeec26935a858af2b8bb5, reversing
changes made to 58bd8173364effb98b9fd9f9b98d31dd887a9bac.
2021-09-29 12:28:41 -04:00
Chris S. Kim 90fe20c3a2
agent: Clean up unused built-in proxy config (#11165) 2021-09-28 11:29:10 -04:00
Daniel Nephin 44d91ea56f
Add failures_before_warning to checks (#10969)
Signed-off-by: Jakub Sokołowski <jakub@status.im>

* agent: add failures_before_warning setting

The new setting allows users to specify the number of check failures
that have to happen before a service status us updated to be `warning`.
This allows for more visibility for detected issues without creating
alerts and pinging administrators. Unlike the previous behavior, which
caused the service status to not update until it reached the configured
`failures_before_critical` setting, now Consul updates the Web UI view
with the `warning` state and the output of the service check when
`failures_before_warning` is breached.

The default value of `FailuresBeforeWarning` is the same as the value of
`FailuresBeforeCritical`, which allows for retaining the previous default
behavior of not triggering a warning.

When `FailuresBeforeWarning` is set to a value higher than that of
`FailuresBeforeCritical it has no effect as `FailuresBeforeCritical`
takes precedence.

Resolves: https://github.com/hashicorp/consul/issues/10680

Signed-off-by: Jakub Sokołowski <jakub@status.im>

Co-authored-by: Jakub Sokołowski <jakub@status.im>
2021-09-14 12:47:52 -04:00
R.B. Boyer 61f1c01b83
agent: ensure that most agent behavior correctly respects partition configuration (#10880) 2021-08-19 15:09:42 -05:00