Commit graph

453 commits

Author SHA1 Message Date
R.B. Boyer fa7a66cd30
agent: purge service/check registration files for incorrect partitions on reload (#11607) 2021-11-18 14:44:20 -06:00
R.B. Boyer 086ff42b56
partitions: various refactors to support partitioning the serf LAN pool (#11568) 2021-11-15 09:51:14 -06:00
R.B. Boyer 7fbf749bc4
segments: ensure that the serf_lan_allowed_cidrs applies to network segments (#11495) 2021-11-04 17:17:19 -05:00
Mark Anderson e9a0fa7d36
Remove some usage of md5 from the system (#11491)
* Remove some usage of md5 from the system

OSS side of https://github.com/hashicorp/consul-enterprise/pull/1253

This is a potential security issue because an attacker could conceivably manipulate inputs to cause persistence files to collide, effectively deleting the persistence file for one of the colliding elements.

Signed-off-by: Mark Anderson <manderson@hashicorp.com>
2021-11-04 13:07:54 -07:00
Daniel Nephin 0ec2a804df
Merge pull request #10690 from tarat44/h2c-support-in-ping-checks
add support for h2c in h2 ping health checks
2021-11-02 13:53:06 -04:00
Daniel Nephin 00ed2b243f
Merge pull request #10771 from hashicorp/dnephin/emit-telemetry-metrics-immediately
telemetry: improve cert expiry metrics
2021-11-01 18:31:03 -04:00
R.B. Boyer 017e9d5ae4
agent: add a clone function for duplicating the serf lan configuration (#11443) 2021-10-28 16:11:26 -05:00
Daniel Nephin 0a19d7fd76 agent: move agent tls metric monitor to a more appropriate place
And add a test for it
2021-10-27 16:26:09 -04:00
Daniel Nephin 1b2144c982 telemetry: set cert expiry metrics to NaN on start
So that followers do not report 0, which would make alerting difficult.
2021-10-27 15:19:25 -04:00
freddygv 183849416b Register the ExportingPartitions cache type 2021-10-27 11:15:25 -06:00
R.B. Boyer e27e58c6cc
agent: refactor the agent delegate interface to be partition friendly (#11429) 2021-10-26 15:08:55 -05:00
Chris S. Kim 27f8a85664
agent: Ensure partition is considered in agent endpoints (#11427) 2021-10-26 15:20:57 -04:00
tarat44 f8b47cdfcd change config option to H2PingUseTLS 2021-10-05 00:12:21 -04:00
tarat44 ed4ca3db49 add support for h2c in h2 ping health checks 2021-10-04 22:51:08 -04:00
Daniel Nephin b8da06a34d acl: remove ACL upgrading from Clients
As part of removing the legacy ACL system ACL upgrading and the flag for
legacy ACLs is removed from Clients.

This commit also removes the 'acls' serf tag from client nodes. The tag is only ever read
from server nodes.

This commit also introduces a constant for the acl serf tag, to make it easier to track where
it is used.
2021-09-29 14:02:38 -04:00
Daniel Nephin bd28d23b55 command/envoy: stop using the DebugConfig from Self endpoint
The DebugConfig in the self endpoint can change at any time. It's not a stable API.

This commit adds the XDSPort to a stable part of the XDS api, and changes the envoy command to read
this new field.

It includes support for the old API as well, in case a newer CLI is used with an older API, and
adds a test for both cases.
2021-09-29 13:21:28 -04:00
Daniel Nephin 402d3792b6 Revert "Merge pull request #10588 from hashicorp/dnephin/config-fix-ports-grpc"
This reverts commit 74fb650b6b966588f8faeec26935a858af2b8bb5, reversing
changes made to 58bd8173364effb98b9fd9f9b98d31dd887a9bac.
2021-09-29 12:28:41 -04:00
Chris S. Kim 90fe20c3a2
agent: Clean up unused built-in proxy config (#11165) 2021-09-28 11:29:10 -04:00
Daniel Nephin 44d91ea56f
Add failures_before_warning to checks (#10969)
Signed-off-by: Jakub Sokołowski <jakub@status.im>

* agent: add failures_before_warning setting

The new setting allows users to specify the number of check failures
that have to happen before a service status us updated to be `warning`.
This allows for more visibility for detected issues without creating
alerts and pinging administrators. Unlike the previous behavior, which
caused the service status to not update until it reached the configured
`failures_before_critical` setting, now Consul updates the Web UI view
with the `warning` state and the output of the service check when
`failures_before_warning` is breached.

The default value of `FailuresBeforeWarning` is the same as the value of
`FailuresBeforeCritical`, which allows for retaining the previous default
behavior of not triggering a warning.

When `FailuresBeforeWarning` is set to a value higher than that of
`FailuresBeforeCritical it has no effect as `FailuresBeforeCritical`
takes precedence.

Resolves: https://github.com/hashicorp/consul/issues/10680

Signed-off-by: Jakub Sokołowski <jakub@status.im>

Co-authored-by: Jakub Sokołowski <jakub@status.im>
2021-09-14 12:47:52 -04:00
R.B. Boyer 61f1c01b83
agent: ensure that most agent behavior correctly respects partition configuration (#10880) 2021-08-19 15:09:42 -05:00
Daniel Nephin 0d69b49f41 config: remove ACLResolver settings from RuntimeConfig 2021-08-17 13:32:52 -04:00
Daniel Nephin 75baa22e64 acl: remove ACLResolver config fields from consul.Config 2021-08-17 13:32:52 -04:00
Daniel Nephin 7f71a672f3
Merge pull request #10807 from hashicorp/dnephin/remove-acl-datacenter
config: remove ACLDatacenter
2021-08-17 13:07:09 -04:00
Daniel Nephin 7c865d03ac proxycfg: Lookup the agent token as a default
When no ACL token is provided with the service registration.
2021-08-12 15:51:34 -04:00
Daniel Nephin 047abdd73c acl: remove ACLDatacenter
This field has been unnecessary for a while now. It was always set to the same value
as PrimaryDatacenter. So we can remove the duplicate field and use PrimaryDatacenter
directly.

This change was made by GoLand refactor, which did most of the work for me.
2021-08-06 18:27:00 -04:00
Daniel Nephin 1673b3a68c telemetry: add a metric for agent TLS cert expiry 2021-08-04 13:51:44 -04:00
Daniel Nephin 057e8320f9 streaming: set a default timeout
The blocking query backend sets the default value on the server side.
The streaming backend does not using blocking queries, so we must set the timeout on
the client.
2021-07-28 17:50:00 -04:00
R.B. Boyer 62ac98b564
agent/structs: add a bunch more EnterpriseMeta helper functions to help with partitioning (#10669) 2021-07-22 13:20:45 -05:00
Daniel Nephin 57c5a40869
Merge pull request #10588 from hashicorp/dnephin/config-fix-ports-grpc
config: rename `ports.grpc` to `ports.xds`
2021-07-13 13:11:38 -04:00
Daniel Nephin 1e23d181b5 config: remove misleading UseTLS field
This field was documented as enabling TLS for outgoing RPC, but that was not the case.
All this field did was set the use_tls serf tag.

Instead of setting this field in a place far from where it is used, move the logic to where
the serf tag is set, so that the code is much more obvious.
2021-07-09 19:01:45 -04:00
Daniel Nephin 3c60a46376 config: remove duplicate TLSConfig fields from agent/consul.Config
tlsutil.Config already presents an excellent structure for this
configuration. Copying the runtime config fields to agent/consul.Config
makes code harder to trace, and provides no advantage.

Instead of copying the fields around, use the tlsutil.Config struct
directly instead.

This is one small step in removing the many layers of duplicate
configuration.
2021-07-09 18:49:42 -04:00
Daniel Nephin 2ab6be6a88 config: update GRPCPort and addr in runtime config 2021-07-09 12:31:53 -04:00
Daniel Nephin 9c6458c6c2 rename GRPC->XDS where appropriate 2021-07-09 12:17:45 -04:00
Daniel Nephin a4a390d7c5 streaming: fix enable of streaming in the client
And add checks to all the tests that explicitly use streaming.
2021-06-28 17:23:14 -04:00
Matt Keeler 7e4ea16149 Move some things around to allow for license updating via config reload
The bulk of this commit is moving the LeaderRoutineManager from the agent/consul package into its own package: lib/gort. It also got a renaming and its Start method now requires a context. Requiring that context required updating a whole bunch of other places in the code.
2021-05-25 09:57:50 -04:00
Matt Keeler 58b934133d hcs-1936: Prepare for adding license auto-retrieval to auto-config in enterprise 2021-05-24 13:20:30 -04:00
Matt Keeler 82f5cb3f08 Preparation for changing where license management is done. 2021-05-24 10:19:31 -04:00
Iryna Shustava 7a41dbd9b6
Save exposed ports in agent's store and expose them via API (#10173)
* Save exposed HTTP or GRPC ports to the agent's store
* Add those the health checks API so we can retrieve them from the API
* Change redirect-traffic command to also exclude those ports from inbound traffic redirection when expose.checks is set to true.
2021-05-12 13:51:39 -07:00
Paul Banks d47eea3a3f
Make Raft trailing logs and snapshot timing reloadable (#10129)
* WIP reloadable raft config

* Pre-define new raft gauges

* Update go-metrics to change gauge reset behaviour

* Update raft to pull in new metric and reloadable config

* Add snapshot persistance timing and installSnapshot to our 'protected' list as they can be infrequent but are important

* Update telemetry docs

* Update config and telemetry docs

* Add note to oldestLogAge on when it is visible

* Add changelog entry

* Update website/content/docs/agent/options.mdx

Co-authored-by: Matt Keeler <mkeeler@users.noreply.github.com>

Co-authored-by: Matt Keeler <mkeeler@users.noreply.github.com>
2021-05-04 15:36:53 +01:00
R.B. Boyer 91bee6246f
Support Incremental xDS mode (#9855)
This adds support for the Incremental xDS protocol when using xDS v3. This is best reviewed commit-by-commit and will not be squashed when merged.

Union of all commit messages follows to give an overarching summary:

xds: exclusively support incremental xDS when using xDS v3

Attempts to use SoTW via v3 will fail, much like attempts to use incremental via v2 will fail.
Work around a strange older envoy behavior involving empty CDS responses over incremental xDS.
xds: various cleanups and refactors that don't strictly concern the addition of incremental xDS support

Dissolve the connectionInfo struct in favor of per-connection ResourceGenerators instead.
Do a better job of ensuring the xds code uses a well configured logger that accurately describes the connected client.
xds: pull out checkStreamACLs method in advance of a later commit

xds: rewrite SoTW xDS protocol tests to use protobufs rather than hand-rolled json strings

In the test we very lightly reuse some of the more boring protobuf construction helper code that is also technically under test. The important thing of the protocol tests is testing the protocol. The actual inputs and outputs are largely already handled by the xds golden output tests now so these protocol tests don't have to do double-duty.

This also updates the SoTW protocol test to exclusively use xDS v2 which is the only variant of SoTW that will be supported in Consul 1.10.

xds: default xds.Server.AuthCheckFrequency at use-time instead of construction-time
2021-04-29 13:54:05 -05:00
Daniel Nephin d257acee24 rpcclient: close the grpc.ClientConn on shutdown 2021-04-27 19:03:16 -04:00
Daniel Nephin 3cda0a7cc4 health: create health.Client in Agent.New 2021-04-27 19:03:16 -04:00
Daniel Nephin 318bbd3e30 health: use blocking queries for near query parameter 2021-04-27 19:03:16 -04:00
Daniel Nephin 18c9e73832 connect: do not set QuerySource.Node
Setting this field to a value is equivalent to using the 'near' query paramter.
The intent is to sort the results by proximity to the node requesting
them. However with connect we send the results to envoy, which doesn't
care about the order, so setting this field is increasing the work
performed for no gain.

It is necessary to unset this field now because we would like connect
to use streaming, but streaming does not support sorting by proximity.
2021-04-27 19:03:16 -04:00
Matt Keeler aa0eb60f57
Move static token resolution into the ACLResolver (#10013) 2021-04-14 12:39:35 -04:00
Tara Tufano b8e7a90f77
add http2 ping health checks (#8431)
* add http2 ping checks

* fix test issue

* add h2ping check to config resources

* add new test and docs for h2ping

* fix grammatical inconsistency in H2PING documentation

* resolve rebase conflicts, add test for h2ping tls verification failure

* api documentation for h2ping

* update test config data with H2PING

* add H2PING to protocol buffers and update changelog

* fix typo in changelog entry
2021-04-09 15:12:10 -04:00
R.B. Boyer af78561018
api: ensure v1/health/ingress/:service endpoint works properly when streaming is enabled (#9967)
The streaming cache type for service health has no way to handle v1/health/ingress/:service queries as there is no equivalent topic that would return the appropriate data.

Ensure that attempts to use this endpoint will use the old cache-type for now so that they return appropriate data when streaming is enabled.
2021-04-05 13:23:00 -05:00
freddygv 6c43195e2a Merge master and fix upstream config protocol defaulting 2021-03-17 21:13:40 -06:00
freddygv 3c97e5a777 Update proxycfg for transparent proxy 2021-03-17 13:40:39 -06:00
Christopher Broglie 94b02c3954 Add support for configuring TLS ServerName for health checks
Some TLS servers require SNI, but the Golang HTTP client doesn't
include it in the ClientHello when connecting to an IP address. This
change adds a new TLSServerName field to health check definitions to
optionally set it. This fixes #9473.
2021-03-16 18:16:44 -04:00