Commit graph

1302 commits

Author SHA1 Message Date
Dhia Ayachi e5dbf5e55b
Add ca certificate metrics (#10504)
* add intermediate ca metric routine

* add Gauge config for intermediate cert

* Stop metrics routine when stopping leader

* add changelog entry

* updage changelog

Co-authored-by: Daniel Nephin <dnephin@hashicorp.com>

* use variables instead of a map

* go imports sort

* Add metrics for primary and secondary ca

* start metrics routine in the right DC

* add telemetry documentation

* update docs

* extract expiry fetching in a func

* merge metrics for primary and secondary into signing ca metric

Co-authored-by: Daniel Nephin <dnephin@hashicorp.com>
2021-07-07 09:41:01 -04:00
Daniel Nephin 14527dd005
Merge pull request #10552 from hashicorp/dnephin/ca-remove-rotation-period
ca: remove unused RotationPeriod field
2021-07-06 18:49:33 -04:00
jkirschner-hashicorp 31bbab8ae7
Merge pull request #10560 from jkirschner-hashicorp/change-sane-to-reasonable
Replace use of 'sane' where appropriate
2021-07-06 11:46:04 -04:00
Daniel Nephin b4a10443d1 ca: remove unused RotationPeriod field
This field was never used. Since it is persisted as part of a map[string]interface{} it
is pretty easy to remove it.
2021-07-05 19:15:44 -04:00
Jared Kirschner 4c3b1b8b7b Replace use of 'sane' where appropriate
HashiCorp voice, style, and language guidelines recommend avoiding ableist
language unless its reference to ability is accurate in a particular use.
2021-07-02 12:18:46 -04:00
Dhia Ayachi b57cf27e8f
Format certificates properly (rfc7468) with a trailing new line (#10411)
* trim carriage return from certificates when inserting rootCA in the inMemDB

* format rootCA properly when returning the CA on the connect CA endpoint

* Fix linter warnings

* Fix providers to trim certs before returning it

* trim newlines on write when possible

* add changelog

* make sure all provider return a trailing newline after the root and intermediate certs

* Fix endpoint to return trailing new line

* Fix failing test with vault provider

* make test more robust

* make sure all provider return a trailing newline after the leaf certs

* Check for suffix before removing newline and use function

* Add comment to consul provider

* Update change log

Co-authored-by: R.B. Boyer <4903+rboyer@users.noreply.github.com>

* fix typo

* simplify code callflow

Co-authored-by: R.B. Boyer <4903+rboyer@users.noreply.github.com>

* extract requireNewLine as shared func

* remove dependency to testify in testing file

* remove extra newline in vault provider

* Add cert newline fix to envoy xds

* remove new line from mock provider

* Remove adding a new line from provider and fix it when the cert is read

* Add a comment to explain the fix

* Add missing for leaf certs

* fix missing new line

* fix missing new line in leaf certs

* remove extra new line in test

* updage changelog

Co-authored-by: Daniel Nephin <dnephin@hashicorp.com>

* fix in vault provider and when reading cache (RPC call)

* fix AWS provider

* fix failing test in the provider

* remove comments and empty lines

* add check for empty cert in test

* fix linter warnings

* add new line for leaf and private key

* use string concat instead of Sprintf

* fix new lines for leaf signing

* preallocate slice and remove append

* Add new line to `SignIntermediate` and `CrossSignCA`

Co-authored-by: R.B. Boyer <4903+rboyer@users.noreply.github.com>
Co-authored-by: Daniel Nephin <dnephin@hashicorp.com>
2021-06-30 20:48:29 -04:00
Daniel Nephin 86244967c5 docs: correct some misleading telemetry docs
The query metrics are actually reported for all read queries, not only
ones that use a MinIndex to block for updates.

Also clarify the raft.apply metric is only on the leader.
2021-06-28 12:20:53 -04:00
R.B. Boyer c3d5a2a5ab
connect/ca: cease including the common name field in generated certs (#10424)
As part of this change, we ensure that the SAN extensions are marked as
critical when the subject is empty so that AWS PCA tolerates the loss of
common names well and continues to function as a Connect CA provider.

Parts of this currently hack around a bug in crypto/x509 and can be
removed after https://go-review.googlesource.com/c/go/+/329129 lands in
a Go release.

Note: the AWS PCA tests do not run automatically, but the following
passed locally for me:

    ENABLE_AWS_PCA_TESTS=1 go test ./agent/connect/ca -run TestAWS
2021-06-25 13:00:00 -05:00
Daniel Nephin 72b30174fa ca: replace ca.PrimaryIntermediateProviders
With an optional interface that providers can use to indicate if they
use an intermediate cert in the primary DC.

This removes the need to look up the provider config when renewing the
intermediate.
2021-06-23 15:47:30 -04:00
Daniel Nephin f4c1f982d1
Merge pull request #9924 from hashicorp/dnephin/cert-expiration-metric
connect: emit a metric for the seconds until root CA expiry
2021-06-18 14:18:55 -04:00
Daniel Nephin e36800cefa Update metric name
and handle the case where there is no active root CA.
2021-06-14 17:01:16 -04:00
Daniel Nephin 548796ae13 connect: emit a metric for the number of seconds until root CA expiration 2021-06-14 16:57:01 -04:00
Freddy f399fd2add
Rename CatalogDestinationsOnly (#10397)
CatalogDestinationsOnly is a passthrough that would enable dialing
addresses outside of Consul's catalog. However, when this flag is set to
true only _connect_ endpoints for services can be dialed.

This flag is being renamed to signal that non-Connect endpoints can't be
dialed by transparent proxies when the value is set to true.
2021-06-14 14:15:09 -06:00
Freddy 61ae2995b7
Add flag for transparent proxies to dial individual instances (#10329) 2021-06-09 14:34:17 -06:00
Daniel Nephin 20f7a72792 stream: remove bufferItem.NextLink
Both NextLink and NextNoBlock had the same logic, with slightly
different return values. By adding a bool return value (similar to map
lookups) we can remove the duplicate method.
2021-06-07 17:04:46 -04:00
Daniel Nephin 48f388f590 stream: fix a bug with creating a snapshot
The head of the topic buffer was being ignored when creating a snapshot. This commit fixes
the bug by ensuring that the head of the topic buffer is included in the snapshot
before handing it off to the subscription.
2021-06-04 18:33:04 -04:00
Paul Ewing e454a9aae0
usagemetrics: add cluster members to metrics API (#10340)
This PR adds cluster members to the metrics API. The number of members per
segment are reported as well as the total number of members.

Tested by running a multi-node cluster locally and ensuring the numbers were
correct. Also added unit test coverage to add the new expected gauges to
existing test cases.
2021-06-03 08:25:53 -07:00
Daniel Nephin 0dfb7da610 grpc: fix a data race by using a static resolver
We have seen test flakes caused by 'concurrent map read and map write', and the race detector
reports the problem as well (prevent us from running some tests with -race).

The root of the problem is the grpc expects resolvers to be registered at init time
before any requests are made, but we were using a separate resolver for each test.

This commit introduces a resolver registry. The registry is registered as the single
resolver for the consul scheme. Each test uses the Authority section of the target
(instead of the scheme) to identify the resolver that should be used for the test.
The scheme is used for lookup, which is why it can no longer be used as the unique
key.

This allows us to use a lock around the map of resolvers, preventing the data race.
2021-06-02 11:35:38 -04:00
Dhia Ayachi 0c13f80d5a
RPC Timeout/Retries account for blocking requests (#8978) 2021-05-27 17:29:43 -04:00
Matt Keeler 7e4ea16149 Move some things around to allow for license updating via config reload
The bulk of this commit is moving the LeaderRoutineManager from the agent/consul package into its own package: lib/gort. It also got a renaming and its Start method now requires a context. Requiring that context required updating a whole bunch of other places in the code.
2021-05-25 09:57:50 -04:00
Matt Keeler 58b934133d hcs-1936: Prepare for adding license auto-retrieval to auto-config in enterprise 2021-05-24 13:20:30 -04:00
Matt Keeler 82f5cb3f08 Preparation for changing where license management is done. 2021-05-24 10:19:31 -04:00
Daniel Nephin f2cf586414 Refactor of serf feature flag tags.
This refactor is to make it easier to see how serf feature flags are
encoded as serf tags, and where those feature flags are read.

- use constants for both the prefix and feature flag name. A constant
  makes it much easier for an IDE to locate the read and write location.
- isolate the feature-flag encoding logic in the metadata package, so
  that the feature flag prefix can be unexported. Only expose a function
  for encoding the flags into tags. This logic is now next to the logic
  which reads the tags.
- remove the duplicate `addEnterpriseSerfTags` functions. Both Client
  and Server structs had the same implementation. And neither
  implementation needed the method receiver.
2021-05-20 12:57:06 -04:00
R.B. Boyer b90877b440
server: ensure that central service config flattening properly resets the state each time (#10239)
The prior solution to call reply.Reset() aged poorly since newer fields
were added to the reply, but not added to Reset() leading serial
blocking query loops on the server to blend replies.

This could manifest as a service-defaults protocol change from
default=>http not reverting back to default after the config entry
reponsible was deleted.
2021-05-14 10:21:44 -05:00
Daniel Nephin 3dd951ab1e testing: don't run t.Parallel in a goroutine
TestACLEndpoint_Login_with_TokenLocality was reguardly being reported as failed even though
it was not failing. I took another look and I suspect it is because t.Parllel was being
called in a goroutine.

This would lead to strange behaviour which apparently confused the 'go test' runner.
2021-05-10 13:30:10 -04:00
Daniel Nephin c9ae72e72f
Merge pull request #10075 from hashicorp/dnephin/handle-raft-apply-errors
rpc: some cleanup of canRetry and ForwardRPC
2021-05-06 16:59:53 -04:00
Daniel Nephin 39d7d07922 state: reduce arguments to validateProposedConfigEntryInServiceGraph 2021-05-06 13:47:40 -04:00
Daniel Nephin 4905ac6f44 rpc: add tests for canRetry
Also accept an RPCInfo instead of interface{}. Accepting an interface
lead to a bug where the caller was expecting the arg to be the response
when in fact it was always passed the request. By accepting RPCInfo
it should indicate that this is actually the request value.

One caller of canRetry already passed an RPCInfo, the second handles
the type assertion before calling canRetry.
2021-05-06 13:30:07 -04:00
Daniel Nephin c38f4869ad rpc: remove unnecessary arg to ForwardRPC 2021-05-06 13:30:07 -04:00
Daniel Nephin 55f620d636
Merge pull request #10155 from hashicorp/dnephin/config-entry-remove-fields
config-entry: remove Kind and Name field from Mesh config entry
2021-05-04 17:27:56 -04:00
Daniel Nephin 0e5e1270b6 config-entries: add a test for the API client
Also fixes a bug with listing kind=mesh config entries. ValidateConfigEntryKind was only being used by
the List endpoint, and was yet another place where we have to enumerate all the kinds.

This commit removes ValidateConfigEntryKind and uses MakeConfigEntry instead. This change removes
the need to maintain two separate functions at the cost of creating an instance of the config entry which will be thrown away immediately.
2021-05-04 17:14:21 -04:00
Daniel Nephin df98027ad1 lint: fix warning by removing reference to deprecated interface 2021-05-04 14:09:14 -04:00
Paul Banks d47eea3a3f
Make Raft trailing logs and snapshot timing reloadable (#10129)
* WIP reloadable raft config

* Pre-define new raft gauges

* Update go-metrics to change gauge reset behaviour

* Update raft to pull in new metric and reloadable config

* Add snapshot persistance timing and installSnapshot to our 'protected' list as they can be infrequent but are important

* Update telemetry docs

* Update config and telemetry docs

* Add note to oldestLogAge on when it is visible

* Add changelog entry

* Update website/content/docs/agent/options.mdx

Co-authored-by: Matt Keeler <mkeeler@users.noreply.github.com>

Co-authored-by: Matt Keeler <mkeeler@users.noreply.github.com>
2021-05-04 15:36:53 +01:00
Luke Kysow eb84a856c4
Give descriptive error if auth method not found (#10163)
* Give descriptive error if auth method not found

Previously during a `consul login -method=blah`, if the auth method was not found, the
error returned would be "ACL not found". This is potentially confusing
because there may be many different ACLs involved in a login: the ACL of
the Consul client, perhaps the binding rule or the auth method.

Now the error will be "auth method blah not found", which is much easier
to debug.
2021-05-03 13:39:13 -07:00
Daniel Nephin bf4c289804 config-entry: remove Kind and Name field from Mesh config entry
No config entry needs a Kind field. It is only used to determine the Go type to
target. As we introduce new config entries (like this one) we can remove the kind field
and have the GetKind method return the single supported value.

In this case (similar to proxy-defaults) the Name field is also unnecessary. We always
use the same value. So we can omit the name field entirely.
2021-04-29 17:11:21 -04:00
Freddy 401f3010e0
Rename "cluster" config entry to "mesh" (#10127)
This config entry is being renamed primarily because in k8s the name
cluster could be confusing given that the config entry applies across
federated datacenters.

Additionally, this config entry will only apply to Consul as a service
mesh, so the more generic "cluster" name is not needed.
2021-04-28 16:13:29 -06:00
Daniel Nephin 3cda0a7cc4 health: create health.Client in Agent.New 2021-04-27 19:03:16 -04:00
Matt Keeler 6c639be8ec
Add prometheus guage definitions for replication metrics. (#10109) 2021-04-23 17:05:33 -04:00
Matt Keeler 09bf05ec5d
Add replication metrics (#10073) 2021-04-22 11:20:53 -04:00
Daniel Nephin 281d7616fa
Merge pull request #10045 from hashicorp/dnephin/state-proxy-defaults
state: remove config-entries kind index
2021-04-20 17:12:50 -04:00
Daniel Nephin 6d1a5b3629 Handle ErrChunkingResubmit.Error properly
Previously canRetry was attempting to retrieve this error from args, however there was never
any callers that would pass an error to args.

With the change to raftApply to move this error to the error return value, it is now possible
to receive this error from the err argument.

This commit updates canRetry to check for ErrChunkingResubmit in err.
2021-04-20 13:29:31 -04:00
Daniel Nephin 8654adfc53 Handle FSM.Apply errors in raftApply
Previously we were inconsistently checking the response for errors. This
PR moves the response-is-error check into raftApply, so that all callers
can look at only the error response, instead of having to know that
errors could come from two places.

This should expose a few more errors that were previously hidden because
in some calls to raftApply we were ignoring the response return value.

Also handle errors more consistently. In some cases we would log the
error before returning it. This can be very confusing because it can
result in the same error being logged multiple times. Instead return
a wrapped error.
2021-04-20 13:29:29 -04:00
Daniel Nephin 95b361ecc8 state: remove unnecessary kind index
The query can be performed using a prefix query on the ID index.

Also backport some enterprise changes to prevent conflicts.
2021-04-15 17:37:28 -04:00
Daniel Nephin eb7f4b7ea4 state: use index constants for ConfigEntry indexes 2021-04-15 17:30:07 -04:00
Freddy 5a9b75a443
Merge pull request #10016 from hashicorp/topology-update 2021-04-15 14:11:23 -06:00
R.B. Boyer c88512fe14
connect: update centralized upstreams representation in service-defaults (#10015) 2021-04-15 14:21:44 -05:00
Daniel Nephin 2a10f01bf5 snapshot: fix saving of auth methods
Previously only a single auth method would be saved to the snapshot. This commit fixes the typo
and adds to the test, to show that all auth methods are now saved.
2021-04-14 16:51:21 -04:00
freddygv 2ff8b9f2f5 Avoid returning a nil slice 2021-04-14 10:52:05 -06:00
Matt Keeler aa0eb60f57
Move static token resolution into the ACLResolver (#10013) 2021-04-14 12:39:35 -04:00
freddygv 7fd4c569ce Update viz endpoint to include topology from intentions 2021-04-14 10:20:15 -06:00