The query metrics are actually reported for all read queries, not only
ones that use a MinIndex to block for updates.
Also clarify the raft.apply metric is only on the leader.
As part of this change, we ensure that the SAN extensions are marked as
critical when the subject is empty so that AWS PCA tolerates the loss of
common names well and continues to function as a Connect CA provider.
Parts of this currently hack around a bug in crypto/x509 and can be
removed after https://go-review.googlesource.com/c/go/+/329129 lands in
a Go release.
Note: the AWS PCA tests do not run automatically, but the following
passed locally for me:
ENABLE_AWS_PCA_TESTS=1 go test ./agent/connect/ca -run TestAWS
With an optional interface that providers can use to indicate if they
use an intermediate cert in the primary DC.
This removes the need to look up the provider config when renewing the
intermediate.
CatalogDestinationsOnly is a passthrough that would enable dialing
addresses outside of Consul's catalog. However, when this flag is set to
true only _connect_ endpoints for services can be dialed.
This flag is being renamed to signal that non-Connect endpoints can't be
dialed by transparent proxies when the value is set to true.
Both NextLink and NextNoBlock had the same logic, with slightly
different return values. By adding a bool return value (similar to map
lookups) we can remove the duplicate method.
The head of the topic buffer was being ignored when creating a snapshot. This commit fixes
the bug by ensuring that the head of the topic buffer is included in the snapshot
before handing it off to the subscription.
This PR adds cluster members to the metrics API. The number of members per
segment are reported as well as the total number of members.
Tested by running a multi-node cluster locally and ensuring the numbers were
correct. Also added unit test coverage to add the new expected gauges to
existing test cases.
We have seen test flakes caused by 'concurrent map read and map write', and the race detector
reports the problem as well (prevent us from running some tests with -race).
The root of the problem is the grpc expects resolvers to be registered at init time
before any requests are made, but we were using a separate resolver for each test.
This commit introduces a resolver registry. The registry is registered as the single
resolver for the consul scheme. Each test uses the Authority section of the target
(instead of the scheme) to identify the resolver that should be used for the test.
The scheme is used for lookup, which is why it can no longer be used as the unique
key.
This allows us to use a lock around the map of resolvers, preventing the data race.
The bulk of this commit is moving the LeaderRoutineManager from the agent/consul package into its own package: lib/gort. It also got a renaming and its Start method now requires a context. Requiring that context required updating a whole bunch of other places in the code.
This refactor is to make it easier to see how serf feature flags are
encoded as serf tags, and where those feature flags are read.
- use constants for both the prefix and feature flag name. A constant
makes it much easier for an IDE to locate the read and write location.
- isolate the feature-flag encoding logic in the metadata package, so
that the feature flag prefix can be unexported. Only expose a function
for encoding the flags into tags. This logic is now next to the logic
which reads the tags.
- remove the duplicate `addEnterpriseSerfTags` functions. Both Client
and Server structs had the same implementation. And neither
implementation needed the method receiver.
The prior solution to call reply.Reset() aged poorly since newer fields
were added to the reply, but not added to Reset() leading serial
blocking query loops on the server to blend replies.
This could manifest as a service-defaults protocol change from
default=>http not reverting back to default after the config entry
reponsible was deleted.
TestACLEndpoint_Login_with_TokenLocality was reguardly being reported as failed even though
it was not failing. I took another look and I suspect it is because t.Parllel was being
called in a goroutine.
This would lead to strange behaviour which apparently confused the 'go test' runner.
Also accept an RPCInfo instead of interface{}. Accepting an interface
lead to a bug where the caller was expecting the arg to be the response
when in fact it was always passed the request. By accepting RPCInfo
it should indicate that this is actually the request value.
One caller of canRetry already passed an RPCInfo, the second handles
the type assertion before calling canRetry.
Also fixes a bug with listing kind=mesh config entries. ValidateConfigEntryKind was only being used by
the List endpoint, and was yet another place where we have to enumerate all the kinds.
This commit removes ValidateConfigEntryKind and uses MakeConfigEntry instead. This change removes
the need to maintain two separate functions at the cost of creating an instance of the config entry which will be thrown away immediately.
* WIP reloadable raft config
* Pre-define new raft gauges
* Update go-metrics to change gauge reset behaviour
* Update raft to pull in new metric and reloadable config
* Add snapshot persistance timing and installSnapshot to our 'protected' list as they can be infrequent but are important
* Update telemetry docs
* Update config and telemetry docs
* Add note to oldestLogAge on when it is visible
* Add changelog entry
* Update website/content/docs/agent/options.mdx
Co-authored-by: Matt Keeler <mkeeler@users.noreply.github.com>
Co-authored-by: Matt Keeler <mkeeler@users.noreply.github.com>
* Give descriptive error if auth method not found
Previously during a `consul login -method=blah`, if the auth method was not found, the
error returned would be "ACL not found". This is potentially confusing
because there may be many different ACLs involved in a login: the ACL of
the Consul client, perhaps the binding rule or the auth method.
Now the error will be "auth method blah not found", which is much easier
to debug.
No config entry needs a Kind field. It is only used to determine the Go type to
target. As we introduce new config entries (like this one) we can remove the kind field
and have the GetKind method return the single supported value.
In this case (similar to proxy-defaults) the Name field is also unnecessary. We always
use the same value. So we can omit the name field entirely.
This config entry is being renamed primarily because in k8s the name
cluster could be confusing given that the config entry applies across
federated datacenters.
Additionally, this config entry will only apply to Consul as a service
mesh, so the more generic "cluster" name is not needed.
Previously canRetry was attempting to retrieve this error from args, however there was never
any callers that would pass an error to args.
With the change to raftApply to move this error to the error return value, it is now possible
to receive this error from the err argument.
This commit updates canRetry to check for ErrChunkingResubmit in err.
Previously we were inconsistently checking the response for errors. This
PR moves the response-is-error check into raftApply, so that all callers
can look at only the error response, instead of having to know that
errors could come from two places.
This should expose a few more errors that were previously hidden because
in some calls to raftApply we were ignoring the response return value.
Also handle errors more consistently. In some cases we would log the
error before returning it. This can be very confusing because it can
result in the same error being logged multiple times. Instead return
a wrapped error.
Previously only a single auth method would be saved to the snapshot. This commit fixes the typo
and adds to the test, to show that all auth methods are now saved.
This PR replaces the original boolean used to configure transparent
proxy mode. It was replaced with a string mode that can be set to:
- "": Empty string is the default for when the setting should be
defaulted from other configuration like config entries.
- "direct": Direct mode is how applications originally opted into the
mesh. Proxy listeners need to be dialed directly.
- "transparent": Transparent mode enables configuring Envoy as a
transparent proxy. Traffic must be captured and redirected to the
inbound and outbound listeners.
This PR also adds a struct for transparent proxy specific configuration.
Initially this is not stored as a pointer. Will revisit that decision
before GA.