The bulk of this commit is moving the LeaderRoutineManager from the agent/consul package into its own package: lib/gort. It also got a renaming and its Start method now requires a context. Requiring that context required updating a whole bunch of other places in the code.
This refactor is to make it easier to see how serf feature flags are
encoded as serf tags, and where those feature flags are read.
- use constants for both the prefix and feature flag name. A constant
makes it much easier for an IDE to locate the read and write location.
- isolate the feature-flag encoding logic in the metadata package, so
that the feature flag prefix can be unexported. Only expose a function
for encoding the flags into tags. This logic is now next to the logic
which reads the tags.
- remove the duplicate `addEnterpriseSerfTags` functions. Both Client
and Server structs had the same implementation. And neither
implementation needed the method receiver.
The prior solution to call reply.Reset() aged poorly since newer fields
were added to the reply, but not added to Reset() leading serial
blocking query loops on the server to blend replies.
This could manifest as a service-defaults protocol change from
default=>http not reverting back to default after the config entry
reponsible was deleted.
TestACLEndpoint_Login_with_TokenLocality was reguardly being reported as failed even though
it was not failing. I took another look and I suspect it is because t.Parllel was being
called in a goroutine.
This would lead to strange behaviour which apparently confused the 'go test' runner.
Also accept an RPCInfo instead of interface{}. Accepting an interface
lead to a bug where the caller was expecting the arg to be the response
when in fact it was always passed the request. By accepting RPCInfo
it should indicate that this is actually the request value.
One caller of canRetry already passed an RPCInfo, the second handles
the type assertion before calling canRetry.
Also fixes a bug with listing kind=mesh config entries. ValidateConfigEntryKind was only being used by
the List endpoint, and was yet another place where we have to enumerate all the kinds.
This commit removes ValidateConfigEntryKind and uses MakeConfigEntry instead. This change removes
the need to maintain two separate functions at the cost of creating an instance of the config entry which will be thrown away immediately.
* WIP reloadable raft config
* Pre-define new raft gauges
* Update go-metrics to change gauge reset behaviour
* Update raft to pull in new metric and reloadable config
* Add snapshot persistance timing and installSnapshot to our 'protected' list as they can be infrequent but are important
* Update telemetry docs
* Update config and telemetry docs
* Add note to oldestLogAge on when it is visible
* Add changelog entry
* Update website/content/docs/agent/options.mdx
Co-authored-by: Matt Keeler <mkeeler@users.noreply.github.com>
Co-authored-by: Matt Keeler <mkeeler@users.noreply.github.com>
* Give descriptive error if auth method not found
Previously during a `consul login -method=blah`, if the auth method was not found, the
error returned would be "ACL not found". This is potentially confusing
because there may be many different ACLs involved in a login: the ACL of
the Consul client, perhaps the binding rule or the auth method.
Now the error will be "auth method blah not found", which is much easier
to debug.
No config entry needs a Kind field. It is only used to determine the Go type to
target. As we introduce new config entries (like this one) we can remove the kind field
and have the GetKind method return the single supported value.
In this case (similar to proxy-defaults) the Name field is also unnecessary. We always
use the same value. So we can omit the name field entirely.
This config entry is being renamed primarily because in k8s the name
cluster could be confusing given that the config entry applies across
federated datacenters.
Additionally, this config entry will only apply to Consul as a service
mesh, so the more generic "cluster" name is not needed.
Previously canRetry was attempting to retrieve this error from args, however there was never
any callers that would pass an error to args.
With the change to raftApply to move this error to the error return value, it is now possible
to receive this error from the err argument.
This commit updates canRetry to check for ErrChunkingResubmit in err.
Previously we were inconsistently checking the response for errors. This
PR moves the response-is-error check into raftApply, so that all callers
can look at only the error response, instead of having to know that
errors could come from two places.
This should expose a few more errors that were previously hidden because
in some calls to raftApply we were ignoring the response return value.
Also handle errors more consistently. In some cases we would log the
error before returning it. This can be very confusing because it can
result in the same error being logged multiple times. Instead return
a wrapped error.
Previously only a single auth method would be saved to the snapshot. This commit fixes the typo
and adds to the test, to show that all auth methods are now saved.
This PR replaces the original boolean used to configure transparent
proxy mode. It was replaced with a string mode that can be set to:
- "": Empty string is the default for when the setting should be
defaulted from other configuration like config entries.
- "direct": Direct mode is how applications originally opted into the
mesh. Proxy listeners need to be dialed directly.
- "transparent": Transparent mode enables configuring Envoy as a
transparent proxy. Traffic must be captured and redirected to the
inbound and outbound listeners.
This PR also adds a struct for transparent proxy specific configuration.
Initially this is not stored as a pointer. Will revisit that decision
before GA.
This is needed in case the client proxy is in TransparentProxy mode.
Typically they won't have explicit configuration for every upstream, so
this ensures the settings can be applied to all of them when generating
xDS config.
New clients in transparent proxy mode can send requests for service
config resolution without any upstream args because they do not have
explicitly defined upstreams.
Old clients on the other hand will never send requests without the
Upstreams args unless they don't have upstreams, in which case we do not
send back upstream config.
As part of this change the indexer will now be case insensitive by using
the lower case value. This should be safe because previously we always
had lower case strings.
This change was made out of convenience. All the other indexers use
lowercase, so we can re-use the indexFromQuery function by using
lowercase here as well.
Previously we were encoding the UUID as a string, but the index it references uses a UUID
so this index can also use an encoded UUID to save a bit of memory.
Prefix queries are generally being used to match part of a partial
index. We can support these indexes by using a function that accept
different types for each subset of the index.
What I found interesting is that in the generic StringFieldIndexer the
implementation for PrefixFromArgs would remove the trailing null, but
at least in these 2 cases we actually want a null terminated string.
We simply want fewer components in the string.
The TestServiceHealthEventsFromChanges function was over 1400 lines.
Attempting to debug test failures in test functions this large is
difficult. It requires scrolling to the line which defines the testcase
because the failure message only includes the line number of the
assertion, not the line number of the test case.
This is an excellent example of where test tables stop working well, and
start being a problem. To mitigate this problem, the runCase pattern can
be used. When one of these tests fails, a failure message will print the
line number of both the test case and the assertion. This allows a
developer to quickly jump to both of the relevant lines, signficanting
reducing the time it takes to debug test failures.
For example, one such failure could look like this:
catalog_events_test.go:1610: case: service reg, new node
catalog_events_test.go:1605: assertion failed: values are not equal
ResolveServiceConfig is called by service manager before the proxy
registration is in the catalog. Therefore we should pass proxy
registration flags in the request rather than trying to fetch
them from the state store (where they may not exist yet).
This is done because after removing ID and NodeName from
ServiceConfigRequest we will no longer know whether a request coming in
is for a Consul client earlier than v1.10.
This enables it to be called for many upstreams or downstreams of a
service while only querying intentions once.
Additionally, decisions are now optionally denied due to L7 permissions
being present. This enables the function to be used to filter for
potential upstreams/downstreams of a service.
Previously this type was defined in structs, but unlike the other types in structs this type
is not used by RPC requests. By moving it to state we can better indicate that this is not
an API type, but part of the state implementation.
I added this recently without realizing that the method already existed and was named
NamespaceOrEmpty. Replace all calls to GetNamespace with NamespaceOrEmpty or NamespaceOrDefault
as appropriate.
Document that this comparison should roughly match MatchesKey
Only sort by overrideKey or service name, but not both
Add namespace to the sort.
The client side also builds a map of these based on the namespace/node/service key, so the only order
that really matters is the ordering of register/dereigster events.
Refactored out a function that can be used for both the snapshot and stream of events to translate
an event into an appropriate connect event.
Previously terminating gateway events would have used the wrong key in the snapshot, which would have
caused them to be filtered out later on.
Also removed an unused function, and some commented out code.
Health of a terminating gateway instance changes
- Generate an event for creating/destroying this instance of the terminating gateway,
duplicate it for each affected service
Co-Authored-By: Kyle Havlovitz <kylehav@gmail.com>
These new functional indexers provide a few advantages:
1. enterprise differences can be isolated to a single function (the
indexer function), making code easier to change
2. as a consequence of (1) we no longer need to wrap all the calls to
Txn operations, making code easier to read.
3. by removing reflection we should increase the performance of all
operations.
One important change is in making all the function signatures the same.
https://blog.golang.org/errors-are-values
An extra boolean return value for SingleIndexer.FromObject is superfluous.
The error value can indicate when the index value could not be created.
By removing this extra return value we can use the same signature for both
indexer functions.
This has the nice properly of a function being usable for both indexing operations.
By using a new pattern for more specific indexes. This allows us to use
the same index for both service checks and node checks. It removes the
abstraction around memdb.Txn operations, and isolates all of the
enterprise differences in a single place (the indexer).
Previously a snapshot created as part of a resumse-stream request could have incorrectly
cached the newSnapshotToFollow event. This would cause clients to error because they
received an unexpected framing event.