These labels should be set by whatever process scrapes Consul (for
prometheus), or by the agent that receives them (for datadog/statsd).
We need to remove them here because the labels are part of the "metric
key", so we'd have to pre-declare the metrics with the labels. We could
do that, but that is extra work for labels that should be added from
elsewhere.
Also renames the closure to be more descriptive.
Prometheus scrapes metrics from each process, so when leadership transfers to a different node
the previous leader would still be reporting the old cached value.
By setting NaN, I believe we should zero-out the value, so that prometheus should only consider the
value from the new leader.
Emit the metric immediately so that after restarting an agent, the new expiry time will be
emitted. This is particularly important when this metric is being monitored, because we want
the alert to resovle itself immediately.
Also fixed a bug that was exposed in one of these metrics. The CARoot can be nil, so we have
to handle that case.
TestSubscribeBackend_IntegrationWithServer_DeliversAllMessages has been
flaking a few times. This commit cleans up the test a bit, and improves
the failure output.
I don't believe this actually fixes the flake, but I'm not able to
reproduce it reliably.
The failure appears to be that the event with Port=0 is being sent in
both the snapshot and as the first event after the EndOfSnapshot event.
Hopefully the improved logging will show us if these are really
duplicate events, or actually different events with different indexes.
partitionAuthorizer.config can be nil if it wasn't provided on calls to
newPartitionAuthorizer outside of the ACLResolver. This usage happens
often in tests.
This commit: adds a nil check when the config is going to be used,
updates non-test usage of NewPolicyAuthorizerWithDefaults to pass a
non-nil config, and dettaches setEnterpriseConf from the ACLResolver.
When issuing cross-partition service discovery requests, ACL filtering
often checks for NodeRead privileges. This is because the common return
type is a CheckServiceNode, which contains node data.
useInDatacenter was used to determine whether the mesh gateway mode of
the upstream should be returned in the discovery chain target. This
commit makes it so that the mesh gateway mode is returned every time,
and it is up to the caller to decide whether mesh gateways should be
watched or used.
Existing config entries prefixed by service- are specific to individual
services. Since this config entry applies to partitions it is being
renamed.
Additionally, the Partition label was changed to Name because using
Partition at the top-level and in the enterprise meta was leading to the
enterprise meta partition being dropped by msgpack.
The code for this was already removed, which suggests this is not actually testing what it claims.
I'm guessing these are still resolving because the tokens are converted to non-legacy tokens?
It seems like this was missing. Previously this was only called by init of ACLs during an upgrade.
Now that legacy ACLs are removed, nothing was calling stop.
Also remove an unused method from client.
This function is only run when the CAManager is a primary. Extracting this function
makes it clear which parts of UpdateConfiguration are run only in the primary and
also makes the cleanup logic simpler. Instead of both a defer and a local var we
can call the cleanup function in two places.
This commit renames functions to use a consistent pattern for identifying the functions that
can only be called when the Manager is run as the primary or secondary.
This is a step toward eventually creating separate types and moving these methods off of CAManager.
Add changelog to document what changed.
Add entry to telemetry section of the website to document what changed
Add docs to the usagemetric endpoint to help document the metrics in code
This commit two test failures:
1. Remove check for "in legacy ACL mode", the actual upgrade will be removed in a following commit.
2. Remove the early WaitForLeader in dc2, because with it the test was
failing with ACL not found.
TestAgentLeaks_Server was reporting a goroutine leak without this. Not sure if it would actually
be a leak in production or if this is due to the test setup, but seems easy enough to call it
this way until we remove legacyACLTokenUpgrade.
We no long need to read the acl serf tag, because servers are always either ACL enabled or
ACL disabled.
We continue to write the tag so that during an upgarde older servers will see the tag.
This commit two test failures:
1. Remove check for "in legacy ACL mode", the actual upgrade will be removed in a following commit.
2. Use the root token in WaitForLeader, because without it the test was
failing with ACL not found.
As part of removing the legacy ACL system ACL upgrading and the flag for
legacy ACLs is removed from Clients.
This commit also removes the 'acls' serf tag from client nodes. The tag is only ever read
from server nodes.
This commit also introduces a constant for the acl serf tag, to make it easier to track where
it is used.
Replace it with an implementation that returns an error, and rename some symbols
to use a Deprecated suffix to make it clear.
Also remove the ACLRequest struct, which is no longer referenced.
When converting these tests from the legacy ACL system to the new RPC endpoints I
initially changed most things to use _prefix rules, because that was equivalent to
the old legacy rules.
This commit modifies a few of those rules to be a bit more specific by replacing the _prefix
rule with a non-prefix one where possible.
In preparation for removing ACL.Apply.
Tests for ACL.Apply, ACL.GetPolicy, and ACL upgrades were removed
because all 3 of those will be removed shortly.
The forth test appears to be for the ACLResolver cache, so the test was moved to the correct
test file, and the name was updated to make it obvious what is being tested.
structs.ACLForceSet was deprecated 4 years ago, it should be safe to remove now.
ACLBootstrapNow was removed in a recent commit. While it is technically possible that a cluster with mixed version
could still attempt a legacy boostrap, we documented that the legacy system was deprecated in 1.4, so no
clusters that are being upgraded should be attempting a legacy boostrap.
* Port consul-enterprise #1123 to OSS
Signed-off-by: Mark Anderson <manderson@hashicorp.com>
* Fixup missing query field
Signed-off-by: Mark Anderson <manderson@hashicorp.com>
* change to re-trigger ci system
Signed-off-by: Mark Anderson <manderson@hashicorp.com>
* move intFromBool to be available for oss
* add expiry indexes
* remove dead code: `TokenExpirationIndex`
* fix remove indexer `TokenExpirationIndex`
* fix rebase issue
* convert `Roles` index to use `indexerSingle`
* split authmethod write indexer to oss and ent
* add index locality
* add locality unit tests
* move intFromBool to be available for oss
* use Bool func
* refactor `aclTokenList` to merge func
Some users are defining routing configurations that do not have associated services. This commit surfaces these configs in the topology visualization. Also fixes a minor internal bug with non-transparent proxy upstream/downstream references.
1) xds and grpc servers:
1.1) to use recovery middleware with callback that prints stack trace to log
1.2) callback turn the panic into a core.Internal error
2) added unit test for grpc server