Emit the metric immediately so that after restarting an agent, the new expiry time will be
emitted. This is particularly important when this metric is being monitored, because we want
the alert to resovle itself immediately.
Also fixed a bug that was exposed in one of these metrics. The CARoot can be nil, so we have
to handle that case.
TestSubscribeBackend_IntegrationWithServer_DeliversAllMessages has been
flaking a few times. This commit cleans up the test a bit, and improves
the failure output.
I don't believe this actually fixes the flake, but I'm not able to
reproduce it reliably.
The failure appears to be that the event with Port=0 is being sent in
both the snapshot and as the first event after the EndOfSnapshot event.
Hopefully the improved logging will show us if these are really
duplicate events, or actually different events with different indexes.
partitionAuthorizer.config can be nil if it wasn't provided on calls to
newPartitionAuthorizer outside of the ACLResolver. This usage happens
often in tests.
This commit: adds a nil check when the config is going to be used,
updates non-test usage of NewPolicyAuthorizerWithDefaults to pass a
non-nil config, and dettaches setEnterpriseConf from the ACLResolver.
When issuing cross-partition service discovery requests, ACL filtering
often checks for NodeRead privileges. This is because the common return
type is a CheckServiceNode, which contains node data.
useInDatacenter was used to determine whether the mesh gateway mode of
the upstream should be returned in the discovery chain target. This
commit makes it so that the mesh gateway mode is returned every time,
and it is up to the caller to decide whether mesh gateways should be
watched or used.
Existing config entries prefixed by service- are specific to individual
services. Since this config entry applies to partitions it is being
renamed.
Additionally, the Partition label was changed to Name because using
Partition at the top-level and in the enterprise meta was leading to the
enterprise meta partition being dropped by msgpack.
The code for this was already removed, which suggests this is not actually testing what it claims.
I'm guessing these are still resolving because the tokens are converted to non-legacy tokens?
It seems like this was missing. Previously this was only called by init of ACLs during an upgrade.
Now that legacy ACLs are removed, nothing was calling stop.
Also remove an unused method from client.
This function is only run when the CAManager is a primary. Extracting this function
makes it clear which parts of UpdateConfiguration are run only in the primary and
also makes the cleanup logic simpler. Instead of both a defer and a local var we
can call the cleanup function in two places.
This commit renames functions to use a consistent pattern for identifying the functions that
can only be called when the Manager is run as the primary or secondary.
This is a step toward eventually creating separate types and moving these methods off of CAManager.
Add changelog to document what changed.
Add entry to telemetry section of the website to document what changed
Add docs to the usagemetric endpoint to help document the metrics in code
This commit two test failures:
1. Remove check for "in legacy ACL mode", the actual upgrade will be removed in a following commit.
2. Remove the early WaitForLeader in dc2, because with it the test was
failing with ACL not found.
TestAgentLeaks_Server was reporting a goroutine leak without this. Not sure if it would actually
be a leak in production or if this is due to the test setup, but seems easy enough to call it
this way until we remove legacyACLTokenUpgrade.
We no long need to read the acl serf tag, because servers are always either ACL enabled or
ACL disabled.
We continue to write the tag so that during an upgarde older servers will see the tag.
This commit two test failures:
1. Remove check for "in legacy ACL mode", the actual upgrade will be removed in a following commit.
2. Use the root token in WaitForLeader, because without it the test was
failing with ACL not found.
As part of removing the legacy ACL system ACL upgrading and the flag for
legacy ACLs is removed from Clients.
This commit also removes the 'acls' serf tag from client nodes. The tag is only ever read
from server nodes.
This commit also introduces a constant for the acl serf tag, to make it easier to track where
it is used.
Replace it with an implementation that returns an error, and rename some symbols
to use a Deprecated suffix to make it clear.
Also remove the ACLRequest struct, which is no longer referenced.
When converting these tests from the legacy ACL system to the new RPC endpoints I
initially changed most things to use _prefix rules, because that was equivalent to
the old legacy rules.
This commit modifies a few of those rules to be a bit more specific by replacing the _prefix
rule with a non-prefix one where possible.
In preparation for removing ACL.Apply.
Tests for ACL.Apply, ACL.GetPolicy, and ACL upgrades were removed
because all 3 of those will be removed shortly.
The forth test appears to be for the ACLResolver cache, so the test was moved to the correct
test file, and the name was updated to make it obvious what is being tested.
structs.ACLForceSet was deprecated 4 years ago, it should be safe to remove now.
ACLBootstrapNow was removed in a recent commit. While it is technically possible that a cluster with mixed version
could still attempt a legacy boostrap, we documented that the legacy system was deprecated in 1.4, so no
clusters that are being upgraded should be attempting a legacy boostrap.
* Port consul-enterprise #1123 to OSS
Signed-off-by: Mark Anderson <manderson@hashicorp.com>
* Fixup missing query field
Signed-off-by: Mark Anderson <manderson@hashicorp.com>
* change to re-trigger ci system
Signed-off-by: Mark Anderson <manderson@hashicorp.com>
* move intFromBool to be available for oss
* add expiry indexes
* remove dead code: `TokenExpirationIndex`
* fix remove indexer `TokenExpirationIndex`
* fix rebase issue
* convert `Roles` index to use `indexerSingle`
* split authmethod write indexer to oss and ent
* add index locality
* add locality unit tests
* move intFromBool to be available for oss
* use Bool func
* refactor `aclTokenList` to merge func
Some users are defining routing configurations that do not have associated services. This commit surfaces these configs in the topology visualization. Also fixes a minor internal bug with non-transparent proxy upstream/downstream references.
1) xds and grpc servers:
1.1) to use recovery middleware with callback that prints stack trace to log
1.2) callback turn the panic into a core.Internal error
2) added unit test for grpc server
Remove the error return, so that not handling is not reported as an
error by errcheck. It was returning the error passed as an arg
unmodified so there is no reason to return the same value that was
passed in.
Remove the term upstreams to remove any confusion with the term used in
service mesh.
Remove the AutoDisable field, and replace it with the TTL value, using 0
to indicate the setting is turned off.
Replace "not Before" with "After".
Add some test coverage to show the behaviour is still correct.
This field was never user-configurable. We always overwrote the value with 120s from
NonUserSource. However, we also never copied the value from RuntimeConfig to consul.Config,
So the value in NonUserSource was always ignored, and we used the default value of 30s
set by consul.DefaultConfig.
All of this code is an unnecessary distraction because a user can not actually configure
this value.
This commit removes the fields and uses a constant value instad. Someone attempting to set
acl.disabled_ttl in their config will now get an error about an unknown field, but previously
the value was completely ignored, so the new behaviour seems more correct.
We have to keep this field in the AutoConfig response for backwards compatibility, but the value
will be ignored by the client, so it doesn't really matter what value we set.
Tests only specified one of the fields, but in production we copy the
value from a single place, so we can do the same in tests.
The AutoConfig test broke because of the problem noticed in a previous
commit. The DisabledTTL is not wired up properly so it reports 0s here.
Changed the test to use an explicit value.
Follow up to https://github.com/hashicorp/consul/pull/10737#discussion_r682147950
Renames all variables for acl.Authorizer to use `authz`. Previously some
places used `rule` which I believe was an old name carried over from the
legacy ACL system.
A couple places also used authorizer.
This commit also removes another couple of authorizer nil checks that
are no longer necessary.
The constructor for Server is not at all the appropriate place to be setting default
values for a config struct that was passed in.
In production this value is always set from agent/config. In tests we should set the
default in a test helper.
This field has been unnecessary for a while now. It was always set to the same value
as PrimaryDatacenter. So we can remove the duplicate field and use PrimaryDatacenter
directly.
This change was made by GoLand refactor, which did most of the work for me.
This method suffered from similar naming to a couple other methods on Server, and had not great
re-use (2 callers). By copying a few of the lines into one of the callers we can move the
implementation into the second caller.
Once moved, we can see that ResolveTokenAndDefaultMeta is identical in both Client and Server, and
likely should be further refactored, possibly into ACLResolver.
This change is being made to make ACL resolution easier to trace.
This method was an alias for ACLResolver.ResolveTokenToIdentityAndAuthorizer. By removing the
method that does nothing the code becomes easier to trace.
ACL filtering only needs an authorizer and a logger. We can decouple filtering from
the ACLResolver by passing in the necessary logger.
This change is being made in preparation for moving the ACLResolver into an acl package
filterACLWithAuthorizer could never return an error. This change moves us a little bit
closer to being able to enable errcheck and catch problems caused by unhandled error
return values.
These functions are moved to the one place they are called to improve code locality.
They are being moved out of agent/consul/acl.go in preparation for moving
ACLResolver to an acl package.
These functions are used in only one place. Move the functions next to their one caller
to improve code locality.
This change is being made in preparation for moving the ACLResolver into an
acl package. The moved functions were previously in the same file as the ACLResolver.
By moving them out of that file we may be able to move the entire file
with fewer modifications.
* defer setting the state before returning to avoid being stuck in `INITIALIZING` state
* add changelog
* move comment with the right if statement
* ca: report state transition error from setSTate
* update comment to reflect state transition
Co-authored-by: Daniel Nephin <dnephin@hashicorp.com>
Follow up to: https://github.com/hashicorp/consul/pull/10738#discussion_r680190210
Previously we were passing an Authorizer that would always allow the
operation, then later checking the authorization using vetServiceTxnOp.
On the surface this seemed strange, but I think it was actually masking
a bug as well. Over time `servicePreApply` was changed to add additional
authorization for `service.Proxy.DestinationServiceName`, but because
we were passing a nil Authorizer, that authorization was not handled on
the txn_endpoint.
`TxnServiceOp.FillAuthzContext` has some special handling in enterprise,
so we need to make sure to continue to use that from the Txn endpoint.
This commit removes the `vetServiceTxnOp` function, and passes in the
`FillAuthzContext` function so that `servicePreApply` can be used by
both the catalog and txn endpoints. This should be much less error prone
and prevent bugs like this in the future.
1. do not emit the metric if Query fails
2. properly check for PrimaryUsersIntermediate, the logic was inverted
Also improve the logging by including the metric name in the log message
* fix state index for `CAOpSetRootsAndConfig` op
* add changelog
* Update changelog
Co-authored-by: Daniel Nephin <dnephin@hashicorp.com>
* remove the change log as it's not needed
Co-authored-by: Daniel Nephin <dnephin@hashicorp.com>
These checks were a bit more involved. They were previously skipping some code paths
when the authorizer was nil. After looking through these it seems correct to remove the
authz == nil check, since it will never evaluate to true.
These case are already impossible conditions, because most of these functions already start
with a check for ACLs being disabled. So the code path being removed could never be reached.
The one other case (ConnectAuthorized) was already changed in a previous commit. This commit
removes an impossible branch because authz == nil can never be true.
* return an error when the index is not valid
* check response as bool when applying `CAOpSetConfig`
* remove check for bool response
* fix error message and add check to test
* fix comment
* add changelog
A previous commit used SetHash on two of the cases to fix a data race. This commit applies
that change to all cases. Using SetHash in this test helper should ensure that the
test helper behaves closer to production.
Some global variables are patched to shorter values in these tests. But the goroutines that read
them can outlive the test because nothing waited for them to exit.
This commit adds a Wait() method to the routine manager, so that tests can wait for the goroutines
to exit. This prevents the data race because the 'reset to original value' can happen
after all other goroutines have stopped.
* ca: move provider creation into CAManager
This further decouples the CAManager from Server. It reduces the interface between them and
removes the need for the SetLogger method on providers.
* ca: move SignCertificate to CAManager
To reduce the scope of Server, and keep all the CA logic together
* ca: move SignCertificate to the file where it is used
* auto-config: move autoConfigBackend impl off of Server
Most of these methods are used exclusively for the AutoConfig RPC
endpoint. This PR uses a pattern that we've used in other places as an
incremental step to reducing the scope of Server.
* fix linter issues
* check error when `raftApplyMsgpack`
* ca: move SignCertificate to CAManager
To reduce the scope of Server, and keep all the CA logic together
* check expiry date of the intermediate before using it to sign a leaf
* fix typo in comment
Co-authored-by: Kyle Havlovitz <kylehav@gmail.com>
* Fix test name
* do not check cert start date
* wrap error to mention it is the intermediate expired
* Fix failing test
* update comment
Co-authored-by: Daniel Nephin <dnephin@hashicorp.com>
* use shim to avoid sleep in test
* add root cert validation
* remove duplicate code
* Revert "fix linter issues"
This reverts commit 6356302b54f06c8f2dee8e59740409d49e84ef24.
* fix import issue
* gofmt leader_connect_ca
* add changelog entry
* update error message
Co-authored-by: Freddy <freddygv@users.noreply.github.com>
* fix error message in test
Co-authored-by: Daniel Nephin <dnephin@hashicorp.com>
Co-authored-by: Kyle Havlovitz <kylehav@gmail.com>
Co-authored-by: Freddy <freddygv@users.noreply.github.com>