This will both save on unnecessary raft operations as well as
unnecessarily incrementing the raft modify index of config entries
subject to no-op updates.
This commit syncs ENT changes to the OSS repo.
Original commit details in ENT:
```
commit 569d25f7f4578981c3801e6e067295668210f748
Author: FFMMM <FFMMM@users.noreply.github.com>
Date: Thu Feb 10 10:23:33 2022 -0800
Vendor fork net rpc (#1538)
* replace net/rpc w consul-net-rpc/net/rpc
Signed-off-by: FFMMM <FFMMM@users.noreply.github.com>
* replace msgpackrpc and go-msgpack with fork from mono repo
Signed-off-by: FFMMM <FFMMM@users.noreply.github.com>
* gofmt all files touched
Signed-off-by: FFMMM <FFMMM@users.noreply.github.com>
```
Signed-off-by: FFMMM <FFMMM@users.noreply.github.com>
* First phase of refactoring PermissionDeniedError
Add extended type PermissionDeniedByACLError that captures information
about the accessor, particular permission type and the object and name
of the thing being checked.
It may be worth folding the test and error return into a single helper
function, that can happen at a later date.
Signed-off-by: Mark Anderson <manderson@hashicorp.com>
Due to timing, a transparent proxy could have two upstreams to dial
directly with the same address.
For example:
- The orders service can dial upstreams shipping and payment directly.
- An instance of shipping at address 10.0.0.1 is deregistered.
- Payments is scaled up and scheduled to have address 10.0.0.1.
- The orders service receives the event for the new payments instance
before seeing the deregistration for the shipping instance. At this
point two upstreams have the same passthrough address and Envoy will
reject the listener configuration.
To disambiguate this commit considers the Raft index when storing
passthrough addresses. In the example above, 10.0.0.1 would only be
associated with the newer payments service instance.
This commit makes two changes to the validation.
Previously we would call this validation in GenerateRoot, which happens
both on initialization (when a follower becomes leader), and when a
configuration is updated. We only want to do this validation during
config update so the logic was moved to the UpdateConfiguration
function.
Previously we would compare the config values against the actual cert.
This caused problems when the cert was created manually in Vault (not
created by Consul). Now we compare the new config against the previous
config. Using a already created CA cert should never error now.
Adding the key bit and types to the config should only error when
the previous values were not the defaults.
These two tests require debug logging enabled, because they look for log lines.
Also switched to testify assertions because the previous errors were not clear.
This test found a bug in the secondary. We were appending the root cert
to the PEM, but that cert was already appended. This was failing
validation in Vault here:
https://github.com/hashicorp/vault/blob/sdk/v0.3.0/sdk/helper/certutil/types.go#L329
Previously this worked because self signed certs have the same
SubjectKeyID and AuthorityKeyID. So having the same self-signed cert
repeated doesn't fail that check.
However with an intermediate that is not self-signed, those values are
different, and so we fail the check. A test I added in a previous commit
should show that this continues to work with self-signed root certs as
well.
This is safer than embedding two interface because there are a number of
places where we check the concrete type. If we check the concrete type
on the top-level interface it will fail. So instead expose the
ACLIdentity from a method.
This change allows us to remove one of the last remaining duplicate
resolve token methods (Server.ResolveToken).
With this change we are down to only 2, where the second one also
handles setting the default EnterpriseMeta from the token.
Now that ACLResolver is embedded we don't need ResolveTokenToIdentity on
Client and Server.
Moving ResolveTokenAndDefaultMeta to ACLResolver removes the duplicate
implementation.
set -euo pipefail
unset CDPATH
cd "$(dirname "$0")"
for f in $(git grep '\brequire := require\.New(' | cut -d':' -f1 | sort -u); do
echo "=== require: $f ==="
sed -i '/require := require.New(t)/d' $f
# require.XXX(blah) but not require.XXX(tblah) or require.XXX(rblah)
sed -i 's/\brequire\.\([a-zA-Z0-9_]*\)(\([^tr]\)/require.\1(t,\2/g' $f
# require.XXX(tblah) but not require.XXX(t, blah)
sed -i 's/\brequire\.\([a-zA-Z0-9_]*\)(\(t[^,]\)/require.\1(t,\2/g' $f
# require.XXX(rblah) but not require.XXX(r, blah)
sed -i 's/\brequire\.\([a-zA-Z0-9_]*\)(\(r[^,]\)/require.\1(t,\2/g' $f
gofmt -s -w $f
done
for f in $(git grep '\bassert := assert\.New(' | cut -d':' -f1 | sort -u); do
echo "=== assert: $f ==="
sed -i '/assert := assert.New(t)/d' $f
# assert.XXX(blah) but not assert.XXX(tblah) or assert.XXX(rblah)
sed -i 's/\bassert\.\([a-zA-Z0-9_]*\)(\([^tr]\)/assert.\1(t,\2/g' $f
# assert.XXX(tblah) but not assert.XXX(t, blah)
sed -i 's/\bassert\.\([a-zA-Z0-9_]*\)(\(t[^,]\)/assert.\1(t,\2/g' $f
# assert.XXX(rblah) but not assert.XXX(r, blah)
sed -i 's/\bassert\.\([a-zA-Z0-9_]*\)(\(r[^,]\)/assert.\1(t,\2/g' $f
gofmt -s -w $f
done
Remove some unnecessary comments around query_blocking metric. The only
line that needs any comments in the atomic decrement.
Cleanup the block and return comments and logic. The old comment about
AbandonCh may have been relevant before, but it is expected behaviour
now.
The logic was simplified by inverting the err condition.
This helps keep the logic in blockingQuery more focused. In the future we
may have a separate struct for RPC queries which may allow us to move this
off of Server.
This safeguard should be safe to apply in general. We are already
applying it to non-blocking queries that call blockingQuery, so it
should be fine to apply it to others.
To remove the TODO, and make it more readable.
In general this reduces the scope of variables, making them easier to reason about.
It also introduces more early returns so that we can see the flow from the structure
of the function.
`newIntermediate` is always equal to `needsNewIntermediate`, so we can
remove the extra variable and use the original directly.
Also remove the `activeRoot.ID != newActiveRoot.ID` case from an if,
because that case is already checked above, and `needsNewIntermediate` will
already be true in that case.
This condition now reads a lot better:
> Persist a new root if we did not have one before, or if generated a new intermediate.
In the previous commit the single use of this storedRoot was removed.
In this commit the original objective is completed. The
Provider.ActiveRoot is being removed because
1. the secondary should get the active root from the Consul primary DC,
not the provider, so that secondary DCs do not need to communicate
with a provider instance in a different DC.
2. so that the Provider.ActiveRoot interface can be changed without
impacting other code paths.
This method had only one caller, which always looked for the active
root. This commit moves the lookup into the method to reduce the logic
in the one caller.
This is being done in preparation for a larger change. Keeping this
separate so it is easier to see.
The `storedRootID != primaryRoots.ActiveRootID` is being removed because
these can never be different.
The `storedRootID` comes from `provider.ActiveRoot`, the
`primaryRoots.ActiveRootID` comes from the store `CARoot` from the
primary. In both cases the source of the data is the primary DC.
Technically they could be different if someone modified the provider
outside of Consul, but that would break many things, so is not a
supported flow.
If these were out of sync because of ordering of events then the
secondary will soon receive an update to `primaryRoots` and everything
will be sorted out again.
ActiveRoot should not be called from the secondary DC, because there
should not be a requirement to run the same Vault instance in a
secondary DC. SignIntermediate is called in a secondary DC, so it should
not call ActiveRoot
We would also like to change the interface of ActiveRoot so that we can
support using an intermediate cert as the primary CA in Consul. In
preparation for making that change I am reducing the number of calls to
ActiveRoot, so that there are fewer code paths to modify when the
interface changes.
This change required a change to the mockCAServerDelegate we use in
tests. It was returning the RootCert for SignIntermediate, but that is
not an accurate fake of production. In production this would also be a
separate cert.
Immediately above this line we are already appending the full list of
intermediates. The `provider.ActiveIntermediate` MUST be in this list of
intermediates because it must be available to all the other non-leader
Servers. If it was not in this list of intermediates then any proxy
that received data from a non-leader would have the wrong certs.
This is being removed now because we are planning on changing the
`Provider.ActiveIntermediate` interface, and removing these extra calls ahead of
time helps make that change easier.
Using tracing and cpu profiling I found that the majority of the time in
these test cases is spent generating a private key. We really don't need
separate private keys, so we can generate only one and use it for all
cases.
With this change the test runs much faster.
Fix the name to match the function it is testing
Remove unused code
Fix the signature, instead of returning (error, string) which should be (string, error)
accept a testing.T to emit errors.
Handle the error from encode.
The only function passed to SnapshotRPC today always returns a nil error, so there's no
way to exercise this bug in practice. This change is being made for correctness so that
it doesn't become a problem in the future, if we ever pass a different function to
SnapshotRPC.
Error messages related to service and check operations previously included
the following substrings:
- service %q
- check %q
From this error message, it isn't clear that the expected field is the ID for
the entity, not the name. For example, if the user has a service named test,
the error message would read 'Unknown service "test"'. This is misleading -
a service with that *name* does exist, but not with that *ID*.
The substrings above have been modified to make it clear that ID is needed,
not name:
- service with ID %q
- check with ID %q
Co-authored-by: Chris S. Kim <ckim@hashicorp.com>
Co-authored-by: Dan Upton <daniel@floppy.co>
This query has been incorrectly querying by accessor ID since New ACLs
were added. However, the legacy token compat allowed this to continue to
work, since it made a fallback query for the anonymousToken ID.
PR #11184 removed this legacy token query, which means that the query by
accessor ID is now the only check for the anonymous token's existence.
This PR updates the GetBySecret call to use the secret ID of the token.
These helper functions actually end up hiding important setup details
that should be visible from the test case. We already have a convenient
way of setting this config when calling newTestServerWithConfig.
I suspect one problem was that we set structs.IntermediateCertRenewInterval to 1ms, which meant
that in some cases the intermediate could renew before we stored the original value.
Another problem was that the 'wait for intermediate' loop was calling the provider.ActiveIntermediate,
but the comparison needs to use the RPC endpoint to accurately represent a user request. So
changing the 'wait for' to use the state store ensures we don't race.
Also moves the patching into a separate function.
Removes the addition of ca.CertificateTimeDriftBuffer as part of calculating halfTime. This was added
in a previous commit to attempt to fix the flake, but it did not appear to fix the problem. Adding the
time here was making the tests fail when using the shared patch
function. It's not clear to me why, but there's no reason we should be
including this time in the halfTime calculation.
Use the new verifyLearfCert to show the cert verifies with intermediates
from both sources. This required using the RPC interface so that the
leaf pem was constructed correctly.
Add IndexedCARoots.Active since that is a common operation we see in a
few places.
Previously we had a couple copies that reproduced the FSM operation.
These copies introduce risk that the test does not accurately match
production.
This PR removes the test versions of the FSM operation, and exports the
real production FSM operation so that it can be used in tests.
The consul provider tests did need to change because of this. Previously
we would return a hardcoded value of 2, but in production this value is
always incremented.
Failing over to a partition is more siimilar to failing over to another
datacenter than it is to failing over to a namespace. In a future
release we should update how localities for failover are specified. We
should be able to accept a list of localities which can include both
partition and datacenter.
* Add partition fields to targets like service route destinations
* Update validation to prevent cross-DC + cross-partition references
* Handle partitions when reading config entries for disco chain
* Encode partition in compiled targets
Given that we do not allow wildcard partitions in intentions, no one ixn
can override the DefaultAllow setting. Only the default ACL policy
applies across all partitions.
This table purposefully does not index by partition/namespace. It's a
global view into all service names.
This table is intended to replace the current serviceListTxn watch in
intentionTopologyTxn. For cross-partition transparent proxying we need
to be able to calculate upstreams from intentions in any partition. This
means that the existing serviceListTxn function is insufficient since
it's scoped to a partition.
Moving away from that function is also beneficial because it watches the
main "services" table, so watchers will wake up when any instance is
registered or deregistered.
* state: port KV and Tombstone tables to new pattern
* go fmt'ed
* handle wildcards for tombstones
* Fix graveyard ent vs oss
* fix oss compilation error
* add partition to tombstones and kv state store indexes
* refactor to use `indexWithEnterpriseIndexable`
* Apply suggestions from code review
Co-authored-by: Chris S. Kim <ckim@hashicorp.com>
Co-authored-by: R.B. Boyer <4903+rboyer@users.noreply.github.com>
* add `singleValueID` implementation assertions
* partition `tableSessions` table
* fix sessions to use UUID and fix prefix index
* fix oss build
* clean up unused functions
* fix oss compilation
* add a partition indexer for sessions
* Fix oss to not have partition index
* fix oss tests
* remove unused operations_ent.go and operations_oss.go func
* remove unused const
* convert `IndexID` of `session_checks` table
* convert `indexSession` of `session_checks` table
* convert `indexNodeCheck` of `session_checks` table
* partition `indexID` and `indexSession` of `tableSessionChecks`
* fix oss linter
* fix review comments
* remove partition for Checks as it's always use the session partition
* fix tests
* fix tests
* do not namespace nodeChecks index
Co-authored-by: Daniel Nephin <dnephin@hashicorp.com>
Co-authored-by: Chris S. Kim <ckim@hashicorp.com>
Co-authored-by: R.B. Boyer <4903+rboyer@users.noreply.github.com>
Cross port of ent #1383 "Reject non-default datacenter when making partitioned ACLs"
On the OSS side this is a minor refactor to add some more checks that are only applicable to enterprise code.
Signed-off-by: Mark Anderson <manderson@hashicorp.com>
The test added in this commit shows the problem. Previously the
SigningKeyID was set to the RootCert not the local leaf signing cert.
This same bug was fixed in two other places back in 2019, but this last one was
missed.
While fixing this bug I noticed I had the same few lines of code in 3
places, so I extracted a new function for them.
There would be 4 places, but currently the InitializeCA flow sets this
SigningKeyID in a different way, so I've left that alone for now.
While working on the CA system it is important to be able to run all the
tests related to the system, without having to wait for unrelated tests.
There are many slow and unrelated tests in agent/consul, so we need some
way to filter to only the relevant tests.
This PR renames all the CA system related tests to start with either
`TestCAMananger` for tests of internal operations that don't have RPC
endpoint, or `TestConnectCA` for tests of RPC endpoints. This allows us
to run all the test with:
go test -run 'TestCAMananger|TestConnectCA' ./agent/consul
The test naming follows an undocumented convention of naming tests as
follows:
Test[<struct name>_]<function name>[_<test case description>]
I tried to always keep Primary/Secondary at the end of the description,
and _Vault_ has to be in the middle because of our regex to run those
tests as a separate CI job.
You may notice some of the test names changed quite a bit. I did my best
to identify the underlying method being tested, but I may have been
slightly off in some cases.
As a method on the struct type this would not be safe to call without first checking
c.isIntermediateUsedToSignLeaf.
So for now, move this logic to the CAMananger, so that it is always correct.
We were not adding the local signing cert to the CARoot. This commit
fixes that bug, and also adds support for fixing existing CARoot on
upgrade.
Also update the tests for both primary and secondary to be more strict.
Check the SigningKeyID is correct after initialization and rotation.
Previously we believe it was necessary for all code that required ports
to use freeport to prevent conflicts.
https://github.com/dnephin/freeport-test shows that it is actually save
to use port 0 (`127.0.0.1:0`) as long as it is passed directly to
`net.Listen`, and the listener holds the port for as long as it is
needed.
This works because freeport explicitly avoids the ephemeral port range,
and port 0 always uses that range. As you can see from the test output
of https://github.com/dnephin/freeport-test, the two systems never use
overlapping ports.
This commit converts all uses of freeport that were being passed
directly to a net.Listen to use port 0 instead. This allows us to remove
a bit of wrapping we had around httptest, in a couple places.
In d2ab767fef21244e9fe3b9887ea70fc177912381 raftApply was changed to handle this check in
a single place, instad of having every caller check it. It looks like these few places
were missed when I did that clean up.
This commit removes the remaining resp.(error) checks, since they are all no-ops now.
This function is only ever called from operations that have already acquired the state lock, so checking
the value of state can never fail.
This change is being made in preparation for splitting out a separate type for the secondary logic. The
state can't easily be shared, so really only the expored top-level functions should acquire the 'state lock'.
This commit removes the actingSecondaryCA field, and removes the stateLock around it. This field
was acting as a proxy for providerRoot != nil, so replace it with that check instead.
The two methods which called secondarySetCAConfigured already set the state, so checking the
state again at this point will not catch runtime errors (only programming errors, which we can catch with tests).
In general, handling state transitions should be done on the "entrypoint" methods where execution starts, not
in every internal method.
This is being done to remove some unnecessary references to c.state, in preparations for extracting
types for primary/secondary.
This makes it easier to fake, which will allow me to use the ConsulProvider as
an 'external PKI' to test a customer setup where the actual root CA is not
the root we use for the Consul CA.
Replaces a call to the state store to fetch the clusterID with the
clusterID field already available on the built-in provider.
* state: port KV and Tombstone tables to new pattern
* go fmt'ed
* handle wildcards for tombstones
* Fix graveyard ent vs oss
* fix oss compilation error
* add partition to tombstones and kv state store indexes
* refactor to use `indexWithEnterpriseIndexable`
* Apply suggestions from code review
Co-authored-by: Chris S. Kim <ckim@hashicorp.com>
Co-authored-by: R.B. Boyer <4903+rboyer@users.noreply.github.com>
* partition `tableSessions` table
* fix sessions to use UUID and fix prefix index
* fix oss build
* clean up unused functions
* fix oss compilation
* add a partition indexer for sessions
* Fix oss to not have partition index
* fix oss tests
* remove unused operations_ent.go and operations_oss.go func
* convert `indexNodeCheck` of `session_checks` table
* partition `indexID` and `indexSession` of `tableSessionChecks`
* remove partition for Checks as it's always use the session partition
* partition sessions index id table
* fix rebase issues
Co-authored-by: Daniel Nephin <dnephin@hashicorp.com>
Co-authored-by: Chris S. Kim <ckim@hashicorp.com>
Co-authored-by: R.B. Boyer <4903+rboyer@users.noreply.github.com>
* state: port KV and Tombstone tables to new pattern
* go fmt'ed
* handle wildcards for tombstones
* Fix graveyard ent vs oss
* fix oss compilation error
* add partition to tombstones and kv state store indexes
* refactor to use `indexWithEnterpriseIndexable`
* Apply suggestions from code review
Co-authored-by: Chris S. Kim <ckim@hashicorp.com>
Co-authored-by: R.B. Boyer <4903+rboyer@users.noreply.github.com>
* add `singleValueID` implementation assertions
* partition `tableSessions` table
* fix sessions to use UUID and fix prefix index
* fix oss build
* clean up unused functions
* fix oss compilation
* add a partition indexer for sessions
* Fix oss to not have partition index
* fix oss tests
* remove unused operations_ent.go and operations_oss.go func
* remove unused const
* convert `IndexID` of `session_checks` table
* convert `indexSession` of `session_checks` table
* convert `indexNodeCheck` of `session_checks` table
* partition `indexID` and `indexSession` of `tableSessionChecks`
* fix oss linter
* fix review comments
* remove partition for Checks as it's always use the session partition
Co-authored-by: Daniel Nephin <dnephin@hashicorp.com>
Co-authored-by: Chris S. Kim <ckim@hashicorp.com>
Co-authored-by: R.B. Boyer <4903+rboyer@users.noreply.github.com>
The TrustDomain is populated from the Host() method which includes the
hard-coded "consul" domain. This means that despite having an empty
cluster ID, the TrustDomain won't be empty.
There are two restrictions:
- Writes from the primary DC which explicitly target a secondary DC.
- Writes to a secondary DC that do not explicitly target the primary DC.
The first restriction is because the config entry is not supported in
secondary datacenters.
The second restriction is to prevent the scenario where a user writes
the config entry to a secondary DC, the write gets forwarded to the
primary, but then the config entry does not apply in the secondary.
This makes the scope more explicit.
Currently getCARoots could return an empty object with an empty trust
domain before the CA is initialized. This commit returns an error while
there is no CA config or no trust domain.
There could be a CA config and no trust domain because the CA config can
be created in InitializeCA before initialization succeeds.
* state: port KV and Tombstone tables to new pattern
* go fmt'ed
* handle wildcards for tombstones
* Fix graveyard ent vs oss
* fix oss compilation error
* add partition to tombstones and kv state store indexes
* refactor to use `indexWithEnterpriseIndexable`
* Apply suggestions from code review
Co-authored-by: Chris S. Kim <ckim@hashicorp.com>
Co-authored-by: R.B. Boyer <4903+rboyer@users.noreply.github.com>
* add `singleValueID` implementation assertions
* partition `tableSessions` table
* fix sessions to use UUID and fix prefix index
* fix oss build
* clean up unused functions
* fix oss compilation
* add a partition indexer for sessions
* Fix oss to not have partition index
* fix oss tests
* remove unused func `prefixIndexFromServiceNameAsString`
* fix test error check
* remove unused operations_ent.go and operations_oss.go func
* remove unused const
Co-authored-by: Daniel Nephin <dnephin@hashicorp.com>
Co-authored-by: Chris S. Kim <ckim@hashicorp.com>
Co-authored-by: R.B. Boyer <4903+rboyer@users.noreply.github.com>
* state: port KV and Tombstone tables to new pattern
* go fmt'ed
* handle wildcards for tombstones
* Fix graveyard ent vs oss
* fix oss compilation error
* add partition to tombstones and kv state store indexes
* refactor to use `indexWithEnterpriseIndexable`
* partition kvs indexID table
* add `partitionedIndexEntryName` in oss for test purpose
* Apply suggestions from code review
Co-authored-by: Chris S. Kim <ckim@hashicorp.com>
Co-authored-by: R.B. Boyer <4903+rboyer@users.noreply.github.com>
* add `singleValueID` implementation assertions
* remove entmeta reference from oss
Co-authored-by: R.B. Boyer <4903+rboyer@users.noreply.github.com>
Co-authored-by: Daniel Nephin <dnephin@hashicorp.com>
Co-authored-by: Chris S. Kim <ckim@hashicorp.com>
Co-authored-by: R.B. Boyer <4903+rboyer@users.noreply.github.com>
Previously secondaryInitialize would return nil in this case, which prevented the
deferred initialize from happening, and left the CA in an uninitialized state until a config
update or root rotation.
To fix this I extracted the common parts into the delegate implementation. However looking at this
again, it seems like the handling in secondaryUpdateRoots is impossible, because that function
should never be called before the secondary is initialzied. I beleive we can remove some of that
logic in a follow up.
* add root_cert_ttl option for consul connect, vault ca providers
Signed-off-by: FFMMM <FFMMM@users.noreply.github.com>
* Apply suggestions from code review
Co-authored-by: Chris S. Kim <ckim@hashicorp.com>
* add changelog, pr feedback
Signed-off-by: FFMMM <FFMMM@users.noreply.github.com>
* Update .changelog/11428.txt, more docs
Co-authored-by: Daniel Nephin <dnephin@hashicorp.com>
* Update website/content/docs/agent/options.mdx
Co-authored-by: Kyle Havlovitz <kylehav@gmail.com>
Co-authored-by: Chris S. Kim <ckim@hashicorp.com>
Co-authored-by: Daniel Nephin <dnephin@hashicorp.com>
Co-authored-by: Kyle Havlovitz <kylehav@gmail.com>
These labels should be set by whatever process scrapes Consul (for
prometheus), or by the agent that receives them (for datadog/statsd).
We need to remove them here because the labels are part of the "metric
key", so we'd have to pre-declare the metrics with the labels. We could
do that, but that is extra work for labels that should be added from
elsewhere.
Also renames the closure to be more descriptive.
Prometheus scrapes metrics from each process, so when leadership transfers to a different node
the previous leader would still be reporting the old cached value.
By setting NaN, I believe we should zero-out the value, so that prometheus should only consider the
value from the new leader.
Emit the metric immediately so that after restarting an agent, the new expiry time will be
emitted. This is particularly important when this metric is being monitored, because we want
the alert to resovle itself immediately.
Also fixed a bug that was exposed in one of these metrics. The CARoot can be nil, so we have
to handle that case.
TestSubscribeBackend_IntegrationWithServer_DeliversAllMessages has been
flaking a few times. This commit cleans up the test a bit, and improves
the failure output.
I don't believe this actually fixes the flake, but I'm not able to
reproduce it reliably.
The failure appears to be that the event with Port=0 is being sent in
both the snapshot and as the first event after the EndOfSnapshot event.
Hopefully the improved logging will show us if these are really
duplicate events, or actually different events with different indexes.
partitionAuthorizer.config can be nil if it wasn't provided on calls to
newPartitionAuthorizer outside of the ACLResolver. This usage happens
often in tests.
This commit: adds a nil check when the config is going to be used,
updates non-test usage of NewPolicyAuthorizerWithDefaults to pass a
non-nil config, and dettaches setEnterpriseConf from the ACLResolver.
When issuing cross-partition service discovery requests, ACL filtering
often checks for NodeRead privileges. This is because the common return
type is a CheckServiceNode, which contains node data.
useInDatacenter was used to determine whether the mesh gateway mode of
the upstream should be returned in the discovery chain target. This
commit makes it so that the mesh gateway mode is returned every time,
and it is up to the caller to decide whether mesh gateways should be
watched or used.
Existing config entries prefixed by service- are specific to individual
services. Since this config entry applies to partitions it is being
renamed.
Additionally, the Partition label was changed to Name because using
Partition at the top-level and in the enterprise meta was leading to the
enterprise meta partition being dropped by msgpack.
The code for this was already removed, which suggests this is not actually testing what it claims.
I'm guessing these are still resolving because the tokens are converted to non-legacy tokens?
It seems like this was missing. Previously this was only called by init of ACLs during an upgrade.
Now that legacy ACLs are removed, nothing was calling stop.
Also remove an unused method from client.
This function is only run when the CAManager is a primary. Extracting this function
makes it clear which parts of UpdateConfiguration are run only in the primary and
also makes the cleanup logic simpler. Instead of both a defer and a local var we
can call the cleanup function in two places.
This commit renames functions to use a consistent pattern for identifying the functions that
can only be called when the Manager is run as the primary or secondary.
This is a step toward eventually creating separate types and moving these methods off of CAManager.
Add changelog to document what changed.
Add entry to telemetry section of the website to document what changed
Add docs to the usagemetric endpoint to help document the metrics in code
This commit two test failures:
1. Remove check for "in legacy ACL mode", the actual upgrade will be removed in a following commit.
2. Remove the early WaitForLeader in dc2, because with it the test was
failing with ACL not found.
TestAgentLeaks_Server was reporting a goroutine leak without this. Not sure if it would actually
be a leak in production or if this is due to the test setup, but seems easy enough to call it
this way until we remove legacyACLTokenUpgrade.
We no long need to read the acl serf tag, because servers are always either ACL enabled or
ACL disabled.
We continue to write the tag so that during an upgarde older servers will see the tag.
This commit two test failures:
1. Remove check for "in legacy ACL mode", the actual upgrade will be removed in a following commit.
2. Use the root token in WaitForLeader, because without it the test was
failing with ACL not found.
As part of removing the legacy ACL system ACL upgrading and the flag for
legacy ACLs is removed from Clients.
This commit also removes the 'acls' serf tag from client nodes. The tag is only ever read
from server nodes.
This commit also introduces a constant for the acl serf tag, to make it easier to track where
it is used.
Replace it with an implementation that returns an error, and rename some symbols
to use a Deprecated suffix to make it clear.
Also remove the ACLRequest struct, which is no longer referenced.
When converting these tests from the legacy ACL system to the new RPC endpoints I
initially changed most things to use _prefix rules, because that was equivalent to
the old legacy rules.
This commit modifies a few of those rules to be a bit more specific by replacing the _prefix
rule with a non-prefix one where possible.
In preparation for removing ACL.Apply.
Tests for ACL.Apply, ACL.GetPolicy, and ACL upgrades were removed
because all 3 of those will be removed shortly.
The forth test appears to be for the ACLResolver cache, so the test was moved to the correct
test file, and the name was updated to make it obvious what is being tested.
structs.ACLForceSet was deprecated 4 years ago, it should be safe to remove now.
ACLBootstrapNow was removed in a recent commit. While it is technically possible that a cluster with mixed version
could still attempt a legacy boostrap, we documented that the legacy system was deprecated in 1.4, so no
clusters that are being upgraded should be attempting a legacy boostrap.
* Port consul-enterprise #1123 to OSS
Signed-off-by: Mark Anderson <manderson@hashicorp.com>
* Fixup missing query field
Signed-off-by: Mark Anderson <manderson@hashicorp.com>
* change to re-trigger ci system
Signed-off-by: Mark Anderson <manderson@hashicorp.com>
* move intFromBool to be available for oss
* add expiry indexes
* remove dead code: `TokenExpirationIndex`
* fix remove indexer `TokenExpirationIndex`
* fix rebase issue
* convert `Roles` index to use `indexerSingle`
* split authmethod write indexer to oss and ent
* add index locality
* add locality unit tests
* move intFromBool to be available for oss
* use Bool func
* refactor `aclTokenList` to merge func
Some users are defining routing configurations that do not have associated services. This commit surfaces these configs in the topology visualization. Also fixes a minor internal bug with non-transparent proxy upstream/downstream references.
1) xds and grpc servers:
1.1) to use recovery middleware with callback that prints stack trace to log
1.2) callback turn the panic into a core.Internal error
2) added unit test for grpc server
Remove the error return, so that not handling is not reported as an
error by errcheck. It was returning the error passed as an arg
unmodified so there is no reason to return the same value that was
passed in.
Remove the term upstreams to remove any confusion with the term used in
service mesh.
Remove the AutoDisable field, and replace it with the TTL value, using 0
to indicate the setting is turned off.
Replace "not Before" with "After".
Add some test coverage to show the behaviour is still correct.
This field was never user-configurable. We always overwrote the value with 120s from
NonUserSource. However, we also never copied the value from RuntimeConfig to consul.Config,
So the value in NonUserSource was always ignored, and we used the default value of 30s
set by consul.DefaultConfig.
All of this code is an unnecessary distraction because a user can not actually configure
this value.
This commit removes the fields and uses a constant value instad. Someone attempting to set
acl.disabled_ttl in their config will now get an error about an unknown field, but previously
the value was completely ignored, so the new behaviour seems more correct.
We have to keep this field in the AutoConfig response for backwards compatibility, but the value
will be ignored by the client, so it doesn't really matter what value we set.
Tests only specified one of the fields, but in production we copy the
value from a single place, so we can do the same in tests.
The AutoConfig test broke because of the problem noticed in a previous
commit. The DisabledTTL is not wired up properly so it reports 0s here.
Changed the test to use an explicit value.
Follow up to https://github.com/hashicorp/consul/pull/10737#discussion_r682147950
Renames all variables for acl.Authorizer to use `authz`. Previously some
places used `rule` which I believe was an old name carried over from the
legacy ACL system.
A couple places also used authorizer.
This commit also removes another couple of authorizer nil checks that
are no longer necessary.
The constructor for Server is not at all the appropriate place to be setting default
values for a config struct that was passed in.
In production this value is always set from agent/config. In tests we should set the
default in a test helper.
This field has been unnecessary for a while now. It was always set to the same value
as PrimaryDatacenter. So we can remove the duplicate field and use PrimaryDatacenter
directly.
This change was made by GoLand refactor, which did most of the work for me.
This method suffered from similar naming to a couple other methods on Server, and had not great
re-use (2 callers). By copying a few of the lines into one of the callers we can move the
implementation into the second caller.
Once moved, we can see that ResolveTokenAndDefaultMeta is identical in both Client and Server, and
likely should be further refactored, possibly into ACLResolver.
This change is being made to make ACL resolution easier to trace.
This method was an alias for ACLResolver.ResolveTokenToIdentityAndAuthorizer. By removing the
method that does nothing the code becomes easier to trace.
ACL filtering only needs an authorizer and a logger. We can decouple filtering from
the ACLResolver by passing in the necessary logger.
This change is being made in preparation for moving the ACLResolver into an acl package
filterACLWithAuthorizer could never return an error. This change moves us a little bit
closer to being able to enable errcheck and catch problems caused by unhandled error
return values.
These functions are moved to the one place they are called to improve code locality.
They are being moved out of agent/consul/acl.go in preparation for moving
ACLResolver to an acl package.
These functions are used in only one place. Move the functions next to their one caller
to improve code locality.
This change is being made in preparation for moving the ACLResolver into an
acl package. The moved functions were previously in the same file as the ACLResolver.
By moving them out of that file we may be able to move the entire file
with fewer modifications.
* defer setting the state before returning to avoid being stuck in `INITIALIZING` state
* add changelog
* move comment with the right if statement
* ca: report state transition error from setSTate
* update comment to reflect state transition
Co-authored-by: Daniel Nephin <dnephin@hashicorp.com>
Follow up to: https://github.com/hashicorp/consul/pull/10738#discussion_r680190210
Previously we were passing an Authorizer that would always allow the
operation, then later checking the authorization using vetServiceTxnOp.
On the surface this seemed strange, but I think it was actually masking
a bug as well. Over time `servicePreApply` was changed to add additional
authorization for `service.Proxy.DestinationServiceName`, but because
we were passing a nil Authorizer, that authorization was not handled on
the txn_endpoint.
`TxnServiceOp.FillAuthzContext` has some special handling in enterprise,
so we need to make sure to continue to use that from the Txn endpoint.
This commit removes the `vetServiceTxnOp` function, and passes in the
`FillAuthzContext` function so that `servicePreApply` can be used by
both the catalog and txn endpoints. This should be much less error prone
and prevent bugs like this in the future.
1. do not emit the metric if Query fails
2. properly check for PrimaryUsersIntermediate, the logic was inverted
Also improve the logging by including the metric name in the log message
* fix state index for `CAOpSetRootsAndConfig` op
* add changelog
* Update changelog
Co-authored-by: Daniel Nephin <dnephin@hashicorp.com>
* remove the change log as it's not needed
Co-authored-by: Daniel Nephin <dnephin@hashicorp.com>
These checks were a bit more involved. They were previously skipping some code paths
when the authorizer was nil. After looking through these it seems correct to remove the
authz == nil check, since it will never evaluate to true.
These case are already impossible conditions, because most of these functions already start
with a check for ACLs being disabled. So the code path being removed could never be reached.
The one other case (ConnectAuthorized) was already changed in a previous commit. This commit
removes an impossible branch because authz == nil can never be true.