Commit Graph

1572 Commits

Author SHA1 Message Date
Daniel Nephin eaaceedf31
Merge pull request #11338 from hashicorp/dnephin/ca-manager-isolate-secondary
ca: clearly identify methods that are primary-only or secondary-only
2021-11-01 14:10:31 -04:00
Daniel Upton a620b6be2e
Support Check-And-Set deletion of config entries (#11419)
Implements #11372
2021-11-01 16:42:01 +00:00
Dhia Ayachi 4d763ef9e6
regenerate expired certs (#11462)
* regenerate expired certs

* add documentation to generate tests certificates
2021-11-01 11:40:16 -04:00
R.B. Boyer 017e9d5ae4
agent: add a clone function for duplicating the serf lan configuration (#11443) 2021-10-28 16:11:26 -05:00
Daniel Nephin 0a19d7fd76 agent: move agent tls metric monitor to a more appropriate place
And add a test for it
2021-10-27 16:26:09 -04:00
Daniel Nephin 1b2144c982 telemetry: set cert expiry metrics to NaN on start
So that followers do not report 0, which would make alerting difficult.
2021-10-27 15:19:25 -04:00
Daniel Nephin a7fcf14c5c telemetry: fix cert expiry metrics by removing labels
These labels should be set by whatever process scrapes Consul (for
prometheus), or by the agent that receives them (for datadog/statsd).

We need to remove them here because the labels are part of the "metric
key", so we'd have to pre-declare the metrics with the labels. We could
do that, but that is extra work for labels that should be added from
elsewhere.

Also renames the closure to be more descriptive.
2021-10-27 15:19:25 -04:00
Daniel Nephin 4300daa2e6 telemetry: only emit leader cert expiry metrics on the servers 2021-10-27 15:19:25 -04:00
Daniel Nephin 9de725c17d telemetry: prevent stale values from cert monitors
Prometheus scrapes metrics from each process, so when leadership transfers to a different node
the previous leader would still be reporting the old cached value.

By setting NaN, I believe we should zero-out the value, so that prometheus should only consider the
value from the new leader.
2021-10-27 15:19:25 -04:00
Daniel Nephin 616cc9b6f8 telemetry: improve cert expiry metrics
Emit the metric immediately so that after restarting an agent, the new expiry time will be
emitted. This is particularly important when this metric is being monitored, because we want
the alert to resovle itself immediately.

Also fixed a bug that was exposed in one of these metrics. The CARoot can be nil, so we have
to handle that case.
2021-10-27 15:19:25 -04:00
Daniel Nephin 24951f0c7e subscribe: attempt to fix a flaky test
TestSubscribeBackend_IntegrationWithServer_DeliversAllMessages has been
flaking a few times. This commit cleans up the test a bit, and improves
the failure output.

I don't believe this actually fixes the flake, but I'm not able to
reproduce it reliably.

The failure appears to be that the event with Port=0 is being sent in
both the snapshot and as the first event after the EndOfSnapshot event.

Hopefully the improved logging will show us if these are really
duplicate events, or actually different events with different indexes.
2021-10-27 15:09:09 -04:00
freddygv 592965d61e Rework acl exports interface 2021-10-27 12:50:39 -06:00
Freddy 9bbeea0432
Merge pull request #11433 from hashicorp/exported-service-acls
[OSS] acl: Expand ServiceRead and NodeRead to account for partition exports
2021-10-27 12:48:08 -06:00
Freddy d8ae915160
Merge pull request #11431 from hashicorp/ap/exports-proxycfg
[OSS] Update partitioned mesh gw handling for connect proxies
2021-10-27 11:27:43 -06:00
Freddy 8e23a6a0cc
Merge pull request #11416 from hashicorp/ap/exports-update
Rename service-exports to partition-exports
2021-10-27 11:27:31 -06:00
freddygv af662c8c1c Avoid mixing named and unnamed params 2021-10-26 23:42:25 -06:00
freddygv 1de62bb0a2 Avoid passing nil config pointer 2021-10-26 23:42:25 -06:00
freddygv 4a2e40aa3c Avoid panic on nil partitionAuthorizer config
partitionAuthorizer.config can be nil if it wasn't provided on calls to
newPartitionAuthorizer outside of the ACLResolver. This usage happens
often in tests.

This commit: adds a nil check when the config is going to be used,
updates non-test usage of NewPolicyAuthorizerWithDefaults to pass a
non-nil config, and dettaches setEnterpriseConf from the ACLResolver.
2021-10-26 23:42:25 -06:00
freddygv 015d85cd74 Update NodeRead for partition-exports
When issuing cross-partition service discovery requests, ACL filtering
often checks for NodeRead privileges. This is because the common return
type is a CheckServiceNode, which contains node data.
2021-10-26 23:42:11 -06:00
Kyle Havlovitz afb0976eac acl: pass PartitionInfo through ent ACLConfig 2021-10-26 23:41:52 -06:00
Kyle Havlovitz 56d1858c4a acl: Expand ServiceRead logic to look at service-exports for cross-partition 2021-10-26 23:41:32 -06:00
freddygv 3966677aaf Finish removing useInDatacenter 2021-10-26 23:36:01 -06:00
freddygv feaebde1f1 Remove useInDatacenter from disco chain requests
useInDatacenter was used to determine whether the mesh gateway mode of
the upstream should be returned in the discovery chain target. This
commit makes it so that the mesh gateway mode is returned every time,
and it is up to the caller to decide whether mesh gateways should be
watched or used.
2021-10-26 23:35:21 -06:00
R.B. Boyer e27e58c6cc
agent: refactor the agent delegate interface to be partition friendly (#11429) 2021-10-26 15:08:55 -05:00
Chris S. Kim 27f8a85664
agent: Ensure partition is considered in agent endpoints (#11427) 2021-10-26 15:20:57 -04:00
freddygv c3e381b4c1 Rename service-exports to partition-exports
Existing config entries prefixed by service- are specific to individual
services. Since this config entry applies to partitions it is being
renamed.

Additionally, the Partition label was changed to Name because using
Partition at the top-level and in the enterprise meta was leading to the
enterprise meta partition being dropped by msgpack.
2021-10-25 17:58:48 -06:00
Daniel Nephin f24bad2a52
Merge pull request #11232 from hashicorp/dnephin/acl-legacy-remove-docs
acl: add docs and changelog for the removal of the legacy ACL system
2021-10-25 18:38:00 -04:00
Daniel Nephin f7cdd210fe Update agent/consul/acl_client.go
Co-authored-by: Freddy <freddygv@users.noreply.github.com>
2021-10-25 17:25:14 -04:00
Daniel Nephin 732b841dd7 state: remove support for updating legacy ACL tokens 2021-10-25 17:25:14 -04:00
Daniel Nephin 76b007dacd acl: remove init check for legacy anon token
This token should always already be migrated from a previous version.
2021-10-25 17:25:14 -04:00
Daniel Nephin 8ae6ee4e36 acl: remove legacy parameter to ACLDatacenter
It is no longer used now that legacy ACLs have been removed.
2021-10-25 17:25:14 -04:00
Daniel Nephin d778113773 acl: remove ACLTokenTypeManagement 2021-10-25 17:25:14 -04:00
Daniel Nephin 88c6aeea34 acl: remove legacy arg to store.ACLTokenSet
And remove the tests for legacy=true
2021-10-25 17:25:14 -04:00
Daniel Nephin b31a7fc498 acl: remove EmbeddedPolicy
This method is no longer. It only existed for legacy tokens, which are no longer supported.
2021-10-25 17:25:14 -04:00
Daniel Nephin ceaa36f983 acl: remove tests for resolving legacy tokens
The code for this was already removed, which suggests this is not actually testing what it claims.

I'm guessing these are still resolving because the tokens are converted to non-legacy tokens?
2021-10-25 17:25:14 -04:00
Daniel Nephin a46e3bd2fc acl: stop replication on leadership lost
It seems like this was missing. Previously this was only called by init of ACLs during an upgrade.
Now that legacy ACLs are  removed, nothing was calling stop.

Also remove an unused method from client.
2021-10-25 17:24:12 -04:00
Daniel Nephin 15cd8c7ab8 Remove incorrect TODO 2021-10-25 17:20:06 -04:00
Daniel Nephin 589b238374 acl: move the legacy ACL struct to the one package where it is used
It is now only used for restoring snapshots. We can remove it in phase 2.
2021-10-25 17:20:06 -04:00
Daniel Nephin 0ba5d0afcd acl: remove most of the rest of structs/acl_legacy.go 2021-10-25 17:20:06 -04:00
FFMMM 6433a57d3c
fix autopilot_failure_tolerance, add autopilot metrics test case (#11399)
Signed-off-by: FFMMM <FFMMM@users.noreply.github.com>
2021-10-25 10:55:59 -07:00
Dhia Ayachi 75f69a98a2
fix leadership transfer on leave suggestions (#11387)
* add suggestions

* set isLeader to false when leadership transfer succeed
2021-10-21 14:02:26 -04:00
Dhia Ayachi 2d1ac1f7d0
try to perform a leadership transfer when leaving (#11376)
* try to perform a leadership transfer when leaving

* add a changelog
2021-10-21 12:44:31 -04:00
Kyle Havlovitz 752a285552 Add new service-exports config entry 2021-10-20 12:24:18 -07:00
R.B. Boyer 55dd52cb17
acl: small OSS refactors to help ensure that auth methods with namespace rules work with partitions (#11323) 2021-10-14 15:38:05 -05:00
freddygv f76fddb28e Use stored entmeta to fill authzContext 2021-10-14 08:57:40 -06:00
freddygv bdf3e951f8 Ensure partition is handled by auto-encrypt 2021-10-14 08:32:45 -06:00
Chris S. Kim 0a6d683c84
Update Intentions.List with partitions (#11299) 2021-10-13 10:47:12 -04:00
Connor 2cd80e5f66
Merge pull request #11222 from hashicorp/clly/service-mesh-metrics
Start tracking connect service mesh usage metrics
2021-10-11 14:35:03 -05:00
Connor Kelly 2119351f77
Replace fmt.Sprintf with function 2021-10-11 12:43:38 -05:00
Daniel Nephin 571acb872e ca: extract primaryUpdateRootCA
This function is only run when the CAManager is a primary. Extracting this function
makes it clear which parts of UpdateConfiguration are run only in the primary and
also makes the cleanup logic simpler. Instead of both a defer and a local var we
can call the cleanup function in two places.
2021-10-10 15:26:55 -04:00
Daniel Nephin a65594d8ec ca: rename functions to use a primary or secondary prefix
This commit renames functions to use a consistent pattern for identifying the functions that
can only be called when the Manager is run as the primary or secondary.

This is a step toward eventually creating separate types and moving these methods off of CAManager.
2021-10-10 15:26:55 -04:00
Daniel Nephin 20f0efd8c1 ca: make receiver variable name consistent
Every other method uses c not ca
2021-10-10 15:26:55 -04:00
FFMMM 7f28301212
fix consul_autopilot_healthy metric emission (#11231)
https://github.com/hashicorp/consul/issues/10730
2021-10-08 10:31:50 -07:00
Connor Kelly 38986d6371
Rename ConfigUsageEnterprise to EnterpriseConfigEntryUsage 2021-10-08 10:53:34 -05:00
Connor Kelly 76b3c4ed3c
Rename and prefix ConfigEntry in Usage table
Rename ConfigUsage functions to ConfigEntry

prefix ConfigEntry kinds with the ConfigEntry table name to prevent
potential conflicts
2021-10-07 16:19:55 -05:00
Connor Kelly 0e39a7a333
Add connect specific prefix to Usage table
Ensure that connect Kind's are separate from ConfigEntry Kind's to
prevent miscounting
2021-10-07 16:16:23 -05:00
Daniel Nephin 51e498717f docs: add notice that legacy ACLs have been removed.
Add changelog

Also remove a metric that is no longer emitted that was missed in a
previous step.
2021-10-05 18:30:22 -04:00
Connor Kelly f9ba7c39b5
Add changelog, website and metric docs
Add changelog to document what changed.
Add entry to telemetry section of the website to document what changed
Add docs to the usagemetric endpoint to help document the metrics in code
2021-10-05 13:34:24 -05:00
Daniel Nephin e03b7e4c68
Merge pull request #11182 from hashicorp/dnephin/acl-legacy-remove-upgrade
acl: remove upgrade from legacy, start in non-legacy mode
2021-10-04 17:25:39 -04:00
Daniel Nephin b9f0014d70 acl: remove updateEnterpriseSerfTags
The only remaining caller is a test helper, and the tests don't use the enterprise gossip
pools.
2021-10-04 17:01:51 -04:00
Daniel Nephin 5ac360b22d
Merge pull request #11126 from hashicorp/dnephin/acl-legacy-remove-resolve-and-get-policy
acl: remove ACL.GetPolicy RPC endpoint and ACLResolver.resolveTokenLegacy
2021-10-04 16:29:51 -04:00
Connor Kelly ed5693b537
Add metrics to count the number of service-mesh config entries 2021-10-04 14:50:17 -05:00
Connor Kelly 9c487389cf
Add metrics to count connect native service mesh instances
This will add the counts of the service mesh instances tagged by
whether or not it is connect native
2021-10-04 14:37:05 -05:00
Connor Kelly 8000ea45ca
Add metrics to count service mesh Kind instance counts
This will add the counts of service mesh instances tagged by the
different ServiceKind's.
2021-10-04 14:36:59 -05:00
Daniel Nephin b6435259c3 acl: fix test failures caused by remocving legacy ACLs
This commit two test failures:

1. Remove check for "in legacy ACL mode", the actual upgrade will be removed in a following commit.
2. Remove the early WaitForLeader in dc2, because with it the test was
   failing with ACL not found.
2021-10-01 18:03:10 -04:00
Dhia Ayachi 8bd52995d1
fix token list by auth method (#11196)
* add tests to OIDC authmethod and fix entMeta when retrieving auth-methods

* fix oss compilation error
2021-10-01 12:00:43 -04:00
Daniel Nephin ec935a2486 acl: call stop for the upgrade goroutine when done
TestAgentLeaks_Server was reporting a goroutine leak without this. Not sure if it would actually
be a leak in production or if this is due to the test setup, but seems easy enough to call it
this way until we remove legacyACLTokenUpgrade.
2021-09-29 17:36:43 -04:00
Daniel Nephin 0c077d0527 acl: only run startACLUpgrade once
Since legacy ACL tokens can no longer be created we only need to run this upgrade a single
time when leadership is estalbished.
2021-09-29 16:22:01 -04:00
Daniel Nephin f21097beda acl: remove reading of serf acl tags
We no long need to read the acl serf tag, because servers are always either ACL enabled or
ACL disabled.

We continue to write the tag so that during an upgarde older servers will see the tag.
2021-09-29 15:45:11 -04:00
Daniel Nephin b866e3c4f4 acl: fix test failure
For some reason removing legacy ACL upgrade requires using an ACL token now
for this WaitForLeader.
2021-09-29 15:21:30 -04:00
Daniel Nephin ebb2388605 acl: remove legacy ACL upgrades from Server
As part of removing the legacy ACL system
2021-09-29 15:19:23 -04:00
Daniel Nephin 41a97360ca acl: fix test failures caused by remocving legacy ACLs
This commit two test failures:

1. Remove check for "in legacy ACL mode", the actual upgrade will be removed in a following commit.
2. Use the root token in WaitForLeader, because without it the test was
   failing with ACL not found.
2021-09-29 15:15:50 -04:00
Daniel Nephin b73b68d696 acl: remove ACL.GetPolicy endpoint and resolve legacy acls
And all code that was no longer used once those two were removed.
2021-09-29 14:33:19 -04:00
Daniel Nephin b8da06a34d acl: remove ACL upgrading from Clients
As part of removing the legacy ACL system ACL upgrading and the flag for
legacy ACLs is removed from Clients.

This commit also removes the 'acls' serf tag from client nodes. The tag is only ever read
from server nodes.

This commit also introduces a constant for the acl serf tag, to make it easier to track where
it is used.
2021-09-29 14:02:38 -04:00
Daniel Nephin 33a5448604
Merge pull request #11136 from hashicorp/dnephin/acl-resolver-fix-default-authz
acl: fix default Authorizer for down_policy extend-cache/async-cache
2021-09-29 13:45:12 -04:00
Daniel Nephin 2995ac61f2 acl: remove the last of the legacy FSM
Replace it with an implementation that returns an error, and rename some symbols
to use a Deprecated suffix to make it clear.

Also remove the ACLRequest struct, which is no longer referenced.
2021-09-29 12:42:23 -04:00
Daniel Nephin a8358f7575 acl: remove bootstrap-init FSM operation 2021-09-29 12:42:23 -04:00
Daniel Nephin ea2e0ad2ec acl: remove initializeLegacyACL from leader init 2021-09-29 12:42:23 -04:00
Daniel Nephin 4e36442583 acl: remove ACLDelete FSM command, and state store function
These are no longer used now that ACL.Apply has been removed.
2021-09-29 12:42:23 -04:00
Daniel Nephin 7e37c9a765 acl: remove legacy field to ACLBoostrap 2021-09-29 12:42:23 -04:00
Daniel Nephin d4c48a3f23
Merge pull request #11101 from hashicorp/dnephin/acl-legacy-remove-rpc-2
acl: remove legacy ACL.Apply RPC
2021-09-29 12:23:55 -04:00
Daniel Nephin 69a83aefcf
Merge pull request #11177 from hashicorp/dnephin/remove-entmeta-methods
structs: remove EnterpriseMeta helper methods
2021-09-29 12:08:07 -04:00
Daniel Nephin acb62aa896
Merge pull request #10986 from hashicorp/dnephin/acl-legacy-remove-rpc
acl: remove legacy ACL RPC - part 1
2021-09-29 12:04:09 -04:00
Daniel Nephin 1bc07c5166 structs: rename the last helper method.
This one gets used a bunch, but we can rename it to make the behaviour more obvious.
2021-09-29 11:48:38 -04:00
Daniel Nephin 93b3e110b6 structs: remove another helper
We already have a helper funtion.
2021-09-29 11:48:03 -04:00
Chris S. Kim 3f79aaf509
Cleanup unnecessary normalizing method (#11169) 2021-09-28 15:31:12 -04:00
Daniel Nephin 4ed9476a61
Merge pull request #11084 from krastin/krastin-autopilot-loggingtypo
Fix a tiny typo in logging in autopilot.go
2021-09-28 15:11:11 -04:00
Daniel Nephin 30fe14eed3 acl: fix default authorizer for down_policy
This was causing a nil panic because a nil authorizer is no longer valid after the cleanup done
in https://github.com/hashicorp/consul/pull/10632.
2021-09-23 18:12:22 -04:00
Daniel Nephin a6a7069ecf Remove t.Parallel from TestACLResolver_DownPolicy
These tests run in under 10ms, t.Parallel does nothing but slow them down and
make failures harder to debug when one panics.
2021-09-23 18:12:22 -04:00
Dhia Ayachi 4505cb2920
Refactor table index acl phase 2 (#11133)
* extract common methods from oss and ent

* remove unreachable code

* add missing normalize for binding rules

* fix oss to use Query
2021-09-23 15:26:09 -04:00
Dhia Ayachi ebe333b947
Refactor table index (#11131)
* convert tableIndex to use the new pattern

* make `indexFromString` available for oss as well

* refactor `indexUpdateMaxTxn`
2021-09-23 11:06:23 -04:00
Daniel Nephin 3e6dc2a843 acl: remove ACL.Apply
As part of removing the legacy ACL system.
2021-09-22 18:28:08 -04:00
Daniel Nephin 2ce64e2837 acl: made acl rules in tests slightly more specific
When converting these tests from the legacy ACL system to the new RPC endpoints I
initially changed most things to use _prefix rules, because that was equivalent to
the old legacy rules.

This commit modifies a few of those rules to be a bit more specific by replacing the _prefix
rule with a non-prefix one where possible.
2021-09-22 18:24:56 -04:00
Mark Anderson c87d57bfeb
partitions/authmethod-index work from enterprise (#11056)
* partitions/authmethod-index work from enterprise

Signed-off-by: Mark Anderson <manderson@hashicorp.com>
2021-09-22 13:19:20 -07:00
R.B. Boyer ba13416b57
grpc: strip local ACL tokens from RPCs during forwarding if crossing datacenters (#11099)
Fixes #11086
2021-09-22 13:14:26 -05:00
Connor bc04a155fb
Merge pull request #11090 from hashicorp/clly/kv-usage-metrics
Add KVUsage to consul state usage metrics
2021-09-22 11:26:56 -05:00
Connor Kelly bfe6b64ca7
Strip out go 1.17 bits 2021-09-22 11:04:48 -05:00
Daniel Nephin b40bdc9e98 acl: remove remaining tests that use ACL.Apply
In preparation for removing ACL.Apply.

Tests for ACL.Apply, ACL.GetPolicy, and ACL upgrades were removed
because all 3 of those will be removed shortly.

The forth test appears to be for the ACLResolver cache, so the test was moved to the correct
test file, and the name was updated to make it obvious what is being tested.
2021-09-21 19:35:26 -04:00
Daniel Nephin ab91d254a3 fsm: restore the legacy commands
and emit a helpful error message.
2021-09-21 18:35:12 -04:00
Daniel Nephin 0180dd67ff Convert tests to the new ACL system
In preparation for removing ACL.Apply
2021-09-21 18:35:12 -04:00