open-nomad/nomad/state
Tim Gross 903b5baaa4
keyring: safely handle missing keys and restore GC (#15092)
When replication of a single key fails, the replication loop breaks early and
therefore keys that fall later in the sorting order will never get
replicated. This is particularly a problem for clusters impacted by the bug that
caused #14981 and that were later upgraded; the keys that were never replicated
can now never be replicated, and so we need to handle them safely.

Included in the replication fix:
* Refactor the replication loop so that each key replicated in a function call
  that returns an error, to make the workflow more clear and reduce nesting. Log
  the error and continue.
* Improve stability of keyring replication tests. We no longer block leadership
  on initializing the keyring, so there's a race condition in the keyring tests
  where we can test for the existence of the root key before the keyring has
  been initialize. Change this to an "eventually" test.

But these fixes aren't enough to fix #14981 because they'll end up seeing an
error once a second complaining about the missing key, so we also need to fix
keyring GC so the keys can be removed from the state store. Now we'll store the
key ID used to sign a workload identity in the Allocation, and we'll index the
Allocation table on that so we can track whether any live Allocation was signed
with a particular key ID.
2022-11-01 15:00:50 -04:00
..
indexer core: add ACL token expiry state, struct, and RPC handling. (#13718) 2022-07-13 15:40:34 +02:00
paginator build: run gofmt on all go source files 2022-08-16 11:14:11 -05:00
autopilot.go autopilot: correctly return errors within state functions. (#12714) 2022-04-21 08:54:50 +02:00
autopilot_test.go ci: swap ci parallelization for unconstrained gomaxprocs 2022-03-15 12:58:52 -05:00
deployment_events_test.go ci: swap ci parallelization for unconstrained gomaxprocs 2022-03-15 12:58:52 -05:00
events.go acl: add ACL roles to event stream topic and resolve policies. (#14923) 2022-10-20 09:43:35 +02:00
events_test.go acl: add ACL roles to event stream topic and resolve policies. (#14923) 2022-10-20 09:43:35 +02:00
iterator.go csi: use node MaxVolumes during scheduling (#7565) 2020-03-31 17:16:47 -04:00
schema.go keyring: safely handle missing keys and restore GC (#15092) 2022-11-01 15:00:50 -04:00
schema_test.go ci: swap ci parallelization for unconstrained gomaxprocs 2022-03-15 12:58:52 -05:00
state_changes.go events: Use single eventsFromChanges func (#9281) 2020-11-05 13:06:52 -08:00
state_store.go keyring: safely handle missing keys and restore GC (#15092) 2022-11-01 15:00:50 -04:00
state_store_acl.go acl: add ACL roles to event stream topic and resolve policies. (#14923) 2022-10-20 09:43:35 +02:00
state_store_acl_test.go cleanup: rename Equals to Equal for consistency (#14759) 2022-10-10 09:28:46 -05:00
state_store_oss.go gofmt all the files 2021-10-01 10:14:28 -04:00
state_store_restore.go Merge branch 'main' into f-gh-13120-sso-umbrella-merged-main 2022-08-30 08:59:13 +01:00
state_store_restore_test.go Merge branch 'main' into f-gh-13120-sso-umbrella-merged-main 2022-08-30 08:59:13 +01:00
state_store_service_regisration_test.go cleanup: rename Equals to Equal for consistency (#14759) 2022-10-10 09:28:46 -05:00
state_store_service_registration.go cleanup: rename Equals to Equal for consistency (#14759) 2022-10-10 09:28:46 -05:00
state_store_test.go refactor eval delete safety check (#15070) 2022-10-28 09:10:33 -04:00
state_store_variables.go cleanup: rename Equals to Equal for consistency (#14759) 2022-10-10 09:28:46 -05:00
state_store_variables_oss.go rename SecureVariables to Variables throughout 2022-08-26 16:06:24 -04:00
state_store_variables_test.go cleanup: rename Equals to Equal for consistency (#14759) 2022-10-10 09:28:46 -05:00
testing.go CSI: allow updates to volumes on re-registration (#12167) 2022-03-07 11:06:59 -05:00