open-vault/vault
Vishal Nayak 3e55e79a3f
Autopilot: Server Stabilization, State and Dead Server Cleanup (#10856)
* k8s doc: update for 0.9.1 and 0.8.0 releases (#10825)

* k8s doc: update for 0.9.1 and 0.8.0 releases

* Update website/content/docs/platform/k8s/helm/configuration.mdx

Co-authored-by: Theron Voran <tvoran@users.noreply.github.com>

Co-authored-by: Theron Voran <tvoran@users.noreply.github.com>

* Autopilot initial commit

* Move autopilot related backend implementations to its own file

* Abstract promoter creation

* Add nil check for health

* Add server state oss no-ops

* Config ext stub for oss

* Make way for non-voters

* s/health/state

* s/ReadReplica/NonVoter

* Add synopsis and description

* Remove struct tags from AutopilotConfig

* Use var for config storage path

* Handle nin-config when reading

* Enable testing autopilot by using inmem cluster

* First passing test

* Only report the server as known if it is present in raft config

* Autopilot defaults to on for all existing and new clusters

* Add locking to some functions

* Persist initial config

* Clarify the command usage doc

* Add health metric for each node

* Fix audit logging issue

* Don't set DisablePerformanceStandby to true in test

* Use node id label for health metric

* Log updates to autopilot config

* Less aggressively consume config loading failures

* Return a mutable config

* Return early from known servers if raft config is unable to be pulled

* Update metrics name

* Reduce log level for potentially noisy log

* Add knob to disable autopilot

* Don't persist if default config is in use

* Autopilot: Dead server cleanup (#10857)

* Dead server cleanup

* Initialize channel in any case

* Fix a bunch of tests

* Fix panic

* Add follower locking in heartbeat tracker

* Add LastContactFailureThreshold to config

* Add log when marking node as dead

* Update follower state locking in heartbeat tracker

* Avoid follower states being nil

* Pull test to its own file

* Add execution status to state response

* Optionally enable autopilot in some tests

* Updates

* Added API function to fetch autopilot configuration

* Add test for default autopilot configuration

* Configuration tests

* Add State API test

* Update test

* Added TestClusterOptions.PhysicalFactoryConfig

* Update locking

* Adjust locking in heartbeat tracker

* s/last_contact_failure_threshold/left_server_last_contact_threshold

* Add disabling autopilot as a core config option

* Disable autopilot in some tests

* s/left_server_last_contact_threshold/dead_server_last_contact_threshold

* Set the lastheartbeat of followers to now when setting up active node

* Don't use config defaults from CLI command

* Remove config file support

* Remove HCL test as well

* Persist only supplied config; merge supplied config with default to operate

* Use pointer to structs for storing follower information

* Test update

* Retrieve non voter status from configbucket and set it up when a node comes up

* Manage desired suffrage

* Consider bucket being created already

* Move desired suffrage to its own entry

* s/DesiredSuffrageKey/LocalNodeConfigKey

* s/witnessSuffrage/recordSuffrage

* Fix test compilation

* Handle local node config post a snapshot install

* Commit to storage first; then record suffrage in fsm

* No need of local node config being nili case, post snapshot restore

* Reconcile autopilot config when a new leader takes over duty

* Grab fsm lock when recording suffrage

* s/Suffrage/DesiredSuffrage in FollowerState

* Instantiate autopilot only in leader

* Default to old ways in more scenarios

* Make API gracefully handle 404

* Address some feedback

* Make IsDead an atomic.Value

* Simplify follower hearbeat tracking

* Use uber.atomic

* Don't have multiple causes for having autopilot disabled

* Don't remove node from follower states if we fail to remove the dead server

* Autopilot server removals map (#11019)

* Don't remove node from follower states if we fail to remove the dead server

* Use map to track dead server removals

* Use lock and map

* Use delegate lock

* Adjust when to remove entry from map

* Only hold the lock while accessing map

* Fix race

* Don't set default min_quorum

* Fix test

* Ensure follower states is not nil before starting autopilot

* Fix race

Co-authored-by: Jason O'Donnell <2160810+jasonodonnell@users.noreply.github.com>
Co-authored-by: Theron Voran <tvoran@users.noreply.github.com>
2021-03-03 13:59:50 -05:00
..
activity Backport some OSS changes (#10267) 2020-10-29 16:47:34 -07:00
cluster Port changes from enterprise lease fix (#10020) 2020-09-22 14:47:13 -07:00
external_tests Autopilot: Server Stabilization, State and Dead Server Cleanup (#10856) 2021-03-03 13:59:50 -05:00
quotas Fix error in log; add additional log on conflicting quotas. (#10888) 2021-02-10 12:24:35 -06:00
replication
seal Same seal migration oss (#10224) 2020-10-23 14:16:04 -04:00
acl.go
acl_test.go Update HCL dependency to fix ParseACLPolicy error on invalid syntax (#10156) 2020-11-30 09:17:33 -06:00
acl_util.go
activity_log.go Add Partial Month Client Count API for Activity Log (#11022) 2021-03-01 16:15:59 -07:00
activity_log_test.go Add Partial Month Client Count API for Activity Log (#11022) 2021-03-01 16:15:59 -07:00
activity_log_testing_util.go Add Partial Month Client Count API for Activity Log (#11022) 2021-03-01 16:15:59 -07:00
activity_log_util.go Backport some OSS changes (#10267) 2020-10-29 16:47:34 -07:00
audit.go Send a test message before committing a new audit device. (#10520) 2020-12-16 16:00:32 -06:00
audit_broker.go
audit_test.go Shutdown Test Cores when Tests Complete (#10912) 2021-02-12 13:04:48 -07:00
audited_headers.go
audited_headers_test.go
auth.go Backport some OSS changes (#10267) 2020-10-29 16:47:34 -07:00
auth_test.go Shutdown Test Cores when Tests Complete (#10912) 2021-02-12 13:04:48 -07:00
barrier.go OSS side barrier encryption tracking and automatic rotation (#11007) 2021-02-25 14:27:25 -06:00
barrier_access.go
barrier_aes_gcm.go OSS side barrier encryption tracking and automatic rotation (#11007) 2021-02-25 14:27:25 -06:00
barrier_aes_gcm_test.go OSS side barrier encryption tracking and automatic rotation (#11007) 2021-02-25 14:27:25 -06:00
barrier_test.go OSS side barrier encryption tracking and automatic rotation (#11007) 2021-02-25 14:27:25 -06:00
barrier_view.go
barrier_view_test.go
barrier_view_util.go
capabilities.go
capabilities_test.go
cluster.go OSS parts of the new client controlled consistency feature (#10974) 2021-02-24 06:58:10 -05:00
cluster_test.go Shutdown Test Cores when Tests Complete (#10912) 2021-02-12 13:04:48 -07:00
core.go Autopilot: Server Stabilization, State and Dead Server Cleanup (#10856) 2021-03-03 13:59:50 -05:00
core_metrics.go OSS side barrier encryption tracking and automatic rotation (#11007) 2021-02-25 14:27:25 -06:00
core_metrics_test.go Fix crash when KV store has a zero-length key. (#9881) 2020-09-02 17:43:44 -05:00
core_test.go Shutdown Test Cores when Tests Complete (#10912) 2021-02-12 13:04:48 -07:00
core_util.go OSS parts of the new client controlled consistency feature (#10974) 2021-02-24 06:58:10 -05:00
cors.go
counters.go Move "counters" path to the logical system's local path list. (#10314) 2020-11-02 21:59:55 -06:00
counters_test.go Fix token counters test (#7867) 2019-11-12 13:33:28 -05:00
deadlock.go Add option to detect deadlocks in Core.stateLock using build tag deadlock (#8524) 2020-03-10 16:01:20 -04:00
dynamic_system_view.go OSS parts of the new client controlled consistency feature (#10974) 2021-02-24 06:58:10 -05:00
dynamic_system_view_test.go Add user configurable password policies available to secret engines (#8637) 2020-05-27 12:28:00 -06:00
expiration.go expiration: Add a few metrics to measure revoke queue lengths (#10955) 2021-02-26 16:00:39 -08:00
expiration_integ_test.go
expiration_test.go Vault-1403 Switch Expiration Manager to use Fairsharing Backpressure (#1709) (#10932) 2021-02-17 14:30:27 -08:00
expiration_util.go
generate_root.go Same seal migration oss (#10224) 2020-10-23 14:16:04 -04:00
generate_root_recovery.go Same seal migration oss (#10224) 2020-10-23 14:16:04 -04:00
generate_root_test.go Shamir seals now come in two varieties: legacy and new-style. (#7694) 2019-10-18 14:46:00 -04:00
ha.go core: Record the time a node became active (#10489) 2020-12-11 16:50:19 -08:00
ha_test.go fix deadlock on core state lock (#10456) 2020-12-10 06:50:11 -05:00
identity_lookup.go
identity_lookup_test.go
identity_store.go OSS parts of the new client controlled consistency feature (#10974) 2021-02-24 06:58:10 -05:00
identity_store_aliases.go
identity_store_aliases_test.go
identity_store_entities.go OSS parts of the new client controlled consistency feature (#10974) 2021-02-24 06:58:10 -05:00
identity_store_entities_test.go VAULT-417: check expired context in entity API (#1445) (#9925) 2020-09-10 16:31:32 -06:00
identity_store_group_aliases.go
identity_store_group_aliases_test.go
identity_store_groups.go Fix use of identity/group endpoint to edit group by name (#10812) 2021-01-29 16:50:08 -06:00
identity_store_groups_test.go Fix use of identity/group endpoint to edit group by name (#10812) 2021-01-29 16:50:08 -06:00
identity_store_oidc.go Fix identity token caching (#8412) 2020-02-26 15:56:19 -05:00
identity_store_oidc_test.go Run CI tests in docker instead of a machine. (#8948) 2020-09-15 10:01:26 -04:00
identity_store_oidc_util.go Fix identity token caching (#8412) 2020-02-26 15:56:19 -05:00
identity_store_schema.go
identity_store_structs.go Fix identity case sensitivity loading in secondary cluster (#7327) 2019-09-30 10:27:25 -04:00
identity_store_test.go Fix a race caused by assignment to core.metricSink (#9560) 2020-07-22 13:52:10 -04:00
identity_store_upgrade.go
identity_store_util.go OSS parts of the new client controlled consistency feature (#10974) 2021-02-24 06:58:10 -05:00
init.go Be consistent with how we report init status. (#10498) 2020-12-08 13:55:34 -05:00
init_test.go Shutdown Test Cores when Tests Complete (#10912) 2021-02-12 13:04:48 -07:00
keyring.go Two minor changes not reflected OSS side (#11020) 2021-02-26 14:23:56 -06:00
keyring_test.go
lock.go Add option to detect deadlocks in Core.stateLock using build tag deadlock (#8524) 2020-03-10 16:01:20 -04:00
logical_cubbyhole.go Reject requests read and write requests to cubbyhole with an empty path (#8971) 2020-05-11 14:15:36 -05:00
logical_cubbyhole_test.go
logical_passthrough.go
logical_passthrough_test.go
logical_raw.go Run go fmt (#7823) 2019-11-07 08:54:34 -08:00
logical_system.go Add Partial Month Client Count API for Activity Log (#11022) 2021-03-01 16:15:59 -07:00
logical_system_activity.go Add Partial Month Client Count API for Activity Log (#11022) 2021-03-01 16:15:59 -07:00
logical_system_helpers.go Fix remove peers check (#10758) 2021-01-25 14:20:46 -05:00
logical_system_integ_test.go Fix wrong err return value in plugin reload status command (#9348) 2020-06-30 13:33:30 -05:00
logical_system_paths.go OSS side barrier encryption tracking and automatic rotation (#11007) 2021-02-25 14:27:25 -06:00
logical_system_pprof.go sys/pprof: fix pprof index description (#7564) 2019-10-03 17:02:41 -07:00
logical_system_quotas.go Fix quota enforcing old path issue (#10689) 2021-02-09 05:46:09 -05:00
logical_system_raft.go Autopilot: Server Stabilization, State and Dead Server Cleanup (#10856) 2021-03-03 13:59:50 -05:00
logical_system_test.go OSS side barrier encryption tracking and automatic rotation (#11007) 2021-02-25 14:27:25 -06:00
logical_system_util.go Move the declaration to a OSS build tag file to not have it collide w… (#10750) 2021-01-25 09:35:19 -05:00
mount.go Backport some OSS changes (#10267) 2020-10-29 16:47:34 -07:00
mount_test.go Shutdown Test Cores when Tests Complete (#10912) 2021-02-12 13:04:48 -07:00
mount_util.go Port filtered paths changes back to OSS (#7741) 2019-10-27 13:30:38 -07:00
namespaces.go Backport some OSS changes (#10267) 2020-10-29 16:47:34 -07:00
plugin_catalog.go DBPW - Copy newdbplugin package to dbplugin/v5 (#10151) 2020-10-15 13:20:12 -06:00
plugin_catalog_test.go Adding snowflake as a bundled database secrets plugin (#10603) 2021-01-07 09:30:24 -08:00
plugin_reload.go Fix wrong err return value in plugin reload status command (#9348) 2020-06-30 13:33:30 -05:00
policy.go Add identity templating helper to sdk/framework (#8088) 2020-01-06 10:16:52 -08:00
policy_store.go Add maximum amount of random entropy requested (#7144) 2019-07-24 18:22:23 -07:00
policy_store_test.go core/policy & core/token: Remove Dead Test Code (#7774) 2019-11-04 10:36:07 +01:00
policy_store_util.go
policy_test.go
policy_util.go
raft.go Autopilot: Server Stabilization, State and Dead Server Cleanup (#10856) 2021-03-03 13:59:50 -05:00
rekey.go Fix panic in RekeyVerifyRestart (#9930) (#10099) 2020-10-07 11:06:17 -07:00
rekey_test.go Shutdown Test Cores when Tests Complete (#10912) 2021-02-12 13:04:48 -07:00
request_forwarding.go Eliminate global that caused race tests to fail in ent with an internal config setting. (#9604) 2020-07-27 16:10:26 -04:00
request_forwarding_rpc.go Autopilot: Server Stabilization, State and Dead Server Cleanup (#10856) 2021-03-03 13:59:50 -05:00
request_forwarding_rpc_util.go
request_forwarding_service.pb.go Autopilot: Server Stabilization, State and Dead Server Cleanup (#10856) 2021-03-03 13:59:50 -05:00
request_forwarding_service.proto Autopilot: Server Stabilization, State and Dead Server Cleanup (#10856) 2021-03-03 13:59:50 -05:00
request_handling.go OSS parts of the new client controlled consistency feature (#10974) 2021-02-24 06:58:10 -05:00
request_handling_test.go auth: store period value on tokens created via login (#7885) 2020-10-26 16:25:56 -04:00
request_handling_util.go OSS side barrier encryption tracking and automatic rotation (#11007) 2021-02-25 14:27:25 -06:00
rollback.go Fixed a bunch of typos (#7146) 2019-07-18 21:10:15 -04:00
rollback_test.go
router.go Resource Quotas: Rate Limiting (#9330) 2020-06-26 17:13:16 -04:00
router_access.go
router_test.go Fix a deadlock if a panic happens during request handling (#6920) 2019-06-19 09:40:57 -04:00
router_testing.go AWS upgrade role entries (#7025) 2019-07-05 16:55:40 -07:00
seal.go Same seal migration oss (#10224) 2020-10-23 14:16:04 -04:00
seal_access.go Migrate built in auto seal to go-kms-wrapping (#8118) 2020-01-10 20:39:52 -05:00
seal_autoseal.go Ensure that perf standbys can perform seal migrations. (#9690) 2020-08-10 08:35:57 -04:00
seal_autoseal_test.go Migrate built in auto seal to go-kms-wrapping (#8118) 2020-01-10 20:39:52 -05:00
seal_test.go Shamir seals now come in two varieties: legacy and new-style. (#7694) 2019-10-18 14:46:00 -04:00
seal_testing.go Revert "Vault Dependency Upgrades [VAULT-871] (#10903)" (#10939) 2021-02-18 15:40:18 -05:00
seal_testing_util.go Revert "Vault Dependency Upgrades [VAULT-871] (#10903)" (#10939) 2021-02-18 15:40:18 -05:00
sealunwrapper.go Migrate built in auto seal to go-kms-wrapping (#8118) 2020-01-10 20:39:52 -05:00
sealunwrapper_test.go Migrate built in auto seal to go-kms-wrapping (#8118) 2020-01-10 20:39:52 -05:00
testing.go Autopilot: Server Stabilization, State and Dead Server Cleanup (#10856) 2021-03-03 13:59:50 -05:00
testing_util.go Revert "Vault Dependency Upgrades [VAULT-871] (#10903)" (#10939) 2021-02-18 15:40:18 -05:00
token_store.go make token create case insensitive [VAULT-1021] (#10743) 2021-01-27 09:56:54 -08:00
token_store_test.go Revert "Vault Dependency Upgrades [VAULT-871] (#10903)" (#10939) 2021-02-18 15:40:18 -05:00
token_store_util.go
ui.go Fix UI custom header values (#10511) 2020-12-15 15:58:03 +01:00
ui_test.go Fix UI custom header values (#10511) 2020-12-15 15:58:03 +01:00
util.go
util_test.go
wrapping.go OSS parts of the new client controlled consistency feature (#10974) 2021-02-24 06:58:10 -05:00
wrapping_util.go