open-vault

Commit Graph

Author	SHA1	Message	Date
Nick Cabatoff	39c7e7c191	Add more raft metrics, emit more metrics on non-perf standbys (#12166 ) Add some metrics helpful for monitoring raft cluster state. Furthermore, we weren't emitting bolt metrics on regular (non-perf) standbys, and there were other metrics in metricsLoop that would make sense to include in OSS but weren't. We now have an active-node-only func, emitMetricsActiveNode. This runs metricsLoop on the active node. Standbys and perf-standbys run metricsLoop from a goroutine managed by the runStandby rungroup.	2022-10-07 09:09:08 -07:00
Josh Black	c45c6e51c0	only enable undo logs if all cluster members support it (#17378 )	2022-10-06 11:24:16 -07:00
Josh Black	9c48c62d6e	Use the incoming request version to populate follower state (#15806 )	2022-06-06 08:44:24 -07:00
Josh Black	416504d8c3	Add autopilot automated upgrades and redundancy zones (#15521 )	2022-05-20 16:49:11 -04:00
Nick Cabatoff	bc9f69af2e	Forward autopilot state reqs, avoid self-dialing (#15493 ) Make sure that autopilot is disabled when we step down from active node state. Forward autopilot state requests to the active node. Avoid self-dialing due to stale advertisement.	2022-05-18 14:50:18 -04:00
Jeff Mitchell	f7147025dd	Migrate to sdk/internalshared libs in go-secure-stdlib (#12090 ) * Swap sdk/helper libs to go-secure-stdlib * Migrate to go-secure-stdlib reloadutil * Migrate to go-secure-stdlib kv-builder * Migrate to go-secure-stdlib gatedwriter	2021-07-15 20:17:31 -04:00
Vishal Nayak	406abc19dc	Autopilot: Return leader info via delegate (#11247 ) * Autopilot: Return leader info via delegate * Pull in the new raft-autopilot lib dependencies * update deps * Add CL	2021-04-27 15:54:26 -04:00
Brian Kassouf	303c2aee7c	Run a more strict formatter over the code (#11312 ) * Update tooling * Run gofumpt * go mod vendor	2021-04-08 09:43:39 -07:00
Nick Cabatoff	411495514c	Add a test for server stabilization (#11128 )	2021-03-17 17:23:13 -04:00
Vishal Nayak	9839e76192	Remove unneeded fields from state output (#11073 )	2021-03-10 12:08:12 -05:00
Vishal Nayak	e5b6ec4d05	Reset IsDead upon each heartbeat (#11049 )	2021-03-05 19:50:36 -05:00
Vishal Nayak	910b45413b	Handle error (#11039 )	2021-03-03 15:55:50 -05:00
Vishal Nayak	3e55e79a3f	Autopilot: Server Stabilization, State and Dead Server Cleanup (#10856 ) * k8s doc: update for 0.9.1 and 0.8.0 releases (#10825) * k8s doc: update for 0.9.1 and 0.8.0 releases * Update website/content/docs/platform/k8s/helm/configuration.mdx Co-authored-by: Theron Voran <tvoran@users.noreply.github.com> Co-authored-by: Theron Voran <tvoran@users.noreply.github.com> * Autopilot initial commit * Move autopilot related backend implementations to its own file * Abstract promoter creation * Add nil check for health * Add server state oss no-ops * Config ext stub for oss * Make way for non-voters * s/health/state * s/ReadReplica/NonVoter * Add synopsis and description * Remove struct tags from AutopilotConfig * Use var for config storage path * Handle nin-config when reading * Enable testing autopilot by using inmem cluster * First passing test * Only report the server as known if it is present in raft config * Autopilot defaults to on for all existing and new clusters * Add locking to some functions * Persist initial config * Clarify the command usage doc * Add health metric for each node * Fix audit logging issue * Don't set DisablePerformanceStandby to true in test * Use node id label for health metric * Log updates to autopilot config * Less aggressively consume config loading failures * Return a mutable config * Return early from known servers if raft config is unable to be pulled * Update metrics name * Reduce log level for potentially noisy log * Add knob to disable autopilot * Don't persist if default config is in use * Autopilot: Dead server cleanup (#10857) * Dead server cleanup * Initialize channel in any case * Fix a bunch of tests * Fix panic * Add follower locking in heartbeat tracker * Add LastContactFailureThreshold to config * Add log when marking node as dead * Update follower state locking in heartbeat tracker * Avoid follower states being nil * Pull test to its own file * Add execution status to state response * Optionally enable autopilot in some tests * Updates * Added API function to fetch autopilot configuration * Add test for default autopilot configuration * Configuration tests * Add State API test * Update test * Added TestClusterOptions.PhysicalFactoryConfig * Update locking * Adjust locking in heartbeat tracker * s/last_contact_failure_threshold/left_server_last_contact_threshold * Add disabling autopilot as a core config option * Disable autopilot in some tests * s/left_server_last_contact_threshold/dead_server_last_contact_threshold * Set the lastheartbeat of followers to now when setting up active node * Don't use config defaults from CLI command * Remove config file support * Remove HCL test as well * Persist only supplied config; merge supplied config with default to operate * Use pointer to structs for storing follower information * Test update * Retrieve non voter status from configbucket and set it up when a node comes up * Manage desired suffrage * Consider bucket being created already * Move desired suffrage to its own entry * s/DesiredSuffrageKey/LocalNodeConfigKey * s/witnessSuffrage/recordSuffrage * Fix test compilation * Handle local node config post a snapshot install * Commit to storage first; then record suffrage in fsm * No need of local node config being nili case, post snapshot restore * Reconcile autopilot config when a new leader takes over duty * Grab fsm lock when recording suffrage * s/Suffrage/DesiredSuffrage in FollowerState * Instantiate autopilot only in leader * Default to old ways in more scenarios * Make API gracefully handle 404 * Address some feedback * Make IsDead an atomic.Value * Simplify follower hearbeat tracking * Use uber.atomic * Don't have multiple causes for having autopilot disabled * Don't remove node from follower states if we fail to remove the dead server * Autopilot server removals map (#11019) * Don't remove node from follower states if we fail to remove the dead server * Use map to track dead server removals * Use lock and map * Use delegate lock * Adjust when to remove entry from map * Only hold the lock while accessing map * Fix race * Don't set default min_quorum * Fix test * Ensure follower states is not nil before starting autopilot * Fix race Co-authored-by: Jason O'Donnell <2160810+jasonodonnell@users.noreply.github.com> Co-authored-by: Theron Voran <tvoran@users.noreply.github.com>	2021-03-03 13:59:50 -05:00

13 Commits