Commit graph

2151 commits

Author SHA1 Message Date
Nick Cabatoff 41d9030fbb
Disable autopilot in raft-ha mode. (#11181)
* Disable autopilot in raft-ha mode.

* Also don't run autopilot on DR secondaries.
2021-03-23 14:13:44 -07:00
Brian Kassouf d01a068929
Remove retry from new raft test (#11158) 2021-03-19 12:41:57 -07:00
Nick Cabatoff b3af58d758
Expose snapshot_interval tunable instead of setting it in prod code for the sake of a test. (#11160) 2021-03-19 15:41:42 -04:00
Brian Kassouf 28aba513f2
storage/raft: Ensure peers are informed of their correct suffrage when added with AutoPilot (#11155)
* storage/raft: Ensure peers are informed of their correct suffrage when added with AutoPilot

* Add test ensuring peer sets are equivalent
2021-03-19 11:53:50 -07:00
Scott Miller 535bcf289e
Fix handling of minimum operations, and forward rotate/config requests to Primary (#11116)
* Boost max_operations to the greater of that specified or absoluteMinOperations

* Forward rotation config requests to the primary

* Reject rotation configs outside the min/max range

* Minor wording fix
2021-03-18 15:08:47 -05:00
Nick Cabatoff 411495514c
Add a test for server stabilization (#11128) 2021-03-17 17:23:13 -04:00
Vishal Nayak 9839e76192
Remove unneeded fields from state output (#11073) 2021-03-10 12:08:12 -05:00
Brian Kassouf aa00b53ba1
Make sure we sanitize the rotation config on each clone (#11050)
* Make sure we sanitize the rotation config on each clone

* Add regression test for missing rotation config

* use Equals

* simplify

Co-authored-by: Scott G. Miller <smiller@hashicorp.com>
2021-03-08 10:59:21 -06:00
Vishal Nayak 3e55e79a3f
Autopilot: Server Stabilization, State and Dead Server Cleanup (#10856)
* k8s doc: update for 0.9.1 and 0.8.0 releases (#10825)

* k8s doc: update for 0.9.1 and 0.8.0 releases

* Update website/content/docs/platform/k8s/helm/configuration.mdx

Co-authored-by: Theron Voran <tvoran@users.noreply.github.com>

Co-authored-by: Theron Voran <tvoran@users.noreply.github.com>

* Autopilot initial commit

* Move autopilot related backend implementations to its own file

* Abstract promoter creation

* Add nil check for health

* Add server state oss no-ops

* Config ext stub for oss

* Make way for non-voters

* s/health/state

* s/ReadReplica/NonVoter

* Add synopsis and description

* Remove struct tags from AutopilotConfig

* Use var for config storage path

* Handle nin-config when reading

* Enable testing autopilot by using inmem cluster

* First passing test

* Only report the server as known if it is present in raft config

* Autopilot defaults to on for all existing and new clusters

* Add locking to some functions

* Persist initial config

* Clarify the command usage doc

* Add health metric for each node

* Fix audit logging issue

* Don't set DisablePerformanceStandby to true in test

* Use node id label for health metric

* Log updates to autopilot config

* Less aggressively consume config loading failures

* Return a mutable config

* Return early from known servers if raft config is unable to be pulled

* Update metrics name

* Reduce log level for potentially noisy log

* Add knob to disable autopilot

* Don't persist if default config is in use

* Autopilot: Dead server cleanup (#10857)

* Dead server cleanup

* Initialize channel in any case

* Fix a bunch of tests

* Fix panic

* Add follower locking in heartbeat tracker

* Add LastContactFailureThreshold to config

* Add log when marking node as dead

* Update follower state locking in heartbeat tracker

* Avoid follower states being nil

* Pull test to its own file

* Add execution status to state response

* Optionally enable autopilot in some tests

* Updates

* Added API function to fetch autopilot configuration

* Add test for default autopilot configuration

* Configuration tests

* Add State API test

* Update test

* Added TestClusterOptions.PhysicalFactoryConfig

* Update locking

* Adjust locking in heartbeat tracker

* s/last_contact_failure_threshold/left_server_last_contact_threshold

* Add disabling autopilot as a core config option

* Disable autopilot in some tests

* s/left_server_last_contact_threshold/dead_server_last_contact_threshold

* Set the lastheartbeat of followers to now when setting up active node

* Don't use config defaults from CLI command

* Remove config file support

* Remove HCL test as well

* Persist only supplied config; merge supplied config with default to operate

* Use pointer to structs for storing follower information

* Test update

* Retrieve non voter status from configbucket and set it up when a node comes up

* Manage desired suffrage

* Consider bucket being created already

* Move desired suffrage to its own entry

* s/DesiredSuffrageKey/LocalNodeConfigKey

* s/witnessSuffrage/recordSuffrage

* Fix test compilation

* Handle local node config post a snapshot install

* Commit to storage first; then record suffrage in fsm

* No need of local node config being nili case, post snapshot restore

* Reconcile autopilot config when a new leader takes over duty

* Grab fsm lock when recording suffrage

* s/Suffrage/DesiredSuffrage in FollowerState

* Instantiate autopilot only in leader

* Default to old ways in more scenarios

* Make API gracefully handle 404

* Address some feedback

* Make IsDead an atomic.Value

* Simplify follower hearbeat tracking

* Use uber.atomic

* Don't have multiple causes for having autopilot disabled

* Don't remove node from follower states if we fail to remove the dead server

* Autopilot server removals map (#11019)

* Don't remove node from follower states if we fail to remove the dead server

* Use map to track dead server removals

* Use lock and map

* Use delegate lock

* Adjust when to remove entry from map

* Only hold the lock while accessing map

* Fix race

* Don't set default min_quorum

* Fix test

* Ensure follower states is not nil before starting autopilot

* Fix race

Co-authored-by: Jason O'Donnell <2160810+jasonodonnell@users.noreply.github.com>
Co-authored-by: Theron Voran <tvoran@users.noreply.github.com>
2021-03-03 13:59:50 -05:00
swayne275 d74f82346b
Add Partial Month Client Count API for Activity Log (#11022)
* sketch out partial month activity log client API

* unit test partialMonthClientCount

* cleanup api

* add api doc, fix test, update api nomenclature to match existing

* cleanup

* add PR changelog file

* integration test for API

* report entities and tokens separately
2021-03-01 16:15:59 -07:00
Scott Miller 08d8f65e01
Take the state lock in checkBarrierRotate, and don't save on seal (#11028)
* Use the state lock, and don't bother a last minute check on seal

* defer
2021-03-01 16:32:17 -06:00
Brian Kassouf cbb8b21520 Fix test build 2021-03-01 12:29:12 -08:00
Brian Kassouf 1bc410783d OSS/ENT Drift 2021-03-01 10:51:04 -08:00
Brian Kassouf a112161f60
expiration: Add a few metrics to measure revoke queue lengths (#10955)
* expiration: Add a few metrics to measure revoke queue lengths

* Update the metric names

* Add appropriate cluster labels

* Add metrics to docs

* Update jobmanager.go
2021-02-26 16:00:39 -08:00
Scott Miller a7b372b447
Two minor changes not reflected OSS side (#11020) 2021-02-26 14:23:56 -06:00
Scott Miller b13b27f37e
OSS side barrier encryption tracking and automatic rotation (#11007)
* Automatic barrier key rotation, OSS portion

* Fix build issues

* Vendored version

* Add missing encs field, not sure where this got lost.
2021-02-25 14:27:25 -06:00
Nick Cabatoff c1ddfbb538
OSS parts of the new client controlled consistency feature (#10974) 2021-02-24 06:58:10 -05:00
swayne275 38a647c6e5
remove noisy log, simplify job interface (#10975) 2021-02-22 15:00:24 -07:00
Brian Kassouf 34a7fc0286
replication: Don't write request coutners on DR Secondary nodes (#10936) 2021-02-22 09:04:41 -08:00
Brian Kassouf 0ad63e5a20
core/expiration: Add backoff jitter to the expiration retries (#10937) 2021-02-18 20:20:01 -08:00
Hridoy Roy 4a96126d5a
Revert "Vault Dependency Upgrades [VAULT-871] (#10903)" (#10939)
This reverts commit eb74ca61fc4dcb7038f39defb127d5d639ba0ca1.
2021-02-18 15:40:18 -05:00
Hridoy Roy a26d1300e8
Vault Dependency Upgrades [VAULT-871] (#10903)
* upgrade vault dependency set

* etcd and grpc issues:

* better for tests

* testing

* all upgrades for hashicorp deps

* kubernetes plugin upgrade seems to work

* kubernetes plugin upgrade seems to work

* etcd and a bunch of other stuff

* all vulnerable packages upgraded

* k8s is broken in linux env but not locally

* test fixes

* fix testing

* fix etcd and grpc

* fix etcd and grpc

* use master branch of go-testing-interface

* roll back etcd upgrade

* have to fix grpc since other vendors pull in grpc 1.35.0 but we cant due to etcd

* rolling back in the replace directives

* a few more testing dependencies to clean up

* fix go mod vendor
2021-02-18 12:31:57 -08:00
swayne275 e4119a6a8a
Vault-1403 Switch Expiration Manager to use Fairsharing Backpressure (#1709) (#10932)
* basic pool and start testing

* refactor a bit for testing

* workFunc, start/stop safety, testing

* cleanup function for worker quit, more tests

* redo public/private members

* improve tests, export types, switch uuid package

* fix loop capture bug, cleanup

* cleanup tests

* update worker pool file name, other improvements

* add job manager prototype

* remove remnants

* add functions to wait for job manager and worker pool to stop, other fixes

* test job manager functionality, fix bugs

* encapsulate how jobs are distributed to workers

* make worker job channel read only

* add job interface, more testing, fixes

* set name for dispatcher

* fix test races

* wire up expiration manager most of the way

* dispatcher and job manager constructors don't return errors

* logger now dependency injected

* make some members private, test fcn to get worker pool size

* make GetNumWorkers public

* Update helper/fairshare/jobmanager_test.go

Co-authored-by: Brian Kassouf <briankassouf@users.noreply.github.com>

* update fairsharing usage, add tests

* make workerpool private

* remove custom worker names

* concurrency improvements

* remove worker pool cleanup function

* remove cleanup func from job manager, remove non blocking stop from fairshare

* update job manager for new constructor

* stop job manager when expiration manager stopped

* unset env var after test

* stop fairshare when started in tests

* stop leaking job manager goroutine

* prototype channel for waking up to assign work

* fix typo/bug and add tests

* improve job manager wake up, fix test typo

* put channel drain back

* better start/pause test for job manager

* comment cleanup

* degrade possible noisy log

* remove closure, clean up context

* improve revocation context timer

* test: reduce number of revocation workers during many tests

* Update vault/expiration.go

Co-authored-by: Brian Kassouf <briankassouf@users.noreply.github.com>

* feedback tweaks

Co-authored-by: Brian Kassouf <briankassouf@users.noreply.github.com>

Co-authored-by: Brian Kassouf <briankassouf@users.noreply.github.com>
2021-02-17 14:30:27 -08:00
swayne275 6e1b183f79
Shutdown Test Cores when Tests Complete (#10912)
* Shutdown Test Cores when Tests Complete

* go mod vendor
2021-02-12 13:04:48 -07:00
Jim Kalafut 42bae71806
Improve error messages (#10843)
- Fix: "bytes" should be less than %!s(int=131072) message
- Also add a missing openapi type that was throwing warnings
2021-02-11 19:51:12 -08:00
Michael Golowka 108d4c6a68
MySQL - Add username customization (#10834) 2021-02-11 14:08:32 -07:00
Vishal Nayak 53cb1deb38
Revert "Read-replica instead of non-voter (#10875)" (#10890)
This reverts commit fc745670cf34821f5834357d9caebc3351dbc1e7.
2021-02-10 16:41:58 -05:00
Mark Gritter 85c1ae1002
Fix error in log; add additional log on conflicting quotas. (#10888) 2021-02-10 12:24:35 -06:00
Ian Ferguson 865df63c76
Correct lock acquisition order in the pathEntityMergeID identity to fix deadlock condition (#10877) 2021-02-10 11:05:16 -05:00
Vishal Nayak a2394e7353
Read-replica instead of non-voter (#10875) 2021-02-10 09:58:18 -05:00
Mark Gritter c5fd996a36
Fix flaky ActivityLog unit test (#10860)
* Wait for initial retention run to finish before adding segments.
2021-02-09 16:34:49 -06:00
Vishal Nayak 8613ba88a6
Fix quota enforcing old path issue (#10689)
* Fix db indexing issue

* Add CL update
2021-02-09 05:46:09 -05:00
Mark Gritter d0994340fb
Fill in missing lease ID deterministically. Generate a UUID on creation. (#10855) 2021-02-08 13:46:59 -06:00
Nick Cabatoff 75c955b3c3
Apply OSS part of ENT change re waitForReplicationState. (#10837) 2021-02-04 09:10:35 -05:00
Mark Gritter 3ec15c4927
Fix use of identity/group endpoint to edit group by name (#10812)
* Updates identity/group to allow updating a group by name (#10223)
* Now that lookup by name is outside handleGroupUpdateCommon, do not
use the second name lookup as the object to update.
* Added changelog.

Co-authored-by: dr-db <25711615+dr-db@users.noreply.github.com>
2021-01-29 16:50:08 -06:00
Mark Gritter ce858de180
Fix for test failing on January 29th: advance months using timeutil, not AddDate. (#10808) 2021-01-29 11:48:22 -06:00
Hridoy Roy 537189cab8
make token create case insensitive [VAULT-1021] (#10743)
* make token create case insensitive

* changelog

* comment update
2021-01-27 09:56:54 -08:00
Aleksandr Bezobchuk 2ec8f9a222
metrics: activity log (#10514)
* core: add vault.identity.entity.active.monthly log
* Fixed end-of-month metrics and unit test.
* Added metric covering month-to-date (not broken down by namespace.)
* Updated documentation
* Added changelog.

Co-authored-by: mgritter <mgritter@hashicorp.com>
2021-01-26 16:37:07 -06:00
Vishal Nayak fcbbc5f7d8
Remove peer DR op token check only on secondaries (#10765) 2021-01-25 17:35:58 -05:00
Vishal Nayak 904bacd55e
Fix remove peers check (#10758) 2021-01-25 14:20:46 -05:00
Vishal Nayak c74c381fb1
Move the declaration to a OSS build tag file to not have it collide w… (#10750)
* Move the declaration to a OSS build tag file to not have it collide with ent declarations

* Add comment

* Remove comment to trigger ci
2021-01-25 09:35:19 -05:00
Vishal Nayak 8ebf0ae794
Fix build (#10749) 2021-01-22 16:40:22 -05:00
Vishal Nayak 5d270db1df
Add list peers to DR secondaries (#10746) 2021-01-22 11:50:59 -05:00
Mark Gritter fd55aa8378
Implement sys/seal-status and sys/leader in system backend (#10725)
* Implement sys/seal-status and sys/leader as normal API calls
(so that they can be used in namespaces.)
* Added changelog.
2021-01-20 14:04:24 -06:00
Nick Cabatoff 8cbc63d572
Add configuration to specify a TLS ServerName to use in the TLS handshake when performing a raft join. (#10698) 2021-01-19 17:54:28 -05:00
Nick Cabatoff c2bdeb9e7d
Minimal change to ensure that the bulky leaseEntry isn't kept in memory. (#10726) 2021-01-19 17:51:41 -05:00
Hridoy Roy 0becd555cf
Protect part of emitMetrics from panic behavior during post-seal (#10708)
* vault/core_metrics.go

* changelog

* comments
2021-01-19 14:06:50 -08:00
Scott Miller 77d27cb968
Add NIST guidance on rotating keys used for AES-GCM encryption (#10612)
* Add NIST guidance on rotating keys used for AES-GCM encryption

* Capture more places barrier encryption is used

* spacing issue

* Probabilistically track an estimated encryption count by key term

* Un-reorder imports

* wip

* get rid of sampling
2021-01-07 15:37:37 -06:00
Scott Miller c3e0d06216
Make the error response to the sys/internal/ui/mounts with no client token consistent (#10650)
* Make the error response to the sys/internal/ui/mounts with no client token consistent

* changelog

* Don't test against an empty mount path

* One other spot

* Instead, do all token checks first and early out before even looking for the mount
2021-01-07 11:46:08 -06:00
Lauren Voswinkel 7189a67a33
Adding snowflake as a bundled database secrets plugin (#10603)
* Adding snowflake as a bundled database secrets plugin

* Add snowflake-database-plugin to expected bundled plugins

* Add snowflake plugin name to the mockBuiltinRegistry
2021-01-07 09:30:24 -08:00