Commit graph

2143 commits

Author SHA1 Message Date
Vishal Nayak 3e55e79a3f
Autopilot: Server Stabilization, State and Dead Server Cleanup (#10856)
* k8s doc: update for 0.9.1 and 0.8.0 releases (#10825)

* k8s doc: update for 0.9.1 and 0.8.0 releases

* Update website/content/docs/platform/k8s/helm/configuration.mdx

Co-authored-by: Theron Voran <tvoran@users.noreply.github.com>

Co-authored-by: Theron Voran <tvoran@users.noreply.github.com>

* Autopilot initial commit

* Move autopilot related backend implementations to its own file

* Abstract promoter creation

* Add nil check for health

* Add server state oss no-ops

* Config ext stub for oss

* Make way for non-voters

* s/health/state

* s/ReadReplica/NonVoter

* Add synopsis and description

* Remove struct tags from AutopilotConfig

* Use var for config storage path

* Handle nin-config when reading

* Enable testing autopilot by using inmem cluster

* First passing test

* Only report the server as known if it is present in raft config

* Autopilot defaults to on for all existing and new clusters

* Add locking to some functions

* Persist initial config

* Clarify the command usage doc

* Add health metric for each node

* Fix audit logging issue

* Don't set DisablePerformanceStandby to true in test

* Use node id label for health metric

* Log updates to autopilot config

* Less aggressively consume config loading failures

* Return a mutable config

* Return early from known servers if raft config is unable to be pulled

* Update metrics name

* Reduce log level for potentially noisy log

* Add knob to disable autopilot

* Don't persist if default config is in use

* Autopilot: Dead server cleanup (#10857)

* Dead server cleanup

* Initialize channel in any case

* Fix a bunch of tests

* Fix panic

* Add follower locking in heartbeat tracker

* Add LastContactFailureThreshold to config

* Add log when marking node as dead

* Update follower state locking in heartbeat tracker

* Avoid follower states being nil

* Pull test to its own file

* Add execution status to state response

* Optionally enable autopilot in some tests

* Updates

* Added API function to fetch autopilot configuration

* Add test for default autopilot configuration

* Configuration tests

* Add State API test

* Update test

* Added TestClusterOptions.PhysicalFactoryConfig

* Update locking

* Adjust locking in heartbeat tracker

* s/last_contact_failure_threshold/left_server_last_contact_threshold

* Add disabling autopilot as a core config option

* Disable autopilot in some tests

* s/left_server_last_contact_threshold/dead_server_last_contact_threshold

* Set the lastheartbeat of followers to now when setting up active node

* Don't use config defaults from CLI command

* Remove config file support

* Remove HCL test as well

* Persist only supplied config; merge supplied config with default to operate

* Use pointer to structs for storing follower information

* Test update

* Retrieve non voter status from configbucket and set it up when a node comes up

* Manage desired suffrage

* Consider bucket being created already

* Move desired suffrage to its own entry

* s/DesiredSuffrageKey/LocalNodeConfigKey

* s/witnessSuffrage/recordSuffrage

* Fix test compilation

* Handle local node config post a snapshot install

* Commit to storage first; then record suffrage in fsm

* No need of local node config being nili case, post snapshot restore

* Reconcile autopilot config when a new leader takes over duty

* Grab fsm lock when recording suffrage

* s/Suffrage/DesiredSuffrage in FollowerState

* Instantiate autopilot only in leader

* Default to old ways in more scenarios

* Make API gracefully handle 404

* Address some feedback

* Make IsDead an atomic.Value

* Simplify follower hearbeat tracking

* Use uber.atomic

* Don't have multiple causes for having autopilot disabled

* Don't remove node from follower states if we fail to remove the dead server

* Autopilot server removals map (#11019)

* Don't remove node from follower states if we fail to remove the dead server

* Use map to track dead server removals

* Use lock and map

* Use delegate lock

* Adjust when to remove entry from map

* Only hold the lock while accessing map

* Fix race

* Don't set default min_quorum

* Fix test

* Ensure follower states is not nil before starting autopilot

* Fix race

Co-authored-by: Jason O'Donnell <2160810+jasonodonnell@users.noreply.github.com>
Co-authored-by: Theron Voran <tvoran@users.noreply.github.com>
2021-03-03 13:59:50 -05:00
swayne275 d74f82346b
Add Partial Month Client Count API for Activity Log (#11022)
* sketch out partial month activity log client API

* unit test partialMonthClientCount

* cleanup api

* add api doc, fix test, update api nomenclature to match existing

* cleanup

* add PR changelog file

* integration test for API

* report entities and tokens separately
2021-03-01 16:15:59 -07:00
Scott Miller 08d8f65e01
Take the state lock in checkBarrierRotate, and don't save on seal (#11028)
* Use the state lock, and don't bother a last minute check on seal

* defer
2021-03-01 16:32:17 -06:00
Brian Kassouf cbb8b21520 Fix test build 2021-03-01 12:29:12 -08:00
Brian Kassouf 1bc410783d OSS/ENT Drift 2021-03-01 10:51:04 -08:00
Brian Kassouf a112161f60
expiration: Add a few metrics to measure revoke queue lengths (#10955)
* expiration: Add a few metrics to measure revoke queue lengths

* Update the metric names

* Add appropriate cluster labels

* Add metrics to docs

* Update jobmanager.go
2021-02-26 16:00:39 -08:00
Scott Miller a7b372b447
Two minor changes not reflected OSS side (#11020) 2021-02-26 14:23:56 -06:00
Scott Miller b13b27f37e
OSS side barrier encryption tracking and automatic rotation (#11007)
* Automatic barrier key rotation, OSS portion

* Fix build issues

* Vendored version

* Add missing encs field, not sure where this got lost.
2021-02-25 14:27:25 -06:00
Nick Cabatoff c1ddfbb538
OSS parts of the new client controlled consistency feature (#10974) 2021-02-24 06:58:10 -05:00
swayne275 38a647c6e5
remove noisy log, simplify job interface (#10975) 2021-02-22 15:00:24 -07:00
Brian Kassouf 34a7fc0286
replication: Don't write request coutners on DR Secondary nodes (#10936) 2021-02-22 09:04:41 -08:00
Brian Kassouf 0ad63e5a20
core/expiration: Add backoff jitter to the expiration retries (#10937) 2021-02-18 20:20:01 -08:00
Hridoy Roy 4a96126d5a
Revert "Vault Dependency Upgrades [VAULT-871] (#10903)" (#10939)
This reverts commit eb74ca61fc4dcb7038f39defb127d5d639ba0ca1.
2021-02-18 15:40:18 -05:00
Hridoy Roy a26d1300e8
Vault Dependency Upgrades [VAULT-871] (#10903)
* upgrade vault dependency set

* etcd and grpc issues:

* better for tests

* testing

* all upgrades for hashicorp deps

* kubernetes plugin upgrade seems to work

* kubernetes plugin upgrade seems to work

* etcd and a bunch of other stuff

* all vulnerable packages upgraded

* k8s is broken in linux env but not locally

* test fixes

* fix testing

* fix etcd and grpc

* fix etcd and grpc

* use master branch of go-testing-interface

* roll back etcd upgrade

* have to fix grpc since other vendors pull in grpc 1.35.0 but we cant due to etcd

* rolling back in the replace directives

* a few more testing dependencies to clean up

* fix go mod vendor
2021-02-18 12:31:57 -08:00
swayne275 e4119a6a8a
Vault-1403 Switch Expiration Manager to use Fairsharing Backpressure (#1709) (#10932)
* basic pool and start testing

* refactor a bit for testing

* workFunc, start/stop safety, testing

* cleanup function for worker quit, more tests

* redo public/private members

* improve tests, export types, switch uuid package

* fix loop capture bug, cleanup

* cleanup tests

* update worker pool file name, other improvements

* add job manager prototype

* remove remnants

* add functions to wait for job manager and worker pool to stop, other fixes

* test job manager functionality, fix bugs

* encapsulate how jobs are distributed to workers

* make worker job channel read only

* add job interface, more testing, fixes

* set name for dispatcher

* fix test races

* wire up expiration manager most of the way

* dispatcher and job manager constructors don't return errors

* logger now dependency injected

* make some members private, test fcn to get worker pool size

* make GetNumWorkers public

* Update helper/fairshare/jobmanager_test.go

Co-authored-by: Brian Kassouf <briankassouf@users.noreply.github.com>

* update fairsharing usage, add tests

* make workerpool private

* remove custom worker names

* concurrency improvements

* remove worker pool cleanup function

* remove cleanup func from job manager, remove non blocking stop from fairshare

* update job manager for new constructor

* stop job manager when expiration manager stopped

* unset env var after test

* stop fairshare when started in tests

* stop leaking job manager goroutine

* prototype channel for waking up to assign work

* fix typo/bug and add tests

* improve job manager wake up, fix test typo

* put channel drain back

* better start/pause test for job manager

* comment cleanup

* degrade possible noisy log

* remove closure, clean up context

* improve revocation context timer

* test: reduce number of revocation workers during many tests

* Update vault/expiration.go

Co-authored-by: Brian Kassouf <briankassouf@users.noreply.github.com>

* feedback tweaks

Co-authored-by: Brian Kassouf <briankassouf@users.noreply.github.com>

Co-authored-by: Brian Kassouf <briankassouf@users.noreply.github.com>
2021-02-17 14:30:27 -08:00
swayne275 6e1b183f79
Shutdown Test Cores when Tests Complete (#10912)
* Shutdown Test Cores when Tests Complete

* go mod vendor
2021-02-12 13:04:48 -07:00
Jim Kalafut 42bae71806
Improve error messages (#10843)
- Fix: "bytes" should be less than %!s(int=131072) message
- Also add a missing openapi type that was throwing warnings
2021-02-11 19:51:12 -08:00
Michael Golowka 108d4c6a68
MySQL - Add username customization (#10834) 2021-02-11 14:08:32 -07:00
Vishal Nayak 53cb1deb38
Revert "Read-replica instead of non-voter (#10875)" (#10890)
This reverts commit fc745670cf34821f5834357d9caebc3351dbc1e7.
2021-02-10 16:41:58 -05:00
Mark Gritter 85c1ae1002
Fix error in log; add additional log on conflicting quotas. (#10888) 2021-02-10 12:24:35 -06:00
Ian Ferguson 865df63c76
Correct lock acquisition order in the pathEntityMergeID identity to fix deadlock condition (#10877) 2021-02-10 11:05:16 -05:00
Vishal Nayak a2394e7353
Read-replica instead of non-voter (#10875) 2021-02-10 09:58:18 -05:00
Mark Gritter c5fd996a36
Fix flaky ActivityLog unit test (#10860)
* Wait for initial retention run to finish before adding segments.
2021-02-09 16:34:49 -06:00
Vishal Nayak 8613ba88a6
Fix quota enforcing old path issue (#10689)
* Fix db indexing issue

* Add CL update
2021-02-09 05:46:09 -05:00
Mark Gritter d0994340fb
Fill in missing lease ID deterministically. Generate a UUID on creation. (#10855) 2021-02-08 13:46:59 -06:00
Nick Cabatoff 75c955b3c3
Apply OSS part of ENT change re waitForReplicationState. (#10837) 2021-02-04 09:10:35 -05:00
Mark Gritter 3ec15c4927
Fix use of identity/group endpoint to edit group by name (#10812)
* Updates identity/group to allow updating a group by name (#10223)
* Now that lookup by name is outside handleGroupUpdateCommon, do not
use the second name lookup as the object to update.
* Added changelog.

Co-authored-by: dr-db <25711615+dr-db@users.noreply.github.com>
2021-01-29 16:50:08 -06:00
Mark Gritter ce858de180
Fix for test failing on January 29th: advance months using timeutil, not AddDate. (#10808) 2021-01-29 11:48:22 -06:00
Hridoy Roy 537189cab8
make token create case insensitive [VAULT-1021] (#10743)
* make token create case insensitive

* changelog

* comment update
2021-01-27 09:56:54 -08:00
Aleksandr Bezobchuk 2ec8f9a222
metrics: activity log (#10514)
* core: add vault.identity.entity.active.monthly log
* Fixed end-of-month metrics and unit test.
* Added metric covering month-to-date (not broken down by namespace.)
* Updated documentation
* Added changelog.

Co-authored-by: mgritter <mgritter@hashicorp.com>
2021-01-26 16:37:07 -06:00
Vishal Nayak fcbbc5f7d8
Remove peer DR op token check only on secondaries (#10765) 2021-01-25 17:35:58 -05:00
Vishal Nayak 904bacd55e
Fix remove peers check (#10758) 2021-01-25 14:20:46 -05:00
Vishal Nayak c74c381fb1
Move the declaration to a OSS build tag file to not have it collide w… (#10750)
* Move the declaration to a OSS build tag file to not have it collide with ent declarations

* Add comment

* Remove comment to trigger ci
2021-01-25 09:35:19 -05:00
Vishal Nayak 8ebf0ae794
Fix build (#10749) 2021-01-22 16:40:22 -05:00
Vishal Nayak 5d270db1df
Add list peers to DR secondaries (#10746) 2021-01-22 11:50:59 -05:00
Mark Gritter fd55aa8378
Implement sys/seal-status and sys/leader in system backend (#10725)
* Implement sys/seal-status and sys/leader as normal API calls
(so that they can be used in namespaces.)
* Added changelog.
2021-01-20 14:04:24 -06:00
Nick Cabatoff 8cbc63d572
Add configuration to specify a TLS ServerName to use in the TLS handshake when performing a raft join. (#10698) 2021-01-19 17:54:28 -05:00
Nick Cabatoff c2bdeb9e7d
Minimal change to ensure that the bulky leaseEntry isn't kept in memory. (#10726) 2021-01-19 17:51:41 -05:00
Hridoy Roy 0becd555cf
Protect part of emitMetrics from panic behavior during post-seal (#10708)
* vault/core_metrics.go

* changelog

* comments
2021-01-19 14:06:50 -08:00
Scott Miller 77d27cb968
Add NIST guidance on rotating keys used for AES-GCM encryption (#10612)
* Add NIST guidance on rotating keys used for AES-GCM encryption

* Capture more places barrier encryption is used

* spacing issue

* Probabilistically track an estimated encryption count by key term

* Un-reorder imports

* wip

* get rid of sampling
2021-01-07 15:37:37 -06:00
Scott Miller c3e0d06216
Make the error response to the sys/internal/ui/mounts with no client token consistent (#10650)
* Make the error response to the sys/internal/ui/mounts with no client token consistent

* changelog

* Don't test against an empty mount path

* One other spot

* Instead, do all token checks first and early out before even looking for the mount
2021-01-07 11:46:08 -06:00
Lauren Voswinkel 7189a67a33
Adding snowflake as a bundled database secrets plugin (#10603)
* Adding snowflake as a bundled database secrets plugin

* Add snowflake-database-plugin to expected bundled plugins

* Add snowflake plugin name to the mockBuiltinRegistry
2021-01-07 09:30:24 -08:00
Mark Gritter d076d95d37
Feature flags API (#10613)
* Added sys/internal/ui/feature-flags endpoint.
* Added documentation for new API endpoint.
* Added integration test.
Co-authored-by: swayne275 <swayne@hashicorp.com>
2021-01-06 16:05:00 -06:00
Nick Cabatoff e856174d15
Fix test for expiring root tokens creating non-expiring root tokens (#10632)
Test was failing (once we specified the expected error to check) because when we create a token via the TokenStore, without registering the lease in the expiration manager, lookupInternal will see that there is an expiring token with no lease and delete it immediately, yielding the "no parent found" error.
2021-01-04 09:48:22 -05:00
swayne275 a961bdc318
Fix setting Activity Log enable flag through the API (#10594)
* fix setting enable, update tests

* improve wording

* fix typo - left the testing enabled set in originally

* improve warning handling

* move from nested if to switch - TIL
2020-12-18 11:20:32 -07:00
Mark Gritter 8c67bed7ae
Send a test message before committing a new audit device. (#10520)
* Send a test message before committing a new audit device.
Also, lower timeout on connection attempts in socket device.
* added changelog
* go mod vendor (picked up some unrelated changes.)
* Skip audit device check in integration test.
Co-authored-by: swayne275 <swayne@hashicorp.com>
2020-12-16 16:00:32 -06:00
Aleksandr Bezobchuk ae6267cc9b
core: add warning when disabling activity (#10485) 2020-12-15 14:11:28 -05:00
Michel Vocks 191aa65bc3
Fix UI custom header values (#10511)
* Fix UI custom header values

* Fix changelog entry

* Introduce param for multi values

* Fix multivalue

* multivalue should be bool

* Sort imports

* Fix conflict

* Remove changelog entry

* Revert entry delete
2020-12-15 15:58:03 +01:00
swayne275 cdf933adf1
say how many leases there are when threshold exceeded (#10567) 2020-12-14 16:00:19 -07:00
Aleksandr Bezobchuk 3bce568535
rate limit: fix initialize defaults (#10536) 2020-12-14 14:55:52 -05:00