Commit graph

474 commits

Author SHA1 Message Date
Scott Miller 10270b6985
Add a periodic test of the autoseal to detect loss of connectivity. (#13078)
* Add a periodic test of the autoseal to detect loss of connectivity

* Keep the logic adjacent to autoseal

* imports

* typo, plus unnecessary constant time compare

* changelog

* pr feedback

* More feedback

* Add locking and a unit test

* unnecessary

* Add timeouts to encrypt/decrypt operations, capture activeContext before starting loop

* Add a block scope for the timeout

* copy/paste ftl

* Refactor to use two timeouts, and cleanup the repetitive failure code

* Readd 0ing gauge

* use millis

* Invert the unit test logic
2021-11-10 14:46:07 -06:00
Nick Cabatoff 40640ef43f
Fix errors logged on standbys when we try to write versions to storage (#13042) 2021-11-08 10:04:17 -05:00
swayne275 92e6972b86
oss LastDRWAL (#12931) 2021-10-26 15:17:20 -06:00
swayne275 95e5cdd000
VAULT 2844: remove legacy lease revocation strategy (#12888)
* remove legacy lease revocation strategy

* add deprecation change log note

* remove VAULT_16_REVOKE_PERMITPOOL

* update changelog
2021-10-22 17:37:01 -06:00
Hridoy Roy 1c427d3286
Port: add client ID to TWEs in activity log [vault-3136] (#12820)
* port for tracking twes as clients

* comment clean up

* changelog

* change changelog entry phrasing
2021-10-14 09:10:59 -07:00
hghaf099 ad2ef412cc
Customizing HTTP headers in the config file (#12485)
* Customizing HTTP headers in the config file

* Add changelog, fix bad imports

* fixing some bugs

* fixing interaction of custom headers and /ui

* Defining a member in core to set custom response headers

* missing additional file

* Some refactoring

* Adding automated tests for the feature

* Changing some error messages based on some recommendations

* Incorporating custom response headers struct into the request context

* removing some unused references

* fixing a test

* changing some error messages, removing a default header value from /ui

* fixing a test

* wrapping ResponseWriter to set the custom headers

* adding a new test

* some cleanup

* removing some extra lines

* Addressing comments

* fixing some agent tests

* skipping custom headers from agent listener config,
removing two of the default headers as they cause issues with Vault in UI mode
Adding X-Content-Type-Options to the ui default headers
Let Content-Type be set as before

* Removing default custom headers, and renaming some function varibles

* some refacotring

* Refactoring and addressing comments

* removing a function and fixing comments
2021-10-13 11:06:33 -04:00
swayne275 2edac287ae
update function signature and call (#12806) 2021-10-11 18:21:38 -06:00
swayne275 b9fde1dd6f
oss port (#12755) 2021-10-07 11:25:16 -06:00
vinay-gopalan 458927c2ed
[VAULT-3157] Move mergeStates utils from Agent to api module (#12731)
* move merge and compare states to vault core

* move MergeState, CompareStates and ParseRequiredStates to api package

* fix merge state reference in API Proxy

* move mergeStates test to api package

* add changelog

* ghost commit to trigger CI

* rename CompareStates to CompareReplicationStates

* rename MergeStates and make compareStates and parseStates private methods

* improved error messaging in parseReplicationState

* export ParseReplicationState for enterprise files
2021-10-06 10:57:06 -07:00
Brian Kassouf 39a9727c8b
Update protobuf & grpc libraries and protoc plugins (#12679) 2021-09-29 18:25:15 -07:00
Scott Miller 0f6543fb41
Upgrade go-limiter to fix building on 1.17 (#12358)
* Upgrade go-limiter

* Modify quota system to pass contexts to upgraded go-limiter

* One more spot

* Add context vars to unit tests

* missed one
2021-09-01 16:28:47 -05:00
Nick Cabatoff 0762f9003d
Refactor usages of Core in IdentityStore so they can be decoupled. (#12461) 2021-08-30 15:31:11 -04:00
Pratyoy Mukhopadhyay 113b6885c3
[VAULT-2852] deprecate req counters in oss (#12197) 2021-07-29 10:21:40 -07:00
Jeff Mitchell f7147025dd
Migrate to sdk/internalshared libs in go-secure-stdlib (#12090)
* Swap sdk/helper libs to go-secure-stdlib

* Migrate to go-secure-stdlib reloadutil

* Migrate to go-secure-stdlib kv-builder

* Migrate to go-secure-stdlib gatedwriter
2021-07-15 20:17:31 -04:00
Pratyoy Mukhopadhyay adbaad17db
[VAULT-708] Zero out request counter on preSeal (#11970)
* [VAULT-708] Zero out request counter on preSeal

* [VAULT-708] Added changelog entry

* [VAULT-708] Add comment clarifying request counter zeroing
2021-07-07 14:03:39 -05:00
Nick Cabatoff 2ce2617a8a
Reorganize request handling code so that we don't touch storage until we have the stateLock. (#11835) 2021-06-11 13:18:16 -04:00
Hridoy Roy 65e3489c45
Diagnose resource creation checks (#11627)
* initial refactoring of unseal step in run

* remove waitgroup

* remove waitgroup

* backup work

* backup

* backup

* completely modularize run and move into diagnose

* add diagnose errors for incorrect number of unseal keys

* comment tests back in

* backup

* first subspan

* finished subspanning but running into error with timeouts

* remove runtime checks

* merge main branch

* meeting updates

* remove telemetry block

* roy comment

* subspans for seal finalization and wrapping diagnose latency checks

* backup while I fix something else

* fix storage latency test errors

* runtime checks

* diagnose with timeout on seal
2021-06-10 12:29:32 -07:00
Josh Black c8cfcd9514
OSS parts of sighup license reload (#11767) 2021-06-04 10:24:35 -07:00
swayne275 35aad1df4a
Vault 2256: fix lease count quotas causing panics on dr secondaries (#11742)
* lift relevant changes from ent to oss

* fix silent error bug in quotas
2021-06-02 10:12:05 -06:00
Nick Cabatoff 2adef1f878
OSS parts of #1891 (sys/health license addition) (#11676) 2021-05-20 13:32:15 -04:00
Nick Cabatoff 01f96f18ce
VAULT-2439: OSS parts of #1889 (raft licensing init) (#11665) 2021-05-19 16:07:58 -04:00
Nick Cabatoff e212ec5d8e
OSS parts of ent PR #1857: license autoloading init changes. (#11623) 2021-05-17 14:10:26 -04:00
Lars Lehtonen 53dd619d2f
vault: deprecate errwrap.Wrapf() (#11577) 2021-05-11 13:12:54 -04:00
Nick Cabatoff 53c7d1de7d
config for autoloading license (oss parts) 2021-05-07 08:55:41 -04:00
Nick Cabatoff 663ad150a7
Make TestActivityLog_MultipleFragmentsAndSegments timeout on its own (#11490)
* The main driver for this change was to make the read from a.newFragmentCh timeout quickly rather than waiting for the test timeout (much longer).  While testing the change I observed a panic during shutdown, but it was swallowed and moreover there was no stack trace so it wasn't obvious.  I'm hoping we can get rid of the recover, so I fixed the issue in the activitylog tests that needed it.
2021-05-06 10:19:53 -04:00
Josh Black 06809930a3
Add HTTP response headers for hostname and raft node ID (if applicable) (#11289) 2021-04-20 15:25:04 -07:00
Hridoy Roy fde9f2f71d
Add More TLS Tests and Verification of TLS Root Certificate (#11300)
* tls tests and root verification

* make the certificate verification check correct for non root CA case

* add expiry test

* addressed comments but struggling with the bug in parsing Cas and inters from single file:

* final checks on tls and listener

* cleanup
2021-04-12 08:39:40 -07:00
Brian Kassouf 303c2aee7c
Run a more strict formatter over the code (#11312)
* Update tooling

* Run gofumpt

* go mod vendor
2021-04-08 09:43:39 -07:00
Nick Cabatoff c2673ee86a
Move SanitizedConfig back to a shared-ent file. (#11291) 2021-04-07 10:25:05 -04:00
Hridoy Roy 049f2513e6
Initial Diagnose Command for TLS and Listener [VAULT-1896, VAULT-1899] (#11249)
* sanity checks for tls config in diagnose

* backup

* backup

* backup

* added necessary tests

* remove comment

* remove parallels causing test flakiness

* comments

* small fix

* separate out config hcl test case into new hcl file

* newline

* addressed comments

* addressed comments

* addressed comments

* addressed comments

* addressed comments

* reload funcs should be allowed to be nil
2021-04-06 16:40:43 -07:00
Vishal Nayak 3e55e79a3f
Autopilot: Server Stabilization, State and Dead Server Cleanup (#10856)
* k8s doc: update for 0.9.1 and 0.8.0 releases (#10825)

* k8s doc: update for 0.9.1 and 0.8.0 releases

* Update website/content/docs/platform/k8s/helm/configuration.mdx

Co-authored-by: Theron Voran <tvoran@users.noreply.github.com>

Co-authored-by: Theron Voran <tvoran@users.noreply.github.com>

* Autopilot initial commit

* Move autopilot related backend implementations to its own file

* Abstract promoter creation

* Add nil check for health

* Add server state oss no-ops

* Config ext stub for oss

* Make way for non-voters

* s/health/state

* s/ReadReplica/NonVoter

* Add synopsis and description

* Remove struct tags from AutopilotConfig

* Use var for config storage path

* Handle nin-config when reading

* Enable testing autopilot by using inmem cluster

* First passing test

* Only report the server as known if it is present in raft config

* Autopilot defaults to on for all existing and new clusters

* Add locking to some functions

* Persist initial config

* Clarify the command usage doc

* Add health metric for each node

* Fix audit logging issue

* Don't set DisablePerformanceStandby to true in test

* Use node id label for health metric

* Log updates to autopilot config

* Less aggressively consume config loading failures

* Return a mutable config

* Return early from known servers if raft config is unable to be pulled

* Update metrics name

* Reduce log level for potentially noisy log

* Add knob to disable autopilot

* Don't persist if default config is in use

* Autopilot: Dead server cleanup (#10857)

* Dead server cleanup

* Initialize channel in any case

* Fix a bunch of tests

* Fix panic

* Add follower locking in heartbeat tracker

* Add LastContactFailureThreshold to config

* Add log when marking node as dead

* Update follower state locking in heartbeat tracker

* Avoid follower states being nil

* Pull test to its own file

* Add execution status to state response

* Optionally enable autopilot in some tests

* Updates

* Added API function to fetch autopilot configuration

* Add test for default autopilot configuration

* Configuration tests

* Add State API test

* Update test

* Added TestClusterOptions.PhysicalFactoryConfig

* Update locking

* Adjust locking in heartbeat tracker

* s/last_contact_failure_threshold/left_server_last_contact_threshold

* Add disabling autopilot as a core config option

* Disable autopilot in some tests

* s/left_server_last_contact_threshold/dead_server_last_contact_threshold

* Set the lastheartbeat of followers to now when setting up active node

* Don't use config defaults from CLI command

* Remove config file support

* Remove HCL test as well

* Persist only supplied config; merge supplied config with default to operate

* Use pointer to structs for storing follower information

* Test update

* Retrieve non voter status from configbucket and set it up when a node comes up

* Manage desired suffrage

* Consider bucket being created already

* Move desired suffrage to its own entry

* s/DesiredSuffrageKey/LocalNodeConfigKey

* s/witnessSuffrage/recordSuffrage

* Fix test compilation

* Handle local node config post a snapshot install

* Commit to storage first; then record suffrage in fsm

* No need of local node config being nili case, post snapshot restore

* Reconcile autopilot config when a new leader takes over duty

* Grab fsm lock when recording suffrage

* s/Suffrage/DesiredSuffrage in FollowerState

* Instantiate autopilot only in leader

* Default to old ways in more scenarios

* Make API gracefully handle 404

* Address some feedback

* Make IsDead an atomic.Value

* Simplify follower hearbeat tracking

* Use uber.atomic

* Don't have multiple causes for having autopilot disabled

* Don't remove node from follower states if we fail to remove the dead server

* Autopilot server removals map (#11019)

* Don't remove node from follower states if we fail to remove the dead server

* Use map to track dead server removals

* Use lock and map

* Use delegate lock

* Adjust when to remove entry from map

* Only hold the lock while accessing map

* Fix race

* Don't set default min_quorum

* Fix test

* Ensure follower states is not nil before starting autopilot

* Fix race

Co-authored-by: Jason O'Donnell <2160810+jasonodonnell@users.noreply.github.com>
Co-authored-by: Theron Voran <tvoran@users.noreply.github.com>
2021-03-03 13:59:50 -05:00
Scott Miller 08d8f65e01
Take the state lock in checkBarrierRotate, and don't save on seal (#11028)
* Use the state lock, and don't bother a last minute check on seal

* defer
2021-03-01 16:32:17 -06:00
Scott Miller b13b27f37e
OSS side barrier encryption tracking and automatic rotation (#11007)
* Automatic barrier key rotation, OSS portion

* Fix build issues

* Vendored version

* Add missing encs field, not sure where this got lost.
2021-02-25 14:27:25 -06:00
Nick Cabatoff c1ddfbb538
OSS parts of the new client controlled consistency feature (#10974) 2021-02-24 06:58:10 -05:00
swayne275 e4119a6a8a
Vault-1403 Switch Expiration Manager to use Fairsharing Backpressure (#1709) (#10932)
* basic pool and start testing

* refactor a bit for testing

* workFunc, start/stop safety, testing

* cleanup function for worker quit, more tests

* redo public/private members

* improve tests, export types, switch uuid package

* fix loop capture bug, cleanup

* cleanup tests

* update worker pool file name, other improvements

* add job manager prototype

* remove remnants

* add functions to wait for job manager and worker pool to stop, other fixes

* test job manager functionality, fix bugs

* encapsulate how jobs are distributed to workers

* make worker job channel read only

* add job interface, more testing, fixes

* set name for dispatcher

* fix test races

* wire up expiration manager most of the way

* dispatcher and job manager constructors don't return errors

* logger now dependency injected

* make some members private, test fcn to get worker pool size

* make GetNumWorkers public

* Update helper/fairshare/jobmanager_test.go

Co-authored-by: Brian Kassouf <briankassouf@users.noreply.github.com>

* update fairsharing usage, add tests

* make workerpool private

* remove custom worker names

* concurrency improvements

* remove worker pool cleanup function

* remove cleanup func from job manager, remove non blocking stop from fairshare

* update job manager for new constructor

* stop job manager when expiration manager stopped

* unset env var after test

* stop fairshare when started in tests

* stop leaking job manager goroutine

* prototype channel for waking up to assign work

* fix typo/bug and add tests

* improve job manager wake up, fix test typo

* put channel drain back

* better start/pause test for job manager

* comment cleanup

* degrade possible noisy log

* remove closure, clean up context

* improve revocation context timer

* test: reduce number of revocation workers during many tests

* Update vault/expiration.go

Co-authored-by: Brian Kassouf <briankassouf@users.noreply.github.com>

* feedback tweaks

Co-authored-by: Brian Kassouf <briankassouf@users.noreply.github.com>

Co-authored-by: Brian Kassouf <briankassouf@users.noreply.github.com>
2021-02-17 14:30:27 -08:00
Vishal Nayak 53cb1deb38
Revert "Read-replica instead of non-voter (#10875)" (#10890)
This reverts commit fc745670cf34821f5834357d9caebc3351dbc1e7.
2021-02-10 16:41:58 -05:00
Vishal Nayak a2394e7353
Read-replica instead of non-voter (#10875) 2021-02-10 09:58:18 -05:00
Brian Kassouf 275ca323e8
core: Record the time a node became active (#10489)
* core: Record the time a node became active

* Update vault/core.go

Co-authored-by: Nick Cabatoff <ncabatoff@hashicorp.com>

* Add omitempty field

* Update vendor

* Added CL entry and fixed test

* Fix test

* Fix command package tests

Co-authored-by: Nick Cabatoff <ncabatoff@hashicorp.com>
2020-12-11 16:50:19 -08:00
Nick Cabatoff b425be1a93
Fix race with test that mutates KeyRotateGracePeriod: make the global be a Core field instead. (#10512) 2020-12-08 13:57:44 -05:00
swayne275 88eaf5f4c3
Fix Racy Activity Log Tests (#10484)
* fix racy activity log tests and move testing utilities elsewhere

* remove TODO

* move SetEnable out of activity log

* clarify not waiting on waitgroup

* remove todo
2020-12-02 13:48:13 -07:00
Brian Kassouf 81a86f48e8
Backport some OSS changes (#10267)
* Backport some OSS changes

* go mod vendor
2020-10-29 16:47:34 -07:00
Nick Cabatoff 0d6a929a4c
Same seal migration oss (#10224)
* Refactoring and test improvements.

* Support migrating from a given type of autoseal to that same type but with different parameters.
2020-10-23 14:16:04 -04:00
Aleksandr Bezobchuk 0d6a0ec589
Merge PR #10010: Rate Limit Quotas: Allow Exempt Paths to be Configurable 2020-10-16 14:58:19 -04:00
Nick Cabatoff 66274607b7
OSS changes for enterprise automated snapshots (#10160) 2020-10-16 14:57:11 -04:00
Brian Kassouf 84dbca38a1
Revert "Migrate internalshared out (#9727)" (#10141)
This reverts commit ee6391b691ac12ab6ca13c3912404f1d3a842bd6.
2020-10-13 16:38:21 -07:00
Jeff Mitchell e6881c8147
Migrate internalshared out (#9727)
* Migrate internalshared out

* fix merge issue

* fix merge issue

* go mod vendor

Co-authored-by: Brian Kassouf <bkassouf@hashicorp.com>
2020-10-12 11:56:24 -07:00
Mark Gritter 587ed7d499
Disable usage metrics on performance standby nodes. (#9966) 2020-09-15 17:12:28 -05:00
Mark Gritter 1b2c20e07c
Merge activity log work to date on enterprise back into oss. (#9900)
* Added stub class for activity logging. (#1435)
* Define activity fragments and starter methods for manipulating them. (#1441)
2020-09-08 14:22:09 -05:00
ncabatoff 4134ef2e98
Ensure that perf standbys can perform seal migrations. (#9690) 2020-08-10 08:35:57 -04:00
Rodrigo D. L d0df8bfa21
adding new config flag disable_sentinel_trace (#9696) 2020-08-10 06:23:44 -04:00