Commit graph

108 commits

Author SHA1 Message Date
Josh Black 416504d8c3
Add autopilot automated upgrades and redundancy zones (#15521) 2022-05-20 16:49:11 -04:00
Nick Cabatoff bc9f69af2e
Forward autopilot state reqs, avoid self-dialing (#15493)
Make sure that autopilot is disabled when we step down from active node state.  Forward autopilot state requests to the active node.  Avoid self-dialing due to stale advertisement.
2022-05-18 14:50:18 -04:00
Nick Cabatoff c5928c1d15
Raft: use a larger initial heartbeat/election timeout (#15042) 2022-04-29 08:32:16 -04:00
Josh Black 41a4b7a170
Ensure initialMmapSize is 0 on Windows (#14977)
* ensure initialMmapSize is 0 on windows

* add changelog
2022-04-08 12:07:21 -07:00
Austin Gebauer f9ce8d7615
Adds Vault version prerelease and metadata to logical.PluginEnvironment (#14851) 2022-04-04 22:31:01 -07:00
hghaf099 9ae2a85700
Fixing excessive unix file permissions (#14791)
* Fixing excessive unix file permissions

* CL

* reduce the permission from 750 to 700
2022-04-01 12:57:38 -04:00
hghaf099 aafb5d6427
VAULT-4240 time.After() in a select statement can lead to memory leak (#14814)
* VAULT-4240 time.After() in a select statement can lead to memory leak

* CL
2022-04-01 10:17:11 -04:00
Josh Black e83471d7de
Login MFA (#14025)
* Login MFA

* ENT OSS segragation (#14088)

* Delete method id if not used in an MFA enforcement config (#14063)

* Delete an MFA methodID only if it is not used by an MFA enforcement config

* Fixing a bug: mfa/validate is an unauthenticated path, and goes through the handleLoginRequest path

* adding use_passcode field to DUO config (#14059)

* add changelog

* preventing replay attack on MFA passcodes (#14056)

* preventing replay attack on MFA passcodes

* using %w instead of %s for error

* Improve CLI command for login mfa (#14106)

CLI prints a warning message indicating the login request needs to get validated

* adding the validity period of a passcode to error messages (#14115)

* PR feedback

* duo to handle preventing passcode reuse

Co-authored-by: hghaf099 <83242695+hghaf099@users.noreply.github.com>
Co-authored-by: hamid ghaf <hamid@hashicorp.com>
2022-02-17 13:08:51 -08:00
John-Michael Faircloth 1cf74e1179
feature: multiplexing support for database plugins (#14033)
* feat: DB plugin multiplexing (#13734)

* WIP: start from main and get a plugin runner from core

* move MultiplexedClient map to plugin catalog
- call sys.NewPluginClient from PluginFactory
- updates to getPluginClient
- thread through isMetadataMode

* use go-plugin ClientProtocol interface
- call sys.NewPluginClient from dbplugin.NewPluginClient

* move PluginSets to dbplugin package
- export dbplugin HandshakeConfig
- small refactor of PluginCatalog.getPluginClient

* add removeMultiplexedClient; clean up on Close()
- call client.Kill from plugin catalog
- set rpcClient when muxed client exists

* add ID to dbplugin.DatabasePluginClient struct

* only create one plugin process per plugin type

* update NewPluginClient to return connection ID to sdk
- wrap grpc.ClientConn so we can inject the ID into context
- get ID from context on grpc server

* add v6 multiplexing  protocol version

* WIP: backwards compat for db plugins

* Ensure locking on plugin catalog access

- Create public GetPluginClient method for plugin catalog
- rename postgres db plugin

* use the New constructor for db plugins

* grpc server: use write lock for Close and rlock for CRUD

* cleanup MultiplexedClients on Close

* remove TODO

* fix multiplexing regression with grpc server connection

* cleanup grpc server instances on close

* embed ClientProtocol in Multiplexer interface

* use PluginClientConfig arg to make NewPluginClient plugin type agnostic

* create a new plugin process for non-muxed plugins

* feat: plugin multiplexing: handle plugin client cleanup (#13896)

* use closure for plugin client cleanup

* log and return errors; add comments

* move rpcClient wrapping to core for ID injection

* refactor core plugin client and sdk

* remove unused ID method

* refactor and only wrap clientConn on multiplexed plugins

* rename structs and do not export types

* Slight refactor of system view interface

* Revert "Slight refactor of system view interface"

This reverts commit 73d420e5cd2f0415e000c5a9284ea72a58016dd6.

* Revert "Revert "Slight refactor of system view interface""

This reverts commit f75527008a1db06d04a23e04c3059674be8adb5f.

* only provide pluginRunner arg to the internal newPluginClient method

* embed ClientProtocol in pluginClient and name logger

* Add back MLock support

* remove enableMlock arg from setupPluginCatalog

* rename plugin util interface to PluginClient

Co-authored-by: Brian Kassouf <bkassouf@hashicorp.com>

* feature: multiplexing: fix unit tests (#14007)

* fix grpc_server tests and add coverage

* update run_config tests

* add happy path test case for grpc_server ID from context

* update test helpers

* feat: multiplexing: handle v5 plugin compiled with new sdk

* add mux supported flag and increase test coverage

* set multiplexingSupport field in plugin server

* remove multiplexingSupport field in sdk

* revert postgres to non-multiplexed

* add comments on grpc server fields

* use pointer receiver on grpc server methods

* add changelog

* use pointer for grpcserver instance

* Use a gRPC server to determine if a plugin should be multiplexed

* Apply suggestions from code review

Co-authored-by: Brian Kassouf <briankassouf@users.noreply.github.com>

* add lock to removePluginClient

* add multiplexingSupport field to externalPlugin struct

* do not send nil to grpc MultiplexingSupport

* check err before logging

* handle locking scenario for cleanupFunc

* allow ServeConfigMultiplex to dispense v5 plugin

* reposition structs, add err check and comments

* add comment on locking for cleanupExternalPlugin

Co-authored-by: Brian Kassouf <bkassouf@hashicorp.com>
Co-authored-by: Brian Kassouf <briankassouf@users.noreply.github.com>
2022-02-17 08:50:33 -06:00
davidadeleon 96dfbfbd02
Raft/fix raft telemetry metric unit (#13749)
Converting raft time metrics to Milliseconds over Default Nanoseconds to maintain consistency
2022-01-24 10:51:35 -05:00
Nick Cabatoff 07555c8bfc
Depend explicitly on go-msgpack v1.1.5 (#13693) 2022-01-19 10:32:19 -05:00
Nick Cabatoff 4ee4374b3e
Use MAP_POPULATE for our bbolt mmaps (#13573)
* Use MAP_POPULATE for our bbolt mmaps, assuming the files fit in memory.  This should improve startup times when freelist sync is disabled.
2022-01-11 08:16:53 -05:00
Scott Miller 89f617a97c
Convert to Go 1.17 go:build directive (#13579) 2022-01-05 12:02:03 -06:00
Jim Kalafut 22c4ae5933
Rename master key to root key (#13324)
* See what it looks like to replace "master key" with "root key".  There are two places that would require more challenging code changes: the storage path `core/master`, and its contents (the JSON-serialized EncodedKeyringtructure.)

* Restore accidentally deleted line

* Add changelog

* Update root->recovery

* Fix test

Co-authored-by: Nick Cabatoff <ncabatoff@hashicorp.com>
2021-12-06 17:12:20 -08:00
Nick Cabatoff 997a5ace91
Prevent raft transactions from containing overlarge keys. (#13286) 2021-11-26 08:38:39 -05:00
Nick Cabatoff f85908e1df
Return an error when trying to store a too-large key with Raft (#13282) 2021-11-25 14:07:03 -05:00
Josh Black d7c54b50e7
fix bolt 32 bit test (#13249) 2021-11-23 10:50:15 -08:00
Josh Black fe0dd6f867
Add InitialMmapSize to bolt options (#13178) 2021-11-22 20:16:57 -08:00
Mayo 0bd0339c0b
cleanup unused code and fix t.Fatal usage in goroutine in testing (#11694) 2021-09-30 07:33:14 -04:00
akshya96 c643dc1d53
Add Custom metadata field to alias (#12502)
* adding changes

* removing q.Q

* removing empty lines

* testing

* checking tests

* fixing tests

* adding changes

* added requested changes

* added requested changes

* added policy templating changes and fixed tests

* adding proto changes

* making changes

* adding unit tests

* using suggested function
2021-09-17 11:03:47 -07:00
Pratyoy Mukhopadhyay 5fda05adee
[VAULT-3226] Use os.rename on windows os (#12377)
* [VAULT-3226] Use os.rename on windows os

* [VAULT-3226] Add changelog
2021-08-19 16:05:53 -07:00
Nick Cabatoff 72499c3215
Check to make sure context isn't expired before doing a raft operation. (#12162) 2021-08-19 12:03:56 -04:00
Pratyoy Mukhopadhyay fa29e780f0
[NO-TICKET] Upgrade protoc-gen-go to 1.26, upgrade protoc to 3.17.3 (#12171)
* [NO-TICKET] Set protoc-gen-go to 1.23, upgrade protoc to 3.17.3

* [NO-TICKET] Upgrade version of protoc-gen-go to 1.26
2021-07-28 14:51:36 -07:00
Nick Cabatoff f7ecb978a6
Use a mode when opening the db file that won't result in excessive perms. (#12160) 2021-07-23 13:43:50 -04:00
Jeff Mitchell f7147025dd
Migrate to sdk/internalshared libs in go-secure-stdlib (#12090)
* Swap sdk/helper libs to go-secure-stdlib

* Migrate to go-secure-stdlib reloadutil

* Migrate to go-secure-stdlib kv-builder

* Migrate to go-secure-stdlib gatedwriter
2021-07-15 20:17:31 -04:00
Nick Cabatoff a3ac49aa05
VAULT-2809: Tweak creation of vault.db file (#12034) 2021-07-09 14:45:50 -04:00
Hridoy Roy e2614979f7
Diagnose Storage Panic Bugfixes (#11923)
* partial

* fix raft panics and ensure checks are skipped if storage isnt initialized

* cleanup directories

* newline

* typo in nil check

* another nil check
2021-06-24 09:56:38 -07:00
Nick Cabatoff ccae681628
Remove fragile link to docs from code. (#11928) 2021-06-23 15:43:44 -04:00
Brian Kassouf a794a6244f
raft: Set BatchApplyCh for more consistent batch sizes (#11907)
* raft: Set BatchApplyCh for more consistent batch sizes

* Add changelog file
2021-06-21 12:00:41 -07:00
Josh Black 8c069936e9
Add new boltdb options (#11895) 2021-06-21 11:35:40 -07:00
Hridoy Roy e38f991054
Diagnose checks for raft quorum status and file backend permissions (#11771)
* raft file and quorum checks

* raft checks

* backup

* raft file checks test

* address comments and add more raft and file and process checks

* syntax issues

* modularize functions to compile differently on different os

* compile raft checks everywhere

* more build tag issues

* raft-diagnose

* correct file permission checks

* upgrade tests and add a getConfigOffline test that currently does not work

* comment

* update file checks method signature on windows

* Update physical/raft/raft_test.go

Co-authored-by: Brian Kassouf <briankassouf@users.noreply.github.com>

* raft tests

* add todo comment for windows root ownership

* voter count message

* raft checks test fixes

Co-authored-by: Brian Kassouf <briankassouf@users.noreply.github.com>
2021-06-17 10:04:21 -07:00
Brian Kassouf 0d9ea8a4b7
physical/raft: Add a function that gets the offline, stale configuration (#11821)
* Add a function that gets the offline, stale configuration

* Fix comment
2021-06-11 10:25:02 -07:00
Lars Lehtonen 5ac47a9265
physical: deprecate errwrap.Wrapf() (#11692) 2021-05-31 12:54:05 -04:00
swayne275 335e4c3711
Introduce Logical Unrecoverable Error, Use it in Expiration Manager (#11477)
* build out zombie lease system

* add typo for CI

* undo test CI commit

* time equality test isn't working on CI, so let's see what this does...

* add unrecoverable proto error, make proto, go mod vendor

* zombify leases if unrecoverable error, tests

* test fix: somehow pointer in pointer rx is null after pointer rx called

* tweaks based on roy feedback

* improve zombie errors

* update which errors are unrecoverable

* combine zombie logic

* keep subset of zombie lease in memory
2021-05-03 17:56:06 -06:00
Scott Miller 85fbd45e1c
Create helpers which integrate with OpenTelemetry for diagnose collection (#11454)
* Create helpers which integrate with OpenTelemetry for diagnose collection

* Go mod vendor

* Comments

* Update vault/diagnose/helpers.go

Co-authored-by: swayne275 <swayne275@gmail.com>

* Add unit test/example

* tweak output

* More comments

* add spot check concept

* Get unit tests working on Result structs

* Fix unit test

* Get unit tests working, and make diagnose sessions local rather than global

* Comments

* Last comments

* No need for init

* :|

* Fix helpers_test

Co-authored-by: swayne275 <swayne275@gmail.com>
2021-04-29 13:32:41 -05:00
Vishal Nayak 406abc19dc
Autopilot: Return leader info via delegate (#11247)
* Autopilot: Return leader info via delegate

* Pull in the new raft-autopilot lib dependencies

* update deps

* Add CL
2021-04-27 15:54:26 -04:00
Josh Black ec105f288f
Switch to shared raft-boltdb library and add metrics (#11269) 2021-04-26 16:01:26 -07:00
Brian Kassouf 303c2aee7c
Run a more strict formatter over the code (#11312)
* Update tooling

* Run gofumpt

* go mod vendor
2021-04-08 09:43:39 -07:00
Hridoy Roy 049f2513e6
Initial Diagnose Command for TLS and Listener [VAULT-1896, VAULT-1899] (#11249)
* sanity checks for tls config in diagnose

* backup

* backup

* backup

* added necessary tests

* remove comment

* remove parallels causing test flakiness

* comments

* small fix

* separate out config hcl test case into new hcl file

* newline

* addressed comments

* addressed comments

* addressed comments

* addressed comments

* addressed comments

* reload funcs should be allowed to be nil
2021-04-06 16:40:43 -07:00
Nick Cabatoff b3af58d758
Expose snapshot_interval tunable instead of setting it in prod code for the sake of a test. (#11160) 2021-03-19 15:41:42 -04:00
Brian Kassouf 28aba513f2
storage/raft: Ensure peers are informed of their correct suffrage when added with AutoPilot (#11155)
* storage/raft: Ensure peers are informed of their correct suffrage when added with AutoPilot

* Add test ensuring peer sets are equivalent
2021-03-19 11:53:50 -07:00
Nick Cabatoff 411495514c
Add a test for server stabilization (#11128) 2021-03-17 17:23:13 -04:00
Vishal Nayak fb2df6ca73
Fix autopilot fsm race (#11091)
* Fix autopilot fsm race

* No need to grab backend's lock
2021-03-11 13:14:11 -05:00
Vishal Nayak 9839e76192
Remove unneeded fields from state output (#11073) 2021-03-10 12:08:12 -05:00
Vishal Nayak e5b6ec4d05
Reset IsDead upon each heartbeat (#11049) 2021-03-05 19:50:36 -05:00
Vishal Nayak 910b45413b
Handle error (#11039) 2021-03-03 15:55:50 -05:00
Vishal Nayak 3e55e79a3f
Autopilot: Server Stabilization, State and Dead Server Cleanup (#10856)
* k8s doc: update for 0.9.1 and 0.8.0 releases (#10825)

* k8s doc: update for 0.9.1 and 0.8.0 releases

* Update website/content/docs/platform/k8s/helm/configuration.mdx

Co-authored-by: Theron Voran <tvoran@users.noreply.github.com>

Co-authored-by: Theron Voran <tvoran@users.noreply.github.com>

* Autopilot initial commit

* Move autopilot related backend implementations to its own file

* Abstract promoter creation

* Add nil check for health

* Add server state oss no-ops

* Config ext stub for oss

* Make way for non-voters

* s/health/state

* s/ReadReplica/NonVoter

* Add synopsis and description

* Remove struct tags from AutopilotConfig

* Use var for config storage path

* Handle nin-config when reading

* Enable testing autopilot by using inmem cluster

* First passing test

* Only report the server as known if it is present in raft config

* Autopilot defaults to on for all existing and new clusters

* Add locking to some functions

* Persist initial config

* Clarify the command usage doc

* Add health metric for each node

* Fix audit logging issue

* Don't set DisablePerformanceStandby to true in test

* Use node id label for health metric

* Log updates to autopilot config

* Less aggressively consume config loading failures

* Return a mutable config

* Return early from known servers if raft config is unable to be pulled

* Update metrics name

* Reduce log level for potentially noisy log

* Add knob to disable autopilot

* Don't persist if default config is in use

* Autopilot: Dead server cleanup (#10857)

* Dead server cleanup

* Initialize channel in any case

* Fix a bunch of tests

* Fix panic

* Add follower locking in heartbeat tracker

* Add LastContactFailureThreshold to config

* Add log when marking node as dead

* Update follower state locking in heartbeat tracker

* Avoid follower states being nil

* Pull test to its own file

* Add execution status to state response

* Optionally enable autopilot in some tests

* Updates

* Added API function to fetch autopilot configuration

* Add test for default autopilot configuration

* Configuration tests

* Add State API test

* Update test

* Added TestClusterOptions.PhysicalFactoryConfig

* Update locking

* Adjust locking in heartbeat tracker

* s/last_contact_failure_threshold/left_server_last_contact_threshold

* Add disabling autopilot as a core config option

* Disable autopilot in some tests

* s/left_server_last_contact_threshold/dead_server_last_contact_threshold

* Set the lastheartbeat of followers to now when setting up active node

* Don't use config defaults from CLI command

* Remove config file support

* Remove HCL test as well

* Persist only supplied config; merge supplied config with default to operate

* Use pointer to structs for storing follower information

* Test update

* Retrieve non voter status from configbucket and set it up when a node comes up

* Manage desired suffrage

* Consider bucket being created already

* Move desired suffrage to its own entry

* s/DesiredSuffrageKey/LocalNodeConfigKey

* s/witnessSuffrage/recordSuffrage

* Fix test compilation

* Handle local node config post a snapshot install

* Commit to storage first; then record suffrage in fsm

* No need of local node config being nili case, post snapshot restore

* Reconcile autopilot config when a new leader takes over duty

* Grab fsm lock when recording suffrage

* s/Suffrage/DesiredSuffrage in FollowerState

* Instantiate autopilot only in leader

* Default to old ways in more scenarios

* Make API gracefully handle 404

* Address some feedback

* Make IsDead an atomic.Value

* Simplify follower hearbeat tracking

* Use uber.atomic

* Don't have multiple causes for having autopilot disabled

* Don't remove node from follower states if we fail to remove the dead server

* Autopilot server removals map (#11019)

* Don't remove node from follower states if we fail to remove the dead server

* Use map to track dead server removals

* Use lock and map

* Use delegate lock

* Adjust when to remove entry from map

* Only hold the lock while accessing map

* Fix race

* Don't set default min_quorum

* Fix test

* Ensure follower states is not nil before starting autopilot

* Fix race

Co-authored-by: Jason O'Donnell <2160810+jasonodonnell@users.noreply.github.com>
Co-authored-by: Theron Voran <tvoran@users.noreply.github.com>
2021-03-03 13:59:50 -05:00
Nick Cabatoff c1ddfbb538
OSS parts of the new client controlled consistency feature (#10974) 2021-02-24 06:58:10 -05:00
Vishal Nayak 53cb1deb38
Revert "Read-replica instead of non-voter (#10875)" (#10890)
This reverts commit fc745670cf34821f5834357d9caebc3351dbc1e7.
2021-02-10 16:41:58 -05:00
Vishal Nayak a2394e7353
Read-replica instead of non-voter (#10875) 2021-02-10 09:58:18 -05:00