* Raft retry join
* update
* Make retry join work with shamir seal
* Return upon context completion
* Update vault/raft.go
Co-Authored-By: Brian Kassouf <briankassouf@users.noreply.github.com>
* Address some review comments
* send leader information slice as a parameter
* Make retry join work properly with Shamir case. This commit has a blocking issue
* Fix join goroutine exiting before the job is done
* Polishing changes
* Don't return after a successful join during unseal
* Added config parsing test
* Add test and fix bugs
* minor changes
* Address review comments
* Fix build error
Co-authored-by: Brian Kassouf <briankassouf@users.noreply.github.com>
* move ServiceDiscovery into methods
* add ServiceDiscoveryFactory
* add serviceDiscovery field to vault.Core
* refactor ConsulServiceDiscovery into separate struct
* cleanup
* revert accidental change to go.mod
* cleanup
* get rid of un-needed struct tags in vault.CoreConfig
* add service_discovery parser
* add ServiceDiscovery to config
* cleanup
* cleanup
* add test for ConfigServiceDiscovery to Core
* unit testing for config service_discovery stanza
* cleanup
* get rid of un-needed redirect_addr stuff in service_discovery stanza
* improve test suite
* cleanup
* clean up test a bit
* create docs for service_discovery
* check if service_discovery is configured, but storage does not support HA
* tinker with test
* tinker with test
* tweak docs
* move ServiceDiscovery into its own package
* tweak a variable name
* fix comment
* rename service_discovery to service_registration
* tweak service_registration config
* Revert "tweak service_registration config"
This reverts commit 5509920a8ab4c5a216468f262fc07c98121dce35.
* simplify naming
* refactor into ./serviceregistration/consul
Shamir seals now come in two varieties: legacy and new-style. Legacy
Shamir is automatically converted to new-style when a rekey operation
is performed. All new Vault initializations using Shamir are new-style.
New-style Shamir writes an encrypted master key to storage, just like
AutoUnseal. The stored master key is encrypted using the shared key that
is split via Shamir's algorithm. Thus when unsealing, we take the key
fragments given, combine them into a Key-Encryption-Key, and use that
to decrypt the master key on disk. Then the master key is used to read
the keyring that decrypts the barrier.
* Initial work
* rework
* s/dr/recovery
* Add sys/raw support to recovery mode (#7577)
* Factor the raw paths out so they can be run with a SystemBackend.
# Conflicts:
# vault/logical_system.go
* Add handleLogicalRecovery which is like handleLogical but is only
sufficient for use with the sys-raw endpoint in recovery mode. No
authentication is done yet.
* Integrate with recovery-mode. We now handle unauthenticated sys/raw
requests, albeit on path v1/raw instead v1/sys/raw.
* Use sys/raw instead raw during recovery.
* Don't bother persisting the recovery token. Authenticate sys/raw
requests with it.
* RecoveryMode: Support generate-root for autounseals (#7591)
* Recovery: Abstract config creation and log settings
* Recovery mode integration test. (#7600)
* Recovery: Touch up (#7607)
* Recovery: Touch up
* revert the raw backend creation changes
* Added recovery operation token prefix
* Move RawBackend to its own file
* Update API path and hit it using CLI flag on generate-root
* Fix a panic triggered when handling a request that yields a nil response. (#7618)
* Improve integ test to actually make changes while in recovery mode and
verify they're still there after coming back in regular mode.
* Refuse to allow a second recovery token to be generated.
* Resize raft cluster to size 1 and start as leader (#7626)
* RecoveryMode: Setup raft cluster post unseal (#7635)
* Setup raft cluster post unseal in recovery mode
* Remove marking as unsealed as its not needed
* Address review comments
* Accept only one seal config in recovery mode as there is no scope for migration
When starting a vault dev server the token helper is invoked to store
the dev root token.
This option gives the user the ability to not store the token.
Storing the token can be undesirable in certain circumstances
(e.g. running local tests) as the user's existing vault token is
clobbered without warning.
Fixes#1861
* Read config before creating logger when booting vault server
* Allow for specifying log output in JSON format in a config file, via a 'log_level' flag
* Create parser for log format flag
* Allow for specifying log format in a config file, via a 'log_format' flag. Also, get rid of 'log_json' flag.
* Add 'log-format' command line flag
* Update documentation to include description of log_format setting
* Tweak comment for VAULT_LOG_FORMAT environment variable
* add test for ParseEnvLogFormat()
* clarify how log format is set
* fix typos in documentation
* Work on raft backend
* Add logstore locally
* Add encryptor and unsealable interfaces
* Add clustering support to raft
* Remove client and handler
* Bootstrap raft on init
* Cleanup raft logic a bit
* More raft work
* Work on TLS config
* More work on bootstrapping
* Fix build
* More work on bootstrapping
* More bootstrapping work
* fix build
* Remove consul dep
* Fix build
* merged oss/master into raft-storage
* Work on bootstrapping
* Get bootstrapping to work
* Clean up FMS and node-id
* Update local node ID logic
* Cleanup node-id change
* Work on snapshotting
* Raft: Add remove peer API (#906)
* Add remove peer API
* Add some comments
* Fix existing snapshotting (#909)
* Raft get peers API (#912)
* Read raft configuration
* address review feedback
* Use the Leadership Transfer API to step-down the active node (#918)
* Raft join and unseal using Shamir keys (#917)
* Raft join using shamir
* Store AEAD instead of master key
* Split the raft join process to answer the challenge after a successful unseal
* get the follower to standby state
* Make unseal work
* minor changes
* Some input checks
* reuse the shamir seal access instead of new default seal access
* refactor joinRaftSendAnswer function
* Synchronously send answer in auto-unseal case
* Address review feedback
* Raft snapshots (#910)
* Fix existing snapshotting
* implement the noop snapshotting
* Add comments and switch log libraries
* add some snapshot tests
* add snapshot test file
* add TODO
* More work on raft snapshotting
* progress on the ConfigStore strategy
* Don't use two buckets
* Update the snapshot store logic to hide the file logic
* Add more backend tests
* Cleanup code a bit
* [WIP] Raft recovery (#938)
* Add recovery functionality
* remove fmt.Printfs
* Fix a few fsm bugs
* Add max size value for raft backend (#942)
* Add max size value for raft backend
* Include physical.ErrValueTooLarge in the message
* Raft snapshot Take/Restore API (#926)
* Inital work on raft snapshot APIs
* Always redirect snapshot install/download requests
* More work on the snapshot APIs
* Cleanup code a bit
* On restore handle special cases
* Use the seal to encrypt the sha sum file
* Add sealer mechanism and fix some bugs
* Call restore while state lock is held
* Send restore cb trigger through raft log
* Make error messages nicer
* Add test helpers
* Add snapshot test
* Add shamir unseal test
* Add more raft snapshot API tests
* Fix locking
* Change working to initalize
* Add underlying raw object to test cluster core
* Move leaderUUID to core
* Add raft TLS rotation logic (#950)
* Add TLS rotation logic
* Cleanup logic a bit
* Add/Remove from follower state on add/remove peer
* add comments
* Update more comments
* Update request_forwarding_service.proto
* Make sure we populate all nodes in the followerstate obj
* Update times
* Apply review feedback
* Add more raft config setting (#947)
* Add performance config setting
* Add more config options and fix tests
* Test Raft Recovery (#944)
* Test raft recovery
* Leave out a node during recovery
* remove unused struct
* Update physical/raft/snapshot_test.go
* Update physical/raft/snapshot_test.go
* fix vendoring
* Switch to new raft interface
* Remove unused files
* Switch a gogo -> proto instance
* Remove unneeded vault dep in go.sum
* Update helper/testhelpers/testhelpers.go
Co-Authored-By: Calvin Leung Huang <cleung2010@gmail.com>
* Update vault/cluster/cluster.go
* track active key within the keyring itself (#6915)
* track active key within the keyring itself
* lookup and store using the active key ID
* update docstring
* minor refactor
* Small text fixes (#6912)
* Update physical/raft/raft.go
Co-Authored-By: Calvin Leung Huang <cleung2010@gmail.com>
* review feedback
* Move raft logical system into separate file
* Update help text a bit
* Enforce cluster addr is set and use it for raft bootstrapping
* Fix tests
* fix http test panic
* Pull in latest raft-snapshot library
* Add comment
* http timeout fields are configurable
* move return statement for server config tests outside of range loop
* adds documentation for configurable listener http_* values
* fixed some formatting for the docs markdown
If you were migrating to Shamir but didn't specify a Shamir block
migration would fail. Being explicit is nice but it's also not really
obvious since you don't need the block normally.
* Add ability to migrate autoseal to autoseal
This adds the ability to migrate from shamir to autoseal, autoseal to
shamir, or autoseal to autoseal, by allowing multiple seal stanzas. A
disabled stanza will be used as the config being migrated from; this can
also be used to provide an unwrap seal on ent over multiple unseals.
A new test is added to ensure that autoseal to autoseal works as
expected.
* Fix test
* Provide default shamir info if not given in config
* Linting feedback
* Remove context var that isn't used
* Don't run auto unseal watcher when in migration, and move SetCores to SetSealsForMigration func
* Slight logic cleanup
* Fix test build and fix bug
* Updates
* remove GetRecoveryKey function
* initial commit for prometheus and sys/metrics support
* Throw an error if prometheusRetentionTime is 0,add prometheus in devmode
* return when format=prometheus is used and prom is disable
* parse prometheus_retention_time from string instead of int
* Initialize config.Telemetry if nil
* address PR issues
* add sys/metrics framework.Path in a factory
* Apply requiredMountTable entries's MountConfig to existing core table
* address pr comments
* enable prometheus sink by default
* Move Metric-related code in a separate metricsutil helper
* Add helper for checking if an error is a fatal error
The double-double negative was really confusing, and this pattern is used a few places in Vault. This negates the double negative, making the devx a bit easier to follow.
* Check return value of UnsealWithStoredKeys in sys/init
* Return proper error types when attempting unseal with stored key
Prior to this commit, "nil" could have meant unsupported auto-unseal, a transient error, or success. This updates the function to return the correct error type, signaling to the caller whether they should retry or fail.
* Continuously attempt to unseal if sealed keys are supported
This fixes a bug that occurs on bootstrapping an initial cluster. Given a collection of Vault nodes and an initialized storage backend, they will all go into standby waiting for initialization. After one node is initialized, the other nodes had no mechanism by which they "re-check" to see if unseal keys are present. This adds a goroutine to the server command which continually waits for unseal keys to exist. It exits in the following conditions:
- the node is unsealed
- the node does not support stored keys
- a fatal error occurs (as defined by Vault)
- the server is shutting down
In all other situations, the routine wakes up at the specified interval and attempts to unseal with the stored keys.
The result will still pass gofmtcheck and won't trigger additional
changes if someone isn't using goimports, but it will avoid the
piecemeal imports changes we've been seeing.
* Continue on plugin registration error in dev mode
* Continue only on unknown type error
* Continue only on unknown type error
* Print plugin registration error on exit
Co-Authored-By: calvn <cleung2010@gmail.com>
* Add request timeouts in normal request path and to expirations
* Add ability to adjust default max request duration
* Some test fixes
* Ensure tests have defaults set for max request duration
* Add context cancel checking to inmem/file
* Fix tests
* Fix tests
* Set default max request duration to basically infinity for this release for BC
* Address feedback
* Tackle #4929 a different way
This turns c.sealed into an atomic, which allows us to call sealInternal
without a lock. By doing so we can better control lock grabbing when a
condition causing the standby loop to get out of active happens. This
encapsulates that logic into two distinct pieces (although they could
be combined into one), and makes lock guarding more understandable.
* Re-add context canceling to the non-HA version of sealInternal
* Return explicitly after stopCh triggered
This change makes it so that if a lease is revoked through user action,
we set the expiration time to now and update pending, just as we do with
tokens. This allows the normal retry logic to apply in these cases as
well, instead of just erroring out immediately. The idea being that once
you tell Vault to revoke something it should keep doing its darndest to
actually make that happen.
* Allow max request size to be user-specified
This turned out to be way more impactful than I'd expected because I
felt like the right granularity was per-listener, since an org may want
to treat external clients differently from internal clients. It's pretty
straightforward though.
This also introduces actually using request contexts for values, which
so far we have not done (using our own logical.Request struct instead),
but this allows non-logical methods to still get this benefit.
* Switch to ioutil.ReadAll()
* Add an idle timeout for the server
Because tidy operations can be long-running, this also changes all tidy
operations to behave the same operationally (kick off the process, get a
warning back, log errors to server log) and makes them all run in a
goroutine.
This could mean a sort of hard stop if Vault gets sealed because the
function won't have the read lock. This should generally be okay
(running tidy again should pick back up where it left off), but future
work could use cleanup funcs to trigger the functions to stop.
* Fix up tidy test
* Add deadline to cluster connections and an idle timeout to the cluster server, plus add readheader/read timeout to api server
This can be used when errors are happening early on to avoid them being
swallowed by logGate.
This also does a bit of cleanup of format env var checking --
helper/logging internally looks for this so it was totally unnecessary
since moving to hclog.
* adding UI handlers and UI header configuration
* forcing specific static headers
* properly getting UI config value from config/environment
* fixing formatting in stub UI text
* use http.Header
* case-insensitive X-Vault header check
* fixing var name
* wrap both stubbed and real UI in header handler
* adding test for >1 keys
* logbridge with hclog and identical output
* Initial search & replace
This compiles, but there is a fair amount of TODO
and commented out code, especially around the
plugin logclient/logserver code.
* strip logbridge
* fix majority of tests
* update logxi aliases
* WIP fixing tests
* more test fixes
* Update test to hclog
* Fix format
* Rename hclog -> log
* WIP making hclog and logxi love each other
* update logger_test.go
* clean up merged comments
* Replace RawLogger interface with a Logger
* Add some logger names
* Replace Trace with Debug
* update builtin logical logging patterns
* Fix build errors
* More log updates
* update log approach in command and builtin
* More log updates
* update helper, http, and logical directories
* Update loggers
* Log updates
* Update logging
* Update logging
* Update logging
* Update logging
* update logging in physical
* prefixing and lowercase
* Update logging
* Move phyisical logging name to server command
* Fix som tests
* address jims feedback so far
* incorporate brians feedback so far
* strip comments
* move vault.go to logging package
* update Debug to Trace
* Update go-plugin deps
* Update logging based on review comments
* Updates from review
* Unvendor logxi
* Remove null_logger.go
* Redo the API client quite a bit to make the behavior of NewClient more
predictable and add locking to make it safer to use with Clone() and if
multiple goroutines for some reason decide to change things.
Along the way I discovered that currently, the x/net/http2 package is
broke with the built-in h2 support in released Go. For those using
DefaultConfig (the vast majority of cases) this will be a non-event.
Others can manually call http2.ConfigureTransport as needed. We should
keep an eye on commits on that repo and consider more updates before
release. Alternately we could go back revisions but miss out on bug
fixes; my theory is that this is not a purposeful break and I'll be
following up on this in the Go issue tracker.
In a few tests that don't use NewTestCluster, either for legacy or other
reasons, ensure that http2.ConfigureTransport is called.
* Use tls config cloning
* Don't http2.ConfigureServer anymore as current Go seems to work properly without requiring the http2 package
* Address feedback
* disable raw endpoint by default
* adding docs
* config option raw -> raw_storage_endpoint
* docs updates
* adding listing on raw endpoint
* reworking tests for enabled raw endpoints
* root protecting base raw endpoint
* Add basic autocompletion
* Add autocomplete to some common commands
* Autocomplete the generate-root flags
* Add information about autocomplete to the docs
* Normalize "X arguments expected" messages
* Use "Vault" when referring to the product and "vault" when referring to an instance of the product
* Various minor tweaks to improve readability and/or provide clarity
Adds HUP support for audit log files to close and reopen. This makes it
much easier to deal with normal log rotation methods.
As part of testing this I noticed that HUP and other items that come out
of command/server.go are going to stderr, which is where our normal log
lines go. This isn't so much problematic with our normal output but as
we officially move to supporting other formats this can cause
interleaving issues, so I moved those to stdout instead.