Shamir seals now come in two varieties: legacy and new-style. Legacy
Shamir is automatically converted to new-style when a rekey operation
is performed. All new Vault initializations using Shamir are new-style.
New-style Shamir writes an encrypted master key to storage, just like
AutoUnseal. The stored master key is encrypted using the shared key that
is split via Shamir's algorithm. Thus when unsealing, we take the key
fragments given, combine them into a Key-Encryption-Key, and use that
to decrypt the master key on disk. Then the master key is used to read
the keyring that decrypts the barrier.
* Initial work
* rework
* s/dr/recovery
* Add sys/raw support to recovery mode (#7577)
* Factor the raw paths out so they can be run with a SystemBackend.
# Conflicts:
# vault/logical_system.go
* Add handleLogicalRecovery which is like handleLogical but is only
sufficient for use with the sys-raw endpoint in recovery mode. No
authentication is done yet.
* Integrate with recovery-mode. We now handle unauthenticated sys/raw
requests, albeit on path v1/raw instead v1/sys/raw.
* Use sys/raw instead raw during recovery.
* Don't bother persisting the recovery token. Authenticate sys/raw
requests with it.
* RecoveryMode: Support generate-root for autounseals (#7591)
* Recovery: Abstract config creation and log settings
* Recovery mode integration test. (#7600)
* Recovery: Touch up (#7607)
* Recovery: Touch up
* revert the raw backend creation changes
* Added recovery operation token prefix
* Move RawBackend to its own file
* Update API path and hit it using CLI flag on generate-root
* Fix a panic triggered when handling a request that yields a nil response. (#7618)
* Improve integ test to actually make changes while in recovery mode and
verify they're still there after coming back in regular mode.
* Refuse to allow a second recovery token to be generated.
* Resize raft cluster to size 1 and start as leader (#7626)
* RecoveryMode: Setup raft cluster post unseal (#7635)
* Setup raft cluster post unseal in recovery mode
* Remove marking as unsealed as its not needed
* Address review comments
* Accept only one seal config in recovery mode as there is no scope for migration
* sys: add host-info endpoint, add client API method
* remove old commented handler
* add http tests, fix bugs
* query all partitions for disk usage
* fix Timestamp decoding
* add comments for clarification
* dont append a nil entry on disk usage query error
* remove HostInfo from the sdk api
We can use Logical().Read(...) to query this endpoint since the payload is contained with the data object. All warnings are preserved under Secret.Warnings.
* ensure that we're testing failure case against a standby node
* add and use TestWaitStandby to ensure core is on standby
* remove TestWaitStandby
* respond with local-only error
* move HostInfo into its own helper package
* fix imports; use new no-forward handler
* add cpu times to collection
* emit clearer multierrors/warnings by collection type
* add comments on HostInfo fields
Generalization of the PhysicalFactory notion introduced by Raft, so it can be used by other storage backends in tests. These are the OSS changes needed for my rework of the ent integ tests and cluster helpers.
There are a few different things happening in this change. First, some code that previously lived in enterprise has moved here: this includes some helper code for manipulating clusters and for building storage backends. Second, the existing cluster-building code using inmem storage has been generalized to allow various storage backends. Third, added support for creating two-cluster DR setups. Finally, there are tweaks to handle edge cases that
result in intermittent failures, or to eliminate sleeps in favour of polling to detect state changes.
Also: generalize TestClusterOptions.PhysicalFactory so it can be used either
as a per-core factory (for raft) or a per-cluster factory (for other
storage backends.)
We're waiting to see standbys receive wals but aren't generating traffic, so the condition is never satisfied. Fixed by continuously updating a KV value. It's a little weird to do so in each of the
goroutines, but there's no harm and it's simplest.
Various improvements to testhelpers.
* WaitForActiveNodeAndPerfStandbys is used to make sure a cluster is fully ready, i.e. both its active node and perf standbys are in a good state.
* WaitForReplicationStatus is like WaitForReplicationState but uses the API, part of a general
effort to move us away from interacting with Core directly in these tests.
* WaitForPerfReplicationWorking is similar to some code that exists in ent already: it writes to the primary and waits to see that appear on the secondary.
* Implement SetCredentials for MongoDB, adding support for static accounts
* rework SetCredentials to split from CreateUser, and to parse the url for database
* Add integration test for mongodb static account rotation
* check the length of the password results to avoid out-of-bounds
* remove unused method
* use the pre-existing test helper for this. Add parse method to helper
* remove unused command
* storage/raft: When restoring a snapshot preseal first
* best-effort allow standbys to apply the restoreOp before sealing active node
* Don't cache the raft tls key
* Update physical/raft/raft.go
* Move pending raft peers to core
* Fix race on close bool
* Extend the leaderlease time for tests
* Update raft deps
* Fix audit hashing
* Fix race with auditing
* Add OIDC token generation to Identity
There are a few open TODOs and some remaining cleanup, but this is
functionally complete and ready for review.
(Tests will being added soon.)
* Simplified key update endpoint
* Cache the config
* Fix Issuer handling
* Suppose base64-encoded templates (#6919)
* Cache JWKS and switch to go-cache (#6918)
* Address review comments
* Add warning if neither Issue nor api_addr are set
* adds tests (#6937)
* adds help synopsis and descriptions to the framework path for the oid… (#6930)
* adds help synopsis and descriptions to the framework path for the oidc backend
* Update vault/identity_store_oidc.go
Co-Authored-By: Jim Kalafut <jim@kalafut.net>
* Add Now parameter to PopulateStringInput
* Addressing review comments
* Refactor template processing to improve mode-specific handling
* adds a test for the periodic func (#6943)
* adds a test for the periodic func
* removes commented out code
* adds a comment
* Add comments
* Work on raft backend
* Add logstore locally
* Add encryptor and unsealable interfaces
* Add clustering support to raft
* Remove client and handler
* Bootstrap raft on init
* Cleanup raft logic a bit
* More raft work
* Work on TLS config
* More work on bootstrapping
* Fix build
* More work on bootstrapping
* More bootstrapping work
* fix build
* Remove consul dep
* Fix build
* merged oss/master into raft-storage
* Work on bootstrapping
* Get bootstrapping to work
* Clean up FMS and node-id
* Update local node ID logic
* Cleanup node-id change
* Work on snapshotting
* Raft: Add remove peer API (#906)
* Add remove peer API
* Add some comments
* Fix existing snapshotting (#909)
* Raft get peers API (#912)
* Read raft configuration
* address review feedback
* Use the Leadership Transfer API to step-down the active node (#918)
* Raft join and unseal using Shamir keys (#917)
* Raft join using shamir
* Store AEAD instead of master key
* Split the raft join process to answer the challenge after a successful unseal
* get the follower to standby state
* Make unseal work
* minor changes
* Some input checks
* reuse the shamir seal access instead of new default seal access
* refactor joinRaftSendAnswer function
* Synchronously send answer in auto-unseal case
* Address review feedback
* Raft snapshots (#910)
* Fix existing snapshotting
* implement the noop snapshotting
* Add comments and switch log libraries
* add some snapshot tests
* add snapshot test file
* add TODO
* More work on raft snapshotting
* progress on the ConfigStore strategy
* Don't use two buckets
* Update the snapshot store logic to hide the file logic
* Add more backend tests
* Cleanup code a bit
* [WIP] Raft recovery (#938)
* Add recovery functionality
* remove fmt.Printfs
* Fix a few fsm bugs
* Add max size value for raft backend (#942)
* Add max size value for raft backend
* Include physical.ErrValueTooLarge in the message
* Raft snapshot Take/Restore API (#926)
* Inital work on raft snapshot APIs
* Always redirect snapshot install/download requests
* More work on the snapshot APIs
* Cleanup code a bit
* On restore handle special cases
* Use the seal to encrypt the sha sum file
* Add sealer mechanism and fix some bugs
* Call restore while state lock is held
* Send restore cb trigger through raft log
* Make error messages nicer
* Add test helpers
* Add snapshot test
* Add shamir unseal test
* Add more raft snapshot API tests
* Fix locking
* Change working to initalize
* Add underlying raw object to test cluster core
* Move leaderUUID to core
* Add raft TLS rotation logic (#950)
* Add TLS rotation logic
* Cleanup logic a bit
* Add/Remove from follower state on add/remove peer
* add comments
* Update more comments
* Update request_forwarding_service.proto
* Make sure we populate all nodes in the followerstate obj
* Update times
* Apply review feedback
* Add more raft config setting (#947)
* Add performance config setting
* Add more config options and fix tests
* Test Raft Recovery (#944)
* Test raft recovery
* Leave out a node during recovery
* remove unused struct
* Update physical/raft/snapshot_test.go
* Update physical/raft/snapshot_test.go
* fix vendoring
* Switch to new raft interface
* Remove unused files
* Switch a gogo -> proto instance
* Remove unneeded vault dep in go.sum
* Update helper/testhelpers/testhelpers.go
Co-Authored-By: Calvin Leung Huang <cleung2010@gmail.com>
* Update vault/cluster/cluster.go
* track active key within the keyring itself (#6915)
* track active key within the keyring itself
* lookup and store using the active key ID
* update docstring
* minor refactor
* Small text fixes (#6912)
* Update physical/raft/raft.go
Co-Authored-By: Calvin Leung Huang <cleung2010@gmail.com>
* review feedback
* Move raft logical system into separate file
* Update help text a bit
* Enforce cluster addr is set and use it for raft bootstrapping
* Fix tests
* fix http test panic
* Pull in latest raft-snapshot library
* Add comment
* Port over some SP v2 bits
Specifically:
* Add too-large handling to Physical (Consul only for now)
* Contextify some identity funcs
* Update SP protos
* Add size limiting to inmem storage
Merge both functions for creating mongodb containers into one.
Add retries to docker container cleanups.
Require $VAULT_ACC be set to enable AWS tests.
* Listener refactoring and file system permissions
* added listenerutil and move some common code there
* Added test for verifying socket file permissions
* Change default port of agent to 8200
* address review feedback
* Address review feedback
* Read socket options from listener config
* Implemented a warning when tls_cipher_suites includes only cipher suites which are not supprted by the HTTP/2 spec
* Added test for cipher suites
* Added hard fail on startup when all defined cipher suites are blacklisted. Added warning when some ciphers are blacklisted.
* Replaced hard failure with warning. Removed bad cipher util function and replaced it by external library.
* Added missing dependency. Fixed renaming of package name.
* Port over OSS cluster port refactor components
* Start forwarding
* Cleanup a bit
* Fix copy error
* Return error from perf standby creation
* Add some more comments
* Fix copy/paste error
* initial commit for prometheus and sys/metrics support
* Throw an error if prometheusRetentionTime is 0,add prometheus in devmode
* return when format=prometheus is used and prom is disable
* parse prometheus_retention_time from string instead of int
* Initialize config.Telemetry if nil
* address PR issues
* add sys/metrics framework.Path in a factory
* Apply requiredMountTable entries's MountConfig to existing core table
* address pr comments
* enable prometheus sink by default
* Move Metric-related code in a separate metricsutil helper
* Two things:
* Change how we populate and clear leader UUID. This fixes a case where
if a standby disconnects from an active node and reconnects, without the
active node restarting, the UUID doesn't change so triggers on a new
active node don't get run.
* Add a bunch of test helpers and minor updates to things.
* Make useCache explicit everywhere in lock manager
This also clears up a case where we could insert into the cache when it
wasn't active
* Address feedback
The result will still pass gofmtcheck and won't trigger additional
changes if someone isn't using goimports, but it will avoid the
piecemeal imports changes we've been seeing.
* Strip empty strings from database revocation stmts
It's technically valid to give empty strings as statements to run on
most databases. However, in the case of revocation statements, it's not
only generally inadvisable but can lead to lack of revocations when you
expect them. This strips empty strings from the array of revocation
statements.
It also makes two other changes:
* Return statements on read as empty but valid arrays rather than nulls,
so that typing information is inferred (this is more in line with the
rest of Vault these days)
* Changes field data for TypeStringSlice and TypeCommaStringSlice such
that a client-supplied value of `""` doesn't turn into `[]string{""}`
but rather `[]string{}`.
The latter and the explicit revocation statement changes are related,
and defense in depth.
* Remove DEL characters from password input
iTerm password manager sends \x03\0x7f before sending a password
from its password manager to make sure the password is not being
echoed to the screen. Unfortunately, vault login does not handle
the Space DEL sequence, causing the login to fail when using the
password manager. This patch uses a simple method to delete the
sequence if present anywhere in the string, although it is strictly
only needed at the start of input.
* Simplify iTerm handling to only remove iTerm prefix
The logic now only removes the two byte prefix sent in by iTerm
instead of trying to remove all deletes in the string.
This has been tested to work with the iTerm password manager.
As a small correction, the byte sequence is \x20\x7f. The
earlier commit message incorrectly stated it was \x03\x7f.
* Support registering plugin with name only
* Make RegisterPlugin backwards compatible
* Add CLI backwards compat command to plugin info and deregister
* Add server-side deprecation warnings if old read/dereg API endpoints are called
* Address feedback
* Add test file for testing path_restore in Transit backend. Fails because 'force' is not implemented yet
* initial implementation of 'force', to force restore of existing transit key atomically
* Initial implemntation of returning 529 for rate limits
- bump aws iam and sts packages to v1.14.31 to get mocking interface
- promote the iam and sts clients to the aws backend struct, for mocking in tests
- this also promotes some functions to methods on the Backend struct, so
that we can use the injected client
Generating creds requires reading config/root for credentials to contact
IAM. Here we make pathConfigRoot a method on aws/backend so we can clear
the clients on successful update of config/root path. Adds a mutex to
safely clear the clients
* refactor locking and unlocking into methods on *backend
* refactor/simply the locking
* check client after grabbing lock
* Disallow adding CA's serial to revocation list
* Allow disabling revocation list generation. This returns an empty (but
signed) list, but does not affect tracking of revocations so turning it
back on will populate the list properly.
* Nomad: updating max token length to 256
* Initial support for supporting custom max token name length for Nomad
* simplify/correct tests
* document nomad max_token_name_length
* removed support for max token length env var. Rename field for clarity
* cleanups after removing env var support
* move RandomWithPrefix to testhelpers
* fix spelling
* Remove default 256 value. Use zero as a sentinel value and ignore it
* update docs
* Initial work on templating
* Add check for unbalanced closing in front
* Add missing templated assignment
* Add first cut of end-to-end test on templating.
* Make template errors be 403s and finish up testing
* Review feedback
* Merge Identity Entities if two claim the same alias
Past bugs/race conditions meant two entities could be created each
claiming the same alias. There are planned longer term fixes for this
(outside of the race condition being fixed in 0.10.4) that involve
changing the data model, but this is an immediate workaround that has
the same net effect: if two entities claim the same alias, assume they
were created due to this race condition and merge them.
In this situation, also automatically merge policies so we don't lose
e.g. RGPs.