* Add helper for checking if an error is a fatal error
The double-double negative was really confusing, and this pattern is used a few places in Vault. This negates the double negative, making the devx a bit easier to follow.
* Check return value of UnsealWithStoredKeys in sys/init
* Return proper error types when attempting unseal with stored key
Prior to this commit, "nil" could have meant unsupported auto-unseal, a transient error, or success. This updates the function to return the correct error type, signaling to the caller whether they should retry or fail.
* Continuously attempt to unseal if sealed keys are supported
This fixes a bug that occurs on bootstrapping an initial cluster. Given a collection of Vault nodes and an initialized storage backend, they will all go into standby waiting for initialization. After one node is initialized, the other nodes had no mechanism by which they "re-check" to see if unseal keys are present. This adds a goroutine to the server command which continually waits for unseal keys to exist. It exits in the following conditions:
- the node is unsealed
- the node does not support stored keys
- a fatal error occurs (as defined by Vault)
- the server is shutting down
In all other situations, the routine wakes up at the specified interval and attempts to unseal with the stored keys.
* Upgrade to new Cloud KMS client libraries
We recently released the new Cloud KMS client libraries which use GRPC
instead of HTTP. They are faster and look nicer (</opinion>), but more
importantly they drastically simplify a lot of the logic around client
creation, encryption, and decryption. In particular, we can drop all the
logic around looking up credentials and base64-encoding/decoding.
Tested on a brand new cluster (no pre-existing unseal keys) and against
a cluster with stored keys from a previous version of Vault to ensure no
regressions.
* Use the default scopes the client requests
The client already does the right thing here, so we don't need to
surface it, especially since we aren't allowing users to configure it.
* fix cubbyhole deletion
* Fix error handling
* Move the cubbyhole tidy logic to token store and track the revocation count
* Move fetching of cubby keys before the tidy loop
* Fix context getting cancelled
* Test the cubbyhole cleanup logic
* Add progress counter for cubbyhole cleanup
* Minor polish
* Use map instead of slice for faster computation
* Add test for cubbyhole deletion
* Add a log statement for deletion
* Add SHA1 hashed tokens into the mix
The result will still pass gofmtcheck and won't trigger additional
changes if someone isn't using goimports, but it will avoid the
piecemeal imports changes we've been seeing.
This changes the behavior of the GCPCKMS auto-unsealer setup to attempt
encryption instead of a key lookup. Key lookups are a different API
method not covered by roles/cloudkms.cryptoKeyEncrypterDecrypter. This
means users must grant an extended scope to their service account
(granting the ability to read key data) which only seems to be used to
validate the existence of the key.
Worse, the only roles that include this permission are overly verbose
(e.g. roles/viewer which gives readonly access to everything in the
project and roles/cloudkms.admin which gives full control over all key
operations). This leaves the user stuck between choosing to create a
custom IAM role (which isn't fun) or grant overly broad permissions.
By changing to an encrypt call, we get better verification of the unseal
permissions and users can reduce scope to a single role.
* Refactor mount tune to support upsert options values and unset options.
* Do not allow unsetting options map
* add secret tune version regression test
* Only accept valid options version
* s/meVersion/optVersion/
We're having issues with leases in the GCS backend storage being
corrupted and failing MAC checking. When that happens, we need to know
the lease ID so we can address the corruption by hand and take
appropriate action.
This will hopefully prevent any instances of incomplete data being sent
to GSS
* Use common abstraction for entity deletion
* Update group memberships before deleting entity
* Added test
* Fix return statements
* Update comment
* Cleanup member entity IDs while loading groups
* Added test to ensure that upgrade happens properly
* Ensure that the group gets persisted if upgrade code modifies it
* Continue on plugin registration error in dev mode
* Continue only on unknown type error
* Continue only on unknown type error
* Print plugin registration error on exit
Co-Authored-By: calvn <cleung2010@gmail.com>
* Support registering plugin with name only
* Make RegisterPlugin backwards compatible
* Add CLI backwards compat command to plugin info and deregister
* Add server-side deprecation warnings if old read/dereg API endpoints are called
* Address feedback
The slice returned by `collectGroupsReverseDFS` is an updated copy of
the slice given to it when called. Appending `pGroups` to `groups`
therefore led to expontential memory usage as the slice was repeatedly
appended to itself.
Fixes#5605
* case insensitive identity names
* TestIdentityStore_GroupHierarchyCases
* address review feedback
* Use errwrap.Contains instead of errwrap.ContainsType
* Warn about duplicate names all the time to help fix them
* Address review feedback
* Support Authorization Bearer as token header
* add requestAuth test
* remove spew debug output in test
* Add Authorization in CORS Allowed headers
* use const where applicable
* use less allocations in bearer token checking
* address PR comments on tests and apply last commit
* reorder error checking in a TestHandler_requestAuth
* logical/aws: Harden WAL entry creation
If AWS IAM user creation failed in any way, the WAL corresponding to the
IAM user would get left around and Vault would try to roll it back.
However, because the user never existed, the rollback failed. Thus, the
WAL would essentially get "stuck" and Vault would continually attempt to
roll it back, failing every time. A similar situation could arise if the
IAM user that Vault created got deleted out of band, or if Vault deleted
it but was unable to write the lease revocation back to storage (e.g., a
storage failure).
This attempts to harden it in two ways. One is by deleting the WAL log
entry if the IAM user creation fails. However, the WAL deletion could
still fail, and this wouldn't help where the user is deleted out of
band, so second, consider the user rolled back if the user just doesn't
exist, under certain circumstances.
Fixes#5190
* Fix segfault in expiration unit tests
TestExpiration_Tidy was passing in a leaseEntry that had a nil Secret,
which then caused a segfault as the changes to revokeEntry didn't check
whether Secret was nil; this is probably unlikely to occur in real life,
but good to be extra cautious.
* Fix potential segfault
Missed the else...
* Respond to PR feedback
* Fix for using ExplicitMaxTTL in auth method plugins.
* Reverted pb.go files for readability of PR.
* Fixed indenting of comment.
* Reverted unintended change by go test.
This works around a very, very common error where people write policies
to affect listing but forget the slash at the end. If there is no exact
rule with a slash at the end when doing a list, we look to see if there
is a rule without it, and if so, use those capabilities.
Fixes #mass-user-confusion
* Initial work on templating
* Add check for unbalanced closing in front
* Add missing templated assignment
* Add first cut of end-to-end test on templating.
* Make template errors be 403s and finish up testing
* Review feedback
* Merge Identity Entities if two claim the same alias
Past bugs/race conditions meant two entities could be created each
claiming the same alias. There are planned longer term fixes for this
(outside of the race condition being fixed in 0.10.4) that involve
changing the data model, but this is an immediate workaround that has
the same net effect: if two entities claim the same alias, assume they
were created due to this race condition and merge them.
In this situation, also automatically merge policies so we don't lose
e.g. RGPs.
* plumbing request context to expiration manager
* moar context
* address feedback
* only using active context for revoke prefix
* using active context for revoke commands
* cancel tidy on active context
* address feedback
* core: Cancel context before taking state lock
* Create active context outside of postUnseal
* Attempt to drain requests before canceling context
* fix test