* Upgrade to new Cloud KMS client libraries
We recently released the new Cloud KMS client libraries which use GRPC
instead of HTTP. They are faster and look nicer (</opinion>), but more
importantly they drastically simplify a lot of the logic around client
creation, encryption, and decryption. In particular, we can drop all the
logic around looking up credentials and base64-encoding/decoding.
Tested on a brand new cluster (no pre-existing unseal keys) and against
a cluster with stored keys from a previous version of Vault to ensure no
regressions.
* Use the default scopes the client requests
The client already does the right thing here, so we don't need to
surface it, especially since we aren't allowing users to configure it.
* fix cubbyhole deletion
* Fix error handling
* Move the cubbyhole tidy logic to token store and track the revocation count
* Move fetching of cubby keys before the tidy loop
* Fix context getting cancelled
* Test the cubbyhole cleanup logic
* Add progress counter for cubbyhole cleanup
* Minor polish
* Use map instead of slice for faster computation
* Add test for cubbyhole deletion
* Add a log statement for deletion
* Add SHA1 hashed tokens into the mix
The result will still pass gofmtcheck and won't trigger additional
changes if someone isn't using goimports, but it will avoid the
piecemeal imports changes we've been seeing.
This changes the behavior of the GCPCKMS auto-unsealer setup to attempt
encryption instead of a key lookup. Key lookups are a different API
method not covered by roles/cloudkms.cryptoKeyEncrypterDecrypter. This
means users must grant an extended scope to their service account
(granting the ability to read key data) which only seems to be used to
validate the existence of the key.
Worse, the only roles that include this permission are overly verbose
(e.g. roles/viewer which gives readonly access to everything in the
project and roles/cloudkms.admin which gives full control over all key
operations). This leaves the user stuck between choosing to create a
custom IAM role (which isn't fun) or grant overly broad permissions.
By changing to an encrypt call, we get better verification of the unseal
permissions and users can reduce scope to a single role.
* Refactor mount tune to support upsert options values and unset options.
* Do not allow unsetting options map
* add secret tune version regression test
* Only accept valid options version
* s/meVersion/optVersion/
We're having issues with leases in the GCS backend storage being
corrupted and failing MAC checking. When that happens, we need to know
the lease ID so we can address the corruption by hand and take
appropriate action.
This will hopefully prevent any instances of incomplete data being sent
to GSS
* Use common abstraction for entity deletion
* Update group memberships before deleting entity
* Added test
* Fix return statements
* Update comment
* Cleanup member entity IDs while loading groups
* Added test to ensure that upgrade happens properly
* Ensure that the group gets persisted if upgrade code modifies it
* Continue on plugin registration error in dev mode
* Continue only on unknown type error
* Continue only on unknown type error
* Print plugin registration error on exit
Co-Authored-By: calvn <cleung2010@gmail.com>
* Support registering plugin with name only
* Make RegisterPlugin backwards compatible
* Add CLI backwards compat command to plugin info and deregister
* Add server-side deprecation warnings if old read/dereg API endpoints are called
* Address feedback
The slice returned by `collectGroupsReverseDFS` is an updated copy of
the slice given to it when called. Appending `pGroups` to `groups`
therefore led to expontential memory usage as the slice was repeatedly
appended to itself.
Fixes#5605
* case insensitive identity names
* TestIdentityStore_GroupHierarchyCases
* address review feedback
* Use errwrap.Contains instead of errwrap.ContainsType
* Warn about duplicate names all the time to help fix them
* Address review feedback
* Support Authorization Bearer as token header
* add requestAuth test
* remove spew debug output in test
* Add Authorization in CORS Allowed headers
* use const where applicable
* use less allocations in bearer token checking
* address PR comments on tests and apply last commit
* reorder error checking in a TestHandler_requestAuth
* logical/aws: Harden WAL entry creation
If AWS IAM user creation failed in any way, the WAL corresponding to the
IAM user would get left around and Vault would try to roll it back.
However, because the user never existed, the rollback failed. Thus, the
WAL would essentially get "stuck" and Vault would continually attempt to
roll it back, failing every time. A similar situation could arise if the
IAM user that Vault created got deleted out of band, or if Vault deleted
it but was unable to write the lease revocation back to storage (e.g., a
storage failure).
This attempts to harden it in two ways. One is by deleting the WAL log
entry if the IAM user creation fails. However, the WAL deletion could
still fail, and this wouldn't help where the user is deleted out of
band, so second, consider the user rolled back if the user just doesn't
exist, under certain circumstances.
Fixes#5190
* Fix segfault in expiration unit tests
TestExpiration_Tidy was passing in a leaseEntry that had a nil Secret,
which then caused a segfault as the changes to revokeEntry didn't check
whether Secret was nil; this is probably unlikely to occur in real life,
but good to be extra cautious.
* Fix potential segfault
Missed the else...
* Respond to PR feedback
* Fix for using ExplicitMaxTTL in auth method plugins.
* Reverted pb.go files for readability of PR.
* Fixed indenting of comment.
* Reverted unintended change by go test.
This works around a very, very common error where people write policies
to affect listing but forget the slash at the end. If there is no exact
rule with a slash at the end when doing a list, we look to see if there
is a rule without it, and if so, use those capabilities.
Fixes #mass-user-confusion
* Initial work on templating
* Add check for unbalanced closing in front
* Add missing templated assignment
* Add first cut of end-to-end test on templating.
* Make template errors be 403s and finish up testing
* Review feedback
* Merge Identity Entities if two claim the same alias
Past bugs/race conditions meant two entities could be created each
claiming the same alias. There are planned longer term fixes for this
(outside of the race condition being fixed in 0.10.4) that involve
changing the data model, but this is an immediate workaround that has
the same net effect: if two entities claim the same alias, assume they
were created due to this race condition and merge them.
In this situation, also automatically merge policies so we don't lose
e.g. RGPs.
* plumbing request context to expiration manager
* moar context
* address feedback
* only using active context for revoke prefix
* using active context for revoke commands
* cancel tidy on active context
* address feedback
* core: Cancel context before taking state lock
* Create active context outside of postUnseal
* Attempt to drain requests before canceling context
* fix test
* Add request timeouts in normal request path and to expirations
* Add ability to adjust default max request duration
* Some test fixes
* Ensure tests have defaults set for max request duration
* Add context cancel checking to inmem/file
* Fix tests
* Fix tests
* Set default max request duration to basically infinity for this release for BC
* Address feedback
* Tackle #4929 a different way
This turns c.sealed into an atomic, which allows us to call sealInternal
without a lock. By doing so we can better control lock grabbing when a
condition causing the standby loop to get out of active happens. This
encapsulates that logic into two distinct pieces (although they could
be combined into one), and makes lock guarding more understandable.
* Re-add context canceling to the non-HA version of sealInternal
* Return explicitly after stopCh triggered
1) In backends, ensure they are now using TokenPolicies
2) Don't reassign auth.Policies until after expmgr registration as we
don't need them at that point
Fixes#4829
It's not obvious why this should be secret, and if it were considered
secret, when and what anything would ever be allowed to access it.
Likely the right way to tie secret values to particular
entities/aliases/groups would be to use the upcoming templated ACL
feature.
1) Disable MaxRetries in test cluster clients. We generally want to fail
as fast as possible in tests so adding unpredictable timing in doesn't
help things, especially if we're timing sensitive in the test.
2) EquivalentPolicies is supposed to return true if only one set
contains `default` and the other is empty, but if one set was nil
instead of simply a zero length slice it would always return false. This
means that renewing against, say, `userpass` when not actually
specifying any user policies would always fail.
* Add description flag to secrets and auth tune subcommands
* Allow empty description to be provided in secret and auth mount tune
* Use flagNameDescription
This change makes it so that if a lease is revoked through user action,
we set the expiration time to now and update pending, just as we do with
tokens. This allows the normal retry logic to apply in these cases as
well, instead of just erroring out immediately. The idea being that once
you tell Vault to revoke something it should keep doing its darndest to
actually make that happen.
* Allow max request size to be user-specified
This turned out to be way more impactful than I'd expected because I
felt like the right granularity was per-listener, since an org may want
to treat external clients differently from internal clients. It's pretty
straightforward though.
This also introduces actually using request contexts for values, which
so far we have not done (using our own logical.Request struct instead),
but this allows non-logical methods to still get this benefit.
* Switch to ioutil.ReadAll()
* Add an idle timeout for the server
Because tidy operations can be long-running, this also changes all tidy
operations to behave the same operationally (kick off the process, get a
warning back, log errors to server log) and makes them all run in a
goroutine.
This could mean a sort of hard stop if Vault gets sealed because the
function won't have the read lock. This should generally be okay
(running tidy again should pick back up where it left off), but future
work could use cleanup funcs to trigger the functions to stop.
* Fix up tidy test
* Add deadline to cluster connections and an idle timeout to the cluster server, plus add readheader/read timeout to api server
If we have a panic defer functions are run but unlocks aren't. Since we
can't really trust plugins and storage, this backs out the changes for
those parts of the request path.
* This changes the way policies are reported in audit logs.
Previously, only policies tied to tokens would be reported. This could
make it difficult to perform after-the-fact analysis based on both the
initial response entry and further requests. Now, the full set of
applicable policies from both the token and any derived policies from
Identity are reported.
To keep things consistent, token authentications now also return the
full set of policies in api.Secret.Auth responses, so this both makes it
easier for users to understand their actual full set, and it matches
what the audit logs now report.
* Remove a lot of deferred functions in the request path.
There is an interesting benchmark at https://www.reddit.com/r/golang/comments/3h21nk/simple_micro_benchmark_to_measure_the_overhead_of/
It shows that defer actually adds quite a lot of overhead -- maybe 100ns
per call but we defer a *lot* of functions in the request path. So this
removes some of the ones in request handling, ha, barrier, router, and
physical cache.
One meta-note: nearly every metrics function is in a defer which means
every metrics call we add could add a non-trivial amount of time, e.g.
for every 10 extra metrics statements we add 1ms to a request. I don't
know how to solve this right now without doing what I did in some of
these cases and putting that call into a simple function call that then
goes before each return.
* Simplify barrier defer cleanup
* Store lease times suitable for export in pending
This essentially caches lease information for token lookups, preventing
going to disk over and over.
* Simplify logic
Taking inspiration from
https://github.com/golang/go/issues/17604#issuecomment-256384471
suggests that taking the address of a stack variable for use in atomics
works (at least, the race detector doesn't complain) but is doing it
wrong.
The only other change is a change in Leader() detecting if HA is enabled
to fast-path out. This value never changes after NewCore, so we don't
need to grab the read lock to check it.
* Add entity information request to system view
* fixing a few comments
* sharing types between plugin and logical
* sharing types between plugin and logical
* fixing output directory for proto
* removing extra replacement
* adding mount type lookup
* empty entities return nil instead of error
* adding some comments
* Add key information to list endpoints in identity.
Also fixes some bugs from before where we were persisting data that we
should not have been (mount type/path).
* Add cached lookups of real time mount info
* Optimize revokeSalted by not calling view.List twice
* Minor comment update
* Do not go through the orphaning dance if we are revoking the entire tree
* Update comment
* Hand off lease expiration to expiration manager via timers
* Use sync.Map as the cache to track token deletion state
* Add CreateOrFetchRevocationLeaseByToken to hand off token revocation to exp manager
* Update revoke and revoke-self handlers
* Fix tests
* revokeSalted: Move token entry deletion into the deferred func
* Fix test race
* Add blocking lease revocation test
* Remove test log
* Add HandlerFunc on NoopBackend, adjust locks, and add test
* Add sleep to allow for revocations to settle
* Various updates
* Rename some functions and variables to be more clear
* Change step-down and seal to use expmgr for revoke functionality like
during request handling
* Attempt to WAL the token as being invalid as soon as possible so that
further usage will fail even if revocation does not fully complete
* Address feedback
* Return invalid lease on negative TTL
* Revert "Return invalid lease on negative TTL"
This reverts commit a39597ecdc23cf7fc69fe003eef9f10d533551d8.
* Extend sleep on tests
This takes place in two parts, since working on this exposed an issue
with response wrapping when there is a raw body set. The changes are (in
diff order):
* A CurrentWrappingLookupFunc has been added to return the current
value. This is necessary for the lookahead call since we don't want the
lookahead call to be wrapped.
* Support for unwrapping < 0.6.2 tokens via the API/CLI has been
removed, because we now have backends returning 404s with data and can't
rely on the 404 trick. These can still be read manually via
cubbyhole/response.
* KV preflight version request now ensures that its calls is not
wrapped, and restores any given function after.
* When responding with a raw body, instead of always base64-decoding a
string value and erroring on failure, on failure we assume that it
simply wasn't a base64-encoded value and use it as is.
* A test that fails on master and works now that ensures that raw body
responses that are wrapped and then unwrapped return the expected
values.
* A flag for response data that indicates to the wrapping handling that
the data contained therein is already JSON decoded (more later).
* RespondWithStatusCode now defaults to a string so that the value is
HMAC'd during audit. The function always JSON encodes the body, so
before now it was always returning []byte which would skip HMACing. We
don't know what's in the data, so this is a "better safe than sorry"
issue. If different behavior is needed, backends can always manually
populate the data instead of relying on the helper function.
* We now check unwrapped data after unwrapping to see if there were raw
flags. If so, we try to detect whether the value can be unbase64'd. The
reason is that if it can it was probably originally a []byte and
shouldn't be audit HMAC'd; if not, it was probably originally a string
and should be. In either case, we then set the value as the raw body and
hit the flag indicating that it's already been JSON decoded so not to
try again before auditing. Doing it this way ensures the right typing.
* There is now a check to see if the data coming from unwrapping is
already JSON decoded and if so the decoding is skipped before setting
the audit response.