This change makes it so that if a lease is revoked through user action,
we set the expiration time to now and update pending, just as we do with
tokens. This allows the normal retry logic to apply in these cases as
well, instead of just erroring out immediately. The idea being that once
you tell Vault to revoke something it should keep doing its darndest to
actually make that happen.
* Add an idle timeout for the server
Because tidy operations can be long-running, this also changes all tidy
operations to behave the same operationally (kick off the process, get a
warning back, log errors to server log) and makes them all run in a
goroutine.
This could mean a sort of hard stop if Vault gets sealed because the
function won't have the read lock. This should generally be okay
(running tidy again should pick back up where it left off), but future
work could use cleanup funcs to trigger the functions to stop.
* Fix up tidy test
* Add deadline to cluster connections and an idle timeout to the cluster server, plus add readheader/read timeout to api server
* Store lease times suitable for export in pending
This essentially caches lease information for token lookups, preventing
going to disk over and over.
* Simplify logic
Taking inspiration from
https://github.com/golang/go/issues/17604#issuecomment-256384471
suggests that taking the address of a stack variable for use in atomics
works (at least, the race detector doesn't complain) but is doing it
wrong.
The only other change is a change in Leader() detecting if HA is enabled
to fast-path out. This value never changes after NewCore, so we don't
need to grab the read lock to check it.
* Hand off lease expiration to expiration manager via timers
* Use sync.Map as the cache to track token deletion state
* Add CreateOrFetchRevocationLeaseByToken to hand off token revocation to exp manager
* Update revoke and revoke-self handlers
* Fix tests
* revokeSalted: Move token entry deletion into the deferred func
* Fix test race
* Add blocking lease revocation test
* Remove test log
* Add HandlerFunc on NoopBackend, adjust locks, and add test
* Add sleep to allow for revocations to settle
* Various updates
* Rename some functions and variables to be more clear
* Change step-down and seal to use expmgr for revoke functionality like
during request handling
* Attempt to WAL the token as being invalid as soon as possible so that
further usage will fail even if revocation does not fully complete
* Address feedback
* Return invalid lease on negative TTL
* Revert "Return invalid lease on negative TTL"
This reverts commit a39597ecdc23cf7fc69fe003eef9f10d533551d8.
* Extend sleep on tests
* govet cleanup in token store
* adding general ttl handling to login requests
* consolidating TTL calculation to system view
* deprecate LeaseExtend
* deprecate LeaseExtend
* set the increment to the correct value
* move calculateTTL out of SystemView
* remove unused value
* add back clearing of lease id
* implement core ttl in some backends
* removing increment and issue time from lease options
* adding ttl tests, fixing some compile issue
* adding ttl tests
* fixing some explicit max TTL logic
* fixing up some tests
* removing unneeded test
* off by one errors...
* adding back some logic for bc
* adding period to return on renewal
* tweaking max ttl capping slightly
* use the appropriate precision for ttl calculation
* deprecate proto fields instead of delete
* addressing feedback
* moving TTL handling for backends to core
* mongo is a secret backend not auth
* adding estimated ttl for backends that also manage the expiration time
* set the estimate values before calling the renew request
* moving calculate TTL to framework, revert removal of increment and issue time from logical
* minor edits
* addressing feedback
* address more feedback
* logbridge with hclog and identical output
* Initial search & replace
This compiles, but there is a fair amount of TODO
and commented out code, especially around the
plugin logclient/logserver code.
* strip logbridge
* fix majority of tests
* update logxi aliases
* WIP fixing tests
* more test fixes
* Update test to hclog
* Fix format
* Rename hclog -> log
* WIP making hclog and logxi love each other
* update logger_test.go
* clean up merged comments
* Replace RawLogger interface with a Logger
* Add some logger names
* Replace Trace with Debug
* update builtin logical logging patterns
* Fix build errors
* More log updates
* update log approach in command and builtin
* More log updates
* update helper, http, and logical directories
* Update loggers
* Log updates
* Update logging
* Update logging
* Update logging
* Update logging
* update logging in physical
* prefixing and lowercase
* Update logging
* Move phyisical logging name to server command
* Fix som tests
* address jims feedback so far
* incorporate brians feedback so far
* strip comments
* move vault.go to logging package
* update Debug to Trace
* Update go-plugin deps
* Update logging based on review comments
* Updates from review
* Unvendor logxi
* Remove null_logger.go
* Start work on passing context to backends
* More work on passing context
* Unindent logical system
* Unindent token store
* Unindent passthrough
* Unindent cubbyhole
* Fix tests
* use requestContext in rollback and expiration managers
* Add logic for using Auth.Period when handling auth login/renew requests
* Set auth.TTL if not set in handleLoginRequest
* Always set auth.TTL = te.TTL on handleLoginRequest, check TTL and period against sys values on RenewToken
* Get sysView from le.Path, revert tests
* Add back auth.Policies
* Fix TokenStore tests, add resp warning when capping values
* Use switch for ttl/period check on RenewToken
* Move comments around
* Start work on context aware backends
* Start work on moving the database plugins to gRPC in order to pass context
* Add context to builtin database plugins
* use byte slice instead of string
* Context all the things
* Move proto messages to the dbplugin package
* Add a grpc mechanism for running backend plugins
* Serve the GRPC plugin
* Add backwards compatibility to the database plugins
* Remove backend plugin changes
* Remove backend plugin changes
* Cleanup the transport implementations
* If grpc connection is in an unexpected state restart the plugin
* Fix tests
* Fix tests
* Remove context from the request object, replace it with context.TODO
* Add a test to verify netRPC plugins still work
* Remove unused mapstructure call
* Code review fixes
* Code review fixes
* Code review fixes
* Move location of quit channel closing in exp manager
If it happens after stopping timers any timers firing before all timers
are stopped will still run the revocation function. With plugin
auto-crash-recovery this could end up instantiating a plugin that could
then try to unwrap a token from a nil token store.
This also plumbs in core so that we can grab a read lock during the
operation and check standby/sealed status before running it (after
grabbing the lock).
* Use context instead of checking core values directly
* Use official Go context in a few key places
1) Ensure that if we fail to generate a lease for a secret we attempt to revoke it
2) Ensure that any lease that is registered should never have a blank token
In theory, number 2 will let us a) find places where this *is* the case, and b) if errors are encountered when revoking tokens due to a blank client token, it suggests that the client token values are being stripped somewhere along the way, which is also instructive.
* Add a benchmark for exiration.Restore
* Add benchmarks for consul Restore functions
* Add a parallel version of expiration.Restore
* remove debug code
* Up the MaxIdleConnsPerHost
* Add tests for etcd
* Return errors and ensure go routines are exited
* Refactor inmem benchmark
* Add s3 bench and refactor a bit
* Few tweaks
* Fix race with waitgroup.Add()
* Fix waitgroup race condition
* Move wait above the info log
* Add helper/consts package to store consts that are needed in cyclic packages
* Remove not used benchmarks
This should help with transient issues. Full control over min/max delays
and number of retries (and ability to turn off) is provided in the API
and via env vars.
Fix tests.
The expiration manager would never be poked to remove token entries upon
token revocation, if that revocation was initiated in the token store
itself. It might have been to avoid deadlock, since during revocation of
tokens the expiration manager is called, which then calls back into the
token store, and so on.
This adds a way to skip that last call back into the token store if we
know that we're on the revocation path because we're in the middle of
revoking a token. That way the lease is cleaned up. This both prevents
log entries appearing for already-revoked tokens, and it also releases
timer/memory resources since we're not keeping the leases around.
In some situations, it can be impossible to revoke leases (for instance,
if someone has gone and manually removed users created by Vault). This
can not only cause Vault to cycle trying to revoke them, but it also
prevents mounts from being unmounted, leaving them in a tainted state
where the only operations allowed are to revoke (or rollback), which
will never successfully complete.
This adds a new endpoint that works similarly to `revoke-prefix` but
ignores errors coming from a backend upon revocation (it does not ignore
errors coming from within the expiration manager, such as errors
accessing the data store). This can be used to force Vault to abandon
leases.
Like `revoke-prefix`, this is a very sensitive operation and requires
`sudo`. It is implemented as a separate endpoint, rather than an
argument to `revoke-prefix`, to ensure that control can be delegated
appropriately, as even most administrators should not normally have
this privilege.
Fixes#1135
Due to a typo, revoking ensures that index entries are created rather
than removed. This adds a failing, then fixed test case (and helper
function) to ensure that index entries are properly removed on revoke.
Fixes#749
In order to implement this efficiently, I have introduced the concept of
"singleton" backends -- currently, 'sys' and 'cubbyhole'. There isn't
much reason to allow sys to be mounted at multiple places, and there
isn't much reason you'd need multiple per-token storage areas. By
restricting it to just one, I can store that particular mount instead of
iterating through them in order to call the appropriate revoke function.
Additionally, because revocation on the backend needs to be triggered by
the token store, the token store's salt is kept in the router and
client tokens going to the cubbyhole backend are double-salted by the
router. This allows the token store to drive when revocation happens
using its salted tokens.