* Populate a token_ttl and token_issue_time field on the Auth struct of audit log entries, and in the Auth portion of a response for login methods
* Revert go fmt, better zero checking
* Update unit tests
* changelog++
* token/renewal: return full set of token and identity policies in the policies field
* extend tests to cover additional token and identity policies on a token
* verify identity_policies returned on login and renewals
* core: revoke the proper token on partial failures from token-related requests
* move test to vault package, move test trigger to expiration manager
* update logging messages for clarity
* docstring fix
* Don't allow registering a non-root zero TTL token lease
This is defense-in-depth in that such a token was not allowed to be
used; however it's also a bug fix in that this would then cause no lease
to be generated but the token entry to be written, meaning the token
entry would stick around until it was attempted to be used or tidied (in
both cases the internal lookup would see that this was invalid and do a
revoke on the spot).
* Fix tests
* tidy
* Initial work
* rework
* s/dr/recovery
* Add sys/raw support to recovery mode (#7577)
* Factor the raw paths out so they can be run with a SystemBackend.
# Conflicts:
# vault/logical_system.go
* Add handleLogicalRecovery which is like handleLogical but is only
sufficient for use with the sys-raw endpoint in recovery mode. No
authentication is done yet.
* Integrate with recovery-mode. We now handle unauthenticated sys/raw
requests, albeit on path v1/raw instead v1/sys/raw.
* Use sys/raw instead raw during recovery.
* Don't bother persisting the recovery token. Authenticate sys/raw
requests with it.
* RecoveryMode: Support generate-root for autounseals (#7591)
* Recovery: Abstract config creation and log settings
* Recovery mode integration test. (#7600)
* Recovery: Touch up (#7607)
* Recovery: Touch up
* revert the raw backend creation changes
* Added recovery operation token prefix
* Move RawBackend to its own file
* Update API path and hit it using CLI flag on generate-root
* Fix a panic triggered when handling a request that yields a nil response. (#7618)
* Improve integ test to actually make changes while in recovery mode and
verify they're still there after coming back in regular mode.
* Refuse to allow a second recovery token to be generated.
* Resize raft cluster to size 1 and start as leader (#7626)
* RecoveryMode: Setup raft cluster post unseal (#7635)
* Setup raft cluster post unseal in recovery mode
* Remove marking as unsealed as its not needed
* Address review comments
* Accept only one seal config in recovery mode as there is no scope for migration
* Fix various read only storage errors
A mistake we've seen multiple times in our own plugins and that we've
seen in the GCP plugin now is that control flow (how the code is
structured, helper functions, etc.) can obfuscate whether an error came
from storage or some other Vault-core location (in which case likely it
needs to be a 5XX message) or because of user input (thus 4XX). Error
handling for functions therefore often ends up always treating errors as
either user related or internal.
When the error is logical.ErrReadOnly this means that treating errors as
user errors skips the check that triggers forwarding, instead returning
a read only view error to the user.
While it's obviously more correct to fix that code, it's not always
immediately apparent to reviewers or fixers what the issue is and fixing
it when it's found both requires someone to hit the problem and report
it (thus exposing bugs to users) and selective targeted refactoring that
only helps that one specific case.
If instead we check whether the logical.Response is an error and, if so,
whether it contains the error value, we work around this in all of these
cases automatically. It feels hacky since it's a coding mistake, but
it's one we've made too multiple times, and avoiding bugs altogether is
better for our users.
Earlier in tokenutil's dev it seemed like there was no reason to allow
auth plugins to toggle renewability off. However, it turns out Centrify
makes use of this for sensible reasons. As a result, move the forcing-on
of renewability into tokenutil, but then allow overriding after
PopulateTokenAuth is called.
* Fix a deadlock if a panic happens during request handling
During request handling, if a panic is created, deferred functions are
run but otherwise execution stops. #5889 changed some locks to
non-defers but had the side effect of causing the read lock to not be
released if the request panicked. This fixes that and addresses a few
other potential places where things could go wrong:
1) In sealInitCommon we always now defer a function that unlocks the
read lock if it hasn't been unlocked already
2) In StepDown we defer the RUnlock but we also had two error cases that
were calling it manually. These are unlikely to be hit but if they were
I believe would cause a panic.
* Add panic recovery test
Move audit.LogInput to sdk/logical. Allow the Data values in audited
logical.Request and Response to implement OptMarshaler, in which case
we delegate hashing/serializing responsibility to them. Add new
ClientCertificateSerialNumber audit request field.
SystemView can now be cast to ExtendedSystemView to expose the Auditor
interface, which allows submitting requests and responses to the audit
broker.
* Port over some SP v2 bits
Specifically:
* Add too-large handling to Physical (Consul only for now)
* Contextify some identity funcs
* Update SP protos
* Add size limiting to inmem storage
Increment a counter whenever a request is received.
The in-memory counter is persisted to counters/requests/YYYY/MM.
When the month wraps around, we reset the in-memory counter to
zero.
Add an endpoint for querying the request counters across all time.
* Fixes a regression in forwarding from #6115
Although removing the authentication header is good defense in depth,
for forwarding mechanisms that use the raw request, we never add it
back. This caused perf standby tests to throw errors. Instead, once
we're past the point at which we would do any raw forwarding, but before
routing the request, remove the header.
To speed this up, a flag is set in the logical.Request to indicate where
the token is sourced from. That way we don't iterate through maps
unnecessarily.
The result will still pass gofmtcheck and won't trigger additional
changes if someone isn't using goimports, but it will avoid the
piecemeal imports changes we've been seeing.
* Fix for using ExplicitMaxTTL in auth method plugins.
* Reverted pb.go files for readability of PR.
* Fixed indenting of comment.
* Reverted unintended change by go test.
* Initial work on templating
* Add check for unbalanced closing in front
* Add missing templated assignment
* Add first cut of end-to-end test on templating.
* Make template errors be 403s and finish up testing
* Review feedback
* plumbing request context to expiration manager
* moar context
* address feedback
* only using active context for revoke prefix
* using active context for revoke commands
* cancel tidy on active context
* address feedback
* core: Cancel context before taking state lock
* Create active context outside of postUnseal
* Attempt to drain requests before canceling context
* fix test
* Add request timeouts in normal request path and to expirations
* Add ability to adjust default max request duration
* Some test fixes
* Ensure tests have defaults set for max request duration
* Add context cancel checking to inmem/file
* Fix tests
* Fix tests
* Set default max request duration to basically infinity for this release for BC
* Address feedback
* Tackle #4929 a different way
This turns c.sealed into an atomic, which allows us to call sealInternal
without a lock. By doing so we can better control lock grabbing when a
condition causing the standby loop to get out of active happens. This
encapsulates that logic into two distinct pieces (although they could
be combined into one), and makes lock guarding more understandable.
* Re-add context canceling to the non-HA version of sealInternal
* Return explicitly after stopCh triggered
1) In backends, ensure they are now using TokenPolicies
2) Don't reassign auth.Policies until after expmgr registration as we
don't need them at that point
Fixes#4829
* Allow max request size to be user-specified
This turned out to be way more impactful than I'd expected because I
felt like the right granularity was per-listener, since an org may want
to treat external clients differently from internal clients. It's pretty
straightforward though.
This also introduces actually using request contexts for values, which
so far we have not done (using our own logical.Request struct instead),
but this allows non-logical methods to still get this benefit.
* Switch to ioutil.ReadAll()
If we have a panic defer functions are run but unlocks aren't. Since we
can't really trust plugins and storage, this backs out the changes for
those parts of the request path.
* This changes the way policies are reported in audit logs.
Previously, only policies tied to tokens would be reported. This could
make it difficult to perform after-the-fact analysis based on both the
initial response entry and further requests. Now, the full set of
applicable policies from both the token and any derived policies from
Identity are reported.
To keep things consistent, token authentications now also return the
full set of policies in api.Secret.Auth responses, so this both makes it
easier for users to understand their actual full set, and it matches
what the audit logs now report.
* Remove a lot of deferred functions in the request path.
There is an interesting benchmark at https://www.reddit.com/r/golang/comments/3h21nk/simple_micro_benchmark_to_measure_the_overhead_of/
It shows that defer actually adds quite a lot of overhead -- maybe 100ns
per call but we defer a *lot* of functions in the request path. So this
removes some of the ones in request handling, ha, barrier, router, and
physical cache.
One meta-note: nearly every metrics function is in a defer which means
every metrics call we add could add a non-trivial amount of time, e.g.
for every 10 extra metrics statements we add 1ms to a request. I don't
know how to solve this right now without doing what I did in some of
these cases and putting that call into a simple function call that then
goes before each return.
* Simplify barrier defer cleanup
* Hand off lease expiration to expiration manager via timers
* Use sync.Map as the cache to track token deletion state
* Add CreateOrFetchRevocationLeaseByToken to hand off token revocation to exp manager
* Update revoke and revoke-self handlers
* Fix tests
* revokeSalted: Move token entry deletion into the deferred func
* Fix test race
* Add blocking lease revocation test
* Remove test log
* Add HandlerFunc on NoopBackend, adjust locks, and add test
* Add sleep to allow for revocations to settle
* Various updates
* Rename some functions and variables to be more clear
* Change step-down and seal to use expmgr for revoke functionality like
during request handling
* Attempt to WAL the token as being invalid as soon as possible so that
further usage will fail even if revocation does not fully complete
* Address feedback
* Return invalid lease on negative TTL
* Revert "Return invalid lease on negative TTL"
This reverts commit a39597ecdc23cf7fc69fe003eef9f10d533551d8.
* Extend sleep on tests
This takes place in two parts, since working on this exposed an issue
with response wrapping when there is a raw body set. The changes are (in
diff order):
* A CurrentWrappingLookupFunc has been added to return the current
value. This is necessary for the lookahead call since we don't want the
lookahead call to be wrapped.
* Support for unwrapping < 0.6.2 tokens via the API/CLI has been
removed, because we now have backends returning 404s with data and can't
rely on the 404 trick. These can still be read manually via
cubbyhole/response.
* KV preflight version request now ensures that its calls is not
wrapped, and restores any given function after.
* When responding with a raw body, instead of always base64-decoding a
string value and erroring on failure, on failure we assume that it
simply wasn't a base64-encoded value and use it as is.
* A test that fails on master and works now that ensures that raw body
responses that are wrapped and then unwrapped return the expected
values.
* A flag for response data that indicates to the wrapping handling that
the data contained therein is already JSON decoded (more later).
* RespondWithStatusCode now defaults to a string so that the value is
HMAC'd during audit. The function always JSON encodes the body, so
before now it was always returning []byte which would skip HMACing. We
don't know what's in the data, so this is a "better safe than sorry"
issue. If different behavior is needed, backends can always manually
populate the data instead of relying on the helper function.
* We now check unwrapped data after unwrapping to see if there were raw
flags. If so, we try to detect whether the value can be unbase64'd. The
reason is that if it can it was probably originally a []byte and
shouldn't be audit HMAC'd; if not, it was probably originally a string
and should be. In either case, we then set the value as the raw body and
hit the flag indicating that it's already been JSON decoded so not to
try again before auditing. Doing it this way ensures the right typing.
* There is now a check to see if the data coming from unwrapping is
already JSON decoded and if so the decoding is skipped before setting
the audit response.
* govet cleanup in token store
* adding general ttl handling to login requests
* consolidating TTL calculation to system view
* deprecate LeaseExtend
* deprecate LeaseExtend
* set the increment to the correct value
* move calculateTTL out of SystemView
* remove unused value
* add back clearing of lease id
* implement core ttl in some backends
* removing increment and issue time from lease options
* adding ttl tests, fixing some compile issue
* adding ttl tests
* fixing some explicit max TTL logic
* fixing up some tests
* removing unneeded test
* off by one errors...
* adding back some logic for bc
* adding period to return on renewal
* tweaking max ttl capping slightly
* use the appropriate precision for ttl calculation
* deprecate proto fields instead of delete
* addressing feedback
* moving TTL handling for backends to core
* mongo is a secret backend not auth
* adding estimated ttl for backends that also manage the expiration time
* set the estimate values before calling the renew request
* moving calculate TTL to framework, revert removal of increment and issue time from logical
* minor edits
* addressing feedback
* address more feedback
* logbridge with hclog and identical output
* Initial search & replace
This compiles, but there is a fair amount of TODO
and commented out code, especially around the
plugin logclient/logserver code.
* strip logbridge
* fix majority of tests
* update logxi aliases
* WIP fixing tests
* more test fixes
* Update test to hclog
* Fix format
* Rename hclog -> log
* WIP making hclog and logxi love each other
* update logger_test.go
* clean up merged comments
* Replace RawLogger interface with a Logger
* Add some logger names
* Replace Trace with Debug
* update builtin logical logging patterns
* Fix build errors
* More log updates
* update log approach in command and builtin
* More log updates
* update helper, http, and logical directories
* Update loggers
* Log updates
* Update logging
* Update logging
* Update logging
* Update logging
* update logging in physical
* prefixing and lowercase
* Update logging
* Move phyisical logging name to server command
* Fix som tests
* address jims feedback so far
* incorporate brians feedback so far
* strip comments
* move vault.go to logging package
* update Debug to Trace
* Update go-plugin deps
* Update logging based on review comments
* Updates from review
* Unvendor logxi
* Remove null_logger.go
* Add some requirements for versioned k/v
* Add a warning message when an upgrade is triggered
* Add path help values
* Make the kv header a const
* Add the uid to mount entry instead of options map
* Pass the backend aware uuid to the mounts and plugins
* Fix comment
* Add options to secret/auth enable and tune CLI commands (#4170)
* Switch mount/tune options to use TypeKVPairs (#4171)
* switching options to TypeKVPairs, adding bool parse for versioned flag
* flipping bool check
* Fix leases coming back from non-leased pluin kv store
* add a test for updating mount options
* Fix tests
* Start work on passing context to backends
* More work on passing context
* Unindent logical system
* Unindent token store
* Unindent passthrough
* Unindent cubbyhole
* Fix tests
* use requestContext in rollback and expiration managers
* Add logic for using Auth.Period when handling auth login/renew requests
* Set auth.TTL if not set in handleLoginRequest
* Always set auth.TTL = te.TTL on handleLoginRequest, check TTL and period against sys values on RenewToken
* Get sysView from le.Path, revert tests
* Add back auth.Policies
* Fix TokenStore tests, add resp warning when capping values
* Use switch for ttl/period check on RenewToken
* Move comments around
* external identity groups
* add local LDAP groups as well to group aliases
* add group aliases for okta credential backend
* Fix panic in tests
* fix build failure
* remove duplicated struct tag
* add test steps to test out removal of group member during renewals
* Add comment for having a prefix check in router
* fix tests
* s/parent_id/canonical_id
* s/parent/canonical in comments and errors
This removes all references I could find to:
- credential provider
- authentication backend
- authentication provider
- auth provider
- auth backend
in favor of the unified:
- auth method
* porting identity to OSS
* changes that glue things together
* add testing bits
* wrapped entity id
* fix mount error
* some more changes to core
* fix storagepacker tests
* fix some more tests
* fix mount tests
* fix http mount tests
* audit changes for identity
* remove upgrade structs on the oss side
* added go-memdb to vendor
* Store original request path in WrapInfo as CreationPath
* Add wrapping_token_creation_path to CLI output
* Add CreationPath to AuditResponseWrapInfo
* Fix tests
* Add and fix tests, update API docs with new sample responses
* exclude /sys/leases/renew from registering with expiration manager
* adding sys/leases/renew to return full secret object, adding tests to catch renew errors
A change in copystructure has caused some panics due to the custom copy
function. I'm more nervous about production panics than I am about
keeping some bad code wiping out some existing warnings, so remove the
custom copy function and just allow direct setting of Warnings.
* Add /sys/config/audited-headers endpoint for configuring the headers that will be audited
* Remove some debug lines
* Add a persistant layer and refactor a bit
* update the api endpoints to be more restful
* Add comments and clean up a few functions
* Remove unneeded hash structure functionaility
* Fix existing tests
* Add tests
* Add test for Applying the header config
* Add Benchmark for the ApplyConfig method
* ResetTimer on the benchmark:
* Update the headers comment
* Add test for audit broker
* Use hyphens instead of camel case
* Add size paramater to the allocation of the result map
* Fix the tests for the audit broker
* PR feedback
* update the path and permissions on config/* paths
* Add docs file
* Fix TestSystemBackend_RootPaths test