This PR relates to a feature request logged through HashiCorp commercial
support.
Vault lacks pagination in its APIs. As a result, certain list operations
can return **very** large responses. The user's chosen audit sinks may
experience difficulty consuming audit records that swell to tens of
megabytes of JSON.
In our case, one of the systems consuming audit log data could not cope,
and failed.
The responses of list operations are typically not very interesting, as
they are mostly lists of keys, or, even when they include a "key_info"
field, are not returning confidential information. They become even less
interesting once HMAC-ed by the audit system.
Some example Vault "list" operations that are prone to becoming very
large in an active Vault installation are:
auth/token/accessors/
identity/entity/id/
identity/entity-alias/id/
pki/certs/
In response, I've coded a new option that can be applied to audit
backends, `elide_list_responses`. When enabled, response data is elided
from audit logs, only when the operation type is "list".
For added safety, the elision only applies to the "keys" and "key_info"
fields within the response data - these are conventionally the only
fields present in a list response - see logical.ListResponse, and
logical.ListResponseWithInfo. However, other fields are technically
possible if a plugin author writes unusual code, and these will be
preserved in the audit log even with this option enabled.
The elision replaces the values of the "keys" and "key_info" fields with
an integer count of the number of entries. This allows even the elided
audit logs to still be useful for answering questions like "Was any data
returned?" or "How many records were listed?".
* fix adding clientID to request in audit log
* fix boolean statement
* use standard encoding for client ID instead of urlEncoding
* change encoding in tests
* add in client counts to request handling
* remove redundant client ID generation in request handling
* directly add clientID to req after handling token usage
* Send a test message before committing a new audit device.
Also, lower timeout on connection attempts in socket device.
* added changelog
* go mod vendor (picked up some unrelated changes.)
* Skip audit device check in integration test.
Co-authored-by: swayne275 <swayne@hashicorp.com>
* Populate a token_ttl and token_issue_time field on the Auth struct of audit log entries, and in the Auth portion of a response for login methods
* Revert go fmt, better zero checking
* Update unit tests
* changelog++
Add support for hashing time.Time within slices, which unbreaks auditing of requests returning the request counters.
Break Hash into struct-specific func like HashAuth, HashRequest. Move all the copying/hashing logic from FormatRequest/FormatResponse into the new Hash* funcs. HashStructure now modifies in place instead of copying.
Instead of returning an error when trying to hash map keys of type time.Time, ignore them, i.e. pass them through unhashed.
Enable auditing on test clusters by default if the caller didn't specify any audit backends. If they do, they're responsible for setting it up.
Move audit.LogInput to sdk/logical. Allow the Data values in audited
logical.Request and Response to implement OptMarshaler, in which case
we delegate hashing/serializing responsibility to them. Add new
ClientCertificateSerialNumber audit request field.
SystemView can now be cast to ExtendedSystemView to expose the Auditor
interface, which allows submitting requests and responses to the audit
broker.
* This changes the way policies are reported in audit logs.
Previously, only policies tied to tokens would be reported. This could
make it difficult to perform after-the-fact analysis based on both the
initial response entry and further requests. Now, the full set of
applicable policies from both the token and any derived policies from
Identity are reported.
To keep things consistent, token authentications now also return the
full set of policies in api.Secret.Auth responses, so this both makes it
easier for users to understand their actual full set, and it matches
what the audit logs now report.
* porting identity to OSS
* changes that glue things together
* add testing bits
* wrapped entity id
* fix mount error
* some more changes to core
* fix storagepacker tests
* fix some more tests
* fix mount tests
* fix http mount tests
* audit changes for identity
* remove upgrade structs on the oss side
* added go-memdb to vendor
* Store original request path in WrapInfo as CreationPath
* Add wrapping_token_creation_path to CLI output
* Add CreationPath to AuditResponseWrapInfo
* Fix tests
* Add and fix tests, update API docs with new sample responses
* audit: Added token_num_uses to audit response
* Fixed jsonx tests
* Revert logical auth to NumUses instead of TokenNumUses
* s/TokenNumUses/NumUses
* Audit: Add num uses to audit requests as well
* Added RemainingUses to distinguish NumUses in audit requests
* Add /sys/config/audited-headers endpoint for configuring the headers that will be audited
* Remove some debug lines
* Add a persistant layer and refactor a bit
* update the api endpoints to be more restful
* Add comments and clean up a few functions
* Remove unneeded hash structure functionaility
* Fix existing tests
* Add tests
* Add test for Applying the header config
* Add Benchmark for the ApplyConfig method
* ResetTimer on the benchmark:
* Update the headers comment
* Add test for audit broker
* Use hyphens instead of camel case
* Add size paramater to the allocation of the result map
* Fix the tests for the audit broker
* PR feedback
* update the path and permissions on config/* paths
* Add docs file
* Fix TestSystemBackend_RootPaths test
request and response. This makes it far easier to properly check
validity elsewhere in Vault because we simply replace the request client
token with the inner value.
This fixes#1911 but not directly; it doesn't address the cause of the
panic. However, it turns out that this is the correct fix anyways,
because it ensures that the value being logged is RFC3339 format, which
is what the time turns into in JSON but not the normal time string
value, so what we audit log (and HMAC) matches what we are returning.