* Add OIDC token generation to Identity
There are a few open TODOs and some remaining cleanup, but this is
functionally complete and ready for review.
(Tests will being added soon.)
* Simplified key update endpoint
* Cache the config
* Fix Issuer handling
* Suppose base64-encoded templates (#6919)
* Cache JWKS and switch to go-cache (#6918)
* Address review comments
* Add warning if neither Issue nor api_addr are set
* adds tests (#6937)
* adds help synopsis and descriptions to the framework path for the oid… (#6930)
* adds help synopsis and descriptions to the framework path for the oidc backend
* Update vault/identity_store_oidc.go
Co-Authored-By: Jim Kalafut <jim@kalafut.net>
* Add Now parameter to PopulateStringInput
* Addressing review comments
* Refactor template processing to improve mode-specific handling
* adds a test for the periodic func (#6943)
* adds a test for the periodic func
* removes commented out code
* adds a comment
* Add comments
* Work on raft backend
* Add logstore locally
* Add encryptor and unsealable interfaces
* Add clustering support to raft
* Remove client and handler
* Bootstrap raft on init
* Cleanup raft logic a bit
* More raft work
* Work on TLS config
* More work on bootstrapping
* Fix build
* More work on bootstrapping
* More bootstrapping work
* fix build
* Remove consul dep
* Fix build
* merged oss/master into raft-storage
* Work on bootstrapping
* Get bootstrapping to work
* Clean up FMS and node-id
* Update local node ID logic
* Cleanup node-id change
* Work on snapshotting
* Raft: Add remove peer API (#906)
* Add remove peer API
* Add some comments
* Fix existing snapshotting (#909)
* Raft get peers API (#912)
* Read raft configuration
* address review feedback
* Use the Leadership Transfer API to step-down the active node (#918)
* Raft join and unseal using Shamir keys (#917)
* Raft join using shamir
* Store AEAD instead of master key
* Split the raft join process to answer the challenge after a successful unseal
* get the follower to standby state
* Make unseal work
* minor changes
* Some input checks
* reuse the shamir seal access instead of new default seal access
* refactor joinRaftSendAnswer function
* Synchronously send answer in auto-unseal case
* Address review feedback
* Raft snapshots (#910)
* Fix existing snapshotting
* implement the noop snapshotting
* Add comments and switch log libraries
* add some snapshot tests
* add snapshot test file
* add TODO
* More work on raft snapshotting
* progress on the ConfigStore strategy
* Don't use two buckets
* Update the snapshot store logic to hide the file logic
* Add more backend tests
* Cleanup code a bit
* [WIP] Raft recovery (#938)
* Add recovery functionality
* remove fmt.Printfs
* Fix a few fsm bugs
* Add max size value for raft backend (#942)
* Add max size value for raft backend
* Include physical.ErrValueTooLarge in the message
* Raft snapshot Take/Restore API (#926)
* Inital work on raft snapshot APIs
* Always redirect snapshot install/download requests
* More work on the snapshot APIs
* Cleanup code a bit
* On restore handle special cases
* Use the seal to encrypt the sha sum file
* Add sealer mechanism and fix some bugs
* Call restore while state lock is held
* Send restore cb trigger through raft log
* Make error messages nicer
* Add test helpers
* Add snapshot test
* Add shamir unseal test
* Add more raft snapshot API tests
* Fix locking
* Change working to initalize
* Add underlying raw object to test cluster core
* Move leaderUUID to core
* Add raft TLS rotation logic (#950)
* Add TLS rotation logic
* Cleanup logic a bit
* Add/Remove from follower state on add/remove peer
* add comments
* Update more comments
* Update request_forwarding_service.proto
* Make sure we populate all nodes in the followerstate obj
* Update times
* Apply review feedback
* Add more raft config setting (#947)
* Add performance config setting
* Add more config options and fix tests
* Test Raft Recovery (#944)
* Test raft recovery
* Leave out a node during recovery
* remove unused struct
* Update physical/raft/snapshot_test.go
* Update physical/raft/snapshot_test.go
* fix vendoring
* Switch to new raft interface
* Remove unused files
* Switch a gogo -> proto instance
* Remove unneeded vault dep in go.sum
* Update helper/testhelpers/testhelpers.go
Co-Authored-By: Calvin Leung Huang <cleung2010@gmail.com>
* Update vault/cluster/cluster.go
* track active key within the keyring itself (#6915)
* track active key within the keyring itself
* lookup and store using the active key ID
* update docstring
* minor refactor
* Small text fixes (#6912)
* Update physical/raft/raft.go
Co-Authored-By: Calvin Leung Huang <cleung2010@gmail.com>
* review feedback
* Move raft logical system into separate file
* Update help text a bit
* Enforce cluster addr is set and use it for raft bootstrapping
* Fix tests
* fix http test panic
* Pull in latest raft-snapshot library
* Add comment
* Add priority queue to sdk
* fix issue of storing pointers and now copy
* update to use copy structure
* Remove file, put Item struct def. into other file
* add link
* clean up docs
* refactor internal data structure to hide heap method implementations. Other cleanup after feedback
* rename PushItem and PopItem to just Push/Pop, after encapsulating the heap methods
* updates after feedback
* refactoring/renaming
* guard against pushing a nil item
* minor updates after feedback
* Add SetCredentials, GenerateCredentials gRPC methods to combined database backend gPRC
* Initial Combined database backend implementation of static accounts and automatic rotation
* vendor updates
* initial implementation of static accounts with Combined database backend, starting with PostgreSQL implementation
* add lock and setup of rotation queue
* vendor the queue
* rebase on new method signature of queue
* remove mongo tests for now
* update default role sql
* gofmt after rebase
* cleanup after rebasing to remove checks for ErrNotFound error
* rebase cdcr-priority-queue
* vendor dependencies with 'go mod vendor'
* website database docs for Static Role support
* document the rotate-role API endpoint
* postgres specific static role docs
* use constants for paths
* updates from review
* remove dead code
* combine and clarify error message for older plugins
* Update builtin/logical/database/backend.go
Co-Authored-By: Jim Kalafut <jim@kalafut.net>
* cleanups from feedback
* code and comment cleanups
* move db.RLock higher to protect db.GenerateCredentials call
* Return output with WALID if we failed to delete the WAL
* Update builtin/logical/database/path_creds_create.go
Co-Authored-By: Jim Kalafut <jim@kalafut.net>
* updates after running 'make fmt'
* update after running 'make proto'
* Update builtin/logical/database/path_roles.go
Co-Authored-By: Brian Kassouf <briankassouf@users.noreply.github.com>
* Update builtin/logical/database/path_roles.go
Co-Authored-By: Brian Kassouf <briankassouf@users.noreply.github.com>
* update comment and remove and rearrange some dead code
* Update website/source/api/secret/databases/index.html.md
Co-Authored-By: Jim Kalafut <jim@kalafut.net>
* cleanups after review
* Update sdk/database/dbplugin/grpc_transport.go
Co-Authored-By: Brian Kassouf <briankassouf@users.noreply.github.com>
* code cleanup after feedback
* remove PasswordLastSet; it's not used
* document GenerateCredentials and SetCredentials
* Update builtin/logical/database/path_rotate_credentials.go
Co-Authored-By: Brian Kassouf <briankassouf@users.noreply.github.com>
* wrap pop and popbykey in backend methods to protect against nil cred rotation queue
* use strings.HasPrefix instead of direct equality check for path
* Forgot to commit this
* updates after feedback
* re-purpose an outdated test to now check that static and dynamic roles cannot share a name
* check for unique name across dynamic and static roles
* refactor loadStaticWALs to return a map of name/setCredentialsWAL struct to consolidate where we're calling set credentials
* remove commented out code
* refactor to have loadstaticwals filter out wals for roles that no longer exist
* return error if nil input given
* add nil check for input into setStaticAccount
* Update builtin/logical/database/path_roles.go
Co-Authored-By: Brian Kassouf <briankassouf@users.noreply.github.com>
* add constant for queue tick time in seconds, used for comparrison in updates
* Update builtin/logical/database/path_roles.go
Co-Authored-By: Jim Kalafut <jim@kalafut.net>
* code cleanup after review
* remove misplaced code comment
* remove commented out code
* create a queue in the Factory method, even if it's never used
* update path_roles to use a common set of fields, with specific overrides for dynamic/static roles by type
* document new method
* move rotation things into a specific file
* rename test file and consolidate some static account tests
* Update builtin/logical/database/path_roles.go
Co-Authored-By: Brian Kassouf <briankassouf@users.noreply.github.com>
* Update builtin/logical/database/rotation.go
Co-Authored-By: Brian Kassouf <briankassouf@users.noreply.github.com>
* Update builtin/logical/database/rotation.go
Co-Authored-By: Brian Kassouf <briankassouf@users.noreply.github.com>
* Update builtin/logical/database/rotation.go
Co-Authored-By: Brian Kassouf <briankassouf@users.noreply.github.com>
* Update builtin/logical/database/rotation.go
Co-Authored-By: Brian Kassouf <briankassouf@users.noreply.github.com>
* Update builtin/logical/database/rotation.go
Co-Authored-By: Brian Kassouf <briankassouf@users.noreply.github.com>
* update code comments, method names, and move more methods into rotation.go
* update comments to be capitalized
* remove the item from the queue before we try to destroy it
* findStaticWAL returns an error
* use lowercase keys when encoding WAL entries
* small cleanups
* remove vestigial static account check
* remove redundant DeleteWAL call in populate queue
* if we error on loading role, push back to queue with 10 second backoff
* poll in initqueue to make sure the backend is setup and can write/delete data
* add revoke_user_on_delete flag to allow users to opt-in to revoking the static database user on delete of the Vault role. Default false
* add code comments on read-only loop
* code comment updates
* re-push if error returned from find static wal
* add locksutil and acquire locks when pop'ing from the queue
* grab exclusive locks for updating static roles
* Add SetCredentials and GenerateCredentials stubs to mockPlugin
* add a switch in initQueue to listen for cancelation
* remove guard on zero time, it should have no affect
* create a new context in Factory to pass on and use for closing the backend queue
* restore master copy of vendor dir
* Fix a deadlock if a panic happens during request handling
During request handling, if a panic is created, deferred functions are
run but otherwise execution stops. #5889 changed some locks to
non-defers but had the side effect of causing the read lock to not be
released if the request panicked. This fixes that and addresses a few
other potential places where things could go wrong:
1) In sealInitCommon we always now defer a function that unlocks the
read lock if it hasn't been unlocked already
2) In StepDown we defer the RUnlock but we also had two error cases that
were calling it manually. These are unlikely to be hit but if they were
I believe would cause a panic.
* Add panic recovery test
This allows us to truly delete policies when we've either invalidated it
(which since they're singletons/default should only happen when we're
doing a namespace delete) or are doing a namespace delete on the local
node.
When unmounting, the router entry would be tainted, preventing routing.
However, we would then unmount the router before clearing storage, so if
an error occurred the router would have forgotten the path. For auth
mounts this isn't a problem since they had a secondary check, but
regular mounts didn't (not sure why, but this is true back to at least
0.2.0). This meant you could then create a duplicate mount using the
same path which would then not conflict in the router until postUnseal.
This adds the extra check to regular mounts, and also moves the location
of the router unmount.
This also ensures that on the next router.Mount, tainted is set to the
mount entry's tainted status.
Fixes#6769
Move audit.LogInput to sdk/logical. Allow the Data values in audited
logical.Request and Response to implement OptMarshaler, in which case
we delegate hashing/serializing responsibility to them. Add new
ClientCertificateSerialNumber audit request field.
SystemView can now be cast to ExtendedSystemView to expose the Auditor
interface, which allows submitting requests and responses to the audit
broker.
* Port over some SP v2 bits
Specifically:
* Add too-large handling to Physical (Consul only for now)
* Contextify some identity funcs
* Update SP protos
* Add size limiting to inmem storage
These paths are handled directly in handler.go, but the list of special
paths here impacts the x-vault-unauthenticated field in generated
OpenAPI.
Fixes: #6651
Merge both functions for creating mongodb containers into one.
Add retries to docker container cleanups.
Require $VAULT_ACC be set to enable AWS tests.
* Don't bother trying to save metrics when we don't have a barrier. Use stateLock.
* Use c.barrier instead of c.systemBarrierView, thus we don't need locking
and don't need to worry about race with mount setup.
* Remove unneccessary lock.
* Fix a locking issue in the Rollback manager
* Update rollback.go
* Update rollback.go
* move state creation
* Update vault/rollback.go
Co-Authored-By: briankassouf <briankassouf@users.noreply.github.com>
* Simplify logic by canceling the lock grab
* Use context instead of a chan
* Update vault/rollback.go
* fixing dockertest to run on travis
* try a repo local directory
* precreate the directory
* strip extraneous comment
* check directory was created
* try to print container logs
* try writing out client logs
* one last try
* Attempt to fix test
* convert to insecure tls
* strip test-temp
Increment a counter whenever a request is received.
The in-memory counter is persisted to counters/requests/YYYY/MM.
When the month wraps around, we reset the in-memory counter to
zero.
Add an endpoint for querying the request counters across all time.
* Add ability to migrate autoseal to autoseal
This adds the ability to migrate from shamir to autoseal, autoseal to
shamir, or autoseal to autoseal, by allowing multiple seal stanzas. A
disabled stanza will be used as the config being migrated from; this can
also be used to provide an unwrap seal on ent over multiple unseals.
A new test is added to ensure that autoseal to autoseal works as
expected.
* Fix test
* Provide default shamir info if not given in config
* Linting feedback
* Remove context var that isn't used
* Don't run auto unseal watcher when in migration, and move SetCores to SetSealsForMigration func
* Slight logic cleanup
* Fix test build and fix bug
* Updates
* remove GetRecoveryKey function
* First pass at filtered-path endpoint. It seems to be working, but there are tests missing, and possibly some optimization to handle large key sets.
* Vendor go-cmp.
* Fix incomplete vendoring of go-cmp.
* Improve test coverage. Fix bug whereby access to a subtree named X would expose existence of a the key named X at the same level.
* Add benchmarks, which showed that hasNonDenyCapability would be "expensive" to call for every member of a large folder. Made a couple of minor tweaks so that now it can be done without allocations.
* Comment cleanup.
* Review requested changes: rename some funcs, use routeCommon instead of
querying storage directly.
* Keep the same endpoint for now, but move it from a LIST to a POST and allow multiple paths to be queried in one operation.
* Modify test to pass multiple paths in at once.
* Add endpoint to default policy.
* Move endpoint to /sys/access/filtered-path.
* Handle ns lease and token renew/revoke via relative paths
* s/usin/using/
* add token and lease lookup paths; set ctx only on non-nil ns
Addtionally, use client token's ns for auth/token/lookup if no token is provided
* Adding Transit Autoseal
* adding tests
* adding more tests
* updating seal info
* send a value to test and set current key id
* updating message
* cleanup
* Adding tls config, addressing some feedback
* adding tls testing
* renaming config fields for tls
* Path globbing
* Add glob support at the beginning
* Ensure when evaluating an ACL that our path never has a leading slash. This already happens in the normal request path but not in tests; putting it here provides it for tests and extra safety in case the request path changes
* Simplify the algorithm, we don't really need to validate the prefix first as glob won't apply if it doesn't
* Add path segment wildcarding
* Disable path globbing for now
* Remove now-unneeded test
* Remove commented out globbing bits
* Remove more holdover glob bits
* Rename k var to something more clear
* Port over OSS cluster port refactor components
* Start forwarding
* Cleanup a bit
* Fix copy error
* Return error from perf standby creation
* Add some more comments
* Fix copy/paste error
* initial commit for prometheus and sys/metrics support
* Throw an error if prometheusRetentionTime is 0,add prometheus in devmode
* return when format=prometheus is used and prom is disable
* parse prometheus_retention_time from string instead of int
* Initialize config.Telemetry if nil
* address PR issues
* add sys/metrics framework.Path in a factory
* Apply requiredMountTable entries's MountConfig to existing core table
* address pr comments
* enable prometheus sink by default
* Move Metric-related code in a separate metricsutil helper
* Fixes a regression in forwarding from #6115
Although removing the authentication header is good defense in depth,
for forwarding mechanisms that use the raw request, we never add it
back. This caused perf standby tests to throw errors. Instead, once
we're past the point at which we would do any raw forwarding, but before
routing the request, remove the header.
To speed this up, a flag is set in the logical.Request to indicate where
the token is sourced from. That way we don't iterate through maps
unnecessarily.
* Merge entities during unseal only on the primary
* Add another guard check
* Add perf standby to the check
* Make primary to not differ from case-insensitivity status w.r.t secondaries
* Ensure mutual exclusivity between loading and invalidations
* Both primary and secondaries won't persist during startup and invalidations
* Allow primary to persist when loading case sensitively
* Using core.perfStandby
* Add a tweak in core for testing
* Address review feedback
* update memdb but not storage in secondaries
* Wire all the things directly do mergeEntity
* Fix persist behavior
* Address review feedback
* Two things:
* Change how we populate and clear leader UUID. This fixes a case where
if a standby disconnects from an active node and reconnects, without the
active node restarting, the UUID doesn't change so triggers on a new
active node don't get run.
* Add a bunch of test helpers and minor updates to things.
This lets other parts of Vault that can't depend on the vault package
take advantage of the subview functionality.
This also allows getting rid of BarrierStorage and vault.Entry, two
totally redundant abstractions.
* Add helper for checking if an error is a fatal error
The double-double negative was really confusing, and this pattern is used a few places in Vault. This negates the double negative, making the devx a bit easier to follow.
* Check return value of UnsealWithStoredKeys in sys/init
* Return proper error types when attempting unseal with stored key
Prior to this commit, "nil" could have meant unsupported auto-unseal, a transient error, or success. This updates the function to return the correct error type, signaling to the caller whether they should retry or fail.
* Continuously attempt to unseal if sealed keys are supported
This fixes a bug that occurs on bootstrapping an initial cluster. Given a collection of Vault nodes and an initialized storage backend, they will all go into standby waiting for initialization. After one node is initialized, the other nodes had no mechanism by which they "re-check" to see if unseal keys are present. This adds a goroutine to the server command which continually waits for unseal keys to exist. It exits in the following conditions:
- the node is unsealed
- the node does not support stored keys
- a fatal error occurs (as defined by Vault)
- the server is shutting down
In all other situations, the routine wakes up at the specified interval and attempts to unseal with the stored keys.
* Upgrade to new Cloud KMS client libraries
We recently released the new Cloud KMS client libraries which use GRPC
instead of HTTP. They are faster and look nicer (</opinion>), but more
importantly they drastically simplify a lot of the logic around client
creation, encryption, and decryption. In particular, we can drop all the
logic around looking up credentials and base64-encoding/decoding.
Tested on a brand new cluster (no pre-existing unseal keys) and against
a cluster with stored keys from a previous version of Vault to ensure no
regressions.
* Use the default scopes the client requests
The client already does the right thing here, so we don't need to
surface it, especially since we aren't allowing users to configure it.
* fix cubbyhole deletion
* Fix error handling
* Move the cubbyhole tidy logic to token store and track the revocation count
* Move fetching of cubby keys before the tidy loop
* Fix context getting cancelled
* Test the cubbyhole cleanup logic
* Add progress counter for cubbyhole cleanup
* Minor polish
* Use map instead of slice for faster computation
* Add test for cubbyhole deletion
* Add a log statement for deletion
* Add SHA1 hashed tokens into the mix
The result will still pass gofmtcheck and won't trigger additional
changes if someone isn't using goimports, but it will avoid the
piecemeal imports changes we've been seeing.
This changes the behavior of the GCPCKMS auto-unsealer setup to attempt
encryption instead of a key lookup. Key lookups are a different API
method not covered by roles/cloudkms.cryptoKeyEncrypterDecrypter. This
means users must grant an extended scope to their service account
(granting the ability to read key data) which only seems to be used to
validate the existence of the key.
Worse, the only roles that include this permission are overly verbose
(e.g. roles/viewer which gives readonly access to everything in the
project and roles/cloudkms.admin which gives full control over all key
operations). This leaves the user stuck between choosing to create a
custom IAM role (which isn't fun) or grant overly broad permissions.
By changing to an encrypt call, we get better verification of the unseal
permissions and users can reduce scope to a single role.
* Refactor mount tune to support upsert options values and unset options.
* Do not allow unsetting options map
* add secret tune version regression test
* Only accept valid options version
* s/meVersion/optVersion/
We're having issues with leases in the GCS backend storage being
corrupted and failing MAC checking. When that happens, we need to know
the lease ID so we can address the corruption by hand and take
appropriate action.
This will hopefully prevent any instances of incomplete data being sent
to GSS
* Use common abstraction for entity deletion
* Update group memberships before deleting entity
* Added test
* Fix return statements
* Update comment
* Cleanup member entity IDs while loading groups
* Added test to ensure that upgrade happens properly
* Ensure that the group gets persisted if upgrade code modifies it
* Continue on plugin registration error in dev mode
* Continue only on unknown type error
* Continue only on unknown type error
* Print plugin registration error on exit
Co-Authored-By: calvn <cleung2010@gmail.com>
* Support registering plugin with name only
* Make RegisterPlugin backwards compatible
* Add CLI backwards compat command to plugin info and deregister
* Add server-side deprecation warnings if old read/dereg API endpoints are called
* Address feedback
The slice returned by `collectGroupsReverseDFS` is an updated copy of
the slice given to it when called. Appending `pGroups` to `groups`
therefore led to expontential memory usage as the slice was repeatedly
appended to itself.
Fixes#5605
* case insensitive identity names
* TestIdentityStore_GroupHierarchyCases
* address review feedback
* Use errwrap.Contains instead of errwrap.ContainsType
* Warn about duplicate names all the time to help fix them
* Address review feedback
* Support Authorization Bearer as token header
* add requestAuth test
* remove spew debug output in test
* Add Authorization in CORS Allowed headers
* use const where applicable
* use less allocations in bearer token checking
* address PR comments on tests and apply last commit
* reorder error checking in a TestHandler_requestAuth
* logical/aws: Harden WAL entry creation
If AWS IAM user creation failed in any way, the WAL corresponding to the
IAM user would get left around and Vault would try to roll it back.
However, because the user never existed, the rollback failed. Thus, the
WAL would essentially get "stuck" and Vault would continually attempt to
roll it back, failing every time. A similar situation could arise if the
IAM user that Vault created got deleted out of band, or if Vault deleted
it but was unable to write the lease revocation back to storage (e.g., a
storage failure).
This attempts to harden it in two ways. One is by deleting the WAL log
entry if the IAM user creation fails. However, the WAL deletion could
still fail, and this wouldn't help where the user is deleted out of
band, so second, consider the user rolled back if the user just doesn't
exist, under certain circumstances.
Fixes#5190
* Fix segfault in expiration unit tests
TestExpiration_Tidy was passing in a leaseEntry that had a nil Secret,
which then caused a segfault as the changes to revokeEntry didn't check
whether Secret was nil; this is probably unlikely to occur in real life,
but good to be extra cautious.
* Fix potential segfault
Missed the else...
* Respond to PR feedback
* Fix for using ExplicitMaxTTL in auth method plugins.
* Reverted pb.go files for readability of PR.
* Fixed indenting of comment.
* Reverted unintended change by go test.
This works around a very, very common error where people write policies
to affect listing but forget the slash at the end. If there is no exact
rule with a slash at the end when doing a list, we look to see if there
is a rule without it, and if so, use those capabilities.
Fixes #mass-user-confusion