* Carefully move changes from the plugin-cluster-reload branch into this clean branch off master.
* Don't test this at this level, adequately covered in the api level tests
* Change PR link
* go.mod
* Vendoring
* Vendor api/sys_plugins.go
* Revert "Some of the OSS changes were clobbered when merging with quotas out of, master (#9343)"
This reverts commit 8719a9b7c4d6ca7afb2e0a85e7c570cc17081f41.
* Revert "OSS side of Global Plugin Reload (#9340)"
This reverts commit f98afb998ae50346849050e882b6be50807983ad.
* Add the initialized tag to Consul registration for parity with k8s (and for easy automated testing). Ensure that whenever we flag Vault as unsealed, we also flag it as initialized.
* Update API docs.
Co-authored-by: Jason O'Donnell <2160810+jasonodonnell@users.noreply.github.com>
* raft: initial work on raft ha storage support
* add note on join
* add todo note
* raft: add support for bootstrapping and joining existing nodes
* raft: gate bootstrap join by reading leader api address from storage
* raft: properly check for raft-only for certain conditionals
* raft: add bootstrap to api and cli
* raft: fix bootstrap cli command
* raft: add test for setting up new cluster with raft HA
* raft: extend TestRaft_HA_NewCluster to include inmem and consul backends
* raft: add test for updating an existing cluster to use raft HA
* raft: remove debug log lines, clean up verifyRaftPeers
* raft: minor cleanup
* raft: minor cleanup
* Update physical/raft/raft.go
Co-authored-by: Brian Kassouf <briankassouf@users.noreply.github.com>
* Update vault/ha.go
Co-authored-by: Brian Kassouf <briankassouf@users.noreply.github.com>
* Update vault/ha.go
Co-authored-by: Brian Kassouf <briankassouf@users.noreply.github.com>
* Update vault/logical_system_raft.go
Co-authored-by: Brian Kassouf <briankassouf@users.noreply.github.com>
* Update vault/raft.go
Co-authored-by: Brian Kassouf <briankassouf@users.noreply.github.com>
* Update vault/raft.go
Co-authored-by: Brian Kassouf <briankassouf@users.noreply.github.com>
* address feedback comments
* address feedback comments
* raft: refactor tls keyring logic
* address feedback comments
* Update vault/raft.go
Co-authored-by: Alexander Bezobchuk <alexanderbez@users.noreply.github.com>
* Update vault/raft.go
Co-authored-by: Alexander Bezobchuk <alexanderbez@users.noreply.github.com>
* address feedback comments
* testing: fix import ordering
* raft: rename var, cleanup comment line
* docs: remove ha_storage restriction note on raft
* docs: more raft HA interaction updates with migration and recovery mode
* docs: update the raft join command
* raft: update comments
* raft: add missing isRaftHAOnly check for clearing out state set earlier
* raft: update a few ha_storage config checks
* Update command/operator_raft_bootstrap.go
Co-authored-by: Vishal Nayak <vishalnayak@users.noreply.github.com>
* raft: address feedback comments
* raft: fix panic when checking for config.HAStorage.Type
* Update vault/raft.go
Co-authored-by: Alexander Bezobchuk <alexanderbez@users.noreply.github.com>
* Update website/pages/docs/commands/operator/raft.mdx
Co-authored-by: Alexander Bezobchuk <alexanderbez@users.noreply.github.com>
* raft: remove bootstrap cli command
* Update vault/raft.go
Co-authored-by: Brian Kassouf <briankassouf@users.noreply.github.com>
* Update vault/raft.go
Co-authored-by: Brian Kassouf <briankassouf@users.noreply.github.com>
* raft: address review feedback
* raft: revert vendored sdk
* raft: don't send applied index and node ID info if we're HA-only
Co-authored-by: Brian Kassouf <briankassouf@users.noreply.github.com>
Co-authored-by: Alexander Bezobchuk <alexanderbez@users.noreply.github.com>
Co-authored-by: Vishal Nayak <vishalnayak@users.noreply.github.com>
* Replaced ClusterMetricSink's cluster name with an atomic.Value.
This should permit go-race tests to pass which seal and unseal
the core.
* Replace metric sink before unseal to avoid data races.
* This package is new for 1.5 so this is not a breaking change.
* This is being moved because this code was originally intended to be used
within plugins, however the design of password policies has changed such
that this is no longer needed. Thus, this code doesn't need to be in the
public SDK.
* move adjustForSealMigration to vault package
* fix adjustForSealMigration
* begin working on new seal migration test
* create shamir seal migration test
* refactor testhelpers
* add VerifyRaftConfiguration to testhelpers
* stub out TestTransit
* Revert "refactor testhelpers"
This reverts commit 39593defd0d4c6fd79aedfd37df6298391abb9db.
* get shamir test working again
* stub out transit join
* work on transit join
* remove debug code
* initTransit now works with raft join
* runTransit works with inmem
* work on runTransit with raft
* runTransit works with raft
* cleanup tests
* TestSealMigration_TransitToShamir_Pre14
* TestSealMigration_ShamirToTransit_Pre14
* split for pre-1.4 testing
* add simple tests for transit and shamir
* fix typo in test suite
* debug wrapper type
* test debug
* test-debug
* refactor core migration
* Revert "refactor core migration"
This reverts commit a776452d32a9dca7a51e3df4a76b9234d8c0c7ce.
* begin refactor of adjustForSealMigration
* fix bug in adjustForSealMigration
* clean up tests
* clean up core refactoring
* fix bug in shamir->transit migration
* stub out test that brings individual nodes up and down
* refactor NewTestCluster
* pass listeners into newCore()
* simplify cluster address setup
* simplify extra test core setup
* refactor TestCluster for readability
* refactor TestCluster for readability
* refactor TestCluster for readability
* add shutdown func to TestCore
* add cleanup func to TestCore
* create RestartCore
* stub out TestSealMigration_ShamirToTransit_Post14
* refactor address handling in NewTestCluster
* fix listener setup in newCore()
* remove unnecessary lock from setSealsForMigration()
* rename sealmigration test package
* use ephemeral ports below 30000
* work on post-1.4 migration testing
* clean up pre-1.4 test
* TestSealMigration_ShamirToTransit_Post14 works for non-raft
* work on raft TestSealMigration_ShamirToTransit_Post14
* clean up test code
* refactor TestClusterCore
* clean up TestClusterCore
* stub out some temporary tests
* use HardcodedServerAddressProvider in seal migration tests
* work on raft for TestSealMigration_ShamirToTransit_Post14
* always use hardcoded raft address provider in seal migration tests
* debug TestSealMigration_ShamirToTransit_Post14
* fix bug in RestartCore
* remove debug code
* TestSealMigration_ShamirToTransit_Post14 works now
* clean up debug code
* clean up tests
* cleanup tests
* refactor test code
* stub out TestSealMigration_TransitToShamir_Post14
* set seals properly for transit->shamir migration
* migrateFromTransitToShamir_Post14 works for inmem
* migrateFromTransitToShamir_Post14 works for raft
* use base ports per-test
* fix seal verification test code
* simplify seal migration test suite
* simplify test suite
* cleanup test suite
* use explicit ports below 30000
* simplify use of numTestCores
* Update vault/external_tests/sealmigration/seal_migration_test.go
Co-authored-by: Calvin Leung Huang <cleung2010@gmail.com>
* Update vault/external_tests/sealmigration/seal_migration_test.go
Co-authored-by: Calvin Leung Huang <cleung2010@gmail.com>
* clean up imports
* rename to StartCore()
* Update vault/testing.go
Co-authored-by: Calvin Leung Huang <cleung2010@gmail.com>
* simplify test suite
* clean up tests
Co-authored-by: Calvin Leung Huang <cleung2010@gmail.com>
* Changes to expiration manager to walk tokens (including non-expiring ones.)
* Count by namespace in token manager.
* Keep a dictionary of policy lists and deduplicate based on it.
* enable seal wrap in all seal migration tests
* move adjustForSealMigration to vault package
* fix adjustForSealMigration
* begin working on new seal migration test
* create shamir seal migration test
* refactor testhelpers
* add VerifyRaftConfiguration to testhelpers
* stub out TestTransit
* Revert "refactor testhelpers"
This reverts commit 39593defd0d4c6fd79aedfd37df6298391abb9db.
* get shamir test working again
* stub out transit join
* work on transit join
* Revert "move resuable storage test to avoid creating import cycle"
This reverts commit b3ff2317381a5af12a53117f87d1c6fbb093af6b.
* remove debug code
* initTransit now works with raft join
* runTransit works with inmem
* work on runTransit with raft
* runTransit works with raft
* get rid of dis-used test
* cleanup tests
* TestSealMigration_TransitToShamir_Pre14
* TestSealMigration_ShamirToTransit_Pre14
* split for pre-1.4 testing
* add simple tests for transit and shamir
* fix typo in test suite
* debug wrapper type
* test debug
* test-debug
* refactor core migration
* Revert "refactor core migration"
This reverts commit a776452d32a9dca7a51e3df4a76b9234d8c0c7ce.
* begin refactor of adjustForSealMigration
* fix bug in adjustForSealMigration
* clean up tests
* clean up core refactoring
* fix bug in shamir->transit migration
* remove unnecessary lock from setSealsForMigration()
* rename sealmigration test package
* use ephemeral ports below 30000
* simplify use of numTestCores
* Add token creation counters.
* Created a utility to change TTL to bucket name.
* Add counter covering token creation for response wrapping.
* Fix namespace label, with a new utility function.
* Refactor PG container creation.
* Rework rotation tests to use shorter sleeps.
* Refactor rotation tests.
* Add a static role rotation test for MongoDB Atlas.
* Populate a token_ttl and token_issue_time field on the Auth struct of audit log entries, and in the Auth portion of a response for login methods
* Revert go fmt, better zero checking
* Update unit tests
* changelog++
* Add random string generator with rules engine
This adds a random string generation library that validates random
strings against a set of rules. The library is designed for use as generating
passwords, but can be used to generate any random strings.
* storage/raft: Advertise the configured cluster address
* Don't allow raft to start with unspecified IP
* Fix concurrent map write panic
* Add test file
* changelog++
* changelog++
* changelog++
* Update tcp_layer.go
* Update tcp_layer.go
* Only set the adverise addr if set
* storage/raft: Add committed and applied indexes to the status output
* Update api vendor
* changelog++
* Update http/sys_leader.go
Co-authored-by: Jim Kalafut <jkalafut@hashicorp.com>
Co-authored-by: Jim Kalafut <jkalafut@hashicorp.com>
* serivceregistration: refactor service registration logic to run later
* move state check to the internal func
* sr/kubernetes: update setInitialStateInternal godoc
* sr/kubernetes: remove return in setInitialState
* core/test: fix mockServiceRegistration
* address review feedback
* stub out reusable storage test
* implement reusable inmem test
* work on reusable raft test
* stub out simple raft test
* switch to reusable raft storage
* cleanup tests
* cleanup tests
* refactor tests
* verify raft configuration
* cleanup tests
* stub out reuseStorage
* use common base address across clusters
* attempt to reuse raft cluster
* tinker with test
* fix typo
* start debugging
* debug raft configuration
* add BaseClusterListenPort to TestCluster options
* use BaseClusterListenPort in test
* raft join works now
* misc cleanup of raft tests
* use configurable base port for raft test
* clean up raft tests
* add parallelized tests for all backends
* clean up reusable storage tests
* remove debugging code from startClusterListener()
* improve comments in testhelpers
* improve comments in teststorage
* improve comments and test logging
* fix typo in vault/testing
* fix typo in comments
* remove debugging code
* make number of cores parameterizable in test
* raft: check for nil on concrete type in SetupCluster
* raft: move check to its own func
* raft: func cleanup
* raft: disallow disable_clustering = true when raft storage is used
* docs: update disable_clustering to mention new behavior
This avoids SetConfig from having to grab a write lock which is called on a SIGHUP, and may block, along with a long-running requests that has a read lock held, any other operation that requires a state lock.
* token/renewal: return full set of token and identity policies in the policies field
* extend tests to cover additional token and identity policies on a token
* verify identity_policies returned on login and renewals
* fix#7623: add missed description field for GET /sys/auth/:path/tune endpoint
* fix#7623: allow empty description
* fix#7623: update tests with description field
* provide vault server flag to exit on core shutdown
* Update command/server.go
Co-Authored-By: Jeff Mitchell <jeffrey.mitchell@gmail.com>
Co-authored-by: Jeff Mitchell <jeffrey.mitchell@gmail.com>
Co-authored-by: Brian Kassouf <briankassouf@users.noreply.github.com>
* Seal migration after unsealing
* Refactor migration fields migrationInformation in core
* Perform seal migration as part of postUnseal
* Remove the sleep logic
* Use proper seal in the unseal function
* Fix migration from Auto to Shamir
* Fix the recovery config missing issue
* Address the non-ha migration case
* Fix the multi cluster case
* Avoid re-running seal migration
* Run the post migration code in new leaders
* Fix the issue of wrong recovery being set
* Address review feedback
* Add more complete testing coverage for seal migrations. (#8247)
* Add more complete testing coverage for seal migrations. Also remove VAULT_ACC gate from some tests that just depend on docker, cleanup dangling recovery config in storage after migration, and fix a call in adjustCoreForSealMigration that seems broken.
* Fix the issue of wrong recovery key being set
* Adapt tests to work with multiple cores.
* Add missing line to disable raft join.
Co-authored-by: Vishal Nayak <vishalnayak@users.noreply.github.com>
* Fix all known issues
* Remove warning
* Review feedback.
* Revert my previous change that broke raft tests. We'll need to come back and at least comment
this once we better understand why it's needed.
* Don't allow migration between same types for now
* Disable auto to auto tests for now since it uses migration between same types which is not allowed
* Update vault/core.go
Co-Authored-By: Brian Kassouf <briankassouf@users.noreply.github.com>
* Add migration logs
* Address review comments
* Add the recovery config check back
* Skip a few steps if migration is already done
* Return from waitForLeadership if migration fails
Co-authored-by: ncabatoff <nick.cabatoff@gmail.com>
Co-authored-by: Brian Kassouf <briankassouf@users.noreply.github.com>
* external_tests: ensure derived cores are stable before proceeding on tests
* testhelpers: add min duration tolerance when checking stability on derived core
* use observer pattern for service discovery
* update perf standby method
* fix test
* revert usersTags to being called serviceTags
* use previous consul code
* vault isnt a performance standby before starting
* log err
* changes from feedback
* add Run method to interface
* changes from feedback
* fix core test
* update example
* Use Shamir as KeK when migrating from auto-seal to shamir
* Use the correct number of shares/threshold for the migrated seal.
* Fix log message
* Add WaitForActiveNode to test
* Make test fail
* Minor updates
* Test with more shares and a threshold
* Add seal/unseal step to the test
* Update the logic that prepares seal migration (#8187)
* Update the logic that preps seal migration
* Add test and update recovery logic
Co-authored-by: ncabatoff <nick.cabatoff@gmail.com>
Co-authored-by: Brian Kassouf <briankassouf@users.noreply.github.com>
* Raft retry join
* update
* Make retry join work with shamir seal
* Return upon context completion
* Update vault/raft.go
Co-Authored-By: Brian Kassouf <briankassouf@users.noreply.github.com>
* Address some review comments
* send leader information slice as a parameter
* Make retry join work properly with Shamir case. This commit has a blocking issue
* Fix join goroutine exiting before the job is done
* Polishing changes
* Don't return after a successful join during unseal
* Added config parsing test
* Add test and fix bugs
* minor changes
* Address review comments
* Fix build error
Co-authored-by: Brian Kassouf <briankassouf@users.noreply.github.com>
* plugin: fix panic on router.MatchingSystemView if backend is nil
* correctly determine the plugin binary file in the directory
* docs: simplify plugin file removal
* move ServiceDiscovery into methods
* add ServiceDiscoveryFactory
* add serviceDiscovery field to vault.Core
* refactor ConsulServiceDiscovery into separate struct
* cleanup
* revert accidental change to go.mod
* cleanup
* get rid of un-needed struct tags in vault.CoreConfig
* add service_discovery parser
* add ServiceDiscovery to config
* cleanup
* cleanup
* add test for ConfigServiceDiscovery to Core
* unit testing for config service_discovery stanza
* cleanup
* get rid of un-needed redirect_addr stuff in service_discovery stanza
* improve test suite
* cleanup
* clean up test a bit
* create docs for service_discovery
* check if service_discovery is configured, but storage does not support HA
* tinker with test
* tinker with test
* tweak docs
* move ServiceDiscovery into its own package
* tweak a variable name
* fix comment
* rename service_discovery to service_registration
* tweak service_registration config
* Revert "tweak service_registration config"
This reverts commit 5509920a8ab4c5a216468f262fc07c98121dce35.
* simplify naming
* refactor into ./serviceregistration/consul
Go 1.13 flipped TLS 1.3 to opt-out instead of opt-in, and its TLS 1.3
support does not allow configuring cipher suites. Simply remove the
affected test; it's not relevant going forward and there's ample
evidence it works properly prior to Go 1.13.
* core: revoke the proper token on partial failures from token-related requests
* move test to vault package, move test trigger to expiration manager
* update logging messages for clarity
* docstring fix
* Don't allow registering a non-root zero TTL token lease
This is defense-in-depth in that such a token was not allowed to be
used; however it's also a bug fix in that this would then cause no lease
to be generated but the token entry to be written, meaning the token
entry would stick around until it was attempted to be used or tidied (in
both cases the internal lookup would see that this was invalid and do a
revoke on the spot).
* Fix tests
* tidy
* vault: remove dead test helper function testMakeBatchTokenViaCore()
* vault: remove dead test helper function testMakeBatchTokenViaBackend()
* vault: remove dead test helper function mockPolicyStoreNoCache()
* vault: remove dead test helper function mockPolicyStore()
* vault: remove unused test imports
* Handpick cluster cipher suites when they're not user-set
There is an undocumented way for users to choose cluster cipher suites
but for the most part this is to paper over the fact that there are
undesirable suites in TLS 1.2.
If not explicitly set, have the set of cipher suites for the cluster
port come from a hand-picked list; either the allowed TLS 1.3 set (for
forwards compatibility) or the three identical ones for TLS 1.2.
The 1.2 suites have been supported in Go until at least as far back as
Go 1.9 from two years ago. As a result in cases where no specific suites
have been chosen this _ought_ to have no compatibility issues.
Also includes a useful test script.
* Abstract generate-root authentication into the strategy interface
* Generate root strategy ncabatoff (#7700)
* Adapt to new shamir-as-kek reality.
* Don't try to verify the master key when we might still be sealed (in
recovery mode). Instead, verify it in the authenticate methods.
because when unsealing it wouldn't wait for core 0 to come up and become
the active node. Much of our testing code assumes that core0 is the
active node.
Shamir seals now come in two varieties: legacy and new-style. Legacy
Shamir is automatically converted to new-style when a rekey operation
is performed. All new Vault initializations using Shamir are new-style.
New-style Shamir writes an encrypted master key to storage, just like
AutoUnseal. The stored master key is encrypted using the shared key that
is split via Shamir's algorithm. Thus when unsealing, we take the key
fragments given, combine them into a Key-Encryption-Key, and use that
to decrypt the master key on disk. Then the master key is used to read
the keyring that decrypts the barrier.
* core: add postSealMigration method
The postSealMigration method is called at the end of the postUnseal
method if a seal migration has occurred. This starts a seal rewrap
process in the enterprise version of. It is a no-op in the OSS version.
* Initial work
* rework
* s/dr/recovery
* Add sys/raw support to recovery mode (#7577)
* Factor the raw paths out so they can be run with a SystemBackend.
# Conflicts:
# vault/logical_system.go
* Add handleLogicalRecovery which is like handleLogical but is only
sufficient for use with the sys-raw endpoint in recovery mode. No
authentication is done yet.
* Integrate with recovery-mode. We now handle unauthenticated sys/raw
requests, albeit on path v1/raw instead v1/sys/raw.
* Use sys/raw instead raw during recovery.
* Don't bother persisting the recovery token. Authenticate sys/raw
requests with it.
* RecoveryMode: Support generate-root for autounseals (#7591)
* Recovery: Abstract config creation and log settings
* Recovery mode integration test. (#7600)
* Recovery: Touch up (#7607)
* Recovery: Touch up
* revert the raw backend creation changes
* Added recovery operation token prefix
* Move RawBackend to its own file
* Update API path and hit it using CLI flag on generate-root
* Fix a panic triggered when handling a request that yields a nil response. (#7618)
* Improve integ test to actually make changes while in recovery mode and
verify they're still there after coming back in regular mode.
* Refuse to allow a second recovery token to be generated.
* Resize raft cluster to size 1 and start as leader (#7626)
* RecoveryMode: Setup raft cluster post unseal (#7635)
* Setup raft cluster post unseal in recovery mode
* Remove marking as unsealed as its not needed
* Address review comments
* Accept only one seal config in recovery mode as there is no scope for migration
* add storage route
* template out the routes and new raft storage overview
* fetch raft config and add new server model
* pngcrush the favicon
* add view components and binary-file component
* add form-save-buttons component
* adjust rawRequest so that it can send a request body and returns the response on errors
* hook up restore
* rename binary-file to file-to-array-buffer
* add ember-service-worker
* use forked version of ember-service-worker for now
* scope the service worker to a single endpoint
* show both download buttons for now
* add service worker download with a fallback to JS in-mem download
* add remove peer functionality
* lint go file
* add storage-type to the cluster and node models
* update edit for to take a cancel action
* separate out a css table styles to be used by http-requests-table and on the raft-overview component
* add raft-join adapter, model, component and use on the init page
* fix styling and gate the menu item on the cluster using raft storage
* style tweaks to the raft-join component
* fix linting
* add form-save-buttons component to storybook
* add cancel functionality for backup uploads, and add a success message for successful uploads
* add component tests
* add filesize.js
* add filesize and modified date to file-to-array-buffer
* fix linting
* fix server section showing in the cluster nav
* don't use babel transforms in service worker lib because we don't want 2 copies of babel polyfill
* add file-to-array-buffer to storybook
* add comments and use removeObjectURL to raft-storage-overview
* update alert-banner markdown
* messaging change for upload alert banner
* Update ui/app/templates/components/raft-storage-restore.hbs
Co-Authored-By: Joshua Ogle <joshua@joshuaogle.com>
* more comments
* actually render the label if passed and update stories with knobs
Seal keys can be rotated. When this happens, the barrier and recovery
keys should be re-encrypted with the new seal key. This change
automatically re-encrypts the barrier and recovery keys with the latest
seal key on the active node during the 'postUnseal' phase.
* sys: add host-info endpoint, add client API method
* remove old commented handler
* add http tests, fix bugs
* query all partitions for disk usage
* fix Timestamp decoding
* add comments for clarification
* dont append a nil entry on disk usage query error
* remove HostInfo from the sdk api
We can use Logical().Read(...) to query this endpoint since the payload is contained with the data object. All warnings are preserved under Secret.Warnings.
* ensure that we're testing failure case against a standby node
* add and use TestWaitStandby to ensure core is on standby
* remove TestWaitStandby
* respond with local-only error
* move HostInfo into its own helper package
* fix imports; use new no-forward handler
* add cpu times to collection
* emit clearer multierrors/warnings by collection type
* add comments on HostInfo fields
The OIDC Discovery standard requires the response_types_supported field
to be returned in the .well-known/openid-configuration response.
Also, the AWS IAM OIDC consumer won't accept Vault as an identity
provider without this field.
Based on examples in the OIDC Core documentation, it appears Vault
supports only the `id_token` flow, and thus that is the only value that
makes sense to be set in this field. See:
https://openid.net/specs/openid-connect-core-1_0.html#AuthorizationExamples
* sys/pprof: add pprof routes to the system backend
* sys/pprof: add pprof paths to handler with local-only check
* fix trailing slash on pprof index endpoint
* use new no-forward handler on pprof
* go mod tidy
* add pprof external tests
* disallow streaming requests to exceed DefaultMaxRequestDuration
* add max request duration test
This allows logical operations (along with a non-nil response writer) to
process http handler funcs within the operation function while keeping
auth and audit checks that the logical request flow provides.
* Move SudoPrivilege out of SystemView
We only use this in token store and it literally doesn't work anything
that isn't the token store or system mount, so we should stop exposing
something that doesn't work.
* Reconcile extended system view with sdk/logical a bit and put an explanation for why SudoPrivilege isn't moved over
Generalization of the PhysicalFactory notion introduced by Raft, so it can be used by other storage backends in tests. These are the OSS changes needed for my rework of the ent integ tests and cluster helpers.
* Fix create token sudo non-root namespace check
* Moved path trimming to SudoPrivilege
* Changed to tokenCtx instead of request ctx
* Use root context for AllowOperation; details in comment
* also flush nilNamespace when a namespace is flushed
* adds test cases with nilNamespace.ID
* adds a test case
* adds a test for oidcCache.Flush
* fixed a typo in an error message
There are a few different things happening in this change. First, some code that previously lived in enterprise has moved here: this includes some helper code for manipulating clusters and for building storage backends. Second, the existing cluster-building code using inmem storage has been generalized to allow various storage backends. Third, added support for creating two-cluster DR setups. Finally, there are tweaks to handle edge cases that
result in intermittent failures, or to eliminate sleeps in favour of polling to detect state changes.
Also: generalize TestClusterOptions.PhysicalFactory so it can be used either
as a per-core factory (for raft) or a per-cluster factory (for other
storage backends.)
* Add maximum amount of random characters requested at any given time
* Readability changes
* Removing sys/tools/random from the default policy
* Setting the maxBytes value as const
* Declaring maxBytes in the package to use it everywhere
* Using maxBytes in the error message
This eliminates a bunch of noise at the end of a test's logs, making it
easier to diagnose test failures.
Add TestCluster.Logger. This is intendend to be used when test helpers
want to log something. If TestClusterOptions.Logger is non-nil, it will
be used as the cluster logger, otherwise we create one based on test
name. In the absence of CoreConfig.Logger, this logger is also used as
the parent of each core's log. This makes it easy when running parallel
tests to identify which log message came from which test.
* flush identity/oidc cache by namespace
* separates and unit tests the logic that looks for a namespace id within a namespace key
* applies pr feedback
* renames nskeyContainsID to isNamespacedKey
We already have a separate log line if rollback fails. It really fills
up logs to always note when rollback is occurring and it usually isn't
useful for incidents.
Wrapping validation was deferring the function to audit log before
actually checking if we were sealed or standby, and without having the
read lock grabbed.
* upgrade aws roles
* test upgrade aws roles
* Initialize aws credential backend at mount time
* add a TODO
* create end-to-end test for builtin/credential/aws
* fix bug in initializer
* improve comments
* add Initialize() to logical.Backend
* use Initialize() in Core.enableCredentialInternal()
* use InitializeRequest to call Initialize()
* improve unit testing for framework.Backend
* call logical.Backend.Initialize() from all of the places that it needs to be called.
* implement backend.proto changes for logical.Backend.Initialize()
* persist current role storage version when upgrading aws roles
* format comments correctly
* improve comments
* use postUnseal funcs to initialize backends
* simplify test suite
* improve test suite
* simplify logic in aws role upgrade
* simplify aws credential initialization logic
* simplify logic in aws role upgrade
* use the core's activeContext for initialization
* refactor builtin/plugin/Backend
* use a goroutine to upgrade the aws roles
* misc improvements and cleanup
* do not run AWS role upgrade on DR Secondary
* always call logical.Backend.Initialize() when loading a plugin.
* improve comments
* on standbys and DR secondaries we do not want to run any kind of upgrade logic
* fix awsVersion struct
* clarify aws version upgrade
* make the upgrade logic for aws auth more explicit
* aws upgrade is now called from a switch
* fix fallthrough bug
* simplify logic
* simplify logic
* rename things
* introduce currentAwsVersion const to track aws version
* improve comments
* rearrange things once more
* conglomerate things into one function
* stub out aws auth initialize e2e test
* improve aws auth initialize e2e test
* finish aws auth initialize e2e test
* tinker with aws auth initialize e2e test
* tinker with aws auth initialize e2e test
* tinker with aws auth initialize e2e test
* fix typo in test suite
* simplify logic a tad
* rearrange assignment
* Fix a few lifecycle related issues in #7025 (#7075)
* Fix panic when plugin fails to load
* Fix various read only storage errors
A mistake we've seen multiple times in our own plugins and that we've
seen in the GCP plugin now is that control flow (how the code is
structured, helper functions, etc.) can obfuscate whether an error came
from storage or some other Vault-core location (in which case likely it
needs to be a 5XX message) or because of user input (thus 4XX). Error
handling for functions therefore often ends up always treating errors as
either user related or internal.
When the error is logical.ErrReadOnly this means that treating errors as
user errors skips the check that triggers forwarding, instead returning
a read only view error to the user.
While it's obviously more correct to fix that code, it's not always
immediately apparent to reviewers or fixers what the issue is and fixing
it when it's found both requires someone to hit the problem and report
it (thus exposing bugs to users) and selective targeted refactoring that
only helps that one specific case.
If instead we check whether the logical.Response is an error and, if so,
whether it contains the error value, we work around this in all of these
cases automatically. It feels hacky since it's a coding mistake, but
it's one we've made too multiple times, and avoiding bugs altogether is
better for our users.
* storage/raft: When restoring a snapshot preseal first
* best-effort allow standbys to apply the restoreOp before sealing active node
* Don't cache the raft tls key
* Update physical/raft/raft.go
* Move pending raft peers to core
* Fix race on close bool
* Extend the leaderlease time for tests
* Update raft deps
* Fix audit hashing
* Fix race with auditing
* adds allowed_roles field to identity token keys and updates tests
* removed a comment that was redundant
* allowed_roles uses role client_id s instead of role names
* renamed allowed_roles to allowed_clients
* renamed allowed_clients to allowed_clientIDs
* WIP
* Kinda working?
* Handle nil during rotation
* Update discovery document
* WIP
* removes some warning messages and checks on keys when creating a role
* Path issuer ns/specific
* Fix nspath handling
* Update issuer handling
* Add locking around key updates
* Cleanup
* Fix nextRun handling
* saving work
* Include namespace in token
* saving work
* saving work
* happy path
* saving work
* sharing debug msgs
* Merge branch 'master' into refactor_periodic_func_test
# Conflicts:
# vault/identity_store_oidc.go
# vault/identity_store_oidc_test.go
* use MatchingStorageByAPIPath instead of logical.InmemStorage
At the level of role config it doesn't mean anything to use
default-service or default-batch; that's for mount tuning. So disallow
it in tokenutil. This also fixes the fact that the switch statement
wasn't right.
Add support for hashing time.Time within slices, which unbreaks auditing of requests returning the request counters.
Break Hash into struct-specific func like HashAuth, HashRequest. Move all the copying/hashing logic from FormatRequest/FormatResponse into the new Hash* funcs. HashStructure now modifies in place instead of copying.
Instead of returning an error when trying to hash map keys of type time.Time, ignore them, i.e. pass them through unhashed.
Enable auditing on test clusters by default if the caller didn't specify any audit backends. If they do, they're responsible for setting it up.
* adds allowed_roles field to identity token keys and updates tests
* removed a comment that was redundant
* allowed_roles uses role client_id s instead of role names
* renamed allowed_roles to allowed_clients
* renamed allowed_clients to allowed_clientIDs
* removes some warning messages and checks on keys when creating a role
* removes name field being set unneededly
Earlier in tokenutil's dev it seemed like there was no reason to allow
auth plugins to toggle renewability off. However, it turns out Centrify
makes use of this for sensible reasons. As a result, move the forcing-on
of renewability into tokenutil, but then allow overriding after
PopulateTokenAuth is called.
* Implemented token backend support for identity
* Fixed tests
* Refactored a few checks for the token entity overwrite. Fixed tests.
* Moved entity alias check up so that the entity and entity alias is only created when it has been specified in allowed_entity_aliases list
* go mod vendor
* Added glob pattern
* Optimized allowed entity alias check
* Added test for asterisk only
* Changed to glob pattern anywhere
* Changed response code in case of failure. Changed globbing pattern check. Added docs.
* Added missing token role get parameter. Added more samples
* Fixed failing tests
* Corrected some cosmetical review points
* Changed response code for invalid provided entity alias
* Fixed minor things
* Fixed failing test
If only a non-_token field is provided we don't want to clear out the
Token version of the params, we want to set both. Otherwise we can't
rely on using the Token version of the parameter when creating the Auth
struct.
* Add OIDC token generation to Identity
There are a few open TODOs and some remaining cleanup, but this is
functionally complete and ready for review.
(Tests will being added soon.)
* Simplified key update endpoint
* Cache the config
* Fix Issuer handling
* Suppose base64-encoded templates (#6919)
* Cache JWKS and switch to go-cache (#6918)
* Address review comments
* Add warning if neither Issue nor api_addr are set
* adds tests (#6937)
* adds help synopsis and descriptions to the framework path for the oid… (#6930)
* adds help synopsis and descriptions to the framework path for the oidc backend
* Update vault/identity_store_oidc.go
Co-Authored-By: Jim Kalafut <jim@kalafut.net>
* Add Now parameter to PopulateStringInput
* Addressing review comments
* Refactor template processing to improve mode-specific handling
* adds a test for the periodic func (#6943)
* adds a test for the periodic func
* removes commented out code
* adds a comment
* Add comments
* Work on raft backend
* Add logstore locally
* Add encryptor and unsealable interfaces
* Add clustering support to raft
* Remove client and handler
* Bootstrap raft on init
* Cleanup raft logic a bit
* More raft work
* Work on TLS config
* More work on bootstrapping
* Fix build
* More work on bootstrapping
* More bootstrapping work
* fix build
* Remove consul dep
* Fix build
* merged oss/master into raft-storage
* Work on bootstrapping
* Get bootstrapping to work
* Clean up FMS and node-id
* Update local node ID logic
* Cleanup node-id change
* Work on snapshotting
* Raft: Add remove peer API (#906)
* Add remove peer API
* Add some comments
* Fix existing snapshotting (#909)
* Raft get peers API (#912)
* Read raft configuration
* address review feedback
* Use the Leadership Transfer API to step-down the active node (#918)
* Raft join and unseal using Shamir keys (#917)
* Raft join using shamir
* Store AEAD instead of master key
* Split the raft join process to answer the challenge after a successful unseal
* get the follower to standby state
* Make unseal work
* minor changes
* Some input checks
* reuse the shamir seal access instead of new default seal access
* refactor joinRaftSendAnswer function
* Synchronously send answer in auto-unseal case
* Address review feedback
* Raft snapshots (#910)
* Fix existing snapshotting
* implement the noop snapshotting
* Add comments and switch log libraries
* add some snapshot tests
* add snapshot test file
* add TODO
* More work on raft snapshotting
* progress on the ConfigStore strategy
* Don't use two buckets
* Update the snapshot store logic to hide the file logic
* Add more backend tests
* Cleanup code a bit
* [WIP] Raft recovery (#938)
* Add recovery functionality
* remove fmt.Printfs
* Fix a few fsm bugs
* Add max size value for raft backend (#942)
* Add max size value for raft backend
* Include physical.ErrValueTooLarge in the message
* Raft snapshot Take/Restore API (#926)
* Inital work on raft snapshot APIs
* Always redirect snapshot install/download requests
* More work on the snapshot APIs
* Cleanup code a bit
* On restore handle special cases
* Use the seal to encrypt the sha sum file
* Add sealer mechanism and fix some bugs
* Call restore while state lock is held
* Send restore cb trigger through raft log
* Make error messages nicer
* Add test helpers
* Add snapshot test
* Add shamir unseal test
* Add more raft snapshot API tests
* Fix locking
* Change working to initalize
* Add underlying raw object to test cluster core
* Move leaderUUID to core
* Add raft TLS rotation logic (#950)
* Add TLS rotation logic
* Cleanup logic a bit
* Add/Remove from follower state on add/remove peer
* add comments
* Update more comments
* Update request_forwarding_service.proto
* Make sure we populate all nodes in the followerstate obj
* Update times
* Apply review feedback
* Add more raft config setting (#947)
* Add performance config setting
* Add more config options and fix tests
* Test Raft Recovery (#944)
* Test raft recovery
* Leave out a node during recovery
* remove unused struct
* Update physical/raft/snapshot_test.go
* Update physical/raft/snapshot_test.go
* fix vendoring
* Switch to new raft interface
* Remove unused files
* Switch a gogo -> proto instance
* Remove unneeded vault dep in go.sum
* Update helper/testhelpers/testhelpers.go
Co-Authored-By: Calvin Leung Huang <cleung2010@gmail.com>
* Update vault/cluster/cluster.go
* track active key within the keyring itself (#6915)
* track active key within the keyring itself
* lookup and store using the active key ID
* update docstring
* minor refactor
* Small text fixes (#6912)
* Update physical/raft/raft.go
Co-Authored-By: Calvin Leung Huang <cleung2010@gmail.com>
* review feedback
* Move raft logical system into separate file
* Update help text a bit
* Enforce cluster addr is set and use it for raft bootstrapping
* Fix tests
* fix http test panic
* Pull in latest raft-snapshot library
* Add comment
* Add priority queue to sdk
* fix issue of storing pointers and now copy
* update to use copy structure
* Remove file, put Item struct def. into other file
* add link
* clean up docs
* refactor internal data structure to hide heap method implementations. Other cleanup after feedback
* rename PushItem and PopItem to just Push/Pop, after encapsulating the heap methods
* updates after feedback
* refactoring/renaming
* guard against pushing a nil item
* minor updates after feedback
* Add SetCredentials, GenerateCredentials gRPC methods to combined database backend gPRC
* Initial Combined database backend implementation of static accounts and automatic rotation
* vendor updates
* initial implementation of static accounts with Combined database backend, starting with PostgreSQL implementation
* add lock and setup of rotation queue
* vendor the queue
* rebase on new method signature of queue
* remove mongo tests for now
* update default role sql
* gofmt after rebase
* cleanup after rebasing to remove checks for ErrNotFound error
* rebase cdcr-priority-queue
* vendor dependencies with 'go mod vendor'
* website database docs for Static Role support
* document the rotate-role API endpoint
* postgres specific static role docs
* use constants for paths
* updates from review
* remove dead code
* combine and clarify error message for older plugins
* Update builtin/logical/database/backend.go
Co-Authored-By: Jim Kalafut <jim@kalafut.net>
* cleanups from feedback
* code and comment cleanups
* move db.RLock higher to protect db.GenerateCredentials call
* Return output with WALID if we failed to delete the WAL
* Update builtin/logical/database/path_creds_create.go
Co-Authored-By: Jim Kalafut <jim@kalafut.net>
* updates after running 'make fmt'
* update after running 'make proto'
* Update builtin/logical/database/path_roles.go
Co-Authored-By: Brian Kassouf <briankassouf@users.noreply.github.com>
* Update builtin/logical/database/path_roles.go
Co-Authored-By: Brian Kassouf <briankassouf@users.noreply.github.com>
* update comment and remove and rearrange some dead code
* Update website/source/api/secret/databases/index.html.md
Co-Authored-By: Jim Kalafut <jim@kalafut.net>
* cleanups after review
* Update sdk/database/dbplugin/grpc_transport.go
Co-Authored-By: Brian Kassouf <briankassouf@users.noreply.github.com>
* code cleanup after feedback
* remove PasswordLastSet; it's not used
* document GenerateCredentials and SetCredentials
* Update builtin/logical/database/path_rotate_credentials.go
Co-Authored-By: Brian Kassouf <briankassouf@users.noreply.github.com>
* wrap pop and popbykey in backend methods to protect against nil cred rotation queue
* use strings.HasPrefix instead of direct equality check for path
* Forgot to commit this
* updates after feedback
* re-purpose an outdated test to now check that static and dynamic roles cannot share a name
* check for unique name across dynamic and static roles
* refactor loadStaticWALs to return a map of name/setCredentialsWAL struct to consolidate where we're calling set credentials
* remove commented out code
* refactor to have loadstaticwals filter out wals for roles that no longer exist
* return error if nil input given
* add nil check for input into setStaticAccount
* Update builtin/logical/database/path_roles.go
Co-Authored-By: Brian Kassouf <briankassouf@users.noreply.github.com>
* add constant for queue tick time in seconds, used for comparrison in updates
* Update builtin/logical/database/path_roles.go
Co-Authored-By: Jim Kalafut <jim@kalafut.net>
* code cleanup after review
* remove misplaced code comment
* remove commented out code
* create a queue in the Factory method, even if it's never used
* update path_roles to use a common set of fields, with specific overrides for dynamic/static roles by type
* document new method
* move rotation things into a specific file
* rename test file and consolidate some static account tests
* Update builtin/logical/database/path_roles.go
Co-Authored-By: Brian Kassouf <briankassouf@users.noreply.github.com>
* Update builtin/logical/database/rotation.go
Co-Authored-By: Brian Kassouf <briankassouf@users.noreply.github.com>
* Update builtin/logical/database/rotation.go
Co-Authored-By: Brian Kassouf <briankassouf@users.noreply.github.com>
* Update builtin/logical/database/rotation.go
Co-Authored-By: Brian Kassouf <briankassouf@users.noreply.github.com>
* Update builtin/logical/database/rotation.go
Co-Authored-By: Brian Kassouf <briankassouf@users.noreply.github.com>
* Update builtin/logical/database/rotation.go
Co-Authored-By: Brian Kassouf <briankassouf@users.noreply.github.com>
* update code comments, method names, and move more methods into rotation.go
* update comments to be capitalized
* remove the item from the queue before we try to destroy it
* findStaticWAL returns an error
* use lowercase keys when encoding WAL entries
* small cleanups
* remove vestigial static account check
* remove redundant DeleteWAL call in populate queue
* if we error on loading role, push back to queue with 10 second backoff
* poll in initqueue to make sure the backend is setup and can write/delete data
* add revoke_user_on_delete flag to allow users to opt-in to revoking the static database user on delete of the Vault role. Default false
* add code comments on read-only loop
* code comment updates
* re-push if error returned from find static wal
* add locksutil and acquire locks when pop'ing from the queue
* grab exclusive locks for updating static roles
* Add SetCredentials and GenerateCredentials stubs to mockPlugin
* add a switch in initQueue to listen for cancelation
* remove guard on zero time, it should have no affect
* create a new context in Factory to pass on and use for closing the backend queue
* restore master copy of vendor dir
* Fix a deadlock if a panic happens during request handling
During request handling, if a panic is created, deferred functions are
run but otherwise execution stops. #5889 changed some locks to
non-defers but had the side effect of causing the read lock to not be
released if the request panicked. This fixes that and addresses a few
other potential places where things could go wrong:
1) In sealInitCommon we always now defer a function that unlocks the
read lock if it hasn't been unlocked already
2) In StepDown we defer the RUnlock but we also had two error cases that
were calling it manually. These are unlikely to be hit but if they were
I believe would cause a panic.
* Add panic recovery test
This allows us to truly delete policies when we've either invalidated it
(which since they're singletons/default should only happen when we're
doing a namespace delete) or are doing a namespace delete on the local
node.
When unmounting, the router entry would be tainted, preventing routing.
However, we would then unmount the router before clearing storage, so if
an error occurred the router would have forgotten the path. For auth
mounts this isn't a problem since they had a secondary check, but
regular mounts didn't (not sure why, but this is true back to at least
0.2.0). This meant you could then create a duplicate mount using the
same path which would then not conflict in the router until postUnseal.
This adds the extra check to regular mounts, and also moves the location
of the router unmount.
This also ensures that on the next router.Mount, tainted is set to the
mount entry's tainted status.
Fixes#6769
Move audit.LogInput to sdk/logical. Allow the Data values in audited
logical.Request and Response to implement OptMarshaler, in which case
we delegate hashing/serializing responsibility to them. Add new
ClientCertificateSerialNumber audit request field.
SystemView can now be cast to ExtendedSystemView to expose the Auditor
interface, which allows submitting requests and responses to the audit
broker.
* Port over some SP v2 bits
Specifically:
* Add too-large handling to Physical (Consul only for now)
* Contextify some identity funcs
* Update SP protos
* Add size limiting to inmem storage
These paths are handled directly in handler.go, but the list of special
paths here impacts the x-vault-unauthenticated field in generated
OpenAPI.
Fixes: #6651
Merge both functions for creating mongodb containers into one.
Add retries to docker container cleanups.
Require $VAULT_ACC be set to enable AWS tests.
* Don't bother trying to save metrics when we don't have a barrier. Use stateLock.
* Use c.barrier instead of c.systemBarrierView, thus we don't need locking
and don't need to worry about race with mount setup.
* Remove unneccessary lock.
* Fix a locking issue in the Rollback manager
* Update rollback.go
* Update rollback.go
* move state creation
* Update vault/rollback.go
Co-Authored-By: briankassouf <briankassouf@users.noreply.github.com>
* Simplify logic by canceling the lock grab
* Use context instead of a chan
* Update vault/rollback.go
* fixing dockertest to run on travis
* try a repo local directory
* precreate the directory
* strip extraneous comment
* check directory was created
* try to print container logs
* try writing out client logs
* one last try
* Attempt to fix test
* convert to insecure tls
* strip test-temp