* OSS portion of wrapper-v2
* Prefetch barrier type to avoid encountering an error in the simple BarrierType() getter
* Rename the OveriddenType to WrapperType and use it for the barrier type prefetch
* Fix unit test
* storage/raft: Fix cluster init with retry_join
Commit 8db66f4853abce3f432adcf1724b1f237b275415 introduced an error
wherein a join() would return nil (no error) with no information on its
channel if a joining node had been initialized. This was not handled
properly by the caller and resulted in a canceled `retry_join`.
Fix this by handling the `nil` channel respone by treating it as an
error and allowing the existing mechanics to work as intended.
* storage/raft: Improve retry_join go test
* storage/raft: Make VerifyRaftPeers pollable
* storage/raft: Add changelog entry for retry_join fix
* storage/raft: Add description to VerifyRaftPeers
strings.ReplaceAll(s, old, new) is a wrapper function for
strings.Replace(s, old, new, -1). But strings.ReplaceAll is more
readable and removes the hardcoded -1.
Signed-off-by: Eng Zer Jun <engzerjun@gmail.com>
* Update gopsutil to v3
* Adds v2 field names in host-info response to allow eventual deprecation in favor of v3 field names
* Map v3 to v2 field names to keep host-info api compat
* copy gopsutil license into source
* raft: Ensure init before setting suffrage
As reported in https://hashicorp.atlassian.net/browse/VAULT-6773:
The /sys/storage/raft/join endpoint is intended to be unauthenticated. We rely
on the seal to manage trust.
It’s possible to use multiple join requests to switch nodes from voter to
non-voter. The screenshot shows a 3 node cluster where vault_2 is the leader,
and vault_3 and vault_4 are followers with non-voters set to false. sent two
requests to the raft join endpoint to have vault_3 and vault_4 join the cluster
with non_voters:true.
This commit fixes the issue by delaying the call to SetDesiredSuffrage until after
the initialization check, preventing unauthenticated mangling of voter status.
Tested locally using
https://github.com/hashicorp/vault-tools/blob/main/users/ncabatoff/cluster/raft.sh
and the reproducer outlined in VAULT-6773.
* raft: Return join err on failure
This is necessary to correctly distinguish errors returned from the Join
workflow. Previously, errors were being masked as timeouts.
* raft: Default autopilot parameters in teststorage
Change some defaults so we don't have to pass in parameters or set them
in the originating tests. These storage types are only used in two
places:
1) Raft HA testing
2) Seal migration testing
Both consumers have been tested and pass with this change.
* changelog: Unauthn voter status change bugfix
* VAULT-6613 add DetermineRoleFromLoginRequest function to Core
* Fix body handling
* Role resolution for rate limit quotas
* VAULT-6613 update precedence test
* Add changelog
* VAULT-6614 start of changes for roles in LCQs
* Expiration changes for leases
* Add role information to RequestAuth
* VAULT-6614 Test updates
* VAULT-6614 Add expiration test with roles
* VAULT-6614 fix comment
* VAULT-6614 Protobuf on OSS
* VAULT-6614 Add rlock to determine role code
* VAULT-6614 Try lock instead of rlock
* VAULT-6614 back to rlock while I think about this more
* VAULT-6614 Additional safety for nil dereference
* VAULT-6614 Use %q over %s
* VAULT-6614 Add overloading to plugin backends
* VAULT-6614 RLocks instead
* VAULT-6614 Fix return for backend factory
Add database plugin metrics around connections
This is a replacement for #15923 that takes into account recent lock
cleanup.
I went ahead and added back in the hanging plugin test, which I meant to
add in #15944 but forgot.
I tested this by spinning up a statsd sink in the tests and verifying I
got a stream of metrics:
```
$ nc -u -l 8125 | grep backend
test.swenson-Q9Q0L72D39.secrets.database.backend.connections.count.pgx.5.:1.000000|g
test.swenson-Q9Q0L72D39.secrets.database.backend.connections.count.pgx.5.:0.000000|g
test.swenson-Q9Q0L72D39.secrets.database.backend.connections.count.pgx.5.:1.000000|g
test.swenson-Q9Q0L72D39.secrets.database.backend.connections.count.pgx.5.:0.000000|g
```
We have to rework the shared gauge code to work without a full
`ClusterMetricSink`, since we don't have access to the core metrics from
within a plugin.
This only reports metrics every 10 minutes by default, but it solves
some problems we would have had with the gauge values becoming stale and
needing to be re-sent.
Co-authored-by: Tom Proctor <tomhjp@users.noreply.github.com>
- When an end-user specifies the all source type within
transit/random and sys/tools/random, only use the additional source
if we are actually configured with an external entropy source
* WIP replacing lib/pq
* change timezome param to be URI format
* add changelog
* add changelog for redshift
* update changelog
* add test for DSN style connection string
* more parseurl and quoteidentify to sdk; include copyright and license
* call dbutil.ParseURL instead, fix import ordering
Co-authored-by: Calvin Leung Huang <1883212+calvn@users.noreply.github.com>
This requires bumping https://github.com/mitchellh/go-testing-interface.
For this new version, we have to create a wrapper to convert
the stdlib `testing.TB` interface to the
`mitchellh/go-testing-interface` `T` interface, since it uses
`Parallel()` now, which is not supported by `testing.TB`. This had to be
added to a new package, `benchhelpers`, to avoid a circular dependency
in `testhelpers`.
We also have to *unbump* https://github.com/armon/go-metrics since
updating it breaks our usage of
https://github.com/google/go-metrics-stackdriver
I verified that the new `pkiCert` template function works with agent
injection using annotations like:
```yaml
vault.hashicorp.com/agent-inject-secret-sample.crt: "pki/issue/example-dot-com"
vault.hashicorp.com/agent-inject-template-sample.crt: |
{{ pkiCert "pki/issue/example-dot-com" "common_name=foo.example.com" "ttl=1h" }}
```
* Allow callers to choose the entropy source for the random endpoints
* Put source in the URL for sys as well
* changelog
* docs
* Fix unit tests, and add coverage
* refactor to use a single common implementation
* Update documentation
* one more tweak
* more cleanup
* Readd lost test expected code
* fmt
* VAULT-5422: Add rate limit for TOTP passcode attempts
* fixing the docs
* CL
* feedback
* Additional info in doc
* rate limit is done per entity per methodID
* refactoring a test
* rate limit OSS work for policy MFA
* adding max_validation_attempts to TOTP config
* feedback
* checking for non-nil reference
The URL password redaction operation did not handle the case where the
database connection URL was provided as a percent-encoded string, and
its password component contained reserved characters. It attempted to
redact the password by replacing the unescaped password in the
percent-encoded URL. This resulted in the password being revealed when
reading the configuration from Vault.
Unlike fips_140_3, fips will be a (FIPS) version-agnostic build tag.
The listener support will remain in 140-3 only, but the IsFIPS() check
should apply regardless of FIPS version.
We add two FIPS-only build files which validate the constraints of FIPS
builds here: fips must be specified with either fips_140_2 or fips_140_3
build tags, and fips and cgo must also be specified together.
Additionally, using only a version-specific FIPS build tag without the
version-agnostic FIPS tag should be a failure.
Signed-off-by: Alexander Scheel <alex.scheel@hashicorp.com>
* remove mount accessor from MFA config
* Update login_mfa_duo_test.go
* DUO test with entity templating
* using identitytpl.PopulateString to perform templating
* minor refactoring
* fixing fmt failures in CI
* change username format to username template
* fixing username_template example
* Add support for PROXY protocol v2 in TCP listener
I did not find tests for this so I added one trying to cover different
configurations to make sure I did not break something. As far as I know,
the behavior should be exactly the same as before except for one thing
when proxy_protocol_behavior is set to "deny_unauthorized", unauthorized
requests were previously silently reject because of https://github.com/armon/go-proxyproto/blob/7e956b284f0a/protocol.go#L81-L84
but it will now be logged.
Also fixes https://github.com/hashicorp/vault/issues/9462 by adding
support for `PROXY UNKNOWN` for PROXY protocol v1.
Closes https://github.com/hashicorp/vault/issues/3807
* Add changelog
* Login MFA
* ENT OSS segragation (#14088)
* Delete method id if not used in an MFA enforcement config (#14063)
* Delete an MFA methodID only if it is not used by an MFA enforcement config
* Fixing a bug: mfa/validate is an unauthenticated path, and goes through the handleLoginRequest path
* adding use_passcode field to DUO config (#14059)
* add changelog
* preventing replay attack on MFA passcodes (#14056)
* preventing replay attack on MFA passcodes
* using %w instead of %s for error
* Improve CLI command for login mfa (#14106)
CLI prints a warning message indicating the login request needs to get validated
* adding the validity period of a passcode to error messages (#14115)
* PR feedback
* duo to handle preventing passcode reuse
Co-authored-by: hghaf099 <83242695+hghaf099@users.noreply.github.com>
Co-authored-by: hamid ghaf <hamid@hashicorp.com>
* port SSCT OSS
* port header hmac key to ent and generate token proto without make command
* remove extra nil check in request handling
* add changelog
* add comment to router.go
* change test var to use length constants
* remove local index is 0 check and extra defer which can be removed after use of ExternalID
* adds development workflow to mirage config
* adds mirage handler and factory for mfa workflow
* adds mfa handling to auth service and cluster adapter
* moves auth success logic from form to controller
* adds mfa form component
* shows delayed auth message for all methods
* adds new code delay to mfa form
* adds error views
* fixes merge conflict
* adds integration tests for mfa-form component
* fixes auth tests
* updates mfa response handling to align with backend
* updates mfa-form to handle multiple methods and constraints
* adds noDefault arg to Select component
* updates mirage mfa handler to align with backend and adds generator for various mfa scenarios
* adds tests
* flaky test fix attempt
* reverts test fix attempt
* adds changelog entry
* updates comments for todo items
* removes faker from mfa mirage factory and handler
* adds number to word helper
* fixes tests
* Revert "Merge branch 'main' into ui/mfa"
This reverts commit 8ee6a6aaa1b6c9ec16b985c10d91c3806819ec40, reversing
changes made to 2428dd6cca07bb41cda3f453619646ca3a88bfd0.
* format-ttl helper fix from main
* feat: DB plugin multiplexing (#13734)
* WIP: start from main and get a plugin runner from core
* move MultiplexedClient map to plugin catalog
- call sys.NewPluginClient from PluginFactory
- updates to getPluginClient
- thread through isMetadataMode
* use go-plugin ClientProtocol interface
- call sys.NewPluginClient from dbplugin.NewPluginClient
* move PluginSets to dbplugin package
- export dbplugin HandshakeConfig
- small refactor of PluginCatalog.getPluginClient
* add removeMultiplexedClient; clean up on Close()
- call client.Kill from plugin catalog
- set rpcClient when muxed client exists
* add ID to dbplugin.DatabasePluginClient struct
* only create one plugin process per plugin type
* update NewPluginClient to return connection ID to sdk
- wrap grpc.ClientConn so we can inject the ID into context
- get ID from context on grpc server
* add v6 multiplexing protocol version
* WIP: backwards compat for db plugins
* Ensure locking on plugin catalog access
- Create public GetPluginClient method for plugin catalog
- rename postgres db plugin
* use the New constructor for db plugins
* grpc server: use write lock for Close and rlock for CRUD
* cleanup MultiplexedClients on Close
* remove TODO
* fix multiplexing regression with grpc server connection
* cleanup grpc server instances on close
* embed ClientProtocol in Multiplexer interface
* use PluginClientConfig arg to make NewPluginClient plugin type agnostic
* create a new plugin process for non-muxed plugins
* feat: plugin multiplexing: handle plugin client cleanup (#13896)
* use closure for plugin client cleanup
* log and return errors; add comments
* move rpcClient wrapping to core for ID injection
* refactor core plugin client and sdk
* remove unused ID method
* refactor and only wrap clientConn on multiplexed plugins
* rename structs and do not export types
* Slight refactor of system view interface
* Revert "Slight refactor of system view interface"
This reverts commit 73d420e5cd2f0415e000c5a9284ea72a58016dd6.
* Revert "Revert "Slight refactor of system view interface""
This reverts commit f75527008a1db06d04a23e04c3059674be8adb5f.
* only provide pluginRunner arg to the internal newPluginClient method
* embed ClientProtocol in pluginClient and name logger
* Add back MLock support
* remove enableMlock arg from setupPluginCatalog
* rename plugin util interface to PluginClient
Co-authored-by: Brian Kassouf <bkassouf@hashicorp.com>
* feature: multiplexing: fix unit tests (#14007)
* fix grpc_server tests and add coverage
* update run_config tests
* add happy path test case for grpc_server ID from context
* update test helpers
* feat: multiplexing: handle v5 plugin compiled with new sdk
* add mux supported flag and increase test coverage
* set multiplexingSupport field in plugin server
* remove multiplexingSupport field in sdk
* revert postgres to non-multiplexed
* add comments on grpc server fields
* use pointer receiver on grpc server methods
* add changelog
* use pointer for grpcserver instance
* Use a gRPC server to determine if a plugin should be multiplexed
* Apply suggestions from code review
Co-authored-by: Brian Kassouf <briankassouf@users.noreply.github.com>
* add lock to removePluginClient
* add multiplexingSupport field to externalPlugin struct
* do not send nil to grpc MultiplexingSupport
* check err before logging
* handle locking scenario for cleanupFunc
* allow ServeConfigMultiplex to dispense v5 plugin
* reposition structs, add err check and comments
* add comment on locking for cleanupExternalPlugin
Co-authored-by: Brian Kassouf <bkassouf@hashicorp.com>
Co-authored-by: Brian Kassouf <briankassouf@users.noreply.github.com>
* Fix upndomain bug causing alias name to change
* Fix nil map
* Add changelog
* revert
* Update changelog
* Add test for alias metadata name
* Fix code comment
* Warn user supplying nonce values in FIPS mode for transit encryption requests
- Send back a warning within the response if an end-user supplies nonce
values that we use within the various transit encrypt apis.
- We do not send a warning if an end-user supplies a nonce value but we
don't use it.
- Affected api methods are encrypt, rewrap and datakey
- The warning is only sent when we are operating in FIPS mode.
- Add a 'Connect Timeout' query parameter to the test helper to set
a timeout value of 30 seconds in an attempt to address the following
failure we see at times in TestDeleteUser and TestUpdateUser
mssql_test.go:253: Failed to initialize: error verifying connection: TLS Handshake failed: cannot read handshake packet: EOF
* Update Go client libraries for etcd
* Added etcd server container to run etcd3 tests automatically.
* Removed etcd2 test case: it fails the backend tests but the failure is
unrelated to the uplift. The etcd2 backend implementation does not
remove empty nested nodes when removing leaf (see comments in #11980).
* prelim fairshare prototype, untested and prototype status
* add tests for new fairshare infra - this likely fails tests for being racy
* probably fix races for code and test
* one more lock to fix for races
* fairsharing queue work distribution, tests, fixes, etc
* comment, shorten wait time
* typos and comments
* fix inverted worker count logic
* Update helper/fairshare/jobmanager.go
typo
* Update helper/fairshare/jobmanager.go
clarify comment
* move back to round robin between queues
* improvements from self review
* add job manager stress test
* Refactor TLS parsing
The ParsePEMBundle and ParsePKIJSON functions in the certutil package assumes
both a client certificate and a custom CA are specified. Cassandra needs to
allow for either a client certificate, a custom CA, or both. This revamps the
parsing of pem_json and pem_bundle to accomodate for any of these configurations
* build out zombie lease system
* add typo for CI
* undo test CI commit
* time equality test isn't working on CI, so let's see what this does...
* add unrecoverable proto error, make proto, go mod vendor
* zombify leases if unrecoverable error, tests
* test fix: somehow pointer in pointer rx is null after pointer rx called
* tweaks based on roy feedback
* improve zombie errors
* update which errors are unrecoverable
* combine zombie logic
* keep subset of zombie lease in memory
* k8s doc: update for 0.9.1 and 0.8.0 releases (#10825)
* k8s doc: update for 0.9.1 and 0.8.0 releases
* Update website/content/docs/platform/k8s/helm/configuration.mdx
Co-authored-by: Theron Voran <tvoran@users.noreply.github.com>
Co-authored-by: Theron Voran <tvoran@users.noreply.github.com>
* Autopilot initial commit
* Move autopilot related backend implementations to its own file
* Abstract promoter creation
* Add nil check for health
* Add server state oss no-ops
* Config ext stub for oss
* Make way for non-voters
* s/health/state
* s/ReadReplica/NonVoter
* Add synopsis and description
* Remove struct tags from AutopilotConfig
* Use var for config storage path
* Handle nin-config when reading
* Enable testing autopilot by using inmem cluster
* First passing test
* Only report the server as known if it is present in raft config
* Autopilot defaults to on for all existing and new clusters
* Add locking to some functions
* Persist initial config
* Clarify the command usage doc
* Add health metric for each node
* Fix audit logging issue
* Don't set DisablePerformanceStandby to true in test
* Use node id label for health metric
* Log updates to autopilot config
* Less aggressively consume config loading failures
* Return a mutable config
* Return early from known servers if raft config is unable to be pulled
* Update metrics name
* Reduce log level for potentially noisy log
* Add knob to disable autopilot
* Don't persist if default config is in use
* Autopilot: Dead server cleanup (#10857)
* Dead server cleanup
* Initialize channel in any case
* Fix a bunch of tests
* Fix panic
* Add follower locking in heartbeat tracker
* Add LastContactFailureThreshold to config
* Add log when marking node as dead
* Update follower state locking in heartbeat tracker
* Avoid follower states being nil
* Pull test to its own file
* Add execution status to state response
* Optionally enable autopilot in some tests
* Updates
* Added API function to fetch autopilot configuration
* Add test for default autopilot configuration
* Configuration tests
* Add State API test
* Update test
* Added TestClusterOptions.PhysicalFactoryConfig
* Update locking
* Adjust locking in heartbeat tracker
* s/last_contact_failure_threshold/left_server_last_contact_threshold
* Add disabling autopilot as a core config option
* Disable autopilot in some tests
* s/left_server_last_contact_threshold/dead_server_last_contact_threshold
* Set the lastheartbeat of followers to now when setting up active node
* Don't use config defaults from CLI command
* Remove config file support
* Remove HCL test as well
* Persist only supplied config; merge supplied config with default to operate
* Use pointer to structs for storing follower information
* Test update
* Retrieve non voter status from configbucket and set it up when a node comes up
* Manage desired suffrage
* Consider bucket being created already
* Move desired suffrage to its own entry
* s/DesiredSuffrageKey/LocalNodeConfigKey
* s/witnessSuffrage/recordSuffrage
* Fix test compilation
* Handle local node config post a snapshot install
* Commit to storage first; then record suffrage in fsm
* No need of local node config being nili case, post snapshot restore
* Reconcile autopilot config when a new leader takes over duty
* Grab fsm lock when recording suffrage
* s/Suffrage/DesiredSuffrage in FollowerState
* Instantiate autopilot only in leader
* Default to old ways in more scenarios
* Make API gracefully handle 404
* Address some feedback
* Make IsDead an atomic.Value
* Simplify follower hearbeat tracking
* Use uber.atomic
* Don't have multiple causes for having autopilot disabled
* Don't remove node from follower states if we fail to remove the dead server
* Autopilot server removals map (#11019)
* Don't remove node from follower states if we fail to remove the dead server
* Use map to track dead server removals
* Use lock and map
* Use delegate lock
* Adjust when to remove entry from map
* Only hold the lock while accessing map
* Fix race
* Don't set default min_quorum
* Fix test
* Ensure follower states is not nil before starting autopilot
* Fix race
Co-authored-by: Jason O'Donnell <2160810+jasonodonnell@users.noreply.github.com>
Co-authored-by: Theron Voran <tvoran@users.noreply.github.com>
* upgrade vault dependency set
* etcd and grpc issues:
* better for tests
* testing
* all upgrades for hashicorp deps
* kubernetes plugin upgrade seems to work
* kubernetes plugin upgrade seems to work
* etcd and a bunch of other stuff
* all vulnerable packages upgraded
* k8s is broken in linux env but not locally
* test fixes
* fix testing
* fix etcd and grpc
* fix etcd and grpc
* use master branch of go-testing-interface
* roll back etcd upgrade
* have to fix grpc since other vendors pull in grpc 1.35.0 but we cant due to etcd
* rolling back in the replace directives
* a few more testing dependencies to clean up
* fix go mod vendor
* basic pool and start testing
* refactor a bit for testing
* workFunc, start/stop safety, testing
* cleanup function for worker quit, more tests
* redo public/private members
* improve tests, export types, switch uuid package
* fix loop capture bug, cleanup
* cleanup tests
* update worker pool file name, other improvements
* add job manager prototype
* remove remnants
* add functions to wait for job manager and worker pool to stop, other fixes
* test job manager functionality, fix bugs
* encapsulate how jobs are distributed to workers
* make worker job channel read only
* add job interface, more testing, fixes
* set name for dispatcher
* fix test races
* dispatcher and job manager constructors don't return errors
* logger now dependency injected
* make some members private, test fcn to get worker pool size
* make GetNumWorkers public
* Update helper/fairshare/jobmanager_test.go
Co-authored-by: Brian Kassouf <briankassouf@users.noreply.github.com>
* make workerpool private
* remove custom worker names
* concurrency improvements
* remove worker pool cleanup function
* remove cleanup func from job manager, remove non blocking stop from fairshare
* stop fairshare when started in tests
* stop leaking job manager goroutine
* prototype channel for waking up to assign work
* fix typo/bug and add tests
* improve job manager wake up, fix test typo
* put channel drain back
* better start/pause test for job manager
* go mod vendor
Co-authored-by: Brian Kassouf <briankassouf@users.noreply.github.com>
Co-authored-by: Brian Kassouf <briankassouf@users.noreply.github.com>
* Adding snowflake as a bundled database secrets plugin
* Add snowflake-database-plugin to expected bundled plugins
* Add snowflake plugin name to the mockBuiltinRegistry
* fix racy activity log tests and move testing utilities elsewhere
* remove TODO
* move SetEnable out of activity log
* clarify not waiting on waitgroup
* remove todo