Create global quotas of each type in every NewTestCluster. Also switch some key locks to use DeadlockMutex to make it easier to discover deadlocks in testing.
NewTestCluster also now starts the cluster, and the Start method becomes a no-op. Unless SkipInit is provided, we also wait for a node to become active, eliminating the need for WaitForActiveNode. This was needed because otherwise we can't safely make the quota api call. We can't do it in Start because Start doesn't return an error, and I didn't want to begin storing the testing object T instead TestCluster just so we could call t.Fatal inside Start.
The last change here was to address the problem of how to skip setting up quotas when creating a cluster with a nonstandard handler that might not even implement the quotas endpoint. The challenge is that because we were taking a func pointer to generate the real handler func, we didn't have any way to compare that func pointer to the standard handler-generating func http.Handler without creating a circular dependency between packages vault and http. The solution was to pass a method instead of an anonymous func pointer so that we can do reflection on it.
* Work to unify log-file for agent/server and add rotation
* Updates to rotation code, tried to centralise the log config setup
* logging + tests
* Move LogFile to ShareConfig in test
* Docs
* Rename integation_test.go->integration_test.go
Signed-off-by: Alexander Scheel <alex.scheel@hashicorp.com>
* Add ability to fetch container's network addresses
This lets us return the on-network container address, allowing us to
spawn client containers which contact server containers.
Signed-off-by: Alexander Scheel <alex.scheel@hashicorp.com>
* Add integration tests with nginx, curl, wget, Go
We build new integration tests, spawning a test instance on nginx and
ensuring we can connect with a variety of clients against a variety of
CA and leaf certificate types. This will ultimately let us detect issues
with compatibility as we expand the matrix of supported servers and
clients.
Signed-off-by: Alexander Scheel <alex.scheel@hashicorp.com>
* Make runner reference unique
Signed-off-by: Alexander Scheel <alex.scheel@hashicorp.com>
* Attempt to fix CI with longer wait
Signed-off-by: Alexander Scheel <alex.scheel@hashicorp.com>
* Finish moving nginx tests to pkiext package
Signed-off-by: Alexander Scheel <alex.scheel@hashicorp.com>
* make fmt
Signed-off-by: Alexander Scheel <alex.scheel@hashicorp.com>
* Add more debugging, work on CircleCI
Signed-off-by: Alexander Scheel <alex.scheel@hashicorp.com>
Signed-off-by: Alexander Scheel <alex.scheel@hashicorp.com>
* Removes _builtin_ versions from mount storage where it already exists
* Stops new builtin versions being put into storage on mount creation/tuning
* Stops the plugin catalog from returning a builtin plugin that has been overridden, so it more accurately reflects the plugins that are available to actually run
* Started work on adding log-file support to Agent
* Allow log file to be picked up and appended
* Use NewLogFile everywhere
* Tried to pull out the config aggregation from Agent.Run
Co-authored-by: Nick Cabatoff <ncabatoff@hashicorp.com>
Give the default SetupLoginMFATOTP helper a more robust period/skew. 403 failures on test-go-race are likely due to TOTP code timeouts being too aggressive.
When running the test suite in CI (where requests are centralized from
relatively few IPs), we'd occasionally hit Dockerhub's rate limits.
Luckily Hashicorp runs a (limited) public mirror of the containers we
need, so we can switch to them here in the tests.
For consistency between developer and CI, we've opted to have the tests
always pull from the Hashicorp mirror, rather than updating the CI
runner to prefer the mirror.
We exclude nomad and influxdb as we don't presently mirror these repos.
Signed-off-by: Alexander Scheel <alex.scheel@hashicorp.com>
Signed-off-by: Alexander Scheel <alex.scheel@hashicorp.com>
* Store login MFA secret with tokenhelper
* Clean up and refactor tokenhelper paths
* Refactor totp test code for re-use
* Add login MFA command tests
* Use longer sleep times and sha512 for totp test
* Add changelog
* Add tests for zlint-clean CA building
This test ensures that we can consistently pass ZLint's CA linting
tests on a root certificate generated by Vault. In particular, nominal
requirements are placed on the structure on the issuer's Subject, which
we supply, and the remaining requirements pass.
The one exception is we include both RFC and CA/BF BR lints in the
default zlint checks; this means ECDSA P-521 (which isn't accepted by
Mozilla's root store policies) is rejected, so we ignore to lints
related to that.
Signed-off-by: Alexander Scheel <alex.scheel@hashicorp.com>
* Add options to copy to/from container, fix stopping
Stopping the container takes a bit of time for some unknown reason so
I've instead opted to shorten the sleep in the zlint tests to avoid
consuming resources too long after the test finish.
Signed-off-by: Alexander Scheel <alex.scheel@hashicorp.com>
* Make zlint tests execute in parallel
This improves the overall test time of the zlint tests, making the
container build up front once (provisioning zlint), and then copying the
cert into the new container image later.
Signed-off-by: Alexander Scheel <alex.scheel@hashicorp.com>
* make fmt
Signed-off-by: Alexander Scheel <alex.scheel@hashicorp.com>
Signed-off-by: Alexander Scheel <alex.scheel@hashicorp.com>
* Refactor Docker command execution
This refactor will allow others to interact with containers more easily,
providing two interfaces (RunCmdWithOutput and RunCmdInBackground) for
executing commands in running containers if they don't wish to do so
manually.
Signed-off-by: Alexander Scheel <alex.scheel@hashicorp.com>
* Allow building containerfiles in tests
By adding image building capabilities to testhelpers (and coupled with
the better command execution support), we can begin to build better,
more reliable integration tests on top of public base images without
needing to maintain separate forks of these images out-of-tree for any
shortcomings they might have.
In particular, rather than doing the rather messy echo hack for writing
clients.conf, it is far better to provision this via a slim
Containerfile overlay on top of the stock jumanjiman/radiusd:latest
image.
Signed-off-by: Alexander Scheel <alex.scheel@hashicorp.com>
* Correctly parse stdout/stderr in RunCmdWithOutput
Signed-off-by: Alexander Scheel <alex.scheel@hashicorp.com>
* ctx -> bCtx for BuildContext
Signed-off-by: Alexander Scheel <alex.scheel@hashicorp.com>
* Update errors to use %w instead of %v
Signed-off-by: Alexander Scheel <alex.scheel@hashicorp.com>
Signed-off-by: Alexander Scheel <alex.scheel@hashicorp.com>
Add plugin version to GRPC interface
Added a version interface in the sdk/logical so that it can be shared between all plugin types, and then wired it up to RunningVersion in the mounts, auth list, and database systems.
I've tested that this works with auth, database, and secrets plugin types, with the following logic to populate RunningVersion:
If a plugin has a PluginVersion() method implemented, then that is used
If not, and the plugin is built into the Vault binary, then the go.mod version is used
Otherwise, the it will be the empty string.
My apologies for the length of this PR.
* Placeholder backend should be external
We use a placeholder backend (previously a framework.Backend) before a
GRPC plugin is lazy-loaded. This makes us later think the plugin is a
builtin plugin.
So we added a `placeholderBackend` type that overrides the
`IsExternal()` method so that later we know that the plugin is external,
and don't give it a default builtin version.
* Allow exposing access to the underlying container
This exposes the Container response from the Docker API, allowing
consumers of the testhelper to interact with the newly started running
container instance. This will be useful for two reasons:
1. Allowing radiusd container to start its own daemon after modifying
its configuration.
2. For loading certificates into a future similar integration test
using the PKI secrets engine.
Signed-off-by: Alexander Scheel <alex.scheel@hashicorp.com>
* Allow any client to connect to test radiusd daemon
This fixes test failures of the following form:
> 2022-09-07T10:46:19.332-0400 [TRACE] core: adding local paths: paths=[]
> 2022-09-07T10:46:19.333-0400 [INFO] core: enabled credential backend: path=mnt/ type=test
> 2022-09-07T10:46:19.334-0400 [WARN] Executing test step: step_number=1
> 2022-09-07T10:46:19.334-0400 [WARN] Executing test step: step_number=2
> 2022-09-07T10:46:29.334-0400 [WARN] Executing test step: step_number=3
> 2022-09-07T10:46:29.335-0400 [WARN] Executing test step: step_number=4
> 2022-09-07T10:46:39.336-0400 [WARN] Requesting RollbackOperation
> --- FAIL: TestBackend_acceptance (28.56s)
> testing.go:364: Failed step 4: erroneous response:
>
> &logical.Response{Secret:<nil>, Auth:<nil>, Data:map[string]interface {}{"error":"context deadline exceeded"}, Redirect:"", Warnings:[]string(nil), WrapInfo:(*wrapping.ResponseWrapInfo)(nil), Headers:map[string][]string(nil)}
> FAIL
> FAIL github.com/hashicorp/vault/builtin/credential/radius 29.238s
In particular, radiusd container ships with a default clients.conf which
restricts connections to ranges associated with the Docker daemon. When
creating new networks (such as in CircleCI) or when running via Podman
(which has its own set of network ranges), this initial config will no
longer be applicable. We thus need to write a new config into the image;
while we could do this by rebuilding a new image on top of the existing
layers (provisioning our config), we then need to manage these changes
and give hooks for the service setup to build it.
Thus, post-startup modification is probably easier to execute in our
case.
Signed-off-by: Alexander Scheel <alex.scheel@hashicorp.com>
Signed-off-by: Alexander Scheel <alex.scheel@hashicorp.com>
* OSS portion of wrapper-v2
* Prefetch barrier type to avoid encountering an error in the simple BarrierType() getter
* Rename the OveriddenType to WrapperType and use it for the barrier type prefetch
* Fix unit test
* storage/raft: Fix cluster init with retry_join
Commit 8db66f4853abce3f432adcf1724b1f237b275415 introduced an error
wherein a join() would return nil (no error) with no information on its
channel if a joining node had been initialized. This was not handled
properly by the caller and resulted in a canceled `retry_join`.
Fix this by handling the `nil` channel respone by treating it as an
error and allowing the existing mechanics to work as intended.
* storage/raft: Improve retry_join go test
* storage/raft: Make VerifyRaftPeers pollable
* storage/raft: Add changelog entry for retry_join fix
* storage/raft: Add description to VerifyRaftPeers
strings.ReplaceAll(s, old, new) is a wrapper function for
strings.Replace(s, old, new, -1). But strings.ReplaceAll is more
readable and removes the hardcoded -1.
Signed-off-by: Eng Zer Jun <engzerjun@gmail.com>
* Update gopsutil to v3
* Adds v2 field names in host-info response to allow eventual deprecation in favor of v3 field names
* Map v3 to v2 field names to keep host-info api compat
* copy gopsutil license into source
* raft: Ensure init before setting suffrage
As reported in https://hashicorp.atlassian.net/browse/VAULT-6773:
The /sys/storage/raft/join endpoint is intended to be unauthenticated. We rely
on the seal to manage trust.
It’s possible to use multiple join requests to switch nodes from voter to
non-voter. The screenshot shows a 3 node cluster where vault_2 is the leader,
and vault_3 and vault_4 are followers with non-voters set to false. sent two
requests to the raft join endpoint to have vault_3 and vault_4 join the cluster
with non_voters:true.
This commit fixes the issue by delaying the call to SetDesiredSuffrage until after
the initialization check, preventing unauthenticated mangling of voter status.
Tested locally using
https://github.com/hashicorp/vault-tools/blob/main/users/ncabatoff/cluster/raft.sh
and the reproducer outlined in VAULT-6773.
* raft: Return join err on failure
This is necessary to correctly distinguish errors returned from the Join
workflow. Previously, errors were being masked as timeouts.
* raft: Default autopilot parameters in teststorage
Change some defaults so we don't have to pass in parameters or set them
in the originating tests. These storage types are only used in two
places:
1) Raft HA testing
2) Seal migration testing
Both consumers have been tested and pass with this change.
* changelog: Unauthn voter status change bugfix
* VAULT-6613 add DetermineRoleFromLoginRequest function to Core
* Fix body handling
* Role resolution for rate limit quotas
* VAULT-6613 update precedence test
* Add changelog
* VAULT-6614 start of changes for roles in LCQs
* Expiration changes for leases
* Add role information to RequestAuth
* VAULT-6614 Test updates
* VAULT-6614 Add expiration test with roles
* VAULT-6614 fix comment
* VAULT-6614 Protobuf on OSS
* VAULT-6614 Add rlock to determine role code
* VAULT-6614 Try lock instead of rlock
* VAULT-6614 back to rlock while I think about this more
* VAULT-6614 Additional safety for nil dereference
* VAULT-6614 Use %q over %s
* VAULT-6614 Add overloading to plugin backends
* VAULT-6614 RLocks instead
* VAULT-6614 Fix return for backend factory
Add database plugin metrics around connections
This is a replacement for #15923 that takes into account recent lock
cleanup.
I went ahead and added back in the hanging plugin test, which I meant to
add in #15944 but forgot.
I tested this by spinning up a statsd sink in the tests and verifying I
got a stream of metrics:
```
$ nc -u -l 8125 | grep backend
test.swenson-Q9Q0L72D39.secrets.database.backend.connections.count.pgx.5.:1.000000|g
test.swenson-Q9Q0L72D39.secrets.database.backend.connections.count.pgx.5.:0.000000|g
test.swenson-Q9Q0L72D39.secrets.database.backend.connections.count.pgx.5.:1.000000|g
test.swenson-Q9Q0L72D39.secrets.database.backend.connections.count.pgx.5.:0.000000|g
```
We have to rework the shared gauge code to work without a full
`ClusterMetricSink`, since we don't have access to the core metrics from
within a plugin.
This only reports metrics every 10 minutes by default, but it solves
some problems we would have had with the gauge values becoming stale and
needing to be re-sent.
Co-authored-by: Tom Proctor <tomhjp@users.noreply.github.com>
- When an end-user specifies the all source type within
transit/random and sys/tools/random, only use the additional source
if we are actually configured with an external entropy source
* WIP replacing lib/pq
* change timezome param to be URI format
* add changelog
* add changelog for redshift
* update changelog
* add test for DSN style connection string
* more parseurl and quoteidentify to sdk; include copyright and license
* call dbutil.ParseURL instead, fix import ordering
Co-authored-by: Calvin Leung Huang <1883212+calvn@users.noreply.github.com>
This requires bumping https://github.com/mitchellh/go-testing-interface.
For this new version, we have to create a wrapper to convert
the stdlib `testing.TB` interface to the
`mitchellh/go-testing-interface` `T` interface, since it uses
`Parallel()` now, which is not supported by `testing.TB`. This had to be
added to a new package, `benchhelpers`, to avoid a circular dependency
in `testhelpers`.
We also have to *unbump* https://github.com/armon/go-metrics since
updating it breaks our usage of
https://github.com/google/go-metrics-stackdriver
I verified that the new `pkiCert` template function works with agent
injection using annotations like:
```yaml
vault.hashicorp.com/agent-inject-secret-sample.crt: "pki/issue/example-dot-com"
vault.hashicorp.com/agent-inject-template-sample.crt: |
{{ pkiCert "pki/issue/example-dot-com" "common_name=foo.example.com" "ttl=1h" }}
```
* Allow callers to choose the entropy source for the random endpoints
* Put source in the URL for sys as well
* changelog
* docs
* Fix unit tests, and add coverage
* refactor to use a single common implementation
* Update documentation
* one more tweak
* more cleanup
* Readd lost test expected code
* fmt
* VAULT-5422: Add rate limit for TOTP passcode attempts
* fixing the docs
* CL
* feedback
* Additional info in doc
* rate limit is done per entity per methodID
* refactoring a test
* rate limit OSS work for policy MFA
* adding max_validation_attempts to TOTP config
* feedback
* checking for non-nil reference
The URL password redaction operation did not handle the case where the
database connection URL was provided as a percent-encoded string, and
its password component contained reserved characters. It attempted to
redact the password by replacing the unescaped password in the
percent-encoded URL. This resulted in the password being revealed when
reading the configuration from Vault.
Unlike fips_140_3, fips will be a (FIPS) version-agnostic build tag.
The listener support will remain in 140-3 only, but the IsFIPS() check
should apply regardless of FIPS version.
We add two FIPS-only build files which validate the constraints of FIPS
builds here: fips must be specified with either fips_140_2 or fips_140_3
build tags, and fips and cgo must also be specified together.
Additionally, using only a version-specific FIPS build tag without the
version-agnostic FIPS tag should be a failure.
Signed-off-by: Alexander Scheel <alex.scheel@hashicorp.com>
* remove mount accessor from MFA config
* Update login_mfa_duo_test.go
* DUO test with entity templating
* using identitytpl.PopulateString to perform templating
* minor refactoring
* fixing fmt failures in CI
* change username format to username template
* fixing username_template example
* Add support for PROXY protocol v2 in TCP listener
I did not find tests for this so I added one trying to cover different
configurations to make sure I did not break something. As far as I know,
the behavior should be exactly the same as before except for one thing
when proxy_protocol_behavior is set to "deny_unauthorized", unauthorized
requests were previously silently reject because of https://github.com/armon/go-proxyproto/blob/7e956b284f0a/protocol.go#L81-L84
but it will now be logged.
Also fixes https://github.com/hashicorp/vault/issues/9462 by adding
support for `PROXY UNKNOWN` for PROXY protocol v1.
Closes https://github.com/hashicorp/vault/issues/3807
* Add changelog
* Login MFA
* ENT OSS segragation (#14088)
* Delete method id if not used in an MFA enforcement config (#14063)
* Delete an MFA methodID only if it is not used by an MFA enforcement config
* Fixing a bug: mfa/validate is an unauthenticated path, and goes through the handleLoginRequest path
* adding use_passcode field to DUO config (#14059)
* add changelog
* preventing replay attack on MFA passcodes (#14056)
* preventing replay attack on MFA passcodes
* using %w instead of %s for error
* Improve CLI command for login mfa (#14106)
CLI prints a warning message indicating the login request needs to get validated
* adding the validity period of a passcode to error messages (#14115)
* PR feedback
* duo to handle preventing passcode reuse
Co-authored-by: hghaf099 <83242695+hghaf099@users.noreply.github.com>
Co-authored-by: hamid ghaf <hamid@hashicorp.com>
* port SSCT OSS
* port header hmac key to ent and generate token proto without make command
* remove extra nil check in request handling
* add changelog
* add comment to router.go
* change test var to use length constants
* remove local index is 0 check and extra defer which can be removed after use of ExternalID
* adds development workflow to mirage config
* adds mirage handler and factory for mfa workflow
* adds mfa handling to auth service and cluster adapter
* moves auth success logic from form to controller
* adds mfa form component
* shows delayed auth message for all methods
* adds new code delay to mfa form
* adds error views
* fixes merge conflict
* adds integration tests for mfa-form component
* fixes auth tests
* updates mfa response handling to align with backend
* updates mfa-form to handle multiple methods and constraints
* adds noDefault arg to Select component
* updates mirage mfa handler to align with backend and adds generator for various mfa scenarios
* adds tests
* flaky test fix attempt
* reverts test fix attempt
* adds changelog entry
* updates comments for todo items
* removes faker from mfa mirage factory and handler
* adds number to word helper
* fixes tests
* Revert "Merge branch 'main' into ui/mfa"
This reverts commit 8ee6a6aaa1b6c9ec16b985c10d91c3806819ec40, reversing
changes made to 2428dd6cca07bb41cda3f453619646ca3a88bfd0.
* format-ttl helper fix from main
* feat: DB plugin multiplexing (#13734)
* WIP: start from main and get a plugin runner from core
* move MultiplexedClient map to plugin catalog
- call sys.NewPluginClient from PluginFactory
- updates to getPluginClient
- thread through isMetadataMode
* use go-plugin ClientProtocol interface
- call sys.NewPluginClient from dbplugin.NewPluginClient
* move PluginSets to dbplugin package
- export dbplugin HandshakeConfig
- small refactor of PluginCatalog.getPluginClient
* add removeMultiplexedClient; clean up on Close()
- call client.Kill from plugin catalog
- set rpcClient when muxed client exists
* add ID to dbplugin.DatabasePluginClient struct
* only create one plugin process per plugin type
* update NewPluginClient to return connection ID to sdk
- wrap grpc.ClientConn so we can inject the ID into context
- get ID from context on grpc server
* add v6 multiplexing protocol version
* WIP: backwards compat for db plugins
* Ensure locking on plugin catalog access
- Create public GetPluginClient method for plugin catalog
- rename postgres db plugin
* use the New constructor for db plugins
* grpc server: use write lock for Close and rlock for CRUD
* cleanup MultiplexedClients on Close
* remove TODO
* fix multiplexing regression with grpc server connection
* cleanup grpc server instances on close
* embed ClientProtocol in Multiplexer interface
* use PluginClientConfig arg to make NewPluginClient plugin type agnostic
* create a new plugin process for non-muxed plugins
* feat: plugin multiplexing: handle plugin client cleanup (#13896)
* use closure for plugin client cleanup
* log and return errors; add comments
* move rpcClient wrapping to core for ID injection
* refactor core plugin client and sdk
* remove unused ID method
* refactor and only wrap clientConn on multiplexed plugins
* rename structs and do not export types
* Slight refactor of system view interface
* Revert "Slight refactor of system view interface"
This reverts commit 73d420e5cd2f0415e000c5a9284ea72a58016dd6.
* Revert "Revert "Slight refactor of system view interface""
This reverts commit f75527008a1db06d04a23e04c3059674be8adb5f.
* only provide pluginRunner arg to the internal newPluginClient method
* embed ClientProtocol in pluginClient and name logger
* Add back MLock support
* remove enableMlock arg from setupPluginCatalog
* rename plugin util interface to PluginClient
Co-authored-by: Brian Kassouf <bkassouf@hashicorp.com>
* feature: multiplexing: fix unit tests (#14007)
* fix grpc_server tests and add coverage
* update run_config tests
* add happy path test case for grpc_server ID from context
* update test helpers
* feat: multiplexing: handle v5 plugin compiled with new sdk
* add mux supported flag and increase test coverage
* set multiplexingSupport field in plugin server
* remove multiplexingSupport field in sdk
* revert postgres to non-multiplexed
* add comments on grpc server fields
* use pointer receiver on grpc server methods
* add changelog
* use pointer for grpcserver instance
* Use a gRPC server to determine if a plugin should be multiplexed
* Apply suggestions from code review
Co-authored-by: Brian Kassouf <briankassouf@users.noreply.github.com>
* add lock to removePluginClient
* add multiplexingSupport field to externalPlugin struct
* do not send nil to grpc MultiplexingSupport
* check err before logging
* handle locking scenario for cleanupFunc
* allow ServeConfigMultiplex to dispense v5 plugin
* reposition structs, add err check and comments
* add comment on locking for cleanupExternalPlugin
Co-authored-by: Brian Kassouf <bkassouf@hashicorp.com>
Co-authored-by: Brian Kassouf <briankassouf@users.noreply.github.com>
* Fix upndomain bug causing alias name to change
* Fix nil map
* Add changelog
* revert
* Update changelog
* Add test for alias metadata name
* Fix code comment
* Warn user supplying nonce values in FIPS mode for transit encryption requests
- Send back a warning within the response if an end-user supplies nonce
values that we use within the various transit encrypt apis.
- We do not send a warning if an end-user supplies a nonce value but we
don't use it.
- Affected api methods are encrypt, rewrap and datakey
- The warning is only sent when we are operating in FIPS mode.
- Add a 'Connect Timeout' query parameter to the test helper to set
a timeout value of 30 seconds in an attempt to address the following
failure we see at times in TestDeleteUser and TestUpdateUser
mssql_test.go:253: Failed to initialize: error verifying connection: TLS Handshake failed: cannot read handshake packet: EOF
* Update Go client libraries for etcd
* Added etcd server container to run etcd3 tests automatically.
* Removed etcd2 test case: it fails the backend tests but the failure is
unrelated to the uplift. The etcd2 backend implementation does not
remove empty nested nodes when removing leaf (see comments in #11980).
* prelim fairshare prototype, untested and prototype status
* add tests for new fairshare infra - this likely fails tests for being racy
* probably fix races for code and test
* one more lock to fix for races
* fairsharing queue work distribution, tests, fixes, etc
* comment, shorten wait time
* typos and comments
* fix inverted worker count logic
* Update helper/fairshare/jobmanager.go
typo
* Update helper/fairshare/jobmanager.go
clarify comment
* move back to round robin between queues
* improvements from self review
* add job manager stress test
* Refactor TLS parsing
The ParsePEMBundle and ParsePKIJSON functions in the certutil package assumes
both a client certificate and a custom CA are specified. Cassandra needs to
allow for either a client certificate, a custom CA, or both. This revamps the
parsing of pem_json and pem_bundle to accomodate for any of these configurations