* [QT-602] Run `proxy` and `agent` test scenarios (#23176)
Update our `proxy` and `agent` scenarios to support new variants and
perform baseline verification and their scenario specific verification.
We integrate these updated scenarios into the pipeline by adding them
to artifact samples.
We've also improved the reliability of the `autopilot` and `replication`
scenarios by refactoring our IP address gathering. Previously, we'd ask
vault for the primary IP address and use some Terraform logic to determine
followers. The leader IP address gathering script was also implicitly
responsible for ensuring that a found leader was within a given group of
hosts, and thus waiting for a given cluster to have a leader, and also for
doing some arithmetic and outputting `replication` specific output data.
We've broken these responsibilities into individual modules, improved their
error messages, and fixed various races and bugs, including:
* Fix a race between creating the file audit device and installing and starting
vault in the `replication` scenario.
* Fix how we determine our leader and follower IP addresses. We now query
vault instead of a prior implementation that inferred the followers and sometimes
did not allow all nodes to be an expected leader.
* Fix a bug where we'd always always fail on the first wrong condition
in the `vault_verify_performance_replication` module.
We also performed some maintenance tasks on Enos scenarios byupdating our
references from `oss` to `ce` to handle the naming and license changes. We
also enabled `shellcheck` linting for enos module scripts.
* Rename `oss` to `ce` for license and naming changes.
* Convert template enos scripts to scripts that take environment
variables.
* Add `shellcheck` linting for enos module scripts.
* Add additional `backend` and `seal` support to `proxy` and `agent`
scenarios.
* Update scenarios to include all baseline verification.
* Add `proxy` and `agent` scenarios to artifact samples.
* Remove IP address verification from the `vault_get_cluster_ips`
modules and implement a new `vault_wait_for_leader` module.
* Determine follower IP addresses by querying vault in the
`vault_get_cluster_ips` module.
* Move replication specific behavior out of the `vault_get_cluster_ips`
module and into it's own `replication_data` module.
* Extend initial version support for the `upgrade` and `autopilot`
scenarios.
We also discovered an issue with undo_logs that has been described in
the VAULT-20259. As such, we've disabled the undo_logs check until
it has been fixed.
* actions: fix actionlint error and linting logic (#23305)
Signed-off-by: Ryan Cragun <me@ryan.ec>
Rather than assuming a short sleep will work, we instead wait until netcat is listening of the socket. We've also configured the netcat listener to persist after the first connection, which allows Vault and us to check the connection without the process closing.
As we implemented this we also ran into AWS issues in us-east-1 and us-west-2, so we've changed our deploy regions until those issues are resolved.
Signed-off-by: Ryan Cragun <me@ryan.ec>
Replace our prior implementation of Enos test groups with the new Enos
sampling feature. With this feature we're able to describe which
scenarios and variant combinations are valid for a given artifact and
allow enos to create a valid sample field (a matrix of all compatible
scenarios) and take an observation (select some to run) for us. This
ensures that every valid scenario and variant combination will
now be a candidate for testing in the pipeline. See QT-504[0] for further
details on the Enos sampling capabilities.
Our prior implementation only tested the amd64 and arm64 zip artifacts,
as well as the Docker container. We now include the following new artifacts
in the test matrix:
* CE Amd64 Debian package
* CE Amd64 RPM package
* CE Arm64 Debian package
* CE Arm64 RPM package
Each artifact includes a sample definition for both pre-merge/post-merge
(build) and release testing.
Changes:
* Remove the hand crafted `enos-run-matrices` ci matrix targets and replace
them with per-artifact samples.
* Use enos sampling to generate different sample groups on all pull
requests.
* Update the enos scenario matrices to handle HSM and FIPS packages.
* Simplify enos scenarios by using shared globals instead of
cargo-culted locals.
Note: This will require coordination with vault-enterprise to ensure a
smooth migration to the new system. Integrating new scenarios or
modifying existing scenarios/variants should be much smoother after this
initial migration.
[0] https://github.com/hashicorp/enos/pull/102
Signed-off-by: Ryan Cragun <me@ryan.ec>
Co-authored-by: Ryan Cragun <me@ryan.ec>
The previous strategy for provisioning infrastructure targets was to use
the cheapest instances that could reliably perform as Vault cluster
nodes. With this change we introduce a new model for target node
infrastructure. We've replaced on-demand instances for a spot
fleet. While the spot price fluctuates based on dynamic pricing,
capacity, region, instance type, and platform, cost savings for our
most common combinations range between 20-70%.
This change only includes spot fleet targets for Vault clusters.
We'll be updating our Consul backend bidding in another PR.
* Create a new `vault_cluster` module that handles installation,
configuration, initializing, and unsealing Vault clusters.
* Create a `target_ec2_instances` module that can provision a group of
instances on-demand.
* Create a `target_ec2_spot_fleet` module that can bid on a fleet of
spot instances.
* Extend every Enos scenario to utilize the spot fleet target acquisition
strategy and the `vault_cluster` module.
* Update our Enos CI modules to handle both the `aws-nuke` permissions
and also the privileges to provision spot fleets.
* Only use us-east-1 and us-west-2 in our scenario matrices as costs are
lower than us-west-1.
Signed-off-by: Ryan Cragun <me@ryan.ec>