Commit Graph

68 Commits

Author SHA1 Message Date
hc-github-team-secure-vault-core 7624576e39
backport of commit 9afd5e52ae31d6c3b7ab6833836647392bb318e6 (#23478)
Co-authored-by: Ryan Cragun <me@ryan.ec>
2023-10-03 19:29:40 +00:00
hc-github-team-secure-vault-core 15e85d26df
backport of commit 1b321e3e7ecf487741e722b1c9b224cbe1f3146e (#23413)
Co-authored-by: Ryan Cragun <me@ryan.ec>
2023-09-28 23:33:24 +00:00
hc-github-team-secure-vault-core b9e0d4666e
backport of commit 807bacbc9c0d499de206cfc1f901cea464d94195 (#23410)
Co-authored-by: Ryan Cragun <me@ryan.ec>
2023-09-28 22:51:49 +00:00
hc-github-team-secure-vault-core fd05101133
backport of commit 460b5de47b2b75b9cbeab06933f15774b7819d50 (#23358)
Co-authored-by: Ryan Cragun <me@ryan.ec>
2023-09-27 23:42:57 +00:00
hc-github-team-secure-vault-core 302284aafa
backport of commit 5cdce48a6a8380c185cf962a8e0768be006230e2 (#23347)
Co-authored-by: Ryan Cragun <me@ryan.ec>
2023-09-27 17:07:51 -06:00
hc-github-team-secure-vault-core fb88d3e4ec
backport of commit 7725117846a47dbd4faeecefa03c181251cbb371 (#23326)
Co-authored-by: Ryan Cragun <me@ryan.ec>
2023-09-27 12:59:02 -06:00
Ryan Cragun d2db7fbcdd
Backport [QT-602] Run `proxy` and `agent` test scenarios (#23176) into release/1.14.x (#23302)
* [QT-602] Run `proxy` and `agent` test scenarios (#23176)

Update our `proxy` and `agent` scenarios to support new variants and
perform baseline verification and their scenario specific verification.
We integrate these updated scenarios into the pipeline by adding them
to artifact samples.

We've also improved the reliability of the `autopilot` and `replication`
scenarios by refactoring our IP address gathering. Previously, we'd ask
vault for the primary IP address and use some Terraform logic to determine
followers. The leader IP address gathering script was also implicitly
responsible for ensuring that a found leader was within a given group of
hosts, and thus waiting for a given cluster to have a leader, and also for
doing some arithmetic and outputting `replication` specific output data.
We've broken these responsibilities into individual modules, improved their
error messages, and fixed various races and bugs, including:
* Fix a race between creating the file audit device and installing and starting
  vault in the `replication` scenario.
* Fix how we determine our leader and follower IP addresses. We now query
  vault instead of a prior implementation that inferred the followers and sometimes
  did not allow all nodes to be an expected leader.
* Fix a bug where we'd always always fail on the first wrong condition
  in the `vault_verify_performance_replication` module.

We also performed some maintenance tasks on Enos scenarios  byupdating our
references from `oss` to `ce` to handle the naming and license changes. We
also enabled `shellcheck` linting for enos module scripts.

* Rename `oss` to `ce` for license and naming changes.
* Convert template enos scripts to scripts that take environment
  variables.
* Add `shellcheck` linting for enos module scripts.
* Add additional `backend` and `seal` support to `proxy` and `agent`
  scenarios.
* Update scenarios to include all baseline verification.
* Add `proxy` and `agent` scenarios to artifact samples.
* Remove IP address verification from the `vault_get_cluster_ips`
  modules and implement a new `vault_wait_for_leader` module.
* Determine follower IP addresses by querying vault in the
  `vault_get_cluster_ips` module.
* Move replication specific behavior out of the `vault_get_cluster_ips`
  module and into it's own `replication_data` module.
* Extend initial version support for the `upgrade` and `autopilot`
  scenarios.

We also discovered an issue with undo_logs that has been described in
the VAULT-20259. As such, we've disabled the undo_logs check until
it has been fixed.

* actions: fix actionlint error and linting logic (#23305)

Signed-off-by: Ryan Cragun <me@ryan.ec>
2023-09-27 10:53:12 -06:00
Ryan Cragun 9da2fc4b8b
test: wait for nc to be listening before enabling auditor (#23142) (#23150)
Rather than assuming a short sleep will work, we instead wait until netcat is listening of the socket. We've also configured the netcat listener to persist after the first connection, which allows Vault and us to check the connection without the process closing.

As we implemented this we also ran into AWS issues in us-east-1 and us-west-2, so we've changed our deploy regions until those issues are resolved.

Signed-off-by: Ryan Cragun <me@ryan.ec>
2023-09-18 15:10:37 -06:00
hc-github-team-secure-vault-core 79ec31895e
backport of commit d634700c9e80871c607f894ae31a1b6187777e6c (#22966)
Co-authored-by: Ryan Cragun <me@ryan.ec>
2023-09-11 18:27:51 +00:00
Ryan Cragun 8880b6eeb1
test: fix release testing from artifactory (#22941) (#22945)
Signed-off-by: Ryan Cragun <me@ryan.ec>
2023-09-08 21:32:39 +00:00
hc-github-team-secure-vault-core f52a686b91
[QT-506] Use enos scenario samples for testing (#22641) (#22933)
Replace our prior implementation of Enos test groups with the new Enos
sampling feature. With this feature we're able to describe which
scenarios and variant combinations are valid for a given artifact and
allow enos to create a valid sample field (a matrix of all compatible
scenarios) and take an observation (select some to run) for us. This
ensures that every valid scenario and variant combination will
now be a candidate for testing in the pipeline. See QT-504[0] for further
details on the Enos sampling capabilities.

Our prior implementation only tested the amd64 and arm64 zip artifacts,
as well as the Docker container. We now include the following new artifacts
in the test matrix:
* CE Amd64 Debian package
* CE Amd64 RPM package
* CE Arm64 Debian package
* CE Arm64 RPM package

Each artifact includes a sample definition for both pre-merge/post-merge
(build) and release testing.

Changes:
* Remove the hand crafted `enos-run-matrices` ci matrix targets and replace
  them with per-artifact samples.
* Use enos sampling to generate different sample groups on all pull
  requests.
* Update the enos scenario matrices to handle HSM and FIPS packages.
* Simplify enos scenarios by using shared globals instead of
  cargo-culted locals.

Note: This will require coordination with vault-enterprise to ensure a
smooth migration to the new system. Integrating new scenarios or
modifying existing scenarios/variants should be much smoother after this
initial migration.

[0] https://github.com/hashicorp/enos/pull/102

Signed-off-by: Ryan Cragun <me@ryan.ec>
Co-authored-by: Ryan Cragun <me@ryan.ec>
2023-09-08 13:31:09 -06:00
Sarah Thompson 2ae56bd4ac
cherrypick of a9a4b0b9ff (#22813) 2023-09-06 18:24:39 +01:00
hc-github-team-secure-vault-core 6f2d433394
backport of commit 6ae9f8d4eddfdb134bcbabd3f58e633757a6afc9 (#22443)
Co-authored-by: Rebecca Willett <47540675+rebwill@users.noreply.github.com>
2023-08-18 16:30:36 +00:00
hc-github-team-secure-vault-core b4fa55858c
backport of commit 6654c425d2206624ff42cc7b7b92407a5e338311 (#22221)
Co-authored-by: Rebecca Willett <47540675+rebwill@users.noreply.github.com>
2023-08-08 11:11:03 -04:00
hc-github-team-secure-vault-core 04eed0b14c
backport of commit 6b21994d76b18c91397247dfd69bb01e46c5de25 (#21981)
Co-authored-by: Ryan Cragun <me@ryan.ec>
2023-07-20 20:51:07 +00:00
hc-github-team-secure-vault-core 32beec61bc
backport of commit fd1683698bad3556d21e783a26ec1bca5d0de671 (#21477)
Co-authored-by: Ryan Cragun <me@ryan.ec>
2023-06-27 16:58:02 +00:00
hc-github-team-secure-vault-core 324557f57e
enos: use on-demand targets (#21459) (#21464)
Add an updated `target_ec2_instances` module that is capable of
dynamically splitting target instances over subnet/az's that are
compatible with the AMI architecture and the associated instance type
for the architecture. Use the `target_ec2_instances` module where
necessary. Ensure that `raft` storage scenarios don't provision
unnecessary infrastructure with a new `target_ec2_shim` module.

After a lot of trial, the state of Ec2 spot instance capacity, their
associated APIs, and current support for different fleet types in AWS
Terraform provider, have proven to make using spot instances for
scenario targets too unreliable.

The current state of each method:
* `target_ec2_fleet`: unusable due to the fact that the `instant` type
  does not guarantee fulfillment of either `spot` or `on-demand`
  instance request types. The module does support both `on-demand` and
  `spot` request types and is capable of bidding across a maximum of
  four availability zones, which makes it an attractive choice if the
  `instant` type would always fulfill requests. Perhaps a `request` type
  with `wait_for_fulfillment` option like `aws_spot_fleet_request` would
  make it more viable for future consideration.
* `target_ec2_spot_fleet`: more reliable if bidding for target instances
  that have capacity in the chosen zone. Issues in the AWS provider
  prevent us from bidding across multiple zones succesfully. Over the
  last 2-3 months target capacity for the instance types we'd prefer to
  use has dropped dramatically and the price is near-or-at on-demand.
  The volatility for nearly no cost savings means we should put this
  option on the shelf for now.
* `target_ec2_instances`: the most reliable method we've got. It is now
  capable of automatically determing which subnets and availability
  zones to provision targets in and has been updated to be usable for
  both Vault and Consul targets. By default we use the cheapest medium
  instance types that we've found are reliable to test vault.

* Update .gitignore
* enos/modules/create_vpc: create a subnet for every availability zone
* enos/modules/target_ec2_fleet: bid across the maximum of four
  availability zones for targets
* enos/modules/target_ec2_spot_fleet: attempt to make the spot fleet bid
  across more availability zones for targets
* enos/modules/target_ec2_instances: create module to use
  ec2:RunInstances for scenario targets
* enos/modules/target_ec2_shim: create shim module to satisfy the
  target module interface
* enos/scenarios: use target_ec2_shim for backend targets on raft
  storage scenarios
* enos/modules/az_finder: remove unsed module

Signed-off-by: Ryan Cragun <me@ryan.ec>
Co-authored-by: Ryan Cragun <me@ryan.ec>
2023-06-26 16:54:39 -06:00
hc-github-team-secure-vault-core 58287739ec
backport of commit 5de6af60760dbcbefd8c8e4eb923f74a5720cf13 (#21440)
Co-authored-by: Ryan Cragun <me@ryan.ec>
2023-06-23 04:48:54 +00:00
hc-github-team-secure-vault-core be67c16299
backport of commit 8d22142a3e9d13435b1a65685317fefba7e2f5b3 (#21421)
Co-authored-by: Ryan Cragun <me@ryan.ec>
2023-06-22 22:14:22 +00:00
hc-github-team-secure-vault-core 386376573b
backport of commit ddff68c82a038bdfd1d16d8d389f5cc839e57b67 (#21230)
Co-authored-by: Ryan Cragun <me@ryan.ec>
2023-06-14 12:39:10 -06:00
hc-github-team-secure-vault-core 7d024b4f4e
backport of commit 2ec5a28f51fe0b5095a0554627fb3295c7f2ccb4 (#21148)
Co-authored-by: Ryan Cragun <me@ryan.ec>
2023-06-12 17:19:46 +00:00
hc-github-team-secure-vault-core 93af7c8756
backport of commit 27621e05d63ae14475e7a5ec8e8f23277d9eeb98 (#21137)
Co-authored-by: Ryan Cragun <me@ryan.ec>
2023-06-12 16:46:11 +00:00
hc-github-team-secure-vault-core 354d49e4eb
backport of commit b0aa808baaf13ca85061bcd20165559c6e8e4553 (#21114)
Co-authored-by: Ryan Cragun <me@ryan.ec>
2023-06-09 13:40:59 -06:00
hc-github-team-secure-vault-core fa3016696a
backport of commit b9f9f27e8e988c4f441f81df733fb0aa5c513290 (#21038)
Co-authored-by: Jaymala <jaymala@hashicorp.com>
2023-06-06 18:12:53 -07:00
hc-github-team-secure-vault-core 9a6d09e029
backport of commit 85128585837bcce2cf99f8e1f749c3a4aef204ca (#21032)
Co-authored-by: Jaymala <jaymala@hashicorp.com>
2023-06-06 17:34:55 -04:00
hc-github-team-secure-vault-core 69104f93b8
backport of commit dbe41c4fee5ce88a1f7ce83a64cc3a78116ab1b3 (#21007)
Co-authored-by: Mike Baum <mike.baum@hashicorp.com>
2023-06-06 07:11:15 -04:00
Mike Baum d323aa33df
Backport of audit file changes to release/1.14.x (#20985) 2023-06-05 11:46:59 -04:00
Ryan Cragun 1e752e0cba
ci: request vpc quota increase (#20360)
* Fix regions on two service quotas
* Request an increase in VPCs per region
* Pin github actions workflows

Signed-off-by: Ryan Cragun <me@ryan.ec>
2023-05-22 11:18:06 -06:00
Ryan Cragun 2b5cb8d26b
test: use correct pool allocation for spot strategy (#20593)
Determine the allocation pool size for the spot fleet by the allocation
strategy. This allows us to ensure a consistent attribute plan during
re-runs which avoid rebuilding the target fleets.

Signed-off-by: Ryan Cragun <me@ryan.ec>
2023-05-16 14:00:20 -06:00
Jaymala fcfd0f9bb8
[QT-554] Remove Terraform validations from Enos replication scenario (#20570)
Signed-off-by: Jaymala Sinha <jaymala@hashicorp.com>
2023-05-12 16:06:46 -04:00
Ryan Cragun 33098fd1ae
enos: use initial version variable in autopilot (#20349)
Signed-off-by: Ryan Cragun <me@ryan.ec>
2023-04-25 12:37:11 -06:00
Ryan Cragun b74e4bc781
enos: use artifactory release for auto-pilot upgrade (#20332)
Signed-off-by: Ryan Cragun <me@ryan.ec>
2023-04-24 18:57:08 +00:00
Ryan Cragun a889ba1205
enos: always use the initial release during upgrades (#20321)
Signed-off-by: Ryan Cragun <me@ryan.ec>
2023-04-24 18:00:44 +00:00
Ryan Cragun deeb1ece5b
[QT-530] enos: allow-list all public IP addresses (#20304)
The security groups that allow access to remote machines in Enos
scenarios have been configured to only allow port 22 (SSH) from the
public IP address of machine executing the Enos scenario. To achieve
this we previously utilized the `enos_environment.public_ip_address`
attribute. Sometime in mid March we started seeing sporadic SSH i/o
timeout errors when attempting to execute Enos resources against SSH
transport targets. We've only ever seen this when communicating from
Azure hosted runners to AWS hosted machines.

While testing we were able to confirm that in some cases the public IP
address resolved using DNS over UDP4 to Google and OpenDNS name servers
did not match what was resolved when using the HTTPS/TCP IP address
service hosted by AWS. The Enos data source was implemented in a way
that we'd attempt resolution of a single name server and only attempt
resolving from the next if previous name server could not get a result.
We'd then allow-list that single IP address. That's a problem if we can
resolve two different public IP addresses depending our endpoint address.

This change utlizes the new `enos_environment.public_ip_addresses`
attribute and subsequent behavior change. Now the data source will
attempt to resolve our public IP address via name servers hosted by
Google, OpenDNS, Cloudflare, and AWS. We then return a unique set of
these IP addresses and allow-list all of them in our security group. It
is our hope that this resolves these i/o timeout errors that seem like
they're caused by the security group black-holing our attempted access
because the IP we resolved does not match what we're actually exiting
with.

Signed-off-by: Ryan Cragun <me@ryan.ec>
2023-04-23 16:25:32 -06:00
Ryan Cragun a19f7dbda5
[QT-525] enos: use spot instances for Vault targets (#20037)
The previous strategy for provisioning infrastructure targets was to use
the cheapest instances that could reliably perform as Vault cluster
nodes. With this change we introduce a new model for target node
infrastructure. We've replaced on-demand instances for a spot
fleet. While the spot price fluctuates based on dynamic pricing, 
capacity, region, instance type, and platform, cost savings for our
most common combinations range between 20-70%.

This change only includes spot fleet targets for Vault clusters.
We'll be updating our Consul backend bidding in another PR.

* Create a new `vault_cluster` module that handles installation,
  configuration, initializing, and unsealing Vault clusters.
* Create a `target_ec2_instances` module that can provision a group of
  instances on-demand.
* Create a `target_ec2_spot_fleet` module that can bid on a fleet of
  spot instances.
* Extend every Enos scenario to utilize the spot fleet target acquisition
  strategy and the `vault_cluster` module.
* Update our Enos CI modules to handle both the `aws-nuke` permissions
  and also the privileges to provision spot fleets.
* Only use us-east-1 and us-west-2 in our scenario matrices as costs are
  lower than us-west-1.

Signed-off-by: Ryan Cragun <me@ryan.ec>
2023-04-13 15:44:43 -04:00
Mike Baum 8de15e4827
[QT-523] Remove copyright/license header from raft config used in the Docker/K8S integration test (#19584) 2023-03-16 17:39:59 -04:00
Hamid Ghaf 27bb03bbc0
adding copyright header (#19555)
* adding copyright header

* fix fmt and a test
2023-03-15 09:00:52 -07:00
Jaymala 99d4151a38
Fetch replication status in its own resource (#19132)
* Fix json decode errors for Enos replication verification module

Signed-off-by: Jaymala Sinha <jaymala@hashicorp.com>

* Rewrite the pr connection check script

Signed-off-by: Jaymala Sinha <jaymala@hashicorp.com>

* Do not fail on get replication status

Signed-off-by: Jaymala Sinha <jaymala@hashicorp.com>

---------

Signed-off-by: Jaymala Sinha <jaymala@hashicorp.com>
2023-02-14 12:21:29 -05:00
Jaymala af957e7c9c
Add Vault log level support (#19083)
Signed-off-by: Jaymala Sinha <jaymala@hashicorp.com>
2023-02-08 17:41:16 -05:00
Mike Baum 225fbb78d2
[QT-304] Ensure Chrome is only installed for vault-enterprise UI Test workflows (#19003) 2023-02-06 16:29:33 -05:00
Mike Baum 3131c48501
[QT-304] Add enos ui scenario (#18518)
* Add enos ui scenario
* Add github action for running the UI scenario
2023-02-03 09:55:06 -05:00
Jaymala bedb7e4af9
Update replication verification to check connection status (#18921)
* Update replication verification to check connection status

Signed-off-by: Jaymala Sinha <jaymala@hashicorp.com>

* Output replication status after verifying connection

Signed-off-by: Jaymala Sinha <jaymala@hashicorp.com>

---------

Signed-off-by: Jaymala Sinha <jaymala@hashicorp.com>
2023-01-31 16:23:46 -05:00
Hamid Ghaf 4b4e0437e1
enos: default undo-logs to cluster behavior (#18771)
* enos: default undo-logs to cluster behavior

* change a step dependency

* rearrange steps, wait a bit longer for undo logs
2023-01-20 10:25:14 -05:00
Jaymala 9501b56ffa
Rename reusable enos-run workflow file (#18757)
* Rename reusable enos-run workflow file

Signed-off-by: Jaymala Sinha <jaymala@hashicorp.com>

* Update Enos README file

Signed-off-by: Jaymala Sinha <jaymala@hashicorp.com>

Signed-off-by: Jaymala Sinha <jaymala@hashicorp.com>
2023-01-18 16:37:38 -07:00
Mike Baum da2849217c
[QT-441] Switch over to using new vault_ci AWS account for enos CI workflows (#18398) 2023-01-18 16:09:19 -05:00
Josh Brand f38c69559b
Add enterprise resources to CI cleanup (#18758) 2023-01-18 14:14:19 -05:00
Jaymala cdae007e30
Fix arch for backend storage in Enos replication scenario (#18741)
Signed-off-by: Jaymala Sinha <jaymala@hashicorp.com>

Signed-off-by: Jaymala Sinha <jaymala@hashicorp.com>
2023-01-17 22:28:30 +00:00
Jaymala ca18e2fffe
[QT-19] Enable Enos replication scenario (#17748)
* Add initial replication scenario config

Signed-off-by: Jaymala Sinha <jaymala@hashicorp.com>

* Add support for replication with different backend and seal types

Signed-off-by: Jaymala Sinha <jaymala@hashicorp.com>

* Update Consul versions

Signed-off-by: Jaymala Sinha <jaymala@hashicorp.com>

* Additional config for replicaiton scenario

Signed-off-by: Jaymala Sinha <jaymala@hashicorp.com>

* Update replication scenario modules

Signed-off-by: Jaymala Sinha <jaymala@hashicorp.com>

* Refactor replication modules

Signed-off-by: Jaymala Sinha <jaymala@hashicorp.com>

* Add more steps for replication

Signed-off-by: Jaymala Sinha <jaymala@hashicorp.com>

* Work in progress with unsealing followers on secondary cluster

Signed-off-by: Jaymala Sinha <jaymala@hashicorp.com>

* Add more replication scenario steps

Signed-off-by: Jaymala Sinha <jaymala@hashicorp.com>

* More updates

Signed-off-by: Jaymala Sinha <jaymala@hashicorp.com>

* Working shamir scenario

Signed-off-by: Jaymala Sinha <jaymala@hashicorp.com>

* Update to unify get Vault IP module

Signed-off-by: Jaymala Sinha <jaymala@hashicorp.com>

* Remove duplicate module

Signed-off-by: Jaymala Sinha <jaymala@hashicorp.com>

* Fix race condition for secondary followers unseal

Signed-off-by: Jaymala Sinha <jaymala@hashicorp.com>

* Use consistent naming for module directories

Signed-off-by: Jaymala Sinha <jaymala@hashicorp.com>

* Update replication scenario with latest test matrix

Signed-off-by: Jaymala Sinha <jaymala@hashicorp.com>

* Verify replication with awskms

Signed-off-by: Jaymala Sinha <jaymala@hashicorp.com>

* Add write and retrive data support for all scenarios

* Update all scenarios to verify write and read kv data

Signed-off-by: Jaymala Sinha <jaymala@hashicorp.com>

* Fix write and read data modules

Signed-off-by: Jaymala Sinha <jaymala@hashicorp.com>

* Add comments explaining the module run

Signed-off-by: Jaymala Sinha <jaymala@hashicorp.com>

* Address review feedback and update consul version

Signed-off-by: Jaymala Sinha <jaymala@hashicorp.com>

* Address more review feedback

Signed-off-by: Jaymala Sinha <jaymala@hashicorp.com>

* Remove vault debug logging

Signed-off-by: Jaymala Sinha <jaymala@hashicorp.com>

* Exclude ent.fips1402 and ent.hsm.fips1402 packages from Enos test matrix

Signed-off-by: Jaymala Sinha <jaymala@hashicorp.com>

* Add verification for replication connection status

* Currently this verification fails on Consul due to VAULT-12332

Signed-off-by: Jaymala Sinha <jaymala@hashicorp.com>

* Add replication scenario to Enos README

Signed-off-by: Jaymala Sinha <jaymala@hashicorp.com>

* Update README as per review suggesstions

Signed-off-by: Jaymala Sinha <jaymala@hashicorp.com>

* [QT-452] Add recovery keys to scenario outputs

Signed-off-by: Jaymala Sinha <jaymala@hashicorp.com>

* Fix replication output var

Signed-off-by: Jaymala Sinha <jaymala@hashicorp.com>

* Fix autopilot scenario deps and add retry for read data

Signed-off-by: Jaymala Sinha <jaymala@hashicorp.com>

Signed-off-by: Jaymala Sinha <jaymala@hashicorp.com>
2023-01-13 11:43:26 -05:00
Josh Brand c2ae1f1654
Add automated CI account cleanup & monitoring (#18659)
This uses aws-nuke and awslimitchecker to monitor the new vault CI account to clean up and prevent resource quota exhaustion.  AWS-nuke will scan all regions of the accounts for lingering resources enos/terraform didn't clean up, and if they don't match exclusion criteria, delete them every night.  By default, we exclude corp-sec created resources, our own CI resources, and when possible, anything created within the past 72 hours. Because this account is dedicated to CI, users should not expect resources to persist beyond this without additional configuration.
2023-01-11 17:24:08 -05:00
Mike Palmiotto 5932b34dad
Turn off undo logs for enos auto-upgrade scenario pre-v1.13 (#18526) 2022-12-22 12:37:05 -05:00