Commit graph

133 commits

Author SHA1 Message Date
Tim Gross 566dae7b19
e2e: add flag to bootstrap Nomad ACLs (#8961)
Adds a `nomad_acls` flag to our Terraform stack that bootstraps Nomad ACLs via
a `local-exec` provider. There's no way to set the `NOMAD_TOKEN` in the Nomad
TF provider if we're bootstrapping in the same Terraform stack, so instead of
using `resource.nomad_acl_token`, we also bootstrap a wide-open anonymous
policy. The resulting management token is exported as an environment var with
`$(terraform output environment)` and tests that want stricter ACLs will be
able to write them using that token.

This should also provide a basis to do similar work with Consul ACLs in the
future.
2020-09-28 09:22:36 -04:00
Tim Gross 147b16243d
e2e: use more recent instance type (#8954)
Newer EC2 instances are both cheaper and have generally better
performance.

The dnsmasq configuration had a hard-coded interface name, so in order to
accomodate instances with more recent networking that result in so-called
predictable interface names, the dnsmasq configuration needs to be replaced at
runtime with userdata to select the default interface.
2020-09-23 14:27:52 -04:00
Tim Gross 1fc525ec1e
e2e: add flags for provisioning Nomad Enterprise (#8929) 2020-09-23 10:39:04 -04:00
Tim Gross 3da61545d5
make sure dev-cluster has the option to run windows config (#8928) 2020-09-18 16:41:35 -04:00
Tim Gross ea1f6408bf
e2e: remove unused framework provisioning code (#8908) 2020-09-18 11:46:47 -04:00
Tim Gross c413fa5e49
e2e: test script for Terraform logic (#8907) 2020-09-18 11:46:40 -04:00
Tim Gross 9d37233eaf
e2e: provision cluster entirely through Terraform (#8748)
Have Terraform run the target-specific `provision.sh`/`provision.ps1` script
rather than the test runner code which needs to be customized for each
distro. Use Terraform's detection of variable value changes so that we can
re-run the provisioning without having to re-install Nomad on those specific
hosts that need it changed.

Allow the configuration "profile" (well-known directory) to be set by a
Terraform variable. The default configurations are installed during Packer
build time, and symlinked into the live configuration directory by the
provision script. Detect changes in the file contents so that we only upload
custom configuration files that have changed between Terraform runs
2020-09-18 11:27:24 -04:00
Tim Gross 990fcf7be4
e2e: documentation and minor tweaks to configs (#8912)
* remove outdated references to envchain in documentation
* add new host volume locations in userdata
* don't exit the entire script during provisioning, just return
2020-09-17 09:20:18 -04:00
Michael Schurter 5f3a71d0b9 docs: update scripts to 0.12.4 2020-09-09 15:22:37 -07:00
Tim Gross a47b1c1081
e2e: move configurations into profile-specific directories (#8828)
This changeset stages upcoming E2E provisioning improvements work. It splits
the existing shared configuration directory into 3 profiles:

* "full-cluster": the set of configurations currently in use
* "dev-cluster": a simplified set of mostly existing configurations that
  weren't in use.
* "custom": an empty profile for developers to keep non-standard
  configurations during complex feature development.

The tooling to switch between profiles will be in a later changeset.

Also drops some unused configuration knobs from the provisioning scripts to
make the next stage of work easier.
2020-09-04 11:23:32 -04:00
Tim Gross 93c1093274
e2e: remove unused EBS volumes and depends_on (#8827)
Our provisioning process for E2E doesn't require the `depends_on` fields to be
set for client instances, so dropping that field allows all instances to be
started in parallel.

We don't use the extra EBS volumes (they aren't even mounted), so remove them
to reduce costs.
2020-09-04 10:25:59 -04:00
Tim Gross 0577b03479
e2e: minor rename and cleanup (#8824) 2020-09-04 08:51:22 -04:00
Tim Gross e6cdd8e0c0
e2e: consolidate cloud-specific Consul configs (#8823)
The `-recursor` flag in the Consul service unit files is specific to a given
cloud, but we already have cloud-specific configuration files. Consolidate all
the cloud-specific items into the config.
2020-09-04 08:51:15 -04:00
Tim Gross bc6ad011fe
e2e: Linux AMI setup cleanup (#8821)
As we add new Linux targets for E2E, the existing setup.sh script will be used
only for Ubuntu. Rather than have the service and config files echo'd from the
script, move them into files we upload so they can be reused.

Includes some general noise reduction in the setup.sh script and removal of
unused bits.
2020-09-03 16:30:58 -04:00
Tim Gross 3a382f599f
e2e: minor TF refactor to split out vars and outputs (#8752) 2020-08-26 17:00:36 -04:00
Tim Gross 8c8b91e7b9
e2e: move systemd unit files into Packer build (#8751) 2020-08-26 16:45:09 -04:00
Tim Gross 693a8a2613
e2e: fix platform path for installing for Linux from s3 (#8708) 2020-08-21 09:20:09 -04:00
Tim Gross b23150057a
E2E: move Nomad installation to script on remote hosts (#8706)
This changeset moves the installation of Nomad binaries out of the
provisioning framework and into scripts that are installed on the remote host
during AMI builds.

This provides a few advantages:

* The provisioning framework can be reduced in scope (with the goal of moving
  most of it into the Terraform stack entirely).
* The scripts can be arbitrarily complex if we don't have to stuff them into
  ssh commands, so it's easier to make them idempotent. In this changeset, the
  scripts check the version of the existing binary and don't re-download when
  using the `--nomad_sha` or `--nomad_version` flags.
* The scripts can be OS/distro specific, which helps in building new test
  targets.
2020-08-20 16:10:00 -04:00
Tim Gross 0fd4a05b2f
E2E AMI cleanup (#8697)
* move CNI install/podman config to build-time
* move DNS config to userdata
* consolidate apt updates for performance
2020-08-20 10:09:31 -04:00
Tim Gross 9a3caa49db
e2e: remove unused spark dependency (#8695) 2020-08-19 14:59:36 -04:00
Tim Gross a49732816c
migrate AMI builds to new account (#8674) 2020-08-19 08:20:59 -04:00
Tim Gross d810dab50b
migrate E2E test runs to new AWS account (#8676) 2020-08-18 14:24:34 -04:00
Tim Gross d0b03cad7c
e2e: give containers access to dnsmasq DNS (#8536)
By default, Docker containers get /etc/resolv.conf bound into the container
with the localhost entry stripped out. In order to resolve using the host's
dnsmasq, we need to make sure the container uses the docker0 IP as its
nameserver and that dnsmasq is listening on that port and forwarding to either
the AWS VPC DNS (so that we can query private resources like EFS) or to the
Consul DNS.
2020-07-24 14:09:18 -04:00
Drew Bailey 01b01f7cac
use latest podman release (#8403) 2020-07-09 09:28:53 -04:00
Drew Bailey 327843acfa
base podman e2e test and provisioning updates (#8104)
* initial setup for terrform to install podman task driver

podman

* Update e2e provisioning to support root podman

Excludes setup for rootless podman. updates source ami to ubuntu 18.04
Installs podman and configures podman varlink

base podman test

ensure client status running

revert terraform directory changes

* back out random go-discover go mod change

* include podman varlink docs

* address comments
2020-06-03 14:06:58 -04:00
Tim Gross 932710ad7d
e2e: upgrade CNI to 0.8.6 (#7956) 2020-05-14 09:29:11 -04:00
Seth Hoenig 623c804046 e2e: upgrade consul in packer setup to 1.7.3 from 1.6.1
There have been a number of bug fixes and features particularly around
Connect that will help us in Nomad's e2e tests. Upgrade Consul in our
packer builder so e2e can make use of the new version.
2020-05-11 11:17:28 -06:00
Tim Gross 4935b304a0
e2e: add helper to Makefile for local file deployments (#7822) 2020-04-28 16:15:58 -04:00
Tim Gross ab3086a1f4
e2e: testing reliability (#7701)
* pin CSI plugin versions
* ensure failing CSI tests clean up
* allow NOMAD_SHA env var to override makefile
2020-04-13 10:25:24 -04:00
Mahmood Ali c8eddb9f6b fixup! e2e: add a convenient creation script 2020-04-09 11:04:26 -04:00
Mahmood Ali 8a4937d9ce e2e: add a convenient creation script
Add a convenience Makefile for creating e2e environment for manual
debugging.
2020-04-09 10:54:30 -04:00
Tim Gross bde13dfc0c
e2e: have TF write-out HCL for CSI volume registration (#7599) 2020-04-02 12:16:43 -04:00
Tim Gross cd1c6173f4 csi: e2e tests for EBS and EFS plugins (#7343)
This changeset provides two basic e2e tests for CSI plugins targeting
common AWS use cases.

The EBS test launches the EBS plugin (controller + nodes) and registers
an EBS volume as a Nomad CSI volume. We deploy a job that writes to
the volume, stop that job, and reuse the volume for another job which
should be able to read the data written by the first job.

The EFS test launches the EFS plugin (nodes-only) and registers an EFS
volume as a Nomad CSI volume. We deploy a job that writes to the
volume, stop that job, and reuse the volume for another job which
should be able to read the data written by the first job.

The writer jobs mount the CSI volume at a location within the alloc
dir.
2020-03-23 13:59:18 -04:00
Mahmood Ali 857ddf7aaf e2e: use unique CSI token
Use a unique per-cluster efs creation token, as https://www.terraform.io/docs/providers/aws/r/efs_file_system.html#creation_token.

Using a static value prevents having multiple test clusters.

[ci skip]
2020-03-15 21:55:26 -04:00
Tim Gross 79222c36bf
e2e: add EBS and EFS volumes for testing CSI (#7266)
This changeset adds volumes but does not mount them to instances so
that we can test the mounting ("staging") via CSI plugins. The CSI
plugins themselves will be installed as Nomad jobs.

In order to ensure we can always mount the EFS volume, this changeset
pins the deployment of the cluster to a specific subnet. In future
work we should spread the cluster out among several AZs and test that
behavior explicitly.
2020-03-04 10:44:51 -05:00
Tim Gross 940110b2de
e2e: improve provisioning defaults and documentation (#7062)
This changeset improves the ergonomics of running the Nomad e2e test
provisioning process by defaulting to a blank `nomad_sha` in the
Terraform configuration. By default, a user will now need to pass in
one of the Nomad version flags. But they won't have to manually edit
the `provisioning.json` file for the common case of deploying a
released version of Nomad, and won't need to put dummy values for
`nomad_sha`.

Includes general documentation improvements.
2020-02-04 10:37:00 -05:00
Tim Gross 7681f09ae4
e2e: packer builds should not be public (#6998) 2020-01-27 16:28:25 -05:00
Tim Gross 457e3ad5c6
e2e: document e2e provisioning process (#6976) 2020-01-22 16:55:17 -05:00
Tim Gross 2edbdfc8be
e2e: update framework to allow deploying Nomad (#6969)
The e2e framework instantiates clients for Nomad/Consul but the
provisioning of the actual Nomad cluster is left to Terraform. The
Terraform provisioning process uses `remote-exec` to deploy specific
versions of Nomad so that we don't have to bake an AMI every time we
want to test a new version. But Terraform treats the resulting
instances as immutable, so we can't use the same tooling to update the
version of Nomad in-place. This is a prerequisite for upgrade testing.

This changeset extends the e2e framework to provide the option of
deploying Nomad (and, in the future, Consul/Vault) with specific
versions to running infrastructure. This initial implementation is
focused on deploying to a single cluster via `ssh` (because that's our
current need), but provides interfaces to hook the test run at the
start of the run, the start of each suite, or the start of a given
test case.

Terraform work includes:
* provides Terraform output that written to JSON used by the framework
  to configure provisioning via `terraform output provisioning`.
* provides Terraform output that can be used by test operators to
  configure their shell via `$(terraform output environment)`
* drops `remote-exec` provisioning steps from Terraform
* makes changes to the deployment scripts to ensure they can be run
  multiple times w/ different versions against the same host.
2020-01-22 08:48:52 -05:00
Tim Gross 1e600d573d
e2e: improve reusability of provisioning scripts (#6942)
This changeset is part of the work to improve our E2E provisioning
process to allow our upgrade tests:

* Move more of the setup into the AMI image creation so it's a little
 more obvious to provisioning config authors which bits are essential
 to deploying a specific version of Nomad.

* Make the service file update do a systemd daemon-reload so that we
  can update an already-running cluster with the same script we use to
  deploy it initially.
2020-01-16 09:29:36 -05:00
Tim Gross b5bcfb533b
upgrade CNI plugins to 0.8.4 (#6921)
When multiple Connect-enabled task groups start on the same client
node, a race condition in the CNI plugins for creating iptables chains
causes one of the tasks to fail. We upstreamed a patch to CNI plugins
to make iptables chain creation idempotent.

This changeset updates end-to-end testing, development tooling, and
documentation to use 0.8.4 which includes our patch.
2020-01-09 10:57:07 -05:00
Tim Gross 9b2b4da3a4
e2e: run client/allocs metrics nightly tests vs Windows (#6850)
Adds Windows targets to the client/allocs metrics tests. Removes the
`allocstats` test, which covers less than these tests and is now
redundant.

Adds a firewall rule to our Windows instances so that the prometheus
server can scrape the Nomad HTTP API for metrics.
2019-12-16 08:34:17 -05:00
Tim Gross c9d92f845f
e2e: add a Windows client to test runner (#6735)
* Adds a constraint to prevent tests from landing on Windows
* Improve Terraform output for mixed windows/linux clients
* Makes some Windows client config fixes from 0.10.2 testing
2019-11-25 13:31:00 -05:00
Tim Gross e012c2b5bf
Infrastructure for Windows e2e testing (#6584)
Includes:
* baseline Windows AMI
* initial pass at Terraform configurations
* OpenSSH for Windows

Using OpenSSH is a lot nicer for Nomad developers than winrm would be,
plus it lets us avoid passing around the Windows password in the
clear.

Note that now we're copying up all the provisioning scripts and
configs as a zipped bundle because TF's file provisioner dies in the
middle of pushing up multiple files (whereas `scp -r` works fine).

We're also running all the provisioning scripts inside the userdata by
polling for the zip file to show up (gross!). This is because
`remote-exec` provisioners are failing on Windows with the same symptoms as:

https://github.com/hashicorp/terraform/issues/17728

If we can't fix this, it'll prevent us from having multiple Windows
clients running until TF supports count interpolation in the
`template_file`, which is planned for a later 0.12 release.
2019-11-19 11:06:10 -05:00
Drew Bailey 2befab6900
Merge pull request #6573 from hashicorp/update-cci-consul
updates default consul version to 1.6.1
2019-11-07 11:01:22 -05:00
Drew Bailey 1c2af019c6
update vagrant & packer consul versions 2019-11-07 10:13:14 -05:00
Tim Gross 3e9ae481ce
e2e: refactor Consul configurations (#6559)
Ensure that we're reusing the base configuration between client and
servers without the possibility of drift. Reduce the amount of `sed`
mangling of the configuration file, and make recommended changes from
`shellcheck` for this section of the provisioning script.

Fixes some rebase errors on the Nomad config as well.
2019-10-28 09:27:40 -04:00
Tim Gross ba7e7413ef
e2e: refactor Nomad configuration (#6560)
Share base configuration for telemetry and consul. Have the server
configurations respect the `var.server_count` config. Make changes
recommended by `shellcheck` in the provisioning scripts for this section.

Switch to OS/arch-tagged release bundles on S3 for compatibility with
adding Windows builds in the near future.
2019-10-28 08:21:02 -04:00
Tim Gross 8be403f47b
e2e: refactor Vault configuration (#6561)
Match the configuration directory layout we're using for Consul and
other services. Make recommended changes from `shellcheck` for this
section of the provisioning script.
2019-10-25 15:29:01 -04:00
Tim Gross 87b3abddd3
e2e: use sockaddr for IP address configuration (#6548)
Update the Consul and Vault configs to take advantage of their
included `go-sockaddr` library for getting the IP addresses we need in
a portable way. This particularly avoids problems with "predictable"
interface names provided by systemd.

Also adds the `sockaddr` binary to the Packer build so we can use it
in our provisioning scripts.
2019-10-25 14:08:38 -04:00
Tim Gross efbd680d4e
e2e: split Packer build scripts from TF provisioning (#6542)
Make a clear split between Packer and Terraform provisioning steps:
the scripts in the `packer/linux` directory are run when we build the
AMI whereas the stuff in shared are run at Terraform provisioning time.

Merging all runtime provisioning scripts into a single script for each
of server/client solves the following:

* Userdata scripts can't take arguments, they can only be templated
  and that means we have to do TF escaping in bash/powershell scripts.
* TF provisioning scripts race with userdata scripts.
2019-10-25 08:08:24 -04:00
Tim Gross c648c4f998
e2e: upgrade terraform to 0.12.x (#6489) 2019-10-14 11:27:08 -04:00
Tim Gross 15e912ddd6
e2e: move remote-exec inline to script (#6488)
A failing script in a `remote-exec` provisioner's `inline` stanza
won't fail the provisioning step. This lets us continue on to execute
tests against potentially broken deployments, rather than letting us
know the provisioning itself failed.
2019-10-14 10:23:41 -04:00
Danielle Lancashire c50d7f2727
e2e: Add Host Volume Configuration 2019-09-17 20:06:50 +02:00
Tim Gross 3469c50275
e2e: tag instances with origin (#6293)
When multiple developers are working on e2e testing, it helps to be
able to identify which infrastructure belongs to which Nomad SHA and
which developer. This adds tags to the EC2 instances.
2019-09-06 15:49:18 -04:00
Michael Schurter 228899c32f e2e: test demo job for connect 2019-09-04 12:40:08 -07:00
Tim Gross 7ee3333a2d e2e: filter default AMI by OS
Add an OS tag to Packer builds of our e2e test AMIs and then filters
by this in Terraform.
2019-08-30 16:51:13 -04:00
Danielle Lancashire d454dab39b
chore: Format hcl configurations 2019-07-20 16:55:07 +02:00
Michael Schurter a3fcb8fcca e2e: debug log level for everyone! 2019-07-18 06:55:27 -07:00
Michael Schurter ea68c930fe e2e: enable_debug=true for all agents
Enables the pprof http endpoint for debugging.
2019-07-17 15:20:45 -07:00
Preetha 0a2e21353f
Merge pull request #5912 from hashicorp/f-systemd-nofile
systemd: set a high but non-infinite fd limit
2019-07-11 12:31:12 -05:00
Preetha Appan 53397722f1
add module version constraint to e2e/terraform 2019-07-05 09:18:38 -05:00
Michael Schurter 803aa62b7a systemd: set a high but non-infinite fd limit 2019-07-02 09:13:24 -07:00
Lang Martin 2a11d66258 e2e readme minor changes to command + env val templates and order 2019-05-22 12:34:57 -04:00
Michael Schurter 2b7f398726 e2e: fix nomad service for systemd<230 2019-05-14 10:53:26 -07:00
Michael Schurter a1c3ce36bc
Merge pull request #5647 from hashicorp/e2e-tf
E2E Test Terraform/Packer Improvements
2019-05-06 15:42:52 -07:00
Mahmood Ali bfc907827c docs: update s3 urls to use virtual bucket style
In response to https://forums.aws.amazon.com/ann.jspa?annID=6776
2019-05-06 10:39:51 -04:00
Michael Schurter 93f3ac7a9c e2e: explain these scripts are for packer
It took me way too long to figure out these weren't used by TF.
2019-05-03 07:55:28 -07:00
Michael Schurter 7558747694 e2e: let the unindex clients do anything...
...and be debugable!
2019-05-03 07:54:55 -07:00
Michael Schurter f08dd66ffa e2e: ssh instructions + remove redundant naming 2019-05-03 07:54:34 -07:00
Michael Schurter 19889d6468 e2e: update deps and install nomad in packer
Nomad on the packer image will be overwritten by the sha specified in
the TF var, but including a base version on the packer image makes the
image valid for independent use.
2019-05-03 07:53:08 -07:00
Michael Schurter 13b62a68f7 e2e: enable systemd units so they start on boot 2019-05-03 07:52:03 -07:00
Preetha fac6d8c918
Merge pull request #5405 from hashicorp/e2e_metrics
Prometheus metrics for the e2e environment
2019-03-21 09:30:12 -05:00
Michael Schurter 30db07cccb docs: sync systemd unit files; update deploy guide
The systemd configs spread across our repo were fairly out of sync. This
should get them on our best practices.

The deployment guide also had some strange things like running Nomad as
a non-root user. It would be fine for servers but completely breaks
clients. For simplicity I simply removed the non-root user references.
2019-03-19 15:18:12 -07:00
Preetha Appan 428f80afcc
prometheus and fabio for metrics 2019-03-11 09:21:04 -05:00
Preetha Appan 419a2682b0
update to Consul 1.4.0 for e2e tests 2019-01-24 09:52:15 -06:00
Danielle Tomlinson 33547c99e7 e2e: Output setup instructions after terraform
This adds a message that provides environment setup instructions for
running e2e tests after running terraform apply.

This allows copy/pasting exports, rather than manually constructing
them.
2019-01-04 16:55:14 +01:00
Preetha 5501ff42c9
Update README.md 2019-01-03 15:31:19 -06:00
Preetha 0071307414
Update README.md 2019-01-03 13:19:04 -06:00
Preetha 758ae0ca7c
Update README.md 2019-01-03 12:12:43 -06:00
Preetha Appan fb8980b46d
added readme 2018-12-18 13:37:03 -06:00
Preetha Appan 1ba4674ce2
suggestions from code review 2018-12-17 15:06:22 -06:00
Jack Pearkes 844d981e77
Terraform configs for e2e tests 2018-12-17 11:40:09 -06:00