Until we have LCOW support in the E2E environment (which requires a Windows
2019 test target), we need to constrain E2E tests to the appropriate kernel
The E2E framework wraps testify's `require` so that by default we can stop
tests on errors, but the cleanup functions should use `assert` so that we
continue to try to cleanup the test environment even if there's a failure.
Exercises host volume and Docker volume functionality for the `exec` and `docker`
task driver, particularly around mounting locations within the container and
how this can be used with `template`.
Adds a `nomad_acls` flag to our Terraform stack that bootstraps Nomad ACLs via
a `local-exec` provider. There's no way to set the `NOMAD_TOKEN` in the Nomad
TF provider if we're bootstrapping in the same Terraform stack, so instead of
using `resource.nomad_acl_token`, we also bootstrap a wide-open anonymous
policy. The resulting management token is exported as an environment var with
`$(terraform output environment)` and tests that want stricter ACLs will be
able to write them using that token.
This should also provide a basis to do similar work with Consul ACLs in the
future.
Newer EC2 instances are both cheaper and have generally better
performance.
The dnsmasq configuration had a hard-coded interface name, so in order to
accomodate instances with more recent networking that result in so-called
predictable interface names, the dnsmasq configuration needs to be replaced at
runtime with userdata to select the default interface.
The conditional around some of the rescheduling tests was backwards, where we
were waiting for allocations to be rescheduled but testing for a count of
0. The test was passing but flaky because if the check happened quickly enough
before the scheduler rescheduled the allocations, it would pass.
Have Terraform run the target-specific `provision.sh`/`provision.ps1` script
rather than the test runner code which needs to be customized for each
distro. Use Terraform's detection of variable value changes so that we can
re-run the provisioning without having to re-install Nomad on those specific
hosts that need it changed.
Allow the configuration "profile" (well-known directory) to be set by a
Terraform variable. The default configurations are installed during Packer
build time, and symlinked into the live configuration directory by the
provision script. Detect changes in the file contents so that we only upload
custom configuration files that have changed between Terraform runs
* remove outdated references to envchain in documentation
* add new host volume locations in userdata
* don't exit the entire script during provisioning, just return
The CLI helpers in the rescheduling test were intended for shared use, but
until some other tests were written we didn't want to waste time making them
generic. This changeset refactors them and adds some new helpers associated
with the node drain tests (under separate PR).
The rescheduling test workloads were created before we had Windows targets in
the E2E nightly run. When these were recently ported to the e2e framework they
were missing the constraint to Linux machines.
Also added a little extra time to polling to avoid some flakiness on the first
run, and a minor readability adjustment to the job names.
Ports the rescheduling tests (which aren't running in CI) into the current
test framework so that they're run on nightly, and exercises the new CLI
helpers.
The E2E suite exercises the API, but not the CLI. This changeset adds a helper
function to send commands via a locally-built Nomad binary (which we'll need
to add to the E2E setup), and some helpers to parse the resulting structured
outputs in a way that tests can consume.
When running the Fabio and Prometheus jobs for the metrics suite
it seems the outer directory is required in the call when
registering the job.
error: "e2e/input/fabio.nomad: no such file or directory"
This changeset stages upcoming E2E provisioning improvements work. It splits
the existing shared configuration directory into 3 profiles:
* "full-cluster": the set of configurations currently in use
* "dev-cluster": a simplified set of mostly existing configurations that
weren't in use.
* "custom": an empty profile for developers to keep non-standard
configurations during complex feature development.
The tooling to switch between profiles will be in a later changeset.
Also drops some unused configuration knobs from the provisioning scripts to
make the next stage of work easier.
Our provisioning process for E2E doesn't require the `depends_on` fields to be
set for client instances, so dropping that field allows all instances to be
started in parallel.
We don't use the extra EBS volumes (they aren't even mounted), so remove them
to reduce costs.
The `-recursor` flag in the Consul service unit files is specific to a given
cloud, but we already have cloud-specific configuration files. Consolidate all
the cloud-specific items into the config.
As we add new Linux targets for E2E, the existing setup.sh script will be used
only for Ubuntu. Rather than have the service and config files echo'd from the
script, move them into files we upload so they can be reused.
Includes some general noise reduction in the setup.sh script and removal of
unused bits.
This changeset moves the installation of Nomad binaries out of the
provisioning framework and into scripts that are installed on the remote host
during AMI builds.
This provides a few advantages:
* The provisioning framework can be reduced in scope (with the goal of moving
most of it into the Terraform stack entirely).
* The scripts can be arbitrarily complex if we don't have to stuff them into
ssh commands, so it's easier to make them idempotent. In this changeset, the
scripts check the version of the existing binary and don't re-download when
using the `--nomad_sha` or `--nomad_version` flags.
* The scripts can be OS/distro specific, which helps in building new test
targets.
Just a smattering of attempted improvements as I read through this
again. Some of my goals:
- Tried to add more high level info to the intro to set the context
- Clarify the difference between *test* dev and *agent* dev workflows
- Add -timeout to provisioning step because cable Internet is lol
Controller plugins that land on the same node will collide over their CSI
`mount_dir`, so give them enough room in our tests that they don't land on the
same host.
Also, version bump the EBS node plugins to match the controllers.