Commit Graph

420 Commits

Author SHA1 Message Date
Tim Gross d810dab50b
migrate E2E test runs to new AWS account (#8676) 2020-08-18 14:24:34 -04:00
Jasmine Dahilig ee522ab587 task lifecycle: e2e tests 2020-08-18 10:49:50 -07:00
Drew Bailey 76d7d926a7
skip podman e2e 2020-08-14 09:02:56 -04:00
Tim Gross 09a97bd158
e2e: spread CSI controller plugins across multiple DCs (#8629)
Controller plugins that land on the same node will collide over their CSI
`mount_dir`, so give them enough room in our tests that they don't land on the
same host.

Also, version bump the EBS node plugins to match the controllers.
2020-08-10 16:41:39 -04:00
Tim Gross 12984ed1c9
e2e: CSI EBS test should expect 2 controllers (#8617) 2020-08-10 09:41:21 -04:00
Tim Gross fa6ec931f8
e2e: CSI EBS version bump to 0.6.0 (#8618) 2020-08-10 09:41:13 -04:00
Tim Gross 5dba653b43
csi/e2e: add 2nd controller for node drain testing (#8573) 2020-07-31 08:03:49 -04:00
Tim Gross 87f9bfaf1e
e2e/csi: update EFS plugin test to use v1.0 (#8562) 2020-07-30 08:41:48 -04:00
Tim Gross d0b03cad7c
e2e: give containers access to dnsmasq DNS (#8536)
By default, Docker containers get /etc/resolv.conf bound into the container
with the localhost entry stripped out. In order to resolve using the host's
dnsmasq, we need to make sure the container uses the docker0 IP as its
nameserver and that dnsmasq is listening on that port and forwarding to either
the AWS VPC DNS (so that we can query private resources like EFS) or to the
Consul DNS.
2020-07-24 14:09:18 -04:00
Lang Martin deb37c91b7
e2e/bin/run: run & update only attempt to contact linux servers (#8517) 2020-07-24 10:52:12 -04:00
Seth Hoenig c202d0f134
Merge pull request #8335 from hashicorp/f-cnative-host-e2e
e2e: add tests for connect native
2020-07-10 10:24:43 -05:00
Seth Hoenig ac8b51b611 e2e: connect jobID code golf 2020-07-10 10:24:13 -05:00
Drew Bailey 01b01f7cac
use latest podman release (#8403) 2020-07-09 09:28:53 -04:00
Seth Hoenig a9991e9ab9 e2e: add tests for connect native
Adds 2 tests around Connect Native. Both make use of the example connect native
services in https://github.com/hashicorp/nomad-connect-examples

One of them runs without Consul ACLs enabled, the other with.
2020-07-01 15:54:28 -05:00
Tim Gross 23be116da0
csi: add -force flag to volume deregister (#8295)
The `nomad volume deregister` command currently returns an error if the volume
has any claims, but in cases where the claims can't be dropped because of
plugin errors, providing a `-force` flag gives the operator an escape hatch.

If the volume has no allocations or if they are all terminal, this flag
deletes the volume from the state store, immediately and implicitly dropping
all claims without further CSI RPCs. Note that this will not also
unmount/detach the volume, which we'll make the responsibility of a separate
`nomad volume detach` command.
2020-07-01 12:17:51 -04:00
Drew Bailey 327843acfa
base podman e2e test and provisioning updates (#8104)
* initial setup for terrform to install podman task driver

podman

* Update e2e provisioning to support root podman

Excludes setup for rootless podman. updates source ami to ubuntu 18.04
Installs podman and configures podman varlink

base podman test

ensure client status running

revert terraform directory changes

* back out random go-discover go mod change

* include podman varlink docs

* address comments
2020-06-03 14:06:58 -04:00
Seth Hoenig 889e7ddd0c build: use hashicorp hclfmt
We have been using fatih/hclfmt which is long abandoned. Instead, switch
to HashiCorp's own hclfmt implementation. There are some trivial changes in
behavior around whitespace.
2020-05-24 18:31:57 -05:00
Tim Gross 932710ad7d
e2e: upgrade CNI to 0.8.6 (#7956) 2020-05-14 09:29:11 -04:00
Seth Hoenig 623c804046 e2e: upgrade consul in packer setup to 1.7.3 from 1.6.1
There have been a number of bug fixes and features particularly around
Connect that will help us in Nomad's e2e tests. Upgrade Consul in our
packer builder so e2e can make use of the new version.
2020-05-11 11:17:28 -06:00
Seth Hoenig aae8a8504e e2e: set an expose service check in connect e2e testcase
Make sure exposed checks work in e2e by setting an expose
check on the e2e connect test.
2020-05-07 14:40:03 -06:00
Tim Gross 139c65c436
e2e: csi test can purge target job (#7823) 2020-05-01 13:25:50 -04:00
Tim Gross 4935b304a0
e2e: add helper to Makefile for local file deployments (#7822) 2020-04-28 16:15:58 -04:00
Tim Gross ab3086a1f4
e2e: testing reliability (#7701)
* pin CSI plugin versions
* ensure failing CSI tests clean up
* allow NOMAD_SHA env var to override makefile
2020-04-13 10:25:24 -04:00
Mahmood Ali c8eddb9f6b fixup! e2e: add a convenient creation script 2020-04-09 11:04:26 -04:00
Mahmood Ali 8a4937d9ce e2e: add a convenient creation script
Add a convenience Makefile for creating e2e environment for manual
debugging.
2020-04-09 10:54:30 -04:00
Lang Martin c0dbcbef5f
e2e: csi: wait for volume write claims to be released before starting read jobs (#7641) 2020-04-07 07:40:44 -04:00
Tim Gross 50f807060a
e2e: csi tests can only run on linux (#7635) 2020-04-06 11:57:59 -04:00
Tim Gross 73dc2ad443 e2e/csi: add waiting for alloc stop 2020-04-06 10:15:55 -04:00
Tim Gross d81797ea33
e2e: improve test reliability for CSI (#7616)
This changeset:

* adds eval status to the error messages emitted when we have
  placement failure in tests. The implementation here isn't quite
  perfect but it's a lot better than "condition not met".
* enforces the ordering of teardown of the CSI test
* doesn't pass the purge flag to one of the two CSI tests, so that we
  exercise both code paths.
2020-04-03 15:52:58 -04:00
Tim Gross 4c51687cbf
e2e: remove gometa from e2eutils (#7610) 2020-04-03 10:22:22 -04:00
Tim Gross bde13dfc0c
e2e: have TF write-out HCL for CSI volume registration (#7599) 2020-04-02 12:16:43 -04:00
Seth Hoenig fc6b02c817 e2e: minimize Consul ACL policies used in e2e tests
Issue #7523 documents the Consul ACLs used in each Consul interface
used by Nomad. Minimize the policies used in e2e tests so that we
are setting a good example.
2020-03-30 12:53:40 -06:00
Tim Gross cd1c6173f4 csi: e2e tests for EBS and EFS plugins (#7343)
This changeset provides two basic e2e tests for CSI plugins targeting
common AWS use cases.

The EBS test launches the EBS plugin (controller + nodes) and registers
an EBS volume as a Nomad CSI volume. We deploy a job that writes to
the volume, stop that job, and reuse the volume for another job which
should be able to read the data written by the first job.

The EFS test launches the EFS plugin (nodes-only) and registers an EFS
volume as a Nomad CSI volume. We deploy a job that writes to the
volume, stop that job, and reuse the volume for another job which
should be able to read the data written by the first job.

The writer jobs mount the CSI volume at a location within the alloc
dir.
2020-03-23 13:59:18 -04:00
Mahmood Ali 857ddf7aaf e2e: use unique CSI token
Use a unique per-cluster efs creation token, as https://www.terraform.io/docs/providers/aws/r/efs_file_system.html#creation_token.

Using a static value prevents having multiple test clusters.

[ci skip]
2020-03-15 21:55:26 -04:00
Tim Gross 79222c36bf
e2e: add EBS and EFS volumes for testing CSI (#7266)
This changeset adds volumes but does not mount them to instances so
that we can test the mounting ("staging") via CSI plugins. The CSI
plugins themselves will be installed as Nomad jobs.

In order to ensure we can always mount the EFS volume, this changeset
pins the deployment of the cluster to a specific subnet. In future
work we should spread the cluster out among several AZs and test that
behavior explicitly.
2020-03-04 10:44:51 -05:00
Mahmood Ali f5bd51ec30 e2e: avoid parsing Args in pkg init
Golang 1.13 introduced a change in test flag parsing:

> testing
> ...
> Testing flags are now registered in the new Init function, which is invoked by the generated main function for the test. As a result, testing flags are now only registered when running a test binary, and packages that call flag.Parse during package initialization may cause tests to fail.

https://golang.org/doc/go1.13#testing

Here, we ensure that e2e framework parsing occur in TestMain, by only
initializing Framework at Run invocation.
2020-03-02 14:13:54 -05:00
Michael Schurter 2ab672c155 test: explicitly pass vars vs enclosing them 2020-02-14 11:10:33 -08:00
Michael Schurter aab1ad8c18 test: remove errgroup to take advantage of vet
go vet would have prevented the bug fixed in
6362e32161295fa959ebe46b93cea0ea1a9bdd72 but our use of errgroup
prevented that.

Rip out errgroup to take advantage of vet, and remove download limiting
now that we're downloading far fewer binaries overall.
2020-02-14 10:53:54 -08:00
Michael Schurter fb3e228af6 test: sort vault tests by version 2020-02-14 10:33:17 -08:00
Michael Schurter bc9e35aafb test: capture url to fix flaky test 2020-02-14 10:32:58 -08:00
Michael Schurter 32ecac58b6 test: only test latest Z of each X.Y.Z release 2020-02-14 08:41:45 -08:00
Michael Schurter 8c332a3757
Merge pull request #7102 from hashicorp/test-limits
Fix some race conditions and flaky tests
2020-02-13 10:19:11 -08:00
Michael Schurter 3170dfd452 test: simplify code 2020-02-07 15:50:53 -08:00
Tim Gross 0c6e164e8f
e2e: add --quiet flag to s3 copy to reduce log spam (#7085) 2020-02-06 09:24:20 -05:00
Seth Hoenig 351d32cd81
Merge pull request #7071 from hashicorp/b-e2e-cacls-wait-longer
e2e: wait 2m rather than 10s after disabling consul acls
2020-02-04 14:05:10 -06:00
Drew Bailey 7bee040e61
simplify job, better error 2020-02-04 13:59:39 -05:00
Drew Bailey 8b6de8f3d2
fix check 2020-02-04 12:16:20 -05:00
Drew Bailey b10c7cc94e
rm unused field 2020-02-04 12:02:01 -05:00
Drew Bailey a716d57ad7
clean up 2020-02-04 11:59:28 -05:00
Drew Bailey 75053a0d10
get test passing, new util func to wait for not pending 2020-02-04 11:56:37 -05:00
Drew Bailey 5117a22c30
add e2e test for system sched ineligible nodes 2020-02-04 11:56:33 -05:00
Seth Hoenig f4a66ebd28 e2e: wait 2m rather than 10s after disabling consul acls
Pretty sure Consul / Nomad clients are often not ready yet after
the ConsulACLs test disables ACLs, by the time the next test starts
running.

Running locally things tend to work, but in TeamCity this seems to
be a recurring problem. However, when running locally sometimes I do
see that the "show status" step after disabling ACLs, some nodes are
still initializing, suggesting we're right on the border of not waiting
long enough

    nomad node status
    ID        DC   Name              Class   Drain  Eligibility  Status
    0e4dfce2  dc1  EC2AMAZ-JB3NF9P   <none>  false  eligible     ready
    6b90aa06  dc2  ip-172-31-16-225  <none>  false  eligible     ready
    7068558a  dc2  ip-172-31-20-143  <none>  false  eligible     ready
    e0ae3c5c  dc1  ip-172-31-25-165  <none>  false  eligible     ready
    15b59ed6  dc1  ip-172-31-23-199  <none>  false  eligible     initializing

Going to try waiting a full 2 minutes after disabling ACLs, hopefully that
will help things Just Work. In the future, we should probably be parsing the
output of the status checks and actually confirming all nodes are ready.

Even better, maybe that's something shipyard will have built-in.
2020-02-04 10:51:03 -06:00
Tim Gross 0b48baf0ba
e2e: rename linux runner to avoid implicit build tag (#7070)
Go implicitly treats files ending with `_linux.go` as build tagged for
Linux only. This broke the e2e provisioning framework on macOS once we
tried importing it into the `e2e/consulacls` module.
2020-02-04 10:55:38 -05:00
Tim Gross 940110b2de
e2e: improve provisioning defaults and documentation (#7062)
This changeset improves the ergonomics of running the Nomad e2e test
provisioning process by defaulting to a blank `nomad_sha` in the
Terraform configuration. By default, a user will now need to pass in
one of the Nomad version flags. But they won't have to manually edit
the `provisioning.json` file for the common case of deploying a
released version of Nomad, and won't need to put dummy values for
`nomad_sha`.

Includes general documentation improvements.
2020-02-04 10:37:00 -05:00
Seth Hoenig 653c8fe9a5 e2e: turn no-ACLs connect tests back on
Also cleanup more missed debugging things >.>
2020-02-03 20:46:36 -06:00
Mahmood Ali 2424870937
Merge pull request #7055 from hashicorp/r-dev-tweaks-20200203
Grab bag of dev tweaks
2020-02-03 14:25:06 -05:00
Mahmood Ali 7171488e81 run "make hclfmt" 2020-02-03 12:15:53 -05:00
Seth Hoenig 057179edea e2e: remove leftover debug println statement 2020-02-03 11:15:38 -06:00
Seth Hoenig 9b20ca5b25 e2e: setup consul ACLs a little more correctly 2020-01-31 19:06:11 -06:00
Seth Hoenig 83c717a624 e2e: remove redundant extra API call for getting allocs 2020-01-31 19:06:07 -06:00
Seth Hoenig b212654b92 e2e: agent token was only being set for server0 2020-01-31 19:06:03 -06:00
Seth Hoenig f7a1e9cee3 e2e: use hclfmt on consul acls policy config files 2020-01-31 19:05:59 -06:00
Seth Hoenig e9e0d2e3fc e2e: uncomment test case that is not broken 2020-01-31 19:05:55 -06:00
Seth Hoenig df633ee45f e2e: do not use eventually when waiting for allocs
This test is causing panics. Unlike the other similar tests, this
one is using require.Eventually which is doing something bad, and
this change replaces it with a for-loop like the other tests.

Failure:

=== RUN   TestE2E/Connect
=== RUN   TestE2E/Connect/*connect.ConnectE2ETest
=== RUN   TestE2E/Connect/*connect.ConnectE2ETest/TestConnectDemo
=== RUN   TestE2E/Connect/*connect.ConnectE2ETest/TestMultiServiceConnect
=== RUN   TestE2E/Connect/*connect.ConnectClientStateE2ETest
panic: Fail in goroutine after TestE2E/Connect/*connect.ConnectE2ETest has completed

goroutine 38 [running]:
testing.(*common).Fail(0xc000656500)
	/opt/google/go/src/testing/testing.go:565 +0x11e
testing.(*common).Fail(0xc000656100)
	/opt/google/go/src/testing/testing.go:559 +0x96
testing.(*common).FailNow(0xc000656100)
	/opt/google/go/src/testing/testing.go:587 +0x2b
testing.(*common).Fatalf(0xc000656100, 0x1512f90, 0x10, 0xc000675f88, 0x1, 0x1)
	/opt/google/go/src/testing/testing.go:672 +0x91
github.com/hashicorp/nomad/e2e/connect.(*ConnectE2ETest).TestMultiServiceConnect.func1(0x0)
	/home/shoenig/go/src/github.com/hashicorp/nomad/e2e/connect/multi_service.go:72 +0x296
github.com/hashicorp/nomad/vendor/github.com/stretchr/testify/assert.Eventually.func1(0xc0004962a0, 0xc0002338f0)
	/home/shoenig/go/src/github.com/hashicorp/nomad/vendor/github.com/stretchr/testify/assert/assertions.go:1494 +0x27
created by github.com/hashicorp/nomad/vendor/github.com/stretchr/testify/assert.Eventually
	/home/shoenig/go/src/github.com/hashicorp/nomad/vendor/github.com/stretchr/testify/assert/assertions.go:1493 +0x272
FAIL	github.com/hashicorp/nomad/e2e	21.427s
2020-01-31 19:05:47 -06:00
Seth Hoenig 5e5fadbcdf e2e: remove forgotten unused field from new struct 2020-01-31 19:05:41 -06:00
Seth Hoenig fc498c2b96 e2e: e2e test for connect with consul acls
Provide script for managing Consul ACLs on a TF provisioned cluster for
e2e testing. Script can be used to 'enable' or 'disable' Consul ACLs,
and automatically takes care of the bootstrapping process if necessary.

The bootstrapping process takes a long time, so we may need to
extend the overall e2e timeout (20 minutes seems fine).

Introduces basic tests for Consul Connect with ACLs.
2020-01-31 19:05:36 -06:00
Seth Hoenig 93d347442f e2e: add a -suite flag to e2e.Framework
This change allows for providing the -suite=<Name> flag when
running the e2e framework. If set, only the matching e2e/Framework.TestSuite.Component
will be run, and all ther suites will be skipped.
2020-01-29 14:57:43 -06:00
Drew Bailey da4af9bef3
fix tests, update changelog 2020-01-29 13:55:39 -05:00
Tim Gross 7681f09ae4
e2e: packer builds should not be public (#6998) 2020-01-27 16:28:25 -05:00
Michael Schurter ed926a9d03
Merge pull request #6938 from hashicorp/e2e-vault
test: download Vault binaries for e2e test
2020-01-27 10:26:48 -08:00
Tim Gross 457e3ad5c6
e2e: document e2e provisioning process (#6976) 2020-01-22 16:55:17 -05:00
Tim Gross 29e1ed6b05
e2e: ensure group script check tests interpolation (#6972)
Fixes a bug introduced in 0aa58b9 where we're writing a test file to
a taskdir-interpolated location, which works when we `alloc exec` but
not in the jobspec for a group script check.

This changeset also makes the test safe to run multiple times by
namespacing the file with the alloc ID, which has the added bonus of
exercising our alloc interpolation code for group script checks.
2020-01-22 09:54:54 -05:00
Tim Gross 2edbdfc8be
e2e: update framework to allow deploying Nomad (#6969)
The e2e framework instantiates clients for Nomad/Consul but the
provisioning of the actual Nomad cluster is left to Terraform. The
Terraform provisioning process uses `remote-exec` to deploy specific
versions of Nomad so that we don't have to bake an AMI every time we
want to test a new version. But Terraform treats the resulting
instances as immutable, so we can't use the same tooling to update the
version of Nomad in-place. This is a prerequisite for upgrade testing.

This changeset extends the e2e framework to provide the option of
deploying Nomad (and, in the future, Consul/Vault) with specific
versions to running infrastructure. This initial implementation is
focused on deploying to a single cluster via `ssh` (because that's our
current need), but provides interfaces to hook the test run at the
start of the run, the start of each suite, or the start of a given
test case.

Terraform work includes:
* provides Terraform output that written to JSON used by the framework
  to configure provisioning via `terraform output provisioning`.
* provides Terraform output that can be used by test operators to
  configure their shell via `$(terraform output environment)`
* drops `remote-exec` provisioning steps from Terraform
* makes changes to the deployment scripts to ensure they can be run
  multiple times w/ different versions against the same host.
2020-01-22 08:48:52 -05:00
Tim Gross d6aac915a7
e2e: use valid jobspec for group check test (#6967)
Group service checks cannot interpolate task fields, because the task
fields are not available at the time the script check hook is created
for the group service. When f31482a was merged this e2e test began
failing because we are now correctly matching the script check ID to
the service ID, which revealed this jobspec was invalid.
2020-01-21 15:54:46 -05:00
Tim Gross 1e600d573d
e2e: improve reusability of provisioning scripts (#6942)
This changeset is part of the work to improve our E2E provisioning
process to allow our upgrade tests:

* Move more of the setup into the AMI image creation so it's a little
 more obvious to provisioning config authors which bits are essential
 to deploying a specific version of Nomad.

* Make the service file update do a systemd daemon-reload so that we
  can update an already-running cluster with the same script we use to
  deploy it initially.
2020-01-16 09:29:36 -05:00
Michael Schurter ffbfb60f40 test: restore e2e-test target and use -integration 2020-01-14 13:47:51 -08:00
Michael Schurter da4645e9a4 test: download Vault binaries for e2e test
Modernize Vault integration/e2e test a bit:

- Download from releases.hashicorp.com instead of using a hardcoded list
- Remove old unused make target e2e-test
- Use NOMAD_E2E env var instead of -integration flag
- Add a README

On my machine with ~250 Mbps internet it takes ~400s to download all
Vault binaries.
2020-01-14 11:02:02 -08:00
Nick Ethier 1f28633954
Merge pull request #6816 from hashicorp/b-multiple-envoy
connect: configure envoy to support multiple sidecars in the same alloc
2020-01-09 23:25:39 -05:00
Tim Gross b5bcfb533b
upgrade CNI plugins to 0.8.4 (#6921)
When multiple Connect-enabled task groups start on the same client
node, a race condition in the CNI plugins for creating iptables chains
causes one of the tasks to fail. We upstreamed a patch to CNI plugins
to make iptables chain creation idempotent.

This changeset updates end-to-end testing, development tooling, and
documentation to use 0.8.4 which includes our patch.
2020-01-09 10:57:07 -05:00
Tim Gross c11cc60674 commit a hclfmt to eliminate diffs after 'make dev' 2020-01-09 08:18:51 -05:00
Nick Ethier 7b931522f0 e2e: add test for multiple sevice sidecars in the same alloc 2020-01-06 12:48:35 -05:00
Tim Gross 4ba5691656
e2e: give metrics longer to settle (#6884)
Increase the shortened timeout after the first loop so that metrics
that take longer to come in aren't failing the test unnecessarily.

Move the check for empty alloc metrics into the loop so that if the
first values we get are empty we don't fail the test too early.
2019-12-20 10:39:35 -05:00
Tim Gross 9b2b4da3a4
e2e: run client/allocs metrics nightly tests vs Windows (#6850)
Adds Windows targets to the client/allocs metrics tests. Removes the
`allocstats` test, which covers less than these tests and is now
redundant.

Adds a firewall rule to our Windows instances so that the prometheus
server can scrape the Nomad HTTP API for metrics.
2019-12-16 08:34:17 -05:00
Tim Gross e439e927ed
e2e: run client/allocs metrics tests nightly (#6842)
Refactor the metrics end-to-end tests so they can be run with our e2e
test framework. Runs fabio/prometheus and a collection of jobs that
will cause metrics to be measured. We then query Prometheus to ensure
we're publishing those allocation metrics and some metrics from the
clients as well.

Includes adding a placeholder for running the same tests on Windows.
2019-12-12 12:45:16 -05:00
Seth Hoenig d81a091ccd
Merge pull request #6752 from hashicorp/docs-vault-token_period
docs: vault integration docs should reference new token_period field
2019-12-02 16:21:17 -05:00
Seth Hoenig 953e40c8ed docs: vault integration docs should reference new token_explicit_max_ttl field 2019-12-02 14:22:47 -06:00
Tim Gross 88cb95261b
e2e: add allocstats test for Windows (#6775)
Extends the BasicAllocStats test to include a test for Windows
clients, exercising stats via a powershell `raw_exec` job.

Adds `ListLinuxClientNodes` and `ListWindowsClientNodes` utils so that
we can scope tests to run only when Linux or Windows clients are
available. This prevents waiting on timeouts when running a subset of
the tests against a development cluster (vs our nightly test
cluster).
2019-11-26 08:05:42 -05:00
Mahmood Ali e626a145c6
Merge pull request #6713 from alrs/fix-e2e-cli-close-before-error
e2e/cli/command: close after error handling
2019-11-25 14:03:25 -05:00
Lars Lehtonen c9383ca17d
e2e/cli/command: Wait() after execution 2019-11-25 10:56:40 -08:00
Tim Gross c9d92f845f
e2e: add a Windows client to test runner (#6735)
* Adds a constraint to prevent tests from landing on Windows
* Improve Terraform output for mixed windows/linux clients
* Makes some Windows client config fixes from 0.10.2 testing
2019-11-25 13:31:00 -05:00
Tim Gross e012c2b5bf
Infrastructure for Windows e2e testing (#6584)
Includes:
* baseline Windows AMI
* initial pass at Terraform configurations
* OpenSSH for Windows

Using OpenSSH is a lot nicer for Nomad developers than winrm would be,
plus it lets us avoid passing around the Windows password in the
clear.

Note that now we're copying up all the provisioning scripts and
configs as a zipped bundle because TF's file provisioner dies in the
middle of pushing up multiple files (whereas `scp -r` works fine).

We're also running all the provisioning scripts inside the userdata by
polling for the zip file to show up (gross!). This is because
`remote-exec` provisioners are failing on Windows with the same symptoms as:

https://github.com/hashicorp/terraform/issues/17728

If we can't fix this, it'll prevent us from having multiple Windows
clients running until TF supports count interpolation in the
`template_file`, which is planned for a later 0.12 release.
2019-11-19 11:06:10 -05:00
Tim Gross 1210261fe2
hclfmt nomad jobspecs (#6724) 2019-11-19 10:36:41 -05:00
Drew Bailey 2befab6900
Merge pull request #6573 from hashicorp/update-cci-consul
updates default consul version to 1.6.1
2019-11-07 11:01:22 -05:00
Drew Bailey 1c2af019c6
update vagrant & packer consul versions 2019-11-07 10:13:14 -05:00
Drew Bailey 786989dbe3
New monitor pkg for shared monitor functionality
Adds new package that can be used by client and server RPC endpoints to
facilitate monitoring based off of a logger

clean up old code

small comment about write

rm old comment about minsize

rename to Monitor

Removes connection logic from monitor command

Keep connection logic in endpoints, use a channel to send results from
monitoring

use new multisink logger and interfaces

small test for dropped messages

update go-hclogger and update sink/intercept logger interfaces
2019-11-05 09:51:49 -05:00
Tim Gross 3e9ae481ce
e2e: refactor Consul configurations (#6559)
Ensure that we're reusing the base configuration between client and
servers without the possibility of drift. Reduce the amount of `sed`
mangling of the configuration file, and make recommended changes from
`shellcheck` for this section of the provisioning script.

Fixes some rebase errors on the Nomad config as well.
2019-10-28 09:27:40 -04:00
Tim Gross ba7e7413ef
e2e: refactor Nomad configuration (#6560)
Share base configuration for telemetry and consul. Have the server
configurations respect the `var.server_count` config. Make changes
recommended by `shellcheck` in the provisioning scripts for this section.

Switch to OS/arch-tagged release bundles on S3 for compatibility with
adding Windows builds in the near future.
2019-10-28 08:21:02 -04:00
Tim Gross 8be403f47b
e2e: refactor Vault configuration (#6561)
Match the configuration directory layout we're using for Consul and
other services. Make recommended changes from `shellcheck` for this
section of the provisioning script.
2019-10-25 15:29:01 -04:00
Tim Gross 87b3abddd3
e2e: use sockaddr for IP address configuration (#6548)
Update the Consul and Vault configs to take advantage of their
included `go-sockaddr` library for getting the IP addresses we need in
a portable way. This particularly avoids problems with "predictable"
interface names provided by systemd.

Also adds the `sockaddr` binary to the Packer build so we can use it
in our provisioning scripts.
2019-10-25 14:08:38 -04:00
Tim Gross efbd680d4e
e2e: split Packer build scripts from TF provisioning (#6542)
Make a clear split between Packer and Terraform provisioning steps:
the scripts in the `packer/linux` directory are run when we build the
AMI whereas the stuff in shared are run at Terraform provisioning time.

Merging all runtime provisioning scripts into a single script for each
of server/client solves the following:

* Userdata scripts can't take arguments, they can only be templated
  and that means we have to do TF escaping in bash/powershell scripts.
* TF provisioning scripts race with userdata scripts.
2019-10-25 08:08:24 -04:00
Tim Gross c648c4f998
e2e: upgrade terraform to 0.12.x (#6489) 2019-10-14 11:27:08 -04:00
Tim Gross 15e912ddd6
e2e: move remote-exec inline to script (#6488)
A failing script in a `remote-exec` provisioner's `inline` stanza
won't fail the provisioning step. This lets us continue on to execute
tests against potentially broken deployments, rather than letting us
know the provisioning itself failed.
2019-10-14 10:23:41 -04:00
Danielle Lancashire 199d24d6bf
chore: initial hclfmt 2019-10-11 14:00:05 +02:00
Lang Martin 0648402150
Merge pull request #6373 from hashicorp/b-raft-proto-upgrade
raft protocol defaults to version 2
2019-09-26 14:33:09 -04:00
Tim Gross d965a15490 driver/networking: don't recreate existing network namespaces 2019-09-25 14:58:17 -04:00
Tim Gross e86a476bbb failing test for #6310 2019-09-25 14:58:17 -04:00
Lang Martin 6e0ec6302b script e2e/upgrades: cluster upgrade scripts 2019-09-24 14:35:45 -04:00
Danielle 940bbcc639
Merge pull request #6342 from hashicorp/f-host-volume-e2e
Add Host Volumes E2E test
2019-09-18 12:59:32 -07:00
Tim Gross adde9acf57
e2e: test infra for client node restarts (#6313)
Add a test helper that restarts a specific client node running under
systemd using a `raw_exec` job.
2019-09-18 10:10:14 -04:00
Tim Gross 7061dcef4b
e2e: move consul status check helpers to e2eutil (#6314) 2019-09-18 08:18:19 -04:00
Danielle Lancashire 05d172ef2b
e2e: init host volumes test 2019-09-18 00:34:48 +02:00
Danielle Lancashire c50d7f2727
e2e: Add Host Volume Configuration 2019-09-17 20:06:50 +02:00
Tim Gross 55ee7a220b
e2e: fixes for race conditions in testing (#6300)
- In script checks, ensure we're running `Exec` against the new running
  allocation and not the earlier stopped one.
- In script checks, allow `Exec` calls to error due to lack of pty when
  we use the exec to kill the task.
- In `utils.go/RegisterAllocs`, force query for allocations to wait on
  wait index returned by registration call.
2019-09-10 13:45:16 -04:00
Tim Gross 3469c50275
e2e: tag instances with origin (#6293)
When multiple developers are working on e2e testing, it helps to be
able to identify which infrastructure belongs to which Nomad SHA and
which developer. This adds tags to the EC2 instances.
2019-09-06 15:49:18 -04:00
Tim Gross ede48ae19c script checks: use cat instead of ls for exit code agreement 2019-09-06 11:17:00 -04:00
Tim Gross c9c612cc70 e2e: script check testing 2019-09-06 10:18:55 -04:00
Michael Schurter 228899c32f e2e: test demo job for connect 2019-09-04 12:40:08 -07:00
Tim Gross 7ee3333a2d e2e: filter default AMI by OS
Add an OS tag to Packer builds of our e2e test AMIs and then filters
by this in Terraform.
2019-08-30 16:51:13 -04:00
Danielle Lancashire d454dab39b
chore: Format hcl configurations 2019-07-20 16:55:07 +02:00
Michael Schurter a3fcb8fcca e2e: debug log level for everyone! 2019-07-18 06:55:27 -07:00
Michael Schurter ea68c930fe e2e: enable_debug=true for all agents
Enables the pprof http endpoint for debugging.
2019-07-17 15:20:45 -07:00
Preetha 0a2e21353f
Merge pull request #5912 from hashicorp/f-systemd-nofile
systemd: set a high but non-infinite fd limit
2019-07-11 12:31:12 -05:00
Preetha Appan 53397722f1
add module version constraint to e2e/terraform 2019-07-05 09:18:38 -05:00
Michael Schurter 803aa62b7a systemd: set a high but non-infinite fd limit 2019-07-02 09:13:24 -07:00
Lang Martin d15d09bcc1 e2e update shell scripts argument quoting 2019-06-04 15:52:32 -04:00
Lang Martin 071dccfcce e2e/deployment DeploymentsForJob fail instead of nil, error passing 2019-06-04 14:31:42 -04:00
Lang Martin fa09e5d5f4 e2e/deployment fail if the second deployment times out 2019-06-04 14:08:30 -04:00
Lang Martin e61597a098 e2e bin/update and bin/run, README 2019-06-04 13:42:07 -04:00
Lang Martin 1635fa3c00 e2e/deployment find the second deployment, use its status 2019-06-04 13:41:52 -04:00
Lang Martin e027b9001b Update e2e/deployment/deployment.go
Co-Authored-By: Mahmood Ali <mahmood@notnoop.com>
2019-05-22 12:34:57 -04:00
Lang Martin 7929ef28c7 e2e/deployment comment the job files for clarity 2019-05-22 12:34:57 -04:00
Lang Martin fe69f89476 e2e add deployment to the list of e2e tests, minor fixes 2019-05-22 12:34:57 -04:00
Lang Martin 2a11d66258 e2e readme minor changes to command + env val templates and order 2019-05-22 12:34:57 -04:00
Lang Martin 97fd114535 e2e utils remove ineffectual assignment of allocs 2019-05-22 12:34:57 -04:00
Lang Martin 01276455bd e2e README typo 2019-05-22 12:34:57 -04:00
Lang Martin 824d1366dd e2e utils error format arg match 2019-05-22 12:32:08 -04:00
Lang Martin 09a6dc2054 new e2e deployment test 2019-05-22 12:32:08 -04:00
Lang Martin d73606e54e e2e util split new alloc and await placement, new WaitForDeployment 2019-05-22 12:32:08 -04:00
Preetha 2dcd4291f8
Merge pull request #5702 from hashicorp/f-filter-by-create-index
Filter deployments by create index
2019-05-15 21:50:41 -05:00
Michael Schurter 2b7f398726 e2e: fix nomad service for systemd<230 2019-05-14 10:53:26 -07:00
Preetha Appan 07690d6f9e
Add flag similar to --all for allocs to be able to filter deployments by latest 2019-05-13 18:33:41 -05:00
Mahmood Ali 919827f2df
Merge pull request #5632 from hashicorp/f-nomad-exec-parts-01-base
nomad exec part 1: plumbing and docker driver
2019-05-09 18:09:27 -04:00
Mahmood Ali 2a555a7e74 add e2e tests for nomad exec 2019-05-09 16:49:08 -04:00
Michael Schurter a1c3ce36bc
Merge pull request #5647 from hashicorp/e2e-tf
E2E Test Terraform/Packer Improvements
2019-05-06 15:42:52 -07:00
Mahmood Ali bfc907827c docs: update s3 urls to use virtual bucket style
In response to https://forums.aws.amazon.com/ann.jspa?annID=6776
2019-05-06 10:39:51 -04:00
Michael Schurter 93f3ac7a9c e2e: explain these scripts are for packer
It took me way too long to figure out these weren't used by TF.
2019-05-03 07:55:28 -07:00
Michael Schurter 7558747694 e2e: let the unindex clients do anything...
...and be debugable!
2019-05-03 07:54:55 -07:00
Michael Schurter f08dd66ffa e2e: ssh instructions + remove redundant naming 2019-05-03 07:54:34 -07:00
Michael Schurter 19889d6468 e2e: update deps and install nomad in packer
Nomad on the packer image will be overwritten by the sha specified in
the TF var, but including a base version on the packer image makes the
image valid for independent use.
2019-05-03 07:53:08 -07:00
Michael Schurter 13b62a68f7 e2e: enable systemd units so they start on boot 2019-05-03 07:52:03 -07:00
Chris Baker c0a7aee610
vault e2e: pass vault version into setup instead of having to infer it from test name 2019-04-10 10:34:10 -05:00
Chris Baker b290d774bc
e2e/vault: updated e2e vault tests to use version-specific policy creation endpoint (old servers are not compatible with new client) 2019-04-10 10:34:10 -05:00
Preetha Appan f0e2859c59
scripts for upgrade testing 2019-04-04 22:31:57 -05:00
Preetha Appan 17d4e80c16
small tweaks to load test jobs to make them work in Nomad 0.8.7 2019-04-02 20:38:56 -05:00
Preetha Appan 19b4bb7ec3
added cpu/disk/memory stress jobs for e2e tests 2019-04-01 22:28:18 -05:00
Preetha Appan 007d771174
Added nginx to e2e test 2019-04-01 14:52:58 -05:00
Preetha Appan a262be08e7
Remove unnecessary step in getting node client
All allocation stats are routable from the server
2019-04-01 10:45:41 -05:00
Preetha Appan f9019ae605
Add e2e test with raw exec job for verifying allocation resource stats 2019-03-31 09:46:23 -05:00
Preetha Appan dc370d2e6f
Use specific url prefix for metrics test
Also changed the output to show client node IP addresses
2019-03-27 11:04:06 -05:00
Michael Schurter 1fab175b26 test: properly skip client state in beforeall 2019-03-22 06:42:04 -07:00
Preetha fac6d8c918
Merge pull request #5405 from hashicorp/e2e_metrics
Prometheus metrics for the e2e environment
2019-03-21 09:30:12 -05:00
Preetha Appan bf8483c960
remove stray println 2019-03-21 09:23:37 -05:00
Michael Schurter 555d6d35ce test: skip slow state test without flag 2019-03-21 07:17:02 -07:00
Michael Schurter cd87afd15f e2e: add NomadAgent and basic client state test
The e2e test code is absolutely hideous and leaks processes and files
on disk. NomadAgent seems useful, but the clientstate e2e tests are very
messy and slow. The last test "Corrupt" is probably the most useful as
it explicitly corrupts the state file whereas the other tests attempt to
reproduce steps thought to cause corruption in earlier releases of
Nomad.
2019-03-21 07:14:34 -07:00
Michael Schurter 30db07cccb docs: sync systemd unit files; update deploy guide
The systemd configs spread across our repo were fairly out of sync. This
should get them on our best practices.

The deployment guide also had some strange things like running Nomad as
a non-root user. It would be fine for servers but completely breaks
clients. For simplicity I simply removed the non-root user references.
2019-03-19 15:18:12 -07:00
Preetha Appan 59e5ee18b0
Removed use of e2e framework 2019-03-11 09:21:04 -05:00
Preetha Appan 428f80afcc
prometheus and fabio for metrics 2019-03-11 09:21:04 -05:00
Mahmood Ali bd11a4c985 tests: disable upgrade e2e tests
Upgrade e2e tests are failing and we haven't had bandwith to fix yet.
Having them fail makes it easy for us to miss other failures and
regressions.

As such, skip the upgrade e2e tests until we fix them.
2019-02-27 08:40:09 -05:00
Alex Dadgar b938a71397 make nomad upgrade e2e build on non linux 2019-01-28 11:18:59 -08:00
Preetha 48b4ed91a3
Merge pull request #5249 from hashicorp/consul_e2e
Consul e2e tests
2019-01-28 10:46:49 -06:00
Preetha Appan 25a29cb52f
Moved in place upgrade canary test over to new e2e framework 2019-01-27 20:15:35 -06:00
Preetha Appan b4a722e08f
Basic consul registration e2e 2019-01-26 10:58:25 -06:00
Alex Dadgar d6412fd8e7 Fix double restart counting for templates
This PR fixes an issue where template restarts would count twice since
it was emitting a restarting event.
2019-01-25 15:38:13 -08:00
Preetha Appan 419a2682b0
update to Consul 1.4.0 for e2e tests 2019-01-24 09:52:15 -06:00
Nick Ethier e104528288
e2e/nomad09upgrade: remove test file that is not needed 2019-01-23 21:23:13 -05:00
Nick Ethier 22da77f3fa
e2e: removed unused code in nomade09upgrade suite 2019-01-23 20:54:23 -05:00
Nick Ethier ccdc1c4615
Merge branch 'f-driver-upgradepath-test' of https://github.com/hashicorp/nomad into f-driver-upgradepath-test
* 'f-driver-upgradepath-test' of https://github.com/hashicorp/nomad:
  Apply suggestions from code review
2019-01-23 20:26:32 -05:00
Nick Ethier 5b9013528e
drivers: add docker upgrade path and e2e test 2019-01-23 14:44:42 -05:00
Michael Schurter ce4a828fd1
Apply suggestions from code review
Co-Authored-By: nickethier <ncethier@gmail.com>
2019-01-23 14:09:49 -05:00
Nick Ethier a3510413ff
e2e: remove unused script 2019-01-22 23:29:15 -05:00
Nick Ethier fc370c3d5e
e2e/nomad09upgrade: add comments 2019-01-22 23:25:27 -05:00
Nick Ethier c30965eaf9
e2e: add tests for nomad driver upgrade path 2019-01-17 23:32:45 -05:00
Michael Schurter 641ba37c61 e2e: test task events for a failed sibling task 2019-01-08 14:39:37 -08:00
Michael Schurter b09c68ceaf e2e: wait for at least N nodes to be ready
Before it was *exactly* N nodes which limited test portability between
clusters.
2019-01-08 14:39:37 -08:00
Michael Schurter 2cf49f9121 e2e: add task events tests 2019-01-08 07:20:53 -08:00
Danielle Tomlinson d195680ec1 e2e: Add consultemplate test
This adds a basic test for consul template, that verifies the behaviour
of consul-template with task blocking and restarting of tasks
2019-01-07 17:53:55 +01:00
Danielle Tomlinson c13dc7f110
Merge pull request #5149 from hashicorp/dani/e2e-friendly
e2e: Output setup instructions after terraform
2019-01-04 22:14:03 +01:00
Danielle Tomlinson 33547c99e7 e2e: Output setup instructions after terraform
This adds a message that provides environment setup instructions for
running e2e tests after running terraform apply.

This allows copy/pasting exports, rather than manually constructing
them.
2019-01-04 16:55:14 +01:00
Mahmood Ali 606ab23235 goimport file 2019-01-04 08:53:50 -05:00
Preetha Appan 378dd74d2a
Added waiting on client node ready state before running e2e tests 2019-01-03 16:16:20 -06:00
Preetha 1e69a6645f
Update README.md 2019-01-03 16:15:59 -06:00
Preetha 5501ff42c9
Update README.md 2019-01-03 15:31:19 -06:00
Preetha Appan f458cb63dd
Increase alloc wait timeout in e2e test 2019-01-03 14:02:02 -06:00
Preetha 9e235f4cb6
Update e2e readme 2019-01-03 13:24:58 -06:00
Preetha 0071307414
Update README.md 2019-01-03 13:19:04 -06:00
Preetha 758ae0ca7c
Update README.md 2019-01-03 12:12:43 -06:00
Preetha Appan d182c0f5cd
Increase timeout in e2e test 2019-01-03 11:22:21 -06:00
Danielle Tomlinson d3b41a26c4 e2e: goimports e2eutil/utils.go 2019-01-03 13:31:49 +01:00
Preetha Appan 2845a556a3
Clean up map update code 2018-12-20 15:12:48 -06:00
Preetha Appan 1bebce3525
new e2e test for spread, and refactor affinity tests to share util methods 2018-12-19 21:25:32 -06:00