Commit Graph

238 Commits

Author SHA1 Message Date
Tim Gross cd1c6173f4 csi: e2e tests for EBS and EFS plugins (#7343)
This changeset provides two basic e2e tests for CSI plugins targeting
common AWS use cases.

The EBS test launches the EBS plugin (controller + nodes) and registers
an EBS volume as a Nomad CSI volume. We deploy a job that writes to
the volume, stop that job, and reuse the volume for another job which
should be able to read the data written by the first job.

The EFS test launches the EFS plugin (nodes-only) and registers an EFS
volume as a Nomad CSI volume. We deploy a job that writes to the
volume, stop that job, and reuse the volume for another job which
should be able to read the data written by the first job.

The writer jobs mount the CSI volume at a location within the alloc
dir.
2020-03-23 13:59:18 -04:00
Mahmood Ali 857ddf7aaf e2e: use unique CSI token
Use a unique per-cluster efs creation token, as https://www.terraform.io/docs/providers/aws/r/efs_file_system.html#creation_token.

Using a static value prevents having multiple test clusters.

[ci skip]
2020-03-15 21:55:26 -04:00
Tim Gross 79222c36bf
e2e: add EBS and EFS volumes for testing CSI (#7266)
This changeset adds volumes but does not mount them to instances so
that we can test the mounting ("staging") via CSI plugins. The CSI
plugins themselves will be installed as Nomad jobs.

In order to ensure we can always mount the EFS volume, this changeset
pins the deployment of the cluster to a specific subnet. In future
work we should spread the cluster out among several AZs and test that
behavior explicitly.
2020-03-04 10:44:51 -05:00
Mahmood Ali f5bd51ec30 e2e: avoid parsing Args in pkg init
Golang 1.13 introduced a change in test flag parsing:

> testing
> ...
> Testing flags are now registered in the new Init function, which is invoked by the generated main function for the test. As a result, testing flags are now only registered when running a test binary, and packages that call flag.Parse during package initialization may cause tests to fail.

https://golang.org/doc/go1.13#testing

Here, we ensure that e2e framework parsing occur in TestMain, by only
initializing Framework at Run invocation.
2020-03-02 14:13:54 -05:00
Michael Schurter 2ab672c155 test: explicitly pass vars vs enclosing them 2020-02-14 11:10:33 -08:00
Michael Schurter aab1ad8c18 test: remove errgroup to take advantage of vet
go vet would have prevented the bug fixed in
6362e32161295fa959ebe46b93cea0ea1a9bdd72 but our use of errgroup
prevented that.

Rip out errgroup to take advantage of vet, and remove download limiting
now that we're downloading far fewer binaries overall.
2020-02-14 10:53:54 -08:00
Michael Schurter fb3e228af6 test: sort vault tests by version 2020-02-14 10:33:17 -08:00
Michael Schurter bc9e35aafb test: capture url to fix flaky test 2020-02-14 10:32:58 -08:00
Michael Schurter 32ecac58b6 test: only test latest Z of each X.Y.Z release 2020-02-14 08:41:45 -08:00
Michael Schurter 8c332a3757
Merge pull request #7102 from hashicorp/test-limits
Fix some race conditions and flaky tests
2020-02-13 10:19:11 -08:00
Michael Schurter 3170dfd452 test: simplify code 2020-02-07 15:50:53 -08:00
Tim Gross 0c6e164e8f
e2e: add --quiet flag to s3 copy to reduce log spam (#7085) 2020-02-06 09:24:20 -05:00
Seth Hoenig 351d32cd81
Merge pull request #7071 from hashicorp/b-e2e-cacls-wait-longer
e2e: wait 2m rather than 10s after disabling consul acls
2020-02-04 14:05:10 -06:00
Drew Bailey 7bee040e61
simplify job, better error 2020-02-04 13:59:39 -05:00
Drew Bailey 8b6de8f3d2
fix check 2020-02-04 12:16:20 -05:00
Drew Bailey b10c7cc94e
rm unused field 2020-02-04 12:02:01 -05:00
Drew Bailey a716d57ad7
clean up 2020-02-04 11:59:28 -05:00
Drew Bailey 75053a0d10
get test passing, new util func to wait for not pending 2020-02-04 11:56:37 -05:00
Drew Bailey 5117a22c30
add e2e test for system sched ineligible nodes 2020-02-04 11:56:33 -05:00
Seth Hoenig f4a66ebd28 e2e: wait 2m rather than 10s after disabling consul acls
Pretty sure Consul / Nomad clients are often not ready yet after
the ConsulACLs test disables ACLs, by the time the next test starts
running.

Running locally things tend to work, but in TeamCity this seems to
be a recurring problem. However, when running locally sometimes I do
see that the "show status" step after disabling ACLs, some nodes are
still initializing, suggesting we're right on the border of not waiting
long enough

    nomad node status
    ID        DC   Name              Class   Drain  Eligibility  Status
    0e4dfce2  dc1  EC2AMAZ-JB3NF9P   <none>  false  eligible     ready
    6b90aa06  dc2  ip-172-31-16-225  <none>  false  eligible     ready
    7068558a  dc2  ip-172-31-20-143  <none>  false  eligible     ready
    e0ae3c5c  dc1  ip-172-31-25-165  <none>  false  eligible     ready
    15b59ed6  dc1  ip-172-31-23-199  <none>  false  eligible     initializing

Going to try waiting a full 2 minutes after disabling ACLs, hopefully that
will help things Just Work. In the future, we should probably be parsing the
output of the status checks and actually confirming all nodes are ready.

Even better, maybe that's something shipyard will have built-in.
2020-02-04 10:51:03 -06:00
Tim Gross 0b48baf0ba
e2e: rename linux runner to avoid implicit build tag (#7070)
Go implicitly treats files ending with `_linux.go` as build tagged for
Linux only. This broke the e2e provisioning framework on macOS once we
tried importing it into the `e2e/consulacls` module.
2020-02-04 10:55:38 -05:00
Tim Gross 940110b2de
e2e: improve provisioning defaults and documentation (#7062)
This changeset improves the ergonomics of running the Nomad e2e test
provisioning process by defaulting to a blank `nomad_sha` in the
Terraform configuration. By default, a user will now need to pass in
one of the Nomad version flags. But they won't have to manually edit
the `provisioning.json` file for the common case of deploying a
released version of Nomad, and won't need to put dummy values for
`nomad_sha`.

Includes general documentation improvements.
2020-02-04 10:37:00 -05:00
Seth Hoenig 653c8fe9a5 e2e: turn no-ACLs connect tests back on
Also cleanup more missed debugging things >.>
2020-02-03 20:46:36 -06:00
Mahmood Ali 2424870937
Merge pull request #7055 from hashicorp/r-dev-tweaks-20200203
Grab bag of dev tweaks
2020-02-03 14:25:06 -05:00
Mahmood Ali 7171488e81 run "make hclfmt" 2020-02-03 12:15:53 -05:00
Seth Hoenig 057179edea e2e: remove leftover debug println statement 2020-02-03 11:15:38 -06:00
Seth Hoenig 9b20ca5b25 e2e: setup consul ACLs a little more correctly 2020-01-31 19:06:11 -06:00
Seth Hoenig 83c717a624 e2e: remove redundant extra API call for getting allocs 2020-01-31 19:06:07 -06:00
Seth Hoenig b212654b92 e2e: agent token was only being set for server0 2020-01-31 19:06:03 -06:00
Seth Hoenig f7a1e9cee3 e2e: use hclfmt on consul acls policy config files 2020-01-31 19:05:59 -06:00
Seth Hoenig e9e0d2e3fc e2e: uncomment test case that is not broken 2020-01-31 19:05:55 -06:00
Seth Hoenig df633ee45f e2e: do not use eventually when waiting for allocs
This test is causing panics. Unlike the other similar tests, this
one is using require.Eventually which is doing something bad, and
this change replaces it with a for-loop like the other tests.

Failure:

=== RUN   TestE2E/Connect
=== RUN   TestE2E/Connect/*connect.ConnectE2ETest
=== RUN   TestE2E/Connect/*connect.ConnectE2ETest/TestConnectDemo
=== RUN   TestE2E/Connect/*connect.ConnectE2ETest/TestMultiServiceConnect
=== RUN   TestE2E/Connect/*connect.ConnectClientStateE2ETest
panic: Fail in goroutine after TestE2E/Connect/*connect.ConnectE2ETest has completed

goroutine 38 [running]:
testing.(*common).Fail(0xc000656500)
	/opt/google/go/src/testing/testing.go:565 +0x11e
testing.(*common).Fail(0xc000656100)
	/opt/google/go/src/testing/testing.go:559 +0x96
testing.(*common).FailNow(0xc000656100)
	/opt/google/go/src/testing/testing.go:587 +0x2b
testing.(*common).Fatalf(0xc000656100, 0x1512f90, 0x10, 0xc000675f88, 0x1, 0x1)
	/opt/google/go/src/testing/testing.go:672 +0x91
github.com/hashicorp/nomad/e2e/connect.(*ConnectE2ETest).TestMultiServiceConnect.func1(0x0)
	/home/shoenig/go/src/github.com/hashicorp/nomad/e2e/connect/multi_service.go:72 +0x296
github.com/hashicorp/nomad/vendor/github.com/stretchr/testify/assert.Eventually.func1(0xc0004962a0, 0xc0002338f0)
	/home/shoenig/go/src/github.com/hashicorp/nomad/vendor/github.com/stretchr/testify/assert/assertions.go:1494 +0x27
created by github.com/hashicorp/nomad/vendor/github.com/stretchr/testify/assert.Eventually
	/home/shoenig/go/src/github.com/hashicorp/nomad/vendor/github.com/stretchr/testify/assert/assertions.go:1493 +0x272
FAIL	github.com/hashicorp/nomad/e2e	21.427s
2020-01-31 19:05:47 -06:00
Seth Hoenig 5e5fadbcdf e2e: remove forgotten unused field from new struct 2020-01-31 19:05:41 -06:00
Seth Hoenig fc498c2b96 e2e: e2e test for connect with consul acls
Provide script for managing Consul ACLs on a TF provisioned cluster for
e2e testing. Script can be used to 'enable' or 'disable' Consul ACLs,
and automatically takes care of the bootstrapping process if necessary.

The bootstrapping process takes a long time, so we may need to
extend the overall e2e timeout (20 minutes seems fine).

Introduces basic tests for Consul Connect with ACLs.
2020-01-31 19:05:36 -06:00
Seth Hoenig 93d347442f e2e: add a -suite flag to e2e.Framework
This change allows for providing the -suite=<Name> flag when
running the e2e framework. If set, only the matching e2e/Framework.TestSuite.Component
will be run, and all ther suites will be skipped.
2020-01-29 14:57:43 -06:00
Drew Bailey da4af9bef3
fix tests, update changelog 2020-01-29 13:55:39 -05:00
Tim Gross 7681f09ae4
e2e: packer builds should not be public (#6998) 2020-01-27 16:28:25 -05:00
Michael Schurter ed926a9d03
Merge pull request #6938 from hashicorp/e2e-vault
test: download Vault binaries for e2e test
2020-01-27 10:26:48 -08:00
Tim Gross 457e3ad5c6
e2e: document e2e provisioning process (#6976) 2020-01-22 16:55:17 -05:00
Tim Gross 29e1ed6b05
e2e: ensure group script check tests interpolation (#6972)
Fixes a bug introduced in 0aa58b9 where we're writing a test file to
a taskdir-interpolated location, which works when we `alloc exec` but
not in the jobspec for a group script check.

This changeset also makes the test safe to run multiple times by
namespacing the file with the alloc ID, which has the added bonus of
exercising our alloc interpolation code for group script checks.
2020-01-22 09:54:54 -05:00
Tim Gross 2edbdfc8be
e2e: update framework to allow deploying Nomad (#6969)
The e2e framework instantiates clients for Nomad/Consul but the
provisioning of the actual Nomad cluster is left to Terraform. The
Terraform provisioning process uses `remote-exec` to deploy specific
versions of Nomad so that we don't have to bake an AMI every time we
want to test a new version. But Terraform treats the resulting
instances as immutable, so we can't use the same tooling to update the
version of Nomad in-place. This is a prerequisite for upgrade testing.

This changeset extends the e2e framework to provide the option of
deploying Nomad (and, in the future, Consul/Vault) with specific
versions to running infrastructure. This initial implementation is
focused on deploying to a single cluster via `ssh` (because that's our
current need), but provides interfaces to hook the test run at the
start of the run, the start of each suite, or the start of a given
test case.

Terraform work includes:
* provides Terraform output that written to JSON used by the framework
  to configure provisioning via `terraform output provisioning`.
* provides Terraform output that can be used by test operators to
  configure their shell via `$(terraform output environment)`
* drops `remote-exec` provisioning steps from Terraform
* makes changes to the deployment scripts to ensure they can be run
  multiple times w/ different versions against the same host.
2020-01-22 08:48:52 -05:00
Tim Gross d6aac915a7
e2e: use valid jobspec for group check test (#6967)
Group service checks cannot interpolate task fields, because the task
fields are not available at the time the script check hook is created
for the group service. When f31482a was merged this e2e test began
failing because we are now correctly matching the script check ID to
the service ID, which revealed this jobspec was invalid.
2020-01-21 15:54:46 -05:00
Tim Gross 1e600d573d
e2e: improve reusability of provisioning scripts (#6942)
This changeset is part of the work to improve our E2E provisioning
process to allow our upgrade tests:

* Move more of the setup into the AMI image creation so it's a little
 more obvious to provisioning config authors which bits are essential
 to deploying a specific version of Nomad.

* Make the service file update do a systemd daemon-reload so that we
  can update an already-running cluster with the same script we use to
  deploy it initially.
2020-01-16 09:29:36 -05:00
Michael Schurter ffbfb60f40 test: restore e2e-test target and use -integration 2020-01-14 13:47:51 -08:00
Michael Schurter da4645e9a4 test: download Vault binaries for e2e test
Modernize Vault integration/e2e test a bit:

- Download from releases.hashicorp.com instead of using a hardcoded list
- Remove old unused make target e2e-test
- Use NOMAD_E2E env var instead of -integration flag
- Add a README

On my machine with ~250 Mbps internet it takes ~400s to download all
Vault binaries.
2020-01-14 11:02:02 -08:00
Nick Ethier 1f28633954
Merge pull request #6816 from hashicorp/b-multiple-envoy
connect: configure envoy to support multiple sidecars in the same alloc
2020-01-09 23:25:39 -05:00
Tim Gross b5bcfb533b
upgrade CNI plugins to 0.8.4 (#6921)
When multiple Connect-enabled task groups start on the same client
node, a race condition in the CNI plugins for creating iptables chains
causes one of the tasks to fail. We upstreamed a patch to CNI plugins
to make iptables chain creation idempotent.

This changeset updates end-to-end testing, development tooling, and
documentation to use 0.8.4 which includes our patch.
2020-01-09 10:57:07 -05:00
Tim Gross c11cc60674 commit a hclfmt to eliminate diffs after 'make dev' 2020-01-09 08:18:51 -05:00
Nick Ethier 7b931522f0 e2e: add test for multiple sevice sidecars in the same alloc 2020-01-06 12:48:35 -05:00
Tim Gross 4ba5691656
e2e: give metrics longer to settle (#6884)
Increase the shortened timeout after the first loop so that metrics
that take longer to come in aren't failing the test unnecessarily.

Move the check for empty alloc metrics into the loop so that if the
first values we get are empty we don't fail the test too early.
2019-12-20 10:39:35 -05:00