Commit graph

607 commits

Author SHA1 Message Date
Kris Hicks 2db8aa2a52 Add e2e test for raw exec 2021-01-28 12:03:19 +00:00
Kris Hicks 87188f04de Add PID namespacing and e2e test 2021-01-28 12:03:19 +00:00
Mahmood Ali c92bb342e1 e2e: skip node drain deadline/force tests 2021-01-27 08:42:16 -05:00
Mahmood Ali b12e8912a9 e2e: use f.NoError instead of requires 2021-01-27 08:36:23 -05:00
Mahmood Ali 1ac8b32e08 e2e: Disable Connect tests
The connect tests are very disruptive: restart consul/nomad agents with new
tokens.  The test seems particularly flaky, failing 32 times out of 73 in my
sample.

The tests are particularly problematic because they are disruptive and affect
other tests. On failure, the nomad or consul agent on the client can get into a
wedged state, so health/deployment info in subsequent tests may be wrong. In
some cases, the node will be deemed as fail, and then the subsequent tests may
fail when the node is deemed lost and the test allocations get migrated unexpectedly.
2021-01-26 10:01:14 -05:00
Mahmood Ali 36ce1e73eb e2e: deflake nodedrain test
The nodedrain deadline test asserts that all allocations are migrated by the
deadline. However, when the deadline is short (e.g. 10s), the test may fail
because of scheduler/client-propagation delays.

In one failing test, it took ~15s from the RPC call to the moment to the moment
the scheduler issued migration update, and then 3 seconds for the alloc to be
stopped.

Here, I increase the timeouts to avoid such false positives.
2021-01-26 10:01:14 -05:00
Mahmood Ali cf8f6f07d7 e2e: vault increase timeout
Increase the timeout for vaultsecrets.  As the default  interval is 0.1s, 10
retries mean it only retries for one second, a very short time for some waiting
scenarios in the test (e.g. starting allocs, etc).
2021-01-26 10:01:14 -05:00
Mahmood Ali 94ad40907c e2e: prefer testutil.WaitForResultRetries
Prefer testutil.WaitForResultRetries that emits more descriptive errors on
failures. `require.Evatually` fails with opaque "Condition never satisfied"
error message.
2021-01-26 10:01:14 -05:00
Mahmood Ali f3f8f15b7b e2e: special case "Unexpected EOF" errors
This is an attempt at deflaking the e2e exec tests, and a way to improve
messages.

e2e occasionally fail with "unexpected EOF" even though the exec output matches
expectations. I suspect there is a race in handling EOF in server/http handling.

Here, we special case this error and ensures we get all failures,
to help debug the case better.
2021-01-26 10:01:14 -05:00
Mahmood Ali 925d9ce952 e2e: tweak failure messages
Tweak the error messages for the flakiest tests, so that on test failure, we get
more output
2021-01-26 09:16:48 -05:00
Mahmood Ali 6aa3dec6cc e2e: use testify requires instead of t.Fatal
testify requires offer better error message that is easier to notice when seeing
a wall of text in the builds.
2021-01-26 09:14:47 -05:00
Mahmood Ali 236b4055a7 e2e: deflake consul/CheckRestart test
Ensure we pass the alloc ID to status.  Otherwise, the test may fail if there is
another spurious allocation running from another test.
2021-01-26 09:12:20 -05:00
Mahmood Ali 0aafd9af64 e2e: Fix build script and pass shellcheck 2021-01-26 09:11:37 -05:00
Mahmood Ali 4397eda209
Merge pull request #9798 from hashicorp/e2e-terraform-tweaks-20200113
This PR makes two ergonomics changes, meant to get e2e builds more reproducible and ease changes.

### AMI Management

First, we pin the server AMIs to the commits associated with the build.  No more using the latest AMI a developer build in a test branch, or accidentally using a stale AMI because we forgot to build one!  Packer is to tag the AMI images with the commit sha used to generate the image, and then Terraform would look up only the AMIs associated with that sha. To minimize churn, we use the SHA associated with the latest Packer configurations, rather than SHA of all.

This has few benefits: reproducibility and avoiding accidental AMI changes and contamination of changes across branches. Also, the change is a stepping stone to an e2e pipeline that builds new AMIs automatically if Packer files changed.

The downside is that new AMIs will be generated even for irrelevant changes (e.g. spelling, commits), but I suspect that's OK. Also, an engineer will be forced to build the AMI whenever they change Packer files while iterating on e2e scripts; this hasn't been an issue for me yet, and I'll be open for iterating on that later if it proves to be an issue.

### Config Files and Packer

Second, this PR moves e2e config hcl management to Terraform instead of Packer. Currently, the config files live in `./terraform/config`, but they are baked into the servers by Packer and changes are ignored.  This current behavior surprised me, as I spent a bit of time debugging why my config changes weren't applied.  Having Terraform manage them would ease engineer's iteration.  Also, make Packer management more consistent (Packer only works `e2e/terraform/packer`), and easing the logic for AMI change detection.

The config directory is very small (100KB), and having it as an upload step adds negligible time to `terraform apply`.
2021-01-25 13:20:28 -05:00
Mahmood Ali 39da228964 update readme about profiles and packer build 2021-01-25 11:40:26 -05:00
Seth Hoenig 8b05efcf88 consul/connect: Add support for Connect terminating gateways
This PR implements Nomad built-in support for running Consul Connect
terminating gateways. Such a gateway can be used by services running
inside the service mesh to access "legacy" services running outside
the service mesh while still making use of Consul's service identity
based networking and ACL policies.

https://www.consul.io/docs/connect/gateways/terminating-gateway

These gateways are declared as part of a task group level service
definition within the connect stanza.

service {
  connect {
    gateway {
      proxy {
        // envoy proxy configuration
      }
      terminating {
        // terminating-gateway configuration entry
      }
    }
  }
}

Currently Envoy is the only supported gateway implementation in
Consul. The gateay task can be customized by configuring the
connect.sidecar_task block.

When the gateway.terminating field is set, Nomad will write/update
the Configuration Entry into Consul on job submission. Because CEs
are global in scope and there may be more than one Nomad cluster
communicating with Consul, there is an assumption that any terminating
gateway defined in Nomad for a particular service will be the same
among Nomad clusters.

Gateways require Consul 1.8.0+, checked by a node constraint.

Closes #9445
2021-01-25 10:36:04 -06:00
Tim Gross 0b49e3da12 e2e: added tests for check restart behavior 2021-01-22 10:55:40 -05:00
Drew Bailey 630babb886
prevent double job status update (#9768)
* Prevent Job Statuses from being calculated twice

https://github.com/hashicorp/nomad/pull/8435 introduced atomic eval
insertion iwth job (de-)registration. This change removes a now obsolete
guard which checked if the index was equal to the job.CreateIndex, which
would empty the status. Now that the job regisration eval insetion is
atomic with the registration this check is no longer necessary to set
the job statuses correctly.

* test to ensure only single job event for job register

* periodic e2e

* separate job update summary step

* fix updatejobstability to use copy instead of modified reference of job

* update envoygatewaybindaddresses copy to prevent job diff on null vs empty

* set ConsulGatewayBindAddress to empty map instead of nil

fix nil assertions for empty map

rm unnecessary guard
2021-01-22 09:18:17 -05:00
Mahmood Ali 9dcdafe4cf e2e: show command output on failure
When a command fails, it's nice to have the full output, as it contains
diagnostic information. The status code isn't sufficient for debugging.
2021-01-21 10:32:16 -05:00
Mahmood Ali 923725bf3d e2e: deflake TestVolumeMounts
After submitting an update, the test ought to wait until the new
allocations are placed. Previously, we'd use the original to-be-stopped
allocations and the test fails when attempting to exec.
2021-01-21 10:28:41 -05:00
Mahmood Ali 95b7fc80b8 e2e deflake namespaces: only check namespace jobs
Deflake namespace e2e test by only asserting on jobs related to the
namespace tests. During our e2e tests, some left over jobs (e.g.
prometheus) are left running while being shutdown and cause the test to
fail.
2021-01-21 10:26:24 -05:00
Mahmood Ali 2e8bcac261 e2e: deflake events
Handle streamCh channel being closed.
2021-01-21 10:25:42 -05:00
Seth Hoenig 991884e715 consul/connect: Enable running multiple ingress gateways per Nomad agent
Connect ingress gateway services were being registered into Consul without
an explicit deterministic service ID. Consul would generate one automatically,
but then Nomad would have no way to register a second gateway on the same agent
as it would not supply 'proxy-id' during envoy bootstrap.

Set the ServiceID for gateways, and supply 'proxy-id' when doing envoy bootstrap.

Fixes #9834
2021-01-19 12:58:36 -06:00
Mahmood Ali 76ce6306a4 add helper for building ami 2021-01-15 10:49:13 -05:00
Mahmood Ali e51651c34a set sha 2021-01-15 10:49:13 -05:00
Mahmood Ali 82637715cf change ami naming 2021-01-15 10:49:12 -05:00
Mahmood Ali 0af1509a77 move config files to terraform 2021-01-15 10:49:12 -05:00
Seth Hoenig 536747f216 e2e: use jobspec2 Parse for parsing jobfile in e2e utils
We directly parse job files in e2eutil, but currently using jobspec
package. Instead, use the Parse method from the jobspec2 package so
we can parse job files with new features.
2021-01-13 14:00:40 -06:00
James Rasell d6cab8aa14
Merge pull request #9767 from hashicorp/f-e2e-job-scaling-suite
e2e: add job scaling test suite.
2021-01-11 18:35:07 +01:00
Seth Hoenig 64a8b795f2
Merge pull request #9766 from hashicorp/f-bump-cni-plugins-version
cni: bump CNI plugins version to v0.9.0
2021-01-11 09:59:43 -06:00
Tim Gross f97505e384 e2e: remove deprecated terraform syntax
Also bumps patch versions of some TF modules
2021-01-11 08:25:22 -05:00
James Rasell 4374d99071
e2e: add job scaling test suite. 2021-01-11 11:34:19 +01:00
Seth Hoenig fc5f48d936 cni: bump CNI version to v0.9.0
https://github.com/containernetworking/plugins/releases/tag/v0.9.0

Also make the copy-paste install instructions work with arm64 for
a better OOTB experience (AWS Graviton, Pi 4's).
2021-01-10 18:03:27 -06:00
James Rasell 108fa33393
Merge pull request #9747 from hashicorp/f-e2e-scaling-policy-suite
e2e: add ScalingPolicies test suite with initial test case.
2021-01-08 10:51:48 +01:00
James Rasell b087d68736
e2e: add ScalingPolicies test suite with initial test case. 2021-01-07 14:39:55 +01:00
James Rasell 02b9d9da87
e2e: move namespace tests into OSS. 2021-01-07 09:15:43 +01:00
Seth Hoenig 7da808b43a e2e: add terraform lockfile
Terraform v0.14 is producing a lockfile after running `terraform init`.
The docs suggest we should include this file in the git repository:

> You should include this file in your version control repository so
> that you can discuss potential changes to your external dependencies
> via code review, just as you would discuss potential changes to your
> configuration itself.

Sounds similar to go.sum

https://www.terraform.io/docs/configuration/dependency-lock.html#lock-file-location
2021-01-05 08:55:37 -06:00
Seth Hoenig 59f230714f e2e: add e2e test for service registration 2021-01-05 08:48:12 -06:00
Chris Baker 57b70a27ec modified e2e test so that it explicitly tested the use case in #6929 2021-01-04 22:25:39 +00:00
Chris Baker 9b125b8837 update template and artifact interpolation to use client-relative paths
resolves #9839
resolves #6929
resolves #6910

e2e: template env interpolation path testing
2021-01-04 22:25:34 +00:00
Tim Gross 26f4ee7fb1 e2e: dnsmasq configuration fixes
* systemd units require absolute paths
* ensure directory exists for dnsmasq
2021-01-04 15:40:57 -05:00
Tim Gross c4e57fb813 e2e: document some design goals 2020-12-17 10:33:33 -05:00
Tim Gross 88fc79c35e e2e: bump default version of dev cluster 2020-12-17 10:33:33 -05:00
Tim Gross 00bc6a7d13
e2e: move dnsmasq config into dnsmasq service unit (#9660)
Our dnsmasq configuration needs host-specific data that we can't configure in
the AMI build. But configuring this in userdata leads to a race between
userdata execution, docker.service startup, and dnsmasq.service startup. So
rather than letting dnsmasq come up with incorrect configuration and then
modifying it after the fact, do the configuration in the service's prestart,
and have it kick off a Docker restart when we're done.
2020-12-17 10:33:19 -05:00
Kris Hicks 0cf9cae656
Apply some suggested fixes from staticcheck (#9598) 2020-12-10 07:29:18 -08:00
Kris Hicks 0a3a748053
Add gosimple linter (#9590) 2020-12-09 11:05:18 -08:00
Drew Bailey 2f5710a3b7
use concrete type helper instead of interface surfing (#9585)
* use concrete type helper instead of interface surfing

* wrap err
2020-12-09 09:02:37 -05:00
Kris Hicks 93155ba3da
Add gocritic to golangci-lint config (#9556) 2020-12-08 12:47:04 -08:00
Seth Hoenig ad5918f754 e2e: upgrade terraform consul to 1.9.0 2020-12-03 13:01:14 -06:00
Drew Bailey 17de8ebcb1
API: Event stream use full name instead of Eval/Alloc (#9509)
* use full name for events

use evaluation and allocation instead of short name

* update api event stream package and shortnames

* update docs

* make sync; fix typo

* backwards compat not from 1.0.0-beta event stream api changes

* use api types instead of string

* rm backwards compat note that only changed between prereleases

* remove backwards incompat that only existed in prereleases
2020-12-03 11:48:18 -05:00
Jasmine Dahilig 6ea00284f1 lifecycle: update e2e test for service job with new docker signal #8932 2020-12-01 23:41:32 -08:00
Seth Hoenig 1b3d409eba e2e: use test framework Assertions in connect tests 2020-11-30 08:48:40 -06:00
Seth Hoenig 546a8bfb95 e2e: add e2e test for consul connect ingress gateway demo
Add the ingress gateway example from the noamd connect examples
to the e2e Connect suite. Includes the ACLs enabled version,
which means the nomad server consul acl policy will require
operator=write permission.
2020-11-25 16:54:02 -06:00
Seth Hoenig d850f17bc1 e2e: print consulacls scripts output as string
The clean up in #8908 inadvertently caused the output from the scripts
involved in the Consul ACL bootstrap process to be printed as a big blob
of bytes, which is slightly less useful than the text version.
2020-11-25 15:03:33 -06:00
Tim Gross 481f91034c
E2E: CSI driver provisioning (#9443)
* e2e/csi: wait longer for plugins to become healthy

Plugins are Docker containers, and as such sometimes we get delays in startup
due to pulling from the registry and this is a source of test flakiness. Give
the plugins a little longer to start up.

* e2e/csi: version bump for AWS EBS plugins
2020-11-25 09:05:22 -05:00
Seth Hoenig 74a34704c5
Merge pull request #8743 from hashicorp/f-task_network_warning
Validate and document 0.12 mbits/network deprecations
2020-11-23 15:36:18 -06:00
Tim Gross d686a51d60
e2e: prevent Ubuntu startup race conditions (#9428)
The cloud-init configuration runs on boot, which can result in a race
condition between that and service startup. This has caused provisioning
failures because Nomad expects the userdata to have configured a host volume
directory. Diagnosing this was also compounded by a warning being fired by
systemd for the Nomad unit file.

* Update the location of the `StartLimitIntervalSec` field to it's
  post-systemd-230 location.
* Ensure that the weekly AMI build is up-to-date to reduce the risk of
  unexpected system software changes.
* Move the host volume to a directory we can set up at AMI build time rather
  than in userdata.
2020-11-23 12:29:08 -05:00
Nick Ethier f1ea79f5a8 remove references to default mbits 2020-11-23 10:32:13 -06:00
Nick Ethier e8784c919f e2e: update jobs to use new network stanza format 2020-11-23 10:25:30 -06:00
Chris Baker 00841a8525 events: e2e test that API client honors the index flag 2020-11-21 16:38:24 +00:00
Michael Schurter 43b225b19d e2e: test template path interpolation 2020-11-18 10:48:58 -08:00
Tim Gross 7e4fd79eee
e2e: CSI test should detect un-deregisterable volumes (#9343)
Assert that deregistering a volume works without errors following a volume
reap. Use CLI helpers where feasible to exercise CSI command line. Dump plugin
allocation logs on deregistration failures for debugging purposes.
2020-11-13 09:31:21 -05:00
Jasmine Dahilig d6110cbed4
lifecycle: add poststop hook (#8194) 2020-11-12 08:01:42 -08:00
Drew Bailey 9a1fc720c8
enables audit log on full-cluster (#9315) 2020-11-11 08:33:01 -05:00
Tim Gross 08ae13d3b9
e2e: Windows provisioning improvements (#9246)
Small changes to the Windows 2016 Packer build for debuggability of
provisioning:

* improve verbosity of powershell error handling
* remove unused "tools" installation
* use ssh communicator for Packer to improve Packer build times and eliminate
  deprecated winrm remote access (unavailable from current macOS)
2020-11-09 13:29:40 -05:00
Drew Bailey c181973265
append custom path to custom_config_files (#9289)
* append custom path to custom_config_files

* remove config_path variable
2020-11-06 11:16:13 -05:00
Tim Gross dc8e20206d
E2E: switch packer build files to HCL2 (#9219)
Build configuration files need comments, and JSON is also just the worst, isn't it?
Upgrade our E2E packer configs to use the new HCL2 syntax.
2020-10-29 10:03:39 -04:00
Tim Gross 06c75460f3
e2e: provide precedence for version variables (#9216)
The `nomad_sha`, `nomad_version`, and `nomad_local_binary` variables for the
Nomad provisioning module assumed that only one would be set. By having the
override each other with an explicit precedence, it makes it easier to avoid
problems with Terraform's implicit variables behavior.

Set the expected default values in the `terraform.full.tfvars` to avoid
shadowing by any future changes to the `terraform.tfvars` file.

Update the Makefile to put the `-var` and `-var-file` in the correct order.
2020-10-29 09:15:22 -04:00
Tim Gross 57f694ff2e
E2E: AMI software version bumps and cleanup (#9213)
* remove unused vault installation from Windows AMI
* match Windows and Linux Consul versions
* bump AMI base Nomad to current stable
2020-10-29 08:27:50 -04:00
Tim Gross a2710c7a31
e2e: set default version for dev cluster (#9208) 2020-10-28 16:50:20 -04:00
Tim Gross 99c2a2df00
e2e: reduce risk of flaky Ubuntu AMI build (#9207)
The base Ubuntu AMI modifies apt sources during cloud-init. But the Packer
build can potentially start the setup script before that work is done,
resulting in errors trying to install base system dependencies like
`dnsmasq`. Delay the setup long enough to lose the race with cloud-init.
2020-10-28 15:13:44 -04:00
Tim Gross 7e4a35ad7e
e2e: use more specific names for OS/distros (#9204)
We intend to expand the nightly E2E test to cover multiple distros and
platforms. Change the naming structure for "Linux client" to the more precise
"Ubuntu Bionic", and "Windows" to "Windows 2016" to make it easier to add new
targets without additional refactoring.
2020-10-28 12:58:00 -04:00
Tim Gross be3f54d296
e2e: make dev cluster the default Terraform vars file (#9202)
Most of the time that a human is running the TF provisioning, they want the
"dev cluster" which is going to deploy an OSS sha, with fewer targets and
configuration alternatives. But the default `terraform.tfvars` is the nightly
E2E run. Because the nightly run is automated, there's no reason we can't have
it pick a non-default `terraform.full.tfvars` file and have the default be the
dev cluster.
2020-10-28 10:01:42 -04:00
Tim Gross 4fe1edfd63
Revert "e2e: fix destination of templates in VaultSecrets test (#9146)" (#9163)
This reverts commit 8aed53c177aea024d4f24d1fbb4d6e0881f04eab.
2020-10-23 09:01:25 -04:00
Tim Gross 1fb1c9c5d4
artifact/template: make destination path absolute inside taskdir (#9149)
Prior to Nomad 0.12.5, you could use `${NOMAD_SECRETS_DIR}/mysecret.txt` as
the `artifact.destination` and `template.destination` because we would always
append the destination to the task working directory. In the recent security
patch we treated the `destination` absolute path as valid if it didn't escape
the working directory, but this breaks backwards compatibility and
interpolation of `destination` fields.

This changeset partially reverts the behavior so that we always append the
destination, but we also perform the escape check on that new destination
after interpolation so the security hole is closed.

Also, ConsulTemplate test should exercise interpolation
2020-10-22 15:47:49 -04:00
Tim Gross 344e821ace
e2e: fix destination of templates in VaultSecrets test (#9146)
The `$NOMAD_SECRETS_DIR` environment variable is rendered as `/secrets`, which
prior to the recent security patch would unintentionally escape the file
sandbox and get dropped in a directory named `/secrets` where the Nomad client
binary was running. The `VaultSecrets` test was accidentally relying on this
behavior and that causes the test to fail.
2020-10-22 13:00:08 -04:00
Tim Gross 9fa38bac98
e2e: path fixes for local_binary uploads (#9137)
When uploading a local binary for provisioning, the location that we pass into
the provisioning script needs to be where we uploaded it to, not the source on
our laptop. Also, the null_resource for uploading needs to read in the private
key, not its path.
2020-10-21 10:20:22 -04:00
Drew Bailey 8451de99b2
adds two base event stream e2e tests (#9126)
* adds two base event stream e2e tests

test evaluation filter keys are included

* Apply suggestions from code review

Co-authored-by: Tim Gross <tgross@hashicorp.com>

* gc aftereach

Co-authored-by: Tim Gross <tgross@hashicorp.com>
2020-10-20 08:26:21 -04:00
Tim Gross 8fcdbe0592
e2e: add reporting to flaky spread test (#9115)
The spread test is infrequently flaky and it's hard to extract what's actually
happening. If the test fails, dump all the allocation metrics so that we can
debug the behavior.
2020-10-16 11:01:07 -04:00
Tim Gross 54d7f57662
e2e: fix flaky TaskEventsTest (#9114)
Assert that we get at least N task events, rather than exactly N. When a
task within an allocation dies, a sibling task can get an Allocation Unhealthy
event after it's also killed, even though it's not the origin of the event.
2020-10-16 10:22:40 -04:00
Tim Gross e0ff06be2f
e2e: networking test job needs to outlast assert (#9113)
The `e2ejob` utility asserts that a job is running for 5s, but with a sleep
time of 5s, the networking job can race with that check. Sleeping for a longer
period should guarantee that we're running long enough to pass the assert.

Also constrains the job to Linux because our Windows test targets don't yet
support Docker (LCOW), and expand the set of DCs we can safely land on.
2020-10-16 10:13:16 -04:00
Chris Baker 0a85d2bd24
Merge pull request #9089 from hashicorp/b-explicit-rune
fix go 1.15 pickiness
2020-10-14 10:37:36 -05:00
Tim Gross fe88003f29
e2e: eliminate race condition causing rescheduling test flake (#9085)
The autorevert test checks for reverted allocations to be placed and running
before checking the deployment status, but the deployment can be completed and
marked "successful" before we check it for "running" status. Instead, just
wait for it to be marked "successful" and assert we have the expected count of
deployment statuses.
2020-10-14 11:35:30 -04:00
Tim Gross 76f1f5e5df
e2e: use AMI filter for Ubuntu packer image (#9086)
Instead of hard-coding the base AMI for our Packer image for Ubuntu, use the
latest from Canonical so that we always have their current kernel patches.
2020-10-14 11:22:33 -04:00
Chris Baker d4bae840b2 fix go 1.15 pickiness 2020-10-14 15:19:54 +00:00
Nick Ethier f5250499b9
e2e/networking: use correct dc (#9088) 2020-10-14 11:14:09 -04:00
Tim Gross 115edb53a0
e2e: add flag to opt-in to creating EBS/EFS volumes (#9082)
For everyday developer use, we don't need volumes for testing CSI. Providing a
flag to opt-in speeds up deploying dev clusters and slightly reduces infra costs.

Skip CSI test if missing volume specs.
2020-10-14 10:29:33 -04:00
Tim Gross 65282a7cf1
E2E: vault secrets (#9081)
* rename vault API compatibility test for clarity
* exercise vault secrets lease renewal
2020-10-14 08:43:28 -04:00
Nick Ethier d45be0b5a6
client: add NetworkStatus to Allocation (#8657) 2020-10-12 13:43:04 -04:00
Yoan Blanc 891accb89a
use allow/deny instead of the colored alternatives (#9019)
Signed-off-by: Yoan Blanc <yoan@dosimple.ch>
2020-10-12 08:47:05 -04:00
Tim Gross 474c18102d
e2e: extend ConsulTemplate test and fix flakiness (#8997)
Add service discovery integration to the existing consul-template E2E test,
and verify both service and key updates force re-rendering. Fixes flakiness by
using the longer default wait config we use elsewhere.

Removes our last direct dependency on gomega.
2020-10-05 10:51:55 -04:00
Tim Gross 727277793b
e2e: bootstrap vault and provision Nomad with vault tokens (#9010)
Provisions vault with the policies described in the Nomad Vault integration
guide, and drops a configuration file for Nomad vault server configuration
with its token. The vault root token is exposed to the E2E runner so that
tests can write additional policies to vault.
2020-10-05 09:28:37 -04:00
Tim Gross b6292528fe
e2e: tfvars.dev file must override default tfvars file (#9005)
The `-var-file` flag for loading variables into Terraform overlays the default
variables file if present. This means that variables that are set in the
default variables file will take precedence if the overlay file does not have
them set.

Set `nomad_acls` and `nomad_enteprise` to `false` in the dev cluster.
2020-10-02 08:02:37 -04:00
Tim Gross 4bab91b81b
e2e: ensure tests are constrained to Linux (#8990)
Until we have LCOW support in the E2E environment (which requires a Windows
2019 test target), we need to constrain E2E tests to the appropriate kernel
2020-09-30 09:43:30 -04:00
Tim Gross e49410e97b
e2e: cleanup errors should use assert, not require (#8989)
The E2E framework wraps testify's `require` so that by default we can stop
tests on errors, but the cleanup functions should use `assert` so that we
continue to try to cleanup the test environment even if there's a failure.
2020-09-30 09:00:37 -04:00
Tim Gross fa1fa623f2
e2e: rework rescheduling progress deadline test (#8958)
Eliminate sources of randomness in the progress deadline test and clarify the
purpose of the test to check for progress deadline updates.
2020-09-29 11:02:16 -04:00
Tim Gross 6489c5f626
e2e: namespace support for CLI helpers (#8978)
Required to support tests for namespaces and other ENT features.
2020-09-28 16:37:34 -04:00
Tim Gross 6bed4ec45b
e2e: ENT placeholder for namespace/quotas tests (#8973) 2020-09-28 11:23:37 -04:00
Tim Gross 1311f32f1b
e2e: test for host volumes and Docker volumes (#8972)
Exercises host volume and Docker volume functionality for the `exec` and `docker`
task driver, particularly around mounting locations within the container and
how this can be used with `template`.
2020-09-28 11:14:13 -04:00
Tim Gross 566dae7b19
e2e: add flag to bootstrap Nomad ACLs (#8961)
Adds a `nomad_acls` flag to our Terraform stack that bootstraps Nomad ACLs via
a `local-exec` provider. There's no way to set the `NOMAD_TOKEN` in the Nomad
TF provider if we're bootstrapping in the same Terraform stack, so instead of
using `resource.nomad_acl_token`, we also bootstrap a wide-open anonymous
policy. The resulting management token is exported as an environment var with
`$(terraform output environment)` and tests that want stricter ACLs will be
able to write them using that token.

This should also provide a basis to do similar work with Consul ACLs in the
future.
2020-09-28 09:22:36 -04:00
Tim Gross 15d3f5ea7e
e2e: remove unused migrations test (#8955)
The areas of the code this test exercised were merged in with the node
drain tests.
2020-09-23 14:50:15 -04:00
Tim Gross 147b16243d
e2e: use more recent instance type (#8954)
Newer EC2 instances are both cheaper and have generally better
performance.

The dnsmasq configuration had a hard-coded interface name, so in order to
accomodate instances with more recent networking that result in so-called
predictable interface names, the dnsmasq configuration needs to be replaced at
runtime with userdata to select the default interface.
2020-09-23 14:27:52 -04:00
Tim Gross 1fc525ec1e
e2e: add flags for provisioning Nomad Enterprise (#8929) 2020-09-23 10:39:04 -04:00
Tim Gross 9cbc604308
e2e: node drain tests (#8906)
Exercise the `nomad node drain` features, driving them via the new CLI helpers.
2020-09-21 11:52:11 -04:00
Tim Gross 34093f7747
e2e: reschedule tests should check for non-zero rescheduled allocs (#8927)
The conditional around some of the rescheduling tests was backwards, where we
were waiting for allocations to be rescheduled but testing for a count of
0. The test was passing but flaky because if the check happened quickly enough
before the scheduler rescheduled the allocations, it would pass.
2020-09-21 08:17:24 -04:00
Tim Gross 3da61545d5
make sure dev-cluster has the option to run windows config (#8928) 2020-09-18 16:41:35 -04:00
Tim Gross ea1f6408bf
e2e: remove unused framework provisioning code (#8908) 2020-09-18 11:46:47 -04:00
Tim Gross c413fa5e49
e2e: test script for Terraform logic (#8907) 2020-09-18 11:46:40 -04:00
Tim Gross 9d37233eaf
e2e: provision cluster entirely through Terraform (#8748)
Have Terraform run the target-specific `provision.sh`/`provision.ps1` script
rather than the test runner code which needs to be customized for each
distro. Use Terraform's detection of variable value changes so that we can
re-run the provisioning without having to re-install Nomad on those specific
hosts that need it changed.

Allow the configuration "profile" (well-known directory) to be set by a
Terraform variable. The default configurations are installed during Packer
build time, and symlinked into the live configuration directory by the
provision script. Detect changes in the file contents so that we only upload
custom configuration files that have changed between Terraform runs
2020-09-18 11:27:24 -04:00
Tim Gross 990fcf7be4
e2e: documentation and minor tweaks to configs (#8912)
* remove outdated references to envchain in documentation
* add new host volume locations in userdata
* don't exit the entire script during provisioning, just return
2020-09-17 09:20:18 -04:00
Tim Gross d7a013b6f5
e2e: refactor CLI utils out of rescheduling test (#8905)
The CLI helpers in the rescheduling test were intended for shared use, but
until some other tests were written we didn't want to waste time making them
generic. This changeset refactors them and adds some new helpers associated
with the node drain tests (under separate PR).
2020-09-16 16:10:06 -04:00
Tim Gross bd889c82aa
e2e: constrain rescheduling test workloads to Linux (#8872)
The rescheduling test workloads were created before we had Windows targets in
the E2E nightly run. When these were recently ported to the e2e framework they
were missing the constraint to Linux machines.

Also added a little extra time to polling to avoid some flakiness on the first
run, and a minor readability adjustment to the job names.
2020-09-11 09:21:28 -04:00
Tim Gross 572ae37856
Merge pull request #8860 E2E: rescheduling tests 2020-09-10 13:43:55 -04:00
Tim Gross 294c7149a2 e2e: rescheduling tests
Ports the rescheduling tests (which aren't running in CI) into the current
test framework so that they're run on nightly, and exercises the new CLI
helpers.
2020-09-10 13:00:37 -04:00
Tim Gross 28e9bbbbf4 e2e: helper for sending CLI commands and parsing output
The E2E suite exercises the API, but not the CLI. This changeset adds a helper
function to send commands via a locally-built Nomad binary (which we'll need
to add to the E2E setup), and some helpers to parse the resulting structured
outputs in a way that tests can consume.
2020-09-10 13:00:32 -04:00
Michael Schurter 5f3a71d0b9 docs: update scripts to 0.12.4 2020-09-09 15:22:37 -07:00
James Rasell 76b03d3a2f
e2e: fix failure in running metrics test suite jobs.
When running the Fabio and Prometheus jobs for the metrics suite
it seems the outer directory is required in the call when
registering the job.

error: "e2e/input/fabio.nomad: no such file or directory"
2020-09-09 08:40:35 +02:00
Tim Gross f499b44101
e2e: move setup jobs for metrics test into that suite (#8842)
The fabio and prometheus workloads are specific to the metrics test and aren't
used by any other test suite.
2020-09-08 13:21:44 -04:00
Tim Gross a47b1c1081
e2e: move configurations into profile-specific directories (#8828)
This changeset stages upcoming E2E provisioning improvements work. It splits
the existing shared configuration directory into 3 profiles:

* "full-cluster": the set of configurations currently in use
* "dev-cluster": a simplified set of mostly existing configurations that
  weren't in use.
* "custom": an empty profile for developers to keep non-standard
  configurations during complex feature development.

The tooling to switch between profiles will be in a later changeset.

Also drops some unused configuration knobs from the provisioning scripts to
make the next stage of work easier.
2020-09-04 11:23:32 -04:00
Tim Gross 93c1093274
e2e: remove unused EBS volumes and depends_on (#8827)
Our provisioning process for E2E doesn't require the `depends_on` fields to be
set for client instances, so dropping that field allows all instances to be
started in parallel.

We don't use the extra EBS volumes (they aren't even mounted), so remove them
to reduce costs.
2020-09-04 10:25:59 -04:00
Tim Gross 0577b03479
e2e: minor rename and cleanup (#8824) 2020-09-04 08:51:22 -04:00
Tim Gross e6cdd8e0c0
e2e: consolidate cloud-specific Consul configs (#8823)
The `-recursor` flag in the Consul service unit files is specific to a given
cloud, but we already have cloud-specific configuration files. Consolidate all
the cloud-specific items into the config.
2020-09-04 08:51:15 -04:00
Tim Gross bc6ad011fe
e2e: Linux AMI setup cleanup (#8821)
As we add new Linux targets for E2E, the existing setup.sh script will be used
only for Ubuntu. Rather than have the service and config files echo'd from the
script, move them into files we upload so they can be reused.

Includes some general noise reduction in the setup.sh script and removal of
unused bits.
2020-09-03 16:30:58 -04:00
Jasmine Dahilig 71a694f39c
Merge pull request #8390 from hashicorp/lifecycle-poststart-hook
task lifecycle poststart hook
2020-08-31 13:53:24 -07:00
Jasmine Dahilig fbe0c89ab1 task lifecycle poststart: code review fixes 2020-08-31 13:22:41 -07:00
Tim Gross 3a382f599f
e2e: minor TF refactor to split out vars and outputs (#8752) 2020-08-26 17:00:36 -04:00
Tim Gross 8c8b91e7b9
e2e: move systemd unit files into Packer build (#8751) 2020-08-26 16:45:09 -04:00
Tim Gross 693a8a2613
e2e: fix platform path for installing for Linux from s3 (#8708) 2020-08-21 09:20:09 -04:00
Tim Gross b23150057a
E2E: move Nomad installation to script on remote hosts (#8706)
This changeset moves the installation of Nomad binaries out of the
provisioning framework and into scripts that are installed on the remote host
during AMI builds.

This provides a few advantages:

* The provisioning framework can be reduced in scope (with the goal of moving
  most of it into the Terraform stack entirely).
* The scripts can be arbitrarily complex if we don't have to stuff them into
  ssh commands, so it's easier to make them idempotent. In this changeset, the
  scripts check the version of the existing binary and don't re-download when
  using the `--nomad_sha` or `--nomad_version` flags.
* The scripts can be OS/distro specific, which helps in building new test
  targets.
2020-08-20 16:10:00 -04:00
Michael Schurter 86a31d0df6
Merge pull request #8701 from hashicorp/doc-e2e
docs: clarify e2e tests
2020-08-20 08:53:58 -07:00
Jasmine Dahilig a7b8adfe01 task lifecycle: e2e fix more alloc stop races 2020-08-20 08:49:58 -07:00
Jasmine Dahilig 681eb407db task lifecycle: make e2e service job test block until poststart task has started 2020-08-20 08:11:16 -07:00
Tim Gross 0fd4a05b2f
E2E AMI cleanup (#8697)
* move CNI install/podman config to build-time
* move DNS config to userdata
* consolidate apt updates for performance
2020-08-20 10:09:31 -04:00
Michael Schurter 72bd8f477c docs: clarify e2e tests
Just a smattering of attempted improvements as I read through this
again. Some of my goals:

- Tried to add more high level info to the intro to set the context
- Clarify the difference between *test* dev and *agent* dev workflows
- Add -timeout to provisioning step because cable Internet is lol
2020-08-19 20:32:31 -07:00
Michael Schurter 66bc07d01a test: deflake consul e2e tests
Modernize test patterns by removing gomega and avoiding the mock_driver.
2020-08-19 14:29:22 -07:00
Tim Gross 9a3caa49db
e2e: remove unused spark dependency (#8695) 2020-08-19 14:59:36 -04:00
Tim Gross a49732816c
migrate AMI builds to new account (#8674) 2020-08-19 08:20:59 -04:00
Tim Gross d810dab50b
migrate E2E test runs to new AWS account (#8676) 2020-08-18 14:24:34 -04:00
Jasmine Dahilig ee522ab587 task lifecycle: e2e tests 2020-08-18 10:49:50 -07:00
Drew Bailey 76d7d926a7
skip podman e2e 2020-08-14 09:02:56 -04:00
Tim Gross 09a97bd158
e2e: spread CSI controller plugins across multiple DCs (#8629)
Controller plugins that land on the same node will collide over their CSI
`mount_dir`, so give them enough room in our tests that they don't land on the
same host.

Also, version bump the EBS node plugins to match the controllers.
2020-08-10 16:41:39 -04:00
Tim Gross 12984ed1c9
e2e: CSI EBS test should expect 2 controllers (#8617) 2020-08-10 09:41:21 -04:00
Tim Gross fa6ec931f8
e2e: CSI EBS version bump to 0.6.0 (#8618) 2020-08-10 09:41:13 -04:00
Tim Gross 5dba653b43
csi/e2e: add 2nd controller for node drain testing (#8573) 2020-07-31 08:03:49 -04:00
Tim Gross 87f9bfaf1e
e2e/csi: update EFS plugin test to use v1.0 (#8562) 2020-07-30 08:41:48 -04:00
Tim Gross d0b03cad7c
e2e: give containers access to dnsmasq DNS (#8536)
By default, Docker containers get /etc/resolv.conf bound into the container
with the localhost entry stripped out. In order to resolve using the host's
dnsmasq, we need to make sure the container uses the docker0 IP as its
nameserver and that dnsmasq is listening on that port and forwarding to either
the AWS VPC DNS (so that we can query private resources like EFS) or to the
Consul DNS.
2020-07-24 14:09:18 -04:00
Lang Martin deb37c91b7
e2e/bin/run: run & update only attempt to contact linux servers (#8517) 2020-07-24 10:52:12 -04:00
Seth Hoenig c202d0f134
Merge pull request #8335 from hashicorp/f-cnative-host-e2e
e2e: add tests for connect native
2020-07-10 10:24:43 -05:00
Seth Hoenig ac8b51b611 e2e: connect jobID code golf 2020-07-10 10:24:13 -05:00
Drew Bailey 01b01f7cac
use latest podman release (#8403) 2020-07-09 09:28:53 -04:00
Seth Hoenig a9991e9ab9 e2e: add tests for connect native
Adds 2 tests around Connect Native. Both make use of the example connect native
services in https://github.com/hashicorp/nomad-connect-examples

One of them runs without Consul ACLs enabled, the other with.
2020-07-01 15:54:28 -05:00
Tim Gross 23be116da0
csi: add -force flag to volume deregister (#8295)
The `nomad volume deregister` command currently returns an error if the volume
has any claims, but in cases where the claims can't be dropped because of
plugin errors, providing a `-force` flag gives the operator an escape hatch.

If the volume has no allocations or if they are all terminal, this flag
deletes the volume from the state store, immediately and implicitly dropping
all claims without further CSI RPCs. Note that this will not also
unmount/detach the volume, which we'll make the responsibility of a separate
`nomad volume detach` command.
2020-07-01 12:17:51 -04:00
Drew Bailey 327843acfa
base podman e2e test and provisioning updates (#8104)
* initial setup for terrform to install podman task driver

podman

* Update e2e provisioning to support root podman

Excludes setup for rootless podman. updates source ami to ubuntu 18.04
Installs podman and configures podman varlink

base podman test

ensure client status running

revert terraform directory changes

* back out random go-discover go mod change

* include podman varlink docs

* address comments
2020-06-03 14:06:58 -04:00
Seth Hoenig 889e7ddd0c build: use hashicorp hclfmt
We have been using fatih/hclfmt which is long abandoned. Instead, switch
to HashiCorp's own hclfmt implementation. There are some trivial changes in
behavior around whitespace.
2020-05-24 18:31:57 -05:00
Tim Gross 932710ad7d
e2e: upgrade CNI to 0.8.6 (#7956) 2020-05-14 09:29:11 -04:00
Seth Hoenig 623c804046 e2e: upgrade consul in packer setup to 1.7.3 from 1.6.1
There have been a number of bug fixes and features particularly around
Connect that will help us in Nomad's e2e tests. Upgrade Consul in our
packer builder so e2e can make use of the new version.
2020-05-11 11:17:28 -06:00
Seth Hoenig aae8a8504e e2e: set an expose service check in connect e2e testcase
Make sure exposed checks work in e2e by setting an expose
check on the e2e connect test.
2020-05-07 14:40:03 -06:00
Tim Gross 139c65c436
e2e: csi test can purge target job (#7823) 2020-05-01 13:25:50 -04:00
Tim Gross 4935b304a0
e2e: add helper to Makefile for local file deployments (#7822) 2020-04-28 16:15:58 -04:00
Tim Gross ab3086a1f4
e2e: testing reliability (#7701)
* pin CSI plugin versions
* ensure failing CSI tests clean up
* allow NOMAD_SHA env var to override makefile
2020-04-13 10:25:24 -04:00
Mahmood Ali c8eddb9f6b fixup! e2e: add a convenient creation script 2020-04-09 11:04:26 -04:00
Mahmood Ali 8a4937d9ce e2e: add a convenient creation script
Add a convenience Makefile for creating e2e environment for manual
debugging.
2020-04-09 10:54:30 -04:00
Lang Martin c0dbcbef5f
e2e: csi: wait for volume write claims to be released before starting read jobs (#7641) 2020-04-07 07:40:44 -04:00
Tim Gross 50f807060a
e2e: csi tests can only run on linux (#7635) 2020-04-06 11:57:59 -04:00
Tim Gross 73dc2ad443 e2e/csi: add waiting for alloc stop 2020-04-06 10:15:55 -04:00
Tim Gross d81797ea33
e2e: improve test reliability for CSI (#7616)
This changeset:

* adds eval status to the error messages emitted when we have
  placement failure in tests. The implementation here isn't quite
  perfect but it's a lot better than "condition not met".
* enforces the ordering of teardown of the CSI test
* doesn't pass the purge flag to one of the two CSI tests, so that we
  exercise both code paths.
2020-04-03 15:52:58 -04:00
Tim Gross 4c51687cbf
e2e: remove gometa from e2eutils (#7610) 2020-04-03 10:22:22 -04:00
Tim Gross bde13dfc0c
e2e: have TF write-out HCL for CSI volume registration (#7599) 2020-04-02 12:16:43 -04:00
Seth Hoenig fc6b02c817 e2e: minimize Consul ACL policies used in e2e tests
Issue #7523 documents the Consul ACLs used in each Consul interface
used by Nomad. Minimize the policies used in e2e tests so that we
are setting a good example.
2020-03-30 12:53:40 -06:00
Tim Gross cd1c6173f4 csi: e2e tests for EBS and EFS plugins (#7343)
This changeset provides two basic e2e tests for CSI plugins targeting
common AWS use cases.

The EBS test launches the EBS plugin (controller + nodes) and registers
an EBS volume as a Nomad CSI volume. We deploy a job that writes to
the volume, stop that job, and reuse the volume for another job which
should be able to read the data written by the first job.

The EFS test launches the EFS plugin (nodes-only) and registers an EFS
volume as a Nomad CSI volume. We deploy a job that writes to the
volume, stop that job, and reuse the volume for another job which
should be able to read the data written by the first job.

The writer jobs mount the CSI volume at a location within the alloc
dir.
2020-03-23 13:59:18 -04:00
Mahmood Ali 857ddf7aaf e2e: use unique CSI token
Use a unique per-cluster efs creation token, as https://www.terraform.io/docs/providers/aws/r/efs_file_system.html#creation_token.

Using a static value prevents having multiple test clusters.

[ci skip]
2020-03-15 21:55:26 -04:00
Tim Gross 79222c36bf
e2e: add EBS and EFS volumes for testing CSI (#7266)
This changeset adds volumes but does not mount them to instances so
that we can test the mounting ("staging") via CSI plugins. The CSI
plugins themselves will be installed as Nomad jobs.

In order to ensure we can always mount the EFS volume, this changeset
pins the deployment of the cluster to a specific subnet. In future
work we should spread the cluster out among several AZs and test that
behavior explicitly.
2020-03-04 10:44:51 -05:00
Mahmood Ali f5bd51ec30 e2e: avoid parsing Args in pkg init
Golang 1.13 introduced a change in test flag parsing:

> testing
> ...
> Testing flags are now registered in the new Init function, which is invoked by the generated main function for the test. As a result, testing flags are now only registered when running a test binary, and packages that call flag.Parse during package initialization may cause tests to fail.

https://golang.org/doc/go1.13#testing

Here, we ensure that e2e framework parsing occur in TestMain, by only
initializing Framework at Run invocation.
2020-03-02 14:13:54 -05:00
Michael Schurter 2ab672c155 test: explicitly pass vars vs enclosing them 2020-02-14 11:10:33 -08:00
Michael Schurter aab1ad8c18 test: remove errgroup to take advantage of vet
go vet would have prevented the bug fixed in
6362e32161295fa959ebe46b93cea0ea1a9bdd72 but our use of errgroup
prevented that.

Rip out errgroup to take advantage of vet, and remove download limiting
now that we're downloading far fewer binaries overall.
2020-02-14 10:53:54 -08:00
Michael Schurter fb3e228af6 test: sort vault tests by version 2020-02-14 10:33:17 -08:00
Michael Schurter bc9e35aafb test: capture url to fix flaky test 2020-02-14 10:32:58 -08:00
Michael Schurter 32ecac58b6 test: only test latest Z of each X.Y.Z release 2020-02-14 08:41:45 -08:00
Michael Schurter 8c332a3757
Merge pull request #7102 from hashicorp/test-limits
Fix some race conditions and flaky tests
2020-02-13 10:19:11 -08:00
Michael Schurter 3170dfd452 test: simplify code 2020-02-07 15:50:53 -08:00
Tim Gross 0c6e164e8f
e2e: add --quiet flag to s3 copy to reduce log spam (#7085) 2020-02-06 09:24:20 -05:00
Seth Hoenig 351d32cd81
Merge pull request #7071 from hashicorp/b-e2e-cacls-wait-longer
e2e: wait 2m rather than 10s after disabling consul acls
2020-02-04 14:05:10 -06:00
Drew Bailey 7bee040e61
simplify job, better error 2020-02-04 13:59:39 -05:00
Drew Bailey 8b6de8f3d2
fix check 2020-02-04 12:16:20 -05:00
Drew Bailey b10c7cc94e
rm unused field 2020-02-04 12:02:01 -05:00
Drew Bailey a716d57ad7
clean up 2020-02-04 11:59:28 -05:00
Drew Bailey 75053a0d10
get test passing, new util func to wait for not pending 2020-02-04 11:56:37 -05:00
Drew Bailey 5117a22c30
add e2e test for system sched ineligible nodes 2020-02-04 11:56:33 -05:00
Seth Hoenig f4a66ebd28 e2e: wait 2m rather than 10s after disabling consul acls
Pretty sure Consul / Nomad clients are often not ready yet after
the ConsulACLs test disables ACLs, by the time the next test starts
running.

Running locally things tend to work, but in TeamCity this seems to
be a recurring problem. However, when running locally sometimes I do
see that the "show status" step after disabling ACLs, some nodes are
still initializing, suggesting we're right on the border of not waiting
long enough

    nomad node status
    ID        DC   Name              Class   Drain  Eligibility  Status
    0e4dfce2  dc1  EC2AMAZ-JB3NF9P   <none>  false  eligible     ready
    6b90aa06  dc2  ip-172-31-16-225  <none>  false  eligible     ready
    7068558a  dc2  ip-172-31-20-143  <none>  false  eligible     ready
    e0ae3c5c  dc1  ip-172-31-25-165  <none>  false  eligible     ready
    15b59ed6  dc1  ip-172-31-23-199  <none>  false  eligible     initializing

Going to try waiting a full 2 minutes after disabling ACLs, hopefully that
will help things Just Work. In the future, we should probably be parsing the
output of the status checks and actually confirming all nodes are ready.

Even better, maybe that's something shipyard will have built-in.
2020-02-04 10:51:03 -06:00
Tim Gross 0b48baf0ba
e2e: rename linux runner to avoid implicit build tag (#7070)
Go implicitly treats files ending with `_linux.go` as build tagged for
Linux only. This broke the e2e provisioning framework on macOS once we
tried importing it into the `e2e/consulacls` module.
2020-02-04 10:55:38 -05:00
Tim Gross 940110b2de
e2e: improve provisioning defaults and documentation (#7062)
This changeset improves the ergonomics of running the Nomad e2e test
provisioning process by defaulting to a blank `nomad_sha` in the
Terraform configuration. By default, a user will now need to pass in
one of the Nomad version flags. But they won't have to manually edit
the `provisioning.json` file for the common case of deploying a
released version of Nomad, and won't need to put dummy values for
`nomad_sha`.

Includes general documentation improvements.
2020-02-04 10:37:00 -05:00
Seth Hoenig 653c8fe9a5 e2e: turn no-ACLs connect tests back on
Also cleanup more missed debugging things >.>
2020-02-03 20:46:36 -06:00
Mahmood Ali 2424870937
Merge pull request #7055 from hashicorp/r-dev-tweaks-20200203
Grab bag of dev tweaks
2020-02-03 14:25:06 -05:00
Mahmood Ali 7171488e81 run "make hclfmt" 2020-02-03 12:15:53 -05:00
Seth Hoenig 057179edea e2e: remove leftover debug println statement 2020-02-03 11:15:38 -06:00
Seth Hoenig 9b20ca5b25 e2e: setup consul ACLs a little more correctly 2020-01-31 19:06:11 -06:00
Seth Hoenig 83c717a624 e2e: remove redundant extra API call for getting allocs 2020-01-31 19:06:07 -06:00
Seth Hoenig b212654b92 e2e: agent token was only being set for server0 2020-01-31 19:06:03 -06:00
Seth Hoenig f7a1e9cee3 e2e: use hclfmt on consul acls policy config files 2020-01-31 19:05:59 -06:00
Seth Hoenig e9e0d2e3fc e2e: uncomment test case that is not broken 2020-01-31 19:05:55 -06:00
Seth Hoenig df633ee45f e2e: do not use eventually when waiting for allocs
This test is causing panics. Unlike the other similar tests, this
one is using require.Eventually which is doing something bad, and
this change replaces it with a for-loop like the other tests.

Failure:

=== RUN   TestE2E/Connect
=== RUN   TestE2E/Connect/*connect.ConnectE2ETest
=== RUN   TestE2E/Connect/*connect.ConnectE2ETest/TestConnectDemo
=== RUN   TestE2E/Connect/*connect.ConnectE2ETest/TestMultiServiceConnect
=== RUN   TestE2E/Connect/*connect.ConnectClientStateE2ETest
panic: Fail in goroutine after TestE2E/Connect/*connect.ConnectE2ETest has completed

goroutine 38 [running]:
testing.(*common).Fail(0xc000656500)
	/opt/google/go/src/testing/testing.go:565 +0x11e
testing.(*common).Fail(0xc000656100)
	/opt/google/go/src/testing/testing.go:559 +0x96
testing.(*common).FailNow(0xc000656100)
	/opt/google/go/src/testing/testing.go:587 +0x2b
testing.(*common).Fatalf(0xc000656100, 0x1512f90, 0x10, 0xc000675f88, 0x1, 0x1)
	/opt/google/go/src/testing/testing.go:672 +0x91
github.com/hashicorp/nomad/e2e/connect.(*ConnectE2ETest).TestMultiServiceConnect.func1(0x0)
	/home/shoenig/go/src/github.com/hashicorp/nomad/e2e/connect/multi_service.go:72 +0x296
github.com/hashicorp/nomad/vendor/github.com/stretchr/testify/assert.Eventually.func1(0xc0004962a0, 0xc0002338f0)
	/home/shoenig/go/src/github.com/hashicorp/nomad/vendor/github.com/stretchr/testify/assert/assertions.go:1494 +0x27
created by github.com/hashicorp/nomad/vendor/github.com/stretchr/testify/assert.Eventually
	/home/shoenig/go/src/github.com/hashicorp/nomad/vendor/github.com/stretchr/testify/assert/assertions.go:1493 +0x272
FAIL	github.com/hashicorp/nomad/e2e	21.427s
2020-01-31 19:05:47 -06:00
Seth Hoenig 5e5fadbcdf e2e: remove forgotten unused field from new struct 2020-01-31 19:05:41 -06:00
Seth Hoenig fc498c2b96 e2e: e2e test for connect with consul acls
Provide script for managing Consul ACLs on a TF provisioned cluster for
e2e testing. Script can be used to 'enable' or 'disable' Consul ACLs,
and automatically takes care of the bootstrapping process if necessary.

The bootstrapping process takes a long time, so we may need to
extend the overall e2e timeout (20 minutes seems fine).

Introduces basic tests for Consul Connect with ACLs.
2020-01-31 19:05:36 -06:00
Seth Hoenig 93d347442f e2e: add a -suite flag to e2e.Framework
This change allows for providing the -suite=<Name> flag when
running the e2e framework. If set, only the matching e2e/Framework.TestSuite.Component
will be run, and all ther suites will be skipped.
2020-01-29 14:57:43 -06:00
Drew Bailey da4af9bef3
fix tests, update changelog 2020-01-29 13:55:39 -05:00
Tim Gross 7681f09ae4
e2e: packer builds should not be public (#6998) 2020-01-27 16:28:25 -05:00
Michael Schurter ed926a9d03
Merge pull request #6938 from hashicorp/e2e-vault
test: download Vault binaries for e2e test
2020-01-27 10:26:48 -08:00
Tim Gross 457e3ad5c6
e2e: document e2e provisioning process (#6976) 2020-01-22 16:55:17 -05:00
Tim Gross 29e1ed6b05
e2e: ensure group script check tests interpolation (#6972)
Fixes a bug introduced in 0aa58b9 where we're writing a test file to
a taskdir-interpolated location, which works when we `alloc exec` but
not in the jobspec for a group script check.

This changeset also makes the test safe to run multiple times by
namespacing the file with the alloc ID, which has the added bonus of
exercising our alloc interpolation code for group script checks.
2020-01-22 09:54:54 -05:00
Tim Gross 2edbdfc8be
e2e: update framework to allow deploying Nomad (#6969)
The e2e framework instantiates clients for Nomad/Consul but the
provisioning of the actual Nomad cluster is left to Terraform. The
Terraform provisioning process uses `remote-exec` to deploy specific
versions of Nomad so that we don't have to bake an AMI every time we
want to test a new version. But Terraform treats the resulting
instances as immutable, so we can't use the same tooling to update the
version of Nomad in-place. This is a prerequisite for upgrade testing.

This changeset extends the e2e framework to provide the option of
deploying Nomad (and, in the future, Consul/Vault) with specific
versions to running infrastructure. This initial implementation is
focused on deploying to a single cluster via `ssh` (because that's our
current need), but provides interfaces to hook the test run at the
start of the run, the start of each suite, or the start of a given
test case.

Terraform work includes:
* provides Terraform output that written to JSON used by the framework
  to configure provisioning via `terraform output provisioning`.
* provides Terraform output that can be used by test operators to
  configure their shell via `$(terraform output environment)`
* drops `remote-exec` provisioning steps from Terraform
* makes changes to the deployment scripts to ensure they can be run
  multiple times w/ different versions against the same host.
2020-01-22 08:48:52 -05:00
Tim Gross d6aac915a7
e2e: use valid jobspec for group check test (#6967)
Group service checks cannot interpolate task fields, because the task
fields are not available at the time the script check hook is created
for the group service. When f31482a was merged this e2e test began
failing because we are now correctly matching the script check ID to
the service ID, which revealed this jobspec was invalid.
2020-01-21 15:54:46 -05:00
Tim Gross 1e600d573d
e2e: improve reusability of provisioning scripts (#6942)
This changeset is part of the work to improve our E2E provisioning
process to allow our upgrade tests:

* Move more of the setup into the AMI image creation so it's a little
 more obvious to provisioning config authors which bits are essential
 to deploying a specific version of Nomad.

* Make the service file update do a systemd daemon-reload so that we
  can update an already-running cluster with the same script we use to
  deploy it initially.
2020-01-16 09:29:36 -05:00
Michael Schurter ffbfb60f40 test: restore e2e-test target and use -integration 2020-01-14 13:47:51 -08:00
Michael Schurter da4645e9a4 test: download Vault binaries for e2e test
Modernize Vault integration/e2e test a bit:

- Download from releases.hashicorp.com instead of using a hardcoded list
- Remove old unused make target e2e-test
- Use NOMAD_E2E env var instead of -integration flag
- Add a README

On my machine with ~250 Mbps internet it takes ~400s to download all
Vault binaries.
2020-01-14 11:02:02 -08:00
Nick Ethier 1f28633954
Merge pull request #6816 from hashicorp/b-multiple-envoy
connect: configure envoy to support multiple sidecars in the same alloc
2020-01-09 23:25:39 -05:00
Tim Gross b5bcfb533b
upgrade CNI plugins to 0.8.4 (#6921)
When multiple Connect-enabled task groups start on the same client
node, a race condition in the CNI plugins for creating iptables chains
causes one of the tasks to fail. We upstreamed a patch to CNI plugins
to make iptables chain creation idempotent.

This changeset updates end-to-end testing, development tooling, and
documentation to use 0.8.4 which includes our patch.
2020-01-09 10:57:07 -05:00
Tim Gross c11cc60674 commit a hclfmt to eliminate diffs after 'make dev' 2020-01-09 08:18:51 -05:00
Nick Ethier 7b931522f0 e2e: add test for multiple sevice sidecars in the same alloc 2020-01-06 12:48:35 -05:00
Tim Gross 4ba5691656
e2e: give metrics longer to settle (#6884)
Increase the shortened timeout after the first loop so that metrics
that take longer to come in aren't failing the test unnecessarily.

Move the check for empty alloc metrics into the loop so that if the
first values we get are empty we don't fail the test too early.
2019-12-20 10:39:35 -05:00
Tim Gross 9b2b4da3a4
e2e: run client/allocs metrics nightly tests vs Windows (#6850)
Adds Windows targets to the client/allocs metrics tests. Removes the
`allocstats` test, which covers less than these tests and is now
redundant.

Adds a firewall rule to our Windows instances so that the prometheus
server can scrape the Nomad HTTP API for metrics.
2019-12-16 08:34:17 -05:00
Tim Gross e439e927ed
e2e: run client/allocs metrics tests nightly (#6842)
Refactor the metrics end-to-end tests so they can be run with our e2e
test framework. Runs fabio/prometheus and a collection of jobs that
will cause metrics to be measured. We then query Prometheus to ensure
we're publishing those allocation metrics and some metrics from the
clients as well.

Includes adding a placeholder for running the same tests on Windows.
2019-12-12 12:45:16 -05:00
Seth Hoenig d81a091ccd
Merge pull request #6752 from hashicorp/docs-vault-token_period
docs: vault integration docs should reference new token_period field
2019-12-02 16:21:17 -05:00
Seth Hoenig 953e40c8ed docs: vault integration docs should reference new token_explicit_max_ttl field 2019-12-02 14:22:47 -06:00
Tim Gross 88cb95261b
e2e: add allocstats test for Windows (#6775)
Extends the BasicAllocStats test to include a test for Windows
clients, exercising stats via a powershell `raw_exec` job.

Adds `ListLinuxClientNodes` and `ListWindowsClientNodes` utils so that
we can scope tests to run only when Linux or Windows clients are
available. This prevents waiting on timeouts when running a subset of
the tests against a development cluster (vs our nightly test
cluster).
2019-11-26 08:05:42 -05:00
Mahmood Ali e626a145c6
Merge pull request #6713 from alrs/fix-e2e-cli-close-before-error
e2e/cli/command: close after error handling
2019-11-25 14:03:25 -05:00
Lars Lehtonen c9383ca17d
e2e/cli/command: Wait() after execution 2019-11-25 10:56:40 -08:00
Tim Gross c9d92f845f
e2e: add a Windows client to test runner (#6735)
* Adds a constraint to prevent tests from landing on Windows
* Improve Terraform output for mixed windows/linux clients
* Makes some Windows client config fixes from 0.10.2 testing
2019-11-25 13:31:00 -05:00
Tim Gross e012c2b5bf
Infrastructure for Windows e2e testing (#6584)
Includes:
* baseline Windows AMI
* initial pass at Terraform configurations
* OpenSSH for Windows

Using OpenSSH is a lot nicer for Nomad developers than winrm would be,
plus it lets us avoid passing around the Windows password in the
clear.

Note that now we're copying up all the provisioning scripts and
configs as a zipped bundle because TF's file provisioner dies in the
middle of pushing up multiple files (whereas `scp -r` works fine).

We're also running all the provisioning scripts inside the userdata by
polling for the zip file to show up (gross!). This is because
`remote-exec` provisioners are failing on Windows with the same symptoms as:

https://github.com/hashicorp/terraform/issues/17728

If we can't fix this, it'll prevent us from having multiple Windows
clients running until TF supports count interpolation in the
`template_file`, which is planned for a later 0.12 release.
2019-11-19 11:06:10 -05:00
Tim Gross 1210261fe2
hclfmt nomad jobspecs (#6724) 2019-11-19 10:36:41 -05:00
Drew Bailey 2befab6900
Merge pull request #6573 from hashicorp/update-cci-consul
updates default consul version to 1.6.1
2019-11-07 11:01:22 -05:00
Drew Bailey 1c2af019c6
update vagrant & packer consul versions 2019-11-07 10:13:14 -05:00
Drew Bailey 786989dbe3
New monitor pkg for shared monitor functionality
Adds new package that can be used by client and server RPC endpoints to
facilitate monitoring based off of a logger

clean up old code

small comment about write

rm old comment about minsize

rename to Monitor

Removes connection logic from monitor command

Keep connection logic in endpoints, use a channel to send results from
monitoring

use new multisink logger and interfaces

small test for dropped messages

update go-hclogger and update sink/intercept logger interfaces
2019-11-05 09:51:49 -05:00
Tim Gross 3e9ae481ce
e2e: refactor Consul configurations (#6559)
Ensure that we're reusing the base configuration between client and
servers without the possibility of drift. Reduce the amount of `sed`
mangling of the configuration file, and make recommended changes from
`shellcheck` for this section of the provisioning script.

Fixes some rebase errors on the Nomad config as well.
2019-10-28 09:27:40 -04:00
Tim Gross ba7e7413ef
e2e: refactor Nomad configuration (#6560)
Share base configuration for telemetry and consul. Have the server
configurations respect the `var.server_count` config. Make changes
recommended by `shellcheck` in the provisioning scripts for this section.

Switch to OS/arch-tagged release bundles on S3 for compatibility with
adding Windows builds in the near future.
2019-10-28 08:21:02 -04:00
Tim Gross 8be403f47b
e2e: refactor Vault configuration (#6561)
Match the configuration directory layout we're using for Consul and
other services. Make recommended changes from `shellcheck` for this
section of the provisioning script.
2019-10-25 15:29:01 -04:00
Tim Gross 87b3abddd3
e2e: use sockaddr for IP address configuration (#6548)
Update the Consul and Vault configs to take advantage of their
included `go-sockaddr` library for getting the IP addresses we need in
a portable way. This particularly avoids problems with "predictable"
interface names provided by systemd.

Also adds the `sockaddr` binary to the Packer build so we can use it
in our provisioning scripts.
2019-10-25 14:08:38 -04:00
Tim Gross efbd680d4e
e2e: split Packer build scripts from TF provisioning (#6542)
Make a clear split between Packer and Terraform provisioning steps:
the scripts in the `packer/linux` directory are run when we build the
AMI whereas the stuff in shared are run at Terraform provisioning time.

Merging all runtime provisioning scripts into a single script for each
of server/client solves the following:

* Userdata scripts can't take arguments, they can only be templated
  and that means we have to do TF escaping in bash/powershell scripts.
* TF provisioning scripts race with userdata scripts.
2019-10-25 08:08:24 -04:00
Tim Gross c648c4f998
e2e: upgrade terraform to 0.12.x (#6489) 2019-10-14 11:27:08 -04:00
Tim Gross 15e912ddd6
e2e: move remote-exec inline to script (#6488)
A failing script in a `remote-exec` provisioner's `inline` stanza
won't fail the provisioning step. This lets us continue on to execute
tests against potentially broken deployments, rather than letting us
know the provisioning itself failed.
2019-10-14 10:23:41 -04:00
Danielle Lancashire 199d24d6bf
chore: initial hclfmt 2019-10-11 14:00:05 +02:00
Lang Martin 0648402150
Merge pull request #6373 from hashicorp/b-raft-proto-upgrade
raft protocol defaults to version 2
2019-09-26 14:33:09 -04:00
Tim Gross d965a15490 driver/networking: don't recreate existing network namespaces 2019-09-25 14:58:17 -04:00
Tim Gross e86a476bbb failing test for #6310 2019-09-25 14:58:17 -04:00
Lang Martin 6e0ec6302b script e2e/upgrades: cluster upgrade scripts 2019-09-24 14:35:45 -04:00
Danielle 940bbcc639
Merge pull request #6342 from hashicorp/f-host-volume-e2e
Add Host Volumes E2E test
2019-09-18 12:59:32 -07:00
Tim Gross adde9acf57
e2e: test infra for client node restarts (#6313)
Add a test helper that restarts a specific client node running under
systemd using a `raw_exec` job.
2019-09-18 10:10:14 -04:00
Tim Gross 7061dcef4b
e2e: move consul status check helpers to e2eutil (#6314) 2019-09-18 08:18:19 -04:00
Danielle Lancashire 05d172ef2b
e2e: init host volumes test 2019-09-18 00:34:48 +02:00
Danielle Lancashire c50d7f2727
e2e: Add Host Volume Configuration 2019-09-17 20:06:50 +02:00
Tim Gross 55ee7a220b
e2e: fixes for race conditions in testing (#6300)
- In script checks, ensure we're running `Exec` against the new running
  allocation and not the earlier stopped one.
- In script checks, allow `Exec` calls to error due to lack of pty when
  we use the exec to kill the task.
- In `utils.go/RegisterAllocs`, force query for allocations to wait on
  wait index returned by registration call.
2019-09-10 13:45:16 -04:00