Commit Graph

198 Commits

Author SHA1 Message Date
Tim Gross 457e3ad5c6
e2e: document e2e provisioning process (#6976) 2020-01-22 16:55:17 -05:00
Tim Gross 29e1ed6b05
e2e: ensure group script check tests interpolation (#6972)
Fixes a bug introduced in 0aa58b9 where we're writing a test file to
a taskdir-interpolated location, which works when we `alloc exec` but
not in the jobspec for a group script check.

This changeset also makes the test safe to run multiple times by
namespacing the file with the alloc ID, which has the added bonus of
exercising our alloc interpolation code for group script checks.
2020-01-22 09:54:54 -05:00
Tim Gross 2edbdfc8be
e2e: update framework to allow deploying Nomad (#6969)
The e2e framework instantiates clients for Nomad/Consul but the
provisioning of the actual Nomad cluster is left to Terraform. The
Terraform provisioning process uses `remote-exec` to deploy specific
versions of Nomad so that we don't have to bake an AMI every time we
want to test a new version. But Terraform treats the resulting
instances as immutable, so we can't use the same tooling to update the
version of Nomad in-place. This is a prerequisite for upgrade testing.

This changeset extends the e2e framework to provide the option of
deploying Nomad (and, in the future, Consul/Vault) with specific
versions to running infrastructure. This initial implementation is
focused on deploying to a single cluster via `ssh` (because that's our
current need), but provides interfaces to hook the test run at the
start of the run, the start of each suite, or the start of a given
test case.

Terraform work includes:
* provides Terraform output that written to JSON used by the framework
  to configure provisioning via `terraform output provisioning`.
* provides Terraform output that can be used by test operators to
  configure their shell via `$(terraform output environment)`
* drops `remote-exec` provisioning steps from Terraform
* makes changes to the deployment scripts to ensure they can be run
  multiple times w/ different versions against the same host.
2020-01-22 08:48:52 -05:00
Tim Gross d6aac915a7
e2e: use valid jobspec for group check test (#6967)
Group service checks cannot interpolate task fields, because the task
fields are not available at the time the script check hook is created
for the group service. When f31482a was merged this e2e test began
failing because we are now correctly matching the script check ID to
the service ID, which revealed this jobspec was invalid.
2020-01-21 15:54:46 -05:00
Tim Gross 1e600d573d
e2e: improve reusability of provisioning scripts (#6942)
This changeset is part of the work to improve our E2E provisioning
process to allow our upgrade tests:

* Move more of the setup into the AMI image creation so it's a little
 more obvious to provisioning config authors which bits are essential
 to deploying a specific version of Nomad.

* Make the service file update do a systemd daemon-reload so that we
  can update an already-running cluster with the same script we use to
  deploy it initially.
2020-01-16 09:29:36 -05:00
Nick Ethier 1f28633954
Merge pull request #6816 from hashicorp/b-multiple-envoy
connect: configure envoy to support multiple sidecars in the same alloc
2020-01-09 23:25:39 -05:00
Tim Gross b5bcfb533b
upgrade CNI plugins to 0.8.4 (#6921)
When multiple Connect-enabled task groups start on the same client
node, a race condition in the CNI plugins for creating iptables chains
causes one of the tasks to fail. We upstreamed a patch to CNI plugins
to make iptables chain creation idempotent.

This changeset updates end-to-end testing, development tooling, and
documentation to use 0.8.4 which includes our patch.
2020-01-09 10:57:07 -05:00
Tim Gross c11cc60674 commit a hclfmt to eliminate diffs after 'make dev' 2020-01-09 08:18:51 -05:00
Nick Ethier 7b931522f0 e2e: add test for multiple sevice sidecars in the same alloc 2020-01-06 12:48:35 -05:00
Tim Gross 4ba5691656
e2e: give metrics longer to settle (#6884)
Increase the shortened timeout after the first loop so that metrics
that take longer to come in aren't failing the test unnecessarily.

Move the check for empty alloc metrics into the loop so that if the
first values we get are empty we don't fail the test too early.
2019-12-20 10:39:35 -05:00
Tim Gross 9b2b4da3a4
e2e: run client/allocs metrics nightly tests vs Windows (#6850)
Adds Windows targets to the client/allocs metrics tests. Removes the
`allocstats` test, which covers less than these tests and is now
redundant.

Adds a firewall rule to our Windows instances so that the prometheus
server can scrape the Nomad HTTP API for metrics.
2019-12-16 08:34:17 -05:00
Tim Gross e439e927ed
e2e: run client/allocs metrics tests nightly (#6842)
Refactor the metrics end-to-end tests so they can be run with our e2e
test framework. Runs fabio/prometheus and a collection of jobs that
will cause metrics to be measured. We then query Prometheus to ensure
we're publishing those allocation metrics and some metrics from the
clients as well.

Includes adding a placeholder for running the same tests on Windows.
2019-12-12 12:45:16 -05:00
Seth Hoenig d81a091ccd
Merge pull request #6752 from hashicorp/docs-vault-token_period
docs: vault integration docs should reference new token_period field
2019-12-02 16:21:17 -05:00
Seth Hoenig 953e40c8ed docs: vault integration docs should reference new token_explicit_max_ttl field 2019-12-02 14:22:47 -06:00
Tim Gross 88cb95261b
e2e: add allocstats test for Windows (#6775)
Extends the BasicAllocStats test to include a test for Windows
clients, exercising stats via a powershell `raw_exec` job.

Adds `ListLinuxClientNodes` and `ListWindowsClientNodes` utils so that
we can scope tests to run only when Linux or Windows clients are
available. This prevents waiting on timeouts when running a subset of
the tests against a development cluster (vs our nightly test
cluster).
2019-11-26 08:05:42 -05:00
Mahmood Ali e626a145c6
Merge pull request #6713 from alrs/fix-e2e-cli-close-before-error
e2e/cli/command: close after error handling
2019-11-25 14:03:25 -05:00
Lars Lehtonen c9383ca17d
e2e/cli/command: Wait() after execution 2019-11-25 10:56:40 -08:00
Tim Gross c9d92f845f
e2e: add a Windows client to test runner (#6735)
* Adds a constraint to prevent tests from landing on Windows
* Improve Terraform output for mixed windows/linux clients
* Makes some Windows client config fixes from 0.10.2 testing
2019-11-25 13:31:00 -05:00
Tim Gross e012c2b5bf
Infrastructure for Windows e2e testing (#6584)
Includes:
* baseline Windows AMI
* initial pass at Terraform configurations
* OpenSSH for Windows

Using OpenSSH is a lot nicer for Nomad developers than winrm would be,
plus it lets us avoid passing around the Windows password in the
clear.

Note that now we're copying up all the provisioning scripts and
configs as a zipped bundle because TF's file provisioner dies in the
middle of pushing up multiple files (whereas `scp -r` works fine).

We're also running all the provisioning scripts inside the userdata by
polling for the zip file to show up (gross!). This is because
`remote-exec` provisioners are failing on Windows with the same symptoms as:

https://github.com/hashicorp/terraform/issues/17728

If we can't fix this, it'll prevent us from having multiple Windows
clients running until TF supports count interpolation in the
`template_file`, which is planned for a later 0.12 release.
2019-11-19 11:06:10 -05:00
Tim Gross 1210261fe2
hclfmt nomad jobspecs (#6724) 2019-11-19 10:36:41 -05:00
Drew Bailey 2befab6900
Merge pull request #6573 from hashicorp/update-cci-consul
updates default consul version to 1.6.1
2019-11-07 11:01:22 -05:00
Drew Bailey 1c2af019c6
update vagrant & packer consul versions 2019-11-07 10:13:14 -05:00
Drew Bailey 786989dbe3
New monitor pkg for shared monitor functionality
Adds new package that can be used by client and server RPC endpoints to
facilitate monitoring based off of a logger

clean up old code

small comment about write

rm old comment about minsize

rename to Monitor

Removes connection logic from monitor command

Keep connection logic in endpoints, use a channel to send results from
monitoring

use new multisink logger and interfaces

small test for dropped messages

update go-hclogger and update sink/intercept logger interfaces
2019-11-05 09:51:49 -05:00
Tim Gross 3e9ae481ce
e2e: refactor Consul configurations (#6559)
Ensure that we're reusing the base configuration between client and
servers without the possibility of drift. Reduce the amount of `sed`
mangling of the configuration file, and make recommended changes from
`shellcheck` for this section of the provisioning script.

Fixes some rebase errors on the Nomad config as well.
2019-10-28 09:27:40 -04:00
Tim Gross ba7e7413ef
e2e: refactor Nomad configuration (#6560)
Share base configuration for telemetry and consul. Have the server
configurations respect the `var.server_count` config. Make changes
recommended by `shellcheck` in the provisioning scripts for this section.

Switch to OS/arch-tagged release bundles on S3 for compatibility with
adding Windows builds in the near future.
2019-10-28 08:21:02 -04:00
Tim Gross 8be403f47b
e2e: refactor Vault configuration (#6561)
Match the configuration directory layout we're using for Consul and
other services. Make recommended changes from `shellcheck` for this
section of the provisioning script.
2019-10-25 15:29:01 -04:00
Tim Gross 87b3abddd3
e2e: use sockaddr for IP address configuration (#6548)
Update the Consul and Vault configs to take advantage of their
included `go-sockaddr` library for getting the IP addresses we need in
a portable way. This particularly avoids problems with "predictable"
interface names provided by systemd.

Also adds the `sockaddr` binary to the Packer build so we can use it
in our provisioning scripts.
2019-10-25 14:08:38 -04:00
Tim Gross efbd680d4e
e2e: split Packer build scripts from TF provisioning (#6542)
Make a clear split between Packer and Terraform provisioning steps:
the scripts in the `packer/linux` directory are run when we build the
AMI whereas the stuff in shared are run at Terraform provisioning time.

Merging all runtime provisioning scripts into a single script for each
of server/client solves the following:

* Userdata scripts can't take arguments, they can only be templated
  and that means we have to do TF escaping in bash/powershell scripts.
* TF provisioning scripts race with userdata scripts.
2019-10-25 08:08:24 -04:00
Tim Gross c648c4f998
e2e: upgrade terraform to 0.12.x (#6489) 2019-10-14 11:27:08 -04:00
Tim Gross 15e912ddd6
e2e: move remote-exec inline to script (#6488)
A failing script in a `remote-exec` provisioner's `inline` stanza
won't fail the provisioning step. This lets us continue on to execute
tests against potentially broken deployments, rather than letting us
know the provisioning itself failed.
2019-10-14 10:23:41 -04:00
Danielle Lancashire 199d24d6bf
chore: initial hclfmt 2019-10-11 14:00:05 +02:00
Lang Martin 0648402150
Merge pull request #6373 from hashicorp/b-raft-proto-upgrade
raft protocol defaults to version 2
2019-09-26 14:33:09 -04:00
Tim Gross d965a15490 driver/networking: don't recreate existing network namespaces 2019-09-25 14:58:17 -04:00
Tim Gross e86a476bbb failing test for #6310 2019-09-25 14:58:17 -04:00
Lang Martin 6e0ec6302b script e2e/upgrades: cluster upgrade scripts 2019-09-24 14:35:45 -04:00
Danielle 940bbcc639
Merge pull request #6342 from hashicorp/f-host-volume-e2e
Add Host Volumes E2E test
2019-09-18 12:59:32 -07:00
Tim Gross adde9acf57
e2e: test infra for client node restarts (#6313)
Add a test helper that restarts a specific client node running under
systemd using a `raw_exec` job.
2019-09-18 10:10:14 -04:00
Tim Gross 7061dcef4b
e2e: move consul status check helpers to e2eutil (#6314) 2019-09-18 08:18:19 -04:00
Danielle Lancashire 05d172ef2b
e2e: init host volumes test 2019-09-18 00:34:48 +02:00
Danielle Lancashire c50d7f2727
e2e: Add Host Volume Configuration 2019-09-17 20:06:50 +02:00
Tim Gross 55ee7a220b
e2e: fixes for race conditions in testing (#6300)
- In script checks, ensure we're running `Exec` against the new running
  allocation and not the earlier stopped one.
- In script checks, allow `Exec` calls to error due to lack of pty when
  we use the exec to kill the task.
- In `utils.go/RegisterAllocs`, force query for allocations to wait on
  wait index returned by registration call.
2019-09-10 13:45:16 -04:00
Tim Gross 3469c50275
e2e: tag instances with origin (#6293)
When multiple developers are working on e2e testing, it helps to be
able to identify which infrastructure belongs to which Nomad SHA and
which developer. This adds tags to the EC2 instances.
2019-09-06 15:49:18 -04:00
Tim Gross ede48ae19c script checks: use cat instead of ls for exit code agreement 2019-09-06 11:17:00 -04:00
Tim Gross c9c612cc70 e2e: script check testing 2019-09-06 10:18:55 -04:00
Michael Schurter 228899c32f e2e: test demo job for connect 2019-09-04 12:40:08 -07:00
Tim Gross 7ee3333a2d e2e: filter default AMI by OS
Add an OS tag to Packer builds of our e2e test AMIs and then filters
by this in Terraform.
2019-08-30 16:51:13 -04:00
Danielle Lancashire d454dab39b
chore: Format hcl configurations 2019-07-20 16:55:07 +02:00
Michael Schurter a3fcb8fcca e2e: debug log level for everyone! 2019-07-18 06:55:27 -07:00
Michael Schurter ea68c930fe e2e: enable_debug=true for all agents
Enables the pprof http endpoint for debugging.
2019-07-17 15:20:45 -07:00
Preetha 0a2e21353f
Merge pull request #5912 from hashicorp/f-systemd-nofile
systemd: set a high but non-infinite fd limit
2019-07-11 12:31:12 -05:00