open-nomad

Commit Graph

Author	SHA1	Message	Date
Chris Baker	0a85d2bd24	Merge pull request #9089 from hashicorp/b-explicit-rune fix go 1.15 pickiness	2020-10-14 10:37:36 -05:00
Tim Gross	fe88003f29	e2e: eliminate race condition causing rescheduling test flake (#9085 ) The autorevert test checks for reverted allocations to be placed and running before checking the deployment status, but the deployment can be completed and marked "successful" before we check it for "running" status. Instead, just wait for it to be marked "successful" and assert we have the expected count of deployment statuses.	2020-10-14 11:35:30 -04:00
Tim Gross	76f1f5e5df	e2e: use AMI filter for Ubuntu packer image (#9086 ) Instead of hard-coding the base AMI for our Packer image for Ubuntu, use the latest from Canonical so that we always have their current kernel patches.	2020-10-14 11:22:33 -04:00
Chris Baker	d4bae840b2	fix go 1.15 pickiness	2020-10-14 15:19:54 +00:00
Nick Ethier	f5250499b9	e2e/networking: use correct dc (#9088 )	2020-10-14 11:14:09 -04:00
Tim Gross	115edb53a0	e2e: add flag to opt-in to creating EBS/EFS volumes (#9082 ) For everyday developer use, we don't need volumes for testing CSI. Providing a flag to opt-in speeds up deploying dev clusters and slightly reduces infra costs. Skip CSI test if missing volume specs.	2020-10-14 10:29:33 -04:00
Tim Gross	65282a7cf1	E2E: vault secrets (#9081 ) * rename vault API compatibility test for clarity * exercise vault secrets lease renewal	2020-10-14 08:43:28 -04:00
Nick Ethier	d45be0b5a6	client: add NetworkStatus to Allocation (#8657 )	2020-10-12 13:43:04 -04:00
Yoan Blanc	891accb89a	use allow/deny instead of the colored alternatives (#9019 ) Signed-off-by: Yoan Blanc <yoan@dosimple.ch>	2020-10-12 08:47:05 -04:00
Tim Gross	474c18102d	e2e: extend ConsulTemplate test and fix flakiness (#8997 ) Add service discovery integration to the existing consul-template E2E test, and verify both service and key updates force re-rendering. Fixes flakiness by using the longer default wait config we use elsewhere. Removes our last direct dependency on gomega.	2020-10-05 10:51:55 -04:00
Tim Gross	727277793b	e2e: bootstrap vault and provision Nomad with vault tokens (#9010 ) Provisions vault with the policies described in the Nomad Vault integration guide, and drops a configuration file for Nomad vault server configuration with its token. The vault root token is exposed to the E2E runner so that tests can write additional policies to vault.	2020-10-05 09:28:37 -04:00
Tim Gross	b6292528fe	e2e: tfvars.dev file must override default tfvars file (#9005 ) The `-var-file` flag for loading variables into Terraform overlays the default variables file if present. This means that variables that are set in the default variables file will take precedence if the overlay file does not have them set. Set `nomad_acls` and `nomad_enteprise` to `false` in the dev cluster.	2020-10-02 08:02:37 -04:00
Tim Gross	4bab91b81b	e2e: ensure tests are constrained to Linux (#8990 ) Until we have LCOW support in the E2E environment (which requires a Windows 2019 test target), we need to constrain E2E tests to the appropriate kernel	2020-09-30 09:43:30 -04:00
Tim Gross	e49410e97b	e2e: cleanup errors should use assert, not require (#8989 ) The E2E framework wraps testify's `require` so that by default we can stop tests on errors, but the cleanup functions should use `assert` so that we continue to try to cleanup the test environment even if there's a failure.	2020-09-30 09:00:37 -04:00
Tim Gross	fa1fa623f2	e2e: rework rescheduling progress deadline test (#8958 ) Eliminate sources of randomness in the progress deadline test and clarify the purpose of the test to check for progress deadline updates.	2020-09-29 11:02:16 -04:00
Tim Gross	6489c5f626	e2e: namespace support for CLI helpers (#8978 ) Required to support tests for namespaces and other ENT features.	2020-09-28 16:37:34 -04:00
Tim Gross	6bed4ec45b	e2e: ENT placeholder for namespace/quotas tests (#8973 )	2020-09-28 11:23:37 -04:00
Tim Gross	1311f32f1b	e2e: test for host volumes and Docker volumes (#8972 ) Exercises host volume and Docker volume functionality for the `exec` and `docker` task driver, particularly around mounting locations within the container and how this can be used with `template`.	2020-09-28 11:14:13 -04:00
Tim Gross	566dae7b19	e2e: add flag to bootstrap Nomad ACLs (#8961 ) Adds a `nomad_acls` flag to our Terraform stack that bootstraps Nomad ACLs via a `local-exec` provider. There's no way to set the `NOMAD_TOKEN` in the Nomad TF provider if we're bootstrapping in the same Terraform stack, so instead of using `resource.nomad_acl_token`, we also bootstrap a wide-open anonymous policy. The resulting management token is exported as an environment var with `$(terraform output environment)` and tests that want stricter ACLs will be able to write them using that token. This should also provide a basis to do similar work with Consul ACLs in the future.	2020-09-28 09:22:36 -04:00
Tim Gross	15d3f5ea7e	e2e: remove unused migrations test (#8955 ) The areas of the code this test exercised were merged in with the node drain tests.	2020-09-23 14:50:15 -04:00
Tim Gross	147b16243d	e2e: use more recent instance type (#8954 ) Newer EC2 instances are both cheaper and have generally better performance. The dnsmasq configuration had a hard-coded interface name, so in order to accomodate instances with more recent networking that result in so-called predictable interface names, the dnsmasq configuration needs to be replaced at runtime with userdata to select the default interface.	2020-09-23 14:27:52 -04:00
Tim Gross	1fc525ec1e	e2e: add flags for provisioning Nomad Enterprise (#8929 )	2020-09-23 10:39:04 -04:00
Tim Gross	9cbc604308	e2e: node drain tests (#8906 ) Exercise the `nomad node drain` features, driving them via the new CLI helpers.	2020-09-21 11:52:11 -04:00
Tim Gross	34093f7747	e2e: reschedule tests should check for non-zero rescheduled allocs (#8927 ) The conditional around some of the rescheduling tests was backwards, where we were waiting for allocations to be rescheduled but testing for a count of 0. The test was passing but flaky because if the check happened quickly enough before the scheduler rescheduled the allocations, it would pass.	2020-09-21 08:17:24 -04:00
Tim Gross	3da61545d5	make sure dev-cluster has the option to run windows config (#8928 )	2020-09-18 16:41:35 -04:00
Tim Gross	ea1f6408bf	e2e: remove unused framework provisioning code (#8908 )	2020-09-18 11:46:47 -04:00
Tim Gross	c413fa5e49	e2e: test script for Terraform logic (#8907 )	2020-09-18 11:46:40 -04:00
Tim Gross	9d37233eaf	e2e: provision cluster entirely through Terraform (#8748 ) Have Terraform run the target-specific `provision.sh`/`provision.ps1` script rather than the test runner code which needs to be customized for each distro. Use Terraform's detection of variable value changes so that we can re-run the provisioning without having to re-install Nomad on those specific hosts that need it changed. Allow the configuration "profile" (well-known directory) to be set by a Terraform variable. The default configurations are installed during Packer build time, and symlinked into the live configuration directory by the provision script. Detect changes in the file contents so that we only upload custom configuration files that have changed between Terraform runs	2020-09-18 11:27:24 -04:00
Tim Gross	990fcf7be4	e2e: documentation and minor tweaks to configs (#8912 ) * remove outdated references to envchain in documentation * add new host volume locations in userdata * don't exit the entire script during provisioning, just return	2020-09-17 09:20:18 -04:00
Tim Gross	d7a013b6f5	e2e: refactor CLI utils out of rescheduling test (#8905 ) The CLI helpers in the rescheduling test were intended for shared use, but until some other tests were written we didn't want to waste time making them generic. This changeset refactors them and adds some new helpers associated with the node drain tests (under separate PR).	2020-09-16 16:10:06 -04:00
Tim Gross	bd889c82aa	e2e: constrain rescheduling test workloads to Linux (#8872 ) The rescheduling test workloads were created before we had Windows targets in the E2E nightly run. When these were recently ported to the e2e framework they were missing the constraint to Linux machines. Also added a little extra time to polling to avoid some flakiness on the first run, and a minor readability adjustment to the job names.	2020-09-11 09:21:28 -04:00
Tim Gross	572ae37856	Merge pull request #8860 E2E: rescheduling tests	2020-09-10 13:43:55 -04:00
Tim Gross	294c7149a2	e2e: rescheduling tests Ports the rescheduling tests (which aren't running in CI) into the current test framework so that they're run on nightly, and exercises the new CLI helpers.	2020-09-10 13:00:37 -04:00
Tim Gross	28e9bbbbf4	e2e: helper for sending CLI commands and parsing output The E2E suite exercises the API, but not the CLI. This changeset adds a helper function to send commands via a locally-built Nomad binary (which we'll need to add to the E2E setup), and some helpers to parse the resulting structured outputs in a way that tests can consume.	2020-09-10 13:00:32 -04:00
Michael Schurter	5f3a71d0b9	docs: update scripts to 0.12.4	2020-09-09 15:22:37 -07:00
James Rasell	76b03d3a2f	e2e: fix failure in running metrics test suite jobs. When running the Fabio and Prometheus jobs for the metrics suite it seems the outer directory is required in the call when registering the job. error: "e2e/input/fabio.nomad: no such file or directory"	2020-09-09 08:40:35 +02:00
Tim Gross	f499b44101	e2e: move setup jobs for metrics test into that suite (#8842 ) The fabio and prometheus workloads are specific to the metrics test and aren't used by any other test suite.	2020-09-08 13:21:44 -04:00
Tim Gross	a47b1c1081	e2e: move configurations into profile-specific directories (#8828 ) This changeset stages upcoming E2E provisioning improvements work. It splits the existing shared configuration directory into 3 profiles: * "full-cluster": the set of configurations currently in use * "dev-cluster": a simplified set of mostly existing configurations that weren't in use. * "custom": an empty profile for developers to keep non-standard configurations during complex feature development. The tooling to switch between profiles will be in a later changeset. Also drops some unused configuration knobs from the provisioning scripts to make the next stage of work easier.	2020-09-04 11:23:32 -04:00
Tim Gross	93c1093274	e2e: remove unused EBS volumes and depends_on (#8827 ) Our provisioning process for E2E doesn't require the `depends_on` fields to be set for client instances, so dropping that field allows all instances to be started in parallel. We don't use the extra EBS volumes (they aren't even mounted), so remove them to reduce costs.	2020-09-04 10:25:59 -04:00
Tim Gross	0577b03479	e2e: minor rename and cleanup (#8824 )	2020-09-04 08:51:22 -04:00
Tim Gross	e6cdd8e0c0	e2e: consolidate cloud-specific Consul configs (#8823 ) The `-recursor` flag in the Consul service unit files is specific to a given cloud, but we already have cloud-specific configuration files. Consolidate all the cloud-specific items into the config.	2020-09-04 08:51:15 -04:00
Tim Gross	bc6ad011fe	e2e: Linux AMI setup cleanup (#8821 ) As we add new Linux targets for E2E, the existing setup.sh script will be used only for Ubuntu. Rather than have the service and config files echo'd from the script, move them into files we upload so they can be reused. Includes some general noise reduction in the setup.sh script and removal of unused bits.	2020-09-03 16:30:58 -04:00
Jasmine Dahilig	71a694f39c	Merge pull request #8390 from hashicorp/lifecycle-poststart-hook task lifecycle poststart hook	2020-08-31 13:53:24 -07:00
Jasmine Dahilig	fbe0c89ab1	task lifecycle poststart: code review fixes	2020-08-31 13:22:41 -07:00
Tim Gross	3a382f599f	e2e: minor TF refactor to split out vars and outputs (#8752 )	2020-08-26 17:00:36 -04:00
Tim Gross	8c8b91e7b9	e2e: move systemd unit files into Packer build (#8751 )	2020-08-26 16:45:09 -04:00
Tim Gross	693a8a2613	e2e: fix platform path for installing for Linux from s3 (#8708 )	2020-08-21 09:20:09 -04:00
Tim Gross	b23150057a	E2E: move Nomad installation to script on remote hosts (#8706 ) This changeset moves the installation of Nomad binaries out of the provisioning framework and into scripts that are installed on the remote host during AMI builds. This provides a few advantages: * The provisioning framework can be reduced in scope (with the goal of moving most of it into the Terraform stack entirely). * The scripts can be arbitrarily complex if we don't have to stuff them into ssh commands, so it's easier to make them idempotent. In this changeset, the scripts check the version of the existing binary and don't re-download when using the `--nomad_sha` or `--nomad_version` flags. * The scripts can be OS/distro specific, which helps in building new test targets.	2020-08-20 16:10:00 -04:00
Michael Schurter	86a31d0df6	Merge pull request #8701 from hashicorp/doc-e2e docs: clarify e2e tests	2020-08-20 08:53:58 -07:00
Jasmine Dahilig	a7b8adfe01	task lifecycle: e2e fix more alloc stop races	2020-08-20 08:49:58 -07:00
Jasmine Dahilig	681eb407db	task lifecycle: make e2e service job test block until poststart task has started	2020-08-20 08:11:16 -07:00
Tim Gross	0fd4a05b2f	E2E AMI cleanup (#8697 ) * move CNI install/podman config to build-time * move DNS config to userdata * consolidate apt updates for performance	2020-08-20 10:09:31 -04:00
Michael Schurter	72bd8f477c	docs: clarify e2e tests Just a smattering of attempted improvements as I read through this again. Some of my goals: - Tried to add more high level info to the intro to set the context - Clarify the difference between test dev and agent dev workflows - Add -timeout to provisioning step because cable Internet is lol	2020-08-19 20:32:31 -07:00
Michael Schurter	66bc07d01a	test: deflake consul e2e tests Modernize test patterns by removing gomega and avoiding the mock_driver.	2020-08-19 14:29:22 -07:00
Tim Gross	9a3caa49db	e2e: remove unused spark dependency (#8695 )	2020-08-19 14:59:36 -04:00
Tim Gross	a49732816c	migrate AMI builds to new account (#8674 )	2020-08-19 08:20:59 -04:00
Tim Gross	d810dab50b	migrate E2E test runs to new AWS account (#8676 )	2020-08-18 14:24:34 -04:00
Jasmine Dahilig	ee522ab587	task lifecycle: e2e tests	2020-08-18 10:49:50 -07:00
Drew Bailey	76d7d926a7	skip podman e2e	2020-08-14 09:02:56 -04:00
Tim Gross	09a97bd158	e2e: spread CSI controller plugins across multiple DCs (#8629 ) Controller plugins that land on the same node will collide over their CSI `mount_dir`, so give them enough room in our tests that they don't land on the same host. Also, version bump the EBS node plugins to match the controllers.	2020-08-10 16:41:39 -04:00
Tim Gross	12984ed1c9	e2e: CSI EBS test should expect 2 controllers (#8617 )	2020-08-10 09:41:21 -04:00
Tim Gross	fa6ec931f8	e2e: CSI EBS version bump to 0.6.0 (#8618 )	2020-08-10 09:41:13 -04:00
Tim Gross	5dba653b43	csi/e2e: add 2nd controller for node drain testing (#8573 )	2020-07-31 08:03:49 -04:00
Tim Gross	87f9bfaf1e	e2e/csi: update EFS plugin test to use v1.0 (#8562 )	2020-07-30 08:41:48 -04:00
Tim Gross	d0b03cad7c	e2e: give containers access to dnsmasq DNS (#8536 ) By default, Docker containers get /etc/resolv.conf bound into the container with the localhost entry stripped out. In order to resolve using the host's dnsmasq, we need to make sure the container uses the docker0 IP as its nameserver and that dnsmasq is listening on that port and forwarding to either the AWS VPC DNS (so that we can query private resources like EFS) or to the Consul DNS.	2020-07-24 14:09:18 -04:00
Lang Martin	deb37c91b7	e2e/bin/run: run & update only attempt to contact linux servers (#8517 )	2020-07-24 10:52:12 -04:00
Seth Hoenig	c202d0f134	Merge pull request #8335 from hashicorp/f-cnative-host-e2e e2e: add tests for connect native	2020-07-10 10:24:43 -05:00
Seth Hoenig	ac8b51b611	e2e: connect jobID code golf	2020-07-10 10:24:13 -05:00
Drew Bailey	01b01f7cac	use latest podman release (#8403 )	2020-07-09 09:28:53 -04:00
Seth Hoenig	a9991e9ab9	e2e: add tests for connect native Adds 2 tests around Connect Native. Both make use of the example connect native services in https://github.com/hashicorp/nomad-connect-examples One of them runs without Consul ACLs enabled, the other with.	2020-07-01 15:54:28 -05:00
Tim Gross	23be116da0	csi: add -force flag to volume deregister (#8295 ) The `nomad volume deregister` command currently returns an error if the volume has any claims, but in cases where the claims can't be dropped because of plugin errors, providing a `-force` flag gives the operator an escape hatch. If the volume has no allocations or if they are all terminal, this flag deletes the volume from the state store, immediately and implicitly dropping all claims without further CSI RPCs. Note that this will not also unmount/detach the volume, which we'll make the responsibility of a separate `nomad volume detach` command.	2020-07-01 12:17:51 -04:00
Drew Bailey	327843acfa	base podman e2e test and provisioning updates (#8104 ) * initial setup for terrform to install podman task driver podman * Update e2e provisioning to support root podman Excludes setup for rootless podman. updates source ami to ubuntu 18.04 Installs podman and configures podman varlink base podman test ensure client status running revert terraform directory changes * back out random go-discover go mod change * include podman varlink docs * address comments	2020-06-03 14:06:58 -04:00
Seth Hoenig	889e7ddd0c	build: use hashicorp hclfmt We have been using fatih/hclfmt which is long abandoned. Instead, switch to HashiCorp's own hclfmt implementation. There are some trivial changes in behavior around whitespace.	2020-05-24 18:31:57 -05:00
Tim Gross	932710ad7d	e2e: upgrade CNI to 0.8.6 (#7956 )	2020-05-14 09:29:11 -04:00
Seth Hoenig	623c804046	e2e: upgrade consul in packer setup to 1.7.3 from 1.6.1 There have been a number of bug fixes and features particularly around Connect that will help us in Nomad's e2e tests. Upgrade Consul in our packer builder so e2e can make use of the new version.	2020-05-11 11:17:28 -06:00
Seth Hoenig	aae8a8504e	e2e: set an expose service check in connect e2e testcase Make sure exposed checks work in e2e by setting an expose check on the e2e connect test.	2020-05-07 14:40:03 -06:00
Tim Gross	139c65c436	e2e: csi test can purge target job (#7823 )	2020-05-01 13:25:50 -04:00
Tim Gross	4935b304a0	e2e: add helper to Makefile for local file deployments (#7822 )	2020-04-28 16:15:58 -04:00
Tim Gross	ab3086a1f4	e2e: testing reliability (#7701 ) * pin CSI plugin versions * ensure failing CSI tests clean up * allow NOMAD_SHA env var to override makefile	2020-04-13 10:25:24 -04:00
Mahmood Ali	c8eddb9f6b	fixup! e2e: add a convenient creation script	2020-04-09 11:04:26 -04:00
Mahmood Ali	8a4937d9ce	e2e: add a convenient creation script Add a convenience Makefile for creating e2e environment for manual debugging.	2020-04-09 10:54:30 -04:00
Lang Martin	c0dbcbef5f	e2e: csi: wait for volume write claims to be released before starting read jobs (#7641 )	2020-04-07 07:40:44 -04:00
Tim Gross	50f807060a	e2e: csi tests can only run on linux (#7635 )	2020-04-06 11:57:59 -04:00
Tim Gross	73dc2ad443	e2e/csi: add waiting for alloc stop	2020-04-06 10:15:55 -04:00
Tim Gross	d81797ea33	e2e: improve test reliability for CSI (#7616 ) This changeset: * adds eval status to the error messages emitted when we have placement failure in tests. The implementation here isn't quite perfect but it's a lot better than "condition not met". * enforces the ordering of teardown of the CSI test * doesn't pass the purge flag to one of the two CSI tests, so that we exercise both code paths.	2020-04-03 15:52:58 -04:00
Tim Gross	4c51687cbf	e2e: remove gometa from e2eutils (#7610 )	2020-04-03 10:22:22 -04:00
Tim Gross	bde13dfc0c	e2e: have TF write-out HCL for CSI volume registration (#7599 )	2020-04-02 12:16:43 -04:00
Seth Hoenig	fc6b02c817	e2e: minimize Consul ACL policies used in e2e tests Issue #7523 documents the Consul ACLs used in each Consul interface used by Nomad. Minimize the policies used in e2e tests so that we are setting a good example.	2020-03-30 12:53:40 -06:00
Tim Gross	cd1c6173f4	csi: e2e tests for EBS and EFS plugins (#7343 ) This changeset provides two basic e2e tests for CSI plugins targeting common AWS use cases. The EBS test launches the EBS plugin (controller + nodes) and registers an EBS volume as a Nomad CSI volume. We deploy a job that writes to the volume, stop that job, and reuse the volume for another job which should be able to read the data written by the first job. The EFS test launches the EFS plugin (nodes-only) and registers an EFS volume as a Nomad CSI volume. We deploy a job that writes to the volume, stop that job, and reuse the volume for another job which should be able to read the data written by the first job. The writer jobs mount the CSI volume at a location within the alloc dir.	2020-03-23 13:59:18 -04:00
Mahmood Ali	857ddf7aaf	e2e: use unique CSI token Use a unique per-cluster efs creation token, as https://www.terraform.io/docs/providers/aws/r/efs_file_system.html#creation_token. Using a static value prevents having multiple test clusters. [ci skip]	2020-03-15 21:55:26 -04:00
Tim Gross	79222c36bf	e2e: add EBS and EFS volumes for testing CSI (#7266 ) This changeset adds volumes but does not mount them to instances so that we can test the mounting ("staging") via CSI plugins. The CSI plugins themselves will be installed as Nomad jobs. In order to ensure we can always mount the EFS volume, this changeset pins the deployment of the cluster to a specific subnet. In future work we should spread the cluster out among several AZs and test that behavior explicitly.	2020-03-04 10:44:51 -05:00
Mahmood Ali	f5bd51ec30	e2e: avoid parsing Args in pkg init Golang 1.13 introduced a change in test flag parsing: > testing > ... > Testing flags are now registered in the new Init function, which is invoked by the generated main function for the test. As a result, testing flags are now only registered when running a test binary, and packages that call flag.Parse during package initialization may cause tests to fail. https://golang.org/doc/go1.13#testing Here, we ensure that e2e framework parsing occur in TestMain, by only initializing Framework at Run invocation.	2020-03-02 14:13:54 -05:00
Michael Schurter	2ab672c155	test: explicitly pass vars vs enclosing them	2020-02-14 11:10:33 -08:00
Michael Schurter	aab1ad8c18	test: remove errgroup to take advantage of vet go vet would have prevented the bug fixed in 6362e32161295fa959ebe46b93cea0ea1a9bdd72 but our use of errgroup prevented that. Rip out errgroup to take advantage of vet, and remove download limiting now that we're downloading far fewer binaries overall.	2020-02-14 10:53:54 -08:00
Michael Schurter	fb3e228af6	test: sort vault tests by version	2020-02-14 10:33:17 -08:00
Michael Schurter	bc9e35aafb	test: capture url to fix flaky test	2020-02-14 10:32:58 -08:00
Michael Schurter	32ecac58b6	test: only test latest Z of each X.Y.Z release	2020-02-14 08:41:45 -08:00
Michael Schurter	8c332a3757	Merge pull request #7102 from hashicorp/test-limits Fix some race conditions and flaky tests	2020-02-13 10:19:11 -08:00
Michael Schurter	3170dfd452	test: simplify code	2020-02-07 15:50:53 -08:00
Tim Gross	0c6e164e8f	e2e: add --quiet flag to s3 copy to reduce log spam (#7085 )	2020-02-06 09:24:20 -05:00
Seth Hoenig	351d32cd81	Merge pull request #7071 from hashicorp/b-e2e-cacls-wait-longer e2e: wait 2m rather than 10s after disabling consul acls	2020-02-04 14:05:10 -06:00
Drew Bailey	7bee040e61	simplify job, better error	2020-02-04 13:59:39 -05:00
Drew Bailey	8b6de8f3d2	fix check	2020-02-04 12:16:20 -05:00
Drew Bailey	b10c7cc94e	rm unused field	2020-02-04 12:02:01 -05:00
Drew Bailey	a716d57ad7	clean up	2020-02-04 11:59:28 -05:00
Drew Bailey	75053a0d10	get test passing, new util func to wait for not pending	2020-02-04 11:56:37 -05:00
Drew Bailey	5117a22c30	add e2e test for system sched ineligible nodes	2020-02-04 11:56:33 -05:00
Seth Hoenig	f4a66ebd28	e2e: wait 2m rather than 10s after disabling consul acls Pretty sure Consul / Nomad clients are often not ready yet after the ConsulACLs test disables ACLs, by the time the next test starts running. Running locally things tend to work, but in TeamCity this seems to be a recurring problem. However, when running locally sometimes I do see that the "show status" step after disabling ACLs, some nodes are still initializing, suggesting we're right on the border of not waiting long enough nomad node status ID DC Name Class Drain Eligibility Status 0e4dfce2 dc1 EC2AMAZ-JB3NF9P <none> false eligible ready 6b90aa06 dc2 ip-172-31-16-225 <none> false eligible ready 7068558a dc2 ip-172-31-20-143 <none> false eligible ready e0ae3c5c dc1 ip-172-31-25-165 <none> false eligible ready 15b59ed6 dc1 ip-172-31-23-199 <none> false eligible initializing Going to try waiting a full 2 minutes after disabling ACLs, hopefully that will help things Just Work. In the future, we should probably be parsing the output of the status checks and actually confirming all nodes are ready. Even better, maybe that's something shipyard will have built-in.	2020-02-04 10:51:03 -06:00
Tim Gross	0b48baf0ba	e2e: rename linux runner to avoid implicit build tag (#7070 ) Go implicitly treats files ending with `_linux.go` as build tagged for Linux only. This broke the e2e provisioning framework on macOS once we tried importing it into the `e2e/consulacls` module.	2020-02-04 10:55:38 -05:00
Tim Gross	940110b2de	e2e: improve provisioning defaults and documentation (#7062 ) This changeset improves the ergonomics of running the Nomad e2e test provisioning process by defaulting to a blank `nomad_sha` in the Terraform configuration. By default, a user will now need to pass in one of the Nomad version flags. But they won't have to manually edit the `provisioning.json` file for the common case of deploying a released version of Nomad, and won't need to put dummy values for `nomad_sha`. Includes general documentation improvements.	2020-02-04 10:37:00 -05:00
Seth Hoenig	653c8fe9a5	e2e: turn no-ACLs connect tests back on Also cleanup more missed debugging things >.>	2020-02-03 20:46:36 -06:00
Mahmood Ali	2424870937	Merge pull request #7055 from hashicorp/r-dev-tweaks-20200203 Grab bag of dev tweaks	2020-02-03 14:25:06 -05:00
Mahmood Ali	7171488e81	run "make hclfmt"	2020-02-03 12:15:53 -05:00
Seth Hoenig	057179edea	e2e: remove leftover debug println statement	2020-02-03 11:15:38 -06:00
Seth Hoenig	9b20ca5b25	e2e: setup consul ACLs a little more correctly	2020-01-31 19:06:11 -06:00
Seth Hoenig	83c717a624	e2e: remove redundant extra API call for getting allocs	2020-01-31 19:06:07 -06:00
Seth Hoenig	b212654b92	e2e: agent token was only being set for server0	2020-01-31 19:06:03 -06:00
Seth Hoenig	f7a1e9cee3	e2e: use hclfmt on consul acls policy config files	2020-01-31 19:05:59 -06:00
Seth Hoenig	e9e0d2e3fc	e2e: uncomment test case that is not broken	2020-01-31 19:05:55 -06:00
Seth Hoenig	df633ee45f	e2e: do not use eventually when waiting for allocs This test is causing panics. Unlike the other similar tests, this one is using require.Eventually which is doing something bad, and this change replaces it with a for-loop like the other tests. Failure: === RUN TestE2E/Connect === RUN TestE2E/Connect/connect.ConnectE2ETest === RUN TestE2E/Connect/connect.ConnectE2ETest/TestConnectDemo === RUN TestE2E/Connect/connect.ConnectE2ETest/TestMultiServiceConnect === RUN TestE2E/Connect/connect.ConnectClientStateE2ETest panic: Fail in goroutine after TestE2E/Connect/connect.ConnectE2ETest has completed goroutine 38 [running]: testing.(common).Fail(0xc000656500) /opt/google/go/src/testing/testing.go:565 +0x11e testing.(common).Fail(0xc000656100) /opt/google/go/src/testing/testing.go:559 +0x96 testing.(common).FailNow(0xc000656100) /opt/google/go/src/testing/testing.go:587 +0x2b testing.(common).Fatalf(0xc000656100, 0x1512f90, 0x10, 0xc000675f88, 0x1, 0x1) /opt/google/go/src/testing/testing.go:672 +0x91 github.com/hashicorp/nomad/e2e/connect.(ConnectE2ETest).TestMultiServiceConnect.func1(0x0) /home/shoenig/go/src/github.com/hashicorp/nomad/e2e/connect/multi_service.go:72 +0x296 github.com/hashicorp/nomad/vendor/github.com/stretchr/testify/assert.Eventually.func1(0xc0004962a0, 0xc0002338f0) /home/shoenig/go/src/github.com/hashicorp/nomad/vendor/github.com/stretchr/testify/assert/assertions.go:1494 +0x27 created by github.com/hashicorp/nomad/vendor/github.com/stretchr/testify/assert.Eventually /home/shoenig/go/src/github.com/hashicorp/nomad/vendor/github.com/stretchr/testify/assert/assertions.go:1493 +0x272 FAIL github.com/hashicorp/nomad/e2e 21.427s	2020-01-31 19:05:47 -06:00
Seth Hoenig	5e5fadbcdf	e2e: remove forgotten unused field from new struct	2020-01-31 19:05:41 -06:00
Seth Hoenig	fc498c2b96	e2e: e2e test for connect with consul acls Provide script for managing Consul ACLs on a TF provisioned cluster for e2e testing. Script can be used to 'enable' or 'disable' Consul ACLs, and automatically takes care of the bootstrapping process if necessary. The bootstrapping process takes a long time, so we may need to extend the overall e2e timeout (20 minutes seems fine). Introduces basic tests for Consul Connect with ACLs.	2020-01-31 19:05:36 -06:00
Seth Hoenig	93d347442f	e2e: add a -suite flag to e2e.Framework This change allows for providing the -suite=<Name> flag when running the e2e framework. If set, only the matching e2e/Framework.TestSuite.Component will be run, and all ther suites will be skipped.	2020-01-29 14:57:43 -06:00
Drew Bailey	da4af9bef3	fix tests, update changelog	2020-01-29 13:55:39 -05:00
Tim Gross	7681f09ae4	e2e: packer builds should not be public (#6998 )	2020-01-27 16:28:25 -05:00
Michael Schurter	ed926a9d03	Merge pull request #6938 from hashicorp/e2e-vault test: download Vault binaries for e2e test	2020-01-27 10:26:48 -08:00
Tim Gross	457e3ad5c6	e2e: document e2e provisioning process (#6976 )	2020-01-22 16:55:17 -05:00
Tim Gross	29e1ed6b05	e2e: ensure group script check tests interpolation (#6972 ) Fixes a bug introduced in 0aa58b9 where we're writing a test file to a taskdir-interpolated location, which works when we `alloc exec` but not in the jobspec for a group script check. This changeset also makes the test safe to run multiple times by namespacing the file with the alloc ID, which has the added bonus of exercising our alloc interpolation code for group script checks.	2020-01-22 09:54:54 -05:00
Tim Gross	2edbdfc8be	e2e: update framework to allow deploying Nomad (#6969 ) The e2e framework instantiates clients for Nomad/Consul but the provisioning of the actual Nomad cluster is left to Terraform. The Terraform provisioning process uses `remote-exec` to deploy specific versions of Nomad so that we don't have to bake an AMI every time we want to test a new version. But Terraform treats the resulting instances as immutable, so we can't use the same tooling to update the version of Nomad in-place. This is a prerequisite for upgrade testing. This changeset extends the e2e framework to provide the option of deploying Nomad (and, in the future, Consul/Vault) with specific versions to running infrastructure. This initial implementation is focused on deploying to a single cluster via `ssh` (because that's our current need), but provides interfaces to hook the test run at the start of the run, the start of each suite, or the start of a given test case. Terraform work includes: * provides Terraform output that written to JSON used by the framework to configure provisioning via `terraform output provisioning`. * provides Terraform output that can be used by test operators to configure their shell via `$(terraform output environment)` * drops `remote-exec` provisioning steps from Terraform * makes changes to the deployment scripts to ensure they can be run multiple times w/ different versions against the same host.	2020-01-22 08:48:52 -05:00
Tim Gross	d6aac915a7	e2e: use valid jobspec for group check test (#6967 ) Group service checks cannot interpolate task fields, because the task fields are not available at the time the script check hook is created for the group service. When f31482a was merged this e2e test began failing because we are now correctly matching the script check ID to the service ID, which revealed this jobspec was invalid.	2020-01-21 15:54:46 -05:00
Tim Gross	1e600d573d	e2e: improve reusability of provisioning scripts (#6942 ) This changeset is part of the work to improve our E2E provisioning process to allow our upgrade tests: * Move more of the setup into the AMI image creation so it's a little more obvious to provisioning config authors which bits are essential to deploying a specific version of Nomad. * Make the service file update do a systemd daemon-reload so that we can update an already-running cluster with the same script we use to deploy it initially.	2020-01-16 09:29:36 -05:00
Michael Schurter	ffbfb60f40	test: restore e2e-test target and use -integration	2020-01-14 13:47:51 -08:00
Michael Schurter	da4645e9a4	test: download Vault binaries for e2e test Modernize Vault integration/e2e test a bit: - Download from releases.hashicorp.com instead of using a hardcoded list - Remove old unused make target e2e-test - Use NOMAD_E2E env var instead of -integration flag - Add a README On my machine with ~250 Mbps internet it takes ~400s to download all Vault binaries.	2020-01-14 11:02:02 -08:00
Nick Ethier	1f28633954	Merge pull request #6816 from hashicorp/b-multiple-envoy connect: configure envoy to support multiple sidecars in the same alloc	2020-01-09 23:25:39 -05:00
Tim Gross	b5bcfb533b	upgrade CNI plugins to 0.8.4 (#6921 ) When multiple Connect-enabled task groups start on the same client node, a race condition in the CNI plugins for creating iptables chains causes one of the tasks to fail. We upstreamed a patch to CNI plugins to make iptables chain creation idempotent. This changeset updates end-to-end testing, development tooling, and documentation to use 0.8.4 which includes our patch.	2020-01-09 10:57:07 -05:00
Tim Gross	c11cc60674	commit a hclfmt to eliminate diffs after 'make dev'	2020-01-09 08:18:51 -05:00
Nick Ethier	7b931522f0	e2e: add test for multiple sevice sidecars in the same alloc	2020-01-06 12:48:35 -05:00
Tim Gross	4ba5691656	e2e: give metrics longer to settle (#6884 ) Increase the shortened timeout after the first loop so that metrics that take longer to come in aren't failing the test unnecessarily. Move the check for empty alloc metrics into the loop so that if the first values we get are empty we don't fail the test too early.	2019-12-20 10:39:35 -05:00
Tim Gross	9b2b4da3a4	e2e: run client/allocs metrics nightly tests vs Windows (#6850 ) Adds Windows targets to the client/allocs metrics tests. Removes the `allocstats` test, which covers less than these tests and is now redundant. Adds a firewall rule to our Windows instances so that the prometheus server can scrape the Nomad HTTP API for metrics.	2019-12-16 08:34:17 -05:00
Tim Gross	e439e927ed	e2e: run client/allocs metrics tests nightly (#6842 ) Refactor the metrics end-to-end tests so they can be run with our e2e test framework. Runs fabio/prometheus and a collection of jobs that will cause metrics to be measured. We then query Prometheus to ensure we're publishing those allocation metrics and some metrics from the clients as well. Includes adding a placeholder for running the same tests on Windows.	2019-12-12 12:45:16 -05:00
Seth Hoenig	d81a091ccd	Merge pull request #6752 from hashicorp/docs-vault-token_period docs: vault integration docs should reference new token_period field	2019-12-02 16:21:17 -05:00
Seth Hoenig	953e40c8ed	docs: vault integration docs should reference new token_explicit_max_ttl field	2019-12-02 14:22:47 -06:00
Tim Gross	88cb95261b	e2e: add allocstats test for Windows (#6775 ) Extends the BasicAllocStats test to include a test for Windows clients, exercising stats via a powershell `raw_exec` job. Adds `ListLinuxClientNodes` and `ListWindowsClientNodes` utils so that we can scope tests to run only when Linux or Windows clients are available. This prevents waiting on timeouts when running a subset of the tests against a development cluster (vs our nightly test cluster).	2019-11-26 08:05:42 -05:00
Mahmood Ali	e626a145c6	Merge pull request #6713 from alrs/fix-e2e-cli-close-before-error e2e/cli/command: close after error handling	2019-11-25 14:03:25 -05:00
Lars Lehtonen	c9383ca17d	e2e/cli/command: Wait() after execution	2019-11-25 10:56:40 -08:00
Tim Gross	c9d92f845f	e2e: add a Windows client to test runner (#6735 ) * Adds a constraint to prevent tests from landing on Windows * Improve Terraform output for mixed windows/linux clients * Makes some Windows client config fixes from 0.10.2 testing	2019-11-25 13:31:00 -05:00
Tim Gross	e012c2b5bf	Infrastructure for Windows e2e testing (#6584 ) Includes: * baseline Windows AMI * initial pass at Terraform configurations * OpenSSH for Windows Using OpenSSH is a lot nicer for Nomad developers than winrm would be, plus it lets us avoid passing around the Windows password in the clear. Note that now we're copying up all the provisioning scripts and configs as a zipped bundle because TF's file provisioner dies in the middle of pushing up multiple files (whereas `scp -r` works fine). We're also running all the provisioning scripts inside the userdata by polling for the zip file to show up (gross!). This is because `remote-exec` provisioners are failing on Windows with the same symptoms as: https://github.com/hashicorp/terraform/issues/17728 If we can't fix this, it'll prevent us from having multiple Windows clients running until TF supports count interpolation in the `template_file`, which is planned for a later 0.12 release.	2019-11-19 11:06:10 -05:00
Tim Gross	1210261fe2	hclfmt nomad jobspecs (#6724 )	2019-11-19 10:36:41 -05:00
Drew Bailey	2befab6900	Merge pull request #6573 from hashicorp/update-cci-consul updates default consul version to 1.6.1	2019-11-07 11:01:22 -05:00
Drew Bailey	1c2af019c6	update vagrant & packer consul versions	2019-11-07 10:13:14 -05:00
Drew Bailey	786989dbe3	New monitor pkg for shared monitor functionality Adds new package that can be used by client and server RPC endpoints to facilitate monitoring based off of a logger clean up old code small comment about write rm old comment about minsize rename to Monitor Removes connection logic from monitor command Keep connection logic in endpoints, use a channel to send results from monitoring use new multisink logger and interfaces small test for dropped messages update go-hclogger and update sink/intercept logger interfaces	2019-11-05 09:51:49 -05:00
Tim Gross	3e9ae481ce	e2e: refactor Consul configurations (#6559 ) Ensure that we're reusing the base configuration between client and servers without the possibility of drift. Reduce the amount of `sed` mangling of the configuration file, and make recommended changes from `shellcheck` for this section of the provisioning script. Fixes some rebase errors on the Nomad config as well.	2019-10-28 09:27:40 -04:00
Tim Gross	ba7e7413ef	e2e: refactor Nomad configuration (#6560 ) Share base configuration for telemetry and consul. Have the server configurations respect the `var.server_count` config. Make changes recommended by `shellcheck` in the provisioning scripts for this section. Switch to OS/arch-tagged release bundles on S3 for compatibility with adding Windows builds in the near future.	2019-10-28 08:21:02 -04:00
Tim Gross	8be403f47b	e2e: refactor Vault configuration (#6561 ) Match the configuration directory layout we're using for Consul and other services. Make recommended changes from `shellcheck` for this section of the provisioning script.	2019-10-25 15:29:01 -04:00
Tim Gross	87b3abddd3	e2e: use sockaddr for IP address configuration (#6548 ) Update the Consul and Vault configs to take advantage of their included `go-sockaddr` library for getting the IP addresses we need in a portable way. This particularly avoids problems with "predictable" interface names provided by systemd. Also adds the `sockaddr` binary to the Packer build so we can use it in our provisioning scripts.	2019-10-25 14:08:38 -04:00
Tim Gross	efbd680d4e	e2e: split Packer build scripts from TF provisioning (#6542 ) Make a clear split between Packer and Terraform provisioning steps: the scripts in the `packer/linux` directory are run when we build the AMI whereas the stuff in shared are run at Terraform provisioning time. Merging all runtime provisioning scripts into a single script for each of server/client solves the following: * Userdata scripts can't take arguments, they can only be templated and that means we have to do TF escaping in bash/powershell scripts. * TF provisioning scripts race with userdata scripts.	2019-10-25 08:08:24 -04:00
Tim Gross	c648c4f998	e2e: upgrade terraform to 0.12.x (#6489 )	2019-10-14 11:27:08 -04:00
Tim Gross	15e912ddd6	e2e: move remote-exec inline to script (#6488 ) A failing script in a `remote-exec` provisioner's `inline` stanza won't fail the provisioning step. This lets us continue on to execute tests against potentially broken deployments, rather than letting us know the provisioning itself failed.	2019-10-14 10:23:41 -04:00
Danielle Lancashire	199d24d6bf	chore: initial hclfmt	2019-10-11 14:00:05 +02:00
Lang Martin	0648402150	Merge pull request #6373 from hashicorp/b-raft-proto-upgrade raft protocol defaults to version 2	2019-09-26 14:33:09 -04:00
Tim Gross	d965a15490	driver/networking: don't recreate existing network namespaces	2019-09-25 14:58:17 -04:00
Tim Gross	e86a476bbb	failing test for #6310	2019-09-25 14:58:17 -04:00
Lang Martin	6e0ec6302b	script e2e/upgrades: cluster upgrade scripts	2019-09-24 14:35:45 -04:00
Danielle	940bbcc639	Merge pull request #6342 from hashicorp/f-host-volume-e2e Add Host Volumes E2E test	2019-09-18 12:59:32 -07:00
Tim Gross	adde9acf57	e2e: test infra for client node restarts (#6313 ) Add a test helper that restarts a specific client node running under systemd using a `raw_exec` job.	2019-09-18 10:10:14 -04:00
Tim Gross	7061dcef4b	e2e: move consul status check helpers to e2eutil (#6314 )	2019-09-18 08:18:19 -04:00
Danielle Lancashire	05d172ef2b	e2e: init host volumes test	2019-09-18 00:34:48 +02:00
Danielle Lancashire	c50d7f2727	e2e: Add Host Volume Configuration	2019-09-17 20:06:50 +02:00
Tim Gross	55ee7a220b	e2e: fixes for race conditions in testing (#6300 ) - In script checks, ensure we're running `Exec` against the new running allocation and not the earlier stopped one. - In script checks, allow `Exec` calls to error due to lack of pty when we use the exec to kill the task. - In `utils.go/RegisterAllocs`, force query for allocations to wait on wait index returned by registration call.	2019-09-10 13:45:16 -04:00
Tim Gross	3469c50275	e2e: tag instances with origin (#6293 ) When multiple developers are working on e2e testing, it helps to be able to identify which infrastructure belongs to which Nomad SHA and which developer. This adds tags to the EC2 instances.	2019-09-06 15:49:18 -04:00
Tim Gross	ede48ae19c	script checks: use cat instead of ls for exit code agreement	2019-09-06 11:17:00 -04:00
Tim Gross	c9c612cc70	e2e: script check testing	2019-09-06 10:18:55 -04:00
Michael Schurter	228899c32f	e2e: test demo job for connect	2019-09-04 12:40:08 -07:00
Tim Gross	7ee3333a2d	e2e: filter default AMI by OS Add an OS tag to Packer builds of our e2e test AMIs and then filters by this in Terraform.	2019-08-30 16:51:13 -04:00
Danielle Lancashire	d454dab39b	chore: Format hcl configurations	2019-07-20 16:55:07 +02:00
Michael Schurter	a3fcb8fcca	e2e: debug log level for everyone!	2019-07-18 06:55:27 -07:00
Michael Schurter	ea68c930fe	e2e: enable_debug=true for all agents Enables the pprof http endpoint for debugging.	2019-07-17 15:20:45 -07:00
Preetha	0a2e21353f	Merge pull request #5912 from hashicorp/f-systemd-nofile systemd: set a high but non-infinite fd limit	2019-07-11 12:31:12 -05:00
Preetha Appan	53397722f1	add module version constraint to e2e/terraform	2019-07-05 09:18:38 -05:00
Michael Schurter	803aa62b7a	systemd: set a high but non-infinite fd limit	2019-07-02 09:13:24 -07:00
Lang Martin	d15d09bcc1	e2e update shell scripts argument quoting	2019-06-04 15:52:32 -04:00
Lang Martin	071dccfcce	e2e/deployment DeploymentsForJob fail instead of nil, error passing	2019-06-04 14:31:42 -04:00
Lang Martin	fa09e5d5f4	e2e/deployment fail if the second deployment times out	2019-06-04 14:08:30 -04:00
Lang Martin	e61597a098	e2e bin/update and bin/run, README	2019-06-04 13:42:07 -04:00
Lang Martin	1635fa3c00	e2e/deployment find the second deployment, use its status	2019-06-04 13:41:52 -04:00
Lang Martin	e027b9001b	Update e2e/deployment/deployment.go Co-Authored-By: Mahmood Ali <mahmood@notnoop.com>	2019-05-22 12:34:57 -04:00
Lang Martin	7929ef28c7	e2e/deployment comment the job files for clarity	2019-05-22 12:34:57 -04:00
Lang Martin	fe69f89476	e2e add deployment to the list of e2e tests, minor fixes	2019-05-22 12:34:57 -04:00
Lang Martin	2a11d66258	e2e readme minor changes to command + env val templates and order	2019-05-22 12:34:57 -04:00
Lang Martin	97fd114535	e2e utils remove ineffectual assignment of allocs	2019-05-22 12:34:57 -04:00
Lang Martin	01276455bd	e2e README typo	2019-05-22 12:34:57 -04:00
Lang Martin	824d1366dd	e2e utils error format arg match	2019-05-22 12:32:08 -04:00
Lang Martin	09a6dc2054	new e2e deployment test	2019-05-22 12:32:08 -04:00
Lang Martin	d73606e54e	e2e util split new alloc and await placement, new WaitForDeployment	2019-05-22 12:32:08 -04:00
Preetha	2dcd4291f8	Merge pull request #5702 from hashicorp/f-filter-by-create-index Filter deployments by create index	2019-05-15 21:50:41 -05:00
Michael Schurter	2b7f398726	e2e: fix nomad service for systemd<230	2019-05-14 10:53:26 -07:00
Preetha Appan	07690d6f9e	Add flag similar to --all for allocs to be able to filter deployments by latest	2019-05-13 18:33:41 -05:00
Mahmood Ali	919827f2df	Merge pull request #5632 from hashicorp/f-nomad-exec-parts-01-base nomad exec part 1: plumbing and docker driver	2019-05-09 18:09:27 -04:00
Mahmood Ali	2a555a7e74	add e2e tests for nomad exec	2019-05-09 16:49:08 -04:00
Michael Schurter	a1c3ce36bc	Merge pull request #5647 from hashicorp/e2e-tf E2E Test Terraform/Packer Improvements	2019-05-06 15:42:52 -07:00

... 2 3 4 5 6 ...

476 Commits