open-nomad

Author	SHA1	Message	Date
Tim Gross	31e72e93ff	E2E: fix flaky event stream test (#12548 ) This changeset fixes two sources of flakiness in the event stream test. First, the stream request gets the event closest to the index, not the exact match. Although events are written before raft entries they're written asynchronously, so it's possible to race and get a raft index from this query higher than the current head of the event buffer. Ensure the job is running before we try to get the index, so that we've given the event enough time to land in the buffer. Second, the assertion that the found index is greater than the start index is only true if the `PlanResult` event manages to land before we do the second registration. Although it should now with the first fix above, it's not a correct assertion for what we're testing.	2022-04-12 08:35:39 -04:00
Tim Gross	77ab8d92f1	E2E: oversubscription assertion needs to wait for stats (#12540 ) The oversubscription test expects an output that requires the client has polled the task for stats at least once. Wait long enough to ensure that we've polled the stats before failing the test.	2022-04-11 11:40:51 -04:00
Tim Gross	c9c3cbd878	E2E: test for nodes disconnected by netsplit (#12407 )	2022-04-11 11:34:27 -04:00
James Rasell	bc800a18d1	e2e: add initial service discovery tests. (#12512 ) Some tests may chose to deregister jobs to check Nomad cleanup logic, however, it is still possible for the test to fail and exit before this is hit. This therefore adds a cancellable cleanup func which can be deferred, using context to control whether it gets run or not.	2022-04-11 11:12:24 +02:00
James Rasell	dbf28a06c1	e2e: fix eventual consistency failure within consultemplate suite. (#12494 )	2022-04-07 17:03:10 +02:00
James Rasell	431c153cd9	client: add Nomad template service functionality to runner. (#12458 ) This change modifies the template task runner to utilise the new consul-template which includes Nomad service lookup template funcs. In order to provide security and auth to consul-template, we use a custom HTTP dialer which is passed to consul-template when setting up the runner. This method follows Vault implementation. Co-authored-by: Michael Schurter <mschurter@hashicorp.com>	2022-04-06 19:17:05 +02:00
Seth Hoenig	3ce4f52740	Merge pull request #12446 from shoenig/no-pkg-err cleanup: purge github.com/pkg/errors	2022-04-04 09:22:44 -05:00
Tim Gross	806a82dd0c	E2E: ensure that CSI EBS tests are isolated from each other (#12443 ) Tear down the volume-consuming job between subtests, rather than after all the tests are complete. For good measure, use a different ID for the volume-consuming job as well.	2022-04-04 09:44:55 -04:00
Seth Hoenig	9670adb6c6	cleanup: purge github.com/pkg/errors	2022-04-01 19:24:02 -05:00
Tim Gross	3030f954a2	E2E disconnected clients test refactor (#12402 ) * Wait longer for node to go down in disconnected clients test. The existing helper only waits 10s, but there's a jitter on heartbeats that we need to account for. Wait for 30s for node to go down to give us plenty of room * Port disconnected clients to stdlib-style test	2022-03-30 09:12:44 -04:00
Tim Gross	19703e3316	E2E: test exercising node drain behavior for CSI volumes (#12384 )	2022-03-29 11:19:23 -04:00
Tim Gross	5c7f2bad0b	E2E: namespace HCP vault and consul policies to avoid collisions (#12386 ) Concurrent E2E runs can collide when provisioning policies on HCP Consul and HCP Vault. Namespace these by the test run name, as we do for most everything else.	2022-03-25 16:05:59 -04:00
Tim Gross	3c15236fd5	E2E: move example test to use golangs stdlib test runner (#12383 ) Our E2E "framework" has a bunch of features around test discovery and standing up infra that were never completed or fully used, and we ended up building out a large test suite that ignored all that in lieu of Terraform-provided infrastructure for the last couple years. This changeset is a proposal (and demonstration) for gradually migrating our E2E tests off the framework code so that developers can write fairly ordinary golang stdlib testing tests.	2022-03-25 14:44:16 -04:00
Tim Gross	67b87e46f1	e2e: test for allocations replacement on disconnected clients (#12375 ) This test exercises the behavior of clients that become disconnected and have their allocations replaced. Future test cases will exercise the `max_client_disconnect` field on the job spec.	2022-03-25 12:26:43 -04:00
Tim Gross	e687a21da9	CSI: set plugin `CSI_ENDPOINT` env var only if unset by user (#12257 ) * Use unix:// prefix for CSI_ENDPOINT variable by default * Some plugins have strict validation over the format of the `CSI_ENDPOINT` variable, and unfortunately not all plugins agree. Allow the user to override the `CSI_ENDPOINT` to workaround those cases. * Update all demos and tests with CSI_ENDPOINT	2022-03-21 11:48:47 -04:00
Tim Gross	bd403f2f88	E2E: ensure `ConnectACLsE2ETest` has clean state before starting (#12334 ) The `ConnectACLsE2ETest` checks that the SI tokens have been properly cleaned up between tests, but following the change to use HCP the previous `Connect` test suite will often have SI tokens that haven't been cleaned up by the time this test suite runs. Wait for the SI tokens to be cleaned up at the start of the test to ensure we have a clean state.	2022-03-21 11:05:02 -04:00
Tim Gross	9f05d62338	E2E with HCP Consul/Vault (#12267 ) Use HCP Consul and HCP Vault for the Consul and Vault clusters used in E2E testing. This has the following benefits: * Without the need to support mTLS bootstrapping for Consul and Vault, we can simplify the mTLS configuration by leaning on Terraform instead of janky bash shell scripting. * Vault bootstrapping is no longer required, so we can eliminate even more janky shell scripting * Our E2E exercises HCP, which is important to us as an organization * With the reduction in configurability, we can simplify the Terraform configuration and drop the complicated `provision.sh`/`provision.ps1` scripts we were using previously. We can template Nomad configuration files and upload them with the `file` provisioner. * Packer builds for Linux and Windows become much simpler. tl;dr way less janky shell scripting!	2022-03-18 09:27:28 -04:00
Seth Hoenig	373d8f7241	ci: missing import for nomad09upgrade	2022-03-17 08:49:15 -05:00
Seth Hoenig	f87eb666c7	e2e: have e2e use ci.Parallel This is a followup to having tests run in serial in CI. The e2e package isn't in CI, but lets use the helper anyway so we can setup semgrep rules covering the entire repository.	2022-03-17 08:37:34 -05:00
Tim Gross	b94837a2b8	csi: add pagination args to `volume snapshot list` (#12193 ) The snapshot list API supports pagination as part of the CSI specification, but we didn't have it plumbed through to the command line.	2022-03-07 12:19:28 -05:00
Tim Gross	09a7612150	csi: volume snapshot list plugin option is required (#12197 ) The RPC for listing volume snapshots requires a plugin ID. Update the `volume snapshot list` command to find the specific plugin from the provided prefix.	2022-03-07 09:58:29 -05:00
Tim Gross	a07386c507	e2e: use context for executing external commands (#12185 ) If any E2E test hangs, it'll eventually timeout and panic, causing the all the remaining tests to fail. External commands should use a short context whenever possible so we can fail the test quickly and move on to the next test.	2022-03-04 08:55:36 -05:00
Tim Gross	5f30279cd2	e2e: `StopJob` should tolerate progress deadline expired (#12179 ) The `TestRescheduleProgressDeadlineFail` E2E test failed during test cleanup because the error message "progress deadline expired" that it emits when we stop the job does not match the one expected from monitoring the `job stop` command. Update the `StopJob` helper to tolerate this use case as well.	2022-03-04 08:55:22 -05:00
Tim Gross	4c4895e19c	e2e: configure prometheus for mTLS for `Metrics` suite (#12181 ) The `Metrics` suite uses prometheus to scrape Nomad metrics so that we're testing the full user experience of extracting metrics from Nomad. With the addition of mTLS, we need to make sure prometheus also has mTLS configuration because the metrics endpoint is protected. Update the Nomad client configuration and prometheus job to bind-mount the client's certs into the task so that the job can use these certs to scrape the server. This is a temporary solution that gets the job passing; we should give the job its own certificates (issued by Vault?) when we've done some of the infrastructure rework we'd like.	2022-03-04 08:55:06 -05:00
Tim Gross	b8b08fb32d	e2e: use UUID for CSI idempotency token (#12183 ) The AWS EBS plugin appears to use the name field of the volume as an idempotency token that persists across the entire AWS account, not just the plugin lifespan. Also fix the regex for the volume ID, which was originally taken from the job ID regex but isn't actually the same. This hasn't failed tests for us because we've always passed in the same volume ID.	2022-03-03 17:00:00 -05:00
Tim Gross	1502af3523	e2e: use `operator api` for Networking suite validation (#12180 ) With mTLS enabled, using `curl` in a bash script for validation involves having to configure arguments to `curl` based on whether or not the test infrastructure is using mTLS, whether ACLs are enabled, etc. Use the new `operator api` command instead to pick up the client configuration from the test environment automatically.	2022-03-03 15:17:29 -05:00
Tim Gross	f2a4ad0949	CSI: implement support for topology (#12129 )	2022-03-01 10:15:46 -05:00
James Rasell	adc3c44e29	e2e: moved missed volume test stop command to util helper.	2022-02-02 08:42:58 +01:00
James Rasell	0a50d9fd2a	e2e: account for new job stop CLI exit behaviour. PR #11550 changed the job stop exit behaviour when monitoring the deployment. When stopping a job, the deployment becomes cancelled and therefore the CLI now exits with status code 1 as it see this as an error. This change adds a new utility e2e function that accounts for this behaviour.	2022-02-01 14:16:37 +01:00
Luiz Aoqui	3c8381bf85	e2e: enable Consul HTTPS port and always restart Nomad systemd unit	2022-01-18 16:56:26 -05:00
James Rasell	45f4689f9c	chore: fixup inconsistent method receiver names. (#11704 )	2021-12-20 11:44:21 +01:00
Tim Gross	ae04e540e6	hclfmt on some config files (#11611 )	2021-12-02 15:25:46 -05:00
Derek Strickland	8a5aa0cd8a	Fix Vault E2E TLS config (#11483 ) * Update e2e/terraform configuration for Vault and default to mtls=true	2021-12-02 12:20:09 -05:00
James Rasell	751c8217d1	core: allow setting and propagation of eval priority on job de/registration (#11532 ) This change modifies the Nomad job register and deregister RPCs to accept an updated option set which includes eval priority. This param is optional and override the use of the job priority to set the eval priority. In order to ensure all evaluations as a result of the request use the same eval priority, the priority is shared to the allocReconciler and deploymentWatcher. This creates a new distinction between eval priority and job priority. The Nomad agent HTTP API has been modified to allow setting the eval priority on job update and delete. To keep consistency with the current v1 API, job update accepts this as a payload param; job delete accepts this as a query param. Any user supplied value is validated within the agent HTTP handler removing the need to pass invalid requests to the server. The register and deregister opts functions now all for setting the eval priority on requests. The change includes a small change to the DeregisterOpts function which handles nil opts. This brings the function inline with the RegisterOpts.	2021-11-23 09:23:31 +01:00
Luiz Aoqui	5d204c8ced	Revert "Return SchedulerConfig instead of SchedulerConfigResponse struct (#10799 )" (#11433 )	2021-11-02 17:42:52 -04:00
Charlie Voiselle	cb8e52b5df	Return SchedulerConfig instead of SchedulerConfigResponse struct (#10799 )	2021-10-13 21:23:13 -04:00
Mahmood Ali	4d90afb425	gofmt all the files mostly to handle build directives in 1.17.	2021-10-01 10:14:28 -04:00
James Rasell	645741cd94	Merge pull request #11194 from hashicorp/b-fix-e2e-acl-tls-provision e2e: fix provisioning when ACLs and TLS enabled.	2021-09-17 08:11:10 +02:00
James Rasell	30273d9256	e2e: fix provisioning when ACLs and TLS enabled; no nightly TLS.	2021-09-16 17:15:41 +02:00
James Rasell	0e926ef3fd	allow configuration of Docker hostnames in bridge mode (#11173 ) Add a new hostname string parameter to the network block which allows operators to specify the hostname of the network namespace. Changing this causes a destructive update to the allocation and it is omitted if empty from API responses. This parameter also supports interpolation. In order to have a hostname passed as a configuration param when creating an allocation network, the CreateNetwork func of the DriverNetworkManager interface needs to be updated. In order to minimize the disruption of future changes, rather than add another string func arg, the function now accepts a request struct along with the allocID param. The struct has the hostname as a field. The in-tree implementations of DriverNetworkManager.CreateNetwork have been modified to account for the function signature change. In updating for the change, the enhancement of adding hostnames to network namespaces has also been added to the Docker driver, whilst the default Linux manager does not current implement it.	2021-09-16 08:13:09 +02:00
Luiz Aoqui	f30c024a40	e2e: use absolute path for mTLS env vars (#11126 )	2021-09-03 12:59:21 -04:00
James Rasell	6bd2acd5b3	Merge pull request #11098 from hashicorp/b-fixup-all-incorrect-docstrings chore: fix incorrect docstring formatting.	2021-08-31 09:46:18 +02:00
Mahmood Ali	fec0adbb0e	Support mTLS clusters for e2e testing (#11092 ) This allows us to spin up e2e clusters with mTLS configured for all HashiCorp services, i.e. Nomad, Consul, and Vault. Used it for testing #11089 . mTLS is disabled by default. I have not updated Windows provisioning scripts yet - Windows also lacks ACL support from before. I intend to follow up for them in another round.	2021-08-30 10:18:16 -04:00
James Rasell	b6813f1221	chore: fix incorrect docstring formatting.	2021-08-30 11:08:12 +02:00
James Rasell	73ab63cf68	test: update e2e and dev scripts to use cni plugins v1.0.0	2021-08-27 11:14:47 +02:00
Mahmood Ali	97966c7a71	e2e: Run system jobs on all datacenters (#11060 ) Target all e2e datacenters for system and sysbatch e2e tests. They require that the system jobs run on all linux clients. However, the jobs currenly only target `dc1` datacenter, but the nightly e2e cluster has 4 clients spread in `dc1` and `dc2` datacenters, causing the tests to fail. I missed this problem in e2e dev cluster because it only used a single dc1 datacenter.	2021-08-17 11:01:47 -04:00
Mahmood Ali	28bc234e84	e2e: fix tests Use basic sleeps in busybox images. busybox are very light, and ping has permissions complications, and it may fail for network related issues.	2021-08-03 11:38:35 -04:00
Seth Hoenig	3371214431	core: implement system batch scheduler This PR implements a new "System Batch" scheduler type. Jobs can make use of this new scheduler by setting their type to 'sysbatch'. Like the name implies, sysbatch can be thought of as a hybrid between system and batch jobs - it is for running short lived jobs intended to run on every compatible node in the cluster. As with batch jobs, sysbatch jobs can also be periodic and/or parameterized dispatch jobs. A sysbatch job is considered complete when it has been run on all compatible nodes until reaching a terminal state (success or failed on retries). Feasibility and preemption are governed the same as with system jobs. In this PR, the update stanza is not yet supported. The update stanza is sill limited in functionality for the underlying system scheduler, and is not useful yet for sysbatch jobs. Further work in #4740 will improve support for the update stanza and deployments. Closes #2527	2021-08-03 10:30:47 -04:00
Mahmood Ali	70f541287b	e2e: wait for allocs and deployments (#10967 ) As we moved to using `-detach` for registering jobs, we should wait until allocs and deployments are created before asserting their properties. Fixing `TestNodeDrainIgnoreSystem` and `TestRescheduleProgressDeadlineFail` tests as they seem particularly flaky, failing 9 and 7 times (respectively) in the last two weeks.	2021-07-29 10:52:04 -04:00
Mahmood Ali	a9bd176742	e2e: use -detach mode when registering jobs with cli (#10877 ) Pick up 15d39f0dee but for RegisterFromJobspec: > This PR changes the e2e helper thingy to set -detach option > when registering a job with the CLI instead of the API. This is > necessary for jobs which never become healthy, as the deployment > never finishes for failing jobs and the command never returns, > causing the test to timeout after 10 minutes. This case occurs in TestVaultSecrets	2021-07-09 09:25:44 -04:00
Seth Hoenig	80f4340b77	e2e: use -detach mode when registering jobs with cli This PR changes the e2e helper thingy to set -detach option when registering a job with the CLI instead of the API. This is necessary for jobs which never become healthy, as the deployment never finishes for failing jobs and the command never returns, causing the test to timeout after 10 minutes.	2021-06-18 12:18:40 -05:00
James Rasell	939b23936a	Merge pull request #10744 from hashicorp/b-remove-duplicate-imports chore: remove duplicate import statements	2021-06-11 16:42:34 +02:00
James Rasell	2898e5d379	e2e: remove duplicate import statements.	2021-06-11 09:37:23 +02:00
Michael Schurter	319650d481	e2e: use api.ipify.org ipv4.icanhazip.com returns ipv6 addresses	2021-06-07 15:12:42 -07:00
Mahmood Ali	5258ae480b	remove unused Spark security group rules	2021-06-04 11:49:43 -04:00
Mahmood Ali	b852dc5eb8	e2e: pass nomad_url variable	2021-06-04 10:32:51 -04:00
Mahmood Ali	71936e1b27	e2e: NOMAD_VERSION is not set when installing url	2021-06-04 10:31:37 -04:00
Mahmood Ali	d0768bb999	restrict ingress ip	2021-06-04 10:31:35 -04:00
Luiz Aoqui	139c5e8df9	e2e: fix terraform output environment command instruction (#10674 )	2021-06-01 10:10:12 -04:00
Mahmood Ali	d8de4e62bb	Merge pull request #10657 from hashicorp/b-alloc-exec-closing Handle `nomad exec` termination events in order	2021-05-25 14:50:58 -04:00
Mahmood Ali	0853d48927	e2e: Spin clusters with custom url binaries (#10656 ) Ease spinning up a cluster, where binaries are fetched from arbitrary urls. These could be CircleCI `build-binaries` job artifacts, or presigned S3 urls. Co-authored-by: Tim Gross <tgross@hashicorp.com>	2021-05-25 13:47:39 -04:00
Mahmood Ali	3b7c5ff46e	e2e: stop suppressing unexpected EOF errors	2021-05-24 13:35:08 -04:00
Tim Gross	709b92c5a8	e2e: update TF lockfile	2021-05-18 09:35:57 -04:00
Tim Gross	d4465f01ac	E2E: remove references to nomad_sha	2021-05-10 16:42:39 -04:00
Mahmood Ali	a33ec72dd7	e2e: enable memory oversubscription (#10557 ) Enable memory oversubscription for the oversubscription tests.	2021-05-10 14:33:47 -04:00
Michael Schurter	547a718ef6	Merge pull request #10248 from hashicorp/f-remotetask-2021 core: propagate remote task handles	2021-04-30 08:57:26 -07:00
Michael Schurter	982c65c0c7	comment out unused consts to make linter happy	2021-04-30 08:31:31 -07:00
Seth Hoenig	d54a606819	Merge pull request #10439 from hashicorp/pick-ent-acls-changes e2e: add e2e tests for consul namespaces on ent with acls	2021-04-28 08:30:08 -06:00
Tim Gross	79f81d617e	licensing: remove raft storage and sync This changeset is the OSS portion of the work to remove the raft storage and sync for Nomad Enterprise.	2021-04-28 10:28:23 -04:00
Michael Schurter	0eb5d5136f	e2e: use public_ip in packer	2021-04-27 15:07:03 -07:00
Michael Schurter	e62795798d	core: propagate remote task handles Add a new driver capability: RemoteTasks. When a task is run by a driver with RemoteTasks set, its TaskHandle will be propagated to the server in its allocation's TaskState. If the task is replaced due to a down node or draining, its TaskHandle will be propagated to its replacement allocation. This allows tasks to be scheduled in remote systems whose lifecycles are disconnected from the Nomad node's lifecycle. See https://github.com/hashicorp/nomad-driver-ecs for an example ECS remote task driver.	2021-04-27 15:07:03 -07:00
Seth Hoenig	09cd01a5f3	e2e: add e2e tests for consul namespaces on ent with acls This PR adds e2e tests for Consul Namespaces for Nomad Enterprise with Consul ACLs enabled. Needed to add support for Consul ACL tokens with `namespace` and `namespace_prefix` blocks, which Nomad parses and validates before tossing the token. These bits will need to be picked back to OSS.	2021-04-27 14:45:54 -06:00
Seth Hoenig	f258fc8270	Merge pull request #10401 from hashicorp/cp-cns-ent-test-fixes cherry-pick fixes from cns ent tests	2021-04-20 08:45:15 -06:00
Drew Bailey	d42f204a89	remove second deploy that did not have anything to do with the test (#10400 )	2021-04-20 08:44:44 -04:00
Seth Hoenig	509490e5d2	e2e: consul namespace tests from nomad ent (cherry-picked from ent without _ent things) This is part 2/4 of e2e tests for Consul Namespaces. Took a first pass at what the parameterized tests can look like, but only on the ENT side for this PR. Will continue to refactor in the next PRs. Also fixes 2 bugs: - Config Entries registered by Nomad Server on job registration were not getting Namespace set - Group level script checks were not getting Namespace set Those changes will need to be copied back to Nomad OSS. Nomad OSS + no ACLs (previously, needs refactor) Nomad ENT + no ACLs (this) Nomad OSS + ACLs (todo) Nomad ENT + ALCs (todo)	2021-04-19 15:35:31 -06:00
Seth Hoenig	25810b4cd6	e2e: set PORT on counter-api with host networking	2021-04-16 16:28:39 -06:00
Seth Hoenig	2d693127bb	e2e: minor tweaks from CR	2021-04-16 15:32:37 -06:00
Seth Hoenig	7f1191111d	e2e: add tests for consul namespaces from nomad oss This PR adds a set of tests to the Consul test suite for testing Nomad OSS's behavior around setting Consul Namespace on groups, which is to ignore the setting (as Consul Namespaces are currently an Enterprise feature). Tests are generally a reduced facsimile of existing tests, modified to check behavior of when group.consul.namespace is set and not set. Verification is oriented around what happens in Consul; the in-depth functional correctness of these features is left to the original tests. Nomad ENT will get its own version of these tests in `namespaces_ent.go`.	2021-04-16 15:32:37 -06:00
Tim Gross	dcc5268862	E2E/CSI: ensure jobs are stopped before checking claims are released During refactoring of the CSI jobs, the EBS test dropped stopping the jobs before checking that the claims were released.	2021-04-15 11:06:11 -04:00
Seth Hoenig	198e0d9f24	e2e: get consul ent in e2e packer builds Using Consul Enterprise is going to be necessary for testing Nomad's Consul Namespace integration in Nomad v1.1 in e2e.	2021-04-14 12:05:55 -06:00
Tim Gross	a13590fb37	e2e/csi: fix name of column used for snapshot create output parsing	2021-04-13 09:15:19 -04:00
Tim Gross	a84eca0136	E2E: remove broken Move-Item call during Windows provisioning The archive does not include the `pkg/windows_amd64` path and unpacking the archive happens in the installation directory.	2021-04-09 09:49:42 -04:00
Tim Gross	f4ccb360ef	E2E: use remote-exec via TF0.14.7+ The E2E provisioning used local-exec to call ssh in a for loop in a hacky workaround https://github.com/hashicorp/terraform/issues/25634, which prevented remote-exec from working on Windows. Move to a newer version of Terraform that fixes the remote-exec bug to make provisioning more reliable and observable. Note that Windows remote-exec needs to include the `powershell` call itself, unlike Unix-alike remote-exec.	2021-04-08 16:03:06 -04:00
Tim Gross	da89103c5c	E2E: extend CSI test to cover create and snapshot workflows Split the EBS and EFS tests out into their own test cases: * EBS exercises the Controller RPCs, including the create/snapshot workflow. * EFS exercises only the Node RPCs, and assumes we have an existing volume that gets registered, rather than created.	2021-04-08 12:55:36 -04:00
Yoan Blanc	ac0d5d8bd3	chore: bump golangci-lint from v1.24 to v1.39 Signed-off-by: Yoan Blanc <yoan@dosimple.ch>	2021-04-03 09:50:23 +02:00
Mahmood Ali	85502c1739	oversubscription: e2e tests!	2021-03-30 16:55:58 -04:00
Drew Bailey	7e78d4a607	e2e license smoke test (#10242 )	2021-03-26 13:21:47 -04:00
Mahmood Ali	dbc3850358	Merge pull request #10145 from hashicorp/b-periodic-init-status periodic: always reset periodic children status	2021-03-26 09:19:08 -04:00
Drew Bailey	64084f3209	e2e allow setting an enterprise license environment variable (#10233 ) * allow setting an enterprise license environment variable * update comment * address pr comments	2021-03-25 14:35:55 -04:00
Mahmood Ali	e643742a38	Add a test for parameterized summary counts	2021-03-25 11:27:09 -04:00
Tim Gross	46223e190e	E2E: bump AWS CSI driver versions	2021-03-24 14:17:38 -04:00
Tim Gross	0e774d40f5	E2E: CSI test should use expected unique-volume name	2021-03-23 08:34:17 -04:00
Tim Gross	fa25e048b2	CSI: unique volume per allocation Add a `PerAlloc` field to volume requests that directs the scheduler to test feasibility for volumes with a source ID that includes the allocation index suffix (ex. `[0]`), rather than the exact source ID. Read the `PerAlloc` field when making the volume claim at the client to determine if the allocation index suffix (ex. `[0]`) should be added to the volume source ID.	2021-03-18 15:35:11 -04:00
Charlie Voiselle	0473f35003	Fixup uses of `sanity` (#10187 ) * Fixup uses of `sanity` * Remove unnecessary comments. These checks are better explained by earlier comments about the context of the test. Per @tgross, moved the tests together to better reinforce the overall shared context. * Update nomad/fsm_test.go	2021-03-16 18:05:08 -04:00
Tim Gross	2a2e36690a	docs: swap master for main in Nomad repo	2021-03-08 14:26:31 -05:00
Mahmood Ali	ff8d67fae2	Merge pull request #9935 from hashicorp/e2e-segment-e2e-clusters e2e: segment e2e clusters	2021-03-01 09:23:21 -05:00
Drew Bailey	86d9e1ff90	Merge pull request #9955 from hashicorp/on-update-services Service and Check on_update configuration option (readiness checks)	2021-02-24 10:11:05 -05:00
Seth Hoenig	d2cd605995	dist: place systemd unit options correctly This PR places StartLimitIntervalSec and StartLimitBurst in the Unit section of systemd unit files, rather than the Service section. https://www.freedesktop.org/software/systemd/man/systemd.unit.html Fixes #10065	2021-02-22 19:23:00 -06:00
Drew Bailey	c152757d38	E2e/fix periodic (#10047 ) * fix periodic * update periodic to not use template nomad job inspect no longer returns an apiliststub so the required fields to query job summary are no longer there, parse cli output instead * rm tmp makefile entry * fix typo * revert makefile change	2021-02-18 12:21:53 -05:00
James Rasell	f95e45b80c	e2e: account for race condition in periodic dispatch test.	2021-02-11 11:08:48 +01:00
Seth Hoenig	7d6e81e9e4	Merge pull request #9990 from hashicorp/f-nsiso-task drivers/exec+java: Add task configuration to restore previous PID/IPC isolation behavior	2021-02-09 13:29:14 -06:00
Seth Hoenig	45e0e70a50	consul/connect: enable custom sidecars to use expose checks This PR enables jobs configured with a custom sidecar_task to make use of the `service.expose` feature for creating checks on services in the service mesh. Before we would check that sidecar_task had not been set (indicating that something other than envoy may be in use, which would not support envoy's expose feature). However Consul has not added support for anything other than envoy and probably never will, so having the restriction in place seems like an unnecessary hindrance. If Consul ever does support something other than Envoy, they will likely find a way to provide the expose feature anyway. Fixes #9854	2021-02-09 10:49:37 -06:00
Seth Hoenig	8ee9835923	drivers/exec+java: Add task configuration to restore previous PID/IPC isolation behavior This PR adds pid_mode and ipc_mode options to the exec and java task driver config options. By default these will defer to the default_pid_mode and default_ipc_mode agent plugin options created in #9969. Setting these values to "host" mode disables isolation for the task. Doing so is not recommended, but may be necessary to support legacy job configurations. Closes #9970	2021-02-08 14:26:35 -06:00
Drew Bailey	b5585882e4	address pr comments	2021-02-08 13:43:05 -05:00
Drew Bailey	b0cf3ffa54	on_update check_restart e2e	2021-02-08 10:49:25 -05:00
Drew Bailey	8507d54e3b	e2e test for on_update service checks check_restart not compatible with on_update=ignore reword caveat	2021-02-08 08:32:40 -05:00
Chris Baker	b1bb8a760e	e2e packer build: upgrade jdk to java 14	2021-02-02 17:33:48 +00:00
Mahmood Ali	45889f9f55	e2e: segment e2e clusters Ensure that the e2e clusters are isolated and never attempt to autojoin with another e2e cluster. This ensures that each cluster servers have a unique `ConsulAutoJoin`, to be used for discovery.	2021-02-01 08:04:21 -05:00
Chris Baker	ce68ee164b	Version 1.0.3 -----BEGIN PGP SIGNATURE----- Version: GnuPG v1 iQEcBAABAgAGBQJgEuOKAAoJEFGFLYc0j/xMxF8H/3TTU6Tu+Xm0YvcsDaYDphZ/ X7KQBV0aFiuL5VkTw4PzKEsgryIy9/sqEPyxxyKRowAmos9qhiusjNAIfqdP4TF8 tdZmTedkfWir9uPD+hyv/LXpwbQ2T8kTwS3xHTYvaOmaCxZr710FEn+imnMk1AUn Xs5itkd/CYGr0nBLm+I5GutWSDPmL7Uw8J5Z30fFyoaxoCPAbCWQQNk793SCRUc5 f/uo18V2tFInmQ+3sAdnM4gPewyStK/a5VvzWavL9fVDtYK83wlqWSchTXY5jpVz zNEzt/rYhbBzakPQQKb5zieblh2iGI8aHWpD5w4WduqO2Sg6B/5lAeNZIlW0UJg= =2g3c -----END PGP SIGNATURE----- Merge tag 'v1.0.3' into post-release-1.0.3 Version 1.0.3	2021-01-29 19:30:08 +00:00
Chris Baker	2632b81124	lint some nomad HCL job specs	2021-01-28 12:03:19 +00:00
Chris Baker	2adf0f12d6	e2e: java driver isolation tests	2021-01-28 12:03:19 +00:00
Chris Baker	aa55df0413	additional e2e utils for multi-task allocs	2021-01-28 12:03:19 +00:00
Kris Hicks	d67b77f38e	Add a little comment	2021-01-28 12:03:19 +00:00
Kris Hicks	5cf972d2e7	Add test for alloc exec	2021-01-28 12:03:19 +00:00
Kris Hicks	2db8aa2a52	Add e2e test for raw exec	2021-01-28 12:03:19 +00:00
Kris Hicks	87188f04de	Add PID namespacing and e2e test	2021-01-28 12:03:19 +00:00
Mahmood Ali	c92bb342e1	e2e: skip node drain deadline/force tests	2021-01-27 08:42:16 -05:00
Mahmood Ali	b12e8912a9	e2e: use f.NoError instead of requires	2021-01-27 08:36:23 -05:00
Mahmood Ali	1ac8b32e08	e2e: Disable Connect tests The connect tests are very disruptive: restart consul/nomad agents with new tokens. The test seems particularly flaky, failing 32 times out of 73 in my sample. The tests are particularly problematic because they are disruptive and affect other tests. On failure, the nomad or consul agent on the client can get into a wedged state, so health/deployment info in subsequent tests may be wrong. In some cases, the node will be deemed as fail, and then the subsequent tests may fail when the node is deemed lost and the test allocations get migrated unexpectedly.	2021-01-26 10:01:14 -05:00
Mahmood Ali	36ce1e73eb	e2e: deflake nodedrain test The nodedrain deadline test asserts that all allocations are migrated by the deadline. However, when the deadline is short (e.g. 10s), the test may fail because of scheduler/client-propagation delays. In one failing test, it took ~15s from the RPC call to the moment to the moment the scheduler issued migration update, and then 3 seconds for the alloc to be stopped. Here, I increase the timeouts to avoid such false positives.	2021-01-26 10:01:14 -05:00
Mahmood Ali	cf8f6f07d7	e2e: vault increase timeout Increase the timeout for vaultsecrets. As the default interval is 0.1s, 10 retries mean it only retries for one second, a very short time for some waiting scenarios in the test (e.g. starting allocs, etc).	2021-01-26 10:01:14 -05:00
Mahmood Ali	94ad40907c	e2e: prefer testutil.WaitForResultRetries Prefer testutil.WaitForResultRetries that emits more descriptive errors on failures. `require.Evatually` fails with opaque "Condition never satisfied" error message.	2021-01-26 10:01:14 -05:00
Mahmood Ali	f3f8f15b7b	e2e: special case "Unexpected EOF" errors This is an attempt at deflaking the e2e exec tests, and a way to improve messages. e2e occasionally fail with "unexpected EOF" even though the exec output matches expectations. I suspect there is a race in handling EOF in server/http handling. Here, we special case this error and ensures we get all failures, to help debug the case better.	2021-01-26 10:01:14 -05:00
Mahmood Ali	925d9ce952	e2e: tweak failure messages Tweak the error messages for the flakiest tests, so that on test failure, we get more output	2021-01-26 09:16:48 -05:00
Mahmood Ali	6aa3dec6cc	e2e: use testify requires instead of t.Fatal testify requires offer better error message that is easier to notice when seeing a wall of text in the builds.	2021-01-26 09:14:47 -05:00
Mahmood Ali	236b4055a7	e2e: deflake consul/CheckRestart test Ensure we pass the alloc ID to status. Otherwise, the test may fail if there is another spurious allocation running from another test.	2021-01-26 09:12:20 -05:00
Mahmood Ali	0aafd9af64	e2e: Fix build script and pass shellcheck	2021-01-26 09:11:37 -05:00
Mahmood Ali	4397eda209	Merge pull request #9798 from hashicorp/e2e-terraform-tweaks-20200113 This PR makes two ergonomics changes, meant to get e2e builds more reproducible and ease changes. ### AMI Management First, we pin the server AMIs to the commits associated with the build. No more using the latest AMI a developer build in a test branch, or accidentally using a stale AMI because we forgot to build one! Packer is to tag the AMI images with the commit sha used to generate the image, and then Terraform would look up only the AMIs associated with that sha. To minimize churn, we use the SHA associated with the latest Packer configurations, rather than SHA of all. This has few benefits: reproducibility and avoiding accidental AMI changes and contamination of changes across branches. Also, the change is a stepping stone to an e2e pipeline that builds new AMIs automatically if Packer files changed. The downside is that new AMIs will be generated even for irrelevant changes (e.g. spelling, commits), but I suspect that's OK. Also, an engineer will be forced to build the AMI whenever they change Packer files while iterating on e2e scripts; this hasn't been an issue for me yet, and I'll be open for iterating on that later if it proves to be an issue. ### Config Files and Packer Second, this PR moves e2e config hcl management to Terraform instead of Packer. Currently, the config files live in `./terraform/config`, but they are baked into the servers by Packer and changes are ignored. This current behavior surprised me, as I spent a bit of time debugging why my config changes weren't applied. Having Terraform manage them would ease engineer's iteration. Also, make Packer management more consistent (Packer only works `e2e/terraform/packer`), and easing the logic for AMI change detection. The config directory is very small (100KB), and having it as an upload step adds negligible time to `terraform apply`.	2021-01-25 13:20:28 -05:00
Mahmood Ali	39da228964	update readme about profiles and packer build	2021-01-25 11:40:26 -05:00
Seth Hoenig	8b05efcf88	consul/connect: Add support for Connect terminating gateways This PR implements Nomad built-in support for running Consul Connect terminating gateways. Such a gateway can be used by services running inside the service mesh to access "legacy" services running outside the service mesh while still making use of Consul's service identity based networking and ACL policies. https://www.consul.io/docs/connect/gateways/terminating-gateway These gateways are declared as part of a task group level service definition within the connect stanza. service { connect { gateway { proxy { // envoy proxy configuration } terminating { // terminating-gateway configuration entry } } } } Currently Envoy is the only supported gateway implementation in Consul. The gateay task can be customized by configuring the connect.sidecar_task block. When the gateway.terminating field is set, Nomad will write/update the Configuration Entry into Consul on job submission. Because CEs are global in scope and there may be more than one Nomad cluster communicating with Consul, there is an assumption that any terminating gateway defined in Nomad for a particular service will be the same among Nomad clusters. Gateways require Consul 1.8.0+, checked by a node constraint. Closes #9445	2021-01-25 10:36:04 -06:00
Tim Gross	0b49e3da12	e2e: added tests for check restart behavior	2021-01-22 10:55:40 -05:00
Drew Bailey	630babb886	prevent double job status update (#9768 ) * Prevent Job Statuses from being calculated twice https://github.com/hashicorp/nomad/pull/8435 introduced atomic eval insertion iwth job (de-)registration. This change removes a now obsolete guard which checked if the index was equal to the job.CreateIndex, which would empty the status. Now that the job regisration eval insetion is atomic with the registration this check is no longer necessary to set the job statuses correctly. * test to ensure only single job event for job register * periodic e2e * separate job update summary step * fix updatejobstability to use copy instead of modified reference of job * update envoygatewaybindaddresses copy to prevent job diff on null vs empty * set ConsulGatewayBindAddress to empty map instead of nil fix nil assertions for empty map rm unnecessary guard	2021-01-22 09:18:17 -05:00
Mahmood Ali	9dcdafe4cf	e2e: show command output on failure When a command fails, it's nice to have the full output, as it contains diagnostic information. The status code isn't sufficient for debugging.	2021-01-21 10:32:16 -05:00
Mahmood Ali	923725bf3d	e2e: deflake TestVolumeMounts After submitting an update, the test ought to wait until the new allocations are placed. Previously, we'd use the original to-be-stopped allocations and the test fails when attempting to exec.	2021-01-21 10:28:41 -05:00
Mahmood Ali	95b7fc80b8	e2e deflake namespaces: only check namespace jobs Deflake namespace e2e test by only asserting on jobs related to the namespace tests. During our e2e tests, some left over jobs (e.g. prometheus) are left running while being shutdown and cause the test to fail.	2021-01-21 10:26:24 -05:00
Mahmood Ali	2e8bcac261	e2e: deflake events Handle streamCh channel being closed.	2021-01-21 10:25:42 -05:00
Seth Hoenig	991884e715	consul/connect: Enable running multiple ingress gateways per Nomad agent Connect ingress gateway services were being registered into Consul without an explicit deterministic service ID. Consul would generate one automatically, but then Nomad would have no way to register a second gateway on the same agent as it would not supply 'proxy-id' during envoy bootstrap. Set the ServiceID for gateways, and supply 'proxy-id' when doing envoy bootstrap. Fixes #9834	2021-01-19 12:58:36 -06:00
Mahmood Ali	76ce6306a4	add helper for building ami	2021-01-15 10:49:13 -05:00
Mahmood Ali	e51651c34a	set sha	2021-01-15 10:49:13 -05:00
Mahmood Ali	82637715cf	change ami naming	2021-01-15 10:49:12 -05:00
Mahmood Ali	0af1509a77	move config files to terraform	2021-01-15 10:49:12 -05:00
Seth Hoenig	536747f216	e2e: use jobspec2 Parse for parsing jobfile in e2e utils We directly parse job files in e2eutil, but currently using jobspec package. Instead, use the Parse method from the jobspec2 package so we can parse job files with new features.	2021-01-13 14:00:40 -06:00
James Rasell	d6cab8aa14	Merge pull request #9767 from hashicorp/f-e2e-job-scaling-suite e2e: add job scaling test suite.	2021-01-11 18:35:07 +01:00
Seth Hoenig	64a8b795f2	Merge pull request #9766 from hashicorp/f-bump-cni-plugins-version cni: bump CNI plugins version to v0.9.0	2021-01-11 09:59:43 -06:00
Tim Gross	f97505e384	e2e: remove deprecated terraform syntax Also bumps patch versions of some TF modules	2021-01-11 08:25:22 -05:00
James Rasell	4374d99071	e2e: add job scaling test suite.	2021-01-11 11:34:19 +01:00
Seth Hoenig	fc5f48d936	cni: bump CNI version to v0.9.0 https://github.com/containernetworking/plugins/releases/tag/v0.9.0 Also make the copy-paste install instructions work with arm64 for a better OOTB experience (AWS Graviton, Pi 4's).	2021-01-10 18:03:27 -06:00
James Rasell	108fa33393	Merge pull request #9747 from hashicorp/f-e2e-scaling-policy-suite e2e: add ScalingPolicies test suite with initial test case.	2021-01-08 10:51:48 +01:00
James Rasell	b087d68736	e2e: add ScalingPolicies test suite with initial test case.	2021-01-07 14:39:55 +01:00
James Rasell	02b9d9da87	e2e: move namespace tests into OSS.	2021-01-07 09:15:43 +01:00

1 2 3 4 5 ...

621 commits