open-nomad

Commit Graph

Author	SHA1	Message	Date
Tim Gross	d62dd5b3fe	E2E: add debugging outputs for disconnected clients test (#12572 ) This test has a failure that's happening only occassionally and not very reproducibly. Print out the allocation status on test failure so that we can do some post-mortum debugging of the test on nightly.	2022-04-14 17:03:57 -04:00
Derek Strickland	3f871973f9	Update E2E terraform output command (#12561 )	2022-04-13 16:46:09 -04:00
Tim Gross	4078e6ea0e	scripts: fix interpreter for bash (#12549 ) Many of our scripts have a non-portable interpreter line for bash and use bash-specific variables like `BASH_SOURCE`. Update the interpreter line to be portable between various Linuxes and macOS without complaint from posix shell users.	2022-04-12 10:08:21 -04:00
Tim Gross	31e72e93ff	E2E: fix flaky event stream test (#12548 ) This changeset fixes two sources of flakiness in the event stream test. First, the stream request gets the event closest to the index, not the exact match. Although events are written before raft entries they're written asynchronously, so it's possible to race and get a raft index from this query higher than the current head of the event buffer. Ensure the job is running before we try to get the index, so that we've given the event enough time to land in the buffer. Second, the assertion that the found index is greater than the start index is only true if the `PlanResult` event manages to land before we do the second registration. Although it should now with the first fix above, it's not a correct assertion for what we're testing.	2022-04-12 08:35:39 -04:00
Tim Gross	77ab8d92f1	E2E: oversubscription assertion needs to wait for stats (#12540 ) The oversubscription test expects an output that requires the client has polled the task for stats at least once. Wait long enough to ensure that we've polled the stats before failing the test.	2022-04-11 11:40:51 -04:00
Tim Gross	c9c3cbd878	E2E: test for nodes disconnected by netsplit (#12407 )	2022-04-11 11:34:27 -04:00
James Rasell	bc800a18d1	e2e: add initial service discovery tests. (#12512 ) Some tests may chose to deregister jobs to check Nomad cleanup logic, however, it is still possible for the test to fail and exit before this is hit. This therefore adds a cancellable cleanup func which can be deferred, using context to control whether it gets run or not.	2022-04-11 11:12:24 +02:00
James Rasell	dbf28a06c1	e2e: fix eventual consistency failure within consultemplate suite. (#12494 )	2022-04-07 17:03:10 +02:00
James Rasell	431c153cd9	client: add Nomad template service functionality to runner. (#12458 ) This change modifies the template task runner to utilise the new consul-template which includes Nomad service lookup template funcs. In order to provide security and auth to consul-template, we use a custom HTTP dialer which is passed to consul-template when setting up the runner. This method follows Vault implementation. Co-authored-by: Michael Schurter <mschurter@hashicorp.com>	2022-04-06 19:17:05 +02:00
Seth Hoenig	3ce4f52740	Merge pull request #12446 from shoenig/no-pkg-err cleanup: purge github.com/pkg/errors	2022-04-04 09:22:44 -05:00
Tim Gross	806a82dd0c	E2E: ensure that CSI EBS tests are isolated from each other (#12443 ) Tear down the volume-consuming job between subtests, rather than after all the tests are complete. For good measure, use a different ID for the volume-consuming job as well.	2022-04-04 09:44:55 -04:00
Seth Hoenig	9670adb6c6	cleanup: purge github.com/pkg/errors	2022-04-01 19:24:02 -05:00
Tim Gross	3030f954a2	E2E disconnected clients test refactor (#12402 ) * Wait longer for node to go down in disconnected clients test. The existing helper only waits 10s, but there's a jitter on heartbeats that we need to account for. Wait for 30s for node to go down to give us plenty of room * Port disconnected clients to stdlib-style test	2022-03-30 09:12:44 -04:00
Tim Gross	19703e3316	E2E: test exercising node drain behavior for CSI volumes (#12384 )	2022-03-29 11:19:23 -04:00
Tim Gross	5c7f2bad0b	E2E: namespace HCP vault and consul policies to avoid collisions (#12386 ) Concurrent E2E runs can collide when provisioning policies on HCP Consul and HCP Vault. Namespace these by the test run name, as we do for most everything else.	2022-03-25 16:05:59 -04:00
Tim Gross	3c15236fd5	E2E: move example test to use golangs stdlib test runner (#12383 ) Our E2E "framework" has a bunch of features around test discovery and standing up infra that were never completed or fully used, and we ended up building out a large test suite that ignored all that in lieu of Terraform-provided infrastructure for the last couple years. This changeset is a proposal (and demonstration) for gradually migrating our E2E tests off the framework code so that developers can write fairly ordinary golang stdlib testing tests.	2022-03-25 14:44:16 -04:00
Tim Gross	67b87e46f1	e2e: test for allocations replacement on disconnected clients (#12375 ) This test exercises the behavior of clients that become disconnected and have their allocations replaced. Future test cases will exercise the `max_client_disconnect` field on the job spec.	2022-03-25 12:26:43 -04:00
Tim Gross	e687a21da9	CSI: set plugin `CSI_ENDPOINT` env var only if unset by user (#12257 ) * Use unix:// prefix for CSI_ENDPOINT variable by default * Some plugins have strict validation over the format of the `CSI_ENDPOINT` variable, and unfortunately not all plugins agree. Allow the user to override the `CSI_ENDPOINT` to workaround those cases. * Update all demos and tests with CSI_ENDPOINT	2022-03-21 11:48:47 -04:00
Tim Gross	bd403f2f88	E2E: ensure `ConnectACLsE2ETest` has clean state before starting (#12334 ) The `ConnectACLsE2ETest` checks that the SI tokens have been properly cleaned up between tests, but following the change to use HCP the previous `Connect` test suite will often have SI tokens that haven't been cleaned up by the time this test suite runs. Wait for the SI tokens to be cleaned up at the start of the test to ensure we have a clean state.	2022-03-21 11:05:02 -04:00
Tim Gross	9f05d62338	E2E with HCP Consul/Vault (#12267 ) Use HCP Consul and HCP Vault for the Consul and Vault clusters used in E2E testing. This has the following benefits: * Without the need to support mTLS bootstrapping for Consul and Vault, we can simplify the mTLS configuration by leaning on Terraform instead of janky bash shell scripting. * Vault bootstrapping is no longer required, so we can eliminate even more janky shell scripting * Our E2E exercises HCP, which is important to us as an organization * With the reduction in configurability, we can simplify the Terraform configuration and drop the complicated `provision.sh`/`provision.ps1` scripts we were using previously. We can template Nomad configuration files and upload them with the `file` provisioner. * Packer builds for Linux and Windows become much simpler. tl;dr way less janky shell scripting!	2022-03-18 09:27:28 -04:00
Seth Hoenig	373d8f7241	ci: missing import for nomad09upgrade	2022-03-17 08:49:15 -05:00
Seth Hoenig	f87eb666c7	e2e: have e2e use ci.Parallel This is a followup to having tests run in serial in CI. The e2e package isn't in CI, but lets use the helper anyway so we can setup semgrep rules covering the entire repository.	2022-03-17 08:37:34 -05:00
Tim Gross	b94837a2b8	csi: add pagination args to `volume snapshot list` (#12193 ) The snapshot list API supports pagination as part of the CSI specification, but we didn't have it plumbed through to the command line.	2022-03-07 12:19:28 -05:00
Tim Gross	09a7612150	csi: volume snapshot list plugin option is required (#12197 ) The RPC for listing volume snapshots requires a plugin ID. Update the `volume snapshot list` command to find the specific plugin from the provided prefix.	2022-03-07 09:58:29 -05:00
Tim Gross	a07386c507	e2e: use context for executing external commands (#12185 ) If any E2E test hangs, it'll eventually timeout and panic, causing the all the remaining tests to fail. External commands should use a short context whenever possible so we can fail the test quickly and move on to the next test.	2022-03-04 08:55:36 -05:00
Tim Gross	5f30279cd2	e2e: `StopJob` should tolerate progress deadline expired (#12179 ) The `TestRescheduleProgressDeadlineFail` E2E test failed during test cleanup because the error message "progress deadline expired" that it emits when we stop the job does not match the one expected from monitoring the `job stop` command. Update the `StopJob` helper to tolerate this use case as well.	2022-03-04 08:55:22 -05:00
Tim Gross	4c4895e19c	e2e: configure prometheus for mTLS for `Metrics` suite (#12181 ) The `Metrics` suite uses prometheus to scrape Nomad metrics so that we're testing the full user experience of extracting metrics from Nomad. With the addition of mTLS, we need to make sure prometheus also has mTLS configuration because the metrics endpoint is protected. Update the Nomad client configuration and prometheus job to bind-mount the client's certs into the task so that the job can use these certs to scrape the server. This is a temporary solution that gets the job passing; we should give the job its own certificates (issued by Vault?) when we've done some of the infrastructure rework we'd like.	2022-03-04 08:55:06 -05:00
Tim Gross	b8b08fb32d	e2e: use UUID for CSI idempotency token (#12183 ) The AWS EBS plugin appears to use the name field of the volume as an idempotency token that persists across the entire AWS account, not just the plugin lifespan. Also fix the regex for the volume ID, which was originally taken from the job ID regex but isn't actually the same. This hasn't failed tests for us because we've always passed in the same volume ID.	2022-03-03 17:00:00 -05:00
Tim Gross	1502af3523	e2e: use `operator api` for Networking suite validation (#12180 ) With mTLS enabled, using `curl` in a bash script for validation involves having to configure arguments to `curl` based on whether or not the test infrastructure is using mTLS, whether ACLs are enabled, etc. Use the new `operator api` command instead to pick up the client configuration from the test environment automatically.	2022-03-03 15:17:29 -05:00
Tim Gross	f2a4ad0949	CSI: implement support for topology (#12129 )	2022-03-01 10:15:46 -05:00
James Rasell	adc3c44e29	e2e: moved missed volume test stop command to util helper.	2022-02-02 08:42:58 +01:00
James Rasell	0a50d9fd2a	e2e: account for new job stop CLI exit behaviour. PR #11550 changed the job stop exit behaviour when monitoring the deployment. When stopping a job, the deployment becomes cancelled and therefore the CLI now exits with status code 1 as it see this as an error. This change adds a new utility e2e function that accounts for this behaviour.	2022-02-01 14:16:37 +01:00
Luiz Aoqui	3c8381bf85	e2e: enable Consul HTTPS port and always restart Nomad systemd unit	2022-01-18 16:56:26 -05:00
James Rasell	45f4689f9c	chore: fixup inconsistent method receiver names. (#11704 )	2021-12-20 11:44:21 +01:00
Tim Gross	ae04e540e6	hclfmt on some config files (#11611 )	2021-12-02 15:25:46 -05:00
Derek Strickland	8a5aa0cd8a	Fix Vault E2E TLS config (#11483 ) * Update e2e/terraform configuration for Vault and default to mtls=true	2021-12-02 12:20:09 -05:00
James Rasell	751c8217d1	core: allow setting and propagation of eval priority on job de/registration (#11532 ) This change modifies the Nomad job register and deregister RPCs to accept an updated option set which includes eval priority. This param is optional and override the use of the job priority to set the eval priority. In order to ensure all evaluations as a result of the request use the same eval priority, the priority is shared to the allocReconciler and deploymentWatcher. This creates a new distinction between eval priority and job priority. The Nomad agent HTTP API has been modified to allow setting the eval priority on job update and delete. To keep consistency with the current v1 API, job update accepts this as a payload param; job delete accepts this as a query param. Any user supplied value is validated within the agent HTTP handler removing the need to pass invalid requests to the server. The register and deregister opts functions now all for setting the eval priority on requests. The change includes a small change to the DeregisterOpts function which handles nil opts. This brings the function inline with the RegisterOpts.	2021-11-23 09:23:31 +01:00
Luiz Aoqui	5d204c8ced	Revert "Return SchedulerConfig instead of SchedulerConfigResponse struct (#10799 )" (#11433 )	2021-11-02 17:42:52 -04:00
Charlie Voiselle	cb8e52b5df	Return SchedulerConfig instead of SchedulerConfigResponse struct (#10799 )	2021-10-13 21:23:13 -04:00
Mahmood Ali	4d90afb425	gofmt all the files mostly to handle build directives in 1.17.	2021-10-01 10:14:28 -04:00
James Rasell	645741cd94	Merge pull request #11194 from hashicorp/b-fix-e2e-acl-tls-provision e2e: fix provisioning when ACLs and TLS enabled.	2021-09-17 08:11:10 +02:00
James Rasell	30273d9256	e2e: fix provisioning when ACLs and TLS enabled; no nightly TLS.	2021-09-16 17:15:41 +02:00
James Rasell	0e926ef3fd	allow configuration of Docker hostnames in bridge mode (#11173 ) Add a new hostname string parameter to the network block which allows operators to specify the hostname of the network namespace. Changing this causes a destructive update to the allocation and it is omitted if empty from API responses. This parameter also supports interpolation. In order to have a hostname passed as a configuration param when creating an allocation network, the CreateNetwork func of the DriverNetworkManager interface needs to be updated. In order to minimize the disruption of future changes, rather than add another string func arg, the function now accepts a request struct along with the allocID param. The struct has the hostname as a field. The in-tree implementations of DriverNetworkManager.CreateNetwork have been modified to account for the function signature change. In updating for the change, the enhancement of adding hostnames to network namespaces has also been added to the Docker driver, whilst the default Linux manager does not current implement it.	2021-09-16 08:13:09 +02:00
Luiz Aoqui	f30c024a40	e2e: use absolute path for mTLS env vars (#11126 )	2021-09-03 12:59:21 -04:00
James Rasell	6bd2acd5b3	Merge pull request #11098 from hashicorp/b-fixup-all-incorrect-docstrings chore: fix incorrect docstring formatting.	2021-08-31 09:46:18 +02:00
Mahmood Ali	fec0adbb0e	Support mTLS clusters for e2e testing (#11092 ) This allows us to spin up e2e clusters with mTLS configured for all HashiCorp services, i.e. Nomad, Consul, and Vault. Used it for testing #11089 . mTLS is disabled by default. I have not updated Windows provisioning scripts yet - Windows also lacks ACL support from before. I intend to follow up for them in another round.	2021-08-30 10:18:16 -04:00
James Rasell	b6813f1221	chore: fix incorrect docstring formatting.	2021-08-30 11:08:12 +02:00
James Rasell	73ab63cf68	test: update e2e and dev scripts to use cni plugins v1.0.0	2021-08-27 11:14:47 +02:00
Mahmood Ali	97966c7a71	e2e: Run system jobs on all datacenters (#11060 ) Target all e2e datacenters for system and sysbatch e2e tests. They require that the system jobs run on all linux clients. However, the jobs currenly only target `dc1` datacenter, but the nightly e2e cluster has 4 clients spread in `dc1` and `dc2` datacenters, causing the tests to fail. I missed this problem in e2e dev cluster because it only used a single dc1 datacenter.	2021-08-17 11:01:47 -04:00
Mahmood Ali	28bc234e84	e2e: fix tests Use basic sleeps in busybox images. busybox are very light, and ping has permissions complications, and it may fail for network related issues.	2021-08-03 11:38:35 -04:00

1 2 3 4 5 ...

524 Commits