Commit Graph

33 Commits

Author SHA1 Message Date
Tim Gross dc013b5267
E2E: set longer timeout for CSI plugin alloc start (#12732)
The CSI plugin allocations take a while to be marked healthy,
sometimes causing E2E test flakes during the setup phase of the
tests. There's nothing CSI specific about marking plugin allocs
healthy, as the plugin supervisor hook does all the fingerprinting in
the postrun hook (the prestart hook just makes a couple of empty
directories). The timeouts we're seeing may be because of where we're
pulling the images from; most our jobs pull from a CDN-backed public
registry whereas these are pulling from ECR. Set a 1min timeout for
these to make sure we have enough time to pull the image and start the
task.
2022-04-21 11:11:43 -04:00
Tim Gross 806a82dd0c
E2E: ensure that CSI EBS tests are isolated from each other (#12443)
Tear down the volume-consuming job between subtests, rather than after
all the tests are complete. For good measure, use a different ID for
the volume-consuming job as well.
2022-04-04 09:44:55 -04:00
Tim Gross 19703e3316
E2E: test exercising node drain behavior for CSI volumes (#12384) 2022-03-29 11:19:23 -04:00
Tim Gross e687a21da9
CSI: set plugin `CSI_ENDPOINT` env var only if unset by user (#12257)
* Use unix:// prefix for CSI_ENDPOINT variable by default
* Some plugins have strict validation over the format of the
  `CSI_ENDPOINT` variable, and unfortunately not all plugins
  agree. Allow the user to override the `CSI_ENDPOINT` to workaround
  those cases.
* Update all demos and tests with CSI_ENDPOINT
2022-03-21 11:48:47 -04:00
Tim Gross b94837a2b8
csi: add pagination args to `volume snapshot list` (#12193)
The snapshot list API supports pagination as part of the CSI
specification, but we didn't have it plumbed through to the command
line.
2022-03-07 12:19:28 -05:00
Tim Gross 09a7612150
csi: volume snapshot list plugin option is required (#12197)
The RPC for listing volume snapshots requires a plugin ID. Update the
`volume snapshot list` command to find the specific plugin from the
provided prefix.
2022-03-07 09:58:29 -05:00
Tim Gross a07386c507
e2e: use context for executing external commands (#12185)
If any E2E test hangs, it'll eventually timeout and panic, causing the
all the remaining tests to fail. External commands should use a short
context whenever possible so we can fail the test quickly and move on
to the next test.
2022-03-04 08:55:36 -05:00
Tim Gross b8b08fb32d
e2e: use UUID for CSI idempotency token (#12183)
The AWS EBS plugin appears to use the name field of the volume as an
idempotency token that persists across the entire AWS account, not
just the plugin lifespan.

Also fix the regex for the volume ID, which was originally taken from
the job ID regex but isn't actually the same. This hasn't failed tests
for us because we've always passed in the same volume ID.
2022-03-03 17:00:00 -05:00
Tim Gross f2a4ad0949
CSI: implement support for topology (#12129) 2022-03-01 10:15:46 -05:00
James Rasell 0a50d9fd2a
e2e: account for new job stop CLI exit behaviour.
PR #11550 changed the job stop exit behaviour when monitoring the
deployment. When stopping a job, the deployment becomes cancelled
and therefore the CLI now exits with status code 1 as it see this
as an error.

This change adds a new utility e2e function that accounts for this
behaviour.
2022-02-01 14:16:37 +01:00
Tim Gross dcc5268862 E2E/CSI: ensure jobs are stopped before checking claims are released
During refactoring of the CSI jobs, the EBS test dropped stopping the jobs
before checking that the claims were released.
2021-04-15 11:06:11 -04:00
Tim Gross a13590fb37 e2e/csi: fix name of column used for snapshot create output parsing 2021-04-13 09:15:19 -04:00
Tim Gross da89103c5c E2E: extend CSI test to cover create and snapshot workflows
Split the EBS and EFS tests out into their own test cases:
* EBS exercises the Controller RPCs, including the create/snapshot workflow.
* EFS exercises only the Node RPCs, and assumes we have an existing volume
that gets registered, rather than created.
2021-04-08 12:55:36 -04:00
Tim Gross 46223e190e E2E: bump AWS CSI driver versions 2021-03-24 14:17:38 -04:00
Tim Gross 0e774d40f5 E2E: CSI test should use expected unique-volume name 2021-03-23 08:34:17 -04:00
Tim Gross fa25e048b2
CSI: unique volume per allocation
Add a `PerAlloc` field to volume requests that directs the scheduler to test
feasibility for volumes with a source ID that includes the allocation index
suffix (ex. `[0]`), rather than the exact source ID.

Read the `PerAlloc` field when making the volume claim at the client to
determine if the allocation index suffix (ex. `[0]`) should be added to the
volume source ID.
2021-03-18 15:35:11 -04:00
Tim Gross 481f91034c
E2E: CSI driver provisioning (#9443)
* e2e/csi: wait longer for plugins to become healthy

Plugins are Docker containers, and as such sometimes we get delays in startup
due to pulling from the registry and this is a source of test flakiness. Give
the plugins a little longer to start up.

* e2e/csi: version bump for AWS EBS plugins
2020-11-25 09:05:22 -05:00
Tim Gross 7e4fd79eee
e2e: CSI test should detect un-deregisterable volumes (#9343)
Assert that deregistering a volume works without errors following a volume
reap. Use CLI helpers where feasible to exercise CSI command line. Dump plugin
allocation logs on deregistration failures for debugging purposes.
2020-11-13 09:31:21 -05:00
Tim Gross 115edb53a0
e2e: add flag to opt-in to creating EBS/EFS volumes (#9082)
For everyday developer use, we don't need volumes for testing CSI. Providing a
flag to opt-in speeds up deploying dev clusters and slightly reduces infra costs.

Skip CSI test if missing volume specs.
2020-10-14 10:29:33 -04:00
Tim Gross 09a97bd158
e2e: spread CSI controller plugins across multiple DCs (#8629)
Controller plugins that land on the same node will collide over their CSI
`mount_dir`, so give them enough room in our tests that they don't land on the
same host.

Also, version bump the EBS node plugins to match the controllers.
2020-08-10 16:41:39 -04:00
Tim Gross 12984ed1c9
e2e: CSI EBS test should expect 2 controllers (#8617) 2020-08-10 09:41:21 -04:00
Tim Gross fa6ec931f8
e2e: CSI EBS version bump to 0.6.0 (#8618) 2020-08-10 09:41:13 -04:00
Tim Gross 5dba653b43
csi/e2e: add 2nd controller for node drain testing (#8573) 2020-07-31 08:03:49 -04:00
Tim Gross 87f9bfaf1e
e2e/csi: update EFS plugin test to use v1.0 (#8562) 2020-07-30 08:41:48 -04:00
Tim Gross 23be116da0
csi: add -force flag to volume deregister (#8295)
The `nomad volume deregister` command currently returns an error if the volume
has any claims, but in cases where the claims can't be dropped because of
plugin errors, providing a `-force` flag gives the operator an escape hatch.

If the volume has no allocations or if they are all terminal, this flag
deletes the volume from the state store, immediately and implicitly dropping
all claims without further CSI RPCs. Note that this will not also
unmount/detach the volume, which we'll make the responsibility of a separate
`nomad volume detach` command.
2020-07-01 12:17:51 -04:00
Tim Gross 139c65c436
e2e: csi test can purge target job (#7823) 2020-05-01 13:25:50 -04:00
Tim Gross ab3086a1f4
e2e: testing reliability (#7701)
* pin CSI plugin versions
* ensure failing CSI tests clean up
* allow NOMAD_SHA env var to override makefile
2020-04-13 10:25:24 -04:00
Lang Martin c0dbcbef5f
e2e: csi: wait for volume write claims to be released before starting read jobs (#7641) 2020-04-07 07:40:44 -04:00
Tim Gross 50f807060a
e2e: csi tests can only run on linux (#7635) 2020-04-06 11:57:59 -04:00
Tim Gross 73dc2ad443 e2e/csi: add waiting for alloc stop 2020-04-06 10:15:55 -04:00
Tim Gross d81797ea33
e2e: improve test reliability for CSI (#7616)
This changeset:

* adds eval status to the error messages emitted when we have
  placement failure in tests. The implementation here isn't quite
  perfect but it's a lot better than "condition not met".
* enforces the ordering of teardown of the CSI test
* doesn't pass the purge flag to one of the two CSI tests, so that we
  exercise both code paths.
2020-04-03 15:52:58 -04:00
Tim Gross bde13dfc0c
e2e: have TF write-out HCL for CSI volume registration (#7599) 2020-04-02 12:16:43 -04:00
Tim Gross cd1c6173f4 csi: e2e tests for EBS and EFS plugins (#7343)
This changeset provides two basic e2e tests for CSI plugins targeting
common AWS use cases.

The EBS test launches the EBS plugin (controller + nodes) and registers
an EBS volume as a Nomad CSI volume. We deploy a job that writes to
the volume, stop that job, and reuse the volume for another job which
should be able to read the data written by the first job.

The EFS test launches the EFS plugin (nodes-only) and registers an EFS
volume as a Nomad CSI volume. We deploy a job that writes to the
volume, stop that job, and reuse the volume for another job which
should be able to read the data written by the first job.

The writer jobs mount the CSI volume at a location within the alloc
dir.
2020-03-23 13:59:18 -04:00