This PR fixes a flakey test where we did not wait on the check
status to actually become failing (go too fast and you just get
a pending check).
Instead add a helper for waiting on any check in the alloc to become
the state we are looking for.
This PR adds some NSD check status output to the CLI.
1. The 'nomad alloc status' command produces nsd check summary output (if present)
2. The 'nomad alloc checks' sub-command is added to produce complete nsd check output (if present)
This PR is the first of several for cleaning up warnings, and refactoring
bits of code in the command package. First pass is over acl_ files and
gets some helpers in place.
* test: use `T.TempDir` to create temporary test directory
This commit replaces `ioutil.TempDir` with `t.TempDir` in tests. The
directory created by `t.TempDir` is automatically removed when the test
and all its subtests complete.
Prior to this commit, temporary directory created using `ioutil.TempDir`
needs to be removed manually by calling `os.RemoveAll`, which is omitted
in some tests. The error handling boilerplate e.g.
defer func() {
if err := os.RemoveAll(dir); err != nil {
t.Fatal(err)
}
}
is also tedious, but `t.TempDir` handles this for us nicely.
Reference: https://pkg.go.dev/testing#T.TempDir
Signed-off-by: Eng Zer Jun <engzerjun@gmail.com>
* test: fix TestLogmon_Start_restart on Windows
Signed-off-by: Eng Zer Jun <engzerjun@gmail.com>
* test: fix failing TestConsul_Integration
t.TempDir fails to perform the cleanup properly because the folder is
still in use
testing.go:967: TempDir RemoveAll cleanup: unlinkat /tmp/TestConsul_Integration2837567823/002/191a6f1a-5371-cf7c-da38-220fe85d10e5/web/secrets: device or resource busy
Signed-off-by: Eng Zer Jun <engzerjun@gmail.com>
CSI `CreateVolume` RPC is idempotent given that the topology,
capabilities, and parameters are unchanged. CSI volumes have many
user-defined fields that are immutable once set, and many fields that
are not user-settable.
Update the `Register` RPC so that updating a volume via the API merges
onto any existing volume without touching Nomad-controlled fields,
while validating it with the same strict requirements expected for
idempotent `CreateVolume` RPCs.
Also, clarify that this state store method is used for everything, not just
for the `Register` RPC.
The `nomad alloc status -verbose` command returns a 404 from CSI volumes
because the volume mount block in the task points back to the
`job.group.volume` block. So using the `Name` field to query is the "name" as
seen in the jobspec, and not the name of the volume that we need for querying.
Show both the job-specific name and the volume ID in the resulting output,
which clarifies the difference between the two fields and is more consistent
with the web UI.
* use msgtype in upsert node
adds message type to signature for upsert node, update tests, remove placeholder method
* UpsertAllocs msg type test setup
* use upsertallocs with msg type in signature
update test usage of delete node
delete placeholder msgtype method
* add msgtype to upsert evals signature, update test call sites with test setup msg type
handle snapshot upsert eval outside of FSM and ignore eval event
remove placeholder upsertevalsmsgtype
handle job plan rpc and prevent event creation for plan
msgtype cleanup upsertnodeevents
updatenodedrain msgtype
msg type 0 is a node registration event, so set the default to the ignore type
* fix named import
* fix signature ordering on upsertnode to match
Display task lifecycle info in `nomad alloc status <alloc_id>` output.
I chose to embed it in the Task header and only add it for tasks with
lifecycle info.
Also, I chose to order the tasks in the following order:
1. prestart non-sidecar tasks
2. prestart sidecar tasks
3. main tasks
The tasks are sorted lexicographically within each tier.
Sample output:
```
$ nomad alloc status 6ec0eb52
ID = 6ec0eb52-e6c8-665c-169c-113d6081309b
Eval ID = fb0caa98
Name = lifecycle.cache[0]
[...]
Task "init" (prestart) is "dead"
Task Resources
CPU Memory Disk Addresses
0/500 MHz 0 B/256 MiB 300 MiB
[...]
Task "some-sidecar" (prestart sidecar) is "running"
Task Resources
CPU Memory Disk Addresses
0/500 MHz 68 KiB/256 MiB 300 MiB
[...]
Task "redis" is "running"
Task Resources
CPU Memory Disk Addresses
10/500 MHz 984 KiB/256 MiB 300 MiB
[...]
```
* nomad/state/schema: use the namespace compound index
* scheduler/scheduler: CSIVolumeByID interface signature namespace
* scheduler/stack: SetJob on CSIVolumeChecker to capture namespace
* scheduler/feasible: pass the captured namespace to CSIVolumeByID
* nomad/state/state_store: use namespace in csi_volume index
* nomad/fsm: pass namespace to CSIVolumeDeregister & Claim
* nomad/core_sched: pass the namespace in volumeClaimReap
* nomad/node_endpoint_test: namespaces in Claim testing
* nomad/csi_endpoint: pass RequestNamespace to state.*
* nomad/csi_endpoint_test: appropriately failed test
* command/alloc_status_test: appropriately failed test
* node_endpoint_test: avoid notTheNamespace for the job
* scheduler/feasible_test: call SetJob to capture the namespace
* nomad/csi_endpoint: ACL check the req namespace, query by namespace
* nomad/state/state_store: remove deregister namespace check
* nomad/state/state_store: remove unused CSIVolumes
* scheduler/feasible: CSIVolumeChecker SetJob -> SetNamespace
* nomad/csi_endpoint: ACL check
* nomad/state/state_store_test: remove call to state.CSIVolumes
* nomad/core_sched_test: job namespace match so claim gc works
Adds a stanza for both Host Volumes and CSI Volumes to the the CLI
output for `nomad alloc status`. Mostly relies on information already
in the API structs, but in the case where there are CSI Volumes we
need to make extra API calls to get the volume status. To reduce
overhead, these extra calls are hidden behind the `-verbose` flag.