* `nomad operator keyring` was missing the general options section
* `nomad operator metrics` was missing a page in the docs entirely
Co-authored-by: Luiz Aoqui <luiz@hashicorp.com>
The `nomad alloc logs` command does not remove terminal escape sequences for
color from the log outputs of a task. Clarify that the standard `-no-color`
flag, which does apply to Nomad's error responses from `nomad alloc logs`,
does not apply to the log output.
This is a first draft of HCLv2 docs - I added the details under hcl2 doc with some minimal info highlighting the hcl2 introductions.
As a longer term strategy, we'll want to mimic the Packer HCL docs structure that details all the blocks and allowed expressions/functions in greater details. However, given that the exact functions and templating syntax is still somewhat influx, I opt to push that to another time.
The terms task directory and allocation directory are used throughout the
documentation but these directories are not the same as the `NOMAD_TASK_DIR`
and `NOMAD_ALLOC_DIR` locations. This is confusing when trying to use the
`template` and `artifact` stanzas, especially when trying to use a destination
outside the Nomad-mounted directories for Docker and similar drivers.
This changeset introduces "allocation working directory" to mean the location
on disk where the various directories and artifacts are staged, and "task
working directory" for the task. Clarify how specific task drivers interact
with the task working directory.
When we try to prefix match the `nomad volume detach` node ID argument, the
node may have been already GC'd. The volume unpublish workflow gracefully
handles this case so that we can free the claim. So make a best effort to find
a node ID among the volume's claimed allocations, or otherwise just use the
node ID we've been given by the user as-is.
The CSI specification for `ValidateVolumeCapability` says that we shall
"reconcile successful capability-validation responses by comparing the
validated capabilities with those that it had originally requested" but leaves
the details of that reconcilation unspecified. This API is not implemented in
Kubernetes, so controller plugins don't have a real-world implementation to
verify their behavior against.
We have found that CSI plugins in the wild may return "successful" but
incomplete `VolumeCapability` responses, so we can't require that all
capabilities we expect have been validated, only that the ones that have been
validated match. This appears to violate the CSI specification but until
that's been resolved in upstream we have to loosen our validation
requirements. The tradeoff is that we're more likely to have runtime errors
during `NodeStageVolume` instead of at the time of volume registration.
The CSI specification allows only the `file-system` attachment mode to have
mount options. The `block-device` mode is left "intentionally empty, for now"
in the protocol. We should be validating against this problem, but our
documentation also had it backwards.
Also adds missing mount_options on group volume.
The soundness guarantees of the CSI specification leave a little to be desired
in our ability to provide a 100% reliable automated solution for managing
volumes. This changeset provides a new command to bridge this gap by providing
the operator the ability to intervene.
The command doesn't take an allocation ID so that the operator doesn't have to
keep track of alloc IDs that may have been GC'd. Handle this case in the
unpublish RPC by sending the client RPC for all the terminal/nil allocs on the
selected node.
Postrun hooks for allocation runners don't currently block the registration of
terminal health with the servers, which is what allows system jobs to be
drained. So draining nodes with jobs that claim CSI volumes requires the
`-ignore-system` job to ensure that the postrun hook for service jobs gets a
chance to execute.
Adds a `-global` flag for stopping multiregion jobs in all regions at
once. Warn the user if they attempt to stop a multiregion job in a single
region.