The `plugin status` command supports displaying CSI capabilities and topology
accessibility, but this was missing from the documentation. Extend the
`-verbose` example to show that info.
During unusual outage recovery scenarios on large clusters, a backlog of
millions of evaluations can appear. In these cases, the `eval delete` command can
put excessive load on the cluster by listing large sets of evals to extract the
IDs and then sending larges batches of IDs. Although the command's batch size
was carefully tuned, we still need to be JSON deserialize, re-serialize to
MessagePack, send the log entries through raft, and get the FSM applied.
To improve performance of this recovery case, move the batching process into the
RPC handler and the state store. The design here is a little weird, so let's
look a the failed options first:
* A naive solution here would be to just send the filter as the raft request and
let the FSM apply delete the whole set in a single operation. Benchmarking with
1M evals on a 3 node cluster demonstrated this can block the FSM apply for
several minutes, which puts the cluster at risk if there's a leadership
failover (the barrier write can't be made while this apply is in-flight).
* A less naive but still bad solution would be to have the RPC handler filter
and paginate, and then hand a list of IDs to the existing raft log
entry. Benchmarks showed this blocked the FSM apply for 20-30s at a time and
took roughly an hour to complete.
Instead, we're filtering and paginating in the RPC handler to find a page token,
and then passing both the filter and page token in the raft log. The FSM apply
recreates the paginator using the filter and page token to get roughly the same
page of evaluations, which it then deletes. The pagination process is fairly
cheap (only abut 5% of the total FSM apply time), so counter-intuitively this
rework ends up being much faster. A benchmark of 1M evaluations showed this
blocked the FSM apply for 20-30ms at a time (typical for normal operations) and
completes in less than 4 minutes.
Note that, as with the existing design, this delete is not consistent: a new
evaluation inserted "behind" the cursor of the pagination will fail to be
deleted.
These options are mutually exclusive but, since `-hcl2-strict` defaults
to `true` users had to explicitily set it to `false` when using `-hcl1`.
Also return `255` when job plan fails validation as this is the expected
code in this situation.
The ACL command docs are now found within a sub-dir like the
operator command docs. Updates to the ACL token commands to
accommodate token expiry have also been added.
The ACL API docs are now found within a sub-dir like the operator
API docs. The ACL docs now include the ACL roles endpoint as well
as updated ACL token endpoints for token expiration.
The configuration section is also updated to accommodate the new
ACL and server parameters for the new ACL features.
* allocrunner: handle lifecycle when all tasks die
When all tasks die the Coordinator must transition to its terminal
state, coordinatorStatePoststop, to unblock poststop tasks. Since this
could happen at any time (for example, a prestart task dies), all states
must be able to transition to this terminal state.
* allocrunner: implement different alloc restarts
Add a new alloc restart mode where all tasks are restarted, even if they
have already exited. Also unifies the alloc restart logic to use the
implementation that restarts tasks concurrently and ignores
ErrTaskNotRunning errors since those are expected when restarting the
allocation.
* allocrunner: allow tasks to run again
Prevent the task runner Run() method from exiting to allow a dead task
to run again. When the task runner is signaled to restart, the function
will jump back to the MAIN loop and run it again.
The task runner determines if a task needs to run again based on two new
task events that were added to differentiate between a request to
restart a specific task, the tasks that are currently running, or all
tasks that have already run.
* api/cli: add support for all tasks alloc restart
Implement the new -all-tasks alloc restart CLI flag and its API
counterpar, AllTasks. The client endpoint calls the appropriate restart
method from the allocrunner depending on the restart parameters used.
* test: fix tasklifecycle Coordinator test
* allocrunner: kill taskrunners if all tasks are dead
When all non-poststop tasks are dead we need to kill the taskrunners so
we don't leak their goroutines, which are blocked in the alloc restart
loop. This also ensures the allocrunner exits on its own.
* taskrunner: fix tests that waited on WaitCh
Now that "dead" tasks may run again, the taskrunner Run() method will
not return when the task finishes running, so tests must wait for the
task state to be "dead" instead of using the WaitCh, since it won't be
closed until the taskrunner is killed.
* tests: add tests for all tasks alloc restart
* changelog: add entry for #14127
* taskrunner: fix restore logic.
The first implementation of the task runner restore process relied on
server data (`tr.Alloc().TerminalStatus()`) which may not be available
to the client at the time of restore.
It also had the incorrect code path. When restoring a dead task the
driver handle always needs to be clear cleanly using `clearDriverHandle`
otherwise, after exiting the MAIN loop, the task may be killed by
`tr.handleKill`.
The fix is to store the state of the Run() loop in the task runner local
client state: if the task runner ever exits this loop cleanly (not with
a shutdown) it will never be able to run again. So if the Run() loops
starts with this local state flag set, it must exit early.
This local state flag is also being checked on task restart requests. If
the task is "dead" and its Run() loop is not active it will never be
able to run again.
* address code review requests
* apply more code review changes
* taskrunner: add different Restart modes
Using the task event to differentiate between the allocrunner restart
methods proved to be confusing for developers to understand how it all
worked.
So instead of relying on the event type, this commit separated the logic
of restarting an taskRunner into two methods:
- `Restart` will retain the current behaviour and only will only restart
the task if it's currently running.
- `ForceRestart` is the new method where a `dead` task is allowed to
restart if its `Run()` method is still active. Callers will need to
restart the allocRunner taskCoordinator to make sure it will allow the
task to run again.
* minor fixes
The original design for workload identities and ACLs allows for operators to
extend the automatic capabilities of a workload by using a specially-named
policy. This has shown to be potentially unsafe because of naming collisions, so
instead we'll allow operators to explicitly attach a policy to a workload
identity.
This changeset adds workload identity fields to ACL policy objects and threads
that all the way down to the command line. It also a new secondary index to the
ACL policy table on namespace and job so that claim resolution can efficiently
query for related policies.
Document the secure variables keyring commands, document the aliased
gossip keyring commands, and note that the old gossip keyring commands
are deprecated.
The sidebar navigation tree for the `operator` sub-sub commands is
getting cluttered and we have a new set of commands coming to support
secure variables keyring as well. Move these all under their own
subtrees.
Use the same output format when listing multiple evals in the `eval
list` command and when `eval status <prefix>` matches more than one
eval.
Include the eval namespace in all output formats and always include the
job ID in `eval status` since, even `node-update` evals are related to a
job.
Add Node ID to the evals table output to help differentiate
`node-update` evals.
Co-authored-by: James Rasell <jrasell@hashicorp.com>
* core: allow pause/un-pause of eval broker on region leader.
* agent: add ability to pause eval broker via scheduler config.
* cli: add operator scheduler commands to interact with config.
* api: add ability to pause eval broker via scheduler config
* e2e: add operator scheduler test for eval broker pause.
* docs: include new opertor scheduler CLI and pause eval API info.
In addition to jobs, there are other objects in Nomad that have a
specific format and can be provided to commands and API endpoints.
This commit creates a new menu section to hold the specification for
volumes and update the command pages to point to the new centralized
definition.
Redirecting the previous entries is not possible with `redirect.js`
because they are done server-side and URL fragments are not accessible
to detect a match. So we provide hidden anchors with a link to the new
page to guide users towards the new documentation.
Co-authored-by: Tim Gross <tgross@hashicorp.com>
The description of `mount_flags` provides incorrect example
of the accepted value format.
This fixes the issue by changing the example from a string
`ro,noatime` to a slice of strings `["ro", "noatime"]`.
Closes#12927Closes#12958
This PR updates the version of redis used in our examples from 3.2 to 7.
The old version is very not supported anymore, and we should be setting
a good example by using a supported version.
The long-form example job is now fixed so that the service stanza uses
nomad as the service discovery provider, and so now the job runs without
a requirement of having Consul running and configured.
In #12324 we made it so that plugins wait until the node drain is
complete, as we do for system jobs. But we neglected to mark the node
drain as complete once only plugins (or system jobs) remaining, which
means that the node drain is left in a draining state until the
`deadline` time expires. This was incorrectly documented as expected
behavior in #12324.
The capacity fields for `create volume` set bounds on the resulting
size of the volume, but the ultimate size of the volume will be
determined by the storage provider (between the min and max). Clarify
this in the documentation and provide a suggestion for how to set a
exact size.
* docs: update json jobs docs
Did you know that Nomad has not 1 but 2 JSON formats for jobs? 2½ if you
want to acknowledge that sometimes our JSON job representations have a
Job top-level wrapper and sometimes do not.
The 2½ formats are:
```
1. HCL JSON
2. Input API JSON (top-level Job field)
2.5. Output API JSON (lacks top-level Job field)
```
`#2` is what our docs consider our API JSON. `#2.5` seems to be an
accident of history we can't fix with breaking API compatibility.
`#1` is an even more interesting accident of history: the `jobspec2`
package automatically detects if the input to Parse is JSON and switches
to a JSON parser. This behavior is undocumented, the format is
unspecified, and there is no official HashiCorp tooling to produce this
JSON from HCL. The plot thickens when you discover popular third party
tools like hcl2json.com and https://github.com/tmccombs/hcl2json seem to
produce JSON that `nomad run` accepts!
Since we have no telemetry around whether or not anyone passes HCL JSON
to `nomad run`, and people don't file bugs around features that Just
Work, I'm choosing to leave that code path in place and *acknowledged
but not suggested* in documentation.
See https://github.com/hashicorp/hcl/issues/498 for a more comprehensive
discussion of what officially supporting HCL JSON in Nomad would look
like.
(I also added some of the missing fields to the (Input API flavor) JSON
Job documentation, but it still needs a lot of work to be
comprehensive.)
Co-authored-by: Tim Gross <tgross@hashicorp.com>
After a more detailed analysis of this feature, the approach taken in
PR #12449 was found to be not ideal due to poor UX (users are
responsible for setting the entity alias they would like to use) and
issues around jobs potentially masquerading itself as another Vault
entity.
* cli: add -json flag to support job commands
While the CLI has always supported running JSON jobs, its support has
been via HCLv2's JSON parsing. I have no idea what format it expects the
job to be in, but it's absolutely not in the same format as the API
expects.
So I ignored that and added a new -json flag to explicitly support *API*
style JSON jobspecs.
The jobspecs can even have the wrapping {"Job": {...}} envelope or not!
* docs: fix example for `nomad job validate`
We haven't been able to validate inside driver config stanzas ever since
the move to task driver plugins. 😭
The new `namespace apply` feature that allows for passing a namespace
specification file detects the difference between an empty namespace
and a namespace specification by checking if the file exists. For most
cases, the file will have an extension like `.hcl` and so there's
little danger that a user will apply a file spec when they intended to
apply a file name.
But because directory names typically don't include an extension,
you're much more likely to collide when trying to `namespace apply` by
name only, and then you get a confusing error message of the form:
Failed to read file: read $namespace: is a directory
Detect the case where the namespace name collides with a directory in
the current working directory, and skip trying to load the directory.