During incident response, operators may find that automated processes
elsewhere in the organization can be generating new workloads on Nomad
clusters that are unable to handle the workload. This changeset adds a
field to the `SchedulerConfiguration` API that causes all job
registration calls to be rejected unless the request has a management
ACL token.
This PR implements a new "System Batch" scheduler type. Jobs can
make use of this new scheduler by setting their type to 'sysbatch'.
Like the name implies, sysbatch can be thought of as a hybrid between
system and batch jobs - it is for running short lived jobs intended to
run on every compatible node in the cluster.
As with batch jobs, sysbatch jobs can also be periodic and/or parameterized
dispatch jobs. A sysbatch job is considered complete when it has been run
on all compatible nodes until reaching a terminal state (success or failed
on retries).
Feasibility and preemption are governed the same as with system jobs. In
this PR, the update stanza is not yet supported. The update stanza is sill
limited in functionality for the underlying system scheduler, and is
not useful yet for sysbatch jobs. Further work in #4740 will improve
support for the update stanza and deployments.
Closes#2527
Cluster operators want to have better control over memory
oversubscription and may want to enable/disable it based on their
experience.
This PR adds a scheduler configuration field to control memory
oversubscription. It's additional field that can be set in the [API via Scheduler Config](https://www.nomadproject.io/api-docs/operator/scheduler), or [the agent server config](https://www.nomadproject.io/docs/configuration/server#configuring-scheduler-config).
I opted to have the memory oversubscription be an opt-in, but happy to change it. To enable it, operators should call the API with:
```json
{
"MemoryOversubscriptionEnabled": true
}
```
If memory oversubscription is disabled, submitting jobs specifying `memory_max` will get a "Memory oversubscription is not
enabled" warnings, but the jobs will be accepted without them accessing
the additional memory.
The warning message is like:
```
$ nomad job run /tmp/j
Job Warnings:
1 warning(s):
* Memory oversubscription is not enabled; Task cache.redis memory_max value will be ignored
==> Monitoring evaluation "7c444157"
Evaluation triggered by job "example"
==> Monitoring evaluation "7c444157"
Evaluation within deployment: "9d826f13"
Allocation "aa5c3cad" created: node "9272088e", group "cache"
Evaluation status changed: "pending" -> "complete"
==> Evaluation "7c444157" finished with status "complete"
# then you can examine the Alloc AllocatedResources to validate whether the task is allowed to exceed memory:
$ nomad alloc status -json aa5c3cad | jq '.AllocatedResources.Tasks["redis"].Memory'
{
"MemoryMB": 256,
"MemoryMaxMB": 0
}
```
Ensure that `""` Scheduler Algorithm gets explicitly set to binpack on
upgrades or on API handling when user misses the value.
The scheduler already treats `""` value as binpack. This PR merely
ensures that the operator API returns the effective value.