3371214431
This PR implements a new "System Batch" scheduler type. Jobs can make use of this new scheduler by setting their type to 'sysbatch'. Like the name implies, sysbatch can be thought of as a hybrid between system and batch jobs - it is for running short lived jobs intended to run on every compatible node in the cluster. As with batch jobs, sysbatch jobs can also be periodic and/or parameterized dispatch jobs. A sysbatch job is considered complete when it has been run on all compatible nodes until reaching a terminal state (success or failed on retries). Feasibility and preemption are governed the same as with system jobs. In this PR, the update stanza is not yet supported. The update stanza is sill limited in functionality for the underlying system scheduler, and is not useful yet for sysbatch jobs. Further work in #4740 will improve support for the update stanza and deployments. Closes #2527
90 lines
4.1 KiB
Plaintext
90 lines
4.1 KiB
Plaintext
---
|
|
layout: docs
|
|
page_title: Schedulers
|
|
description: Learn about Nomad's various schedulers.
|
|
---
|
|
|
|
# Schedulers
|
|
|
|
Nomad has four scheduler types that can be used when creating your job:
|
|
`service`, `batch`, `system` and `sysbatch`. Here we will describe the differences
|
|
between each of these schedulers.
|
|
|
|
## Service
|
|
|
|
The `service` scheduler is designed for scheduling long lived services that
|
|
should never go down. As such, the `service` scheduler ranks a large portion
|
|
of the nodes that meet the job's constraints and selects the optimal node to
|
|
place a task group on. The `service` scheduler uses a best fit scoring algorithm
|
|
influenced by Google's work on [Borg]. Ranking this larger set of candidate
|
|
nodes increases scheduling time but provides greater guarantees about the
|
|
optimality of a job placement, which given the service workload is highly
|
|
desirable.
|
|
|
|
Service jobs are intended to run until explicitly stopped by an operator. If a
|
|
service task exits it is considered a failure and handled according to the job's
|
|
[restart] and [reschedule] stanzas.
|
|
|
|
## Batch
|
|
|
|
Batch jobs are much less sensitive to short term performance fluctuations and
|
|
are short lived, finishing in a few minutes to a few days. Although the `batch`
|
|
scheduler is very similar to the `service` scheduler, it makes certain
|
|
optimizations for the batch workload. The main distinction is that after finding
|
|
the set of nodes that meet the job's constraints it uses the power of two
|
|
choices described in Berkeley's [Sparrow] scheduler to limit the number of nodes
|
|
that are ranked.
|
|
|
|
Batch jobs are intended to run until they exit successfully. Batch tasks that
|
|
exit with an error are handled according to the job's [restart] and [reschedule]
|
|
stanzas.
|
|
|
|
## System
|
|
|
|
The `system` scheduler is used to register jobs that should be run on all
|
|
clients that meet the job's constraints. The `system` scheduler is also invoked
|
|
when clients join the cluster or transition into the ready state. This means
|
|
that all registered `system` jobs will be re-evaluated and their tasks will be
|
|
placed on the newly available nodes if the constraints are met.
|
|
|
|
This scheduler type is extremely useful for deploying and managing tasks that
|
|
should be present on every node in the cluster. Since these tasks are
|
|
managed by Nomad, they can take advantage of job updating,
|
|
service discovery, and more.
|
|
|
|
Since Nomad 0.9, the system scheduler will preempt eligible lower priority
|
|
tasks running on a node if there isn't enough capacity to place a system job.
|
|
See [preemption] for details on how tasks that get preempted are chosen.
|
|
|
|
Systems jobs are intended to run until explicitly stopped either by an operator
|
|
or [preemption]. If a system task exits it is considered a failure and handled
|
|
according to the job's [restart] stanza; system jobs do not have rescheduling.
|
|
|
|
## System Batch
|
|
|
|
The `sysbatch` scheduler is used to register jobs that should be run to completion
|
|
on all clients that meet the job's constraints. The `sysbatch` scheduler will
|
|
schedule jobs similarly to the `system` scheduler, but like a `batch` job once a
|
|
task exists successfully it is not restarted on that client.
|
|
|
|
This scheduler type is useful for issuing "one off" commands to be run on every
|
|
node in the cluster. Sysbatch jobs can also be created as [periodic] and [parameterized]
|
|
jobs. Since these tasks are managed by Nomad, they can take advantage of job
|
|
updating, service discovery, monitoring, and more.
|
|
|
|
The `sysbatch` scheduler will preempt lower priority tasks running on a node if there
|
|
is not enough capacity to place the job. See preemption details on how tasks that
|
|
get preempted are chosen.
|
|
|
|
Sysbatch jobs are intended to run until successful completion, explicitly stopped
|
|
by an operator, or evicted through [preemption]. Sysbatch tasks that exit with an
|
|
error are handled according to the job's [restart] stanza.
|
|
|
|
[borg]: https://research.google.com/pubs/pub43438.html
|
|
[parameterized]: /docs/job-specification/parameterized
|
|
[periodic]: /docs/job-specification/periodic
|
|
[preemption]: /docs/internals/scheduling/preemption
|
|
[restart]: /docs/job-specification/restart
|
|
[reschedule]: /docs/job-specification/reschedule
|
|
[sparrow]: https://cs.stanford.edu/~matei/papers/2013/sosp_sparrow.pdf
|