68 lines
3 KiB
Plaintext
68 lines
3 KiB
Plaintext
---
|
|
layout: docs
|
|
page_title: Schedulers
|
|
description: Learn about Nomad's various schedulers.
|
|
---
|
|
|
|
# Schedulers
|
|
|
|
Nomad has three scheduler types that can be used when creating your job:
|
|
`service`, `batch` and `system`. Here we will describe the differences between
|
|
each of these schedulers.
|
|
|
|
## Service
|
|
|
|
The `service` scheduler is designed for scheduling long lived services that
|
|
should never go down. As such, the `service` scheduler ranks a large portion
|
|
of the nodes that meet the job's constraints and selects the optimal node to
|
|
place a task group on. The `service` scheduler uses a best fit scoring algorithm
|
|
influenced by Google's work on [Borg]. Ranking this larger set of candidate
|
|
nodes increases scheduling time but provides greater guarantees about the
|
|
optimality of a job placement, which given the service workload is highly
|
|
desirable.
|
|
|
|
Service jobs are intended to run until explicitly stopped by an operator. If a
|
|
service task exits it is considered a failure and handled according to the job's
|
|
[restart] and [reschedule] stanzas.
|
|
|
|
## Batch
|
|
|
|
Batch jobs are much less sensitive to short term performance fluctuations and
|
|
are short lived, finishing in a few minutes to a few days. Although the `batch`
|
|
scheduler is very similar to the `service` scheduler, it makes certain
|
|
optimizations for the batch workload. The main distinction is that after finding
|
|
the set of nodes that meet the job's constraints it uses the power of two
|
|
choices described in Berkeley's [Sparrow] scheduler to limit the number of nodes
|
|
that are ranked.
|
|
|
|
Batch jobs are intended to run until they exit successfully. Batch tasks that
|
|
exit with an error are handled according to the job's [restart] and [reschedule]
|
|
stanzas.
|
|
|
|
## System
|
|
|
|
The `system` scheduler is used to register jobs that should be run on all
|
|
clients that meet the job's constraints. The `system` scheduler is also invoked
|
|
when clients join the cluster or transition into the ready state. This means
|
|
that all registered `system` jobs will be re-evaluated and their tasks will be
|
|
placed on the newly available nodes if the constraints are met.
|
|
|
|
This scheduler type is extremely useful for deploying and managing tasks that
|
|
should be present on every node in the cluster. Since these tasks are
|
|
managed by Nomad, they can take advantage of job updating,
|
|
service discovery, and more.
|
|
|
|
Since Nomad 0.9, the system scheduler will preempt eligible lower priority
|
|
tasks running on a node if there isn't enough capacity to place a system job.
|
|
See [preemption] for details on how tasks that get preempted are chosen.
|
|
|
|
Systems jobs are intended to run until explicitly stopped either by an operator
|
|
or [preemption]. If a system task exits it is considered a failure and handled
|
|
according to the job's [restart] stanza; system jobs do not have rescheduling.
|
|
|
|
[borg]: https://research.google.com/pubs/pub43438.html
|
|
[sparrow]: https://cs.stanford.edu/~matei/papers/2013/sosp_sparrow.pdf
|
|
[preemption]: /docs/internals/scheduling/preemption
|
|
[restart]: /docs/job-specification/restart
|
|
[reschedule]: /docs/job-specification/reschedule
|