open-nomad/website/content/docs/internals/scheduling/scheduling.mdx

---
layout: docs
page_title: Scheduling
description: Learn about how scheduling works in Nomad.
---

# Scheduling in Nomad

[![Nomad Data Model][img-data-model]][img-data-model]

There are four primary "nouns" in Nomad; jobs, nodes, allocations, and
evaluations. Jobs are submitted by users and represent a _desired state_. A job
is a declarative description of tasks to run which are bounded by constraints
and require resources. Tasks can be scheduled on nodes in the cluster running
the Nomad client. The mapping of tasks in a job to clients is done using
allocations. An allocation is used to declare that a set of tasks in a job
should be run on a particular node. Scheduling is the process of determining
the appropriate allocations and is done as part of an evaluation.

An evaluation is created any time the external state, either desired or
emergent, changes. The desired state is based on jobs, meaning the desired
state changes if a new job is submitted, an existing job is updated, or a job
is deregistered. The emergent state is based on the client nodes, and so we
must handle the failure of any clients in the system. These events trigger the
creation of a new evaluation, as Nomad must _evaluate_ the state of the world
and reconcile it with the desired state.

This diagram shows the flow of an evaluation through Nomad:

[![Nomad Evaluation Flow][img-eval-flow]][img-eval-flow]

The lifecycle of an evaluation begins with an event causing the evaluation to
be created. Evaluations are created in the `pending` state and are enqueued
into the evaluation broker. There is a single evaluation broker which runs on
the leader server. The evaluation broker is used to manage the queue of pending
evaluations, provide priority ordering, and ensure at least once delivery.

Nomad servers run scheduling workers, defaulting to one per CPU core, which are
used to process evaluations. The workers dequeue evaluations from the broker,
and then invoke the appropriate scheduler as specified by the job. Nomad ships
with a `service` scheduler that optimizes for long-lived services, a `batch`
scheduler that is used for fast placement of batch jobs, a `system` scheduler
that is used to run jobs on every node, and a `core` scheduler which is used
for internal maintenance.

Schedulers are responsible for processing an evaluation and generating an
allocation _plan_. The plan is the set of allocations to evict, update, or
create. The specific logic used to generate a plan may vary by scheduler, but
generally the scheduler needs to first reconcile the desired state with the
real state to determine what must be done. New allocations need to be placed
and existing allocations may need to be updated, migrated, or stopped.

Placing allocations is split into two distinct phases, feasibility checking and
ranking. In the first phase the scheduler finds nodes that are feasible by
filtering unhealthy nodes, those missing necessary drivers, and those failing
the specified constraints.

The second phase is ranking, where the scheduler scores feasible nodes to find
the best fit. Scoring is primarily based on bin packing, which is used to
optimize the resource utilization and density of applications, but is also
augmented by affinity and anti-affinity rules. Nomad automatically applies a job
anti-affinity rule which discourages colocating multiple instances of a task
group. The combination of this anti-affinity and bin packing optimizes for
density while reducing the probability of correlated failures.

Once the scheduler has ranked enough nodes, the highest ranking node is
selected and added to the allocation plan.

When planning is complete, the scheduler submits the plan to the leader which
adds the plan to the plan queue. The plan queue manages pending plans, provides
priority ordering, and allows Nomad to handle concurrency races. Multiple
schedulers are running in parallel without locking or reservations, making
Nomad optimistically concurrent. As a result, schedulers might overlap work on
the same node and cause resource over-subscription. The plan queue allows the
leader node to protect against this and do partial or complete rejections of a
plan.

As the leader processes plans, it creates allocations when there is no conflict
and otherwise informs the scheduler of a failure in the plan result. The plan
result provides feedback to the scheduler, allowing it to terminate or explore
alternate plans if the previous plan was partially or completely rejected.

Once the scheduler has finished processing an evaluation, it updates the status
of the evaluation and acknowledges delivery with the evaluation broker. This
completes the lifecycle of an evaluation. Allocations that were created,
modified or deleted as a result will be picked up by client nodes and will
begin execution.

[omega]: https://research.google.com/pubs/pub41684.html
[borg]: https://research.google.com/pubs/pub43438.html
[img-data-model]: /img/nomad-data-model.png
[img-eval-flow]: /img/nomad-evaluation-flow.png
website: add docs on scheduling 2015-09-19 19:02:48 +00:00			`---`
new website :sparkles: 2020-02-06 23:45:31 +00:00			`layout: docs`
			`page_title: Scheduling`
			`description: Learn about how scheduling works in Nomad.`
website: add docs on scheduling 2015-09-19 19:02:48 +00:00			`---`

			`# Scheduling in Nomad`

Added sentence about job anti-affinity; Reflowed This will create a concrete mention of job anti-affminity in the Nomad documentation. The only place we discuss it currently is in a similar sentence on the website itself. I borrowed liberally from that sentence in crafting this line. 2017-08-01 17:13:42 +00:00			`[![Nomad Data Model][img-data-model]][img-data-model]`

			`There are four primary "nouns" in Nomad; jobs, nodes, allocations, and`
			`evaluations. Jobs are submitted by users and represent a _desired state_. A job`
			`is a declarative description of tasks to run which are bounded by constraints`
new website :sparkles: 2020-02-06 23:45:31 +00:00			`and require resources. Tasks can be scheduled on nodes in the cluster running`
Added sentence about job anti-affinity; Reflowed This will create a concrete mention of job anti-affminity in the Nomad documentation. The only place we discuss it currently is in a similar sentence on the website itself. I borrowed liberally from that sentence in crafting this line. 2017-08-01 17:13:42 +00:00			`the Nomad client. The mapping of tasks in a job to clients is done using`
			`allocations. An allocation is used to declare that a set of tasks in a job`
			`should be run on a particular node. Scheduling is the process of determining`
			`the appropriate allocations and is done as part of an evaluation.`

			`An evaluation is created any time the external state, either desired or`
			`emergent, changes. The desired state is based on jobs, meaning the desired`
			`state changes if a new job is submitted, an existing job is updated, or a job`
			`is deregistered. The emergent state is based on the client nodes, and so we`
			`must handle the failure of any clients in the system. These events trigger the`
			`creation of a new evaluation, as Nomad must _evaluate_ the state of the world`
			`and reconcile it with the desired state.`
website: add docs on scheduling 2015-09-19 19:02:48 +00:00
			`This diagram shows the flow of an evaluation through Nomad:`

Added sentence about job anti-affinity; Reflowed This will create a concrete mention of job anti-affminity in the Nomad documentation. The only place we discuss it currently is in a similar sentence on the website itself. I borrowed liberally from that sentence in crafting this line. 2017-08-01 17:13:42 +00:00			`[![Nomad Evaluation Flow][img-eval-flow]][img-eval-flow]`

			`The lifecycle of an evaluation begins with an event causing the evaluation to`
			be created. Evaluations are created in the `pending` state and are enqueued
			`into the evaluation broker. There is a single evaluation broker which runs on`
			`the leader server. The evaluation broker is used to manage the queue of pending`
			`evaluations, provide priority ordering, and ensure at least once delivery.`

			`Nomad servers run scheduling workers, defaulting to one per CPU core, which are`
			`used to process evaluations. The workers dequeue evaluations from the broker,`
			`and then invoke the appropriate scheduler as specified by the job. Nomad ships`
			with a `service` scheduler that optimizes for long-lived services, a `batch`
			scheduler that is used for fast placement of batch jobs, a `system` scheduler
			that is used to run jobs on every node, and a `core` scheduler which is used
docs: remove mention of custom schedulers Not sure if this was meant to imply adding more schedulers to Nomad is easy, or that we plan on adding pluggable schedulers. Either way, neither of those statements is really true unless you really stretch the definitions of "easy" and "plan". So remove this sentence as I can't imagine it does anything other than confuse people. 2020-07-22 15:57:43 +00:00			`for internal maintenance.`
Added sentence about job anti-affinity; Reflowed This will create a concrete mention of job anti-affminity in the Nomad documentation. The only place we discuss it currently is in a similar sentence on the website itself. I borrowed liberally from that sentence in crafting this line. 2017-08-01 17:13:42 +00:00
			`Schedulers are responsible for processing an evaluation and generating an`
			`allocation _plan_. The plan is the set of allocations to evict, update, or`
			`create. The specific logic used to generate a plan may vary by scheduler, but`
			`generally the scheduler needs to first reconcile the desired state with the`
			`real state to determine what must be done. New allocations need to be placed`
			`and existing allocations may need to be updated, migrated, or stopped.`

			`Placing allocations is split into two distinct phases, feasibility checking and`
			`ranking. In the first phase the scheduler finds nodes that are feasible by`
			`filtering unhealthy nodes, those missing necessary drivers, and those failing`
			`the specified constraints.`

			`The second phase is ranking, where the scheduler scores feasible nodes to find`
			`the best fit. Scoring is primarily based on bin packing, which is used to`
			`optimize the resource utilization and density of applications, but is also`
Review changes 2017-08-01 17:58:18 +00:00			`augmented by affinity and anti-affinity rules. Nomad automatically applies a job`
			`anti-affinity rule which discourages colocating multiple instances of a task`
			`group. The combination of this anti-affinity and bin packing optimizes for`
			`density while reducing the probability of correlated failures.`
Added a note about the job anti-affinity rule We only mention job anti-affinity on the main webpage. This sentence is borrowed from there with minor tweaks to at least introduce it in the docs corpus 2017-07-30 20:18:21 +00:00
Added sentence about job anti-affinity; Reflowed This will create a concrete mention of job anti-affminity in the Nomad documentation. The only place we discuss it currently is in a similar sentence on the website itself. I borrowed liberally from that sentence in crafting this line. 2017-08-01 17:13:42 +00:00			`Once the scheduler has ranked enough nodes, the highest ranking node is`
			`selected and added to the allocation plan.`
website: add docs on scheduling 2015-09-19 19:02:48 +00:00
Added sentence about job anti-affinity; Reflowed This will create a concrete mention of job anti-affminity in the Nomad documentation. The only place we discuss it currently is in a similar sentence on the website itself. I borrowed liberally from that sentence in crafting this line. 2017-08-01 17:13:42 +00:00			`When planning is complete, the scheduler submits the plan to the leader which`
			`adds the plan to the plan queue. The plan queue manages pending plans, provides`
			`priority ordering, and allows Nomad to handle concurrency races. Multiple`
			`schedulers are running in parallel without locking or reservations, making`
			`Nomad optimistically concurrent. As a result, schedulers might overlap work on`
			`the same node and cause resource over-subscription. The plan queue allows the`
			`leader node to protect against this and do partial or complete rejections of a`
			`plan.`
website: add docs on scheduling 2015-09-19 19:02:48 +00:00
			`As the leader processes plans, it creates allocations when there is no conflict`
Added sentence about job anti-affinity; Reflowed This will create a concrete mention of job anti-affminity in the Nomad documentation. The only place we discuss it currently is in a similar sentence on the website itself. I borrowed liberally from that sentence in crafting this line. 2017-08-01 17:13:42 +00:00			`and otherwise informs the scheduler of a failure in the plan result. The plan`
			`result provides feedback to the scheduler, allowing it to terminate or explore`
			`alternate plans if the previous plan was partially or completely rejected.`

			`Once the scheduler has finished processing an evaluation, it updates the status`
			`of the evaluation and acknowledges delivery with the evaluation broker. This`
			`completes the lifecycle of an evaluation. Allocations that were created,`
			`modified or deleted as a result will be picked up by client nodes and will`
			`begin execution.`

new website :sparkles: 2020-02-06 23:45:31 +00:00			`[omega]: https://research.google.com/pubs/pub41684.html`
			`[borg]: https://research.google.com/pubs/pub43438.html`
			`[img-data-model]: /img/nomad-data-model.png`
			`[img-eval-flow]: /img/nomad-evaluation-flow.png`