open-nomad/website/source/docs/internals/scheduling.html.md

---
layout: "docs"
page_title: "Scheduling"
sidebar_current: "docs-internals-scheduling"
description: |-
  Learn about how scheduling works in Nomad.
---

# Scheduling

Scheduling is a core function of Nomad. It is the process of assigning tasks
from jobs to client machines. This process must respect the constraints as declared
in the job, and optimize for resource utilization. This page documents the details
of how scheduling works in Nomad to help both users and developers
build a mental model. The design is heavily inspired by Google's
work on both [Omega: flexible, scalable schedulers for large compute clusters](https://research.google.com/pubs/pub41684.html)
and [Large-scale cluster management at Google with Borg](https://research.google.com/pubs/pub43438.html).

~> **Advanced Topic!** This page covers technical details
of Nomad. You do not need to understand these details to
effectively use Nomad. The details are documented here for
those who wish to learn about them without having to go
spelunking through the source code.

# Scheduling in Nomad

[![Nomad Data Model](/assets/images/nomad-data-model.png)](/assets/images/nomad-data-model.png)

There are four primary "nouns" in Nomad; jobs, nodes, allocations, and evaluations.
Jobs are submitted by users and represent a _desired state_. A job is a declarative description
of tasks to run which are bounded by constraints and require resources. Tasks can be scheduled on 
nodes in the cluster running the Nomad client. The mapping of tasks in a job to clients is done
using allocations. An allocation is used to declare that a set of tasks in a job should be run
on a particular node. Scheduling is the process of determining the appropriate allocations and
is done as part of an evaluation.

An evaluation is created any time the external state, either desired or emergent, changes. The desired
state is based on jobs, meaning the desired state changes if a new job is submitted, an
existing job is updated, or a job is deregistered. The emergent state is based on the client
nodes, and so we must handle the failure of any clients in the system. These events trigger
the creation of a new evaluation, as Nomad must _evaluate_ the state of the world and reconcile
it with the desired state.

This diagram shows the flow of an evaluation through Nomad:

[![Nomad Evaluation Flow](/assets/images/nomad-evaluation-flow.png)](/assets/images/nomad-evaluation-flow.png)

The lifecycle of an evaluation beings with an event causing the evaluation to be
created. Evaluations are created in the `pending` state and are enqueued into the
evaluation broker. There is a single evaluation broker which runs on the leader server.
The evaluation broker is used to manage the queue of pending evaluations, provide priority ordering,
and ensure at least once delivery.

Nomad servers run scheduling workers, defaulting to one per CPU core, which are used to
process evaluations. The workers dequeue evaluations from the broker, and then invoke
the appropriate scheduler as specified by the job. Nomad ships with a `service` scheduler
that optimizes for long-lived services, a `batch` scheduler that is used for fast placement
of batch jobs, a `system` scheduler that is used to run jobs on every node,
and a `core` scheduler which is used for internal maintenance.
Nomad can be extended to support custom schedulers as well.

Schedulers are responsible for processing an evaluation and generating an allocation _plan_.
The plan is the set of allocations to evict, update, or create. The specific logic used to
generate a plan may vary by scheduler, but generally the scheduler needs to first reconcile
the desired state with the real state to determine what must be done. New allocations need
to be placed and existing allocations may need to be updated, migrated, or stopped.

Placing allocations is split into two distinct phases, feasibility
checking and ranking. In the first phase the scheduler finds nodes that are
feasible by filtering unhealthy nodes, those missing necessary drivers, and those
failing the specified constraints.

The second phase is ranking, where the scheduler scores feasible nodes to find the best fit.
Scoring is primarily based on bin packing, which is used to optimize the resource utilization
and density of applications, but is also augmented by affinity and anti-affinity rules.
Once the scheduler has ranked enough nodes, the highest ranking node is selected and
added to the allocation plan.

When planning is complete, the scheduler submits the plan to the leader which adds
the plan to the plan queue. The plan queue manages pending plans, provides priority
ordering, and allows Nomad to handle concurrency races. Multiple schedulers are running
in parallel without locking or reservations, making Nomad optimistically concurrent.
As a result, schedulers might overlap work on the same node and cause resource
over-subscription. The plan queue allows the leader node to protect against this and
do partial or complete rejections of a plan.

As the leader processes plans, it creates allocations when there is no conflict
and otherwise informs the scheduler of a failure in the plan result. The plan result
provides feedback to the scheduler, allowing it to terminate or explore alternate plans
if the previous plan was partially or completely rejected.

Once the scheduler has finished processing an evaluation, it updates the status of
the evaluation and acknowledges delivery with the evaluation broker. This completes
the lifecycle of an evaluation. Allocations that were created, modified or deleted
as a result will be picked up by client nodes and will begin execution.
website: add docs on scheduling 2015-09-19 19:02:48 +00:00			`---`
			`layout: "docs"`
			`page_title: "Scheduling"`
			`sidebar_current: "docs-internals-scheduling"`
			`description: \|-`
fix typo 2015-09-24 16:41:41 +00:00			`Learn about how scheduling works in Nomad.`
website: add docs on scheduling 2015-09-19 19:02:48 +00:00			`---`

			`# Scheduling`

			`Scheduling is a core function of Nomad. It is the process of assigning tasks`
			`from jobs to client machines. This process must respect the constraints as declared`
website: update scheduling links 2015-09-19 19:08:42 +00:00			`in the job, and optimize for resource utilization. This page documents the details`
			`of how scheduling works in Nomad to help both users and developers`
			`build a mental model. The design is heavily inspired by Google's`
Update website to remove a lot of copy-paste with Vault + improve images 2015-09-20 20:37:22 +00:00			`work on both [Omega: flexible, scalable schedulers for large compute clusters](https://research.google.com/pubs/pub41684.html)`
			`and [Large-scale cluster management at Google with Borg](https://research.google.com/pubs/pub43438.html).`
website: add docs on scheduling 2015-09-19 19:02:48 +00:00
			`~> Advanced Topic! This page covers technical details`
Update website to remove a lot of copy-paste with Vault + improve images 2015-09-20 20:37:22 +00:00			`of Nomad. You do not need to understand these details to`
website: add docs on scheduling 2015-09-19 19:02:48 +00:00			`effectively use Nomad. The details are documented here for`
			`those who wish to learn about them without having to go`
			`spelunking through the source code.`

			`# Scheduling in Nomad`

Update website to remove a lot of copy-paste with Vault + improve images 2015-09-20 20:37:22 +00:00			`[![Nomad Data Model](/assets/images/nomad-data-model.png)](/assets/images/nomad-data-model.png)`
website: add docs on scheduling 2015-09-19 19:02:48 +00:00
minor grammer 2015-09-24 16:35:58 +00:00			`There are four primary "nouns" in Nomad; jobs, nodes, allocations, and evaluations.`
website: add docs on scheduling 2015-09-19 19:02:48 +00:00			`Jobs are submitted by users and represent a _desired state_. A job is a declarative description`
Rephrase the sentence 2016-03-18 06:19:28 +00:00			`of tasks to run which are bounded by constraints and require resources. Tasks can be scheduled on`
Typo Copy & paste isn't the best practice 2016-03-18 06:25:23 +00:00			`nodes in the cluster running the Nomad client. The mapping of tasks in a job to clients is done`
website: add docs on scheduling 2015-09-19 19:02:48 +00:00			`using allocations. An allocation is used to declare that a set of tasks in a job should be run`
			`on a particular node. Scheduling is the process of determining the appropriate allocations and`
			`is done as part of an evaluation.`

			`An evaluation is created any time the external state, either desired or emergent, changes. The desired`
			`state is based on jobs, meaning the desired state changes if a new job is submitted, an`
			`existing job is updated, or a job is deregistered. The emergent state is based on the client`
			`nodes, and so we must handle the failure of any clients in the system. These events trigger`
			`the creation of a new evaluation, as Nomad must _evaluate_ the state of the world and reconcile`
			`it with the desired state.`

			`This diagram shows the flow of an evaluation through Nomad:`

Update website to remove a lot of copy-paste with Vault + improve images 2015-09-20 20:37:22 +00:00			`[![Nomad Evaluation Flow](/assets/images/nomad-evaluation-flow.png)](/assets/images/nomad-evaluation-flow.png)`
website: add docs on scheduling 2015-09-19 19:02:48 +00:00
			`The lifecycle of an evaluation beings with an event causing the evaluation to be`
			created. Evaluations are created in the `pending` state and are enqueued into the
			`evaluation broker. There is a single evaluation broker which runs on the leader server.`
			`The evaluation broker is used to manage the queue of pending evaluations, provide priority ordering,`
			`and ensure at least once delivery.`

			`Nomad servers run scheduling workers, defaulting to one per CPU core, which are used to`
			`process evaluations. The workers dequeue evaluations from the broker, and then invoke`
schedule -> scheduler 2015-10-11 18:07:08 +00:00			the appropriate scheduler as specified by the job. Nomad ships with a `service` scheduler
website: add docs on scheduling 2015-09-19 19:02:48 +00:00			that optimizes for long-lived services, a `batch` scheduler that is used for fast placement
format long line 2015-10-19 20:46:10 +00:00			of batch jobs, a `system` scheduler that is used to run jobs on every node,
			and a `core` scheduler which is used for internal maintenance.
			`Nomad can be extended to support custom schedulers as well.`
website: add docs on scheduling 2015-09-19 19:02:48 +00:00
			`Schedulers are responsible for processing an evaluation and generating an allocation _plan_.`
			`The plan is the set of allocations to evict, update, or create. The specific logic used to`
			`generate a plan may vary by scheduler, but generally the scheduler needs to first reconcile`
			`the desired state with the real state to determine what must be done. New allocations need`
			`to be placed and existing allocations may need to be updated, migrated, or stopped.`

			`Placing allocations is split into two distinct phases, feasibility`
			`checking and ranking. In the first phase the scheduler finds nodes that are`
			`feasible by filtering unhealthy nodes, those missing necessary drivers, and those`
			`failing the specified constraints.`

			`The second phase is ranking, where the scheduler scores feasible nodes to find the best fit.`
			`Scoring is primarily based on bin packing, which is used to optimize the resource utilization`
			`and density of applications, but is also augmented by affinity and anti-affinity rules.`
			`Once the scheduler has ranked enough nodes, the highest ranking node is selected and`
			`added to the allocation plan.`

Update scheduling.html.md 2015-10-11 18:25:30 +00:00			`When planning is complete, the scheduler submits the plan to the leader which adds`
			`the plan to the plan queue. The plan queue manages pending plans, provides priority`
website: add docs on scheduling 2015-09-19 19:02:48 +00:00			`ordering, and allows Nomad to handle concurrency races. Multiple schedulers are running`
			`in parallel without locking or reservations, making Nomad optimistically concurrent.`
			`As a result, schedulers might overlap work on the same node and cause resource`
			`over-subscription. The plan queue allows the leader node to protect against this and`
			`do partial or complete rejections of a plan.`

			`As the leader processes plans, it creates allocations when there is no conflict`
			`and otherwise informs the scheduler of a failure in the plan result. The plan result`
			`provides feedback to the scheduler, allowing it to terminate or explore alternate plans`
			`if the previous plan was partially or completely rejected.`

			`Once the scheduler has finished processing an evaluation, it updates the status of`
			`the evaluation and acknowledges delivery with the evaluation broker. This completes`
			`the lifecycle of an evaluation. Allocations that were created, modified or deleted`
			`as a result will be picked up by client nodes and will begin execution.`