open-nomad/website/content/docs/concepts/architecture.mdx

---
layout: docs
page_title: Architecture
description: Learn about the internal architecture of Nomad.
---

# Architecture

Nomad is a complex system that has many different pieces. To help both users and developers of Nomad
build a mental model of how it works, this page documents the system architecture.

~> **Advanced Topic!** This page covers technical details
of Nomad. You do not need to understand these details to
effectively use Nomad. The details are documented here for
those who wish to learn about them without having to go
spelunking through the source code.

# Glossary

Before describing the architecture, we provide a glossary of terms to help
clarify what is being discussed:

#### Job

A Job is a specification provided by users that declares a workload for
Nomad. A Job is a form of _desired state_; the user is expressing that the
job should be running, but not where it should be run. The responsibility of
Nomad is to make sure the _actual state_ matches the user desired state. A
Job is composed of one or more task groups.

#### Task Group

A Task Group is a set of tasks that must be run together. For example, a web
server may require that a log shipping co-process is always running as well. A
task group is the unit of scheduling, meaning the entire group must run on the
same client node and cannot be split.

#### Driver

A Driver represents the basic means of executing your **Tasks**.  Example
Drivers include Docker, QEMU, Java, and static binaries.

#### Task

A Task is the smallest unit of work in Nomad. Tasks are executed by drivers,
which allow Nomad to be flexible in the types of tasks it supports. Tasks
specify their driver, configuration for the driver, constraints, and resources
required.

#### Client

A Nomad client is an agent configured to run and manage tasks using available
compute resources on a machine. The agent is responsible for registering with
the servers, watching for any work to be assigned and executing tasks. The
Nomad agent is a long lived process which interfaces with the servers.

#### Allocation

An Allocation is a mapping between a task group in a job and a client node. A
single job may have hundreds or thousands of task groups, meaning an
equivalent number of allocations must exist to map the work to client
machines. Allocations are created by the Nomad servers as part of scheduling
decisions made during an evaluation.

#### Evaluation

Evaluations are the mechanism by which Nomad makes scheduling decisions.  When
either the _desired state_ (jobs) or _actual state_ (clients) changes, Nomad
creates a new evaluation to determine if any actions must be taken. An
evaluation may result in changes to allocations if necessary.

#### Deployment

Deployments are the mechanism by which Nomad rolls out changes to cluster state
in a step-by-step fashion. Deployments are only available for Jobs with the type
`service`. When an Evaluation is processed, the scheduler creates only the
number of Allocations permitted by the [`update`][] block and the current state
of the cluster. The Deployment is used to monitor the health of those
Allocations and emit a new Evaluation for the next step of the update.

#### Server

Nomad servers are the brains of the cluster. There is a cluster of servers per
region and they manage all jobs and clients, run evaluations, and create task
allocations.  The servers replicate data between each other and perform leader
election to ensure high availability. More information about latency
requirements for servers can be found in [Network
Topology](/nomad/docs/install/production/requirements#network-topology).

#### Regions

Nomad models infrastructure as regions and datacenters. A region will contain
one or more datacenters. A set of servers joined together will represent a
single region. Servers federate across regions to make Nomad globally aware.

#### Datacenters

Nomad models a datacenter as an abstract grouping of clients within a
region. Nomad clients are not required to be in the same datacenter as the
servers they are joined with, but do need to be in the same
region. Datacenters provide a way to express fault tolerance among jobs as
well as isolation of infrastructure.

#### Bin Packing

Bin Packing is the process of filling bins with items in a way that maximizes
the utilization of bins. This extends to Nomad, where the clients are "bins"
and the items are task groups. Nomad optimizes resources by efficiently bin
packing tasks onto client machines.

# High-Level Overview

Looking at only a single region, at a high level Nomad looks like this:

[![Regional Architecture](/img/nomad-architecture-region.png)](/img/nomad-architecture-region.png)

Within each region, we have both clients and servers. Servers are responsible for
accepting jobs from users, managing clients, and [computing task placements](/nomad/docs/concepts/scheduling/scheduling).
Each region may have clients from multiple datacenters, allowing a small number of servers
to handle very large clusters.

In some cases, for either availability or scalability, you may need to run multiple
regions. Nomad supports federating multiple regions together into a single cluster.
At a high level, this setup looks like this:

[![Global Architecture](/img/nomad-architecture-global.png)](/img/nomad-architecture-global.png)

Regions are fully independent from each other, and do not share jobs, clients, or
state. They are loosely-coupled using a gossip protocol, which allows users to
submit jobs to any region or query the state of any region transparently. Requests
are forwarded to the appropriate server to be processed and the results returned.
Data is _not_ replicated between regions.

The servers in each region are all part of a single consensus group. This means
that they work together to elect a single leader which has extra duties. The leader
is responsible for processing all queries and transactions. Nomad is optimistically
concurrent, meaning all servers participate in making scheduling decisions in parallel.
The leader provides the additional coordination necessary to do this safely and
to ensure clients are not oversubscribed.

Each region is expected to have either three or five servers. This strikes a balance
between availability in the case of failure and performance, as consensus gets
progressively slower as more servers are added. However, there is no limit to the number
of clients per region.

Clients are configured to communicate with their regional servers and communicate
using remote procedure calls (RPC) to register themselves, send heartbeats for liveness,
wait for new allocations, and update the status of allocations. A client registers
with the servers to provide the resources available, attributes, and installed drivers.
Servers use this information for scheduling decisions and create allocations to assign
work to clients.

Users make use of the Nomad CLI or API to submit jobs to the servers. A job represents
a desired state and provides the set of tasks that should be run. The servers are
responsible for scheduling the tasks, which is done by finding an optimal placement for
each task such that resource utilization is maximized while satisfying all constraints
specified by the job. Resource utilization is maximized by bin packing, in which
the scheduling tries to make use of all the resources of a machine without
exhausting any dimension. Job constraints can be used to ensure an application is
running in an appropriate environment. Constraints can be technical requirements based
on hardware features such as architecture and availability of GPUs, or software features
like operating system and kernel version, or they can be business constraints like
ensuring PCI compliant workloads run on appropriate servers.

# Getting in Depth

This has been a brief high-level overview of the architecture of Nomad. There
are more details available for each of the sub-systems. The [consensus protocol](/nomad/docs/concepts/consensus),
[gossip protocol](/nomad/docs/concepts/gossip), and [scheduler design](/nomad/docs/concepts/scheduling/scheduling)
are all documented in more detail.

For other details, either consult the code, ask in IRC or reach out to the mailing list.


[`update`]: /nomad/docs/job-specification/update
website 2015-09-12 00:01:02 +00:00			`---`
new website :sparkles: 2020-02-06 23:45:31 +00:00			`layout: docs`
			`page_title: Architecture`
			`description: Learn about the internal architecture of Nomad.`
website 2015-09-12 00:01:02 +00:00			`---`

			`# Architecture`

website: Working on internal architecture 2015-09-17 23:29:25 +00:00			`Nomad is a complex system that has many different pieces. To help both users and developers of Nomad`
website 2015-09-12 00:01:02 +00:00			`build a mental model of how it works, this page documents the system architecture.`

			`~> Advanced Topic! This page covers technical details`
Update website to remove a lot of copy-paste with Vault + improve images 2015-09-20 20:37:22 +00:00			`of Nomad. You do not need to understand these details to`
website: Working on internal architecture 2015-09-17 23:29:25 +00:00			`effectively use Nomad. The details are documented here for`
website 2015-09-12 00:01:02 +00:00			`those who wish to learn about them without having to go`
website: Working on internal architecture 2015-09-17 23:29:25 +00:00			`spelunking through the source code.`
website 2015-09-12 00:01:02 +00:00
			`# Glossary`

			`Before describing the architecture, we provide a glossary of terms to help`
			`clarify what is being discussed:`

docs: ensure definitions have anchors Move the words being defined in the /docs/internal/architecture page to be small headers so that they can be linked to with anchors from Learn guides and other documentation location. 2021-05-18 18:13:49 +00:00			`#### Job`

			`A Job is a specification provided by users that declares a workload for`
			`Nomad. A Job is a form of _desired state_; the user is expressing that the`
			`job should be running, but not where it should be run. The responsibility of`
			`Nomad is to make sure the _actual state_ matches the user desired state. A`
			`Job is composed of one or more task groups.`

			`#### Task Group`

			`A Task Group is a set of tasks that must be run together. For example, a web`
			`server may require that a log shipping co-process is always running as well. A`
			`task group is the unit of scheduling, meaning the entire group must run on the`
			`same client node and cannot be split.`

			`#### Driver`

			`A Driver represents the basic means of executing your Tasks. Example`
			`Drivers include Docker, QEMU, Java, and static binaries.`

			`#### Task`

			`A Task is the smallest unit of work in Nomad. Tasks are executed by drivers,`
			`which allow Nomad to be flexible in the types of tasks it supports. Tasks`
			`specify their driver, configuration for the driver, constraints, and resources`
			`required.`

			`#### Client`

			`A Nomad client is an agent configured to run and manage tasks using available`
			`compute resources on a machine. The agent is responsible for registering with`
			`the servers, watching for any work to be assigned and executing tasks. The`
			`Nomad agent is a long lived process which interfaces with the servers.`

			`#### Allocation`

			`An Allocation is a mapping between a task group in a job and a client node. A`
			`single job may have hundreds or thousands of task groups, meaning an`
			`equivalent number of allocations must exist to map the work to client`
			`machines. Allocations are created by the Nomad servers as part of scheduling`
			`decisions made during an evaluation.`

			`#### Evaluation`

			`Evaluations are the mechanism by which Nomad makes scheduling decisions. When`
			`either the _desired state_ (jobs) or _actual state_ (clients) changes, Nomad`
			`creates a new evaluation to determine if any actions must be taken. An`
			`evaluation may result in changes to allocations if necessary.`

internals documentation with diagrams (#14750) This changeset adds new architecture internals documents to the contributing guide. These are intentionally here and not on the public-facing website because the material is not required for operators and includes a lot of diagrams that we can cheaply maintain with mermaid syntax but would involve art assets to have up on the main site that would become quickly out of date as code changes happen and be extremely expensive to maintain. However, these should be suitable to use as points of conversation with expert end users. Included: * A description of Evaluation triggers and expected counts, with examples. * A description of Evaluation states and implicit states. This is taken from an internal document in our team wiki. * A description of how writing the State Store works. This is taken from a diagram I put together a few months ago for internal education purposes. * A description of Evaluation lifecycle, from registration to running Allocations. This is mostly lifted from @lgfa29's amazing mega-diagram, but broken into digestible chunks and without multi-region deployments, which I'd like to cover in a future doc. Also includes adding Deployments to our public-facing glossary. Co-authored-by: Luiz Aoqui <luiz@hashicorp.com> Co-authored-by: Michael Schurter <mschurter@hashicorp.com> Co-authored-by: Seth Hoenig <shoenig@duck.com> 2022-10-03 18:06:41 +00:00			`#### Deployment`

			`Deployments are the mechanism by which Nomad rolls out changes to cluster state`
			`in a step-by-step fashion. Deployments are only available for Jobs with the type`
			`service`. When an Evaluation is processed, the scheduler creates only the
			number of Allocations permitted by the [`update`][] block and the current state
			`of the cluster. The Deployment is used to monitor the health of those`
			`Allocations and emit a new Evaluation for the next step of the update.`

docs: ensure definitions have anchors Move the words being defined in the /docs/internal/architecture page to be small headers so that they can be linked to with anchors from Learn guides and other documentation location. 2021-05-18 18:13:49 +00:00			`#### Server`

			`Nomad servers are the brains of the cluster. There is a cluster of servers per`
			`region and they manage all jobs and clients, run evaluations, and create task`
			`allocations. The servers replicate data between each other and perform leader`
			`election to ensure high availability. More information about latency`
			`requirements for servers can be found in [Network`
docs: Migrate link formats (#15779) * Adding check-legacy-links-format workflow * Adding test-link-rewrites workflow * chore: updates link checker workflow hash * Migrating links to new format Co-authored-by: Kendall Strautman <kendallstrautman@gmail.com> 2023-01-25 17:31:14 +00:00			`Topology](/nomad/docs/install/production/requirements#network-topology).`
docs: ensure definitions have anchors Move the words being defined in the /docs/internal/architecture page to be small headers so that they can be linked to with anchors from Learn guides and other documentation location. 2021-05-18 18:13:49 +00:00
			`#### Regions`

			`Nomad models infrastructure as regions and datacenters. A region will contain`
			`one or more datacenters. A set of servers joined together will represent a`
			`single region. Servers federate across regions to make Nomad globally aware.`

			`#### Datacenters`

			`Nomad models a datacenter as an abstract grouping of clients within a`
			`region. Nomad clients are not required to be in the same datacenter as the`
			`servers they are joined with, but do need to be in the same`
			`region. Datacenters provide a way to express fault tolerance among jobs as`
			`well as isolation of infrastructure.`

			`#### Bin Packing`

			`Bin Packing is the process of filling bins with items in a way that maximizes`
			`the utilization of bins. This extends to Nomad, where the clients are "bins"`
			`and the items are task groups. Nomad optimizes resources by efficiently bin`
			`packing tasks onto client machines.`
website 2015-09-12 00:01:02 +00:00
			`# High-Level Overview`

doc updates 2015-09-24 15:50:20 +00:00			`Looking at only a single region, at a high level Nomad looks like this:`
website: Working on internal architecture 2015-09-17 23:29:25 +00:00
new website :sparkles: 2020-02-06 23:45:31 +00:00			`[![Regional Architecture](/img/nomad-architecture-region.png)](/img/nomad-architecture-region.png)`
website: Working on internal architecture 2015-09-17 23:29:25 +00:00
			`Within each region, we have both clients and servers. Servers are responsible for`
docs: Migrate link formats (#15779) * Adding check-legacy-links-format workflow * Adding test-link-rewrites workflow * chore: updates link checker workflow hash * Migrating links to new format Co-authored-by: Kendall Strautman <kendallstrautman@gmail.com> 2023-01-25 17:31:14 +00:00			`accepting jobs from users, managing clients, and [computing task placements](/nomad/docs/concepts/scheduling/scheduling).`
website: update scheduling links 2015-09-19 19:08:42 +00:00			`Each region may have clients from multiple datacenters, allowing a small number of servers`
website: Working on internal architecture 2015-09-17 23:29:25 +00:00			`to handle very large clusters.`

			`In some cases, for either availability or scalability, you may need to run multiple`
			`regions. Nomad supports federating multiple regions together into a single cluster.`
doc updates 2015-09-24 15:50:20 +00:00			`At a high level, this setup looks like this:`
website: Working on internal architecture 2015-09-17 23:29:25 +00:00
new website :sparkles: 2020-02-06 23:45:31 +00:00			`[![Global Architecture](/img/nomad-architecture-global.png)](/img/nomad-architecture-global.png)`
website: Working on internal architecture 2015-09-17 23:29:25 +00:00
doc updates 2015-09-24 15:50:20 +00:00			`Regions are fully independent from each other, and do not share jobs, clients, or`
website: Working on internal architecture 2015-09-17 23:29:25 +00:00			`state. They are loosely-coupled using a gossip protocol, which allows users to`
			`submit jobs to any region or query the state of any region transparently. Requests`
new website :sparkles: 2020-02-06 23:45:31 +00:00			`are forwarded to the appropriate server to be processed and the results returned.`
reword for clarity; note that data is not replicated between regions 2017-08-19 00:13:26 +00:00			`Data is _not_ replicated between regions.`
website: Working on internal architecture 2015-09-17 23:29:25 +00:00
update architecture copy 2016-08-26 20:57:43 +00:00			`The servers in each region are all part of a single consensus group. This means`
website: Working on internal architecture 2015-09-17 23:29:25 +00:00			`that they work together to elect a single leader which has extra duties. The leader`
doc updates 2015-09-24 15:50:20 +00:00			`is responsible for processing all queries and transactions. Nomad is optimistically`
website: Working on internal architecture 2015-09-17 23:29:25 +00:00			`concurrent, meaning all servers participate in making scheduling decisions in parallel.`
			`The leader provides the additional coordination necessary to do this safely and`
			`to ensure clients are not oversubscribed.`

			`Each region is expected to have either three or five servers. This strikes a balance`
			`between availability in the case of failure and performance, as consensus gets`
			`progressively slower as more servers are added. However, there is no limit to the number`
			`of clients per region.`

			`Clients are configured to communicate with their regional servers and communicate`
doc updates 2015-09-24 15:50:20 +00:00			`using remote procedure calls (RPC) to register themselves, send heartbeats for liveness,`
website: Working on internal architecture 2015-09-17 23:29:25 +00:00			`wait for new allocations, and update the status of allocations. A client registers`
			`with the servers to provide the resources available, attributes, and installed drivers.`
			`Servers use this information for scheduling decisions and create allocations to assign`
			`work to clients.`

			`Users make use of the Nomad CLI or API to submit jobs to the servers. A job represents`
doc updates 2015-09-24 15:50:20 +00:00			`a desired state and provides the set of tasks that should be run. The servers are`
			`responsible for scheduling the tasks, which is done by finding an optimal placement for`
website: Working on internal architecture 2015-09-17 23:29:25 +00:00			`each task such that resource utilization is maximized while satisfying all constraints`
			`specified by the job. Resource utilization is maximized by bin packing, in which`
			`the scheduling tries to make use of all the resources of a machine without`
doc updates 2015-09-24 15:50:20 +00:00			`exhausting any dimension. Job constraints can be used to ensure an application is`
website: Working on internal architecture 2015-09-17 23:29:25 +00:00			`running in an appropriate environment. Constraints can be technical requirements based`
replace comman with 'and' 2015-10-10 23:58:30 +00:00			`on hardware features such as architecture and availability of GPUs, or software features`
website: Working on internal architecture 2015-09-17 23:29:25 +00:00			`like operating system and kernel version, or they can be business constraints like`
			`ensuring PCI compliant workloads run on appropriate servers.`
website 2015-09-12 00:01:02 +00:00
			`# Getting in Depth`

website: Working on internal architecture 2015-09-17 23:29:25 +00:00			`This has been a brief high-level overview of the architecture of Nomad. There`
docs: Migrate link formats (#15779) * Adding check-legacy-links-format workflow * Adding test-link-rewrites workflow * chore: updates link checker workflow hash * Migrating links to new format Co-authored-by: Kendall Strautman <kendallstrautman@gmail.com> 2023-01-25 17:31:14 +00:00			`are more details available for each of the sub-systems. The [consensus protocol](/nomad/docs/concepts/consensus),`
			`[gossip protocol](/nomad/docs/concepts/gossip), and [scheduler design](/nomad/docs/concepts/scheduling/scheduling)`
website: update scheduling links 2015-09-19 19:08:42 +00:00			`are all documented in more detail.`
website 2015-09-12 00:01:02 +00:00
			`For other details, either consult the code, ask in IRC or reach out to the mailing list.`
internals documentation with diagrams (#14750) This changeset adds new architecture internals documents to the contributing guide. These are intentionally here and not on the public-facing website because the material is not required for operators and includes a lot of diagrams that we can cheaply maintain with mermaid syntax but would involve art assets to have up on the main site that would become quickly out of date as code changes happen and be extremely expensive to maintain. However, these should be suitable to use as points of conversation with expert end users. Included: * A description of Evaluation triggers and expected counts, with examples. * A description of Evaluation states and implicit states. This is taken from an internal document in our team wiki. * A description of how writing the State Store works. This is taken from a diagram I put together a few months ago for internal education purposes. * A description of Evaluation lifecycle, from registration to running Allocations. This is mostly lifted from @lgfa29's amazing mega-diagram, but broken into digestible chunks and without multi-region deployments, which I'd like to cover in a future doc. Also includes adding Deployments to our public-facing glossary. Co-authored-by: Luiz Aoqui <luiz@hashicorp.com> Co-authored-by: Michael Schurter <mschurter@hashicorp.com> Co-authored-by: Seth Hoenig <shoenig@duck.com> 2022-10-03 18:06:41 +00:00

docs: Migrate link formats (#15779) * Adding check-legacy-links-format workflow * Adding test-link-rewrites workflow * chore: updates link checker workflow hash * Migrating links to new format Co-authored-by: Kendall Strautman <kendallstrautman@gmail.com> 2023-01-25 17:31:14 +00:00			[`update`]: /nomad/docs/job-specification/update