2015-09-12 00:01:02 +00:00
|
|
|
---
|
2020-02-06 23:45:31 +00:00
|
|
|
layout: docs
|
|
|
|
page_title: Architecture
|
|
|
|
description: Learn about the internal architecture of Nomad.
|
2015-09-12 00:01:02 +00:00
|
|
|
---
|
|
|
|
|
|
|
|
# Architecture
|
|
|
|
|
2015-09-17 23:29:25 +00:00
|
|
|
Nomad is a complex system that has many different pieces. To help both users and developers of Nomad
|
2015-09-12 00:01:02 +00:00
|
|
|
build a mental model of how it works, this page documents the system architecture.
|
|
|
|
|
|
|
|
~> **Advanced Topic!** This page covers technical details
|
2015-09-20 20:37:22 +00:00
|
|
|
of Nomad. You do not need to understand these details to
|
2015-09-17 23:29:25 +00:00
|
|
|
effectively use Nomad. The details are documented here for
|
2015-09-12 00:01:02 +00:00
|
|
|
those who wish to learn about them without having to go
|
2015-09-17 23:29:25 +00:00
|
|
|
spelunking through the source code.
|
2015-09-12 00:01:02 +00:00
|
|
|
|
|
|
|
# Glossary
|
|
|
|
|
|
|
|
Before describing the architecture, we provide a glossary of terms to help
|
|
|
|
clarify what is being discussed:
|
|
|
|
|
2021-05-18 18:13:49 +00:00
|
|
|
#### Job
|
|
|
|
|
|
|
|
A Job is a specification provided by users that declares a workload for
|
|
|
|
Nomad. A Job is a form of _desired state_; the user is expressing that the
|
|
|
|
job should be running, but not where it should be run. The responsibility of
|
|
|
|
Nomad is to make sure the _actual state_ matches the user desired state. A
|
|
|
|
Job is composed of one or more task groups.
|
|
|
|
|
|
|
|
#### Task Group
|
|
|
|
|
|
|
|
A Task Group is a set of tasks that must be run together. For example, a web
|
|
|
|
server may require that a log shipping co-process is always running as well. A
|
|
|
|
task group is the unit of scheduling, meaning the entire group must run on the
|
|
|
|
same client node and cannot be split.
|
|
|
|
|
|
|
|
#### Driver
|
|
|
|
|
|
|
|
A Driver represents the basic means of executing your **Tasks**. Example
|
|
|
|
Drivers include Docker, QEMU, Java, and static binaries.
|
|
|
|
|
|
|
|
#### Task
|
|
|
|
|
|
|
|
A Task is the smallest unit of work in Nomad. Tasks are executed by drivers,
|
|
|
|
which allow Nomad to be flexible in the types of tasks it supports. Tasks
|
|
|
|
specify their driver, configuration for the driver, constraints, and resources
|
|
|
|
required.
|
|
|
|
|
|
|
|
#### Client
|
|
|
|
|
|
|
|
A Nomad client is an agent configured to run and manage tasks using available
|
|
|
|
compute resources on a machine. The agent is responsible for registering with
|
|
|
|
the servers, watching for any work to be assigned and executing tasks. The
|
|
|
|
Nomad agent is a long lived process which interfaces with the servers.
|
|
|
|
|
|
|
|
#### Allocation
|
|
|
|
|
|
|
|
An Allocation is a mapping between a task group in a job and a client node. A
|
|
|
|
single job may have hundreds or thousands of task groups, meaning an
|
|
|
|
equivalent number of allocations must exist to map the work to client
|
|
|
|
machines. Allocations are created by the Nomad servers as part of scheduling
|
|
|
|
decisions made during an evaluation.
|
|
|
|
|
|
|
|
#### Evaluation
|
|
|
|
|
|
|
|
Evaluations are the mechanism by which Nomad makes scheduling decisions. When
|
|
|
|
either the _desired state_ (jobs) or _actual state_ (clients) changes, Nomad
|
|
|
|
creates a new evaluation to determine if any actions must be taken. An
|
|
|
|
evaluation may result in changes to allocations if necessary.
|
|
|
|
|
|
|
|
#### Server
|
|
|
|
|
|
|
|
Nomad servers are the brains of the cluster. There is a cluster of servers per
|
|
|
|
region and they manage all jobs and clients, run evaluations, and create task
|
|
|
|
allocations. The servers replicate data between each other and perform leader
|
|
|
|
election to ensure high availability. More information about latency
|
|
|
|
requirements for servers can be found in [Network
|
|
|
|
Topology](/docs/install/production/requirements#network-topology).
|
|
|
|
|
|
|
|
#### Regions
|
|
|
|
|
|
|
|
Nomad models infrastructure as regions and datacenters. A region will contain
|
|
|
|
one or more datacenters. A set of servers joined together will represent a
|
|
|
|
single region. Servers federate across regions to make Nomad globally aware.
|
|
|
|
|
|
|
|
#### Datacenters
|
|
|
|
|
|
|
|
Nomad models a datacenter as an abstract grouping of clients within a
|
|
|
|
region. Nomad clients are not required to be in the same datacenter as the
|
|
|
|
servers they are joined with, but do need to be in the same
|
|
|
|
region. Datacenters provide a way to express fault tolerance among jobs as
|
|
|
|
well as isolation of infrastructure.
|
|
|
|
|
|
|
|
#### Bin Packing
|
|
|
|
|
|
|
|
Bin Packing is the process of filling bins with items in a way that maximizes
|
|
|
|
the utilization of bins. This extends to Nomad, where the clients are "bins"
|
|
|
|
and the items are task groups. Nomad optimizes resources by efficiently bin
|
|
|
|
packing tasks onto client machines.
|
2015-09-12 00:01:02 +00:00
|
|
|
|
|
|
|
# High-Level Overview
|
|
|
|
|
2015-09-24 15:50:20 +00:00
|
|
|
Looking at only a single region, at a high level Nomad looks like this:
|
2015-09-17 23:29:25 +00:00
|
|
|
|
2020-02-06 23:45:31 +00:00
|
|
|
[![Regional Architecture](/img/nomad-architecture-region.png)](/img/nomad-architecture-region.png)
|
2015-09-17 23:29:25 +00:00
|
|
|
|
|
|
|
Within each region, we have both clients and servers. Servers are responsible for
|
2020-02-06 23:45:31 +00:00
|
|
|
accepting jobs from users, managing clients, and [computing task placements](/docs/internals/scheduling/scheduling).
|
2015-09-19 19:08:42 +00:00
|
|
|
Each region may have clients from multiple datacenters, allowing a small number of servers
|
2015-09-17 23:29:25 +00:00
|
|
|
to handle very large clusters.
|
|
|
|
|
|
|
|
In some cases, for either availability or scalability, you may need to run multiple
|
|
|
|
regions. Nomad supports federating multiple regions together into a single cluster.
|
2015-09-24 15:50:20 +00:00
|
|
|
At a high level, this setup looks like this:
|
2015-09-17 23:29:25 +00:00
|
|
|
|
2020-02-06 23:45:31 +00:00
|
|
|
[![Global Architecture](/img/nomad-architecture-global.png)](/img/nomad-architecture-global.png)
|
2015-09-17 23:29:25 +00:00
|
|
|
|
2015-09-24 15:50:20 +00:00
|
|
|
Regions are fully independent from each other, and do not share jobs, clients, or
|
2015-09-17 23:29:25 +00:00
|
|
|
state. They are loosely-coupled using a gossip protocol, which allows users to
|
|
|
|
submit jobs to any region or query the state of any region transparently. Requests
|
2020-02-06 23:45:31 +00:00
|
|
|
are forwarded to the appropriate server to be processed and the results returned.
|
2017-08-19 00:13:26 +00:00
|
|
|
Data is _not_ replicated between regions.
|
2015-09-17 23:29:25 +00:00
|
|
|
|
2016-08-26 20:57:43 +00:00
|
|
|
The servers in each region are all part of a single consensus group. This means
|
2015-09-17 23:29:25 +00:00
|
|
|
that they work together to elect a single leader which has extra duties. The leader
|
2015-09-24 15:50:20 +00:00
|
|
|
is responsible for processing all queries and transactions. Nomad is optimistically
|
2015-09-17 23:29:25 +00:00
|
|
|
concurrent, meaning all servers participate in making scheduling decisions in parallel.
|
|
|
|
The leader provides the additional coordination necessary to do this safely and
|
|
|
|
to ensure clients are not oversubscribed.
|
|
|
|
|
|
|
|
Each region is expected to have either three or five servers. This strikes a balance
|
|
|
|
between availability in the case of failure and performance, as consensus gets
|
|
|
|
progressively slower as more servers are added. However, there is no limit to the number
|
|
|
|
of clients per region.
|
|
|
|
|
|
|
|
Clients are configured to communicate with their regional servers and communicate
|
2015-09-24 15:50:20 +00:00
|
|
|
using remote procedure calls (RPC) to register themselves, send heartbeats for liveness,
|
2015-09-17 23:29:25 +00:00
|
|
|
wait for new allocations, and update the status of allocations. A client registers
|
|
|
|
with the servers to provide the resources available, attributes, and installed drivers.
|
|
|
|
Servers use this information for scheduling decisions and create allocations to assign
|
|
|
|
work to clients.
|
|
|
|
|
|
|
|
Users make use of the Nomad CLI or API to submit jobs to the servers. A job represents
|
2015-09-24 15:50:20 +00:00
|
|
|
a desired state and provides the set of tasks that should be run. The servers are
|
|
|
|
responsible for scheduling the tasks, which is done by finding an optimal placement for
|
2015-09-17 23:29:25 +00:00
|
|
|
each task such that resource utilization is maximized while satisfying all constraints
|
|
|
|
specified by the job. Resource utilization is maximized by bin packing, in which
|
|
|
|
the scheduling tries to make use of all the resources of a machine without
|
2015-09-24 15:50:20 +00:00
|
|
|
exhausting any dimension. Job constraints can be used to ensure an application is
|
2015-09-17 23:29:25 +00:00
|
|
|
running in an appropriate environment. Constraints can be technical requirements based
|
2015-10-10 23:58:30 +00:00
|
|
|
on hardware features such as architecture and availability of GPUs, or software features
|
2015-09-17 23:29:25 +00:00
|
|
|
like operating system and kernel version, or they can be business constraints like
|
|
|
|
ensuring PCI compliant workloads run on appropriate servers.
|
2015-09-12 00:01:02 +00:00
|
|
|
|
|
|
|
# Getting in Depth
|
|
|
|
|
2015-09-17 23:29:25 +00:00
|
|
|
This has been a brief high-level overview of the architecture of Nomad. There
|
2020-02-06 23:45:31 +00:00
|
|
|
are more details available for each of the sub-systems. The [consensus protocol](/docs/internals/consensus),
|
|
|
|
[gossip protocol](/docs/internals/gossip), and [scheduler design](/docs/internals/scheduling/scheduling)
|
2015-09-19 19:08:42 +00:00
|
|
|
are all documented in more detail.
|
2015-09-12 00:01:02 +00:00
|
|
|
|
|
|
|
For other details, either consult the code, ask in IRC or reach out to the mailing list.
|