160 lines
6.6 KiB
Markdown
160 lines
6.6 KiB
Markdown
---
|
|
layout: "docs"
|
|
page_title: "Creating a Nomad Cluster"
|
|
sidebar_current: "docs-cluster-bootstrap"
|
|
description: |-
|
|
Learn how to bootstrap a Nomad cluster.
|
|
---
|
|
|
|
# Creating a cluster
|
|
|
|
Nomad clusters in production comprises of a few Nomad servers (an odd number,
|
|
preferrably 3 or 5, but never an even number to prevent split-brain), clients and
|
|
optionally Consul servers and clients. Before we start discussing the specifics
|
|
around bootstrapping clusters we should discuss the network topology. Nomad
|
|
models infrastructure as Regions and Datacenters. Nomad Regions may contain multiple
|
|
datacenters. Nomad Servers are assigned to the same region and hence a region is a
|
|
single scheduling domain in Nomad. Each cluster of Nomad Servers support
|
|
only one region, however multiple Regions can be stitched together to allow a
|
|
globally coherent view of an organization's resources.
|
|
|
|
|
|
## Consul Cluster
|
|
|
|
Bootstrapping a Nomad cluster becomes significantly easier if operators use
|
|
Consul. Network topology of a Consul cluster is slightly different than Nomad.
|
|
Consul models infrastructures as Data Centers, and each Consul Data Center can
|
|
have up to ~10,000 nodes and multiple Consul datacentres can be connected over
|
|
the WAN so that clients can discover nodes in other Data Centres. We recommend
|
|
running a Consul Cluster in every Nomad datacenter and connecting them over the
|
|
WAN. Please refer to the Consul
|
|
(documentation)[https://www.consul.io/docs/commands/join.html] to learn more
|
|
about bootstrapping Consul and connecting multiple Consul clusters over the WAN.
|
|
|
|
Also, Nomad clusters can be significantly larger than Consul clusters, so
|
|
sharding the Consul clusters per ~10,000 nodes organized in individual DCs helps
|
|
scale Consul as the Nomad clusters scale.
|
|
|
|
|
|
## Nomad Servers
|
|
|
|
Nomad servers are expected to have sub 10 millisecond network latencies between
|
|
them. Nomad servers could be spread across multiple datacenters, if they have
|
|
low latency connections between them, to achieve high availability. For example,
|
|
on AWS every region comprises of multiple zones which have very low latency
|
|
links between them, so every zone can be modeled as a Nomad datacenter and
|
|
every Zone can have a single Nomad server which could be connected to form a
|
|
quorum and form a Region. Nomad servers uses Raft for replicating state between
|
|
them and raft being highly consistent needs a quorum of servers to function,
|
|
therefore we recommend running an odd number of Nomad servers in a region.
|
|
Usually running 3-5 servers in a region is recommended. The cluster can
|
|
withstand a failure of one server in a cluster of three servers and two failures
|
|
in a cluster of five servers. Adding more servers to the quorum adds more time
|
|
to replicate state and hence throughput decreases so we don't recommend having
|
|
more than seven servers in a region.
|
|
|
|
During the bootstrapping phase Nomad servers need to know the addresses of other
|
|
servers. Nomad will automatically bootstrap itself when the Consul service is
|
|
present, or can be manually joined by using the
|
|
[`-retry-join`](https://www.nomadproject.io/docs/agent/config.html#_retry_join)
|
|
CLI command or the Server
|
|
[`retry_join`](https://www.nomadproject.io/docs/agent/config.html#retry_join)
|
|
option.
|
|
|
|
|
|
|
|
## Nomad Clients
|
|
|
|
Nomad clients are organized in datacenters and need to be made aware of the
|
|
Nomad Servers to communicate with. If Consul is present, Nomad will
|
|
automatically self-bootstrap, otherwise they will need to be provided with a
|
|
static list of
|
|
[`servers`](https://www.nomadproject.io/docs/agent/config.html#servers) to find
|
|
the list of Nomad Servers.
|
|
|
|
Operators can either place the addresses of the Nomad servers in the client
|
|
configuration or point Nomad client to the Nomad server service in Consul. Once
|
|
a client establishes connection with a Nomad servers, if new servers are added
|
|
to the cluster the addresses are propagated down to the clients along with
|
|
heartbeat.
|
|
|
|
|
|
### Bootstrapping a Nomad cluster without Consul
|
|
|
|
At least one Nomad server's address (also known as the seed node) needs to be
|
|
known ahead of time and a running agent can be joined to a cluster by running
|
|
the `server-join` cli command.
|
|
|
|
For example, once a Nomad agent starts in the server mode it can be joined to an
|
|
existing cluster with a server whose IP is known. Once the agent joins the other
|
|
node in the cluster, it can discover the other nodes via the gossip protocol.
|
|
|
|
```
|
|
nomad server-join -retry-join 10.0.0.1
|
|
```
|
|
|
|
The `-retry-join` parameter indicates that the agent should keep trying to join
|
|
the server even if the first attempt fails. This is essential when the other
|
|
address is going to be eventually available after some time as nodes might take
|
|
a variable amount of time to boot up in a cluster.
|
|
|
|
On the client side, the addresses of the servers are expected to be specified
|
|
via the client configuration.
|
|
|
|
```
|
|
client {
|
|
...
|
|
servers = ["10.10.11.2:4648", "10.10.11.3:4648", "10.10.11.4:4648"]
|
|
...
|
|
}
|
|
```
|
|
|
|
In the above example we are specifying three servers for the clients to
|
|
connect. If servers are added or removed, clients know about them via the
|
|
heartbeat of a server which is alive.
|
|
|
|
|
|
### Bootstrapping a Nomad cluster with Consul
|
|
|
|
Bootstrapping a Nomad cluster is significantly easier if Consul is used along
|
|
with Nomad. If a local Consul cluster is bootstrapped before Nomad, the
|
|
following configuration would register the Nomad agent with Consul and look up
|
|
the addresses of the other Nomad server addresses and join with them
|
|
automatically.
|
|
|
|
```
|
|
{
|
|
"server_service_name": "nomad",
|
|
"server_auto_join": true,
|
|
"client_service_name": "nomad-client",
|
|
"client_auto_join": true
|
|
}
|
|
```
|
|
|
|
With the above configuration Nomad agent is going to look up Consul for
|
|
addresses of agents in the `nomad` service and join them automatically. In
|
|
addition, if the `auto-advertise` option is set Nomad is going to register the
|
|
agents with Consul automatically too. By default, Nomad will automatically
|
|
register the server and the client agents with Consul and try to auto-discover
|
|
the servers if it can talk to a local Consul agent on the same server.
|
|
|
|
Please refer to the (documentation)[/jobspec/servicediscovery.html] for the
|
|
complete set of configuration options.
|
|
|
|
|
|
### Fedarating a cluster
|
|
|
|
Nomad clusters across multiple regions can be fedarated and once they are
|
|
connected users can target the Nomad Servers in various regions from any other
|
|
region while submitting a job or querying any Nomad API.
|
|
|
|
Fedarating multiple Nomad clusters is as simple as joining a server to any other
|
|
remote server.
|
|
|
|
```
|
|
nomad server-join 10.10.11.8:4648
|
|
```
|
|
|
|
Servers across regions discover other servers in the cluster via the gossip
|
|
protocol and hence it enough to join one known server.
|