Remove todos
This commit is contained in:
parent
009bd88e2f
commit
b1cd81e997
|
@ -1058,7 +1058,6 @@ func (s *Server) GetConfig() *Config {
|
|||
return s.config
|
||||
}
|
||||
|
||||
// TODO(alex) we need a outage guide
|
||||
// peersInfoContent is used to help operators understand what happened to the
|
||||
// peers.json file. This is written to a file called peers.info in the same
|
||||
// location.
|
||||
|
@ -1082,5 +1081,5 @@ creating the peers.json file, and that all servers receive the same
|
|||
configuration. Once the peers.json file is successfully ingested and applied, it
|
||||
will be deleted.
|
||||
|
||||
Please see https://www.consul.io/docs/guides/outage.html for more information.
|
||||
Please see https://www.nomadproject.io/guides/outage.html for more information.
|
||||
`
|
||||
|
|
|
@ -16,9 +16,9 @@ as interacting with the Raft subsystem. This was added in Nomad 0.5.5.
|
|||
~> Use this command with extreme caution, as improper use could lead to a Nomad
|
||||
outage and even loss of data.
|
||||
|
||||
See the [Outage Recovery](TODO alexdadgar) guide for some examples of how
|
||||
See the [Outage Recovery](/guides/outage.html) guide for some examples of how
|
||||
this command is used. For an API to perform these operations programatically,
|
||||
please see the documentation for the [Operator](/docs/agent/http/operator.html)
|
||||
please see the documentation for the [Operator](/guides/outage.html)
|
||||
endpoint.
|
||||
|
||||
## Usage
|
||||
|
|
|
@ -15,7 +15,7 @@ as interacting with the Raft subsystem. This was added in Nomad 0.5.5
|
|||
~> Use this interface with extreme caution, as improper use could lead to a
|
||||
Nomad outage and even loss of data.
|
||||
|
||||
See the [Outage Recovery](/docs/guides/outage.html) guide for some examples of how
|
||||
See the [Outage Recovery](/guides/outage.html) guide for some examples of how
|
||||
these capabilities are used. For a CLI to perform these operations manually, please
|
||||
see the documentation for the [`nomad operator`](/docs/commands/operator-index.html)
|
||||
command.
|
||||
|
|
|
@ -0,0 +1,116 @@
|
|||
---
|
||||
layout: "guides"
|
||||
page_title: "Automatically Bootstrapping a Nomad Cluster"
|
||||
sidebar_current: "guides-cluster-automatic"
|
||||
description: |-
|
||||
Learn how to automatically bootstrap a Nomad cluster using Consul. By having
|
||||
a Consul agent installed on each host, Nomad can automatically discover other
|
||||
clients and servers to bootstrap the cluster without operator involvement.
|
||||
---
|
||||
|
||||
# Automatic Bootstrapping
|
||||
|
||||
To automatically bootstrap a Nomad cluster, we must leverage another HashiCorp
|
||||
open source tool, [Consul](https://www.consul.io/). Bootstrapping Nomad is
|
||||
easiest against an existing Consul cluster. The Nomad servers and clients
|
||||
will become informed of each other's existence when the Consul agent is
|
||||
installed and configured on each host. As an added benefit, integrating Consul
|
||||
with Nomad provides service and health check registration for applications which
|
||||
later run under Nomad.
|
||||
|
||||
Consul models infrastructures as datacenters and multiple Consul datacenters can
|
||||
be connected over the WAN so that clients can discover nodes in other
|
||||
datacenters. Since Nomad regions can encapsulate many datacenters, we recommend
|
||||
running a Consul cluster in every Nomad datacenter and connecting them over the
|
||||
WAN. Please refer to the Consul guide for both
|
||||
[bootstrapping](https://www.consul.io/docs/guides/bootstrapping.html) a single
|
||||
datacenter and [connecting multiple Consul clusters over the
|
||||
WAN](https://www.consul.io/docs/guides/datacenters.html).
|
||||
|
||||
If a Consul agent is installed on the host prior to Nomad starting, the Nomad
|
||||
agent will register with Consul and discover other nodes.
|
||||
|
||||
For servers, we must inform the cluster how many servers we expect to have. This
|
||||
is required to form the initial quorum, since Nomad is unaware of how many peers
|
||||
to expect. For example, to form a region with three Nomad servers, you would use
|
||||
the following Nomad configuration file:
|
||||
|
||||
```hcl
|
||||
# /etc/nomad.d/server.hcl
|
||||
|
||||
server {
|
||||
enabled = true
|
||||
bootstrap_expect = 3
|
||||
}
|
||||
```
|
||||
|
||||
This configuration would be saved to disk and then run:
|
||||
|
||||
```shell
|
||||
$ nomad agent -config=/etc/nomad.d/server.hcl
|
||||
```
|
||||
|
||||
A similar configuration is available for Nomad clients:
|
||||
|
||||
```hcl
|
||||
# /etc/nomad.d/client.hcl
|
||||
|
||||
datacenter = "dc1"
|
||||
|
||||
client {
|
||||
enabled = true
|
||||
}
|
||||
```
|
||||
|
||||
The agent is started in a similar manner:
|
||||
|
||||
```shell
|
||||
$ nomad agent -config=/etc/nomad.d/client.hcl
|
||||
```
|
||||
|
||||
As you can see, the above configurations include no IP or DNS addresses between
|
||||
the clients and servers. This is because Nomad detected the existence of Consul
|
||||
and utilized service discovery to form the cluster.
|
||||
|
||||
## Internals
|
||||
|
||||
~> This section discusses the internals of the Consul and Nomad integration at a
|
||||
very high level. Reading is only recommended for those curious to the
|
||||
implementation.
|
||||
|
||||
As discussed in the previous section, Nomad merges multiple configuration files
|
||||
together, so the `-config` may be specified more than once:
|
||||
|
||||
```shell
|
||||
$ nomad agent -config=base.hcl -config=server.hcl
|
||||
```
|
||||
|
||||
In addition to merging configuration on the command line, Nomad also maintains
|
||||
its own internal configurations (called "default configs") which include sane
|
||||
base defaults. One of those default configurations includes a "consul" block,
|
||||
which specifies sane defaults for connecting to and integrating with Consul. In
|
||||
essence, this configuration file resembles the following:
|
||||
|
||||
```hcl
|
||||
# You do not need to add this to your configuration file. This is an example
|
||||
# that is part of Nomad's internal default configuration for Consul integration.
|
||||
consul {
|
||||
# The address to the Consul agent.
|
||||
address = "127.0.0.1:8500"
|
||||
|
||||
# The service name to register the server and client with Consul.
|
||||
server_service_name = "nomad"
|
||||
client_service_name = "nomad-client"
|
||||
|
||||
# Enables automatically registering the services.
|
||||
auto_advertise = true
|
||||
|
||||
# Enabling the server and client to bootstrap using Consul.
|
||||
server_auto_join = true
|
||||
client_auto_join = true
|
||||
}
|
||||
```
|
||||
|
||||
Please refer to the [Consul
|
||||
documentation](/docs/agent/configuration/consul.html) for the complete set of
|
||||
configuration options.
|
|
@ -0,0 +1,24 @@
|
|||
---
|
||||
layout: "guides"
|
||||
page_title: "Bootstrapping a Nomad Cluster"
|
||||
sidebar_current: "guides-cluster-bootstrap"
|
||||
description: |-
|
||||
Learn how to bootstrap a Nomad cluster.
|
||||
---
|
||||
|
||||
# Bootstrapping a Nomad Cluster
|
||||
|
||||
Nomad models infrastructure into regions and datacenters. Servers reside at the
|
||||
regional layer and manage all state and scheduling decisions for that region.
|
||||
Regions contain multiple datacenters, and clients are registered to a single
|
||||
datacenter (and thus a region that contains that datacenter). For more details on
|
||||
the architecture of Nomad and how it models infrastructure see the [architecture
|
||||
page](/docs/internals/architecture.html).
|
||||
|
||||
There are two strategies for bootstrapping a Nomad cluster:
|
||||
|
||||
1. <a href="/guides/cluster/automatic.html">Automatic bootstrapping</a>
|
||||
1. <a href="/guides/cluster/manual.html">Manual bootstrapping</a>
|
||||
|
||||
Please refer to the specific documentation links above or in the sidebar for
|
||||
more detailed information about each strategy.
|
|
@ -0,0 +1,28 @@
|
|||
---
|
||||
layout: "guides"
|
||||
page_title: "Federating a Nomad Cluster"
|
||||
sidebar_current: "guides-cluster-federation"
|
||||
description: |-
|
||||
Learn how to join Nomad servers across multiple regions so users can submit
|
||||
jobs to any server in any region using global federation.
|
||||
---
|
||||
|
||||
# Federating a Cluster
|
||||
|
||||
Because Nomad operates at a regional level, federation is part of Nomad core.
|
||||
Federation enables users to submit jobs or interact with the HTTP API targeting
|
||||
any region, from any server, even if that server resides in a different region.
|
||||
|
||||
Federating multiple Nomad clusters is as simple as joining servers. From any
|
||||
server in one region, issue a join command to a server in a remote region:
|
||||
|
||||
```shell
|
||||
$ nomad server-join 1.2.3.4:4648
|
||||
```
|
||||
|
||||
Note that only one join command is required per region. Servers across regions
|
||||
discover other servers in the cluster via the gossip protocol and hence it's
|
||||
enough to join just one known server.
|
||||
|
||||
If bootstrapped via Consul and the Consul clusters in the Nomad regions are
|
||||
federated, then federation occurs automatically.
|
|
@ -0,0 +1,65 @@
|
|||
---
|
||||
layout: "guides"
|
||||
page_title: "Manually Bootstrapping a Nomad Cluster"
|
||||
sidebar_current: "guides-cluster-manual"
|
||||
description: |-
|
||||
Learn how to manually bootstrap a Nomad cluster using the server-join
|
||||
command. This section also discusses Nomad federation across multiple
|
||||
datacenters and regions.
|
||||
---
|
||||
|
||||
# Manual Bootstrapping
|
||||
|
||||
Manually bootstrapping a Nomad cluster does not rely on additional tooling, but
|
||||
does require operator participation in the cluster formation process. When
|
||||
bootstrapping, Nomad servers and clients must be started and informed with the
|
||||
address of at least one Nomad server.
|
||||
|
||||
As you can tell, this creates a chicken-and-egg problem where one server must
|
||||
first be fully bootstrapped and configured before the remaining servers and
|
||||
clients can join the cluster. This requirement can add additional provisioning
|
||||
time as well as ordered dependencies during provisioning.
|
||||
|
||||
First, we bootstrap a single Nomad server and capture its IP address. After we
|
||||
have that nodes IP address, we place this address in the configuration.
|
||||
|
||||
For Nomad servers, this configuration may look something like this:
|
||||
|
||||
```hcl
|
||||
server {
|
||||
enabled = true
|
||||
bootstrap_expect = 3
|
||||
|
||||
# This is the IP address of the first server we provisioned
|
||||
retry_join = ["<known-address>:4648"]
|
||||
}
|
||||
```
|
||||
|
||||
Alternatively, the address can be supplied after the servers have all been
|
||||
started by running the [`server-join` command](/docs/commands/server-join.html)
|
||||
on the servers individual to cluster the servers. All servers can join just one
|
||||
other server, and then rely on the gossip protocol to discover the rest.
|
||||
|
||||
```
|
||||
$ nomad server-join <known-address>
|
||||
```
|
||||
|
||||
For Nomad clients, the configuration may look something like:
|
||||
|
||||
```hcl
|
||||
client {
|
||||
enabled = true
|
||||
servers = ["<known-address>:4647"]
|
||||
}
|
||||
```
|
||||
|
||||
At this time, there is no equivalent of the <tt>server-join</tt> command for
|
||||
Nomad clients.
|
||||
|
||||
The port corresponds to the RPC port. If no port is specified with the IP
|
||||
address, the default RCP port of `4647` is assumed.
|
||||
|
||||
As servers are added or removed from the cluster, this information is pushed to
|
||||
the client. This means only one server must be specified because, after initial
|
||||
contact, the full set of servers in the client's region are shared with the
|
||||
client.
|
|
@ -0,0 +1,71 @@
|
|||
---
|
||||
layout: "guides"
|
||||
page_title: "Nomad Client and Server Requirements"
|
||||
sidebar_current: "guides-cluster-requirements"
|
||||
description: |-
|
||||
Learn about Nomad client and server requirements such as memory and CPU
|
||||
recommendations, network topologies, and more.
|
||||
---
|
||||
|
||||
# Cluster Requirements
|
||||
|
||||
## Resources (RAM, CPU, etc.)
|
||||
|
||||
**Nomad servers** may need to be run on large machine instances. We suggest
|
||||
having 8+ cores, 32 GB+ of memory, 80 GB+ of disk and significant network
|
||||
bandwidth. The core count and network recommendations are to ensure high
|
||||
throughput as Nomad heavily relies on network communication and as the Servers
|
||||
are managing all the nodes in the region and performing scheduling. The memory
|
||||
and disk requirements are due to the fact that Nomad stores all state in memory
|
||||
and will store two snapshots of this data onto disk. Thus disk should be at
|
||||
least 2 times the memory available to the server when deploying a high load
|
||||
cluster.
|
||||
|
||||
**Nomad clients** support reserving resources on the node that should not be
|
||||
used by Nomad. This should be used to target a specific resource utilization per
|
||||
node and to reserve resources for applications running outside of Nomad's
|
||||
supervision such as Consul and the operating system itself.
|
||||
|
||||
Please see the [reservation configuration](/docs/agent/configuration/client.html#reserved) for
|
||||
more detail.
|
||||
|
||||
## Network Topology
|
||||
|
||||
**Nomad servers** are expected to have sub 10 millisecond network latencies
|
||||
between each other to ensure liveness and high throughput scheduling. Nomad
|
||||
servers can be spread across multiple datacenters if they have low latency
|
||||
connections between them to achieve high availability.
|
||||
|
||||
For example, on AWS every region comprises of multiple zones which have very low
|
||||
latency links between them, so every zone can be modeled as a Nomad datacenter
|
||||
and every Zone can have a single Nomad server which could be connected to form a
|
||||
quorum and a region.
|
||||
|
||||
Nomad servers uses Raft for state replication and Raft being highly consistent
|
||||
needs a quorum of servers to function, therefore we recommend running an odd
|
||||
number of Nomad servers in a region. Usually running 3-5 servers in a region is
|
||||
recommended. The cluster can withstand a failure of one server in a cluster of
|
||||
three servers and two failures in a cluster of five servers. Adding more servers
|
||||
to the quorum adds more time to replicate state and hence throughput decreases
|
||||
so we don't recommend having more than seven servers in a region.
|
||||
|
||||
**Nomad clients** do not have the same latency requirements as servers since they
|
||||
are not participating in Raft. Thus clients can have 100+ millisecond latency to
|
||||
their servers. This allows having a set of Nomad servers that service clients
|
||||
that can be spread geographically over a continent or even the world in the case
|
||||
of having a single "global" region and many datacenter.
|
||||
|
||||
## Ports Used
|
||||
|
||||
Nomad requires 3 different ports to work properly on servers and 2 on clients,
|
||||
some on TCP, UDP, or both protocols. Below we document the requirements for each
|
||||
port.
|
||||
|
||||
* HTTP API (Default 4646). This is used by clients and servers to serve the HTTP
|
||||
API. TCP only.
|
||||
|
||||
* RPC (Default 4647). This is used by servers and clients to communicate amongst
|
||||
each other. TCP only.
|
||||
|
||||
* Serf WAN (Default 4648). This is used by servers to gossip over the WAN to
|
||||
other servers. TCP and UDP.
|
Loading…
Reference in New Issue