255 lines
10 KiB
Plaintext
255 lines
10 KiB
Plaintext
---
|
|
layout: docs
|
|
page_title: 'Upgrading'
|
|
sidebar_current: 'guides-upgrade'
|
|
description: |-
|
|
Learn how to upgrade Nomad.
|
|
---
|
|
|
|
# Upgrading
|
|
|
|
Nomad is designed to be flexible and resilient when upgrading from one Nomad
|
|
version to the next. Upgrades should cause neither a Nomad nor a service
|
|
outage. However, there are some restrictions to be aware of before upgrading:
|
|
|
|
- Nomad strives to be backward compatible for at least 1 point release, so
|
|
Nomad v0.10 hosts work with v0.9 hosts. Upgrading 2 point releases (eg v0.8
|
|
to v0.10) may work but is untested and unsupported.
|
|
|
|
- Nomad does _not_ support downgrading at this time. Downgrading clients
|
|
requires draining allocations and removing the [data directory][data_dir].
|
|
Downgrading servers safely requires re-provisioning the cluster.
|
|
|
|
- New features are unlikely to work correctly until all nodes have been
|
|
upgraded.
|
|
|
|
- Check the [version upgrade details page][upgrade-specific] for important
|
|
changes and backward incompatibilities.
|
|
|
|
- When upgrading a Nomad Client, if it takes longer than the
|
|
[`heartbeat_grace`][heartbeat_grace] (10s by default) period to restart, all
|
|
allocations on that node may be rescheduled.
|
|
|
|
Nomad supports upgrading in place or by rolling in new servers:
|
|
|
|
- In Place: The Nomad binary can be updated on existing hosts. Running
|
|
allocations will continue running uninterrupted.
|
|
|
|
- Rolling: New hosts containing the new Nomad version may be added followed by
|
|
the removal of old hosts. The old nodes must be drained to migrate running
|
|
allocations to the new nodes.
|
|
|
|
This guide describes both approaches.
|
|
|
|
## Upgrade Process
|
|
|
|
Once you have checked the [upgrade details for the new
|
|
version][upgrade-specific], the upgrade process is as simple as updating the
|
|
binary on each host and restarting the Nomad service.
|
|
|
|
At a high level we complete the following steps to upgrade Nomad:
|
|
|
|
- **Add the new version**
|
|
- **Check cluster health**
|
|
- **Remove the old version**
|
|
- **Check cluster health**
|
|
- **Upgrade clients**
|
|
|
|
### 1. Add the new version to the existing cluster
|
|
|
|
While it is possible to upgrade Nomad client nodes before servers, this guide
|
|
recommends upgrading servers first as many new client features will not work
|
|
until servers are upgraded.
|
|
|
|
In a [federated cluster](/nomad/tutorials/manage-clusters/federation),
|
|
new features are not guaranteed to work until all agents in a region and the
|
|
server nodes in the authoritative region are upgraded.
|
|
|
|
Whether you are replacing Nomad in place on existing systems or bringing up new
|
|
servers you should make changes incrementally, verifying cluster health at each
|
|
step of the upgrade.
|
|
|
|
On a single server, install the new version of Nomad. You can do this by
|
|
joining a new server to the cluster or by replacing or upgrading the binary
|
|
locally and restarting the Nomad service.
|
|
|
|
Note that if you have [`leave_on_terminate`][] or [`leave_on_interrupt`][] set,
|
|
you should ensure you're using the expected signal for your upgrade process. For
|
|
example, if you have `leave_on_terminate` set and you intend on updating a
|
|
server in-place, you should `SIGINT` and not `SIGTERM` when shutting down the
|
|
server before restarting it.
|
|
|
|
### 2. Check cluster health
|
|
|
|
[Monitor the Nomad logs][monitor] on the remaining servers to check that the
|
|
new server has joined the cluster correctly.
|
|
|
|
Run `nomad agent-info` on the new servers and check that the `last_log_index`
|
|
is of a similar value to the other servers. This step ensures that changes have
|
|
been replicated to the new server.
|
|
|
|
```shell-session
|
|
ubuntu@nomad-server-10-1-1-4:~$ nomad agent-info
|
|
nomad
|
|
bootstrap = false
|
|
known_regions = 1
|
|
leader = false
|
|
server = true
|
|
raft
|
|
applied_index = 53460
|
|
commit_index = 53460
|
|
fsm_pending = 0
|
|
last_contact = 54.512216ms
|
|
last_log_index = 53460
|
|
last_log_term = 1
|
|
last_snapshot_index = 49511
|
|
last_snapshot_term = 1
|
|
num_peers = 2
|
|
...
|
|
```
|
|
|
|
Continue with the upgrades across the servers making sure to do a single Nomad
|
|
server at a time. You can check state of the servers with [`nomad server members`][server-members], and the state of the client nodes with [`nomad node status`][node-status].
|
|
|
|
### 3. Remove the old versions from servers
|
|
|
|
If you are doing an in place upgrade on existing servers this step is not
|
|
necessary as the version was changed in place.
|
|
|
|
If you are doing an upgrade by adding new servers and removing old servers
|
|
from the fleet you need to ensure that the server has left the fleet safely.
|
|
|
|
1. Stop the service on the existing host
|
|
2. On another server issue a `nomad server members` and check the status, if
|
|
the server is now in a left state you are safe to continue.
|
|
3. If the server is not in a left state, issue a `nomad server force-leave <server id>`
|
|
to remove the server from the cluster.
|
|
|
|
Monitor the logs of the other hosts in the Nomad cluster over this period.
|
|
|
|
### 4. Check cluster health
|
|
|
|
Use the same actions in step #2 above to confirm cluster health.
|
|
|
|
### 5. Upgrade clients
|
|
|
|
Following the successful upgrade of the servers you can now update your
|
|
clients using a similar process as the servers. You may either upgrade clients
|
|
in-place or start new nodes on the new version. See the [Workload Migration
|
|
Guide](/nomad/tutorials/manage-clusters/node-drain) for instructions on how to migrate running
|
|
allocations from the old nodes to the new nodes with the [`nomad node drain`](/nomad/docs/commands/node/drain) command.
|
|
|
|
## Done
|
|
|
|
You are now running the latest Nomad version. You can verify all
|
|
Clients joined by running `nomad node status` and checking all the clients
|
|
are in a `ready` state.
|
|
|
|
## Upgrading to Nomad Enterprise
|
|
|
|
Before upgrading servers to Nomad Enterprise versions 1.6.0 and later,
|
|
you should validate your enterprise license with the
|
|
[`nomad license inspect` command](/nomad/docs/commands/license/inspect)
|
|
using the binary that you are upgrading to.
|
|
See the [licensing FAQ](/nomad/docs/enterprise/license/faq)
|
|
for more information.
|
|
|
|
After that, the process of upgrading to a Nomad Enterprise version is identical to upgrading
|
|
between versions of open source Nomad. The same guidance above should be
|
|
followed and as always, prior to starting the upgrade please check the [specific
|
|
version details](/nomad/docs/upgrade/upgrade-specific) page as some version
|
|
differences may require specific steps.
|
|
|
|
[data_dir]: /nomad/docs/configuration#data_dir
|
|
[heartbeat_grace]: /nomad/docs/configuration/server#heartbeat_grace
|
|
[monitor]: /nomad/docs/commands/monitor
|
|
[node-status]: /nomad/docs/commands/node/status
|
|
[server-members]: /nomad/docs/commands/server/members
|
|
[upgrade-specific]: /nomad/docs/upgrade/upgrade-specific
|
|
|
|
## Upgrading to Raft Protocol 3
|
|
|
|
This section provides details on upgrading to Raft Protocol 3. Raft
|
|
protocol version 3 requires Nomad running 0.8.0 or newer on all
|
|
servers in order to work. Raft protocol version 2 will be removed in
|
|
Nomad 1.4.0.
|
|
|
|
To see the version of the Raft protocol in use on each server, use the
|
|
`nomad operator raft list-peers` command.
|
|
|
|
Note that the format of `peers.json` used for outage recovery is
|
|
different when running with the latest Raft protocol. See [Manual
|
|
Recovery Using
|
|
peers.json](/nomad/tutorials/manage-clusters/outage-recovery#manual-recovery-using-peersjson)
|
|
for a description of the required format.
|
|
|
|
When using Raft protocol version 3, servers are identified by their
|
|
`node-id` instead of their IP address when Nomad makes changes to its
|
|
internal Raft quorum configuration. This means that once a cluster has
|
|
been upgraded with servers all running Raft protocol version 3, it
|
|
will no longer allow servers running any older Raft protocol versions
|
|
to be added.
|
|
|
|
### Upgrading a Production Cluster to Raft Version 3
|
|
|
|
For production raft clusters with 3 or more members, the easiest way
|
|
to upgrade servers is to have each server leave the cluster, upgrade
|
|
its [`raft_protocol`] version in the `server` block (if upgrading to
|
|
a version lower than v1.3.0), and then add it back. Make sure the new
|
|
server joins successfully and that the cluster is stable before
|
|
rolling the upgrade forward to the next server. It's also possible to
|
|
stand up a new set of servers, and then slowly stand down each of the
|
|
older servers in a similar fashion.
|
|
|
|
For in-place raft protocol upgrades, perform the following for each
|
|
server, leaving the leader until last to reduce the chance of leader
|
|
elections that will slow down the process:
|
|
|
|
* Stop the server.
|
|
* Run `nomad server force-leave $server_name`.
|
|
* If the upgrade is for a Nomad version lower than v1.3.0, update the
|
|
[`raft_protocol`] in the server's configuration file to `3`.
|
|
* Restart the server.
|
|
* Run `nomad operator raft list-peers` to verify that the
|
|
`RaftProtocol` for the server is now `3`.
|
|
* On the server, run `nomad agent-info` and check that the
|
|
`last_log_index` is of a similar value to the other servers. This
|
|
step ensures that raft is healthy and changes are replicating to the
|
|
new server.
|
|
|
|
### Upgrading a Single Server Cluster to Raft Version 3
|
|
|
|
If you are running a single Nomad server, restarting it in-place will
|
|
result in that server not being able to elect itself as a leader. To
|
|
avoid this, create a new [`peers.json`][peers-json] file before
|
|
restarting the server with the new configuration. If you have `jq`
|
|
installed you can run the following script on the server's host to
|
|
write the correct `peers.json` file:
|
|
|
|
```
|
|
#!/usr/bin/env bash
|
|
|
|
NOMAD_DATA_DIR=$(nomad agent-info -json | jq -r '.config.DataDir')
|
|
NOMAD_ADDR=$(nomad agent-info -json | jq -r '.stats.nomad.leader_addr')
|
|
NODE_ID=$(cat "$NOMAD_DATA_DIR/server/node-id")
|
|
|
|
cat <<EOF > "$NOMAD_DATA_DIR/server/raft/peers.json"
|
|
[
|
|
{
|
|
"id": "$NODE_ID",
|
|
"address": "$NOMAD_ADDR",
|
|
"non_voter": false
|
|
}
|
|
]
|
|
EOF
|
|
```
|
|
|
|
After running this script, if the upgrade is for a Nomad version lower
|
|
than v1.3.0, update the [`raft_protocol`] in the server's
|
|
configuration to `3` and restart the server.
|
|
|
|
[peers-json]: /nomad/tutorials/manage-clusters/outage-recovery#manual-recovery-using-peersjson
|
|
[`raft_protocol`]: /nomad/docs/configuration/server#raft_protocol
|
|
[`leave_on_interrupt`]: /nomad/docs/configuration#leave_on_interrupt
|
|
[`leave_on_terminate`]: /nomad/docs/configuration#leave_on_terminate
|