raft: default to protocol v3 (#11572)
Many of Nomad's Autopilot features require raft protocol version 3. Set the default raft protocol to 3, and improve the upgrade documentation.
This commit is contained in:
parent
5f48e18189
commit
7ad15b2b42
|
@ -0,0 +1,7 @@
|
|||
```release-note:improvement
|
||||
raft: The default raft protocol version is now 3.
|
||||
```
|
||||
|
||||
```release-note:deprecation
|
||||
Raft protocol version 2 is deprecated and will be removed in Nomad 1.4.0.
|
||||
```
|
|
@ -953,6 +953,7 @@ func DefaultConfig() *Config {
|
|||
Enabled: false,
|
||||
EnableEventBroker: helper.BoolToPtr(true),
|
||||
EventBufferSize: helper.IntToPtr(100),
|
||||
RaftProtocol: 3,
|
||||
StartJoin: []string{},
|
||||
ServerJoin: &ServerJoin{
|
||||
RetryJoin: []string{},
|
||||
|
|
|
@ -161,7 +161,7 @@ server {
|
|||
required as the agent internally knows the latest version, but may be useful
|
||||
in some upgrade scenarios.
|
||||
|
||||
- `raft_protocol` `(int: 2)` - Specifies the Raft protocol version to use when
|
||||
- `raft_protocol` `(int: 3)` - Specifies the Raft protocol version to use when
|
||||
communicating with other Nomad servers. This affects available Autopilot
|
||||
features and is typically not required as the agent internally knows the
|
||||
latest version, but may be useful in some upgrade scenarios.
|
||||
|
|
|
@ -153,3 +153,83 @@ differences may require specific steps.
|
|||
[node-status]: /docs/commands/node/status
|
||||
[server-members]: /docs/commands/server/members
|
||||
[upgrade-specific]: /docs/upgrade/upgrade-specific
|
||||
|
||||
## Upgrading to Raft Protocol 3
|
||||
|
||||
This section provides details on upgrading to Raft Protocol 3. Raft
|
||||
protocol version 3 requires Nomad running 0.8.0 or newer on all
|
||||
servers in order to work. Raft protocol version 2 will be removed in
|
||||
Nomad 1.4.0.
|
||||
|
||||
To see the version of the Raft protocol in use on each server, use the
|
||||
`nomad operator raft list-peers` command.
|
||||
|
||||
Note that the format of `peers.json` used for outage recovery is
|
||||
different when running with the latest Raft protocol. See [Manual
|
||||
Recovery Using
|
||||
peers.json](https://learn.hashicorp.com/tutorials/nomad/outage-recovery#manual-recovery-using-peersjson)
|
||||
for a description of the required format.
|
||||
|
||||
When using Raft protocol version 3, servers are identified by their
|
||||
`node-id` instead of their IP address when Nomad makes changes to its
|
||||
internal Raft quorum configuration. This means that once a cluster has
|
||||
been upgraded with servers all running Raft protocol version 3, it
|
||||
will no longer allow servers running any older Raft protocol versions
|
||||
to be added.
|
||||
|
||||
### Upgrading a Production Cluster to Raft Version 3
|
||||
|
||||
For production raft clusters with 3 or more memebrs, the easiest way
|
||||
to upgrade servers is to have each server leave the cluster, upgrade
|
||||
its [`raft_protocol`] version in the `server` stanza, and then add it
|
||||
back. Make sure the new server joins successfully and that the cluster
|
||||
is stable before rolling the upgrade forward to the next server. It's
|
||||
also possible to stand up a new set of servers, and then slowly stand
|
||||
down each of the older servers in a similar fashion.
|
||||
|
||||
For in-place raft protocol upgrades, perform the following for each
|
||||
server, leaving the leader until last to reduce the chance of leader
|
||||
elections that will slow down the process:
|
||||
|
||||
* Stop the server
|
||||
* Run `nomad server force-leave $server_name`
|
||||
* Update the `raft_protocol` in the server's configuration file to 3.
|
||||
* Restart the server
|
||||
* Run `nomad operator raft list-peers` to verify that the `raft_vsn`
|
||||
for the server is now 3.
|
||||
* On the server, run `nomad agent-info` and check that the
|
||||
`last_log_index` is of a similar value to the other servers. This
|
||||
step ensures that raft is healthy and changes are replicating to the
|
||||
new server.
|
||||
|
||||
### Upgrading a Single Server Cluster to Raft Version 3
|
||||
|
||||
If you are running a single Nomad server, restarting it in-place will
|
||||
result in that server not being able to elect itself as a leader. To
|
||||
avoid this, create a new [`raft.peers`][peers-json] file before
|
||||
restarting the server with the new configuration. If you have `jq`
|
||||
installed you can run the following script on the server's host to
|
||||
write the correct `raft.peers` file:
|
||||
|
||||
```
|
||||
#!/usr/bin/env bash
|
||||
|
||||
NOMAD_DATA_DIR=$(nomad agent-info -json | jq -r '.config.DataDir')
|
||||
NOMAD_ADDR=$(nomad agent-info -json | jq -r '.stats.nomad.leader_addr')
|
||||
NODE_ID=$(cat "$NOMAD_DATA_DIR/server/node-id")
|
||||
|
||||
cat <<EOF > "$NOMAD_DATA_DIR/server/raft/peers.json"
|
||||
[
|
||||
{
|
||||
"id": "$NODE_ID",
|
||||
"address": "$NOMAD_ADDR",
|
||||
"non_voter": false
|
||||
}
|
||||
]
|
||||
EOF
|
||||
```
|
||||
|
||||
After running this script, update the `raft_protocol` in the server's
|
||||
configuration to 3 and restart the server.
|
||||
|
||||
[peers-json]: https://learn.hashicorp.com/tutorials/nomad/outage-recovery#manual-recovery-using-peersjson
|
||||
|
|
|
@ -13,6 +13,18 @@ upgrade. However, specific versions of Nomad may have more details provided for
|
|||
their upgrades as a result of new features or changed behavior. This page is
|
||||
used to document those details separately from the standard upgrade flow.
|
||||
|
||||
## Nomad 1.3.0
|
||||
|
||||
#### Raft Protocol Version 2 Deprecation
|
||||
|
||||
Raft protocol version 2 will be removed from Nomad in the next major
|
||||
release of Nomad, 1.4.0.
|
||||
|
||||
In Nomad 1.3.0, the default raft protocol version has been updated to
|
||||
3. If the [`raft_protocol_version`] is not explicitly set, upgrading a
|
||||
server will automatically upgrade that server's raft protocol. See the
|
||||
[Upgrading to Raft Protocol 3] guide.
|
||||
|
||||
## Nomad 1.2.4
|
||||
|
||||
#### `nomad eval status -json` deprecated
|
||||
|
@ -959,7 +971,7 @@ will be interpolated properly. Please see the
|
|||
### Raft Protocol Version Compatibility
|
||||
|
||||
When upgrading to Nomad 0.8.0 from a version lower than 0.7.0, users will need
|
||||
to set the [`raft_protocol`](/docs/configuration/server#raft_protocol) option in
|
||||
to set the [`raft_protocol`] option in
|
||||
their `server` stanza to 1 in order to maintain backwards compatibility with the
|
||||
old servers during the upgrade. After the servers have been migrated to version
|
||||
0.8.0, `raft_protocol` can be moved up to 2 and the servers restarted to match
|
||||
|
@ -997,40 +1009,6 @@ In order to enable all
|
|||
servers in a Nomad cluster must be running with Raft protocol version 3 or
|
||||
later.
|
||||
|
||||
#### Upgrading to Raft Protocol 3
|
||||
|
||||
This section provides details on upgrading to Raft Protocol 3 in Nomad 0.8 and
|
||||
higher. Raft protocol version 3 requires Nomad running 0.8.0 or newer on all
|
||||
servers in order to work. See [Raft Protocol Version
|
||||
Compatibility](/docs/upgrade/upgrade-specific#raft-protocol-version-compatibility)
|
||||
for more details. Also the format of `peers.json` used for outage recovery is
|
||||
different when running with the latest Raft protocol. See [Manual Recovery Using
|
||||
peers.json](https://learn.hashicorp.com/tutorials/nomad/outage-recovery#manual-recovery-using-peersjson)
|
||||
for a description of the required format.
|
||||
|
||||
Please note that the Raft protocol is different from Nomad's internal protocol
|
||||
as shown in commands like `nomad server members`. To see the version of the Raft
|
||||
protocol in use on each server, use the `nomad operator raft list-peers`
|
||||
command.
|
||||
|
||||
The easiest way to upgrade servers is to have each server leave the cluster,
|
||||
upgrade its `raft_protocol` version in the `server` stanza, and then add it
|
||||
back. Make sure the new server joins successfully and that the cluster is stable
|
||||
before rolling the upgrade forward to the next server. It's also possible to
|
||||
stand up a new set of servers, and then slowly stand down each of the older
|
||||
servers in a similar fashion.
|
||||
|
||||
When using Raft protocol version 3, servers are identified by their `node-id`
|
||||
instead of their IP address when Nomad makes changes to its internal Raft quorum
|
||||
configuration. This means that once a cluster has been upgraded with servers all
|
||||
running Raft protocol version 3, it will no longer allow servers running any
|
||||
older Raft protocol versions to be added. If running a single Nomad server,
|
||||
restarting it in-place will result in that server not being able to elect itself
|
||||
as a leader. To avoid this, either set the Raft protocol back to 2, or use
|
||||
[Manual Recovery Using
|
||||
peers.json](https://learn.hashicorp.com/tutorials/nomad/outage-recovery#manual-recovery-using-peersjson)
|
||||
to map the server to its node ID in the Raft quorum configuration.
|
||||
|
||||
### Node Draining Improvements
|
||||
|
||||
Node draining via the [`node drain`][drain-cli] command or the [drain
|
||||
|
@ -1224,6 +1202,8 @@ deleted and then Nomad 0.3.0 can be launched.
|
|||
[preemption]: /docs/internals/scheduling/preemption
|
||||
[proxy_concurrency]: /docs/job-specification/sidecar_task#proxy_concurrency
|
||||
[`sidecar_task.config`]: /docs/job-specification/sidecar_task#config
|
||||
[raft protocol version]: /docs/configuration/server#raft_protocol
|
||||
[`raft protocol`]: /docs/configuration/server#raft_protocol
|
||||
[reserved]: /docs/configuration/client#reserved-parameters
|
||||
[task-config]: /docs/job-specification/task#config
|
||||
[tls-guide]: https://learn.hashicorp.com/tutorials/nomad/security-enable-tls
|
||||
|
@ -1248,3 +1228,4 @@ deleted and then Nomad 0.3.0 can be launched.
|
|||
[cap_add_exec]: /docs/drivers/exec#cap_add
|
||||
[cap_drop_exec]: /docs/drivers/exec#cap_drop
|
||||
[`log_file`]: /docs/configuration#log_file
|
||||
[Upgrading to Raft Protocol 3]: /docs/upgrade#upgrading-to-raft-protocol-3
|
||||
|
|
Loading…
Reference in New Issue