docs: add drain guide
This commit is contained in:
parent
0b6fbb8e16
commit
0a282e6c88
|
@ -0,0 +1,247 @@
|
|||
---
|
||||
layout: "guides"
|
||||
page_title: "Decommissioning Nodes"
|
||||
sidebar_current: "guides-decommissioning-nodes"
|
||||
description: |-
|
||||
Decommissioning nodes is a normal part of cluster operations for a variety of
|
||||
reasons: server maintenance, operating system upgrades, etc. Nomad offers a
|
||||
number of parameters for controlling how running jobs are migrated off of
|
||||
draining nodes.
|
||||
---
|
||||
|
||||
# Decommissioning Nomad Client Nodes
|
||||
|
||||
Decommissioning nodes is a normal part of cluster operations for a variety of
|
||||
reasons: server maintenance, operating system upgrades, etc. Nomad offers a
|
||||
number of parameters for controlling how running jobs are migrated off of
|
||||
draining nodes.
|
||||
|
||||
## Configuring How Jobs are Migrated
|
||||
|
||||
In Nomad 0.8 a [`migrate`][migrate] stanza was added to jobs to allow control over how
|
||||
allocations for a job are migrated off of a draining node. For example for a
|
||||
job that runs a web service and has a Consul health check:
|
||||
|
||||
```hcl
|
||||
job "webapp" {
|
||||
datacenters = ["dc1"]
|
||||
|
||||
migrate {
|
||||
max_parallel = 2
|
||||
health_check = "checks"
|
||||
min_healthy_time = "15s"
|
||||
healthy_deadline = "5m"
|
||||
}
|
||||
|
||||
group "webapp" {
|
||||
count = 9
|
||||
|
||||
task "webapp" {
|
||||
driver = "docker"
|
||||
config {
|
||||
image = "hashicorp/http-echo:0.2.3"
|
||||
args = ["-text", "ok"]
|
||||
port_map {
|
||||
http = 5678
|
||||
}
|
||||
}
|
||||
|
||||
resources {
|
||||
network {
|
||||
mbits = 10
|
||||
port "http" {}
|
||||
}
|
||||
}
|
||||
|
||||
service {
|
||||
name = "webapp"
|
||||
port = "http"
|
||||
check {
|
||||
name = "http-ok"
|
||||
type = "http"
|
||||
path = "/"
|
||||
interval = "10s"
|
||||
timeout = "2s"
|
||||
}
|
||||
}
|
||||
}
|
||||
}
|
||||
}
|
||||
```
|
||||
|
||||
The above `migrate` stanza ensures only 2 allocations are stopped at a time to
|
||||
migrate during node drains.
|
||||
|
||||
When the job is run it may be placed on multiple nodes. In the following
|
||||
example the 9 `webapp` allocations are spread across 2 nodes:
|
||||
|
||||
```text
|
||||
$ nomad run webapp.nomad
|
||||
==> Monitoring evaluation "5129bc74"
|
||||
Evaluation triggered by job "webapp"
|
||||
Allocation "5b4d6db5" created: node "46f1c6c4", group "webapp"
|
||||
Allocation "670a715f" created: node "f7476465", group "webapp"
|
||||
Allocation "78b6b393" created: node "46f1c6c4", group "webapp"
|
||||
Allocation "85743ff5" created: node "f7476465", group "webapp"
|
||||
Allocation "edf71a5d" created: node "f7476465", group "webapp"
|
||||
Allocation "56f770c0" created: node "46f1c6c4", group "webapp"
|
||||
Allocation "9a51a484" created: node "46f1c6c4", group "webapp"
|
||||
Allocation "f6f6e64c" created: node "f7476465", group "webapp"
|
||||
Allocation "fefe81d0" created: node "f7476465", group "webapp"
|
||||
Evaluation status changed: "pending" -> "complete"
|
||||
==> Evaluation "5129bc74" finished with status "complete"
|
||||
```
|
||||
|
||||
If one those nodes needed to be decommissioned, perhaps because of a hardware
|
||||
issue, then an operator would issue node drain to migrate the allocations off:
|
||||
|
||||
```text
|
||||
$ nomad node drain -enable -yes 46f1
|
||||
2018-04-11T23:41:56Z: Ctrl-C to stop monitoring: will not cancel the node drain
|
||||
2018-04-11T23:41:56Z: Node "46f1c6c4-a0e5-21f6-fd5c-d76c3d84e806" drain strategy set
|
||||
2018-04-11T23:41:57Z: Alloc "5b4d6db5-3fcb-eb7d-0415-23eefcd78b6a" marked for migration
|
||||
2018-04-11T23:41:57Z: Alloc "56f770c0-f8aa-4565-086d-01faa977f82d" marked for migration
|
||||
2018-04-11T23:41:57Z: Alloc "56f770c0-f8aa-4565-086d-01faa977f82d" draining
|
||||
2018-04-11T23:41:57Z: Alloc "5b4d6db5-3fcb-eb7d-0415-23eefcd78b6a" draining
|
||||
2018-04-11T23:42:03Z: Alloc "56f770c0-f8aa-4565-086d-01faa977f82d" status running -> complete
|
||||
2018-04-11T23:42:03Z: Alloc "5b4d6db5-3fcb-eb7d-0415-23eefcd78b6a" status running -> complete
|
||||
2018-04-11T23:42:22Z: Alloc "78b6b393-d29c-d8f8-e8e8-28931c0013ee" marked for migration
|
||||
2018-04-11T23:42:22Z: Alloc "78b6b393-d29c-d8f8-e8e8-28931c0013ee" draining
|
||||
2018-04-11T23:42:27Z: Alloc "78b6b393-d29c-d8f8-e8e8-28931c0013ee" status running -> complete
|
||||
2018-04-11T23:42:29Z: Alloc "9a51a484-8c43-aa4e-d60a-46cfd1450780" marked for migration
|
||||
2018-04-11T23:42:29Z: Alloc "9a51a484-8c43-aa4e-d60a-46cfd1450780" draining
|
||||
2018-04-11T23:42:29Z: Node "46f1c6c4-a0e5-21f6-fd5c-d76c3d84e806" drain complete
|
||||
2018-04-11T23:42:34Z: Alloc "9a51a484-8c43-aa4e-d60a-46cfd1450780" status running -> complete
|
||||
2018-04-11T23:42:34Z: All allocations on node "46f1c6c4-a0e5-21f6-fd5c-d76c3d84e806" have stopped.
|
||||
```
|
||||
|
||||
There are a couple important events to notice in the output. First, only 2
|
||||
allocations are migrated initially:
|
||||
|
||||
```
|
||||
2018-04-11T23:41:57Z: Alloc "5b4d6db5-3fcb-eb7d-0415-23eefcd78b6a" marked for migration
|
||||
2018-04-11T23:41:57Z: Alloc "56f770c0-f8aa-4565-086d-01faa977f82d" marked for migration
|
||||
```
|
||||
|
||||
This is because `max_parallel = 2` in the job specification. The next
|
||||
allocation on the draining node waits to be migrated:
|
||||
|
||||
```
|
||||
2018-04-11T23:42:22Z: Alloc "78b6b393-d29c-d8f8-e8e8-28931c0013ee" marked for migration
|
||||
```
|
||||
|
||||
Note that this occurs 25 seconds after the initial migrations. The 25 second
|
||||
delay is because a replacement allocation took 10 seconds to become healthy and
|
||||
then the `min_healthy_deadline = "15s"` meant node draining waited an
|
||||
additional 15 seconds. If the replacement allocation had failed within that
|
||||
time the node drain would not have continued until a replacement could be
|
||||
successfully made.
|
||||
|
||||
### Scheduling Eligibility
|
||||
|
||||
Now that the example drain has finished we can inspect the state of the drained
|
||||
node:
|
||||
|
||||
```text
|
||||
$ nomad node status
|
||||
ID DC Name Class Drain Eligibility Status
|
||||
46f1c6c4 dc1 nomad-5 <none> false ineligible ready
|
||||
96b52ad8 dc1 nomad-6 <none> false eligible ready
|
||||
f7476465 dc1 nomad-4 <none> false eligible ready
|
||||
```
|
||||
|
||||
While node `46f1` has `Drain = false`, notice that its `Eligibility =
|
||||
ineligible`. Node scheduling eligibility is a new field in Nomad 0.8. When a
|
||||
node is ineligible for scheduling the scheduler will not consider it for new
|
||||
placements.
|
||||
|
||||
While draining, a node will always be ineligible for scheduling. Once draining
|
||||
completes it will remain ineligible to prevent refilling a newly drained node.
|
||||
|
||||
However, by default canceling a drain with the `-disable` option will reset a
|
||||
node to be eligible for scheduling. To cancel a drain and preserving the node's
|
||||
ineligible status use the `-keep-ineligible` option.
|
||||
|
||||
Scheduling eligibility can be toggled independently of node drains by using the
|
||||
[`nomad node eligibility`][eligibility] command.
|
||||
|
||||
### Node Drain Deadline
|
||||
|
||||
Sometimes a drain is unable to proceed and complete normally. This could be
|
||||
caused by not enough capacity existing in the cluster to replace the drained
|
||||
allocations or by replacement allocations failing to start successfully in a
|
||||
timely fashion.
|
||||
|
||||
Operators may specify a deadline using the option for node drain to prevent
|
||||
drains from getting stuck. Once the deadline is reached, all remaining
|
||||
allocations on the node are stopped regardless of `migrate` stanza parameters.
|
||||
|
||||
The default deadline is 1 hour and may be changed with the
|
||||
[`-deadline`][deadline] command line option. The [`-force`][force] option is
|
||||
like an instant deadline: all allocations are immediately stopped. The
|
||||
[`-no-deadline`][no-deadline] option disables the deadline so a drain may
|
||||
continue indefinitely.
|
||||
|
||||
Like all other drain parameters, a drain's deadline can be updated by making
|
||||
subsequent `nomad node drain ...` calls with updated values.
|
||||
|
||||
## Node Drains and Non-Service Jobs
|
||||
|
||||
So far we have only seen how draining works with service jobs. Both batch and
|
||||
system jobs are have different behaviors during node drains.
|
||||
|
||||
### Draining Batch Jobs
|
||||
|
||||
Node drains only migrate batch jobs once the drain's deadline has been reached.
|
||||
For node drains without a deadline the drain will not complete until all batch
|
||||
jobs on the node have completed (or failed).
|
||||
|
||||
The goal of this behavior is to avoid losing progress a batch job has made by
|
||||
forcing it to exit early.
|
||||
|
||||
### Keeping System Jobs Running
|
||||
|
||||
Node drains only stop system jobs once all other allocations have exited. This
|
||||
way if a node is running a log shipping daemon or metrics collector as a system
|
||||
job, it will continue to run as long as there are other services running.
|
||||
|
||||
The [`-ignore-system`][ignore-system] option leaves system jobs running even
|
||||
after all other allocations have exited. This is useful when system jobs are
|
||||
used to monitor Nomad itself or other system properties.
|
||||
|
||||
## Draining Multiple Nodes
|
||||
|
||||
A common operation is to decommission an entire class of nodes at once. Prior
|
||||
to Nomad 0.7 this was a problematic operation as the first node to begin
|
||||
draining may migrate all of their allocations to the next node about to be
|
||||
drained. In pathological cases this could repeat on each node to be drained and
|
||||
cause allocations to be rescheduled repeatedly.
|
||||
|
||||
As of Nomad 0.8 an operator can avoid this churn by marking node ineligible for
|
||||
scheduling before draining them using the [`nomad node
|
||||
eligibility`][eligibility] command:
|
||||
|
||||
```text
|
||||
$ nomad node eligibility -disable 46f1
|
||||
Node "46f1c6c4-a0e5-21f6-fd5c-d76c3d84e806" scheduling eligibility set: ineligible for scheduling
|
||||
|
||||
$ nomad node eligibility -disable 96b5
|
||||
Node "96b52ad8-e9ad-1084-c14f-0e11f10772e4" scheduling eligibility set: ineligible for scheduling
|
||||
|
||||
$ nomad node status
|
||||
ID DC Name Class Drain Eligibility Status
|
||||
46f1c6c4 dc1 nomad-5 <none> false ineligible ready
|
||||
96b52ad8 dc1 nomad-6 <none> false ineligible ready
|
||||
f7476465 dc1 nomad-4 <none> false eligible ready
|
||||
```
|
||||
|
||||
Now that both `nomad-5` and `nomad-6` are ineligible for scheduling, they can
|
||||
be drained without risking placing allocations on an _about-to-be-drained_
|
||||
node.
|
||||
|
||||
[deadline]: /docs/commands/node/drain.html#deadline
|
||||
[eligibility]: /docs/commands/node/eligibility.html
|
||||
[force]: /docs/commands/node/drain.html#force
|
||||
[ignore-system]: /docs/commands/node/drain.html#ignore-system
|
||||
[migrate]: /docs/job-specification/migrate.html
|
||||
[no-deadline]: /docs/commands/node/drain.html#no-deadline
|
|
@ -57,6 +57,10 @@
|
|||
</ul>
|
||||
</li>
|
||||
|
||||
<li<%= sidebar_current("guides-decommissioning-nodes") %>>
|
||||
<a href="/guides/node-draining.html">Decommissioning Nodes</a>
|
||||
</li>
|
||||
|
||||
<li<%= sidebar_current("guides-namespaces") %>>
|
||||
<a href="/guides/namespaces.html">Namespaces</a>
|
||||
</li>
|
||||
|
|
Loading…
Reference in New Issue