Merge pull request #4085 from hashicorp/docs-node-drain
Initial Node drain docs
This commit is contained in:
commit
3495df7da9
|
@ -1,9 +1,15 @@
|
|||
## 0.8 (Unreleased)
|
||||
|
||||
__BACKWARDS INCOMPATIBILITIES:__
|
||||
* cli: node drain now blocks until the drain completes and all allocations on
|
||||
the draining node have stopped. Use -detach for the old behavior.
|
||||
* discovery: Prevent absolute URLs in check paths. The documentation indicated
|
||||
that absolute URLs are not allowed, but it was not enforced. Absolute URLs
|
||||
in HTTP check paths will now fail to validate. [[GH-3685](https://github.com/hashicorp/nomad/issues/3685)]
|
||||
* drain: Draining a node no longer stops all allocations immediately: a new
|
||||
[migrate stanza](https://www.nomadproject.io/docs/job-specification/migrate.html)
|
||||
allows jobs to specify how quickly task groups can be drained. A `-force`
|
||||
option can be used to emulate the old drain behavior.
|
||||
* jobspec: The default values for restart policy have changed. Restart policy mode defaults to "fail" and the
|
||||
attempts/time interval values have been changed to enable faster server side rescheduling. See
|
||||
[restart stanza](https://www.nomadproject.io/docs/job-specification/restart.html) for more information.
|
||||
|
@ -21,6 +27,9 @@ IMPROVEMENTS:
|
|||
* core: Servers can now retry connecting to Vault to verify tokens without requiring a SIGHUP to do so [[GH-3957](https://github.com/hashicorp/nomad/issues/3957)]
|
||||
* core: Updated yamux library to pick up memory and CPU performance improvements [[GH-3980](https://github.com/hashicorp/nomad/issues/3980)]
|
||||
* core: Client stanza now supports overriding total memory [[GH-4052](https://github.com/hashicorp/nomad/issues/4052)]
|
||||
* core: Node draining is now able to migrate allocations in a controlled
|
||||
manner with parameters specified by the drain command and in job files using
|
||||
the migrate stanza [[GH-4010](https://github.com/hashicorp/nomad/issues/4010)]
|
||||
* acl: Increase token name limit from 64 characters to 256 [[GH-3888](https://github.com/hashicorp/nomad/issues/3888)]
|
||||
* cli: Node status and filesystem related commands do not require direct
|
||||
network access to the Nomad client nodes [[GH-3892](https://github.com/hashicorp/nomad/issues/3892)]
|
||||
|
|
|
@ -159,6 +159,35 @@ job "example" {
|
|||
canary = 0
|
||||
}
|
||||
|
||||
# The migrate stanza specifies the group's strategy for migrating off of
|
||||
# draining nodes. If omitted, a default migration strategy is applied.
|
||||
#
|
||||
# For more information on the "migrate" stanza, please see
|
||||
# the online documentation at:
|
||||
#
|
||||
# https://www.nomadproject.io/docs/job-specification/migrate.html
|
||||
#
|
||||
migrate {
|
||||
# Specifies the number of task groups that can be migrated at the same
|
||||
# time. This number must be less than the total count for the group as
|
||||
# (count - max_parallel) will be left running during migrations.
|
||||
max_parallel = 1
|
||||
|
||||
# Specifies the mechanism in which allocations health is determined. The
|
||||
# potential values are "checks" or "task_states".
|
||||
health_check = "checks"
|
||||
|
||||
# Specifies the minimum time the allocation must be in the healthy state
|
||||
# before it is marked as healthy and unblocks further allocations from being
|
||||
# migrated. This is specified using a label suffix like "30s" or "15m".
|
||||
min_healthy_time = "10s"
|
||||
|
||||
# Specifies the deadline in which the allocation must be marked as healthy
|
||||
# after which the allocation is automatically transitioned to unhealthy. This
|
||||
# is specified using a label suffix like "2m" or "1h".
|
||||
healthy_deadline = "5m"
|
||||
}
|
||||
|
||||
# The "group" stanza defines a series of tasks that should be co-located on
|
||||
# the same Nomad client. Any task within a group will be placed on the same
|
||||
# client.
|
||||
|
|
|
@ -8,8 +8,8 @@ description: >
|
|||
|
||||
# Command: job deployments
|
||||
|
||||
The `job dispatch` command is used to display the deployments for a particular
|
||||
job.
|
||||
The `job deployments` command is used to display the deployments for a
|
||||
particular job.
|
||||
|
||||
## Usage
|
||||
|
||||
|
|
|
@ -18,9 +18,11 @@ Run `nomad node <subcommand> -h` for help on that subcommand. The following
|
|||
subcommands are available:
|
||||
|
||||
* [`node config`][config] - View or modify client configuration details
|
||||
* [`node drain`][drain] - Toggle drain mode on a given node
|
||||
* [`node drain`][drain] - Set drain mode on a given node
|
||||
* [`node eligibility`][eligibility] - Toggle scheduilng eligibility on a given node
|
||||
* [`node status`][status] - Display status information about nodes
|
||||
|
||||
[config]: /docs/commands/node/config.html "View or modify client configuration details"
|
||||
[drain]: /docs/commands/node/drain.html "Toggle drain mode on a given node"
|
||||
[drain]: /docs/commands/node/drain.html "Set drain mode on a given node"
|
||||
[eligibility]: /docs/commands/node/eligibility.html "Toggle scheduling eligibility on a given node"
|
||||
[status]: /docs/commands/node/status.html "Display status information about nodes"
|
||||
|
|
|
@ -10,7 +10,20 @@ description: >
|
|||
|
||||
The `node drain` command is used to toggle drain mode on a given node. Drain
|
||||
mode prevents any new tasks from being allocated to the node, and begins
|
||||
migrating all existing allocations away.
|
||||
migrating all existing allocations away. Allocations will be migrated according
|
||||
to their [`migrate`][migrate] stanza until the drain's deadline is reached.
|
||||
|
||||
By default the `node drain` command blocks until a node is done draining and
|
||||
all allocations have terminated. Canceling the `node drain` command *will not*
|
||||
cancel the drain. Drains may be canceled by using the `-disable` parameter
|
||||
below.
|
||||
|
||||
When draining more than one node at a time, it is recommended you first disable
|
||||
[scheduling eligibility][eligibility] on all nodes that will be drained. For
|
||||
example if you are decommissioning an entire class of nodes, first run `node
|
||||
eligibility -disable` on all of their node IDs, and then run `node drain
|
||||
-enable`. This will ensure allocations drained from the first node are not
|
||||
placed on another node about to be drained.
|
||||
|
||||
The [node status](/docs/commands/node/status.html) command compliments this
|
||||
nicely by providing the current drain status of a given node.
|
||||
|
@ -37,6 +50,19 @@ operation is desired.
|
|||
|
||||
* `-enable`: Enable node drain mode.
|
||||
* `-disable`: Disable node drain mode.
|
||||
* `-deadline`: Set the deadline by which all allocations must be moved off the
|
||||
node. Remaining allocations after the deadline are force removed from the
|
||||
node. Defaults to 1 hour.
|
||||
* `-detach`: Return immediately instead of entering monitor mode.
|
||||
* `-force`: Force remove allocations off the node immediately.
|
||||
* `-no-deadline`: No deadline allows the allocations to drain off the node
|
||||
without being force stopped after a certain deadline.
|
||||
* `-ignore-system`: Ignore sytem allows the drain to complete without stopping
|
||||
system job allocations. By default system jobs are stopped last.
|
||||
* `-keep-ineligible`: Keep ineligible will maintain the node's scheduling
|
||||
ineligibility even if the drain is being disabled. This is useful when an
|
||||
existing drain is being cancelled but additional scheduling on the node is not
|
||||
desired.
|
||||
* `-self`: Drain the local node.
|
||||
* `-yes`: Automatic yes to prompts.
|
||||
|
||||
|
@ -45,11 +71,46 @@ operation is desired.
|
|||
Enable drain mode on node with ID prefix "4d2ba53b":
|
||||
|
||||
```
|
||||
$ nomad node drain -enable 4d2ba53b
|
||||
$ nomad node drain -enable f4e8a9e5
|
||||
Are you sure you want to enable drain mode for node "f4e8a9e5-30d8-3536-1e6f-cda5c869c35e"? [y/N] y
|
||||
2018-03-30T23:13:16Z: Ctrl-C to stop monitoring: will not cancel the node drain
|
||||
2018-03-30T23:13:16Z: Node "f4e8a9e5-30d8-3536-1e6f-cda5c869c35e" drain strategy set
|
||||
2018-03-30T23:13:17Z: Alloc "1877230b-64d3-a7dd-9c31-dc5ad3c93e9a" marked for migration
|
||||
2018-03-30T23:13:17Z: Alloc "1877230b-64d3-a7dd-9c31-dc5ad3c93e9a" draining
|
||||
2018-03-30T23:13:17Z: Alloc "1877230b-64d3-a7dd-9c31-dc5ad3c93e9a" status running -> complete
|
||||
2018-03-30T23:13:29Z: Alloc "3fce5308-818c-369e-0bb7-f61f0a1be9ed" marked for migration
|
||||
2018-03-30T23:13:29Z: Alloc "3fce5308-818c-369e-0bb7-f61f0a1be9ed" draining
|
||||
2018-03-30T23:13:30Z: Alloc "3fce5308-818c-369e-0bb7-f61f0a1be9ed" status running -> complete
|
||||
2018-03-30T23:13:41Z: Alloc "9a98c5aa-a719-2f34-ecfc-0e6268b5d537" marked for migration
|
||||
2018-03-30T23:13:41Z: Alloc "9a98c5aa-a719-2f34-ecfc-0e6268b5d537" draining
|
||||
2018-03-30T23:13:41Z: Node "f4e8a9e5-30d8-3536-1e6f-cda5c869c35e" drain complete
|
||||
2018-03-30T23:13:42Z: Alloc "9a98c5aa-a719-2f34-ecfc-0e6268b5d537" status running -> complete
|
||||
2018-03-30T23:13:42Z: All allocations on node "f4e8a9e5-30d8-3536-1e6f-cda5c869c35e" have stopped.
|
||||
```
|
||||
|
||||
Enable drain mode on the local node:
|
||||
|
||||
```
|
||||
$ nomad node drain -enable -self
|
||||
...
|
||||
```
|
||||
|
||||
Enable drain mode but do not stop system jobs:
|
||||
|
||||
```
|
||||
$ nomad node drain -enable -ignore-system 4d2ba53b
|
||||
...
|
||||
```
|
||||
|
||||
Disable drain mode but keep the node ineligible for scheduling. Useful for
|
||||
inspecting the current state of a misbehaving node without Nomad trying to
|
||||
start or migrate allocations:
|
||||
|
||||
```
|
||||
$ nomad node drain -disable -keep-ineligible 4d2ba53b
|
||||
...
|
||||
```
|
||||
|
||||
|
||||
[eligibility]: /docs/commands/node/eligibility.html
|
||||
[migrate]: /docs/job-specification/migrate.html
|
||||
|
|
|
@ -0,0 +1,71 @@
|
|||
---
|
||||
layout: "docs"
|
||||
page_title: "Commands: node eligibility"
|
||||
sidebar_current: "docs-commands-node-eligibility"
|
||||
description: >
|
||||
The node eligibility command is used to configure a node's scheduling
|
||||
eligibility.
|
||||
---
|
||||
|
||||
# Command: node eligibility
|
||||
|
||||
The `node eligibility` command is used to toggle scheduling eligibility for a
|
||||
given node. By default node's are eligible for scheduling meaning they can
|
||||
receive placements and run new allocations. Node's that have their scheduling
|
||||
elegibility disabled are ineligibile for new placements.
|
||||
|
||||
The [`node drain`][drain] command automatically disables eligibility. Disabling
|
||||
a drain restore eligibility by default.
|
||||
|
||||
Disable scheduling eligibility is useful when draining a set of nodes: first
|
||||
disable eligibility on each node that will be drained. Then drain each node.
|
||||
If you just drain each node allocations may get rescheduled multiple times as
|
||||
they get placed on node's about to be drained!
|
||||
|
||||
Disabling scheduling eligibility may also be useful when investigating poorly
|
||||
behaved nodes. It allows operators to investigate the current state of a node
|
||||
without the risk of additional work being assigned to it.
|
||||
|
||||
## Usage
|
||||
|
||||
```
|
||||
nomad node eligibility [options] <node>
|
||||
```
|
||||
|
||||
A `-self` flag can be used to toggle eligibility of the local node. If this is
|
||||
not supplied, a node ID or prefix must be provided. If there is an exact match,
|
||||
the eligibility will be adjusted for that node. Otherwise, a list of matching
|
||||
nodes and information will be displayed.
|
||||
|
||||
It is also required to pass one of `-enable` or `-disable`, depending on which
|
||||
operation is desired.
|
||||
|
||||
## General Options
|
||||
|
||||
<%= partial "docs/commands/_general_options" %>
|
||||
|
||||
## Drain Options
|
||||
|
||||
* `-enable`: Enable scheduling eligbility.
|
||||
* `-disable`: Disable scheduling eligibility.
|
||||
* `-self`: Set eligibility for the local node.
|
||||
* `-yes`: Automatic yes to prompts.
|
||||
|
||||
## Examples
|
||||
|
||||
Enable scheduling eligibility on node with ID prefix "574545c5":
|
||||
|
||||
```
|
||||
$ nomad node eligibility -enable 574545c5
|
||||
Node "574545c5-c2d7-e352-d505-5e2cb9fe169f" scheduling eligibility set: eligible for scheduling
|
||||
```
|
||||
|
||||
Disable scheduling eligibility on the local node:
|
||||
|
||||
```
|
||||
$ nomad node eligibility -disable -self
|
||||
Node "574545c5-c2d7-e352-d505-5e2cb9fe169f" scheduling eligibility set: ineligible for scheduling
|
||||
```
|
||||
|
||||
|
||||
[drain]: /docs/commands/node/drain.html
|
|
@ -0,0 +1,84 @@
|
|||
---
|
||||
layout: "docs"
|
||||
page_title: "migrate Stanza - Job Specification"
|
||||
sidebar_current: "docs-job-specification-migrate"
|
||||
description: |-
|
||||
The "migrate" stanza specifies the group's migrate strategy. The migrate
|
||||
strategy is used to control the job's behavior when it is being migrated off
|
||||
of a draining node.
|
||||
---
|
||||
|
||||
# `migrate` Stanza
|
||||
|
||||
<table class="table table-bordered table-striped">
|
||||
<tr>
|
||||
<th width="120">Placement</th>
|
||||
<td>
|
||||
<code>job -> **migrate**</code>
|
||||
</td>
|
||||
<td>
|
||||
<code>job -> group -> **migrate**</code>
|
||||
</td>
|
||||
</tr>
|
||||
</table>
|
||||
|
||||
The `migrate` stanza specifies the group's strategy for migrating off of
|
||||
[draining][drain] nodes. If omitted, a default migration strategy is applied.
|
||||
If specified at the job level, the configuration will apply to all groups
|
||||
within the job. Only service jobs with a count greater than 1 support migrate
|
||||
stanzas.
|
||||
|
||||
```hcl
|
||||
job "docs" {
|
||||
migrate {
|
||||
max_parallel = 1
|
||||
health_check = "checks"
|
||||
min_healthy_time = "10s"
|
||||
healthy_deadline = "5m"
|
||||
}
|
||||
}
|
||||
```
|
||||
|
||||
When one or more nodes are draining, only `max_parallel` allocations will be
|
||||
stopped at a time. Node draining will not continue until replacement
|
||||
allocations have been healthy for their `min_healthy_time` or
|
||||
`healthy_deadline` is reached.
|
||||
|
||||
Note that a node's drain [deadline][deadline] will override the `migrate`
|
||||
stanza for allocations on that node. The `migrate` stanza is for job authors to
|
||||
define how their services should be migrated, while the node drain deadline is
|
||||
for system operators to put hard limits on how long a drain may take.
|
||||
|
||||
## `migrate` Parameters
|
||||
|
||||
- `max_parallel` `(int: 1)` - Specifies the number of allocations that can be
|
||||
migrated at the same time. This number must be less than the total
|
||||
[`count`][count] for the group as `count - max_parallel` will be left running
|
||||
during migrations.
|
||||
|
||||
- `health_check` `(string: "checks")` - Specifies the mechanism in which
|
||||
allocations health is determined. The potential values are:
|
||||
|
||||
- "checks" - Specifies that the allocation should be considered healthy when
|
||||
all of its tasks are running and their associated [checks][checks] are
|
||||
healthy, and unhealthy if any of the tasks fail or not all checks become
|
||||
healthy. This is a superset of "task_states" mode.
|
||||
|
||||
- "task_states" - Specifies that the allocation should be considered healthy when
|
||||
all its tasks are running and unhealthy if tasks fail.
|
||||
|
||||
- `min_healthy_time` `(string: "10s")` - Specifies the minimum time the
|
||||
allocation must be in the healthy state before it is marked as healthy and
|
||||
unblocks further allocations from being migrated. This is specified using a
|
||||
label suffix like "30s" or "15m".
|
||||
|
||||
- `healthy_deadline` `(string: "5m")` - Specifies the deadline in which the
|
||||
allocation must be marked as healthy after which the allocation is
|
||||
automatically transitioned to unhealthy. This is specified using a label
|
||||
suffix like "2m" or "1h".
|
||||
|
||||
|
||||
[checks]: /docs/job-specification/service.html#check-parameters
|
||||
[count]: /docs/job-specification/group.html#count
|
||||
[drain]: /docs/commands/node/drain.html
|
||||
[deadline]: /docs/commands/node/drain.html#deadline
|
|
@ -53,6 +53,9 @@
|
|||
<li<%= sidebar_current("docs-job-specification-meta")%>>
|
||||
<a href="/docs/job-specification/meta.html">meta</a>
|
||||
</li>
|
||||
<li<%= sidebar_current("docs-job-specification-migrate")%>>
|
||||
<a href="/docs/job-specification/migrate.html">migrate</a>
|
||||
</li>
|
||||
<li<%= sidebar_current("docs-job-specification-network")%>>
|
||||
<a href="/docs/job-specification/network.html">network</a>
|
||||
</li>
|
||||
|
@ -324,6 +327,9 @@
|
|||
<li<%= sidebar_current("docs-commands-node-drain") %>>
|
||||
<a href="/docs/commands/node/drain.html">drain</a>
|
||||
</li>
|
||||
<li<%= sidebar_current("docs-commands-node-eligibility") %>>
|
||||
<a href="/docs/commands/node/eligibility.html">eligibility</a>
|
||||
</li>
|
||||
<li<%= sidebar_current("docs-commands-node-status") %>>
|
||||
<a href="/docs/commands/node/status.html">status</a>
|
||||
</li>
|
||||
|
|
Loading…
Reference in New Issue