Merge pull request #4085 from hashicorp/docs-node-drain

Initial Node drain docs
This commit is contained in:
Michael Schurter 2018-03-30 16:34:49 -07:00 committed by GitHub
commit 3495df7da9
No known key found for this signature in database
GPG Key ID: 4AEE18F83AFDEB23
8 changed files with 268 additions and 6 deletions

View File

@ -1,9 +1,15 @@
## 0.8 (Unreleased)
__BACKWARDS INCOMPATIBILITIES:__
* cli: node drain now blocks until the drain completes and all allocations on
the draining node have stopped. Use -detach for the old behavior.
* discovery: Prevent absolute URLs in check paths. The documentation indicated
that absolute URLs are not allowed, but it was not enforced. Absolute URLs
in HTTP check paths will now fail to validate. [[GH-3685](https://github.com/hashicorp/nomad/issues/3685)]
* drain: Draining a node no longer stops all allocations immediately: a new
[migrate stanza](https://www.nomadproject.io/docs/job-specification/migrate.html)
allows jobs to specify how quickly task groups can be drained. A `-force`
option can be used to emulate the old drain behavior.
* jobspec: The default values for restart policy have changed. Restart policy mode defaults to "fail" and the
attempts/time interval values have been changed to enable faster server side rescheduling. See
[restart stanza](https://www.nomadproject.io/docs/job-specification/restart.html) for more information.
@ -21,6 +27,9 @@ IMPROVEMENTS:
* core: Servers can now retry connecting to Vault to verify tokens without requiring a SIGHUP to do so [[GH-3957](https://github.com/hashicorp/nomad/issues/3957)]
* core: Updated yamux library to pick up memory and CPU performance improvements [[GH-3980](https://github.com/hashicorp/nomad/issues/3980)]
* core: Client stanza now supports overriding total memory [[GH-4052](https://github.com/hashicorp/nomad/issues/4052)]
* core: Node draining is now able to migrate allocations in a controlled
manner with parameters specified by the drain command and in job files using
the migrate stanza [[GH-4010](https://github.com/hashicorp/nomad/issues/4010)]
* acl: Increase token name limit from 64 characters to 256 [[GH-3888](https://github.com/hashicorp/nomad/issues/3888)]
* cli: Node status and filesystem related commands do not require direct
network access to the Nomad client nodes [[GH-3892](https://github.com/hashicorp/nomad/issues/3892)]

View File

@ -159,6 +159,35 @@ job "example" {
canary = 0
}
# The migrate stanza specifies the group's strategy for migrating off of
# draining nodes. If omitted, a default migration strategy is applied.
#
# For more information on the "migrate" stanza, please see
# the online documentation at:
#
# https://www.nomadproject.io/docs/job-specification/migrate.html
#
migrate {
# Specifies the number of task groups that can be migrated at the same
# time. This number must be less than the total count for the group as
# (count - max_parallel) will be left running during migrations.
max_parallel = 1
# Specifies the mechanism in which allocations health is determined. The
# potential values are "checks" or "task_states".
health_check = "checks"
# Specifies the minimum time the allocation must be in the healthy state
# before it is marked as healthy and unblocks further allocations from being
# migrated. This is specified using a label suffix like "30s" or "15m".
min_healthy_time = "10s"
# Specifies the deadline in which the allocation must be marked as healthy
# after which the allocation is automatically transitioned to unhealthy. This
# is specified using a label suffix like "2m" or "1h".
healthy_deadline = "5m"
}
# The "group" stanza defines a series of tasks that should be co-located on
# the same Nomad client. Any task within a group will be placed on the same
# client.

View File

@ -8,8 +8,8 @@ description: >
# Command: job deployments
The `job dispatch` command is used to display the deployments for a particular
job.
The `job deployments` command is used to display the deployments for a
particular job.
## Usage

View File

@ -18,9 +18,11 @@ Run `nomad node <subcommand> -h` for help on that subcommand. The following
subcommands are available:
* [`node config`][config] - View or modify client configuration details
* [`node drain`][drain] - Toggle drain mode on a given node
* [`node drain`][drain] - Set drain mode on a given node
* [`node eligibility`][eligibility] - Toggle scheduilng eligibility on a given node
* [`node status`][status] - Display status information about nodes
[config]: /docs/commands/node/config.html "View or modify client configuration details"
[drain]: /docs/commands/node/drain.html "Toggle drain mode on a given node"
[drain]: /docs/commands/node/drain.html "Set drain mode on a given node"
[eligibility]: /docs/commands/node/eligibility.html "Toggle scheduling eligibility on a given node"
[status]: /docs/commands/node/status.html "Display status information about nodes"

View File

@ -10,7 +10,20 @@ description: >
The `node drain` command is used to toggle drain mode on a given node. Drain
mode prevents any new tasks from being allocated to the node, and begins
migrating all existing allocations away.
migrating all existing allocations away. Allocations will be migrated according
to their [`migrate`][migrate] stanza until the drain's deadline is reached.
By default the `node drain` command blocks until a node is done draining and
all allocations have terminated. Canceling the `node drain` command *will not*
cancel the drain. Drains may be canceled by using the `-disable` parameter
below.
When draining more than one node at a time, it is recommended you first disable
[scheduling eligibility][eligibility] on all nodes that will be drained. For
example if you are decommissioning an entire class of nodes, first run `node
eligibility -disable` on all of their node IDs, and then run `node drain
-enable`. This will ensure allocations drained from the first node are not
placed on another node about to be drained.
The [node status](/docs/commands/node/status.html) command compliments this
nicely by providing the current drain status of a given node.
@ -37,6 +50,19 @@ operation is desired.
* `-enable`: Enable node drain mode.
* `-disable`: Disable node drain mode.
* `-deadline`: Set the deadline by which all allocations must be moved off the
node. Remaining allocations after the deadline are force removed from the
node. Defaults to 1 hour.
* `-detach`: Return immediately instead of entering monitor mode.
* `-force`: Force remove allocations off the node immediately.
* `-no-deadline`: No deadline allows the allocations to drain off the node
without being force stopped after a certain deadline.
* `-ignore-system`: Ignore sytem allows the drain to complete without stopping
system job allocations. By default system jobs are stopped last.
* `-keep-ineligible`: Keep ineligible will maintain the node's scheduling
ineligibility even if the drain is being disabled. This is useful when an
existing drain is being cancelled but additional scheduling on the node is not
desired.
* `-self`: Drain the local node.
* `-yes`: Automatic yes to prompts.
@ -45,11 +71,46 @@ operation is desired.
Enable drain mode on node with ID prefix "4d2ba53b":
```
$ nomad node drain -enable 4d2ba53b
$ nomad node drain -enable f4e8a9e5
Are you sure you want to enable drain mode for node "f4e8a9e5-30d8-3536-1e6f-cda5c869c35e"? [y/N] y
2018-03-30T23:13:16Z: Ctrl-C to stop monitoring: will not cancel the node drain
2018-03-30T23:13:16Z: Node "f4e8a9e5-30d8-3536-1e6f-cda5c869c35e" drain strategy set
2018-03-30T23:13:17Z: Alloc "1877230b-64d3-a7dd-9c31-dc5ad3c93e9a" marked for migration
2018-03-30T23:13:17Z: Alloc "1877230b-64d3-a7dd-9c31-dc5ad3c93e9a" draining
2018-03-30T23:13:17Z: Alloc "1877230b-64d3-a7dd-9c31-dc5ad3c93e9a" status running -> complete
2018-03-30T23:13:29Z: Alloc "3fce5308-818c-369e-0bb7-f61f0a1be9ed" marked for migration
2018-03-30T23:13:29Z: Alloc "3fce5308-818c-369e-0bb7-f61f0a1be9ed" draining
2018-03-30T23:13:30Z: Alloc "3fce5308-818c-369e-0bb7-f61f0a1be9ed" status running -> complete
2018-03-30T23:13:41Z: Alloc "9a98c5aa-a719-2f34-ecfc-0e6268b5d537" marked for migration
2018-03-30T23:13:41Z: Alloc "9a98c5aa-a719-2f34-ecfc-0e6268b5d537" draining
2018-03-30T23:13:41Z: Node "f4e8a9e5-30d8-3536-1e6f-cda5c869c35e" drain complete
2018-03-30T23:13:42Z: Alloc "9a98c5aa-a719-2f34-ecfc-0e6268b5d537" status running -> complete
2018-03-30T23:13:42Z: All allocations on node "f4e8a9e5-30d8-3536-1e6f-cda5c869c35e" have stopped.
```
Enable drain mode on the local node:
```
$ nomad node drain -enable -self
...
```
Enable drain mode but do not stop system jobs:
```
$ nomad node drain -enable -ignore-system 4d2ba53b
...
```
Disable drain mode but keep the node ineligible for scheduling. Useful for
inspecting the current state of a misbehaving node without Nomad trying to
start or migrate allocations:
```
$ nomad node drain -disable -keep-ineligible 4d2ba53b
...
```
[eligibility]: /docs/commands/node/eligibility.html
[migrate]: /docs/job-specification/migrate.html

View File

@ -0,0 +1,71 @@
---
layout: "docs"
page_title: "Commands: node eligibility"
sidebar_current: "docs-commands-node-eligibility"
description: >
The node eligibility command is used to configure a node's scheduling
eligibility.
---
# Command: node eligibility
The `node eligibility` command is used to toggle scheduling eligibility for a
given node. By default node's are eligible for scheduling meaning they can
receive placements and run new allocations. Node's that have their scheduling
elegibility disabled are ineligibile for new placements.
The [`node drain`][drain] command automatically disables eligibility. Disabling
a drain restore eligibility by default.
Disable scheduling eligibility is useful when draining a set of nodes: first
disable eligibility on each node that will be drained. Then drain each node.
If you just drain each node allocations may get rescheduled multiple times as
they get placed on node's about to be drained!
Disabling scheduling eligibility may also be useful when investigating poorly
behaved nodes. It allows operators to investigate the current state of a node
without the risk of additional work being assigned to it.
## Usage
```
nomad node eligibility [options] <node>
```
A `-self` flag can be used to toggle eligibility of the local node. If this is
not supplied, a node ID or prefix must be provided. If there is an exact match,
the eligibility will be adjusted for that node. Otherwise, a list of matching
nodes and information will be displayed.
It is also required to pass one of `-enable` or `-disable`, depending on which
operation is desired.
## General Options
<%= partial "docs/commands/_general_options" %>
## Drain Options
* `-enable`: Enable scheduling eligbility.
* `-disable`: Disable scheduling eligibility.
* `-self`: Set eligibility for the local node.
* `-yes`: Automatic yes to prompts.
## Examples
Enable scheduling eligibility on node with ID prefix "574545c5":
```
$ nomad node eligibility -enable 574545c5
Node "574545c5-c2d7-e352-d505-5e2cb9fe169f" scheduling eligibility set: eligible for scheduling
```
Disable scheduling eligibility on the local node:
```
$ nomad node eligibility -disable -self
Node "574545c5-c2d7-e352-d505-5e2cb9fe169f" scheduling eligibility set: ineligible for scheduling
```
[drain]: /docs/commands/node/drain.html

View File

@ -0,0 +1,84 @@
---
layout: "docs"
page_title: "migrate Stanza - Job Specification"
sidebar_current: "docs-job-specification-migrate"
description: |-
The "migrate" stanza specifies the group's migrate strategy. The migrate
strategy is used to control the job's behavior when it is being migrated off
of a draining node.
---
# `migrate` Stanza
<table class="table table-bordered table-striped">
<tr>
<th width="120">Placement</th>
<td>
<code>job -> **migrate**</code>
</td>
<td>
<code>job -> group -> **migrate**</code>
</td>
</tr>
</table>
The `migrate` stanza specifies the group's strategy for migrating off of
[draining][drain] nodes. If omitted, a default migration strategy is applied.
If specified at the job level, the configuration will apply to all groups
within the job. Only service jobs with a count greater than 1 support migrate
stanzas.
```hcl
job "docs" {
migrate {
max_parallel = 1
health_check = "checks"
min_healthy_time = "10s"
healthy_deadline = "5m"
}
}
```
When one or more nodes are draining, only `max_parallel` allocations will be
stopped at a time. Node draining will not continue until replacement
allocations have been healthy for their `min_healthy_time` or
`healthy_deadline` is reached.
Note that a node's drain [deadline][deadline] will override the `migrate`
stanza for allocations on that node. The `migrate` stanza is for job authors to
define how their services should be migrated, while the node drain deadline is
for system operators to put hard limits on how long a drain may take.
## `migrate` Parameters
- `max_parallel` `(int: 1)` - Specifies the number of allocations that can be
migrated at the same time. This number must be less than the total
[`count`][count] for the group as `count - max_parallel` will be left running
during migrations.
- `health_check` `(string: "checks")` - Specifies the mechanism in which
allocations health is determined. The potential values are:
- "checks" - Specifies that the allocation should be considered healthy when
all of its tasks are running and their associated [checks][checks] are
healthy, and unhealthy if any of the tasks fail or not all checks become
healthy. This is a superset of "task_states" mode.
- "task_states" - Specifies that the allocation should be considered healthy when
all its tasks are running and unhealthy if tasks fail.
- `min_healthy_time` `(string: "10s")` - Specifies the minimum time the
allocation must be in the healthy state before it is marked as healthy and
unblocks further allocations from being migrated. This is specified using a
label suffix like "30s" or "15m".
- `healthy_deadline` `(string: "5m")` - Specifies the deadline in which the
allocation must be marked as healthy after which the allocation is
automatically transitioned to unhealthy. This is specified using a label
suffix like "2m" or "1h".
[checks]: /docs/job-specification/service.html#check-parameters
[count]: /docs/job-specification/group.html#count
[drain]: /docs/commands/node/drain.html
[deadline]: /docs/commands/node/drain.html#deadline

View File

@ -53,6 +53,9 @@
<li<%= sidebar_current("docs-job-specification-meta")%>>
<a href="/docs/job-specification/meta.html">meta</a>
</li>
<li<%= sidebar_current("docs-job-specification-migrate")%>>
<a href="/docs/job-specification/migrate.html">migrate</a>
</li>
<li<%= sidebar_current("docs-job-specification-network")%>>
<a href="/docs/job-specification/network.html">network</a>
</li>
@ -324,6 +327,9 @@
<li<%= sidebar_current("docs-commands-node-drain") %>>
<a href="/docs/commands/node/drain.html">drain</a>
</li>
<li<%= sidebar_current("docs-commands-node-eligibility") %>>
<a href="/docs/commands/node/eligibility.html">eligibility</a>
</li>
<li<%= sidebar_current("docs-commands-node-status") %>>
<a href="/docs/commands/node/status.html">status</a>
</li>