Merge pull request #4085 from hashicorp/docs-node-drain
Initial Node drain docs
This commit is contained in:
commit
3495df7da9
|
@ -1,9 +1,15 @@
|
||||||
## 0.8 (Unreleased)
|
## 0.8 (Unreleased)
|
||||||
|
|
||||||
__BACKWARDS INCOMPATIBILITIES:__
|
__BACKWARDS INCOMPATIBILITIES:__
|
||||||
|
* cli: node drain now blocks until the drain completes and all allocations on
|
||||||
|
the draining node have stopped. Use -detach for the old behavior.
|
||||||
* discovery: Prevent absolute URLs in check paths. The documentation indicated
|
* discovery: Prevent absolute URLs in check paths. The documentation indicated
|
||||||
that absolute URLs are not allowed, but it was not enforced. Absolute URLs
|
that absolute URLs are not allowed, but it was not enforced. Absolute URLs
|
||||||
in HTTP check paths will now fail to validate. [[GH-3685](https://github.com/hashicorp/nomad/issues/3685)]
|
in HTTP check paths will now fail to validate. [[GH-3685](https://github.com/hashicorp/nomad/issues/3685)]
|
||||||
|
* drain: Draining a node no longer stops all allocations immediately: a new
|
||||||
|
[migrate stanza](https://www.nomadproject.io/docs/job-specification/migrate.html)
|
||||||
|
allows jobs to specify how quickly task groups can be drained. A `-force`
|
||||||
|
option can be used to emulate the old drain behavior.
|
||||||
* jobspec: The default values for restart policy have changed. Restart policy mode defaults to "fail" and the
|
* jobspec: The default values for restart policy have changed. Restart policy mode defaults to "fail" and the
|
||||||
attempts/time interval values have been changed to enable faster server side rescheduling. See
|
attempts/time interval values have been changed to enable faster server side rescheduling. See
|
||||||
[restart stanza](https://www.nomadproject.io/docs/job-specification/restart.html) for more information.
|
[restart stanza](https://www.nomadproject.io/docs/job-specification/restart.html) for more information.
|
||||||
|
@ -21,6 +27,9 @@ IMPROVEMENTS:
|
||||||
* core: Servers can now retry connecting to Vault to verify tokens without requiring a SIGHUP to do so [[GH-3957](https://github.com/hashicorp/nomad/issues/3957)]
|
* core: Servers can now retry connecting to Vault to verify tokens without requiring a SIGHUP to do so [[GH-3957](https://github.com/hashicorp/nomad/issues/3957)]
|
||||||
* core: Updated yamux library to pick up memory and CPU performance improvements [[GH-3980](https://github.com/hashicorp/nomad/issues/3980)]
|
* core: Updated yamux library to pick up memory and CPU performance improvements [[GH-3980](https://github.com/hashicorp/nomad/issues/3980)]
|
||||||
* core: Client stanza now supports overriding total memory [[GH-4052](https://github.com/hashicorp/nomad/issues/4052)]
|
* core: Client stanza now supports overriding total memory [[GH-4052](https://github.com/hashicorp/nomad/issues/4052)]
|
||||||
|
* core: Node draining is now able to migrate allocations in a controlled
|
||||||
|
manner with parameters specified by the drain command and in job files using
|
||||||
|
the migrate stanza [[GH-4010](https://github.com/hashicorp/nomad/issues/4010)]
|
||||||
* acl: Increase token name limit from 64 characters to 256 [[GH-3888](https://github.com/hashicorp/nomad/issues/3888)]
|
* acl: Increase token name limit from 64 characters to 256 [[GH-3888](https://github.com/hashicorp/nomad/issues/3888)]
|
||||||
* cli: Node status and filesystem related commands do not require direct
|
* cli: Node status and filesystem related commands do not require direct
|
||||||
network access to the Nomad client nodes [[GH-3892](https://github.com/hashicorp/nomad/issues/3892)]
|
network access to the Nomad client nodes [[GH-3892](https://github.com/hashicorp/nomad/issues/3892)]
|
||||||
|
|
|
@ -159,6 +159,35 @@ job "example" {
|
||||||
canary = 0
|
canary = 0
|
||||||
}
|
}
|
||||||
|
|
||||||
|
# The migrate stanza specifies the group's strategy for migrating off of
|
||||||
|
# draining nodes. If omitted, a default migration strategy is applied.
|
||||||
|
#
|
||||||
|
# For more information on the "migrate" stanza, please see
|
||||||
|
# the online documentation at:
|
||||||
|
#
|
||||||
|
# https://www.nomadproject.io/docs/job-specification/migrate.html
|
||||||
|
#
|
||||||
|
migrate {
|
||||||
|
# Specifies the number of task groups that can be migrated at the same
|
||||||
|
# time. This number must be less than the total count for the group as
|
||||||
|
# (count - max_parallel) will be left running during migrations.
|
||||||
|
max_parallel = 1
|
||||||
|
|
||||||
|
# Specifies the mechanism in which allocations health is determined. The
|
||||||
|
# potential values are "checks" or "task_states".
|
||||||
|
health_check = "checks"
|
||||||
|
|
||||||
|
# Specifies the minimum time the allocation must be in the healthy state
|
||||||
|
# before it is marked as healthy and unblocks further allocations from being
|
||||||
|
# migrated. This is specified using a label suffix like "30s" or "15m".
|
||||||
|
min_healthy_time = "10s"
|
||||||
|
|
||||||
|
# Specifies the deadline in which the allocation must be marked as healthy
|
||||||
|
# after which the allocation is automatically transitioned to unhealthy. This
|
||||||
|
# is specified using a label suffix like "2m" or "1h".
|
||||||
|
healthy_deadline = "5m"
|
||||||
|
}
|
||||||
|
|
||||||
# The "group" stanza defines a series of tasks that should be co-located on
|
# The "group" stanza defines a series of tasks that should be co-located on
|
||||||
# the same Nomad client. Any task within a group will be placed on the same
|
# the same Nomad client. Any task within a group will be placed on the same
|
||||||
# client.
|
# client.
|
||||||
|
|
|
@ -8,8 +8,8 @@ description: >
|
||||||
|
|
||||||
# Command: job deployments
|
# Command: job deployments
|
||||||
|
|
||||||
The `job dispatch` command is used to display the deployments for a particular
|
The `job deployments` command is used to display the deployments for a
|
||||||
job.
|
particular job.
|
||||||
|
|
||||||
## Usage
|
## Usage
|
||||||
|
|
||||||
|
|
|
@ -18,9 +18,11 @@ Run `nomad node <subcommand> -h` for help on that subcommand. The following
|
||||||
subcommands are available:
|
subcommands are available:
|
||||||
|
|
||||||
* [`node config`][config] - View or modify client configuration details
|
* [`node config`][config] - View or modify client configuration details
|
||||||
* [`node drain`][drain] - Toggle drain mode on a given node
|
* [`node drain`][drain] - Set drain mode on a given node
|
||||||
|
* [`node eligibility`][eligibility] - Toggle scheduilng eligibility on a given node
|
||||||
* [`node status`][status] - Display status information about nodes
|
* [`node status`][status] - Display status information about nodes
|
||||||
|
|
||||||
[config]: /docs/commands/node/config.html "View or modify client configuration details"
|
[config]: /docs/commands/node/config.html "View or modify client configuration details"
|
||||||
[drain]: /docs/commands/node/drain.html "Toggle drain mode on a given node"
|
[drain]: /docs/commands/node/drain.html "Set drain mode on a given node"
|
||||||
|
[eligibility]: /docs/commands/node/eligibility.html "Toggle scheduling eligibility on a given node"
|
||||||
[status]: /docs/commands/node/status.html "Display status information about nodes"
|
[status]: /docs/commands/node/status.html "Display status information about nodes"
|
||||||
|
|
|
@ -10,7 +10,20 @@ description: >
|
||||||
|
|
||||||
The `node drain` command is used to toggle drain mode on a given node. Drain
|
The `node drain` command is used to toggle drain mode on a given node. Drain
|
||||||
mode prevents any new tasks from being allocated to the node, and begins
|
mode prevents any new tasks from being allocated to the node, and begins
|
||||||
migrating all existing allocations away.
|
migrating all existing allocations away. Allocations will be migrated according
|
||||||
|
to their [`migrate`][migrate] stanza until the drain's deadline is reached.
|
||||||
|
|
||||||
|
By default the `node drain` command blocks until a node is done draining and
|
||||||
|
all allocations have terminated. Canceling the `node drain` command *will not*
|
||||||
|
cancel the drain. Drains may be canceled by using the `-disable` parameter
|
||||||
|
below.
|
||||||
|
|
||||||
|
When draining more than one node at a time, it is recommended you first disable
|
||||||
|
[scheduling eligibility][eligibility] on all nodes that will be drained. For
|
||||||
|
example if you are decommissioning an entire class of nodes, first run `node
|
||||||
|
eligibility -disable` on all of their node IDs, and then run `node drain
|
||||||
|
-enable`. This will ensure allocations drained from the first node are not
|
||||||
|
placed on another node about to be drained.
|
||||||
|
|
||||||
The [node status](/docs/commands/node/status.html) command compliments this
|
The [node status](/docs/commands/node/status.html) command compliments this
|
||||||
nicely by providing the current drain status of a given node.
|
nicely by providing the current drain status of a given node.
|
||||||
|
@ -37,6 +50,19 @@ operation is desired.
|
||||||
|
|
||||||
* `-enable`: Enable node drain mode.
|
* `-enable`: Enable node drain mode.
|
||||||
* `-disable`: Disable node drain mode.
|
* `-disable`: Disable node drain mode.
|
||||||
|
* `-deadline`: Set the deadline by which all allocations must be moved off the
|
||||||
|
node. Remaining allocations after the deadline are force removed from the
|
||||||
|
node. Defaults to 1 hour.
|
||||||
|
* `-detach`: Return immediately instead of entering monitor mode.
|
||||||
|
* `-force`: Force remove allocations off the node immediately.
|
||||||
|
* `-no-deadline`: No deadline allows the allocations to drain off the node
|
||||||
|
without being force stopped after a certain deadline.
|
||||||
|
* `-ignore-system`: Ignore sytem allows the drain to complete without stopping
|
||||||
|
system job allocations. By default system jobs are stopped last.
|
||||||
|
* `-keep-ineligible`: Keep ineligible will maintain the node's scheduling
|
||||||
|
ineligibility even if the drain is being disabled. This is useful when an
|
||||||
|
existing drain is being cancelled but additional scheduling on the node is not
|
||||||
|
desired.
|
||||||
* `-self`: Drain the local node.
|
* `-self`: Drain the local node.
|
||||||
* `-yes`: Automatic yes to prompts.
|
* `-yes`: Automatic yes to prompts.
|
||||||
|
|
||||||
|
@ -45,11 +71,46 @@ operation is desired.
|
||||||
Enable drain mode on node with ID prefix "4d2ba53b":
|
Enable drain mode on node with ID prefix "4d2ba53b":
|
||||||
|
|
||||||
```
|
```
|
||||||
$ nomad node drain -enable 4d2ba53b
|
$ nomad node drain -enable f4e8a9e5
|
||||||
|
Are you sure you want to enable drain mode for node "f4e8a9e5-30d8-3536-1e6f-cda5c869c35e"? [y/N] y
|
||||||
|
2018-03-30T23:13:16Z: Ctrl-C to stop monitoring: will not cancel the node drain
|
||||||
|
2018-03-30T23:13:16Z: Node "f4e8a9e5-30d8-3536-1e6f-cda5c869c35e" drain strategy set
|
||||||
|
2018-03-30T23:13:17Z: Alloc "1877230b-64d3-a7dd-9c31-dc5ad3c93e9a" marked for migration
|
||||||
|
2018-03-30T23:13:17Z: Alloc "1877230b-64d3-a7dd-9c31-dc5ad3c93e9a" draining
|
||||||
|
2018-03-30T23:13:17Z: Alloc "1877230b-64d3-a7dd-9c31-dc5ad3c93e9a" status running -> complete
|
||||||
|
2018-03-30T23:13:29Z: Alloc "3fce5308-818c-369e-0bb7-f61f0a1be9ed" marked for migration
|
||||||
|
2018-03-30T23:13:29Z: Alloc "3fce5308-818c-369e-0bb7-f61f0a1be9ed" draining
|
||||||
|
2018-03-30T23:13:30Z: Alloc "3fce5308-818c-369e-0bb7-f61f0a1be9ed" status running -> complete
|
||||||
|
2018-03-30T23:13:41Z: Alloc "9a98c5aa-a719-2f34-ecfc-0e6268b5d537" marked for migration
|
||||||
|
2018-03-30T23:13:41Z: Alloc "9a98c5aa-a719-2f34-ecfc-0e6268b5d537" draining
|
||||||
|
2018-03-30T23:13:41Z: Node "f4e8a9e5-30d8-3536-1e6f-cda5c869c35e" drain complete
|
||||||
|
2018-03-30T23:13:42Z: Alloc "9a98c5aa-a719-2f34-ecfc-0e6268b5d537" status running -> complete
|
||||||
|
2018-03-30T23:13:42Z: All allocations on node "f4e8a9e5-30d8-3536-1e6f-cda5c869c35e" have stopped.
|
||||||
```
|
```
|
||||||
|
|
||||||
Enable drain mode on the local node:
|
Enable drain mode on the local node:
|
||||||
|
|
||||||
```
|
```
|
||||||
$ nomad node drain -enable -self
|
$ nomad node drain -enable -self
|
||||||
|
...
|
||||||
```
|
```
|
||||||
|
|
||||||
|
Enable drain mode but do not stop system jobs:
|
||||||
|
|
||||||
|
```
|
||||||
|
$ nomad node drain -enable -ignore-system 4d2ba53b
|
||||||
|
...
|
||||||
|
```
|
||||||
|
|
||||||
|
Disable drain mode but keep the node ineligible for scheduling. Useful for
|
||||||
|
inspecting the current state of a misbehaving node without Nomad trying to
|
||||||
|
start or migrate allocations:
|
||||||
|
|
||||||
|
```
|
||||||
|
$ nomad node drain -disable -keep-ineligible 4d2ba53b
|
||||||
|
...
|
||||||
|
```
|
||||||
|
|
||||||
|
|
||||||
|
[eligibility]: /docs/commands/node/eligibility.html
|
||||||
|
[migrate]: /docs/job-specification/migrate.html
|
||||||
|
|
|
@ -0,0 +1,71 @@
|
||||||
|
---
|
||||||
|
layout: "docs"
|
||||||
|
page_title: "Commands: node eligibility"
|
||||||
|
sidebar_current: "docs-commands-node-eligibility"
|
||||||
|
description: >
|
||||||
|
The node eligibility command is used to configure a node's scheduling
|
||||||
|
eligibility.
|
||||||
|
---
|
||||||
|
|
||||||
|
# Command: node eligibility
|
||||||
|
|
||||||
|
The `node eligibility` command is used to toggle scheduling eligibility for a
|
||||||
|
given node. By default node's are eligible for scheduling meaning they can
|
||||||
|
receive placements and run new allocations. Node's that have their scheduling
|
||||||
|
elegibility disabled are ineligibile for new placements.
|
||||||
|
|
||||||
|
The [`node drain`][drain] command automatically disables eligibility. Disabling
|
||||||
|
a drain restore eligibility by default.
|
||||||
|
|
||||||
|
Disable scheduling eligibility is useful when draining a set of nodes: first
|
||||||
|
disable eligibility on each node that will be drained. Then drain each node.
|
||||||
|
If you just drain each node allocations may get rescheduled multiple times as
|
||||||
|
they get placed on node's about to be drained!
|
||||||
|
|
||||||
|
Disabling scheduling eligibility may also be useful when investigating poorly
|
||||||
|
behaved nodes. It allows operators to investigate the current state of a node
|
||||||
|
without the risk of additional work being assigned to it.
|
||||||
|
|
||||||
|
## Usage
|
||||||
|
|
||||||
|
```
|
||||||
|
nomad node eligibility [options] <node>
|
||||||
|
```
|
||||||
|
|
||||||
|
A `-self` flag can be used to toggle eligibility of the local node. If this is
|
||||||
|
not supplied, a node ID or prefix must be provided. If there is an exact match,
|
||||||
|
the eligibility will be adjusted for that node. Otherwise, a list of matching
|
||||||
|
nodes and information will be displayed.
|
||||||
|
|
||||||
|
It is also required to pass one of `-enable` or `-disable`, depending on which
|
||||||
|
operation is desired.
|
||||||
|
|
||||||
|
## General Options
|
||||||
|
|
||||||
|
<%= partial "docs/commands/_general_options" %>
|
||||||
|
|
||||||
|
## Drain Options
|
||||||
|
|
||||||
|
* `-enable`: Enable scheduling eligbility.
|
||||||
|
* `-disable`: Disable scheduling eligibility.
|
||||||
|
* `-self`: Set eligibility for the local node.
|
||||||
|
* `-yes`: Automatic yes to prompts.
|
||||||
|
|
||||||
|
## Examples
|
||||||
|
|
||||||
|
Enable scheduling eligibility on node with ID prefix "574545c5":
|
||||||
|
|
||||||
|
```
|
||||||
|
$ nomad node eligibility -enable 574545c5
|
||||||
|
Node "574545c5-c2d7-e352-d505-5e2cb9fe169f" scheduling eligibility set: eligible for scheduling
|
||||||
|
```
|
||||||
|
|
||||||
|
Disable scheduling eligibility on the local node:
|
||||||
|
|
||||||
|
```
|
||||||
|
$ nomad node eligibility -disable -self
|
||||||
|
Node "574545c5-c2d7-e352-d505-5e2cb9fe169f" scheduling eligibility set: ineligible for scheduling
|
||||||
|
```
|
||||||
|
|
||||||
|
|
||||||
|
[drain]: /docs/commands/node/drain.html
|
|
@ -0,0 +1,84 @@
|
||||||
|
---
|
||||||
|
layout: "docs"
|
||||||
|
page_title: "migrate Stanza - Job Specification"
|
||||||
|
sidebar_current: "docs-job-specification-migrate"
|
||||||
|
description: |-
|
||||||
|
The "migrate" stanza specifies the group's migrate strategy. The migrate
|
||||||
|
strategy is used to control the job's behavior when it is being migrated off
|
||||||
|
of a draining node.
|
||||||
|
---
|
||||||
|
|
||||||
|
# `migrate` Stanza
|
||||||
|
|
||||||
|
<table class="table table-bordered table-striped">
|
||||||
|
<tr>
|
||||||
|
<th width="120">Placement</th>
|
||||||
|
<td>
|
||||||
|
<code>job -> **migrate**</code>
|
||||||
|
</td>
|
||||||
|
<td>
|
||||||
|
<code>job -> group -> **migrate**</code>
|
||||||
|
</td>
|
||||||
|
</tr>
|
||||||
|
</table>
|
||||||
|
|
||||||
|
The `migrate` stanza specifies the group's strategy for migrating off of
|
||||||
|
[draining][drain] nodes. If omitted, a default migration strategy is applied.
|
||||||
|
If specified at the job level, the configuration will apply to all groups
|
||||||
|
within the job. Only service jobs with a count greater than 1 support migrate
|
||||||
|
stanzas.
|
||||||
|
|
||||||
|
```hcl
|
||||||
|
job "docs" {
|
||||||
|
migrate {
|
||||||
|
max_parallel = 1
|
||||||
|
health_check = "checks"
|
||||||
|
min_healthy_time = "10s"
|
||||||
|
healthy_deadline = "5m"
|
||||||
|
}
|
||||||
|
}
|
||||||
|
```
|
||||||
|
|
||||||
|
When one or more nodes are draining, only `max_parallel` allocations will be
|
||||||
|
stopped at a time. Node draining will not continue until replacement
|
||||||
|
allocations have been healthy for their `min_healthy_time` or
|
||||||
|
`healthy_deadline` is reached.
|
||||||
|
|
||||||
|
Note that a node's drain [deadline][deadline] will override the `migrate`
|
||||||
|
stanza for allocations on that node. The `migrate` stanza is for job authors to
|
||||||
|
define how their services should be migrated, while the node drain deadline is
|
||||||
|
for system operators to put hard limits on how long a drain may take.
|
||||||
|
|
||||||
|
## `migrate` Parameters
|
||||||
|
|
||||||
|
- `max_parallel` `(int: 1)` - Specifies the number of allocations that can be
|
||||||
|
migrated at the same time. This number must be less than the total
|
||||||
|
[`count`][count] for the group as `count - max_parallel` will be left running
|
||||||
|
during migrations.
|
||||||
|
|
||||||
|
- `health_check` `(string: "checks")` - Specifies the mechanism in which
|
||||||
|
allocations health is determined. The potential values are:
|
||||||
|
|
||||||
|
- "checks" - Specifies that the allocation should be considered healthy when
|
||||||
|
all of its tasks are running and their associated [checks][checks] are
|
||||||
|
healthy, and unhealthy if any of the tasks fail or not all checks become
|
||||||
|
healthy. This is a superset of "task_states" mode.
|
||||||
|
|
||||||
|
- "task_states" - Specifies that the allocation should be considered healthy when
|
||||||
|
all its tasks are running and unhealthy if tasks fail.
|
||||||
|
|
||||||
|
- `min_healthy_time` `(string: "10s")` - Specifies the minimum time the
|
||||||
|
allocation must be in the healthy state before it is marked as healthy and
|
||||||
|
unblocks further allocations from being migrated. This is specified using a
|
||||||
|
label suffix like "30s" or "15m".
|
||||||
|
|
||||||
|
- `healthy_deadline` `(string: "5m")` - Specifies the deadline in which the
|
||||||
|
allocation must be marked as healthy after which the allocation is
|
||||||
|
automatically transitioned to unhealthy. This is specified using a label
|
||||||
|
suffix like "2m" or "1h".
|
||||||
|
|
||||||
|
|
||||||
|
[checks]: /docs/job-specification/service.html#check-parameters
|
||||||
|
[count]: /docs/job-specification/group.html#count
|
||||||
|
[drain]: /docs/commands/node/drain.html
|
||||||
|
[deadline]: /docs/commands/node/drain.html#deadline
|
|
@ -53,6 +53,9 @@
|
||||||
<li<%= sidebar_current("docs-job-specification-meta")%>>
|
<li<%= sidebar_current("docs-job-specification-meta")%>>
|
||||||
<a href="/docs/job-specification/meta.html">meta</a>
|
<a href="/docs/job-specification/meta.html">meta</a>
|
||||||
</li>
|
</li>
|
||||||
|
<li<%= sidebar_current("docs-job-specification-migrate")%>>
|
||||||
|
<a href="/docs/job-specification/migrate.html">migrate</a>
|
||||||
|
</li>
|
||||||
<li<%= sidebar_current("docs-job-specification-network")%>>
|
<li<%= sidebar_current("docs-job-specification-network")%>>
|
||||||
<a href="/docs/job-specification/network.html">network</a>
|
<a href="/docs/job-specification/network.html">network</a>
|
||||||
</li>
|
</li>
|
||||||
|
@ -324,6 +327,9 @@
|
||||||
<li<%= sidebar_current("docs-commands-node-drain") %>>
|
<li<%= sidebar_current("docs-commands-node-drain") %>>
|
||||||
<a href="/docs/commands/node/drain.html">drain</a>
|
<a href="/docs/commands/node/drain.html">drain</a>
|
||||||
</li>
|
</li>
|
||||||
|
<li<%= sidebar_current("docs-commands-node-eligibility") %>>
|
||||||
|
<a href="/docs/commands/node/eligibility.html">eligibility</a>
|
||||||
|
</li>
|
||||||
<li<%= sidebar_current("docs-commands-node-status") %>>
|
<li<%= sidebar_current("docs-commands-node-status") %>>
|
||||||
<a href="/docs/commands/node/status.html">status</a>
|
<a href="/docs/commands/node/status.html">status</a>
|
||||||
</li>
|
</li>
|
||||||
|
|
Loading…
Reference in New Issue