Merge pull request #4085 from hashicorp/docs-node-drain

Initial Node drain docs
2018-03-30 16:34:49 -07:00 · 2018-03-30 16:34:49 -07:00 · 3495df7da9
parent 6871a068cb 1f1a20eaed
commit 3495df7da9
8 changed files with 268 additions and 6 deletions
--- a/CHANGELOG.md
+++ b/CHANGELOG.md
@ -1,9 +1,15 @@
 ## 0.8 (Unreleased)
 __BACKWARDS INCOMPATIBILITIES:__
 * cli: node drain now blocks until the drain completes and all allocations on
   the draining node have stopped. Use -detach for the old behavior.
 * discovery: Prevent absolute URLs in check paths. The documentation indicated
   that absolute URLs are not allowed, but it was not enforced. Absolute URLs
   in HTTP check paths will now fail to validate. [[GH-3685](https://github.com/hashicorp/nomad/issues/3685)]
 * drain: Draining a node no longer stops all allocations immediately: a new
   [migrate stanza](https://www.nomadproject.io/docs/job-specification/migrate.html)
   allows jobs to specify how quickly task groups can be drained. A `-force`
   option can be used to emulate the old drain behavior.
 * jobspec: The default values for restart policy have changed. Restart policy mode defaults to "fail" and the
   attempts/time interval values have been changed to enable faster server side rescheduling. See
   [restart stanza](https://www.nomadproject.io/docs/job-specification/restart.html) for more information.
@ -21,6 +27,9 @@ IMPROVEMENTS:
 * core: Servers can now retry connecting to Vault to verify tokens without requiring a SIGHUP to do so [[GH-3957](https://github.com/hashicorp/nomad/issues/3957)]
 * core: Updated yamux library to pick up memory and CPU performance improvements [[GH-3980](https://github.com/hashicorp/nomad/issues/3980)]
 * core: Client stanza now supports overriding total memory [[GH-4052](https://github.com/hashicorp/nomad/issues/4052)]
 * core: Node draining is now able to migrate allocations in a controlled
   manner with parameters specified by the drain command and in job files using
   the migrate stanza [[GH-4010](https://github.com/hashicorp/nomad/issues/4010)]
 * acl: Increase token name limit from 64 characters to 256 [[GH-3888](https://github.com/hashicorp/nomad/issues/3888)]
 * cli: Node status and filesystem related commands do not require direct
   network access to the Nomad client nodes [[GH-3892](https://github.com/hashicorp/nomad/issues/3892)]
--- a/command/job_init.go
+++ b/command/job_init.go
@ -159,6 +159,35 @@ job "example" {
    canary = 0
  }
  # The migrate stanza specifies the group's strategy for migrating off of
  # draining nodes. If omitted, a default migration strategy is applied.
  #
  # For more information on the "migrate" stanza, please see 
  # the online documentation at:
  #
  #     https://www.nomadproject.io/docs/job-specification/migrate.html
  #
  migrate {
    # Specifies the number of task groups that can be migrated at the same
    # time. This number must be less than the total count for the group as
    # (count - max_parallel) will be left running during migrations.
    max_parallel = 1
    # Specifies the mechanism in which allocations health is determined. The
    # potential values are "checks" or "task_states".
    health_check = "checks"
    # Specifies the minimum time the allocation must be in the healthy state
    # before it is marked as healthy and unblocks further allocations from being
    # migrated. This is specified using a label suffix like "30s" or "15m".
    min_healthy_time = "10s"
    # Specifies the deadline in which the allocation must be marked as healthy
    # after which the allocation is automatically transitioned to unhealthy. This
    # is specified using a label suffix like "2m" or "1h".
    healthy_deadline = "5m"
  }
  # The "group" stanza defines a series of tasks that should be co-located on
  # the same Nomad client. Any task within a group will be placed on the same
  # client.
--- a/website/source/docs/commands/job/deployments.html.md.erb
+++ b/website/source/docs/commands/job/deployments.html.md.erb
@ -8,8 +8,8 @@ description: >
 # Command: job deployments
-The `job dispatch` command is used to display the deployments for a particular
+The `job deployments` command is used to display the deployments for a
-job.
+particular job.
 ## Usage
--- a/website/source/docs/commands/node.html.md.erb
+++ b/website/source/docs/commands/node.html.md.erb
@ -18,9 +18,11 @@ Run `nomad node <subcommand> -h` for help on that subcommand. The following
 subcommands are available:
 * [`node config`][config] - View or modify client configuration details
-* [`node drain`][drain] - Toggle drain mode on a given node
+* [`node drain`][drain] - Set drain mode on a given node
 * [`node eligibility`][eligibility] - Toggle scheduilng eligibility on a given node
 * [`node status`][status] - Display status information about nodes
 [config]: /docs/commands/node/config.html "View or modify client configuration details"
-[drain]: /docs/commands/node/drain.html "Toggle drain mode on a given node"
+[drain]: /docs/commands/node/drain.html "Set drain mode on a given node"
 [eligibility]: /docs/commands/node/eligibility.html "Toggle scheduling eligibility on a given node"
 [status]: /docs/commands/node/status.html "Display status information about nodes"
--- a/website/source/docs/commands/node/drain.html.md.erb
+++ b/website/source/docs/commands/node/drain.html.md.erb
@ -10,7 +10,20 @@ description: >
 The `node drain` command is used to toggle drain mode on a given node. Drain
 mode prevents any new tasks from being allocated to the node, and begins
-migrating all existing allocations away.
+migrating all existing allocations away. Allocations will be migrated according
 to their [`migrate`][migrate] stanza until the drain's deadline is reached.
 By default the `node drain` command blocks until a node is done draining and
 all allocations have terminated. Canceling the `node drain` command *will not*
 cancel the drain. Drains may be canceled by using the `-disable` parameter
 below.
 When draining more than one node at a time, it is recommended you first disable
 [scheduling eligibility][eligibility] on all nodes that will be drained. For
 example if you are decommissioning an entire class of nodes, first run `node
 eligibility -disable` on all of their node IDs, and then run `node drain
 -enable`. This will ensure allocations drained from the first node are not
 placed on another node about to be drained.
 The [node status](/docs/commands/node/status.html) command compliments this
 nicely by providing the current drain status of a given node.
@ -37,6 +50,19 @@ operation is desired.
 * `-enable`: Enable node drain mode.
 * `-disable`: Disable node drain mode.
 * `-deadline`: Set the deadline by which all allocations must be moved off the
  node. Remaining allocations after the deadline are force removed from the
  node. Defaults to 1 hour.
 * `-detach`: Return immediately instead of entering monitor mode.
 * `-force`: Force remove allocations off the node immediately.
 * `-no-deadline`: No deadline allows the allocations to drain off the node
  without being force stopped after a certain deadline.
 * `-ignore-system`: Ignore sytem allows the drain to complete without stopping
  system job allocations. By default system jobs are stopped last.
 * `-keep-ineligible`: Keep ineligible will maintain the node's scheduling
  ineligibility even if the drain is being disabled. This is useful when an
  existing drain is being cancelled but additional scheduling on the node is not
  desired.
 * `-self`: Drain the local node.
 * `-yes`: Automatic yes to prompts.
@ -45,11 +71,46 @@ operation is desired.
 Enable drain mode on node with ID prefix "4d2ba53b":
 ```
-$ nomad node drain -enable 4d2ba53b
+$ nomad node drain -enable f4e8a9e5
 Are you sure you want to enable drain mode for node "f4e8a9e5-30d8-3536-1e6f-cda5c869c35e"? [y/N] y
 2018-03-30T23:13:16Z: Ctrl-C to stop monitoring: will not cancel the node drain
 2018-03-30T23:13:16Z: Node "f4e8a9e5-30d8-3536-1e6f-cda5c869c35e" drain strategy set
 2018-03-30T23:13:17Z: Alloc "1877230b-64d3-a7dd-9c31-dc5ad3c93e9a" marked for migration
 2018-03-30T23:13:17Z: Alloc "1877230b-64d3-a7dd-9c31-dc5ad3c93e9a" draining
 2018-03-30T23:13:17Z: Alloc "1877230b-64d3-a7dd-9c31-dc5ad3c93e9a" status running -> complete
 2018-03-30T23:13:29Z: Alloc "3fce5308-818c-369e-0bb7-f61f0a1be9ed" marked for migration
 2018-03-30T23:13:29Z: Alloc "3fce5308-818c-369e-0bb7-f61f0a1be9ed" draining
 2018-03-30T23:13:30Z: Alloc "3fce5308-818c-369e-0bb7-f61f0a1be9ed" status running -> complete
 2018-03-30T23:13:41Z: Alloc "9a98c5aa-a719-2f34-ecfc-0e6268b5d537" marked for migration
 2018-03-30T23:13:41Z: Alloc "9a98c5aa-a719-2f34-ecfc-0e6268b5d537" draining
 2018-03-30T23:13:41Z: Node "f4e8a9e5-30d8-3536-1e6f-cda5c869c35e" drain complete
 2018-03-30T23:13:42Z: Alloc "9a98c5aa-a719-2f34-ecfc-0e6268b5d537" status running -> complete
 2018-03-30T23:13:42Z: All allocations on node "f4e8a9e5-30d8-3536-1e6f-cda5c869c35e" have stopped.
 ```
 Enable drain mode on the local node:
 ```
 $ nomad node drain -enable -self
 ...
 ```
 Enable drain mode but do not stop system jobs:
 ```
 $ nomad node drain -enable -ignore-system 4d2ba53b
 ...
 ```
 Disable drain mode but keep the node ineligible for scheduling. Useful for
 inspecting the current state of a misbehaving node without Nomad trying to
 start or migrate allocations:
 ```
 $ nomad node drain -disable -keep-ineligible 4d2ba53b
 ...
 ```
 [eligibility]: /docs/commands/node/eligibility.html
 [migrate]: /docs/job-specification/migrate.html
--- a/website/source/docs/commands/node/eligibility.html.md.erb
+++ b/website/source/docs/commands/node/eligibility.html.md.erb
@ -0,0 +1,71 @@
 ---
 layout: "docs"
 page_title: "Commands: node eligibility"
 sidebar_current: "docs-commands-node-eligibility"
 description: >
  The node eligibility command is used to configure a node's scheduling
  eligibility.
 ---
 # Command: node eligibility
 The `node eligibility` command is used to toggle scheduling eligibility for a
 given node. By default node's are eligible for scheduling meaning they can
 receive placements and run new allocations. Node's that have their scheduling
 elegibility disabled are ineligibile for new placements.
 The [`node drain`][drain] command automatically disables eligibility. Disabling
 a drain restore eligibility by default.
 Disable scheduling eligibility is useful when draining a set of nodes: first
 disable eligibility on each node that will be drained. Then drain each node.
 If you just drain each node allocations may get rescheduled multiple times as
 they get placed on node's about to be drained!
 Disabling scheduling eligibility may also be useful when investigating poorly
 behaved nodes. It allows operators to investigate the current state of a node
 without the risk of additional work being assigned to it.
 ## Usage
 ```
 nomad node eligibility [options] <node>
 ```
 A `-self` flag can be used to toggle eligibility of the local node. If this is
 not supplied, a node ID or prefix must be provided. If there is an exact match,
 the eligibility will be adjusted for that node. Otherwise, a list of matching
 nodes and information will be displayed.
 It is also required to pass one of `-enable` or `-disable`, depending on which
 operation is desired.
 ## General Options
 <%= partial "docs/commands/_general_options" %>
 ## Drain Options
 * `-enable`: Enable scheduling eligbility.
 * `-disable`: Disable scheduling eligibility.
 * `-self`: Set eligibility for the local node.
 * `-yes`: Automatic yes to prompts.
 ## Examples
 Enable scheduling eligibility on node with ID prefix "574545c5":
 ```
 $ nomad node eligibility -enable 574545c5
 Node "574545c5-c2d7-e352-d505-5e2cb9fe169f" scheduling eligibility set: eligible for scheduling
 ```
 Disable scheduling eligibility on the local node:
 ```
 $ nomad node eligibility -disable -self
 Node "574545c5-c2d7-e352-d505-5e2cb9fe169f" scheduling eligibility set: ineligible for scheduling
 ```
 [drain]: /docs/commands/node/drain.html
--- a/website/source/docs/job-specification/migrate.html.md
+++ b/website/source/docs/job-specification/migrate.html.md
@ -0,0 +1,84 @@
 ---
 layout: "docs"
 page_title: "migrate Stanza - Job Specification"
 sidebar_current: "docs-job-specification-migrate"
 description: |-
  The "migrate" stanza specifies the group's migrate strategy. The migrate
  strategy is used to control the job's behavior when it is being migrated off
  of a draining node.
 ---
 # `migrate` Stanza
 <table class="table table-bordered table-striped">
  <tr>
    <th width="120">Placement</th>
    <td>
      <code>job -> **migrate**</code>
    </td>
    <td>
      <code>job -> group -> **migrate**</code>
    </td>
  </tr>
 </table>
 The `migrate` stanza specifies the group's strategy for migrating off of
 [draining][drain] nodes. If omitted, a default migration strategy is applied.
 If specified at the job level, the configuration will apply to all groups
 within the job. Only service jobs with a count greater than 1 support migrate
 stanzas.
 ```hcl
 job "docs" {
  migrate {
    max_parallel     = 1
    health_check     = "checks"
    min_healthy_time = "10s"
    healthy_deadline = "5m"
  }
 }
 ```
 When one or more nodes are draining, only `max_parallel` allocations will be
 stopped at a time. Node draining will not continue until replacement
 allocations have been healthy for their `min_healthy_time` or
 `healthy_deadline` is reached.
 Note that a node's drain [deadline][deadline] will override the `migrate`
 stanza for allocations on that node. The `migrate` stanza is for job authors to
 define how their services should be migrated, while the node drain deadline is
 for system operators to put hard limits on how long a drain may take.
 ## `migrate` Parameters
 - `max_parallel` `(int: 1)` - Specifies the number of allocations that can be
  migrated at the same time. This number must be less than the total
  [`count`][count] for the group as `count - max_parallel` will be left running
  during migrations.
 - `health_check` `(string: "checks")` - Specifies the mechanism in which
  allocations health is determined. The potential values are:
  - "checks" - Specifies that the allocation should be considered healthy when
    all of its tasks are running and their associated [checks][checks] are
    healthy, and unhealthy if any of the tasks fail or not all checks become
    healthy.  This is a superset of "task_states" mode.
  - "task_states" - Specifies that the allocation should be considered healthy when
    all its tasks are running and unhealthy if tasks fail.
 - `min_healthy_time` `(string: "10s")` - Specifies the minimum time the
  allocation must be in the healthy state before it is marked as healthy and
  unblocks further allocations from being migrated. This is specified using a
  label suffix like "30s" or "15m".
 - `healthy_deadline` `(string: "5m")` - Specifies the deadline in which the
  allocation must be marked as healthy after which the allocation is
  automatically transitioned to unhealthy. This is specified using a label
  suffix like "2m" or "1h".
 [checks]: /docs/job-specification/service.html#check-parameters
 [count]: /docs/job-specification/group.html#count
 [drain]: /docs/commands/node/drain.html
 [deadline]: /docs/commands/node/drain.html#deadline
--- a/website/source/layouts/docs.erb
+++ b/website/source/layouts/docs.erb
@ -53,6 +53,9 @@
          <li<%= sidebar_current("docs-job-specification-meta")%>>
            <a href="/docs/job-specification/meta.html">meta</a>
          </li>
          <li<%= sidebar_current("docs-job-specification-migrate")%>>
            <a href="/docs/job-specification/migrate.html">migrate</a>
          </li>
          <li<%= sidebar_current("docs-job-specification-network")%>>
            <a href="/docs/job-specification/network.html">network</a>
          </li>
@ -324,6 +327,9 @@
              <li<%= sidebar_current("docs-commands-node-drain") %>>
                <a href="/docs/commands/node/drain.html">drain</a>
              </li>
              <li<%= sidebar_current("docs-commands-node-eligibility") %>>
                <a href="/docs/commands/node/eligibility.html">eligibility</a>
              </li>
              <li<%= sidebar_current("docs-commands-node-status") %>>
                <a href="/docs/commands/node/status.html">status</a>
              </li>