diff --git a/website/pages/docs/commands/node/drain.mdx b/website/pages/docs/commands/node/drain.mdx index 1b8e3cf43..697675935 100644 --- a/website/pages/docs/commands/node/drain.mdx +++ b/website/pages/docs/commands/node/drain.mdx @@ -18,10 +18,12 @@ all allocations have terminated. Canceling the `node drain` command _will not_ cancel the drain. Drains may be canceled by using the `-disable` parameter below. -When draining more than one node at a time, it is recommended you first disable -[scheduling eligibility][eligibility] on all nodes that will be drained. For -example if you are decommissioning an entire class of nodes, first run `node eligibility -disable` on all of their node IDs, and then run `node drain -enable`. This will ensure allocations drained from the first node are not -placed on another node about to be drained. +When draining more than one node at a time, it is recommended you first +disable [scheduling eligibility][eligibility] on all nodes that will be +drained. For example if you are decommissioning an entire class of nodes, +first run `node eligibility -disable` on all of their node IDs, and then run +`node drain -enable`. This will ensure allocations drained from the first node +are not placed on another node about to be drained. The [node status] command compliments this nicely by providing the current drain status of a given node. @@ -65,8 +67,10 @@ operation is desired. - `-no-deadline`: No deadline allows the allocations to drain off the node without being force stopped after a certain deadline. -- `-ignore-system`: Ignore system allows the drain to complete without stopping - system job allocations. By default system jobs are stopped last. +- `-ignore-system`: Ignore system allows the drain to complete without + stopping system job allocations. By default system jobs are stopped + last. You should always use this flag when draining a node running + [CSI node plugins][internals-csi]. - `-keep-ineligible`: Keep ineligible will maintain the node's scheduling ineligibility even if the drain is being disabled. This is useful when an @@ -135,3 +139,4 @@ $ nomad node drain -self -monitor [migrate]: /docs/job-specification/migrate [node status]: /docs/commands/node/status [workload migration guide]: https://learn.hashicorp.com/nomad/operating-nomad/node-draining +[internals-csi]: /docs/internals/plugins/csi diff --git a/website/pages/docs/internals/plugins/csi.mdx b/website/pages/docs/internals/plugins/csi.mdx index e0c3c57ec..4ca451341 100644 --- a/website/pages/docs/internals/plugins/csi.mdx +++ b/website/pages/docs/internals/plugins/csi.mdx @@ -56,12 +56,12 @@ that perform both the controller and node roles in the same instance. Not every plugin provider has or needs a controller; that's specific to the provider implementation. -You should almost always run node plugins as Nomad `system` jobs to -ensure volume claims are released when a Nomad client is drained. Use -constraints for the node plugin jobs based on the availability of -volumes. For example, AWS EBS volumes are specific to particular -availability zones with a region. Controller plugins can be run as -`service` jobs. +You should always run node plugins as Nomad `system` jobs and use the +`-ignore-system` flag on the `nomad node drain` command to ensure that the +node plugins are still running while the node is being drained. Use +constraints for the node plugin jobs based on the availability of volumes. For +example, AWS EBS volumes are specific to particular availability zones with a +region. Controller plugins can be run as `service` jobs. Nomad exposes a Unix domain socket named `csi.sock` inside each CSI plugin task, and communicates over the gRPC protocol expected by the @@ -111,17 +111,13 @@ client, and the node plugin mounts the volume to a staging area in the Nomad data directory. Nomad will bind-mount this staged directory into each task that mounts the volume. -This cycle is reversed when a task that claims a volume becomes -terminal. The client updates the server frequently about changes to -allocations, including terminal state. When the server receives a -terminal state for a job with volume claims, it creates a volume claim -garbage collection (GC) evaluation to to handled by the core job -scheduler. The GC job will send "detach" RPCs to the node plugin. The -node plugin unmounts the bind-mount from the allocation and unmounts -the volume from the plugin (if it's not in use by another task). The -GC job will then send "unpublish" RPCs to the controller plugin (if -any), and decrement the claim count for the volume. At this point the -volume’s claim capacity has been freed up for scheduling. +This cycle is reversed when a task that claims a volume becomes terminal. The +client will send an "unpublish" RPC to the server, which will send "detach" +RPCs to the node plugin. The node plugin unmounts the bind-mount from the +allocation and unmounts the volume from the plugin (if it's not in use by +another task). The server will then send "unpublish" RPCs to the controller +plugin (if any), and decrement the claim count for the volume. At this point +the volume’s claim capacity has been freed up for scheduling. [csi-spec]: https://github.com/container-storage-interface/spec [csi-drivers-list]: https://kubernetes-csi.github.io/docs/drivers.html diff --git a/website/pages/docs/job-specification/csi_plugin.mdx b/website/pages/docs/job-specification/csi_plugin.mdx index cff59997d..54d5a36b1 100644 --- a/website/pages/docs/job-specification/csi_plugin.mdx +++ b/website/pages/docs/job-specification/csi_plugin.mdx @@ -51,10 +51,11 @@ host. With the Docker task driver, you can use the `privileged = true` configuration, but no other default task drivers currently have this option. -~> **Note:** During node drains, jobs that claim volumes should be -moved before the `node` or `monolith` plugin for those -volumes. Because [`system`][system] jobs are moved last during node drains, you -should run `node` or `monolith` plugins as `system` jobs. +~> **Note:** During node drains, jobs that claim volumes must be moved before +the `node` or `monolith` plugin for those volumes. You should run `node` or +`monolith` plugins as [`system`][system] jobs and use the `-ignore-system` +flag on `nomad node drain` to ensure that the plugins are running while the +node is being drained. ## `csi_plugin` Examples