open-nomad/website/content/docs/internals/plugins/task-drivers.mdx
Kevin Wang 3e6757f211
feat(website): extract /plugins /tools docs (#11584)
Co-authored-by: Luiz Aoqui <luiz@hashicorp.com>
Co-authored-by: Mike Nomitch <mnomitch@hashicorp.com>
2021-12-09 14:25:18 -05:00

257 lines
12 KiB
Plaintext

---
layout: docs
page_title: Task Driver Plugins
description: Learn how to author a Nomad task driver plugin.
---
# Task Drivers
Task drivers in Nomad are the runtime components that execute workloads. For
a real world example of a Nomad task driver plugin implementation, see the [LXC
driver source][lxcdriver].
## Authoring Task Driver Plugins
Authoring a task driver (shortened to driver in this documentation) in Nomad
consists of implementing the [DriverPlugin][driverplugin] interface and adding
a main package to launch the plugin. A driver plugin is long-lived and its
lifetime is not bound to the Nomad client. This means that the Nomad client can
be restarted without restarting the driver. Nomad will ensure that one
instance of the driver is running, meaning if the driver crashes or otherwise
terminates, Nomad will launch another instance of it.
Drivers should maintain as little state as possible. State for a task is stored
by the Nomad client on task creation. This enables a pattern where the driver
can maintain an in-memory state of the running tasks, and if necessary the
Nomad client can recover tasks into the driver state.
The [driver plugin skeleton project][skeletonproject] exists to help bootstrap
the development of new driver plugins. It provides most of the boilerplate
necessary for a driver plugin, along with detailed comments.
## Task Driver Plugin API
The [base plugin][baseplugin] must be implemented in addition to the following
functions.
### `TaskConfigSchema() (*hclspec.Spec, error)`
This function returns the schema for the driver configuration of the task. For
more information on `hclspec.Spec` see the HCL section in the [base
plugin][baseplugin] documentation.
### `Capabilities() (*Capabilities, error)`
Capabilities define what features the driver implements. Example:
```go
type Capabilities struct {
// SendSignals marks the driver as being able to send signals
SendSignals bool
// Exec marks the driver as being able to execute arbitrary commands
// such as health checks. Used by the ScriptExecutor interface.
Exec bool
//FSIsolation indicates what kind of filesystem isolation the driver supports.
FSIsolation FSIsolation
//NetIsolationModes lists the set of isolation modes supported by the driver
NetIsolationModes []NetIsolationMode
// MustInitiateNetwork tells Nomad that the driver must create the network
// namespace and that the CreateNetwork and DestroyNetwork RPCs are implemented.
MustInitiateNetwork bool
// MountConfigs tells Nomad which mounting config options the driver supports.
MountConfigs MountConfigSupport
// RemoteTasks indicates this driver runs tasks on remote systems
// instead of locally. The Nomad client can use this information to
// adjust behavior such as propogating task handles between allocations
// to avoid downtime when a client is lost.
RemoteTasks bool
}
```
The file system isolation options are:
- `FSIsolationImage`: The task driver isolates tasks as machine images.
- `FSIsolationChroot`: The task driver isolates tasks with `chroot` or
`pivot_root`.
- `FSIsolationNone`: The task driver has no filesystem isolation.
The network isolation modes are:
- `NetIsolationModeHost`: The task driver supports disabling network isolation
and using the host network.
- `NetIsolationModeGroup`: The task driver supports using the task group
network namespace.
- `NetIsolationModeTask`: The task driver supports isolating the network to
just the task.
- `NetIsolationModeNone`: There is no network to isolate. This is used for
task that the client manages remotely.
#### Remote Task Drivers
[Remote Task Drivers][rtd] should set `RemoteTasks` to `true`. Remote Task
Drivers are task driver plugins that execute tasks on a different system than
the Nomad client. This means the tasks lifecycle is distinct from the Nomad
client's.
For task driver plugin authors there are 2 important new behaviors when
`RemoteTasks` is `true`:
1. The `TaskHandle` returned by `StartTask` will be propagated to replacement
allocations if the Nomad client is drained or down. Nomad will call
`RecoverTask` instead of `StartTask` for remote tasks in replacement
allocations when a `TaskHandle` has been propagated from the previous
allocation.
2. If the Nomad client managing a remote task is drained or if the allocation
was `lost`, the remote task is sent a special `DETACH` kill signal. This
indicates the plugin should stop managing the remote task, but _not_ stop
it.
These behaviors are meant to keep remote tasks running even when the Nomad
client managing them is shutdown. Remote tasks are stopped when the job is
explicitly stopped like traditional tasks.
### `Fingerprint(context.Context) (<-chan *Fingerprint, error)`
This function is called by the client when the plugin is started. It allows the
driver to indicate its health to the client. The channel returned should
immediately send an initial Fingerprint, then send periodic updates at an
interval that is appropriate for the driver until the context is canceled.
The fingerprint consists of a `HealthState` and `HealthDescription` to inform
the client about its health. Additionally an `Attributes` field is available
for the driver to add additional attributes to the client node. The fingerprint
`HealthState` can be one of three states.
- `HealthStateUndetected`: Indicates that the necessary dependencies for the
driver are not detected on the system. Ex. java runtime for the java driver
- `HealthStateUnhealthy`: Indicates that something is wrong with the driver
runtime. Ex. docker daemon stopped for the Docker driver
- `HealthStateHealthy`: All systems go
### `StartTask(*TaskConfig) (*TaskHandle, *DriverNetwork, error)`
This function takes a [`TaskConfig`][taskconfig] which includes all of the configuration
needed to launch the task. Additionally the driver configuration can be decoded
from the `TaskConfig` by calling `*TaskConfig.DecodeDriverConfig(t interface{})`
passing in a pointer to the driver specific configuration struct. The
`TaskConfig` includes an `ID` field which future operations on the task will be
referenced by.
Drivers return a [`*TaskHandle`][taskhandle] which contains
the required information for the driver to reattach to the running task in the
case of plugin crashes or restarts. Some of this required state
will be specific to the driver implementation, thus a `DriverState` field
exists to allow the driver to encode custom state into the struct. Helper
fields exist on the `TaskHandle` to `GetDriverState` and `SetDriverState`
removing the need for the driver to handle serialization.
A `*DriverNetwork` can optionally be returned to describe the network of the
task if it is modified by the driver. An example of this is in the Docker
driver where tasks can be attached to a specific Docker network.
If an error occurs, it is expected that the driver will cleanup any created
resources prior to returning the error.
#### Logging
Nomad handles all rotation and plumbing of task logs. In order for task stdout
and stderr to be received by Nomad, they must be written to the correct
location. Prior to starting the task through the driver, the Nomad client
creates FIFOs for stdout and stderr. These paths are given to the driver in the
`TaskConfig`. The [`fifo` package][fifopackage] can be used to support
cross platform writing to these paths.
#### TaskHandle Schema Versioning
A `Version` field is available on the TaskHandle struct to facilitate backwards
compatible recovery of tasks. This field is opaque to Nomad, but allows the
driver to handle recover tasks that were created by an older version of the
plugin.
### `RecoverTask(*TaskHandle) error`
When a driver is restarted it is not expected to persist any internal state to
disk. To support this, Nomad will attempt to recover a task that was
previously started if the driver does not recognize the task ID. During task
recovery, Nomad calls `RecoverTask` passing the `TaskHandle` that was
returned by the `StartTask` function. If no error was returned, it is
expected that the driver can now operate on the task by referencing the task
ID. If an error occurs, the Nomad client will mark the task as `lost`.
### `WaitTask(context.Context, id string) (<-chan *ExitResult, error)`
The `WaitTask` function is expected to return a channel that will send an
`*ExitResult` when the task exits or close the channel when the context is
canceled. It is also expected that calling `WaitTask` on an exited task will
immediately send an `*ExitResult` on the returned channel.
### `StopTask(taskID string, timeout time.Duration, signal string) error`
The `StopTask` function is expected to stop a running task by sending the given
signal to it. If the task does not stop during the given timeout, the driver
must forcefully kill the task.
`StopTask` does not clean up resources of the task or remove it from the
driver's internal state. A call to `WaitTask` after `StopTask` is valid and
should be handled.
### `DestroyTask(taskID string, force bool) error`
The `DestroyTask` function cleans up and removes a task that has terminated. If
force is set to true, the driver must destroy the task even if it is still
running. If `WaitTask` is called after `DestroyTask`, it should return
`drivers.ErrTaskNotFound` as no task state should exist after `DestroyTask` is
called.
### `InspectTask(taskID string) (*TaskStatus, error)`
The `InspectTask` function returns detailed status information for the
referenced `taskID`.
### `TaskStats(context.Context, id string, time.Duration) (<-chan *cstructs.TaskResourceUsage, error)`
The `TaskStats` function returns a channel which the driver should send stats
to at the given interval. The driver must send stats at the given interval
until the given context is canceled or the task terminates.
### `TaskEvents(context.Context) (<-chan *TaskEvent, error)`
The Nomad client publishes events associated with an allocation. The
`TaskEvents` function allows the driver to publish driver specific events about
tasks and the Nomad client will associate them with the correct allocation.
An `Eventer` utility is available in the
`github.com/hashicorp/nomad/drivers/shared/eventer` package implements an
event loop and publishing mechanism for use in the `TaskEvents` function.
### `SignalTask(taskID string, signal string) error`
> Optional - can be skipped by embedding `drivers.DriverSignalTaskNotSupported`
The `SignalTask` function is used by drivers which support sending OS signals
(`SIGHUP`, `SIGKILL`, `SIGUSR1` etc.) to the task. It is an optional function
and is listed as a capability in the driver `Capabilities` struct.
### `ExecTask(taskID string, cmd []string, timeout time.Duration) (*ExecTaskResult, error)`
> Optional - can be skipped by embedding `drivers.DriverExecTaskNotSupported`
The `ExecTask` function is used by the Nomad client to execute commands inside
the task execution context. For example, the Docker driver executes commands
inside the running container. `ExecTask` is called for Consul script checks.
[lxcdriver]: https://github.com/hashicorp/nomad-driver-lxc
[driverplugin]: https://github.com/hashicorp/nomad/blob/v0.9.0/plugins/drivers/driver.go#L39-L57
[skeletonproject]: https://github.com/hashicorp/nomad-skeleton-driver-plugin
[baseplugin]: /docs/internals/plugins/base
[taskconfig]: https://godoc.org/github.com/hashicorp/nomad/plugins/drivers#TaskConfig
[taskhandle]: https://godoc.org/github.com/hashicorp/nomad/plugins/drivers#TaskHandle
[fifopackage]: https://godoc.org/github.com/hashicorp/nomad/client/lib/fifo
[rtd]: /plugins/drivers/remote