3e6757f211
Co-authored-by: Luiz Aoqui <luiz@hashicorp.com> Co-authored-by: Mike Nomitch <mnomitch@hashicorp.com>
257 lines
12 KiB
Plaintext
257 lines
12 KiB
Plaintext
---
|
|
layout: docs
|
|
page_title: Task Driver Plugins
|
|
description: Learn how to author a Nomad task driver plugin.
|
|
---
|
|
|
|
# Task Drivers
|
|
|
|
Task drivers in Nomad are the runtime components that execute workloads. For
|
|
a real world example of a Nomad task driver plugin implementation, see the [LXC
|
|
driver source][lxcdriver].
|
|
|
|
## Authoring Task Driver Plugins
|
|
|
|
Authoring a task driver (shortened to driver in this documentation) in Nomad
|
|
consists of implementing the [DriverPlugin][driverplugin] interface and adding
|
|
a main package to launch the plugin. A driver plugin is long-lived and its
|
|
lifetime is not bound to the Nomad client. This means that the Nomad client can
|
|
be restarted without restarting the driver. Nomad will ensure that one
|
|
instance of the driver is running, meaning if the driver crashes or otherwise
|
|
terminates, Nomad will launch another instance of it.
|
|
|
|
Drivers should maintain as little state as possible. State for a task is stored
|
|
by the Nomad client on task creation. This enables a pattern where the driver
|
|
can maintain an in-memory state of the running tasks, and if necessary the
|
|
Nomad client can recover tasks into the driver state.
|
|
|
|
The [driver plugin skeleton project][skeletonproject] exists to help bootstrap
|
|
the development of new driver plugins. It provides most of the boilerplate
|
|
necessary for a driver plugin, along with detailed comments.
|
|
|
|
## Task Driver Plugin API
|
|
|
|
The [base plugin][baseplugin] must be implemented in addition to the following
|
|
functions.
|
|
|
|
### `TaskConfigSchema() (*hclspec.Spec, error)`
|
|
|
|
This function returns the schema for the driver configuration of the task. For
|
|
more information on `hclspec.Spec` see the HCL section in the [base
|
|
plugin][baseplugin] documentation.
|
|
|
|
### `Capabilities() (*Capabilities, error)`
|
|
|
|
Capabilities define what features the driver implements. Example:
|
|
|
|
```go
|
|
type Capabilities struct {
|
|
// SendSignals marks the driver as being able to send signals
|
|
SendSignals bool
|
|
|
|
// Exec marks the driver as being able to execute arbitrary commands
|
|
// such as health checks. Used by the ScriptExecutor interface.
|
|
Exec bool
|
|
|
|
//FSIsolation indicates what kind of filesystem isolation the driver supports.
|
|
FSIsolation FSIsolation
|
|
|
|
//NetIsolationModes lists the set of isolation modes supported by the driver
|
|
NetIsolationModes []NetIsolationMode
|
|
|
|
// MustInitiateNetwork tells Nomad that the driver must create the network
|
|
// namespace and that the CreateNetwork and DestroyNetwork RPCs are implemented.
|
|
MustInitiateNetwork bool
|
|
|
|
// MountConfigs tells Nomad which mounting config options the driver supports.
|
|
MountConfigs MountConfigSupport
|
|
|
|
// RemoteTasks indicates this driver runs tasks on remote systems
|
|
// instead of locally. The Nomad client can use this information to
|
|
// adjust behavior such as propogating task handles between allocations
|
|
// to avoid downtime when a client is lost.
|
|
RemoteTasks bool
|
|
}
|
|
```
|
|
|
|
The file system isolation options are:
|
|
|
|
- `FSIsolationImage`: The task driver isolates tasks as machine images.
|
|
- `FSIsolationChroot`: The task driver isolates tasks with `chroot` or
|
|
`pivot_root`.
|
|
- `FSIsolationNone`: The task driver has no filesystem isolation.
|
|
|
|
The network isolation modes are:
|
|
|
|
- `NetIsolationModeHost`: The task driver supports disabling network isolation
|
|
and using the host network.
|
|
- `NetIsolationModeGroup`: The task driver supports using the task group
|
|
network namespace.
|
|
- `NetIsolationModeTask`: The task driver supports isolating the network to
|
|
just the task.
|
|
- `NetIsolationModeNone`: There is no network to isolate. This is used for
|
|
task that the client manages remotely.
|
|
|
|
#### Remote Task Drivers
|
|
|
|
[Remote Task Drivers][rtd] should set `RemoteTasks` to `true`. Remote Task
|
|
Drivers are task driver plugins that execute tasks on a different system than
|
|
the Nomad client. This means the tasks lifecycle is distinct from the Nomad
|
|
client's.
|
|
|
|
For task driver plugin authors there are 2 important new behaviors when
|
|
`RemoteTasks` is `true`:
|
|
|
|
1. The `TaskHandle` returned by `StartTask` will be propagated to replacement
|
|
allocations if the Nomad client is drained or down. Nomad will call
|
|
`RecoverTask` instead of `StartTask` for remote tasks in replacement
|
|
allocations when a `TaskHandle` has been propagated from the previous
|
|
allocation.
|
|
2. If the Nomad client managing a remote task is drained or if the allocation
|
|
was `lost`, the remote task is sent a special `DETACH` kill signal. This
|
|
indicates the plugin should stop managing the remote task, but _not_ stop
|
|
it.
|
|
|
|
These behaviors are meant to keep remote tasks running even when the Nomad
|
|
client managing them is shutdown. Remote tasks are stopped when the job is
|
|
explicitly stopped like traditional tasks.
|
|
|
|
### `Fingerprint(context.Context) (<-chan *Fingerprint, error)`
|
|
|
|
This function is called by the client when the plugin is started. It allows the
|
|
driver to indicate its health to the client. The channel returned should
|
|
immediately send an initial Fingerprint, then send periodic updates at an
|
|
interval that is appropriate for the driver until the context is canceled.
|
|
|
|
The fingerprint consists of a `HealthState` and `HealthDescription` to inform
|
|
the client about its health. Additionally an `Attributes` field is available
|
|
for the driver to add additional attributes to the client node. The fingerprint
|
|
`HealthState` can be one of three states.
|
|
|
|
- `HealthStateUndetected`: Indicates that the necessary dependencies for the
|
|
driver are not detected on the system. Ex. java runtime for the java driver
|
|
- `HealthStateUnhealthy`: Indicates that something is wrong with the driver
|
|
runtime. Ex. docker daemon stopped for the Docker driver
|
|
- `HealthStateHealthy`: All systems go
|
|
|
|
### `StartTask(*TaskConfig) (*TaskHandle, *DriverNetwork, error)`
|
|
|
|
This function takes a [`TaskConfig`][taskconfig] which includes all of the configuration
|
|
needed to launch the task. Additionally the driver configuration can be decoded
|
|
from the `TaskConfig` by calling `*TaskConfig.DecodeDriverConfig(t interface{})`
|
|
passing in a pointer to the driver specific configuration struct. The
|
|
`TaskConfig` includes an `ID` field which future operations on the task will be
|
|
referenced by.
|
|
|
|
Drivers return a [`*TaskHandle`][taskhandle] which contains
|
|
the required information for the driver to reattach to the running task in the
|
|
case of plugin crashes or restarts. Some of this required state
|
|
will be specific to the driver implementation, thus a `DriverState` field
|
|
exists to allow the driver to encode custom state into the struct. Helper
|
|
fields exist on the `TaskHandle` to `GetDriverState` and `SetDriverState`
|
|
removing the need for the driver to handle serialization.
|
|
|
|
A `*DriverNetwork` can optionally be returned to describe the network of the
|
|
task if it is modified by the driver. An example of this is in the Docker
|
|
driver where tasks can be attached to a specific Docker network.
|
|
|
|
If an error occurs, it is expected that the driver will cleanup any created
|
|
resources prior to returning the error.
|
|
|
|
#### Logging
|
|
|
|
Nomad handles all rotation and plumbing of task logs. In order for task stdout
|
|
and stderr to be received by Nomad, they must be written to the correct
|
|
location. Prior to starting the task through the driver, the Nomad client
|
|
creates FIFOs for stdout and stderr. These paths are given to the driver in the
|
|
`TaskConfig`. The [`fifo` package][fifopackage] can be used to support
|
|
cross platform writing to these paths.
|
|
|
|
#### TaskHandle Schema Versioning
|
|
|
|
A `Version` field is available on the TaskHandle struct to facilitate backwards
|
|
compatible recovery of tasks. This field is opaque to Nomad, but allows the
|
|
driver to handle recover tasks that were created by an older version of the
|
|
plugin.
|
|
|
|
### `RecoverTask(*TaskHandle) error`
|
|
|
|
When a driver is restarted it is not expected to persist any internal state to
|
|
disk. To support this, Nomad will attempt to recover a task that was
|
|
previously started if the driver does not recognize the task ID. During task
|
|
recovery, Nomad calls `RecoverTask` passing the `TaskHandle` that was
|
|
returned by the `StartTask` function. If no error was returned, it is
|
|
expected that the driver can now operate on the task by referencing the task
|
|
ID. If an error occurs, the Nomad client will mark the task as `lost`.
|
|
|
|
### `WaitTask(context.Context, id string) (<-chan *ExitResult, error)`
|
|
|
|
The `WaitTask` function is expected to return a channel that will send an
|
|
`*ExitResult` when the task exits or close the channel when the context is
|
|
canceled. It is also expected that calling `WaitTask` on an exited task will
|
|
immediately send an `*ExitResult` on the returned channel.
|
|
|
|
### `StopTask(taskID string, timeout time.Duration, signal string) error`
|
|
|
|
The `StopTask` function is expected to stop a running task by sending the given
|
|
signal to it. If the task does not stop during the given timeout, the driver
|
|
must forcefully kill the task.
|
|
|
|
`StopTask` does not clean up resources of the task or remove it from the
|
|
driver's internal state. A call to `WaitTask` after `StopTask` is valid and
|
|
should be handled.
|
|
|
|
### `DestroyTask(taskID string, force bool) error`
|
|
|
|
The `DestroyTask` function cleans up and removes a task that has terminated. If
|
|
force is set to true, the driver must destroy the task even if it is still
|
|
running. If `WaitTask` is called after `DestroyTask`, it should return
|
|
`drivers.ErrTaskNotFound` as no task state should exist after `DestroyTask` is
|
|
called.
|
|
|
|
### `InspectTask(taskID string) (*TaskStatus, error)`
|
|
|
|
The `InspectTask` function returns detailed status information for the
|
|
referenced `taskID`.
|
|
|
|
### `TaskStats(context.Context, id string, time.Duration) (<-chan *cstructs.TaskResourceUsage, error)`
|
|
|
|
The `TaskStats` function returns a channel which the driver should send stats
|
|
to at the given interval. The driver must send stats at the given interval
|
|
until the given context is canceled or the task terminates.
|
|
|
|
### `TaskEvents(context.Context) (<-chan *TaskEvent, error)`
|
|
|
|
The Nomad client publishes events associated with an allocation. The
|
|
`TaskEvents` function allows the driver to publish driver specific events about
|
|
tasks and the Nomad client will associate them with the correct allocation.
|
|
|
|
An `Eventer` utility is available in the
|
|
`github.com/hashicorp/nomad/drivers/shared/eventer` package implements an
|
|
event loop and publishing mechanism for use in the `TaskEvents` function.
|
|
|
|
### `SignalTask(taskID string, signal string) error`
|
|
|
|
> Optional - can be skipped by embedding `drivers.DriverSignalTaskNotSupported`
|
|
|
|
The `SignalTask` function is used by drivers which support sending OS signals
|
|
(`SIGHUP`, `SIGKILL`, `SIGUSR1` etc.) to the task. It is an optional function
|
|
and is listed as a capability in the driver `Capabilities` struct.
|
|
|
|
### `ExecTask(taskID string, cmd []string, timeout time.Duration) (*ExecTaskResult, error)`
|
|
|
|
> Optional - can be skipped by embedding `drivers.DriverExecTaskNotSupported`
|
|
|
|
The `ExecTask` function is used by the Nomad client to execute commands inside
|
|
the task execution context. For example, the Docker driver executes commands
|
|
inside the running container. `ExecTask` is called for Consul script checks.
|
|
|
|
[lxcdriver]: https://github.com/hashicorp/nomad-driver-lxc
|
|
[driverplugin]: https://github.com/hashicorp/nomad/blob/v0.9.0/plugins/drivers/driver.go#L39-L57
|
|
[skeletonproject]: https://github.com/hashicorp/nomad-skeleton-driver-plugin
|
|
[baseplugin]: /docs/internals/plugins/base
|
|
[taskconfig]: https://godoc.org/github.com/hashicorp/nomad/plugins/drivers#TaskConfig
|
|
[taskhandle]: https://godoc.org/github.com/hashicorp/nomad/plugins/drivers#TaskHandle
|
|
[fifopackage]: https://godoc.org/github.com/hashicorp/nomad/client/lib/fifo
|
|
[rtd]: /plugins/drivers/remote
|