open-nomad/website/source/docs/internals/plugins/task-drivers.html.md

8.7 KiB

layout page_title sidebar_current description
docs Task Driver Plugins docs-internals-plugins-task-drivers Learn how to author a Nomad task driver plugin.

Task Drivers

Task drivers in Nomad are the runtime components that execute workloads. For a real world example of a Nomad task driver plugin implementation, see the LXC driver source.

Authoring Task Driver Plugins

Authoring a task driver (shortened to driver in this documentation) in Nomad consists of implementing the DriverPlugin interface and adding a main package to launch the plugin. A driver plugin is long-lived and its lifetime is not bound to the Nomad client. This means that the Nomad client can be restarted without restarting the driver. Nomad will ensure that one instance of the driver is running, meaning if the driver crashes or otherwise terminates, Nomad will launch another instance of it.

Drivers should maintain as little state as possible. State for a task is stored by the Nomad client on task creation. This enables a pattern where the driver can maintain an in-memory state of the running tasks, and if necessary the Nomad client can recover tasks into the driver state.

Task Driver Plugin API

The base plugin must be implemented in addition to the following functions.

TaskConfigSchema() (*hclspec.Spec, error)

This function returns the schema for the driver configuration of the task. For more information on hclspec.Spec see the HCL section in the base plugin documentation.

Capabilities() (*Capabilities, error)

Capabilities define what features the driver implements. Example:

Capabilities {
    // Does the driver support sending OS signals to the task?
	SendSignals: true,
    // Does the driver support executing a command within the task execution
    // environment?
	Exec:        true,
    // What filesystem isolation is supported by the driver. Options include
    // FSIsolationImage, FSIsolationChroot, and FSIsolationNone
	FSIsolation: FSIsolationImage,
}

Fingerprint(context.Context) (<-chan *Fingerprint, error)

This function is called by the client when the plugin is started. It allows the driver to indicate its health to the client. The channel returned should immediately send an initial Fingerprint, then send periodic updates at an interval that is appropriate for the driver until the context is canceled.

The fingerprint consists of a HealthState and HealthDescription to inform the client about its health. Additionally an Attributes field is available for the driver to add additional attributes to the client node. The fingerprint HealthState can be one of three states.

  • HealthStateUndetected: Indicates that the necessary dependencies for the driver are not detected on the system. Ex. java runtime for the java driver
  • HealthStateUnhealthy: Indicates that something is wrong with the driver runtime. Ex. docker daemon stopped for the Docker driver
  • HealthStateHealthy: All systems go

StartTask(*TaskConfig) (*TaskHandle, *DriverNetwork, error)

This function takes a TaskConfig which includes all of the configuration needed to launch the task. Additionally the driver configuration can be decoded from the TaskConfig by calling *TaskConfig.DecodeDriverConfig(t interface{}) passing in a pointer to the driver specific configuration struct. The TaskConfig includes an ID field which future operations on the task will be referenced by.

Drivers return a *TaskHandle which contains the required information for the driver to reattach to the running task in the case of plugin crashes or restarts. Some of this required state will be specific to the driver implementation, thus a DriverState field exists to allow the driver to encode custom state into the struct. Helper fields exist on the TaskHandle to GetDriverState and SetDriverState removing the need for the driver to handle serialization.

A *DriverNetwork can optionally be returned to describe the network of the task if it is modified by the driver. An example of this is in the Docker driver where tasks can be attached to a specific Docker network.

If an error occurs, it is expected that the driver will cleanup any created resources prior to returning the error.

Logging

Nomad handles all rotation and plumbing of task logs. In order for task stdout and stderr to be received by Nomad, they must be written to the correct location. Prior to starting the task through the driver, the Nomad client creates FIFOs for stdout and stderr. These paths are given to the driver in the TaskConfig. The fifo package can be used to support cross platform writing to these paths.

TaskHandle Schema Versioning

A Version field is available on the TaskHandle struct to facilitate backwards compatible recovery of tasks. This field is opaque to Nomad, but allows the driver to handle recover tasks that were created by an older version of the plugin.

RecoverTask(*TaskHandle) error

When a driver is restarted it is not expected to persist any internal state to disk. To support this, Nomad will attempt to recover a task that was previously started if the driver does not recognize the task ID. During task recovery, Nomad calls RecoverTask passing the TaskHandle that was returned by the StartTask function. If no error was returned, it is expected that the driver can now operate on the task by referencing the task ID. If an error occurs, the Nomad client will mark the task as lost.

WaitTask(context.Context, id string) (<-chan *ExitResult, error)

The WaitTask function is expected to return a channel that will send an *ExitResult when the task exits or close the channel when the context is canceled. It is also expected that calling WaitTask on an exited task will immediately send an *ExitResult on the returned channel.

StopTask(taskID string, timeout time.Duration, signal string) error

The StopTask function is expected to stop a running task by sending the given signal to it. If the task does not stop during the given timeout, the driver must forcefully kill the task.

StopTask does not clean up resources of the task or remove it from the driver's internal state. A call to WaitTask after StopTask is valid and should be handled.

DestroyTask(taskID string, force bool) error

The DestroyTask function cleans up and removes a task that has terminated. If force is set to true, the driver must destroy the task even if it is still running. If WaitTask is called after DestroyTask, it should return drivers.ErrTaskNotFound as no task state should exist after DestroyTask is called.

InspectTask(taskID string) (*TaskStatus, error)

The InspectTask function returns detailed status information for the referenced taskID.

TaskStats(context.Context, id string, time.Duration) (<-chan *cstructs.TaskResourceUsage, error)

The TaskStats function returns a channel which the driver should send stats to at the given interval. The driver must send stats at the given interval until the given context is canceled or the task terminates.

TaskEvents(context.Context) (<-chan *TaskEvent, error)

The Nomad client publishes events associated with an allocation. The TaskEvents function allows the driver to publish driver specific events about tasks and the Nomad client will associate them with the correct allocation.

An Eventer utility is available in the github.com/hashicorp/nomad/drivers/shared/eventer package implements an event loop and publishing mechanism for use in the TaskEvents function.

SignalTask(taskID string, signal string) error

Optional - can be skipped by embedding drivers.DriverSignalTaskNotSupported

The SignalTask function is used by drivers which support sending OS signals (SIGHUP, SIGKILL, SIGUSR1 etc.) to the task. It is an optional function and is listed as a capability in the driver Capabilities struct.

ExecTask(taskID string, cmd []string, timeout time.Duration) (*ExecTaskResult, error)

Optional - can be skipped by embedding drivers.DriverExecTaskNotSupported

The ExecTask function is used by the Nomad client to execute commands inside the task execution context. For example, the Docker driver executes commands inside the running container. ExecTask is called for Consul script checks.