2019-03-04 14:04:53 +00:00
|
|
|
---
|
2020-02-06 23:45:31 +00:00
|
|
|
layout: docs
|
|
|
|
page_title: Device Plugins
|
|
|
|
description: Learn how to author a Nomad device plugin.
|
2019-03-04 14:04:53 +00:00
|
|
|
---
|
|
|
|
|
|
|
|
# Devices
|
|
|
|
|
2019-11-21 19:50:50 +00:00
|
|
|
Nomad has built-in support for scheduling compute resources such as CPU, memory,
|
|
|
|
and networking. Nomad device plugins are used to support scheduling tasks with
|
|
|
|
other devices, such as GPUs. They are responsible for fingerprinting these
|
|
|
|
devices and working with the Nomad client to make them available to assigned
|
|
|
|
tasks.
|
|
|
|
|
|
|
|
For a real world example of a Nomad device plugin implementation, see the [Nvidia
|
2021-03-09 19:33:35 +00:00
|
|
|
GPU plugin](https://github.com/hashicorp/nomad/tree/main/devices/gpu/nvidia).
|
2019-11-21 19:50:50 +00:00
|
|
|
|
|
|
|
## Authoring Device Plugins
|
|
|
|
|
|
|
|
Authoring a device plugin in Nomad consists of implementing the
|
2020-02-06 23:45:31 +00:00
|
|
|
[DevicePlugin][deviceplugin] interface alongside
|
2019-11-21 19:50:50 +00:00
|
|
|
a main package to launch the plugin.
|
|
|
|
|
2020-02-06 23:45:31 +00:00
|
|
|
The [device plugin skeleton project][skeletonproject] exists to help bootstrap
|
2019-11-21 19:50:50 +00:00
|
|
|
the development of new device plugins. It provides most of the boilerplate
|
|
|
|
necessary for a device plugin, along with detailed comments.
|
|
|
|
|
|
|
|
### Lifecycle and State
|
|
|
|
|
|
|
|
A device plugin is long-lived. Nomad will ensure that one instance of the plugin is
|
|
|
|
running. If the plugin crashes or otherwise terminates, Nomad will launch another
|
|
|
|
instance of it.
|
|
|
|
|
2020-02-10 19:20:59 +00:00
|
|
|
However, unlike [task drivers](/docs/internals/plugins/task-drivers), device plugins do not currently
|
2019-11-21 19:50:50 +00:00
|
|
|
have an interface for persisting state to the Nomad client. Instead, the device
|
|
|
|
plugin API emphasizes fingerprinting devices and reporting their status. After
|
|
|
|
helping to provision a task with a scheduled device, a device plugin does not
|
|
|
|
have any responsibility (or ability) to monitor the task.
|
|
|
|
|
|
|
|
## Device Plugin API
|
|
|
|
|
|
|
|
The [base plugin][baseplugin] must be implemented in addition to the following
|
|
|
|
functions.
|
|
|
|
|
|
|
|
### `Fingerprint(context.Context) (<-chan *FingerprintResponse, error)`
|
|
|
|
|
2020-02-06 23:45:31 +00:00
|
|
|
The `Fingerprint` [function][fingerprintfn] is called by the client when the plugin is started.
|
2019-11-21 19:50:50 +00:00
|
|
|
It allows the plugin to provide Nomad with a list of discovered devices, along with their
|
|
|
|
attributes, for the purpose of scheduling workloads using devices.
|
|
|
|
The channel returned should immediately send an initial
|
2020-02-06 23:45:31 +00:00
|
|
|
[`FingerprintResponse`][fingerprintresponse], then send periodic updates at
|
2019-11-21 19:50:50 +00:00
|
|
|
an appropriate interval until the context is canceled.
|
|
|
|
|
|
|
|
Each fingerprint response consists of either an error or a list of device groups.
|
|
|
|
A device group is a list of detected devices that are identical for the purpose of
|
|
|
|
scheduling; that is, they will have identical attributes.
|
|
|
|
|
|
|
|
### `Stats(context.Context, time.Duration) (<-chan *StatsResponse, error)`
|
|
|
|
|
2020-02-06 23:45:31 +00:00
|
|
|
The `Stats` [function][statsfn] returns a channel on which the plugin should
|
2019-11-21 19:50:50 +00:00
|
|
|
emit device statistics, at the specified interval, until either an error is
|
|
|
|
encountered or the specified context is cancelled. The `StatsReponse` object
|
|
|
|
allows [dimensioned][dimensioned] statistics to be returned for each device in a device group.
|
|
|
|
|
|
|
|
### `Reserve(deviceIDs []string) (*ContainerReservation, error)`
|
|
|
|
|
2020-02-06 23:45:31 +00:00
|
|
|
The `Reserve` [function][reservefn] accepts a list of device IDs and returns the information
|
2019-11-21 19:50:50 +00:00
|
|
|
necessary for the client to make those devices available to a task. Currently,
|
|
|
|
the `ContainerReservation` object allows the plugin to specify environment
|
|
|
|
variables for the task, as well as a list of host devices and files to be mounted
|
|
|
|
into the task's filesystem. Any orchestration required to prepare the device for
|
|
|
|
use should also be performed in this function.
|
|
|
|
|
2020-02-06 23:45:31 +00:00
|
|
|
[deviceplugin]: https://github.com/hashicorp/nomad/blob/v0.9.0/plugins/device/device.go#L20-L33
|
|
|
|
[baseplugin]: /docs/internals/plugins/base
|
|
|
|
[skeletonproject]: https://github.com/hashicorp/nomad-skeleton-device-plugin
|
|
|
|
[fingerprintresponse]: https://github.com/hashicorp/nomad/blob/v0.9.0/plugins/device/device.go#L37-L43
|
|
|
|
[fingerprintfn]: https://github.com/hashicorp/nomad-skeleton-device-plugin/blob/v0.1.0/device/device.go#L159-L165
|
|
|
|
[statsfn]: https://github.com/hashicorp/nomad-skeleton-device-plugin/blob/v0.1.0/device/device.go#L169-L176
|
|
|
|
[reservefn]: https://github.com/hashicorp/nomad-skeleton-device-plugin/blob/v0.1.0/device/device.go#L189-L245
|
2019-11-21 19:50:50 +00:00
|
|
|
[dimensioned]: https://github.com/hashicorp/nomad/blob/v0.9.0/plugins/shared/structs/stats.go#L33-L34
|