ed5fccc183
Devices are fingerprinted as groups of similar devices. This prevented specifying specific device by their ID in constraint and affinity rules. This commit introduces the `${device.ids}` attribute that returns a comma separated list of IDs that are part of the device group. Users can then use the set operators to write rules.
330 lines
8.4 KiB
Plaintext
330 lines
8.4 KiB
Plaintext
---
|
|
layout: docs
|
|
page_title: device Stanza - Job Specification
|
|
description: |-
|
|
The "device" stanza is used to require a certain device be made available
|
|
to the task.
|
|
---
|
|
|
|
# `device` Stanza
|
|
|
|
<Placement groups={['job', 'group', 'task', 'resources', 'device']} />
|
|
|
|
The `device` stanza is used to create both a scheduling and runtime requirement
|
|
that the given task has access to the specified devices. A device is a hardware
|
|
device that is attached to the node and may be made available to the task.
|
|
Examples are GPUs, FPGAs, and TPUs.
|
|
|
|
When a `device` stanza is added, Nomad will schedule the task onto a node that
|
|
contains the set of device(s) that meet the specified requirements. The `device` stanza
|
|
allows the operator to specify as little as just the type of device required,
|
|
such as `gpu`, all the way to specifying arbitrary constraints and affinities.
|
|
Once the scheduler has placed the allocation on a suitable node, the Nomad
|
|
Client will invoke the device plugin to retrieve information on how to mount the
|
|
device and what environment variables to expose. For more information on the
|
|
runtime environment, please consult the individual device plugin's documentation.
|
|
|
|
See the [device plugin's documentation][devices] for a list of supported devices.
|
|
|
|
```hcl
|
|
job "docs" {
|
|
group "example" {
|
|
task "server" {
|
|
resources {
|
|
device "nvidia/gpu" {
|
|
count = 2
|
|
|
|
constraint {
|
|
attribute = "${device.attr.memory}"
|
|
operator = ">="
|
|
value = "2 GiB"
|
|
}
|
|
|
|
affinity {
|
|
attribute = "${device.attr.memory}"
|
|
operator = ">="
|
|
value = "4 GiB"
|
|
weight = 75
|
|
}
|
|
}
|
|
}
|
|
}
|
|
}
|
|
}
|
|
```
|
|
|
|
In the above example, the task is requesting two GPUs, from the Nvidia vendor,
|
|
but is not specifying the specific model required. Instead it is placing a hard
|
|
constraint that the device has at least 2 GiB of memory and that it would prefer
|
|
to use GPUs that have at least 4 GiB. This examples shows how expressive the
|
|
`device` stanza can be.
|
|
|
|
~> Device support is currently limited to Linux, and container based drivers
|
|
due to the ability to isolate devices to specific tasks.
|
|
|
|
## `device` Parameters
|
|
|
|
- `name` `(string: "")` - Specifies the device required. The following inputs
|
|
are valid:
|
|
|
|
- `<device_type>`: If a single value is given, it is assumed to be the device
|
|
type, such as "gpu", or "fpga".
|
|
|
|
- `<vendor>/<device_type>`: If two values are given separated by a `/`, the
|
|
given device type will be selected, constraining on the provided vendor.
|
|
Examples include "nvidia/gpu" or "amd/gpu".
|
|
|
|
- `<vendor>/<device_type>/<model>`: If three values are given separated by a `/`, the
|
|
given device type will be selected, constraining on the provided vendor, and
|
|
model name. Examples include "nvidia/gpu/1080ti" or "nvidia/gpu/2080ti".
|
|
|
|
- `count` `(int: 1)` - Specifies the number of instances of the given device
|
|
that are required.
|
|
|
|
- `constraint` <code>([Constraint][]: nil)</code> - Constraints to restrict
|
|
which devices are eligible. This can be provided multiple times to define
|
|
additional constraints. See below for available attributes.
|
|
|
|
- `affinity` <code>([Affinity][]: nil)</code> - Affinity to specify a preference
|
|
for which devices get selected. This can be provided multiple times to define
|
|
additional affinities. See below for available attributes.
|
|
|
|
## `device` Constraint and Affinity Attributes
|
|
|
|
The set of attributes available for use in a `constraint` or `affinity` are as
|
|
follows:
|
|
|
|
<table>
|
|
<thead>
|
|
<tr>
|
|
<th>Variable</th>
|
|
<th>Description</th>
|
|
<th>Example Value</th>
|
|
</tr>
|
|
</thead>
|
|
<tbody>
|
|
<tr>
|
|
<td>
|
|
<code>{'${device.ids}'}</code>
|
|
</td>
|
|
<td>Comma separated list of device IDs in the group</td>
|
|
<td>
|
|
<code>9afa5da1-8f39-25a2-48dc-ba31fd7c0023,c248b547-fed7-4d67-ade5-73a27d280ac4</code>
|
|
</td>
|
|
</tr>
|
|
<tr>
|
|
<td>
|
|
<code>{'${device.type}'}</code>
|
|
</td>
|
|
<td>The type of device</td>
|
|
<td>
|
|
<code>"gpu", "tpu", "fpga"</code>
|
|
</td>
|
|
</tr>
|
|
<tr>
|
|
<td>
|
|
<code>{'${device.vendor}'}</code>
|
|
</td>
|
|
<td>The device's vendor</td>
|
|
<td>
|
|
<code>"amd", "nvidia", "intel"</code>
|
|
</td>
|
|
</tr>
|
|
<tr>
|
|
<td>
|
|
<code>{'${device.model}'}</code>
|
|
</td>
|
|
<td>The device's model</td>
|
|
<td>
|
|
<code>"1080ti"</code>
|
|
</td>
|
|
</tr>
|
|
<tr>
|
|
<td>
|
|
<code>
|
|
${'{'}device.attr.<property>{'}'}
|
|
</code>
|
|
</td>
|
|
<td>Property of the device</td>
|
|
<td>
|
|
<code>{'${device.attr.memory} => 8 GiB'}</code>
|
|
</td>
|
|
</tr>
|
|
</tbody>
|
|
</table>
|
|
|
|
For the set of attributes available, please see the individual [device plugin's
|
|
documentation][devices].
|
|
|
|
### Attribute Units and Conversions
|
|
|
|
Devices report their attributes with strict types and can also provide unit
|
|
information. For example, when a GPU is reporting its memory, it can report that
|
|
it is "4096 MiB". Since Nomad has the associated unit information, a constraint
|
|
that requires greater than "3.5 GiB" can match since Nomad can convert between
|
|
these units.
|
|
|
|
The units Nomad supports are as follows:
|
|
|
|
<table>
|
|
<thead>
|
|
<tr>
|
|
<th>Base Unit</th>
|
|
<th>Values</th>
|
|
</tr>
|
|
</thead>
|
|
<tbody>
|
|
<tr>
|
|
<td>
|
|
<code>Byte</code>
|
|
</td>
|
|
<td>
|
|
<code>
|
|
<strong>Base 2</strong>: KiB, MiB, GiB, TiB, PiB, EiB
|
|
</code>
|
|
<br />
|
|
<code>
|
|
<strong>Base 10</strong>: kB, KB (equivalent to kB), MB, GB, TB, PB,
|
|
EB
|
|
</code>
|
|
</td>
|
|
</tr>
|
|
<tr>
|
|
<td>
|
|
<code>Byte Rates</code>
|
|
</td>
|
|
<td>
|
|
<code>
|
|
<strong>Base 2</strong>: KiB/s, MiB/s, GiB/s, TiB/s, PiB/s, EiB/s
|
|
</code>
|
|
<br />
|
|
<code>
|
|
<strong>Base 10</strong>: kB/s, KB/s (equivalent to kB/s), MB/s, GB/s,
|
|
TB/s, PB/s,EB/s
|
|
</code>
|
|
</td>
|
|
</tr>
|
|
<tr>
|
|
<td>
|
|
<code>Hertz</code>
|
|
</td>
|
|
<td>
|
|
<code>MHz, GHz</code>
|
|
</td>
|
|
</tr>
|
|
<tr>
|
|
<td>
|
|
<code>Watts</code>
|
|
</td>
|
|
<td>
|
|
<code>mW, W, kW, MW, GW</code>
|
|
</td>
|
|
</tr>
|
|
</tbody>
|
|
</table>
|
|
|
|
Conversion is only possible within the same base unit.
|
|
|
|
## `device` Examples
|
|
|
|
The following examples only show the `device` stanzas. Remember that the
|
|
`device` stanza is only valid in the placements listed above.
|
|
|
|
### Single Nvidia GPU
|
|
|
|
This example schedules a task with a single Nvidia GPU made available.
|
|
|
|
```hcl
|
|
device "nvidia/gpu" {}
|
|
```
|
|
|
|
### Multiple Nvidia GPU
|
|
|
|
This example schedules a task with a two Nvidia GPU made available.
|
|
|
|
```hcl
|
|
device "nvidia/gpu" {
|
|
count = 2
|
|
}
|
|
```
|
|
|
|
### Single Nvidia GPU with Specific Model
|
|
|
|
This example schedules a task with a single Nvidia GPU made available and uses
|
|
the name to specify the exact model to be used.
|
|
|
|
```hcl
|
|
device "nvidia/gpu/1080ti" {}
|
|
```
|
|
|
|
This is a simplification of the following:
|
|
|
|
```hcl
|
|
device "gpu" {
|
|
count = 1
|
|
|
|
constraint {
|
|
attribute = "${device.vendor}"
|
|
value = "nvidia"
|
|
}
|
|
|
|
constraint {
|
|
attribute = "${device.model}"
|
|
value = "1080ti"
|
|
}
|
|
}
|
|
```
|
|
|
|
### Affinity with Unit Conversion
|
|
|
|
This example uses an affinity to tell the scheduler it would prefer if the GPU
|
|
had at least 1.5 GiB of memory. The following are both equivalent as Nomad can
|
|
do unit conversions.
|
|
|
|
Specified in `GiB`:
|
|
|
|
```hcl
|
|
device "nvidia/gpu" {
|
|
affinity {
|
|
attribute = "${device.attr.memory}"
|
|
operator = ">="
|
|
value = "1.5 GiB"
|
|
weight = 75
|
|
}
|
|
}
|
|
```
|
|
|
|
Specified in `MiB`:
|
|
|
|
```hcl
|
|
device "nvidia/gpu" {
|
|
affinity {
|
|
attribute = "${device.attr.memory}"
|
|
operator = ">="
|
|
value = "1500 MiB"
|
|
weight = 75
|
|
}
|
|
}
|
|
```
|
|
|
|
### Affinity Towards Specific GPU Devices
|
|
|
|
This example uses affinity to indicate scheduling preference towards specific
|
|
GPU devices, using their UUID as selection criteria. Since devices are
|
|
fingerprinted as a group, you may specify multiple IDs as a comma separated
|
|
list.
|
|
|
|
```hcl
|
|
device "nvidia/gpu" {
|
|
affinity {
|
|
attribute = "${device.ids}"
|
|
operator = "set_contains"
|
|
value = "9afa5da1-8f39-25a2-48dc-ba31fd7c0023,c248b547-fed7-4d67-ade5-73a27d280ac4"
|
|
}
|
|
}
|
|
```
|
|
|
|
[affinity]: /docs/job-specification/affinity 'Nomad affinity Job Specification'
|
|
[constraint]: /docs/job-specification/constraint 'Nomad constraint Job Specification'
|
|
[devices]: /docs/devices 'Nomad Device Plugins'
|