open-nomad/website/content/docs/job-specification/device.mdx
Luiz Aoqui ed5fccc183
scheduler: allow using device ID as attribute (#15455)
Devices are fingerprinted as groups of similar devices. This prevented
specifying specific device by their ID in constraint and affinity rules.

This commit introduces the `${device.ids}` attribute that returns a
comma separated list of IDs that are part of the device group. Users can
then use the set operators to write rules.
2023-01-10 14:28:23 -05:00

330 lines
8.4 KiB
Plaintext

---
layout: docs
page_title: device Stanza - Job Specification
description: |-
The "device" stanza is used to require a certain device be made available
to the task.
---
# `device` Stanza
<Placement groups={['job', 'group', 'task', 'resources', 'device']} />
The `device` stanza is used to create both a scheduling and runtime requirement
that the given task has access to the specified devices. A device is a hardware
device that is attached to the node and may be made available to the task.
Examples are GPUs, FPGAs, and TPUs.
When a `device` stanza is added, Nomad will schedule the task onto a node that
contains the set of device(s) that meet the specified requirements. The `device` stanza
allows the operator to specify as little as just the type of device required,
such as `gpu`, all the way to specifying arbitrary constraints and affinities.
Once the scheduler has placed the allocation on a suitable node, the Nomad
Client will invoke the device plugin to retrieve information on how to mount the
device and what environment variables to expose. For more information on the
runtime environment, please consult the individual device plugin's documentation.
See the [device plugin's documentation][devices] for a list of supported devices.
```hcl
job "docs" {
group "example" {
task "server" {
resources {
device "nvidia/gpu" {
count = 2
constraint {
attribute = "${device.attr.memory}"
operator = ">="
value = "2 GiB"
}
affinity {
attribute = "${device.attr.memory}"
operator = ">="
value = "4 GiB"
weight = 75
}
}
}
}
}
}
```
In the above example, the task is requesting two GPUs, from the Nvidia vendor,
but is not specifying the specific model required. Instead it is placing a hard
constraint that the device has at least 2 GiB of memory and that it would prefer
to use GPUs that have at least 4 GiB. This examples shows how expressive the
`device` stanza can be.
~> Device support is currently limited to Linux, and container based drivers
due to the ability to isolate devices to specific tasks.
## `device` Parameters
- `name` `(string: "")` - Specifies the device required. The following inputs
are valid:
- `<device_type>`: If a single value is given, it is assumed to be the device
type, such as "gpu", or "fpga".
- `<vendor>/<device_type>`: If two values are given separated by a `/`, the
given device type will be selected, constraining on the provided vendor.
Examples include "nvidia/gpu" or "amd/gpu".
- `<vendor>/<device_type>/<model>`: If three values are given separated by a `/`, the
given device type will be selected, constraining on the provided vendor, and
model name. Examples include "nvidia/gpu/1080ti" or "nvidia/gpu/2080ti".
- `count` `(int: 1)` - Specifies the number of instances of the given device
that are required.
- `constraint` <code>([Constraint][]: nil)</code> - Constraints to restrict
which devices are eligible. This can be provided multiple times to define
additional constraints. See below for available attributes.
- `affinity` <code>([Affinity][]: nil)</code> - Affinity to specify a preference
for which devices get selected. This can be provided multiple times to define
additional affinities. See below for available attributes.
## `device` Constraint and Affinity Attributes
The set of attributes available for use in a `constraint` or `affinity` are as
follows:
<table>
<thead>
<tr>
<th>Variable</th>
<th>Description</th>
<th>Example Value</th>
</tr>
</thead>
<tbody>
<tr>
<td>
<code>{'${device.ids}'}</code>
</td>
<td>Comma separated list of device IDs in the group</td>
<td>
<code>9afa5da1-8f39-25a2-48dc-ba31fd7c0023,c248b547-fed7-4d67-ade5-73a27d280ac4</code>
</td>
</tr>
<tr>
<td>
<code>{'${device.type}'}</code>
</td>
<td>The type of device</td>
<td>
<code>"gpu", "tpu", "fpga"</code>
</td>
</tr>
<tr>
<td>
<code>{'${device.vendor}'}</code>
</td>
<td>The device's vendor</td>
<td>
<code>"amd", "nvidia", "intel"</code>
</td>
</tr>
<tr>
<td>
<code>{'${device.model}'}</code>
</td>
<td>The device's model</td>
<td>
<code>"1080ti"</code>
</td>
</tr>
<tr>
<td>
<code>
${'{'}device.attr.&lt;property&gt;{'}'}
</code>
</td>
<td>Property of the device</td>
<td>
<code>{'${device.attr.memory} => 8 GiB'}</code>
</td>
</tr>
</tbody>
</table>
For the set of attributes available, please see the individual [device plugin's
documentation][devices].
### Attribute Units and Conversions
Devices report their attributes with strict types and can also provide unit
information. For example, when a GPU is reporting its memory, it can report that
it is "4096 MiB". Since Nomad has the associated unit information, a constraint
that requires greater than "3.5 GiB" can match since Nomad can convert between
these units.
The units Nomad supports are as follows:
<table>
<thead>
<tr>
<th>Base Unit</th>
<th>Values</th>
</tr>
</thead>
<tbody>
<tr>
<td>
<code>Byte</code>
</td>
<td>
<code>
<strong>Base 2</strong>: KiB, MiB, GiB, TiB, PiB, EiB
</code>
<br />
<code>
<strong>Base 10</strong>: kB, KB (equivalent to kB), MB, GB, TB, PB,
EB
</code>
</td>
</tr>
<tr>
<td>
<code>Byte Rates</code>
</td>
<td>
<code>
<strong>Base 2</strong>: KiB/s, MiB/s, GiB/s, TiB/s, PiB/s, EiB/s
</code>
<br />
<code>
<strong>Base 10</strong>: kB/s, KB/s (equivalent to kB/s), MB/s, GB/s,
TB/s, PB/s,EB/s
</code>
</td>
</tr>
<tr>
<td>
<code>Hertz</code>
</td>
<td>
<code>MHz, GHz</code>
</td>
</tr>
<tr>
<td>
<code>Watts</code>
</td>
<td>
<code>mW, W, kW, MW, GW</code>
</td>
</tr>
</tbody>
</table>
Conversion is only possible within the same base unit.
## `device` Examples
The following examples only show the `device` stanzas. Remember that the
`device` stanza is only valid in the placements listed above.
### Single Nvidia GPU
This example schedules a task with a single Nvidia GPU made available.
```hcl
device "nvidia/gpu" {}
```
### Multiple Nvidia GPU
This example schedules a task with a two Nvidia GPU made available.
```hcl
device "nvidia/gpu" {
count = 2
}
```
### Single Nvidia GPU with Specific Model
This example schedules a task with a single Nvidia GPU made available and uses
the name to specify the exact model to be used.
```hcl
device "nvidia/gpu/1080ti" {}
```
This is a simplification of the following:
```hcl
device "gpu" {
count = 1
constraint {
attribute = "${device.vendor}"
value = "nvidia"
}
constraint {
attribute = "${device.model}"
value = "1080ti"
}
}
```
### Affinity with Unit Conversion
This example uses an affinity to tell the scheduler it would prefer if the GPU
had at least 1.5 GiB of memory. The following are both equivalent as Nomad can
do unit conversions.
Specified in `GiB`:
```hcl
device "nvidia/gpu" {
affinity {
attribute = "${device.attr.memory}"
operator = ">="
value = "1.5 GiB"
weight = 75
}
}
```
Specified in `MiB`:
```hcl
device "nvidia/gpu" {
affinity {
attribute = "${device.attr.memory}"
operator = ">="
value = "1500 MiB"
weight = 75
}
}
```
### Affinity Towards Specific GPU Devices
This example uses affinity to indicate scheduling preference towards specific
GPU devices, using their UUID as selection criteria. Since devices are
fingerprinted as a group, you may specify multiple IDs as a comma separated
list.
```hcl
device "nvidia/gpu" {
affinity {
attribute = "${device.ids}"
operator = "set_contains"
value = "9afa5da1-8f39-25a2-48dc-ba31fd7c0023,c248b547-fed7-4d67-ade5-73a27d280ac4"
}
}
```
[affinity]: /docs/job-specification/affinity 'Nomad affinity Job Specification'
[constraint]: /docs/job-specification/constraint 'Nomad constraint Job Specification'
[devices]: /docs/devices 'Nomad Device Plugins'