open-nomad/website/source/docs/job-specification/device.html.md

7.2 KiB

layout page_title sidebar_current description
docs device Stanza - Job Specification docs-job-specification-device The "device" stanza is used to require a certain device be made available to the task.

device Stanza

Placement job -> group -> task -> resources -> **device**

The device stanza is used to create both a scheduling and runtime requirement that the given task has access to the specified devices. A device is a hardware device that is attached to the node and may be made available to the task. Examples are GPUs, FPGAs, and TPUs.

When a device stanza is added, Nomad will schedule the task onto a node that contains the set of device(s) that meet the specified requirements. The device stanza allows the operator to specify as little as just the type of device required, such as gpu, all the way to specifying arbitrary constraints and affinities. Once the scheduler has placed the allocation on a suitable node, the Nomad Client will invoke the device plugin to retrieve information on how to mount the device and what environment variables to expose. For more information on the runtime environment, please consult the individual device plugin's documentation.

See the device plugin's documentation for a list of supported devices.

job "docs" {
  group "example" {
    task "server" {
      resources {
        device "nvidia/gpu" {
          count = 2

          constraint {
            attribute = "${device.attr.memory}"
            operator  = ">="
            value     = "2 GiB"
          }

          affinity {
            attribute = "${device.attr.memory}"
            operator  = ">="
            value     = "4 GiB"
            weight    = 75
          }
        }
      }
    }
  }
}

In the above example, the task is requesting two GPUs, from the Nvidia vendor, but is not specifying the specific model required. Instead it is placing a hard constraint that the device has at least 2 GiB of memory and that it would prefer to use GPUs that have at least 4 GiB. This examples shows how expressive the device stanza can be.

~> Device supported is currently limited to Linux, and container based drivers due to the ability to isolate devices to specific tasks.

device Parameters

  • name (string: "") - Specifies the device required. The following inputs are valid:

    • <device_type>: If a single value is given, it is assumed to be the device type, such as "gpu", or "fpga".

    • <vendor>/<device_type>: If two values are given separated by a /, the given device type will be selected, constraining on the provided vendor. Examples include "nvidia/gpu" or "amd/gpu".

    • <vendor>/<device_type>/<model>: If three values are given separated by a /, the given device type will be selected, constraining on the provided vendor, and model name. Examples include "nvidia/gpu/1080ti" or "nvidia/gpu/2080ti".

  • count (int: 1) - Specifies the number of instances of the given device that are required.

  • constraint (Constraint: nil) - Constraints to restrict which devices are eligible. This can be provided multiple times to define additional constraints. See below for available attributes.

  • affinity (Affinity: nil) - Affinity to specify a preference for which devices get selected. This can be provided multiple times to define additional affinities. See below for available attributes.

device Constraint and Affinity Attributes

The set of attributes available for use in a constraint or affinity are as follows:

Variable Description Example Value
${device.type} The type of device "gpu", "tpu", "fpga"
${device.vendor} The device's vendor "amd", "nvidia", "intel"
${device.model} The device's model "1080ti"
${device.attr.<property>} Property of the device ${device.attr.memory} => 8 GiB

For the set of attributes available, please see the individual device plugin's documentation.

Attribute Units and Conversions

Devices report their attributes with strict types and can also provide unit information. For example, when a GPU is reporting its memory, it can report that it is "4096 MiB". Since Nomad has the associated unit information, a constraint that requires greater than "3.5 GiB" can match since Nomad can convert between these units.

The units Nomad supports is as follows:

Base Unit Values
Byte **Base 2**: KiB, MiB, GiB, TiB, PiB, EiB
**Base 10**: kB, KB (equivalent to kB), MB, GB, TB, PB, EB
Byte Rates **Base 2**: KiB/s, MiB/s, GiB/s, TiB/s, PiB/s, EiB/s
**Base 10**: kB/s, KB/s (equivalent to kB/s), MB/s, GB/s, TB/s, PB/s, EB/s
Hertz MHz, GHz
Watts mW, W, kW, MW, GW

Conversion is only possible within the same base unit.

device Examples

The following examples only show the device stanzas. Remember that the device stanza is only valid in the placements listed above.

Single Nvidia GPU

This example schedules a task with a single Nvidia GPU made available.

device "nvidia/gpu" {}

Multiple Nvidia GPU

This example schedules a task with a two Nvidia GPU made available.

device "nvidia/gpu" {
  count = 2    
}

Single Nvidia GPU with Specific Model

This example schedules a task with a single Nvidia GPU made available and uses the name to specify the exact model to be used.

device "nvidia/gpu/1080ti" {}

This is a simplification of the following:

device "gpu" {
  count = 1

  constraint {
    attribute = "${device.vendor}"
    value     = "nvidia"
  }

  constraint {
    attribute = "${device.model}"
    value     = "1080ti"
  }
}

Affinity with Unit Conversion

This example uses an affinity to tell the scheduler it would prefer if the GPU had at least 1.5 GiB of memory. The following are both equivalent as Nomad can do unit conversions.

Specified in GiB:

device "nvidia/gpu" {
  affinity {
    attribute = "${device.attr.memory}"
    operator  = ">="
    value     = "1.5 GiB"
    weight    = 75
  }
}

Specified in MiB:

device "nvidia/gpu" {
  affinity {
    attribute = "${device.attr.memory}"
    operator  = ">="
    value     = "1500 MiB"
    weight    = 75
  }
}