open-nomad/devices/gpu/nvidia
oleksii.shyman e41fbf7577 Add support for docker runtimes
- docker fingerprint issues a docker api system info call to get the
  list of supported OCI runtimes.
  - OCI runtimes are reported as comma separated list of names
  - docker driver is aware of GPU runtime presence
  - docker driver throws an error when user tries to run container with
  GPU, when GPU runtime is not present
  - docker GPU runtime name is configurable
2019-01-15 11:34:47 -08:00
..
cmd nvidia package restructue + build non-linux 2018-10-05 13:56:04 -07:00
nvml nvidia package restructue + build non-linux 2018-10-05 13:56:04 -07:00
README.md Device manager 2018-11-07 10:43:15 -08:00
device.go Add support for docker runtimes 2019-01-15 11:34:47 -08:00
device_test.go Add support for docker runtimes 2019-01-15 11:34:47 -08:00
fingerprint.go Device manager 2018-11-07 10:43:15 -08:00
fingerprint_test.go Use Attribute in device fingerprinting 2018-10-13 11:43:06 -07:00
stats.go devices/nvidia: memory state as the summary stat 2018-12-10 12:18:24 -05:00
stats_test.go devices/nvidia: memory state as the summary stat 2018-12-10 12:18:24 -05:00

README.md

This package provides an implementation of nvidia device plugin

Behavior

Nvidia device plugin uses NVML bindings to get data regarding available nvidia devices and will expose them via Fingerprint RPC. GPUs can be excluded from fingerprinting by setting the ignored_gpu_ids field. Plugin sends statistics for fingerprinted devices every stats_period period.

Config

The configuration should be passed via an HCL file that begins with a top level config stanza:

config {
  ignored_gpu_ids = ["uuid1", "uuid2"]
  fingerprint_period = "5s"
}

The valid configuration options are:

  • ignored_gpu_ids (list(string): []): list of GPU UUIDs strings that should not be exposed to nomad
  • fingerprint_period (string: "5s"): The interval to repeat fingerprint process to identify possible changes.