Fix some docstring typos and fix noisy log message during client restarts.
A log for the common case where the plugin socket isn't ready yet
isn't actionable by the operator so having it at info is just noise.
Fixes a bug where we cpu is pigged at 100% due to collecting devices
statistics. The passed stats interval was ignored, and the default zero
value causes a very tight loop of stats collection.
FWIW, in my testing, it took 2.5-3ms to collect nvidia GPU stats, on a
`g2.2xlarge` ec2 instance.
The stats interval defaults to 1 second and is user configurable. I
believe this is too frequent as a default, and I may advocate for
reducing it to a value closer to 5s or 10s, but keeping it as is for
now.
Fixes https://github.com/hashicorp/nomad/issues/6057 .
The driver manager is modeled after the device manager and is started by the client.
It's responsible for handling driver lifecycle and reattachment state, as well as
processing the incomming fingerprint and task events from each driver. The mananger
exposes a method for registering event handlers for task events that is used by the
task runner to update the server when a task has been updated with an event.
Since driver fingerprinting has been implemented by the driver manager, it is no
longer needed in the fingerprint mananger and has been removed.
Introduce a device manager that manages the lifecycle of device plugins
on the client. It fingerprints, collects stats, and forwards Reserve
requests to the correct plugin. The manager, also handles device plugins
failing and validates their output.