This commit introduces a new set of endpoints to a Nomad Client:
ClientCSI.
ClientCSI is responsible for mediating requests from a Nomad Server to
a CSI Plugin running on a Nomad Client. It should only really be used to
make controller RPCs.
This changeset implements the initial registration and fingerprinting
of CSI Plugins as part of #5378. At a high level, it introduces the
following:
* A `csi_plugin` stanza as part of a Nomad task configuration, to
allow a task to expose that it is a plugin.
* A new task runner hook: `csi_plugin_supervisor`. This hook does two
things. When the `csi_plugin` stanza is detected, it will
automatically configure the plugin task to receive bidirectional
mounts to the CSI intermediary directory. At runtime, it will then
perform an initial heartbeat of the plugin and handle submitting it to
the new `dynamicplugins.Registry` for further use by the client, and
then run a lightweight heartbeat loop that will emit task events
when health changes.
* The `dynamicplugins.Registry` for handling plugins that run
as Nomad tasks, in contrast to the existing catalog that requires
`go-plugin` type plugins and to know the plugin configuration in
advance.
* The `csimanager` which fingerprints CSI plugins, in a similar way to
`drivermanager` and `devicemanager`. It currently only fingerprints
the NodeID from the plugin, and assumes that all plugins are
monolithic.
Missing features
* We do not use the live updates of the `dynamicplugin` registry in
the `csimanager` yet.
* We do not deregister the plugins from the client when they shutdown
yet, they just become indefinitely marked as unhealthy. This is
deliberate until we figure out how we should manage deploying new
versions of plugins/transitioning them.
allow oss to parse sink duration
clean up audit sink parsing
ent eventer config reload
fix typo
SetEnabled to eventer interface
client acl test
rm dead code
fix failing test
In service jobs, lifecycles non-sidecar task tweak health logic a bit:
they may terminate successfully without impacting alloc health, but fail
the alloc if they fail.
Sidecars should be treated just like a normal task.
Fixes a bug where an allocation is considered healthy if some of the
tasks are being restarted and as such, their checks aren't tracked by
consul agent client.
Here, we fix the immediate case by ensuring that an alloc is healthy
only if tasks are running and the registered checks at the time are
healthy.
Previously, health tracker tracked task "health" independently from
checks and leads to problems when a task restarts. Consider the
following series of events:
1. all tasks start running -> `tracker.tasksHealthy` is true
2. one task has unhealthy checks and get restarted
3. remaining checks are healthy -> `tracker.checksHealthy` is true
4. propagate health status now that `tracker.tasksHealthy` and
`tracker.checksHealthy`.
This change ensures that we accurately use the latest status of tasks
and checks regardless of their status changes.
Also, ensures that we only consider check health after tasks are
considered healthy, otherwise we risk trusting incomplete checks.
This approach accomodates task dependencies well. Service jobs can have
prestart short-lived tasks that will terminate before main process runs.
These dead tasks that complete successfully will not negate health
status.