This changeset adds a subsystem to run on the leader, similar to the
deployment watcher or node drainer. The `Watcher` performs a blocking
query on updates to the `CSIVolumes` table and triggers reaping of
volume claims.
This will avoid tying up scheduling workers by immediately sending
volume claim workloads into their own loop, rather than blocking the
scheduling workers in the core GC job doing things like talking to CSI
controllers
The volume watcher is enabled on leader step-up and disabled on leader
step-down.
The volume claim GC mechanism now makes an empty claim RPC for the
volume to trigger an index bump. That in turn unblocks the blocking
query in the volume watcher so it can assess which claims can be
released for a volume.
* nomad/state/state_store: error message copy/paste error
* nomad/structs/structs: add a VolumeEval to the JobDeregisterResponse
* nomad/job_endpoint: synchronously, volumeClaimReap on job Deregister
* nomad/core_sched: make volumeClaimReap available without a CoreSched
* nomad/job_endpoint: Deregister return early if the job is missing
* nomad/job_endpoint_test: job Deregistion is idempotent
* nomad/core_sched: conditionally ignore alloc status in volumeClaimReap
* nomad/job_endpoint: volumeClaimReap all allocations, even running
* nomad/core_sched_test: extra argument to collectClaimsToGCImpl
* nomad/job_endpoint: job deregistration is not idempotent
Enable configuration of HTTP and gRPC endpoints which should be exposed by
the Connect sidecar proxy. This changeset is the first "non-magical" pass
that lays the groundwork for enabling Consul service checks for tasks
running in a network namespace because they are Connect-enabled. The changes
here provide for full configuration of the
connect {
sidecar_service {
proxy {
expose {
paths = [{
path = <exposed endpoint>
protocol = <http or grpc>
local_path_port = <local endpoint port>
listener_port = <inbound mesh port>
}, ... ]
}
}
}
stanza. Everything from `expose` and below is new, and partially implements
the precedent set by Consul:
https://www.consul.io/docs/connect/registration/service-registration.html#expose-paths-configuration-reference
Combined with a task-group level network port-mapping in the form:
port "exposeExample" { to = -1 }
it is now possible to "punch a hole" through the network namespace
to a specific HTTP or gRPC path, with the anticipated use case of creating
Consul checks on Connect enabled services.
A future PR may introduce more automagic behavior, where we can do things like
1) auto-fill the 'expose.path.local_path_port' with the default value of the
'service.port' value for task-group level connect-enabled services.
2) automatically generate a port-mapping
3) enable an 'expose.checks' flag which automatically creates exposed endpoints
for every compatible consul service check (http/grpc checks on connect
enabled services).
* nomad/structs/structs: new NodeEventSubsystemCSI
* client/client: pass triggerNodeEvent in the CSIConfig
* client/pluginmanager/csimanager/instance: add eventer to instanceManager
* client/pluginmanager/csimanager/manager: pass triggerNodeEvent
* client/pluginmanager/csimanager/volume: node event on [un]mount
* nomad/structs/structs: use storage, not CSI
* client/pluginmanager/csimanager/volume: use storage, not CSI
* client/pluginmanager/csimanager/volume_test: eventer
* client/pluginmanager/csimanager/volume: event on error
* client/pluginmanager/csimanager/volume_test: check event on error
* command/node_status: remove an extra space in event detail format
* client/pluginmanager/csimanager/volume: use snake_case for details
* client/pluginmanager/csimanager/volume_test: snake_case details
- tg.Count defaults to tg.Scaling.Min if present (falls back on previous default of 1 if Scaling is absent)
- Validate() enforces tg.Scaling.Min <= tg.Count <= tg.Scaling.Max
modification in ApiScalingPolicyToStructs, api.TaskGroup.Validate so that defaults are handled for TaskGroup.Count and
* command/csi: csi, csi_plugin, csi_volume
* helper/funcs: move ExtraKeys from parse_config to UnusedKeys
* command/agent/config_parse: use helper.UnusedKeys
* api/csi: annotate CSIVolumes with hcl fields
* command/csi_plugin: add Synopsis
* command/csi_volume_register: use hcl.Decode style parsing
* command/csi_volume_list
* command/csi_volume_status: list format, cleanup
* command/csi_plugin_list
* command/csi_plugin_status
* command/csi_volume_deregister
* command/csi_volume: add Synopsis
* api/contexts/contexts: add csi search contexts to the constants
* command/commands: register csi commands
* api/csi: fix struct tag for linter
* command/csi_plugin_list: unused struct vars
* command/csi_plugin_status: unused struct vars
* command/csi_volume_list: unused struct vars
* api/csi: add allocs to CSIPlugin
* command/csi_plugin_status: format the allocs
* api/allocations: copy Allocation.Stub in from structs
* nomad/client_rpc: add some error context with Errorf
* api/csi: collapse read & write alloc maps to a stub list
* command/csi_volume_status: cleanup allocation display
* command/csi_volume_list: use Schedulable instead of Healthy
* command/csi_volume_status: use Schedulable instead of Healthy
* command/csi_volume_list: sprintf string
* command/csi: delete csi.go, csi_plugin.go
* command/plugin: refactor csi components to sub-command plugin status
* command/plugin: remove csi
* command/plugin_status: remove csi
* command/volume: remove csi
* command/volume_status: split out csi specific
* helper/funcs: add RemoveEqualFold
* command/agent/config_parse: use helper.RemoveEqualFold
* api/csi: do ,unusedKeys right
* command/volume: refactor csi components to `nomad volume`
* command/volume_register: split out csi specific
* command/commands: use the new top level commands
* command/volume_deregister: hardwired type csi for now
* command/volume_status: csiFormatVolumes rescued from volume_list
* command/plugin_status: avoid a panic on no args
* command/volume_status: avoid a panic on no args
* command/plugin_status: predictVolumeType
* command/volume_status: predictVolumeType
* nomad/csi_endpoint_test: move CreateTestPlugin to testing
* command/plugin_status_test: use CreateTestCSIPlugin
* nomad/structs/structs: add CSIPlugins and CSIVolumes search consts
* nomad/state/state_store: add CSIPlugins and CSIVolumesByIDPrefix
* nomad/search_endpoint: add CSIPlugins and CSIVolumes
* command/plugin_status: move the header to the csi specific
* command/volume_status: move the header to the csi specific
* nomad/state/state_store: CSIPluginByID prefix
* command/status: rename the search context to just Plugins/Volumes
* command/plugin,volume_status: test return ids now
* command/status: rename the search context to just Plugins/Volumes
* command/plugin_status: support -json and -t
* command/volume_status: support -json and -t
* command/plugin_status_csi: comments
* command/*_status: clean up text
* api/csi: fix stale comments
* command/volume: make deregister sound less fearsome
* command/plugin_status: set the id length
* command/plugin_status_csi: more compact plugin health
* command/volume: better error message, comment
When an alloc is marked terminal (and after node unstage/unpublish
have been called), the client syncs the terminal alloc state with the
server via `Node.UpdateAlloc RPC`.
For each job that has a terminal alloc, the `Node.UpdateAlloc` RPC
handler at the server will emit an eval for a new core job to garbage
collect CSI volume claims. When this eval is handled on the core
scheduler, it will call a `volumeReap` method to release the claims
for all terminal allocs on the job.
The volume reap will issue a `ControllerUnpublishVolume` RPC for any
node that has no alloc claiming the volume. Once this returns (or
is skipped), the volume reap will send a new `CSIVolume.Claim` RPC
that releases the volume claim for that allocation in the state store,
making it available for scheduling again.
This same `volumeReap` method will be called from the core job GC,
which gives us a second chance to reclaim volumes during GC if there
were controller RPC failures.
This changeset adds a new core job `CoreJobCSIVolumePublicationGC` to
the leader's loop for scheduling core job evals. Right now this is an
empty method body without even a config file stanza. Later changesets
will implement the logic of volume publication GC.
This changeset implements the initial registration and fingerprinting
of CSI Plugins as part of #5378. At a high level, it introduces the
following:
* A `csi_plugin` stanza as part of a Nomad task configuration, to
allow a task to expose that it is a plugin.
* A new task runner hook: `csi_plugin_supervisor`. This hook does two
things. When the `csi_plugin` stanza is detected, it will
automatically configure the plugin task to receive bidirectional
mounts to the CSI intermediary directory. At runtime, it will then
perform an initial heartbeat of the plugin and handle submitting it to
the new `dynamicplugins.Registry` for further use by the client, and
then run a lightweight heartbeat loop that will emit task events
when health changes.
* The `dynamicplugins.Registry` for handling plugins that run
as Nomad tasks, in contrast to the existing catalog that requires
`go-plugin` type plugins and to know the plugin configuration in
advance.
* The `csimanager` which fingerprints CSI plugins, in a similar way to
`drivermanager` and `devicemanager`. It currently only fingerprints
the NodeID from the plugin, and assumes that all plugins are
monolithic.
Missing features
* We do not use the live updates of the `dynamicplugin` registry in
the `csimanager` yet.
* We do not deregister the plugins from the client when they shutdown
yet, they just become indefinitely marked as unhealthy. This is
deliberate until we figure out how we should manage deploying new
versions of plugins/transitioning them.
Nomad jobs may be configured with a TaskGroup which contains a Service
definition that is Consul Connect enabled. These service definitions end
up establishing a Consul Connect Proxy Task (e.g. envoy, by default). In
the case where Consul ACLs are enabled, a Service Identity token is required
for these tasks to run & connect, etc. This changeset enables the Nomad Server
to recieve RPC requests for the derivation of SI tokens on behalf of instances
of Consul Connect using Tasks. Those tokens are then relayed back to the
requesting Client, which then injects the tokens in the secrets directory of
the Task.
When a job is configured with Consul Connect aware tasks (i.e. sidecar),
the Nomad Client should be able to request from Consul (through Nomad Server)
Service Identity tokens specific to those tasks.
Enable any Server to lookup the unique ClusterID. If one has not been
generated, and this node is the leader, generate a UUID and attempt to
apply it through raft.
The value is not yet used anywhere in this changeset, but is a prerequisite
for gh-6701.
This change provides an initial pass at setting up the configuration necessary to
enable use of Connect with Consul ACLs. Operators will be able to pass in a Consul
Token through `-consul-token` or `$CONSUL_TOKEN` in the `job run` and `job revert`
commands (similar to Vault tokens).
These values are not actually used yet in this changeset.