open-nomad

Author	SHA1	Message	Date
Seth Hoenig	2631659551	ci: swap ci parallelization for unconstrained gomaxprocs	2022-03-15 12:58:52 -05:00
Tim Gross	246db87a74	CSI: allow for concurrent plugin allocations (#12078 ) The dynamic plugin registry assumes that plugins are singletons, which matches the behavior of other Nomad plugins. But because dynamic plugins like CSI are implemented by allocations, we need to handle the possibility of multiple allocations for a given plugin type + ID, as well as behaviors around interleaved allocation starts and stops. Update the data structure for the dynamic registry so that more recent allocations take over as the instance manager singleton, but we still preserve the previous running allocations so that restores work without racing. Multiple allocations can run on a client for the same plugin, even if only during updates. Provide each plugin task a unique path for the control socket so that the tasks don't interfere with each other.	2022-02-23 15:23:07 -05:00
Tim Gross	27bb2da5ee	CSI: make gRPC client creation more robust (#12057 ) Nomad communicates with CSI plugin tasks via gRPC. The plugin supervisor hook uses this to ping the plugin for health checks which it emits as task events. After the first successful health check the plugin supervisor registers the plugin in the client's dynamic plugin registry, which in turn creates a CSI plugin manager instance that has its own gRPC client for fingerprinting the plugin and sending mount requests. If the plugin manager instance fails to connect to the plugin on its first attempt, it exits. The plugin supervisor hook is unaware that connection failed so long as its own pings continue to work. A transient failure during plugin startup may mislead the plugin supervisor hook into thinking the plugin is up (so there's no need to restart the allocation) but no fingerprinter is started. * Refactors the gRPC client to connect on first use. This provides the plugin manager instance the ability to retry the gRPC client connection until success. * Add a 30s timeout to the plugin supervisor so that we don't poll forever waiting for a plugin that will never come back up. Minor improvements: * The plugin supervisor hook creates a new gRPC client for every probe and then throws it away. Instead, reuse the client as we do for the plugin manager. * The gRPC client constructor has a 1 second timeout. Clarify that this timeout applies to the connection and not the rest of the client lifetime.	2022-02-15 16:57:29 -05:00
Tim Gross	66b4b28b1a	CSI: node unmount from the client before unpublish RPC (#11892 ) When an allocation stops, the `csi_hook` makes an unpublish RPC to the servers to unpublish via the CSI RPCs: first to the node plugins and then the controller plugins. The controller RPCs must happen after the node RPCs so that the node has had a chance to unmount the volume before the controller tries to detach the associated device. But the client has local access to the node plugins and can independently determine if it's safe to send unpublish RPC to those plugins. This will allow the server to treat the node plugin as abandoned if a client is disconnected and `stop_on_client_disconnect` is set. This will let the server try to send unpublish RPCs to the controller plugins, under the assumption that the client will be trying to unmount the volume on its end first. Note that the CSI `NodeUnpublishVolume`/`NodeUnstageVolume` RPCs can return ignorable errors in the case where the volume has already been unmounted from the node. Handle all other errors by retrying until we get success so as to give operators the opportunity to reschedule a failed node plugin (ex. in the case where they accidentally drained a node without `-ignore-system`). Fan-out the work for each volume into its own goroutine so that we can release a subset of volumes if only one is stuck.	2022-01-28 08:30:31 -05:00
Seth Hoenig	4650e97d29	deps: upgrade docker and runc This PR upgrades - docker dependency to the latest tagged release (v20.10.12) - runc dependency to the latest tagged release (v1.0.3) Docker does not abide by [semver](https://github.com/moby/moby/issues/39302), so it is marked +incompatible, and transitive dependencies are upgrade manually. Runc made three relevant breaking changes * cgroup manager .Set changed to accept Resources instead of Cgroup `3f65946756` * config.Device moved to devices.Device https://github.com/opencontainers/runc/pull/2679 * mountinfo.Mounted now returns an error if the specified path does not exist https://github.com/moby/sys/blob/mountinfo/v0.5.0/mountinfo/mountinfo.go#L16	2022-01-18 08:35:26 -06:00
Ryan Sundberg	d43c5f98a5	CSI: Include MountOptions in capabilities sent to CSI for all RPCs Include the VolumeCapability.MountVolume data in ControllerPublishVolume, CreateVolume, and ValidateVolumeCapabilities RPCs sent to the CSI controller. The previous behavior was to only include the MountVolume capability in the NodeStageVolume request, which on some CSI implementations would be rejected since the Volume was not originally provisioned with the specific mount capabilities requested.	2021-05-24 10:59:54 -04:00
Tim Gross	276633673d	CSI: use AccessMode/AttachmentMode from CSIVolumeClaim Registration of Nomad volumes previously allowed for a single volume capability (access mode + attachment mode pair). The recent `volume create` command requires that we pass a list of requested capabilities, but the existing workflow for claiming volumes and attaching them on the client assumed that the volume's single capability was correct and unchanging. Add `AccessMode` and `AttachmentMode` to `CSIVolumeClaim`, use these fields to set the initial claim value, and add backwards compatibility logic to handle the existing volumes that already have claims without these fields.	2021-04-07 11:24:09 -04:00
Tim Gross	0856483115	CSI: fingerprint detailed node capabilities In order to support new node RPCs, we need to fingerprint plugin capabilities in more detail. This changeset mirrors recent work to fingerprint controller capabilities, but is not yet in use by any Nomad RPC.	2021-04-01 16:00:58 -04:00
Tim Gross	9fc4cf1419	CSI: fingerprint detailed controller capabilities In order to support new controller RPCs, we need to fingerprint volume capabilities in more detail and perform controller RPCs only when the specific capability is present. This fixes a bug in Ceph support where the plugin can only suport create/delete but we assume that it also supports attach/detach.	2021-03-31 16:37:09 -04:00
Tim Gross	d38008176e	CSI: create/delete/list volume RPCs This commit implements the RPC handlers on the client that talk to the CSI plugins on that client for the Create/Delete/List RPC.	2021-03-31 16:37:09 -04:00
Tim Gross	7d53ed88d6	csi: client RPCs should return wrapped errors for checking (#8605 ) When the client-side actions of a CSI client RPC succeed but we get disconnected during the RPC or we fail to checkpoint the claim state, we want to be able to retry the client RPC without getting blocked by the client-side state (ex. mount points) already having been cleaned up in previous calls.	2020-08-07 11:01:36 -04:00
Tim Gross	56c6dacd38	csi: NodePublish should not create target_path, only its parent dir (#8505 ) The NodePublish workflow currently creates the target path and its parent directory. However, the CSI specification says that the CO shall ensure the parent directory of the target path exists, and that the SP shall place the block device or mounted directory at the target path. Much of our testing has been with CSI plugins that are more forgiving, but our behavior breaks spec-compliant CSI plugins. This changeset ensures we only create the parent directory.	2020-07-23 15:52:22 -04:00
Tim Gross	3d38592fbb	csi: add VolumeContext to NodeStage/Publish RPCs (#8239 ) In #7957 we added support for passing a volume context to the controller RPCs. This is an opaque map that's created by `CreateVolume` or, in Nomad's case, in the volume registration spec. However, we missed passing this field to the `NodeStage` and `NodePublish` RPC, which prevents certain plugins (such as MooseFS) from making node RPCs.	2020-06-22 13:54:32 -04:00
Tim Gross	ba11aef5d9	csi: skip unit tests on unsupported platforms (#8033 ) Some of the unit tests for CSI require platform-specific APIs that aren't available on macOS. We can safely skip these tests.	2020-05-21 13:56:50 -04:00
Tim Gross	4f54a633a2	csi: refactor internal client field name to ExternalID (#7958 ) The CSI plugins RPCs require the use of the storage provider's volume ID, rather than the user-defined volume ID. Although changing the RPCs to use the field name `ExternalID` risks breaking backwards compatibility, we can use the `ExternalID` name internally for the client and only use `VolumeID` at the RPC boundaries.	2020-05-14 11:56:07 -04:00
Tim Gross	4374c1a837	csi: support Secrets parameter in CSI RPCs (#7923 ) CSI plugins can require credentials for some publishing and unpublishing workflow RPCs. Secrets are configured at the time of volume registration, stored in the volume struct, and then passed around as an opaque map by Nomad to the plugins.	2020-05-11 17:12:51 -04:00
Tim Gross	083b35d651	csi: checkpoint volume claim garbage collection (#7782 ) Adds a `CSIVolumeClaim` type to be tracked as current and past claims on a volume. Allows for a client RPC failure during node or controller detachment without having to keep the allocation around after the first garbage collection eval. This changeset lays groundwork for moving the actual detachment RPCs into a volume watching loop outside the GC eval.	2020-04-23 11:06:23 -04:00
Charlie Voiselle	c68c19f3cf	Use ExternalID in NodeStageVolume RPC (#7754 )	2020-04-20 17:13:46 -04:00
Tim Gross	5a3b45864d	csi: fix unpublish workflow ID mismatches The CSI plugins uses the external volume ID for all operations, but the Client CSI RPCs uses the Nomad volume ID (human-friendly) for the mount paths. Pass the External ID as an arg in the RPC call so that the unpublish workflows have it without calling back to the server to find the external ID. The controller CSI plugins need the CSI node ID (or in other words, the storage provider's view of node ID like the EC2 instance ID), not the Nomad node ID, to determine how to detach the external volume.	2020-04-06 10:15:55 -04:00
Tim Gross	f6b3d38eb8	CSI: move node unmount to server-driven RPCs (#7596 ) If a volume-claiming alloc stops and the CSI Node plugin that serves that alloc's volumes is missing, there's no way for the allocrunner hook to send the `NodeUnpublish` and `NodeUnstage` RPCs. This changeset addresses this issue with a redesign of the client-side for CSI. Rather than unmounting in the alloc runner hook, the alloc runner hook will simply exit. When the server gets the `Node.UpdateAlloc` for the terminal allocation that had a volume claim, it creates a volume claim GC job. This job will made client RPCs to a new node plugin RPC endpoint, and only once that succeeds, move on to making the client RPCs to the controller plugin. If the node plugin is unavailable, the GC job will fail and be requeued.	2020-04-02 16:04:56 -04:00
Lang Martin	8d4f39fba1	csi: add node events to report progress mounting and unmounting volumes (#7547 ) * nomad/structs/structs: new NodeEventSubsystemCSI * client/client: pass triggerNodeEvent in the CSIConfig * client/pluginmanager/csimanager/instance: add eventer to instanceManager * client/pluginmanager/csimanager/manager: pass triggerNodeEvent * client/pluginmanager/csimanager/volume: node event on [un]mount * nomad/structs/structs: use storage, not CSI * client/pluginmanager/csimanager/volume: use storage, not CSI * client/pluginmanager/csimanager/volume_test: eventer * client/pluginmanager/csimanager/volume: event on error * client/pluginmanager/csimanager/volume_test: check event on error * command/node_status: remove an extra space in event detail format * client/pluginmanager/csimanager/volume: use snake_case for details * client/pluginmanager/csimanager/volume_test: snake_case details	2020-03-31 17:13:52 -04:00
Tim Gross	14b4712f01	csi: annotate remaining missing cancellation contexts (#7552 )	2020-03-30 16:46:43 -04:00
Tim Gross	6ffd36c4e5	csi: add grpc retries to client controller RPCs (#7549 ) The CSI Specification defines various gRPC Errors and how they may be retried. After auditing all our CSI RPC calls in #6863, this changeset: * adds retries and backoffs to the where they were needed but not implemented * annotates those CSI RPCs that do not need retries so that we don't wonder whether it's been left off accidentally * added a timeout and cancellation context to the `Probe` call, which didn't have one.	2020-03-30 16:26:03 -04:00
Lang Martin	e100444740	csi: add mount_options to volumes and volume requests (#7398 ) Add mount_options to both the volume definition on registration and to the volume block in the group where the volume is requested. If both are specified, the options provided in the request replace the options defined in the volume. They get passed to the NodePublishVolume, which causes the node plugin to actually mount the volume on the host. Individual tasks just mount bind into the host mounted volume (unchanged behavior). An operator can mount the same volume with different options by specifying it twice in the group context. closes #7007 * nomad/structs/volumes: add MountOptions to volume request * jobspec/test-fixtures/basic.hcl: add mount_options to volume block * jobspec/parse_test: add expected MountOptions * api/tasks: add mount_options * jobspec/parse_group: use hcl decode not mapstructure, mount_options * client/allocrunner/csi_hook: pass MountOptions through client/allocrunner/csi_hook: add a VolumeMountOptions client/allocrunner/csi_hook: drop Options client/allocrunner/csi_hook: use the structs options * client/pluginmanager/csimanager/interface: UsageOptions.MountOptions * client/pluginmanager/csimanager/volume: pass MountOptions in capabilities * plugins/csi/plugin: remove todo 7007 comment * nomad/structs/csi: MountOptions * api/csi: add options to the api for parsing, match structs * plugins/csi/plugin: move VolumeMountOptions to structs * api/csi: use specific type for mount_options * client/allocrunner/csi_hook: merge MountOptions here * rename CSIOptions to CSIMountOptions * client/allocrunner/csi_hook * client/pluginmanager/csimanager/volume * nomad/structs/csi * plugins/csi/fake/client: add PrevVolumeCapability * plugins/csi/plugin * client/pluginmanager/csimanager/volume_test: remove debugging * client/pluginmanager/csimanager/volume: fix odd merging logic * api: rename CSIOptions -> CSIMountOptions * nomad/csi_endpoint: remove a 7007 comment * command/alloc_status: show mount options in the volume list * nomad/structs/csi: include MountOptions in the volume stub * api/csi: add MountOptions to stub * command/volume_status_csi: clean up csiVolMountOption, add it * command/alloc_status: csiVolMountOption lives in volume_csi_status * command/node_status: display mount flags * nomad/structs/volumes: npe * plugins/csi/plugin: npe in ToCSIRepresentation * jobspec/parse_test: expand volume parse test cases * command/agent/job_endpoint: ApiTgToStructsTG needs MountOptions * command/volume_status_csi: copy paste error * jobspec/test-fixtures/basic: hclfmt * command/volume_status_csi: clean up csiVolMountOption	2020-03-23 13:59:25 -04:00
Tim Gross	32b94bf1a4	csi: stub fingerprint on instance manager shutdown (#7388 ) Run the plugin fingerprint one last time with a closed client during instance manager shutdown. This will return quickly and will give us a correctly-populated `PluginInfo` marked as unhealthy so the Nomad client can update the server about plugin health.	2020-03-23 13:59:25 -04:00
Tim Gross	5a0bcd39d1	csi: dynamically update plugin registration (#7386 ) Allow for faster updates to plugin status when allocations become terminal by listening for register/deregister events from the dynamic plugin registry (which in turn are triggered by the plugin supervisor hook). The deregistration function closures that we pass up to the CSI plugin manager don't properly close over the name and type of the registration, causing monolith-type plugins to deregister only one of their two plugins on alloc shutdown. Rebind plugin supervisor deregistration targets to fix that. Includes log message and comment improvements	2020-03-23 13:59:25 -04:00
Tim Gross	eda7be552c	csi: add dynamicplugins registry to client state store (#7330 ) In order to correctly fingerprint dynamic plugins on client restarts, we need to persist a handle to the plugin (that is, connection info) to the client state store. The dynamic registry will sync automatically to the client state whenever it receives a register/deregister call.	2020-03-23 13:58:30 -04:00
Lang Martin	6750c262a4	csi: use `ExternalID`, when set, to identify volumes for outside RPC calls (#7326 ) * nomad/structs/csi: new RemoteID() uses the ExternalID if set * nomad/csi_endpoint: pass RemoteID to volume request types * client/pluginmanager/csimanager/volume: pass RemoteID to NodePublishVolume	2020-03-23 13:58:30 -04:00
Tim Gross	de4ad6ca38	csi: add Provider field to CSI CLIs and APIs (#7285 ) Derive a provider name and version for plugins (and the volumes that use them) from the CSI identity API `GetPluginInfo`. Expose the vendor name as `Provider` in the API and CLI commands.	2020-03-23 13:58:30 -04:00
Lang Martin	a4784ef258	csi add allocation context to fingerprinting results (#7133 ) * structs: CSIInfo include AllocID, CSIPlugins no Jobs * state_store: eliminate plugin Jobs, delete an empty plugin * nomad/structs/csi: detect empty plugins correctly * client/allocrunner/taskrunner/plugin_supervisor_hook: option AllocID * client/pluginmanager/csimanager/instance: allocID * client/pluginmanager/csimanager/fingerprint: set AllocID * client/node_updater: split controller and node plugins * api/csi: remove Jobs The CSI Plugin API will map plugins to allocations, which allows plugins to be defined by jobs in many configurations. In particular, multiple plugins can be defined in the same job, and multiple jobs can be used to define a single plugin. Because we now map the allocation context directly from the node, it's no longer necessary to track the jobs associated with a plugin directly. * nomad/csi_endpoint_test: CreateTestPlugin & register via fingerprint * client/dynamicplugins: lift AllocID into the struct from Options * api/csi_test: remove Jobs test * nomad/structs/csi: CSIPlugins has an array of allocs * nomad/state/state_store: implement CSIPluginDenormalize * nomad/state/state_store: CSIPluginDenormalize npe on missing alloc * nomad/csi_endpoint_test: defer deleteNodes for clarity * api/csi_test: disable this test awaiting mocks: https://github.com/hashicorp/nomad/issues/7123	2020-03-23 13:58:30 -04:00
Danielle Lancashire	6fc7f7779d	csimanager/volume: Update MountVolume docstring	2020-03-23 13:58:30 -04:00
Danielle Lancashire	511b7775a6	csi: Claim CSI Volumes during csi_hook.Prerun This commit is the initial implementation of claiming volumes from the server and passes through any publishContext information as appropriate. There's nothing too fancy here.	2020-03-23 13:58:30 -04:00
Danielle Lancashire	f79351915c	csi: Basic volume usage tracking	2020-03-23 13:58:30 -04:00
Danielle Lancashire	0203341033	csi: Add comment to UsageOptions.ToFS()	2020-03-23 13:58:30 -04:00
Danielle Lancashire	6b7ee96a88	csi: Move VolumeCapabilties helper to package	2020-03-23 13:58:30 -04:00
Danielle Lancashire	da4f6b60a2	csi: Pass through usage options to the csimanager The CSI Spec requires us to attach and stage volumes based on different types of usage information when it may effect how they are bound. Here we pass through some basic usage options in the CSI Hook (specifically the volume aliases ReadOnly field), and the attachment/access mode from the volume. We pass the attachment/access mode seperately from the volume as it simplifies some handling and doesn't necessarily force every attachment to use the same mode should more be supported (I.e if we let each `volume "foo" {}` specify an override in the future).	2020-03-23 13:58:30 -04:00
Danielle Lancashire	a62a90e03c	csi: Unpublish volumes during ar.Postrun This commit introduces initial support for unmounting csi volumes. It takes a relatively simplistic approach to performing NodeUnpublishVolume calls, optimising for cleaning up any leftover state rather than terminating early in the case of errors. This is because it happens during an allocation's shutdown flow and may not always have a corresponding call to `NodePublishVolume` that succeeded.	2020-03-23 13:58:30 -04:00
Danielle Lancashire	f77d3813d1	csi: Fix broken call to newVolumeManager	2020-03-23 13:58:29 -04:00
Danielle Lancashire	3bff9fefae	csi: Provide plugin-scoped paths during RPCs When providing paths to plugins, the path needs to be in the scope of the plugins container, rather than that of the host. Here we enable that by providing the mount point through the plugin registration and then use it when constructing request target paths.	2020-03-23 13:58:29 -04:00
Danielle Lancashire	94e87fbe9c	csimanager: Cleanup volumemanager setup	2020-03-23 13:58:29 -04:00
Danielle Lancashire	ee85c468c0	csimanager: Instantiate fingerprint manager's csiclient	2020-03-23 13:58:29 -04:00
Danielle Lancashire	bbf6a9c14b	volume_manager: cleanup of mount detection No functional changes, but makes ensure.*Dir follow a nicer return style.	2020-03-23 13:58:29 -04:00
Danielle Lancashire	80b7aa0a31	volume_manager: Add support for publishing volumes	2020-03-23 13:58:29 -04:00
Danielle Lancashire	e619ae5a42	volume_manager: Initial support for unstaging volumes	2020-03-23 13:58:29 -04:00
Danielle Lancashire	6e71baa77d	volume_manager: NodeStageVolume Support This commit introduces support for staging volumes when a plugin implements the STAGE_UNSTAGE_VOLUME capability. See the following for further reference material: `4731db0e0b/spec.md (nodestagevolume)`	2020-03-23 13:58:29 -04:00
Danielle Lancashire	f1ab38e845	volume_manager: Introduce helpers for staging This commit adds helpers that create and validate the staging directory for a given volume. It is currently missing usage options as the interfaces are not yet in place for those. The staging directory is only required when a volume has the STAGE_UNSTAGE Volume capability and has to live within the plugin root as the plugin needs to be able to create mounts inside it from within the container.	2020-03-23 13:58:29 -04:00
Lang Martin	33c55e609b	csi: pluginmanager use PluginID instead of Driver	2020-03-23 13:58:29 -04:00
Danielle Lancashire	1a10433b97	csi: Add VolumeManager (#6920 ) This changeset is some pre-requisite boilerplate that is required for introducing CSI volume management for client nodes. It extracts out fingerprinting logic from the csi instance manager. This change is to facilitate reusing the csimanager to also manage the node-local CSI functionality, as it is the easiest place for us to guaruntee health checking and to provide additional visibility into the running operations through the fingerprinter mechanism and goroutine. It also introduces the VolumeMounter interface that will be used to manage staging/publishing unstaging/unpublishing of volumes on the host.	2020-03-23 13:58:29 -04:00
Danielle Lancashire	de5d373001	csi: Setup gRPC Clients with a logger	2020-03-23 13:58:29 -04:00
Danielle Lancashire	57ae1d2cd6	csimanager: Fingerprint Node Service capabilities	2020-03-23 13:58:29 -04:00

1 2

52 commits