open-nomad

Commit Graph

Author	SHA1	Message	Date
Tim Gross	f6b3d38eb8	CSI: move node unmount to server-driven RPCs (#7596 ) If a volume-claiming alloc stops and the CSI Node plugin that serves that alloc's volumes is missing, there's no way for the allocrunner hook to send the `NodeUnpublish` and `NodeUnstage` RPCs. This changeset addresses this issue with a redesign of the client-side for CSI. Rather than unmounting in the alloc runner hook, the alloc runner hook will simply exit. When the server gets the `Node.UpdateAlloc` for the terminal allocation that had a volume claim, it creates a volume claim GC job. This job will made client RPCs to a new node plugin RPC endpoint, and only once that succeeds, move on to making the client RPCs to the controller plugin. If the node plugin is unavailable, the GC job will fail and be requeued.	2020-04-02 16:04:56 -04:00
Mahmood Ali	37c0dbcfe6	fix codegen for ugorji/go When generating ugorji/go package, we should use github.com/hashicorp/go-msgpack/codec instead. Also fix the reference for codegen_generated	2020-03-31 21:30:21 -04:00
Seth Hoenig	0266f056b8	connect: enable proxy.passthrough configuration Enable configuration of HTTP and gRPC endpoints which should be exposed by the Connect sidecar proxy. This changeset is the first "non-magical" pass that lays the groundwork for enabling Consul service checks for tasks running in a network namespace because they are Connect-enabled. The changes here provide for full configuration of the connect { sidecar_service { proxy { expose { paths = [{ path = <exposed endpoint> protocol = <http or grpc> local_path_port = <local endpoint port> listener_port = <inbound mesh port> }, ... ] } } } stanza. Everything from `expose` and below is new, and partially implements the precedent set by Consul: https://www.consul.io/docs/connect/registration/service-registration.html#expose-paths-configuration-reference Combined with a task-group level network port-mapping in the form: port "exposeExample" { to = -1 } it is now possible to "punch a hole" through the network namespace to a specific HTTP or gRPC path, with the anticipated use case of creating Consul checks on Connect enabled services. A future PR may introduce more automagic behavior, where we can do things like 1) auto-fill the 'expose.path.local_path_port' with the default value of the 'service.port' value for task-group level connect-enabled services. 2) automatically generate a port-mapping 3) enable an 'expose.checks' flag which automatically creates exposed endpoints for every compatible consul service check (http/grpc checks on connect enabled services).	2020-03-31 17:15:27 -06:00
Lang Martin	8d4f39fba1	csi: add node events to report progress mounting and unmounting volumes (#7547 ) * nomad/structs/structs: new NodeEventSubsystemCSI * client/client: pass triggerNodeEvent in the CSIConfig * client/pluginmanager/csimanager/instance: add eventer to instanceManager * client/pluginmanager/csimanager/manager: pass triggerNodeEvent * client/pluginmanager/csimanager/volume: node event on [un]mount * nomad/structs/structs: use storage, not CSI * client/pluginmanager/csimanager/volume: use storage, not CSI * client/pluginmanager/csimanager/volume_test: eventer * client/pluginmanager/csimanager/volume: event on error * client/pluginmanager/csimanager/volume_test: check event on error * command/node_status: remove an extra space in event detail format * client/pluginmanager/csimanager/volume: use snake_case for details * client/pluginmanager/csimanager/volume_test: snake_case details	2020-03-31 17:13:52 -04:00
Mahmood Ali	14a461d6c4	Merge pull request #7560 from hashicorp/vendor-go-msgpack-v1.1.5 vendor: explicit use of hashicorp/go-msgpack	2020-03-31 10:09:05 -04:00
Tim Gross	4a834ea0fa	client: use NewNodeEvent builder for consistency (#7559 )	2020-03-31 10:02:16 -04:00
Yoan Blanc	225c9c1215	fixup! vendor: explicit use of hashicorp/go-msgpack Signed-off-by: Yoan Blanc <yoan@dosimple.ch>	2020-03-31 09:48:07 -04:00
Yoan Blanc	761d014071	vendor: explicit use of hashicorp/go-msgpack Signed-off-by: Yoan Blanc <yoan@dosimple.ch>	2020-03-31 09:45:21 -04:00
Tim Gross	14b4712f01	csi: annotate remaining missing cancellation contexts (#7552 )	2020-03-30 16:46:43 -04:00
Tim Gross	6ffd36c4e5	csi: add grpc retries to client controller RPCs (#7549 ) The CSI Specification defines various gRPC Errors and how they may be retried. After auditing all our CSI RPC calls in #6863, this changeset: * adds retries and backoffs to the where they were needed but not implemented * annotates those CSI RPCs that do not need retries so that we don't wonder whether it's been left off accidentally * added a timeout and cancellation context to the `Probe` call, which didn't have one.	2020-03-30 16:26:03 -04:00
Seth Hoenig	b3664c628c	Merge pull request #7524 from hashicorp/docs-consul-acl-minimums consul: annotate Consul interfaces with ACLs	2020-03-30 13:27:27 -06:00
Seth Hoenig	7dbc22539e	docs: remove erroneous characters from comment	2020-03-30 13:26:48 -06:00
Seth Hoenig	41cabd3e18	Merge pull request #7542 from jorgemarey/b-fix-lockedUpstreamsUpdate Add new setUpstreamsLocked function to avoid blocking on Update	2020-03-30 11:27:32 -06:00
Seth Hoenig	0a812ab689	consul: annotate Consul interfaces with ACLs	2020-03-30 10:17:28 -06:00
Mahmood Ali	b4a00f8dd7	tests: deflake TestAllocGarbageCollector_MakeRoomFor_MaxAllocs The test inserts an alloc in the server state, but expect the client to start the alloc runner for it almost immediately. Here, we add a retry loop to check that the client start all expected alloc runners eventually.	2020-03-30 07:06:53 -04:00
Jorge Marey	3731b70e03	Add new setUpstreamsLocked function to avoid lock	2020-03-29 20:34:04 +02:00
Mahmood Ali	7985b1893f	fixup! tests: Add tests for EC2 Metadata immitation cases	2020-03-26 11:37:54 -04:00
Mahmood Ali	a1e7378c7b	fixup! tests: Add tests for EC2 Metadata immitation cases	2020-03-26 11:33:44 -04:00
Mahmood Ali	1d50379bc6	fingerprint: handle incomplete AWS immitation APIs Fix a regression where we accidentally started treating non-AWS environments as AWS environments, resulting in bad networking settings. Two factors some at play: First, in [1], we accidentally switched the ultimate AWS test from checking `ami-id` to `instance-id`. This means that nomad started treating more environments as AWS; e.g. Hetzner implements `instance-id` but not `ami-id`. Second, some of these environments return empty values instead of errors! Hetzner returns empty 200 response for `local-ipv4`, resulting into bad networking configuration. This change fix the situation by restoring the check to `ami-id` and ensuring that we only set network configuration when the ip address is not-empty. Also, be more defensive around response whitespace input. [1] https://github.com/hashicorp/nomad/pull/6779	2020-03-26 11:23:15 -04:00
Mahmood Ali	b3de5d5721	tests: Add tests for EC2 Metadata immitation cases Test that nomad doesn't set empty/bad network configuration when in an environment that does incomplete immitation of EC2 Metadata API.	2020-03-26 11:13:21 -04:00
Mahmood Ali	884d18f068	Merge pull request #7383 from hashicorp/b-health-detect-failing-tasks health: detect failing tasks	2020-03-25 06:30:05 -04:00
Mahmood Ali	a5b024fdea	tests: restart restartpolicy for all tasks in tests	2020-03-24 21:52:48 -04:00
Mahmood Ali	7565ac34c0	tests: populate task restart policy properly	2020-03-24 21:44:37 -04:00
Mahmood Ali	a45202399c	tests: fix TestAllocations_GarbageCollect	2020-03-24 17:38:59 -04:00
Mahmood Ali	5ed346bf05	tests: update AR task restart policy	2020-03-24 17:00:42 -04:00
Mahmood Ali	ceed57b48f	per-task restart policy	2020-03-24 17:00:41 -04:00
Tim Gross	076fbbf08f	Merge pull request #7012 from hashicorp/f-csi-volumes Container Storage Interface Support	2020-03-23 14:19:46 -04:00
Lang Martin	e100444740	csi: add mount_options to volumes and volume requests (#7398 ) Add mount_options to both the volume definition on registration and to the volume block in the group where the volume is requested. If both are specified, the options provided in the request replace the options defined in the volume. They get passed to the NodePublishVolume, which causes the node plugin to actually mount the volume on the host. Individual tasks just mount bind into the host mounted volume (unchanged behavior). An operator can mount the same volume with different options by specifying it twice in the group context. closes #7007 * nomad/structs/volumes: add MountOptions to volume request * jobspec/test-fixtures/basic.hcl: add mount_options to volume block * jobspec/parse_test: add expected MountOptions * api/tasks: add mount_options * jobspec/parse_group: use hcl decode not mapstructure, mount_options * client/allocrunner/csi_hook: pass MountOptions through client/allocrunner/csi_hook: add a VolumeMountOptions client/allocrunner/csi_hook: drop Options client/allocrunner/csi_hook: use the structs options * client/pluginmanager/csimanager/interface: UsageOptions.MountOptions * client/pluginmanager/csimanager/volume: pass MountOptions in capabilities * plugins/csi/plugin: remove todo 7007 comment * nomad/structs/csi: MountOptions * api/csi: add options to the api for parsing, match structs * plugins/csi/plugin: move VolumeMountOptions to structs * api/csi: use specific type for mount_options * client/allocrunner/csi_hook: merge MountOptions here * rename CSIOptions to CSIMountOptions * client/allocrunner/csi_hook * client/pluginmanager/csimanager/volume * nomad/structs/csi * plugins/csi/fake/client: add PrevVolumeCapability * plugins/csi/plugin * client/pluginmanager/csimanager/volume_test: remove debugging * client/pluginmanager/csimanager/volume: fix odd merging logic * api: rename CSIOptions -> CSIMountOptions * nomad/csi_endpoint: remove a 7007 comment * command/alloc_status: show mount options in the volume list * nomad/structs/csi: include MountOptions in the volume stub * api/csi: add MountOptions to stub * command/volume_status_csi: clean up csiVolMountOption, add it * command/alloc_status: csiVolMountOption lives in volume_csi_status * command/node_status: display mount flags * nomad/structs/volumes: npe * plugins/csi/plugin: npe in ToCSIRepresentation * jobspec/parse_test: expand volume parse test cases * command/agent/job_endpoint: ApiTgToStructsTG needs MountOptions * command/volume_status_csi: copy paste error * jobspec/test-fixtures/basic: hclfmt * command/volume_status_csi: clean up csiVolMountOption	2020-03-23 13:59:25 -04:00
Tim Gross	32b94bf1a4	csi: stub fingerprint on instance manager shutdown (#7388 ) Run the plugin fingerprint one last time with a closed client during instance manager shutdown. This will return quickly and will give us a correctly-populated `PluginInfo` marked as unhealthy so the Nomad client can update the server about plugin health.	2020-03-23 13:59:25 -04:00
Tim Gross	5a0bcd39d1	csi: dynamically update plugin registration (#7386 ) Allow for faster updates to plugin status when allocations become terminal by listening for register/deregister events from the dynamic plugin registry (which in turn are triggered by the plugin supervisor hook). The deregistration function closures that we pass up to the CSI plugin manager don't properly close over the name and type of the registration, causing monolith-type plugins to deregister only one of their two plugins on alloc shutdown. Rebind plugin supervisor deregistration targets to fix that. Includes log message and comment improvements	2020-03-23 13:59:25 -04:00
Tim Gross	fe926e899e	volumes: add task environment interpolation to volume_mount (#7364 )	2020-03-23 13:59:25 -04:00
Tim Gross	22e9f679c3	csi: implement controller detach RPCs (#7356 ) This changeset implements the remaining controller detach RPCs: server-to-client and client-to-controller. The tests also uncovered a bug in our RPC for claims which is fixed here; the volume claim RPC is used for both claiming and releasing a claim on a volume. We should only submit a controller publish RPC when the claim is new and not when it's being released.	2020-03-23 13:59:25 -04:00
Tim Gross	eda7be552c	csi: add dynamicplugins registry to client state store (#7330 ) In order to correctly fingerprint dynamic plugins on client restarts, we need to persist a handle to the plugin (that is, connection info) to the client state store. The dynamic registry will sync automatically to the client state whenever it receives a register/deregister call.	2020-03-23 13:58:30 -04:00
Lang Martin	6750c262a4	csi: use `ExternalID`, when set, to identify volumes for outside RPC calls (#7326 ) * nomad/structs/csi: new RemoteID() uses the ExternalID if set * nomad/csi_endpoint: pass RemoteID to volume request types * client/pluginmanager/csimanager/volume: pass RemoteID to NodePublishVolume	2020-03-23 13:58:30 -04:00
Tim Gross	1cf7ef44ed	csi: docstring and log message fixups (#7327 ) Fix some docstring typos and fix noisy log message during client restarts. A log for the common case where the plugin socket isn't ready yet isn't actionable by the operator so having it at info is just noise.	2020-03-23 13:58:30 -04:00
Lang Martin	de25fc6cf4	csi: csi-hostpath plugin unimplemented error on controller publish (#7299 ) * client/allocrunner/csi_hook: tag errors * nomad/client_csi_endpoint: tag errors * nomad/client_rpc: remove an unnecessary error tag * nomad/state/state_store: ControllerRequired fix intent We use ControllerRequired to indicate that a volume should use the publish/unpublish workflow, rather than that it has a controller. We need to check both RequiresControllerPlugin and SupportsAttachDetach from the fingerprint to check that. * nomad/csi_endpoint: tag errors * nomad/csi_endpoint_test: longer error messages, mock fingerprints	2020-03-23 13:58:30 -04:00
Tim Gross	de4ad6ca38	csi: add Provider field to CSI CLIs and APIs (#7285 ) Derive a provider name and version for plugins (and the volumes that use them) from the CSI identity API `GetPluginInfo`. Expose the vendor name as `Provider` in the API and CLI commands.	2020-03-23 13:58:30 -04:00
Lang Martin	a4784ef258	csi add allocation context to fingerprinting results (#7133 ) * structs: CSIInfo include AllocID, CSIPlugins no Jobs * state_store: eliminate plugin Jobs, delete an empty plugin * nomad/structs/csi: detect empty plugins correctly * client/allocrunner/taskrunner/plugin_supervisor_hook: option AllocID * client/pluginmanager/csimanager/instance: allocID * client/pluginmanager/csimanager/fingerprint: set AllocID * client/node_updater: split controller and node plugins * api/csi: remove Jobs The CSI Plugin API will map plugins to allocations, which allows plugins to be defined by jobs in many configurations. In particular, multiple plugins can be defined in the same job, and multiple jobs can be used to define a single plugin. Because we now map the allocation context directly from the node, it's no longer necessary to track the jobs associated with a plugin directly. * nomad/csi_endpoint_test: CreateTestPlugin & register via fingerprint * client/dynamicplugins: lift AllocID into the struct from Options * api/csi_test: remove Jobs test * nomad/structs/csi: CSIPlugins has an array of allocs * nomad/state/state_store: implement CSIPluginDenormalize * nomad/state/state_store: CSIPluginDenormalize npe on missing alloc * nomad/csi_endpoint_test: defer deleteNodes for clarity * api/csi_test: disable this test awaiting mocks: https://github.com/hashicorp/nomad/issues/7123	2020-03-23 13:58:30 -04:00
Danielle Lancashire	247e86bb35	csi: VolumeCapabilities for ControllerPublishVolume This commit introduces support for providing VolumeCapabilities during requests to `ControllerPublishVolumes` as this is a required field.	2020-03-23 13:58:30 -04:00
Danielle Lancashire	e75f057df3	csi: Fix Controller RPCs Currently the handling of CSINode RPCs does not correctly handle forwarding RPCs to Nodes. This commit fixes this by introducing a shim RPC (nomad/client_csi_enpdoint) that will correctly forward the request to the owning node, or submit the RPC to the client. In the process it also cleans up handling a little bit by adding the `CSIControllerQuery` embeded struct for required forwarding state. The CSIControllerQuery embeding the requirement of a `PluginID` also means we could move node targetting into the shim RPC if wanted in the future.	2020-03-23 13:58:30 -04:00
Danielle Lancashire	d5e255f97a	client: Rename ClientCSI -> CSIController	2020-03-23 13:58:30 -04:00
Danielle Lancashire	5b05baf9f6	csi: Add /dev mounts to CSI Plugins CSI Plugins that manage devices need not just access to the CSI directory, but also to manage devices inside `/dev`. This commit introduces a `/dev:/dev` mount to the container so that they may do so.	2020-03-23 13:58:30 -04:00
Danielle Lancashire	6fc7f7779d	csimanager/volume: Update MountVolume docstring	2020-03-23 13:58:30 -04:00
Danielle Lancashire	1b70fb1398	hook resources: Init with empty resources during setup	2020-03-23 13:58:30 -04:00
Danielle Lancashire	511b7775a6	csi: Claim CSI Volumes during csi_hook.Prerun This commit is the initial implementation of claiming volumes from the server and passes through any publishContext information as appropriate. There's nothing too fancy here.	2020-03-23 13:58:30 -04:00
Danielle Lancashire	f79351915c	csi: Basic volume usage tracking	2020-03-23 13:58:30 -04:00
Danielle Lancashire	0203341033	csi: Add comment to UsageOptions.ToFS()	2020-03-23 13:58:30 -04:00
Danielle Lancashire	9f1a076bd5	client: Implement ClientCSI.ControllerValidateVolume	2020-03-23 13:58:30 -04:00
Danielle Lancashire	6b7ee96a88	csi: Move VolumeCapabilties helper to package	2020-03-23 13:58:30 -04:00
Danielle Lancashire	da4f6b60a2	csi: Pass through usage options to the csimanager The CSI Spec requires us to attach and stage volumes based on different types of usage information when it may effect how they are bound. Here we pass through some basic usage options in the CSI Hook (specifically the volume aliases ReadOnly field), and the attachment/access mode from the volume. We pass the attachment/access mode seperately from the volume as it simplifies some handling and doesn't necessarily force every attachment to use the same mode should more be supported (I.e if we let each `volume "foo" {}` specify an override in the future).	2020-03-23 13:58:30 -04:00

1 2 3 4 5 ...

4122 Commits