open-nomad

Commit Graph

Author	SHA1	Message	Date
Tim Gross	60874ebe25	csi: Postrun hook should not change mode (#9323 ) The unpublish workflow requires that we know the mode (RW vs RO) if we want to unpublish the node. Update the hook and the Unpublish RPC so that we mark the claim for release in a new state but leave the mode alone. This fixes a bug where RO claims were failing node unpublish. The core job GC doesn't know the mode, but we don't need it for that workflow, so add a mode specifically for GC; the volumewatcher uses this as a sentinel to check whether claims (with their specific RW vs RO modes) need to be claimed.	2020-11-11 13:06:30 -05:00
Tim Gross	8459f1ead5	csi: prevent in-use plugin GC from blocking volume GC (#9141 ) During CSI plugin GC, we don't return an error if the volume is in use, because this is not an error condition. If we were to return an error during a `nomad system gc`, we would not continue on to GC volumes. But check for the specific error message fails if the GC is performed on a worker rather than on the leader, due to RPC forwarding wrapping the error message. Use a less specific test so that we don't return an error.	2020-10-21 16:54:28 -04:00
Tim Gross	69f4f171e5	CSI: fix missing ACL tokens for leader-driven RPCs (#8607 ) The volumewatcher and GC job in the leader can't make CSI RPCs when ACLs are enabled without the leader ACL token being passed thru.	2020-08-07 15:37:27 -04:00
Tim Gross	2854298089	csi: release claims via csi_hook postrun unpublish RPC (#8580 ) Add a Postrun hook to send the `CSIVolume.Unpublish` RPC to the server. This may forward client RPCs to the node plugins or to the controller plugins, depending on whether other allocations on this node have claims on this volume. By making clients responsible for running the `CSIVolume.Unpublish` RPC (and making the RPC available to a `nomad volume detach` command), the volumewatcher becomes only used by the core GC job and we no longer need async volume GC from job deregister and node update.	2020-08-06 14:51:46 -04:00
Tim Gross	3aa761b151	Periodic GC for volume claims (#7881 ) This changeset implements a periodic garbage collection of CSI volumes with missing allocations. This can happen in a scenario where a node update fails partially and the allocation updates are written to raft but the evaluations to GC the volumes are dropped. This feature will cover this edge case and ensure that upgrades from 0.11.0 and 0.11.1 get any stray claims cleaned up.	2020-05-11 08:20:50 -04:00
Tim Gross	801ebcfe8d	periodic GC for CSI plugins (#7878 ) This changeset implements a periodic garbage collection of unused CSI plugins. Plugins are self-cleaning when the last allocation for a plugin is stopped, but this feature will cover any missing edge cases and ensure that upgrades from 0.11.0 and 0.11.1 get any stray plugins cleaned up.	2020-05-06 16:49:12 -04:00
Tim Gross	a7a64443e1	csi: move volume claim release into volumewatcher (#7794 ) This changeset adds a subsystem to run on the leader, similar to the deployment watcher or node drainer. The `Watcher` performs a blocking query on updates to the `CSIVolumes` table and triggers reaping of volume claims. This will avoid tying up scheduling workers by immediately sending volume claim workloads into their own loop, rather than blocking the scheduling workers in the core GC job doing things like talking to CSI controllers The volume watcher is enabled on leader step-up and disabled on leader step-down. The volume claim GC mechanism now makes an empty claim RPC for the volume to trigger an index bump. That in turn unblocks the blocking query in the volume watcher so it can assess which claims can be released for a volume.	2020-04-30 09:13:00 -04:00
Tim Gross	083b35d651	csi: checkpoint volume claim garbage collection (#7782 ) Adds a `CSIVolumeClaim` type to be tracked as current and past claims on a volume. Allows for a client RPC failure during node or controller detachment without having to keep the allocation around after the first garbage collection eval. This changeset lays groundwork for moving the actual detachment RPCs into a volume watching loop outside the GC eval.	2020-04-23 11:06:23 -04:00
Tim Gross	f37e986b1b	refactor: make nodeForControllerPlugin private to ClientCSI (#7688 ) The current design of `ClientCSI` RPC requires that callers in the server know about the free-standing `nodeForControllerPlugin` function. This makes it difficult to send `ClientCSI` RPC messages from subpackages of `nomad` and adds a bunch of boilerplate to every server-side caller of a controller RPC. This changeset makes it so that the `ClientCSI` RPCs will populate and validate the controller's client node ID if it hasn't been passed by the caller, centralizing the logic of picking and validating controller targets into the `nomad.ClientCSI` struct.	2020-04-10 16:47:21 -04:00
Tim Gross	027277a0d9	csi: make volume GC in job deregister safely async The `Job.Deregister` call will block on the client CSI controller RPCs while the alloc still exists on the Nomad client node. So we need to make the volume claim reaping async from the `Job.Deregister`. This allows `nomad job stop` to return immediately. In order to make this work, this changeset changes the volume GC so that the GC jobs are on a by-volume basis rather than a by-job basis; we won't have to query the (possibly deleted) job at the time of volume GC. We smuggle the volume ID and whether it's a purge into the GC eval ID the same way we smuggled the job ID previously.	2020-04-06 10:15:55 -04:00
Tim Gross	5a3b45864d	csi: fix unpublish workflow ID mismatches The CSI plugins uses the external volume ID for all operations, but the Client CSI RPCs uses the Nomad volume ID (human-friendly) for the mount paths. Pass the External ID as an arg in the RPC call so that the unpublish workflows have it without calling back to the server to find the external ID. The controller CSI plugins need the CSI node ID (or in other words, the storage provider's view of node ID like the EC2 instance ID), not the Nomad node ID, to determine how to detach the external volume.	2020-04-06 10:15:55 -04:00
Lang Martin	1750426d04	csi: run volume claim GC on `job stop -purge` (#7615 ) * nomad/state/state_store: error message copy/paste error * nomad/structs/structs: add a VolumeEval to the JobDeregisterResponse * nomad/job_endpoint: synchronously, volumeClaimReap on job Deregister * nomad/core_sched: make volumeClaimReap available without a CoreSched * nomad/job_endpoint: Deregister return early if the job is missing * nomad/job_endpoint_test: job Deregistion is idempotent * nomad/core_sched: conditionally ignore alloc status in volumeClaimReap * nomad/job_endpoint: volumeClaimReap all allocations, even running * nomad/core_sched_test: extra argument to collectClaimsToGCImpl * nomad/job_endpoint: job deregistration is not idempotent	2020-04-03 17:37:26 -04:00
Tim Gross	f6b3d38eb8	CSI: move node unmount to server-driven RPCs (#7596 ) If a volume-claiming alloc stops and the CSI Node plugin that serves that alloc's volumes is missing, there's no way for the allocrunner hook to send the `NodeUnpublish` and `NodeUnstage` RPCs. This changeset addresses this issue with a redesign of the client-side for CSI. Rather than unmounting in the alloc runner hook, the alloc runner hook will simply exit. When the server gets the `Node.UpdateAlloc` for the terminal allocation that had a volume claim, it creates a volume claim GC job. This job will made client RPCs to a new node plugin RPC endpoint, and only once that succeeds, move on to making the client RPCs to the controller plugin. If the node plugin is unavailable, the GC job will fail and be requeued.	2020-04-02 16:04:56 -04:00
Lang Martin	3621df1dbf	csi: volume ids are only unique per namespace (#7358 ) * nomad/state/schema: use the namespace compound index * scheduler/scheduler: CSIVolumeByID interface signature namespace * scheduler/stack: SetJob on CSIVolumeChecker to capture namespace * scheduler/feasible: pass the captured namespace to CSIVolumeByID * nomad/state/state_store: use namespace in csi_volume index * nomad/fsm: pass namespace to CSIVolumeDeregister & Claim * nomad/core_sched: pass the namespace in volumeClaimReap * nomad/node_endpoint_test: namespaces in Claim testing * nomad/csi_endpoint: pass RequestNamespace to state.* * nomad/csi_endpoint_test: appropriately failed test * command/alloc_status_test: appropriately failed test * node_endpoint_test: avoid notTheNamespace for the job * scheduler/feasible_test: call SetJob to capture the namespace * nomad/csi_endpoint: ACL check the req namespace, query by namespace * nomad/state/state_store: remove deregister namespace check * nomad/state/state_store: remove unused CSIVolumes * scheduler/feasible: CSIVolumeChecker SetJob -> SetNamespace * nomad/csi_endpoint: ACL check * nomad/state/state_store_test: remove call to state.CSIVolumes * nomad/core_sched_test: job namespace match so claim gc works	2020-03-23 13:59:25 -04:00
Tim Gross	8bc5641438	csi: volume claim garbage collection (#7125 ) When an alloc is marked terminal (and after node unstage/unpublish have been called), the client syncs the terminal alloc state with the server via `Node.UpdateAlloc RPC`. For each job that has a terminal alloc, the `Node.UpdateAlloc` RPC handler at the server will emit an eval for a new core job to garbage collect CSI volume claims. When this eval is handled on the core scheduler, it will call a `volumeReap` method to release the claims for all terminal allocs on the job. The volume reap will issue a `ControllerUnpublishVolume` RPC for any node that has no alloc claiming the volume. Once this returns (or is skipped), the volume reap will send a new `CSIVolume.Claim` RPC that releases the volume claim for that allocation in the state store, making it available for scheduling again. This same `volumeReap` method will be called from the core job GC, which gives us a second chance to reclaim volumes during GC if there were controller RPC failures.	2020-03-23 13:58:30 -04:00
Tim Gross	8673ea5cba	csi: add empty CSI volume publication GC to scheduled core jobs (#7014 ) This changeset adds a new core job `CoreJobCSIVolumePublicationGC` to the leader's loop for scheduling core job evals. Right now this is an empty method body without even a config file stanza. Later changesets will implement the logic of volume publication GC.	2020-03-23 13:58:29 -04:00
Lang Martin	ee4848167c	core_sched add compat comment for later removal	2019-07-10 13:56:20 -04:00
Lang Martin	a95225d754	NodeDeregisterBatch -> NodeBatchDeregister match JobBatch pattern	2019-07-10 13:56:20 -04:00
Lang Martin	ad3549f906	core_sched use the new rpc names	2019-07-10 13:56:20 -04:00
Lang Martin	d22d9fb5b2	core_sched check ServersMeetMinimumVersion	2019-07-10 13:56:19 -04:00
Lang Martin	a4472e3d34	core_sched check ServersMeetMinimumVersion, send old node deregister	2019-07-10 13:56:19 -04:00
Lang Martin	d5ff2834ca	core_sched batch node deregistration requests	2019-07-10 13:56:19 -04:00
Mahmood Ali	6f077a73dc	Fix panic on failure Error expects an odd number of arguments, and panics otherwise.	2019-01-08 12:19:44 -05:00
Alex Dadgar	14a61ea3ea	Don't GC running but desired stop allocations This PR fixes an edge case where we could GC an allocation that was in a desired stop state but had not terminated yet. This can be hit if the client hasn't shutdown the allocation yet or if the allocation is still shutting down (long kill_timeout). Fixes https://github.com/hashicorp/nomad/issues/4940	2018-12-05 13:01:12 -08:00
Preetha Appan	39072977d6	Use create index as trigger condition to gc old terminal allocs	2018-11-09 11:44:21 -06:00
Preetha Appan	e586817ce7	batch jobs GC removes terminal allocs if job modifyindex is older than running job	2018-11-01 00:05:31 -05:00
Alex Dadgar	3c19d01d7a	server	2018-09-15 16:23:13 -07:00
Preetha Appan	a9d63c0df3	Check allocation's desired state in GC eligibility logic in core scheduler	2018-05-21 13:28:31 -05:00
Preetha Appan	688fd9ee37	Update alloc GC eligility logic to not rely on follow up evals	2018-04-11 13:58:02 -05:00
Preetha Appan	7040884002	Simplify and update allocation gc eligibility logic	2018-04-10 16:08:37 -05:00
Alex Dadgar	7545c0053e	job gc uses batch endpoint	2018-03-16 10:53:03 -07:00
Preetha Appan	8ecb6ca91b	Code review feedback and more test cases	2018-01-31 09:58:05 -06:00
Preetha Appan	28d2439810	Consider dead job status and modify unit test setup for correctness	2018-01-31 09:58:05 -06:00
Preetha Appan	4fd2691323	Use next alloc id being set, move outside structs package and other code review feedback	2018-01-31 09:58:05 -06:00
Preetha Appan	dd91a2f5be	Make garbage collection be aware of rescheduling info in allocations	2018-01-31 09:58:05 -06:00
Alex Dadgar	d3e119f4d0	thread leader token through core gc and test	2017-10-23 15:04:00 -07:00
Alex Dadgar	84d06f6abe	Sync namespace changes	2017-09-07 17:04:21 -07:00
Alex Dadgar	84c2f25e0a	Deployment GC ensures no alloc references	2017-07-17 14:09:59 -07:00
Alex Dadgar	e71e315950	Fix log line for gc'ing deployments	2017-07-13 15:07:25 -07:00
Alex Dadgar	b64185a3f1	Deployment GC This PR implements the garbage collector for deployments. Deployments will by default be garbage collected after 1 hour.	2017-07-07 12:05:57 -07:00
Alex Dadgar	34332af70e	GC and some fixes	2017-04-15 17:08:05 -07:00
Alex Dadgar	3825f7cf1f	Eval GC will collect allocs from stopped batch job This PR fixes a bug in which allocations from stopped batch jobs could not be garbage collected.	2017-03-11 15:48:57 -08:00
Alex Dadgar	b69b357c7f	Nomad builds	2017-02-07 20:31:23 -08:00
Alex Dadgar	7f9c6466d4	Disallow GC of parameterized jobs This PR makes it so parameterized jobs do not get garbage collected and adds a test.	2017-01-26 11:57:32 -08:00
Alex Dadgar	3f0a47f9e4	Disallow EvalGC to reap batch jobs evals/allocs and make JobGC only oneshot GCs everything	2016-06-27 22:54:03 -07:00
Diptanu Choudhury	0fe8746692	GC-ing dead batch jobs	2016-06-22 11:40:27 -07:00
Sean Chittenden	a658299235	Misc typos	2016-06-16 16:17:17 -07:00
Alex Dadgar	98bf249625	Partial GC allocations	2016-06-10 18:32:37 -07:00
Alex Dadgar	cc95d5d332	GC Nodes even if they have terminal allocations	2016-06-03 16:24:41 -07:00
Alex Dadgar	d94204554f	Merge pull request #1012 from hashicorp/f-partition-gc core: Limit GC size	2016-04-14 13:00:53 -07:00

1 2

66 Commits