open-nomad

Author	SHA1	Message	Date
Tim Gross	314458ebdb	csi: update volumewatcher to use unpublish RPC (#8579 ) This changeset updates `nomad/volumewatcher` to take advantage of the `CSIVolume.Unpublish` RPC. This lets us eliminate a bunch of code and associated tests. The raft batching code can be safely dropped, as the characteristic times of the CSI RPCs are on the order of seconds or even minutes, so batching up raft RPCs added complexity without any real world performance wins. Includes refactor w/ test cleanup and dead code elimination in volumewatcher	2020-08-06 14:31:18 -04:00
Tim Gross	3aa761b151	Periodic GC for volume claims (#7881 ) This changeset implements a periodic garbage collection of CSI volumes with missing allocations. This can happen in a scenario where a node update fails partially and the allocation updates are written to raft but the evaluations to GC the volumes are dropped. This feature will cover this edge case and ensure that upgrades from 0.11.0 and 0.11.1 get any stray claims cleaned up.	2020-05-11 08:20:50 -04:00
Tim Gross	801ebcfe8d	periodic GC for CSI plugins (#7878 ) This changeset implements a periodic garbage collection of unused CSI plugins. Plugins are self-cleaning when the last allocation for a plugin is stopped, but this feature will cover any missing edge cases and ensure that upgrades from 0.11.0 and 0.11.1 get any stray plugins cleaned up.	2020-05-06 16:49:12 -04:00
Tim Gross	a7a64443e1	csi: move volume claim release into volumewatcher (#7794 ) This changeset adds a subsystem to run on the leader, similar to the deployment watcher or node drainer. The `Watcher` performs a blocking query on updates to the `CSIVolumes` table and triggers reaping of volume claims. This will avoid tying up scheduling workers by immediately sending volume claim workloads into their own loop, rather than blocking the scheduling workers in the core GC job doing things like talking to CSI controllers The volume watcher is enabled on leader step-up and disabled on leader step-down. The volume claim GC mechanism now makes an empty claim RPC for the volume to trigger an index bump. That in turn unblocks the blocking query in the volume watcher so it can assess which claims can be released for a volume.	2020-04-30 09:13:00 -04:00
Tim Gross	083b35d651	csi: checkpoint volume claim garbage collection (#7782 ) Adds a `CSIVolumeClaim` type to be tracked as current and past claims on a volume. Allows for a client RPC failure during node or controller detachment without having to keep the allocation around after the first garbage collection eval. This changeset lays groundwork for moving the actual detachment RPCs into a volume watching loop outside the GC eval.	2020-04-23 11:06:23 -04:00
Tim Gross	5a3b45864d	csi: fix unpublish workflow ID mismatches The CSI plugins uses the external volume ID for all operations, but the Client CSI RPCs uses the Nomad volume ID (human-friendly) for the mount paths. Pass the External ID as an arg in the RPC call so that the unpublish workflows have it without calling back to the server to find the external ID. The controller CSI plugins need the CSI node ID (or in other words, the storage provider's view of node ID like the EC2 instance ID), not the Nomad node ID, to determine how to detach the external volume.	2020-04-06 10:15:55 -04:00
Lang Martin	1750426d04	csi: run volume claim GC on `job stop -purge` (#7615 ) * nomad/state/state_store: error message copy/paste error * nomad/structs/structs: add a VolumeEval to the JobDeregisterResponse * nomad/job_endpoint: synchronously, volumeClaimReap on job Deregister * nomad/core_sched: make volumeClaimReap available without a CoreSched * nomad/job_endpoint: Deregister return early if the job is missing * nomad/job_endpoint_test: job Deregistion is idempotent * nomad/core_sched: conditionally ignore alloc status in volumeClaimReap * nomad/job_endpoint: volumeClaimReap all allocations, even running * nomad/core_sched_test: extra argument to collectClaimsToGCImpl * nomad/job_endpoint: job deregistration is not idempotent	2020-04-03 17:37:26 -04:00
Tim Gross	f6b3d38eb8	CSI: move node unmount to server-driven RPCs (#7596 ) If a volume-claiming alloc stops and the CSI Node plugin that serves that alloc's volumes is missing, there's no way for the allocrunner hook to send the `NodeUnpublish` and `NodeUnstage` RPCs. This changeset addresses this issue with a redesign of the client-side for CSI. Rather than unmounting in the alloc runner hook, the alloc runner hook will simply exit. When the server gets the `Node.UpdateAlloc` for the terminal allocation that had a volume claim, it creates a volume claim GC job. This job will made client RPCs to a new node plugin RPC endpoint, and only once that succeeds, move on to making the client RPCs to the controller plugin. If the node plugin is unavailable, the GC job will fail and be requeued.	2020-04-02 16:04:56 -04:00
Lang Martin	b596e67f47	csi: implement volume ACLs (#7339 ) * acl/policy: add the volume ACL policies * nomad/csi_endpoint: enforce ACLs for volume access * nomad/search_endpoint_oss: volume acls * acl/acl: add plugin read as a global policy * acl/policy: add PluginPolicy global cap type * nomad/csi_endpoint: check the global plugin ACL policy * nomad/mock/acl: PluginPolicy * nomad/csi_endpoint: fix list rebase * nomad/core_sched_test: new test since #7358 * nomad/csi_endpoint_test: use correct permissions for list * nomad/csi_endpoint: allowCSIMount keeps ACL checks together * nomad/job_endpoint: check mount permission for jobs * nomad/job_endpoint_test: need plugin read, too	2020-03-23 13:59:25 -04:00
Lang Martin	3621df1dbf	csi: volume ids are only unique per namespace (#7358 ) * nomad/state/schema: use the namespace compound index * scheduler/scheduler: CSIVolumeByID interface signature namespace * scheduler/stack: SetJob on CSIVolumeChecker to capture namespace * scheduler/feasible: pass the captured namespace to CSIVolumeByID * nomad/state/state_store: use namespace in csi_volume index * nomad/fsm: pass namespace to CSIVolumeDeregister & Claim * nomad/core_sched: pass the namespace in volumeClaimReap * nomad/node_endpoint_test: namespaces in Claim testing * nomad/csi_endpoint: pass RequestNamespace to state.* * nomad/csi_endpoint_test: appropriately failed test * command/alloc_status_test: appropriately failed test * node_endpoint_test: avoid notTheNamespace for the job * scheduler/feasible_test: call SetJob to capture the namespace * nomad/csi_endpoint: ACL check the req namespace, query by namespace * nomad/state/state_store: remove deregister namespace check * nomad/state/state_store: remove unused CSIVolumes * scheduler/feasible: CSIVolumeChecker SetJob -> SetNamespace * nomad/csi_endpoint: ACL check * nomad/state/state_store_test: remove call to state.CSIVolumes * nomad/core_sched_test: job namespace match so claim gc works	2020-03-23 13:59:25 -04:00
Tim Gross	22e9f679c3	csi: implement controller detach RPCs (#7356 ) This changeset implements the remaining controller detach RPCs: server-to-client and client-to-controller. The tests also uncovered a bug in our RPC for claims which is fixed here; the volume claim RPC is used for both claiming and releasing a claim on a volume. We should only submit a controller publish RPC when the claim is new and not when it's being released.	2020-03-23 13:59:25 -04:00
Tim Gross	8bc5641438	csi: volume claim garbage collection (#7125 ) When an alloc is marked terminal (and after node unstage/unpublish have been called), the client syncs the terminal alloc state with the server via `Node.UpdateAlloc RPC`. For each job that has a terminal alloc, the `Node.UpdateAlloc` RPC handler at the server will emit an eval for a new core job to garbage collect CSI volume claims. When this eval is handled on the core scheduler, it will call a `volumeReap` method to release the claims for all terminal allocs on the job. The volume reap will issue a `ControllerUnpublishVolume` RPC for any node that has no alloc claiming the volume. Once this returns (or is skipped), the volume reap will send a new `CSIVolume.Claim` RPC that releases the volume claim for that allocation in the state store, making it available for scheduling again. This same `volumeReap` method will be called from the core job GC, which gives us a second chance to reclaim volumes during GC if there were controller RPC failures.	2020-03-23 13:58:30 -04:00
Tim Gross	8673ea5cba	csi: add empty CSI volume publication GC to scheduled core jobs (#7014 ) This changeset adds a new core job `CoreJobCSIVolumePublicationGC` to the leader's loop for scheduling core job evals. Right now this is an empty method body without even a config file stanza. Later changesets will implement the logic of volume publication GC.	2020-03-23 13:58:29 -04:00
Seth Hoenig	f0c3dca49c	tests: swap lib/freeport for tweaked helper/freeport Copy the updated version of freeport (sdk/freeport), and tweak it for use in Nomad tests. This means staying below port 10000 to avoid conflicts with the lib/freeport that is still transitively used by the old version of consul that we vendor. Also provide implementations to find ephemeral ports of macOS and Windows environments. Ports acquired through freeport are supposed to be returned to freeport, which this change now also introduces. Many tests are modified to include calls to a cleanup function for Server objects. This should help quite a bit with some flakey tests, but not all of them. Our port problems will not go away completely until we upgrade our vendor version of consul. With Go modules, we'll probably do a 'replace' to swap out other copies of freeport with the one now in 'nomad/helper/freeport'.	2019-12-09 08:37:32 -06:00
Alex Dadgar	14a61ea3ea	Don't GC running but desired stop allocations This PR fixes an edge case where we could GC an allocation that was in a desired stop state but had not terminated yet. This can be hit if the client hasn't shutdown the allocation yet or if the allocation is still shutting down (long kill_timeout). Fixes https://github.com/hashicorp/nomad/issues/4940	2018-12-05 13:01:12 -08:00
Preetha Appan	39072977d6	Use create index as trigger condition to gc old terminal allocs	2018-11-09 11:44:21 -06:00
Preetha Appan	e586817ce7	batch jobs GC removes terminal allocs if job modifyindex is older than running job	2018-11-01 00:05:31 -05:00
Preetha Appan	a9d63c0df3	Check allocation's desired state in GC eligibility logic in core scheduler	2018-05-21 13:28:31 -05:00
Preetha	0b6fbb8e16	Merge pull request #4131 from hashicorp/b-rescheduling-fix-gc Update garbage collection logic to make sure allocs with pending evals are not GCed	2018-04-11 15:44:36 -05:00
Preetha Appan	1da4d88f3d	Make test descriptions better	2018-04-11 15:12:23 -05:00
Preetha Appan	688fd9ee37	Update alloc GC eligility logic to not rely on follow up evals	2018-04-11 13:58:02 -05:00
Charlie Voiselle	ba88f00ccb	Changed "til" to "until" Should be "till" or "until"; chose "until" because it is unambiguous as to meaning.	2018-04-11 12:36:28 -05:00
Preetha Appan	59cce1d620	Fix unit test for core scheduler GC	2018-04-10 17:12:06 -05:00
Preetha Appan	7040884002	Simplify and update allocation gc eligibility logic	2018-04-10 16:08:37 -05:00
Alex Dadgar	7545c0053e	job gc uses batch endpoint	2018-03-16 10:53:03 -07:00
Josh Soref	4e40338cfa	spelling: rescheduling	2018-03-11 18:40:32 +00:00
Josh Soref	bf05f146cd	spelling: deployment	2018-03-11 17:57:49 +00:00
Preetha Appan	eaedffc7f7	Fix go vet errors	2018-02-28 12:21:27 -06:00
Alex Dadgar	a6dfffa4fa	Add testing interfaces	2018-02-15 13:59:00 -08:00
Preetha Appan	8ecb6ca91b	Code review feedback and more test cases	2018-01-31 09:58:05 -06:00
Preetha Appan	28d2439810	Consider dead job status and modify unit test setup for correctness	2018-01-31 09:58:05 -06:00
Preetha Appan	4fd2691323	Use next alloc id being set, move outside structs package and other code review feedback	2018-01-31 09:58:05 -06:00
Preetha Appan	dd91a2f5be	Make garbage collection be aware of rescheduling info in allocations	2018-01-31 09:58:05 -06:00
Alex Dadgar	d3e119f4d0	thread leader token through core gc and test	2017-10-23 15:04:00 -07:00
Alex Dadgar	84d06f6abe	Sync namespace changes	2017-09-07 17:04:21 -07:00
Alex Dadgar	06eddf243c	parallel nomad tests	2017-07-25 17:39:36 -07:00
Alex Dadgar	84c2f25e0a	Deployment GC ensures no alloc references	2017-07-17 14:09:59 -07:00
Alex Dadgar	09dfa2fc10	Rename CreateDeployments and remove cancelling behavior in state_store	2017-07-07 12:10:04 -07:00
Alex Dadgar	b64185a3f1	Deployment GC This PR implements the garbage collector for deployments. Deployments will by default be garbage collected after 1 hour.	2017-07-07 12:05:57 -07:00
Alex Dadgar	34332af70e	GC and some fixes	2017-04-15 17:08:05 -07:00
Alex Dadgar	3825f7cf1f	Eval GC will collect allocs from stopped batch job This PR fixes a bug in which allocations from stopped batch jobs could not be garbage collected.	2017-03-11 15:48:57 -08:00
Alex Dadgar	04862ca10e	Tests compile	2017-02-07 21:30:57 -08:00
Alex Dadgar	7f9c6466d4	Disallow GC of parameterized jobs This PR makes it so parameterized jobs do not get garbage collected and adds a test.	2017-01-26 11:57:32 -08:00
Alex Dadgar	007a538515	Fix core scheduler tests	2016-08-11 14:36:22 -07:00
Alex Dadgar	e33bda76bf	test sched doesn't mark complete as lost + core_sched tests	2016-08-04 11:24:17 -07:00
Diptanu Choudhury	6193529040	Fixed more tests	2016-07-25 17:31:40 -07:00
Alex Dadgar	e26f826189	fix job gc tests	2016-07-25 14:56:23 -07:00
Alex Dadgar	0db55c1dce	Revert "Fix job gc tests" This reverts commit 4be50ac8c78b09d603d9680064391d449b268436.	2016-07-25 14:53:07 -07:00
Alex Dadgar	e61aa2484a	Fix job gc tests	2016-07-25 14:49:57 -07:00
Diptanu Choudhury	487c66b84d	Removing the queued state of Job Summary and alloc desired status false	2016-07-13 13:20:46 -06:00

1 2

78 commits