open-nomad

Commit Graph

Author	SHA1	Message	Date
Lang Martin	3621df1dbf	csi: volume ids are only unique per namespace (#7358 ) * nomad/state/schema: use the namespace compound index * scheduler/scheduler: CSIVolumeByID interface signature namespace * scheduler/stack: SetJob on CSIVolumeChecker to capture namespace * scheduler/feasible: pass the captured namespace to CSIVolumeByID * nomad/state/state_store: use namespace in csi_volume index * nomad/fsm: pass namespace to CSIVolumeDeregister & Claim * nomad/core_sched: pass the namespace in volumeClaimReap * nomad/node_endpoint_test: namespaces in Claim testing * nomad/csi_endpoint: pass RequestNamespace to state.* * nomad/csi_endpoint_test: appropriately failed test * command/alloc_status_test: appropriately failed test * node_endpoint_test: avoid notTheNamespace for the job * scheduler/feasible_test: call SetJob to capture the namespace * nomad/csi_endpoint: ACL check the req namespace, query by namespace * nomad/state/state_store: remove deregister namespace check * nomad/state/state_store: remove unused CSIVolumes * scheduler/feasible: CSIVolumeChecker SetJob -> SetNamespace * nomad/csi_endpoint: ACL check * nomad/state/state_store_test: remove call to state.CSIVolumes * nomad/core_sched_test: job namespace match so claim gc works	2020-03-23 13:59:25 -04:00
Tim Gross	8bc5641438	csi: volume claim garbage collection (#7125 ) When an alloc is marked terminal (and after node unstage/unpublish have been called), the client syncs the terminal alloc state with the server via `Node.UpdateAlloc RPC`. For each job that has a terminal alloc, the `Node.UpdateAlloc` RPC handler at the server will emit an eval for a new core job to garbage collect CSI volume claims. When this eval is handled on the core scheduler, it will call a `volumeReap` method to release the claims for all terminal allocs on the job. The volume reap will issue a `ControllerUnpublishVolume` RPC for any node that has no alloc claiming the volume. Once this returns (or is skipped), the volume reap will send a new `CSIVolume.Claim` RPC that releases the volume claim for that allocation in the state store, making it available for scheduling again. This same `volumeReap` method will be called from the core job GC, which gives us a second chance to reclaim volumes during GC if there were controller RPC failures.	2020-03-23 13:58:30 -04:00
Tim Gross	8673ea5cba	csi: add empty CSI volume publication GC to scheduled core jobs (#7014 ) This changeset adds a new core job `CoreJobCSIVolumePublicationGC` to the leader's loop for scheduling core job evals. Right now this is an empty method body without even a config file stanza. Later changesets will implement the logic of volume publication GC.	2020-03-23 13:58:29 -04:00
Lang Martin	ee4848167c	core_sched add compat comment for later removal	2019-07-10 13:56:20 -04:00
Lang Martin	a95225d754	NodeDeregisterBatch -> NodeBatchDeregister match JobBatch pattern	2019-07-10 13:56:20 -04:00
Lang Martin	ad3549f906	core_sched use the new rpc names	2019-07-10 13:56:20 -04:00
Lang Martin	d22d9fb5b2	core_sched check ServersMeetMinimumVersion	2019-07-10 13:56:19 -04:00
Lang Martin	a4472e3d34	core_sched check ServersMeetMinimumVersion, send old node deregister	2019-07-10 13:56:19 -04:00
Lang Martin	d5ff2834ca	core_sched batch node deregistration requests	2019-07-10 13:56:19 -04:00
Mahmood Ali	6f077a73dc	Fix panic on failure Error expects an odd number of arguments, and panics otherwise.	2019-01-08 12:19:44 -05:00
Alex Dadgar	14a61ea3ea	Don't GC running but desired stop allocations This PR fixes an edge case where we could GC an allocation that was in a desired stop state but had not terminated yet. This can be hit if the client hasn't shutdown the allocation yet or if the allocation is still shutting down (long kill_timeout). Fixes https://github.com/hashicorp/nomad/issues/4940	2018-12-05 13:01:12 -08:00
Preetha Appan	39072977d6	Use create index as trigger condition to gc old terminal allocs	2018-11-09 11:44:21 -06:00
Preetha Appan	e586817ce7	batch jobs GC removes terminal allocs if job modifyindex is older than running job	2018-11-01 00:05:31 -05:00
Alex Dadgar	3c19d01d7a	server	2018-09-15 16:23:13 -07:00
Preetha Appan	a9d63c0df3	Check allocation's desired state in GC eligibility logic in core scheduler	2018-05-21 13:28:31 -05:00
Preetha Appan	688fd9ee37	Update alloc GC eligility logic to not rely on follow up evals	2018-04-11 13:58:02 -05:00
Preetha Appan	7040884002	Simplify and update allocation gc eligibility logic	2018-04-10 16:08:37 -05:00
Alex Dadgar	7545c0053e	job gc uses batch endpoint	2018-03-16 10:53:03 -07:00
Preetha Appan	8ecb6ca91b	Code review feedback and more test cases	2018-01-31 09:58:05 -06:00
Preetha Appan	28d2439810	Consider dead job status and modify unit test setup for correctness	2018-01-31 09:58:05 -06:00
Preetha Appan	4fd2691323	Use next alloc id being set, move outside structs package and other code review feedback	2018-01-31 09:58:05 -06:00
Preetha Appan	dd91a2f5be	Make garbage collection be aware of rescheduling info in allocations	2018-01-31 09:58:05 -06:00
Alex Dadgar	d3e119f4d0	thread leader token through core gc and test	2017-10-23 15:04:00 -07:00
Alex Dadgar	84d06f6abe	Sync namespace changes	2017-09-07 17:04:21 -07:00
Alex Dadgar	84c2f25e0a	Deployment GC ensures no alloc references	2017-07-17 14:09:59 -07:00
Alex Dadgar	e71e315950	Fix log line for gc'ing deployments	2017-07-13 15:07:25 -07:00
Alex Dadgar	b64185a3f1	Deployment GC This PR implements the garbage collector for deployments. Deployments will by default be garbage collected after 1 hour.	2017-07-07 12:05:57 -07:00
Alex Dadgar	34332af70e	GC and some fixes	2017-04-15 17:08:05 -07:00
Alex Dadgar	3825f7cf1f	Eval GC will collect allocs from stopped batch job This PR fixes a bug in which allocations from stopped batch jobs could not be garbage collected.	2017-03-11 15:48:57 -08:00
Alex Dadgar	b69b357c7f	Nomad builds	2017-02-07 20:31:23 -08:00
Alex Dadgar	7f9c6466d4	Disallow GC of parameterized jobs This PR makes it so parameterized jobs do not get garbage collected and adds a test.	2017-01-26 11:57:32 -08:00
Alex Dadgar	3f0a47f9e4	Disallow EvalGC to reap batch jobs evals/allocs and make JobGC only oneshot GCs everything	2016-06-27 22:54:03 -07:00
Diptanu Choudhury	0fe8746692	GC-ing dead batch jobs	2016-06-22 11:40:27 -07:00
Sean Chittenden	a658299235	Misc typos	2016-06-16 16:17:17 -07:00
Alex Dadgar	98bf249625	Partial GC allocations	2016-06-10 18:32:37 -07:00
Alex Dadgar	cc95d5d332	GC Nodes even if they have terminal allocations	2016-06-03 16:24:41 -07:00
Alex Dadgar	d94204554f	Merge pull request #1012 from hashicorp/f-partition-gc core: Limit GC size	2016-04-14 13:00:53 -07:00
Alex Dadgar	b34ab80c93	Address comments	2016-04-14 11:41:04 -07:00
Alex Dadgar	034bae90bb	Revert "Remove client status from allocation TerminalStatus" This reverts commit 819e1e4b3967c7029ee8221144666ff460fdd7ed.	2016-04-08 14:22:06 -07:00
Alex Dadgar	ca938f205c	Force GC garbage collects nodes last and fix eval GC to cleanup deregistered batch jobs	2016-04-08 11:42:02 -07:00
Alex Dadgar	066d006868	Limit GC size	2016-03-30 15:17:13 -07:00
Alex Dadgar	b9a80f14f1	Limit garbage collection of batch jobs	2016-03-25 16:46:48 -07:00
Alex Dadgar	5fc83bd868	Dead->Complete	2016-03-25 12:56:54 -07:00
Alex Dadgar	09f63fd3c0	Remove client status from allocation TerminalStatus	2016-03-25 12:53:37 -07:00
Alex Dadgar	d42e0a7dfd	Add force node gc	2016-02-20 16:11:29 -08:00
Alex Dadgar	143972b6d9	Job GC endpoint	2016-02-20 15:50:41 -08:00
Alex Dadgar	7bec30c2b3	Small cleanup	2015-12-16 15:00:45 -08:00
Alex Dadgar	f1d88bdf86	Remove user-specifiable gc threshold	2015-12-16 15:00:45 -08:00
Alex Dadgar	2218a79815	Add garbage collection to jobs	2015-12-16 15:00:45 -08:00
Armon Dadgar	ad681be59c	nomad: adding node GC	2015-09-07 11:01:29 -07:00

1 2

53 Commits