open-nomad

Commit Graph

Author	SHA1	Message	Date
Nick Ethier	1e4ea699ad	fix test failures from rebase	2020-06-18 11:05:32 -07:00
Nick Ethier	4a44deaa5c	CNI Implementation (#7518 )	2020-06-18 11:05:29 -07:00
Nick Ethier	0bc0403cc3	Task DNS Options (#7661 ) Co-Authored-By: Tim Gross <tgross@hashicorp.com> Co-Authored-By: Seth Hoenig <shoenig@hashicorp.com>	2020-06-18 11:01:31 -07:00
Tim Gross	c14a75bfab	multiregion: use pending instead of paused The `paused` state is used as an operator safety mechanism, so that they can debug a deployment or halt one that's causing a wider failure. By using the `paused` state as the first state of a multiregion deployment, we risked resuming an intentionally operator-paused deployment because of activity in a peer region. This changeset replaces the use of the `paused` state with a `pending` state, and provides a `Deployment.Run` internal RPC to replace the use of the `Deployment.Pause` (resume) RPC we were using in `deploymentwatcher`.	2020-06-17 11:06:14 -04:00
Tim Gross	fd50b12ee2	multiregion: integrate with deploymentwatcher * `nextRegion` should take status parameter * thread Deployment/Job RPCs thru `nextRegion` * add `nextRegion` calls to `deploymentwatcher` * use a better description for paused for peer	2020-06-17 11:06:00 -04:00
Tim Gross	5c4d0a73f4	start all but first region deployment in paused state	2020-06-17 11:05:34 -04:00
Tim Gross	473a0f1d44	multiregion: unblock and cancel RPCs	2020-06-17 11:02:26 -04:00
Lang Martin	069840bef8	scheduler/reconcile: set FollowupEvalID on lost stop_after_client_disconnect (#8105 ) (#8138 ) * scheduler/reconcile: set FollowupEvalID on lost stop_after_client_disconnect * scheduler/reconcile: thread follupEvalIDs through to results.stop * scheduler/reconcile: comment typo * nomad/_test: correct arguments for plan.AppendStoppedAlloc * scheduler/reconcile: avoid nil, cleanup handleDelayed(Lost\|Reschedules)	2020-06-09 17:13:53 -04:00
Lang Martin	ac7c39d3d3	Delayed evaluations for `stop_after_client_disconnect` can cause unwanted extra followup evaluations around job garbage collection (#8099 ) * client/heartbeatstop: reversed time condition for startup grace * scheduler/generic_sched: use `delayInstead` to avoid a loop Without protecting the loop that creates followUpEvals, a delayed eval is allowed to create an immediate subsequent delayed eval. For both `stop_after_client_disconnect` and the `reschedule` block, a delayed eval should always produce some immediate result (running or blocked) and then only after the outcome of that eval produce a second delayed eval. * scheduler/reconcile: lostLater are different than delayedReschedules Just slightly. `lostLater` allocs should be used to create batched evaluations, but `handleDelayedReschedules` assumes that the allocations are in the untainted set. When it creates the in-place updates to those allocations at the end, it causes the allocation to be treated as running over in the planner, which causes the initial `stop_after_client_disconnect` evaluation to be retried by the worker.	2020-06-03 09:48:38 -04:00
Mahmood Ali	21c948f3d3	keep promotion score constants next to use	2020-05-27 15:13:19 -04:00
Mahmood Ali	d9792777d9	Open source Preemption code Nomad 0.12 OSS is to include preemption feature. This commit moves the private code for managing preemption to OSS repository.	2020-05-27 15:02:01 -04:00
Lang Martin	d3c4700cd3	server: stop after client disconnect (#7939 ) * jobspec, api: add stop_after_client_disconnect * nomad/state/state_store: error message typo * structs: alloc methods to support stop_after_client_disconnect 1. a global AllocStates to track status changes with timestamps. We need this to track the time at which the alloc became lost originally. 2. ShouldClientStop() and WaitClientStop() to actually do the math * scheduler/reconcile_util: delayByStopAfterClientDisconnect * scheduler/reconcile: use delayByStopAfterClientDisconnect * scheduler/util: updateNonTerminalAllocsToLost comments This was setup to only update allocs to lost if the DesiredStatus had already been set by the scheduler. It seems like the intention was to update the status from any non-terminal state, and not all lost allocs have been marked stop or evict by now * scheduler/testing: AssertEvalStatus just use require * scheduler/generic_sched: don't create a blocked eval if delayed * scheduler/generic_sched_test: several scheduling cases	2020-05-13 16:39:04 -04:00
Mahmood Ali	759eade78b	missed fixing one invocation	2020-05-01 13:38:46 -04:00
Mahmood Ali	b9e3cde865	tests and some clean up	2020-05-01 13:13:30 -04:00
Charlie Voiselle	d8e5e02398	Wiring algorithm to scheduler calls	2020-05-01 13:13:29 -04:00
Michael Schurter	c901d0e7dd	Merge branch 'master' into b-reserved-scoring	2020-04-30 14:48:14 -07:00
Mahmood Ali	9f005201e2	Ensure that alloc updates preserve device offers When an alloc is updated in-place, ensure that the allocated device are preserved and carried over to new alloc.	2020-04-21 08:57:15 -04:00
Mahmood Ali	2ff2745374	test for allocated devices on job in-update update When an alloc is updated in-place, test that the allocated devices are preserved in new alloc struct.	2020-04-21 08:56:05 -04:00
Michael Schurter	4c5a0cae35	core: fix node reservation scoring The BinPackIter accounted for node reservations twice when scoring nodes which could bias scores toward nodes with reservations. Pseudo-code for previous algorithm: ``` proposed = reservedResources + sum(allocsResources) available = nodeResources - reservedResources score = 1 - (proposed / available) ``` The node's reserved resources are added to the total resources used by allocations, and then the node's reserved resources are later substracted from the node's overall resources. The new algorithm is: ``` proposed = sum(allocResources) available = nodeResources - reservedResources score = 1 - (proposed / available) ``` The node's reserved resources are no longer added to the total resources used by allocations. My guess as to how this bug happened is that the resource utilization variable (`util`) is calculated and returned by the `AllocsFit` function which needs to take reserved resources into account as a basic feasibility check. To avoid re-calculating alloc resource usage (because there may be a large number of allocs), we reused `util` in the `ScoreFit` function. `ScoreFit` properly accounts for reserved resources by subtracting them from the node's overall resources. However since `util` _also_ took reserved resources into account the score would be incorrect. Prior to the fix the added test output: ``` Node: reserved Score: 1.0000 Node: reserved2 Score: 1.0000 Node: no-reserved Score: 0.9741 ``` The scores being 1.0 for both nodes with reserved resources is a good hint something is wrong as they should receive different scores. Upon further inspection the double accounting of reserved resources caused their scores to be >1.0 and clamped. After the fix the added test outputs: ``` Node: no-reserved Score: 0.9741 Node: reserved Score: 0.9480 Node: reserved2 Score: 0.8717 ```	2020-04-15 15:13:30 -07:00
Michael Schurter	4b475db408	core: fix comment on system stack This makes me do a double take every time I run into it, so what if we just changed it?	2020-04-09 15:19:11 -07:00
Tim Gross	161f9aedc3	scheduler: prevent a reported NPE for CSI (#7633 )	2020-04-06 09:42:27 -04:00
Lang Martin	e03c328792	csi: use node MaxVolumes during scheduling (#7565 ) * nomad/state/state_store: CSIVolumesByNodeID ignores namespace * scheduler/scheduler: add CSIVolumesByNodeID to the state interface * scheduler/feasible: check node MaxVolumes * nomad/csi_endpoint: no namespace inn CSIVolumesByNodeID anymore * nomad/state/state_store: avoid DenormalizeAllocationSlice * nomad/state/iterator: clean up SliceIterator Next * scheduler/feasible_test: block with MaxVolumes * nomad/state/state_store_test: fix args to CSIVolumesByNodeID	2020-03-31 17:16:47 -04:00
Chris Baker	179ab68258	wip: added job.scale rpc endpoint, needs explicit test (tested via http now)	2020-03-24 13:57:09 +00:00
Mahmood Ali	6ddf3d1742	Merge pull request #7414 from hashicorp/b-network-mode-change Detect network mode change	2020-03-24 09:46:40 -04:00
Lang Martin	d994990ef0	csi: the scheduler allows a job with a volume write claim to be updated (#7438 ) * nomad/structs/csi: split CanWrite into health, in use * scheduler/scheduler: expose AllocByID in the state interface * nomad/state/state_store_test * scheduler/stack: SetJobID on the matcher * scheduler/feasible: when a volume writer is in use, check if it's us * scheduler/feasible: remove SetJob * nomad/state/state_store: denormalize allocs before Claim * nomad/structs/csi: return errors on claim, with context * nomad/csi_endpoint_test: new alloc doesn't look like an update * nomad/state/state_store_test: change test reference to CanWrite	2020-03-23 21:21:04 -04:00
Tim Gross	d1f43a5fea	csi: improve error messages from scheduler (#7426 )	2020-03-23 13:59:25 -04:00
Lang Martin	3621df1dbf	csi: volume ids are only unique per namespace (#7358 ) * nomad/state/schema: use the namespace compound index * scheduler/scheduler: CSIVolumeByID interface signature namespace * scheduler/stack: SetJob on CSIVolumeChecker to capture namespace * scheduler/feasible: pass the captured namespace to CSIVolumeByID * nomad/state/state_store: use namespace in csi_volume index * nomad/fsm: pass namespace to CSIVolumeDeregister & Claim * nomad/core_sched: pass the namespace in volumeClaimReap * nomad/node_endpoint_test: namespaces in Claim testing * nomad/csi_endpoint: pass RequestNamespace to state.* * nomad/csi_endpoint_test: appropriately failed test * command/alloc_status_test: appropriately failed test * node_endpoint_test: avoid notTheNamespace for the job * scheduler/feasible_test: call SetJob to capture the namespace * nomad/csi_endpoint: ACL check the req namespace, query by namespace * nomad/state/state_store: remove deregister namespace check * nomad/state/state_store: remove unused CSIVolumes * scheduler/feasible: CSIVolumeChecker SetJob -> SetNamespace * nomad/csi_endpoint: ACL check * nomad/state/state_store_test: remove call to state.CSIVolumes * nomad/core_sched_test: job namespace match so claim gc works	2020-03-23 13:59:25 -04:00
Danielle Lancashire	e227f31584	sched/feasible: Return more detailed CSI Failure messages	2020-03-23 13:58:30 -04:00
Danielle Lancashire	a2e01c4369	sched/feasible: Validate CSIVolume's correctly Previously we were looking up plugins based on the Alias Name for a CSI Volume within the context of its task group. Here we first look up a volume based on its identifier and then validate the existence of the plugin based on its `PluginID`.	2020-03-23 13:58:30 -04:00
Danielle Lancashire	e56c677221	sched/feasible: CSI - Filter applicable volumes This commit filters the jobs volumes when setting them on the feasibility checker. This ensures that the rest of the checker does not have to worry about non-csi volumes.	2020-03-23 13:58:30 -04:00
Lang Martin	7b675f89ac	csi: fix index maintenance for CSIVolume and CSIPlugin tables (#7049 ) * state_store: csi volumes/plugins store the index in the txn * nomad: csi_endpoint_test require index checks need uint64() * nomad: other tests using int 0 not uint64(0) * structs: pass index into New, but not other struct methods * state_store: csi plugin indexes, use new struct interface * nomad: csi_endpoint_test check index/query meta (on explicit 0) * structs: NewCSIVolume takes an index arg now * scheduler/test: NewCSIVolume takes an index arg now	2020-03-23 13:58:29 -04:00
Lang Martin	a0a6766740	CSI: Scheduler knows about CSI constraints and availability (#6995 ) * structs: piggyback csi volumes on host volumes for job specs * state_store: CSIVolumeByID always includes plugins, matches usecase * scheduler/feasible: csi volume checker * scheduler/stack: add csi volumes * contributing: update rpc checklist * scheduler: add volumes to State interface * scheduler/feasible: introduce new checker collection tgAvailable * scheduler/stack: taskGroupCSIVolumes checker is transient * state_store CSIVolumeDenormalizePlugins comment clarity * structs: remote TODO comment in TaskGroup Validate * scheduler/feasible: CSIVolumeChecker hasPlugins improve comment * scheduler/feasible_test: set t.Parallel * Update nomad/state/state_store.go Co-Authored-By: Danielle <dani@hashicorp.com> * Update scheduler/feasible.go Co-Authored-By: Danielle <dani@hashicorp.com> * structs: lift ControllerRequired to each volume * state_store: store plug.ControllerRequired, use it for volume health * feasible: csi match fast path remove stale host volume copied logic * scheduler/feasible: improve comments Co-authored-by: Danielle <dani@builds.terrible.systems>	2020-03-23 13:58:29 -04:00
Jasmine Dahilig	81d051d7e8	fix bug in lifecycle scheduler test mocks	2020-03-21 17:52:51 -04:00
Jasmine Dahilig	0cc9212a54	add test cases for scheduler alloc placement with lifecycle resources	2020-03-21 17:52:47 -04:00
Jasmine Dahilig	3e4e8f2b02	add allocfit test for lifecycles	2020-03-21 17:52:46 -04:00
Mahmood Ali	b880607bad	update scheduler to account for hooks	2020-03-21 17:52:45 -04:00
Mahmood Ali	9568553d7e	Detect network mode change Mark job as updated if network mode changed.	2020-03-21 16:51:10 -04:00
Drew Bailey	6bd6c6638c	include pro tag in serveral oss.go files	2020-02-10 15:56:14 -05:00
Drew Bailey	9a65556211	add state store test to ensure PlacedCanaries is updated	2020-02-03 13:58:01 -05:00
Drew Bailey	f51a3d1f37	nomad state store must be modified through raft, rm local state change	2020-02-03 13:57:34 -05:00
Drew Bailey	1c046a74d8	comment for filtering reason	2020-02-03 09:02:09 -05:00
Drew Bailey	e71f132455	add test for node eligibility	2020-02-03 09:02:09 -05:00
Drew Bailey	6b492630dd	make diffSystemAllocsForNode aware of eligibility diffSystemAllocs -> diffSystemAllocsForNode, this function is only used for diffing system allocations, but lacked awareness of eligible nodes and the node ID that the allocation was going to be placed. This change now ignores a change if its existing allocation is on an ineligible node. For a new allocation, it also checks tainted and ineligible nodes in the same function instead of nil-ing out the diff after computation in diffSystemAllocs	2020-02-03 09:02:08 -05:00
Drew Bailey	e613a258da	ignore computed diffs if node is ineligible test flakey, add temp sleeps for debugging fix computed class	2020-02-03 09:02:08 -05:00
Drew Bailey	63ddda71e1	Return FailedTGAlloc metric instead of no node err If an existing system allocation is running and the node its running on is marked as ineligible, subsequent plan/applys return an RPC error instead of a more helpful plan result. This change logs the error, and appends a failedTGAlloc for the placement.	2020-01-22 10:07:15 -05:00
Drew Bailey	ef175c0b31	Update Evicted allocations to lost when lost If an alloc is being preempted and marked as evict, but the underlying node is lost before the migration takes place, the allocation currently stays as desired evict, status running forever, or until the node comes back online. This commit updates updateNonTerminalAllocsToLost to check for a destired status of Evict as well as Stop when updating allocations on tainted nodes. switch to table test for lost node cases	2020-01-07 13:34:18 -05:00
Preetha Appan	afff27b69b	More error->debug for logging in the bin packing iterator	2019-12-12 15:50:16 -06:00
Preetha Appan	3458b41290	Use debug logging for scheduler internals We currently log an error if preemption is unable to find a suitable set of allocations to preempt. This commit changes that to debug level since not finding preemptable allocations is not an error condition.	2019-12-12 12:05:29 -06:00
Michael Schurter	7655e0cee4	Merge pull request #6792 from hashicorp/b-propose-panic scheduler: fix panic when preempting and evicting allocs	2019-12-03 10:40:19 -08:00
Tim Gross	c50057bf1f	scheduler: fix job update placement on prev node penalized (#6781 ) Fixes #5856 When the scheduler looks for a placement for an allocation that's replacing another allocation, it's supposed to penalize the previous node if the allocation had been rescheduled or failed. But we're currently always penalizing the node, which leads to unnecessary migrations on job update. This commit leaves in place the existing behavior where if the previous alloc was itself rescheduled, its previous nodes are also penalized. This is conservative but the right behavior especially on larger clusters where a group of hosts might be having correlated trouble (like an AZ failure). Co-Authored-By: Michael Schurter <mschurter@hashicorp.com>	2019-12-03 06:14:49 -08:00

1 2 3 4 5 ...

676 Commits