open-nomad

Commit Graph

Author	SHA1	Message	Date
Tim Gross	3ceb5b36b1	csi: allow more than 1 writer claim for multi-writer mode (#9040 ) Fixes a bug where CSI volumes with the `MULTI_NODE_MULTI_WRITER` access mode were using the same logic as `MULTI_NODE_SINGLE_WRITER` to determine whether the volume had writer claims available for scheduling. Extends CSI claim endpoint test to exercise multi-reader and make sure `WriteFreeClaims` is exercised for multi-writer in feasibility test.	2020-10-07 10:43:23 -04:00
Seth Hoenig	f44a4f68ee	consul/connect: trigger update as necessary on connect changes This PR fixes a long standing bug where submitting jobs with changes to connect services would not trigger updates as expected. Previously, service blocks were not considered as sources of destructive updates since they could be synced with consul non-destructively. With Connect, task group services that have changes to their connect block or to the service port should be destructive, since the network plumbing of the alloc is going to need updating. Fixes #8596 #7991 Non-destructive half in #7192	2020-10-05 14:53:00 -05:00
Neil Mock	f749de8543	Fix multi-interface networking in the system scheduler (#8822 )	2020-09-22 12:54:34 -04:00
Mahmood Ali	6a0dd8bc87	Merge pull request #8867 from hashicorp/b-canary-substitution scheduler: Revert requireCanary logic	2020-09-15 12:58:55 -05:00
Mahmood Ali	339617a836	Only ignore rescheduled allocations if they got stopped	2020-09-14 21:11:52 -04:00
Mahmood Ali	98de2d2278	add a test when .NextAllocation is set but alloc is still running	2020-09-14 17:12:53 -04:00
Mahmood Ali	fd54cfce6e	Revert the `requireCanary` check introduced in https://github.com/hashicorp/nomad/pull/8691/files#diff-1801138ac4d10f2064ba6f2e434ac9b4L430-R431 . The change was intended to fix a case where a canary alloc may fail to be rescheduled if all the other allocs fail as well (e.g. if all allocs happen to be placed on a node that died). However, it introduced some unintended side-effects. Reverting the change for now and will investigate further.	2020-09-10 14:59:02 -04:00
Mahmood Ali	c6e1d22697	test for rescheduling non-canaries	2020-09-10 14:59:02 -04:00
Mahmood Ali	8837c9a45d	Handle migration of non-deployment jobs This handles the case where a job when from no-deployment to deployment with canaries. Consider a case where a `max_parallel=0` job is submitted as version 0, then an update is submitted with `max_parallel=1, canary=1` as verion 1. In this case, we will have 1 canary alloc, and all remaining allocs will be version 0. Until the deployment is promoted, we ought to replace the canaries with version 0 job (which isn't associated with a deployment).	2020-08-26 10:36:34 -04:00
Mahmood Ali	2438b90334	Update scheduler/reconcile.go Co-authored-by: Chris Baker <1675087+cgbaker@users.noreply.github.com>	2020-08-25 17:37:19 -04:00
Mahmood Ali	38b61b97d8	simplify canary check `(alloc.DeploymentStatus == nil \|\| !alloc.DeploymentStatus.IsCanary())` and `!alloc.DeploymentStatus.IsCanary()` are equivalent.	2020-08-25 17:37:19 -04:00
Mahmood Ali	e4bb88dfcf	tweak stack job manipulation To address review comments	2020-08-25 17:37:19 -04:00
Mahmood Ali	def768728e	Have Plan.AppendAlloc accept the job	2020-08-25 17:22:09 -04:00
Mahmood Ali	8a342926b7	Respect alloc job version for lost/failed allocs This change fixes a bug where lost/failed allocations are replaced by allocations with the latest versions, even if the version hasn't been promoted yet. Now, when generating a plan for lost/failed allocations, the scheduler first checks if the current deployment is in Canary stage, and if so, it ensures that any lost/failed allocations is replaced one with the latest promoted version instead.	2020-08-19 09:52:48 -04:00
Lars Lehtonen	fb7b2282b1	scheduler: label loops with nested switch statements for effective break (#8528 )	2020-07-24 08:50:41 -04:00
Tim Gross	1ca2c4ec2c	scheduler: DesiredCanaries can be set on every pass safely The reconcile loop sets `DeploymentState.DesiredCanaries` only on the first pass through the loop and if the job is not paused/pending. In MRD, deployments will make one pass though the loop while "pending", and were not ever getting `DesiredCanaries` set. We can't set it in the initial `DeploymentState` constructor because the first pass through setting up canaries expects it's not there yet. However, this value is static for a given version of a job because it's coming from the update stanza, so it's safe to re-assign the value on subsequent passes.	2020-07-20 11:25:53 -04:00
Tim Gross	d3341a2019	refactor: make it clear where we're accessing dstate The field name `Deployment.TaskGroups` contains a map of `DeploymentState`, which makes it a little harder to follow state updates when combined with inconsistent naming conventions, particularly when we also have the state store or actual `TaskGroup`s in scope. This changeset changes all uses to `dstate` so as not to be confused with actual TaskGroups.	2020-07-20 11:25:53 -04:00
Tim Gross	fe5f5e35aa	mrd: reconcile should treat pending deployments as paused (#8446 ) If a job update includes a task group that has no changes, those allocations have their version bumped in-place. The ends up triggering an eval from `deploymentwatcher` when it verifies their health. Although this eval is a no-op, we were only treating pending deployments the same as paused when the deployment was a new MRD. This means that any eval after the initial one will kick off the deployment, and that caused pending deployments to "jump the queue" and run ahead of schedule, breaking MRD invariants and resulting in a state with all regions blocked. This behavior can be replicated even in the case of job updates with no in-place updates by patching `deploymentwatcher` to inject a spurious no-op eval. This changeset fixes the behavior by treating pending deployments the same as paused in all cases in the reconciler.	2020-07-16 13:00:08 -04:00
Tim Gross	bd457343de	MRD: all regions should start pending (#8433 ) Deployments should wait until kicked off by `Job.Register` so that we can assert that all regions have a scheduled deployment before starting any region. This changeset includes the OSS fixes to support the ENT work. `IsMultiregionStarter` has no more callers in OSS, so remove it here.	2020-07-14 10:57:37 -04:00
Nick Ethier	e0fb634309	ar: support opting into binding host ports to default network IP (#8321 ) * ar: support opting into binding host ports to default network IP * fix config plumbing * plumb node address into network resource * struct: only handle network resource upgrade path once	2020-07-06 18:51:46 -04:00
Tim Gross	31185325c9	reconcile should not overwrite unblocking state (#8349 ) Pre-0.12.0 beta, a deployment was considered "complete" if it was successful. But with MRD we have "blocked" and "unblocking" states as well. We did not consider the case where a concurrent alloc health status update triggers a `Compute` call on a deployment that's moved from "blocked" to "unblocking" (it's a small window), which caused an extra pass thru the `nextRegion` logic in `deploymentwatcher` and triggered an error when later transitioning to "successful". This changeset makes sure we don't overwrite that status.	2020-07-06 11:31:33 -04:00
Nick Ethier	89118016fc	command: correctly show host IP in ports output /w multi-host networks (#8289 )	2020-06-25 15:16:01 -04:00
Nick Ethier	416efd83ee	scheduler: do network feasibility checking for system jobs (#8256 )	2020-06-24 16:01:00 -04:00
Mahmood Ali	1c1fb5da0a	this is OSS	2020-06-22 10:28:45 -04:00
Michael Schurter	562704124d	Merge pull request #8208 from hashicorp/f-multi-network multi-interface network support	2020-06-19 15:46:48 -07:00
Tim Gross	d3ecb87984	multiregion: initial deploymentPaused must match start condition (#8215 ) In #8209 we fixed the `max_parallel` stanza for multiregion by introducing the `IsMultiregionStarter` check, but didn't apply it to the earlier place its required. The result is that deployments start but don't place allocations.	2020-06-19 13:42:38 -04:00
Tim Gross	b654e1b8a4	multiregion: all regions start in running if no max_parallel (#8209 ) If `max_parallel` is not set, all regions should begin in a `running` state rather than a `pending` state. Otherwise the first region is set to `running` and then all the remaining regions once it enters `blocked. That behavior is technically correct in that we have at most `max_parallel` regions running, but definitely not what a user expects.	2020-06-19 11:17:09 -04:00
Nick Ethier	f0559a8162	multi-interface network support	2020-06-19 09:42:10 -04:00
Nick Ethier	1e4ea699ad	fix test failures from rebase	2020-06-18 11:05:32 -07:00
Nick Ethier	4a44deaa5c	CNI Implementation (#7518 )	2020-06-18 11:05:29 -07:00
Nick Ethier	0bc0403cc3	Task DNS Options (#7661 ) Co-Authored-By: Tim Gross <tgross@hashicorp.com> Co-Authored-By: Seth Hoenig <shoenig@hashicorp.com>	2020-06-18 11:01:31 -07:00
Tim Gross	c14a75bfab	multiregion: use pending instead of paused The `paused` state is used as an operator safety mechanism, so that they can debug a deployment or halt one that's causing a wider failure. By using the `paused` state as the first state of a multiregion deployment, we risked resuming an intentionally operator-paused deployment because of activity in a peer region. This changeset replaces the use of the `paused` state with a `pending` state, and provides a `Deployment.Run` internal RPC to replace the use of the `Deployment.Pause` (resume) RPC we were using in `deploymentwatcher`.	2020-06-17 11:06:14 -04:00
Tim Gross	fd50b12ee2	multiregion: integrate with deploymentwatcher * `nextRegion` should take status parameter * thread Deployment/Job RPCs thru `nextRegion` * add `nextRegion` calls to `deploymentwatcher` * use a better description for paused for peer	2020-06-17 11:06:00 -04:00
Tim Gross	5c4d0a73f4	start all but first region deployment in paused state	2020-06-17 11:05:34 -04:00
Tim Gross	473a0f1d44	multiregion: unblock and cancel RPCs	2020-06-17 11:02:26 -04:00
Lang Martin	069840bef8	scheduler/reconcile: set FollowupEvalID on lost stop_after_client_disconnect (#8105 ) (#8138 ) * scheduler/reconcile: set FollowupEvalID on lost stop_after_client_disconnect * scheduler/reconcile: thread follupEvalIDs through to results.stop * scheduler/reconcile: comment typo * nomad/_test: correct arguments for plan.AppendStoppedAlloc * scheduler/reconcile: avoid nil, cleanup handleDelayed(Lost\|Reschedules)	2020-06-09 17:13:53 -04:00
Lang Martin	ac7c39d3d3	Delayed evaluations for `stop_after_client_disconnect` can cause unwanted extra followup evaluations around job garbage collection (#8099 ) * client/heartbeatstop: reversed time condition for startup grace * scheduler/generic_sched: use `delayInstead` to avoid a loop Without protecting the loop that creates followUpEvals, a delayed eval is allowed to create an immediate subsequent delayed eval. For both `stop_after_client_disconnect` and the `reschedule` block, a delayed eval should always produce some immediate result (running or blocked) and then only after the outcome of that eval produce a second delayed eval. * scheduler/reconcile: lostLater are different than delayedReschedules Just slightly. `lostLater` allocs should be used to create batched evaluations, but `handleDelayedReschedules` assumes that the allocations are in the untainted set. When it creates the in-place updates to those allocations at the end, it causes the allocation to be treated as running over in the planner, which causes the initial `stop_after_client_disconnect` evaluation to be retried by the worker.	2020-06-03 09:48:38 -04:00
Mahmood Ali	21c948f3d3	keep promotion score constants next to use	2020-05-27 15:13:19 -04:00
Mahmood Ali	d9792777d9	Open source Preemption code Nomad 0.12 OSS is to include preemption feature. This commit moves the private code for managing preemption to OSS repository.	2020-05-27 15:02:01 -04:00
Lang Martin	d3c4700cd3	server: stop after client disconnect (#7939 ) * jobspec, api: add stop_after_client_disconnect * nomad/state/state_store: error message typo * structs: alloc methods to support stop_after_client_disconnect 1. a global AllocStates to track status changes with timestamps. We need this to track the time at which the alloc became lost originally. 2. ShouldClientStop() and WaitClientStop() to actually do the math * scheduler/reconcile_util: delayByStopAfterClientDisconnect * scheduler/reconcile: use delayByStopAfterClientDisconnect * scheduler/util: updateNonTerminalAllocsToLost comments This was setup to only update allocs to lost if the DesiredStatus had already been set by the scheduler. It seems like the intention was to update the status from any non-terminal state, and not all lost allocs have been marked stop or evict by now * scheduler/testing: AssertEvalStatus just use require * scheduler/generic_sched: don't create a blocked eval if delayed * scheduler/generic_sched_test: several scheduling cases	2020-05-13 16:39:04 -04:00
Mahmood Ali	759eade78b	missed fixing one invocation	2020-05-01 13:38:46 -04:00
Mahmood Ali	b9e3cde865	tests and some clean up	2020-05-01 13:13:30 -04:00
Charlie Voiselle	d8e5e02398	Wiring algorithm to scheduler calls	2020-05-01 13:13:29 -04:00
Michael Schurter	c901d0e7dd	Merge branch 'master' into b-reserved-scoring	2020-04-30 14:48:14 -07:00
Mahmood Ali	9f005201e2	Ensure that alloc updates preserve device offers When an alloc is updated in-place, ensure that the allocated device are preserved and carried over to new alloc.	2020-04-21 08:57:15 -04:00
Mahmood Ali	2ff2745374	test for allocated devices on job in-update update When an alloc is updated in-place, test that the allocated devices are preserved in new alloc struct.	2020-04-21 08:56:05 -04:00
Michael Schurter	4c5a0cae35	core: fix node reservation scoring The BinPackIter accounted for node reservations twice when scoring nodes which could bias scores toward nodes with reservations. Pseudo-code for previous algorithm: ``` proposed = reservedResources + sum(allocsResources) available = nodeResources - reservedResources score = 1 - (proposed / available) ``` The node's reserved resources are added to the total resources used by allocations, and then the node's reserved resources are later substracted from the node's overall resources. The new algorithm is: ``` proposed = sum(allocResources) available = nodeResources - reservedResources score = 1 - (proposed / available) ``` The node's reserved resources are no longer added to the total resources used by allocations. My guess as to how this bug happened is that the resource utilization variable (`util`) is calculated and returned by the `AllocsFit` function which needs to take reserved resources into account as a basic feasibility check. To avoid re-calculating alloc resource usage (because there may be a large number of allocs), we reused `util` in the `ScoreFit` function. `ScoreFit` properly accounts for reserved resources by subtracting them from the node's overall resources. However since `util` _also_ took reserved resources into account the score would be incorrect. Prior to the fix the added test output: ``` Node: reserved Score: 1.0000 Node: reserved2 Score: 1.0000 Node: no-reserved Score: 0.9741 ``` The scores being 1.0 for both nodes with reserved resources is a good hint something is wrong as they should receive different scores. Upon further inspection the double accounting of reserved resources caused their scores to be >1.0 and clamped. After the fix the added test outputs: ``` Node: no-reserved Score: 0.9741 Node: reserved Score: 0.9480 Node: reserved2 Score: 0.8717 ```	2020-04-15 15:13:30 -07:00
Michael Schurter	4b475db408	core: fix comment on system stack This makes me do a double take every time I run into it, so what if we just changed it?	2020-04-09 15:19:11 -07:00
Tim Gross	161f9aedc3	scheduler: prevent a reported NPE for CSI (#7633 )	2020-04-06 09:42:27 -04:00
Lang Martin	e03c328792	csi: use node MaxVolumes during scheduling (#7565 ) * nomad/state/state_store: CSIVolumesByNodeID ignores namespace * scheduler/scheduler: add CSIVolumesByNodeID to the state interface * scheduler/feasible: check node MaxVolumes * nomad/csi_endpoint: no namespace inn CSIVolumesByNodeID anymore * nomad/state/state_store: avoid DenormalizeAllocationSlice * nomad/state/iterator: clean up SliceIterator Next * scheduler/feasible_test: block with MaxVolumes * nomad/state/state_store_test: fix args to CSIVolumesByNodeID	2020-03-31 17:16:47 -04:00
Chris Baker	179ab68258	wip: added job.scale rpc endpoint, needs explicit test (tested via http now)	2020-03-24 13:57:09 +00:00
Mahmood Ali	6ddf3d1742	Merge pull request #7414 from hashicorp/b-network-mode-change Detect network mode change	2020-03-24 09:46:40 -04:00
Lang Martin	d994990ef0	csi: the scheduler allows a job with a volume write claim to be updated (#7438 ) * nomad/structs/csi: split CanWrite into health, in use * scheduler/scheduler: expose AllocByID in the state interface * nomad/state/state_store_test * scheduler/stack: SetJobID on the matcher * scheduler/feasible: when a volume writer is in use, check if it's us * scheduler/feasible: remove SetJob * nomad/state/state_store: denormalize allocs before Claim * nomad/structs/csi: return errors on claim, with context * nomad/csi_endpoint_test: new alloc doesn't look like an update * nomad/state/state_store_test: change test reference to CanWrite	2020-03-23 21:21:04 -04:00
Tim Gross	d1f43a5fea	csi: improve error messages from scheduler (#7426 )	2020-03-23 13:59:25 -04:00
Lang Martin	3621df1dbf	csi: volume ids are only unique per namespace (#7358 ) * nomad/state/schema: use the namespace compound index * scheduler/scheduler: CSIVolumeByID interface signature namespace * scheduler/stack: SetJob on CSIVolumeChecker to capture namespace * scheduler/feasible: pass the captured namespace to CSIVolumeByID * nomad/state/state_store: use namespace in csi_volume index * nomad/fsm: pass namespace to CSIVolumeDeregister & Claim * nomad/core_sched: pass the namespace in volumeClaimReap * nomad/node_endpoint_test: namespaces in Claim testing * nomad/csi_endpoint: pass RequestNamespace to state.* * nomad/csi_endpoint_test: appropriately failed test * command/alloc_status_test: appropriately failed test * node_endpoint_test: avoid notTheNamespace for the job * scheduler/feasible_test: call SetJob to capture the namespace * nomad/csi_endpoint: ACL check the req namespace, query by namespace * nomad/state/state_store: remove deregister namespace check * nomad/state/state_store: remove unused CSIVolumes * scheduler/feasible: CSIVolumeChecker SetJob -> SetNamespace * nomad/csi_endpoint: ACL check * nomad/state/state_store_test: remove call to state.CSIVolumes * nomad/core_sched_test: job namespace match so claim gc works	2020-03-23 13:59:25 -04:00
Danielle Lancashire	e227f31584	sched/feasible: Return more detailed CSI Failure messages	2020-03-23 13:58:30 -04:00
Danielle Lancashire	a2e01c4369	sched/feasible: Validate CSIVolume's correctly Previously we were looking up plugins based on the Alias Name for a CSI Volume within the context of its task group. Here we first look up a volume based on its identifier and then validate the existence of the plugin based on its `PluginID`.	2020-03-23 13:58:30 -04:00
Danielle Lancashire	e56c677221	sched/feasible: CSI - Filter applicable volumes This commit filters the jobs volumes when setting them on the feasibility checker. This ensures that the rest of the checker does not have to worry about non-csi volumes.	2020-03-23 13:58:30 -04:00
Lang Martin	7b675f89ac	csi: fix index maintenance for CSIVolume and CSIPlugin tables (#7049 ) * state_store: csi volumes/plugins store the index in the txn * nomad: csi_endpoint_test require index checks need uint64() * nomad: other tests using int 0 not uint64(0) * structs: pass index into New, but not other struct methods * state_store: csi plugin indexes, use new struct interface * nomad: csi_endpoint_test check index/query meta (on explicit 0) * structs: NewCSIVolume takes an index arg now * scheduler/test: NewCSIVolume takes an index arg now	2020-03-23 13:58:29 -04:00
Lang Martin	a0a6766740	CSI: Scheduler knows about CSI constraints and availability (#6995 ) * structs: piggyback csi volumes on host volumes for job specs * state_store: CSIVolumeByID always includes plugins, matches usecase * scheduler/feasible: csi volume checker * scheduler/stack: add csi volumes * contributing: update rpc checklist * scheduler: add volumes to State interface * scheduler/feasible: introduce new checker collection tgAvailable * scheduler/stack: taskGroupCSIVolumes checker is transient * state_store CSIVolumeDenormalizePlugins comment clarity * structs: remote TODO comment in TaskGroup Validate * scheduler/feasible: CSIVolumeChecker hasPlugins improve comment * scheduler/feasible_test: set t.Parallel * Update nomad/state/state_store.go Co-Authored-By: Danielle <dani@hashicorp.com> * Update scheduler/feasible.go Co-Authored-By: Danielle <dani@hashicorp.com> * structs: lift ControllerRequired to each volume * state_store: store plug.ControllerRequired, use it for volume health * feasible: csi match fast path remove stale host volume copied logic * scheduler/feasible: improve comments Co-authored-by: Danielle <dani@builds.terrible.systems>	2020-03-23 13:58:29 -04:00
Jasmine Dahilig	81d051d7e8	fix bug in lifecycle scheduler test mocks	2020-03-21 17:52:51 -04:00
Jasmine Dahilig	0cc9212a54	add test cases for scheduler alloc placement with lifecycle resources	2020-03-21 17:52:47 -04:00
Jasmine Dahilig	3e4e8f2b02	add allocfit test for lifecycles	2020-03-21 17:52:46 -04:00
Mahmood Ali	b880607bad	update scheduler to account for hooks	2020-03-21 17:52:45 -04:00
Mahmood Ali	9568553d7e	Detect network mode change Mark job as updated if network mode changed.	2020-03-21 16:51:10 -04:00
Drew Bailey	6bd6c6638c	include pro tag in serveral oss.go files	2020-02-10 15:56:14 -05:00
Drew Bailey	9a65556211	add state store test to ensure PlacedCanaries is updated	2020-02-03 13:58:01 -05:00
Drew Bailey	f51a3d1f37	nomad state store must be modified through raft, rm local state change	2020-02-03 13:57:34 -05:00
Drew Bailey	1c046a74d8	comment for filtering reason	2020-02-03 09:02:09 -05:00
Drew Bailey	e71f132455	add test for node eligibility	2020-02-03 09:02:09 -05:00
Drew Bailey	6b492630dd	make diffSystemAllocsForNode aware of eligibility diffSystemAllocs -> diffSystemAllocsForNode, this function is only used for diffing system allocations, but lacked awareness of eligible nodes and the node ID that the allocation was going to be placed. This change now ignores a change if its existing allocation is on an ineligible node. For a new allocation, it also checks tainted and ineligible nodes in the same function instead of nil-ing out the diff after computation in diffSystemAllocs	2020-02-03 09:02:08 -05:00
Drew Bailey	e613a258da	ignore computed diffs if node is ineligible test flakey, add temp sleeps for debugging fix computed class	2020-02-03 09:02:08 -05:00
Drew Bailey	63ddda71e1	Return FailedTGAlloc metric instead of no node err If an existing system allocation is running and the node its running on is marked as ineligible, subsequent plan/applys return an RPC error instead of a more helpful plan result. This change logs the error, and appends a failedTGAlloc for the placement.	2020-01-22 10:07:15 -05:00
Drew Bailey	ef175c0b31	Update Evicted allocations to lost when lost If an alloc is being preempted and marked as evict, but the underlying node is lost before the migration takes place, the allocation currently stays as desired evict, status running forever, or until the node comes back online. This commit updates updateNonTerminalAllocsToLost to check for a destired status of Evict as well as Stop when updating allocations on tainted nodes. switch to table test for lost node cases	2020-01-07 13:34:18 -05:00
Preetha Appan	afff27b69b	More error->debug for logging in the bin packing iterator	2019-12-12 15:50:16 -06:00
Preetha Appan	3458b41290	Use debug logging for scheduler internals We currently log an error if preemption is unable to find a suitable set of allocations to preempt. This commit changes that to debug level since not finding preemptable allocations is not an error condition.	2019-12-12 12:05:29 -06:00
Michael Schurter	7655e0cee4	Merge pull request #6792 from hashicorp/b-propose-panic scheduler: fix panic when preempting and evicting allocs	2019-12-03 10:40:19 -08:00
Tim Gross	c50057bf1f	scheduler: fix job update placement on prev node penalized (#6781 ) Fixes #5856 When the scheduler looks for a placement for an allocation that's replacing another allocation, it's supposed to penalize the previous node if the allocation had been rescheduled or failed. But we're currently always penalizing the node, which leads to unnecessary migrations on job update. This commit leaves in place the existing behavior where if the previous alloc was itself rescheduled, its previous nodes are also penalized. This is conservative but the right behavior especially on larger clusters where a group of hosts might be having correlated trouble (like an AZ failure). Co-Authored-By: Michael Schurter <mschurter@hashicorp.com>	2019-12-03 06:14:49 -08:00
Michael Schurter	0374069f82	scheduler: update tests with modern error helper	2019-12-02 20:25:52 -08:00
Michael Schurter	19a2ee71d3	scheduler: fix panic when preempting and evicting Fixes #6787 In ProposedAllocs the proposed alloc slice was being copied while its contents were not. Since RemoveAllocs nils elements of the proposed alloc slice and is called twice, it could panic on the second call when erroneously accessing a nil'd alloc. The fix is to not copy the proposed alloc slice and pass the slice returned by the 1st RemoveAllocs call to the 2nd call, thus maintaining the trimmed length.	2019-12-02 20:22:22 -08:00
Michael Schurter	6f64e52d61	Merge pull request #6699 from hashicorp/f-semver-constraints Add new "semver" constraint	2019-11-19 12:18:43 -08:00
Drew Bailey	876618b5d2	Removes checking constraints for inplace update	2019-11-19 13:34:41 -05:00
Michael Schurter	796758b8a5	core: add semver constraint The existing version constraint uses logic optimized for package managers, not schedulers, when checking prereleases: - 1.3.0-beta1 will not satisfy ">= 0.6.1" - 1.7.0-rc1 will not satisfy ">= 1.6.0-beta1" This is due to package managers wishing to favor final releases over prereleases. In a scheduler versions more often represent the earliest release all required features/APIs are available in a system. Whether the constraint or the version being evaluated are prereleases has no impact on ordering. This commit adds a new constraint - `semver` - which will use Semver v2.0 ordering when evaluating constraints. Given the above examples: - 1.3.0-beta1 satisfies ">= 0.6.1" using `semver` - 1.7.0-rc1 satisfies ">= 1.6.0-beta1" using `semver` Since existing jobspecs may rely on the old behavior, a new constraint was added and the implicit Consul Connect and Vault constraints were updated to use it.	2019-11-19 08:40:19 -08:00
Drew Bailey	e44a66d7fc	DOCS: Spread stanza does not exist on task Fixes documentation inaccuracy for spread stanza placement. Spreads can only exist on the top level job struct or within a group. comment about nil assumption	2019-11-19 08:26:36 -05:00
Drew Bailey	07e3164bf9	Check for changes to affinity and constraints Adds checks for affinity and constraint changes when determining if we should update inplace. refactor to check all levels at once check for spread changes when checking inplace update	2019-11-19 08:26:34 -05:00
Chris Baker	e0105f817a	changed all tests to require from t.Fatalf	2019-11-07 22:39:47 +00:00
Chris Baker	95ae01a9f4	the scheduler checks whether task changes require a restart, this needed to be updated to consider devices	2019-11-07 17:51:15 +00:00
Michael Schurter	c6bbe85f42	core: fix panic when AllocatedResources is nil Fix for #6540	2019-10-28 14:38:21 -07:00
Danielle Lancashire	78b61de45f	config: Hoist volume.config.source into volume Currently, using a Volume in a job uses the following configuration: ``` volume "alias-name" { type = "volume-type" read_only = true config { source = "host_volume_name" } } ``` This commit migrates to the following: ``` volume "alias-name" { type = "volume-type" source = "host_volume_name" read_only = true } ``` The original design was based due to being uncertain about the future of storage plugins, and to allow maxium flexibility. However, this causes a few issues, namely: - We frequently need to parse this configuration during submission, scheduling, and mounting - It complicates the configuration from and end users perspective - It complicates the ability to do validation As we understand the problem space of CSI a little more, it has become clear that we won't need the `source` to be in config, as it will be used in the majority of cases: - Host Volumes: Always need a source - Preallocated CSI Volumes: Always needs a source from a volume or claim name - Dynamic Persistent CSI Volumes: Always needs a source to attach the volumes to for managing upgrades and to avoid dangling. - Dynamic Ephemeral CSI Volumes: Less thought out, but `source` will probably point to the plugin name, and a `config` block will allow you to pass meta to the plugin. Or will point to a pre-configured ephemeral config. *If implemented The new design simplifies this by merging the source into the volume stanza to solve the above issues with usability, performance, and error handling.	2019-09-13 04:37:59 +02:00
Preetha Appan	9accf60805	update comment	2019-09-05 18:43:30 -05:00
Preetha Appan	d21c708c4a	Fix inplace updates bug with group level networks During inplace updates, we should be using network information from the previous allocation being updated.	2019-09-05 18:37:24 -05:00
Jasmine Dahilig	4edebe389a	add default update stanza and max_parallel=0 disables deployments (#6191 )	2019-09-02 10:30:09 -07:00
Mahmood Ali	3a1cb51539	schedulers: check all drivers on node When checking driver feasability for an alloc with multiple drivers, we must check that all drivers are detected and healthy. Nomad 0.9 and 0.8 have a bug where we may check a single driver only, but which driver is dependent on map traversal order, which is unspecified in golang spec.	2019-08-29 09:03:31 -04:00
Mahmood Ali	3da10b5cb3	scheduler: tests for multiple drivers in TG	2019-08-29 09:03:31 -04:00
Danielle Lancashire	3a5e48ad18	scheduler: Implicit constraint on readonly hostvol When a Client declares a volume is ReadOnly, we should only schedule it for requests for ReadOnly volumes. This change means that if a host exposes a readonly volume, we then validate that the group level requests for the volume are all read only for that host.	2019-08-21 20:57:05 +02:00
Danielle Lancashire	e132a30899	structs: Unify Volume and VolumeRequest	2019-08-12 15:39:08 +02:00
Danielle	fc53283489	Update scheduler/feasible.go Co-Authored-By: Mahmood Ali <mahmood@hashicorp.com>	2019-08-12 15:39:08 +02:00
Danielle Lancashire	073836ec67	scheduler: Add a feasability checker for Host Vols	2019-08-12 15:39:08 +02:00
Preetha Appan	e6a496bac0	Code review feedback	2019-07-31 01:04:08 -04:00
Preetha Appan	99eca85206	Scheduler changes to support network at task group level Also includes unit tests for binpacker and preemption. The tests verify that network resources specified at the task group level are properly accounted for	2019-07-31 01:04:08 -04:00
Nick Ethier	7c9520b404	scheduler: fix disk constraints	2019-07-31 01:04:08 -04:00
Nick Ethier	09a4cfd8d7	fix failing tests	2019-07-31 01:04:07 -04:00
Nick Ethier	af66a35924	networking: Add new bridge networking mode implementation	2019-07-31 01:04:06 -04:00
Nick Ethier	15989bba8e	ar: cleanup lint errors	2019-07-31 01:03:18 -04:00
Nick Ethier	66c514a388	Add network lifecycle management Adds a new Prerun and Postrun hooks to manage set up of network namespaces on linux. Work still needs to be done to make the code platform agnostic and support Docker style network initalization.	2019-07-31 01:03:17 -04:00
Lang Martin	8157a7b6f8	system_sched submits failed evals as blocked	2019-07-18 10:32:12 -04:00
Preetha Appan	3484f18984	Fix more tests	2019-06-26 16:30:53 -05:00
Preetha Appan	10e7d6df6d	Remove compat code associated with many previous versions of nomad This removes compat code for namespaces (0.7), Drain(0.8) and other older features from releases older than Nomad 0.7	2019-06-25 19:05:25 -05:00
Mahmood Ali	8d4f914be9	Merge pull request #5790 from hashicorp/b-reschedule-desired-state Mark rescheduled allocs as stopped.	2019-06-13 17:28:59 -04:00
Mahmood Ali	5e6327b6a1	Test behavior no reschedule for service/batch jobs	2019-06-13 16:41:19 -04:00
Mahmood Ali	faf643a375	Don't stop rescheduleLater allocations When an alloc is due to be rescheduleLater, it goes through the reconciler twice: once to be ignored with a follow up evals, and once again when processing the follow up eval where they appear as rescheduleNow. Here, we ignore them in the first run and mark them as stopped in second iteration; rather than stop them twice.	2019-06-13 09:44:41 -04:00
Mahmood Ali	5dc404ecab	Only preempt for network when there is a network When examining preemption for networks, only consider allocs that have networks. Fixes https://github.com/hashicorp/nomad/issues/5793	2019-06-07 18:55:55 -04:00
Mahmood Ali	98575f5788	test: add tests for network devices and preemption	2019-06-07 18:55:02 -04:00
Mahmood Ali	fd8fb8c22b	Stop allocs to be rescheduled Currently, when an alloc fails and is rescheduled, the alloc desired state remains as "run" and the nomad client may not free the resources. Here, we ensure that an alloc is marked as stopped when it's rescheduled. Notice the Desired Status and Description before and after this change: Before: ``` mars-2:nomad notnoop$ nomad alloc status 02aba49e ID = 02aba49e Eval ID = bb9ed1d2 Name = example-reschedule.nodes[0] Node ID = 5853d547 Node Name = mars-2.local Job ID = example-reschedule Job Version = 0 Client Status = failed Client Description = Failed tasks Desired Status = run Desired Description = <none> Created = 10s ago Modified = 5s ago Replacement Alloc ID = d6bf872b Task "payload" is "dead" Task Resources CPU Memory Disk Addresses 0/100 MHz 24 MiB/300 MiB 300 MiB Task Events: Started At = 2019-06-06T21:12:45Z Finished At = 2019-06-06T21:12:50Z Total Restarts = 0 Last Restart = N/A Recent Events: Time Type Description 2019-06-06T17:12:50-04:00 Not Restarting Policy allows no restarts 2019-06-06T17:12:50-04:00 Terminated Exit Code: 1 2019-06-06T17:12:45-04:00 Started Task started by client 2019-06-06T17:12:45-04:00 Task Setup Building Task Directory 2019-06-06T17:12:45-04:00 Received Task received by client ``` After: ``` ID = 5001ccd1 Eval ID = 53507a02 Name = example-reschedule.nodes[0] Node ID = a3b04364 Node Name = mars-2.local Job ID = example-reschedule Job Version = 0 Client Status = failed Client Description = Failed tasks Desired Status = stop Desired Description = alloc was rescheduled because it failed Created = 13s ago Modified = 3s ago Replacement Alloc ID = 7ba7ac20 Task "payload" is "dead" Task Resources CPU Memory Disk Addresses 21/100 MHz 24 MiB/300 MiB 300 MiB Task Events: Started At = 2019-06-06T21:22:50Z Finished At = 2019-06-06T21:22:55Z Total Restarts = 0 Last Restart = N/A Recent Events: Time Type Description 2019-06-06T17:22:55-04:00 Not Restarting Policy allows no restarts 2019-06-06T17:22:55-04:00 Terminated Exit Code: 1 2019-06-06T17:22:50-04:00 Started Task started by client 2019-06-06T17:22:50-04:00 Task Setup Building Task Directory 2019-06-06T17:22:50-04:00 Received Task received by client ```	2019-06-06 17:27:12 -04:00
Mahmood Ali	3eda42d027	tests: Migrated allocs aren't lost Fix `TestServiceSched_NodeDown` for checking that the migrated allocs are actually marked to be stopped. The boolean logic in test made it skip actually checking client status as long as desired status was stop. Here, we mark some jobs for migration while leaving others as running, and we check that lost flag is only set for non-migrated allocs.	2019-06-06 16:05:07 -04:00
Lang Martin	34230577df	describe a pending deployment with auto_promote accurately	2019-05-22 12:32:08 -04:00
Lang Martin	d462639cc9	sched reconcile copy AutoPromote to DeploymentState	2019-05-22 12:32:08 -04:00
Preetha Appan	374eee421f	Fix comment and assert score in test case	2019-05-15 12:35:57 -05:00
Nick Ethier	f0b9f8e37a	fix missing brace	2019-05-15 13:02:04 -04:00
Nick Ethier	0d851b5d11	scheduler: add check to prohibit returning inf during spread boost calculation	2019-05-15 13:00:24 -04:00
Lang Martin	29ea112586	system_sched & test cleanup comments	2019-05-01 12:25:26 -04:00
Lang Martin	c490dacf76	system_sched_test extend the test to check ineligible nodes	2019-05-01 12:25:26 -04:00
Lang Martin	c43bcbd35e	system_sched when a node is filtered, don't mark failure	2019-05-01 12:25:26 -04:00
Lang Martin	aecec5df1b	system_sched_test create partially constrained job	2019-05-01 12:25:26 -04:00
Arshneet Singh	d4e7a5c005	Add comments to functions, and use require instead of assert	2019-04-23 09:57:21 -07:00
Arshneet Singh	4cf4324b8f	Remove allowPlanOptimization from schedulers	2019-04-23 09:18:02 -07:00
Arshneet Singh	0dd4c109e8	Compat tags	2019-04-23 09:18:01 -07:00
Arshneet Singh	65f5fab131	Add tests for plan normalization	2019-04-23 09:18:01 -07:00
Arshneet Singh	b977748a4b	Add code for plan normalization	2019-04-23 09:18:01 -07:00
Danielle	198a838b61	Merge pull request #5512 from hashicorp/dani/f-alloc-stop alloc-lifecycle: nomad alloc stop	2019-04-23 13:05:08 +02:00
Danielle Lancashire	832f607433	allocs: Add nomad alloc stop This adds a `nomad alloc stop` command that can be used to stop and force migrate an allocation to a different node. This is built on top of the AllocUpdateDesiredTransitionRequest and explicitly limits the scope of access to that transition to expose it under the alloc-lifecycle ACL. The API returns the follow up eval that can be used as part of monitoring in the CLI or parsed and used in an external tool.	2019-04-23 12:50:23 +02:00
Preetha Appan	bcb5c8c70d	remove stray new line	2019-04-12 10:32:48 -05:00
Preetha Appan	8ddc076c1d	Refactor scheduler package to enable preemption for batch/service jobs	2019-04-10 20:24:01 -05:00
James Rasell	9470507cf4	Add NodeName to the alloc/job status outputs. Currently when operators need to log onto a machine where an alloc is running they will need to perform both an alloc/job status call and then a call to discover the node name from the node list. This updates both the job status and alloc status output to include the node name within the information to make operator use easier. Closes #2359 Cloess #1180	2019-04-10 10:34:10 -05:00
Preetha Appan	da1ce9bcea	Fix bug where scoring metadata would be overridden during an inplace upgrade.	2019-03-12 23:36:46 -05:00
Alex Dadgar	41265d4d61	Change types of weights on spread/affinity	2019-01-30 12:20:38 -08:00
Nick Ethier	24cbf42798	scheduler: fix NPE when deployment is nil, but placement is a canary	2019-01-28 20:22:59 -06:00
Alex Dadgar	5198ff05c3	convert driver to device for device constraint/attributes	2019-01-23 10:58:45 -08:00
Alex Dadgar	4bdccab550	goimports	2019-01-22 15:44:31 -08:00
Preetha Appan	3b054d6135	Remove unnecessary usage of alloc.Resource	2019-01-10 16:36:47 -06:00
Mahmood Ali	0dfa93a3c1	appease linter	2019-01-08 10:58:49 -05:00
Alex Dadgar	8a35d7b1dd	Test recovery	2019-01-07 14:49:41 -08:00
Preetha	f406e66ab8	Merge pull request #4881 from hashicorp/f-device-preemption Device preemption	2018-12-11 18:34:19 -06:00
Preetha Appan	977a4a540d	Early continue after meeting needed count Also adds another optimization that filters out un-needed allocations as a final filtering step	2018-12-11 10:12:18 -06:00
Preetha Appan	f60c52c8ba	Score combinations of allocs from multiple devices for preemption	2018-12-07 18:35:47 -06:00
Alex Dadgar	1e3c3cb287	Deprecate IOPS IOPS have been modelled as a resource since Nomad 0.1 but has never actually been detected and there is no plan in the short term to add detection. This is because IOPS is a bit simplistic of a unit to define the performance requirements from the underlying storage system. In its current state it adds unnecessary confusion and can be removed without impacting any users. This PR leaves IOPS defined at the jobspec parsing level and in the api/ resources since these are the two public uses of the field. These should be considered deprecated and only exist to allow users to stop using them during the Nomad 0.9.x release. In the future, there should be no expectation that the field will exist.	2018-12-06 15:09:26 -08:00
Preetha Appan	63681fac0c	use structured logging everywhere consistently	2018-12-03 08:31:41 -06:00
Preetha Appan	766820def3	addresses some code clarity review comments	2018-11-27 11:02:06 -06:00
Mahmood Ali	96ffe044e7	Simplify map count update logic Co-Authored-By: preetapan <preetha@hashicorp.com>	2018-11-27 10:03:11 -06:00
Mahmood Ali	57b94c2d50	code review suggestion Co-Authored-By: preetapan <preetha@hashicorp.com>	2018-11-27 09:59:57 -06:00

1 2 3 4 5 ...

804 Commits