open-nomad

Author	SHA1	Message	Date
Tim Gross	c50057bf1f	scheduler: fix job update placement on prev node penalized (#6781 ) Fixes #5856 When the scheduler looks for a placement for an allocation that's replacing another allocation, it's supposed to penalize the previous node if the allocation had been rescheduled or failed. But we're currently always penalizing the node, which leads to unnecessary migrations on job update. This commit leaves in place the existing behavior where if the previous alloc was itself rescheduled, its previous nodes are also penalized. This is conservative but the right behavior especially on larger clusters where a group of hosts might be having correlated trouble (like an AZ failure). Co-Authored-By: Michael Schurter <mschurter@hashicorp.com>	2019-12-03 06:14:49 -08:00
Preetha Appan	d21c708c4a	Fix inplace updates bug with group level networks During inplace updates, we should be using network information from the previous allocation being updated.	2019-09-05 18:37:24 -05:00
Nick Ethier	7c9520b404	scheduler: fix disk constraints	2019-07-31 01:04:08 -04:00
Nick Ethier	09a4cfd8d7	fix failing tests	2019-07-31 01:04:07 -04:00
Nick Ethier	af66a35924	networking: Add new bridge networking mode implementation	2019-07-31 01:04:06 -04:00
Lang Martin	8157a7b6f8	system_sched submits failed evals as blocked	2019-07-18 10:32:12 -04:00
Mahmood Ali	fd8fb8c22b	Stop allocs to be rescheduled Currently, when an alloc fails and is rescheduled, the alloc desired state remains as "run" and the nomad client may not free the resources. Here, we ensure that an alloc is marked as stopped when it's rescheduled. Notice the Desired Status and Description before and after this change: Before: ``` mars-2:nomad notnoop$ nomad alloc status 02aba49e ID = 02aba49e Eval ID = bb9ed1d2 Name = example-reschedule.nodes[0] Node ID = 5853d547 Node Name = mars-2.local Job ID = example-reschedule Job Version = 0 Client Status = failed Client Description = Failed tasks Desired Status = run Desired Description = <none> Created = 10s ago Modified = 5s ago Replacement Alloc ID = d6bf872b Task "payload" is "dead" Task Resources CPU Memory Disk Addresses 0/100 MHz 24 MiB/300 MiB 300 MiB Task Events: Started At = 2019-06-06T21:12:45Z Finished At = 2019-06-06T21:12:50Z Total Restarts = 0 Last Restart = N/A Recent Events: Time Type Description 2019-06-06T17:12:50-04:00 Not Restarting Policy allows no restarts 2019-06-06T17:12:50-04:00 Terminated Exit Code: 1 2019-06-06T17:12:45-04:00 Started Task started by client 2019-06-06T17:12:45-04:00 Task Setup Building Task Directory 2019-06-06T17:12:45-04:00 Received Task received by client ``` After: ``` ID = 5001ccd1 Eval ID = 53507a02 Name = example-reschedule.nodes[0] Node ID = a3b04364 Node Name = mars-2.local Job ID = example-reschedule Job Version = 0 Client Status = failed Client Description = Failed tasks Desired Status = stop Desired Description = alloc was rescheduled because it failed Created = 13s ago Modified = 3s ago Replacement Alloc ID = 7ba7ac20 Task "payload" is "dead" Task Resources CPU Memory Disk Addresses 21/100 MHz 24 MiB/300 MiB 300 MiB Task Events: Started At = 2019-06-06T21:22:50Z Finished At = 2019-06-06T21:22:55Z Total Restarts = 0 Last Restart = N/A Recent Events: Time Type Description 2019-06-06T17:22:55-04:00 Not Restarting Policy allows no restarts 2019-06-06T17:22:55-04:00 Terminated Exit Code: 1 2019-06-06T17:22:50-04:00 Started Task started by client 2019-06-06T17:22:50-04:00 Task Setup Building Task Directory 2019-06-06T17:22:50-04:00 Received Task received by client ```	2019-06-06 17:27:12 -04:00
Arshneet Singh	4cf4324b8f	Remove allowPlanOptimization from schedulers	2019-04-23 09:18:02 -07:00
Arshneet Singh	b977748a4b	Add code for plan normalization	2019-04-23 09:18:01 -07:00
Danielle	198a838b61	Merge pull request #5512 from hashicorp/dani/f-alloc-stop alloc-lifecycle: nomad alloc stop	2019-04-23 13:05:08 +02:00
Danielle Lancashire	832f607433	allocs: Add nomad alloc stop This adds a `nomad alloc stop` command that can be used to stop and force migrate an allocation to a different node. This is built on top of the AllocUpdateDesiredTransitionRequest and explicitly limits the scope of access to that transition to expose it under the alloc-lifecycle ACL. The API returns the follow up eval that can be used as part of monitoring in the CLI or parsed and used in an external tool.	2019-04-23 12:50:23 +02:00
Preetha Appan	bcb5c8c70d	remove stray new line	2019-04-12 10:32:48 -05:00
Preetha Appan	8ddc076c1d	Refactor scheduler package to enable preemption for batch/service jobs	2019-04-10 20:24:01 -05:00
James Rasell	9470507cf4	Add NodeName to the alloc/job status outputs. Currently when operators need to log onto a machine where an alloc is running they will need to perform both an alloc/job status call and then a call to discover the node name from the node list. This updates both the job status and alloc status output to include the node name within the information to make operator use easier. Closes #2359 Cloess #1180	2019-04-10 10:34:10 -05:00
Nick Ethier	24cbf42798	scheduler: fix NPE when deployment is nil, but placement is a canary	2019-01-28 20:22:59 -06:00
Alex Dadgar	4bdccab550	goimports	2019-01-22 15:44:31 -08:00
Preetha Appan	0494a098ce	More style and readablity fixes from review	2018-10-30 11:06:32 -05:00
Preetha Appan	cc295b90de	Implement preemption for system jobs. This commit implements an allocation selection algorithm for finding allocations to preempt. It currently special cases network resource asks from others (cpu/memory/disk/iops).	2018-10-30 11:06:32 -05:00
Alex Dadgar	a78cefec18	use int64	2018-10-16 15:34:32 -07:00
Preetha Appan	7c0d8c646c	Change CPU/Disk/MemoryMB to int everywhere in new resource structs	2018-10-16 16:21:42 -05:00
Alex Dadgar	bac5cb1e8b	Scheduler uses allocated resources	2018-10-02 17:08:25 -07:00
Preetha Appan	a10118c461	Add failed follow up to the list of allowed eval trigger reasons needs unit test	2018-09-25 10:49:55 -07:00
Alex Dadgar	6a21f9fe96	Unique TriggerBy for blocked evals Give blocked evals a unique triggerby reason to make debugging a chain of evaluations easier.	2018-09-24 14:47:49 -07:00
Alex Dadgar	3c19d01d7a	server	2018-09-15 16:23:13 -07:00
Preetha Appan	6ed527c636	Use heap to store top K scoring nodes. Scoring metadata is now aggregated by scorer type to make it easier to parse when reading it in the CLI.	2018-09-04 16:10:11 -05:00
Alex Dadgar	e1c239daae	Merge pull request #4414 from hashicorp/b-stop-summary Reset Queued allocs to zero when job stopped	2018-07-16 14:32:55 -07:00
Nick Ethier	6b6777359b	scheduler: fix missing err assignment	2018-07-11 14:27:10 -04:00
Nick Ethier	5f6def5b04	scheduler: better error handling	2018-07-05 11:00:03 -04:00
Nick Ethier	030e650e78	scheduler: fix nil pointer exception	2018-07-02 16:05:38 -04:00
Alex Dadgar	c3c79c408e	Reset Queued allocs to zero when job stopped When a job is stopped but not purged, we should set the Queued count to be zero.	2018-06-13 10:46:39 -07:00
Alex Dadgar	f95ab4ade8	Mark canaries on creation, and unmark on promotion	2018-05-07 14:50:01 -05:00
Preetha Appan	a569d34f25	Add custom status description for rescheduling follow up evals, and make unit test robust	2018-04-10 15:30:15 -05:00
Alex Dadgar	e5b5803265	Only mark allocs as part of deployment if deployment is active	2018-04-05 15:40:49 -07:00
Preetha Appan	00537c739b	Fixes edge cases around timing and task finish time being set more than once	2018-04-03 16:34:59 -05:00
Alex Dadgar	9d60e2cebf	Correct status desc on draining system allocs	2018-03-26 17:54:46 -07:00
Alex Dadgar	e106da84de	name and test	2018-03-26 11:06:21 -07:00
Alex Dadgar	e2a6e64fca	Don't create unnecessary deployments	2018-03-23 16:55:21 -07:00
Alex Dadgar	92b636dd32	Fix deadline handling	2018-03-21 16:51:44 -07:00
Michael Schurter	c0542474db	drain: initial drainv2 structs and impl	2018-03-21 16:49:48 -07:00
Preetha Appan	3e96c6c4e0	Address more code review feedback	2018-03-14 16:10:32 -05:00
Preetha Appan	9fed0d2103	Get reschedule policy from the alloc directly	2018-03-14 16:10:32 -05:00
Preetha Appan	e89bbf7289	Update comment about WaitTime	2018-03-14 16:10:32 -05:00
Preetha Appan	47e0280d96	More small review feedback	2018-03-14 16:10:32 -05:00
Preetha Appan	5373ade731	Scheduler and Reconciler changes to support delayed rescheduling	2018-03-14 16:10:32 -05:00
Preetha Appan	2ed4de7e7b	Track previous node id correctly, plus unit test	2018-01-31 09:58:05 -06:00
Preetha Appan	ea4a889e28	Address more code review feedback	2018-01-31 09:56:53 -06:00
Preetha Appan	bd89d2b39e	Make sure that reschedule trackers are not added for node drain replacements	2018-01-31 09:56:53 -06:00
Preetha Appan	21b7b79d5d	Add helper methods, use require and other code review feedback	2018-01-31 09:56:53 -06:00
Preetha Appan	d0f9d59abb	Reconile with changes to structs for reschedule tracking	2018-01-31 09:56:53 -06:00
Preetha Appan	fbb1936dee	Fix some comments and lint warnings, remove unused method	2018-01-31 09:56:53 -06:00

1 2 3 4

167 commits