open-nomad

Commit Graph

Author	SHA1	Message	Date
Mahmood Ali	fd8fb8c22b	Stop allocs to be rescheduled Currently, when an alloc fails and is rescheduled, the alloc desired state remains as "run" and the nomad client may not free the resources. Here, we ensure that an alloc is marked as stopped when it's rescheduled. Notice the Desired Status and Description before and after this change: Before: ``` mars-2:nomad notnoop$ nomad alloc status 02aba49e ID = 02aba49e Eval ID = bb9ed1d2 Name = example-reschedule.nodes[0] Node ID = 5853d547 Node Name = mars-2.local Job ID = example-reschedule Job Version = 0 Client Status = failed Client Description = Failed tasks Desired Status = run Desired Description = <none> Created = 10s ago Modified = 5s ago Replacement Alloc ID = d6bf872b Task "payload" is "dead" Task Resources CPU Memory Disk Addresses 0/100 MHz 24 MiB/300 MiB 300 MiB Task Events: Started At = 2019-06-06T21:12:45Z Finished At = 2019-06-06T21:12:50Z Total Restarts = 0 Last Restart = N/A Recent Events: Time Type Description 2019-06-06T17:12:50-04:00 Not Restarting Policy allows no restarts 2019-06-06T17:12:50-04:00 Terminated Exit Code: 1 2019-06-06T17:12:45-04:00 Started Task started by client 2019-06-06T17:12:45-04:00 Task Setup Building Task Directory 2019-06-06T17:12:45-04:00 Received Task received by client ``` After: ``` ID = 5001ccd1 Eval ID = 53507a02 Name = example-reschedule.nodes[0] Node ID = a3b04364 Node Name = mars-2.local Job ID = example-reschedule Job Version = 0 Client Status = failed Client Description = Failed tasks Desired Status = stop Desired Description = alloc was rescheduled because it failed Created = 13s ago Modified = 3s ago Replacement Alloc ID = 7ba7ac20 Task "payload" is "dead" Task Resources CPU Memory Disk Addresses 21/100 MHz 24 MiB/300 MiB 300 MiB Task Events: Started At = 2019-06-06T21:22:50Z Finished At = 2019-06-06T21:22:55Z Total Restarts = 0 Last Restart = N/A Recent Events: Time Type Description 2019-06-06T17:22:55-04:00 Not Restarting Policy allows no restarts 2019-06-06T17:22:55-04:00 Terminated Exit Code: 1 2019-06-06T17:22:50-04:00 Started Task started by client 2019-06-06T17:22:50-04:00 Task Setup Building Task Directory 2019-06-06T17:22:50-04:00 Received Task received by client ```	2019-06-06 17:27:12 -04:00
Mahmood Ali	3eda42d027	tests: Migrated allocs aren't lost Fix `TestServiceSched_NodeDown` for checking that the migrated allocs are actually marked to be stopped. The boolean logic in test made it skip actually checking client status as long as desired status was stop. Here, we mark some jobs for migration while leaving others as running, and we check that lost flag is only set for non-migrated allocs.	2019-06-06 16:05:07 -04:00
Lang Martin	34230577df	describe a pending deployment with auto_promote accurately	2019-05-22 12:32:08 -04:00
Lang Martin	d462639cc9	sched reconcile copy AutoPromote to DeploymentState	2019-05-22 12:32:08 -04:00
Preetha Appan	374eee421f	Fix comment and assert score in test case	2019-05-15 12:35:57 -05:00
Nick Ethier	f0b9f8e37a	fix missing brace	2019-05-15 13:02:04 -04:00
Nick Ethier	0d851b5d11	scheduler: add check to prohibit returning inf during spread boost calculation	2019-05-15 13:00:24 -04:00
Lang Martin	29ea112586	system_sched & test cleanup comments	2019-05-01 12:25:26 -04:00
Lang Martin	c490dacf76	system_sched_test extend the test to check ineligible nodes	2019-05-01 12:25:26 -04:00
Lang Martin	c43bcbd35e	system_sched when a node is filtered, don't mark failure	2019-05-01 12:25:26 -04:00
Lang Martin	aecec5df1b	system_sched_test create partially constrained job	2019-05-01 12:25:26 -04:00
Arshneet Singh	d4e7a5c005	Add comments to functions, and use require instead of assert	2019-04-23 09:57:21 -07:00
Arshneet Singh	4cf4324b8f	Remove allowPlanOptimization from schedulers	2019-04-23 09:18:02 -07:00
Arshneet Singh	0dd4c109e8	Compat tags	2019-04-23 09:18:01 -07:00
Arshneet Singh	65f5fab131	Add tests for plan normalization	2019-04-23 09:18:01 -07:00
Arshneet Singh	b977748a4b	Add code for plan normalization	2019-04-23 09:18:01 -07:00
Danielle	198a838b61	Merge pull request #5512 from hashicorp/dani/f-alloc-stop alloc-lifecycle: nomad alloc stop	2019-04-23 13:05:08 +02:00
Danielle Lancashire	832f607433	allocs: Add nomad alloc stop This adds a `nomad alloc stop` command that can be used to stop and force migrate an allocation to a different node. This is built on top of the AllocUpdateDesiredTransitionRequest and explicitly limits the scope of access to that transition to expose it under the alloc-lifecycle ACL. The API returns the follow up eval that can be used as part of monitoring in the CLI or parsed and used in an external tool.	2019-04-23 12:50:23 +02:00
Preetha Appan	bcb5c8c70d	remove stray new line	2019-04-12 10:32:48 -05:00
Preetha Appan	8ddc076c1d	Refactor scheduler package to enable preemption for batch/service jobs	2019-04-10 20:24:01 -05:00
James Rasell	9470507cf4	Add NodeName to the alloc/job status outputs. Currently when operators need to log onto a machine where an alloc is running they will need to perform both an alloc/job status call and then a call to discover the node name from the node list. This updates both the job status and alloc status output to include the node name within the information to make operator use easier. Closes #2359 Cloess #1180	2019-04-10 10:34:10 -05:00
Preetha Appan	da1ce9bcea	Fix bug where scoring metadata would be overridden during an inplace upgrade.	2019-03-12 23:36:46 -05:00
Alex Dadgar	41265d4d61	Change types of weights on spread/affinity	2019-01-30 12:20:38 -08:00
Nick Ethier	24cbf42798	scheduler: fix NPE when deployment is nil, but placement is a canary	2019-01-28 20:22:59 -06:00
Alex Dadgar	5198ff05c3	convert driver to device for device constraint/attributes	2019-01-23 10:58:45 -08:00
Alex Dadgar	4bdccab550	goimports	2019-01-22 15:44:31 -08:00
Preetha Appan	3b054d6135	Remove unnecessary usage of alloc.Resource	2019-01-10 16:36:47 -06:00
Mahmood Ali	0dfa93a3c1	appease linter	2019-01-08 10:58:49 -05:00
Alex Dadgar	8a35d7b1dd	Test recovery	2019-01-07 14:49:41 -08:00
Preetha	f406e66ab8	Merge pull request #4881 from hashicorp/f-device-preemption Device preemption	2018-12-11 18:34:19 -06:00
Preetha Appan	977a4a540d	Early continue after meeting needed count Also adds another optimization that filters out un-needed allocations as a final filtering step	2018-12-11 10:12:18 -06:00
Preetha Appan	f60c52c8ba	Score combinations of allocs from multiple devices for preemption	2018-12-07 18:35:47 -06:00
Alex Dadgar	1e3c3cb287	Deprecate IOPS IOPS have been modelled as a resource since Nomad 0.1 but has never actually been detected and there is no plan in the short term to add detection. This is because IOPS is a bit simplistic of a unit to define the performance requirements from the underlying storage system. In its current state it adds unnecessary confusion and can be removed without impacting any users. This PR leaves IOPS defined at the jobspec parsing level and in the api/ resources since these are the two public uses of the field. These should be considered deprecated and only exist to allow users to stop using them during the Nomad 0.9.x release. In the future, there should be no expectation that the field will exist.	2018-12-06 15:09:26 -08:00
Preetha Appan	63681fac0c	use structured logging everywhere consistently	2018-12-03 08:31:41 -06:00
Preetha Appan	766820def3	addresses some code clarity review comments	2018-11-27 11:02:06 -06:00
Mahmood Ali	96ffe044e7	Simplify map count update logic Co-Authored-By: preetapan <preetha@hashicorp.com>	2018-11-27 10:03:11 -06:00
Mahmood Ali	57b94c2d50	code review suggestion Co-Authored-By: preetapan <preetha@hashicorp.com>	2018-11-27 09:59:57 -06:00
Preetha Appan	86f416a984	Fix formatting	2018-11-16 20:45:52 -06:00
Preetha Appan	8efe6171e4	Fix preemption logic bug, need to group allocations by device first. This ensures that the set of allocations chosen for preemption all share the same device where ID is <vendor/type/device>	2018-11-16 20:32:10 -06:00
Danielle Tomlinson	9c72dafc95	scheduler: Add is_set/is_not_set constraints This adds constraints for asserting that a given attribute or value exists, or does not exist. This acts as a companion to =, or != operators, e.g: ```hcl constraint { attribute = "${attrs.type}" operator = "!=" value = "database" } constraint { attribute = "${attrs.type}" operator = "is_set" } ```	2018-11-15 11:00:32 -08:00
Preetha Appan	998968f57a	fix linting	2018-11-15 12:27:32 -06:00
Preetha Appan	e5de50fba8	Initial implementation of device preemption	2018-11-15 11:09:26 -06:00
Danielle Tomlinson	e5c641daa9	scheduler: Allow comparisons of nil values This commit allows the ConstraintChecker to test values that do not exist. This is useful when wanting to _exclude_ given nodes from executing a job, for example, if you wanted to give canary nodes an attribute, and not run critical services on them, you may specify something like the below, but not want to tag all other nodes with the inverse. ```hcl constraint { attribute = "${node.attr.canary} operator = "!=" value = "1" } ``` This also requires all constraint checkers to allow for nil target values, as they will no longer be short circuited by resolving a target.	2018-11-13 13:36:51 -08:00
Alex Dadgar	08dc2ea702	Merge pull request #4867 from hashicorp/b-deployment-progress-deadline Blocked evaluation fixes	2018-11-13 10:29:03 -08:00
Preetha Appan	de890b9d5c	blank line	2018-11-12 15:50:14 -06:00
Preetha Appan	20af09a1ef	Fix logic bug in tracking sum of matched affinity weights We need to track the sum of matching weights per device, but only change the final return value if its the highest scoring choice	2018-11-12 15:06:45 -06:00
Preetha Appan	285b9b6001	Normalize scores correctly	2018-11-08 17:01:58 -06:00
Preetha Appan	f20f2ca8e9	Fixes device scheduling unit tests Also changes the logic for score when there is more than one task requesting a device. Since inter task affinities are already normalized, we take the average of the scores across tasks.	2018-11-08 10:31:19 -06:00
Alex Dadgar	dbb05357bc	fix test	2018-11-07 11:59:24 -08:00
Alex Dadgar	a7ca737fb6	review comments	2018-11-07 11:31:52 -08:00
Alex Dadgar	36abd3a3d8	review comments	2018-11-07 10:33:22 -08:00
Alex Dadgar	e3cbb2c82e	allocs fit checks if devices get oversubscribed	2018-11-07 10:33:22 -08:00
Alex Dadgar	4f9b3ede87	Split device accounter and allocator	2018-11-07 10:32:03 -08:00
Alex Dadgar	6fa893c801	affinities	2018-11-07 10:32:03 -08:00
Alex Dadgar	feb83a2be3	assign devices	2018-11-07 10:32:03 -08:00
Alex Dadgar	6d8bb3a7bd	Duplicate blocked evals cancelling improved The old logic for cancelling duplicate blocked evaluations by job id had the issue where the newer evaluation could have additional node classes that it is (in)eligible for that we would not capture. This could make it such that cluster state could change such that the job would make progress but no evaluation was unblocked.	2018-11-07 10:08:23 -08:00
Preetha Appan	a6b714b81c	update preemption tests to use new node resource structs also includes a fix to remove unnecessary subtraction of network mbits	2018-11-02 17:59:53 -05:00
Preetha	b2b52b1ada	Merge pull request #4794 from hashicorp/f-preemption-systemjobs Preemption for system jobs	2018-11-02 16:28:06 -05:00
Preetha Appan	56de32f363	Address more minor code review feedback	2018-11-02 16:26:34 -05:00
Preetha Appan	253a351532	Fix test setup	2018-11-02 16:06:25 -05:00
Preetha Appan	fba24e5a8a	dereference safely	2018-11-02 15:58:59 -05:00
Preetha Appan	d061678df7	Fix static port preemption to be device aware	2018-11-02 13:07:24 -05:00
Preetha Appan	4182444937	Handle static port preemption when there are multiple devices Also added test case	2018-11-02 09:09:50 -05:00
Preetha Appan	fd60e66f86	Plumb alloc resource cache in a few more places. also removed now unused method	2018-11-01 16:44:43 -05:00
Preetha Appan	78d635edca	More review comments	2018-11-01 16:36:11 -05:00
Preetha Appan	6e1023ba08	Cleaner way to exit early, and fixed a couple more places reading from alloc.Resources	2018-11-01 16:15:58 -05:00
Preetha Appan	b4dd26247f	review comments	2018-11-01 12:01:59 -05:00
Preetha Appan	d03201adf8	Fix formatting of allocation score metrics	2018-10-30 12:03:23 -05:00
Preetha Appan	f1c3eb2792	Introduce interface with multiple implementations for resource distance	2018-10-30 11:06:32 -05:00
Preetha Appan	047af5141e	refactor preemption code to use method recievers and setters for common fields	2018-10-30 11:06:32 -05:00
Preetha Appan	1a5421f5d7	more minor cleanup	2018-10-30 11:06:32 -05:00
Preetha Appan	0494a098ce	More style and readablity fixes from review	2018-10-30 11:06:32 -05:00
Preetha Appan	3910ba9bbd	Preempted allocations should be removed from proposed allocations	2018-10-30 11:06:32 -05:00
Preetha Appan	9dd76d83dc	comments	2018-10-30 11:06:32 -05:00
Preetha Appan	e6234e3cc5	fix end to end scheduler test to use new resource structs correctly	2018-10-30 11:06:32 -05:00
Preetha Appan	8807c25b11	Modify preemption code to use new style of resource structs	2018-10-30 11:06:32 -05:00
Preetha Appan	c1c1c230e4	Make preemption config a struct to allow for enabling based on scheduler type	2018-10-30 11:06:32 -05:00
Preetha Appan	25a047267f	Use scheduler config from state store to enable/disable preemption	2018-10-30 11:06:32 -05:00
Preetha Appan	1805032e69	Fix linting and better comments	2018-10-30 11:06:32 -05:00
Preetha Appan	cc295b90de	Implement preemption for system jobs. This commit implements an allocation selection algorithm for finding allocations to preempt. It currently special cases network resource asks from others (cpu/memory/disk/iops).	2018-10-30 11:06:32 -05:00
Preetha Appan	22aee7294e	Merge branch 'f-fix-resource-type' of github.com:hashicorp/nomad into f-fix-resource-type	2018-10-16 18:30:12 -05:00
Preetha Appan	53c3f8151b	fix linting	2018-10-16 18:29:49 -05:00
Alex Dadgar	a78cefec18	use int64	2018-10-16 15:34:32 -07:00
Preetha Appan	7c0d8c646c	Change CPU/Disk/MemoryMB to int everywhere in new resource structs	2018-10-16 16:21:42 -05:00
Alex Dadgar	f5a76d8411	review comments	2018-10-15 15:31:13 -07:00
Alex Dadgar	7ecd65109a	Check constraints on devices	2018-10-14 13:35:47 -07:00
Alex Dadgar	5284554fcc	rework device checker	2018-10-13 16:47:53 -07:00
Alex Dadgar	1089e13b14	add to stack	2018-10-13 12:27:49 -07:00
Alex Dadgar	9b5aaac410	Device feasability checker	2018-10-13 12:27:49 -07:00
Preetha Appan	1574e898af	Fix bug in reconciler where terminal allocs on a job already stopped were unnecessarily updated	2018-10-08 21:03:49 -05:00
Alex Dadgar	01f8e5b95f	renames	2018-10-04 14:57:25 -07:00
Alex Dadgar	52f9cd7637	fixing tests	2018-10-04 14:26:19 -07:00
Alex Dadgar	bac5cb1e8b	Scheduler uses allocated resources	2018-10-02 17:08:25 -07:00
Preetha Appan	a10118c461	Add failed follow up to the list of allowed eval trigger reasons needs unit test	2018-09-25 10:49:55 -07:00
Alex Dadgar	6a21f9fe96	Unique TriggerBy for blocked evals Give blocked evals a unique triggerby reason to make debugging a chain of evaluations easier.	2018-09-24 14:47:49 -07:00
Alex Dadgar	3c19d01d7a	server	2018-09-15 16:23:13 -07:00
Alex Dadgar	3ba62efd5e	Failed/paused deployments do not block migrations This PR changes behavior of the scheduler such that a task group with a deployment that is failed or paused will not cause the scheduler to skip migrations. The reason for this change is that it causes a bad UX when draining nodes with allocations that are part of a failed/paused deployment. These operations should not be coupled in any way and this remedies that. Prior behavior was still correct, but required either jobs to transistion to a healthy state or for the node to hit its drain deadline.	2018-09-10 15:28:45 -07:00
Alex Dadgar	cc92cd92cd	Merge pull request #4642 from hashicorp/b-vet Fix vet errors and use newer go version in travis	2018-09-04 17:04:02 -07:00
Alex Dadgar	c6576ddac1	Fix make check errors	2018-09-04 16:03:52 -07:00
Preetha Appan	751c0eb5a5	code review feedback	2018-09-04 16:10:11 -05:00
Preetha Appan	9bc0962527	Track top k nodes by norm score rather than top k nodes per scorer	2018-09-04 16:10:11 -05:00
Preetha Appan	6ed527c636	Use heap to store top K scoring nodes. Scoring metadata is now aggregated by scorer type to make it easier to parse when reading it in the CLI.	2018-09-04 16:10:11 -05:00
Preetha Appan	65cf4373b3	fix linting error	2018-09-04 16:10:11 -05:00
Preetha Appan	dd5fe6373f	Fix scoring logic for uneven spread to incorporate current alloc count Also addressed other small code review comments	2018-09-04 16:10:11 -05:00
Preetha Appan	e72c0fe527	more cleanup	2018-09-04 16:10:11 -05:00
Preetha Appan	4c624424e6	added some unit tests for -1 spread score	2018-09-04 16:10:11 -05:00
Preetha Appan	92d37acc2a	comment and formatting cleanup	2018-09-04 16:10:11 -05:00
Preetha Appan	7b0a27cad6	fix scoring algorithm when min count == current count	2018-09-04 16:10:11 -05:00
Preetha Appan	bad075f640	Remove hardcoded boosts for even spread. instead, calculate them based on delta between current and minimum value	2018-09-04 16:10:11 -05:00
Preetha Appan	c56873ff37	Implement support for even spread across datacenters, with unit test	2018-09-04 16:10:11 -05:00
Preetha Appan	d091c00dd3	Support implicit spread target to account for remaining desired counts	2018-09-04 16:10:11 -05:00
Preetha Appan	33779abe5f	fix comments	2018-09-04 16:10:11 -05:00
Preetha Appan	5812f906c8	Allow empty spread targets, and validate target percentages.	2018-09-04 16:10:11 -05:00
Preetha Appan	55f276c189	Include spreads configured at job level when precomputing weights/desired counts.	2018-09-04 16:10:11 -05:00
Preetha Appan	fbd0004707	Fix warnings	2018-09-04 16:10:11 -05:00
Preetha Appan	db0d95b09c	Implement spread iterator that scores according to percentage of desired count in each target. Added this as a new step in the stack and some unit tests	2018-09-04 16:10:11 -05:00
Preetha Appan	eccf128c5c	Some minor changes from code review	2018-09-04 16:10:11 -05:00
Preetha Appan	038ed52877	Fix after rename to ConstraintSetContainsAny	2018-09-04 16:10:11 -05:00
Preetha Appan	3a39db3902	Fix linting	2018-09-04 16:10:11 -05:00
Preetha Appan	d5cd2bbddb	Remove unnecessary reset	2018-09-04 16:10:11 -05:00
Preetha Appan	dccb693221	test for setcontainsany, and treat set_contains same as set_contains_all	2018-09-04 16:10:11 -05:00
Preetha Appan	70bfd0c0cb	Address some review feedback	2018-09-04 16:10:11 -05:00
Preetha Appan	8685593ec0	Back out changes to propertyset that were not necessary for affinities	2018-09-04 16:10:11 -05:00
Preetha Appan	5eacd6ada4	Implement affinity support in generic scheduler	2018-09-04 16:10:11 -05:00
Alex Dadgar	e1c239daae	Merge pull request #4414 from hashicorp/b-stop-summary Reset Queued allocs to zero when job stopped	2018-07-16 14:32:55 -07:00
Nick Ethier	6b6777359b	scheduler: fix missing err assignment	2018-07-11 14:27:10 -04:00
Nick Ethier	5f6def5b04	scheduler: better error handling	2018-07-05 11:00:03 -04:00
Nick Ethier	030e650e78	scheduler: fix nil pointer exception	2018-07-02 16:05:38 -04:00
Alex Dadgar	300b1a7a15	Tests only use testlog package logger	2018-06-13 15:40:56 -07:00
Alex Dadgar	c3c79c408e	Reset Queued allocs to zero when job stopped When a job is stopped but not purged, we should set the Queued count to be zero.	2018-06-13 10:46:39 -07:00
Preetha Appan	b64788043e	make test create index clearer	2018-06-05 17:29:59 -05:00
Preetha Appan	3e264dcb79	Fix reconciler bug with deployment not being created if job create index is different This fixes an issue where if a job is purged and resubmitted Nomad does not create a new deployment. Adds unit test that failed before this fix	2018-06-05 13:58:53 -05:00
Preetha Appan	f8a23bc54a	fix test comment	2018-05-09 16:01:34 -05:00
Preetha Appan	ef531b0f34	Add unit tests for forced rescheduling	2018-05-09 11:30:42 -05:00
Preetha Appan	c1b92c284e	Work in progress - force rescheduling of failed allocs	2018-05-08 17:26:57 -05:00
Alex Dadgar	555d14fd92	Add test	2018-05-07 14:55:01 -05:00
Preetha Appan	cf44670d56	Make sure that task group has a deployment state before using it	2018-05-07 14:55:01 -05:00
Alex Dadgar	c6478d9469	clarify comment	2018-05-07 14:55:01 -05:00
Alex Dadgar	768fec8505	Allow healthy canary deployment to skip progress deadline	2018-05-07 14:55:01 -05:00
Alex Dadgar	8626c1b94a	Reschedule when we have canaries properly	2018-05-07 14:55:01 -05:00
Alex Dadgar	8dee3ab068	canary reschedule test	2018-05-07 14:50:01 -05:00
Alex Dadgar	deb93dc7b7	Test for rescheduling when there are canaries	2018-05-07 14:50:01 -05:00
Alex Dadgar	550f5e31f8	Allow canary count greater than desired	2018-05-07 14:50:01 -05:00
Alex Dadgar	f95ab4ade8	Mark canaries on creation, and unmark on promotion	2018-05-07 14:50:01 -05:00
Preetha Appan	5329900f6d	Only use DesiredTransition.Reschedule in reconciler when its an active deployment	2018-05-07 14:50:01 -05:00
Alex Dadgar	e7444c3873	Add test where deployment is marked as complete when done even with failed allocs	2018-05-07 14:50:01 -05:00
Alex Dadgar	57969b4ee0	fix reconcile tests	2018-05-07 14:50:01 -05:00
Alex Dadgar	5547974f35	Only reschedule allowed deployment allocs	2018-05-07 14:50:01 -05:00
Alex Dadgar	fcf4f582d0	small review feedback fixes	2018-05-07 14:50:01 -05:00
Alex Dadgar	1336002255	Progress deadline in deployment state	2018-05-07 14:50:01 -05:00
Alex Dadgar	ee50789c22	Initial implementation	2018-05-07 14:50:01 -05:00
Preetha Appan	a569d34f25	Add custom status description for rescheduling follow up evals, and make unit test robust	2018-04-10 15:30:15 -05:00
Alex Dadgar	e5b5803265	Only mark allocs as part of deployment if deployment is active	2018-04-05 15:40:49 -07:00
Preetha Appan	7e17bc231f	remove unnecessary check and other fixes from code review	2018-04-04 07:35:20 -05:00
Preetha Appan	00537c739b	Fixes edge cases around timing and task finish time being set more than once	2018-04-03 16:34:59 -05:00
Alex Dadgar	3aa4ee9d75	Fix lost handling of not actually down nodes	2018-03-30 14:17:41 -07:00
Preetha Appan	d87e528059	rename skip->ignore and improve comment formatting	2018-03-29 15:11:10 -05:00
Preetha Appan	38a7614776	Refactored for readability, pair programmed with @dadgar	2018-03-29 13:28:37 -05:00
Preetha Appan	5090fefe96	Filter out allocs with DesiredState = stop, and unit tests	2018-03-29 09:28:52 -05:00
Alex Dadgar	b18f789020	Unmark drain when nodes hit their deadline and only batch/system left and add all job type integration test	2018-03-28 17:25:58 -07:00
Preetha Appan	d2899728fd	Fix linting	2018-03-28 12:26:28 -05:00
Alex Dadgar	9d60e2cebf	Correct status desc on draining system allocs	2018-03-26 17:54:46 -07:00
Preetha Appan	33e170c15d	s/linear/constant/g	2018-03-26 14:45:09 -05:00
Preetha	5668c3c38e	Merge pull request #4037 from hashicorp/b-fix-terminal-filtering-service-allocs Fix edge case in reconciler	2018-03-26 13:14:51 -05:00
Preetha Appan	1b9e413a1a	one field per line in struct definition	2018-03-26 13:13:21 -05:00
Alex Dadgar	e106da84de	name and test	2018-03-26 11:06:21 -07:00
Alex Dadgar	e2a6e64fca	Don't create unnecessary deployments	2018-03-23 16:55:21 -07:00
Preetha Appan	cbfd69ce7a	Fix edge case in reconciler where service jobs with ClientstatusComplete were not replaced	2018-03-23 18:41:00 -05:00
Alex Dadgar	3b72dd94ba	Do not mark an allocation as an inplace update if specification hasn't changed	2018-03-23 14:36:05 -07:00
Michael Schurter	cb61a4bdc7	Fix linting errors	2018-03-21 16:51:45 -07:00
Alex Dadgar	92b636dd32	Fix deadline handling	2018-03-21 16:51:44 -07:00
Michael Schurter	9263cc2ed7	scheduler: migrate non-terminal migrating allocs filterByTainted node should always migrate non-terminal migrating allocs	2018-03-21 16:49:48 -07:00
Michael Schurter	d1ec65d765	switch to new raft DesiredTransition message	2018-03-21 16:49:48 -07:00
Alex Dadgar	db4a634072	RPC, FSM, State Store for marking DesiredTransistion fix build tag	2018-03-21 16:49:48 -07:00
Michael Schurter	c0542474db	drain: initial drainv2 structs and impl	2018-03-21 16:49:48 -07:00
Chelsea Holland Komlo	329605b7cc	fix up scheduling test	2018-03-21 15:54:03 -04:00
Chelsea Holland Komlo	60f12d206f	improve comments; update watchDriver	2018-03-21 15:15:26 -04:00
Chelsea Holland Komlo	d92703617c	simplify logic bump log level	2018-03-21 15:15:26 -04:00
Chelsea Holland Komlo	d8f68e5ef8	fix up codereview feedback	2018-03-21 15:15:26 -04:00
Chelsea Holland Komlo	c7fd0bd8a1	fix up scheduler mocks	2018-03-21 15:15:25 -04:00
Chelsea Holland Komlo	3aa726baab	fix scheduler driver name; create node structs file	2018-03-21 15:15:25 -04:00
Chelsea Holland Komlo	3cba95e8a7	allow nomad to schedule based on the status of a client driver health check Slight updates for go style	2018-03-21 15:15:25 -04:00
Preetha Appan	56e60e5840	Fix linting warning	2018-03-14 16:12:22 -05:00
Preetha Appan	9a5e6edf1f	Rename DelayCeiling to MaxDelay	2018-03-14 16:10:32 -05:00
Preetha Appan	3e96c6c4e0	Address more code review feedback	2018-03-14 16:10:32 -05:00
Preetha Appan	9fed0d2103	Get reschedule policy from the alloc directly	2018-03-14 16:10:32 -05:00
Preetha Appan	e89bbf7289	Update comment about WaitTime	2018-03-14 16:10:32 -05:00
Preetha Appan	e2656ef546	Cleaner handling of batched evals	2018-03-14 16:10:32 -05:00
Preetha Appan	47e0280d96	More small review feedback	2018-03-14 16:10:32 -05:00
Preetha Appan	2ba976dec8	Remove unnecessary check against 5 second window for determining immediate scheduling eligibility	2018-03-14 16:10:32 -05:00
Preetha Appan	5373ade731	Scheduler and Reconciler changes to support delayed rescheduling	2018-03-14 16:10:32 -05:00
Josh Soref	e0f6a33fe5	spelling: system	2018-03-11 19:01:19 +00:00
Josh Soref	a89e1b8395	spelling: strategy	2018-03-11 18:58:19 +00:00
Josh Soref	f8eb766fb5	spelling: reschedulable	2018-03-11 18:48:12 +00:00
Josh Soref	ed8db9992e	spelling: feasibility	2018-03-11 18:07:09 +00:00
Josh Soref	bf9283c606	spelling: corresponding	2018-03-11 17:51:41 +00:00
Josh Soref	ca4ceb0e5c	spelling: commits	2018-03-11 17:47:45 +00:00
Preetha Appan	7b6ba7a1f4	Fixes bug in reconciler where previously rescheduled allocs are rescheduled again. Simplified logic and added test case to catch this.	2018-02-20 12:07:56 -06:00
Preetha Appan	7c57303dd2	Clarify comment	2018-02-05 16:37:07 -06:00
Preetha Appan	d48c411692	Reconciler should consider failed allocs when marking deployment as failed.	2018-02-02 19:40:25 -06:00
Preetha Appan	a1237d627a	code review feedback	2018-01-31 09:58:05 -06:00
Preetha Appan	5ad892026a	Add a field to track the next allocation during a replacement	2018-01-31 09:58:05 -06:00
Preetha Appan	2ed4de7e7b	Track previous node id correctly, plus unit test	2018-01-31 09:58:05 -06:00
Preetha Appan	dd4917c2f0	Add more clarification in comment	2018-01-31 09:58:05 -06:00
Preetha Appan	09bef7d1ce	Preallocate slice for skipped nodes	2018-01-31 09:58:05 -06:00
Preetha Appan	237beb49ae	Better score threshold	2018-01-31 09:58:05 -06:00
Preetha Appan	fa18c0def4	Add one more unit test	2018-01-31 09:58:05 -06:00
Preetha Appan	a75540cec6	Limit iterator uses a score threshold and a maxSkip value to be able to skip lower scoring nodes	2018-01-31 09:58:05 -06:00
Preetha Appan	b6268a5fab	Beef up unit test for rescheduling batch jobs	2018-01-31 09:56:53 -06:00
Preetha Appan	ea4a889e28	Address more code review feedback	2018-01-31 09:56:53 -06:00
Preetha Appan	bd89d2b39e	Make sure that reschedule trackers are not added for node drain replacements	2018-01-31 09:56:53 -06:00
Preetha Appan	a662b38801	Improve reconciler unit tests	2018-01-31 09:56:53 -06:00
Preetha Appan	fee4ccf154	Prevent side effect modification of select options when preferred nodes are set	2018-01-31 09:56:53 -06:00
Preetha Appan	21b7b79d5d	Add helper methods, use require and other code review feedback	2018-01-31 09:56:53 -06:00
Preetha Appan	d0f9d59abb	Reconile with changes to structs for reschedule tracking	2018-01-31 09:56:53 -06:00
Preetha Appan	fbb1936dee	Fix some comments and lint warnings, remove unused method	2018-01-31 09:56:53 -06:00
Preetha Appan	031c566ada	Reschedule previous allocs and track their reschedule attempts	2018-01-31 09:56:53 -06:00
Preetha Appan	fd2fbefa4c	Add a field to track the next allocation during a replacement	2018-01-24 17:55:05 -06:00
Alex Dadgar	6dda0ebaed	gofmt	2018-01-04 14:45:15 -08:00
Alex Dadgar	2f561609b7	Fix detection of successful batch allocations This PR restores older behavior of detecting successful batch allocations (04d86ffd1006fde9dfb2ca8c1237fe60b995b0e3). This has the side effect that we correctly filter desired status stop but not successful batch allocations and create their replacements.	2018-01-04 14:20:32 -08:00
Preetha	1712b03705	Merge branch 'master' into 0.8	2018-01-03 16:06:38 -06:00
Preetha Appan	51bd0b59c7	Return an error if evaluation doesn't exist in state store at plan apply time.	2017-12-18 14:55:36 -06:00
Preetha Appan	3c36abfe14	Update eval modify index as part of plan apply.	2017-12-18 10:03:55 -06:00
Preetha Appan	3b4d7ac2a3	Fix some typos	2017-12-14 13:29:27 -06:00
Michael Schurter	45494f7304	Fix port labels on mock Alloc/Job/Node	2017-12-08 14:50:06 -08:00
Alex Dadgar	44240ce440	Merge pull request #3375 from hashicorp/b-batch Allow batch jobs to be rerun if purged	2017-10-13 17:11:45 -07:00
Alex Dadgar	c1cc51dbee	sync	2017-10-13 14:36:02 -07:00
Alex Dadgar	746cd7403f	Allow batch jobs to be rerun if purged This PR allows batch jobs to be rerun if they have been purged.	2017-10-13 12:40:37 -07:00
Michael Schurter	a66c53d45a	Remove `structs` import from `api` Goes a step further and removes structs import from api's tests as well by moving GenerateUUID to its own package.	2017-09-29 10:36:08 -07:00
Alex Dadgar	4173834231	Enable more linters	2017-09-26 15:26:33 -07:00
Alex Dadgar	3904bde9a3	Fix batch handling of complete allocs/node drains This PR fixes: * An issue in which a node-drain that contains a complete batch alloc would cause a replacement * An issue in which allocations with the same name during a scale down/stop event wouldn't be properly stopped. * An issue in which batch allocations from previous job versions may not have been stopped properly. Fixes https://github.com/hashicorp/nomad/issues/3210	2017-09-14 15:08:57 -07:00
Alex Dadgar	84d06f6abe	Sync namespace changes	2017-09-07 17:04:21 -07:00
Alex Dadgar	0aef02a4f9	fix test	2017-08-21 14:07:54 -07:00
Alex Dadgar	27256ebcc6	Placing allocs counts towards placement limit This PR makes placing new allocations count towards the limit. We do not restrict how many new placements are made by the limit but we still count towards the limit. This has the nice affect that if you have a group with count = 5 and max_parallel = 1 but only 3 allocs exist for it and a change is made, you will create 2 more at the new version but not destroy one, taking you down to two running as you would have previously. Fixes https://github.com/hashicorp/nomad/issues/3053	2017-08-21 12:41:19 -07:00
Alex Dadgar	2453f13fc5	fixes	2017-08-15 12:27:05 -07:00
Alex Dadgar	0570e09feb	Fix panic occuring from improper bitmap size This PR fixes an allignment calculation when determining the bitmap size. Fixes https://github.com/hashicorp/nomad/issues/3008	2017-08-12 15:37:02 -07:00
Luke Farnell	f0ced87b95	fixed all spelling mistakes for goreport	2017-08-07 17:13:05 -04:00
Alex Dadgar	7b13c0d702	Lost allocs replaced even if deployment failed This PR allows the scheduler to replace lost allocations even if the job has a failed or paused deployment. The prior behavior was confusing to users. Fixes https://github.com/hashicorp/nomad/issues/2958	2017-08-03 17:42:14 -07:00
Alex Dadgar	7d2b84ab01	Review fixes	2017-08-01 14:18:52 -07:00
Alex Dadgar	2650bb1d12	Distinct Property supports arbitrary limit This PR enhances the distinct_property constraint such that a limit can be specified in the RTarget/value parameter. This allows constraints such as: ``` constraint { distinct_property = "${meta.rack}" value = "2" } ``` This restricts any given rack from running more than 2 allocations from the task group. Fixes https://github.com/hashicorp/nomad/issues/1146	2017-07-31 16:52:13 -07:00
Alex Dadgar	4f69355a66	Fix incorrect destructive update with distinct_property constraint This PR fixes an issue in which an update to a task group with a distinct property constraint would result in an incorrect destructive update.	2017-07-31 11:17:35 -07:00
Michael Schurter	5f1f91a46c	Use go-testing-interface instead of testing This drops the testings stdlib pkg from our dependencies. Saves a whopping 46kb on our binary (was really hoping for more of a win there), but also avoids potential ugliness with how testing sets flags.	2017-07-25 15:35:19 -07:00
Alex Dadgar	492239d3ee	Improve multiple group handling in a deployment This PR resolves a bug in which a job with multiple task groups would create new deployment objects each, thus clearing out all other task groups deployment state.	2017-07-25 11:27:47 -07:00
Alex Dadgar	184bfd4836	Better comment	2017-07-20 12:31:08 -07:00
Alex Dadgar	248315a2d9	Handle destructive changes before placements This PR updates the generic scheduler to handle destructive changes before handling placements. This is important because the destructive change may be due to a lowering of resources. If this is the case, the handling of the destructive changes first may make it possible for the placement to happen. To reason about this imagine there is one node with CPU = 500. If the group originally had: * `count = 1` * `cpu = 400` And then the job was updated such that the group had: * `count = 4` * `cpu = 120` If the original alloc isn't discounted first, nothing would be able to place.	2017-07-20 12:24:27 -07:00
Alex Dadgar	ce265e0aff	Update full node test to test more advanced case	2017-07-20 12:23:40 -07:00
Alex Dadgar	a9ec1d6ca7	Fix update limit calculation to avoid panic This PR fixes the rolling update limit calculation to avoid a panic when there are more allocations for a deployment that haven't determined their health than the max_parallel count of the task group. Fixes https://github.com/hashicorp/nomad/issues/2820	2017-07-19 11:11:47 -07:00
Alex Dadgar	22e84d00ab	Fix deep copy of driver config	2017-07-17 17:53:21 -07:00
Alex Dadgar	641e178416	Stop before trying to place	2017-07-17 17:18:12 -07:00
Alex Dadgar	66a90326e1	Treat destructive updates atomically	2017-07-16 10:35:38 -07:00

... 3 4 5 6 7 ...

791 Commits