open-nomad

Commit Graph

Author	SHA1	Message	Date
Jasmine Dahilig	81d051d7e8	fix bug in lifecycle scheduler test mocks	2020-03-21 17:52:51 -04:00
Jasmine Dahilig	0cc9212a54	add test cases for scheduler alloc placement with lifecycle resources	2020-03-21 17:52:47 -04:00
Jasmine Dahilig	3e4e8f2b02	add allocfit test for lifecycles	2020-03-21 17:52:46 -04:00
Drew Bailey	9a65556211	add state store test to ensure PlacedCanaries is updated	2020-02-03 13:58:01 -05:00
Drew Bailey	ef175c0b31	Update Evicted allocations to lost when lost If an alloc is being preempted and marked as evict, but the underlying node is lost before the migration takes place, the allocation currently stays as desired evict, status running forever, or until the node comes back online. This commit updates updateNonTerminalAllocsToLost to check for a destired status of Evict as well as Stop when updating allocations on tainted nodes. switch to table test for lost node cases	2020-01-07 13:34:18 -05:00
Michael Schurter	7655e0cee4	Merge pull request #6792 from hashicorp/b-propose-panic scheduler: fix panic when preempting and evicting allocs	2019-12-03 10:40:19 -08:00
Tim Gross	c50057bf1f	scheduler: fix job update placement on prev node penalized (#6781 ) Fixes #5856 When the scheduler looks for a placement for an allocation that's replacing another allocation, it's supposed to penalize the previous node if the allocation had been rescheduled or failed. But we're currently always penalizing the node, which leads to unnecessary migrations on job update. This commit leaves in place the existing behavior where if the previous alloc was itself rescheduled, its previous nodes are also penalized. This is conservative but the right behavior especially on larger clusters where a group of hosts might be having correlated trouble (like an AZ failure). Co-Authored-By: Michael Schurter <mschurter@hashicorp.com>	2019-12-03 06:14:49 -08:00
Michael Schurter	0374069f82	scheduler: update tests with modern error helper	2019-12-02 20:25:52 -08:00
Michael Schurter	c6bbe85f42	core: fix panic when AllocatedResources is nil Fix for #6540	2019-10-28 14:38:21 -07:00
Mahmood Ali	fd8fb8c22b	Stop allocs to be rescheduled Currently, when an alloc fails and is rescheduled, the alloc desired state remains as "run" and the nomad client may not free the resources. Here, we ensure that an alloc is marked as stopped when it's rescheduled. Notice the Desired Status and Description before and after this change: Before: ``` mars-2:nomad notnoop$ nomad alloc status 02aba49e ID = 02aba49e Eval ID = bb9ed1d2 Name = example-reschedule.nodes[0] Node ID = 5853d547 Node Name = mars-2.local Job ID = example-reschedule Job Version = 0 Client Status = failed Client Description = Failed tasks Desired Status = run Desired Description = <none> Created = 10s ago Modified = 5s ago Replacement Alloc ID = d6bf872b Task "payload" is "dead" Task Resources CPU Memory Disk Addresses 0/100 MHz 24 MiB/300 MiB 300 MiB Task Events: Started At = 2019-06-06T21:12:45Z Finished At = 2019-06-06T21:12:50Z Total Restarts = 0 Last Restart = N/A Recent Events: Time Type Description 2019-06-06T17:12:50-04:00 Not Restarting Policy allows no restarts 2019-06-06T17:12:50-04:00 Terminated Exit Code: 1 2019-06-06T17:12:45-04:00 Started Task started by client 2019-06-06T17:12:45-04:00 Task Setup Building Task Directory 2019-06-06T17:12:45-04:00 Received Task received by client ``` After: ``` ID = 5001ccd1 Eval ID = 53507a02 Name = example-reschedule.nodes[0] Node ID = a3b04364 Node Name = mars-2.local Job ID = example-reschedule Job Version = 0 Client Status = failed Client Description = Failed tasks Desired Status = stop Desired Description = alloc was rescheduled because it failed Created = 13s ago Modified = 3s ago Replacement Alloc ID = 7ba7ac20 Task "payload" is "dead" Task Resources CPU Memory Disk Addresses 21/100 MHz 24 MiB/300 MiB 300 MiB Task Events: Started At = 2019-06-06T21:22:50Z Finished At = 2019-06-06T21:22:55Z Total Restarts = 0 Last Restart = N/A Recent Events: Time Type Description 2019-06-06T17:22:55-04:00 Not Restarting Policy allows no restarts 2019-06-06T17:22:55-04:00 Terminated Exit Code: 1 2019-06-06T17:22:50-04:00 Started Task started by client 2019-06-06T17:22:50-04:00 Task Setup Building Task Directory 2019-06-06T17:22:50-04:00 Received Task received by client ```	2019-06-06 17:27:12 -04:00
Mahmood Ali	3eda42d027	tests: Migrated allocs aren't lost Fix `TestServiceSched_NodeDown` for checking that the migrated allocs are actually marked to be stopped. The boolean logic in test made it skip actually checking client status as long as desired status was stop. Here, we mark some jobs for migration while leaving others as running, and we check that lost flag is only set for non-migrated allocs.	2019-06-06 16:05:07 -04:00
Preetha Appan	da1ce9bcea	Fix bug where scoring metadata would be overridden during an inplace upgrade.	2019-03-12 23:36:46 -05:00
Alex Dadgar	41265d4d61	Change types of weights on spread/affinity	2019-01-30 12:20:38 -08:00
Alex Dadgar	8a35d7b1dd	Test recovery	2019-01-07 14:49:41 -08:00
Alex Dadgar	01f8e5b95f	renames	2018-10-04 14:57:25 -07:00
Alex Dadgar	52f9cd7637	fixing tests	2018-10-04 14:26:19 -07:00
Alex Dadgar	6a21f9fe96	Unique TriggerBy for blocked evals Give blocked evals a unique triggerby reason to make debugging a chain of evaluations easier.	2018-09-24 14:47:49 -07:00
Alex Dadgar	cc92cd92cd	Merge pull request #4642 from hashicorp/b-vet Fix vet errors and use newer go version in travis	2018-09-04 17:04:02 -07:00
Alex Dadgar	c6576ddac1	Fix make check errors	2018-09-04 16:03:52 -07:00
Preetha Appan	dd5fe6373f	Fix scoring logic for uneven spread to incorporate current alloc count Also addressed other small code review comments	2018-09-04 16:10:11 -05:00
Preetha Appan	db0d95b09c	Implement spread iterator that scores according to percentage of desired count in each target. Added this as a new step in the stack and some unit tests	2018-09-04 16:10:11 -05:00
Alex Dadgar	c3c79c408e	Reset Queued allocs to zero when job stopped When a job is stopped but not purged, we should set the Queued count to be zero.	2018-06-13 10:46:39 -07:00
Alex Dadgar	8626c1b94a	Reschedule when we have canaries properly	2018-05-07 14:55:01 -05:00
Alex Dadgar	f95ab4ade8	Mark canaries on creation, and unmark on promotion	2018-05-07 14:50:01 -05:00
Preetha Appan	33e170c15d	s/linear/constant/g	2018-03-26 14:45:09 -05:00
Michael Schurter	d1ec65d765	switch to new raft DesiredTransition message	2018-03-21 16:49:48 -07:00
Alex Dadgar	db4a634072	RPC, FSM, State Store for marking DesiredTransistion fix build tag	2018-03-21 16:49:48 -07:00
Preetha Appan	9a5e6edf1f	Rename DelayCeiling to MaxDelay	2018-03-14 16:10:32 -05:00
Preetha Appan	9fed0d2103	Get reschedule policy from the alloc directly	2018-03-14 16:10:32 -05:00
Preetha Appan	47e0280d96	More small review feedback	2018-03-14 16:10:32 -05:00
Preetha Appan	5373ade731	Scheduler and Reconciler changes to support delayed rescheduling	2018-03-14 16:10:32 -05:00
Preetha Appan	d48c411692	Reconciler should consider failed allocs when marking deployment as failed.	2018-02-02 19:40:25 -06:00
Preetha Appan	5ad892026a	Add a field to track the next allocation during a replacement	2018-01-31 09:58:05 -06:00
Preetha Appan	2ed4de7e7b	Track previous node id correctly, plus unit test	2018-01-31 09:58:05 -06:00
Preetha Appan	21b7b79d5d	Add helper methods, use require and other code review feedback	2018-01-31 09:56:53 -06:00
Preetha Appan	d0f9d59abb	Reconile with changes to structs for reschedule tracking	2018-01-31 09:56:53 -06:00
Preetha Appan	031c566ada	Reschedule previous allocs and track their reschedule attempts	2018-01-31 09:56:53 -06:00
Preetha Appan	fd2fbefa4c	Add a field to track the next allocation during a replacement	2018-01-24 17:55:05 -06:00
Alex Dadgar	6dda0ebaed	gofmt	2018-01-04 14:45:15 -08:00
Alex Dadgar	2f561609b7	Fix detection of successful batch allocations This PR restores older behavior of detecting successful batch allocations (04d86ffd1006fde9dfb2ca8c1237fe60b995b0e3). This has the side effect that we correctly filter desired status stop but not successful batch allocations and create their replacements.	2018-01-04 14:20:32 -08:00
Preetha Appan	51bd0b59c7	Return an error if evaluation doesn't exist in state store at plan apply time.	2017-12-18 14:55:36 -06:00
Michael Schurter	45494f7304	Fix port labels on mock Alloc/Job/Node	2017-12-08 14:50:06 -08:00
Michael Schurter	a66c53d45a	Remove `structs` import from `api` Goes a step further and removes structs import from api's tests as well by moving GenerateUUID to its own package.	2017-09-29 10:36:08 -07:00
Alex Dadgar	4173834231	Enable more linters	2017-09-26 15:26:33 -07:00
Alex Dadgar	3904bde9a3	Fix batch handling of complete allocs/node drains This PR fixes: * An issue in which a node-drain that contains a complete batch alloc would cause a replacement * An issue in which allocations with the same name during a scale down/stop event wouldn't be properly stopped. * An issue in which batch allocations from previous job versions may not have been stopped properly. Fixes https://github.com/hashicorp/nomad/issues/3210	2017-09-14 15:08:57 -07:00
Alex Dadgar	84d06f6abe	Sync namespace changes	2017-09-07 17:04:21 -07:00
Alex Dadgar	0aef02a4f9	fix test	2017-08-21 14:07:54 -07:00
Alex Dadgar	2650bb1d12	Distinct Property supports arbitrary limit This PR enhances the distinct_property constraint such that a limit can be specified in the RTarget/value parameter. This allows constraints such as: ``` constraint { distinct_property = "${meta.rack}" value = "2" } ``` This restricts any given rack from running more than 2 allocations from the task group. Fixes https://github.com/hashicorp/nomad/issues/1146	2017-07-31 16:52:13 -07:00
Alex Dadgar	4f69355a66	Fix incorrect destructive update with distinct_property constraint This PR fixes an issue in which an update to a task group with a distinct property constraint would result in an incorrect destructive update.	2017-07-31 11:17:35 -07:00
Alex Dadgar	ce265e0aff	Update full node test to test more advanced case	2017-07-20 12:23:40 -07:00

1 2 3

129 Commits