Commit Graph

580 Commits

Author SHA1 Message Date
Preetha Appan d061678df7
Fix static port preemption to be device aware 2018-11-02 13:07:24 -05:00
Preetha Appan 4182444937
Handle static port preemption when there are multiple devices
Also added test case
2018-11-02 09:09:50 -05:00
Preetha Appan fd60e66f86
Plumb alloc resource cache in a few more places.
also removed now unused method
2018-11-01 16:44:43 -05:00
Preetha Appan 78d635edca
More review comments 2018-11-01 16:36:11 -05:00
Preetha Appan 6e1023ba08
Cleaner way to exit early, and fixed a couple more places reading from alloc.Resources 2018-11-01 16:15:58 -05:00
Preetha Appan b4dd26247f
review comments 2018-11-01 12:01:59 -05:00
Preetha Appan d03201adf8
Fix formatting of allocation score metrics 2018-10-30 12:03:23 -05:00
Preetha Appan f1c3eb2792
Introduce interface with multiple implementations for resource distance 2018-10-30 11:06:32 -05:00
Preetha Appan 047af5141e
refactor preemption code to use method recievers and setters for common fields 2018-10-30 11:06:32 -05:00
Preetha Appan 1a5421f5d7
more minor cleanup 2018-10-30 11:06:32 -05:00
Preetha Appan 0494a098ce
More style and readablity fixes from review 2018-10-30 11:06:32 -05:00
Preetha Appan 3910ba9bbd
Preempted allocations should be removed from proposed allocations 2018-10-30 11:06:32 -05:00
Preetha Appan 9dd76d83dc
comments 2018-10-30 11:06:32 -05:00
Preetha Appan e6234e3cc5
fix end to end scheduler test to use new resource structs correctly 2018-10-30 11:06:32 -05:00
Preetha Appan 8807c25b11
Modify preemption code to use new style of resource structs 2018-10-30 11:06:32 -05:00
Preetha Appan c1c1c230e4
Make preemption config a struct to allow for enabling based on scheduler type 2018-10-30 11:06:32 -05:00
Preetha Appan 25a047267f
Use scheduler config from state store to enable/disable preemption 2018-10-30 11:06:32 -05:00
Preetha Appan 1805032e69
Fix linting and better comments 2018-10-30 11:06:32 -05:00
Preetha Appan cc295b90de
Implement preemption for system jobs.
This commit implements an allocation selection algorithm for finding
allocations to preempt. It currently special cases network resource asks
from others (cpu/memory/disk/iops).
2018-10-30 11:06:32 -05:00
Preetha Appan 22aee7294e
Merge branch 'f-fix-resource-type' of github.com:hashicorp/nomad into f-fix-resource-type 2018-10-16 18:30:12 -05:00
Preetha Appan 53c3f8151b
fix linting 2018-10-16 18:29:49 -05:00
Alex Dadgar a78cefec18 use int64 2018-10-16 15:34:32 -07:00
Preetha Appan 7c0d8c646c
Change CPU/Disk/MemoryMB to int everywhere in new resource structs 2018-10-16 16:21:42 -05:00
Alex Dadgar f5a76d8411 review comments 2018-10-15 15:31:13 -07:00
Alex Dadgar 7ecd65109a Check constraints on devices 2018-10-14 13:35:47 -07:00
Alex Dadgar 5284554fcc rework device checker 2018-10-13 16:47:53 -07:00
Alex Dadgar 1089e13b14 add to stack 2018-10-13 12:27:49 -07:00
Alex Dadgar 9b5aaac410 Device feasability checker 2018-10-13 12:27:49 -07:00
Preetha Appan 1574e898af
Fix bug in reconciler where terminal allocs on a job already stopped were unnecessarily updated 2018-10-08 21:03:49 -05:00
Alex Dadgar 01f8e5b95f renames 2018-10-04 14:57:25 -07:00
Alex Dadgar 52f9cd7637 fixing tests 2018-10-04 14:26:19 -07:00
Alex Dadgar bac5cb1e8b Scheduler uses allocated resources 2018-10-02 17:08:25 -07:00
Preetha Appan a10118c461 Add failed follow up to the list of allowed eval trigger reasons
needs unit test
2018-09-25 10:49:55 -07:00
Alex Dadgar 6a21f9fe96 Unique TriggerBy for blocked evals
Give blocked evals a unique triggerby reason to make debugging a chain
of evaluations easier.
2018-09-24 14:47:49 -07:00
Alex Dadgar 3c19d01d7a server 2018-09-15 16:23:13 -07:00
Alex Dadgar 3ba62efd5e Failed/paused deployments do not block migrations
This PR changes behavior of the scheduler such that a task group with a
deployment that is failed or paused will not cause the scheduler to skip
migrations.

The reason for this change is that it causes a bad UX when draining
nodes with allocations that are part of a failed/paused deployment.
These operations should not be coupled in any way and this remedies
that.

Prior behavior was still correct, but required either jobs to
transistion to a healthy state or for the node to hit its drain
deadline.
2018-09-10 15:28:45 -07:00
Alex Dadgar cc92cd92cd
Merge pull request #4642 from hashicorp/b-vet
Fix vet errors and use newer go version in travis
2018-09-04 17:04:02 -07:00
Alex Dadgar c6576ddac1 Fix make check errors 2018-09-04 16:03:52 -07:00
Preetha Appan 751c0eb5a5
code review feedback 2018-09-04 16:10:11 -05:00
Preetha Appan 9bc0962527
Track top k nodes by norm score rather than top k nodes per scorer 2018-09-04 16:10:11 -05:00
Preetha Appan 6ed527c636
Use heap to store top K scoring nodes.
Scoring metadata is now aggregated by scorer type to make it easier
to parse when reading it in the CLI.
2018-09-04 16:10:11 -05:00
Preetha Appan 65cf4373b3
fix linting error 2018-09-04 16:10:11 -05:00
Preetha Appan dd5fe6373f
Fix scoring logic for uneven spread to incorporate current alloc count
Also addressed other small code review comments
2018-09-04 16:10:11 -05:00
Preetha Appan e72c0fe527
more cleanup 2018-09-04 16:10:11 -05:00
Preetha Appan 4c624424e6
added some unit tests for -1 spread score 2018-09-04 16:10:11 -05:00
Preetha Appan 92d37acc2a
comment and formatting cleanup 2018-09-04 16:10:11 -05:00
Preetha Appan 7b0a27cad6
fix scoring algorithm when min count == current count 2018-09-04 16:10:11 -05:00
Preetha Appan bad075f640
Remove hardcoded boosts for even spread.
instead, calculate them based on delta between current and minimum value
2018-09-04 16:10:11 -05:00
Preetha Appan c56873ff37
Implement support for even spread across datacenters, with unit test 2018-09-04 16:10:11 -05:00
Preetha Appan d091c00dd3
Support implicit spread target to account for remaining desired counts 2018-09-04 16:10:11 -05:00
Preetha Appan 33779abe5f
fix comments 2018-09-04 16:10:11 -05:00
Preetha Appan 5812f906c8
Allow empty spread targets, and validate target percentages. 2018-09-04 16:10:11 -05:00
Preetha Appan 55f276c189
Include spreads configured at job level when precomputing weights/desired counts. 2018-09-04 16:10:11 -05:00
Preetha Appan fbd0004707
Fix warnings 2018-09-04 16:10:11 -05:00
Preetha Appan db0d95b09c
Implement spread iterator that scores according to percentage of desired count in each target.
Added this as a new step in the stack and some unit tests
2018-09-04 16:10:11 -05:00
Preetha Appan eccf128c5c
Some minor changes from code review 2018-09-04 16:10:11 -05:00
Preetha Appan 038ed52877
Fix after rename to ConstraintSetContainsAny 2018-09-04 16:10:11 -05:00
Preetha Appan 3a39db3902
Fix linting 2018-09-04 16:10:11 -05:00
Preetha Appan d5cd2bbddb
Remove unnecessary reset 2018-09-04 16:10:11 -05:00
Preetha Appan dccb693221
test for setcontainsany, and treat set_contains same as set_contains_all 2018-09-04 16:10:11 -05:00
Preetha Appan 70bfd0c0cb
Address some review feedback 2018-09-04 16:10:11 -05:00
Preetha Appan 8685593ec0
Back out changes to propertyset that were not necessary for affinities 2018-09-04 16:10:11 -05:00
Preetha Appan 5eacd6ada4
Implement affinity support in generic scheduler 2018-09-04 16:10:11 -05:00
Alex Dadgar e1c239daae
Merge pull request #4414 from hashicorp/b-stop-summary
Reset Queued allocs to zero when job stopped
2018-07-16 14:32:55 -07:00
Nick Ethier 6b6777359b
scheduler: fix missing err assignment 2018-07-11 14:27:10 -04:00
Nick Ethier 5f6def5b04
scheduler: better error handling 2018-07-05 11:00:03 -04:00
Nick Ethier 030e650e78
scheduler: fix nil pointer exception 2018-07-02 16:05:38 -04:00
Alex Dadgar 300b1a7a15 Tests only use testlog package logger 2018-06-13 15:40:56 -07:00
Alex Dadgar c3c79c408e Reset Queued allocs to zero when job stopped
When a job is stopped but not purged, we should set the Queued count to
be zero.
2018-06-13 10:46:39 -07:00
Preetha Appan b64788043e
make test create index clearer 2018-06-05 17:29:59 -05:00
Preetha Appan 3e264dcb79
Fix reconciler bug with deployment not being created if job create index is different
This fixes an issue where if a job is purged and resubmitted Nomad does not create
a new deployment. Adds unit test that failed before this fix
2018-06-05 13:58:53 -05:00
Preetha Appan f8a23bc54a
fix test comment 2018-05-09 16:01:34 -05:00
Preetha Appan ef531b0f34
Add unit tests for forced rescheduling 2018-05-09 11:30:42 -05:00
Preetha Appan c1b92c284e
Work in progress - force rescheduling of failed allocs 2018-05-08 17:26:57 -05:00
Alex Dadgar 555d14fd92
Add test 2018-05-07 14:55:01 -05:00
Preetha Appan cf44670d56
Make sure that task group has a deployment state before using it 2018-05-07 14:55:01 -05:00
Alex Dadgar c6478d9469
clarify comment 2018-05-07 14:55:01 -05:00
Alex Dadgar 768fec8505
Allow healthy canary deployment to skip progress deadline 2018-05-07 14:55:01 -05:00
Alex Dadgar 8626c1b94a
Reschedule when we have canaries properly 2018-05-07 14:55:01 -05:00
Alex Dadgar 8dee3ab068
canary reschedule test 2018-05-07 14:50:01 -05:00
Alex Dadgar deb93dc7b7
Test for rescheduling when there are canaries 2018-05-07 14:50:01 -05:00
Alex Dadgar 550f5e31f8
Allow canary count greater than desired 2018-05-07 14:50:01 -05:00
Alex Dadgar f95ab4ade8
Mark canaries on creation, and unmark on promotion 2018-05-07 14:50:01 -05:00
Preetha Appan 5329900f6d
Only use DesiredTransition.Reschedule in reconciler when its an active deployment 2018-05-07 14:50:01 -05:00
Alex Dadgar e7444c3873
Add test where deployment is marked as complete when done even with failed allocs 2018-05-07 14:50:01 -05:00
Alex Dadgar 57969b4ee0
fix reconcile tests 2018-05-07 14:50:01 -05:00
Alex Dadgar 5547974f35
Only reschedule allowed deployment allocs 2018-05-07 14:50:01 -05:00
Alex Dadgar fcf4f582d0
small review feedback fixes 2018-05-07 14:50:01 -05:00
Alex Dadgar 1336002255
Progress deadline in deployment state 2018-05-07 14:50:01 -05:00
Alex Dadgar ee50789c22
Initial implementation 2018-05-07 14:50:01 -05:00
Preetha Appan a569d34f25
Add custom status description for rescheduling follow up evals, and make unit test robust 2018-04-10 15:30:15 -05:00
Alex Dadgar e5b5803265 Only mark allocs as part of deployment if deployment is active 2018-04-05 15:40:49 -07:00
Preetha Appan 7e17bc231f
remove unnecessary check and other fixes from code review 2018-04-04 07:35:20 -05:00
Preetha Appan 00537c739b
Fixes edge cases around timing and task finish time being set more than once 2018-04-03 16:34:59 -05:00
Alex Dadgar 3aa4ee9d75 Fix lost handling of not actually down nodes 2018-03-30 14:17:41 -07:00
Preetha Appan d87e528059
rename skip->ignore and improve comment formatting 2018-03-29 15:11:10 -05:00
Preetha Appan 38a7614776
Refactored for readability, pair programmed with @dadgar 2018-03-29 13:28:37 -05:00
Preetha Appan 5090fefe96
Filter out allocs with DesiredState = stop, and unit tests 2018-03-29 09:28:52 -05:00
Alex Dadgar b18f789020 Unmark drain when nodes hit their deadline and only batch/system left and add all job type integration test 2018-03-28 17:25:58 -07:00
Preetha Appan d2899728fd
Fix linting 2018-03-28 12:26:28 -05:00