Commit graph

545 commits

Author SHA1 Message Date
Alex Dadgar 3ba62efd5e Failed/paused deployments do not block migrations
This PR changes behavior of the scheduler such that a task group with a
deployment that is failed or paused will not cause the scheduler to skip
migrations.

The reason for this change is that it causes a bad UX when draining
nodes with allocations that are part of a failed/paused deployment.
These operations should not be coupled in any way and this remedies
that.

Prior behavior was still correct, but required either jobs to
transistion to a healthy state or for the node to hit its drain
deadline.
2018-09-10 15:28:45 -07:00
Alex Dadgar cc92cd92cd
Merge pull request #4642 from hashicorp/b-vet
Fix vet errors and use newer go version in travis
2018-09-04 17:04:02 -07:00
Alex Dadgar c6576ddac1 Fix make check errors 2018-09-04 16:03:52 -07:00
Preetha Appan 751c0eb5a5
code review feedback 2018-09-04 16:10:11 -05:00
Preetha Appan 9bc0962527
Track top k nodes by norm score rather than top k nodes per scorer 2018-09-04 16:10:11 -05:00
Preetha Appan 6ed527c636
Use heap to store top K scoring nodes.
Scoring metadata is now aggregated by scorer type to make it easier
to parse when reading it in the CLI.
2018-09-04 16:10:11 -05:00
Preetha Appan 65cf4373b3
fix linting error 2018-09-04 16:10:11 -05:00
Preetha Appan dd5fe6373f
Fix scoring logic for uneven spread to incorporate current alloc count
Also addressed other small code review comments
2018-09-04 16:10:11 -05:00
Preetha Appan e72c0fe527
more cleanup 2018-09-04 16:10:11 -05:00
Preetha Appan 4c624424e6
added some unit tests for -1 spread score 2018-09-04 16:10:11 -05:00
Preetha Appan 92d37acc2a
comment and formatting cleanup 2018-09-04 16:10:11 -05:00
Preetha Appan 7b0a27cad6
fix scoring algorithm when min count == current count 2018-09-04 16:10:11 -05:00
Preetha Appan bad075f640
Remove hardcoded boosts for even spread.
instead, calculate them based on delta between current and minimum value
2018-09-04 16:10:11 -05:00
Preetha Appan c56873ff37
Implement support for even spread across datacenters, with unit test 2018-09-04 16:10:11 -05:00
Preetha Appan d091c00dd3
Support implicit spread target to account for remaining desired counts 2018-09-04 16:10:11 -05:00
Preetha Appan 33779abe5f
fix comments 2018-09-04 16:10:11 -05:00
Preetha Appan 5812f906c8
Allow empty spread targets, and validate target percentages. 2018-09-04 16:10:11 -05:00
Preetha Appan 55f276c189
Include spreads configured at job level when precomputing weights/desired counts. 2018-09-04 16:10:11 -05:00
Preetha Appan fbd0004707
Fix warnings 2018-09-04 16:10:11 -05:00
Preetha Appan db0d95b09c
Implement spread iterator that scores according to percentage of desired count in each target.
Added this as a new step in the stack and some unit tests
2018-09-04 16:10:11 -05:00
Preetha Appan eccf128c5c
Some minor changes from code review 2018-09-04 16:10:11 -05:00
Preetha Appan 038ed52877
Fix after rename to ConstraintSetContainsAny 2018-09-04 16:10:11 -05:00
Preetha Appan 3a39db3902
Fix linting 2018-09-04 16:10:11 -05:00
Preetha Appan d5cd2bbddb
Remove unnecessary reset 2018-09-04 16:10:11 -05:00
Preetha Appan dccb693221
test for setcontainsany, and treat set_contains same as set_contains_all 2018-09-04 16:10:11 -05:00
Preetha Appan 70bfd0c0cb
Address some review feedback 2018-09-04 16:10:11 -05:00
Preetha Appan 8685593ec0
Back out changes to propertyset that were not necessary for affinities 2018-09-04 16:10:11 -05:00
Preetha Appan 5eacd6ada4
Implement affinity support in generic scheduler 2018-09-04 16:10:11 -05:00
Alex Dadgar e1c239daae
Merge pull request #4414 from hashicorp/b-stop-summary
Reset Queued allocs to zero when job stopped
2018-07-16 14:32:55 -07:00
Nick Ethier 6b6777359b
scheduler: fix missing err assignment 2018-07-11 14:27:10 -04:00
Nick Ethier 5f6def5b04
scheduler: better error handling 2018-07-05 11:00:03 -04:00
Nick Ethier 030e650e78
scheduler: fix nil pointer exception 2018-07-02 16:05:38 -04:00
Alex Dadgar 300b1a7a15 Tests only use testlog package logger 2018-06-13 15:40:56 -07:00
Alex Dadgar c3c79c408e Reset Queued allocs to zero when job stopped
When a job is stopped but not purged, we should set the Queued count to
be zero.
2018-06-13 10:46:39 -07:00
Preetha Appan b64788043e
make test create index clearer 2018-06-05 17:29:59 -05:00
Preetha Appan 3e264dcb79
Fix reconciler bug with deployment not being created if job create index is different
This fixes an issue where if a job is purged and resubmitted Nomad does not create
a new deployment. Adds unit test that failed before this fix
2018-06-05 13:58:53 -05:00
Preetha Appan f8a23bc54a
fix test comment 2018-05-09 16:01:34 -05:00
Preetha Appan ef531b0f34
Add unit tests for forced rescheduling 2018-05-09 11:30:42 -05:00
Preetha Appan c1b92c284e
Work in progress - force rescheduling of failed allocs 2018-05-08 17:26:57 -05:00
Alex Dadgar 555d14fd92
Add test 2018-05-07 14:55:01 -05:00
Preetha Appan cf44670d56
Make sure that task group has a deployment state before using it 2018-05-07 14:55:01 -05:00
Alex Dadgar c6478d9469
clarify comment 2018-05-07 14:55:01 -05:00
Alex Dadgar 768fec8505
Allow healthy canary deployment to skip progress deadline 2018-05-07 14:55:01 -05:00
Alex Dadgar 8626c1b94a
Reschedule when we have canaries properly 2018-05-07 14:55:01 -05:00
Alex Dadgar 8dee3ab068
canary reschedule test 2018-05-07 14:50:01 -05:00
Alex Dadgar deb93dc7b7
Test for rescheduling when there are canaries 2018-05-07 14:50:01 -05:00
Alex Dadgar 550f5e31f8
Allow canary count greater than desired 2018-05-07 14:50:01 -05:00
Alex Dadgar f95ab4ade8
Mark canaries on creation, and unmark on promotion 2018-05-07 14:50:01 -05:00
Preetha Appan 5329900f6d
Only use DesiredTransition.Reschedule in reconciler when its an active deployment 2018-05-07 14:50:01 -05:00
Alex Dadgar e7444c3873
Add test where deployment is marked as complete when done even with failed allocs 2018-05-07 14:50:01 -05:00
Alex Dadgar 57969b4ee0
fix reconcile tests 2018-05-07 14:50:01 -05:00
Alex Dadgar 5547974f35
Only reschedule allowed deployment allocs 2018-05-07 14:50:01 -05:00
Alex Dadgar fcf4f582d0
small review feedback fixes 2018-05-07 14:50:01 -05:00
Alex Dadgar 1336002255
Progress deadline in deployment state 2018-05-07 14:50:01 -05:00
Alex Dadgar ee50789c22
Initial implementation 2018-05-07 14:50:01 -05:00
Preetha Appan a569d34f25
Add custom status description for rescheduling follow up evals, and make unit test robust 2018-04-10 15:30:15 -05:00
Alex Dadgar e5b5803265 Only mark allocs as part of deployment if deployment is active 2018-04-05 15:40:49 -07:00
Preetha Appan 7e17bc231f
remove unnecessary check and other fixes from code review 2018-04-04 07:35:20 -05:00
Preetha Appan 00537c739b
Fixes edge cases around timing and task finish time being set more than once 2018-04-03 16:34:59 -05:00
Alex Dadgar 3aa4ee9d75 Fix lost handling of not actually down nodes 2018-03-30 14:17:41 -07:00
Preetha Appan d87e528059
rename skip->ignore and improve comment formatting 2018-03-29 15:11:10 -05:00
Preetha Appan 38a7614776
Refactored for readability, pair programmed with @dadgar 2018-03-29 13:28:37 -05:00
Preetha Appan 5090fefe96
Filter out allocs with DesiredState = stop, and unit tests 2018-03-29 09:28:52 -05:00
Alex Dadgar b18f789020 Unmark drain when nodes hit their deadline and only batch/system left and add all job type integration test 2018-03-28 17:25:58 -07:00
Preetha Appan d2899728fd
Fix linting 2018-03-28 12:26:28 -05:00
Alex Dadgar 9d60e2cebf Correct status desc on draining system allocs 2018-03-26 17:54:46 -07:00
Preetha Appan 33e170c15d
s/linear/constant/g 2018-03-26 14:45:09 -05:00
Preetha 5668c3c38e
Merge pull request #4037 from hashicorp/b-fix-terminal-filtering-service-allocs
Fix edge case in reconciler
2018-03-26 13:14:51 -05:00
Preetha Appan 1b9e413a1a
one field per line in struct definition 2018-03-26 13:13:21 -05:00
Alex Dadgar e106da84de name and test 2018-03-26 11:06:21 -07:00
Alex Dadgar e2a6e64fca Don't create unnecessary deployments 2018-03-23 16:55:21 -07:00
Preetha Appan cbfd69ce7a
Fix edge case in reconciler where service jobs with ClientstatusComplete were not replaced 2018-03-23 18:41:00 -05:00
Alex Dadgar 3b72dd94ba Do not mark an allocation as an inplace update if specification hasn't changed 2018-03-23 14:36:05 -07:00
Michael Schurter cb61a4bdc7 Fix linting errors 2018-03-21 16:51:45 -07:00
Alex Dadgar 92b636dd32 Fix deadline handling 2018-03-21 16:51:44 -07:00
Michael Schurter 9263cc2ed7 scheduler: migrate non-terminal migrating allocs
filterByTainted node should always migrate non-terminal migrating allocs
2018-03-21 16:49:48 -07:00
Michael Schurter d1ec65d765 switch to new raft DesiredTransition message 2018-03-21 16:49:48 -07:00
Alex Dadgar db4a634072 RPC, FSM, State Store for marking DesiredTransistion
fix build tag
2018-03-21 16:49:48 -07:00
Michael Schurter c0542474db drain: initial drainv2 structs and impl 2018-03-21 16:49:48 -07:00
Chelsea Holland Komlo 329605b7cc fix up scheduling test 2018-03-21 15:54:03 -04:00
Chelsea Holland Komlo 60f12d206f improve comments; update watchDriver 2018-03-21 15:15:26 -04:00
Chelsea Holland Komlo d92703617c simplify logic
bump log level
2018-03-21 15:15:26 -04:00
Chelsea Holland Komlo d8f68e5ef8 fix up codereview feedback 2018-03-21 15:15:26 -04:00
Chelsea Holland Komlo c7fd0bd8a1 fix up scheduler mocks 2018-03-21 15:15:25 -04:00
Chelsea Holland Komlo 3aa726baab fix scheduler driver name; create node structs file 2018-03-21 15:15:25 -04:00
Chelsea Holland Komlo 3cba95e8a7 allow nomad to schedule based on the status of a client driver health check
Slight updates for go style
2018-03-21 15:15:25 -04:00
Preetha Appan 56e60e5840
Fix linting warning 2018-03-14 16:12:22 -05:00
Preetha Appan 9a5e6edf1f
Rename DelayCeiling to MaxDelay 2018-03-14 16:10:32 -05:00
Preetha Appan 3e96c6c4e0
Address more code review feedback 2018-03-14 16:10:32 -05:00
Preetha Appan 9fed0d2103
Get reschedule policy from the alloc directly 2018-03-14 16:10:32 -05:00
Preetha Appan e89bbf7289
Update comment about WaitTime 2018-03-14 16:10:32 -05:00
Preetha Appan e2656ef546
Cleaner handling of batched evals 2018-03-14 16:10:32 -05:00
Preetha Appan 47e0280d96
More small review feedback 2018-03-14 16:10:32 -05:00
Preetha Appan 2ba976dec8
Remove unnecessary check against 5 second window for determining immediate scheduling eligibility 2018-03-14 16:10:32 -05:00
Preetha Appan 5373ade731
Scheduler and Reconciler changes to support delayed rescheduling 2018-03-14 16:10:32 -05:00
Josh Soref e0f6a33fe5 spelling: system 2018-03-11 19:01:19 +00:00
Josh Soref a89e1b8395 spelling: strategy 2018-03-11 18:58:19 +00:00
Josh Soref f8eb766fb5 spelling: reschedulable 2018-03-11 18:48:12 +00:00
Josh Soref ed8db9992e spelling: feasibility 2018-03-11 18:07:09 +00:00
Josh Soref bf9283c606 spelling: corresponding 2018-03-11 17:51:41 +00:00