Alex Dadgar
a7ca737fb6
review comments
2018-11-07 11:31:52 -08:00
Alex Dadgar
36abd3a3d8
review comments
2018-11-07 10:33:22 -08:00
Alex Dadgar
e3cbb2c82e
allocs fit checks if devices get oversubscribed
2018-11-07 10:33:22 -08:00
Alex Dadgar
4f9b3ede87
Split device accounter and allocator
2018-11-07 10:32:03 -08:00
Alex Dadgar
6fa893c801
affinities
2018-11-07 10:32:03 -08:00
Alex Dadgar
feb83a2be3
assign devices
2018-11-07 10:32:03 -08:00
Alex Dadgar
6d8bb3a7bd
Duplicate blocked evals cancelling improved
...
The old logic for cancelling duplicate blocked evaluations by job id had
the issue where the newer evaluation could have additional node classes
that it is (in)eligible for that we would not capture. This could make
it such that cluster state could change such that the job would make
progress but no evaluation was unblocked.
2018-11-07 10:08:23 -08:00
Preetha Appan
a6b714b81c
update preemption tests to use new node resource structs
...
also includes a fix to remove unnecessary subtraction of network mbits
2018-11-02 17:59:53 -05:00
Preetha
b2b52b1ada
Merge pull request #4794 from hashicorp/f-preemption-systemjobs
...
Preemption for system jobs
2018-11-02 16:28:06 -05:00
Preetha Appan
56de32f363
Address more minor code review feedback
2018-11-02 16:26:34 -05:00
Preetha Appan
253a351532
Fix test setup
2018-11-02 16:06:25 -05:00
Preetha Appan
fba24e5a8a
dereference safely
2018-11-02 15:58:59 -05:00
Preetha Appan
d061678df7
Fix static port preemption to be device aware
2018-11-02 13:07:24 -05:00
Preetha Appan
4182444937
Handle static port preemption when there are multiple devices
...
Also added test case
2018-11-02 09:09:50 -05:00
Preetha Appan
fd60e66f86
Plumb alloc resource cache in a few more places.
...
also removed now unused method
2018-11-01 16:44:43 -05:00
Preetha Appan
78d635edca
More review comments
2018-11-01 16:36:11 -05:00
Preetha Appan
6e1023ba08
Cleaner way to exit early, and fixed a couple more places reading from alloc.Resources
2018-11-01 16:15:58 -05:00
Preetha Appan
b4dd26247f
review comments
2018-11-01 12:01:59 -05:00
Preetha Appan
d03201adf8
Fix formatting of allocation score metrics
2018-10-30 12:03:23 -05:00
Preetha Appan
f1c3eb2792
Introduce interface with multiple implementations for resource distance
2018-10-30 11:06:32 -05:00
Preetha Appan
047af5141e
refactor preemption code to use method recievers and setters for common fields
2018-10-30 11:06:32 -05:00
Preetha Appan
1a5421f5d7
more minor cleanup
2018-10-30 11:06:32 -05:00
Preetha Appan
0494a098ce
More style and readablity fixes from review
2018-10-30 11:06:32 -05:00
Preetha Appan
3910ba9bbd
Preempted allocations should be removed from proposed allocations
2018-10-30 11:06:32 -05:00
Preetha Appan
9dd76d83dc
comments
2018-10-30 11:06:32 -05:00
Preetha Appan
e6234e3cc5
fix end to end scheduler test to use new resource structs correctly
2018-10-30 11:06:32 -05:00
Preetha Appan
8807c25b11
Modify preemption code to use new style of resource structs
2018-10-30 11:06:32 -05:00
Preetha Appan
c1c1c230e4
Make preemption config a struct to allow for enabling based on scheduler type
2018-10-30 11:06:32 -05:00
Preetha Appan
25a047267f
Use scheduler config from state store to enable/disable preemption
2018-10-30 11:06:32 -05:00
Preetha Appan
1805032e69
Fix linting and better comments
2018-10-30 11:06:32 -05:00
Preetha Appan
cc295b90de
Implement preemption for system jobs.
...
This commit implements an allocation selection algorithm for finding
allocations to preempt. It currently special cases network resource asks
from others (cpu/memory/disk/iops).
2018-10-30 11:06:32 -05:00
Preetha Appan
22aee7294e
Merge branch 'f-fix-resource-type' of github.com:hashicorp/nomad into f-fix-resource-type
2018-10-16 18:30:12 -05:00
Preetha Appan
53c3f8151b
fix linting
2018-10-16 18:29:49 -05:00
Alex Dadgar
a78cefec18
use int64
2018-10-16 15:34:32 -07:00
Preetha Appan
7c0d8c646c
Change CPU/Disk/MemoryMB to int everywhere in new resource structs
2018-10-16 16:21:42 -05:00
Alex Dadgar
f5a76d8411
review comments
2018-10-15 15:31:13 -07:00
Alex Dadgar
7ecd65109a
Check constraints on devices
2018-10-14 13:35:47 -07:00
Alex Dadgar
5284554fcc
rework device checker
2018-10-13 16:47:53 -07:00
Alex Dadgar
1089e13b14
add to stack
2018-10-13 12:27:49 -07:00
Alex Dadgar
9b5aaac410
Device feasability checker
2018-10-13 12:27:49 -07:00
Preetha Appan
1574e898af
Fix bug in reconciler where terminal allocs on a job already stopped were unnecessarily updated
2018-10-08 21:03:49 -05:00
Alex Dadgar
01f8e5b95f
renames
2018-10-04 14:57:25 -07:00
Alex Dadgar
52f9cd7637
fixing tests
2018-10-04 14:26:19 -07:00
Alex Dadgar
bac5cb1e8b
Scheduler uses allocated resources
2018-10-02 17:08:25 -07:00
Preetha Appan
a10118c461
Add failed follow up to the list of allowed eval trigger reasons
...
needs unit test
2018-09-25 10:49:55 -07:00
Alex Dadgar
6a21f9fe96
Unique TriggerBy for blocked evals
...
Give blocked evals a unique triggerby reason to make debugging a chain
of evaluations easier.
2018-09-24 14:47:49 -07:00
Alex Dadgar
3c19d01d7a
server
2018-09-15 16:23:13 -07:00
Alex Dadgar
3ba62efd5e
Failed/paused deployments do not block migrations
...
This PR changes behavior of the scheduler such that a task group with a
deployment that is failed or paused will not cause the scheduler to skip
migrations.
The reason for this change is that it causes a bad UX when draining
nodes with allocations that are part of a failed/paused deployment.
These operations should not be coupled in any way and this remedies
that.
Prior behavior was still correct, but required either jobs to
transistion to a healthy state or for the node to hit its drain
deadline.
2018-09-10 15:28:45 -07:00
Alex Dadgar
cc92cd92cd
Merge pull request #4642 from hashicorp/b-vet
...
Fix vet errors and use newer go version in travis
2018-09-04 17:04:02 -07:00
Alex Dadgar
c6576ddac1
Fix make check errors
2018-09-04 16:03:52 -07:00
Preetha Appan
751c0eb5a5
code review feedback
2018-09-04 16:10:11 -05:00
Preetha Appan
9bc0962527
Track top k nodes by norm score rather than top k nodes per scorer
2018-09-04 16:10:11 -05:00
Preetha Appan
6ed527c636
Use heap to store top K scoring nodes.
...
Scoring metadata is now aggregated by scorer type to make it easier
to parse when reading it in the CLI.
2018-09-04 16:10:11 -05:00
Preetha Appan
65cf4373b3
fix linting error
2018-09-04 16:10:11 -05:00
Preetha Appan
dd5fe6373f
Fix scoring logic for uneven spread to incorporate current alloc count
...
Also addressed other small code review comments
2018-09-04 16:10:11 -05:00
Preetha Appan
e72c0fe527
more cleanup
2018-09-04 16:10:11 -05:00
Preetha Appan
4c624424e6
added some unit tests for -1 spread score
2018-09-04 16:10:11 -05:00
Preetha Appan
92d37acc2a
comment and formatting cleanup
2018-09-04 16:10:11 -05:00
Preetha Appan
7b0a27cad6
fix scoring algorithm when min count == current count
2018-09-04 16:10:11 -05:00
Preetha Appan
bad075f640
Remove hardcoded boosts for even spread.
...
instead, calculate them based on delta between current and minimum value
2018-09-04 16:10:11 -05:00
Preetha Appan
c56873ff37
Implement support for even spread across datacenters, with unit test
2018-09-04 16:10:11 -05:00
Preetha Appan
d091c00dd3
Support implicit spread target to account for remaining desired counts
2018-09-04 16:10:11 -05:00
Preetha Appan
33779abe5f
fix comments
2018-09-04 16:10:11 -05:00
Preetha Appan
5812f906c8
Allow empty spread targets, and validate target percentages.
2018-09-04 16:10:11 -05:00
Preetha Appan
55f276c189
Include spreads configured at job level when precomputing weights/desired counts.
2018-09-04 16:10:11 -05:00
Preetha Appan
fbd0004707
Fix warnings
2018-09-04 16:10:11 -05:00
Preetha Appan
db0d95b09c
Implement spread iterator that scores according to percentage of desired count in each target.
...
Added this as a new step in the stack and some unit tests
2018-09-04 16:10:11 -05:00
Preetha Appan
eccf128c5c
Some minor changes from code review
2018-09-04 16:10:11 -05:00
Preetha Appan
038ed52877
Fix after rename to ConstraintSetContainsAny
2018-09-04 16:10:11 -05:00
Preetha Appan
3a39db3902
Fix linting
2018-09-04 16:10:11 -05:00
Preetha Appan
d5cd2bbddb
Remove unnecessary reset
2018-09-04 16:10:11 -05:00
Preetha Appan
dccb693221
test for setcontainsany, and treat set_contains same as set_contains_all
2018-09-04 16:10:11 -05:00
Preetha Appan
70bfd0c0cb
Address some review feedback
2018-09-04 16:10:11 -05:00
Preetha Appan
8685593ec0
Back out changes to propertyset that were not necessary for affinities
2018-09-04 16:10:11 -05:00
Preetha Appan
5eacd6ada4
Implement affinity support in generic scheduler
2018-09-04 16:10:11 -05:00
Alex Dadgar
e1c239daae
Merge pull request #4414 from hashicorp/b-stop-summary
...
Reset Queued allocs to zero when job stopped
2018-07-16 14:32:55 -07:00
Nick Ethier
6b6777359b
scheduler: fix missing err assignment
2018-07-11 14:27:10 -04:00
Nick Ethier
5f6def5b04
scheduler: better error handling
2018-07-05 11:00:03 -04:00
Nick Ethier
030e650e78
scheduler: fix nil pointer exception
2018-07-02 16:05:38 -04:00
Alex Dadgar
300b1a7a15
Tests only use testlog package logger
2018-06-13 15:40:56 -07:00
Alex Dadgar
c3c79c408e
Reset Queued allocs to zero when job stopped
...
When a job is stopped but not purged, we should set the Queued count to
be zero.
2018-06-13 10:46:39 -07:00
Preetha Appan
b64788043e
make test create index clearer
2018-06-05 17:29:59 -05:00
Preetha Appan
3e264dcb79
Fix reconciler bug with deployment not being created if job create index is different
...
This fixes an issue where if a job is purged and resubmitted Nomad does not create
a new deployment. Adds unit test that failed before this fix
2018-06-05 13:58:53 -05:00
Preetha Appan
f8a23bc54a
fix test comment
2018-05-09 16:01:34 -05:00
Preetha Appan
ef531b0f34
Add unit tests for forced rescheduling
2018-05-09 11:30:42 -05:00
Preetha Appan
c1b92c284e
Work in progress - force rescheduling of failed allocs
2018-05-08 17:26:57 -05:00
Alex Dadgar
555d14fd92
Add test
2018-05-07 14:55:01 -05:00
Preetha Appan
cf44670d56
Make sure that task group has a deployment state before using it
2018-05-07 14:55:01 -05:00
Alex Dadgar
c6478d9469
clarify comment
2018-05-07 14:55:01 -05:00
Alex Dadgar
768fec8505
Allow healthy canary deployment to skip progress deadline
2018-05-07 14:55:01 -05:00
Alex Dadgar
8626c1b94a
Reschedule when we have canaries properly
2018-05-07 14:55:01 -05:00
Alex Dadgar
8dee3ab068
canary reschedule test
2018-05-07 14:50:01 -05:00
Alex Dadgar
deb93dc7b7
Test for rescheduling when there are canaries
2018-05-07 14:50:01 -05:00
Alex Dadgar
550f5e31f8
Allow canary count greater than desired
2018-05-07 14:50:01 -05:00
Alex Dadgar
f95ab4ade8
Mark canaries on creation, and unmark on promotion
2018-05-07 14:50:01 -05:00
Preetha Appan
5329900f6d
Only use DesiredTransition.Reschedule in reconciler when its an active deployment
2018-05-07 14:50:01 -05:00
Alex Dadgar
e7444c3873
Add test where deployment is marked as complete when done even with failed allocs
2018-05-07 14:50:01 -05:00
Alex Dadgar
57969b4ee0
fix reconcile tests
2018-05-07 14:50:01 -05:00
Alex Dadgar
5547974f35
Only reschedule allowed deployment allocs
2018-05-07 14:50:01 -05:00
Alex Dadgar
fcf4f582d0
small review feedback fixes
2018-05-07 14:50:01 -05:00
Alex Dadgar
1336002255
Progress deadline in deployment state
2018-05-07 14:50:01 -05:00
Alex Dadgar
ee50789c22
Initial implementation
2018-05-07 14:50:01 -05:00
Preetha Appan
a569d34f25
Add custom status description for rescheduling follow up evals, and make unit test robust
2018-04-10 15:30:15 -05:00
Alex Dadgar
e5b5803265
Only mark allocs as part of deployment if deployment is active
2018-04-05 15:40:49 -07:00
Preetha Appan
7e17bc231f
remove unnecessary check and other fixes from code review
2018-04-04 07:35:20 -05:00
Preetha Appan
00537c739b
Fixes edge cases around timing and task finish time being set more than once
2018-04-03 16:34:59 -05:00
Alex Dadgar
3aa4ee9d75
Fix lost handling of not actually down nodes
2018-03-30 14:17:41 -07:00
Preetha Appan
d87e528059
rename skip->ignore and improve comment formatting
2018-03-29 15:11:10 -05:00
Preetha Appan
38a7614776
Refactored for readability, pair programmed with @dadgar
2018-03-29 13:28:37 -05:00
Preetha Appan
5090fefe96
Filter out allocs with DesiredState = stop, and unit tests
2018-03-29 09:28:52 -05:00
Alex Dadgar
b18f789020
Unmark drain when nodes hit their deadline and only batch/system left and add all job type integration test
2018-03-28 17:25:58 -07:00
Preetha Appan
d2899728fd
Fix linting
2018-03-28 12:26:28 -05:00
Alex Dadgar
9d60e2cebf
Correct status desc on draining system allocs
2018-03-26 17:54:46 -07:00
Preetha Appan
33e170c15d
s/linear/constant/g
2018-03-26 14:45:09 -05:00
Preetha
5668c3c38e
Merge pull request #4037 from hashicorp/b-fix-terminal-filtering-service-allocs
...
Fix edge case in reconciler
2018-03-26 13:14:51 -05:00
Preetha Appan
1b9e413a1a
one field per line in struct definition
2018-03-26 13:13:21 -05:00
Alex Dadgar
e106da84de
name and test
2018-03-26 11:06:21 -07:00
Alex Dadgar
e2a6e64fca
Don't create unnecessary deployments
2018-03-23 16:55:21 -07:00
Preetha Appan
cbfd69ce7a
Fix edge case in reconciler where service jobs with ClientstatusComplete were not replaced
2018-03-23 18:41:00 -05:00
Alex Dadgar
3b72dd94ba
Do not mark an allocation as an inplace update if specification hasn't changed
2018-03-23 14:36:05 -07:00
Michael Schurter
cb61a4bdc7
Fix linting errors
2018-03-21 16:51:45 -07:00
Alex Dadgar
92b636dd32
Fix deadline handling
2018-03-21 16:51:44 -07:00
Michael Schurter
9263cc2ed7
scheduler: migrate non-terminal migrating allocs
...
filterByTainted node should always migrate non-terminal migrating allocs
2018-03-21 16:49:48 -07:00
Michael Schurter
d1ec65d765
switch to new raft DesiredTransition message
2018-03-21 16:49:48 -07:00
Alex Dadgar
db4a634072
RPC, FSM, State Store for marking DesiredTransistion
...
fix build tag
2018-03-21 16:49:48 -07:00
Michael Schurter
c0542474db
drain: initial drainv2 structs and impl
2018-03-21 16:49:48 -07:00
Chelsea Holland Komlo
329605b7cc
fix up scheduling test
2018-03-21 15:54:03 -04:00
Chelsea Holland Komlo
60f12d206f
improve comments; update watchDriver
2018-03-21 15:15:26 -04:00
Chelsea Holland Komlo
d92703617c
simplify logic
...
bump log level
2018-03-21 15:15:26 -04:00
Chelsea Holland Komlo
d8f68e5ef8
fix up codereview feedback
2018-03-21 15:15:26 -04:00
Chelsea Holland Komlo
c7fd0bd8a1
fix up scheduler mocks
2018-03-21 15:15:25 -04:00
Chelsea Holland Komlo
3aa726baab
fix scheduler driver name; create node structs file
2018-03-21 15:15:25 -04:00
Chelsea Holland Komlo
3cba95e8a7
allow nomad to schedule based on the status of a client driver health check
...
Slight updates for go style
2018-03-21 15:15:25 -04:00
Preetha Appan
56e60e5840
Fix linting warning
2018-03-14 16:12:22 -05:00
Preetha Appan
9a5e6edf1f
Rename DelayCeiling to MaxDelay
2018-03-14 16:10:32 -05:00
Preetha Appan
3e96c6c4e0
Address more code review feedback
2018-03-14 16:10:32 -05:00
Preetha Appan
9fed0d2103
Get reschedule policy from the alloc directly
2018-03-14 16:10:32 -05:00
Preetha Appan
e89bbf7289
Update comment about WaitTime
2018-03-14 16:10:32 -05:00
Preetha Appan
e2656ef546
Cleaner handling of batched evals
2018-03-14 16:10:32 -05:00
Preetha Appan
47e0280d96
More small review feedback
2018-03-14 16:10:32 -05:00
Preetha Appan
2ba976dec8
Remove unnecessary check against 5 second window for determining immediate scheduling eligibility
2018-03-14 16:10:32 -05:00
Preetha Appan
5373ade731
Scheduler and Reconciler changes to support delayed rescheduling
2018-03-14 16:10:32 -05:00
Josh Soref
e0f6a33fe5
spelling: system
2018-03-11 19:01:19 +00:00
Josh Soref
a89e1b8395
spelling: strategy
2018-03-11 18:58:19 +00:00
Josh Soref
f8eb766fb5
spelling: reschedulable
2018-03-11 18:48:12 +00:00
Josh Soref
ed8db9992e
spelling: feasibility
2018-03-11 18:07:09 +00:00
Josh Soref
bf9283c606
spelling: corresponding
2018-03-11 17:51:41 +00:00
Josh Soref
ca4ceb0e5c
spelling: commits
2018-03-11 17:47:45 +00:00
Preetha Appan
7b6ba7a1f4
Fixes bug in reconciler where previously rescheduled allocs are rescheduled again. Simplified logic and added test case to catch this.
2018-02-20 12:07:56 -06:00
Preetha Appan
7c57303dd2
Clarify comment
2018-02-05 16:37:07 -06:00
Preetha Appan
d48c411692
Reconciler should consider failed allocs when marking deployment as failed.
2018-02-02 19:40:25 -06:00
Preetha Appan
a1237d627a
code review feedback
2018-01-31 09:58:05 -06:00
Preetha Appan
5ad892026a
Add a field to track the next allocation during a replacement
2018-01-31 09:58:05 -06:00
Preetha Appan
2ed4de7e7b
Track previous node id correctly, plus unit test
2018-01-31 09:58:05 -06:00
Preetha Appan
dd4917c2f0
Add more clarification in comment
2018-01-31 09:58:05 -06:00
Preetha Appan
09bef7d1ce
Preallocate slice for skipped nodes
2018-01-31 09:58:05 -06:00
Preetha Appan
237beb49ae
Better score threshold
2018-01-31 09:58:05 -06:00
Preetha Appan
fa18c0def4
Add one more unit test
2018-01-31 09:58:05 -06:00
Preetha Appan
a75540cec6
Limit iterator uses a score threshold and a maxSkip value to be able to skip lower scoring nodes
2018-01-31 09:58:05 -06:00
Preetha Appan
b6268a5fab
Beef up unit test for rescheduling batch jobs
2018-01-31 09:56:53 -06:00
Preetha Appan
ea4a889e28
Address more code review feedback
2018-01-31 09:56:53 -06:00
Preetha Appan
bd89d2b39e
Make sure that reschedule trackers are not added for node drain replacements
2018-01-31 09:56:53 -06:00
Preetha Appan
a662b38801
Improve reconciler unit tests
2018-01-31 09:56:53 -06:00
Preetha Appan
fee4ccf154
Prevent side effect modification of select options when preferred nodes are set
2018-01-31 09:56:53 -06:00
Preetha Appan
21b7b79d5d
Add helper methods, use require and other code review feedback
2018-01-31 09:56:53 -06:00
Preetha Appan
d0f9d59abb
Reconile with changes to structs for reschedule tracking
2018-01-31 09:56:53 -06:00
Preetha Appan
fbb1936dee
Fix some comments and lint warnings, remove unused method
2018-01-31 09:56:53 -06:00
Preetha Appan
031c566ada
Reschedule previous allocs and track their reschedule attempts
2018-01-31 09:56:53 -06:00
Preetha Appan
fd2fbefa4c
Add a field to track the next allocation during a replacement
2018-01-24 17:55:05 -06:00
Alex Dadgar
6dda0ebaed
gofmt
2018-01-04 14:45:15 -08:00
Alex Dadgar
2f561609b7
Fix detection of successful batch allocations
...
This PR restores older behavior of detecting successful batch
allocations (04d86ffd1006fde9dfb2ca8c1237fe60b995b0e3). This has the
side effect that we correctly filter desired status stop but not
successful batch allocations and create their replacements.
2018-01-04 14:20:32 -08:00
Preetha
1712b03705
Merge branch 'master' into 0.8
2018-01-03 16:06:38 -06:00
Preetha Appan
51bd0b59c7
Return an error if evaluation doesn't exist in state store at plan apply time.
2017-12-18 14:55:36 -06:00
Preetha Appan
3c36abfe14
Update eval modify index as part of plan apply.
2017-12-18 10:03:55 -06:00
Preetha Appan
3b4d7ac2a3
Fix some typos
2017-12-14 13:29:27 -06:00
Michael Schurter
45494f7304
Fix port labels on mock Alloc/Job/Node
2017-12-08 14:50:06 -08:00
Alex Dadgar
44240ce440
Merge pull request #3375 from hashicorp/b-batch
...
Allow batch jobs to be rerun if purged
2017-10-13 17:11:45 -07:00
Alex Dadgar
c1cc51dbee
sync
2017-10-13 14:36:02 -07:00
Alex Dadgar
746cd7403f
Allow batch jobs to be rerun if purged
...
This PR allows batch jobs to be rerun if they have been purged.
2017-10-13 12:40:37 -07:00
Michael Schurter
a66c53d45a
Remove `structs` import from `api`
...
Goes a step further and removes structs import from api's tests as well
by moving GenerateUUID to its own package.
2017-09-29 10:36:08 -07:00
Alex Dadgar
4173834231
Enable more linters
2017-09-26 15:26:33 -07:00
Alex Dadgar
3904bde9a3
Fix batch handling of complete allocs/node drains
...
This PR fixes:
* An issue in which a node-drain that contains a complete batch alloc
would cause a replacement
* An issue in which allocations with the same name during a scale
down/stop event wouldn't be properly stopped.
* An issue in which batch allocations from previous job versions may not
have been stopped properly.
Fixes https://github.com/hashicorp/nomad/issues/3210
2017-09-14 15:08:57 -07:00
Alex Dadgar
84d06f6abe
Sync namespace changes
2017-09-07 17:04:21 -07:00
Alex Dadgar
0aef02a4f9
fix test
2017-08-21 14:07:54 -07:00
Alex Dadgar
27256ebcc6
Placing allocs counts towards placement limit
...
This PR makes placing new allocations count towards the limit. We do not
restrict how many new placements are made by the limit but we still
count towards the limit. This has the nice affect that if you have a
group with count = 5 and max_parallel = 1 but only 3 allocs exist for it
and a change is made, you will create 2 more at the new version but not
destroy one, taking you down to two running as you would have
previously.
Fixes https://github.com/hashicorp/nomad/issues/3053
2017-08-21 12:41:19 -07:00
Alex Dadgar
2453f13fc5
fixes
2017-08-15 12:27:05 -07:00
Alex Dadgar
0570e09feb
Fix panic occuring from improper bitmap size
...
This PR fixes an allignment calculation when determining the bitmap
size.
Fixes https://github.com/hashicorp/nomad/issues/3008
2017-08-12 15:37:02 -07:00
Luke Farnell
f0ced87b95
fixed all spelling mistakes for goreport
2017-08-07 17:13:05 -04:00
Alex Dadgar
7b13c0d702
Lost allocs replaced even if deployment failed
...
This PR allows the scheduler to replace lost allocations even if the job
has a failed or paused deployment. The prior behavior was confusing to
users.
Fixes https://github.com/hashicorp/nomad/issues/2958
2017-08-03 17:42:14 -07:00
Alex Dadgar
7d2b84ab01
Review fixes
2017-08-01 14:18:52 -07:00
Alex Dadgar
2650bb1d12
Distinct Property supports arbitrary limit
...
This PR enhances the distinct_property constraint such that a limit can
be specified in the RTarget/value parameter. This allows constraints
such as:
```
constraint {
distinct_property = "${meta.rack}"
value = "2"
}
```
This restricts any given rack from running more than 2 allocations from
the task group.
Fixes https://github.com/hashicorp/nomad/issues/1146
2017-07-31 16:52:13 -07:00
Alex Dadgar
4f69355a66
Fix incorrect destructive update with distinct_property constraint
...
This PR fixes an issue in which an update to a task group with a
distinct property constraint would result in an incorrect destructive
update.
2017-07-31 11:17:35 -07:00
Michael Schurter
5f1f91a46c
Use go-testing-interface instead of testing
...
This drops the testings stdlib pkg from our dependencies. Saves a
whopping 46kb on our binary (was really hoping for more of a win there),
but also avoids potential ugliness with how testing sets flags.
2017-07-25 15:35:19 -07:00
Alex Dadgar
492239d3ee
Improve multiple group handling in a deployment
...
This PR resolves a bug in which a job with multiple task groups would
create new deployment objects each, thus clearing out all other task
groups deployment state.
2017-07-25 11:27:47 -07:00
Alex Dadgar
184bfd4836
Better comment
2017-07-20 12:31:08 -07:00
Alex Dadgar
248315a2d9
Handle destructive changes before placements
...
This PR updates the generic scheduler to handle destructive changes
before handling placements. This is important because the destructive
change may be due to a lowering of resources. If this is the case, the
handling of the destructive changes first may make it possible for the
placement to happen.
To reason about this imagine there is one node with CPU = 500.
If the group originally had:
* `count = 1`
* `cpu = 400`
And then the job was updated such that the group had:
* `count = 4`
* `cpu = 120`
If the original alloc isn't discounted first, nothing would be able to
place.
2017-07-20 12:24:27 -07:00
Alex Dadgar
ce265e0aff
Update full node test to test more advanced case
2017-07-20 12:23:40 -07:00
Alex Dadgar
a9ec1d6ca7
Fix update limit calculation to avoid panic
...
This PR fixes the rolling update limit calculation to avoid a panic when
there are more allocations for a deployment that haven't determined
their health than the max_parallel count of the task group.
Fixes https://github.com/hashicorp/nomad/issues/2820
2017-07-19 11:11:47 -07:00
Alex Dadgar
22e84d00ab
Fix deep copy of driver config
2017-07-17 17:53:21 -07:00
Alex Dadgar
641e178416
Stop before trying to place
2017-07-17 17:18:12 -07:00
Alex Dadgar
66a90326e1
Treat destructive updates atomically
2017-07-16 10:35:38 -07:00
Alex Dadgar
f86760db3c
Basic logs
2017-07-07 16:49:08 -07:00
Alex Dadgar
20005f925a
Rolling node drains using max_parallel and stagger
...
This PR adds rolling node drains done at max_parallel and stagger of the
update spec. It brings it inline with old behavior.
2017-07-07 12:12:48 -07:00
Alex Dadgar
3a29b38108
Status description shows requiring promotion
2017-07-07 12:12:48 -07:00
Alex Dadgar
9f016606aa
Fix some tests, eval monitor shows deployment id and deployment cancels based on version
2017-07-07 12:12:48 -07:00
Alex Dadgar
9aa1f2fea2
Respond to comments
2017-07-07 12:10:04 -07:00
Alex Dadgar
454083ba1b
Remove canary
2017-07-07 12:10:04 -07:00
Alex Dadgar
d352d85bb9
Test scheduler's handling of canaries/inplace updates
2017-07-07 12:10:04 -07:00
Alex Dadgar
83c60483f2
Test marking as complete
2017-07-07 12:10:04 -07:00
Alex Dadgar
477c713df5
Plan apply handles canaries and success is set via update
2017-07-07 12:10:04 -07:00
Alex Dadgar
1e8b5e75a5
Fix handling of failed job
2017-07-07 12:10:04 -07:00
Alex Dadgar
e229d3650b
Attach eval id
2017-07-07 12:10:04 -07:00
Alex Dadgar
af1935e1e1
Mark complete
2017-07-07 12:10:04 -07:00
Alex Dadgar
8424a3b380
Change canary handling
2017-07-07 12:10:04 -07:00
Alex Dadgar
c10d7ab871
Remove promoted bit from allocation
2017-07-07 12:10:04 -07:00
Alex Dadgar
09dfa2fc10
Rename CreateDeployments and remove cancelling behavior in state_store
2017-07-07 12:10:04 -07:00
Alex Dadgar
067ed86a47
Client watches for allocation health using task state and Consul checks
...
This PR adds watching of allocation health at the client. The client can
watch for health based on the tasks running on time and also based on
the consul checks passing.
2017-07-07 12:10:04 -07:00
Alex Dadgar
e7034691ea
deployment status
2017-07-07 12:07:07 -07:00
Alex Dadgar
d04877d23c
initial impl
2017-07-07 12:03:11 -07:00
Alex Dadgar
27a6e6b6d1
update description of the alloc update factory function
2017-07-07 12:03:11 -07:00
Alex Dadgar
ce2319be9b
cleanup limit detection
2017-07-07 12:03:11 -07:00
Alex Dadgar
b2573b01f9
Fix canary handling
2017-07-07 12:03:11 -07:00
Alex Dadgar
7952240d69
Deployment tests
2017-07-07 12:03:11 -07:00
Alex Dadgar
ce55559f12
Non-Canary/Deployment Tests
2017-07-07 12:03:11 -07:00
Alex Dadgar
d111dd5c10
Pull out in-place updating into a passed in function; reduce inputs to reconciler
2017-07-07 12:03:11 -07:00
Alex Dadgar
c77944ed29
assign names
2017-07-07 12:03:11 -07:00
Alex Dadgar
ecacd44888
handle batch filtering
2017-07-07 12:03:11 -07:00
Alex Dadgar
4c123500ee
Remove old
2017-07-07 12:03:11 -07:00
Alex Dadgar
270e26c600
Populate desired state per tg
2017-07-07 12:03:11 -07:00
Alex Dadgar
23dcd175ef
Show canaries on plan
2017-07-07 12:03:11 -07:00
Alex Dadgar
cf5baba808
handle annotations
2017-07-07 12:03:11 -07:00
Alex Dadgar
a46f7c3eb8
Todos
2017-07-07 12:03:11 -07:00
Alex Dadgar
00d962b8b5
Some comments and cleanup
2017-07-07 12:03:11 -07:00
Alex Dadgar
994ad285b7
Split reconcile file
2017-07-07 12:03:11 -07:00
Alex Dadgar
07b1c3e5db
Only upsert a job if the spec changes and push deployment creation into reconciler
2017-07-07 12:03:11 -07:00
Alex Dadgar
0d42b5d421
initial reconciler
2017-07-07 12:01:17 -07:00
Alex Dadgar
b3f4db0930
cancel deployments
2017-07-07 12:01:17 -07:00
Alex Dadgar
8169590d76
Fix tests
2017-05-01 13:54:26 -07:00
Alex Dadgar
5a2449d236
Respond to review comments
2017-04-19 10:54:03 -07:00
Alex Dadgar
3145086a42
non-purge deregisters
2017-04-15 17:08:05 -07:00
Alex Dadgar
2c31d4036b
Skip inplace update on terminal batch allocation
...
This PR skips adding an inplace update to a successfully terminal batch
job to the plan. This avoids extra data in the plan and avoids
triggering updates on all clients that have the terminal allocation.
This is matching behavior of the service scheduler.
/cc @armon for review
2017-03-11 17:19:22 -08:00
Alex Dadgar
bb12ff69a6
Fix in-place update
2017-03-09 22:03:10 -08:00
Alex Dadgar
601cbd7784
Feedback addressed
2017-03-09 21:36:27 -08:00
Alex Dadgar
b65d248dee
Fix filtering issue and add a test that would catch it
2017-03-09 16:20:39 -08:00
Alex Dadgar
7945e4564c
Refactor
2017-03-09 15:26:46 -08:00
Alex Dadgar
60c42f745a
Split distinct property and host iterator and add iterator to system stack
2017-03-08 19:00:10 -08:00
Alex Dadgar
319b24081f
cleanup
2017-03-08 17:57:31 -08:00
Alex Dadgar
a439bf709d
Property Set
2017-03-08 17:50:40 -08:00
Alex Dadgar
d83a8fe9f2
Unoptimized implementation + testing
2017-03-07 14:48:54 -08:00
Alex Dadgar
87d971a6b8
Double the anti-affinity for placing same task group on node
2017-03-06 11:52:53 -08:00
Alex Dadgar
5be806a3df
Fix vet script and fix vet problems
...
This PR fixes our vet script and fixes all the missed vet changes.
It also fixes pointers being printed in `nomad stop <job>` and `nomad
node-status <node>`.
2017-02-27 16:00:19 -08:00
Alex Dadgar
04862ca10e
Tests compile
2017-02-07 21:30:57 -08:00
Alex Dadgar
b69b357c7f
Nomad builds
2017-02-07 20:31:23 -08:00
Alex Dadgar
302a0cf382
Fix adjust test
2017-01-08 14:14:35 -08:00
Alex Dadgar
2c838a80f6
Detect newly created allocation's properly
2017-01-08 13:55:03 -08:00
Alex Dadgar
8d5f0fea69
Merge pull request #2128 from hashicorp/f-dispatch
...
Nomad Constructor Jobs and Dispatch
2017-01-06 05:22:49 +08:00
Diptanu Choudhury
9cdd576720
Updated changelog and fixed tests
2016-12-20 11:32:17 -08:00
Alex Dadgar
a1dd78c24b
Scheduler combines meta from job > group > task
2016-12-15 17:08:38 -08:00
Diptanu Choudhury
5191b4d33a
Making the status command return the allocs of currently registered job
2016-11-24 16:31:30 +01:00
Alex Dadgar
a1d08c2aba
Add scheduler version enforcement
2016-10-26 14:52:48 -07:00
Alex Dadgar
989827e402
Add set contains
2016-10-19 13:06:28 -07:00
Alex Dadgar
36cfe6e89e
Large refactor of task runner and Vault token rehandling
2016-10-18 11:24:20 -07:00
Ben Barnard
83f647ed84
Replace "the the" with "the" in documentation and comments
2016-10-11 15:31:40 -04:00
Diptanu Choudhury
dae7f88118
Not setting a drained node as preferred node ( #1740 )
2016-09-23 21:15:50 -07:00
Diptanu Choudhury
45afc0b4e1
Added logic to ensure scheduler knows job defn has been updated when ephemeral disks has been updated ( #1725 )
2016-09-21 14:00:02 -07:00
Alex Dadgar
bc500a536c
tasks updated
2016-09-21 11:31:09 -07:00
Diptanu Choudhury
36edabb487
Fixed the logic of calculating queued allocation in sys sched ( #1724 )
2016-09-20 12:05:19 -07:00
Alex Dadgar
683380c25c
Merge pull request #1715 from hashicorp/b-dead-system-nodes
...
Fix bug where dead nodes weren't properly handled by system scheduler
2016-09-19 11:49:44 -07:00
Alex Dadgar
47551e93b4
Fix bug in which dead nodes weren't being properly handled by system scheduler
2016-09-19 11:49:27 -07:00
Diptanu Choudhury
1b3c5e98c8
Renaming LocalDisk to EphemeralDisk ( #1710 )
...
Renaming LocalDisk to EphemeralDisk
2016-09-14 15:43:42 -07:00
Diptanu Choudhury
d94bb45ad3
Added some more comments
2016-08-31 14:06:31 -07:00
Diptanu Choudhury
52e9946da9
Implemented SetPrefferingNodes in stack
2016-08-30 16:17:50 -07:00
Diptanu Choudhury
bfee7b30a3
Introducing shared resources in alloc
2016-08-29 13:49:25 -07:00
Diptanu Choudhury
13497913f9
Ensuring resources are re-calculated properly in fsm
2016-08-26 20:13:11 -07:00
Diptanu Choudhury
e79cb67391
Changing implementation of AllocsFit
2016-08-26 17:28:29 -05:00
Diptanu Choudhury
3447658bba
Added scheduler tests to ensure disk constraints are honored
2016-08-25 15:31:56 -05:00
Diptanu Choudhury
ffaf6c6299
Fixed some tests
2016-08-25 13:56:39 -05:00
Diptanu Choudhury
ec73c768f1
Making the scheduler use LocalDisk instead of Resources.DiskMB
2016-08-25 12:27:42 -05:00
Diptanu Choudhury
c1a455983d
Added the chained alloc for system scheduler
2016-08-16 10:49:45 -07:00
Diptanu Choudhury
1de89776d7
Marking an allocation chained if we are creating this to replace an old one
2016-08-15 17:52:41 -07:00
Alex Dadgar
64f7eff612
Plan on system scheduler doesn't count nodes who don't meet constraints
2016-08-11 15:26:25 -07:00
Diptanu Choudhury
23fcb9f5c9
Ensuring system sched doesn't increment queued count when nodes are filtered
2016-08-10 14:33:13 -07:00
Diptanu Choudhury
13bab5b1ad
Added scheduler tests
2016-08-09 14:52:25 -07:00
Diptanu Choudhury
ab94c8eed9
Marking allocations which are not terminal and are on down nodes as lost
2016-08-09 13:11:58 -07:00
Alex Dadgar
e33bda76bf
test sched doesn't mark complete as lost + core_sched tests
2016-08-04 11:24:17 -07:00
Alex Dadgar
ac3328e812
Make scheduler mark allocations as lost
2016-08-03 15:57:46 -07:00
Alex Dadgar
3a9f3a31bc
KillTimeout can be modified in place
2016-08-01 20:19:12 -07:00
Alex Dadgar
e661c09898
fix filter logic
2016-07-28 15:57:56 -07:00
Alex Dadgar
ddbd9261c1
Merge pull request #1471 from hashicorp/b-handle-old-batch-allocs
...
filterCompleteAllocs filters replaced batch allocs
2016-07-28 14:31:19 -07:00
Diptanu Choudhury
eb08405467
Updated tests and added logic to system sched
2016-07-28 14:02:50 -07:00
Diptanu Choudhury
2e84d246f9
fixed a comment
2016-07-28 12:22:44 -07:00
Diptanu Choudhury
48eda99dd9
Setting the queued count as zero if there is nothing to place
2016-07-28 12:13:35 -07:00
Diptanu Choudhury
4a8636cb61
Added a test
2016-07-27 17:49:53 -07:00
Alex Dadgar
c132952ba2
filterCompleteAllocs filters replaced batch allocs
2016-07-27 11:54:55 -07:00
Diptanu Choudhury
d1a6bdb4ba
Making the queued allocations bind late
2016-07-25 22:11:11 -07:00
Diptanu Choudhury
d1682e052a
Added a test for adjustQueuedAllocations
2016-07-25 17:31:40 -07:00
Diptanu Choudhury
51cb201a09
Initializing the queued allocations late
2016-07-25 17:26:38 -07:00
Diptanu Choudhury
09aa867cc2
Added a test to ensure we record the queued allocations correctly when the plan made partial progress
2016-07-25 17:26:38 -07:00
Diptanu Choudhury
8f0d2a2775
Fixed some more tests
2016-07-25 17:26:38 -07:00
Diptanu Choudhury
4a17d8e6d6
Added a test to ensure failed batch allocations are being added to the number of queued allocations
2016-07-25 17:26:38 -07:00
Diptanu Choudhury
39bcfcd1c6
Added a test to ensure system scheduler records the correct number of queued allocations
2016-07-25 17:26:38 -07:00
Diptanu Choudhury
cce5f483ae
Added some more tests
2016-07-25 17:26:38 -07:00
Diptanu Choudhury
dabb83063b
Review comments
2016-07-25 17:26:38 -07:00
Diptanu Choudhury
50842b88c7
Fixed some bugs
2016-07-25 17:26:38 -07:00
Diptanu Choudhury
804ef1e932
Not setting the desired and client status of an allocation during in-place updates
2016-07-25 17:26:38 -07:00
Diptanu Choudhury
a64785417d
Fixed the logic for decrementing the count of queued based on plan result
2016-07-25 17:26:38 -07:00
Diptanu Choudhury
1cc0bc392b
Setting the number of queued allocations per task group
2016-07-25 17:26:38 -07:00
Diptanu Choudhury
487c66b84d
Removing the queued state of Job Summary and alloc desired status false
2016-07-13 13:20:46 -06:00
Alex Dadgar
e90529afc9
test for max plan
2016-06-20 17:56:49 -07:00
Alex Dadgar
67c0816726
Handle max plans
2016-06-20 17:43:02 -07:00
Sean Chittenden
a658299235
Misc typos
2016-06-16 16:17:17 -07:00
Alex Dadgar
d44c4761f6
track failed allocations properly
2016-06-15 12:58:19 -07:00
Alex Dadgar
8e231fa382
Rename ConsulService back to Service
2016-06-12 16:36:49 -07:00
Sean Chittenden
2f036231e5
Merge pull request #1201 from hashicorp/f-dyn-server-list
...
Dynamic Server Lists/Client Bootstrapping via consul.
2016-06-11 18:58:25 -04:00
Alex Dadgar
b064b392fc
Only unblock if missed class was added after eval snapshot index
2016-06-10 15:24:06 -07:00
Sean Chittenden
95c9d1a63e
Per-comment, remove structs.Allocation's Services attribute.
...
Nuke PopulateServiceIDs() now that it's also no longer needed.
2016-06-10 15:54:39 -04:00
Sean Chittenden
7956eb0c80
Rename structs.Task's `Service` attribute to `ConsulService`
2016-06-10 15:54:39 -04:00
Sean Chittenden
4973ec32bb
Rename structs.Services to structs.ConsulServices
2016-06-10 15:54:39 -04:00
Alex Dadgar
57770de1fc
Add eval-status and remove eval-monitor
2016-05-27 11:50:15 -07:00
Alex Dadgar
fb8d79a908
Blocked evals don't store TG alloc metrics
2016-05-27 11:26:14 -07:00
Alex Dadgar
6a236872b4
address comment
2016-05-25 10:30:47 -07:00
Alex Dadgar
3fd51ecece
Periodically unblock failed evaluations
2016-05-24 20:10:56 -07:00
Alex Dadgar
18d9e89065
Reuse the same evaluation and reblock it until there is no more work to do
2016-05-24 20:10:56 -07:00
Alex Dadgar
3cbb89c61e
Merge pull request #1188 from hashicorp/f-no-failed-allocs
...
Failed Allocation Metrics stored in Evaluation
2016-05-24 20:06:28 -07:00
Alex Dadgar
958d677248
comment
2016-05-24 18:18:10 -07:00
Alex Dadgar
fcc57fbc66
rename SpawnedBlockedEval and simplify map safety check
2016-05-24 18:12:59 -07:00
Alex Dadgar
7167b93ba9
Add test to verify drain doesn't restart successful batch and add to ignore list
2016-05-24 17:47:03 -07:00
Alex Dadgar
b5ad18a7ea
Dont restart successfully finished batch allocations
2016-05-24 17:23:18 -07:00
Alex Dadgar
1feb57b047
Evals track blocked evals they create
2016-05-19 13:09:52 -07:00
Alex Dadgar
8f5f12ae81
Scheduler no longer produces failed allocations; failed alloc metrics stored in evaluation
2016-05-18 18:11:40 -07:00
Alex Dadgar
117b926e2b
inplaceUpdate returns the allocs that were updated in-place
2016-05-17 15:37:37 -07:00
Alex Dadgar
a5ab96d40e
Merge pull request #1168 from hashicorp/f-plan-endpoint
...
Job.Plan endpoint
2016-05-16 13:15:40 -07:00
Alex Dadgar
a231f6f998
Switch to using the harness
2016-05-16 12:49:18 -07:00
Sean Chittenden
dc28ab0cb5
Speling police
2016-05-15 09:41:34 -07:00
Alex Dadgar
bed4cb7a9f
Fixes
2016-05-13 11:53:11 -07:00
Alex Dadgar
7a44ec5ccc
Remove plan from the response
2016-05-12 11:29:38 -07:00
Alex Dadgar
6d69e39966
Test task group update annotations
2016-05-11 16:31:50 -07:00
Alex Dadgar
81f0286dd8
Merge branch 'master' into f-plan-endpoint
2016-05-11 15:39:36 -07:00
Alex Dadgar
24bfaa70ac
Fix switching diff structures
2016-05-11 15:36:28 -07:00
Alex Dadgar
8b45e2c474
Check if network asks have changed when checking task updates
2016-05-05 21:32:01 -07:00
Alex Dadgar
ab0b57a9a1
Initial plan endpoint implementation - WIP
2016-05-05 11:21:58 -07:00
Alex Dadgar
ff0dd9b81c
Task is not eligible for update if User, Meta, or Resources change
2016-04-25 17:20:25 -07:00
Alex Dadgar
733156c016
vendor
2016-04-19 17:12:44 -07:00
Alex Dadgar
7dc1a525cb
more debug
2016-04-19 16:55:27 -07:00
Alex Dadgar
76e493dc16
base debugging
2016-04-19 16:33:25 -07:00
Alex Dadgar
1a31e5e137
Fix drained/batch allocations from continually migrating
2016-04-12 16:14:32 -07:00
Alex Dadgar
f021c1a7b0
filtering failed batch allocs
2016-04-11 12:51:53 -07:00
Alex Dadgar
034bae90bb
Revert "Remove client status from allocation TerminalStatus"
...
This reverts commit 819e1e4b3967c7029ee8221144666ff460fdd7ed.
2016-04-08 14:22:06 -07:00
Alex Dadgar
09f63fd3c0
Remove client status from allocation TerminalStatus
2016-03-25 12:53:37 -07:00
Alex Dadgar
b80e61a66c
Merge pull request #975 from hashicorp/f-rename-complete-alloc
...
Successful allocations are marked as complete instead of dead
2016-03-25 10:35:11 -07:00