Mahmood Ali
3eda42d027
tests: Migrated allocs aren't lost
...
Fix `TestServiceSched_NodeDown` for checking that the migrated allocs
are actually marked to be stopped.
The boolean logic in test made it skip actually checking client status
as long as desired status was stop.
Here, we mark some jobs for migration while leaving others as running,
and we check that lost flag is only set for non-migrated allocs.
2019-06-06 16:05:07 -04:00
Lang Martin
34230577df
describe a pending deployment with auto_promote accurately
2019-05-22 12:32:08 -04:00
Lang Martin
d462639cc9
sched reconcile copy AutoPromote to DeploymentState
2019-05-22 12:32:08 -04:00
Preetha Appan
374eee421f
Fix comment and assert score in test case
2019-05-15 12:35:57 -05:00
Nick Ethier
f0b9f8e37a
fix missing brace
2019-05-15 13:02:04 -04:00
Nick Ethier
0d851b5d11
scheduler: add check to prohibit returning inf during spread boost calculation
2019-05-15 13:00:24 -04:00
Lang Martin
29ea112586
system_sched & test cleanup comments
2019-05-01 12:25:26 -04:00
Lang Martin
c490dacf76
system_sched_test extend the test to check ineligible nodes
2019-05-01 12:25:26 -04:00
Lang Martin
c43bcbd35e
system_sched when a node is filtered, don't mark failure
2019-05-01 12:25:26 -04:00
Lang Martin
aecec5df1b
system_sched_test create partially constrained job
2019-05-01 12:25:26 -04:00
Arshneet Singh
d4e7a5c005
Add comments to functions, and use require instead of assert
2019-04-23 09:57:21 -07:00
Arshneet Singh
4cf4324b8f
Remove allowPlanOptimization from schedulers
2019-04-23 09:18:02 -07:00
Arshneet Singh
0dd4c109e8
Compat tags
2019-04-23 09:18:01 -07:00
Arshneet Singh
65f5fab131
Add tests for plan normalization
2019-04-23 09:18:01 -07:00
Arshneet Singh
b977748a4b
Add code for plan normalization
2019-04-23 09:18:01 -07:00
Danielle
198a838b61
Merge pull request #5512 from hashicorp/dani/f-alloc-stop
...
alloc-lifecycle: nomad alloc stop
2019-04-23 13:05:08 +02:00
Danielle Lancashire
832f607433
allocs: Add nomad alloc stop
...
This adds a `nomad alloc stop` command that can be used to stop and
force migrate an allocation to a different node.
This is built on top of the AllocUpdateDesiredTransitionRequest and
explicitly limits the scope of access to that transition to expose it
under the alloc-lifecycle ACL.
The API returns the follow up eval that can be used as part of
monitoring in the CLI or parsed and used in an external tool.
2019-04-23 12:50:23 +02:00
Preetha Appan
bcb5c8c70d
remove stray new line
2019-04-12 10:32:48 -05:00
Preetha Appan
8ddc076c1d
Refactor scheduler package to enable preemption for batch/service jobs
2019-04-10 20:24:01 -05:00
James Rasell
9470507cf4
Add NodeName to the alloc/job status outputs.
...
Currently when operators need to log onto a machine where an alloc
is running they will need to perform both an alloc/job status
call and then a call to discover the node name from the node list.
This updates both the job status and alloc status output to include
the node name within the information to make operator use easier.
Closes #2359
Cloess #1180
2019-04-10 10:34:10 -05:00
Preetha Appan
da1ce9bcea
Fix bug where scoring metadata would be overridden during an inplace upgrade.
2019-03-12 23:36:46 -05:00
Alex Dadgar
41265d4d61
Change types of weights on spread/affinity
2019-01-30 12:20:38 -08:00
Nick Ethier
24cbf42798
scheduler: fix NPE when deployment is nil, but placement is a canary
2019-01-28 20:22:59 -06:00
Alex Dadgar
5198ff05c3
convert driver to device for device constraint/attributes
2019-01-23 10:58:45 -08:00
Alex Dadgar
4bdccab550
goimports
2019-01-22 15:44:31 -08:00
Preetha Appan
3b054d6135
Remove unnecessary usage of alloc.Resource
2019-01-10 16:36:47 -06:00
Mahmood Ali
0dfa93a3c1
appease linter
2019-01-08 10:58:49 -05:00
Alex Dadgar
8a35d7b1dd
Test recovery
2019-01-07 14:49:41 -08:00
Preetha
f406e66ab8
Merge pull request #4881 from hashicorp/f-device-preemption
...
Device preemption
2018-12-11 18:34:19 -06:00
Preetha Appan
977a4a540d
Early continue after meeting needed count
...
Also adds another optimization that filters out un-needed allocations
as a final filtering step
2018-12-11 10:12:18 -06:00
Preetha Appan
f60c52c8ba
Score combinations of allocs from multiple devices for preemption
2018-12-07 18:35:47 -06:00
Alex Dadgar
1e3c3cb287
Deprecate IOPS
...
IOPS have been modelled as a resource since Nomad 0.1 but has never
actually been detected and there is no plan in the short term to add
detection. This is because IOPS is a bit simplistic of a unit to define
the performance requirements from the underlying storage system. In its
current state it adds unnecessary confusion and can be removed without
impacting any users. This PR leaves IOPS defined at the jobspec parsing
level and in the api/ resources since these are the two public uses of
the field. These should be considered deprecated and only exist to allow
users to stop using them during the Nomad 0.9.x release. In the future,
there should be no expectation that the field will exist.
2018-12-06 15:09:26 -08:00
Preetha Appan
63681fac0c
use structured logging everywhere consistently
2018-12-03 08:31:41 -06:00
Preetha Appan
766820def3
addresses some code clarity review comments
2018-11-27 11:02:06 -06:00
Mahmood Ali
96ffe044e7
Simplify map count update logic
...
Co-Authored-By: preetapan <preetha@hashicorp.com>
2018-11-27 10:03:11 -06:00
Mahmood Ali
57b94c2d50
code review suggestion
...
Co-Authored-By: preetapan <preetha@hashicorp.com>
2018-11-27 09:59:57 -06:00
Preetha Appan
86f416a984
Fix formatting
2018-11-16 20:45:52 -06:00
Preetha Appan
8efe6171e4
Fix preemption logic bug, need to group allocations by device first.
...
This ensures that the set of allocations chosen for preemption all share
the same device where ID is <vendor/type/device>
2018-11-16 20:32:10 -06:00
Danielle Tomlinson
9c72dafc95
scheduler: Add is_set/is_not_set constraints
...
This adds constraints for asserting that a given attribute or value
exists, or does not exist. This acts as a companion to =, or !=
operators, e.g:
```hcl
constraint {
attribute = "${attrs.type}"
operator = "!="
value = "database"
}
constraint {
attribute = "${attrs.type}"
operator = "is_set"
}
```
2018-11-15 11:00:32 -08:00
Preetha Appan
998968f57a
fix linting
2018-11-15 12:27:32 -06:00
Preetha Appan
e5de50fba8
Initial implementation of device preemption
2018-11-15 11:09:26 -06:00
Danielle Tomlinson
e5c641daa9
scheduler: Allow comparisons of nil values
...
This commit allows the ConstraintChecker to test values that do not exist.
This is useful when wanting to _exclude_ given nodes from executing a
job, for example, if you wanted to give canary nodes an attribute, and
not run critical services on them, you may specify something like the
below, but not want to tag all other nodes with the inverse.
```hcl
constraint {
attribute = "${node.attr.canary}
operator = "!="
value = "1"
}
```
This also requires all constraint checkers to allow for nil target
values, as they will no longer be short circuited by resolving a target.
2018-11-13 13:36:51 -08:00
Alex Dadgar
08dc2ea702
Merge pull request #4867 from hashicorp/b-deployment-progress-deadline
...
Blocked evaluation fixes
2018-11-13 10:29:03 -08:00
Preetha Appan
de890b9d5c
blank line
2018-11-12 15:50:14 -06:00
Preetha Appan
20af09a1ef
Fix logic bug in tracking sum of matched affinity weights
...
We need to track the sum of matching weights per device, but only
change the final return value if its the highest scoring choice
2018-11-12 15:06:45 -06:00
Preetha Appan
285b9b6001
Normalize scores correctly
2018-11-08 17:01:58 -06:00
Preetha Appan
f20f2ca8e9
Fixes device scheduling unit tests
...
Also changes the logic for score when there is more than one task
requesting a device. Since inter task affinities are already normalized,
we take the average of the scores across tasks.
2018-11-08 10:31:19 -06:00
Alex Dadgar
dbb05357bc
fix test
2018-11-07 11:59:24 -08:00
Alex Dadgar
a7ca737fb6
review comments
2018-11-07 11:31:52 -08:00
Alex Dadgar
36abd3a3d8
review comments
2018-11-07 10:33:22 -08:00
Alex Dadgar
e3cbb2c82e
allocs fit checks if devices get oversubscribed
2018-11-07 10:33:22 -08:00
Alex Dadgar
4f9b3ede87
Split device accounter and allocator
2018-11-07 10:32:03 -08:00
Alex Dadgar
6fa893c801
affinities
2018-11-07 10:32:03 -08:00
Alex Dadgar
feb83a2be3
assign devices
2018-11-07 10:32:03 -08:00
Alex Dadgar
6d8bb3a7bd
Duplicate blocked evals cancelling improved
...
The old logic for cancelling duplicate blocked evaluations by job id had
the issue where the newer evaluation could have additional node classes
that it is (in)eligible for that we would not capture. This could make
it such that cluster state could change such that the job would make
progress but no evaluation was unblocked.
2018-11-07 10:08:23 -08:00
Preetha Appan
a6b714b81c
update preemption tests to use new node resource structs
...
also includes a fix to remove unnecessary subtraction of network mbits
2018-11-02 17:59:53 -05:00
Preetha
b2b52b1ada
Merge pull request #4794 from hashicorp/f-preemption-systemjobs
...
Preemption for system jobs
2018-11-02 16:28:06 -05:00
Preetha Appan
56de32f363
Address more minor code review feedback
2018-11-02 16:26:34 -05:00
Preetha Appan
253a351532
Fix test setup
2018-11-02 16:06:25 -05:00
Preetha Appan
fba24e5a8a
dereference safely
2018-11-02 15:58:59 -05:00
Preetha Appan
d061678df7
Fix static port preemption to be device aware
2018-11-02 13:07:24 -05:00
Preetha Appan
4182444937
Handle static port preemption when there are multiple devices
...
Also added test case
2018-11-02 09:09:50 -05:00
Preetha Appan
fd60e66f86
Plumb alloc resource cache in a few more places.
...
also removed now unused method
2018-11-01 16:44:43 -05:00
Preetha Appan
78d635edca
More review comments
2018-11-01 16:36:11 -05:00
Preetha Appan
6e1023ba08
Cleaner way to exit early, and fixed a couple more places reading from alloc.Resources
2018-11-01 16:15:58 -05:00
Preetha Appan
b4dd26247f
review comments
2018-11-01 12:01:59 -05:00
Preetha Appan
d03201adf8
Fix formatting of allocation score metrics
2018-10-30 12:03:23 -05:00
Preetha Appan
f1c3eb2792
Introduce interface with multiple implementations for resource distance
2018-10-30 11:06:32 -05:00
Preetha Appan
047af5141e
refactor preemption code to use method recievers and setters for common fields
2018-10-30 11:06:32 -05:00
Preetha Appan
1a5421f5d7
more minor cleanup
2018-10-30 11:06:32 -05:00
Preetha Appan
0494a098ce
More style and readablity fixes from review
2018-10-30 11:06:32 -05:00
Preetha Appan
3910ba9bbd
Preempted allocations should be removed from proposed allocations
2018-10-30 11:06:32 -05:00
Preetha Appan
9dd76d83dc
comments
2018-10-30 11:06:32 -05:00
Preetha Appan
e6234e3cc5
fix end to end scheduler test to use new resource structs correctly
2018-10-30 11:06:32 -05:00
Preetha Appan
8807c25b11
Modify preemption code to use new style of resource structs
2018-10-30 11:06:32 -05:00
Preetha Appan
c1c1c230e4
Make preemption config a struct to allow for enabling based on scheduler type
2018-10-30 11:06:32 -05:00
Preetha Appan
25a047267f
Use scheduler config from state store to enable/disable preemption
2018-10-30 11:06:32 -05:00
Preetha Appan
1805032e69
Fix linting and better comments
2018-10-30 11:06:32 -05:00
Preetha Appan
cc295b90de
Implement preemption for system jobs.
...
This commit implements an allocation selection algorithm for finding
allocations to preempt. It currently special cases network resource asks
from others (cpu/memory/disk/iops).
2018-10-30 11:06:32 -05:00
Preetha Appan
22aee7294e
Merge branch 'f-fix-resource-type' of github.com:hashicorp/nomad into f-fix-resource-type
2018-10-16 18:30:12 -05:00
Preetha Appan
53c3f8151b
fix linting
2018-10-16 18:29:49 -05:00
Alex Dadgar
a78cefec18
use int64
2018-10-16 15:34:32 -07:00
Preetha Appan
7c0d8c646c
Change CPU/Disk/MemoryMB to int everywhere in new resource structs
2018-10-16 16:21:42 -05:00
Alex Dadgar
f5a76d8411
review comments
2018-10-15 15:31:13 -07:00
Alex Dadgar
7ecd65109a
Check constraints on devices
2018-10-14 13:35:47 -07:00
Alex Dadgar
5284554fcc
rework device checker
2018-10-13 16:47:53 -07:00
Alex Dadgar
1089e13b14
add to stack
2018-10-13 12:27:49 -07:00
Alex Dadgar
9b5aaac410
Device feasability checker
2018-10-13 12:27:49 -07:00
Preetha Appan
1574e898af
Fix bug in reconciler where terminal allocs on a job already stopped were unnecessarily updated
2018-10-08 21:03:49 -05:00
Alex Dadgar
01f8e5b95f
renames
2018-10-04 14:57:25 -07:00
Alex Dadgar
52f9cd7637
fixing tests
2018-10-04 14:26:19 -07:00
Alex Dadgar
bac5cb1e8b
Scheduler uses allocated resources
2018-10-02 17:08:25 -07:00
Preetha Appan
a10118c461
Add failed follow up to the list of allowed eval trigger reasons
...
needs unit test
2018-09-25 10:49:55 -07:00
Alex Dadgar
6a21f9fe96
Unique TriggerBy for blocked evals
...
Give blocked evals a unique triggerby reason to make debugging a chain
of evaluations easier.
2018-09-24 14:47:49 -07:00
Alex Dadgar
3c19d01d7a
server
2018-09-15 16:23:13 -07:00
Alex Dadgar
3ba62efd5e
Failed/paused deployments do not block migrations
...
This PR changes behavior of the scheduler such that a task group with a
deployment that is failed or paused will not cause the scheduler to skip
migrations.
The reason for this change is that it causes a bad UX when draining
nodes with allocations that are part of a failed/paused deployment.
These operations should not be coupled in any way and this remedies
that.
Prior behavior was still correct, but required either jobs to
transistion to a healthy state or for the node to hit its drain
deadline.
2018-09-10 15:28:45 -07:00
Alex Dadgar
cc92cd92cd
Merge pull request #4642 from hashicorp/b-vet
...
Fix vet errors and use newer go version in travis
2018-09-04 17:04:02 -07:00
Alex Dadgar
c6576ddac1
Fix make check errors
2018-09-04 16:03:52 -07:00
Preetha Appan
751c0eb5a5
code review feedback
2018-09-04 16:10:11 -05:00
Preetha Appan
9bc0962527
Track top k nodes by norm score rather than top k nodes per scorer
2018-09-04 16:10:11 -05:00
Preetha Appan
6ed527c636
Use heap to store top K scoring nodes.
...
Scoring metadata is now aggregated by scorer type to make it easier
to parse when reading it in the CLI.
2018-09-04 16:10:11 -05:00
Preetha Appan
65cf4373b3
fix linting error
2018-09-04 16:10:11 -05:00
Preetha Appan
dd5fe6373f
Fix scoring logic for uneven spread to incorporate current alloc count
...
Also addressed other small code review comments
2018-09-04 16:10:11 -05:00
Preetha Appan
e72c0fe527
more cleanup
2018-09-04 16:10:11 -05:00
Preetha Appan
4c624424e6
added some unit tests for -1 spread score
2018-09-04 16:10:11 -05:00
Preetha Appan
92d37acc2a
comment and formatting cleanup
2018-09-04 16:10:11 -05:00
Preetha Appan
7b0a27cad6
fix scoring algorithm when min count == current count
2018-09-04 16:10:11 -05:00
Preetha Appan
bad075f640
Remove hardcoded boosts for even spread.
...
instead, calculate them based on delta between current and minimum value
2018-09-04 16:10:11 -05:00
Preetha Appan
c56873ff37
Implement support for even spread across datacenters, with unit test
2018-09-04 16:10:11 -05:00
Preetha Appan
d091c00dd3
Support implicit spread target to account for remaining desired counts
2018-09-04 16:10:11 -05:00
Preetha Appan
33779abe5f
fix comments
2018-09-04 16:10:11 -05:00
Preetha Appan
5812f906c8
Allow empty spread targets, and validate target percentages.
2018-09-04 16:10:11 -05:00
Preetha Appan
55f276c189
Include spreads configured at job level when precomputing weights/desired counts.
2018-09-04 16:10:11 -05:00
Preetha Appan
fbd0004707
Fix warnings
2018-09-04 16:10:11 -05:00
Preetha Appan
db0d95b09c
Implement spread iterator that scores according to percentage of desired count in each target.
...
Added this as a new step in the stack and some unit tests
2018-09-04 16:10:11 -05:00
Preetha Appan
eccf128c5c
Some minor changes from code review
2018-09-04 16:10:11 -05:00
Preetha Appan
038ed52877
Fix after rename to ConstraintSetContainsAny
2018-09-04 16:10:11 -05:00
Preetha Appan
3a39db3902
Fix linting
2018-09-04 16:10:11 -05:00
Preetha Appan
d5cd2bbddb
Remove unnecessary reset
2018-09-04 16:10:11 -05:00
Preetha Appan
dccb693221
test for setcontainsany, and treat set_contains same as set_contains_all
2018-09-04 16:10:11 -05:00
Preetha Appan
70bfd0c0cb
Address some review feedback
2018-09-04 16:10:11 -05:00
Preetha Appan
8685593ec0
Back out changes to propertyset that were not necessary for affinities
2018-09-04 16:10:11 -05:00
Preetha Appan
5eacd6ada4
Implement affinity support in generic scheduler
2018-09-04 16:10:11 -05:00
Alex Dadgar
e1c239daae
Merge pull request #4414 from hashicorp/b-stop-summary
...
Reset Queued allocs to zero when job stopped
2018-07-16 14:32:55 -07:00
Nick Ethier
6b6777359b
scheduler: fix missing err assignment
2018-07-11 14:27:10 -04:00
Nick Ethier
5f6def5b04
scheduler: better error handling
2018-07-05 11:00:03 -04:00
Nick Ethier
030e650e78
scheduler: fix nil pointer exception
2018-07-02 16:05:38 -04:00
Alex Dadgar
300b1a7a15
Tests only use testlog package logger
2018-06-13 15:40:56 -07:00
Alex Dadgar
c3c79c408e
Reset Queued allocs to zero when job stopped
...
When a job is stopped but not purged, we should set the Queued count to
be zero.
2018-06-13 10:46:39 -07:00
Preetha Appan
b64788043e
make test create index clearer
2018-06-05 17:29:59 -05:00
Preetha Appan
3e264dcb79
Fix reconciler bug with deployment not being created if job create index is different
...
This fixes an issue where if a job is purged and resubmitted Nomad does not create
a new deployment. Adds unit test that failed before this fix
2018-06-05 13:58:53 -05:00
Preetha Appan
f8a23bc54a
fix test comment
2018-05-09 16:01:34 -05:00
Preetha Appan
ef531b0f34
Add unit tests for forced rescheduling
2018-05-09 11:30:42 -05:00
Preetha Appan
c1b92c284e
Work in progress - force rescheduling of failed allocs
2018-05-08 17:26:57 -05:00
Alex Dadgar
555d14fd92
Add test
2018-05-07 14:55:01 -05:00
Preetha Appan
cf44670d56
Make sure that task group has a deployment state before using it
2018-05-07 14:55:01 -05:00
Alex Dadgar
c6478d9469
clarify comment
2018-05-07 14:55:01 -05:00
Alex Dadgar
768fec8505
Allow healthy canary deployment to skip progress deadline
2018-05-07 14:55:01 -05:00
Alex Dadgar
8626c1b94a
Reschedule when we have canaries properly
2018-05-07 14:55:01 -05:00
Alex Dadgar
8dee3ab068
canary reschedule test
2018-05-07 14:50:01 -05:00
Alex Dadgar
deb93dc7b7
Test for rescheduling when there are canaries
2018-05-07 14:50:01 -05:00
Alex Dadgar
550f5e31f8
Allow canary count greater than desired
2018-05-07 14:50:01 -05:00
Alex Dadgar
f95ab4ade8
Mark canaries on creation, and unmark on promotion
2018-05-07 14:50:01 -05:00
Preetha Appan
5329900f6d
Only use DesiredTransition.Reschedule in reconciler when its an active deployment
2018-05-07 14:50:01 -05:00
Alex Dadgar
e7444c3873
Add test where deployment is marked as complete when done even with failed allocs
2018-05-07 14:50:01 -05:00
Alex Dadgar
57969b4ee0
fix reconcile tests
2018-05-07 14:50:01 -05:00
Alex Dadgar
5547974f35
Only reschedule allowed deployment allocs
2018-05-07 14:50:01 -05:00
Alex Dadgar
fcf4f582d0
small review feedback fixes
2018-05-07 14:50:01 -05:00
Alex Dadgar
1336002255
Progress deadline in deployment state
2018-05-07 14:50:01 -05:00
Alex Dadgar
ee50789c22
Initial implementation
2018-05-07 14:50:01 -05:00