Commit graph

451 commits

Author SHA1 Message Date
Preetha Appan 5373ade731
Scheduler and Reconciler changes to support delayed rescheduling 2018-03-14 16:10:32 -05:00
Josh Soref e0f6a33fe5 spelling: system 2018-03-11 19:01:19 +00:00
Josh Soref a89e1b8395 spelling: strategy 2018-03-11 18:58:19 +00:00
Josh Soref f8eb766fb5 spelling: reschedulable 2018-03-11 18:48:12 +00:00
Josh Soref ed8db9992e spelling: feasibility 2018-03-11 18:07:09 +00:00
Josh Soref bf9283c606 spelling: corresponding 2018-03-11 17:51:41 +00:00
Josh Soref ca4ceb0e5c spelling: commits 2018-03-11 17:47:45 +00:00
Preetha Appan 7b6ba7a1f4
Fixes bug in reconciler where previously rescheduled allocs are rescheduled again. Simplified logic and added test case to catch this. 2018-02-20 12:07:56 -06:00
Preetha Appan 7c57303dd2
Clarify comment 2018-02-05 16:37:07 -06:00
Preetha Appan d48c411692
Reconciler should consider failed allocs when marking deployment as failed. 2018-02-02 19:40:25 -06:00
Preetha Appan a1237d627a
code review feedback 2018-01-31 09:58:05 -06:00
Preetha Appan 5ad892026a
Add a field to track the next allocation during a replacement 2018-01-31 09:58:05 -06:00
Preetha Appan 2ed4de7e7b
Track previous node id correctly, plus unit test 2018-01-31 09:58:05 -06:00
Preetha Appan dd4917c2f0
Add more clarification in comment 2018-01-31 09:58:05 -06:00
Preetha Appan 09bef7d1ce
Preallocate slice for skipped nodes 2018-01-31 09:58:05 -06:00
Preetha Appan 237beb49ae
Better score threshold 2018-01-31 09:58:05 -06:00
Preetha Appan fa18c0def4
Add one more unit test 2018-01-31 09:58:05 -06:00
Preetha Appan a75540cec6
Limit iterator uses a score threshold and a maxSkip value to be able to skip lower scoring nodes 2018-01-31 09:58:05 -06:00
Preetha Appan b6268a5fab
Beef up unit test for rescheduling batch jobs 2018-01-31 09:56:53 -06:00
Preetha Appan ea4a889e28
Address more code review feedback 2018-01-31 09:56:53 -06:00
Preetha Appan bd89d2b39e
Make sure that reschedule trackers are not added for node drain replacements 2018-01-31 09:56:53 -06:00
Preetha Appan a662b38801
Improve reconciler unit tests 2018-01-31 09:56:53 -06:00
Preetha Appan fee4ccf154
Prevent side effect modification of select options when preferred nodes are set 2018-01-31 09:56:53 -06:00
Preetha Appan 21b7b79d5d
Add helper methods, use require and other code review feedback 2018-01-31 09:56:53 -06:00
Preetha Appan d0f9d59abb
Reconile with changes to structs for reschedule tracking 2018-01-31 09:56:53 -06:00
Preetha Appan fbb1936dee
Fix some comments and lint warnings, remove unused method 2018-01-31 09:56:53 -06:00
Preetha Appan 031c566ada
Reschedule previous allocs and track their reschedule attempts 2018-01-31 09:56:53 -06:00
Preetha Appan fd2fbefa4c
Add a field to track the next allocation during a replacement 2018-01-24 17:55:05 -06:00
Alex Dadgar 6dda0ebaed gofmt 2018-01-04 14:45:15 -08:00
Alex Dadgar 2f561609b7 Fix detection of successful batch allocations
This PR restores older behavior of detecting successful batch
allocations (04d86ffd1006fde9dfb2ca8c1237fe60b995b0e3). This has the
side effect that we correctly filter desired status stop but not
successful batch allocations and create their replacements.
2018-01-04 14:20:32 -08:00
Preetha 1712b03705
Merge branch 'master' into 0.8 2018-01-03 16:06:38 -06:00
Preetha Appan 51bd0b59c7
Return an error if evaluation doesn't exist in state store at plan apply time. 2017-12-18 14:55:36 -06:00
Preetha Appan 3c36abfe14
Update eval modify index as part of plan apply. 2017-12-18 10:03:55 -06:00
Preetha Appan 3b4d7ac2a3
Fix some typos 2017-12-14 13:29:27 -06:00
Michael Schurter 45494f7304 Fix port labels on mock Alloc/Job/Node 2017-12-08 14:50:06 -08:00
Alex Dadgar 44240ce440 Merge pull request #3375 from hashicorp/b-batch
Allow batch jobs to be rerun if purged
2017-10-13 17:11:45 -07:00
Alex Dadgar c1cc51dbee sync 2017-10-13 14:36:02 -07:00
Alex Dadgar 746cd7403f Allow batch jobs to be rerun if purged
This PR allows batch jobs to be rerun if they have been purged.
2017-10-13 12:40:37 -07:00
Michael Schurter a66c53d45a Remove structs import from api
Goes a step further and removes structs import from api's tests as well
by moving GenerateUUID to its own package.
2017-09-29 10:36:08 -07:00
Alex Dadgar 4173834231 Enable more linters 2017-09-26 15:26:33 -07:00
Alex Dadgar 3904bde9a3 Fix batch handling of complete allocs/node drains
This PR fixes:
* An issue in which a node-drain that contains a complete batch alloc
would cause a replacement
* An issue in which allocations with the same name during a scale
down/stop event wouldn't be properly stopped.
* An issue in which batch allocations from previous job versions may not
have been stopped properly.

Fixes https://github.com/hashicorp/nomad/issues/3210
2017-09-14 15:08:57 -07:00
Alex Dadgar 84d06f6abe Sync namespace changes 2017-09-07 17:04:21 -07:00
Alex Dadgar 0aef02a4f9 fix test 2017-08-21 14:07:54 -07:00
Alex Dadgar 27256ebcc6 Placing allocs counts towards placement limit
This PR makes placing new allocations count towards the limit. We do not
restrict how many new placements are made by the limit but we still
count towards the limit. This has the nice affect that if you have a
group with count = 5 and max_parallel = 1 but only 3 allocs exist for it
and a change is made, you will create 2 more at the new version but not
destroy one, taking you down to two running as you would have
previously.

Fixes https://github.com/hashicorp/nomad/issues/3053
2017-08-21 12:41:19 -07:00
Alex Dadgar 2453f13fc5 fixes 2017-08-15 12:27:05 -07:00
Alex Dadgar 0570e09feb Fix panic occuring from improper bitmap size
This PR fixes an allignment calculation when determining the bitmap
size.

Fixes https://github.com/hashicorp/nomad/issues/3008
2017-08-12 15:37:02 -07:00
Luke Farnell f0ced87b95 fixed all spelling mistakes for goreport 2017-08-07 17:13:05 -04:00
Alex Dadgar 7b13c0d702 Lost allocs replaced even if deployment failed
This PR allows the scheduler to replace lost allocations even if the job
has a failed or paused deployment. The prior behavior was confusing to
users.

Fixes https://github.com/hashicorp/nomad/issues/2958
2017-08-03 17:42:14 -07:00
Alex Dadgar 7d2b84ab01 Review fixes 2017-08-01 14:18:52 -07:00
Alex Dadgar 2650bb1d12 Distinct Property supports arbitrary limit
This PR enhances the distinct_property constraint such that a limit can
be specified in the RTarget/value parameter. This allows constraints
such as:

```
constraint {
  distinct_property = "${meta.rack}"
  value = "2"
}
```

This restricts any given rack from running more than 2 allocations from
the task group.

Fixes https://github.com/hashicorp/nomad/issues/1146
2017-07-31 16:52:13 -07:00
Alex Dadgar 4f69355a66 Fix incorrect destructive update with distinct_property constraint
This PR fixes an issue in which an update to a task group with a
distinct property constraint would result in an incorrect destructive
update.
2017-07-31 11:17:35 -07:00
Michael Schurter 5f1f91a46c Use go-testing-interface instead of testing
This drops the testings stdlib pkg from our dependencies. Saves a
whopping 46kb on our binary (was really hoping for more of a win there),
but also avoids potential ugliness with how testing sets flags.
2017-07-25 15:35:19 -07:00
Alex Dadgar 492239d3ee Improve multiple group handling in a deployment
This PR resolves a bug in which a job with multiple task groups would
create new deployment objects each, thus clearing out all other task
groups deployment state.
2017-07-25 11:27:47 -07:00
Alex Dadgar 184bfd4836 Better comment 2017-07-20 12:31:08 -07:00
Alex Dadgar 248315a2d9 Handle destructive changes before placements
This PR updates the generic scheduler to handle destructive changes
before handling placements. This is important because the destructive
change may be due to a lowering of resources. If this is the case, the
handling of the destructive changes first may make it possible for the
placement to happen.

To reason about this imagine there is one node with CPU = 500.

If the group originally had:
* `count = 1`
* `cpu = 400`

And then the job was updated such that the group had:
* `count = 4`
* `cpu = 120`

If the original alloc isn't discounted first, nothing would be able to
place.
2017-07-20 12:24:27 -07:00
Alex Dadgar ce265e0aff Update full node test to test more advanced case 2017-07-20 12:23:40 -07:00
Alex Dadgar a9ec1d6ca7 Fix update limit calculation to avoid panic
This PR fixes the rolling update limit calculation to avoid a panic when
there are more allocations for a deployment that haven't determined
their health than the max_parallel count of the task group.

Fixes https://github.com/hashicorp/nomad/issues/2820
2017-07-19 11:11:47 -07:00
Alex Dadgar 22e84d00ab Fix deep copy of driver config 2017-07-17 17:53:21 -07:00
Alex Dadgar 641e178416 Stop before trying to place 2017-07-17 17:18:12 -07:00
Alex Dadgar 66a90326e1 Treat destructive updates atomically 2017-07-16 10:35:38 -07:00
Alex Dadgar f86760db3c Basic logs 2017-07-07 16:49:08 -07:00
Alex Dadgar 20005f925a Rolling node drains using max_parallel and stagger
This PR adds rolling node drains done at max_parallel and stagger of the
update spec. It brings it inline with old behavior.
2017-07-07 12:12:48 -07:00
Alex Dadgar 3a29b38108 Status description shows requiring promotion 2017-07-07 12:12:48 -07:00
Alex Dadgar 9f016606aa Fix some tests, eval monitor shows deployment id and deployment cancels based on version 2017-07-07 12:12:48 -07:00
Alex Dadgar 9aa1f2fea2 Respond to comments 2017-07-07 12:10:04 -07:00
Alex Dadgar 454083ba1b Remove canary 2017-07-07 12:10:04 -07:00
Alex Dadgar d352d85bb9 Test scheduler's handling of canaries/inplace updates 2017-07-07 12:10:04 -07:00
Alex Dadgar 83c60483f2 Test marking as complete 2017-07-07 12:10:04 -07:00
Alex Dadgar 477c713df5 Plan apply handles canaries and success is set via update 2017-07-07 12:10:04 -07:00
Alex Dadgar 1e8b5e75a5 Fix handling of failed job 2017-07-07 12:10:04 -07:00
Alex Dadgar e229d3650b Attach eval id 2017-07-07 12:10:04 -07:00
Alex Dadgar af1935e1e1 Mark complete 2017-07-07 12:10:04 -07:00
Alex Dadgar 8424a3b380 Change canary handling 2017-07-07 12:10:04 -07:00
Alex Dadgar c10d7ab871 Remove promoted bit from allocation 2017-07-07 12:10:04 -07:00
Alex Dadgar 09dfa2fc10 Rename CreateDeployments and remove cancelling behavior in state_store 2017-07-07 12:10:04 -07:00
Alex Dadgar 067ed86a47 Client watches for allocation health using task state and Consul checks
This PR adds watching of allocation health at the client. The client can
watch for health based on the tasks running on time and also based on
the consul checks passing.
2017-07-07 12:10:04 -07:00
Alex Dadgar e7034691ea deployment status 2017-07-07 12:07:07 -07:00
Alex Dadgar d04877d23c initial impl 2017-07-07 12:03:11 -07:00
Alex Dadgar 27a6e6b6d1 update description of the alloc update factory function 2017-07-07 12:03:11 -07:00
Alex Dadgar ce2319be9b cleanup limit detection 2017-07-07 12:03:11 -07:00
Alex Dadgar b2573b01f9 Fix canary handling 2017-07-07 12:03:11 -07:00
Alex Dadgar 7952240d69 Deployment tests 2017-07-07 12:03:11 -07:00
Alex Dadgar ce55559f12 Non-Canary/Deployment Tests 2017-07-07 12:03:11 -07:00
Alex Dadgar d111dd5c10 Pull out in-place updating into a passed in function; reduce inputs to reconciler 2017-07-07 12:03:11 -07:00
Alex Dadgar c77944ed29 assign names 2017-07-07 12:03:11 -07:00
Alex Dadgar ecacd44888 handle batch filtering 2017-07-07 12:03:11 -07:00
Alex Dadgar 4c123500ee Remove old 2017-07-07 12:03:11 -07:00
Alex Dadgar 270e26c600 Populate desired state per tg 2017-07-07 12:03:11 -07:00
Alex Dadgar 23dcd175ef Show canaries on plan 2017-07-07 12:03:11 -07:00
Alex Dadgar cf5baba808 handle annotations 2017-07-07 12:03:11 -07:00
Alex Dadgar a46f7c3eb8 Todos 2017-07-07 12:03:11 -07:00
Alex Dadgar 00d962b8b5 Some comments and cleanup 2017-07-07 12:03:11 -07:00
Alex Dadgar 994ad285b7 Split reconcile file 2017-07-07 12:03:11 -07:00
Alex Dadgar 07b1c3e5db Only upsert a job if the spec changes and push deployment creation into reconciler 2017-07-07 12:03:11 -07:00
Alex Dadgar 0d42b5d421 initial reconciler 2017-07-07 12:01:17 -07:00
Alex Dadgar b3f4db0930 cancel deployments 2017-07-07 12:01:17 -07:00
Alex Dadgar 8169590d76 Fix tests 2017-05-01 13:54:26 -07:00
Alex Dadgar 5a2449d236 Respond to review comments 2017-04-19 10:54:03 -07:00
Alex Dadgar 3145086a42 non-purge deregisters 2017-04-15 17:08:05 -07:00
Alex Dadgar 2c31d4036b Skip inplace update on terminal batch allocation
This PR skips adding an inplace update to a successfully terminal batch
job to the plan. This avoids extra data in the plan and avoids
triggering updates on all clients that have the terminal allocation.
This is matching behavior of the service scheduler.

/cc @armon for review
2017-03-11 17:19:22 -08:00