Commit graph

704 commits

Author SHA1 Message Date
Preetha Appan e2656ef546
Cleaner handling of batched evals 2018-03-14 16:10:32 -05:00
Preetha Appan 47e0280d96
More small review feedback 2018-03-14 16:10:32 -05:00
Preetha Appan 2ba976dec8
Remove unnecessary check against 5 second window for determining immediate scheduling eligibility 2018-03-14 16:10:32 -05:00
Preetha Appan 5373ade731
Scheduler and Reconciler changes to support delayed rescheduling 2018-03-14 16:10:32 -05:00
Josh Soref e0f6a33fe5 spelling: system 2018-03-11 19:01:19 +00:00
Josh Soref a89e1b8395 spelling: strategy 2018-03-11 18:58:19 +00:00
Josh Soref f8eb766fb5 spelling: reschedulable 2018-03-11 18:48:12 +00:00
Josh Soref ed8db9992e spelling: feasibility 2018-03-11 18:07:09 +00:00
Josh Soref bf9283c606 spelling: corresponding 2018-03-11 17:51:41 +00:00
Josh Soref ca4ceb0e5c spelling: commits 2018-03-11 17:47:45 +00:00
Preetha Appan 7b6ba7a1f4
Fixes bug in reconciler where previously rescheduled allocs are rescheduled again. Simplified logic and added test case to catch this. 2018-02-20 12:07:56 -06:00
Preetha Appan 7c57303dd2
Clarify comment 2018-02-05 16:37:07 -06:00
Preetha Appan d48c411692
Reconciler should consider failed allocs when marking deployment as failed. 2018-02-02 19:40:25 -06:00
Preetha Appan a1237d627a
code review feedback 2018-01-31 09:58:05 -06:00
Preetha Appan 5ad892026a
Add a field to track the next allocation during a replacement 2018-01-31 09:58:05 -06:00
Preetha Appan 2ed4de7e7b
Track previous node id correctly, plus unit test 2018-01-31 09:58:05 -06:00
Preetha Appan dd4917c2f0
Add more clarification in comment 2018-01-31 09:58:05 -06:00
Preetha Appan 09bef7d1ce
Preallocate slice for skipped nodes 2018-01-31 09:58:05 -06:00
Preetha Appan 237beb49ae
Better score threshold 2018-01-31 09:58:05 -06:00
Preetha Appan fa18c0def4
Add one more unit test 2018-01-31 09:58:05 -06:00
Preetha Appan a75540cec6
Limit iterator uses a score threshold and a maxSkip value to be able to skip lower scoring nodes 2018-01-31 09:58:05 -06:00
Preetha Appan b6268a5fab
Beef up unit test for rescheduling batch jobs 2018-01-31 09:56:53 -06:00
Preetha Appan ea4a889e28
Address more code review feedback 2018-01-31 09:56:53 -06:00
Preetha Appan bd89d2b39e
Make sure that reschedule trackers are not added for node drain replacements 2018-01-31 09:56:53 -06:00
Preetha Appan a662b38801
Improve reconciler unit tests 2018-01-31 09:56:53 -06:00
Preetha Appan fee4ccf154
Prevent side effect modification of select options when preferred nodes are set 2018-01-31 09:56:53 -06:00
Preetha Appan 21b7b79d5d
Add helper methods, use require and other code review feedback 2018-01-31 09:56:53 -06:00
Preetha Appan d0f9d59abb
Reconile with changes to structs for reschedule tracking 2018-01-31 09:56:53 -06:00
Preetha Appan fbb1936dee
Fix some comments and lint warnings, remove unused method 2018-01-31 09:56:53 -06:00
Preetha Appan 031c566ada
Reschedule previous allocs and track their reschedule attempts 2018-01-31 09:56:53 -06:00
Preetha Appan fd2fbefa4c
Add a field to track the next allocation during a replacement 2018-01-24 17:55:05 -06:00
Alex Dadgar 6dda0ebaed gofmt 2018-01-04 14:45:15 -08:00
Alex Dadgar 2f561609b7 Fix detection of successful batch allocations
This PR restores older behavior of detecting successful batch
allocations (04d86ffd1006fde9dfb2ca8c1237fe60b995b0e3). This has the
side effect that we correctly filter desired status stop but not
successful batch allocations and create their replacements.
2018-01-04 14:20:32 -08:00
Preetha 1712b03705
Merge branch 'master' into 0.8 2018-01-03 16:06:38 -06:00
Preetha Appan 51bd0b59c7
Return an error if evaluation doesn't exist in state store at plan apply time. 2017-12-18 14:55:36 -06:00
Preetha Appan 3c36abfe14
Update eval modify index as part of plan apply. 2017-12-18 10:03:55 -06:00
Preetha Appan 3b4d7ac2a3
Fix some typos 2017-12-14 13:29:27 -06:00
Michael Schurter 45494f7304 Fix port labels on mock Alloc/Job/Node 2017-12-08 14:50:06 -08:00
Alex Dadgar 44240ce440 Merge pull request #3375 from hashicorp/b-batch
Allow batch jobs to be rerun if purged
2017-10-13 17:11:45 -07:00
Alex Dadgar c1cc51dbee sync 2017-10-13 14:36:02 -07:00
Alex Dadgar 746cd7403f Allow batch jobs to be rerun if purged
This PR allows batch jobs to be rerun if they have been purged.
2017-10-13 12:40:37 -07:00
Michael Schurter a66c53d45a Remove structs import from api
Goes a step further and removes structs import from api's tests as well
by moving GenerateUUID to its own package.
2017-09-29 10:36:08 -07:00
Alex Dadgar 4173834231 Enable more linters 2017-09-26 15:26:33 -07:00
Alex Dadgar 3904bde9a3 Fix batch handling of complete allocs/node drains
This PR fixes:
* An issue in which a node-drain that contains a complete batch alloc
would cause a replacement
* An issue in which allocations with the same name during a scale
down/stop event wouldn't be properly stopped.
* An issue in which batch allocations from previous job versions may not
have been stopped properly.

Fixes https://github.com/hashicorp/nomad/issues/3210
2017-09-14 15:08:57 -07:00
Alex Dadgar 84d06f6abe Sync namespace changes 2017-09-07 17:04:21 -07:00
Alex Dadgar 0aef02a4f9 fix test 2017-08-21 14:07:54 -07:00
Alex Dadgar 27256ebcc6 Placing allocs counts towards placement limit
This PR makes placing new allocations count towards the limit. We do not
restrict how many new placements are made by the limit but we still
count towards the limit. This has the nice affect that if you have a
group with count = 5 and max_parallel = 1 but only 3 allocs exist for it
and a change is made, you will create 2 more at the new version but not
destroy one, taking you down to two running as you would have
previously.

Fixes https://github.com/hashicorp/nomad/issues/3053
2017-08-21 12:41:19 -07:00
Alex Dadgar 2453f13fc5 fixes 2017-08-15 12:27:05 -07:00
Alex Dadgar 0570e09feb Fix panic occuring from improper bitmap size
This PR fixes an allignment calculation when determining the bitmap
size.

Fixes https://github.com/hashicorp/nomad/issues/3008
2017-08-12 15:37:02 -07:00
Luke Farnell f0ced87b95 fixed all spelling mistakes for goreport 2017-08-07 17:13:05 -04:00
Alex Dadgar 7b13c0d702 Lost allocs replaced even if deployment failed
This PR allows the scheduler to replace lost allocations even if the job
has a failed or paused deployment. The prior behavior was confusing to
users.

Fixes https://github.com/hashicorp/nomad/issues/2958
2017-08-03 17:42:14 -07:00
Alex Dadgar 7d2b84ab01 Review fixes 2017-08-01 14:18:52 -07:00
Alex Dadgar 2650bb1d12 Distinct Property supports arbitrary limit
This PR enhances the distinct_property constraint such that a limit can
be specified in the RTarget/value parameter. This allows constraints
such as:

```
constraint {
  distinct_property = "${meta.rack}"
  value = "2"
}
```

This restricts any given rack from running more than 2 allocations from
the task group.

Fixes https://github.com/hashicorp/nomad/issues/1146
2017-07-31 16:52:13 -07:00
Alex Dadgar 4f69355a66 Fix incorrect destructive update with distinct_property constraint
This PR fixes an issue in which an update to a task group with a
distinct property constraint would result in an incorrect destructive
update.
2017-07-31 11:17:35 -07:00
Michael Schurter 5f1f91a46c Use go-testing-interface instead of testing
This drops the testings stdlib pkg from our dependencies. Saves a
whopping 46kb on our binary (was really hoping for more of a win there),
but also avoids potential ugliness with how testing sets flags.
2017-07-25 15:35:19 -07:00
Alex Dadgar 492239d3ee Improve multiple group handling in a deployment
This PR resolves a bug in which a job with multiple task groups would
create new deployment objects each, thus clearing out all other task
groups deployment state.
2017-07-25 11:27:47 -07:00
Alex Dadgar 184bfd4836 Better comment 2017-07-20 12:31:08 -07:00
Alex Dadgar 248315a2d9 Handle destructive changes before placements
This PR updates the generic scheduler to handle destructive changes
before handling placements. This is important because the destructive
change may be due to a lowering of resources. If this is the case, the
handling of the destructive changes first may make it possible for the
placement to happen.

To reason about this imagine there is one node with CPU = 500.

If the group originally had:
* `count = 1`
* `cpu = 400`

And then the job was updated such that the group had:
* `count = 4`
* `cpu = 120`

If the original alloc isn't discounted first, nothing would be able to
place.
2017-07-20 12:24:27 -07:00
Alex Dadgar ce265e0aff Update full node test to test more advanced case 2017-07-20 12:23:40 -07:00
Alex Dadgar a9ec1d6ca7 Fix update limit calculation to avoid panic
This PR fixes the rolling update limit calculation to avoid a panic when
there are more allocations for a deployment that haven't determined
their health than the max_parallel count of the task group.

Fixes https://github.com/hashicorp/nomad/issues/2820
2017-07-19 11:11:47 -07:00
Alex Dadgar 22e84d00ab Fix deep copy of driver config 2017-07-17 17:53:21 -07:00
Alex Dadgar 641e178416 Stop before trying to place 2017-07-17 17:18:12 -07:00
Alex Dadgar 66a90326e1 Treat destructive updates atomically 2017-07-16 10:35:38 -07:00
Alex Dadgar f86760db3c Basic logs 2017-07-07 16:49:08 -07:00
Alex Dadgar 20005f925a Rolling node drains using max_parallel and stagger
This PR adds rolling node drains done at max_parallel and stagger of the
update spec. It brings it inline with old behavior.
2017-07-07 12:12:48 -07:00
Alex Dadgar 3a29b38108 Status description shows requiring promotion 2017-07-07 12:12:48 -07:00
Alex Dadgar 9f016606aa Fix some tests, eval monitor shows deployment id and deployment cancels based on version 2017-07-07 12:12:48 -07:00
Alex Dadgar 9aa1f2fea2 Respond to comments 2017-07-07 12:10:04 -07:00
Alex Dadgar 454083ba1b Remove canary 2017-07-07 12:10:04 -07:00
Alex Dadgar d352d85bb9 Test scheduler's handling of canaries/inplace updates 2017-07-07 12:10:04 -07:00
Alex Dadgar 83c60483f2 Test marking as complete 2017-07-07 12:10:04 -07:00
Alex Dadgar 477c713df5 Plan apply handles canaries and success is set via update 2017-07-07 12:10:04 -07:00
Alex Dadgar 1e8b5e75a5 Fix handling of failed job 2017-07-07 12:10:04 -07:00
Alex Dadgar e229d3650b Attach eval id 2017-07-07 12:10:04 -07:00
Alex Dadgar af1935e1e1 Mark complete 2017-07-07 12:10:04 -07:00
Alex Dadgar 8424a3b380 Change canary handling 2017-07-07 12:10:04 -07:00
Alex Dadgar c10d7ab871 Remove promoted bit from allocation 2017-07-07 12:10:04 -07:00
Alex Dadgar 09dfa2fc10 Rename CreateDeployments and remove cancelling behavior in state_store 2017-07-07 12:10:04 -07:00
Alex Dadgar 067ed86a47 Client watches for allocation health using task state and Consul checks
This PR adds watching of allocation health at the client. The client can
watch for health based on the tasks running on time and also based on
the consul checks passing.
2017-07-07 12:10:04 -07:00
Alex Dadgar e7034691ea deployment status 2017-07-07 12:07:07 -07:00
Alex Dadgar d04877d23c initial impl 2017-07-07 12:03:11 -07:00
Alex Dadgar 27a6e6b6d1 update description of the alloc update factory function 2017-07-07 12:03:11 -07:00
Alex Dadgar ce2319be9b cleanup limit detection 2017-07-07 12:03:11 -07:00
Alex Dadgar b2573b01f9 Fix canary handling 2017-07-07 12:03:11 -07:00
Alex Dadgar 7952240d69 Deployment tests 2017-07-07 12:03:11 -07:00
Alex Dadgar ce55559f12 Non-Canary/Deployment Tests 2017-07-07 12:03:11 -07:00
Alex Dadgar d111dd5c10 Pull out in-place updating into a passed in function; reduce inputs to reconciler 2017-07-07 12:03:11 -07:00
Alex Dadgar c77944ed29 assign names 2017-07-07 12:03:11 -07:00
Alex Dadgar ecacd44888 handle batch filtering 2017-07-07 12:03:11 -07:00
Alex Dadgar 4c123500ee Remove old 2017-07-07 12:03:11 -07:00
Alex Dadgar 270e26c600 Populate desired state per tg 2017-07-07 12:03:11 -07:00
Alex Dadgar 23dcd175ef Show canaries on plan 2017-07-07 12:03:11 -07:00
Alex Dadgar cf5baba808 handle annotations 2017-07-07 12:03:11 -07:00
Alex Dadgar a46f7c3eb8 Todos 2017-07-07 12:03:11 -07:00
Alex Dadgar 00d962b8b5 Some comments and cleanup 2017-07-07 12:03:11 -07:00
Alex Dadgar 994ad285b7 Split reconcile file 2017-07-07 12:03:11 -07:00
Alex Dadgar 07b1c3e5db Only upsert a job if the spec changes and push deployment creation into reconciler 2017-07-07 12:03:11 -07:00
Alex Dadgar 0d42b5d421 initial reconciler 2017-07-07 12:01:17 -07:00
Alex Dadgar b3f4db0930 cancel deployments 2017-07-07 12:01:17 -07:00
Alex Dadgar 8169590d76 Fix tests 2017-05-01 13:54:26 -07:00
Alex Dadgar 5a2449d236 Respond to review comments 2017-04-19 10:54:03 -07:00
Alex Dadgar 3145086a42 non-purge deregisters 2017-04-15 17:08:05 -07:00
Alex Dadgar 2c31d4036b Skip inplace update on terminal batch allocation
This PR skips adding an inplace update to a successfully terminal batch
job to the plan. This avoids extra data in the plan and avoids
triggering updates on all clients that have the terminal allocation.
This is matching behavior of the service scheduler.

/cc @armon for review
2017-03-11 17:19:22 -08:00
Alex Dadgar bb12ff69a6 Fix in-place update 2017-03-09 22:03:10 -08:00
Alex Dadgar 601cbd7784 Feedback addressed 2017-03-09 21:36:27 -08:00
Alex Dadgar b65d248dee Fix filtering issue and add a test that would catch it 2017-03-09 16:20:39 -08:00
Alex Dadgar 7945e4564c Refactor 2017-03-09 15:26:46 -08:00
Alex Dadgar 60c42f745a Split distinct property and host iterator and add iterator to system stack 2017-03-08 19:00:10 -08:00
Alex Dadgar 319b24081f cleanup 2017-03-08 17:57:31 -08:00
Alex Dadgar a439bf709d Property Set 2017-03-08 17:50:40 -08:00
Alex Dadgar d83a8fe9f2 Unoptimized implementation + testing 2017-03-07 14:48:54 -08:00
Alex Dadgar 87d971a6b8 Double the anti-affinity for placing same task group on node 2017-03-06 11:52:53 -08:00
Alex Dadgar 5be806a3df Fix vet script and fix vet problems
This PR fixes our vet script and fixes all the missed vet changes.

It also fixes pointers being printed in `nomad stop <job>` and `nomad
node-status <node>`.
2017-02-27 16:00:19 -08:00
Alex Dadgar 04862ca10e Tests compile 2017-02-07 21:30:57 -08:00
Alex Dadgar b69b357c7f Nomad builds 2017-02-07 20:31:23 -08:00
Alex Dadgar 302a0cf382 Fix adjust test 2017-01-08 14:14:35 -08:00
Alex Dadgar 2c838a80f6 Detect newly created allocation's properly 2017-01-08 13:55:03 -08:00
Alex Dadgar 8d5f0fea69 Merge pull request #2128 from hashicorp/f-dispatch
Nomad Constructor Jobs and Dispatch
2017-01-06 05:22:49 +08:00
Diptanu Choudhury 9cdd576720 Updated changelog and fixed tests 2016-12-20 11:32:17 -08:00
Alex Dadgar a1dd78c24b Scheduler combines meta from job > group > task 2016-12-15 17:08:38 -08:00
Diptanu Choudhury 5191b4d33a Making the status command return the allocs of currently registered job 2016-11-24 16:31:30 +01:00
Alex Dadgar a1d08c2aba Add scheduler version enforcement 2016-10-26 14:52:48 -07:00
Alex Dadgar 989827e402 Add set contains 2016-10-19 13:06:28 -07:00
Alex Dadgar 36cfe6e89e Large refactor of task runner and Vault token rehandling 2016-10-18 11:24:20 -07:00
Ben Barnard 83f647ed84 Replace "the the" with "the" in documentation and comments 2016-10-11 15:31:40 -04:00
Diptanu Choudhury dae7f88118 Not setting a drained node as preferred node (#1740) 2016-09-23 21:15:50 -07:00
Diptanu Choudhury 45afc0b4e1 Added logic to ensure scheduler knows job defn has been updated when ephemeral disks has been updated (#1725) 2016-09-21 14:00:02 -07:00
Alex Dadgar bc500a536c tasks updated 2016-09-21 11:31:09 -07:00
Diptanu Choudhury 36edabb487 Fixed the logic of calculating queued allocation in sys sched (#1724) 2016-09-20 12:05:19 -07:00
Alex Dadgar 683380c25c Merge pull request #1715 from hashicorp/b-dead-system-nodes
Fix bug where dead nodes weren't properly handled by system scheduler
2016-09-19 11:49:44 -07:00
Alex Dadgar 47551e93b4 Fix bug in which dead nodes weren't being properly handled by system scheduler 2016-09-19 11:49:27 -07:00
Diptanu Choudhury 1b3c5e98c8 Renaming LocalDisk to EphemeralDisk (#1710)
Renaming LocalDisk to EphemeralDisk
2016-09-14 15:43:42 -07:00
Diptanu Choudhury d94bb45ad3 Added some more comments 2016-08-31 14:06:31 -07:00
Diptanu Choudhury 52e9946da9 Implemented SetPrefferingNodes in stack 2016-08-30 16:17:50 -07:00
Diptanu Choudhury bfee7b30a3 Introducing shared resources in alloc 2016-08-29 13:49:25 -07:00
Diptanu Choudhury 13497913f9 Ensuring resources are re-calculated properly in fsm 2016-08-26 20:13:11 -07:00
Diptanu Choudhury e79cb67391 Changing implementation of AllocsFit 2016-08-26 17:28:29 -05:00
Diptanu Choudhury 3447658bba Added scheduler tests to ensure disk constraints are honored 2016-08-25 15:31:56 -05:00
Diptanu Choudhury ffaf6c6299 Fixed some tests 2016-08-25 13:56:39 -05:00
Diptanu Choudhury ec73c768f1 Making the scheduler use LocalDisk instead of Resources.DiskMB 2016-08-25 12:27:42 -05:00
Diptanu Choudhury c1a455983d Added the chained alloc for system scheduler 2016-08-16 10:49:45 -07:00
Diptanu Choudhury 1de89776d7 Marking an allocation chained if we are creating this to replace an old one 2016-08-15 17:52:41 -07:00
Alex Dadgar 64f7eff612 Plan on system scheduler doesn't count nodes who don't meet constraints 2016-08-11 15:26:25 -07:00
Diptanu Choudhury 23fcb9f5c9 Ensuring system sched doesn't increment queued count when nodes are filtered 2016-08-10 14:33:13 -07:00
Diptanu Choudhury 13bab5b1ad Added scheduler tests 2016-08-09 14:52:25 -07:00
Diptanu Choudhury ab94c8eed9 Marking allocations which are not terminal and are on down nodes as lost 2016-08-09 13:11:58 -07:00
Alex Dadgar e33bda76bf test sched doesn't mark complete as lost + core_sched tests 2016-08-04 11:24:17 -07:00
Alex Dadgar ac3328e812 Make scheduler mark allocations as lost 2016-08-03 15:57:46 -07:00
Alex Dadgar 3a9f3a31bc KillTimeout can be modified in place 2016-08-01 20:19:12 -07:00
Alex Dadgar e661c09898 fix filter logic 2016-07-28 15:57:56 -07:00
Alex Dadgar ddbd9261c1 Merge pull request #1471 from hashicorp/b-handle-old-batch-allocs
filterCompleteAllocs filters replaced batch allocs
2016-07-28 14:31:19 -07:00
Diptanu Choudhury eb08405467 Updated tests and added logic to system sched 2016-07-28 14:02:50 -07:00
Diptanu Choudhury 2e84d246f9 fixed a comment 2016-07-28 12:22:44 -07:00
Diptanu Choudhury 48eda99dd9 Setting the queued count as zero if there is nothing to place 2016-07-28 12:13:35 -07:00
Diptanu Choudhury 4a8636cb61 Added a test 2016-07-27 17:49:53 -07:00
Alex Dadgar c132952ba2 filterCompleteAllocs filters replaced batch allocs 2016-07-27 11:54:55 -07:00
Diptanu Choudhury d1a6bdb4ba Making the queued allocations bind late 2016-07-25 22:11:11 -07:00
Diptanu Choudhury d1682e052a Added a test for adjustQueuedAllocations 2016-07-25 17:31:40 -07:00
Diptanu Choudhury 51cb201a09 Initializing the queued allocations late 2016-07-25 17:26:38 -07:00
Diptanu Choudhury 09aa867cc2 Added a test to ensure we record the queued allocations correctly when the plan made partial progress 2016-07-25 17:26:38 -07:00
Diptanu Choudhury 8f0d2a2775 Fixed some more tests 2016-07-25 17:26:38 -07:00
Diptanu Choudhury 4a17d8e6d6 Added a test to ensure failed batch allocations are being added to the number of queued allocations 2016-07-25 17:26:38 -07:00
Diptanu Choudhury 39bcfcd1c6 Added a test to ensure system scheduler records the correct number of queued allocations 2016-07-25 17:26:38 -07:00
Diptanu Choudhury cce5f483ae Added some more tests 2016-07-25 17:26:38 -07:00
Diptanu Choudhury dabb83063b Review comments 2016-07-25 17:26:38 -07:00
Diptanu Choudhury 50842b88c7 Fixed some bugs 2016-07-25 17:26:38 -07:00
Diptanu Choudhury 804ef1e932 Not setting the desired and client status of an allocation during in-place updates 2016-07-25 17:26:38 -07:00
Diptanu Choudhury a64785417d Fixed the logic for decrementing the count of queued based on plan result 2016-07-25 17:26:38 -07:00
Diptanu Choudhury 1cc0bc392b Setting the number of queued allocations per task group 2016-07-25 17:26:38 -07:00
Diptanu Choudhury 487c66b84d Removing the queued state of Job Summary and alloc desired status false 2016-07-13 13:20:46 -06:00
Alex Dadgar e90529afc9 test for max plan 2016-06-20 17:56:49 -07:00
Alex Dadgar 67c0816726 Handle max plans 2016-06-20 17:43:02 -07:00
Sean Chittenden a658299235
Misc typos 2016-06-16 16:17:17 -07:00
Alex Dadgar d44c4761f6 track failed allocations properly 2016-06-15 12:58:19 -07:00
Alex Dadgar 8e231fa382 Rename ConsulService back to Service 2016-06-12 16:36:49 -07:00
Sean Chittenden 2f036231e5 Merge pull request #1201 from hashicorp/f-dyn-server-list
Dynamic Server Lists/Client Bootstrapping via consul.
2016-06-11 18:58:25 -04:00
Alex Dadgar b064b392fc Only unblock if missed class was added after eval snapshot index 2016-06-10 15:24:06 -07:00
Sean Chittenden 95c9d1a63e
Per-comment, remove structs.Allocation's Services attribute.
Nuke PopulateServiceIDs() now that it's also no longer needed.
2016-06-10 15:54:39 -04:00
Sean Chittenden 7956eb0c80
Rename structs.Task's Service attribute to ConsulService 2016-06-10 15:54:39 -04:00
Sean Chittenden 4973ec32bb
Rename structs.Services to structs.ConsulServices 2016-06-10 15:54:39 -04:00
Alex Dadgar 57770de1fc Add eval-status and remove eval-monitor 2016-05-27 11:50:15 -07:00
Alex Dadgar fb8d79a908 Blocked evals don't store TG alloc metrics 2016-05-27 11:26:14 -07:00
Alex Dadgar 6a236872b4 address comment 2016-05-25 10:30:47 -07:00
Alex Dadgar 3fd51ecece Periodically unblock failed evaluations 2016-05-24 20:10:56 -07:00
Alex Dadgar 18d9e89065 Reuse the same evaluation and reblock it until there is no more work to do 2016-05-24 20:10:56 -07:00
Alex Dadgar 3cbb89c61e Merge pull request #1188 from hashicorp/f-no-failed-allocs
Failed Allocation Metrics stored in Evaluation
2016-05-24 20:06:28 -07:00
Alex Dadgar 958d677248 comment 2016-05-24 18:18:10 -07:00
Alex Dadgar fcc57fbc66 rename SpawnedBlockedEval and simplify map safety check 2016-05-24 18:12:59 -07:00
Alex Dadgar 7167b93ba9 Add test to verify drain doesn't restart successful batch and add to ignore list 2016-05-24 17:47:03 -07:00
Alex Dadgar b5ad18a7ea Dont restart successfully finished batch allocations 2016-05-24 17:23:18 -07:00
Alex Dadgar 1feb57b047 Evals track blocked evals they create 2016-05-19 13:09:52 -07:00
Alex Dadgar 8f5f12ae81 Scheduler no longer produces failed allocations; failed alloc metrics stored in evaluation 2016-05-18 18:11:40 -07:00
Alex Dadgar 117b926e2b inplaceUpdate returns the allocs that were updated in-place 2016-05-17 15:37:37 -07:00
Alex Dadgar a5ab96d40e Merge pull request #1168 from hashicorp/f-plan-endpoint
Job.Plan endpoint
2016-05-16 13:15:40 -07:00
Alex Dadgar a231f6f998 Switch to using the harness 2016-05-16 12:49:18 -07:00
Sean Chittenden dc28ab0cb5
Speling police 2016-05-15 09:41:34 -07:00
Alex Dadgar bed4cb7a9f Fixes 2016-05-13 11:53:11 -07:00
Alex Dadgar 7a44ec5ccc Remove plan from the response 2016-05-12 11:29:38 -07:00
Alex Dadgar 6d69e39966 Test task group update annotations 2016-05-11 16:31:50 -07:00
Alex Dadgar 81f0286dd8 Merge branch 'master' into f-plan-endpoint 2016-05-11 15:39:36 -07:00
Alex Dadgar 24bfaa70ac Fix switching diff structures 2016-05-11 15:36:28 -07:00
Alex Dadgar 8b45e2c474 Check if network asks have changed when checking task updates 2016-05-05 21:32:01 -07:00
Alex Dadgar ab0b57a9a1 Initial plan endpoint implementation - WIP 2016-05-05 11:21:58 -07:00
Alex Dadgar ff0dd9b81c Task is not eligible for update if User, Meta, or Resources change 2016-04-25 17:20:25 -07:00
Alex Dadgar 733156c016 vendor 2016-04-19 17:12:44 -07:00
Alex Dadgar 7dc1a525cb more debug 2016-04-19 16:55:27 -07:00
Alex Dadgar 76e493dc16 base debugging 2016-04-19 16:33:25 -07:00
Alex Dadgar 1a31e5e137 Fix drained/batch allocations from continually migrating 2016-04-12 16:14:32 -07:00
Alex Dadgar f021c1a7b0 filtering failed batch allocs 2016-04-11 12:51:53 -07:00
Alex Dadgar 034bae90bb Revert "Remove client status from allocation TerminalStatus"
This reverts commit 819e1e4b3967c7029ee8221144666ff460fdd7ed.
2016-04-08 14:22:06 -07:00
Alex Dadgar 09f63fd3c0 Remove client status from allocation TerminalStatus 2016-03-25 12:53:37 -07:00
Alex Dadgar b80e61a66c Merge pull request #975 from hashicorp/f-rename-complete-alloc
Successful allocations are marked as complete instead of dead
2016-03-25 10:35:11 -07:00
Diptanu Choudhury 3f0580f204 Added a note about backward compatibility 2016-03-23 19:08:07 -07:00
Alex Dadgar 94522e7bed Successful allocations are marked as complete instead of dead 2016-03-23 18:08:19 -07:00
Alex Dadgar 2de9299cab ProposedAllocs dedups in-place updated allocations 2016-03-21 18:09:32 -07:00
Alex Dadgar f6e220b987 unit-test demonstrating broken behavior 2016-03-21 16:28:47 -07:00
Alex Dadgar 914207a5c2 Allow count zero 2016-03-17 11:02:59 -07:00
Alex Dadgar 7843ed1218 evict and replace when the artifacts of a task change 2016-03-15 19:32:49 -07:00
Alex Dadgar ad92e50a24 Avoid serializes Allocation.Resources 2016-03-01 14:09:25 -08:00
Alex Dadgar 6a368a9f67 Merge pull request #848 from hashicorp/b-server-panic
core: store the job on allocations that are from stopped jobs
2016-02-24 17:27:22 -08:00
Alex Dadgar a2b56a5cff Generic Scheduler handles periodic eval type 2016-02-24 16:20:33 -08:00
Alex Dadgar a9d410dbee Store the job on allocations that are from stopped jobs 2016-02-24 14:50:59 -08:00
Alex Dadgar fa8e2d31ee Revert "err logs in worker and scheduler"
This reverts commit 7befc586521b70eb84013bff367310e4cfa45c27.
2016-02-22 22:23:57 -08:00
Alex Dadgar f48eabe753 err logs in worker and scheduler 2016-02-22 14:47:59 -08:00
Alex Dadgar f092c7ca15 format 2016-02-22 13:24:26 -08:00
Armon Dadgar 2b7bdfee37 nomad: add a sanity check guard 2016-02-22 12:15:40 -08:00
Alex Dadgar e42720c2f5 Fix progressMade in scheduler 2016-02-22 10:38:04 -08:00
Armon Dadgar 87447efa61 schedule: deduplicate the jobs 2016-02-21 11:32:56 -08:00
Armon Dadgar 0dbd4c46c9 nomad: make PopulateServiceIDs more efficient 2016-02-21 11:15:00 -08:00
Alex Dadgar 821b9c13db Merge pull request #823 from hashicorp/f-bitmap
Switch port collision checking to use bitmap instead of map
2016-02-20 16:02:48 -08:00
Armon Dadgar 9784bb7285 nomad: cache bitmaps to avoid GC pressure 2016-02-20 12:18:22 -08:00
Armon Dadgar 35741fcedd scheduler: Use AllocsByNodeTerminal to avoid filtering 2016-02-20 11:29:15 -08:00
Alex Dadgar d1011c9668 Fixes 2016-02-19 15:49:32 -08:00
Alex Dadgar 80345a2953 resolveConstraintTargets checks for bracket syntax 2016-02-16 10:03:04 -08:00
Alex Dadgar f6e0349d3b go vet 2016-02-12 16:08:58 -08:00
Alex Dadgar 8e6544333e Only set eligibility if the eval hasn't escaped 2016-02-11 09:45:27 -08:00
Alex Dadgar a47d5260c5 Reset retry count if progress is made and fail by creating a blocked eval 2016-02-09 21:24:47 -08:00
Alex Dadgar 5018f5dd1e Only interpret vars wrapped in braces 2016-02-04 17:26:46 -08:00
Alex Dadgar 25cb7fc03d Fix computed class when the job has multiple task groups 2016-02-03 21:22:18 -08:00
Alex Dadgar 4e527b26b0 test 2016-02-03 14:15:02 -08:00
Alex Dadgar d930d488b5 Fix node drain 2016-02-03 12:00:43 -08:00
Alex Dadgar c7821f13d7 Only replace batch allocations that have failed 2016-02-02 17:40:32 -08:00
Alex Dadgar 36df3aaac7 Remove running, system scheduler, and fix tg overriding eligibility 2016-01-31 20:56:52 -08:00
Alex Dadgar 151fe5ed88 Make computed node class a string and add versioning 2016-01-31 18:04:45 -08:00
Alex Dadgar 9045d7e989 Schedulers create blocked eval if there are failed allocations 2016-01-31 18:04:45 -08:00
Alex Dadgar 9a8871249d EvalEligibility unit tests and simplify escaped constraint tracking 2016-01-26 17:34:41 -08:00
Alex Dadgar 9dc22532e5 Respond to comments 2016-01-26 16:43:42 -08:00
Alex Dadgar 0d55fb2bdd Add benchmark 2016-01-26 15:16:43 -08:00
Alex Dadgar 1bd9bece62 Change the unique namespace on the node 2016-01-26 15:16:43 -08:00
Alex Dadgar 2b7d42bf9b FeasibilityWrapper uses computed node class eligibility to call feasibility checks minimally 2016-01-26 15:16:43 -08:00
Alex Dadgar 5d23025df8 EvalEligibility in context 2016-01-26 15:16:43 -08:00
Ivo Verberk 91a9f2c4ce Shorten CLI identifiers
* Truncate all UUID identifiers to eight characters by default
* Refactor the node identifier to an auto-generated UUID
* Created and updated tests and mocks
2016-01-14 21:57:43 +01:00
Alex Dadgar 41efdcb1c3 Add JobModifyIndex 2016-01-12 09:50:33 -08:00
Alex Dadgar 561f9634ba Fix counts 2016-01-04 14:33:10 -08:00
Alex Dadgar c0721e45f6 Fix bug, add tests, and cli output 2016-01-04 14:23:06 -08:00
Alex Dadgar 36752b9ed4 Store the available nodes in the alloc metric 2016-01-04 12:07:33 -08:00
Alex Dadgar d6aa36b417 Merge pull request #618 from hashicorp/f-node-class-constraint
Add node class to constraints
2015-12-28 13:27:38 -08:00
Alex Dadgar 0b29c2046d Test ebug log 2015-12-23 19:44:42 -08:00
Alex Dadgar 5e71751a1d Add node class to constraints 2015-12-21 17:15:34 -08:00
Alex Dadgar 92823b71a8 merge 2015-12-16 15:01:15 -08:00
Alex Dadgar 2218a79815 Add garbage collection to jobs 2015-12-16 15:00:45 -08:00
Diptanu Choudhury 0cc1275782 Added a test to make sure services no longer present are being removed 2015-12-15 10:43:56 -08:00
Diptanu Choudhury ba5561cae0 Making sure existing ids for services are not re-generated 2015-12-15 09:14:32 -08:00
Diptanu Choudhury 7a8acd32e4 Populating service ids only if allocations can be placed for system jobs 2015-12-15 08:38:18 -08:00
Diptanu Choudhury ddaf74fb65 Added a test to prove services are removed from the map in Alloc if they are removed from the Tasks 2015-12-15 08:35:26 -08:00
Diptanu Choudhury b7f556fabc Changed some comments 2015-12-14 18:05:58 -08:00
Diptanu Choudhury 1c76715358 Re-initializing the service map for in place updates 2015-12-14 17:06:58 -08:00
Diptanu Choudhury 2eb03e1d23 Renamed serviceId to serviceID 2015-12-14 15:57:56 -08:00
Diptanu Choudhury 76486d71e2 Making the allocs hold service ids 2015-12-14 15:08:35 -08:00
Chris Hines 5f0c30b926 Skip unreliable time measurement assertions on Windows. 2015-12-09 14:55:57 -05:00
Alex Dadgar 9801db55b3 Remove unnecessary copy 2015-11-23 16:36:12 -08:00
Alex Dadgar 9b99eeeec4 Remove shared reference to network resources across allocs 2015-11-23 16:32:30 -08:00
Chris Bednarski 9f40143684 Merge branch 'master' into f-port-labels 2015-11-16 16:02:38 -08:00
Alex Dadgar bdf7497f1b Initialize task state in allocation sent by scheduler 2015-11-16 15:14:21 -08:00
Diptanu Choudhury babc68adfb Fixing the scheduler tests 2015-11-16 13:10:57 -08:00
Alex Dadgar 3cdbfc010f Remove weight and hard/soft fields from constraint 2015-10-27 14:31:14 -07:00
Alex Dadgar c7f904ff31 Merge pull request #321 from hashicorp/f-unique-constraint
Add "distinctHost" constraint
2015-10-26 14:18:57 -07:00
Alex Dadgar 1784387e1d Rename Dynamic -> ProposedAllocConstraintIterator 2015-10-26 14:12:54 -07:00
Alex Dadgar a9135b92b2 Cleanup DynamicConstraintIterator 2015-10-26 14:01:32 -07:00
Alex Dadgar fd9c2baf02 Constants for constraints and renaming to use undescore instead of camel 2015-10-26 13:47:56 -07:00
Alex Dadgar 2b2b6c321a Check for environment variable updates for tasks 2015-10-23 14:52:06 -07:00
Alex Dadgar be50fe6254 Fix markdown and log messages 2015-10-23 09:56:48 -07:00
Alex Dadgar ecc4f98f3a Change "unique" to "distinctHosts" 2015-10-22 17:40:41 -07:00
Alex Dadgar 861a65288c Fix test and simplify some boolean logic/fix metrics counting 2015-10-22 16:45:03 -07:00
Alex Dadgar 783b0b5aee Add dynamic constraint to generic_scheduler 2015-10-22 15:09:03 -07:00
Alex Dadgar 910dcc49fb DynamicConstraintIterator that implements the unique constraint 2015-10-22 14:31:12 -07:00
Gregory Man 400363079c Make go vet happy 2015-10-21 15:47:36 +03:00
Alex Dadgar d9b78ffdfe Remove base nodes from stack constructors 2015-10-16 17:05:23 -07:00
Alex Dadgar 1a1febba4f Unit tests for the refactor scheduler methods 2015-10-16 16:35:55 -07:00
Alex Dadgar 1ec921a3c2 Refactor task group constraint logic in generic/system stack 2015-10-16 14:00:51 -07:00
Alex Dadgar ab9acb9edf diffResult stores values not pointers 2015-10-16 11:43:09 -07:00
Alex Dadgar 406e135ce8 Add negative test to DriverIterator, increase system scheduler attempts, and fix evictAndPlace status message 2015-10-16 11:36:26 -07:00
Alex Dadgar 70c39bd5a4 Add diffSystemAlloc which gives richer information which node to place a system allocation 2015-10-15 13:14:44 -07:00
Alex Dadgar 65fd28d7d1 Refactor shared code between schedulers 2015-10-14 18:39:44 -07:00
Alex Dadgar 692efe513d Use valid driver values in test 2015-10-14 18:39:44 -07:00
Alex Dadgar 494244ed06 System scheduler and system stack 2015-10-14 18:39:44 -07:00
Armon Dadgar 931a113cb5 scheduler: adding regexp and version constraint cache 2015-10-12 20:15:07 -07:00
Armon Dadgar 81c08ba66e scheduler: support lexical ordering constraints 2015-10-11 15:57:06 -04:00
Armon Dadgar 751fa61b3e scheduler: adding regexp constraints 2015-10-11 15:35:13 -04:00
Armon Dadgar 86ca7c59a1 scheduler: adding version constraint logic 2015-10-11 15:12:39 -04:00
Ivo Verberk c6e1b13b51 Fix vet warnings 2015-10-07 12:26:58 +02:00
Armon Dadgar 32f4e9e401 scheduler: tasks updated should only check if number of dynamic ports is different 2015-10-04 15:53:02 -04:00
Armon Dadgar b213462cb4 Change CPU from float64 to int 2015-09-23 11:14:32 -07:00
Armon Dadgar d442a49fde scheduler: job anti-affinity score should record as negative 2015-09-22 22:24:07 -07:00
Armon Dadgar 90a82da0fd scheduler: do not skip job anti-affinity 2015-09-22 22:20:07 -07:00
Armon Dadgar a23c8bf713 scheduler: Allow rolling update, assign eval first.Fixes #91 2015-09-22 21:45:25 -07:00
Chris Bednarski 168c959497 Added named ports 2015-09-22 13:59:16 -07:00
Armon Dadgar ee2fd15cc2 scheduler: ignore allocations in terminal state 2015-09-17 21:25:55 -07:00
Armon Dadgar 38471c81a0 scheduler: pass failure reason to ExhaustedNode 2015-09-13 18:38:26 -07:00
Armon Dadgar cece4473b8 scheduler: in-place update should preserve network offer 2015-09-13 17:06:34 -07:00
Armon Dadgar fcd367e657 scheduler: track dimension of exhaustion 2015-09-13 16:48:01 -07:00
Armon Dadgar f71527dadf schedule: avoid in-place update of task if network resources are different 2015-09-13 16:41:53 -07:00
Armon Dadgar cf0a546fdc scheduler: expose reason network offer failed 2015-09-13 16:41:32 -07:00
Armon Dadgar c6f5a8e029 scheduler: thread through the TaskResources 2015-09-13 15:20:50 -07:00
Armon Dadgar cb3aa5d4db nomad: update for new AllocsFit API 2015-09-13 14:57:58 -07:00
Armon Dadgar aed6bd8a41 scheduler: use the new network index 2015-09-13 14:37:09 -07:00
Armon Dadgar e901b0a1ca nomad: moving network index 2015-09-13 14:35:28 -07:00
Armon Dadgar 625308661a scheduler: binpacker makes network offers 2015-09-13 14:31:32 -07:00
Armon Dadgar b0eb463823 scheduler: expose AddReserved, add test 2015-09-13 14:31:01 -07:00
Armon Dadgar 5c8f1c0fa5 scheduler: adding helper library for network assignments 2015-09-12 19:34:46 -07:00
Armon Dadgar 0e38f0e914 scheduler: refactor shared logic 2015-09-12 14:44:40 -07:00
Armon Dadgar 4333b7370b scheduler: recompute scan limit on SetNodes 2015-09-11 12:03:41 -07:00
Armon Dadgar e804567324 scheduler: Adding SetLimit to LimitIterator 2015-09-11 12:01:22 -07:00
Armon Dadgar ea0795995d Use a single implementation of GenerateUUID 2015-09-07 15:23:03 -07:00
Armon Dadgar b0ef497f92 scheduler: use update strategy for rolling updates 2015-09-07 15:17:39 -07:00
Armon Dadgar e7c0bb5e1e scheduler: Adding CreateEval to Planner 2015-09-07 14:26:29 -07:00
Armon Dadgar c8bcb292a0 scheduler: support in-place allocation updates 2015-09-07 12:27:12 -07:00
Armon Dadgar c2eff48412 scheduler: util method to diff task groups 2015-09-07 12:25:23 -07:00
Armon Dadgar f8ee848fee scheduler: share context and stack 2015-09-07 11:34:59 -07:00
Armon Dadgar 4a348cc0da scheduler: allow updating the base nodes 2015-09-07 11:30:13 -07:00
Armon Dadgar 5b45705165 scheduler: allow StaticIterator to update base set 2015-09-07 11:26:16 -07:00
Armon Dadgar f1a93b0aa7 scheduler: pull node shuffle into util 2015-09-07 11:23:38 -07:00
Armon Dadgar 8bedd3769c nomad: unifying the state store API 2015-09-06 20:56:38 -07:00
Armon Dadgar 1a5579384a nomad: cleanup API descrepencies 2015-09-06 20:47:42 -07:00
Armon Dadgar 5832b2f147 nomad: adding drain as node property 2015-09-06 19:47:02 -07:00
Chris Bednarski 96cb220ff4 Update references to "os" to use "kernel.name"
This brings test code and mocks up to date with the fingerprinter. This was a slightly larger change than I anticipated, but I think it's good for two reasons:

1. More semanitcally correct. `os.name` is something like "Windows 10 Pro" or "Ubuntu", while `kernel.name` is "windows" or "linux". `os.version` and `kernel.version` match these semantics.
2. `kernel.name` is much easier to grep for than `os`, which is helpful because oracle can't help us with strings.
2015-08-28 01:30:47 -07:00
Armon Dadgar 2ee6947844 scheduler: updating for new APIs 2015-08-25 17:06:06 -07:00
Armon Dadgar b77e4ff343 scheduler: update tests to filter terminal allocs 2015-08-23 16:30:57 -07:00
Armon Dadgar f04e2b81ba nomad: adding evicted state for allocs 2015-08-22 18:30:49 -07:00
Armon Dadgar 383a6aa76d scheduler: adding job anti-affinity to the generic stack 2015-08-16 10:37:11 -07:00
Armon Dadgar 6d60c4c623 scheduler: adding JobAntiAffinityIterator 2015-08-16 10:32:25 -07:00
Armon Dadgar f4e9ef8d1e scheduler: move proposed alloc logic to Context 2015-08-16 10:28:58 -07:00
Armon Dadgar 0dff5a77a2 scheduler: coalesce failures by task group 2015-08-16 10:03:21 -07:00
Armon Dadgar 107da3a174 scheduler: track sub-scores 2015-08-16 09:57:30 -07:00
Armon Dadgar 1fd148d97d nomad: fixing vet errors 2015-08-15 16:10:10 -07:00
Armon Dadgar 088cc37487 scheduler: update status and test retry limit 2015-08-15 14:47:13 -07:00
Armon Dadgar cae67b7f60 nomad: expose UpdateEval as a planner 2015-08-15 14:25:00 -07:00
Armon Dadgar 9fd8049e16 scheduler: create allocs for failed placements 2015-08-15 13:40:13 -07:00
Armon Dadgar da90c453ce nomad: adding index on EvalID and Status to alloc 2015-08-15 13:27:42 -07:00
Armon Dadgar 3f16a21658 nomad: remove NodesByDatacenterStatus 2015-08-15 13:11:42 -07:00