Commit graph

163 commits

Author SHA1 Message Date
Drew Bailey e613a258da
ignore computed diffs if node is ineligible
test flakey, add temp sleeps for debugging

fix computed class
2020-02-03 09:02:08 -05:00
Drew Bailey ef175c0b31
Update Evicted allocations to lost when lost
If an alloc is being preempted and marked as evict, but the underlying
node is lost before the migration takes place, the allocation currently
stays as desired evict, status running forever, or until the node comes
back online.

This commit updates updateNonTerminalAllocsToLost to check for a
destired status of Evict as well as Stop when updating allocations on
tainted nodes.

switch to table test for lost node cases
2020-01-07 13:34:18 -05:00
Drew Bailey 876618b5d2
Removes checking constraints for inplace update 2019-11-19 13:34:41 -05:00
Drew Bailey e44a66d7fc
DOCS: Spread stanza does not exist on task
Fixes documentation inaccuracy for spread stanza placement. Spreads can
only exist on the top level job struct or within a group.

comment about nil assumption
2019-11-19 08:26:36 -05:00
Drew Bailey 07e3164bf9
Check for changes to affinity and constraints
Adds checks for affinity and constraint changes when determining if we
should update inplace.

refactor to check all levels at once

check for spread changes when checking inplace update
2019-11-19 08:26:34 -05:00
Chris Baker 95ae01a9f4 the scheduler checks whether task changes require a restart, this needed
to be updated to consider devices
2019-11-07 17:51:15 +00:00
Michael Schurter c6bbe85f42 core: fix panic when AllocatedResources is nil
Fix for #6540
2019-10-28 14:38:21 -07:00
Preetha Appan 9accf60805
update comment 2019-09-05 18:43:30 -05:00
Preetha Appan d21c708c4a
Fix inplace updates bug with group level networks
During inplace updates, we should be using network information
from the previous allocation being updated.
2019-09-05 18:37:24 -05:00
Nick Ethier 7c9520b404
scheduler: fix disk constraints 2019-07-31 01:04:08 -04:00
Nick Ethier 09a4cfd8d7
fix failing tests 2019-07-31 01:04:07 -04:00
Nick Ethier af66a35924
networking: Add new bridge networking mode implementation 2019-07-31 01:04:06 -04:00
Nick Ethier 15989bba8e
ar: cleanup lint errors 2019-07-31 01:03:18 -04:00
Nick Ethier 66c514a388
Add network lifecycle management
Adds a new Prerun and Postrun hooks to manage set up of network namespaces
on linux. Work still needs to be done to make the code platform agnostic and
support Docker style network initalization.
2019-07-31 01:03:17 -04:00
Arshneet Singh b977748a4b Add code for plan normalization 2019-04-23 09:18:01 -07:00
Preetha Appan da1ce9bcea
Fix bug where scoring metadata would be overridden during an inplace upgrade. 2019-03-12 23:36:46 -05:00
Alex Dadgar 1e3c3cb287 Deprecate IOPS
IOPS have been modelled as a resource since Nomad 0.1 but has never
actually been detected and there is no plan in the short term to add
detection. This is because IOPS is a bit simplistic of a unit to define
the performance requirements from the underlying storage system. In its
current state it adds unnecessary confusion and can be removed without
impacting any users. This PR leaves IOPS defined at the jobspec parsing
level and in the api/ resources since these are the two public uses of
the field. These should be considered deprecated and only exist to allow
users to stop using them during the Nomad 0.9.x release. In the future,
there should be no expectation that the field will exist.
2018-12-06 15:09:26 -08:00
Alex Dadgar a78cefec18 use int64 2018-10-16 15:34:32 -07:00
Preetha Appan 7c0d8c646c
Change CPU/Disk/MemoryMB to int everywhere in new resource structs 2018-10-16 16:21:42 -05:00
Alex Dadgar 52f9cd7637 fixing tests 2018-10-04 14:26:19 -07:00
Alex Dadgar bac5cb1e8b Scheduler uses allocated resources 2018-10-02 17:08:25 -07:00
Alex Dadgar 3c19d01d7a server 2018-09-15 16:23:13 -07:00
Alex Dadgar 3aa4ee9d75 Fix lost handling of not actually down nodes 2018-03-30 14:17:41 -07:00
Alex Dadgar b18f789020 Unmark drain when nodes hit their deadline and only batch/system left and add all job type integration test 2018-03-28 17:25:58 -07:00
Alex Dadgar 9d60e2cebf Correct status desc on draining system allocs 2018-03-26 17:54:46 -07:00
Michael Schurter d1ec65d765 switch to new raft DesiredTransition message 2018-03-21 16:49:48 -07:00
Alex Dadgar db4a634072 RPC, FSM, State Store for marking DesiredTransistion
fix build tag
2018-03-21 16:49:48 -07:00
Michael Schurter c0542474db drain: initial drainv2 structs and impl 2018-03-21 16:49:48 -07:00
Josh Soref ed8db9992e spelling: feasibility 2018-03-11 18:07:09 +00:00
Preetha Appan fbb1936dee
Fix some comments and lint warnings, remove unused method 2018-01-31 09:56:53 -06:00
Preetha Appan 031c566ada
Reschedule previous allocs and track their reschedule attempts 2018-01-31 09:56:53 -06:00
Alex Dadgar 4173834231 Enable more linters 2017-09-26 15:26:33 -07:00
Alex Dadgar 641e178416 Stop before trying to place 2017-07-17 17:18:12 -07:00
Alex Dadgar 454083ba1b Remove canary 2017-07-07 12:10:04 -07:00
Alex Dadgar 477c713df5 Plan apply handles canaries and success is set via update 2017-07-07 12:10:04 -07:00
Alex Dadgar e229d3650b Attach eval id 2017-07-07 12:10:04 -07:00
Alex Dadgar 27a6e6b6d1 update description of the alloc update factory function 2017-07-07 12:03:11 -07:00
Alex Dadgar ce55559f12 Non-Canary/Deployment Tests 2017-07-07 12:03:11 -07:00
Alex Dadgar d111dd5c10 Pull out in-place updating into a passed in function; reduce inputs to reconciler 2017-07-07 12:03:11 -07:00
Alex Dadgar 5a2449d236 Respond to review comments 2017-04-19 10:54:03 -07:00
Alex Dadgar 3145086a42 non-purge deregisters 2017-04-15 17:08:05 -07:00
Alex Dadgar 2c31d4036b Skip inplace update on terminal batch allocation
This PR skips adding an inplace update to a successfully terminal batch
job to the plan. This avoids extra data in the plan and avoids
triggering updates on all clients that have the terminal allocation.
This is matching behavior of the service scheduler.

/cc @armon for review
2017-03-11 17:19:22 -08:00
Alex Dadgar a439bf709d Property Set 2017-03-08 17:50:40 -08:00
Alex Dadgar b69b357c7f Nomad builds 2017-02-07 20:31:23 -08:00
Alex Dadgar 2c838a80f6 Detect newly created allocation's properly 2017-01-08 13:55:03 -08:00
Alex Dadgar a1dd78c24b Scheduler combines meta from job > group > task 2016-12-15 17:08:38 -08:00
Alex Dadgar 36cfe6e89e Large refactor of task runner and Vault token rehandling 2016-10-18 11:24:20 -07:00
Ben Barnard 83f647ed84 Replace "the the" with "the" in documentation and comments 2016-10-11 15:31:40 -04:00
Diptanu Choudhury 45afc0b4e1 Added logic to ensure scheduler knows job defn has been updated when ephemeral disks has been updated (#1725) 2016-09-21 14:00:02 -07:00
Alex Dadgar bc500a536c tasks updated 2016-09-21 11:31:09 -07:00
Alex Dadgar 683380c25c Merge pull request #1715 from hashicorp/b-dead-system-nodes
Fix bug where dead nodes weren't properly handled by system scheduler
2016-09-19 11:49:44 -07:00
Alex Dadgar 47551e93b4 Fix bug in which dead nodes weren't being properly handled by system scheduler 2016-09-19 11:49:27 -07:00
Diptanu Choudhury 1b3c5e98c8 Renaming LocalDisk to EphemeralDisk (#1710)
Renaming LocalDisk to EphemeralDisk
2016-09-14 15:43:42 -07:00
Diptanu Choudhury d94bb45ad3 Added some more comments 2016-08-31 14:06:31 -07:00
Diptanu Choudhury 52e9946da9 Implemented SetPrefferingNodes in stack 2016-08-30 16:17:50 -07:00
Diptanu Choudhury ec73c768f1 Making the scheduler use LocalDisk instead of Resources.DiskMB 2016-08-25 12:27:42 -05:00
Diptanu Choudhury ab94c8eed9 Marking allocations which are not terminal and are on down nodes as lost 2016-08-09 13:11:58 -07:00
Alex Dadgar ac3328e812 Make scheduler mark allocations as lost 2016-08-03 15:57:46 -07:00
Diptanu Choudhury 8f0d2a2775 Fixed some more tests 2016-07-25 17:26:38 -07:00
Diptanu Choudhury 804ef1e932 Not setting the desired and client status of an allocation during in-place updates 2016-07-25 17:26:38 -07:00
Diptanu Choudhury 1cc0bc392b Setting the number of queued allocations per task group 2016-07-25 17:26:38 -07:00
Sean Chittenden a658299235
Misc typos 2016-06-16 16:17:17 -07:00
Sean Chittenden 95c9d1a63e
Per-comment, remove structs.Allocation's Services attribute.
Nuke PopulateServiceIDs() now that it's also no longer needed.
2016-06-10 15:54:39 -04:00
Alex Dadgar fb8d79a908 Blocked evals don't store TG alloc metrics 2016-05-27 11:26:14 -07:00
Alex Dadgar 3cbb89c61e Merge pull request #1188 from hashicorp/f-no-failed-allocs
Failed Allocation Metrics stored in Evaluation
2016-05-24 20:06:28 -07:00
Alex Dadgar 958d677248 comment 2016-05-24 18:18:10 -07:00
Alex Dadgar fcc57fbc66 rename SpawnedBlockedEval and simplify map safety check 2016-05-24 18:12:59 -07:00
Alex Dadgar 7167b93ba9 Add test to verify drain doesn't restart successful batch and add to ignore list 2016-05-24 17:47:03 -07:00
Alex Dadgar b5ad18a7ea Dont restart successfully finished batch allocations 2016-05-24 17:23:18 -07:00
Alex Dadgar 1feb57b047 Evals track blocked evals they create 2016-05-19 13:09:52 -07:00
Alex Dadgar 117b926e2b inplaceUpdate returns the allocs that were updated in-place 2016-05-17 15:37:37 -07:00
Alex Dadgar bed4cb7a9f Fixes 2016-05-13 11:53:11 -07:00
Alex Dadgar 81f0286dd8 Merge branch 'master' into f-plan-endpoint 2016-05-11 15:39:36 -07:00
Alex Dadgar 24bfaa70ac Fix switching diff structures 2016-05-11 15:36:28 -07:00
Alex Dadgar 8b45e2c474 Check if network asks have changed when checking task updates 2016-05-05 21:32:01 -07:00
Alex Dadgar ab0b57a9a1 Initial plan endpoint implementation - WIP 2016-05-05 11:21:58 -07:00
Alex Dadgar ff0dd9b81c Task is not eligible for update if User, Meta, or Resources change 2016-04-25 17:20:25 -07:00
Alex Dadgar 7843ed1218 evict and replace when the artifacts of a task change 2016-03-15 19:32:49 -07:00
Alex Dadgar ad92e50a24 Avoid serializes Allocation.Resources 2016-03-01 14:09:25 -08:00
Alex Dadgar e42720c2f5 Fix progressMade in scheduler 2016-02-22 10:38:04 -08:00
Armon Dadgar 87447efa61 schedule: deduplicate the jobs 2016-02-21 11:32:56 -08:00
Armon Dadgar 0dbd4c46c9 nomad: make PopulateServiceIDs more efficient 2016-02-21 11:15:00 -08:00
Alex Dadgar a47d5260c5 Reset retry count if progress is made and fail by creating a blocked eval 2016-02-09 21:24:47 -08:00
Alex Dadgar 41efdcb1c3 Add JobModifyIndex 2016-01-12 09:50:33 -08:00
Alex Dadgar 36752b9ed4 Store the available nodes in the alloc metric 2016-01-04 12:07:33 -08:00
Diptanu Choudhury 1c76715358 Re-initializing the service map for in place updates 2015-12-14 17:06:58 -08:00
Alex Dadgar bdf7497f1b Initialize task state in allocation sent by scheduler 2015-11-16 15:14:21 -08:00
Alex Dadgar 2b2b6c321a Check for environment variable updates for tasks 2015-10-23 14:52:06 -07:00
Alex Dadgar 1a1febba4f Unit tests for the refactor scheduler methods 2015-10-16 16:35:55 -07:00
Alex Dadgar 1ec921a3c2 Refactor task group constraint logic in generic/system stack 2015-10-16 14:00:51 -07:00
Alex Dadgar ab9acb9edf diffResult stores values not pointers 2015-10-16 11:43:09 -07:00
Alex Dadgar 70c39bd5a4 Add diffSystemAlloc which gives richer information which node to place a system allocation 2015-10-15 13:14:44 -07:00
Alex Dadgar 65fd28d7d1 Refactor shared code between schedulers 2015-10-14 18:39:44 -07:00
Alex Dadgar 494244ed06 System scheduler and system stack 2015-10-14 18:39:44 -07:00
Armon Dadgar 32f4e9e401 scheduler: tasks updated should only check if number of dynamic ports is different 2015-10-04 15:53:02 -04:00
Armon Dadgar f71527dadf schedule: avoid in-place update of task if network resources are different 2015-09-13 16:41:53 -07:00
Armon Dadgar c2eff48412 scheduler: util method to diff task groups 2015-09-07 12:25:23 -07:00
Armon Dadgar f1a93b0aa7 scheduler: pull node shuffle into util 2015-09-07 11:23:38 -07:00
Armon Dadgar 8bedd3769c nomad: unifying the state store API 2015-09-06 20:56:38 -07:00
Armon Dadgar 5832b2f147 nomad: adding drain as node property 2015-09-06 19:47:02 -07:00