Commit graph

10931 commits

Author SHA1 Message Date
Michael Schurter 1b7ac447e9 alloc_runner: watch health for deployed batch jobs 2018-03-21 16:51:45 -07:00
Michael Schurter 1cc012966b api: fix tests to expect default migrate strategy 2018-03-21 16:51:45 -07:00
Michael Schurter cb61a4bdc7 Fix linting errors 2018-03-21 16:51:45 -07:00
Michael Schurter 33488ee2f0 rpcapi: remove; unused 2018-03-21 16:51:45 -07:00
Alex Dadgar 640ebdaef6 fix race in drain integration tests 2018-03-21 16:51:45 -07:00
Michael Schurter c401d5a098 Refactor assertOps into a helper func 2018-03-21 16:51:45 -07:00
Michael Schurter 187b0e1a48 Remove debug prints 2018-03-21 16:51:45 -07:00
Michael Schurter f67eca48ac Deregister garbage collected jobs 2018-03-21 16:51:45 -07:00
Michael Schurter 922842546c JobNs -> NamespacedID
Also drop the New func as it's easy to swap the order of arguments since
they're both strings.
2018-03-21 16:51:45 -07:00
Michael Schurter 8dc7d9fb6a drainer: RegisterJob -> RegisterJobs
Test job watcher
2018-03-21 16:51:45 -07:00
Michael Schurter 3116897099 Fix deadline heap triggering
Chan must be buffered to avoid skipping triggering altogether

Also made timing in a test a bit more lenient
2018-03-21 16:51:45 -07:00
Michael Schurter be7c759867 Improve drain log messages
Also delay "node complete" after the node has been marked complete to
capture a few more alloc events. There are other ways to implement this
that could trade off correctness for responsiveness as technically a
node is considered drained when all of its allocs have been marked to
stop and not when they've actually stopped (which may not happen for a
long time).
2018-03-21 16:51:45 -07:00
Michael Schurter 5eebd53223 Monitor node drains until completion in CLI
allow -detach like other commands
2018-03-21 16:51:45 -07:00
Michael Schurter 2832853bfa Add DesiredTransition.ShouldMigrate to api pkg 2018-03-21 16:51:45 -07:00
Michael Schurter 3907766b6d Fix node eligibility test 2018-03-21 16:51:45 -07:00
Alex Dadgar 9d23c965da fix comment 2018-03-21 16:51:45 -07:00
Alex Dadgar fb4badf1bc sharding 2018-03-21 16:51:44 -07:00
Alex Dadgar 2d91b9dfba Batch drain update 2018-03-21 16:51:44 -07:00
Alex Dadgar 92b636dd32 Fix deadline handling 2018-03-21 16:51:44 -07:00
Michael Schurter 9898edfa90 Switch to drainerv2 impl 2018-03-21 16:51:44 -07:00
Alex Dadgar 7b2bad8c5e Toggle Drain allows resetting eligibility
This PR allows marking a node as eligible for scheduling while toggling
drain. By default the `nomad node drain -disable` commmand will mark it
as eligible but the drainer will maintain in-eligibility.
2018-03-21 16:51:44 -07:00
Alex Dadgar ad80e655cc code review 2018-03-21 16:51:44 -07:00
Alex Dadgar 11f9fe4960 spelling fixes 2018-03-21 16:51:44 -07:00
Alex Dadgar bc7385812d Comments 2018-03-21 16:51:44 -07:00
Alex Dadgar e87c677a42 handle empty node case 2018-03-21 16:51:44 -07:00
Alex Dadgar 405dab2253 integration test and basic fixes 2018-03-21 16:51:44 -07:00
Alex Dadgar e63bcb474d Drainer 2018-03-21 16:51:44 -07:00
Alex Dadgar 4754366640 job watcher 2018-03-21 16:51:44 -07:00
Alex Dadgar 504bfabb4d Node's being untracked or having updated deadlines, updates the deadliner 2018-03-21 16:51:44 -07:00
Alex Dadgar 66eaaa6a4d node watcher 2018-03-21 16:51:44 -07:00
Alex Dadgar 527ac0b39d drain heap 2018-03-21 16:51:44 -07:00
Alex Dadgar 2d4c193a0a Initial design 2018-03-21 16:51:44 -07:00
Alex Dadgar 33ca319080 System test runs on mac 2018-03-21 16:51:44 -07:00
Alex Dadgar f8d4a3a9e6 Fix file names 2018-03-21 16:51:44 -07:00
Alex Dadgar 02019f216a Correct defaulting 2018-03-21 16:51:44 -07:00
Michael Schurter 32a7649359 refactor main drainloop into 2 more methods 2018-03-21 16:51:44 -07:00
Michael Schurter 5e52f84bb7 drainer: refactor newStopAllocs, applyMigrations 2018-03-21 16:51:44 -07:00
Michael Schurter 62960ed7bd client: don't monitor health of non-service jobs
Also fix system job draining; won't work without deadline fixes
2018-03-21 16:51:44 -07:00
Alex Dadgar a37329189a Improve DeadlineTime helper 2018-03-21 16:51:44 -07:00
Michael Schurter b7c993f0e5 drainer: convert fsm errors to go errors 2018-03-21 16:51:44 -07:00
Michael Schurter ab0de41884 drainer: factor job & node watchers out of drainer.go 2018-03-21 16:51:44 -07:00
Michael Schurter 5922aef623 Restart every time SetEnabled(true) is called 2018-03-21 16:51:44 -07:00
Michael Schurter 959d447d38 Remove unused context 2018-03-21 16:51:44 -07:00
Michael Schurter 8b41e9b2e1 drainer: drainer should shutdown with server 2018-03-21 16:51:44 -07:00
Michael Schurter 0a17076ad2 refactor drainer into a subpkg 2018-03-21 16:51:44 -07:00
Alex Dadgar 78c7c36e65 code review 2018-03-21 16:51:44 -07:00
Alex Dadgar ac4975cef4 Small refactor and cleanups 2018-03-21 16:51:44 -07:00
Alex Dadgar 93871c18f8 Fix retaining the drain 2018-03-21 16:51:44 -07:00
Alex Dadgar 010a6b8ca5 Unblock evals once eligible 2018-03-21 16:51:44 -07:00
Alex Dadgar d47c68f764 Add eligibility to node view 2018-03-21 16:51:44 -07:00