This change fixes a bug where lost/failed allocations are replaced by
allocations with the latest versions, even if the version hasn't been
promoted yet.
Now, when generating a plan for lost/failed allocations, the scheduler
first checks if the current deployment is in Canary stage, and if so, it
ensures that any lost/failed allocations is replaced one with the latest
promoted version instead.
* scheduler/reconcile: set FollowupEvalID on lost stop_after_client_disconnect
* scheduler/reconcile: thread follupEvalIDs through to results.stop
* scheduler/reconcile: comment typo
* nomad/_test: correct arguments for plan.AppendStoppedAlloc
* scheduler/reconcile: avoid nil, cleanup handleDelayed(Lost|Reschedules)
If an existing system allocation is running and the node its running on
is marked as ineligible, subsequent plan/applys return an RPC error
instead of a more helpful plan result.
This change logs the error, and appends a failedTGAlloc for the
placement.
Also includes unit tests for binpacker and preemption.
The tests verify that network resources specified at the
task group level are properly accounted for
This adds a `nomad alloc stop` command that can be used to stop and
force migrate an allocation to a different node.
This is built on top of the AllocUpdateDesiredTransitionRequest and
explicitly limits the scope of access to that transition to expose it
under the alloc-lifecycle ACL.
The API returns the follow up eval that can be used as part of
monitoring in the CLI or parsed and used in an external tool.
Currently when operators need to log onto a machine where an alloc
is running they will need to perform both an alloc/job status
call and then a call to discover the node name from the node list.
This updates both the job status and alloc status output to include
the node name within the information to make operator use easier.
Closes#2359
Cloess #1180
This commit implements an allocation selection algorithm for finding
allocations to preempt. It currently special cases network resource asks
from others (cpu/memory/disk/iops).
This PR adds watching of allocation health at the client. The client can
watch for health based on the tasks running on time and also based on
the consul checks passing.