Currently, when an alloc fails and is rescheduled, the alloc desired
state remains as "run" and the nomad client may not free the resources.
Here, we ensure that an alloc is marked as stopped when it's
rescheduled.
Notice the Desired Status and Description before and after this change:
Before:
```
mars-2:nomad notnoop$ nomad alloc status 02aba49e
ID = 02aba49e
Eval ID = bb9ed1d2
Name = example-reschedule.nodes[0]
Node ID = 5853d547
Node Name = mars-2.local
Job ID = example-reschedule
Job Version = 0
Client Status = failed
Client Description = Failed tasks
Desired Status = run
Desired Description = <none>
Created = 10s ago
Modified = 5s ago
Replacement Alloc ID = d6bf872b
Task "payload" is "dead"
Task Resources
CPU Memory Disk Addresses
0/100 MHz 24 MiB/300 MiB 300 MiB
Task Events:
Started At = 2019-06-06T21:12:45Z
Finished At = 2019-06-06T21:12:50Z
Total Restarts = 0
Last Restart = N/A
Recent Events:
Time Type Description
2019-06-06T17:12:50-04:00 Not Restarting Policy allows no restarts
2019-06-06T17:12:50-04:00 Terminated Exit Code: 1
2019-06-06T17:12:45-04:00 Started Task started by client
2019-06-06T17:12:45-04:00 Task Setup Building Task Directory
2019-06-06T17:12:45-04:00 Received Task received by client
```
After:
```
ID = 5001ccd1
Eval ID = 53507a02
Name = example-reschedule.nodes[0]
Node ID = a3b04364
Node Name = mars-2.local
Job ID = example-reschedule
Job Version = 0
Client Status = failed
Client Description = Failed tasks
Desired Status = stop
Desired Description = alloc was rescheduled because it failed
Created = 13s ago
Modified = 3s ago
Replacement Alloc ID = 7ba7ac20
Task "payload" is "dead"
Task Resources
CPU Memory Disk Addresses
21/100 MHz 24 MiB/300 MiB 300 MiB
Task Events:
Started At = 2019-06-06T21:22:50Z
Finished At = 2019-06-06T21:22:55Z
Total Restarts = 0
Last Restart = N/A
Recent Events:
Time Type Description
2019-06-06T17:22:55-04:00 Not Restarting Policy allows no restarts
2019-06-06T17:22:55-04:00 Terminated Exit Code: 1
2019-06-06T17:22:50-04:00 Started Task started by client
2019-06-06T17:22:50-04:00 Task Setup Building Task Directory
2019-06-06T17:22:50-04:00 Received Task received by client
```
This adds a `nomad alloc stop` command that can be used to stop and
force migrate an allocation to a different node.
This is built on top of the AllocUpdateDesiredTransitionRequest and
explicitly limits the scope of access to that transition to expose it
under the alloc-lifecycle ACL.
The API returns the follow up eval that can be used as part of
monitoring in the CLI or parsed and used in an external tool.
Currently when operators need to log onto a machine where an alloc
is running they will need to perform both an alloc/job status
call and then a call to discover the node name from the node list.
This updates both the job status and alloc status output to include
the node name within the information to make operator use easier.
Closes#2359
Cloess #1180
This commit implements an allocation selection algorithm for finding
allocations to preempt. It currently special cases network resource asks
from others (cpu/memory/disk/iops).
This PR fixes:
* An issue in which a node-drain that contains a complete batch alloc
would cause a replacement
* An issue in which allocations with the same name during a scale
down/stop event wouldn't be properly stopped.
* An issue in which batch allocations from previous job versions may not
have been stopped properly.
Fixes https://github.com/hashicorp/nomad/issues/3210