Some operators use very long group/task `shutdown_delay` settings to
safely drain network connections to their workloads after service
deregistration. But during incident response, they may want to cause
that drain to be skipped so they can quickly shed load.
Provide a `-no-shutdown-delay` flag on the `nomad alloc stop` and
`nomad job stop` commands that bypasses the delay. This sets a new
desired transition state on the affected allocations that the
allocation/task runner will identify during pre-kill on the client.
Note (as documented here) that using this flag will almost always
result in failed inbound network connections for workloads as the
tasks will exit before clients receive updated service discovery
information and won't be gracefully drained.
state store: call-out to generic update of job recommendations from job update method
recommendations API work, and http endpoint errors for OSS
support for scaling polices in task block of job spec
add query filters for ScalingPolicy list endpoint
command: nomad scaling policy list: added -job and -type
Fixes#9017
The ?resources=true query parameter includes resources in the object
stub listings. Specifically:
- For `/v1/nodes?resources=true` both the `NodeResources` and
`ReservedResources` field are included.
- For `/v1/allocations?resources=true` the `AllocatedResources` field is
included.
The ?task_states=false query parameter removes TaskStates from
/v1/allocations responses. (By default TaskStates are included.)
Fix a bug where a millicious user can access or manipulate an alloc in a
namespace they don't have access to. The allocation endpoints perform
ACL checks against the request namespace, not the allocation namespace,
and performs the allocation lookup independently from namespaces.
Here, we check that the requested can access the alloc namespace
regardless of the declared request namespace.
Ideally, we'd enforce that the declared request namespace matches
the actual allocation namespace. Unfortunately, we haven't documented
alloc endpoints as namespaced functions; we suspect starting to enforce
this will be very disruptive and inappropriate for a nomad point
release. As such, we maintain current behavior that doesn't require
passing the proper namespace in request. A future major release may
start enforcing checking declared namespace.
This adds a `nomad alloc stop` command that can be used to stop and
force migrate an allocation to a different node.
This is built on top of the AllocUpdateDesiredTransitionRequest and
explicitly limits the scope of access to that transition to expose it
under the alloc-lifecycle ACL.
The API returns the follow up eval that can be used as part of
monitoring in the CLI or parsed and used in an external tool.
The ForceLeaveRequest endpoint may only be called on servers, but the
code was using a Client to resolve tokens. This would cause a panic when
an agent wasn't both a Server and a Client.
This PR makes GetAllocs use a blocking query as well as adding a sanity
check to the clients watchAllocation code to ensure it gets the correct
allocations.
This PR fixes https://github.com/hashicorp/nomad/issues/2119 and
https://github.com/hashicorp/nomad/issues/2153.
The issue was that the client was talking to two different servers, one
to check which allocations to pull and the other to pull those
allocations. However the latter call was not with a blocking query and
thus the client would not retreive the allocations it requested.
The logging has been improved to make the problem more clear as well.
* Reverted changes to get methods
* Added prefix query parameter
* Updated node status to use prefix based searching
* Fixed tests
* Removed truncation logic
* Use go-memdb prefix indexer for lookups
* Add Job lookups
* Update state store with new ByIDPrefix get methods
* Call new methods when exact lookup fails or is not applicable