* core: allow pause/un-pause of eval broker on region leader.
* agent: add ability to pause eval broker via scheduler config.
* cli: add operator scheduler commands to interact with config.
* api: add ability to pause eval broker via scheduler config
* e2e: add operator scheduler test for eval broker pause.
* docs: include new opertor scheduler CLI and pause eval API info.
When calculating a services namespace for registration, the code
assumed the first task within the task array would include a
service block. This is incorrect as it is possible only a latter
task within the array contains a service definition.
This change fixes the logic, so we correctly search for a service
definition before identifying the namespace.
This PR adds the 'choose' query parameter to the '/v1/service/<service>' endpoint.
The value of 'choose' is in the form '<number>|<key>', number is the number
of desired services and key is a value unique but consistent to the requester
(e.g. allocID).
Folks aren't really expected to use this API directly, but rather through consul-template
which will soon be getting a new helper function making use of this query parameter.
Example,
curl 'localhost:4646/v1/service/redis?choose=2|abc123'
Note: consul-templte v0.29.1 includes the necessary nomadServices functionality.
The plan applier has to get a snapshot with a minimum index for the
plan it's working on in order to ensure consistency. Under heavy raft
loads, we can exceed the timeout. When this happens, we hit a bug
where the plan applier blocks waiting on the `indexCh` forever, and
all schedulers will block in `Plan.Submit`.
Closing the `indexCh` when the `asyncPlanWait` is done with it will
prevent the deadlock without impacting correctness of the previous
snapshot index.
This changeset includes the a PoC failing test that works by injecting
a large timeout into the state store. We need to turn this into a test
we can run normally without breaking the state store before we can
merge this PR.
Increase `snapshotMinIndex` timeout to 10s.
This timeout creates backpressure where any concurrent `Plan.Submit`
RPCs will block waiting for results. This sheds load across all
servers and gives raft some CPU to catch up, because schedulers won't
dequeue more work while waiting. Increase it to 10s based on
observations of large production clusters.
The changelog entry for #13340 indicated it was an improvement. But on
discussion, it was determined that this was a workaround for a
regression. Update the changelog to make this clear.
If the node has been GC'd or is down, we can't send it a node
unpublish. The CSI spec requires that we don't send the controller
unpublish before the node unpublish, but in the case where a node is
gone we can't know the final fate of the node unpublish step.
The `csi_hook` on the client will unpublish if the allocation has
stopped and if the host is terminated there's no mount for the volume
anyways. So we'll now assume that the node has unpublished at its
end. If it hasn't, any controller unpublish will potentially hang or
error and need to be retried.
As a performance optimization in the scheduler, feasibility checks
that apply to an entire class are only checked once for all nodes of
that class. Other feasibility checks are "available" checks because
they rely on more ephemeral characteristics and don't contribute to
the hash for the node class. This currently includes only CSI.
We have a separate fast path for "available" checks when the node has
already been marked eligible on the basis of class. This fast path has
a bug where it returns early rather than continuing the loop. This
causes the entire task group to be rejected.
Fix the bug by not returning early in the fast path and instead jump
to the top of the loop like all the other code paths in this method.
Includes a new test exercising topology at whole-scheduler level and a
fix for an existing test that should've caught this previously.
This PR update to Go 1.18.2. Also update the versions of hclfmt
and go-hclogfmt which includes newer dependencies necessary for dealing
with go1.18.
The hcl v2 branch is now 'nomad-v2.9.1+tweaks2', to include a fix for
newer macOS versions: 8927e75e82
Fix numerous go-getter security issues:
- Add timeouts to http, git, and hg operations to prevent DoS
- Add size limit to http to prevent resource exhaustion
- Disable following symlinks in both artifacts and `job run`
- Stop performing initial HEAD request to avoid file corruption on
retries and DoS opportunities.
**Approach**
Since Nomad has no ability to differentiate a DoS-via-large-artifact vs
a legitimate workload, all of the new limits are configurable at the
client agent level.
The max size of HTTP downloads is also exposed as a node attribute so
that if some workloads have large artifacts they can specify a high
limit in their jobspecs.
In the future all of this plumbing could be extended to enable/disable
specific getters or artifact downloading entirely on a per-node basis.
* LastIndexOf and always append a namespace on job links
* Confirmed the volume equivalent and simplified idWIthNamespace logic
* Changelog added
* PR comments addressed
* Drop the redirect for the time being
* Tests updated to reflect namespace on links
* Task detail test default namespace link for test