Update github.com/aws/aws-sdk-go and github.com/hashicorp/go-discover to
pick up support for EC2 Metadata Instance Service v2 changes.
Follow up to https://github.com/hashicorp/go-discover/pull/128 .
Fixes#5856
When the scheduler looks for a placement for an allocation that's
replacing another allocation, it's supposed to penalize the previous
node if the allocation had been rescheduled or failed. But we're
currently always penalizing the node, which leads to unnecessary
migrations on job update.
This commit leaves in place the existing behavior where if the
previous alloc was itself rescheduled, its previous nodes are also
penalized. This is conservative but the right behavior especially on
larger clusters where a group of hosts might be having correlated
trouble (like an AZ failure).
Co-Authored-By: Michael Schurter <mschurter@hashicorp.com>
Fixes#6787
In ProposedAllocs the proposed alloc slice was being copied while its
contents were not. Since RemoveAllocs nils elements of the proposed
alloc slice and is called twice, it could panic on the second call when
erroneously accessing a nil'd alloc.
The fix is to not copy the proposed alloc slice and pass the slice
returned by the 1st RemoveAllocs call to the 2nd call, thus maintaining
the trimmed length.
Add an RPC timeout for logmon. In
https://github.com/hashicorp/nomad/issues/6461#issuecomment-559747758 ,
`logmonClient.Stop` locked up and indefinitely blocked the task runner
destroy operation.
This is an incremental improvement. We still need to follow up to
understand how we got to that state, and the full impact of locked-up
Stop and its link to pending allocations on restart.
Some code cleanup:
* Use a field for setting EC2 metadata instead of env-vars in testing;
but keep environment variables for backward compatibility reasons
* Update tests to use testify
* fix plugin launcher SetConfig msgpack params
The plugin launcher tool was passing the wrong byte array into
`SetConfig`, resulting in msgpack decoding errors. This was fixed in
a949050 (#6725) but accidentally reverted in 6aff18d (#6590).
Co-Authored-By: Chris Baker <1675087+cgbaker@users.noreply.github.com>
Extends the BasicAllocStats test to include a test for Windows
clients, exercising stats via a powershell `raw_exec` job.
Adds `ListLinuxClientNodes` and `ListWindowsClientNodes` utils so that
we can scope tests to run only when Linux or Windows clients are
available. This prevents waiting on timeouts when running a subset of
the tests against a development cluster (vs our nightly test
cluster).