When multiple Connect-enabled task groups start on the same client
node, a race condition in the CNI plugins for creating iptables chains
causes one of the tasks to fail. We upstreamed a patch to CNI plugins
to make iptables chain creation idempotent.
This changeset updates end-to-end testing, development tooling, and
documentation to use 0.8.4 which includes our patch.
Now that alloc.Canonicalize() is called in all alloc sources in the
client (i.e. on state restore and RPC fetching), we no longer need to
check alloc.TaskResources.
alloc.AllocatedResources is always non-nil through alloc runner.
Though, early on, we check for alloc validity, so NewTaskRunner and
TaskEnv must still check. `TestClient_AddAllocError` test validates
that behavior.
This commit ensures that Alloc.AllocatedResources is properly populated
when read from persistence stores (namely Raft and client state store).
The alloc struct may have been written previously by an arbitrary old
version that may only populate Alloc.TaskResources.
In 0.10.2 (specifically 387b016) we added interpolation to group
service blocks and centralized the logic for task environment
interpolation. This wasn't also added to script checks, which caused a
regression where the IDs for script checks for services w/
interpolated fields (ex. the service name) didn't match the service ID
that was registered with Consul.
This changeset calls the same taskenv interpolation logic during
`script_check` configuration, and adds tests to reduce the risk of
future regressions by comparing the IDs of service hook and the check hook.
When parsing a config file which had the consul.timeout param set,
Nomad was reporting an error causing startup to fail. This seems
to be caused by the HCL decoder interpreting the timeout type as
an int rather than a string. This is caused by the struct
TimeoutHCL param having a hcl key of timeout alongside a Timeout
struct param of type time.Duration (int). Ensuring the decoder
ignores the Timeout struct param ensure the decoder runs
correctly.
If an alloc is being preempted and marked as evict, but the underlying
node is lost before the migration takes place, the allocation currently
stays as desired evict, status running forever, or until the node comes
back online.
This commit updates updateNonTerminalAllocsToLost to check for a
destired status of Evict as well as Stop when updating allocations on
tainted nodes.
switch to table test for lost node cases
Increase the shortened timeout after the first loop so that metrics
that take longer to come in aren't failing the test unnecessarily.
Move the check for empty alloc metrics into the loop so that if the
first values we get are empty we don't fail the test too early.