This PR fixes a long standing bug where submitting jobs with changes
to connect services would not trigger updates as expected. Previously,
service blocks were not considered as sources of destructive updates
since they could be synced with consul non-destructively. With Connect,
task group services that have changes to their connect block or to
the service port should be destructive, since the network plumbing of
the alloc is going to need updating.
Fixes#8596#7991
Non-destructive half in #7192
This change fixes a bug where lost/failed allocations are replaced by
allocations with the latest versions, even if the version hasn't been
promoted yet.
Now, when generating a plan for lost/failed allocations, the scheduler
first checks if the current deployment is in Canary stage, and if so, it
ensures that any lost/failed allocations is replaced one with the latest
promoted version instead.
* scheduler/reconcile: set FollowupEvalID on lost stop_after_client_disconnect
* scheduler/reconcile: thread follupEvalIDs through to results.stop
* scheduler/reconcile: comment typo
* nomad/_test: correct arguments for plan.AppendStoppedAlloc
* scheduler/reconcile: avoid nil, cleanup handleDelayed(Lost|Reschedules)
* jobspec, api: add stop_after_client_disconnect
* nomad/state/state_store: error message typo
* structs: alloc methods to support stop_after_client_disconnect
1. a global AllocStates to track status changes with timestamps. We
need this to track the time at which the alloc became lost
originally.
2. ShouldClientStop() and WaitClientStop() to actually do the math
* scheduler/reconcile_util: delayByStopAfterClientDisconnect
* scheduler/reconcile: use delayByStopAfterClientDisconnect
* scheduler/util: updateNonTerminalAllocsToLost comments
This was setup to only update allocs to lost if the DesiredStatus had
already been set by the scheduler. It seems like the intention was to
update the status from any non-terminal state, and not all lost allocs
have been marked stop or evict by now
* scheduler/testing: AssertEvalStatus just use require
* scheduler/generic_sched: don't create a blocked eval if delayed
* scheduler/generic_sched_test: several scheduling cases
diffSystemAllocs -> diffSystemAllocsForNode, this function is only used
for diffing system allocations, but lacked awareness of eligible
nodes and the node ID that the allocation was going to be placed.
This change now ignores a change if its existing allocation is on an
ineligible node. For a new allocation, it also checks tainted and
ineligible nodes in the same function instead of nil-ing out the diff
after computation in diffSystemAllocs
If an alloc is being preempted and marked as evict, but the underlying
node is lost before the migration takes place, the allocation currently
stays as desired evict, status running forever, or until the node comes
back online.
This commit updates updateNonTerminalAllocsToLost to check for a
destired status of Evict as well as Stop when updating allocations on
tainted nodes.
switch to table test for lost node cases
Fixes documentation inaccuracy for spread stanza placement. Spreads can
only exist on the top level job struct or within a group.
comment about nil assumption
Adds checks for affinity and constraint changes when determining if we
should update inplace.
refactor to check all levels at once
check for spread changes when checking inplace update
Adds a new Prerun and Postrun hooks to manage set up of network namespaces
on linux. Work still needs to be done to make the code platform agnostic and
support Docker style network initalization.
IOPS have been modelled as a resource since Nomad 0.1 but has never
actually been detected and there is no plan in the short term to add
detection. This is because IOPS is a bit simplistic of a unit to define
the performance requirements from the underlying storage system. In its
current state it adds unnecessary confusion and can be removed without
impacting any users. This PR leaves IOPS defined at the jobspec parsing
level and in the api/ resources since these are the two public uses of
the field. These should be considered deprecated and only exist to allow
users to stop using them during the Nomad 0.9.x release. In the future,
there should be no expectation that the field will exist.