If a job update includes a task group that has no changes, those allocations
have their version bumped in-place. The ends up triggering an eval from
`deploymentwatcher` when it verifies their health. Although this eval is a
no-op, we were only treating pending deployments the same as paused when
the deployment was a new MRD. This means that any eval after the initial one
will kick off the deployment, and that caused pending deployments to "jump
the queue" and run ahead of schedule, breaking MRD invariants and resulting in
a state with all regions blocked.
This behavior can be replicated even in the case of job updates with no
in-place updates by patching `deploymentwatcher` to inject a spurious no-op
eval. This changeset fixes the behavior by treating pending deployments the
same as paused in all cases in the reconciler.
Go 1.14.4 contains two CVEs which are fixed in 1.14.5:
- [CVE-2020-15586](https://golang.org/issue/34902)
- [CVE-2020-14039](https://golang.org/issue/39360)
Upon consideration with HashiCorp security these CVEs are considered low
severity for Nomad and no new security fix binary will be released.
This fixes a bug where jobs may get "stuck" unprocessed that
dispropotionately affect periodic jobs around leadership transitions.
When registering a job, the job registration and the eval to process it
get applied to raft as two separate transactions; if the job
registration succeeds but eval application fails, the job may remain
unprocessed. Operators may detect such failure, when submitting a job
update and get a 500 error code, and they could retry; periodic jobs
failures are more likely to go unnoticed, and no further periodic
invocations will be processed until an operator force evaluation.
This fixes the issue by ensuring that the job registration and eval
application get persisted and processed atomically in the same raft log
entry.
Also, applies the same change to ensure atomicity in job deregistration.
Backward Compatibility
We must maintain compatibility in two scenarios: mixed clusters where a
leader can handle atomic updates but followers cannot, and a recent
cluster processes old log entries from legacy or mixed cluster mode.
To handle this constraints: ensure that the leader continue to emit the
Evaluation log entry until all servers have upgraded; also, when
processing raft logs, the servers honor evaluations found in both spots,
the Eval in job (de-)registration and the eval update entries.
When an updated server sees mix-mode behavior where an eval is inserted
into the raft log twice, it ignores the second instance.
I made one compromise in consistency in the mixed-mode scenario: servers
may disagree on the eval.CreateIndex value: the leader and updated
servers will report the job registration index while old servers will
report the index of the eval update log entry. This discripency doesn't
seem to be material - it's the eval.JobModifyIndex that matters.
Deployments should wait until kicked off by `Job.Register` so that we can
assert that all regions have a scheduled deployment before starting any
region. This changeset includes the OSS fixes to support the ENT work.
`IsMultiregionStarter` has no more callers in OSS, so remove it here.
It's supposed to be possible for a region not to have `datacenters` set so
that it can use the job's `datacenters` field. This requires that operators
use the same DC name across multiple regions, but that's the default client
configuration.
This updates the Ember edition setting to Octane, which I removed from #8319
because it required the template-only Glimmer components setting to be turned
on, which this does. These changes to templates accommodate that setting.
Before docker, the only default was `SIGINT` for `kill_signal`. The
docker driver however defaults to `SIGTERM`, and we should document
as such.
Fixes#7140
This changes fixes a syntax error in the autoscaling apm plugin
docs as well as updates the scaling stanza doc. The stazna wording
implied its use was only for external autoscalers, whereas it also
is used by the UI.
This includes fixes for newer template lint rules that came along with
updating that dependency, which was necessary to be able to use
the no-curly-component-invocation rule. It also updates some curly
invocations that I missed in #8075.