202 lines
6.2 KiB
Markdown
202 lines
6.2 KiB
Markdown
|
# Architecture: Evaluation Status
|
||
|
|
||
|
The [Scheduling in Nomad][] internals documentation covers the path that an
|
||
|
evaluation takes through the leader, worker, and plan applier. But it doesn't
|
||
|
cover in any detail the various `Evaluation.Status` values, or where the
|
||
|
`PreviousEval`, `NextEval`, or `BlockedEval` ID pointers are set.
|
||
|
|
||
|
The state diagram below describes the transitions between `Status` values as
|
||
|
solid arrows. The dashed arrows represent when a new evaluation is created. The
|
||
|
parenthetical labels on those arrows are the `TriggeredBy` field for the new
|
||
|
evaluation.
|
||
|
|
||
|
The status values are:
|
||
|
|
||
|
* `pending` evaluations either are queued to be scheduled, are still being
|
||
|
processed in the scheduler, or are being applied by the plan applier and not
|
||
|
yet acknowledged.
|
||
|
* `failed` evaluations have failed to be applied by the plan applier (or are
|
||
|
somehow invalid in the scheduler; this is always a bug)
|
||
|
* `blocked` evaluations are created when an eval has failed too many attempts to
|
||
|
have its plan applied by the leader, or when a plan can only be partially
|
||
|
applied and there are still more allocations to create.
|
||
|
* `complete` means the plan was applied successfully (at least partially).
|
||
|
* `canceled` means the evaluation was superseded by state changes like a new
|
||
|
version of the job.
|
||
|
|
||
|
|
||
|
```mermaid
|
||
|
flowchart LR
|
||
|
|
||
|
event((Cluster\nEvent))
|
||
|
|
||
|
pending([pending])
|
||
|
blocked([blocked])
|
||
|
complete([complete])
|
||
|
failed([failed])
|
||
|
canceled([canceled])
|
||
|
|
||
|
%% style classes
|
||
|
classDef status fill:#d5f6ea,stroke-width:4px,stroke:#1d9467
|
||
|
classDef other fill:#d5f6ea,stroke:#1d9467
|
||
|
class event other;
|
||
|
class pending,blocked,complete,failed,canceled status;
|
||
|
|
||
|
event -. "job-register
|
||
|
job-deregister
|
||
|
periodic-job
|
||
|
node-update
|
||
|
node-drain
|
||
|
alloc-stop
|
||
|
scheduled
|
||
|
alloc-failure
|
||
|
job-scaling" .-> pending
|
||
|
|
||
|
pending -. "new eval\n(rolling-update)" .-> pending
|
||
|
pending -. "new eval\n(preemption)" .-> pending
|
||
|
|
||
|
pending -. "new eval\n(max-plan-attempts)" .-> blocked
|
||
|
pending -- if plan submitted --> complete
|
||
|
pending -- if invalid --> failed
|
||
|
pending -- if no-op --> canceled
|
||
|
|
||
|
failed -- if retried --> blocked
|
||
|
failed -- if retried --> complete
|
||
|
|
||
|
blocked -- if no-op --> canceled
|
||
|
blocked -- if plan submitted --> complete
|
||
|
|
||
|
complete -. "new eval\n(deployment-watcher)" .-> pending
|
||
|
complete -. "new eval\n(queued-allocs)" .-> blocked
|
||
|
|
||
|
failed -. "new eval\n(failed-follow-up)" .-> pending
|
||
|
```
|
||
|
|
||
|
But it's hard to get a full picture of the evaluation lifecycle purely from the
|
||
|
`Status` fields, because evaluations have several "quasi-statuses" which aren't
|
||
|
represented as explicit statuses in the state store:
|
||
|
|
||
|
* `scheduling` is the status where an eval is being processed by the scheduler
|
||
|
worker.
|
||
|
* `applying` is the status where the resulting plan for the eval is being
|
||
|
applied in the plan applier on the leader.
|
||
|
* `delayed` is an enqueued eval that will be dequeued some time in the future.
|
||
|
* `deleted` is when an eval is removed from the state store entirely.
|
||
|
|
||
|
By adding these statuses to the diagram (the dashed nodes), you can see where
|
||
|
the same `Status` transition might result in different `PreviousEval`,
|
||
|
`NextEval`, or `BlockedEval` set. You can also see where the "chain" of
|
||
|
evaluations is broken when new evals are created for preemptions or by the
|
||
|
deployment watcher.
|
||
|
|
||
|
|
||
|
```mermaid
|
||
|
flowchart LR
|
||
|
|
||
|
event((Cluster\nEvent))
|
||
|
|
||
|
%% statuss
|
||
|
pending([pending])
|
||
|
blocked([blocked])
|
||
|
complete([complete])
|
||
|
failed([failed])
|
||
|
canceled([canceled])
|
||
|
|
||
|
%% quasi-statuss
|
||
|
deleted([deleted])
|
||
|
delayed([delayed])
|
||
|
scheduling([scheduling])
|
||
|
applying([applying])
|
||
|
|
||
|
%% style classes
|
||
|
classDef status fill:#d5f6ea,stroke-width:4px,stroke:#1d9467
|
||
|
classDef quasistatus fill:#d5f6ea,stroke-dasharray: 5 5,stroke:#1d9467
|
||
|
classDef other fill:#d5f6ea,stroke:#1d9467
|
||
|
|
||
|
class event other;
|
||
|
class pending,blocked,complete,failed,canceled status;
|
||
|
class deleted,delayed,scheduling,applying quasistatus;
|
||
|
|
||
|
event -- "job-register
|
||
|
job-deregister
|
||
|
periodic-job
|
||
|
node-update
|
||
|
node-drain
|
||
|
alloc-stop
|
||
|
scheduled
|
||
|
alloc-failure
|
||
|
job-scaling" --> pending
|
||
|
|
||
|
pending -- dequeued --> scheduling
|
||
|
pending -- if delayed --> delayed
|
||
|
delayed -- dequeued --> scheduling
|
||
|
|
||
|
scheduling -. "not all allocs placed
|
||
|
new eval created by scheduler
|
||
|
trigger queued-allocs
|
||
|
new has .PreviousEval = old.ID
|
||
|
old has .BlockedEval = new.ID" .-> blocked
|
||
|
|
||
|
scheduling -. "failed to plan
|
||
|
new eval created by scheduler
|
||
|
trigger: max-plan-attempts
|
||
|
new has .PreviousEval = old.ID
|
||
|
old has .BlockedEval = new.ID" .-> blocked
|
||
|
|
||
|
scheduling -- "not all allocs placed
|
||
|
reuse already-blocked eval" --> blocked
|
||
|
|
||
|
blocked -- "unblocked by
|
||
|
external state changes" --> scheduling
|
||
|
|
||
|
scheduling -- allocs placed --> complete
|
||
|
|
||
|
scheduling -- "wrong eval type or
|
||
|
max retries exceeded
|
||
|
on plan submit" --> failed
|
||
|
|
||
|
scheduling -- "canceled by
|
||
|
job update/stop" --> canceled
|
||
|
|
||
|
failed -- retry --> scheduling
|
||
|
|
||
|
scheduling -. "new eval from rolling update (system jobs)
|
||
|
created by scheduler
|
||
|
trigger: rolling-update
|
||
|
new has .PreviousEval = old.ID
|
||
|
old has .NextEval = new.ID" .-> pending
|
||
|
|
||
|
scheduling -- submit --> applying
|
||
|
applying -- failed --> scheduling
|
||
|
|
||
|
applying -. "new eval for preempted allocs
|
||
|
created by plan applier
|
||
|
trigger: preemption
|
||
|
new has .PreviousEval = unset!
|
||
|
old has .BlockedEval = unset!" .-> pending
|
||
|
|
||
|
complete -. "new eval from deployments (service jobs)
|
||
|
created by deploymentwatcher
|
||
|
trigger: deployment-watcher
|
||
|
new has .PreviousEval = unset!
|
||
|
old has .NextEval = unset!" .-> pending
|
||
|
|
||
|
failed -- "new eval
|
||
|
trigger: failed-follow-up
|
||
|
new has .PreviousEval = old.ID
|
||
|
old has .NextEval = new.ID" --> pending
|
||
|
|
||
|
pending -- "undeliverable evals
|
||
|
reaped by leader" --> failed
|
||
|
|
||
|
blocked -- "duplicate blocked evals
|
||
|
reaped by leader" --> canceled
|
||
|
|
||
|
canceled -- garbage\ncollection --> deleted
|
||
|
failed -- garbage\ncollection --> deleted
|
||
|
complete -- garbage\ncollection --> deleted
|
||
|
```
|
||
|
|
||
|
|
||
|
[Scheduling in Nomad]: https://www.nomadproject.io/docs/internals/scheduling/scheduling
|