open-nomad/contributing/architecture-eval-states.md
Tim Gross 2a6e8be6ba
internals documentation with diagrams (#14750)
This changeset adds new architecture internals documents to the contributing
guide. These are intentionally here and not on the public-facing website because
the material is not required for operators and includes a lot of diagrams that
we can cheaply maintain with mermaid syntax but would involve art assets to have
up on the main site that would become quickly out of date as code changes happen
and be extremely expensive to maintain. However, these should be suitable to use
as points of conversation with expert end users.

Included:
* A description of Evaluation triggers and expected counts, with examples.
* A description of Evaluation states and implicit states. This is taken from an
  internal document in our team wiki.
* A description of how writing the State Store works. This is taken from a
  diagram I put together a few months ago for internal education purposes.
* A description of Evaluation lifecycle, from registration to running
  Allocations. This is mostly lifted from @lgfa29's amazing mega-diagram, but
  broken into digestible chunks and without multi-region deployments, which I'd
  like to cover in a future doc.

Also includes adding Deployments to our public-facing glossary.

Co-authored-by: Luiz Aoqui <luiz@hashicorp.com>
Co-authored-by: Michael Schurter <mschurter@hashicorp.com>
Co-authored-by: Seth Hoenig <shoenig@duck.com>
2022-10-03 14:06:41 -04:00

202 lines
6.2 KiB
Markdown

# Architecture: Evaluation Status
The [Scheduling in Nomad][] internals documentation covers the path that an
evaluation takes through the leader, worker, and plan applier. But it doesn't
cover in any detail the various `Evaluation.Status` values, or where the
`PreviousEval`, `NextEval`, or `BlockedEval` ID pointers are set.
The state diagram below describes the transitions between `Status` values as
solid arrows. The dashed arrows represent when a new evaluation is created. The
parenthetical labels on those arrows are the `TriggeredBy` field for the new
evaluation.
The status values are:
* `pending` evaluations either are queued to be scheduled, are still being
processed in the scheduler, or are being applied by the plan applier and not
yet acknowledged.
* `failed` evaluations have failed to be applied by the plan applier (or are
somehow invalid in the scheduler; this is always a bug)
* `blocked` evaluations are created when an eval has failed too many attempts to
have its plan applied by the leader, or when a plan can only be partially
applied and there are still more allocations to create.
* `complete` means the plan was applied successfully (at least partially).
* `canceled` means the evaluation was superseded by state changes like a new
version of the job.
```mermaid
flowchart LR
event((Cluster\nEvent))
pending([pending])
blocked([blocked])
complete([complete])
failed([failed])
canceled([canceled])
%% style classes
classDef status fill:#d5f6ea,stroke-width:4px,stroke:#1d9467
classDef other fill:#d5f6ea,stroke:#1d9467
class event other;
class pending,blocked,complete,failed,canceled status;
event -. "job-register
job-deregister
periodic-job
node-update
node-drain
alloc-stop
scheduled
alloc-failure
job-scaling" .-> pending
pending -. "new eval\n(rolling-update)" .-> pending
pending -. "new eval\n(preemption)" .-> pending
pending -. "new eval\n(max-plan-attempts)" .-> blocked
pending -- if plan submitted --> complete
pending -- if invalid --> failed
pending -- if no-op --> canceled
failed -- if retried --> blocked
failed -- if retried --> complete
blocked -- if no-op --> canceled
blocked -- if plan submitted --> complete
complete -. "new eval\n(deployment-watcher)" .-> pending
complete -. "new eval\n(queued-allocs)" .-> blocked
failed -. "new eval\n(failed-follow-up)" .-> pending
```
But it's hard to get a full picture of the evaluation lifecycle purely from the
`Status` fields, because evaluations have several "quasi-statuses" which aren't
represented as explicit statuses in the state store:
* `scheduling` is the status where an eval is being processed by the scheduler
worker.
* `applying` is the status where the resulting plan for the eval is being
applied in the plan applier on the leader.
* `delayed` is an enqueued eval that will be dequeued some time in the future.
* `deleted` is when an eval is removed from the state store entirely.
By adding these statuses to the diagram (the dashed nodes), you can see where
the same `Status` transition might result in different `PreviousEval`,
`NextEval`, or `BlockedEval` set. You can also see where the "chain" of
evaluations is broken when new evals are created for preemptions or by the
deployment watcher.
```mermaid
flowchart LR
event((Cluster\nEvent))
%% statuss
pending([pending])
blocked([blocked])
complete([complete])
failed([failed])
canceled([canceled])
%% quasi-statuss
deleted([deleted])
delayed([delayed])
scheduling([scheduling])
applying([applying])
%% style classes
classDef status fill:#d5f6ea,stroke-width:4px,stroke:#1d9467
classDef quasistatus fill:#d5f6ea,stroke-dasharray: 5 5,stroke:#1d9467
classDef other fill:#d5f6ea,stroke:#1d9467
class event other;
class pending,blocked,complete,failed,canceled status;
class deleted,delayed,scheduling,applying quasistatus;
event -- "job-register
job-deregister
periodic-job
node-update
node-drain
alloc-stop
scheduled
alloc-failure
job-scaling" --> pending
pending -- dequeued --> scheduling
pending -- if delayed --> delayed
delayed -- dequeued --> scheduling
scheduling -. "not all allocs placed
new eval created by scheduler
trigger queued-allocs
new has .PreviousEval = old.ID
old has .BlockedEval = new.ID" .-> blocked
scheduling -. "failed to plan
new eval created by scheduler
trigger: max-plan-attempts
new has .PreviousEval = old.ID
old has .BlockedEval = new.ID" .-> blocked
scheduling -- "not all allocs placed
reuse already-blocked eval" --> blocked
blocked -- "unblocked by
external state changes" --> scheduling
scheduling -- allocs placed --> complete
scheduling -- "wrong eval type or
max retries exceeded
on plan submit" --> failed
scheduling -- "canceled by
job update/stop" --> canceled
failed -- retry --> scheduling
scheduling -. "new eval from rolling update (system jobs)
created by scheduler
trigger: rolling-update
new has .PreviousEval = old.ID
old has .NextEval = new.ID" .-> pending
scheduling -- submit --> applying
applying -- failed --> scheduling
applying -. "new eval for preempted allocs
created by plan applier
trigger: preemption
new has .PreviousEval = unset!
old has .BlockedEval = unset!" .-> pending
complete -. "new eval from deployments (service jobs)
created by deploymentwatcher
trigger: deployment-watcher
new has .PreviousEval = unset!
old has .NextEval = unset!" .-> pending
failed -- "new eval
trigger: failed-follow-up
new has .PreviousEval = old.ID
old has .NextEval = new.ID" --> pending
pending -- "undeliverable evals
reaped by leader" --> failed
blocked -- "duplicate blocked evals
reaped by leader" --> canceled
canceled -- garbage\ncollection --> deleted
failed -- garbage\ncollection --> deleted
complete -- garbage\ncollection --> deleted
```
[Scheduling in Nomad]: https://www.nomadproject.io/docs/internals/scheduling/scheduling