open-nomad/contributing/architecture-eval-states.md

# Architecture: Evaluation Status

The [Scheduling in Nomad][] internals documentation covers the path that an
evaluation takes through the leader, worker, and plan applier. But it doesn't
cover in any detail the various `Evaluation.Status` values, or where the
`PreviousEval`, `NextEval`, or `BlockedEval` ID pointers are set.

The state diagram below describes the transitions between `Status` values as
solid arrows. The dashed arrows represent when a new evaluation is created. The
parenthetical labels on those arrows are the `TriggeredBy` field for the new
evaluation.

The status values are:

* `pending` evaluations either are queued to be scheduled, are still being
  processed in the scheduler, or are being applied by the plan applier and not
  yet acknowledged.
* `failed` evaluations have failed to be applied by the plan applier (or are
  somehow invalid in the scheduler; this is always a bug)
* `blocked` evaluations are created when an eval has failed too many attempts to
  have its plan applied by the leader, or when a plan can only be partially
  applied and there are still more allocations to create.
* `complete` means the plan was applied successfully (at least partially).
* `canceled` means the evaluation was superseded by state changes like a new
  version of the job.


```mermaid
flowchart LR

    event((Cluster\nEvent))

    pending([pending])
    blocked([blocked])
    complete([complete])
    failed([failed])
    canceled([canceled])

    %% style classes
    classDef status fill:#d5f6ea,stroke-width:4px,stroke:#1d9467
    classDef other fill:#d5f6ea,stroke:#1d9467
    class event other;
    class pending,blocked,complete,failed,canceled status;

    event -. "job-register
      job-deregister
      periodic-job
      node-update
      node-drain
      alloc-stop
      scheduled
      alloc-failure
      job-scaling" .-> pending

    pending -. "new eval\n(rolling-update)" .-> pending
    pending -. "new eval\n(preemption)" .-> pending

    pending -. "new eval\n(max-plan-attempts)" .-> blocked
    pending -- if plan submitted --> complete
    pending -- if invalid --> failed
    pending -- if no-op --> canceled

    failed -- if retried --> blocked
    failed -- if retried --> complete

    blocked -- if no-op --> canceled
    blocked -- if plan submitted --> complete

    complete -. "new eval\n(deployment-watcher)" .-> pending
    complete -. "new eval\n(queued-allocs)" .-> blocked

    failed -. "new eval\n(failed-follow-up)" .-> pending
```

But it's hard to get a full picture of the evaluation lifecycle purely from the
`Status` fields, because evaluations have several "quasi-statuses" which aren't
represented as explicit statuses in the state store:

* `scheduling` is the status where an eval is being processed by the scheduler
  worker.
* `applying` is the status where the resulting plan for the eval is being
  applied in the plan applier on the leader.
* `delayed` is an enqueued eval that will be dequeued some time in the future.
* `deleted` is when an eval is removed from the state store entirely.

By adding these statuses to the diagram (the dashed nodes), you can see where
the same `Status` transition might result in different `PreviousEval`,
`NextEval`, or `BlockedEval` set. You can also see where the "chain" of
evaluations is broken when new evals are created for preemptions or by the
deployment watcher.


```mermaid
flowchart LR

    event((Cluster\nEvent))

    %% statuss
    pending([pending])
    blocked([blocked])
    complete([complete])
    failed([failed])
    canceled([canceled])

    %% quasi-statuss
    deleted([deleted])
    delayed([delayed])
    scheduling([scheduling])
    applying([applying])

    %% style classes
    classDef status fill:#d5f6ea,stroke-width:4px,stroke:#1d9467
    classDef quasistatus fill:#d5f6ea,stroke-dasharray: 5 5,stroke:#1d9467
    classDef other fill:#d5f6ea,stroke:#1d9467

    class event other;
    class pending,blocked,complete,failed,canceled status;
    class deleted,delayed,scheduling,applying quasistatus;

    event -- "job-register
      job-deregister
      periodic-job
      node-update
      node-drain
      alloc-stop
      scheduled
      alloc-failure
      job-scaling" --> pending

    pending -- dequeued --> scheduling
    pending -- if delayed --> delayed
    delayed -- dequeued --> scheduling

    scheduling -. "not all allocs placed
      new eval created by scheduler
      trigger queued-allocs
      new has .PreviousEval = old.ID
      old has .BlockedEval = new.ID" .-> blocked

    scheduling -. "failed to plan
      new eval created by scheduler
      trigger: max-plan-attempts
      new has .PreviousEval = old.ID
      old has .BlockedEval = new.ID" .-> blocked

    scheduling -- "not all allocs placed
      reuse already-blocked eval" --> blocked

    blocked -- "unblocked by
      external state changes" --> scheduling

    scheduling -- allocs placed --> complete

    scheduling -- "wrong eval type or
      max retries exceeded
      on plan submit" --> failed

    scheduling -- "canceled by
      job update/stop" --> canceled

    failed -- retry --> scheduling

    scheduling -. "new eval from rolling update (system jobs)
      created by scheduler
      trigger: rolling-update
      new has .PreviousEval = old.ID
      old has .NextEval = new.ID" .-> pending

    scheduling -- submit --> applying
    applying -- failed --> scheduling

    applying -. "new eval for preempted allocs
      created by plan applier
      trigger: preemption
      new has .PreviousEval = unset!
      old has .BlockedEval = unset!" .-> pending

    complete -. "new eval from deployments (service jobs)
      created by deploymentwatcher
      trigger: deployment-watcher
      new has .PreviousEval = unset!
      old has .NextEval = unset!" .-> pending

    failed -- "new eval
      trigger: failed-follow-up
      new has .PreviousEval = old.ID
      old has .NextEval = new.ID" --> pending

    pending -- "undeliverable evals
      reaped by leader" --> failed

    blocked -- "duplicate blocked evals
      reaped by leader" --> canceled

    canceled -- garbage\ncollection --> deleted
    failed -- garbage\ncollection --> deleted
    complete -- garbage\ncollection --> deleted
```


[Scheduling in Nomad]: https://www.nomadproject.io/docs/internals/scheduling/scheduling