open-nomad/contributing/architecture-state-store.md
Tim Gross 2a6e8be6ba
internals documentation with diagrams (#14750)
This changeset adds new architecture internals documents to the contributing
guide. These are intentionally here and not on the public-facing website because
the material is not required for operators and includes a lot of diagrams that
we can cheaply maintain with mermaid syntax but would involve art assets to have
up on the main site that would become quickly out of date as code changes happen
and be extremely expensive to maintain. However, these should be suitable to use
as points of conversation with expert end users.

Included:
* A description of Evaluation triggers and expected counts, with examples.
* A description of Evaluation states and implicit states. This is taken from an
  internal document in our team wiki.
* A description of how writing the State Store works. This is taken from a
  diagram I put together a few months ago for internal education purposes.
* A description of Evaluation lifecycle, from registration to running
  Allocations. This is mostly lifted from @lgfa29's amazing mega-diagram, but
  broken into digestible chunks and without multi-region deployments, which I'd
  like to cover in a future doc.

Also includes adding Deployments to our public-facing glossary.

Co-authored-by: Luiz Aoqui <luiz@hashicorp.com>
Co-authored-by: Michael Schurter <mschurter@hashicorp.com>
Co-authored-by: Seth Hoenig <shoenig@duck.com>
2022-10-03 14:06:41 -04:00

4.3 KiB

Architecture: Nomad State Store

Nomad server state is an in-memory state store backed by raft. All writes to state are serialized into message pack and written as raft logs. The raft logs are replicated from the leader to the followers. Once each follower has persisted the log entry and applied the entry to its in-memory state ("FSM"), the leader considers the write committed.

This architecture has a few implications:

  • The fsm.Apply functions must be deterministic over their inputs for a given state. You can never generate random IDs or assign wall-clock timestamps in the state store. These values must be provided as parameters from the RPC handler.

    ```go
    # Incorrect: generating a timestamp in the state store is not deterministic.
    func (s *StateStore) UpsertObject(...) {
        # ...
        obj.CreateTime = time.Now()
        # ...
    }
    
    # Correct: non-deterministic values should be passed as inputs:
    func (s *StateStore) UpsertObject(..., timestamp time.Time) {
        # ...
        obj.CreateTime = timestamp
        # ...
    }
    ```
    
  • Every object you read from the state store must be copied before it can be mutated, because mutating the object modifies it outside the raft workflow. The result can be servers having inconsistent state, transactions breaking, or even server panics.

    ```go
    # Incorrect: job is mutated without copying.
    job, err := state.JobByID(ws, namespace, id)
    job.Status = structs.JobStatusRunning
    
    # Correct: only the job copy is mutated.
    job, err := state.JobByID(ws, namespace, id)
    updateJob := job.Copy()
    updateJob.Status = structs.JobStatusRunning
    ```
    

Adding new objects to the state store should be done as part of adding new RPC endpoints. See the RPC Endpoint Checklist.

flowchart TD

    %% entities

    ext(("API\nclient"))
    any("Any node
      (client or server)")
    follower(Follower)

    rpcLeader("RPC handler (on leader)")

    writes("writes go thru raft
        raftApply(MessageType, entry) in nomad/rpc.go
        structs.MessageType in nomad/structs/structs.go
        go generate ./... for nomad/msgtypes.go")
    click writes href "https://github.com/hashicorp/nomad/tree/main/nomad" _blank

    reads("reads go directly to state store
        Typical state_store.go funcs to implement:

        state.GetMyThingByID
        state.GetMyThingByPrefix
        state.ListMyThing
        state.UpsertMyThing
        state.DeleteMyThing")
    click writes href "https://github.com/hashicorp/nomad/tree/main/nomad/state" _blank

    raft("hashicorp/raft")

    bolt("boltdb")

    fsm("Application-specific
      Finite State Machine (FSM)
      (aka State Store)")
    click writes href "https://github.com/hashicorp/nomad/tree/main/nomad/fsm.go" _blank

    memdb("hashicorp/go-memdb")

    %% style classes
    classDef leader fill:#d5f6ea,stroke-width:4px,stroke:#1d9467
    classDef other fill:#d5f6ea,stroke:#1d9467
    class any,follower other;
    class rpcLeader,raft,bolt,fsm,memdb leader;

    %% flows

    ext -- HTTP request --> any

    any -- "RPC request
      to connected server
      (follower or leader)" --> follower

    follower -- "(1) srv.Forward (to leader)" --> rpcLeader

    raft -- "(3) replicate to a
      quorum of followers
      wait on their fsm.Apply" --> follower

    rpcLeader --> reads
    reads --> memdb

    rpcLeader --> writes
    writes -- "(2)" --> raft

    raft -- "(4) write log to disk" --> bolt
    raft -- "(5) fsm.Apply
      nomad/fsm.go" --> fsm

    fsm -- "(6) txn.Insert" --> memdb

    bolt <-- "Snapshot Persist: nomad/fsm.go
    Snapshot Restore: nomad/fsm.go" --> memdb


    %% notes

    note1("Typical structs to implement
        for RPC handlers:

        structs.MyThing
          .Diff()
          .Copy()
          .Merge()
        structs.MyThingUpsertRequest
        structs.MyThingUpsertResponse
        structs.MyThingGetRequest
        structs.MyThingGetResponse
        structs.MyThingListRequest
        structs.MyThingListResponse
        structs.MyThingDeleteRequest
        structs.MyThingDeleteResponse

        Don't forget to register your new RPC
        in nomad/server.go!")

    note1 -.- rpcLeader