2a6e8be6ba
This changeset adds new architecture internals documents to the contributing guide. These are intentionally here and not on the public-facing website because the material is not required for operators and includes a lot of diagrams that we can cheaply maintain with mermaid syntax but would involve art assets to have up on the main site that would become quickly out of date as code changes happen and be extremely expensive to maintain. However, these should be suitable to use as points of conversation with expert end users. Included: * A description of Evaluation triggers and expected counts, with examples. * A description of Evaluation states and implicit states. This is taken from an internal document in our team wiki. * A description of how writing the State Store works. This is taken from a diagram I put together a few months ago for internal education purposes. * A description of Evaluation lifecycle, from registration to running Allocations. This is mostly lifted from @lgfa29's amazing mega-diagram, but broken into digestible chunks and without multi-region deployments, which I'd like to cover in a future doc. Also includes adding Deployments to our public-facing glossary. Co-authored-by: Luiz Aoqui <luiz@hashicorp.com> Co-authored-by: Michael Schurter <mschurter@hashicorp.com> Co-authored-by: Seth Hoenig <shoenig@duck.com>
152 lines
4.3 KiB
Markdown
152 lines
4.3 KiB
Markdown
# Architecture: Nomad State Store
|
|
|
|
Nomad server state is an in-memory state store backed by raft. All writes to
|
|
state are serialized into message pack and written as raft logs. The raft logs
|
|
are replicated from the leader to the followers. Once each follower has
|
|
persisted the log entry and applied the entry to its in-memory state ("FSM"),
|
|
the leader considers the write committed.
|
|
|
|
This architecture has a few implications:
|
|
|
|
* The `fsm.Apply` functions must be deterministic over their inputs for a given
|
|
state. You can never generate random IDs or assign wall-clock timestamps in
|
|
the state store. These values must be provided as parameters from the RPC
|
|
handler.
|
|
|
|
```go
|
|
# Incorrect: generating a timestamp in the state store is not deterministic.
|
|
func (s *StateStore) UpsertObject(...) {
|
|
# ...
|
|
obj.CreateTime = time.Now()
|
|
# ...
|
|
}
|
|
|
|
# Correct: non-deterministic values should be passed as inputs:
|
|
func (s *StateStore) UpsertObject(..., timestamp time.Time) {
|
|
# ...
|
|
obj.CreateTime = timestamp
|
|
# ...
|
|
}
|
|
```
|
|
|
|
* Every object you read from the state store must be copied before it can be
|
|
mutated, because mutating the object modifies it outside the raft
|
|
workflow. The result can be servers having inconsistent state, transactions
|
|
breaking, or even server panics.
|
|
|
|
```go
|
|
# Incorrect: job is mutated without copying.
|
|
job, err := state.JobByID(ws, namespace, id)
|
|
job.Status = structs.JobStatusRunning
|
|
|
|
# Correct: only the job copy is mutated.
|
|
job, err := state.JobByID(ws, namespace, id)
|
|
updateJob := job.Copy()
|
|
updateJob.Status = structs.JobStatusRunning
|
|
```
|
|
|
|
Adding new objects to the state store should be done as part of adding new RPC
|
|
endpoints. See the [RPC Endpoint Checklist][].
|
|
|
|
```mermaid
|
|
flowchart TD
|
|
|
|
%% entities
|
|
|
|
ext(("API\nclient"))
|
|
any("Any node
|
|
(client or server)")
|
|
follower(Follower)
|
|
|
|
rpcLeader("RPC handler (on leader)")
|
|
|
|
writes("writes go thru raft
|
|
raftApply(MessageType, entry) in nomad/rpc.go
|
|
structs.MessageType in nomad/structs/structs.go
|
|
go generate ./... for nomad/msgtypes.go")
|
|
click writes href "https://github.com/hashicorp/nomad/tree/main/nomad" _blank
|
|
|
|
reads("reads go directly to state store
|
|
Typical state_store.go funcs to implement:
|
|
|
|
state.GetMyThingByID
|
|
state.GetMyThingByPrefix
|
|
state.ListMyThing
|
|
state.UpsertMyThing
|
|
state.DeleteMyThing")
|
|
click writes href "https://github.com/hashicorp/nomad/tree/main/nomad/state" _blank
|
|
|
|
raft("hashicorp/raft")
|
|
|
|
bolt("boltdb")
|
|
|
|
fsm("Application-specific
|
|
Finite State Machine (FSM)
|
|
(aka State Store)")
|
|
click writes href "https://github.com/hashicorp/nomad/tree/main/nomad/fsm.go" _blank
|
|
|
|
memdb("hashicorp/go-memdb")
|
|
|
|
%% style classes
|
|
classDef leader fill:#d5f6ea,stroke-width:4px,stroke:#1d9467
|
|
classDef other fill:#d5f6ea,stroke:#1d9467
|
|
class any,follower other;
|
|
class rpcLeader,raft,bolt,fsm,memdb leader;
|
|
|
|
%% flows
|
|
|
|
ext -- HTTP request --> any
|
|
|
|
any -- "RPC request
|
|
to connected server
|
|
(follower or leader)" --> follower
|
|
|
|
follower -- "(1) srv.Forward (to leader)" --> rpcLeader
|
|
|
|
raft -- "(3) replicate to a
|
|
quorum of followers
|
|
wait on their fsm.Apply" --> follower
|
|
|
|
rpcLeader --> reads
|
|
reads --> memdb
|
|
|
|
rpcLeader --> writes
|
|
writes -- "(2)" --> raft
|
|
|
|
raft -- "(4) write log to disk" --> bolt
|
|
raft -- "(5) fsm.Apply
|
|
nomad/fsm.go" --> fsm
|
|
|
|
fsm -- "(6) txn.Insert" --> memdb
|
|
|
|
bolt <-- "Snapshot Persist: nomad/fsm.go
|
|
Snapshot Restore: nomad/fsm.go" --> memdb
|
|
|
|
|
|
%% notes
|
|
|
|
note1("Typical structs to implement
|
|
for RPC handlers:
|
|
|
|
structs.MyThing
|
|
.Diff()
|
|
.Copy()
|
|
.Merge()
|
|
structs.MyThingUpsertRequest
|
|
structs.MyThingUpsertResponse
|
|
structs.MyThingGetRequest
|
|
structs.MyThingGetResponse
|
|
structs.MyThingListRequest
|
|
structs.MyThingListResponse
|
|
structs.MyThingDeleteRequest
|
|
structs.MyThingDeleteResponse
|
|
|
|
Don't forget to register your new RPC
|
|
in nomad/server.go!")
|
|
|
|
note1 -.- rpcLeader
|
|
```
|
|
|
|
|
|
[RPC Endpoint Checklist]: https://github.com/hashicorp/nomad/blob/main/contributing/checklist-rpc-endpoint.md
|