Various Boltdb/Raft Documentation Updates (#11793)

* Documenting the new raft_boltdb configuration options
* Add documentation around new boltdb metrics.
* Correct documentation for the consul.raft.fsm.apply metric
This commit is contained in:
Matt Keeler 2021-12-09 16:18:59 -05:00 committed by GitHub
parent bb992667de
commit 431de5e3dd
No known key found for this signature in database
GPG Key ID: 4AEE18F83AFDEB23
2 changed files with 34 additions and 1 deletions

View File

@ -1770,6 +1770,16 @@ bind_addr = "{{ GetPrivateInterfaces | include \"network\" \"10.0.0.0/8\" | attr
- `protocol` ((#protocol)) Equivalent to the [`-protocol` command-line
flag](#_protocol).
- `raft_boltdb` ((#raft_boltdb)) This is a nested object that allows configuring
options for Raft's BoltDB based log store.
- `NoFreelistSync` ((#NoFreelistSync)) Setting this to `true` will disable
syncing the BoltDB freelist to disk within the raft.db file. Not syncing
the freelist to disk will reduce disk IO required for write operations
at the expense of potentially increasing start up time due to needing
to scan the db to discover where the free space resides within the file.
- `raft_protocol` ((#raft_protocol)) Equivalent to the [`-raft-protocol`
command-line flag](#_raft_protocol).

View File

@ -342,11 +342,34 @@ These metrics are used to monitor the health of the Consul servers.
| `consul.raft.applied_index` | Represents the raft applied index. | index | gauge |
| `consul.raft.apply` | Counts the number of Raft transactions occurring over the interval, which is a general indicator of the write load on the Consul servers. | raft transactions / interval | counter |
| `consul.raft.barrier` | Counts the number of times the agent has started the barrier i.e the number of times it has issued a blocking call, to ensure that the agent has all the pending operations that were queued, to be applied to the agent's FSM. | blocks / interval | counter |
| `consul.raft.boltdb.freelistBytes` | Represents the number of bytes necessary to encode the freelist metadata. When `raft_boltdb.NoFreelistSync` is set to `false` these metadata bytes must also be written to disk for each committed log. | bytes | gauge |
| `consul.raft.boltdb.freePageBytes` | Represents the number of bytes of free space within the raft.db file. | bytes | gauge |
| `consul.raft.boltdb.getLog` | Measures the amount of time spent reading logs from the db. | ms | timer |
| `consul.raft.boltdb.logBatchSize` | Measures the total size in bytes of logs being written to the db in a single batch. | bytes | sample |
| `consul.raft.boltdb.logsPerBatch` | Measures the number of logs being written per batch to the db. | logs | sample |
| `consul.raft.boltdb.logSize` | Measures the size of logs being written to the db. | bytes | sample |
| `consul.raft.boltdb.numFreePages` | Represents the number of free pages within the raft.db file. | pages | gauge |
| `consul.raft.boltdb.numPendingPages` | Represents the number of pending pages within the raft.db that will soon become free. | pages | gauge |
| `consul.raft.boltdb.openReadTxn` | Represents the number of open read transactions against the db | transactions | guage |
| `consul.raft.boltdb.totalReadTxn` | Represents the total number of started read transactions against the db | transactions | guage |
| `consul.raft.boltdb.storeLogs` | Measures the amount of time spent writing logs to the db. | ms | timer |
| `consul.raft.boltdb.txstats.cursorCount` | Counts the number of cursors created since Consul was started. | cursors | counter |
| `consul.raft.boltdb.txstats.nodeCount` | Counts the number of node allocations within the db since Consul was started. | allocations | counter |
| `consul.raft.boltdb.txstats.nodeDeref` | Counts the number of node dereferences in the db since Consul was started. | dereferences | counter |
| `consul.raft.boltdb.txstats.pageAlloc` | Represents the number of bytes allocated within the db since Consul was started. Note that this does not take into account space having been freed and reused. In that case, the value of this metric will still increase. | bytes | gauge |
| `consul.raft.boltdb.txstats.pageCount` | Represents the number of pages allocated since Consul was started. Note that this does not take into account space having been freed and reused. In that case, the value of this metric will still increase. | pages | gauge |
| `consul.raft.boltdb.txstats.rebalance` | Counts the number of node rebalances performed in the db since Consul was started. | rebalances | counter |
| `consul.raft.boltdb.txstats.rebalanceTime` | Measures the time spent rebalancing nodes in the db. | ms | timer |
| `consul.raft.boltdb.txstats.spill` | Counts the number of nodes spilled in the db since Consul was started. | spills | counter |
| `consul.raft.boltdb.txstats.spillTime` | Measures the time spent spilling nodes in the db. | ms | timer |
| `consul.raft.boltdb.txstats.split` | Counts the number of nodes split in the db since Consul was started. | splits | counter |
| `consul.raft.boltdb.txstats.write` | Counts the number of writes to the db since Consul was started. | writes | counter |
| `consul.raft.boltdb.txstats.writeTime` | Measures the amount of time spent performing writes to the db. | ms | timer |
| `consul.raft.commitNumLogs` | Measures the count of logs processed for application to the FSM in a single batch. | logs | gauge |
| `consul.raft.commitTime` | Measures the time it takes to commit a new entry to the Raft log on the leader. | ms | timer |
| `consul.raft.fsm.lastRestoreDuration` | Measures the time taken to restore the FSM from a snapshot on an agent restart or from the leader calling installSnapshot. This is a gauge that holds it's value since most servers only restore during restarts which are typically infrequent. | ms | gauge |
| `consul.raft.fsm.snapshot` | Measures the time taken by the FSM to record the current state for the snapshot. | ms | timer |
| `consul.raft.fsm.apply` | The number of logs committed since the last interval. | commit logs / interval | counter |
| `consul.raft.fsm.apply` | Measures the time to apply a log to the FSM. | ms | timer |
| `consul.raft.fsm.enqueue` | Measures the amount of time to enqueue a batch of logs for the FSM to apply. | ms | timer |
| `consul.raft.fsm.restore` | Measures the time taken by the FSM to restore its state from a snapshot. | ms | timer |
| `consul.raft.last_index` | Represents the raft applied index. | index | gauge |