Small changes to gossip related telemetry docs (#9846)

Update gossip related telemetry docs to include correct descriptions, and added missing metrics
This commit is contained in:
Preetha 2021-03-11 14:21:32 -06:00 committed by GitHub
parent 92e35ed005
commit b3f1cafed3
No known key found for this signature in database
GPG Key ID: 4AEE18F83AFDEB23
1 changed files with 5 additions and 4 deletions

View File

@ -303,10 +303,10 @@ These metrics give insight into the health of the cluster as a whole.
| `consul.memberlist.udp.sent/received` | This metric measures the total number of bytes sent/received by an agent through the UDP protocol. | bytes sent or bytes received / interval | counter |
| `consul.memberlist.tcp.connect` | This metric counts the number of times an agent has initiated a push/pull sync with an other agent. | push/pull initiated / interval | counter |
| `consul.memberlist.tcp.sent` | This metric measures the total number of bytes sent by an agent through the TCP protocol | bytes sent / interval | counter |
| `consul.memberlist.gossip` | This metric gives the number of gossips (messages) broadcasted to a set of randomly selected nodes. | messages / Interval | counter |
| `consul.memberlist.msg_alive` | This metric counts the number of alive agents, that the agent has mapped out so far, based on the message information given by the network layer. | nodes / Interval | counter |
| `consul.memberlist.msg_dead` | This metric gives the number of dead agents, that the agent has mapped out so far, based on the message information given by the network layer. | nodes / Interval | counter |
| `consul.memberlist.msg_suspect` | This metric gives the number of suspect nodes, that the agent has mapped out so far, based on the message information given by the network layer. | nodes / Interval | counter |
| `consul.memberlist.gossip` | This metric measures the time taken for gossip messages to be broadcasted to a set of randomly selected nodes. | ms | timer |
| `consul.memberlist.msg_alive` | This metric counts the number of alive messages, that the agent has processed so far, based on the message information given by the network layer. | messages / Interval | counter |
| `consul.memberlist.msg_dead` | This metric gives the number of dead messages, that the agent has processed so far, based on the message information given by the network layer. | messages / Interval | counter |
| `consul.memberlist.msg_suspect` | This metric gives the number of suspect messages, that the agent has processed so far, based on the message information given by the network layer. | messages / Interval | counter |
| `consul.memberlist.probeNode` | This metric measures the time taken to perform a single round of failure detection on a select agent. | nodes / Interval | counter |
| `consul.memberlist.pushPullNode` | This metric measures the number of agents that have exchanged state with this agent. | nodes / Interval | counter |
| `consul.serf.member.failed` | This increments when an agent is marked dead. This can be an indicator of overloaded agents, network problems, or configuration errors where agents cannot connect to each other on the [required ports](/docs/agent/options#ports). | failures / interval | counter |
@ -314,6 +314,7 @@ These metrics give insight into the health of the cluster as a whole.
| `consul.serf.member.join` | This increments when an agent joins the cluster. If an agent flapped or failed this counter also increments when it re-joins. | joins / interval | counter |
| `consul.serf.member.left` | This increments when an agent leaves the cluster. | leaves / interval | counter |
| `consul.serf.events` | This increments when an agent processes an [event](/commands/event). Consul uses events internally so there may be additional events showing in telemetry. There are also a per-event counters emitted as `consul.serf.events.`. | events / interval | counter |
| `consul.serf.msgs.sent` | This metric is sample of the number of bytes of messages broadcast to the cluster. In a given time interval, the sum of this metric is the total number of bytes sent and the count is the number of messages sent. | message bytes / interval | counter |
| `consul.autopilot.failure_tolerance` | This tracks the number of voting servers that the cluster can lose while continuing to function. | servers | gauge |
| `consul.autopilot.healthy` | This tracks the overall health of the local server cluster. If all servers are considered healthy by Autopilot, this will be set to 1. If any are unhealthy, this will be 0. | boolean | gauge |
| `consul.session_ttl.active` | This tracks the active number of sessions being tracked. | sessions | gauge |