From 8b20491a79112227b9b37685c403a49d99be0748 Mon Sep 17 00:00:00 2001 From: Matt Keeler Date: Fri, 23 Apr 2021 16:05:00 -0400 Subject: [PATCH] Update changelog and add telemetry docs (#10107) --- .changelog/10073.txt | 3 + website/content/docs/agent/telemetry.mdx | 208 ++++++++++++----------- 2 files changed, 113 insertions(+), 98 deletions(-) create mode 100644 .changelog/10073.txt diff --git a/.changelog/10073.txt b/.changelog/10073.txt new file mode 100644 index 000000000..9b9d9327a --- /dev/null +++ b/.changelog/10073.txt @@ -0,0 +1,3 @@ +```release-note:improvement +telemetry: Add new metrics for status of secondary datacenter replication. +``` diff --git a/website/content/docs/agent/telemetry.mdx b/website/content/docs/agent/telemetry.mdx index 139acd68b..241e50d1c 100644 --- a/website/content/docs/agent/telemetry.mdx +++ b/website/content/docs/agent/telemetry.mdx @@ -203,104 +203,116 @@ This is a full list of metrics emitted by Consul. These metrics are used to monitor the health of the Consul servers. -| Metric | Description | Unit | Type | -| -------------------------------------------- | ------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- | --------------------------------- | ------- | -| `consul.acl.apply` | Measures the time it takes to complete an update to the ACL store. | ms | timer | -| `consul.acl.resolveTokenLegacy` | Measures the time it takes to resolve an ACL token using the legacy ACL system. | ms | timer | -| `consul.acl.ResolveToken` | Measures the time it takes to resolve an ACL token. | ms | timer | -| `consul.acl.ResolveTokenToIdentity` | Measures the time it takes to resolve an ACL token to an Identity. | ms | timer | -| `consul.acl.token.cache_hit` | Increments if Consul is able to resolve a token's identity, or a legacy token, from the cache. | cache read op | counter | -| `consul.acl.token.cache_miss` | Increments if Consul cannot resolve a token's identity, or a legacy token, from the cache. | cache read op | counter | -| `consul.cache.bypass` | Counts how many times a request bypassed the cache because no cache-key was provided. | counter | counter | -| `consul.cache.fetch_success` | Counts the number of successful fetches by the cache. | counter | counter | -| `consul.cache.fetch_error` | Counts the number of failed fetches by the cache. | counter | counter | -| `consul.cache.evict_expired` | Counts the number of expired entries that are evicted. | counter | counter | -| `consul.raft.fsm.snapshot` | Measures the time taken by the FSM to record the current state for the snapshot. | ms | timer | -| `consul.raft.fsm.apply` | The number of logs committed since the last interval. | commit logs / interval | counter | -| `consul.raft.commitNumLogs` | Measures the count of logs processed for application to the FSM in a single batch. | logs | gauge | -| `consul.raft.fsm.enqueue` | Measures the amount of time to enqueue a batch of logs for the FSM to apply. | ms | timer | -| `consul.raft.fsm.restore` | Measures the time taken by the FSM to restore its state from a snapshot. | ms | timer | -| `consul.raft.snapshot.create` | Measures the time taken to initialize the snapshot process. | ms | timer | -| `consul.raft.snapshot.persist` | Measures the time taken to dump the current snapshot taken by the Consul agent to the disk. | ms | timer | -| `consul.raft.snapshot.takeSnapshot` | Measures the total time involved in taking the current snapshot (creating one and persisting it) by the Consul agent. | ms | timer | -| `consul.raft.replication.heartbeat` | Measures the time taken to invoke appendEntries on a peer, so that it doesn’t timeout on a periodic basis. | ms | timer | -| `consul.serf.snapshot.appendLine` | Measures the time taken by the Consul agent to append an entry into the existing log. | ms | timer | -| `consul.serf.snapshot.compact` | Measures the time taken by the Consul agent to compact a log. This operation occurs only when the snapshot becomes large enough to justify the compaction . | ms | timer | -| `consul.raft.applied_index` | Represents the raft applied index. | index | gauge | -| `consul.raft.last_index` | Represents the raft applied index. | index | gauge | -| `consul.raft.state.leader` | Increments whenever a Consul server becomes a leader. If there are frequent leadership changes this may be indication that the servers are overloaded and aren't meeting the soft real-time requirements for Raft, or that there are networking problems between the servers. | leadership transitions / interval | counter | -| `consul.raft.state.candidate` | Increments whenever a Consul server starts an election. If this increments without a leadership change occurring it could indicate that a single server is overloaded or is experiencing network connectivity issues. | election attempts / interval | counter | -| `consul.raft.apply` | Counts the number of Raft transactions occurring over the interval, which is a general indicator of the write load on the Consul servers. | raft transactions / interval | counter | -| `consul.raft.barrier` | Counts the number of times the agent has started the barrier i.e the number of times it has issued a blocking call, to ensure that the agent has all the pending operations that were queued, to be applied to the agent's FSM. | blocks / interval | counter | -| `consul.raft.verify_leader` | Counts the number of times an agent checks whether it is still the leader or not | checks / interval | Counter | -| `consul.raft.restore` | Counts the number of times the restore operation has been performed by the agent. Here, restore refers to the action of raft consuming an external snapshot to restore its state. | operation invoked / interval | counter | -| `consul.raft.commitTime` | Measures the time it takes to commit a new entry to the Raft log on the leader. | ms | timer | -| `consul.raft.leader.dispatchLog` | Measures the time it takes for the leader to write log entries to disk. | ms | timer | -| `consul.raft.leader.dispatchNumLogs` | Measures the number of logs committed to disk in a batch. | logs | gauge | -| `consul.raft.replication.appendEntries` | Measures the time it takes to replicate log entries to followers. This is a general indicator of the load pressure on the Consul servers, as well as the performance of the communication between the servers. | ms | timer | -| `consul.raft.state.follower` | Counts the number of times an agent has entered the follower mode. This happens when a new agent joins the cluster or after the end of a leader election. | follower state entered / interval | counter | -| `consul.raft.transistion.heartbeat_timeout` | The number of times an agent has transitioned to the Candidate state, after receive no heartbeat messages from the last known leader. | timeouts / interval | counter | -| `consul.raft.restoreUserSnapshot` | Measures the time taken by the agent to restore the FSM state from a user's snapshot | ms | timer | -| `consul.raft.rpc.processHeartBeat` | Measures the time taken to process a heartbeat request. | ms | timer | -| `consul.raft.rpc.appendEntries` | Measures the time taken to process an append entries RPC call from an agent. | ms | timer | -| `consul.raft.rpc.appendEntries.storeLogs` | Measures the time taken to add any outstanding logs for an agent, since the last appendEntries was invoked | ms | timer | -| `consul.raft.rpc.appendEntries.processLogs` | Measures the time taken to process the outstanding log entries of an agent. | ms | timer | -| `consul.raft.rpc.requestVote` | Measures the time taken to process the request vote RPC call. | ms | timer | -| `consul.raft.rpc.installSnapshot` | Measures the time taken to process the installSnapshot RPC call. This metric should only be seen on agents which are currently in the follower state. | ms | timer | -| `consul.raft.replication.appendEntries.rpc` | Measures the time taken by the append entries RFC, to replicate the log entries of a leader agent onto its follower agent(s) | ms | timer | -| `consul.raft.replication.appendEntries.logs` | Measures the number of logs replicated to an agent, to bring it up to speed with the leader's logs. | logs appended/ interval | counter | -| `consul.raft.leader.lastContact` | This will only be emitted by the Raft leader and measures the time since the leader was last able to contact the follower nodes when checking its leader lease. It can be used as a measure for how stable the Raft timing is and how close the leader is to timing out its lease.The lease timeout is 500 ms times the [`raft_multiplier` configuration](/docs/agent/options#raft_multiplier), so this telemetry value should not be getting close to that configured value, otherwise the Raft timing is marginal and might need to be tuned, or more powerful servers might be needed. See the [Server Performance](/docs/install/performance) guide for more details. | ms | timer | -| `consul.rpc.accept_conn` | Increments when a server accepts an RPC connection. | connections | counter | -| `consul.catalog.register` | Measures the time it takes to complete a catalog register operation. | ms | timer | -| `consul.catalog.deregister` | Measures the time it takes to complete a catalog deregister operation. | ms | timer | -| `consul.fsm.register` | Measures the time it takes to apply a catalog register operation to the FSM. | ms | timer | -| `consul.fsm.deregister` | Measures the time it takes to apply a catalog deregister operation to the FSM. | ms | timer | -| `consul.fsm.acl.` | Measures the time it takes to apply the given ACL operation to the FSM. | ms | timer | -| `consul.fsm.session.` | Measures the time it takes to apply the given session operation to the FSM. | ms | timer | -| `consul.fsm.kvs.` | Measures the time it takes to apply the given KV operation to the FSM. | ms | timer | -| `consul.fsm.tombstone.` | Measures the time it takes to apply the given tombstone operation to the FSM. | ms | timer | -| `consul.fsm.coordinate.batch-update` | Measures the time it takes to apply the given batch coordinate update to the FSM. | ms | timer | -| `consul.fsm.prepared-query.` | Measures the time it takes to apply the given prepared query update operation to the FSM. | ms | timer | -| `consul.fsm.txn` | Measures the time it takes to apply the given transaction update to the FSM. | ms | timer | -| `consul.fsm.autopilot` | Measures the time it takes to apply the given autopilot update to the FSM. | ms | timer | -| `consul.fsm.persist` | Measures the time it takes to persist the FSM to a raft snapshot. | ms | timer | -| `consul.fsm.intention` | Measures the time it takes to apply an intention operation to the state store. | ms | timer | -| `consul.fsm.ca` | Measures the time it takes to apply CA configuration operations to the FSM. | ms | timer | -| `consul.fsm.ca.leaf` | Measures the time it takes to apply an operation while signing a leaf certificate. | ms | timer | -| `consul.fsm.acl.token` | Measures the time it takes to apply an ACL token operation to the FSM. | ms | timer | -| `consul.fsm.acl.policy` | Measures the time it takes to apply an ACL policy operation to the FSM. | ms | timer | -| `consul.fsm.acl.bindingrule` | Measures the time it takes to apply an ACL binding rule operation to the FSM. | ms | timer | -| `consul.fsm.acl.authmethod` | Measures the time it takes to apply an ACL authmethod operation to the FSM. | ms | timer | -| `consul.fsm.system_metadata` | Measures the time it takes to apply a system metadata operation to the FSM. | ms | timer | -| `consul.kvs.apply` | Measures the time it takes to complete an update to the KV store. | ms | timer | -| `consul.leader.barrier` | Measures the time spent waiting for the raft barrier upon gaining leadership. | ms | timer | -| `consul.leader.reconcile` | Measures the time spent updating the raft store from the serf member information. | ms | timer | -| `consul.leader.reconcileMember` | Measures the time spent updating the raft store for a single serf member's information. | ms | timer | -| `consul.leader.reapTombstones` | Measures the time spent clearing tombstones. | ms | timer | -| `consul.prepared-query.apply` | Measures the time it takes to apply a prepared query update. | ms | timer | -| `consul.prepared-query.explain` | Measures the time it takes to process a prepared query explain request. | ms | timer | -| `consul.prepared-query.execute` | Measures the time it takes to process a prepared query execute request. | ms | timer | -| `consul.prepared-query.execute_remote` | Measures the time it takes to process a prepared query execute request that was forwarded to another datacenter. | ms | timer | -| `consul.rpc.raft_handoff` | Increments when a server accepts a Raft-related RPC connection. | connections | counter | -| `consul.rpc.request_error` | Increments when a server returns an error from an RPC request. | errors | counter | -| `consul.rpc.request` | Increments when a server receives a Consul-related RPC request. | requests | counter | -| `consul.rpc.query` | Increments when a server receives a new blocking RPC request, indicating the rate of new blocking query calls. See consul.rpc.queries_blocking for the current number of in-flight blocking RPC calls. This metric changed in 1.7.0 to only increment on the the start of a query. The rate of queries will appear lower, but is more accurate. | queries | counter | -| `consul.rpc.queries_blocking` | The current number of in-flight blocking queries the server is handling. | queries | gauge | -| `consul.rpc.cross-dc` | Increments when a server sends a (potentially blocking) cross datacenter RPC query. | queries | counter | -| `consul.rpc.consistentRead` | Measures the time spent confirming that a consistent read can be performed. | ms | timer | -| `consul.session.apply` | Measures the time spent applying a session update. | ms | timer | -| `consul.session.renew` | Measures the time spent renewing a session. | ms | timer | -| `consul.session_ttl.invalidate` | Measures the time spent invalidating an expired session. | ms | timer | -| `consul.txn.apply` | Measures the time spent applying a transaction operation. | ms | timer | -| `consul.txn.read` | Measures the time spent returning a read transaction. | ms | timer | -| `consul.grpc.client.request.count` | Counts the number of gRPC requests made by the client agent to a Consul server. | requests | counter | -| `consul.grpc.client.connection.count` | Counts the number of new gRPC connections opened by the client agent to a Consul server. | connections | counter | -| `consul.grpc.client.connections` | Measures the number of active gRPC connections open from the client agent to any Consul servers. | connections | gauge | -| `consul.grpc.server.request.count` | Counts the number of gRPC requests received by the server. | requests | counter | -| `consul.grpc.server.connection.count` | Counts the number of new gRPC connections received by the server. | connections | counter | -| `consul.grpc.server.connections` | Measures the number of active gRPC connections open on the server. | connections | gauge | -| `consul.grpc.server.stream.count` | Counts the number of new gRPC streams received by the server. | streams | counter | -| `consul.grpc.server.streams` | Measures the number of active gRPC streams handled by the server. | streams | guage | +| Metric | Description | Unit | Type | +| --------------------------------------------------- | ------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- | --------------------------------- | ------- | +| `consul.acl.apply` | Measures the time it takes to complete an update to the ACL store. | ms | timer | +| `consul.acl.resolveTokenLegacy` | Measures the time it takes to resolve an ACL token using the legacy ACL system. | ms | timer | +| `consul.acl.ResolveToken` | Measures the time it takes to resolve an ACL token. | ms | timer | +| `consul.acl.ResolveTokenToIdentity` | Measures the time it takes to resolve an ACL token to an Identity. | ms | timer | +| `consul.acl.token.cache_hit` | Increments if Consul is able to resolve a token's identity, or a legacy token, from the cache. | cache read op | counter | +| `consul.acl.token.cache_miss` | Increments if Consul cannot resolve a token's identity, or a legacy token, from the cache. | cache read op | counter | +| `consul.cache.bypass` | Counts how many times a request bypassed the cache because no cache-key was provided. | counter | counter | +| `consul.cache.fetch_success` | Counts the number of successful fetches by the cache. | counter | counter | +| `consul.cache.fetch_error` | Counts the number of failed fetches by the cache. | counter | counter | +| `consul.cache.evict_expired` | Counts the number of expired entries that are evicted. | counter | counter | +| `consul.raft.fsm.snapshot` | Measures the time taken by the FSM to record the current state for the snapshot. | ms | timer | +| `consul.raft.fsm.apply` | The number of logs committed since the last interval. | commit logs / interval | counter | +| `consul.raft.commitNumLogs` | Measures the count of logs processed for application to the FSM in a single batch. | logs | gauge | +| `consul.raft.fsm.enqueue` | Measures the amount of time to enqueue a batch of logs for the FSM to apply. | ms | timer | +| `consul.raft.fsm.restore` | Measures the time taken by the FSM to restore its state from a snapshot. | ms | timer | +| `consul.raft.snapshot.create` | Measures the time taken to initialize the snapshot process. | ms | timer | +| `consul.raft.snapshot.persist` | Measures the time taken to dump the current snapshot taken by the Consul agent to the disk. | ms | timer | +| `consul.raft.snapshot.takeSnapshot` | Measures the total time involved in taking the current snapshot (creating one and persisting it) by the Consul agent. | ms | timer | +| `consul.raft.replication.heartbeat` | Measures the time taken to invoke appendEntries on a peer, so that it doesn’t timeout on a periodic basis. | ms | timer | +| `consul.serf.snapshot.appendLine` | Measures the time taken by the Consul agent to append an entry into the existing log. | ms | timer | +| `consul.serf.snapshot.compact` | Measures the time taken by the Consul agent to compact a log. This operation occurs only when the snapshot becomes large enough to justify the compaction . | ms | timer | +| `consul.raft.applied_index` | Represents the raft applied index. | index | gauge | +| `consul.raft.last_index` | Represents the raft applied index. | index | gauge | +| `consul.raft.state.leader` | Increments whenever a Consul server becomes a leader. If there are frequent leadership changes this may be indication that the servers are overloaded and aren't meeting the soft real-time requirements for Raft, or that there are networking problems between the servers. | leadership transitions / interval | counter | +| `consul.raft.state.candidate` | Increments whenever a Consul server starts an election. If this increments without a leadership change occurring it could indicate that a single server is overloaded or is experiencing network connectivity issues. | election attempts / interval | counter | +| `consul.raft.apply` | Counts the number of Raft transactions occurring over the interval, which is a general indicator of the write load on the Consul servers. | raft transactions / interval | counter | +| `consul.raft.barrier` | Counts the number of times the agent has started the barrier i.e the number of times it has issued a blocking call, to ensure that the agent has all the pending operations that were queued, to be applied to the agent's FSM. | blocks / interval | counter | +| `consul.raft.verify_leader` | Counts the number of times an agent checks whether it is still the leader or not | checks / interval | Counter | +| `consul.raft.restore` | Counts the number of times the restore operation has been performed by the agent. Here, restore refers to the action of raft consuming an external snapshot to restore its state. | operation invoked / interval | counter | +| `consul.raft.commitTime` | Measures the time it takes to commit a new entry to the Raft log on the leader. | ms | timer | +| `consul.raft.leader.dispatchLog` | Measures the time it takes for the leader to write log entries to disk. | ms | timer | +| `consul.raft.leader.dispatchNumLogs` | Measures the number of logs committed to disk in a batch. | logs | gauge | +| `consul.raft.replication.appendEntries` | Measures the time it takes to replicate log entries to followers. This is a general indicator of the load pressure on the Consul servers, as well as the performance of the communication between the servers. | ms | timer | +| `consul.raft.state.follower` | Counts the number of times an agent has entered the follower mode. This happens when a new agent joins the cluster or after the end of a leader election. | follower state entered / interval | counter | +| `consul.raft.transistion.heartbeat_timeout` | The number of times an agent has transitioned to the Candidate state, after receive no heartbeat messages from the last known leader. | timeouts / interval | counter | +| `consul.raft.restoreUserSnapshot` | Measures the time taken by the agent to restore the FSM state from a user's snapshot | ms | timer | +| `consul.raft.rpc.processHeartBeat` | Measures the time taken to process a heartbeat request. | ms | timer | +| `consul.raft.rpc.appendEntries` | Measures the time taken to process an append entries RPC call from an agent. | ms | timer | +| `consul.raft.rpc.appendEntries.storeLogs` | Measures the time taken to add any outstanding logs for an agent, since the last appendEntries was invoked | ms | timer | +| `consul.raft.rpc.appendEntries.processLogs` | Measures the time taken to process the outstanding log entries of an agent. | ms | timer | +| `consul.raft.rpc.requestVote` | Measures the time taken to process the request vote RPC call. | ms | timer | +| `consul.raft.rpc.installSnapshot` | Measures the time taken to process the installSnapshot RPC call. This metric should only be seen on agents which are currently in the follower state. | ms | timer | +| `consul.raft.replication.appendEntries.rpc` | Measures the time taken by the append entries RFC, to replicate the log entries of a leader agent onto its follower agent(s) | ms | timer | +| `consul.raft.replication.appendEntries.logs` | Measures the number of logs replicated to an agent, to bring it up to speed with the leader's logs. | logs appended/ interval | counter | +| `consul.raft.leader.lastContact` | This will only be emitted by the Raft leader and measures the time since the leader was last able to contact the follower nodes when checking its leader lease. It can be used as a measure for how stable the Raft timing is and how close the leader is to timing out its lease.The lease timeout is 500 ms times the [`raft_multiplier` configuration](/docs/agent/options#raft_multiplier), so this telemetry value should not be getting close to that configured value, otherwise the Raft timing is marginal and might need to be tuned, or more powerful servers might be needed. See the [Server Performance](/docs/install/performance) guide for more details. | ms | timer | +| `consul.rpc.accept_conn` | Increments when a server accepts an RPC connection. | connections | counter | +| `consul.catalog.register` | Measures the time it takes to complete a catalog register operation. | ms | timer | +| `consul.catalog.deregister` | Measures the time it takes to complete a catalog deregister operation. | ms | timer | +| `consul.fsm.register` | Measures the time it takes to apply a catalog register operation to the FSM. | ms | timer | +| `consul.fsm.deregister` | Measures the time it takes to apply a catalog deregister operation to the FSM. | ms | timer | +| `consul.fsm.acl.` | Measures the time it takes to apply the given ACL operation to the FSM. | ms | timer | +| `consul.fsm.session.` | Measures the time it takes to apply the given session operation to the FSM. | ms | timer | +| `consul.fsm.kvs.` | Measures the time it takes to apply the given KV operation to the FSM. | ms | timer | +| `consul.fsm.tombstone.` | Measures the time it takes to apply the given tombstone operation to the FSM. | ms | timer | +| `consul.fsm.coordinate.batch-update` | Measures the time it takes to apply the given batch coordinate update to the FSM. | ms | timer | +| `consul.fsm.prepared-query.` | Measures the time it takes to apply the given prepared query update operation to the FSM. | ms | timer | +| `consul.fsm.txn` | Measures the time it takes to apply the given transaction update to the FSM. | ms | timer | +| `consul.fsm.autopilot` | Measures the time it takes to apply the given autopilot update to the FSM. | ms | timer | +| `consul.fsm.persist` | Measures the time it takes to persist the FSM to a raft snapshot. | ms | timer | +| `consul.fsm.intention` | Measures the time it takes to apply an intention operation to the state store. | ms | timer | +| `consul.fsm.ca` | Measures the time it takes to apply CA configuration operations to the FSM. | ms | timer | +| `consul.fsm.ca.leaf` | Measures the time it takes to apply an operation while signing a leaf certificate. | ms | timer | +| `consul.fsm.acl.token` | Measures the time it takes to apply an ACL token operation to the FSM. | ms | timer | +| `consul.fsm.acl.policy` | Measures the time it takes to apply an ACL policy operation to the FSM. | ms | timer | +| `consul.fsm.acl.bindingrule` | Measures the time it takes to apply an ACL binding rule operation to the FSM. | ms | timer | +| `consul.fsm.acl.authmethod` | Measures the time it takes to apply an ACL authmethod operation to the FSM. | ms | timer | +| `consul.fsm.system_metadata` | Measures the time it takes to apply a system metadata operation to the FSM. | ms | timer | +| `consul.kvs.apply` | Measures the time it takes to complete an update to the KV store. | ms | timer | +| `consul.leader.barrier` | Measures the time spent waiting for the raft barrier upon gaining leadership. | ms | timer | +| `consul.leader.reconcile` | Measures the time spent updating the raft store from the serf member information. | ms | timer | +| `consul.leader.reconcileMember` | Measures the time spent updating the raft store for a single serf member's information. | ms | timer | +| `consul.leader.reapTombstones` | Measures the time spent clearing tombstones. | ms | timer | +| `consul.leader.replication.acl-policies.status` | This will only be emitted by the leader in a secondary datacenter. The value will be a 1 if the last round of ACL policy replication was successful or 0 if there was an error. | healthy | gauge | +| `consul.leader.replication.acl-policies.index` | This will only be emitted by the leader in a secondary datacenter. Increments to the index of ACL policies in the primary datacenter that have been successfully replicated. | index | gauge | +| `consul.leader.replication.acl-roles.status` | This will only be emitted by the leader in a secondary datacenter. The value will be a 1 if the last round of ACL role replication was successful or 0 if there was an error. | healthy | gauge | +| `consul.leader.replication.acl-roles.index` | This will only be emitted by the leader in a secondary datacenter. Increments to the index of ACL roles in the primary datacenter that have been successfully replicated. | index | gauge | +| `consul.leader.replication.acl-tokens.status` | This will only be emitted by the leader in a secondary datacenter. The value will be a 1 if the last round of ACL token replication was successful or 0 if there was an error. | healthy | gauge | +| `consul.leader.replication.acl-tokens.index` | This will only be emitted by the leader in a secondary datacenter. Increments to the index of ACL tokens in the primary datacenter that have been successfully replicated. | index | gauge | +| `consul.leader.replication.config-entries.status` | This will only be emitted by the leader in a secondary datacenter. The value will be a 1 if the last round of config entry replication was successful or 0 if there was an error. | healthy | gauge | +| `consul.leader.replication.config-entries.index` | This will only be emitted by the leader in a secondary datacenter. Increments to the index of config entries in the primary datacenter that have been successfully replicated. | index | gauge | +| `consul.leader.replication.federation-state.status` | This will only be emitted by the leader in a secondary datacenter. The value will be a 1 if the last round of federation state replication was successful or 0 if there was an error. | healthy | gauge | +| `consul.leader.replication.federation-state.index` | This will only be emitted by the leader in a secondary datacenter. Increments to the index of federation states in the primary datacenter that have been successfully replicated. | index | gauge | +| `consul.leader.replication.namespaces.status` | This will only be emitted by the leader in a secondary datacenter. The value will be a 1 if the last round of namespace replication was successful or 0 if there was an error. | healthy | gauge | +| `consul.leader.replication.namespaces.index` | This will only be emitted by the leader in a secondary datacenter. Increments to the index of namespaces in the primary datacenter that have been successfully replicated. | index | gauge | +| `consul.prepared-query.apply` | Measures the time it takes to apply a prepared query update. | ms | timer | +| `consul.prepared-query.explain` | Measures the time it takes to process a prepared query explain request. | ms | timer | +| `consul.prepared-query.execute` | Measures the time it takes to process a prepared query execute request. | ms | timer | +| `consul.prepared-query.execute_remote` | Measures the time it takes to process a prepared query execute request that was forwarded to another datacenter. | ms | timer | +| `consul.rpc.raft_handoff` | Increments when a server accepts a Raft-related RPC connection. | connections | counter | +| `consul.rpc.request_error` | Increments when a server returns an error from an RPC request. | errors | counter | +| `consul.rpc.request` | Increments when a server receives a Consul-related RPC request. | requests | counter | +| `consul.rpc.query` | Increments when a server receives a new blocking RPC request, indicating the rate of new blocking query calls. See consul.rpc.queries_blocking for the current number of in-flight blocking RPC calls. This metric changed in 1.7.0 to only increment on the the start of a query. The rate of queries will appear lower, but is more accurate. | queries | counter | +| `consul.rpc.queries_blocking` | The current number of in-flight blocking queries the server is handling. | queries | gauge | +| `consul.rpc.cross-dc` | Increments when a server sends a (potentially blocking) cross datacenter RPC query. | queries | counter | +| `consul.rpc.consistentRead` | Measures the time spent confirming that a consistent read can be performed. | ms | timer | +| `consul.session.apply` | Measures the time spent applying a session update. | ms | timer | +| `consul.session.renew` | Measures the time spent renewing a session. | ms | timer | +| `consul.session_ttl.invalidate` | Measures the time spent invalidating an expired session. | ms | timer | +| `consul.txn.apply` | Measures the time spent applying a transaction operation. | ms | timer | +| `consul.txn.read` | Measures the time spent returning a read transaction. | ms | timer | +| `consul.grpc.client.request.count` | Counts the number of gRPC requests made by the client agent to a Consul server. | requests | counter | +| `consul.grpc.client.connection.count` | Counts the number of new gRPC connections opened by the client agent to a Consul server. | connections | counter | +| `consul.grpc.client.connections` | Measures the number of active gRPC connections open from the client agent to any Consul servers. | connections | gauge | +| `consul.grpc.server.request.count` | Counts the number of gRPC requests received by the server. | requests | counter | +| `consul.grpc.server.connection.count` | Counts the number of new gRPC connections received by the server. | connections | counter | +| `consul.grpc.server.connections` | Measures the number of active gRPC connections open on the server. | connections | gauge | +| `consul.grpc.server.stream.count` | Counts the number of new gRPC streams received by the server. | streams | counter | +| `consul.grpc.server.streams` | Measures the number of active gRPC streams handled by the server. | streams | guage | ## Cluster Health