open-vault/website/pages/docs/internals/telemetry.mdx

419 lines
65 KiB
Plaintext
Raw Blame History

This file contains ambiguous Unicode characters

This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.

---
layout: docs
page_title: Telemetry
sidebar_title: Telemetry
description: Learn about the telemetry data available in Vault.
---
# Telemetry
The Vault server process collects various runtime metrics about the performance of different libraries and subsystems. These metrics are aggregated on a ten second interval and are retained for one minute.
To view the raw data, you must send a signal to the Vault process: on Unix-style operating systems, this is `USR1` while on Windows it is `BREAK`. When the Vault process receives this signal it will dump the current telemetry information to the process's `stderr`.
This telemetry information can be used for debugging or otherwise getting a better view of what Vault is doing.
Telemetry information can also be streamed directly from Vault to a range of metrics aggregation solutions as described in the [telemetry Stanza documentation][telemetry-stanza].
The following is an example telemetry dump snippet:
```text
[2017-12-19 20:37:50 +0000 UTC][G] 'vault.7f320e57f9fe.expire.num_leases': 5100.000
[2017-12-19 20:37:50 +0000 UTC][G] 'vault.7f320e57f9fe.runtime.num_goroutines': 39.000
[2017-12-19 20:37:50 +0000 UTC][G] 'vault.7f320e57f9fe.runtime.sys_bytes': 222746880.000
[2017-12-19 20:37:50 +0000 UTC][G] 'vault.7f320e57f9fe.runtime.malloc_count': 109189192.000
[2017-12-19 20:37:50 +0000 UTC][G] 'vault.7f320e57f9fe.runtime.free_count': 108408240.000
[2017-12-19 20:37:50 +0000 UTC][G] 'vault.7f320e57f9fe.runtime.heap_objects': 780953.000
[2017-12-19 20:37:50 +0000 UTC][G] 'vault.7f320e57f9fe.runtime.total_gc_runs': 232.000
[2017-12-19 20:37:50 +0000 UTC][G] 'vault.7f320e57f9fe.runtime.alloc_bytes': 72954392.000
[2017-12-19 20:37:50 +0000 UTC][G] 'vault.7f320e57f9fe.runtime.total_gc_pause_ns': 150293024.000
[2017-12-19 20:37:50 +0000 UTC][S] 'vault.merkle.flushDirty': Count: 100 Min: 0.008 Mean: 0.027 Max: 0.183 Stddev: 0.024 Sum: 2.681 LastUpdated: 2017-12-19 20:37:59.848733035 +0000 UTC m=+10463.692105920
[2017-12-19 20:37:50 +0000 UTC][S] 'vault.merkle.saveCheckpoint': Count: 4 Min: 0.021 Mean: 0.054 Max: 0.110 Stddev: 0.039 Sum: 0.217 LastUpdated: 2017-12-19 20:37:57.048458148 +0000 UTC m=+10460.891835029
[2017-12-19 20:38:00 +0000 UTC][G] 'vault.7f320e57f9fe.runtime.alloc_bytes': 73326136.000
[2017-12-19 20:38:00 +0000 UTC][G] 'vault.7f320e57f9fe.runtime.sys_bytes': 222746880.000
[2017-12-19 20:38:00 +0000 UTC][G] 'vault.7f320e57f9fe.runtime.malloc_count': 109195904.000
[2017-12-19 20:38:00 +0000 UTC][G] 'vault.7f320e57f9fe.runtime.free_count': 108409568.000
[2017-12-19 20:38:00 +0000 UTC][G] 'vault.7f320e57f9fe.runtime.heap_objects': 786342.000
[2017-12-19 20:38:00 +0000 UTC][G] 'vault.7f320e57f9fe.runtime.total_gc_pause_ns': 150293024.000
[2017-12-19 20:38:00 +0000 UTC][G] 'vault.7f320e57f9fe.expire.num_leases': 5100.000
[2017-12-19 20:38:00 +0000 UTC][G] 'vault.7f320e57f9fe.runtime.num_goroutines': 39.000
[2017-12-19 20:38:00 +0000 UTC][G] 'vault.7f320e57f9fe.runtime.total_gc_runs': 232.000
[2017-12-19 20:38:00 +0000 UTC][S] 'vault.route.rollback.consul-': Count: 1 Sum: 0.013 LastUpdated: 2017-12-19 20:38:01.968471579 +0000 UTC m=+10465.811842067
[2017-12-19 20:38:00 +0000 UTC][S] 'vault.rollback.attempt.consul-': Count: 1 Sum: 0.073 LastUpdated: 2017-12-19 20:38:01.968502743 +0000 UTC m=+10465.811873131
[2017-12-19 20:38:00 +0000 UTC][S] 'vault.rollback.attempt.pki-': Count: 1 Sum: 0.070 LastUpdated: 2017-12-19 20:38:01.96867005 +0000 UTC m=+10465.812041936
[2017-12-19 20:38:00 +0000 UTC][S] 'vault.route.rollback.auth-app-id-': Count: 1 Sum: 0.012 LastUpdated: 2017-12-19 20:38:01.969146401 +0000 UTC m=+10465.812516689
[2017-12-19 20:38:00 +0000 UTC][S] 'vault.rollback.attempt.identity-': Count: 1 Sum: 0.063 LastUpdated: 2017-12-19 20:38:01.968029888 +0000 UTC m=+10465.811400276
[2017-12-19 20:38:00 +0000 UTC][S] 'vault.rollback.attempt.database-': Count: 1 Sum: 0.066 LastUpdated: 2017-12-19 20:38:01.969394215 +0000 UTC m=+10465.812764603
[2017-12-19 20:38:00 +0000 UTC][S] 'vault.barrier.get': Count: 16 Min: 0.010 Mean: 0.015 Max: 0.031 Stddev: 0.005 Sum: 0.237 LastUpdated: 2017-12-19 20:38:01.983268118 +0000 UTC m=+10465.826637008
[2017-12-19 20:38:00 +0000 UTC][S] 'vault.merkle.flushDirty': Count: 100 Min: 0.006 Mean: 0.024 Max: 0.098 Stddev: 0.019 Sum: 2.386 LastUpdated: 2017-12-19 20:38:09.848158309 +0000 UTC m=+10473.691527099
```
You'll note that log entries are prefixed with the metric type as follows:
- **[C]** is a counter
- **[G]** is a gauge
- **[S]** is a summary
The following sections describe available Vault metrics. The metrics interval can be assumed to be 10 seconds when manually triggering metrics output using the above described signals.
## Audit Metrics
These metrics relate to auditing.
| Metric | Description | Unit | Type |
| :--------------------------------- | :-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- | :------- | :------ |
| `vault.audit.log_request` | Duration of time taken by all audit log requests across all audit log devices | ms | summary |
| `vault.audit.log_response` | Duration of time taken by audit log responses across all audit log devices | ms | summary |
| `vault.audit.log_request_failure` | Number of audit log request failures. **NOTE**: This is a particularly important metric. Any non-zero value here indicates that there was a failure to make an audit log request to any of the configured audit log devices; **when Vault cannot log to any of the configured audit log devices it ceases all user operations**, and you should begin troubleshooting the audit log devices immediately if this metric continually increases. | failures | counter |
| `vault.audit.log_response_failure` | Number of audit log response failures. **NOTE**: This is a particularly important metric. Any non-zero value here indicates that there was a failure to receive a response to a request made to one of the configured audit log devices; **when Vault cannot log to any of the configured audit log devices it ceases all user operations**, and you should begin troubleshooting the audit log devices immediately if this metric continually increases. | failures | counter |
**NOTE:** In addition, there are audit metrics for each enabled audit device represented as `vault.audit.<type>.log_request`. For example, if a file audit device is enabled, its metrics would be `vault.audit.file.log_request` and `vault.audit.file.log_response` .
## Core Metrics
These metrics represent operational aspects of the running Vault instance.
| Metric | Description | Unit | Type |
| :----------------------------------- | :-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- | :--- | :------ |
| `vault.barrier.delete` | Duration of time taken by DELETE operations at the barrier | ms | summary |
| `vault.barrier.get` | Duration of time taken by GET operations at the barrier | ms | summary |
| `vault.barrier.put` | Duration of time taken by PUT operations at the barrier | ms | summary |
| `vault.barrier.list` | Duration of time taken by LIST operations at the barrier | ms | summary |
| `vault.core.check_token` | Duration of time taken by token checks handled by Vault core | ms | summary |
| `vault.core.fetch_acl_and_token` | Duration of time taken by ACL and corresponding token entry fetches handled by Vault core | ms | summary |
| `vault.core.handle_request` | Duration of time taken by requests handled by Vault core | ms | summary |
| `vault.core.handle_login_request` | Duration of time taken by login requests handled by Vault core | ms | summary |
| `vault.core.leadership_setup_failed` | Duration of time taken by cluster leadership setup failures which have occurred in a highly available Vault cluster. This should be monitored and alerted on for overall cluster leadership status. | ms | summary |
| `vault.core.leadership_lost` | Duration of time taken by cluster leadership losses which have occurred in a highly available Vault cluster. This should be monitored and alerted on for overall cluster leadership status. | ms | summary |
| `vault.core.post_unseal` | Duration of time taken by post-unseal operations handled by Vault core | ms | gauge |
| `vault.core.pre_seal` | Duration of time taken by pre-seal operations | ms | gauge |
| `vault.core.seal-with-request` | Duration of time taken by requested seal operations | ms | gauge |
| `vault.core.seal` | Duration of time taken by seal operations | ms | gauge |
| `vault.core.seal-internal` | Duration of time taken by internal seal operations | ms | gauge |
| `vault.core.step_down` | Duration of time taken by cluster leadership step downs. This should be monitored and alerted on for overall cluster leadership status. | ms | summary |
| `vault.core.unseal` | Duration of time taken by unseal operations | ms | summary |
## Runtime Metrics
These metrics represent runtime aspects of the running Vault instance.
| Metric | Description | Unit | Type |
| :-------------------------------- | :----------------------------------------------------------------------------------------------------------------------------------------------------------- | :--------- | :------ |
| `vault.runtime.alloc_bytes` | Number of bytes allocated by the Vault process. This could burst from time to time, but should return to a steady state value. | bytes | gauge |
| `vault.runtime.free_count` | Number of freed objects | objects | gauge |
| `vault.runtime.heap_objects` | Number of objects on the heap. This is a good general memory pressure indicator worth establishing a baseline and thresholds for alerting. | objects | gauge |
| `vault.runtime.malloc_count` | Cumulative count of allocated heap objects | objects | gauge |
| `vault.runtime.num_goroutines` | Number of goroutines. This serves as a general system load indicator worth establishing a baseline and thresholds for alerting. | goroutines | gauge |
| `vault.runtime.sys_bytes` | Number of bytes allocated to Vault. This includes what is being used by Vault's heap and what has been reclaimed but not given back to the operating system. | bytes | gauge |
| `vault.runtime.total_gc_pause_ns` | The total garbage collector pause time since Vault was last started | ns | gauge |
| `vault.runtime.gc_pause_ns` | Total duration of the last garbage collection run | ns | sample |
| `vault.runtime.total_gc_runs` | Total number of garbage collection runs since Vault was last started | operations | gauge |
## Policy and Token Metrics
These metrics relate to policies and tokens.
| Metric | Description | Unit | Type |
| :---------------------------------------- | :-------------------------------------------------------------------------- | :----- | :------ |
| `vault.expire.fetch-lease-times` | Time taken to fetch lease times | ms | summary |
| `vault.expire.fetch-lease-times-by-token` | Time taken to fetch lease times by token | ms | summary |
| `vault.expire.num_leases` | Number of all leases which are eligible for eventual expiry | leases | gauge |
| `vault.expire.revoke` | Time taken to revoke a token | ms | summary |
| `vault.expire.revoke-force` | Time taken to forcibly revoke a token | ms | summary |
| `vault.expire.revoke-prefix` | Time taken to revoke tokens on a prefix | ms | summary |
| `vault.expire.revoke-by-token` | Time taken to revoke all secrets issued with a given token | ms | summary |
| `vault.expire.renew` | Time taken to renew a lease | ms | summary |
| `vault.expire.renew-token` | Time taken to renew a token which does not need to invoke a logical backend | ms | summary |
| `vault.expire.register` | Time taken for register operations | ms | summary |
These operations take a request and response with an associated lease and register a lease entry with lease ID
| Metric | Description | Unit | Type |
| :--------------------------- | :-------------------------------------------------------------------------------------------- | :---- | :------ |
| `vault.expire.register-auth` | Time taken for register authentication operations which create lease entries without lease ID | ms | summary |
| `vault.policy.get_policy` | Time taken to get a policy | ms | summary |
| `vault.policy.list_policies` | Time taken to list policies | ms | summary |
| `vault.policy.delete_policy` | Time taken to delete a policy | ms | summary |
| `vault.policy.set_policy` | Time taken to set a policy | ms | summary |
| `vault.token.create` | The time taken to create a token | ms | summary |
| `vault.token.create_root` | Number of created root tokens. Does not decrease on revocation. | token | counter |
| `vault.token.createAccessor` | The time taken to create a token accessor | ms | summary |
| `vault.token.lookup` | The time taken to look up a token | ms | summary |
| `vault.token.revoke` | Time taken to revoke a token | ms | summary |
| `vault.token.revoke-tree` | Time taken to revoke a token tree | ms | summary |
| `vault.token.store` | Time taken to store an updated token entry without writing to the secondary index | ms | summary |
## Auth Methods Metrics
These metrics relate to supported authentication methods.
| Metric | Description | Unit | Type |
| :---------------------------------- | :------------------------------------------------------------------------------------------------------------ | :--- | :------ |
| `vault.rollback.attempt.auth-token` | Time taken to perform a rollback operation for the [token auth method][token-auth-backend] | ms | summary |
| `vault.rollback.attempt.auth-ldap` | Time taken to perform a rollback operation for the [LDAP auth method][ldap-auth-backend] | ms | summary |
| `vault.rollback.attempt.cubbyhole` | Time taken to perform a rollback operation for the [Cubbyhole secret backend][cubbyhole-secrets-engine] | ms | summary |
| `vault.rollback.attempt.secret` | Time taken to perform a rollback operation for the [K/V secret backend][kv-secrets-engine] | ms | summary |
| `vault.rollback.attempt.sys` | Time taken to perform a rollback operation for the system backend | ms | summary |
| `vault.route.rollback.auth-ldap` | Time taken to perform a route rollback operation for the [LDAP auth method][ldap-auth-backend] | ms | summary |
| `vault.route.rollback.auth-token` | Time taken to perform a route rollback operation for the [token auth method][token-auth-backend] | ms | summary |
| `vault.route.rollback.cubbyhole` | Time taken to perform a route rollback operation for the [Cubbyhole secret backend][cubbyhole-secrets-engine] | ms | summary |
| `vault.route.rollback.secret` | Time taken to perform a route rollback operation for the [K/V secret backend][kv-secrets-engine] | ms | summary |
| `vault.route.rollback.sys` | Time taken to perform a route rollback operation for the system backend | ms | summary |
## Merkle Tree and Write Ahead Log Metrics
These metrics relate to internal operations on Merkle Trees and Write Ahead Logs (WAL)
| Metric | Description | Unit | Type |
| :---------------------------- | :-------------------------------------------------------------------------- | :--- | :------ |
| `vault.merkle_flushdirty` | Time taken to flush any dirty pages to cold storage | ms | summary |
| `vault.merkle_savecheckpoint` | Time taken to save the checkpoint | ms | summary |
| `vault.wal_deletewals` | Time taken to delete a Write Ahead Log (WAL) | ms | summary |
| `vault.wal_gc_deleted` | Number of Write Ahead Logs (WAL) deleted during each garbage collection run | WAL | counter |
| `vault.wal_gc_total` | Total Number of Write Ahead Logs (WAL) on disk | WAL | counter |
| `vault.wal_loadWAL` | Time taken to load a Write Ahead Log (WAL) | ms | summary |
| `vault.wal_persistwals` | Time taken to persist a Write Ahead Log (WAL) | ms | summary |
| `vault.wal_flushready` | Time taken to flush a ready Write Ahead Log (WAL) to storage | ms | summary |
## Replication Metrics
These metrics relate to [Vault Enterprise Replication](/docs/enterprise/replication). The following metrics are not available in telemetry unless replication is in an unhealthy state: `replication.fetchRemoteKeys`, `replication.merkleDiff`, and `replication.merkleSync`.
| Metric | Description | Unit | Type |
| :------------------------------------------------------ | :----------------------------------------------------------------------------------------------------------------------------------------- | :-------------- | :------ |
| `logshipper.streamWALs.missing_guard` | Number of incidences where the starting Merkle Tree index used to begin streaming WAL entries is not matched/found | missing guards | counter |
| `logshipper.streamWALs.guard_found` | Number of incidences where the starting Merkle Tree index used to begin streaming WAL entries is matched/found | found guards | counter |
| `replication.fetchRemoteKeys` | Time taken to fetch keys from a remote cluster participating in replication prior to Merkle Tree based delta generation | ms | summary |
| `replication.merkleDiff` | Time taken to perform a Merkle Tree based delta generation between the clusters participating in replication | ms | summary |
| `replication.merkleSync` | Time taken to perform a Merkle Tree based synchronization using the last delta generated between the clusters participating in replication | ms | summary |
| `replication.merkle.commit_index` | The last committed index in the Merkle Tree. | sequence number | gauge |
| `replication.wal.last_wal` | The index of the last WAL | sequence number | gauge |
| `replication.wal.last_dr_wal` | The index of the last DR WAL | sequence number | gauge |
| `replication.wal.last_performance_wal` | The index of the last Performance WAL | sequence number | gauge |
| `replication.fsm.last_remote_wal` | The index of the last remote WAL | sequence number | gauge |
| `vault.replication.wal.gc` |Time taken to complete one run of the WAL garbage collection process | ms | summary |
| `replication.rpc.server.auth_request` | Duration of time taken by auth request | ms | summary |
| `replication.rpc.server.bootstrap_request` | Duration of time taken by bootstrap request | ms | summary |
| `replication.rpc.server.conflicting_pages_request` | Duration of time taken by conflicting pages request | ms | summary |
| `replication.rpc.server.echo` | Duration of time taken by echo | ms | summary |
| `replication.rpc.server.forwarding_request` | Duration of time taken by forwarding request | ms | summary |
| `replication.rpc.server.guard_hash_request` | Duration of time taken by guard hash request | ms | summary |
| `replication.rpc.server.persist_alias_request` | Duration of time taken by persist alias request | ms | summary |
| `replication.rpc.server.persist_persona_request` | Duration of time taken by persist persona request | ms | summary |
| `replication.rpc.server.stream_wals_request` | Duration of time taken by stream wals request | ms | summary |
| `replication.rpc.server.sub_page_hashes_request` | Duration of time taken by sub page hashes request | ms | summary |
| `replication.rpc.server.sync_counter_request` | Duration of time taken by sync counter request | ms | summary |
| `replication.rpc.server.upsert_group_request` | Duration of time taken by upsert group request | ms | summary |
| `replication.rpc.client.conflicting_pages` | Duration of time taken by client conflicting pages request | ms | summary |
| `replication.rpc.client.fetch_keys` | Duration of time taken by client fetch keys request | ms | summary |
| `replication.rpc.client.forward` | Duration of time taken by client forward request | ms | summary |
| `replication.rpc.client.guard_hash` | Duration of time taken by client guard hash request | ms | summary |
| `replication.rpc.client.persist_alias` | Duration of time taken by | ms | summary |
| `replication.rpc.client.register_auth` | Duration of time taken by client register auth request | ms | summary |
| `replication.rpc.client.register_lease` | Duration of time taken by client register lease request | ms | summary |
| `replication.rpc.client.stream_wals` | Duration of time taken by client s | ms | summary |
| `replication.rpc.client.sub_page_hashes` | Duration of time taken by client sub page hashes request | ms | summary |
| `replication.rpc.client.sync_counter` | Duration of time taken by client sync counter request | ms | summary |
| `replication.rpc.client.upsert_group` | Duration of time taken by client upstert group request | ms | summary |
| `replication.rpc.client.wrap_in_cubbyhole` | Duration of time taken by client wrap in cubbyhole request | ms | summary |
| `replication.rpc.dr.server.echo` | Duration of time taken by DR echo request | ms | summary |
| `replication.rpc.dr.server.fetch_keys_request` | Duration of time taken by DR fetch keys request | ms | summary |
| `replication.rpc.standby.server.echo` | Duration of time taken by standby echo request | ms | summary |
| `replication.rpc.standby.server.register_auth_request` | Duration of time taken by standby register auth request | ms | summary |
| `replication.rpc.standby.server.register_lease_request` | Duration of time taken by standby register lease request | ms | summary |
| `replication.rpc.standby.server.wrap_token_request` | Duration of time taken by standby wrap token request | ms | summary |
## Secrets Engines Metrics
These metrics relate to the supported [secrets engines][secrets-engines].
| Metric | Description | Unit | Type |
| :--------------------------------------- | :------------------------------------------------------------------------------------------------------------------------------------------------------------------------- | :----- | :------ |
| `database.Initialize` | Time taken to initialize a database secret engine across all database secrets engines | ms | summary |
| `database.<name>.Initialize` | Time taken to initialize a database secret engine for the named database secrets engine `<name>`, for example: `database.postgresql-prod.Initialize` | ms | summary |
| `database.Initialize.error` | Number of database secrets engine initialization operation errors across all database secrets engines | errors | counter |
| `database.<name>.Initialize.error` | Number of database secrets engine initialization operation errors for the named database secrets engine `<name>`, for example: `database.postgresql-prod.Initialize.error` | errors | counter |
| `database.Close` | Time taken to close a database secret engine across all database secrets engines | ms | summary |
| `database.<name>.Close` | Time taken to close a database secret engine for the named database secrets engine `<name>`, for example: `database.postgresql-prod.Close` | ms | summary |
| `database.Close.error` | Number of database secrets engine close operation errors across all database secrets engines | errors | counter |
| `database.<name>.Close.error` | Number of database secrets engine close operation errors for the named database secrets engine `<name>`, for example: `database.postgresql-prod.Close.error` | errors | counter |
| `database.CreateUser` | Time taken to create a user across all database secrets engines | ms | summary |
| `database.<name>.CreateUser` | Time taken to create a user for the named database secrets engine `<name>` | ms | summary |
| `database.CreateUser.error` | Number of user creation operation errors across all database secrets engines | errors | counter |
| `database.<name>.CreateUser.error` | Number of user creation operation errors for the named database secrets engine `<name>`, for example: `database.postgresql-prod.CreateUser.error` | errors | counter |
| `database.RenewUser` | Time taken to renew a user across all database secrets engines | ms | summary |
| `database.<name>.RenewUser` | Time taken to renew a user for the named database secrets engine `<name>`, for example: `database.postgresql-prod.RenewUser` | ms | summary |
| `database.RenewUser.error` | Number of user renewal operation errors across all database secrets engines | errors | counter |
| `database.<name>.RenewUser.error` | Number of user renewal operations for the named database secrets engine `<name>`, for example: `database.postgresql-prod.RenewUser.error` | errors | counter |
| `database.RevokeUser` | Time taken to revoke a user across all database secrets engines | ms | summary |
| `database.<name>.RevokeUser` | Time taken to revoke a user for the named database secrets engine `<name>`, for example: `database.postgresql-prod.RevokeUser` | ms | summary |
| `database.RevokeUser.error` | Number of user revocation operation errors across all database secrets engines | errors | counter |
| `database.<name>.RevokeUser.error` | Number of user revocation operations for the named database secrets engine `<name>`, for example: `database.postgresql-prod.RevokeUser.error` | errors | counter |
## Storage Backend Metrics
These metrics relate to the supported [storage backends][storage-backends].
| Metric | Description | Unit | Type |
| :-------------------------- | :--------------------------------------------------------------------------------------------------------------------- | :--- | :------ |
| `vault.azure.put` | Duration of a PUT operation against the [Azure storage backend][azure-storage-backend] | ms | summary |
| `vault.azure.get` | Duration of a GET operation against the [Azure storage backend][azure-storage-backend] | ms | summary |
| `vault.azure.delete` | Duration of a DELETE operation against the [Azure storage backend][azure-storage-backend] | ms | summary |
| `vault.azure.list` | Duration of a LIST operation against the [Azure storage backend][azure-storage-backend] | ms | summary |
| `vault.cassandra.put` | Duration of a PUT operation against the [Cassandra storage backend][cassandra-storage-backend] | ms | summary |
| `vault.cassandra.get` | Duration of a GET operation against the [Cassandra storage backend][cassandra-storage-backend] | ms | summary |
| `vault.cassandra.delete` | Duration of a DELETE operation against the [Cassandra storage backend][cassandra-storage-backend] | ms | summary |
| `vault.cassandra.list` | Duration of a LIST operation against the [Cassandra storage backend][cassandra-storage-backend] | ms | summary |
| `vault.cockroachdb.put` | Duration of a PUT operation against the [CockroachDB storage backend][cockroachdb-storage-backend] | ms | summary |
| `vault.cockroachdb.get` | Duration of a GET operation against the [CockroachDB storage backend][cockroachdb-storage-backend] | ms | summary |
| `vault.cockroachdb.delete` | Duration of a DELETE operation against the [CockroachDB storage backend][cockroachdb-storage-backend] | ms | summary |
| `vault.cockroachdb.list` | Duration of a LIST operation against the [CockroachDB storage backend][cockroachdb-storage-backend] | ms | summary |
| `vault.consul.put` | Duration of a PUT operation against the [Consul storage backend][consul-storage-backend] | ms | summary |
| `vault.consul.get` | Duration of a GET operation against the [Consul storage backend][consul-storage-backend] | ms | summary |
| `vault.consul.delete` | Duration of a DELETE operation against the [Consul storage backend][consul-storage-backend] | ms | summary |
| `vault.consul.list` | Duration of a LIST operation against the [Consul storage backend][consul-storage-backend] | ms | summary |
| `vault.couchdb.put` | Duration of a PUT operation against the [CouchDB storage backend][couchdb-storage-backend] | ms | summary |
| `vault.couchdb.get` | Duration of a GET operation against the [CouchDB storage backend][couchdb-storage-backend] | ms | summary |
| `vault.couchdb.delete` | Duration of a DELETE operation against the [CouchDB storage backend][couchdb-storage-backend] | ms | summary |
| `vault.couchdb.list` | Duration of a LIST operation against the [CouchDB storage backend][couchdb-storage-backend] | ms | summary |
| `vault.dynamodb.put` | Duration of a PUT operation against the [DynamoDB storage backend][dynamodb-storage-backend] | ms | summary |
| `vault.dynamodb.get` | Duration of a GET operation against the [DynamoDB storage backend][dynamodb-storage-backend] | ms | summary |
| `vault.dynamodb.delete` | Duration of a DELETE operation against the [DynamoDB storage backend][dynamodb-storage-backend] | ms | summary |
| `vault.dynamodb.list` | Duration of a LIST operation against the [DynamoDB storage backend][dynamodb-storage-backend] | ms | summary |
| `vault.etcd.put` | Duration of a PUT operation against the [etcd storage backend][etcd-storage-backend] | ms | summary |
| `vault.etcd.get` | Duration of a GET operation against the [etcd storage backend][etcd-storage-backend] | ms | summary |
| `vault.etcd.delete` | Duration of a DELETE operation against the [etcd storage backend][etcd-storage-backend] | ms | summary |
| `vault.etcd.list` | Duration of a LIST operation against the [etcd storage backend][etcd-storage-backend] | ms | summary |
| `vault.gcs.put` | Duration of a PUT operation against the [Google Cloud Storage storage backend][gcs-storage-backend] | ms | summary |
| `vault.gcs.get` | Duration of a GET operation against the [Google Cloud Storage storage backend][gcs-storage-backend] | ms | summary |
| `vault.gcs.delete` | Duration of a DELETE operation against the [Google Cloud Storage storage backend][gcs-storage-backend] | ms | summary |
| `vault.gcs.list` | Duration of a LIST operation against the [Google Cloud Storage storage backend][gcs-storage-backend] | ms | summary |
| `vault.gcs.lock.unlock` | Duration of an UNLOCK operation against the [Google Cloud Storage storage backend][gcs-storage-backend] in HA mode | ms | summary |
| `vault.gcs.lock.lock` | Duration of a LOCK operation against the [Google Cloud Storage storage backend][gcs-storage-backend] in HA mode | ms | summary |
| `vault.gcs.lock.value` | Duration of a VALUE operation against the [Google Cloud Storage storage backend][gcs-storage-backend] in HA mode | ms | summary |
| `vault.mssql.put` | Duration of a PUT operation against the [MS-SQL storage backend][mssql-storage-backend] | ms | summary |
| `vault.mssql.get` | Duration of a GET operation against the [MS-SQL storage backend][mssql-storage-backend] | ms | summary |
| `vault.mssql.delete` | Duration of a DELETE operation against the [MS-SQL storage backend][mssql-storage-backend] | ms | summary |
| `vault.mssql.list` | Duration of a LIST operation against the [MS-SQL storage backend][mssql-storage-backend] | ms | summary |
| `vault.mysql.put` | Duration of a PUT operation against the [MySQL storage backend][mysql-storage-backend] | ms | summary |
| `vault.mysql.get` | Duration of a GET operation against the [MySQL storage backend][mysql-storage-backend] | ms | summary |
| `vault.mysql.delete` | Duration of a DELETE operation against the [MySQL storage backend][mysql-storage-backend] | ms | summary |
| `vault.mysql.list` | Duration of a LIST operation against the [MySQL storage backend][mysql-storage-backend] | ms | summary |
| `vault.postgres.put` | Duration of a PUT operation against the [PostgreSQL storage backend][postgresql-storage-backend] | ms | summary |
| `vault.postgres.get` | Duration of a GET operation against the [PostgreSQL storage backend][postgresql-storage-backend] | ms | summary |
| `vault.postgres.delete` | Duration of a DELETE operation against the [PostgreSQL storage backend][postgresql-storage-backend] | ms | summary |
| `vault.postgres.list` | Duration of a LIST operation against the [PostgreSQL storage backend][postgresql-storage-backend] | ms | summary |
| `vault.s3.put` | Duration of a PUT operation against the [Amazon S3 storage backend][s3-storage-backend] | ms | summary |
| `vault.s3.get` | Duration of a GET operation against the [Amazon S3 storage backend][s3-storage-backend] | ms | summary |
| `vault.s3.delete` | Duration of a DELETE operation against the [Amazon S3 storage backend][s3-storage-backend] | ms | summary |
| `vault.s3.list` | Duration of a LIST operation against the [Amazon S3 storage backend][s3-storage-backend] | ms | summary |
| `vault.spanner.put` | Duration of a PUT operation against the [Google Cloud Spanner storage backend][spanner-storage-backend] | ms | summary |
| `vault.spanner.get` | Duration of a GET operation against the [Google Cloud Spanner storage backend][spanner-storage-backend] | ms | summary |
| `vault.spanner.delete` | Duration of a DELETE operation against the [Google Cloud Spanner storage backend][spanner-storage-backend] | ms | summary |
| `vault.spanner.list` | Duration of a LIST operation against the [Google Cloud Spanner storage backend][spanner-storage-backend] | ms | summary |
| `vault.spanner.lock.unlock` | Duration of an UNLOCK operation against the [Google Cloud Spanner storage backend][spanner-storage-backend] in HA mode | ms | summary |
| `vault.spanner.lock.lock` | Duration of a LOCK operation against the [Google Cloud Spanner storage backend][spanner-storage-backend] in HA mode | ms | summary |
| `vault.spanner.lock.value` | Duration of a VALUE operation against the [Google Cloud Spanner storage backend][gcs-storage-backend] in HA mode | ms | summary |
| `vault.swift.put` | Duration of a PUT operation against the [Swift storage backend][swift-storage-backend] | ms | summary |
| `vault.swift.get` | Duration of a GET operation against the [Swift storage backend][swift-storage-backend] | ms | summary |
| `vault.swift.delete` | Duration of a DELETE operation against the [Swift storage backend][swift-storage-backend] | ms | summary |
| `vault.swift.list` | Duration of a LIST operation against the [Swift storage backend][swift-storage-backend] | ms | summary |
| `vault.zookeeper.put` | Duration of a PUT operation against the [ZooKeeper storage backend][zookeeper-storage-backend] | ms | summary |
| `vault.zookeeper.get` | Duration of a GET operation against the [ZooKeeper storage backend][zookeeper-storage-backend] | ms | summary |
| `vault.zookeeper.delete` | Duration of a DELETE operation against the [ZooKeeper storage backend][zookeeper-storage-backend] | ms | summary |
| `vault.zookeeper.list` | Duration of a LIST operation against the [ZooKeeper storage backend][zookeeper-storage-backend] | ms | summary |
## Integrated Raft Storage Health
These metrics relate to raft based [integrated storage][integrated-storage].
| Metric | Description | Unit | Type |
| :---------------------------------------------- | :------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------ | :-------------------------------- | :------ |
| `vault.raft.apply` | Number of Raft transactions occurring over the interval, which is a general indicator of the write load on the Raft servers. | raft transactions / interval | counter |
| `vault.raft.barrier` | Number of times the node has started the barrier i.e the number of times it has issued a blocking call, to ensure that the node has all the pending operations that were queued, to be applied to the node's FSM. | blocks / interval | counter |
| `vault.raft.candidate.electSelf` | Time to request for a vote from a peer. | ms | summary |
| `vault.raft.commitNumLogs` | Number of logs processed for application to the FSM in a single batch. | logs | gauge |
| `vault.raft.commitTime` | Time to commit a new entry to the Raft log on the leader. | ms | timer |
| `vault.raft.compactLogs` | Time to trim the logs that are no longer needed. | ms | summary |
| `vault.raft.delete` | Time to delete file from raft's underlying storage. | ms | summary |
| `vault.raft.delete_prefix` | Time to delete files under a prefix from raft's underlying storage. | ms | summary |
| `vault.raft.fsm.apply` | Number of logs committed since the last interval. | commit logs / interval | summary |
| `vault.raft.fsm.applyBatch` | Time to apply batch of logs. | ms | summary |
| `vault.raft.fsm.applyBatchNum` | Number of logs applied in batch. | ms | summary |
| `vault.raft.fsm.enqueue` | Time to enqueue a batch of logs for the FSM to apply. | ms | timer |
| `vault.raft.fsm.restore` | Time taken by the FSM to restore its state from a snapshot. | ms | summary |
| `vault.raft.fsm.snapshot` | Time taken by the FSM to record the current state for the snapshot. | ms | summary |
| `vault.raft.fsm.store_config` | Time to store the configuration. | ms | summary |
| `vault.raft.get` | Time to retrieve file from raft's underlying storage. | ms | summary |
| `vault.raft.leader.dispatchLog` | Time for the leader to write log entries to disk. | ms | timer |
| `vault.raft.leader.dispatchNumLogs` | Number of logs committed to disk in a batch. | logs | gauge |
| `vault.raft.list` | Time to retrieve list of keys from raft's underlying storage. | ms | summary |
| `vault.raft.put` | Time to persist key in raft's underlying storage. | ms | summary |
| `vault.raft.replication.appendEntries.log` | Number of logs replicated to a node, to bring it up to speed with the leader's logs. | logs appended / interval | counter |
| `vault.raft.replication.appendEntries.rpc` | Time taken by the append entries RFC, to replicate the log entries of a leader node onto its follower node(s). | ms | timer |
| `vault.raft.replication.heartbeat` | Time taken to invoke appendEntries on a peer, so that it doesnt timeout on a periodic basis. | ms | timer |
| `vault.raft.replication.installSnapshot` | Time taken to process the installSnapshot RPC call. This metric should only be seen on nodes which are currently in the follower state. | ms | timer |
| `vault.raft.restore` | Number of times the restore operation has been performed by the node. Here, restore refers to the action of raft consuming an external snapshot to restore its state. | operation invoked / interval | counter |
| `vault.raft.restoreUserSnapshot` | Time taken by the node to restore the FSM state from a user's snapshot. | ms | timer |
| `vault.raft.rpc.appendEntries` | Time taken to process an append entries RPC call from a node. | ms | timer |
| `vault.raft.rpc.appendEntries.processLogs` | Time taken to process the outstanding log entries of a node. | ms | timer |
| `vault.raft.rpc.appendEntries.storeLogs` | Time taken to add any outstanding logs for a node, since the last appendEntries was invoked. | ms | timer |
| `vault.raft.rpc.installSnapshot` | Time taken to process the installSnapshot RPC call. This metric should only be seen on nodes which are currently in the follower state. | ms | timer |
| `vault.raft.rpc.processHeartbeat` | Time taken to process a heartbeat request. | ms | timer |
| `vault.raft.rpc.requestVote` | Time taken to complete requestVote RPC call. | ms | summary |
| `vault.raft.snapshot.create` | Time taken to initialize the snapshot process. | ms | timer |
| `vault.raft.snapshot.persist` | Time taken to dump the current snapshot taken by the node to the disk. | ms | timer |
| `vault.raft.snapshot.takeSnapshot` | Total time involved in taking the current snapshot (creating one and persisting it) by the node. | ms | timer |
| `vault.raft.state.follower` | Number of times node has entered the follower mode. This happens when a new node joins the cluster or after the end of a leader election. | follower state entered / interval | counter |
| `vault.raft.transition.heartbeat_timeout` | Number of times node has transitioned to the Candidate state, after receive no heartbeat messages from the last known leader. | timeouts / interval | counter |
| `vault.raft.transition.leader_lease_timeout` | Number of times quorum of nodes were not able to be contacted. | contact failures | counter |
| `vault.raft.verify_leader` | Number of times node checks whether it is still the leader or not. | checks / interval | counter |
| `vault.raft-storage.delete` | Time to insert log entry to delete path. | ms | timer |
| `vault.raft-storage.get` | Time to retrieve value for path from FSM. | ms | timer |
| `vault.raft-storage.put` | Time to insert log entry to persist path. | ms | timer |
| `vault.raft-storage.list` | Time to list all entries under the prefix from the FSM. | ms | timer |
| `vault.raft-storage.transaction` | Time to insert operations into a single log. | ms | timer |
| `vault.raft-storage.entry_size` | The total size of a Raft entry during log application in bytes. | bytes | sample |
## Integrated Raft Storage Leadership Changes
| Metric | Description | Unit | Type |
| :---------------------------------------------- | :------------------------------------------------------------------------------------------------------------ | :--- | :------ |
| `vault.raft.leader.lastContact` | Measures the time since the leader was last able to contact the follower nodes when checking its leader lease | ms | summary |
| `vault.raft.state.candidate` | Increments whenever raft server starts an election | Elections | counter |
| `vault.raft.state.leader` | Increments whenever raft server becomes a leader | Leaders | counter |
**Why they're important**: Normally, your raft cluster should have a stable
leader. If there are frequent elections or leadership changes, it would likely
indicate network issues between the raft nodes, or that the raft servers
themselves are unable to keep up with the load.
**What to look for**: For a healthy cluster, you're looking for a lastContact
lower than 200ms, leader > 0 and candidate == 0. Deviations from this might
indicate flapping leadership.
[secrets-engines]: /docs/secrets
[storage-backends]: /docs/configuration/storage
[telemetry-stanza]: /docs/configuration/telemetry
[cubbyhole-secrets-engine]: /docs/secrets/cubbyhole
[kv-secrets-engine]: /docs/secrets/kv
[ldap-auth-backend]: /docs/auth/ldap
[token-auth-backend]: /docs/auth/token
[azure-storage-backend]: /docs/configuration/storage/azure
[cassandra-storage-backend]: /docs/configuration/storage/cassandra
[cockroachdb-storage-backend]: /docs/configuration/storage/cockroachdb
[consul-storage-backend]: /docs/configuration/storage/consul
[couchdb-storage-backend]: /docs/configuration/storage/couchdb
[dynamodb-storage-backend]: /docs/configuration/storage/dynamodb
[etcd-storage-backend]: /docs/configuration/storage/etcd
[gcs-storage-backend]: /docs/configuration/storage/google-cloud-storage
[spanner-storage-backend]: /docs/configuration/storage/google-cloud-spanner
[mssql-storage-backend]: /docs/configuration/storage/mssql
[mysql-storage-backend]: /docs/configuration/storage/mysql
[postgresql-storage-backend]: /docs/configuration/storage/postgresql
[s3-storage-backend]: /docs/configuration/storage/s3
[swift-storage-backend]: /docs/configuration/storage/swift
[zookeeper-storage-backend]: /docs/configuration/storage/zookeeper
[integrated-storage]: /docs/configuration/storage/raft