The Vault server process collects various runtime metrics about the performance of different libraries and subsystems. These metrics are aggregated on a ten-second interval and retained for one minute in memory. Telemetry from Vault must be stored in metrics aggregation software to monitor Vault and collect durable metrics.
To view the raw data, you must send a signal to the Vault process: on Unix-style operating systems, this is `USR1`, while on Windows, it is `BREAK`. When the Vault process receives this signal, it will dump the current telemetry information to the process's `stderr`.
Telemetry information can also be streamed directly from Vault to a range of metrics aggregation solutions as described in the [telemetry Stanza][telemetry-stanza] documentation.
- **[C]** is a counter. Counters are cumulative metrics that are incremented when some event occurs, and resets at the end of reporting intervals. Vault retains counters and other metrics for one minute in-memory, so an [aggregation solution][telemetry-stanza] must be configured to see accurate and persistent counters over time.
- **[S]** is a summary. Summaries provide sample observations of values. Vault commonly uses summaries for measuring the timing duration of discrete events in the reporting interval.
The following sections describe the available Vault metrics. The metrics interval are approximately 10 seconds when manually triggering metrics output using the above-described signals. Some high-cardinality gauges, like `vault.kv.secret.count`, are emitted every 10 minutes, or at an interval configured in the `telemetry` stanza.
Some Vault metrics come with additional [labels](#metric-labels) describing the measurement in more detail, such as the namespace in which an operation takes place or the auth method used to create a token. This additional information is incorporated into the metrics name in the in-memory telemetry or other telemetry engines that do not support labels. The metric name in the table below is followed by a list of labels supported, in the order in which they appear, if flattened.
| `vault.audit.log_request_failure` | Number of audit log request failures. **NOTE**: This is a crucial metric. A non-zero value here indicates that there was a failure to send an audit log request to a configured audit log devices occured. If Vault cannot log into a configured audit log device, it ceases all user operations. When this metric increases regularly, it is suggested to troubleshoot the audit log devices immediately. | failures | counter |
| `vault.audit.log_response_failure` | Number of audit log response failures. **NOTE**: This is a crucial metric. A non-zero value here indicates that there was a failure to receive a response to a request made to one of the configured audit log devices occured. When Vault cannot log to a configured audit log devices, it ceases all user operations. Troubleshooting the audit log devices is suggested when a consistent value of this metric is evaluated. | failures | counter |
**NOTE:** In addition, there are audit metrics for each enabled audit device represented as `vault.audit.<type>.log_request`. For example, if a file audit device is enabled, its metrics would be `vault.audit.file.log_request` and `vault.audit.file.log_response` .
## Core Metrics
These metrics represent operational aspects of the running Vault instance.
| `vault.core.leadership_setup_failed` | Duration of time taken by cluster leadership setup failures which have occurred in a highly available Vault cluster. This should be monitored and alerted on for overall cluster leadership status. | ms | summary |
| `vault.core.leadership_lost` | The total duration that a HA cluster node maintained leadership as reported at the last time of loss. Any count greater than zero means that a leadership change has occurred. Continuing changes or reports of low value could be a cause for monitoring alerts as they would typically imply ongoing flapping of leadership that may rotate between nodes. | ms | summary |
| `vault.core.license.expiration_time_epoch` | Time as epoch (seconds since Jan 1 1970) at which license will expire. | seconds | gauge |
| `vault.core.mount_table.num_entries` | Number of mounts in a particular mount table. This metric is labeled by table type (auth or logical) and whether or not the table is replicated (local or not) | objects | gauge |
| `vault.core.mount_table.size` | Size of a particular mount table. This metric is labeled by table type (auth or logical) and whether or not the table is replicated (local or not) | bytes | gauge |
| `vault.core.step_down` | Duration of time taken by cluster leadership step downs. This should be monitored, and alerts set for overall cluster leadership status. | ms | summary |
| `vault.rollback.attempt.<mountpoint>` | Time taken to perform a rollback operation on the given mount point. The mount point name has its forward slashes `/` replaced by `-`. For example, a rollback operation on the `auth/token` backend is reported as `vault.rollback.attempt.auth-token-`. | ms | summary |
| `vault.route.create.<mountpoint>` | Time taken to dispatch a create operation to a backend, and for that backend to process it. The mount point name has its forward slashes `/` replaced by `-`. For example, a create operation to `ns1/secret/` would have corresponding metric `vault.route.create.ns1-secret-`. The number of samples of this metric, and the corresponding ones for other operations below, indicates how many operations were performed per mount point. | ms | summary |
| `vault.route.delete.<mountpoint>` | Time taken to dispatch a delete operation to a backend, and for that backend to process it. | ms | summary |
| `vault.route.list.<mountpoint>` | Time taken to dispatch a list operation to a backend, and for that backend to process it. | ms | summary |
| `vault.route.read.<mountpoint>` | Time taken to dispatch a read operation to a backend, and for that backend to process it. | ms | summary |
| `vault.route.rollback.<mountpoint>` | Time taken to dispatch a rollback operation to a backend, and for that backend to process it. Rollback operations are automatically scheduled to clean up partial errors. | ms | summary |
## Runtime Metrics
These metrics collect information from Vault's Go runtime, such as memory usage information.
| `vault.runtime.alloc_bytes` | Number of bytes allocated by the Vault process. The number of bytes may peak from time to time, but should return to a steady state value. | bytes | gauge |
| `vault.runtime.free_count` | Number of freed objects | objects | gauge |
| `vault.runtime.heap_objects` | Number of objects on the heap. This is a good general memory pressure indicator worth establishing a baseline and thresholds for alerting. | objects | gauge |
| `vault.runtime.num_goroutines` | Number of go routines. This serves as a general system load indicator worth establishing a baseline and thresholds for alerting. | go routines | gauge |
| `vault.runtime.sys_bytes` | Number of bytes allocated to Vault. This includes what is being used by Vault's heap and what has been reclaimed but not given back to the operating system. | bytes | gauge |
| `vault.runtime.total_gc_pause_ns` | The total garbage collector pause time since Vault was last started | ns | gauge |
| `vault.expire.leases.by_expiration` (cluster,gauge,expiring,namespace) | The number of leases set to expire, grouped by a time interval. This specific time interval and the total number of time intervals are configurable via `lease_metrics_epsilon` and `num_lease_metrics_buckets` in the telemetry stanza of a vault server configuration. The default values for these are `1hr` and `168` respectively, so the metric will report the number of leases that will expire each hour from the current time to a week from the present time. You can additionally group lease expiration by namespace by setting `add_lease_metrics_namespace_labels` to `true` in the config file (default is `false`). | leases | gauge |
| `vault.identity.num_entities` | The number of identity entities stored in Vault | entities | gauge |
| `vault.identity.entity.active.monthly` (cluster, namespace) | The number of distinct entities that created a token during the past month, per namespace. Only available if client count is enabled. Reported at the start of each month. | entities | gauge |
| `vault.identity.entity.active.partial_month` (cluster) | The total number of distinct entities that has created a token during the current month. Only available if client count is enabled. Reported periodically within each month. | entities | gauge |
| `vault.identity.entity.active.reporting_period` (cluster, namespace) | The client count default reporting period defines the number of distinct entities that created a token in the past N months, as defined by the client count default reporting period. Only available if client count is enabled. Reported at the start of each month. | entities | gauge |
| `vault.identity.entity.alias.count` (cluster, namespace, auth_method, mount_point) | The number of identity entities aliases stored in Vault, grouped by the auth mount that created them. This gauge is computed every 10 minutes. | aliases | gauge |
| `vault.identity.entity.count` (cluster, namespace) | The number of identity entities stored in Vault, grouped by namespace. | entities | gauge |
| `vault.identity.entity.creation` (cluster, namespace, auth_method, mount_point) | The number of identity entities created, grouped by the auth mount that created them. | entities | counter |
| `vault.identity.upsert_entity_txn` | Time taken to insert a new or modified entity into the in-memory database, and persist it to storage. | ms | summary |
| `vault.identity.upsert_group_txn` | Time taken to insert a new or modified group into the in-memory database, and persist it to storage. This operation is performed on group membership changes. | ms | summary |
| `vault.token.count` (cluster, namespace) | Number of service tokens available for use; counts all un-expired and un-revoked tokens in Vault's token store. This measurement is performed every 10 minutes. | token | gauge |
| `vault.token.count.by_auth` (cluster, namespace, auth_method) | Number of service tokens that were created by a particular auth method. | tokens | gauge |
| `vault.token.count.by_policy` (cluster, namespace, policy) | Number of service tokens that have a particular policy attached. If a token has more than one policy, it is counted in each policy gauge. | tokens | gauge |
| `vault.token.count.by_ttl` (cluster, namespace, creation_ttl) | Number of service tokens, grouped by the TTL range they were assigned at creation. | tokens | gauge |
| `vault.token.create` | The time taken to create a token | ms | summary |
| `vault.token.create_root` | Number of created root tokens. Does not decrease on revocation. | tokens | counter |
| `vault.token.createAccessor` | The time taken to create a token accessor | ms | summary |
| `vault.token.creation` (cluster, namespace, auth_method, mount_point, creation_ttl, token_type) | Number of service or batch tokens created. | tokens | counter |
| `vault.token.lookup` | The time taken to look up a token | ms | summary |
| `vault.token.revoke` | Time taken to revoke a token | ms | summary |
| `vault.token.revoke-tree` | Time taken to revoke a token tree | ms | summary |
| `vault.token.store` | Time taken to store an updated token entry without writing to the secondary index | ms | summary |
## Resource Quota Metrics
These metrics relate to rate limit and lease count quotas. Each metric comes with a label "name" identifying the specific quota.
These metrics relate to [Vault Enterprise Replication](/docs/enterprise/replication). The following metrics are not available in telemetry unless replication is in an unhealthy state: `replication.fetchRemoteKeys`, `replication.merkleDiff`, and `replication.merkleSync`.
| `vault.logshipper.streamWALs.missing_guard` | Number of incidences where the starting Merkle Tree index used to begin streaming WAL entries is not matched/found | missing guards | counter |
| `vault.logshipper.streamWALs.guard_found` | Number of incidences where the starting Merkle Tree index used to begin streaming WAL entries is matched/found | found guards | counter |
| `vault.logshipper.streamWALs.scanned_entries` | Number of entries scanned in the buffer before the right one was found. | scanned entries | summary |
| `vault.logshipper.buffer.length` | Current length of the log shipper buffer | buffer entries | gauge |
| `vault.logshipper.buffer.size` | Current size in bytes of the log shipper buffer | bytes | gauge |
| `vault.logshipper.buffer.max_length` | Maximum length of the log shipper buffer | buffer entries | gauge |
| `vault.logshipper.buffer.max_size` | Maximum size in bytes of the log shipper buffer | bytes | gauge |
| `vault.replication.fetchRemoteKeys` | Time taken to fetch keys from a remote cluster participating in replication prior to Merkle Tree based delta generation | ms | summary |
| `vault.replication.merkleDiff` | Time taken to perform a Merkle Tree based delta generation between the clusters participating in replication | ms | summary |
| `vault.replication.merkleSync` | Time taken to perform a Merkle Tree based synchronization using the last delta generated between the clusters participating in replication | ms | summary |
| `vault.replication.merkle.commit_index` | The last committed index in the Merkle Tree. | sequence number | gauge |
| `vault.replication.wal.last_wal` | The index of the last WAL | sequence number | gauge |
| `vault.replication.wal.last_dr_wal` | The index of the last DR WAL | sequence number | gauge |
| `vault.replication.wal.last_performance_wal` | The index of the last Performance WAL | sequence number | gauge |
| `vault.replication.fsm.last_remote_wal` | The index of the last remote WAL | sequence number | gauge |
| `vault.replication.wal.gc` | Time taken to complete one run of the WAL garbage collection process | ms | summary |
| `vault.replication.rpc.server.auth_request` | Duration of time taken by auth request | ms | summary |
| `vault.replication.rpc.server.bootstrap_request` | Duration of time taken by bootstrap request | ms | summary |
| `vault.replication.rpc.server.conflicting_pages_request` | Duration of time taken by conflicting pages request | ms | summary |
| `vault.replication.rpc.server.echo` | Duration of time taken by echo | ms | summary |
| `database.Initialize` | Time taken to initialize a database secret engine across all database secrets engines | ms | summary |
| `database.<name>.Initialize` | Time taken to initialize a database secret engine for the named database secrets engine `<name>`, for example: `database.postgresql-prod.Initialize` | ms | summary |
| `database.Initialize.error` | Number of database secrets engine initialization operation errors across all database secrets engines | errors | counter |
| `database.<name>.Initialize.error` | Number of database secrets engine initialization operation errors for the named database secrets engine `<name>`, for example: `database.postgresql-prod.Initialize.error` | errors | counter |
| `database.Close` | Time taken to close a database secret engine across all database secrets engines | ms | summary |
| `database.<name>.Close` | Time taken to close a database secret engine for the named database secrets engine `<name>`, for example: `database.postgresql-prod.Close` | ms | summary |
| `database.Close.error` | Number of database secrets engine close operation errors across all database secrets engines | errors | counter |
| `database.<name>.Close.error` | Number of database secrets engine close operation errors for the named database secrets engine `<name>`, for example: `database.postgresql-prod.Close.error` | errors | counter |
| `database.CreateUser` | Time taken to create a user across all database secrets engines | ms | summary |
| `database.<name>.CreateUser` | Time taken to create a user for the named database secrets engine `<name>` | ms | summary |
| `database.CreateUser.error` | Number of user creation operation errors across all database secrets engines | errors | counter |
| `database.<name>.CreateUser.error` | Number of user creation operation errors for the named database secrets engine `<name>`, for example: `database.postgresql-prod.CreateUser.error` | errors | counter |
| `database.RenewUser` | Time taken to renew a user across all database secrets engines | ms | summary |
| `database.<name>.RenewUser` | Time taken to renew a user for the named database secrets engine `<name>`, for example: `database.postgresql-prod.RenewUser` | ms | summary |
| `database.RenewUser.error` | Number of user renewal operation errors across all database secrets engines | errors | counter |
| `database.<name>.RenewUser.error` | Number of user renewal operations for the named database secrets engine `<name>`, for example: `database.postgresql-prod.RenewUser.error` | errors | counter |
| `database.RevokeUser` | Time taken to revoke a user across all database secrets engines | ms | summary |
| `database.<name>.RevokeUser` | Time taken to revoke a user for the named database secrets engine `<name>`, for example: `database.postgresql-prod.RevokeUser` | ms | summary |
| `database.RevokeUser.error` | Number of user revocation operation errors across all database secrets engines | errors | counter |
| `database.<name>.RevokeUser.error` | Number of user revocation operations for the named database secrets engine `<name>`, for example: `database.postgresql-prod.RevokeUser.error` | errors | counter |
| `secrets.pki.tidy.cert_store_current_entry` | The index of the current entry in the certificate store being verified by the tidy operation | entry index | gauge |
| `secrets.pki.tidy.cert_store_deleted_count` | Number of entries deleted from the certificate store | entry | counter |
| `secrets.pki.tidy.cert_store_total_entries` | Number of entries in the certificate store to verify during the tidy operation | entry | gauge |
| `secrets.pki.tidy.cert_store_total_entries_remaining` | Number of entries in the certificate store that are left after the tidy operation (checked but not removed). | entry | gauge |
| `secrets.pki.tidy.duration` | Duration of time taken by the PKI tidy operation | ms | summary |
| `secrets.pki.tidy.failure` | Number of times the PKI tidy operation has not completed due to errors | operations | counter |
| `secrets.pki.tidy.revoked_cert_current_entry` | The index of the current revoked certificate entry in the certificate store being verified by the tidy operation | entry index | gauge |
| `secrets.pki.tidy.revoked_cert_deleted_count` | Number of entries deleted from the certificate store for revoked certificates | entry | counter |
| `secrets.pki.tidy.revoked_cert_total_entries` | Number of entries in the certificate store for revoked certificates to verify during the tidy operation | entry | gauge |
| `secrets.pki.tidy.revoked_cert_total_entries_remaining` | Number of entries in the certificate store for revoked certificates that are left after the tidy operation (checked but not removed). | entry | gauge |
| `secrets.pki.tidy.revoked_cert_total_entries_incorrect_issuers` | Number of entries in the certificate store which had incorrect issuer information (total). | entry | gauge |
| `secrets.pki.tidy.revoked_cert_total_entries_fixed_issuers` | Number of entries in the certificate store which had incorrect issuer information that was fixed during this tidy operation. | entry | gauge |
| `secrets.pki.tidy.start_time_epoch` | Start time (as seconds since Jan 1 1970) when the PKI tidy operation is active, 0 otherwise | seconds | gauge |
| `vault.secret.kv.count` (cluster, namespace, mount_point) | Number of entries in each key-value secret engine. | paths | gauge |
| `vault.secret.lease.creation` (cluster, namespace, secret_engine, mount_point, creation_ttl) | Counts the number of leases created by secret engines. | leases | counter |
| `vault.azure.put` | Duration of a PUT operation against the [Azure storage backend][azure-storage-backend] | ms | summary |
| `vault.azure.get` | Duration of a GET operation against the [Azure storage backend][azure-storage-backend] | ms | summary |
| `vault.azure.delete` | Duration of a DELETE operation against the [Azure storage backend][azure-storage-backend] | ms | summary |
| `vault.azure.list` | Duration of a LIST operation against the [Azure storage backend][azure-storage-backend] | ms | summary |
| `vault.cassandra.put` | Duration of a PUT operation against the [Cassandra storage backend][cassandra-storage-backend] | ms | summary |
| `vault.cassandra.get` | Duration of a GET operation against the [Cassandra storage backend][cassandra-storage-backend] | ms | summary |
| `vault.cassandra.delete` | Duration of a DELETE operation against the [Cassandra storage backend][cassandra-storage-backend] | ms | summary |
| `vault.cassandra.list` | Duration of a LIST operation against the [Cassandra storage backend][cassandra-storage-backend] | ms | summary |
| `vault.cockroachdb.put` | Duration of a PUT operation against the [CockroachDB storage backend][cockroachdb-storage-backend] | ms | summary |
| `vault.cockroachdb.get` | Duration of a GET operation against the [CockroachDB storage backend][cockroachdb-storage-backend] | ms | summary |
| `vault.cockroachdb.delete` | Duration of a DELETE operation against the [CockroachDB storage backend][cockroachdb-storage-backend] | ms | summary |
| `vault.cockroachdb.list` | Duration of a LIST operation against the [CockroachDB storage backend][cockroachdb-storage-backend] | ms | summary |
| `vault.consul.put` | Duration of a PUT operation against the [Consul storage backend][consul-storage-backend] | ms | summary |
| `vault.consul.transaction` | Duration of a Txn operation against the [Consul storage backend][consul-storage-backend] | ms | summary |
| `vault.consul.get` | Duration of a GET operation against the [Consul storage backend][consul-storage-backend] | ms | summary |
| `vault.consul.delete` | Duration of a DELETE operation against the [Consul storage backend][consul-storage-backend] | ms | summary |
| `vault.consul.list` | Duration of a LIST operation against the [Consul storage backend][consul-storage-backend] | ms | summary |
| `vault.couchdb.put` | Duration of a PUT operation against the [CouchDB storage backend][couchdb-storage-backend] | ms | summary |
| `vault.couchdb.get` | Duration of a GET operation against the [CouchDB storage backend][couchdb-storage-backend] | ms | summary |
| `vault.couchdb.delete` | Duration of a DELETE operation against the [CouchDB storage backend][couchdb-storage-backend] | ms | summary |
| `vault.couchdb.list` | Duration of a LIST operation against the [CouchDB storage backend][couchdb-storage-backend] | ms | summary |
| `vault.dynamodb.put` | Duration of a PUT operation against the [DynamoDB storage backend][dynamodb-storage-backend] | ms | summary |
| `vault.dynamodb.get` | Duration of a GET operation against the [DynamoDB storage backend][dynamodb-storage-backend] | ms | summary |
| `vault.dynamodb.delete` | Duration of a DELETE operation against the [DynamoDB storage backend][dynamodb-storage-backend] | ms | summary |
| `vault.dynamodb.list` | Duration of a LIST operation against the [DynamoDB storage backend][dynamodb-storage-backend] | ms | summary |
| `vault.etcd.put` | Duration of a PUT operation against the [etcd storage backend][etcd-storage-backend] | ms | summary |
| `vault.etcd.get` | Duration of a GET operation against the [etcd storage backend][etcd-storage-backend] | ms | summary |
| `vault.etcd.delete` | Duration of a DELETE operation against the [etcd storage backend][etcd-storage-backend] | ms | summary |
| `vault.etcd.list` | Duration of a LIST operation against the [etcd storage backend][etcd-storage-backend] | ms | summary |
| `vault.gcs.put` | Duration of a PUT operation against the [Google Cloud Storage storage backend][gcs-storage-backend] | ms | summary |
| `vault.gcs.get` | Duration of a GET operation against the [Google Cloud Storage storage backend][gcs-storage-backend] | ms | summary |
| `vault.gcs.delete` | Duration of a DELETE operation against the [Google Cloud Storage storage backend][gcs-storage-backend] | ms | summary |
| `vault.gcs.list` | Duration of a LIST operation against the [Google Cloud Storage storage backend][gcs-storage-backend] | ms | summary |
| `vault.gcs.lock.unlock` | Duration of an UNLOCK operation against the [Google Cloud Storage storage backend][gcs-storage-backend] in HA mode | ms | summary |
| `vault.gcs.lock.lock` | Duration of a LOCK operation against the [Google Cloud Storage storage backend][gcs-storage-backend] in HA mode | ms | summary |
| `vault.gcs.lock.value` | Duration of a VALUE operation against the [Google Cloud Storage storage backend][gcs-storage-backend] in HA mode | ms | summary |
| `vault.mssql.put` | Duration of a PUT operation against the [MS-SQL storage backend][mssql-storage-backend] | ms | summary |
| `vault.mssql.get` | Duration of a GET operation against the [MS-SQL storage backend][mssql-storage-backend] | ms | summary |
| `vault.mssql.delete` | Duration of a DELETE operation against the [MS-SQL storage backend][mssql-storage-backend] | ms | summary |
| `vault.mssql.list` | Duration of a LIST operation against the [MS-SQL storage backend][mssql-storage-backend] | ms | summary |
| `vault.mysql.put` | Duration of a PUT operation against the [MySQL storage backend][mysql-storage-backend] | ms | summary |
| `vault.mysql.get` | Duration of a GET operation against the [MySQL storage backend][mysql-storage-backend] | ms | summary |
| `vault.mysql.delete` | Duration of a DELETE operation against the [MySQL storage backend][mysql-storage-backend] | ms | summary |
| `vault.mysql.list` | Duration of a LIST operation against the [MySQL storage backend][mysql-storage-backend] | ms | summary |
| `vault.postgres.put` | Duration of a PUT operation against the [PostgreSQL storage backend][postgresql-storage-backend] | ms | summary |
| `vault.postgres.get` | Duration of a GET operation against the [PostgreSQL storage backend][postgresql-storage-backend] | ms | summary |
| `vault.postgres.delete` | Duration of a DELETE operation against the [PostgreSQL storage backend][postgresql-storage-backend] | ms | summary |
| `vault.postgres.list` | Duration of a LIST operation against the [PostgreSQL storage backend][postgresql-storage-backend] | ms | summary |
| `vault.s3.put` | Duration of a PUT operation against the [Amazon S3 storage backend][s3-storage-backend] | ms | summary |
| `vault.s3.get` | Duration of a GET operation against the [Amazon S3 storage backend][s3-storage-backend] | ms | summary |
| `vault.s3.delete` | Duration of a DELETE operation against the [Amazon S3 storage backend][s3-storage-backend] | ms | summary |
| `vault.s3.list` | Duration of a LIST operation against the [Amazon S3 storage backend][s3-storage-backend] | ms | summary |
| `vault.spanner.put` | Duration of a PUT operation against the [Google Cloud Spanner storage backend][spanner-storage-backend] | ms | summary |
| `vault.spanner.get` | Duration of a GET operation against the [Google Cloud Spanner storage backend][spanner-storage-backend] | ms | summary |
| `vault.spanner.delete` | Duration of a DELETE operation against the [Google Cloud Spanner storage backend][spanner-storage-backend] | ms | summary |
| `vault.spanner.list` | Duration of a LIST operation against the [Google Cloud Spanner storage backend][spanner-storage-backend] | ms | summary |
| `vault.spanner.lock.unlock` | Duration of an UNLOCK operation against the [Google Cloud Spanner storage backend][spanner-storage-backend] in HA mode | ms | summary |
| `vault.spanner.lock.lock` | Duration of a LOCK operation against the [Google Cloud Spanner storage backend][spanner-storage-backend] in HA mode | ms | summary |
| `vault.spanner.lock.value` | Duration of a VALUE operation against the [Google Cloud Spanner storage backend][gcs-storage-backend] in HA mode | ms | summary |
| `vault.swift.put` | Duration of a PUT operation against the [Swift storage backend][swift-storage-backend] | ms | summary |
| `vault.swift.get` | Duration of a GET operation against the [Swift storage backend][swift-storage-backend] | ms | summary |
| `vault.swift.delete` | Duration of a DELETE operation against the [Swift storage backend][swift-storage-backend] | ms | summary |
| `vault.swift.list` | Duration of a LIST operation against the [Swift storage backend][swift-storage-backend] | ms | summary |
| `vault.zookeeper.put` | Duration of a PUT operation against the [ZooKeeper storage backend][zookeeper-storage-backend] | ms | summary |
| `vault.zookeeper.get` | Duration of a GET operation against the [ZooKeeper storage backend][zookeeper-storage-backend] | ms | summary |
| `vault.zookeeper.delete` | Duration of a DELETE operation against the [ZooKeeper storage backend][zookeeper-storage-backend] | ms | summary |
| `vault.zookeeper.list` | Duration of a LIST operation against the [ZooKeeper storage backend][zookeeper-storage-backend] | ms | summary |
| `vault.raft.apply` | Number of Raft transactions occurring over the interval, which is a general indicator of the write load on the Raft servers. | raft transactions / interval | counter |
| `vault.raft.barrier` | Number of times the node has started the barrier i.e the number of times it has issued a blocking call, to ensure that the node has all the pending operations that were queued, to be applied to the node's FSM. | blocks / interval | counter |
| `vault.raft.candidate.electSelf` | Time to request for a vote from a peer. | ms | summary |
| `vault.raft.commitNumLogs` | Number of logs processed for application to the FSM in a single batch. | logs | gauge |
| `vault.raft.commitTime` | Time to commit a new entry to the Raft log on the leader. | ms | timer |
| `vault.raft.compactLogs` | Time to trim the logs that are no longer needed. | ms | summary |
| `vault.raft.delete` | Time to delete file from raft's underlying storage. | ms | summary |
| `vault.raft.delete_prefix` | Time to delete files under a prefix from raft's underlying storage. | ms | summary |
| `vault.raft.fsm.apply` | Number of logs committed since the last interval. | commit logs / interval | summary |
| `vault.raft.fsm.applyBatch` | Time to apply batch of logs. | ms | summary |
| `vault.raft.fsm.applyBatchNum` | Number of logs applied in batch. | ms | summary |
| `vault.raft.fsm.enqueue` | Time to enqueue a batch of logs for the FSM to apply. | ms | timer |
| `vault.raft.fsm.restore` | Time taken by the FSM to restore its state from a snapshot. | ms | summary |
| `vault.raft.fsm.snapshot` | Time taken by the FSM to record the current state for the snapshot. | ms | summary |
| `vault.raft.fsm.store_config` | Time to store the configuration. | ms | summary |
| `vault.raft.get` | Time to retrieve file from raft's underlying storage. | ms | summary |
| `vault.raft.leader.dispatchLog` | Time for the leader to write log entries to disk. | ms | timer |
| `vault.raft.leader.dispatchNumLogs` | Number of logs committed to disk in a batch. | logs | gauge |
| `vault.raft.list` | Time to retrieve list of keys from raft's underlying storage. | ms | summary |
| `vault.raft.peers` | Number of peers in the raft cluster configuration. | peers | gauge |
| `vault.raft.put` | Time to persist key in raft's underlying storage. | ms | summary |
| `vault.raft.replication.appendEntries.log` | Number of logs replicated to a node, to bring it up to speed with the leader's logs. | logs appended / interval | counter |
| `vault.raft.replication.appendEntries.rpc` | Time taken by the append entries RFC, to replicate the log entries of a leader node onto its follower node(s). | ms | timer |
| `vault.raft.replication.heartbeat` | Time taken to invoke appendEntries on a peer, so that it doesn’t timeout on a periodic basis. | ms | timer |
| `vault.raft.replication.installSnapshot` | Time taken to process the installSnapshot RPC call. This metric should only be seen on nodes which are currently in the follower state. | ms | timer |
| `vault.raft.restore` | Number of times the restore operation has been performed by the node. Here, restore refers to the action of raft consuming an external snapshot to restore its state. | operation invoked / interval | counter |
| `vault.raft.restoreUserSnapshot` | Time taken by the node to restore the FSM state from a user's snapshot. | ms | timer |
| `vault.raft.rpc.appendEntries` | Time taken to process an append entries RPC call from a node. | ms | timer |
| `vault.raft.rpc.appendEntries.processLogs` | Time taken to process the outstanding log entries of a node. | ms | timer |
| `vault.raft.rpc.appendEntries.storeLogs` | Time taken to add any outstanding logs for a node, since the last appendEntries was invoked. | ms | timer |
| `vault.raft.rpc.installSnapshot` | Time taken to process the installSnapshot RPC call. This metric should only be seen on nodes which are currently in the follower state. | ms | timer |
| `vault.raft.rpc.processHeartbeat` | Time taken to process a heartbeat request. | ms | timer |
| `vault.raft.rpc.requestVote` | Time taken to complete requestVote RPC call. | ms | summary |
| `vault.raft.snapshot.create` | Time taken to initialize the snapshot process. | ms | timer |
| `vault.raft.snapshot.persist` | Time taken to dump the current snapshot taken by the node to the disk. | ms | timer |
| `vault.raft.snapshot.takeSnapshot` | Total time involved in taking the current snapshot (creating one and persisting it) by the node. | ms | timer |
| `vault.raft.state.follower` | Number of times node has entered the follower mode. This happens when a new node joins the cluster or after the end of a leader election. | follower state entered / interval | counter |
| `vault.raft.transition.heartbeat_timeout` | Number of times node has transitioned to the Candidate state, after receiving no heartbeat messages from the last known leader. | timeouts / interval | counter |
| `vault.autopilot.failure_tolerance` | How many nodes can be lost while maintaining quorum, i.e., number of healthy nodes in excess of quorum | nodes | gauge |
| `vault.raft.leader.lastContact` | Measures the time since the leader was last able to contact the follower nodes when checking its leader lease | ms | summary |
| `vault.raft.state.candidate` | Increments whenever raft server starts an election | Elections | counter |
| `vault.raft.state.leader` | Increments whenever raft server becomes a leader | Leaders | counter |
**Why are they vital?**: If frequent elections or leadership changes occur, it would likely indicate network issues between the raft nodes or the raft servers cannot keep up with the load.
| `vault.autosnapshots.total.snapshot.size` | For storage_type=local, space on disk used by saved snapshots | bytes | gauge |
| `vault.autosnapshots.percent.maxspace.used` | For storage_type=local, percent used of maximum allocated space | percentage | gauge |
| `vault.autosnapshots.save.errors` | Increments whenever an error occurs trying to save a snapshot | n/a | counter |
| `vault.autosnapshots.save.duration` | Measures the time taken saving a snapshot | ms | summary |
| `vault.autosnapshots.last.success.time` | Epoch time (seconds since 1970/01/01) of last successful snapshot save | n/a | gauge |
| `vault.autosnapshots.snapshot.size` | Measures the size in bytes of snapshots | bytes | summary |
| `vault.autosnapshots.rotate.duration` | Measures the time taken to rotate (i.e. delete) old snapshots to satisfy configured retention | ms | summary |
| `vault.autosnapshots.snapshots.in.storage` | Number of snapshots in storage | n/a | gauge |
| `auth_method` | Authorization engine type . | `userpass` |
| `cluster` | The cluster name from which the metric originated; set in the configuration file, or automatically generated when a cluster is create | `vault-cluster-d54ad07` |
| `creation_ttl` | Time-to-live value assigned to a token or lease at creation. This value is rounded up to the next-highest bucket; the available buckets are `1m`, `10m`, `20m`, `1h`, `2h`, `1d`, `2d`, `7d`, and `30d`. Any longer TTL is assigned the value `+Inf`. | `7d` |
| `mount_point` | Path at which an auth method or secret engine is mounted. | `auth/userpass/` |
| `namespace` | A namespace path, or `root` for the root namespace | `ns1` |
| `policy` | A single named policy | `default` |
| `secret_engine` | The [secret engine][secrets-engine] type. | `aws` |
| `token_type` | Identifies whether the token is a batch token or a service token. | `service` |