d7f44448e1
* Add disconnects/reconnect to log output and emit reschedule metrics * TaskGroupSummary: Add Unknown, update StateStore logic, add to metrics
482 lines
77 KiB
Plaintext
482 lines
77 KiB
Plaintext
---
|
|
layout: docs
|
|
page_title: Metrics Reference
|
|
description: Learn about the different metrics available in Nomad.
|
|
---
|
|
|
|
# Metrics Reference
|
|
|
|
The Nomad agent collects various runtime metrics about the performance of
|
|
different libraries and subsystems. These metrics are aggregated on a ten
|
|
second interval and are retained for one minute.
|
|
|
|
This data can be accessed via an HTTP endpoint or via sending a signal to the
|
|
Nomad process. This data is available via HTTP at `/metrics`. See
|
|
[Metrics](/api-docs/metrics) for more information.
|
|
|
|
To view this data via sending a signal to the Nomad process: on Unix,
|
|
this is `USR1` while on Windows it is `BREAK`. Once Nomad receives the signal,
|
|
it will dump the current telemetry information to the agent's `stderr`.
|
|
|
|
This telemetry information can be used for debugging or otherwise
|
|
getting a better view of what Nomad is doing.
|
|
|
|
Telemetry information can be streamed to both [statsite](https://github.com/armon/statsite)
|
|
as well as statsd based on providing the appropriate configuration options.
|
|
|
|
To configure the telemetry output please see the [agent
|
|
configuration](/docs/configuration/telemetry).
|
|
|
|
Below is sample output of a telemetry dump:
|
|
|
|
```text
|
|
[2015-09-17 16:59:40 -0700 PDT][G] 'nomad.nomad.broker.total_blocked': 0.000
|
|
[2015-09-17 16:59:40 -0700 PDT][G] 'nomad.nomad.plan.queue_depth': 0.000
|
|
[2015-09-17 16:59:40 -0700 PDT][G] 'nomad.runtime.malloc_count': 7568.000
|
|
[2015-09-17 16:59:40 -0700 PDT][G] 'nomad.runtime.total_gc_runs': 8.000
|
|
[2015-09-17 16:59:40 -0700 PDT][G] 'nomad.nomad.broker.total_ready': 0.000
|
|
[2015-09-17 16:59:40 -0700 PDT][G] 'nomad.runtime.num_goroutines': 56.000
|
|
[2015-09-17 16:59:40 -0700 PDT][G] 'nomad.runtime.sys_bytes': 3999992.000
|
|
[2015-09-17 16:59:40 -0700 PDT][G] 'nomad.runtime.heap_objects': 4135.000
|
|
[2015-09-17 16:59:40 -0700 PDT][G] 'nomad.nomad.heartbeat.active': 1.000
|
|
[2015-09-17 16:59:40 -0700 PDT][G] 'nomad.nomad.broker.total_unacked': 0.000
|
|
[2015-09-17 16:59:40 -0700 PDT][G] 'nomad.nomad.broker.total_waiting': 0.000
|
|
[2015-09-17 16:59:40 -0700 PDT][G] 'nomad.runtime.alloc_bytes': 634056.000
|
|
[2015-09-17 16:59:40 -0700 PDT][G] 'nomad.runtime.free_count': 3433.000
|
|
[2015-09-17 16:59:40 -0700 PDT][G] 'nomad.runtime.total_gc_pause_ns': 6572135.000
|
|
[2015-09-17 16:59:40 -0700 PDT][C] 'nomad.memberlist.msg.alive': Count: 1 Sum: 1.000
|
|
[2015-09-17 16:59:40 -0700 PDT][C] 'nomad.serf.member.join': Count: 1 Sum: 1.000
|
|
[2015-09-17 16:59:40 -0700 PDT][C] 'nomad.raft.barrier': Count: 1 Sum: 1.000
|
|
[2015-09-17 16:59:40 -0700 PDT][C] 'nomad.raft.apply': Count: 1 Sum: 1.000
|
|
[2015-09-17 16:59:40 -0700 PDT][C] 'nomad.nomad.rpc.query': Count: 2 Sum: 2.000
|
|
[2015-09-17 16:59:40 -0700 PDT][S] 'nomad.serf.queue.Query': Count: 6 Sum: 0.000
|
|
[2015-09-17 16:59:40 -0700 PDT][S] 'nomad.nomad.fsm.register_node': Count: 1 Sum: 1.296
|
|
[2015-09-17 16:59:40 -0700 PDT][S] 'nomad.serf.queue.Intent': Count: 6 Sum: 0.000
|
|
[2015-09-17 16:59:40 -0700 PDT][S] 'nomad.runtime.gc_pause_ns': Count: 8 Min: 126492.000 Mean: 821516.875 Max: 3126670.000 Stddev: 1139250.294 Sum: 6572135.000
|
|
[2015-09-17 16:59:40 -0700 PDT][S] 'nomad.raft.leader.dispatchLog': Count: 3 Min: 0.007 Mean: 0.018 Max: 0.039 Stddev: 0.018 Sum: 0.054
|
|
[2015-09-17 16:59:40 -0700 PDT][S] 'nomad.nomad.leader.reconcileMember': Count: 1 Sum: 0.007
|
|
[2015-09-17 16:59:40 -0700 PDT][S] 'nomad.nomad.leader.reconcile': Count: 1 Sum: 0.025
|
|
[2015-09-17 16:59:40 -0700 PDT][S] 'nomad.raft.fsm.apply': Count: 1 Sum: 1.306
|
|
[2015-09-17 16:59:40 -0700 PDT][S] 'nomad.nomad.client.get_allocs': Count: 1 Sum: 0.110
|
|
[2015-09-17 16:59:40 -0700 PDT][S] 'nomad.nomad.worker.dequeue_eval': Count: 29 Min: 0.003 Mean: 363.426 Max: 503.377 Stddev: 228.126 Sum: 10539.354
|
|
[2015-09-17 16:59:40 -0700 PDT][S] 'nomad.serf.queue.Event': Count: 6 Sum: 0.000
|
|
[2015-09-17 16:59:40 -0700 PDT][S] 'nomad.raft.commitTime': Count: 3 Min: 0.013 Mean: 0.037 Max: 0.079 Stddev: 0.037 Sum: 0.110
|
|
[2015-09-17 16:59:40 -0700 PDT][S] 'nomad.nomad.leader.barrier': Count: 1 Sum: 0.071
|
|
[2015-09-17 16:59:40 -0700 PDT][S] 'nomad.nomad.client.register': Count: 1 Sum: 1.626
|
|
[2015-09-17 16:59:40 -0700 PDT][S] 'nomad.nomad.eval.dequeue': Count: 21 Min: 500.610 Mean: 501.753 Max: 503.361 Stddev: 1.030 Sum: 10536.813
|
|
[2015-09-17 16:59:40 -0700 PDT][S] 'nomad.memberlist.gossip': Count: 12 Min: 0.009 Mean: 0.017 Max: 0.025 Stddev: 0.005 Sum: 0.204
|
|
```
|
|
|
|
### Metric Types
|
|
|
|
| Type | Description | Quantiles |
|
|
| ------- | ------------------------------------------------------------------------------------------------------------------- | --------- |
|
|
| Gauge | Gauge types report an absolute number at the end of the aggregation interval | false |
|
|
| Counter | Counts are incremented and flushed at the end of the aggregation interval and then are reset to zero | true |
|
|
| Timer | Timers measure the time to complete a task and will include quantiles, means, standard deviation, etc per interval. | true |
|
|
|
|
### Tagged Metrics
|
|
|
|
Nomad emits metrics in a tagged format. Each metric can support more than one
|
|
tag, meaning that it is possible to do a match over metrics for datapoints
|
|
such as a particular datacenter, and return all metrics with this tag. Nomad
|
|
supports labels for namespaces as well.
|
|
|
|
## Key Metrics
|
|
|
|
The metrics in the table below are the most important metrics for monitoring
|
|
the overall health of a Nomad cluster.
|
|
|
|
When telemetry is being streamed to statsite or statsd, `interval` in the
|
|
table below is defined to be their flush interval. Otherwise, the interval can
|
|
be assumed to be 10 seconds when retrieving metrics using the above described
|
|
signals.
|
|
|
|
| Metrics | Description | Unit | Type |
|
|
| -------------------------------------------- | ------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------| ------------------------------ | ------- |
|
|
| `nomad.runtime.alloc_bytes` | Memory utilization | # of bytes | Gauge |
|
|
| `nomad.runtime.heap_objects` | Number of objects on the heap. General memory pressure indicator | # of heap objects | Gauge |
|
|
| `nomad.runtime.num_goroutines` | Number of goroutines and general load pressure indicator | # of goroutines | Gauge |
|
|
| `nomad.nomad.broker.total_blocked` | Evaluations that are blocked until an existing evaluation for the same job completes | # of evaluations | Gauge |
|
|
| `nomad.nomad.broker.total_ready` | Number of evaluations ready to be processed | # of evaluations | Gauge |
|
|
| `nomad.nomad.broker.total_unacked` | Evaluations dispatched for processing but incomplete | # of evaluations | Gauge |
|
|
| `nomad.nomad.heartbeat.active` | Number of active heartbeat timers. Each timer represents a Nomad Client connection | # of heartbeat timers | Gauge |
|
|
| `nomad.nomad.heartbeat.invalidate` | The length of time it takes to invalidate a Nomad Client due to failed heartbeats | ms / Heartbeat Invalidation | Timer |
|
|
| `nomad.nomad.plan.evaluate` | Time to validate a scheduler Plan. Higher values cause lower scheduling throughput. Similar to `nomad.plan.submit` but does not include RPC time or time in the Plan Queue | ms / Plan Evaluation | Timer |
|
|
| `nomad.nomad.plan.node_rejected` | Number of times a node has had a plan rejected. A node with a high rate of rejections may have an underlying issue causing it to be unschedulable. Refer to [this link][s_port_plan_failure] for more information | # of rejected plans | Counter |
|
|
| `nomad.nomad.plan.queue_depth` | Number of scheduler Plans waiting to be evaluated | # of plans | Gauge |
|
|
| `nomad.nomad.plan.submit` | Time to submit a scheduler Plan. Higher values cause lower scheduling throughput | ms / Plan Submit | Timer |
|
|
| `nomad.nomad.rpc.query` | Number of RPC queries | RPC Queries / `interval` | Counter |
|
|
| `nomad.nomad.rpc.request_error` | Number of RPC requests being handled that result in an error | RPC Errors / `interval` | Counter |
|
|
| `nomad.nomad.rpc.request` | Number of RPC requests being handled | RPC Requests / `interval` | Counter |
|
|
| `nomad.nomad.worker.invoke_scheduler.<type>` | Time to run the scheduler of the given type | ms / Scheduler Run | Timer |
|
|
| `nomad.nomad.worker.wait_for_index` | Time waiting for Raft log replication from leader. High delays result in lower scheduling throughput | ms / Raft Index Wait | Timer |
|
|
| `nomad.raft.apply` | Number of Raft transactions | Raft transactions / `interval` | Counter |
|
|
| `nomad.raft.leader.lastContact` | Time since last contact to leader. General indicator of Raft latency | ms / Leader Contact | Timer |
|
|
| `nomad.raft.replication.appendEntries` | Raft transaction commit time | ms / Raft Log Append | Timer |
|
|
| `nomad.license.expiration_time_epoch` | Time as epoch (seconds since Jan 1 1970) at which license will expire | Seconds | Gauge |
|
|
|
|
## Client Metrics
|
|
|
|
The Nomad client emits metrics related to the resource usage of the allocations
|
|
and tasks running on it and the node itself. Operators have to explicitly turn
|
|
on publishing host and allocation metrics. Publishing allocation and host
|
|
metrics can be turned on by setting the value of `publish_allocation_metrics`
|
|
`publish_node_metrics` to `true`.
|
|
|
|
By default the collection interval is 1 second but it can be changed by the
|
|
changing the value of the `collection_interval` key in the `telemetry`
|
|
configuration block.
|
|
|
|
Please see the [agent configuration](/docs/configuration/telemetry)
|
|
page for more details.
|
|
|
|
As of Nomad 0.9, Nomad will emit additional labels for [parameterized](/docs/job-specification/parameterized) and
|
|
[periodic](/docs/job-specification/parameterized) jobs. Nomad
|
|
emits the parent job id as a new label `parent_id`. Also, the labels `dispatch_id`
|
|
and `periodic_id` are emitted, containing the ID of the specific invocation of the
|
|
parameterized or periodic job respectively. For example, a dispatch job with the id
|
|
`myjob/dispatch-1312323423423`, will have the following labels.
|
|
|
|
| Label | Value |
|
|
| ----------- | ------------------------------ |
|
|
| job | `myjob/dispatch-1312323423423` |
|
|
| parent_id | `myjob` |
|
|
| dispatch_id | `1312323423423` |
|
|
|
|
## Host Metrics
|
|
|
|
Nomad will emit [tagged metrics][tagged-metrics], in the below format:
|
|
|
|
| Metric | Description | Unit | Type | Labels |
|
|
| --------------------------------------- | ----------------------------------------------------------------------------------- | ---------- | ----- | ------------------------------------------------------------------------------------- |
|
|
| `nomad.client.allocated.cpu` | Total amount of CPU shares the scheduler has allocated to tasks | Mhz | Gauge | datacenter, host, node_class, node_id, node_scheduling_eligibility, node_status |
|
|
| `nomad.client.allocated.memory` | Total amount of memory the scheduler has allocated to tasks | Megabytes | Gauge | datacenter, host, node_class, node_id, node_scheduling_eligibility, node_status |
|
|
| `nomad.client.allocated_disk` | Total amount of disk space the scheduler has allocated to tasks | Megabytes | Gauge | datacenter, host, node_class, node_id, node_scheduling_eligibility, node_status |
|
|
| `nomad.client.allocations.blocked` | Number of allocations blocked | Integer | Gauge | datacenter, host, node_class, node_id, node_scheduling_eligibility, node_status |
|
|
| `nomad.client.allocations.migrating` | Number of allocations migrating | Integer | Gauge | datacenter, host, node_class, node_id, node_scheduling_eligibility, node_status |
|
|
| `nomad.client.allocations.pending` | Number of allocations pending | Integer | Gauge | datacenter, host, node_class, node_id, node_scheduling_eligibility, node_status |
|
|
| `nomad.client.allocations.running` | Number of allocations running | Integer | Gauge | datacenter, host, node_class, node_id, node_scheduling_eligibility, node_status |
|
|
| `nomad.client.allocations.start` | Number of allocations starting | Integer | Gauge | datacenter, host, node_class, node_id, node_scheduling_eligibility, node_status |
|
|
| `nomad.client.allocations.terminal` | Number of allocations terminal | Integer | Gauge | datacenter, host, node_class, node_id, node_scheduling_eligibility, node_status |
|
|
| `nomad.client.allocs.oom_killed` | Number of allocations OOM killed | Integer | Gauge | datacenter, host, node_class, node_id, node_scheduling_eligibility, node_status |
|
|
| `nomad.client.host.cpu.idle` | CPU utilization in idle state | Percentage | Gauge | cpu, datacenter, host, node_class, node_id, node_scheduling_eligibility, node_status |
|
|
| `nomad.client.host.cpu.system` | CPU utilization in system space | Percentage | Gauge | cpu, datacenter, host, node_class, node_id, node_scheduling_eligibility, node_status |
|
|
| `nomad.client.host.cpu.total` | Total CPU utilization | Percentage | Gauge | cpu, datacenter, host, node_class, node_id, node_scheduling_eligibility, node_status |
|
|
| `nomad.client.host.cpu.user` | CPU utilization in user space | Percentage | Gauge | cpu, datacenter, host, node_class, node_id, node_scheduling_eligibility, node_status |
|
|
| `nomad.client.host.disk.available` | Amount of space which is available | Bytes | Gauge | datacenter, disk, host, node_class, node_id, node_scheduling_eligibility, node_status |
|
|
| `nomad.client.host.disk.inodes_percent` | Disk space consumed by the inodes | Percentage | Gauge | datacenter, disk, host, node_class, node_id, node_scheduling_eligibility, node_status |
|
|
| `nomad.client.host.disk.size` | Total size of the device | Bytes | Gauge | datacenter, disk, host, node_class, node_id, node_scheduling_eligibility, node_status |
|
|
| `nomad.client.host.disk.used_percent` | Percentage of disk space used | Percentage | Gauge | datacenter, disk, host, node_class, node_id, node_scheduling_eligibility, node_status |
|
|
| `nomad.client.host.disk.used` | Amount of space which has been used | Bytes | Gauge | datacenter, disk, host, node_class, node_id, node_scheduling_eligibility, node_status |
|
|
| `nomad.client.host.memory.available` | Total amount of memory available to processes which includes free and cached memory | Bytes | Gauge | datacenter, host, node_class, node_id, node_scheduling_eligibility, node_status |
|
|
| `nomad.client.host.memory.free` | Amount of memory which is free | Bytes | Gauge | datacenter, host, node_class, node_id, node_scheduling_eligibility, node_status |
|
|
| `nomad.client.host.memory.total` | Total amount of physical memory on the node | Bytes | Gauge | datacenter, host, node_class, node_id, node_scheduling_eligibility, node_status |
|
|
| `nomad.client.host.memory.used` | Amount of memory used by processes | Bytes | Gauge | datacenter, host, node_class, node_id, node_scheduling_eligibility, node_status |
|
|
| `nomad.client.unallocated.cpu` | Total amount of CPU shares free for the scheduler to allocate to tasks | Mhz | Gauge | datacenter, host, node_class, node_id, node_scheduling_eligibility, node_status |
|
|
| `nomad.client.unallocated.disk` | Total amount of disk space free for the scheduler to allocate to tasks | Megabytes | Gauge | datacenter, host, node_class, node_id, node_scheduling_eligibility, node_status |
|
|
| `nomad.client.unallocated.memory` | Total amount of memory free for the scheduler to allocate to tasks | Megabytes | Gauge | datacenter, host, node_class, node_id, node_scheduling_eligibility, node_status |
|
|
| `nomad.client.uptime` | Uptime of the host running the Nomad client | Seconds | Gauge | datacenter, host, node_class, node_id, node_scheduling_eligibility, node_status |
|
|
|
|
## Allocation Metrics
|
|
|
|
The following metrics are emitted for each allocation if allocation metrics
|
|
are enabled. Note that allocation metrics available may be dependent on the
|
|
task driver; not all task drivers can provide all metrics.
|
|
|
|
| Metric | Description | Unit | Type | Labels |
|
|
| --------------------------------------------- | ----------------------------------------------------------------- | ----------- | ----- | ------------------------------------------------ |
|
|
| `nomad.client.allocs.cpu.allocated` | Total CPU resources allocated by the task across all cores | MHz | Gauge | alloc_id, host, job, namespace, task, task_group |
|
|
| `nomad.client.allocs.cpu.system` | Total CPU resources consumed by the task in system space | Percentage | Gauge | alloc_id, host, job, namespace, task, task_group |
|
|
| `nomad.client.allocs.cpu.throttled_periods` | Total number of CPU periods that the task was throttled | Nanoseconds | Gauge | alloc_id, host, job, namespace, task, task_group |
|
|
| `nomad.client.allocs.cpu.throttled_time` | Total time that the task was throttled | Nanoseconds | Gauge | alloc_id, host, job, namespace, task, task_group |
|
|
| `nomad.client.allocs.cpu.total_percent` | Total CPU resources consumed by the task across all cores | Percentage | Gauge | alloc_id, host, job, namespace, task, task_group |
|
|
| `nomad.client.allocs.cpu.total_ticks` | CPU ticks consumed by the process in the last collection interval | Integer | Gauge | alloc_id, host, job, namespace, task, task_group |
|
|
| `nomad.client.allocs.cpu.user` | Total CPU resources consumed by the task in the user space | Percentage | Gauge | alloc_id, host, job, namespace, task, task_group |
|
|
| `nomad.client.allocs.memory.allocated` | Amount of memory allocated by the task | Bytes | Gauge | alloc_id, host, job, namespace, task, task_group |
|
|
| `nomad.client.allocs.memory.cache` | Amount of memory cached by the task | Bytes | Gauge | alloc_id, host, job, namespace, task, task_group |
|
|
| `nomad.client.allocs.memory.kernel_max_usage` | Maximum amount of memory ever used by the kernel for this task | Bytes | Gauge | alloc_id, host, job, namespace, task, task_group |
|
|
| `nomad.client.allocs.memory.kernel_usage` | Amount of memory used by the kernel for this task | Bytes | Gauge | alloc_id, host, job, namespace, task, task_group |
|
|
| `nomad.client.allocs.memory.max_usage` | Maximum amount of memory ever used by the task | Bytes | Gauge | alloc_id, host, job, namespace, task, task_group |
|
|
| `nomad.client.allocs.memory.rss` | Amount of RSS memory consumed by the task | Bytes | Gauge | alloc_id, host, job, namespace, task, task_group |
|
|
| `nomad.client.allocs.memory.swap` | Amount of memory swapped by the task | Bytes | Gauge | alloc_id, host, job, namespace, task, task_group |
|
|
| `nomad.client.allocs.memory.usage` | Total amount of memory used by the task | Bytes | Gauge | alloc_id, host, job, namespace, task, task_group |
|
|
|
|
## Job Summary Metrics
|
|
|
|
Job summary metrics are emitted by the Nomad leader server.
|
|
|
|
| Metric | Description | Unit | Type | Labels |
|
|
| ---------------------------------- | ---------------------------------------- | ------- | ----- | -------------------------------- |
|
|
| `nomad.nomad.job_summary.complete` | Number of complete allocations for a job | Integer | Gauge | host, job, namespace, task_group |
|
|
| `nomad.nomad.job_summary.failed` | Number of failed allocations for a job | Integer | Gauge | host, job, namespace, task_group |
|
|
| `nomad.nomad.job_summary.lost` | Number of lost allocations for a job | Integer | Gauge | host, job, namespace, task_group |
|
|
| `nomad.nomad.job_summary.unknown` | Number of unknown allocations for a job | Integer | Gauge | host, job, namespace, task_group |
|
|
| `nomad.nomad.job_summary.queued` | Number of queued allocations for a job | Integer | Gauge | host, job, namespace, task_group |
|
|
| `nomad.nomad.job_summary.running` | Number of running allocations for a job | Integer | Gauge | host, job, namespace, task_group |
|
|
| `nomad.nomad.job_summary.starting` | Number of starting allocations for a job | Integer | Gauge | host, job, namespace, task_group |
|
|
|
|
## Job Status Metrics
|
|
|
|
Job status metrics are emitted by the Nomad leader server.
|
|
|
|
| Metric | Description | Unit | Type | Labels |
|
|
| -------------------------------- | ---------------------- | ------- | ----- | ------ |
|
|
| `nomad.nomad.job_status.dead` | Number of dead jobs | Integer | Gauge | host |
|
|
| `nomad.nomad.job_status.pending` | Number of pending jobs | Integer | Gauge | host |
|
|
| `nomad.nomad.job_status.running` | Number of running jobs | Integer | Gauge | host |
|
|
|
|
## Server Metrics
|
|
|
|
The following table includes metrics for overall cluster health in addition to
|
|
those listed in [Key Metrics](#key-metrics) above.
|
|
|
|
| Metric | Description | Unit | Type | Labels |
|
|
|------------------------------------------------------|--------------------------------------------------------------------------------|----------------------|---------|---------------------------------------------------------|
|
|
| `nomad.memberlist.gossip` | Time elapsed to broadcast gossip messages | Nanoseconds | Summary | host |
|
|
| `nomad.nomad.acl.bootstrap` | Time elapsed for `ACL.Bootstrap` RPC call | Nanoseconds | Summary | host |
|
|
| `nomad.nomad.acl.delete_policies` | Time elapsed for `ACL.DeletePolicies` RPC call | Nanoseconds | Summary | host |
|
|
| `nomad.nomad.acl.delete_tokens` | Time elapsed for `ACL.DeleteTokens` RPC call | Nanoseconds | Summary | host |
|
|
| `nomad.nomad.acl.get_policies` | Time elapsed for `ACL.GetPolicies` RPC call | Nanoseconds | Summary | host |
|
|
| `nomad.nomad.acl.get_policy` | Time elapsed for `ACL.GetPolicy` RPC call | Nanoseconds | Summary | host |
|
|
| `nomad.nomad.acl.get_token` | Time elapsed for `ACL.GetToken` RPC call | Nanoseconds | Summary | host |
|
|
| `nomad.nomad.acl.get_tokens` | Time elapsed for `ACL.GetTokens` RPC call | Nanoseconds | Summary | host |
|
|
| `nomad.nomad.acl.list_policies` | Time elapsed for `ACL.ListPolicies` RPC call | Nanoseconds | Summary | host |
|
|
| `nomad.nomad.acl.list_tokens` | Time elapsed for `ACL.ListTokens` RPC call | Nanoseconds | Summary | host |
|
|
| `nomad.nomad.acl.resolve_token` | Time elapsed for `ACL.ResolveToken` RPC call | Nanoseconds | Summary | host |
|
|
| `nomad.nomad.acl.upsert_policies` | Time elapsed for `ACL.UpsertPolicies` RPC call | Nanoseconds | Summary | host |
|
|
| `nomad.nomad.acl.upsert_tokens` | Time elapsed for `ACL.UpsertTokens` RPC call | Nanoseconds | Summary | host |
|
|
| `nomad.nomad.alloc.exec` | Time elapsed to establish alloc exec | Nanoseconds | Summary | host |
|
|
| `nomad.nomad.alloc.get_alloc` | Time elapsed for `Alloc.GetAlloc` RPC call | Nanoseconds | Summary | host |
|
|
| `nomad.nomad.alloc.get_allocs` | Time elapsed for `Alloc.GetAllocs` RPC call | Nanoseconds | Summary | host |
|
|
| `nomad.nomad.alloc.list` | Time elapsed for `Alloc.List` RPC call | Nanoseconds | Summary | host |
|
|
| `nomad.nomad.alloc.stop` | Time elapsed for `Alloc.Stop` RPC call | Nanoseconds | Summary | host |
|
|
| `nomad.nomad.alloc.update_desired_transition` | Time elapsed for `Alloc.UpdateDesiredTransition` RPC call | Nanoseconds | Summary | host |
|
|
| `nomad.nomad.blocked_evals.cpu` | Amount of CPU shares requested by blocked evals | Integer | Gauge | datacenter, host, node_class |
|
|
| `nomad.nomad.blocked_evals.memory` | Amount of memory requested by blocked evals | Integer | Gauge | datacenter, host, node_class |
|
|
| `nomad.nomad.blocked_evals.job.cpu` | Amount of CPU shares requested by blocked evals of a job | Integer | Gauge | host, job, namespace |
|
|
| `nomad.nomad.blocked_evals.job.memory` | Amount of memory requested by blocked evals of a job | Integer | Gauge | host, job, namespace |
|
|
| `nomad.nomad.blocked_evals.total_blocked` | Count of evals in the blocked state | Integer | Gauge | host |
|
|
| `nomad.nomad.blocked_evals.total_escaped` | Count of evals that have escaped computed node classes | Integer | Gauge | host |
|
|
| `nomad.nomad.blocked_evals.total_quota_limit` | Count of blocked evals due to quota limits | Integer | Gauge | host |
|
|
| `nomad.nomad.broker.batch_ready` | Count of batch evals ready to be scheduled | Integer | Gauge | host |
|
|
| `nomad.nomad.broker.batch_unacked` | Count of unacknowledged batch evals | Integer | Gauge | host |
|
|
| `nomad.nomad.broker.eval_waiting` | Time elapsed with evaluation waiting to be enqueued | Nanoseconds | Gauge | eval_id, job, namespace |
|
|
| `nomad.nomad.broker.service_ready` | Count of service evals ready to be scheduled | Integer | Gauge | host |
|
|
| `nomad.nomad.broker.service_unacked` | Count of unacknowledged service evals | Integer | Gauge | host |
|
|
| `nomad.nomad.broker.system_ready` | Count of system evals ready to be scheduled | Integer | Gauge | host |
|
|
| `nomad.nomad.broker.system_unacked` | Count of unacknowledged system evals | Integer | Gauge | host |
|
|
| `nomad.nomad.broker.total_ready` | Count of evals in the ready state | Integer | Gauge | host |
|
|
| `nomad.nomad.broker.total_waiting` | Count of evals waiting to be enqueued | Integer | Gauge | host |
|
|
| `nomad.nomad.client.batch_deregister` | Time elapsed for `Node.BatchDeregister` RPC call | Nanoseconds | Summary | host |
|
|
| `nomad.nomad.client.deregister` | Time elapsed for `Node.Deregister` RPC call | Nanoseconds | Summary | host |
|
|
| `nomad.nomad.client.derive_si_token` | Time elapsed for `Node.DeriveSIToken` RPC call | Nanoseconds | Summary | host |
|
|
| `nomad.nomad.client.derive_vault_token` | Time elapsed for `Node.DeriveVaultToken` RPC call | Nanoseconds | Summary | host |
|
|
| `nomad.nomad.client.emit_events` | Time elapsed for `Node.EmitEvents` RPC call | Nanoseconds | Summary | host |
|
|
| `nomad.nomad.client.evaluate` | Time elapsed for `Node.Evaluate` RPC call | Nanoseconds | Summary | host |
|
|
| `nomad.nomad.client.get_allocs` | Time elapsed for `Node.GetAllocs` RPC call | Nanoseconds | Summary | host |
|
|
| `nomad.nomad.client.get_client_allocs` | Time elapsed for `Node.GetClientAllocs` RPC call | Nanoseconds | Summary | host |
|
|
| `nomad.nomad.client.get_node` | Time elapsed for `Node.GetNode` RPC call | Nanoseconds | Summary | host |
|
|
| `nomad.nomad.client.list` | Time elapsed for `Node.List` RPC call | Nanoseconds | Summary | host |
|
|
| `nomad.nomad.client.register` | Time elapsed for `Node.Register` RPC call | Nanoseconds | Summary | host |
|
|
| `nomad.nomad.client.stats` | Time elapsed for `Client.Stats` RPC call | Nanoseconds | Summary | host |
|
|
| `nomad.nomad.client.update_alloc` | Time elapsed for `Node.UpdateAlloc` RPC call | Nanoseconds | Summary | host |
|
|
| `nomad.nomad.client.update_drain` | Time elapsed for `Node.UpdateDrain` RPC call | Nanoseconds | Summary | host |
|
|
| `nomad.nomad.client.update_eligibility` | Time elapsed for `Node.UpdateEligibility` RPC call | Nanoseconds | Summary | host |
|
|
| `nomad.nomad.client.update_status` | Time elapsed for `Node.UpdateStatus` RPC call | Nanoseconds | Summary | host |
|
|
| `nomad.nomad.client_allocations.garbage_collect_all` | Time elapsed for `ClientAllocations.GarbageCollectAll` RPC call | Nanoseconds | Summary | host |
|
|
| `nomad.nomad.client_allocations.garbage_collect` | Time elapsed for `ClientAllocations.GarbageCollect` RPC call | Nanoseconds | Summary | host |
|
|
| `nomad.nomad.client_allocations.restart` | Time elapsed for `ClientAllocations.Restart` RPC call | Nanoseconds | Summary | host |
|
|
| `nomad.nomad.client_allocations.signal` | Time elapsed for `ClientAllocations.Signal` RPC call | Nanoseconds | Summary | host |
|
|
| `nomad.nomad.client_allocations.stats` | Time elapsed for `ClientAllocations.Stats` RPC call | Nanoseconds | Summary | host |
|
|
| `nomad.nomad.client_csi_controller.attach_volume` | Time elapsed for `Controller.AttachVolume` RPC call | Nanoseconds | Summary | host |
|
|
| `nomad.nomad.client_csi_controller.detach_volume` | Time elapsed for `Controller.DetachVolume` RPC call | Nanoseconds | Summary | host |
|
|
| `nomad.nomad.client_csi_controller.validate_volume` | Time elapsed for `Controller.ValidateVolume` RPC call | Nanoseconds | Summary | host |
|
|
| `nomad.nomad.client_csi_node.detach_volume` | Time elapsed for `Node.DetachVolume` RPC call | Nanoseconds | Summary | host |
|
|
| `nomad.nomad.deployment.allocations` | Time elapsed for `Deployment.Allocations` RPC call | Nanoseconds | Summary | host |
|
|
| `nomad.nomad.deployment.cancel` | Time elapsed for `Deployment.Cancel` RPC call | Nanoseconds | Summary | host |
|
|
| `nomad.nomad.deployment.fail` | Time elapsed for `Deployment.Fail` RPC call | Nanoseconds | Summary | host |
|
|
| `nomad.nomad.deployment.get_deployment` | Time elapsed for `Deployment.GetDeployment` RPC call | Nanoseconds | Summary | host |
|
|
| `nomad.nomad.deployment.list` | Time elapsed for `Deployment.List` RPC call | Nanoseconds | Summary | host |
|
|
| `nomad.nomad.deployment.pause` | Time elapsed for `Deployment.Pause` RPC call | Nanoseconds | Summary | host |
|
|
| `nomad.nomad.deployment.promote` | Time elapsed for `Deployment.Promote` RPC call | Nanoseconds | Summary | host |
|
|
| `nomad.nomad.deployment.reap` | Time elapsed for `Deployment.Reap` RPC call | Nanoseconds | Summary | host |
|
|
| `nomad.nomad.deployment.run` | Time elapsed for `Deployment.Run` RPC call | Nanoseconds | Summary | host |
|
|
| `nomad.nomad.deployment.set_alloc_health` | Time elapsed for `Deployment.SetAllocHealth` RPC call | Nanoseconds | Summary | host |
|
|
| `nomad.nomad.deployment.unblock` | Time elapsed for `Deployment.Unblock` RPC call | Nanoseconds | Summary | host |
|
|
| `nomad.nomad.eval.ack` | Time elapsed for `Eval.Ack` RPC call | Nanoseconds | Summary | host |
|
|
| `nomad.nomad.eval.allocations` | Time elapsed for `Eval.Allocations` RPC call | Nanoseconds | Summary | host |
|
|
| `nomad.nomad.eval.create` | Time elapsed for `Eval.Create` RPC call | Nanoseconds | Summary | host |
|
|
| `nomad.nomad.eval.dequeue` | Time elapsed for `Eval.Dequeue` RPC call | Nanoseconds | Summary | host |
|
|
| `nomad.nomad.eval.get_eval` | Time elapsed for `Eval.GetEval` RPC call | Nanoseconds | Summary | host |
|
|
| `nomad.nomad.eval.list` | Time elapsed for `Eval.List` RPC call | Nanoseconds | Summary | host |
|
|
| `nomad.nomad.eval.nack` | Time elapsed for `Eval.Nack` RPC call | Nanoseconds | Summary | host |
|
|
| `nomad.nomad.eval.reap` | Time elapsed for `Eval.Reap` RPC call | Nanoseconds | Summary | host |
|
|
| `nomad.nomad.eval.reblock` | Time elapsed for `Eval.Reblock` RPC call | Nanoseconds | Summary | host |
|
|
| `nomad.nomad.eval.update` | Time elapsed for `Eval.Update` RPC call | Nanoseconds | Summary | host |
|
|
| `nomad.nomad.file_system.list` | Time elapsed for `FileSystem.List` RPC call | Nanoseconds | Summary | host |
|
|
| `nomad.nomad.file_system.logs` | Time elapsed to establish `FileSystem.Logs` RPC | Nanoseconds | Summary | host |
|
|
| `nomad.nomad.file_system.stat` | Time elapsed for `FileSystem.Stat` RPC call | Nanoseconds | Summary | host |
|
|
| `nomad.nomad.file_system.stream` | Time elapsed to establish `FileSystem.Stream` RPC | Nanoseconds | Summary | host |
|
|
| `nomad.nomad.fsm.alloc_client_update` | Time elapsed to apply `AllocClientUpdate` raft entry | Nanoseconds | Summary | host |
|
|
| `nomad.nomad.fsm.alloc_update_desired_transition` | Time elapsed to apply `AllocUpdateDesiredTransition` raft entry | Nanoseconds | Summary | host |
|
|
| `nomad.nomad.fsm.alloc_update` | Time elapsed to apply `AllocUpdate` raft entry | Nanoseconds | Summary | host |
|
|
| `nomad.nomad.fsm.apply_acl_policy_delete` | Time elapsed to apply `ApplyACLPolicyDelete` raft entry | Nanoseconds | Summary | host |
|
|
| `nomad.nomad.fsm.apply_acl_policy_upsert` | Time elapsed to apply `ApplyACLPolicyUpsert` raft entry | Nanoseconds | Summary | host |
|
|
| `nomad.nomad.fsm.apply_acl_token_bootstrap` | Time elapsed to apply `ApplyACLTokenBootstrap` raft entry | Nanoseconds | Summary | host |
|
|
| `nomad.nomad.fsm.apply_acl_token_delete` | Time elapsed to apply `ApplyACLTokenDelete` raft entry | Nanoseconds | Summary | host |
|
|
| `nomad.nomad.fsm.apply_acl_token_upsert` | Time elapsed to apply `ApplyACLTokenUpsert` raft entry | Nanoseconds | Summary | host |
|
|
| `nomad.nomad.fsm.apply_csi_plugin_delete` | Time elapsed to apply `ApplyCSIPluginDelete` raft entry | Nanoseconds | Summary | host |
|
|
| `nomad.nomad.fsm.apply_csi_volume_batch_claim` | Time elapsed to apply `ApplyCSIVolumeBatchClaim` raft entry | Nanoseconds | Summary | host |
|
|
| `nomad.nomad.fsm.apply_csi_volume_claim` | Time elapsed to apply `ApplyCSIVolumeClaim` raft entry | Nanoseconds | Summary | host |
|
|
| `nomad.nomad.fsm.apply_csi_volume_deregister` | Time elapsed to apply `ApplyCSIVolumeDeregister` raft entry | Nanoseconds | Summary | host |
|
|
| `nomad.nomad.fsm.apply_csi_volume_register` | Time elapsed to apply `ApplyCSIVolumeRegister` raft entry | Nanoseconds | Summary | host |
|
|
| `nomad.nomad.fsm.apply_deployment_alloc_health` | Time elapsed to apply `ApplyDeploymentAllocHealth` raft entry | Nanoseconds | Summary | host |
|
|
| `nomad.nomad.fsm.apply_deployment_delete` | Time elapsed to apply `ApplyDeploymentDelete` raft entry | Nanoseconds | Summary | host |
|
|
| `nomad.nomad.fsm.apply_deployment_promotion` | Time elapsed to apply `ApplyDeploymentPromotion` raft entry | Nanoseconds | Summary | host |
|
|
| `nomad.nomad.fsm.apply_deployment_status_update` | Time elapsed to apply `ApplyDeploymentStatusUpdate` raft entry | Nanoseconds | Summary | host |
|
|
| `nomad.nomad.fsm.apply_job_stability` | Time elapsed to apply `ApplyJobStability` raft entry | Nanoseconds | Summary | host |
|
|
| `nomad.nomad.fsm.apply_namespace_delete` | Time elapsed to apply `ApplyNamespaceDelete` raft entry | Nanoseconds | Summary | host |
|
|
| `nomad.nomad.fsm.apply_namespace_upsert` | Time elapsed to apply `ApplyNamespaceUpsert` raft entry | Nanoseconds | Summary | host |
|
|
| `nomad.nomad.fsm.apply_plan_results` | Time elapsed to apply `ApplyPlanResults` raft entry | Nanoseconds | Summary | host |
|
|
| `nomad.nomad.fsm.apply_scheduler_config` | Time elapsed to apply `ApplySchedulerConfig` raft entry | Nanoseconds | Summary | host |
|
|
| `nomad.nomad.fsm.autopilot` | Time elapsed to apply `Autopilot` raft entry | Nanoseconds | Summary | host |
|
|
| `nomad.nomad.fsm.batch_deregister_job` | Time elapsed to apply `BatchDeregisterJob` raft entry | Nanoseconds | Summary | host |
|
|
| `nomad.nomad.fsm.batch_deregister_node` | Time elapsed to apply `BatchDeregisterNode` raft entry | Nanoseconds | Summary | host |
|
|
| `nomad.nomad.fsm.batch_node_drain_update` | Time elapsed to apply `BatchNodeDrainUpdate` raft entry | Nanoseconds | Summary | host |
|
|
| `nomad.nomad.fsm.cluster_meta` | Time elapsed to apply `ClusterMeta` raft entry | Nanoseconds | Summary | host |
|
|
| `nomad.nomad.fsm.delete_eval` | Time elapsed to apply `DeleteEval` raft entry | Nanoseconds | Summary | host |
|
|
| `nomad.nomad.fsm.deregister_job` | Time elapsed to apply `DeregisterJob` raft entry | Nanoseconds | Summary | host |
|
|
| `nomad.nomad.fsm.deregister_node` | Time elapsed to apply `DeregisterNode` raft entry | Nanoseconds | Summary | host |
|
|
| `nomad.nomad.fsm.deregister_si_accessor` | Time elapsed to apply `DeregisterSITokenAccessor` raft entry | Nanoseconds | Summary | host |
|
|
| `nomad.nomad.fsm.deregister_vault_accessor` | Time elapsed to apply `DeregisterVaultAccessor` raft entry | Nanoseconds | Summary | host |
|
|
| `nomad.nomad.fsm.node_drain_update` | Time elapsed to apply `NodeDrainUpdate` raft entry | Nanoseconds | Summary | host |
|
|
| `nomad.nomad.fsm.node_eligibility_update` | Time elapsed to apply `NodeEligibilityUpdate` raft entry | Nanoseconds | Summary | host |
|
|
| `nomad.nomad.fsm.node_status_update` | Time elapsed to apply `NodeStatusUpdate` raft entry | Nanoseconds | Summary | host |
|
|
| `nomad.nomad.fsm.persist` | Time elapsed to apply `Persist` raft entry | Nanoseconds | Summary | host |
|
|
| `nomad.nomad.fsm.register_job` | Time elapsed to apply `RegisterJob` raft entry | Nanoseconds | Summary | host |
|
|
| `nomad.nomad.fsm.register_node` | Time elapsed to apply `RegisterNode` raft entry | Nanoseconds | Summary | host |
|
|
| `nomad.nomad.fsm.update_eval` | Time elapsed to apply `UpdateEval` raft entry | Nanoseconds | Summary | host |
|
|
| `nomad.nomad.fsm.upsert_node_events` | Time elapsed to apply `UpsertNodeEvents` raft entry | Nanoseconds | Summary | host |
|
|
| `nomad.nomad.fsm.upsert_scaling_event` | Time elapsed to apply `UpsertScalingEvent` raft entry | Nanoseconds | Summary | host |
|
|
| `nomad.nomad.fsm.upsert_si_accessor` | Time elapsed to apply `UpsertSITokenAccessors` raft entry | Nanoseconds | Summary | host |
|
|
| `nomad.nomad.fsm.upsert_vault_accessor` | Time elapsed to apply `UpsertVaultAccessor` raft entry | Nanoseconds | Summary | host |
|
|
| `nomad.nomad.job.allocations` | Time elapsed for `Job.Allocations` RPC call | Nanoseconds | Summary | host |
|
|
| `nomad.nomad.job.batch_deregister` | Time elapsed for `Job.BatchDeregister` RPC call | Nanoseconds | Summary | host |
|
|
| `nomad.nomad.job.deployments` | Time elapsed for `Job.Deployments` RPC call | Nanoseconds | Summary | host |
|
|
| `nomad.nomad.job.deregister` | Time elapsed for `Job.Deregister` RPC call | Nanoseconds | Summary | host |
|
|
| `nomad.nomad.job.dispatch` | Time elapsed for `Job.Dispatch` RPC call | Nanoseconds | Summary | host |
|
|
| `nomad.nomad.job.evaluate` | Time elapsed for `Job.Evaluate` RPC call | Nanoseconds | Summary | host |
|
|
| `nomad.nomad.job.evaluations` | Time elapsed for `Job.Evaluations` RPC call | Nanoseconds | Summary | host |
|
|
| `nomad.nomad.job.get_job_versions` | Time elapsed for `Job.GetJobVersions` RPC call | Nanoseconds | Summary | host |
|
|
| `nomad.nomad.job.get_job` | Time elapsed for `Job.GetJob` RPC call | Nanoseconds | Summary | host |
|
|
| `nomad.nomad.job.latest_deployment` | Time elapsed for `Job.LatestDeployment` RPC call | Nanoseconds | Summary | host |
|
|
| `nomad.nomad.job.list` | Time elapsed for `Job.List` RPC call | Nanoseconds | Summary | host |
|
|
| `nomad.nomad.job.plan` | Time elapsed for `Job.Plan` RPC call | Nanoseconds | Summary | host |
|
|
| `nomad.nomad.job.register` | Time elapsed for `Job.Register` RPC call | Nanoseconds | Summary | host |
|
|
| `nomad.nomad.job.revert` | Time elapsed for `Job.Revert` RPC call | Nanoseconds | Summary | host |
|
|
| `nomad.nomad.job.scale_status` | Time elapsed for `Job.ScaleStatus` RPC call | Nanoseconds | Summary | host |
|
|
| `nomad.nomad.job.scale` | Time elapsed for `Job.Scale` RPC call | Nanoseconds | Summary | host |
|
|
| `nomad.nomad.job.stable` | Time elapsed for `Job.Stable` RPC call | Nanoseconds | Summary | host |
|
|
| `nomad.nomad.job.validate` | Time elapsed for `Job.Validate` RPC call | Nanoseconds | Summary | host |
|
|
| `nomad.nomad.job_summary.get_job_summary` | Time elapsed for `Job.Summary` RPC call | Nanoseconds | Summary | host |
|
|
| `nomad.nomad.leader.barrier` | Time elapsed to establish a raft barrier during leader transition | Nanoseconds | Summary | host |
|
|
| `nomad.nomad.leader.reconcileMember` | Time elapsed to reconcile a serf peer with state store | Nanoseconds | Summary | host |
|
|
| `nomad.nomad.leader.reconcile` | Time elapsed to reconcile all serf peers with state store | Nanoseconds | Summary | host |
|
|
| `nomad.nomad.namespace.delete_namespaces` | Time elapsed for `Namespace.DeleteNamespaces` | Nanoseconds | Summary | host |
|
|
| `nomad.nomad.namespace.get_namespace` | Time elapsed for `Namespace.GetNamespace` | Nanoseconds | Summary | host |
|
|
| `nomad.nomad.namespace.get_namespaces` | Time elapsed for `Namespace.GetNamespaces` | Nanoseconds | Summary | host |
|
|
| `nomad.nomad.namespace.list_namespace` | Time elapsed for `Namespace.ListNamespaces` | Nanoseconds | Summary | host |
|
|
| `nomad.nomad.namespace.upsert_namespaces` | Time elapsed for `Namespace.UpsertNamespaces` | Nanoseconds | Summary | host |
|
|
| `nomad.nomad.periodic.force` | Time elapsed for `Periodic.Force` RPC call | Nanoseconds | Summary | host |
|
|
| `nomad.nomad.plan.apply` | Time elapsed to apply a plan | Nanoseconds | Summary | host |
|
|
| `nomad.nomad.plan.evaluate` | Time elapsed to evaluate a plan | Nanoseconds | Summary | host |
|
|
| `nomad.nomad.plan.node_rejected` | Number of times a node has had a plan rejected | Integer | Counter | host, node_id |
|
|
| `nomad.nomad.plan.queue_depth` | Count of evals in the plan queue | Integer | Gauge | host |
|
|
| `nomad.nomad.plan.submit` | Time elapsed for `Plan.Submit` RPC call | Nanoseconds | Summary | host |
|
|
| `nomad.nomad.plan.wait_for_index` | Time elapsed that planner waits for the raft index of the plan to be processed | Nanoseconds | Summary | host |
|
|
| `nomad.nomad.plugin.delete` | Time elapsed for `CSIPlugin.Delete` RPC call | Nanoseconds | Summary | host |
|
|
| `nomad.nomad.plugin.get` | Time elapsed for `CSIPlugin.Get` RPC call | Nanoseconds | Summary | host |
|
|
| `nomad.nomad.plugin.list` | Time elapsed for `CSIPlugin.List` RPC call | Nanoseconds | Summary | host |
|
|
| `nomad.nomad.scaling.get_policy` | Time elapsed for `Scaling.GetPolicy` RPC call | Nanoseconds | Summary | host |
|
|
| `nomad.nomad.scaling.list_policies` | Time elapsed for `Scaling.ListPolicies` RPC call | Nanoseconds | Summary | host |
|
|
| `nomad.nomad.search.prefix_search` | Time elapsed for `Search.PrefixSearch` RPC call | Nanoseconds | Summary | host |
|
|
| `nomad.nomad.vault.create_token` | Time elapsed to create Vault token | Nanoseconds | Gauge | host |
|
|
| `nomad.nomad.vault.distributed_tokens_revoked` | Count of revoked tokens | Integer | Gauge | host |
|
|
| `nomad.nomad.vault.lookup_token` | Time elapsed to lookup Vault token | Nanoseconds | Gauge | host |
|
|
| `nomad.nomad.vault.renew_failed` | Count of failed attempts to renew Vault token | Integer | Gauge | host |
|
|
| `nomad.nomad.vault.renew` | Time elapsed to renew Vault token | Nanoseconds | Gauge | host |
|
|
| `nomad.nomad.vault.revoke_tokens` | Time elapsed to revoke Vault tokens | Nanoseconds | Gauge | host |
|
|
| `nomad.nomad.vault.token_ttl` | Time to live for Vault token | Integer | Gauge | host |
|
|
| `nomad.nomad.vault.undistributed_tokens_abandoned` | Count of abandoned tokens | Integer | Gauge | host |
|
|
| `nomad.nomad.volume.claim` | Time elapsed for `CSIVolume.Claim` RPC call | Nanoseconds | Summary | host |
|
|
| `nomad.nomad.volume.deregister` | Time elapsed for `CSIVolume.Deregister` RPC call | Nanoseconds | Summary | host |
|
|
| `nomad.nomad.volume.get` | Time elapsed for `CSIVolume.Get` RPC call | Nanoseconds | Summary | host |
|
|
| `nomad.nomad.volume.list` | Time elapsed for `CSIVolume.List` RPC call | Nanoseconds | Summary | host |
|
|
| `nomad.nomad.volume.register` | Time elapsed for `CSIVolume.Register` RPC call | Nanoseconds | Summary | host |
|
|
| `nomad.nomad.volume.unpublish` | Time elapsed for `CSIVolume.Unpublish` RPC call | Nanoseconds | Summary | host |
|
|
| `nomad.nomad.worker.create_eval` | Time elapsed for worker to create an eval | Nanoseconds | Summary | host |
|
|
| `nomad.nomad.worker.dequeue_eval` | Time elapsed for worker to dequeue an eval | Nanoseconds | Summary | host |
|
|
| `nomad.nomad.worker.invoke_scheduler_service` | Time elapsed for worker to invoke the scheduler | Nanoseconds | Summary | host |
|
|
| `nomad.nomad.worker.send_ack` | Time elapsed for worker to send acknowledgement | Nanoseconds | Summary | host |
|
|
| `nomad.nomad.worker.submit_plan` | Time elapsed for worker to submit plan | Nanoseconds | Summary | host |
|
|
| `nomad.nomad.worker.update_eval` | Time elapsed for worker to submit updated eval | Nanoseconds | Summary | host |
|
|
| `nomad.nomad.worker.wait_for_index` | Time elapsed that worker waits for the raft index of the eval to be processed | Nanoseconds | Summary | host |
|
|
| `nomad.raft.appliedIndex` | Current index applied to FSM | Integer | Gauge | host |
|
|
| `nomad.raft.barrier` | Count of blocking raft API calls | Integer | Counter | host |
|
|
| `nomad.raft.commitNumLogs` | Count of logs enqueued | Integer | Gauge | host |
|
|
| `nomad.raft.commitTime` | Time elapsed to commit writes | Nanoseconds | Summary | host |
|
|
| `nomad.raft.fsm.apply` | Time elapsed to apply write to FSM | Nanoseconds | Summary | host |
|
|
| `nomad.raft.fsm.enqueue` | Time elapsed to enqueue write to FSM | Nanoseconds | Summary | host |
|
|
| `nomad.raft.lastIndex` | Most recent index seen | Integer | Gauge | host |
|
|
| `nomad.raft.leader.dispatchLog` | Time elapsed to write log, mark in flight, and start replication | Nanoseconds | Summary | host |
|
|
| `nomad.raft.leader.dispatchNumLogs` | Count of logs dispatched | Integer | Gauge | host |
|
|
| `nomad.raft.replication.appendEntries` | Raft transaction commit time | ms / Raft Log Append | Timer | |
|
|
| `nomad.raft.state.candidate` | Count of entering candidate state | Integer | Gauge | host |
|
|
| `nomad.raft.state.follower` | Count of entering follower state | Integer | Gauge | host |
|
|
| `nomad.raft.state.leader` | Count of entering leader state | Integer | Gauge | host |
|
|
| `nomad.raft.transition.heartbeat_timeout` | Count of failing to heartbeat and starting election | Integer | Gauge | host |
|
|
| `nomad.raft.transition.leader_lease_timeout` | Count of stepping down as leader after losing quorum | Integer | Gauge | host |
|
|
| `nomad.runtime.free_count` | Count of objects freed from heap by go runtime GC | Integer | Gauge | host |
|
|
| `nomad.runtime.gc_pause_ns` | Go runtime GC pause times | Nanoseconds | Summary | host |
|
|
| `nomad.runtime.sys_bytes` | Go runtime GC metadata size | # of bytes | Gauge | host |
|
|
| `nomad.runtime.total_gc_pause_ns` | Total elapsed go runtime GC pause times | Nanoseconds | Gauge | host |
|
|
| `nomad.runtime.total_gc_runs` | Count of go runtime GC runs | Integer | Gauge | host |
|
|
| `nomad.serf.queue.Event` | Count of memberlist events received | Integer | Summary | host |
|
|
| `nomad.serf.queue.Intent` | Count of memberlist changes | Integer | Summary | host |
|
|
| `nomad.serf.queue.Query` | Count of memberlist queries | Integer | Summary | host |
|
|
| `nomad.scheduler.allocs.rescheduled.attempted` | Count of attempts to reschedule an allocation | Integer | Count | alloc_id, job, namespace, task_group |
|
|
| `nomad.scheduler.allocs.rescheduled.limit` | Maximum number of attempts to reschedule an allocation | Integer | Count | alloc_id, job, namespace, task_group |
|
|
| `nomad.scheduler.allocs.rescheduled.wait_until` | Time that a rescheduled allocation will be delayed | Float | Gauge | alloc_id, job, namespace, task_group, follow_up_eval_id |
|
|
| `nomad.state.snapshotIndex` | Current snapshot index | Integer | Gauge | host |
|
|
|
|
## Raft BoltDB Metrics
|
|
|
|
Raft database metrics are emitted by the `raft-boltdb` library.
|
|
|
|
| Metric | Description | Unit | Type |
|
|
| ----------------------------------------- | ----------------------------------------- | ----------- | ------- |
|
|
| `nomad.raft.boltdb.numFreePages` | Number of free pages | Integer | Gauge |
|
|
| `nomad.raft.boltdb.numPendingPages` | Number of pending pages | Integer | Gauge |
|
|
| `nomad.raft.boltdb.freePageBytes` | Number of free page bytes | Integer | Gauge |
|
|
| `nomad.raft.boltdb.freelistBytes` | Number of freelist bytes | Integer | Gauge |
|
|
| `nomad.raft.boltdb.totalReadTxn` | Count of total read transactions | Integer | Counter |
|
|
| `nomad.raft.boltdb.openReadTxn` | Number of current open read transactions | Integer | Gauge |
|
|
| `nomad.raft.boltdb.txstats.pageCount` | Number of pages in use | Integer | Gauge |
|
|
| `nomad.raft.boltdb.txstats.pageAlloc` | Number of page allocations | Integer | Gauge |
|
|
| `nomad.raft.boltdb.txstats.cursorCount` | Count of total database cursors | Integer | Counter |
|
|
| `nomad.raft.boltdb.txstats.nodeCount` | Count of total database nodes | Integer | Counter |
|
|
| `nomad.raft.boltdb.txstats.nodeDeref` | Count of total database node dereferences | Integer | Counter |
|
|
| `nomad.raft.boltdb.txstats.rebalance` | Count of total rebalance operations | Integer | Counter |
|
|
| `nomad.raft.boltdb.txstats.rebalanceTime` | Sample of rebalance operation times | Nanoseconds | Summary |
|
|
| `nomad.raft.boltdb.txstats.split` | Count of total split operations | Integer | Counter |
|
|
| `nomad.raft.boltdb.txstats.spill` | Count of total spill operations | Integer | Counter |
|
|
| `nomad.raft.boltdb.txstats.spillTime` | Sample of spill operation times | Nanoseconds | Summary |
|
|
| `nomad.raft.boltdb.txstats.write` | Count of total write operations | Integer | Counter |
|
|
| `nomad.raft.boltdb.txstats.writeTime` | Sample of write operation times | Nanoseconds | Summary |
|
|
|
|
[tagged-metrics]: /docs/telemetry/metrics#tagged-metrics
|
|
[s_port_plan_failure]: /s/port-plan-failure
|
|
|
|
|