Commit graph

12 commits

Author SHA1 Message Date
Derek Strickland d7f44448e1 disconnected clients: Observability plumbing (#12141)
* Add disconnects/reconnect to log output and emit reschedule metrics

* TaskGroupSummary: Add Unknown, update StateStore logic, add to metrics
2022-04-05 17:12:23 -04:00
Seth Hoenig de95998faa core: switch to go.etc.io/bbolt
This PR swaps the underlying BoltDB implementation from boltdb/bolt
to go.etc.io/bbolt.

In addition, the Server has a new configuration option for disabling
NoFreelistSync on the underlying database.

Freelist option: https://github.com/etcd-io/bbolt/blob/master/db.go#L81
Consul equivelent PR: https://github.com/hashicorp/consul/pull/11720
2022-02-23 14:26:41 -06:00
Luiz Aoqui 626e633b41
docs: add nomad.plan.node_rejected metric (#11860) 2022-01-18 13:47:20 -05:00
Tim Gross 32f150d469
docs: new scheduler metrics (#11790)
* Fixed name of `nomad.scheduler.allocs.reschedule` metric
* Added new metrics to metrics reference documentation
* Expanded definitions of "waiting" metrics
* Changelog entry for #10236 and #10237
2022-01-07 09:51:15 -05:00
Tim Gross 348f482c94
docs: improve docs for troubleshooting and monitoring scheduler (#11623)
This changeset adds more specific recommendations as to what metrics
to monitor, and what resources should be examined during incident
response.

It also renames the "Telemetry" section to "Monitoring Nomad" to
surface the material better and distinguish it from the "Metric
Reference".

Co-authored-by: Charlie Voiselle <464492+angrycub@users.noreply.github.com>
2021-12-07 15:52:13 -05:00
James Rasell d44e5620dd
docs: add license expiry metric to metrics website doc. 2021-12-07 10:31:51 +00:00
kfenech1 26a0158ead
docs: nomad.client.unallocated.memory is in Megabytes not bytes (#11468) 2021-11-08 11:05:11 -05:00
Michael Schurter fff95b0697
docs: improve wait_for_index metrics description (#10717)
Old description of `{plan,worker}.wait_for_index` described the metric
in terms of waiting for a snapshot which has two problems:

1. "Snapshot" is an overloaded term in Nomad and operators can't be
   expected to know which use we're referring to here.
2. The most important thing about the metric is what we're waiting *on*
   before taking a snapshot: the raft index of the object to be
   processed (plan or eval).

The new description tries to cram all of that context into the tiny
space provided.

See #5791 for details about the `wait_for_index` mechanism in general.
2021-06-09 08:53:06 -04:00
Luiz Aoqui f1b9055d21
Add metrics for blocked eval resources (#10454)
* add metrics for blocked eval resources

* docs: add new blocked_evals metrics

* fix to call `pruneStats` instead of `stats.prune` directly
2021-04-29 15:03:45 -04:00
Bryce Kalow a6ca40fa4e
feat(website): migrates to new nav data format (#10264) 2021-03-31 08:43:17 -05:00
Tim Gross cf052cfee5 docs: add metrics from raft leadership transitions 2021-01-27 11:50:11 -05:00
Jeff Escalante eaaafd9dd4
implement mdx remote 2021-01-05 19:02:39 -05:00