- Moved federation docs to the bottom since *everyone* is potentially
affected by the other sections on the page, but only users of
federation are affected by it.
- Added section on the plan for node rejected bug since it is fairly
easy to diagnose and removing affected nodes is a fairly reliable
workaround.
- Mention 5s cliff for wait_for_index.
- Remove the lie that we do not have job status metrics! How old was
that?!
- Reinforce the importance of monitoring basic system resources
This changeset adds more specific recommendations as to what metrics
to monitor, and what resources should be examined during incident
response.
It also renames the "Telemetry" section to "Monitoring Nomad" to
surface the material better and distinguish it from the "Metric
Reference".
Co-authored-by: Charlie Voiselle <464492+angrycub@users.noreply.github.com>
2021-12-07 15:52:13 -05:00
Renamed from website/content/docs/operations/telemetry.mdx (Browse further)