The shortlink /s/port-plan-failure is logged when a plan for a node is
rejected to help users debug and mitigate repeated `plan for node
rejected` failures.
The current link to #9506 is... less than useful. It is not clear to
users what steps they should take to either fix their cluster or
contribute to the issue.
While .../monitoring-nomad#progess isn't as comprehensive as it could
be, it's a much more gentle introduction to the class of bug than the
original issue.
Use the new filtering and pagination capabilities of the `Eval.List`
RPC to provide filtering and pagination at the command line.
Also includes note that `nomad eval status -json` is deprecated and
will be replaced with a single evaluation view in a future version of
Nomad.
This changeset adds more specific recommendations as to what metrics
to monitor, and what resources should be examined during incident
response.
It also renames the "Telemetry" section to "Monitoring Nomad" to
surface the material better and distinguish it from the "Metric
Reference".
Co-authored-by: Charlie Voiselle <464492+angrycub@users.noreply.github.com>
Create a new "Operating Nomad" section of the docs where we can put reference
material for operators that doesn't quite fit in either configuration file /
command line documentation or a step-by-step Learn Guide. Pre-populate this
with the existing telemetry docs and some links out to the Learn Guide
sections.