Commit Graph

15 Commits

Author SHA1 Message Date
John Eikenberry 30d3a087dc
log warning about certificate expiring sooner and with more details
The old setting of 24 hours was not enough time to deal with an expiring certificates. This change ups it to 28 days OR 40% of the full cert duration, whichever is shorter. It also adds details to the log message to indicate which certificate it is logging about and a suggested action.
2023-04-07 20:38:07 +00:00
Ronald dd0e8eec14
copyright headers for agent folder (#16704)
* copyright headers for agent folder

* Ignore test data files

* fix proto files and remove headers in agent/uiserver folder

* ignore deep-copy files
2023-03-28 14:39:22 -04:00
Daniel Nephin 9ec7e07db4 ca: use the new leaf signing lookup func in leader metrics 2022-01-06 16:55:49 -05:00
Daniel Nephin 0a19d7fd76 agent: move agent tls metric monitor to a more appropriate place
And add a test for it
2021-10-27 16:26:09 -04:00
Daniel Nephin 1b2144c982 telemetry: set cert expiry metrics to NaN on start
So that followers do not report 0, which would make alerting difficult.
2021-10-27 15:19:25 -04:00
Daniel Nephin a7fcf14c5c telemetry: fix cert expiry metrics by removing labels
These labels should be set by whatever process scrapes Consul (for
prometheus), or by the agent that receives them (for datadog/statsd).

We need to remove them here because the labels are part of the "metric
key", so we'd have to pre-declare the metrics with the labels. We could
do that, but that is extra work for labels that should be added from
elsewhere.

Also renames the closure to be more descriptive.
2021-10-27 15:19:25 -04:00
Daniel Nephin 4300daa2e6 telemetry: only emit leader cert expiry metrics on the servers 2021-10-27 15:19:25 -04:00
Daniel Nephin 9de725c17d telemetry: prevent stale values from cert monitors
Prometheus scrapes metrics from each process, so when leadership transfers to a different node
the previous leader would still be reporting the old cached value.

By setting NaN, I believe we should zero-out the value, so that prometheus should only consider the
value from the new leader.
2021-10-27 15:19:25 -04:00
Daniel Nephin 616cc9b6f8 telemetry: improve cert expiry metrics
Emit the metric immediately so that after restarting an agent, the new expiry time will be
emitted. This is particularly important when this metric is being monitored, because we want
the alert to resovle itself immediately.

Also fixed a bug that was exposed in one of these metrics. The CARoot can be nil, so we have
to handle that case.
2021-10-27 15:19:25 -04:00
Daniel Nephin 210a850353 telemetry: add log message when certs are about to expire 2021-08-04 14:18:59 -04:00
Daniel Nephin 13aa7b70d5 telemetry: fix a couple bugs in cert expiry metrics
1. do not emit the metric if Query fails
2. properly check for PrimaryUsersIntermediate, the logic was inverted

Also improve the logging by including the metric name in the log message
2021-08-04 13:51:44 -04:00
Daniel Nephin 1673b3a68c telemetry: add a metric for agent TLS cert expiry 2021-08-04 13:51:44 -04:00
Dhia Ayachi e5dbf5e55b
Add ca certificate metrics (#10504)
* add intermediate ca metric routine

* add Gauge config for intermediate cert

* Stop metrics routine when stopping leader

* add changelog entry

* updage changelog

Co-authored-by: Daniel Nephin <dnephin@hashicorp.com>

* use variables instead of a map

* go imports sort

* Add metrics for primary and secondary ca

* start metrics routine in the right DC

* add telemetry documentation

* update docs

* extract expiry fetching in a func

* merge metrics for primary and secondary into signing ca metric

Co-authored-by: Daniel Nephin <dnephin@hashicorp.com>
2021-07-07 09:41:01 -04:00
Daniel Nephin e36800cefa Update metric name
and handle the case where there is no active root CA.
2021-06-14 17:01:16 -04:00
Daniel Nephin 548796ae13 connect: emit a metric for the number of seconds until root CA expiration 2021-06-14 16:57:01 -04:00