open-nomad/client/allochealth
Seth Hoenig d2d8ebbeba
consul: correctly interpret missing consul checks as unhealthy (#15822)
* consul: correctly understand missing consul checks as unhealthy

This PR fixes a bug where Nomad assumed any registered Checks would exist
in the service registration coming back from Consul. In some cases, the
Consul may be slow in processing the check registration, and the response
object would not contain checks. Nomad would then scan the empty response
looking for Checks with failing health status, finding none, and then
marking a task/alloc as healthy.

In reality, we must always use Nomad's view of what checks should exist as
the source of truth, and compare that with the response Consul gives us,
making sure they match, before scanning the Consul response for failing
check statuses.

Fixes #15536

* consul: minor CR refactor using maps not sets

* consul: observe transition from healthy to unhealthy checks

* consul: spell healthy correctly
2023-01-19 14:01:12 -06:00
..
tracker.go consul: correctly interpret missing consul checks as unhealthy (#15822) 2023-01-19 14:01:12 -06:00
tracker_test.go consul: correctly interpret missing consul checks as unhealthy (#15822) 2023-01-19 14:01:12 -06:00