Commit Graph

27 Commits

Author SHA1 Message Date
hashicorp-copywrite[bot] 005636afa0 [COMPLIANCE] Add Copyright and License Headers 2023-04-10 15:36:59 +00:00
Luiz Aoqui 7305a374e3
allocrunner: fix health check monitoring for Consul services (#16402)
Services must be interpolated to replace runtime variables before they
can be compared against the values returned by Consul.
2023-03-10 14:43:31 -05:00
Seth Hoenig d2d8ebbeba
consul: correctly interpret missing consul checks as unhealthy (#15822)
* consul: correctly understand missing consul checks as unhealthy

This PR fixes a bug where Nomad assumed any registered Checks would exist
in the service registration coming back from Consul. In some cases, the
Consul may be slow in processing the check registration, and the response
object would not contain checks. Nomad would then scan the empty response
looking for Checks with failing health status, finding none, and then
marking a task/alloc as healthy.

In reality, we must always use Nomad's view of what checks should exist as
the source of truth, and compare that with the response Consul gives us,
making sure they match, before scanning the Consul response for failing
check statuses.

Fixes #15536

* consul: minor CR refactor using maps not sets

* consul: observe transition from healthy to unhealthy checks

* consul: spell healthy correctly
2023-01-19 14:01:12 -06:00
Michael Schurter b2d22aef65
2 small data race fixes in logmon and check tests (#14538)
* logmon: fix data race around oldestLogFileIdx

* checks: fix 2 data races in tests

* logmon: move & rename lock to logically group
2022-09-13 12:54:06 -07:00
Seth Hoenig b3ea68948b build: run gofmt on all go source files
Go 1.19 will forecefully format all your doc strings. To get this
out of the way, here is one big commit with all the changes gofmt
wants to make.
2022-08-16 11:14:11 -05:00
Seth Hoenig 606e3ebdd4 client: updates from pr feedback 2022-07-21 09:54:27 -05:00
Seth Hoenig 297d386bdc client: add support for checks in nomad services
This PR adds support for specifying checks in services registered to
the built-in nomad service provider.

Currently only HTTP and TCP checks are supported, though more types
could be added later.
2022-07-12 17:09:50 -05:00
Radek Simko 9cc71d6665
client/allochealth: add healthy_deadline as context to error messages (#13214) 2022-06-06 10:11:08 -04:00
James Rasell a646333263
Merge branch 'main' into f-1.3-boogie-nights 2022-03-23 09:41:25 +01:00
Seth Hoenig 2631659551 ci: swap ci parallelization for unconstrained gomaxprocs 2022-03-15 12:58:52 -05:00
James Rasell 7cd28a6fb6
client: refactor common service registration objects from Consul.
This commit performs refactoring to pull out common service
registration objects into a new `client/serviceregistration`
package. This new package will form the base point for all
client specific service registration functionality.

The Consul specific implementation is not moved as it also
includes non-service registration implementations; this reduces
the blast radius of the changes as well.
2022-03-15 09:38:30 +01:00
Samantha 54f8c04c91
Fix health checking for ephemeral poststart tasks (#11945)
Update the logic in the Nomad client's alloc health tracker which
erroneously marks existing healthy allocations with dead poststart ephemeral
tasks as unhealthy even if they were already successful during a previous
deployment.
2022-02-02 16:29:49 -05:00
Drew Bailey 8507d54e3b
e2e test for on_update service checks
check_restart not compatible with on_update=ignore

reword caveat
2021-02-08 08:32:40 -05:00
Drew Bailey 82f971f289
OnUpdate configuration for services and checks
Allow for readiness type checks by configuring nomad to ignore warnings
or errors reported by a service check. This allows the deployment to
progress and while Consul handles introducing the sercive into a
resource pool once the check passes.
2021-02-08 08:32:40 -05:00
Tim Gross d55e3e2018 lifecycle: successful prestart tasks should not fail deployments
In 492d62d we prevented poststop tasks from contributing to allocation health
status, which fixed a bug where poststop tasks would prevent a deployment from
ever being marked successful. The patch introduced a regression where prestart
tasks that complete are causing the allocation to be marked unhealthy. This
changeset restores the previous behavior for prestart tasks.
2021-01-13 11:40:21 -05:00
Drew Bailey 03a9541822
ignore poststop task in alloc health tracker (#9548), fixes #9361
* investigating where to ignore poststop task in alloc health tracker

* ignore poststop when setting latest start time for allocation

* clean up logic

* lifecycle: isolate mocks for poststop deployment test

* lifecycle: update comments in tracker

Co-authored-by: Jasmine Dahilig <jasmine@dahilig.com>
2021-01-12 10:03:48 -08:00
Mahmood Ali 0ece631e60 allochealth: Fix when check health preceeds task health
Fix a bug where if the alloc check becomes healthy before the task health, the
alloc may never be considered healthy.
2020-05-13 07:44:39 -04:00
Mahmood Ali 934c5e8ff0 tests: tests for health check sequencing
Add a failing tests to show that if an alloc checks is marked healthy before the
alloc tasks start up, the alloc may be forever considered unhealthy.
2020-05-13 07:43:00 -04:00
Mahmood Ali fa1244f8c5 health tracker: account for group service checks 2020-03-22 12:38:37 -04:00
Mahmood Ali d61140dcac health check account for task lifecycle
In service jobs, lifecycles non-sidecar task tweak health logic a bit:
they may terminate successfully without impacting alloc health, but fail
the alloc if they fail.

Sidecars should be treated just like a normal task.
2020-03-22 12:37:40 -04:00
Mahmood Ali 07a30580ac health: fail health if any task is pending
Fixes a bug where an allocation is considered healthy if some of the
tasks are being restarted and as such, their checks aren't tracked by
consul agent client.

Here, we fix the immediate case by ensuring that an alloc is healthy
only if tasks are running and the registered checks at the time are
healthy.

Previously, health tracker tracked task "health" independently from
checks and leads to problems when a task restarts.  Consider the
following series of events:

1. all tasks start running -> `tracker.tasksHealthy` is true
2. one task has unhealthy checks and get restarted
3. remaining checks are healthy -> `tracker.checksHealthy` is true
4. propagate health status now that `tracker.tasksHealthy` and
`tracker.checksHealthy`.

This change ensures that we accurately use the latest status of tasks
and checks regardless of their status changes.

Also, ensures that we only consider check health after tasks are
considered healthy, otherwise we risk trusting incomplete checks.

This approach accomodates task dependencies well.  Service jobs can have
prestart short-lived tasks that will terminate before main process runs.
These dead tasks that complete successfully will not negate health
status.
2020-03-22 11:13:41 -04:00
Nick Ethier bd454a4c6f
client: improve group service stanza interpolation and check_re… (#6586)
* client: improve group service stanza interpolation and check_restart support

Interpolation can now be done on group service stanzas. Note that some task runtime specific information
that was previously available when the service was registered poststart of a task is no longer available.

The check_restart stanza for checks defined on group services will now properly restart the allocation upon
check failures if configured.
2019-11-18 13:04:01 -05:00
Michael Schurter fb487358fb
connect: add group.service stanza support 2019-07-31 01:04:05 -04:00
Michael Schurter 159042a1a3 client: fix setting alloc unhealthy at deadline
During the 0.9 client refactor the code to fail a deployment when the
deadline was reached was broken. This restores and tests that behavior.
2019-02-19 07:44:14 -08:00
Michael Schurter 4e7ea460e8 test: port some pre-0.9 DeploymentHealth tests
Skipping a failing one as I need to move to some other work and don't
want to leave this work orphaned on my machine.
2019-01-14 09:56:53 -08:00
Michael Schurter 944ea6d38b client: emit last sent alloc to new listeners
Fixes a deadlock where the allocwatcher would block forever waiting for
an update from a terminal alloc.

Made the broadcaster easier to debug as well.
2018-11-27 14:06:08 -08:00
Michael Schurter 4136e59f79 arv2: implement alloc health watching
Also remove initial alloc from broadcaster as it just caused useless
extra processing.
2018-10-16 16:53:30 -07:00