open-nomad/website/pages/docs/job-specification/check_restart.mdx

147 lines
4.1 KiB
Plaintext
Raw Normal View History

2017-09-14 23:44:27 +00:00
---
2020-02-06 23:45:31 +00:00
layout: docs
page_title: check_restart Stanza - Job Specification
sidebar_title: check_restart
2017-09-14 23:44:27 +00:00
description: |-
The "check_restart" stanza instructs Nomad when to restart tasks with
unhealthy service checks.
---
# `check_restart` Stanza
2020-02-06 23:45:31 +00:00
<Placement
groups={[
['job', 'group', 'task', 'service', 'check_restart'],
2020-09-30 13:48:40 +00:00
['job', 'group', 'task', 'service', 'check', 'check_restart'],
2020-02-06 23:45:31 +00:00
]}
/>
2017-09-14 23:44:27 +00:00
~> The `check_restart` stanza in Nomad is only supported for task networks,
*not* group networks. Please follow [#9176][gh-9176] to be notified when
this is fixed.
2017-09-14 23:44:27 +00:00
As of Nomad 0.7 the `check_restart` stanza instructs Nomad when to restart
2020-02-06 23:45:31 +00:00
tasks with unhealthy service checks. When a health check in Consul has been
2017-09-14 23:44:27 +00:00
unhealthy for the `limit` specified in a `check_restart` stanza, it is
restarted according to the task group's [`restart` policy][restart_stanza]. The
`check_restart` settings apply to [`check`s][check_stanza], but may also be
placed on [`service`s][service_stanza] to apply to all checks on a service.
If `check_restart` is set on both the check and service, the stanzas are
merged with the check values taking precedence.
2017-09-14 23:44:27 +00:00
```hcl
job "mysql" {
group "mysqld" {
restart {
attempts = 3
delay = "10s"
interval = "10m"
mode = "fail"
}
task "server" {
service {
tags = ["leader", "mysql"]
port = "db"
check {
type = "tcp"
port = "db"
interval = "10s"
timeout = "2s"
}
check {
type = "script"
name = "check_table"
command = "/usr/local/bin/check_mysql_table_status"
args = ["--verbose"]
interval = "60s"
timeout = "5s"
check_restart {
limit = 3
grace = "90s"
ignore_warnings = false
}
}
}
}
}
}
```
- `limit` `(int: 0)` - Restart task when a health check has failed `limit`
2020-02-06 23:45:31 +00:00
times. For example 1 causes a restart on the first failure. The default,
`0`, disables health check based restarts. Failures must be consecutive. A
2017-09-14 23:44:27 +00:00
single passing check will reset the count, so flapping services may not be
restarted.
- `grace` `(string: "1s")` - Duration to wait after a task starts or restarts
before checking its health.
- `ignore_warnings` `(bool: false)` - By default checks with both `critical`
2020-02-06 23:45:31 +00:00
and `warning` statuses are considered unhealthy. Setting `ignore_warnings = true` treats a `warning` status like `passing` and will not trigger a restart.
2017-09-14 23:44:27 +00:00
## Example Behavior
Using the example `mysql` above would have the following behavior:
```hcl
check_restart {
# ...
grace = "90s"
# ...
}
```
When the `server` task first starts and is registered in Consul, its health
will not be checked for 90 seconds. This gives the server time to startup.
```hcl
check_restart {
limit = 3
# ...
}
```
2020-02-06 23:45:31 +00:00
After the grace period if the script check fails, it has 180 seconds (`60s interval * 3 limit`) to pass before a restart is triggered. Once a restart is
2017-09-14 23:44:27 +00:00
triggered the task group's [`restart` policy][restart_stanza] takes control:
```hcl
restart {
# ...
delay = "10s"
# ...
}
```
The [`restart` stanza][restart_stanza] controls the restart behavior of the
task. In this case it will stop the task and then wait 10 seconds before
starting it again.
2017-09-14 23:44:27 +00:00
Once the task restarts Nomad waits the `grace` period again before starting to
check the task's health.
```hcl
restart {
attempts = 3
# ...
interval = "10m"
mode = "fail"
}
```
If the check continues to fail, the task will be restarted up to `attempts`
times within an `interval`. If the `restart` attempts are reached within the
`limit` then the `mode` controls the behavior. In this case the task would fail
and not be restarted again. See the [`restart` stanza][restart_stanza] for
details.
2020-02-06 23:45:31 +00:00
[check_stanza]: /docs/job-specification/service#check-parameters 'check stanza'
[gh-9176]: https://github.com/hashicorp/nomad/issues/9176
2020-02-06 23:45:31 +00:00
[restart_stanza]: /docs/job-specification/restart 'restart stanza'
[service_stanza]: /docs/job-specification/service 'service stanza'