open-nomad/website/content/docs/job-specification/check_restart.mdx

Ignoring revisions in .git-blame-ignore-revs. Click here to bypass and see the normal blame view.

144 lines
3.9 KiB
Plaintext
Raw Normal View History

2017-09-14 23:44:27 +00:00
---
2020-02-06 23:45:31 +00:00
layout: docs
page_title: check_restart Block - Job Specification
2017-09-14 23:44:27 +00:00
description: |-
The "check_restart" block instructs Nomad when to restart tasks with
2017-09-14 23:44:27 +00:00
unhealthy service checks.
---
# `check_restart` Block
2017-09-14 23:44:27 +00:00
2020-02-06 23:45:31 +00:00
<Placement
groups={[
['job', 'group', 'task', 'service', 'check_restart'],
2020-09-30 13:48:40 +00:00
['job', 'group', 'task', 'service', 'check', 'check_restart'],
2020-02-06 23:45:31 +00:00
]}
/>
2017-09-14 23:44:27 +00:00
The `check_restart` block instructs Nomad when to restart tasks with unhealthy
service checks. When a health check in Nomad or Consul has been unhealthy for the `limit`
specified in a `check_restart` block, it is restarted according to the task group's
[`restart` policy][restart_block]. The `check_restart` settings apply to
[`check`s][check_block], but may also be placed on [`service`s][service_block]
to apply to all checks on a service. If `check_restart` is set on both the check
and service, the blocks are merged with the check values taking precedence.
2017-09-14 23:44:27 +00:00
```hcl
job "mysql" {
group "mysqld" {
restart {
attempts = 3
delay = "10s"
interval = "10m"
mode = "fail"
}
task "server" {
service {
tags = ["leader", "mysql"]
port = "db"
check {
type = "tcp"
port = "db"
interval = "10s"
timeout = "2s"
}
check {
type = "script"
name = "check_table"
command = "/usr/local/bin/check_mysql_table_status"
args = ["--verbose"]
interval = "60s"
timeout = "5s"
check_restart {
limit = 3
grace = "90s"
ignore_warnings = false
}
}
}
}
}
}
```
- `limit` `(int: 0)` - Restart task when a health check has failed `limit`
2020-02-06 23:45:31 +00:00
times. For example 1 causes a restart on the first failure. The default,
`0`, disables health check based restarts. Failures must be consecutive. A
2017-09-14 23:44:27 +00:00
single passing check will reset the count, so flapping services may not be
restarted.
- `grace` `(string: "1s")` - Duration to wait after a task starts or restarts
before checking its health.
- `ignore_warnings` `(bool: false)` - By default checks with both `critical`
and `warning` statuses are considered unhealthy. Setting `ignore_warnings = true`
treats a `warning` status like `passing` and will not trigger a restart. Only
available in the Consul service provider.
2017-09-14 23:44:27 +00:00
## Example Behavior
Using the example `mysql` above would have the following behavior:
```hcl
check_restart {
# ...
grace = "90s"
# ...
}
```
When the `server` task first starts and is registered in Consul, its health
will not be checked for 90 seconds. This gives the server time to startup.
```hcl
check_restart {
limit = 3
# ...
}
```
After the grace period if the script check fails, it has 180 seconds (`60s interval * 3 limit`)
to pass before a restart is triggered. Once a restart is triggered the task group's
[`restart` policy][restart_block] takes control:
2017-09-14 23:44:27 +00:00
```hcl
restart {
# ...
delay = "10s"
# ...
}
```
The [`restart` block][restart_block] controls the restart behavior of the
task. In this case it will stop the task and then wait 10 seconds before
starting it again.
2017-09-14 23:44:27 +00:00
Once the task restarts Nomad waits the `grace` period again before starting to
check the task's health.
```hcl
restart {
attempts = 3
# ...
interval = "10m"
mode = "fail"
}
```
If the check continues to fail, the task will be restarted up to `attempts`
times within an `interval`. If the `restart` attempts are reached within the
`limit` then the `mode` controls the behavior. In this case the task would fail
and not be restarted again. See the [`restart` block][restart_block] for
2017-09-14 23:44:27 +00:00
details.
[check_block]: /nomad/docs/job-specification/service#check-parameters 'check block'
[gh-9176]: https://github.com/hashicorp/nomad/issues/9176
[restart_block]: /nomad/docs/job-specification/restart 'restart block'
[service_block]: /nomad/docs/job-specification/service 'service block'