69ced2a2bd
This PR removes the assertion around when the 'task' field of a check may be set. Starting in Nomad 1.4 we automatically set the task field on all checks in support of the NSD checks feature. This is causing validation problems elsewhere, e.g. when a group service using the Consul provider sets 'task' it will fail validation that worked previously. The assertion of leaving 'task' unset was only about making sure job submitters weren't expecting some behavior, but in practice is causing bugs now that we need the task field for more than it was originally added for. We can simply update the docs, noting when the task field set by job submitters actually has value.
442 lines
15 KiB
Plaintext
442 lines
15 KiB
Plaintext
---
|
|
layout: docs
|
|
page_title: check Block - Job Specification
|
|
description: |-
|
|
The "check" block declares service check definition for a service registered into the Nomad or Consul service provider.
|
|
---
|
|
|
|
# `check` Stanza
|
|
|
|
<Placement
|
|
groups={[
|
|
['job', 'group', 'service', 'check'],
|
|
['job', 'group', 'task', 'service', 'check'],
|
|
]}
|
|
/>
|
|
|
|
The `check` block instructs Nomad to register a check associated with a [service][service]
|
|
into the Nomad or Consul service provider.
|
|
|
|
```hcl
|
|
job "example" {
|
|
datacenters = ["dc1"]
|
|
|
|
group "cache" {
|
|
network {
|
|
port "db" { to = 6379 }
|
|
}
|
|
|
|
service {
|
|
provider = "nomad"
|
|
name = "redis"
|
|
port = "db"
|
|
check {
|
|
name = "redis_probe"
|
|
type = "tcp"
|
|
interval = "10s"
|
|
timeout = "1s"
|
|
}
|
|
}
|
|
|
|
task "redis" {
|
|
driver = "docker"
|
|
config {
|
|
image = "redis:7"
|
|
ports = ["db"]
|
|
}
|
|
}
|
|
}
|
|
}
|
|
```
|
|
|
|
### `check` Parameters
|
|
|
|
- `address_mode` `(string: "host")` - Same as `address_mode` on `service`.
|
|
Unlike services, checks do not have an `auto` address mode as there's no way
|
|
for Nomad to know which is the best address to use for checks. Consul needs
|
|
access to the address for any HTTP or TCP checks. See
|
|
[below for details.](#using-driver-address-mode) Unlike `port`, this setting
|
|
is _not_ inherited from the `service`.
|
|
If the service `address` is set and the check `address_mode` is not set, the
|
|
service `address` value will be used for the check address.
|
|
|
|
- `args` `(array<string>: [])` - Specifies additional arguments to the
|
|
`command`. This only applies to script-based health checks.
|
|
|
|
- `check_restart` - See [`check_restart` stanza][check_restart_stanza].
|
|
|
|
- `command` `(string: <varies>)` - Specifies the command to run for performing
|
|
the health check. The script must exit: 0 for passing, 1 for warning, or any
|
|
other value for a failing health check. This is required for script-based
|
|
health checks. Only supported in the Consul service provider.
|
|
|
|
~> **Caveat:** The command must be the path to the command on disk, and no
|
|
shell exists by default. That means operators like `||` or `&&` are not
|
|
available. Additionally, all arguments must be supplied via the `args`
|
|
parameter. To achieve the behavior of shell operators, specify the command
|
|
as a shell, like `/bin/bash` and then use `args` to run the check.
|
|
|
|
- `grpc_service` `(string: <optional>)` - What service, if any, to specify in
|
|
the gRPC health check. gRPC health checks require Consul 1.0.5 or later.
|
|
|
|
- `grpc_use_tls` `(bool: false)` - Use TLS to perform a gRPC health check. May
|
|
be used with `tls_skip_verify` to use TLS but skip certificate verification.
|
|
|
|
- `initial_status` `(string: <enum>)` - Specifies the starting status of the
|
|
service. Valid options are `passing`, `warning`, and `critical`. Omitting
|
|
this field (or submitting an empty string) will result in the Consul default
|
|
behavior, which is `critical`. Only supported in the Consul service provider.
|
|
In the Nomad service provider, the initial status of a check is `pending`
|
|
until Nomad produces an initial check status result.
|
|
|
|
- `success_before_passing` `(int:0)` - The number of consecutive successful checks
|
|
required before Consul will transition the service status to [`passing`][consul_passfail].
|
|
Only supported in the Consul service provider.
|
|
|
|
- `failures_before_critical` `(int:0)` - The number of consecutive failing checks
|
|
required before Consul will transition the service status to [`critical`][consul_passfail].
|
|
Only supported in the Consul service provider.
|
|
|
|
- `interval` `(string: <required>)` - Specifies the frequency of the health checks
|
|
that Consul or Nomad service provider will perform. This is specified using a label
|
|
suffix like "30s" or "1h". This must be greater than or equal to "1s".
|
|
|
|
- `method` `(string: "GET")` - Specifies the HTTP method to use for HTTP
|
|
checks. Must be a valid HTTP method.
|
|
|
|
- `body` `(string: "")` - Specifies the HTTP body to use for HTTP checks.
|
|
|
|
- `name` `(string: "service: <name> check")` - Specifies the name of the health
|
|
check. If the name is not specified Nomad generates one based on the service name.
|
|
|
|
- `path` `(string: <varies>)` - Specifies the path of the HTTP endpoint which
|
|
will be queried to observe the health of a service. Nomad will automatically
|
|
add the IP of the service and the port, so this is just the relative URL to
|
|
the health check endpoint. This is required for http-based health checks.
|
|
|
|
- `expose` `(bool: false)` - Specifies whether an [Expose Path](/docs/job-specification/expose#path-parameters)
|
|
should be automatically generated for this check. Only compatible with
|
|
Connect-enabled task-group services using the default Connect proxy. If set, check
|
|
[`type`][type] must be `http` or `grpc`, and check `name` must be set.
|
|
Only supported in the Consul service provider.
|
|
|
|
- `port` `(string: <varies>)` - Specifies the label of the port on which the
|
|
check will be performed. Note this is the _label_ of the port and not the port
|
|
number unless `address_mode = driver`. The port label must match one defined
|
|
in the [`network`][network] stanza. If a port value was declared on the
|
|
`service`, this will inherit from that value if not supplied. If supplied,
|
|
this value takes precedence over the `service.port` value. This is useful for
|
|
services which operate on multiple ports. `grpc`, `http`, and `tcp` checks
|
|
require a port while `script` checks do not. Checks will use the host IP and
|
|
ports by default. In Nomad 0.7.1 or later numeric ports may be used if
|
|
`address_mode="driver"` is set on the check.
|
|
|
|
- `protocol` `(string: "http")` - Specifies the protocol for the http-based
|
|
health checks. Valid options are `http` and `https`.
|
|
|
|
- `task` `(string: "")` - Specifies the task associated with this
|
|
check. Scripts are executed within the task's environment, and
|
|
`check_restart` stanzas will apply to the specified task. Inherits
|
|
the [`service.task`][service_task] value if not set. Must be unset
|
|
or equivelent to `service.task` in task-level services.
|
|
|
|
- `timeout` `(string: <required>)` - Specifies how long to wait for a health check
|
|
query to succeed. This is specified using a label suffix like "30s" or "1h". This
|
|
must be greater than or equal to "1s"
|
|
|
|
~> **Caveat:** Script checks use the task driver to execute in the task's
|
|
environment. For task drivers with namespace isolation such as `docker` or
|
|
`exec`, setting up the context for the script check may take an unexpectedly
|
|
long amount of time (a full second or two), especially on busy hosts. The
|
|
timeout configuration must allow for both this setup and the execution of
|
|
the script. Operators should use long timeouts (5 or more seconds) for script
|
|
checks, and monitor telemetry for
|
|
`client.allocrunner.taskrunner.tasklet_timeout`.
|
|
|
|
- `type` `(string: <required>)` - This indicates the check types supported by
|
|
Nomad. For Consul service checks, valid options are `grpc`, `http`, `script`,
|
|
and `tcp`. For Nomad service checks, valid options are `http` and `tcp`.
|
|
|
|
- `tls_skip_verify` `(bool: false)` - Skip verifying TLS certificates for HTTPS
|
|
checks. Only supported in the Consul service provider.
|
|
|
|
- `on_update` `(string: "require_healthy")` - Specifies how checks should be
|
|
evaluated when determining deployment health (including a job's initial
|
|
deployment). This allows job submitters to define certain checks as readiness
|
|
checks, progressing a deployment even if the Service's checks are not yet
|
|
healthy. Checks inherit the Service's value by default. The check status is
|
|
not altered in the service provider and is only used to determine the check's
|
|
health during an update.
|
|
|
|
- `require_healthy` - In order for Nomad to consider the check healthy during
|
|
an update it must report as healthy.
|
|
|
|
- `ignore_warnings` - If a Service Check reports as warning, Nomad will treat
|
|
the check as healthy. The Check will still be in a warning state in Consul.
|
|
|
|
- `ignore` - Any status will be treated as healthy.
|
|
|
|
~> **Caveat:** `on_update` is only compatible with certain
|
|
[`check_restart`][check_restart_stanza] configurations. `on_update = "ignore_warnings"` requires that `check_restart.ignore_warnings = true`.
|
|
`check_restart` can however specify `ignore_warnings = true` with `on_update = "require_healthy"`. If `on_update` is set to `ignore`, `check_restart` must
|
|
be omitted entirely.
|
|
|
|
#### `header` Stanza
|
|
|
|
HTTP checks may include a `header` stanza to set HTTP headers. The `header`
|
|
stanza parameters have lists of strings as values. Multiple values will cause
|
|
the header to be set multiple times, once for each value.
|
|
|
|
```hcl
|
|
service {
|
|
# ...
|
|
check {
|
|
type = "http"
|
|
port = "lb"
|
|
path = "/_healthz"
|
|
interval = "5s"
|
|
timeout = "2s"
|
|
header {
|
|
Authorization = ["Basic ZWxhc3RpYzpjaGFuZ2VtZQ=="]
|
|
}
|
|
}
|
|
}
|
|
```
|
|
|
|
### HTTP Health Check
|
|
|
|
This example shows a service with an HTTP health check. This will query the
|
|
service on the IP and port registered with Nomad at `/_healthz` every 5 seconds,
|
|
giving the service a maximum of 2 seconds to return a response, and include an
|
|
Authorization header. Any non-2xx code is considered a failure.
|
|
|
|
```hcl
|
|
service {
|
|
check {
|
|
type = "http"
|
|
port = "lb"
|
|
path = "/_healthz"
|
|
interval = "5s"
|
|
timeout = "2s"
|
|
header {
|
|
Authorization = ["Basic ZWxhc3RpYzpjaGFuZ2VtZQ=="]
|
|
}
|
|
}
|
|
}
|
|
```
|
|
|
|
### Multiple Health Checks
|
|
|
|
This example shows a service with multiple health checks defined. All health
|
|
checks must be passing in order for the service to register as healthy.
|
|
|
|
```hcl
|
|
service {
|
|
check {
|
|
name = "HTTP Check"
|
|
type = "http"
|
|
port = "lb"
|
|
path = "/_healthz"
|
|
interval = "5s"
|
|
timeout = "2s"
|
|
}
|
|
|
|
check {
|
|
name = "HTTPS Check"
|
|
type = "http"
|
|
protocol = "https"
|
|
port = "lb"
|
|
path = "/_healthz"
|
|
interval = "5s"
|
|
timeout = "2s"
|
|
method = "POST"
|
|
}
|
|
|
|
check {
|
|
name = "Postgres Check"
|
|
type = "script"
|
|
command = "/usr/local/bin/pg-tools"
|
|
args = ["verify", "database", "prod", "up"]
|
|
interval = "5s"
|
|
timeout = "2s"
|
|
on_update = "ignore_warnings"
|
|
}
|
|
}
|
|
```
|
|
|
|
### gRPC Health Check
|
|
|
|
gRPC health checks use the same host and port behavior as `http` and `tcp`
|
|
checks, but gRPC checks also have an optional gRPC service to health check. Not
|
|
all gRPC applications require a service to health check.
|
|
|
|
```hcl
|
|
service {
|
|
check {
|
|
type = "grpc"
|
|
port = "rpc"
|
|
interval = "5s"
|
|
timeout = "2s"
|
|
grpc_service = "example.Service"
|
|
grpc_use_tls = true
|
|
tls_skip_verify = true
|
|
}
|
|
}
|
|
```
|
|
|
|
In this example Consul would health check the `example.Service` service on the
|
|
`rpc` port defined in the task's [network resources][network] stanza. See
|
|
[Using Driver Address Mode](#using-driver-address-mode) for details on address
|
|
selection.
|
|
|
|
### Script Checks with Shells
|
|
|
|
Note that script checks run inside the task. If your task is a Docker container,
|
|
the script will run inside the Docker container. If your task is running in a
|
|
chroot, it will run in the chroot. Please keep this in mind when authoring check
|
|
scripts.
|
|
|
|
This example shows a service with a script check that is evaluated and interpolated in a shell; it
|
|
tests whether a file is present at `${HEALTH_CHECK_FILE}` environment variable:
|
|
|
|
```hcl
|
|
service {
|
|
check {
|
|
type = "script"
|
|
command = "/bin/bash"
|
|
args = ["-c", "test -f ${HEALTH_CHECK_FILE}"]
|
|
}
|
|
}
|
|
```
|
|
|
|
Using `/bin/bash` (or another shell) is required here to interpolate the `${HEALTH_CHECK_FILE}` value.
|
|
|
|
The following examples of `command` fields **will not work**:
|
|
|
|
```hcl
|
|
# invalid because command is not a path
|
|
check {
|
|
type = "script"
|
|
command = "test -f /tmp/file.txt"
|
|
}
|
|
|
|
# invalid because path will not be interpolated
|
|
check {
|
|
type = "script"
|
|
command = "/bin/test"
|
|
args = ["-f", "${HEALTH_CHECK_FILE}"]
|
|
}
|
|
```
|
|
|
|
### Healthiness vs. Readiness Checks
|
|
|
|
Multiple checks for a service can be composed to create healthiness and readiness
|
|
checks by configuring [`on_update`][on_update] for the check.
|
|
|
|
```hcl
|
|
service {
|
|
# This is a healthiness check that will be used to verify the service
|
|
# is responsive to tcp connections and behaving as expected.
|
|
check {
|
|
name = "connection_tcp"
|
|
type = "tcp"
|
|
port = 6379
|
|
interval = "10s"
|
|
timeout = "2s"
|
|
}
|
|
|
|
# This is a readiness check that is used to verify that, for example, the
|
|
# application has elected a leader by making a request to its /leader endpoint.
|
|
# Failures of this check are ignored during deployments.
|
|
check {
|
|
name = "leader_elected"
|
|
type = "http"
|
|
path = "/leader"
|
|
interval = "10s"
|
|
timeout = "2s"
|
|
on_update = "ignore_warnings"
|
|
}
|
|
}
|
|
```
|
|
|
|
For checks registered into the Nomad service provider, the status information will
|
|
indicate `Mode = readiness` for readiness checks and `Mode = healthiness` for health
|
|
checks.
|
|
|
|
### Check status on CLI
|
|
|
|
For checks registered into the Nomad service provider, the status information of
|
|
checks can be viewed per-allocation. The `alloc status` command now includes
|
|
summary information for Nomad service checks.
|
|
|
|
```
|
|
➜ nomad alloc status <allocation-id>
|
|
```
|
|
|
|
```
|
|
Nomad Service Checks:
|
|
Service Task Name Mode Status
|
|
database task db_tcp_probe readiness success
|
|
web (group) healthz healthiness failure
|
|
web (group) index-page healthiness success
|
|
```
|
|
|
|
The `alloc checks` command can be used for viewing complete check status information
|
|
for all checks in an allocation.
|
|
|
|
```
|
|
➜ noamd alloc checks <allocation-id>
|
|
```
|
|
|
|
```
|
|
Status of 3 Nomad Service Checks
|
|
|
|
ID = d8651d93a50b9e28375a7beb9418c418
|
|
Name = db_tcp_probe
|
|
Group = example.group[0]
|
|
Task = task
|
|
Service = database
|
|
Status = success
|
|
Mode = readiness
|
|
Timestamp = 2022-08-22T10:41:23-05:00
|
|
Output = nomad: tcp ok
|
|
|
|
ID = 0413b61bda7014f02671675d7e146373
|
|
Name = index-page
|
|
Group = example.group[0]
|
|
Task = (group)
|
|
Service = web
|
|
Status = success
|
|
StatusCode = 200
|
|
Mode = healthiness
|
|
Timestamp = 2022-08-22T10:41:23-05:00
|
|
Output = nomad: http ok
|
|
|
|
ID = c3cce3f0c97975f84bbf39bdd50deaea
|
|
Name = healthz
|
|
Group = example.group[0]
|
|
Task = (group)
|
|
Service = web
|
|
Status = failure
|
|
Mode = healthiness
|
|
Timestamp = 2022-08-22T10:41:23-05:00
|
|
Output = nomad: Get "http://:9999/": dial tcp :9999: connect: connection refused
|
|
```
|
|
|
|
---
|
|
|
|
<sup>
|
|
<small>1</small>
|
|
</sup>
|
|
<small>
|
|
{' '}
|
|
Script checks are not supported for the QEMU driver since the Nomad client
|
|
does not have access to the file system of a task for that driver.
|
|
</small>
|
|
|
|
[check_restart_stanza]: /docs/job-specification/check_restart
|
|
[consul_passfail]: https://www.consul.io/docs/discovery/checks#success-failures-before-passing-critical
|
|
[network]: /docs/job-specification/network 'Nomad network Job Specification'
|
|
[service]: /docs/job-specification/service
|
|
[service_task]: /docs/job-specification/service#task-1
|
|
[on_update]: /docs/job-specification/service#on_update |