2014-02-19 02:05:18 +00:00
---
2020-04-07 18:55:19 +00:00
layout: docs
2020-09-01 15:14:13 +00:00
page_title: Monitor Services - Check Definitions
2020-04-07 18:55:19 +00:00
description: >-
One of the primary roles of the agent is management of system- and
application-level health checks. A health check is considered to be
application-level if it is associated with a service. A check is defined in a
configuration file or added at runtime over the HTTP interface.
2014-02-19 02:05:18 +00:00
---
# Checks
2015-01-29 21:54:36 +00:00
One of the primary roles of the agent is management of system-level and application-level health
2015-01-29 21:45:19 +00:00
checks. A health check is considered to be application-level if it is associated with a
2020-01-09 09:26:47 +00:00
service. If not associated with a service, the check monitors the health of the entire node.
2020-08-17 16:17:51 +00:00
Review the [health checks tutorial](https://learn.hashicorp.com/tutorials/consul/service-registration-health-checks) to get a more complete example on how to leverage health check capabilities in Consul.
2015-01-29 21:54:36 +00:00
2016-11-25 18:25:09 +00:00
A check is defined in a configuration file or added at runtime over the HTTP interface. Checks
2015-01-29 22:10:15 +00:00
created via the HTTP interface persist with that node.
2014-02-19 02:05:18 +00:00
2018-02-03 01:53:49 +00:00
There are several different kinds of checks:
2014-02-19 02:05:18 +00:00
2020-04-06 20:27:35 +00:00
- Script + Interval - These checks depend on invoking an external application
2015-02-06 07:30:08 +00:00
that performs the health check, exits with an appropriate exit code, and potentially
generates some output. A script is paired with an invocation interval (e.g.
2016-04-14 21:31:03 +00:00
every 30 seconds). This is similar to the Nagios plugin system. The output of
2017-07-17 18:20:35 +00:00
a script check is limited to 4KB. Output larger than this will be truncated.
2016-02-26 05:12:13 +00:00
By default, Script checks will be configured with a timeout equal to 30 seconds.
2016-09-21 16:20:36 +00:00
It is possible to configure a custom Script check timeout value by specifying the
2017-10-11 21:55:55 +00:00
`timeout` field in the check definition. When the timeout is reached on Windows,
Consul will wait for any child processes spawned by the script to finish. For any
other system, Consul will attempt to force-kill the script and any child processes
it has spawned once the timeout has passed.
2018-10-11 12:22:11 +00:00
In Consul 0.9.0 and later, script checks are not enabled by default. To use them you
can either use :
2020-04-06 20:27:35 +00:00
2022-01-11 01:30:50 +00:00
- [`enable_local_script_checks`](/docs/agent/config/cli-flags#_enable_local_script_checks):
2019-01-25 19:45:08 +00:00
enable script checks defined in local config files. Script checks defined via the HTTP
2018-10-11 12:22:11 +00:00
API will not be allowed.
2022-01-11 01:30:50 +00:00
- [`enable_script_checks`](/docs/agent/config/cli-flags#_enable_script_checks): enable
2018-10-11 12:22:11 +00:00
script checks regardless of how they are defined.
2015-02-06 07:30:08 +00:00
2018-11-27 17:15:01 +00:00
~> **Security Warning:** Enabling script checks in some configurations may
introduce a remote execution vulnerability which is known to be targeted by
malware. We strongly recommend `enable_local_script_checks` instead. See [this
blog post](https://www.hashicorp.com/blog/protecting-consul-from-rce-risk-in-specific-configurations)
for more details.
2020-04-09 00:09:01 +00:00
- `HTTP + Interval` - These checks make an HTTP `GET` request to the specified URL,
2019-07-16 22:13:26 +00:00
waiting the specified `interval` amount of time between requests (eg. 30 seconds).
2020-04-06 20:27:35 +00:00
The status of the service depends on the HTTP response code: any `2xx` code is
2019-07-16 22:13:26 +00:00
considered passing, a `429 Too ManyRequests` is a warning, and anything else is
a failure. This type of check
2017-06-06 23:11:56 +00:00
should be preferred over a script that uses `curl` or another external process
to check a simple HTTP operation. By default, HTTP checks are `GET` requests
unless the `method` field specifies a different method. Additional header
fields can be set through the `header` field which is a map of lists of
strings, e.g. `{"x-foo": ["bar", "baz"]}`. By default, HTTP checks will be
2019-07-16 22:13:26 +00:00
configured with a request timeout equal to 10 seconds.
2022-04-01 21:31:15 +00:00
2019-07-16 22:13:26 +00:00
It is possible to configure a custom HTTP check timeout value by
2017-06-06 23:11:56 +00:00
specifying the `timeout` field in the check definition. The output of the
2017-07-17 18:20:35 +00:00
check is limited to roughly 4KB. Responses larger than this will be truncated.
2018-02-03 01:53:49 +00:00
HTTP checks also support TLS. By default, a valid TLS certificate is expected.
2017-06-06 23:11:56 +00:00
Certificate verification can be turned off by setting the `tls_skip_verify`
2021-02-25 06:35:34 +00:00
field to `true` in the check definition. When using TLS, the SNI will be set
automatically from the URL if it uses a hostname (as opposed to an IP address);
2021-06-16 20:13:32 +00:00
the value can be overridden by setting `tls_server_name`.
2015-02-06 07:30:08 +00:00
2022-04-01 21:31:15 +00:00
Consul follows HTTP redirects by default. Set the `disable_redirects` field to
`true` to disable redirects.
2020-04-09 00:09:01 +00:00
- `TCP + Interval` - These checks make a TCP connection attempt to the specified
2020-04-06 20:27:35 +00:00
IP/hostname and port, waiting `interval` amount of time between attempts
2019-07-16 22:13:26 +00:00
(e.g. 30 seconds). If no hostname
2016-09-21 16:20:36 +00:00
is specified, it defaults to "localhost". The status of the service depends on
whether the connection attempt is successful (ie - the port is currently
accepting connections). If the connection is accepted, the status is
`success`, otherwise the status is `critical`. In the case of a hostname that
resolves to both IPv4 and IPv6 addresses, an attempt will be made to both
addresses, and the first successful connection attempt will result in a
successful check. This type of check should be preferred over a script that
uses `netcat` or another external process to check a simple socket operation.
2020-04-06 20:27:35 +00:00
By default, TCP checks will be configured with a request timeout of 10 seconds.
It is possible to configure a custom TCP check timeout value by specifying the
2019-07-16 22:13:26 +00:00
`timeout` field in the check definition.
2015-07-27 00:53:52 +00:00
2022-06-06 19:13:19 +00:00
- `UDP + Interval` - These checks direct the client to periodically send UDP datagrams
to the specified IP/hostname and port. The duration specified in the `interval` field sets the amount of time
between attempts, such as `30s` to indicate 30 seconds. The check is logged as healthy if any response from the UDP server is received. Any other result sets the status to `critical`.
The default interval for, UDP checks is `10s`, but you can configure a custom UDP check timeout value by specifying the
`timeout` field in the check definition. If any timeout on read exists, the check is still considered healthy.
2020-04-09 00:09:01 +00:00
- `Time to Live (TTL)` ((#ttl)) - These checks retain their last known state
2020-04-07 18:55:19 +00:00
for a given TTL. The state of the check must be updated periodically over the HTTP
interface. If an external system fails to update the status within a given TTL,
the check is set to the failed state. This mechanism, conceptually similar to a
dead man's switch, relies on the application to directly report its health. For
example, a healthy app can periodically `PUT` a status update to the HTTP endpoint;
if the app fails, the TTL will expire and the health check enters a critical state.
2022-03-30 21:16:26 +00:00
The endpoints used to update health information for a given check are: [pass](/api-docs/agent/check#ttl-check-pass),
[warn](/api-docs/agent/check#ttl-check-warn), [fail](/api-docs/agent/check#ttl-check-fail),
and [update](/api-docs/agent/check#ttl-check-update). TTL checks also persist their
2020-04-07 18:55:19 +00:00
last known status to disk. This allows the Consul agent to restore the last known
status of the check across restarts. Persisted check status is valid through the
end of the TTL from the time of the last check.
2014-02-19 02:05:18 +00:00
2020-04-09 00:09:01 +00:00
- `Docker + Interval` - These checks depend on invoking an external application which
2017-07-17 18:20:35 +00:00
is packaged within a Docker Container. The application is triggered within the running
container via the Docker Exec API. We expect that the Consul agent user has access
2020-04-06 20:27:35 +00:00
to either the Docker HTTP API or the unix socket. Consul uses `$DOCKER_HOST` to
2017-07-17 18:20:35 +00:00
determine the Docker API endpoint. The application is expected to run, perform a health
check of the service running inside the container, and exit with an appropriate exit code.
The check should be paired with an invocation interval. The shell on which the check
has to be performed is configurable which makes it possible to run containers which
have different shells on the same host. Check output for Docker is limited to
4KB. Any output larger than this will be truncated. In Consul 0.9.0 and later, the agent
2022-01-11 01:30:50 +00:00
must be configured with [`enable_script_checks`](/docs/agent/config/cli-flags#_enable_script_checks)
2017-07-17 18:20:35 +00:00
set to `true` in order to enable Docker health checks.
2015-10-28 21:19:57 +00:00
2020-04-09 00:09:01 +00:00
- `gRPC + Interval` - These checks are intended for applications that support the standard
2018-02-03 01:53:49 +00:00
[gRPC health checking protocol](https://github.com/grpc/grpc/blob/master/doc/health-checking.md).
2019-07-16 22:13:26 +00:00
The state of the check will be updated by probing the configured endpoint, waiting `interval`
2020-04-06 20:27:35 +00:00
amount of time between probes (eg. 30 seconds). By default, gRPC checks will be configured
2019-07-16 22:13:26 +00:00
with a default timeout of 10 seconds.
2018-02-03 01:53:49 +00:00
It is possible to configure a custom timeout value by specifying the `timeout` field in
the check definition. gRPC checks will default to not using TLS, but TLS can be enabled by
setting `grpc_use_tls` in the check definition. If TLS is enabled, then by default, a valid
TLS certificate is expected. Certificate verification can be turned off by setting the
`tls_skip_verify` field to `true` in the check definition.
2020-02-03 09:19:06 +00:00
To check on a specific service instead of the whole gRPC server, add the service identifier after the `gRPC` check's endpoint in the following format `/:service_identifier`.
2018-02-03 01:53:49 +00:00
2021-07-25 20:08:44 +00:00
- `H2ping + Interval` - These checks test an endpoint that uses http2
by connecting to the endpoint and sending a ping frame. TLS is assumed to be configured by default.
2021-10-07 06:29:41 +00:00
To disable TLS and use h2c, set `h2ping_use_tls` to `false`. If the ping is successful
2021-04-09 19:12:10 +00:00
within a specified timeout, then the check is updated as passing.
2021-07-25 20:08:44 +00:00
The timeout defaults to 10 seconds, but is configurable using the `timeout` field. If TLS is enabled a valid
2021-04-09 19:12:10 +00:00
certificate is required, unless `tls_skip_verify` is set to `true`.
The check will be run on the interval specified by the `interval` field.
2020-04-09 00:09:01 +00:00
- `Alias` - These checks alias the health state of another registered
2020-04-07 18:55:19 +00:00
node or service. The state of the check will be updated asynchronously, but is
nearly instant. For aliased services on the same agent, the local state is monitored
and no additional network resources are consumed. For other services and nodes,
the check maintains a blocking query over the agent's connection with a current
server and allows stale requests. If there are any errors in watching the aliased
node or service, the check state will be critical. For the blocking query, the
check will use the ACL token set on the service or check definition or otherwise
will fall back to the default ACL token set with the agent (`acl_token`).
2018-07-13 04:14:36 +00:00
2014-02-19 02:05:18 +00:00
## Check Definition
2015-01-29 21:45:19 +00:00
A script check:
2014-02-19 02:05:18 +00:00
2022-01-10 23:19:39 +00:00
<CodeTabs heading="Script Check">
```hcl
check = {
id = "mem-util"
2022-05-09 00:48:37 +00:00
name = "Memory utilization"
2022-01-10 23:19:39 +00:00
args = ["/usr/local/bin/check_mem.py", "-limit", "256MB"]
interval = "10s"
timeout = "1s"
}
```
2021-07-31 01:37:33 +00:00
```json
2014-10-19 23:40:10 +00:00
{
"check": {
"id": "mem-util",
"name": "Memory utilization",
2017-10-04 23:48:00 +00:00
"args": ["/usr/local/bin/check_mem.py", "-limit", "256MB"],
2016-02-26 05:12:13 +00:00
"interval": "10s",
"timeout": "1s"
2014-10-19 23:40:10 +00:00
}
}
```
2014-02-19 02:05:18 +00:00
2022-01-10 23:19:39 +00:00
</CodeTabs>
2015-01-29 21:45:19 +00:00
A HTTP check:
2015-01-09 22:43:24 +00:00
2022-01-10 23:19:39 +00:00
<CodeTabs heading="HTTP Check">
```hcl
check = {
id = "api"
name = "HTTP API on port 5000"
http = "https://localhost:5000/health"
tls_server_name = ""
tls_skip_verify = false
method = "POST"
header = {
2022-05-09 00:48:37 +00:00
Content-Type = ["application/json"]
2022-01-10 23:19:39 +00:00
}
body = "{\"method\":\"health\"}"
2022-04-01 21:31:15 +00:00
disable_redirects = true
2022-05-09 00:48:37 +00:00
interval = "10s"
timeout = "1s"
2022-01-10 23:19:39 +00:00
}
```
2021-07-31 01:37:33 +00:00
```json
2015-01-09 22:43:24 +00:00
{
"check": {
"id": "api",
"name": "HTTP API on port 5000",
2017-06-24 22:37:52 +00:00
"http": "https://localhost:5000/health",
2021-02-25 06:35:34 +00:00
"tls_server_name": "",
2017-06-24 22:37:52 +00:00
"tls_skip_verify": false,
2017-06-06 23:11:56 +00:00
"method": "POST",
2021-10-07 06:29:41 +00:00
"header": { "Content-Type": ["application/json"] },
2020-02-10 16:27:12 +00:00
"body": "{\"method\":\"health\"}",
2015-02-06 07:30:08 +00:00
"interval": "10s",
"timeout": "1s"
2015-01-09 22:43:24 +00:00
}
}
```
2022-01-10 23:19:39 +00:00
</CodeTabs>
2015-07-27 00:53:52 +00:00
A TCP check:
2022-01-10 23:19:39 +00:00
<CodeTabs heading="TCP Check">
```hcl
check = {
id = "ssh"
name = "SSH TCP on port 22"
tcp = "localhost:22"
interval = "10s"
timeout = "1s"
}
```
2021-07-31 01:37:33 +00:00
```json
2015-07-27 00:53:52 +00:00
{
"check": {
"id": "ssh",
"name": "SSH TCP on port 22",
"tcp": "localhost:22",
"interval": "10s",
"timeout": "1s"
}
}
```
2022-01-10 23:19:39 +00:00
</CodeTabs>
2022-06-06 19:13:19 +00:00
A UDP check:
<CodeTabs heading="UDP Check">
```hcl
check = {
id = "dns"
name = "DNS UDP on port 53"
udp = "localhost:53"
interval = "10s"
timeout = "1s"
}
```
```json
{
"check": {
"id": "dns",
"name": "DNS UDP on port 53",
"udp": "localhost:53",
"interval": "10s",
"timeout": "1s"
}
}
```
</CodeTabs>
2015-01-29 21:45:19 +00:00
A TTL check:
2014-02-19 02:05:18 +00:00
2022-01-10 23:19:39 +00:00
<CodeTabs heading="TTL Check">
```hcl
2022-05-09 00:48:37 +00:00
check = {
2022-01-10 23:19:39 +00:00
id = "web-app"
name = "Web App Status"
notes = "Web app does a curl internally every 10 seconds"
2022-05-09 00:48:37 +00:00
ttl = "30s"
2022-01-10 23:19:39 +00:00
}
```
2021-07-31 01:37:33 +00:00
```json
2014-10-19 23:40:10 +00:00
{
"check": {
"id": "web-app",
"name": "Web App Status",
"notes": "Web app does a curl internally every 10 seconds",
"ttl": "30s"
}
}
```
2014-02-19 02:05:18 +00:00
2022-01-10 23:19:39 +00:00
</CodeTabs>
2015-10-28 21:19:57 +00:00
A Docker check:
2015-12-08 08:04:55 +00:00
2022-01-10 23:19:39 +00:00
<CodeTabs heading="Docker Check">
```hcl
check = {
id = "mem-util"
name = "Memory utilization"
docker_container_id = "f972c95ebf0e"
shell = "/bin/bash"
args = ["/usr/local/bin/check_mem.py"]
interval = "10s"
}
```
2021-07-31 01:37:33 +00:00
```json
2015-10-28 21:19:57 +00:00
{
2018-07-13 04:14:36 +00:00
"check": {
2015-10-28 21:19:57 +00:00
"id": "mem-util",
"name": "Memory utilization",
"docker_container_id": "f972c95ebf0e",
"shell": "/bin/bash",
2017-10-04 23:48:00 +00:00
"args": ["/usr/local/bin/check_mem.py"],
2015-10-28 21:19:57 +00:00
"interval": "10s"
}
}
```
2022-01-10 23:19:39 +00:00
</CodeTabs>
2020-02-03 09:19:06 +00:00
A gRPC check for the whole application:
2018-02-03 01:53:49 +00:00
2022-01-10 23:19:39 +00:00
<CodeTabs heading="gRPC Check">
```hcl
check = {
2022-05-09 00:48:37 +00:00
id = "mem-util"
2022-01-10 23:19:39 +00:00
name = "Service health status"
2022-05-09 00:48:37 +00:00
grpc = "127.0.0.1:12345"
2022-01-10 23:19:39 +00:00
grpc_use_tls = true
2022-05-09 00:48:37 +00:00
interval = "10s"
2022-01-10 23:19:39 +00:00
}
```
2021-07-31 01:37:33 +00:00
```json
2018-02-03 01:53:49 +00:00
{
2018-07-13 04:14:36 +00:00
"check": {
2018-02-03 01:53:49 +00:00
"id": "mem-util",
"name": "Service health status",
"grpc": "127.0.0.1:12345",
"grpc_use_tls": true,
"interval": "10s"
}
}
```
2022-01-10 23:19:39 +00:00
</CodeTabs>
2020-02-03 09:19:06 +00:00
A gRPC check for the specific `my_service` service:
2022-01-10 23:19:39 +00:00
<CodeTabs heading="gRPC Specific Service Check">
```hcl
check = {
id = "mem-util"
name = "Service health status"
grpc = "127.0.0.1:12345/my_service"
grpc_use_tls = true
interval = "10s"
}
```
2021-07-31 01:37:33 +00:00
```json
2020-02-03 09:19:06 +00:00
{
"check": {
"id": "mem-util",
"name": "Service health status",
"grpc": "127.0.0.1:12345/my_service",
"grpc_use_tls": true,
"interval": "10s"
}
}
```
2022-01-10 23:19:39 +00:00
</CodeTabs>
2021-04-09 19:12:10 +00:00
A h2ping check:
2022-01-10 23:19:39 +00:00
<CodeTabs heading="H2ping Check">
```hcl
check = {
id = "h2ping-check"
name = "h2ping"
h2ping = "localhost:22222"
interval = "10s"
h2ping_use_tls = false
}
```
2021-07-31 01:37:33 +00:00
```json
2021-04-09 19:12:10 +00:00
{
"check": {
"id": "h2ping-check",
"name": "h2ping",
"h2ping": "localhost:22222",
"interval": "10s",
2021-10-05 01:36:18 +00:00
"h2ping_use_tls": false
2021-04-09 19:12:10 +00:00
}
}
```
2022-01-10 23:19:39 +00:00
</CodeTabs>
2018-07-13 04:14:36 +00:00
An alias check for a local service:
2022-01-10 23:19:39 +00:00
<CodeTabs heading="Alias Check">
```hcl
check = {
id = "web-alias"
alias_service = "web"
}
```
2021-07-31 01:37:33 +00:00
```json
2018-07-13 04:14:36 +00:00
{
"check": {
"id": "web-alias",
"alias_service": "web"
}
}
```
2022-01-10 23:19:39 +00:00
</CodeTabs>
2020-12-08 23:24:36 +00:00
~> Configuration info: The alias check configuration expects the alias to be
registered on the same agent as the one you are aliasing. If the service is
not registered with the same agent, `"alias_node": "<node_id>"` must also be
specified. When using `alias_node`, if no service is specified, the check will
alias the health of the node. If a service is specified, the check will alias
2020-09-23 16:11:09 +00:00
the specified service on this particular node.
2017-10-10 23:40:59 +00:00
Each type of definition must include a `name` and may optionally provide an
`id` and `notes` field. The `id` must be unique per _agent_ otherwise only the
last defined check with that `id` will be registered. If the `id` is not set
and the check is embedded within a service definition a unique check id is
generated. Otherwise, `id` will be set to `name`. If names might conflict,
unique IDs should be provided.
2015-01-29 21:45:19 +00:00
The `notes` field is opaque to Consul but can be used to provide a human-readable
2018-12-05 20:40:48 +00:00
description of the current state of the check. Similarly, an external process
updating a TTL check via the HTTP interface can set the `notes` value.
2014-02-19 02:05:18 +00:00
2015-04-28 21:26:22 +00:00
Checks may also contain a `token` field to provide an ACL token. This token is
used for any interaction with the catalog for the check, including
2022-01-10 23:36:16 +00:00
[anti-entropy syncs](/docs/architecture/anti-entropy) and deregistration.
2018-07-13 04:14:36 +00:00
For Alias checks, this token is used if a remote blocking query is necessary
to watch the state of the aliased node or service.
2015-04-28 21:26:22 +00:00
2022-06-06 19:13:19 +00:00
Script, TCP, UDP, HTTP, Docker, and gRPC checks must include an `interval` field. This
2018-02-03 01:53:49 +00:00
field is parsed by Go's `time` package, and has the following
2016-01-13 22:44:01 +00:00
[formatting specification](https://golang.org/pkg/time/#ParseDuration):
2020-04-06 20:27:35 +00:00
2015-06-03 17:53:09 +00:00
> A duration string is a possibly signed sequence of decimal numbers, each with
> optional fraction and a unit suffix, such as "300ms", "-1.5h" or "2h45m".
> Valid time units are "ns", "us" (or "µs"), "ms", "s", "m", "h".
2016-08-16 16:27:20 +00:00
In Consul 0.7 and later, checks that are associated with a service may also contain
an optional `deregister_critical_service_after` field, which is a timeout in the
same Go time format as `interval` and `ttl`. If a check is in the critical state
for more than this configured value, then its associated service (and all of its
associated checks) will automatically be deregistered. The minimum timeout is 1
2016-08-16 19:52:30 +00:00
minute, and the process that reaps critical services runs every 30 seconds, so it
2016-08-16 16:27:20 +00:00
may take slightly longer than the configured timeout to trigger the deregistration.
This should generally be configured with a timeout that's much, much longer than
any expected recoverable outage for the given service.
2016-08-16 07:05:55 +00:00
2014-02-23 02:53:31 +00:00
To configure a check, either provide it as a `-config-file` option to the
2015-01-29 21:45:19 +00:00
agent or place it inside the `-config-dir` of the agent. The file must
2018-02-03 01:53:49 +00:00
end in a ".json" or ".hcl" extension to be loaded by Consul. Check definitions
can also be updated by sending a `SIGHUP` to the agent. Alternatively, the
2020-04-09 23:20:00 +00:00
check can be registered dynamically using the [HTTP API](/api).
2014-02-23 02:53:31 +00:00
2014-02-19 20:05:18 +00:00
## Check Scripts
A check script is generally free to do anything to determine the status
2015-01-29 21:45:19 +00:00
of the check. The only limitations placed are that the exit codes must obey
this convention:
2014-02-19 20:05:18 +00:00
2020-04-06 20:27:35 +00:00
- Exit code 0 - Check is passing
- Exit code 1 - Check is warning
- Any other code - Check is failing
2014-02-19 20:05:18 +00:00
This is the only convention that Consul depends on. Any output of the script
2018-12-05 20:40:48 +00:00
will be captured and stored in the `output` field.
2014-10-26 20:24:23 +00:00
2017-07-17 18:20:35 +00:00
In Consul 0.9.0 and later, the agent must be configured with
2022-01-11 01:30:50 +00:00
[`enable_script_checks`](/docs/agent/config/cli-flags#_enable_script_checks) set to `true`
2017-07-17 18:20:35 +00:00
in order to enable script checks.
2015-05-28 20:03:01 +00:00
## Initial Health Check Status
By default, when checks are registered against a Consul agent, the state is set
immediately to "critical". This is useful to prevent services from being
registered as "passing" and entering the service pool before they are confirmed
to be healthy. In certain cases, it may be desirable to specify the initial
state of a health check. This can be done by specifying the `status` field in a
health check definition, like so:
2022-01-10 23:19:39 +00:00
<CodeTabs heading="Status Field Example">
```hcl
2022-05-09 00:48:37 +00:00
check = {
id = "mem"
args = ["/bin/check_mem", "-limit", "256MB"]
interval = "10s"
status = "passing"
2022-01-10 23:19:39 +00:00
}
```
2021-07-31 01:37:33 +00:00
```json
2015-05-28 20:03:01 +00:00
{
"check": {
"id": "mem",
2017-10-04 23:48:00 +00:00
"args": ["/bin/check_mem", "-limit", "256MB"],
2015-05-28 20:03:01 +00:00
"interval": "10s",
"status": "passing"
}
}
```
2022-01-10 23:19:39 +00:00
</CodeTabs>
2015-05-28 20:03:01 +00:00
The above service definition would cause the new "mem" check to be
registered with its initial state set to "passing".
2015-01-14 01:52:17 +00:00
## Service-bound checks
2015-01-29 21:45:19 +00:00
Health checks may optionally be bound to a specific service. This ensures
2015-01-14 01:52:17 +00:00
that the status of the health check will only affect the health status of the
given service instead of the entire node. Service-bound health checks may be
provided by adding a `service_id` field to a check configuration:
2022-01-10 23:19:39 +00:00
<CodeTabs heading="Status Field Example">
```hcl
2022-05-09 00:48:37 +00:00
check = {
2022-01-10 23:19:39 +00:00
id = "web-app"
2022-05-09 00:48:37 +00:00
name = "Web App Status"
service_id = "web-app"
2022-01-10 23:19:39 +00:00
ttl = "30s"
}
```
2021-07-31 01:37:33 +00:00
```json
2015-01-14 01:52:17 +00:00
{
"check": {
"id": "web-app",
"name": "Web App Status",
"service_id": "web-app",
"ttl": "30s"
}
}
```
2022-01-10 23:19:39 +00:00
</CodeTabs>
2015-01-14 01:52:17 +00:00
In the above configuration, if the web-app health check begins failing, it will
2015-01-29 21:45:19 +00:00
only affect the availability of the web-app service. All other services
provided by the node will remain unchanged.
2015-01-14 01:52:17 +00:00
2018-12-19 00:47:52 +00:00
## Agent Certificates for TLS Checks
2022-01-11 01:30:50 +00:00
The [enable_agent_tls_for_checks](/docs/agent/config/config-files#enable_agent_tls_for_checks)
2018-12-19 00:47:52 +00:00
agent configuration option can be utilized to have HTTP or gRPC health checks
to use the agent's credentials when configured for TLS.
2014-10-26 20:24:23 +00:00
## Multiple Check Definitions
2015-01-29 21:45:19 +00:00
Multiple check definitions can be defined using the `checks` (plural)
2014-10-26 20:24:23 +00:00
key in your configuration file.
2022-01-10 23:19:39 +00:00
<CodeTabs heading="Multiple Checks Example">
```hcl
checks = [
{
id = "chk1"
name = "mem"
args = ["/bin/check_mem", "-limit", "256MB"]
interval = "5s"
},
{
id = "chk2"
name = "/health"
http = "http://localhost:5000/health"
interval = "15s"
},
{
id = "chk3"
name = "cpu"
args = ["/bin/check_cpu"]
interval = "10s"
},
...
]
```
2021-07-31 01:37:33 +00:00
```json
2014-10-26 20:24:23 +00:00
{
"checks": [
{
"id": "chk1",
"name": "mem",
2017-10-04 23:48:00 +00:00
"args": ["/bin/check_mem", "-limit", "256MB"],
2014-10-27 18:58:01 +00:00
"interval": "5s"
2014-10-26 20:24:23 +00:00
},
{
"id": "chk2",
2015-01-09 22:43:24 +00:00
"name": "/health",
"http": "http://localhost:5000/health",
"interval": "15s"
},
{
"id": "chk3",
2014-10-26 20:24:23 +00:00
"name": "cpu",
2018-10-24 14:21:20 +00:00
"args": ["/bin/check_cpu"],
2014-10-27 18:58:01 +00:00
"interval": "10s"
2014-10-26 20:24:23 +00:00
},
...
]
}
```
2019-10-14 20:49:49 +00:00
2022-01-10 23:19:39 +00:00
</CodeTabs>
2021-09-14 16:47:52 +00:00
## Success/Failures before passing/warning/critical
2019-10-14 20:49:49 +00:00
2021-09-14 16:47:52 +00:00
To prevent flapping health checks, and limit the load they cause on the cluster,
a health check may be configured to become passing/warning/critical only after a
specified number of consecutive checks return passing/critical.
2020-01-24 00:29:54 +00:00
The status will not transition states until the configured threshold is reached.
2021-10-07 06:29:41 +00:00
- `success_before_passing` - Number of consecutive successful results required
before check status transitions to passing. Defaults to `0`. Added in Consul 1.7.0.
- `failures_before_warning` - Number of consecutive unsuccessful results required
2021-09-14 16:47:52 +00:00
before check status transitions to warning. Defaults to the same value as that of
`failures_before_critical` to maintain the expected behavior of not changing the
2022-01-11 01:02:17 +00:00
status of service checks to `warning` before `critical` unless configured to do so.
2021-09-14 16:47:52 +00:00
Values higher than `failures_before_critical` are invalid. Added in Consul 1.11.0.
2021-10-07 06:29:41 +00:00
- `failures_before_critical` - Number of consecutive unsuccessful results required
2021-09-14 16:47:52 +00:00
before check status transitions to critical. Defaults to `0`. Added in Consul 1.7.0.
2019-10-14 20:49:49 +00:00
This feature is available for HTTP, TCP, gRPC, Docker & Monitor checks.
2020-01-24 00:29:54 +00:00
By default, both passing and critical thresholds will be set to 0 so the check
status will always reflect the last check result.
2019-10-14 20:49:49 +00:00
2022-01-10 23:19:39 +00:00
<CodeTabs heading="Flapping Prevention Example">
```hcl
checks = [
{
name = "HTTP TCP on port 80"
tcp = "localhost:80"
interval = "10s"
2022-05-09 00:48:37 +00:00
timeout = "1s"
2022-01-10 23:19:39 +00:00
success_before_passing = 3
failures_before_warning = 1
failures_before_critical = 3
}
]
```
2020-01-24 00:29:54 +00:00
```json
2019-10-14 20:49:49 +00:00
{
2020-01-24 00:29:54 +00:00
"checks": [
{
"name": "HTTP TCP on port 80",
"tcp": "localhost:80",
"interval": "10s",
"timeout": "1s",
"success_before_passing": 3,
2021-09-14 16:47:52 +00:00
"failures_before_warning": 1,
2020-01-24 00:29:54 +00:00
"failures_before_critical": 3
}
]
2019-10-14 20:49:49 +00:00
}
2020-01-09 09:26:47 +00:00
```
2022-01-10 23:19:39 +00:00
</CodeTabs>