diff --git a/website/content/api-docs/agent/check.mdx b/website/content/api-docs/agent/check.mdx
index eafbb17c4..785fbce8b 100644
--- a/website/content/api-docs/agent/check.mdx
+++ b/website/content/api-docs/agent/check.mdx
@@ -6,7 +6,10 @@ description: The /agent/check endpoints interact with checks on the local agent
# Check - Agent HTTP API
-The `/agent/check` endpoints interact with checks on the local agent in Consul.
+Consul's health check capabilities are described in the
+[health checks overview](/docs/discovery/checks).
+The `/agent/check` endpoints interact with health checks
+managed by the local agent in Consul.
These should not be confused with checks in the catalog.
## List Checks
@@ -418,6 +421,10 @@ $ curl \
This endpoint is used with a TTL type check to set the status of the check to
`critical` and to reset the TTL clock.
+If you want to manually mark a service as unhealthy,
+use [maintenance mode](/api-docs/agent#enable-maintenance-mode)
+instead of defining a TTL health check and using this endpoint.
+
| Method | Path | Produces |
| ------ | ----------------------------- | ------------------ |
| `PUT` | `/agent/check/fail/:check_id` | `application/json` |
@@ -456,6 +463,10 @@ $ curl \
This endpoint is used with a TTL type check to set the status of the check and
to reset the TTL clock.
+If you want to manually mark a service as unhealthy,
+use [maintenance mode](/api-docs/agent#enable-maintenance-mode)
+instead of defining a TTL health check and using this endpoint.
+
| Method | Path | Produces |
| ------ | ------------------------------- | ------------------ |
| `PUT` | `/agent/check/update/:check_id` | `application/json` |
diff --git a/website/content/api-docs/health.mdx b/website/content/api-docs/health.mdx
index 898c8ffe4..cad74bbad 100644
--- a/website/content/api-docs/health.mdx
+++ b/website/content/api-docs/health.mdx
@@ -14,6 +14,9 @@ optional health checking mechanisms. Additionally, some of the query results
from the health endpoints are filtered while the catalog endpoints provide the
raw entries.
+To modify health check registration or information,
+use the [`/agent/check`](/api-docs/agent/check) endpoints.
+
## List Checks for Node
This endpoint returns the checks specific to the node provided on the path.
diff --git a/website/content/docs/discovery/checks.mdx b/website/content/docs/discovery/checks.mdx
index 5a2149579..1b4c4faf4 100644
--- a/website/content/docs/discovery/checks.mdx
+++ b/website/content/docs/discovery/checks.mdx
@@ -13,144 +13,72 @@ description: >-
One of the primary roles of the agent is management of system-level and application-level health
checks. A health check is considered to be application-level if it is associated with a
service. If not associated with a service, the check monitors the health of the entire node.
-Review the [health checks tutorial](https://learn.hashicorp.com/tutorials/consul/service-registration-health-checks) to get a more complete example on how to leverage health check capabilities in Consul.
-A check is defined in a configuration file or added at runtime over the HTTP interface. Checks
-created via the HTTP interface persist with that node.
+Review the [service health checks tutorial](https://learn.hashicorp.com/tutorials/consul/service-registration-health-checks)
+to get a more complete example on how to leverage health check capabilities in Consul.
-There are several different kinds of checks:
+## Registering a health check
-- Script + Interval - These checks depend on invoking an external application
- that performs the health check, exits with an appropriate exit code, and potentially
- generates some output. A script is paired with an invocation interval (e.g.
- every 30 seconds). This is similar to the Nagios plugin system. The output of
- a script check is limited to 4KB. Output larger than this will be truncated.
- By default, Script checks will be configured with a timeout equal to 30 seconds.
- It is possible to configure a custom Script check timeout value by specifying the
- `timeout` field in the check definition. When the timeout is reached on Windows,
- Consul will wait for any child processes spawned by the script to finish. For any
- other system, Consul will attempt to force-kill the script and any child processes
- it has spawned once the timeout has passed.
- In Consul 0.9.0 and later, script checks are not enabled by default. To use them you
- can either use :
+There are three ways to register a service with health checks:
- - [`enable_local_script_checks`](/docs/agent/config/cli-flags#_enable_local_script_checks):
- enable script checks defined in local config files. Script checks defined via the HTTP
- API will not be allowed.
- - [`enable_script_checks`](/docs/agent/config/cli-flags#_enable_script_checks): enable
- script checks regardless of how they are defined.
+1. Start or reload a Consul agent with a service definition file in the
+ [agent's configuration directory](/docs/agent#configuring-consul-agents).
+1. Call the
+ [`/agent/service/register`](/api-docs/agent/service#register-service)
+ HTTP API endpoint to register the service.
+1. Use the
+ [`consul services register`](/commands/services/register)
+ CLI command to register the service.
- ~> **Security Warning:** Enabling script checks in some configurations may
- introduce a remote execution vulnerability which is known to be targeted by
- malware. We strongly recommend `enable_local_script_checks` instead. See [this
- blog post](https://www.hashicorp.com/blog/protecting-consul-from-rce-risk-in-specific-configurations)
- for more details.
+When a service is registered using the HTTP API endpoint or CLI command,
+the checks persist in the Consul data folder across Consul agent restarts.
-- `HTTP + Interval` - These checks make an HTTP `GET` request to the specified URL,
- waiting the specified `interval` amount of time between requests (eg. 30 seconds).
- The status of the service depends on the HTTP response code: any `2xx` code is
- considered passing, a `429 Too ManyRequests` is a warning, and anything else is
- a failure. This type of check
- should be preferred over a script that uses `curl` or another external process
- to check a simple HTTP operation. By default, HTTP checks are `GET` requests
- unless the `method` field specifies a different method. Additional header
- fields can be set through the `header` field which is a map of lists of
- strings, e.g. `{"x-foo": ["bar", "baz"]}`. By default, HTTP checks will be
- configured with a request timeout equal to 10 seconds.
+## Types of checks
- It is possible to configure a custom HTTP check timeout value by
- specifying the `timeout` field in the check definition. The output of the
- check is limited to roughly 4KB. Responses larger than this will be truncated.
- HTTP checks also support TLS. By default, a valid TLS certificate is expected.
- Certificate verification can be turned off by setting the `tls_skip_verify`
- field to `true` in the check definition. When using TLS, the SNI will be set
- automatically from the URL if it uses a hostname (as opposed to an IP address);
- the value can be overridden by setting `tls_server_name`.
+This section describes the available types of health checks you can use to
+automatically monitor the health of a service instance or node.
- Consul follows HTTP redirects by default. Set the `disable_redirects` field to
- `true` to disable redirects.
+-> **To manually mark a service unhealthy:** Use the maintenance mode
+ [CLI command](/commands/maint) or
+ [HTTP API endpoint](/api-docs/agent#enable-maintenance-mode)
+ to temporarily remove one or all service instances on a node
+ from service discovery DNS and HTTP API query results.
-- `TCP + Interval` - These checks make a TCP connection attempt to the specified
- IP/hostname and port, waiting `interval` amount of time between attempts
- (e.g. 30 seconds). If no hostname
- is specified, it defaults to "localhost". The status of the service depends on
- whether the connection attempt is successful (ie - the port is currently
- accepting connections). If the connection is accepted, the status is
- `success`, otherwise the status is `critical`. In the case of a hostname that
- resolves to both IPv4 and IPv6 addresses, an attempt will be made to both
- addresses, and the first successful connection attempt will result in a
- successful check. This type of check should be preferred over a script that
- uses `netcat` or another external process to check a simple socket operation.
- By default, TCP checks will be configured with a request timeout of 10 seconds.
- It is possible to configure a custom TCP check timeout value by specifying the
- `timeout` field in the check definition.
+### Script check ((#script-interval))
-- `UDP + Interval` - These checks direct the client to periodically send UDP datagrams
- to the specified IP/hostname and port. The duration specified in the `interval` field sets the amount of time
- between attempts, such as `30s` to indicate 30 seconds. The check is logged as healthy if any response from the UDP server is received. Any other result sets the status to `critical`.
- The default interval for, UDP checks is `10s`, but you can configure a custom UDP check timeout value by specifying the
- `timeout` field in the check definition. If any timeout on read exists, the check is still considered healthy.
+Script checks periodically invoke an external application that performs the health check,
+exits with an appropriate exit code, and potentially generates some output.
+The specified `interval` determines the time between check invocations.
+The output of a script check is limited to 4KB.
+Larger outputs are truncated.
-- `Time to Live (TTL)` ((#ttl)) - These checks retain their last known state
- for a given TTL. The state of the check must be updated periodically over the HTTP
- interface. If an external system fails to update the status within a given TTL,
- the check is set to the failed state. This mechanism, conceptually similar to a
- dead man's switch, relies on the application to directly report its health. For
- example, a healthy app can periodically `PUT` a status update to the HTTP endpoint;
- if the app fails, the TTL will expire and the health check enters a critical state.
- The endpoints used to update health information for a given check are: [pass](/api-docs/agent/check#ttl-check-pass),
- [warn](/api-docs/agent/check#ttl-check-warn), [fail](/api-docs/agent/check#ttl-check-fail),
- and [update](/api-docs/agent/check#ttl-check-update). TTL checks also persist their
- last known status to disk. This allows the Consul agent to restore the last known
- status of the check across restarts. Persisted check status is valid through the
- end of the TTL from the time of the last check.
+By default, script checks are configured with a timeout equal to 30 seconds.
+To configure a custom script check timeout value,
+specify the `timeout` field in the check definition.
+After reaching the timeout on a Windows system,
+Consul waits for any child processes spawned by the script to finish.
+After reaching the timeout on other systems,
+Consul attempts to force-kill the script and any child processes it spawned.
-- `Docker + Interval` - These checks depend on invoking an external application which
- is packaged within a Docker Container. The application is triggered within the running
- container via the Docker Exec API. We expect that the Consul agent user has access
- to either the Docker HTTP API or the unix socket. Consul uses `$DOCKER_HOST` to
- determine the Docker API endpoint. The application is expected to run, perform a health
- check of the service running inside the container, and exit with an appropriate exit code.
- The check should be paired with an invocation interval. The shell on which the check
- has to be performed is configurable which makes it possible to run containers which
- have different shells on the same host. Check output for Docker is limited to
- 4KB. Any output larger than this will be truncated. In Consul 0.9.0 and later, the agent
- must be configured with [`enable_script_checks`](/docs/agent/config/cli-flags#_enable_script_checks)
- set to `true` in order to enable Docker health checks.
+Script checks are not enabled by default.
+To enable a Consul agent to perform script checks,
+use one of the following agent configuration options:
-- `gRPC + Interval` - These checks are intended for applications that support the standard
- [gRPC health checking protocol](https://github.com/grpc/grpc/blob/master/doc/health-checking.md).
- The state of the check will be updated by probing the configured endpoint, waiting `interval`
- amount of time between probes (eg. 30 seconds). By default, gRPC checks will be configured
- with a default timeout of 10 seconds.
- It is possible to configure a custom timeout value by specifying the `timeout` field in
- the check definition. gRPC checks will default to not using TLS, but TLS can be enabled by
- setting `grpc_use_tls` in the check definition. If TLS is enabled, then by default, a valid
- TLS certificate is expected. Certificate verification can be turned off by setting the
- `tls_skip_verify` field to `true` in the check definition.
- To check on a specific service instead of the whole gRPC server, add the service identifier after the `gRPC` check's endpoint in the following format `/:service_identifier`.
+- [`enable_local_script_checks`](/docs/agent/config/cli-flags#_enable_local_script_checks):
+ Enable script checks defined in local config files.
+ Script checks registered using the HTTP API are not allowed.
+- [`enable_script_checks`](/docs/agent/config/cli-flags#_enable_script_checks):
+ Enable script checks no matter how they are registered.
-- `H2ping + Interval` - These checks test an endpoint that uses http2
- by connecting to the endpoint and sending a ping frame. TLS is assumed to be configured by default.
- To disable TLS and use h2c, set `h2ping_use_tls` to `false`. If the ping is successful
- within a specified timeout, then the check is updated as passing.
- The timeout defaults to 10 seconds, but is configurable using the `timeout` field. If TLS is enabled a valid
- certificate is required, unless `tls_skip_verify` is set to `true`.
- The check will be run on the interval specified by the `interval` field.
+ ~> **Security Warning:**
+ Enabling non-local script checks in some configurations may introduce
+ a remote execution vulnerability known to be targeted by malware.
+ We strongly recommend `enable_local_script_checks` instead.
+ For more information, refer to
+ [this blog post](https://www.hashicorp.com/blog/protecting-consul-from-rce-risk-in-specific-configurations).
-- `Alias` - These checks alias the health state of another registered
- node or service. The state of the check will be updated asynchronously, but is
- nearly instant. For aliased services on the same agent, the local state is monitored
- and no additional network resources are consumed. For other services and nodes,
- the check maintains a blocking query over the agent's connection with a current
- server and allows stale requests. If there are any errors in watching the aliased
- node or service, the check state will be critical. For the blocking query, the
- check will use the ACL token set on the service or check definition or otherwise
- will fall back to the default ACL token set with the agent (`acl_token`).
-
-## Check Definition
-
-A script check:
+The following service definition file snippet is an example
+of a script check definition:
@@ -162,7 +90,6 @@ check = {
interval = "10s"
timeout = "1s"
}
-
```
```json
@@ -179,7 +106,47 @@ check = {
-A HTTP check:
+#### Check script conventions
+
+A check script's exit code is used to determine the health check status:
+
+- Exit code 0 - Check is passing
+- Exit code 1 - Check is warning
+- Any other code - Check is failing
+
+Any output of the script is captured and made available in the
+`Output` field of checks included in HTTP API responses,
+as in this example from the [local service health endpoint](/api-docs/agent/service#by-name-json).
+
+### HTTP check ((#http-interval))
+
+HTTP checks periodically make an HTTP `GET` request to the specified URL,
+waiting the specified `interval` amount of time between requests.
+The status of the service depends on the HTTP response code: any `2xx` code is
+considered passing, a `429 Too ManyRequests` is a warning, and anything else is
+a failure. This type of check
+should be preferred over a script that uses `curl` or another external process
+to check a simple HTTP operation. By default, HTTP checks are `GET` requests
+unless the `method` field specifies a different method. Additional request
+headers can be set through the `header` field which is a map of lists of
+strings, such as `{"x-foo": ["bar", "baz"]}`.
+
+By default, HTTP checks are configured with a request timeout equal to 10 seconds.
+To configure a custom HTTP check timeout value,
+specify the `timeout` field in the check definition.
+The output of an HTTP check is limited to approximately 4KB.
+Larger outputs are truncated.
+HTTP checks also support TLS. By default, a valid TLS certificate is expected.
+Certificate verification can be turned off by setting the `tls_skip_verify`
+field to `true` in the check definition. When using TLS, the SNI is implicitly
+determined from the URL if it uses a hostname instead of an IP address.
+You can explicitly set the SNI value by setting `tls_server_name`.
+
+Consul follows HTTP redirects by default.
+To disable redirects, set the `disable_redirects` field to `true`.
+
+The following service definition file snippet is an example
+of an HTTP check definition:
@@ -220,7 +187,23 @@ check = {
-A TCP check:
+### TCP check ((#tcp-interval))
+
+TCP checks periodically make a TCP connection attempt to the specified IP/hostname and port, waiting `interval` amount of time between attempts.
+If no hostname is specified, it defaults to "localhost".
+The health check status is `success` if the target host accepts the connection attempt,
+otherwise the status is `critical`. In the case of a hostname that
+resolves to both IPv4 and IPv6 addresses, an attempt is made to both
+addresses, and the first successful connection attempt results in a
+successful check. This type of check should be preferred over a script that
+uses `netcat` or another external process to check a simple socket operation.
+
+By default, TCP checks are configured with a request timeout equal to 10 seconds.
+To configure a custom TCP check timeout value,
+specify the `timeout` field in the check definition.
+
+The following service definition file snippet is an example
+of a TCP check definition:
@@ -232,7 +215,6 @@ check = {
interval = "10s"
timeout = "1s"
}
-
```
```json
@@ -249,7 +231,21 @@ check = {
-A UDP check:
+### UDP check ((#udp-interval))
+
+UDP checks periodically direct the Consul agent to send UDP datagrams
+to the specified IP/hostname and port,
+waiting `interval` amount of time between attempts.
+The check status is set to `success` if any response is received from the targeted UDP server.
+Any other result sets the status to `critical`.
+
+By default, UDP checks are configured with a request timeout equal to 10 seconds.
+To configure a custom UDP check timeout value,
+specify the `timeout` field in the check definition.
+If any timeout on read exists, the check is still considered healthy.
+
+The following service definition file snippet is an example
+of a UDP check definition:
@@ -261,7 +257,6 @@ check = {
interval = "10s"
timeout = "1s"
}
-
```
```json
@@ -278,7 +273,32 @@ check = {
-A TTL check:
+### Time to live (TTL) check ((#ttl))
+
+TTL checks retain their last known state for the specified `ttl` duration.
+If the `ttl` duration elapses before a new check update
+is provided over the HTTP interface,
+the check is set to `critical` state.
+
+This mechanism relies on the application to directly report its health.
+For example, a healthy app can periodically `PUT` a status update to the HTTP endpoint.
+Then, if the app is disrupted and unable to perform this update
+before the TTL expires, the health check enters the `critical` state.
+The endpoints used to update health information for a given check are: [pass](/api-docs/agent/check#ttl-check-pass),
+[warn](/api-docs/agent/check#ttl-check-warn), [fail](/api-docs/agent/check#ttl-check-fail),
+and [update](/api-docs/agent/check#ttl-check-update). TTL checks also persist their
+last known status to disk. This persistence allows the Consul agent to restore the last known
+status of the check across agent restarts. Persisted check status is valid through the
+end of the TTL from the time of the last check.
+
+To manually mark a service unhealthy,
+it is far more convenient to use the maintenance mode
+[CLI command](/commands/maint) or
+[HTTP API endpoint](/api-docs/agent#enable-maintenance-mode)
+rather than a TTL health check with arbitrarily high `ttl`.
+
+The following service definition file snippet is an example
+of a TTL check definition:
@@ -304,7 +324,24 @@ check = {
-A Docker check:
+### Docker check ((#docker-interval))
+
+These checks depend on periodically invoking an external application that
+is packaged within a Docker Container. The application is triggered within the running
+container through the Docker Exec API. We expect that the Consul agent user has access
+to either the Docker HTTP API or the unix socket. Consul uses `$DOCKER_HOST` to
+determine the Docker API endpoint. The application is expected to run, perform a health
+check of the service running inside the container, and exit with an appropriate exit code.
+The check should be paired with an invocation interval. The shell on which the check
+has to be performed is configurable, making it possible to run containers which
+have different shells on the same host.
+The output of a Docker check is limited to 4KB.
+Larger outputs are truncated.
+The agent must be configured with [`enable_script_checks`](/docs/agent/config/cli-flags#_enable_script_checks)
+set to `true` in order to enable Docker health checks.
+
+The following service definition file snippet is an example
+of a Docker check definition:
@@ -334,7 +371,26 @@ check = {
-A gRPC check for the whole application:
+### gRPC check ((##grpc-interval))
+
+gRPC checks are intended for applications that support the standard
+[gRPC health checking protocol](https://github.com/grpc/grpc/blob/master/doc/health-checking.md).
+The state of the check will be updated by periodically probing the configured endpoint,
+waiting `interval` amount of time between attempts.
+
+By default, gRPC checks are configured with a timeout equal to 10 seconds.
+To configure a custom Docker check timeout value,
+specify the `timeout` field in the check definition.
+
+gRPC checks default to not using TLS.
+To enable TLS, set `grpc_use_tls` in the check definition.
+If TLS is enabled, then by default, a valid TLS certificate is expected.
+Certificate verification can be turned off by setting the
+`tls_skip_verify` field to `true` in the check definition.
+To check on a specific service instead of the whole gRPC server, add the service identifier after the `gRPC` check's endpoint in the following format `/:service_identifier`.
+
+The following service definition file snippet is an example
+of a gRPC check for a whole application:
@@ -362,7 +418,8 @@ check = {
-A gRPC check for the specific `my_service` service:
+The following service definition file snippet is an example
+of a gRPC check for the specific `my_service` service
@@ -390,7 +447,23 @@ check = {
-A h2ping check:
+### H2ping check ((#h2ping-interval))
+
+H2ping checks test an endpoint that uses http2 by connecting to the endpoint
+and sending a ping frame, waiting `interval` amount of time between attempts.
+If the ping is successful within a specified timeout,
+then the check status is set to `success`.
+
+By default, h2ping checks are configured with a request timeout equal to 10 seconds.
+To configure a custom h2ping check timeout value,
+specify the `timeout` field in the check definition.
+
+TLS is enabled by default.
+To disable TLS and use h2c, set `h2ping_use_tls` to `false`.
+If TLS is not disabled, a valid certificate is required unless `tls_skip_verify` is set to `true`.
+
+The following service definition file snippet is an example
+of an h2ping check definition:
@@ -418,7 +491,29 @@ check = {
-An alias check for a local service:
+### Alias check
+
+These checks alias the health state of another registered
+node or service. The state of the check updates asynchronously, but is
+nearly instant. For aliased services on the same agent, the local state is monitored
+and no additional network resources are consumed. For other services and nodes,
+the check maintains a blocking query over the agent's connection with a current
+server and allows stale requests. If there are any errors in watching the aliased
+node or service, the check state is set to `critical`.
+For the blocking query, the check uses the ACL token set on the service or check definition.
+If no ACL token is set in the service or check definition,
+the blocking query uses the agent's default ACL token
+([`acl.tokens.default`](/docs/agent/config/config-files#acl_tokens_default)).
+
+~> **Configuration info**: The alias check configuration expects the alias to be
+registered on the same agent as the one you are aliasing. If the service is
+not registered with the same agent, `"alias_node": ""` must also be
+specified. When using `alias_node`, if no service is specified, the check will
+alias the health of the node. If a service is specified, the check will alias
+the specified service on this particular node.
+
+The following service definition file snippet is an example
+of an alias check for a local service:
@@ -440,72 +535,137 @@ check = {
-~> Configuration info: The alias check configuration expects the alias to be
-registered on the same agent as the one you are aliasing. If the service is
-not registered with the same agent, `"alias_node": ""` must also be
-specified. When using `alias_node`, if no service is specified, the check will
-alias the health of the node. If a service is specified, the check will alias
-the specified service on this particular node.
+## Check definition
-Each type of definition must include a `name` and may optionally provide an
-`id` and `notes` field. The `id` must be unique per _agent_ otherwise only the
-last defined check with that `id` will be registered. If the `id` is not set
-and the check is embedded within a service definition a unique check id is
-generated. Otherwise, `id` will be set to `name`. If names might conflict,
-unique IDs should be provided.
+This section covers some of the most common options for check definitions.
+For a complete list of all check options, refer to the
+[Register Check HTTP API endpoint documentation](/api-docs/agent/check#json-request-body-schema).
-The `notes` field is opaque to Consul but can be used to provide a human-readable
-description of the current state of the check. Similarly, an external process
-updating a TTL check via the HTTP interface can set the `notes` value.
+-> **Casing for check options:**
+ The correct casing for an option depends on whether the check is defined in
+ a service definition file or an HTTP API JSON request body.
+ For example, the option `deregister_critical_service_after` in a service
+ definition file is instead named `DeregisterCriticalServiceAfter` in an
+ HTTP API JSON request body.
-Checks may also contain a `token` field to provide an ACL token. This token is
-used for any interaction with the catalog for the check, including
-[anti-entropy syncs](/docs/architecture/anti-entropy) and deregistration.
-For Alias checks, this token is used if a remote blocking query is necessary
-to watch the state of the aliased node or service.
+#### General options
-Script, TCP, UDP, HTTP, Docker, and gRPC checks must include an `interval` field. This
-field is parsed by Go's `time` package, and has the following
-[formatting specification](https://golang.org/pkg/time/#ParseDuration):
+- `name` `(string: )` - Specifies the name of the check.
-> A duration string is a possibly signed sequence of decimal numbers, each with
-> optional fraction and a unit suffix, such as "300ms", "-1.5h" or "2h45m".
-> Valid time units are "ns", "us" (or "µs"), "ms", "s", "m", "h".
+- `id` `(string: "")` - Specifies a unique ID for this check on this node.
+
+ If unspecified, Consul defines the check id by:
+ - If the check definition is embedded within a service definition file,
+ a unique check id is auto-generated.
+ - Otherwise, the `id` is set to the value of `name`.
+ If names might conflict, you must provide unique IDs to avoid
+ overwriting existing checks with the same id on this node.
-In Consul 0.7 and later, checks that are associated with a service may also contain
-an optional `deregister_critical_service_after` field, which is a timeout in the
-same Go time format as `interval` and `ttl`. If a check is in the critical state
-for more than this configured value, then its associated service (and all of its
-associated checks) will automatically be deregistered. The minimum timeout is 1
-minute, and the process that reaps critical services runs every 30 seconds, so it
-may take slightly longer than the configured timeout to trigger the deregistration.
-This should generally be configured with a timeout that's much, much longer than
-any expected recoverable outage for the given service.
+- `interval` `(string: )` - Specifies
+ the frequency at which to run this check.
+ Required for all check types except TTL and alias checks.
-To configure a check, either provide it as a `-config-file` option to the
-agent or place it inside the `-config-dir` of the agent. The file must
-end in a ".json" or ".hcl" extension to be loaded by Consul. Check definitions
-can also be updated by sending a `SIGHUP` to the agent. Alternatively, the
-check can be registered dynamically using the [HTTP API](/api).
+ The value is parsed by Go's `time` package, and has the following
+ [formatting specification](https://golang.org/pkg/time/#ParseDuration):
-## Check Scripts
+ > A duration string is a possibly signed sequence of decimal numbers, each with
+ > optional fraction and a unit suffix, such as "300ms", "-1.5h" or "2h45m".
+ > Valid time units are "ns", "us" (or "µs"), "ms", "s", "m", "h".
-A check script is generally free to do anything to determine the status
-of the check. The only limitations placed are that the exit codes must obey
-this convention:
+- `service_id` `(string: )` - Specifies
+ the ID of a service instance to associate this check with.
+ That service instance must be on this node.
+ If not specified, this check is treated as a node-level check.
+ For more information, refer to the
+ [service-bound checks](#service-bound-checks) section.
-- Exit code 0 - Check is passing
-- Exit code 1 - Check is warning
-- Any other code - Check is failing
+- `status` `(string: "")` - Specifies the initial status of the health check as
+ "critical" (default), "warning", or "passing". For more details, refer to
+ the [initial health check status](#initial-health-check-status) section.
+
+ -> **Health defaults to critical:** If health status it not initially specified,
+ it defaults to "critical" to protect against including a service
+ in discovery results before it is ready.
-This is the only convention that Consul depends on. Any output of the script
-will be captured and stored in the `output` field.
+- `deregister_critical_service_after` `(string: "")` - If specified,
+ the associated service and all its checks are deregistered
+ after this check is in the critical state for more than the specified value.
+ The value has the same formatting specification as the [`interval`](#interval) field.
-In Consul 0.9.0 and later, the agent must be configured with
-[`enable_script_checks`](/docs/agent/config/cli-flags#_enable_script_checks) set to `true`
-in order to enable script checks.
+ The minimum timeout is 1 minute,
+ and the process that reaps critical services runs every 30 seconds,
+ so it may take slightly longer than the configured timeout to trigger the deregistration.
+ This field should generally be configured with a timeout that's significantly longer than
+ any expected recoverable outage for the given service.
-## Initial Health Check Status
+- `notes` `(string: "")` - Provides a human-readable description of the check.
+ This field is opaque to Consul and can be used however is useful to the user.
+ For example, it could be used to describe the current state of the check.
+
+- `token` `(string: "")` - Specifies an ACL token used for any interaction
+ with the catalog for the check, including
+ [anti-entropy syncs](/docs/architecture/anti-entropy) and deregistration.
+
+ For alias checks, this token is used if a remote blocking query is necessary to watch the state of the aliased node or service.
+
+#### Success/failures before passing/warning/critical
+
+To prevent flapping health checks and limit the load they cause on the cluster,
+a health check may be configured to become passing/warning/critical only after a
+specified number of consecutive checks return as passing/critical.
+The status does not transition states until the configured threshold is reached.
+
+- `success_before_passing` - Number of consecutive successful results required
+ before check status transitions to passing. Defaults to `0`. Added in Consul 1.7.0.
+
+- `failures_before_warning` - Number of consecutive unsuccessful results required
+ before check status transitions to warning. Defaults to the same value as that of
+ `failures_before_critical` to maintain the expected behavior of not changing the
+ status of service checks to `warning` before `critical` unless configured to do so.
+ Values higher than `failures_before_critical` are invalid. Added in Consul 1.11.0.
+
+- `failures_before_critical` - Number of consecutive unsuccessful results required
+ before check status transitions to critical. Defaults to `0`. Added in Consul 1.7.0.
+
+This feature is available for all check types except TTL and alias checks.
+By default, both passing and critical thresholds are set to 0 so the check
+status always reflects the last check result.
+
+
+
+```hcl
+checks = [
+ {
+ name = "HTTP TCP on port 80"
+ tcp = "localhost:80"
+ interval = "10s"
+ timeout = "1s"
+ success_before_passing = 3
+ failures_before_warning = 1
+ failures_before_critical = 3
+ }
+]
+```
+
+```json
+{
+ "checks": [
+ {
+ "name": "HTTP TCP on port 80",
+ "tcp": "localhost:80",
+ "interval": "10s",
+ "timeout": "1s",
+ "success_before_passing": 3,
+ "failures_before_warning": 1,
+ "failures_before_critical": 3
+ }
+ ]
+}
+```
+
+
+
+## Initial health check status
By default, when checks are registered against a Consul agent, the state is set
immediately to "critical". This is useful to prevent services from being
@@ -576,13 +736,13 @@ In the above configuration, if the web-app health check begins failing, it will
only affect the availability of the web-app service. All other services
provided by the node will remain unchanged.
-## Agent Certificates for TLS Checks
+## Agent certificates for TLS checks
The [enable_agent_tls_for_checks](/docs/agent/config/config-files#enable_agent_tls_for_checks)
agent configuration option can be utilized to have HTTP or gRPC health checks
to use the agent's credentials when configured for TLS.
-## Multiple Check Definitions
+## Multiple check definitions
Multiple check definitions can be defined using the `checks` (plural)
key in your configuration file.
@@ -640,58 +800,3 @@ checks = [
```
-
-## Success/Failures before passing/warning/critical
-
-To prevent flapping health checks, and limit the load they cause on the cluster,
-a health check may be configured to become passing/warning/critical only after a
-specified number of consecutive checks return passing/critical.
-The status will not transition states until the configured threshold is reached.
-
-- `success_before_passing` - Number of consecutive successful results required
- before check status transitions to passing. Defaults to `0`. Added in Consul 1.7.0.
-- `failures_before_warning` - Number of consecutive unsuccessful results required
- before check status transitions to warning. Defaults to the same value as that of
- `failures_before_critical` to maintain the expected behavior of not changing the
- status of service checks to `warning` before `critical` unless configured to do so.
- Values higher than `failures_before_critical` are invalid. Added in Consul 1.11.0.
-- `failures_before_critical` - Number of consecutive unsuccessful results required
- before check status transitions to critical. Defaults to `0`. Added in Consul 1.7.0.
-
-This feature is available for HTTP, TCP, gRPC, Docker & Monitor checks.
-By default, both passing and critical thresholds will be set to 0 so the check
-status will always reflect the last check result.
-
-
-
-```hcl
-checks = [
- {
- name = "HTTP TCP on port 80"
- tcp = "localhost:80"
- interval = "10s"
- timeout = "1s"
- success_before_passing = 3
- failures_before_warning = 1
- failures_before_critical = 3
- }
-]
-```
-
-```json
-{
- "checks": [
- {
- "name": "HTTP TCP on port 80",
- "tcp": "localhost:80",
- "interval": "10s",
- "timeout": "1s",
- "success_before_passing": 3,
- "failures_before_warning": 1,
- "failures_before_critical": 3
- }
- ]
-}
-```
-
-