diff --git a/website/content/docs/discovery/checks.mdx b/website/content/docs/discovery/checks.mdx index c683f4c8f..a1391c2df 100644 --- a/website/content/docs/discovery/checks.mdx +++ b/website/content/docs/discovery/checks.mdx @@ -9,150 +9,45 @@ description: >- One of the primary roles of the agent is management of system-level and application-level health checks. A health check is considered to be application-level if it is associated with a -service. If not associated with a service, the check monitors the health of the entire node. +service. If not associated with a service, the check monitors the health of the entire node. + Review the [health checks tutorial](https://learn.hashicorp.com/tutorials/consul/service-registration-health-checks) to get a more complete example on how to leverage health check capabilities in Consul. A check is defined in a configuration file or added at runtime over the HTTP interface. Checks created via the HTTP interface persist with that node. -There are several different kinds of checks: +There are severeal types of checks: -- Script + Interval - These checks depend on invoking an external application - that performs the health check, exits with an appropriate exit code, and potentially - generates some output. A script is paired with an invocation interval (e.g. - every 30 seconds). This is similar to the Nagios plugin system. The output of - a script check is limited to 4KB. Output larger than this will be truncated. - By default, Script checks will be configured with a timeout equal to 30 seconds. - It is possible to configure a custom Script check timeout value by specifying the - `timeout` field in the check definition. When the timeout is reached on Windows, - Consul will wait for any child processes spawned by the script to finish. For any - other system, Consul will attempt to force-kill the script and any child processes - it has spawned once the timeout has passed. - In Consul 0.9.0 and later, script checks are not enabled by default. To use them you - can either use : +- [`Script + Interval`](#script-check) - These checks invoke an external application + that performs the health check. - - [`enable_local_script_checks`](/docs/agent/config/cli-flags#_enable_local_script_checks): - enable script checks defined in local config files. Script checks defined via the HTTP - API will not be allowed. - - [`enable_script_checks`](/docs/agent/config/cli-flags#_enable_script_checks): enable - script checks regardless of how they are defined. +- [`HTTP + Interval`](#http-check) - These checks make an HTTP `GET` request to the specified URL + in the health check definition. + +- [`TCP + Interval`](#tcp-check) - These checks attempt a TCP connection to the specified + address and port in the health check definition. - ~> **Security Warning:** Enabling script checks in some configurations may - introduce a remote execution vulnerability which is known to be targeted by - malware. We strongly recommend `enable_local_script_checks` instead. See [this - blog post](https://www.hashicorp.com/blog/protecting-consul-from-rce-risk-in-specific-configurations) - for more details. +- [`UDP + Interval`](#udp-check) - These checks direct the client to periodically send UDP datagrams + to the specified address and port in the health check definition. + +- [`OSService + Interval`](#osservice-check) - These checks periodically direct the Consul agent to monitor + the health of a service running on the host operating system. -- `HTTP + Interval` - These checks make an HTTP `GET` request to the specified URL, - waiting the specified `interval` amount of time between requests (eg. 30 seconds). - The status of the service depends on the HTTP response code: any `2xx` code is - considered passing, a `429 Too ManyRequests` is a warning, and anything else is - a failure. This type of check - should be preferred over a script that uses `curl` or another external process - to check a simple HTTP operation. By default, HTTP checks are `GET` requests - unless the `method` field specifies a different method. Additional header - fields can be set through the `header` field which is a map of lists of - strings, e.g. `{"x-foo": ["bar", "baz"]}`. By default, HTTP checks will be - configured with a request timeout equal to 10 seconds. +- [`Time to Live (TTL)`](#time-to-live-ttl-check) - These checks attempt an HTTP connection after a given TTL elapses. + +- [`Docker + Interval`](#docker-check) - These checks invoke an external application that + is packaged within a Docker container. - It is possible to configure a custom HTTP check timeout value by - specifying the `timeout` field in the check definition. The output of the - check is limited to roughly 4KB. Responses larger than this will be truncated. - HTTP checks also support TLS. By default, a valid TLS certificate is expected. - Certificate verification can be turned off by setting the `tls_skip_verify` - field to `true` in the check definition. When using TLS, the SNI will be set - automatically from the URL if it uses a hostname (as opposed to an IP address); - the value can be overridden by setting `tls_server_name`. - - Consul follows HTTP redirects by default. Set the `disable_redirects` field to - `true` to disable redirects. - -- `TCP + Interval` - These checks make a TCP connection attempt to the specified - IP/hostname and port, waiting `interval` amount of time between attempts - (e.g. 30 seconds). If no hostname - is specified, it defaults to "localhost". The status of the service depends on - whether the connection attempt is successful (ie - the port is currently - accepting connections). If the connection is accepted, the status is - `success`, otherwise the status is `critical`. In the case of a hostname that - resolves to both IPv4 and IPv6 addresses, an attempt will be made to both - addresses, and the first successful connection attempt will result in a - successful check. This type of check should be preferred over a script that - uses `netcat` or another external process to check a simple socket operation. - By default, TCP checks will be configured with a request timeout of 10 seconds. - It is possible to configure a custom TCP check timeout value by specifying the - `timeout` field in the check definition. - -- `UDP + Interval` - These checks direct the client to periodically send UDP datagrams - to the specified IP/hostname and port. The duration specified in the `interval` field sets the amount of time - between attempts, such as `30s` to indicate 30 seconds. The check is logged as healthy if any response from the UDP server is received. Any other result sets the status to `critical`. - For UDP checks, the default value for the `timeout` field is `10s`, but you can configure a custom timeout by specifying the - `timeout` field in the check definition. If any timeout on read exists, the check is still considered healthy. - -- `Time to Live (TTL)` ((#ttl)) - These checks retain their last known state - for a given TTL. The state of the check must be updated periodically over the HTTP - interface. If an external system fails to update the status within a given TTL, - the check is set to the failed state. This mechanism, conceptually similar to a - dead man's switch, relies on the application to directly report its health. For - example, a healthy app can periodically `PUT` a status update to the HTTP endpoint; - if the app fails, the TTL will expire and the health check enters a critical state. - The endpoints used to update health information for a given check are: [pass](/api-docs/agent/check#ttl-check-pass), - [warn](/api-docs/agent/check#ttl-check-warn), [fail](/api-docs/agent/check#ttl-check-fail), - and [update](/api-docs/agent/check#ttl-check-update). TTL checks also persist their - last known status to disk. This allows the Consul agent to restore the last known - status of the check across restarts. Persisted check status is valid through the - end of the TTL from the time of the last check. - -- `Docker + Interval` - These checks depend on invoking an external application which - is packaged within a Docker Container. The application is triggered within the running - container via the Docker Exec API. We expect that the Consul agent user has access - to either the Docker HTTP API or the unix socket. Consul uses `$DOCKER_HOST` to - determine the Docker API endpoint. The application is expected to run, perform a health - check of the service running inside the container, and exit with an appropriate exit code. - The check should be paired with an invocation interval. The shell on which the check - has to be performed is configurable which makes it possible to run containers which - have different shells on the same host. Check output for Docker is limited to - 4KB. Any output larger than this will be truncated. In Consul 0.9.0 and later, the agent - must be configured with [`enable_script_checks`](/docs/agent/config/cli-flags#_enable_script_checks) - set to `true` in order to enable Docker health checks. - -- `gRPC + Interval` - These checks are intended for applications that support the standard +- [`gRPC + Interval`](#grpc-check) - These checks are intended for applications that support the standard [gRPC health checking protocol](https://github.com/grpc/grpc/blob/master/doc/health-checking.md). - The state of the check will be updated by probing the configured endpoint, waiting `interval` - amount of time between probes (eg. 30 seconds). By default, gRPC checks will be configured - with a default timeout of 10 seconds. - It is possible to configure a custom timeout value by specifying the `timeout` field in - the check definition. gRPC checks will default to not using TLS, but TLS can be enabled by - setting `grpc_use_tls` in the check definition. If TLS is enabled, then by default, a valid - TLS certificate is expected. Certificate verification can be turned off by setting the - `tls_skip_verify` field to `true` in the check definition. - To check on a specific service instead of the whole gRPC server, add the service identifier after the `gRPC` check's endpoint in the following format `/:service_identifier`. -- `H2ping + Interval` - These checks test an endpoint that uses http2 - by connecting to the endpoint and sending a ping frame. TLS is assumed to be configured by default. - To disable TLS and use h2c, set `h2ping_use_tls` to `false`. If the ping is successful - within a specified timeout, then the check is updated as passing. - The timeout defaults to 10 seconds, but is configurable using the `timeout` field. If TLS is enabled a valid - certificate is required, unless `tls_skip_verify` is set to `true`. - The check will be run on the interval specified by the `interval` field. +- [`H2ping + Interval`](#h2ping-check) - These checks test an endpoint that uses HTTP/2 + by connecting to the endpoint and sending a ping frame. -- `Alias` - These checks alias the health state of another registered - node or service. The state of the check will be updated asynchronously, but is - nearly instant. For aliased services on the same agent, the local state is monitored - and no additional network resources are consumed. For other services and nodes, - the check maintains a blocking query over the agent's connection with a current - server and allows stale requests. If there are any errors in watching the aliased - node or service, the check state will be critical. For the blocking query, the - check will use the ACL token set on the service or check definition or otherwise - will fall back to the default ACL token set with the agent (`acl_token`). +- [`Alias`](#alias-check) - These checks alias the health state of another registered + node or service. -## Check Definition - -A script check: -======= - -Review the [service health checks tutorial](https://learn.hashicorp.com/tutorials/consul/service-registration-health-checks) -to get a more complete example on how to leverage health check capabilities in Consul. ## Registering a health check @@ -181,7 +76,7 @@ automatically monitor the health of a service instance or node. to temporarily remove one or all service instances on a node from service discovery DNS and HTTP API query results. -### Script check ((#script-interval)) +### Script check Script checks periodically invoke an external application that performs the health check, exits with an appropriate exit code, and potentially generates some output. @@ -255,7 +150,7 @@ Any output of the script is captured and made available in the `Output` field of checks included in HTTP API responses, as in this example from the [local service health endpoint](/api-docs/agent/service#by-name-json). -### HTTP check ((#http-interval)) +### HTTP check HTTP checks periodically make an HTTP `GET` request to the specified URL, waiting the specified `interval` amount of time between requests. @@ -324,7 +219,7 @@ check = { -### TCP check ((#tcp-interval)) +### TCP check TCP checks periodically make a TCP connection attempt to the specified IP/hostname and port, waiting `interval` amount of time between attempts. If no hostname is specified, it defaults to "localhost". @@ -368,7 +263,7 @@ check = { -### UDP check ((#udp-interval)) +### UDP check UDP checks periodically direct the Consul agent to send UDP datagrams to the specified IP/hostname and port, @@ -416,7 +311,8 @@ OSService checks periodically direct the Consul agent to monitor the health of a the host operating system as either a Windows service (Windows) or a SystemD service (Unix). The check is logged as `healthy` if the service is running. If it is stopped or not running, the status is `critical`. All other results set -the status to `warning`, which indicates that the check is not reliable because an issue is preventing the check from determining the health of the service. +the status to `warning`, which indicates that the check is not reliable because +an issue is preventing the check from determining the health of the service. The following service definition file snippet is an example of an OSService check definition: @@ -447,9 +343,10 @@ check = { -### Time to live (TTL) check ((#ttl)) +### Time to live (TTL) check TTL checks retain their last known state for the specified `ttl` duration. +The state of the check updates periodically over the HTTP interface. If the `ttl` duration elapses before a new check update is provided over the HTTP interface, the check is set to `critical` state. @@ -498,7 +395,7 @@ check = { -### Docker check ((#docker-interval)) +### Docker check These checks depend on periodically invoking an external application that is packaged within a Docker Container. The application is triggered within the running @@ -511,8 +408,21 @@ has to be performed is configurable, making it possible to run containers which have different shells on the same host. The output of a Docker check is limited to 4KB. Larger outputs are truncated. -The agent must be configured with [`enable_script_checks`](/docs/agent/config/cli-flags#_enable_script_checks) -set to `true` in order to enable Docker health checks. + +Docker checks are not enabled by default. +To enable a Consul agent to perform Docker checks, +use one of the following agent configuration options: + +- [`enable_local_script_checks`](/docs/agent/config/cli-flags#_enable_local_script_checks): + Enable script checks defined in local config files. + Script checks registered using the HTTP API are not allowed. + +- [`enable_script_checks`](/docs/agent/config/cli-flags#_enable_script_checks): + Enable script checks no matter how they are registered. + + !> **Security Warning:** + We recommend using `enable_local_script_checks` instead of `enable_script_checks` in production + environments, as remote script checks are more vulnerable to malware attacks. Learn more about how [script checks can be exploited](https://www.hashicorp.com/blog/protecting-consul-from-rce-risk-in-specific-configurations#how-script-checks-can-be-exploited). The following service definition file snippet is an example of a Docker check definition: @@ -545,7 +455,7 @@ check = { -### gRPC check ((##grpc-interval)) +### gRPC check gRPC checks are intended for applications that support the standard [gRPC health checking protocol](https://github.com/grpc/grpc/blob/master/doc/health-checking.md). @@ -561,10 +471,10 @@ To enable TLS, set `grpc_use_tls` in the check definition. If TLS is enabled, then by default, a valid TLS certificate is expected. Certificate verification can be turned off by setting the `tls_skip_verify` field to `true` in the check definition. -To check on a specific service instead of the whole gRPC server, add the service identifier after the `gRPC` check's endpoint in the following format `/:service_identifier`. +To check on a specific service instead of the whole gRPC server, +add the service identifier after the `gRPC` check's endpoint. -The following service definition file snippet is an example -of a gRPC check for a whole application: +The following example shows a gRPC check for a whole application: @@ -592,8 +502,7 @@ check = { -The following service definition file snippet is an example -of a gRPC check for the specific `my_service` service +The following example shows a gRPC check for the specific `my_service` service: @@ -621,7 +530,7 @@ check = { -### H2ping check ((#h2ping-interval)) +### H2ping check H2ping checks test an endpoint that uses http2 by connecting to the endpoint and sending a ping frame, waiting `interval` amount of time between attempts.