Major updates and reorganizing of checks.mdx (#15806)
* Major updates and reorganizing of checks.mdx * Update checks.mdx Additional suggestion for clarity around gRPC `:/service-identifier` example Signed-off-by: Jeff Boruszak <104028618+boruszak@users.noreply.github.com> * Update website/content/docs/discovery/checks.mdx Co-authored-by: Jeff Boruszak <104028618+boruszak@users.noreply.github.com> * Update website/content/docs/discovery/checks.mdx Co-authored-by: Jeff Boruszak <104028618+boruszak@users.noreply.github.com> * Update website/content/docs/discovery/checks.mdx Co-authored-by: Jeff Boruszak <104028618+boruszak@users.noreply.github.com> * Update website/content/docs/discovery/checks.mdx Co-authored-by: Jeff Boruszak <104028618+boruszak@users.noreply.github.com> * Update website/content/docs/discovery/checks.mdx Co-authored-by: Jeff Boruszak <104028618+boruszak@users.noreply.github.com> * Update website/content/docs/discovery/checks.mdx Co-authored-by: Jeff Boruszak <104028618+boruszak@users.noreply.github.com> * Update website/content/docs/discovery/checks.mdx Co-authored-by: Jeff Boruszak <104028618+boruszak@users.noreply.github.com> * Update website/content/docs/discovery/checks.mdx Co-authored-by: Jeff Boruszak <104028618+boruszak@users.noreply.github.com> * Update website/content/docs/discovery/checks.mdx Co-authored-by: Jeff Boruszak <104028618+boruszak@users.noreply.github.com> * Update website/content/docs/discovery/checks.mdx Co-authored-by: Jeff Boruszak <104028618+boruszak@users.noreply.github.com> * Update website/content/docs/discovery/checks.mdx Co-authored-by: Jeff Boruszak <104028618+boruszak@users.noreply.github.com> * Update website/content/docs/discovery/checks.mdx Co-authored-by: Jeff Boruszak <104028618+boruszak@users.noreply.github.com> Signed-off-by: Jeff Boruszak <104028618+boruszak@users.noreply.github.com> Co-authored-by: Jeff Boruszak <104028618+boruszak@users.noreply.github.com> Co-authored-by: trujillo-adam <47586768+trujillo-adam@users.noreply.github.com>
This commit is contained in:
parent
acfc7452e9
commit
ec3ebec505
|
@ -9,150 +9,45 @@ description: >-
|
||||||
|
|
||||||
One of the primary roles of the agent is management of system-level and application-level health
|
One of the primary roles of the agent is management of system-level and application-level health
|
||||||
checks. A health check is considered to be application-level if it is associated with a
|
checks. A health check is considered to be application-level if it is associated with a
|
||||||
service. If not associated with a service, the check monitors the health of the entire node.
|
service. If not associated with a service, the check monitors the health of the entire node.
|
||||||
|
|
||||||
Review the [health checks tutorial](https://learn.hashicorp.com/tutorials/consul/service-registration-health-checks)
|
Review the [health checks tutorial](https://learn.hashicorp.com/tutorials/consul/service-registration-health-checks)
|
||||||
to get a more complete example on how to leverage health check capabilities in Consul.
|
to get a more complete example on how to leverage health check capabilities in Consul.
|
||||||
|
|
||||||
A check is defined in a configuration file or added at runtime over the HTTP interface. Checks
|
A check is defined in a configuration file or added at runtime over the HTTP interface. Checks
|
||||||
created via the HTTP interface persist with that node.
|
created via the HTTP interface persist with that node.
|
||||||
|
|
||||||
There are several different kinds of checks:
|
There are severeal types of checks:
|
||||||
|
|
||||||
- Script + Interval - These checks depend on invoking an external application
|
- [`Script + Interval`](#script-check) - These checks invoke an external application
|
||||||
that performs the health check, exits with an appropriate exit code, and potentially
|
that performs the health check.
|
||||||
generates some output. A script is paired with an invocation interval (e.g.
|
|
||||||
every 30 seconds). This is similar to the Nagios plugin system. The output of
|
|
||||||
a script check is limited to 4KB. Output larger than this will be truncated.
|
|
||||||
By default, Script checks will be configured with a timeout equal to 30 seconds.
|
|
||||||
It is possible to configure a custom Script check timeout value by specifying the
|
|
||||||
`timeout` field in the check definition. When the timeout is reached on Windows,
|
|
||||||
Consul will wait for any child processes spawned by the script to finish. For any
|
|
||||||
other system, Consul will attempt to force-kill the script and any child processes
|
|
||||||
it has spawned once the timeout has passed.
|
|
||||||
In Consul 0.9.0 and later, script checks are not enabled by default. To use them you
|
|
||||||
can either use :
|
|
||||||
|
|
||||||
- [`enable_local_script_checks`](/docs/agent/config/cli-flags#_enable_local_script_checks):
|
- [`HTTP + Interval`](#http-check) - These checks make an HTTP `GET` request to the specified URL
|
||||||
enable script checks defined in local config files. Script checks defined via the HTTP
|
in the health check definition.
|
||||||
API will not be allowed.
|
|
||||||
- [`enable_script_checks`](/docs/agent/config/cli-flags#_enable_script_checks): enable
|
- [`TCP + Interval`](#tcp-check) - These checks attempt a TCP connection to the specified
|
||||||
script checks regardless of how they are defined.
|
address and port in the health check definition.
|
||||||
|
|
||||||
~> **Security Warning:** Enabling script checks in some configurations may
|
- [`UDP + Interval`](#udp-check) - These checks direct the client to periodically send UDP datagrams
|
||||||
introduce a remote execution vulnerability which is known to be targeted by
|
to the specified address and port in the health check definition.
|
||||||
malware. We strongly recommend `enable_local_script_checks` instead. See [this
|
|
||||||
blog post](https://www.hashicorp.com/blog/protecting-consul-from-rce-risk-in-specific-configurations)
|
- [`OSService + Interval`](#osservice-check) - These checks periodically direct the Consul agent to monitor
|
||||||
for more details.
|
the health of a service running on the host operating system.
|
||||||
|
|
||||||
- `HTTP + Interval` - These checks make an HTTP `GET` request to the specified URL,
|
- [`Time to Live (TTL)`](#time-to-live-ttl-check) - These checks attempt an HTTP connection after a given TTL elapses.
|
||||||
waiting the specified `interval` amount of time between requests (eg. 30 seconds).
|
|
||||||
The status of the service depends on the HTTP response code: any `2xx` code is
|
- [`Docker + Interval`](#docker-check) - These checks invoke an external application that
|
||||||
considered passing, a `429 Too ManyRequests` is a warning, and anything else is
|
is packaged within a Docker container.
|
||||||
a failure. This type of check
|
|
||||||
should be preferred over a script that uses `curl` or another external process
|
|
||||||
to check a simple HTTP operation. By default, HTTP checks are `GET` requests
|
|
||||||
unless the `method` field specifies a different method. Additional header
|
|
||||||
fields can be set through the `header` field which is a map of lists of
|
|
||||||
strings, e.g. `{"x-foo": ["bar", "baz"]}`. By default, HTTP checks will be
|
|
||||||
configured with a request timeout equal to 10 seconds.
|
|
||||||
|
|
||||||
It is possible to configure a custom HTTP check timeout value by
|
- [`gRPC + Interval`](#grpc-check) - These checks are intended for applications that support the standard
|
||||||
specifying the `timeout` field in the check definition. The output of the
|
|
||||||
check is limited to roughly 4KB. Responses larger than this will be truncated.
|
|
||||||
HTTP checks also support TLS. By default, a valid TLS certificate is expected.
|
|
||||||
Certificate verification can be turned off by setting the `tls_skip_verify`
|
|
||||||
field to `true` in the check definition. When using TLS, the SNI will be set
|
|
||||||
automatically from the URL if it uses a hostname (as opposed to an IP address);
|
|
||||||
the value can be overridden by setting `tls_server_name`.
|
|
||||||
|
|
||||||
Consul follows HTTP redirects by default. Set the `disable_redirects` field to
|
|
||||||
`true` to disable redirects.
|
|
||||||
|
|
||||||
- `TCP + Interval` - These checks make a TCP connection attempt to the specified
|
|
||||||
IP/hostname and port, waiting `interval` amount of time between attempts
|
|
||||||
(e.g. 30 seconds). If no hostname
|
|
||||||
is specified, it defaults to "localhost". The status of the service depends on
|
|
||||||
whether the connection attempt is successful (ie - the port is currently
|
|
||||||
accepting connections). If the connection is accepted, the status is
|
|
||||||
`success`, otherwise the status is `critical`. In the case of a hostname that
|
|
||||||
resolves to both IPv4 and IPv6 addresses, an attempt will be made to both
|
|
||||||
addresses, and the first successful connection attempt will result in a
|
|
||||||
successful check. This type of check should be preferred over a script that
|
|
||||||
uses `netcat` or another external process to check a simple socket operation.
|
|
||||||
By default, TCP checks will be configured with a request timeout of 10 seconds.
|
|
||||||
It is possible to configure a custom TCP check timeout value by specifying the
|
|
||||||
`timeout` field in the check definition.
|
|
||||||
|
|
||||||
- `UDP + Interval` - These checks direct the client to periodically send UDP datagrams
|
|
||||||
to the specified IP/hostname and port. The duration specified in the `interval` field sets the amount of time
|
|
||||||
between attempts, such as `30s` to indicate 30 seconds. The check is logged as healthy if any response from the UDP server is received. Any other result sets the status to `critical`.
|
|
||||||
For UDP checks, the default value for the `timeout` field is `10s`, but you can configure a custom timeout by specifying the
|
|
||||||
`timeout` field in the check definition. If any timeout on read exists, the check is still considered healthy.
|
|
||||||
|
|
||||||
- `Time to Live (TTL)` ((#ttl)) - These checks retain their last known state
|
|
||||||
for a given TTL. The state of the check must be updated periodically over the HTTP
|
|
||||||
interface. If an external system fails to update the status within a given TTL,
|
|
||||||
the check is set to the failed state. This mechanism, conceptually similar to a
|
|
||||||
dead man's switch, relies on the application to directly report its health. For
|
|
||||||
example, a healthy app can periodically `PUT` a status update to the HTTP endpoint;
|
|
||||||
if the app fails, the TTL will expire and the health check enters a critical state.
|
|
||||||
The endpoints used to update health information for a given check are: [pass](/api-docs/agent/check#ttl-check-pass),
|
|
||||||
[warn](/api-docs/agent/check#ttl-check-warn), [fail](/api-docs/agent/check#ttl-check-fail),
|
|
||||||
and [update](/api-docs/agent/check#ttl-check-update). TTL checks also persist their
|
|
||||||
last known status to disk. This allows the Consul agent to restore the last known
|
|
||||||
status of the check across restarts. Persisted check status is valid through the
|
|
||||||
end of the TTL from the time of the last check.
|
|
||||||
|
|
||||||
- `Docker + Interval` - These checks depend on invoking an external application which
|
|
||||||
is packaged within a Docker Container. The application is triggered within the running
|
|
||||||
container via the Docker Exec API. We expect that the Consul agent user has access
|
|
||||||
to either the Docker HTTP API or the unix socket. Consul uses `$DOCKER_HOST` to
|
|
||||||
determine the Docker API endpoint. The application is expected to run, perform a health
|
|
||||||
check of the service running inside the container, and exit with an appropriate exit code.
|
|
||||||
The check should be paired with an invocation interval. The shell on which the check
|
|
||||||
has to be performed is configurable which makes it possible to run containers which
|
|
||||||
have different shells on the same host. Check output for Docker is limited to
|
|
||||||
4KB. Any output larger than this will be truncated. In Consul 0.9.0 and later, the agent
|
|
||||||
must be configured with [`enable_script_checks`](/docs/agent/config/cli-flags#_enable_script_checks)
|
|
||||||
set to `true` in order to enable Docker health checks.
|
|
||||||
|
|
||||||
- `gRPC + Interval` - These checks are intended for applications that support the standard
|
|
||||||
[gRPC health checking protocol](https://github.com/grpc/grpc/blob/master/doc/health-checking.md).
|
[gRPC health checking protocol](https://github.com/grpc/grpc/blob/master/doc/health-checking.md).
|
||||||
The state of the check will be updated by probing the configured endpoint, waiting `interval`
|
|
||||||
amount of time between probes (eg. 30 seconds). By default, gRPC checks will be configured
|
|
||||||
with a default timeout of 10 seconds.
|
|
||||||
It is possible to configure a custom timeout value by specifying the `timeout` field in
|
|
||||||
the check definition. gRPC checks will default to not using TLS, but TLS can be enabled by
|
|
||||||
setting `grpc_use_tls` in the check definition. If TLS is enabled, then by default, a valid
|
|
||||||
TLS certificate is expected. Certificate verification can be turned off by setting the
|
|
||||||
`tls_skip_verify` field to `true` in the check definition.
|
|
||||||
To check on a specific service instead of the whole gRPC server, add the service identifier after the `gRPC` check's endpoint in the following format `/:service_identifier`.
|
|
||||||
|
|
||||||
- `H2ping + Interval` - These checks test an endpoint that uses http2
|
- [`H2ping + Interval`](#h2ping-check) - These checks test an endpoint that uses HTTP/2
|
||||||
by connecting to the endpoint and sending a ping frame. TLS is assumed to be configured by default.
|
by connecting to the endpoint and sending a ping frame.
|
||||||
To disable TLS and use h2c, set `h2ping_use_tls` to `false`. If the ping is successful
|
|
||||||
within a specified timeout, then the check is updated as passing.
|
|
||||||
The timeout defaults to 10 seconds, but is configurable using the `timeout` field. If TLS is enabled a valid
|
|
||||||
certificate is required, unless `tls_skip_verify` is set to `true`.
|
|
||||||
The check will be run on the interval specified by the `interval` field.
|
|
||||||
|
|
||||||
- `Alias` - These checks alias the health state of another registered
|
- [`Alias`](#alias-check) - These checks alias the health state of another registered
|
||||||
node or service. The state of the check will be updated asynchronously, but is
|
node or service.
|
||||||
nearly instant. For aliased services on the same agent, the local state is monitored
|
|
||||||
and no additional network resources are consumed. For other services and nodes,
|
|
||||||
the check maintains a blocking query over the agent's connection with a current
|
|
||||||
server and allows stale requests. If there are any errors in watching the aliased
|
|
||||||
node or service, the check state will be critical. For the blocking query, the
|
|
||||||
check will use the ACL token set on the service or check definition or otherwise
|
|
||||||
will fall back to the default ACL token set with the agent (`acl_token`).
|
|
||||||
|
|
||||||
## Check Definition
|
|
||||||
|
|
||||||
A script check:
|
|
||||||
=======
|
|
||||||
|
|
||||||
Review the [service health checks tutorial](https://learn.hashicorp.com/tutorials/consul/service-registration-health-checks)
|
|
||||||
to get a more complete example on how to leverage health check capabilities in Consul.
|
|
||||||
|
|
||||||
## Registering a health check
|
## Registering a health check
|
||||||
|
|
||||||
|
@ -181,7 +76,7 @@ automatically monitor the health of a service instance or node.
|
||||||
to temporarily remove one or all service instances on a node
|
to temporarily remove one or all service instances on a node
|
||||||
from service discovery DNS and HTTP API query results.
|
from service discovery DNS and HTTP API query results.
|
||||||
|
|
||||||
### Script check ((#script-interval))
|
### Script check
|
||||||
|
|
||||||
Script checks periodically invoke an external application that performs the health check,
|
Script checks periodically invoke an external application that performs the health check,
|
||||||
exits with an appropriate exit code, and potentially generates some output.
|
exits with an appropriate exit code, and potentially generates some output.
|
||||||
|
@ -255,7 +150,7 @@ Any output of the script is captured and made available in the
|
||||||
`Output` field of checks included in HTTP API responses,
|
`Output` field of checks included in HTTP API responses,
|
||||||
as in this example from the [local service health endpoint](/api-docs/agent/service#by-name-json).
|
as in this example from the [local service health endpoint](/api-docs/agent/service#by-name-json).
|
||||||
|
|
||||||
### HTTP check ((#http-interval))
|
### HTTP check
|
||||||
|
|
||||||
HTTP checks periodically make an HTTP `GET` request to the specified URL,
|
HTTP checks periodically make an HTTP `GET` request to the specified URL,
|
||||||
waiting the specified `interval` amount of time between requests.
|
waiting the specified `interval` amount of time between requests.
|
||||||
|
@ -324,7 +219,7 @@ check = {
|
||||||
|
|
||||||
</CodeTabs>
|
</CodeTabs>
|
||||||
|
|
||||||
### TCP check ((#tcp-interval))
|
### TCP check
|
||||||
|
|
||||||
TCP checks periodically make a TCP connection attempt to the specified IP/hostname and port, waiting `interval` amount of time between attempts.
|
TCP checks periodically make a TCP connection attempt to the specified IP/hostname and port, waiting `interval` amount of time between attempts.
|
||||||
If no hostname is specified, it defaults to "localhost".
|
If no hostname is specified, it defaults to "localhost".
|
||||||
|
@ -368,7 +263,7 @@ check = {
|
||||||
|
|
||||||
</CodeTabs>
|
</CodeTabs>
|
||||||
|
|
||||||
### UDP check ((#udp-interval))
|
### UDP check
|
||||||
|
|
||||||
UDP checks periodically direct the Consul agent to send UDP datagrams
|
UDP checks periodically direct the Consul agent to send UDP datagrams
|
||||||
to the specified IP/hostname and port,
|
to the specified IP/hostname and port,
|
||||||
|
@ -416,7 +311,8 @@ OSService checks periodically direct the Consul agent to monitor the health of a
|
||||||
the host operating system as either a Windows service (Windows) or a SystemD service (Unix).
|
the host operating system as either a Windows service (Windows) or a SystemD service (Unix).
|
||||||
The check is logged as `healthy` if the service is running.
|
The check is logged as `healthy` if the service is running.
|
||||||
If it is stopped or not running, the status is `critical`. All other results set
|
If it is stopped or not running, the status is `critical`. All other results set
|
||||||
the status to `warning`, which indicates that the check is not reliable because an issue is preventing the check from determining the health of the service.
|
the status to `warning`, which indicates that the check is not reliable because
|
||||||
|
an issue is preventing the check from determining the health of the service.
|
||||||
|
|
||||||
The following service definition file snippet is an example
|
The following service definition file snippet is an example
|
||||||
of an OSService check definition:
|
of an OSService check definition:
|
||||||
|
@ -447,9 +343,10 @@ check = {
|
||||||
|
|
||||||
</CodeTabs>
|
</CodeTabs>
|
||||||
|
|
||||||
### Time to live (TTL) check ((#ttl))
|
### Time to live (TTL) check
|
||||||
|
|
||||||
TTL checks retain their last known state for the specified `ttl` duration.
|
TTL checks retain their last known state for the specified `ttl` duration.
|
||||||
|
The state of the check updates periodically over the HTTP interface.
|
||||||
If the `ttl` duration elapses before a new check update
|
If the `ttl` duration elapses before a new check update
|
||||||
is provided over the HTTP interface,
|
is provided over the HTTP interface,
|
||||||
the check is set to `critical` state.
|
the check is set to `critical` state.
|
||||||
|
@ -498,7 +395,7 @@ check = {
|
||||||
|
|
||||||
</CodeTabs>
|
</CodeTabs>
|
||||||
|
|
||||||
### Docker check ((#docker-interval))
|
### Docker check
|
||||||
|
|
||||||
These checks depend on periodically invoking an external application that
|
These checks depend on periodically invoking an external application that
|
||||||
is packaged within a Docker Container. The application is triggered within the running
|
is packaged within a Docker Container. The application is triggered within the running
|
||||||
|
@ -511,8 +408,21 @@ has to be performed is configurable, making it possible to run containers which
|
||||||
have different shells on the same host.
|
have different shells on the same host.
|
||||||
The output of a Docker check is limited to 4KB.
|
The output of a Docker check is limited to 4KB.
|
||||||
Larger outputs are truncated.
|
Larger outputs are truncated.
|
||||||
The agent must be configured with [`enable_script_checks`](/docs/agent/config/cli-flags#_enable_script_checks)
|
|
||||||
set to `true` in order to enable Docker health checks.
|
Docker checks are not enabled by default.
|
||||||
|
To enable a Consul agent to perform Docker checks,
|
||||||
|
use one of the following agent configuration options:
|
||||||
|
|
||||||
|
- [`enable_local_script_checks`](/docs/agent/config/cli-flags#_enable_local_script_checks):
|
||||||
|
Enable script checks defined in local config files.
|
||||||
|
Script checks registered using the HTTP API are not allowed.
|
||||||
|
|
||||||
|
- [`enable_script_checks`](/docs/agent/config/cli-flags#_enable_script_checks):
|
||||||
|
Enable script checks no matter how they are registered.
|
||||||
|
|
||||||
|
!> **Security Warning:**
|
||||||
|
We recommend using `enable_local_script_checks` instead of `enable_script_checks` in production
|
||||||
|
environments, as remote script checks are more vulnerable to malware attacks. Learn more about how [script checks can be exploited](https://www.hashicorp.com/blog/protecting-consul-from-rce-risk-in-specific-configurations#how-script-checks-can-be-exploited).
|
||||||
|
|
||||||
The following service definition file snippet is an example
|
The following service definition file snippet is an example
|
||||||
of a Docker check definition:
|
of a Docker check definition:
|
||||||
|
@ -545,7 +455,7 @@ check = {
|
||||||
|
|
||||||
</CodeTabs>
|
</CodeTabs>
|
||||||
|
|
||||||
### gRPC check ((##grpc-interval))
|
### gRPC check
|
||||||
|
|
||||||
gRPC checks are intended for applications that support the standard
|
gRPC checks are intended for applications that support the standard
|
||||||
[gRPC health checking protocol](https://github.com/grpc/grpc/blob/master/doc/health-checking.md).
|
[gRPC health checking protocol](https://github.com/grpc/grpc/blob/master/doc/health-checking.md).
|
||||||
|
@ -561,10 +471,10 @@ To enable TLS, set `grpc_use_tls` in the check definition.
|
||||||
If TLS is enabled, then by default, a valid TLS certificate is expected.
|
If TLS is enabled, then by default, a valid TLS certificate is expected.
|
||||||
Certificate verification can be turned off by setting the
|
Certificate verification can be turned off by setting the
|
||||||
`tls_skip_verify` field to `true` in the check definition.
|
`tls_skip_verify` field to `true` in the check definition.
|
||||||
To check on a specific service instead of the whole gRPC server, add the service identifier after the `gRPC` check's endpoint in the following format `/:service_identifier`.
|
To check on a specific service instead of the whole gRPC server,
|
||||||
|
add the service identifier after the `gRPC` check's endpoint.
|
||||||
|
|
||||||
The following service definition file snippet is an example
|
The following example shows a gRPC check for a whole application:
|
||||||
of a gRPC check for a whole application:
|
|
||||||
|
|
||||||
<CodeTabs heading="gRPC Check">
|
<CodeTabs heading="gRPC Check">
|
||||||
|
|
||||||
|
@ -592,8 +502,7 @@ check = {
|
||||||
|
|
||||||
</CodeTabs>
|
</CodeTabs>
|
||||||
|
|
||||||
The following service definition file snippet is an example
|
The following example shows a gRPC check for the specific `my_service` service:
|
||||||
of a gRPC check for the specific `my_service` service
|
|
||||||
|
|
||||||
<CodeTabs heading="gRPC Specific Service Check">
|
<CodeTabs heading="gRPC Specific Service Check">
|
||||||
|
|
||||||
|
@ -621,7 +530,7 @@ check = {
|
||||||
|
|
||||||
</CodeTabs>
|
</CodeTabs>
|
||||||
|
|
||||||
### H2ping check ((#h2ping-interval))
|
### H2ping check
|
||||||
|
|
||||||
H2ping checks test an endpoint that uses http2 by connecting to the endpoint
|
H2ping checks test an endpoint that uses http2 by connecting to the endpoint
|
||||||
and sending a ping frame, waiting `interval` amount of time between attempts.
|
and sending a ping frame, waiting `interval` amount of time between attempts.
|
||||||
|
|
Loading…
Reference in New Issue