docs: improve health check related docs

Includes: - Improved scannability and organization of checks overview - Checks overview includes more guidance on - How to register a health check - The options available for a health check definition - Contextual cross-references to maintenance mode
2022-07-27 14:03:06 -07:00 · 2022-07-27 14:03:06 -07:00 · 99df4df057
parent f6a163f239
commit 99df4df057
3 changed files with 365 additions and 246 deletions
--- a/website/content/api-docs/agent/check.mdx
+++ b/website/content/api-docs/agent/check.mdx
@ -6,7 +6,10 @@ description: The /agent/check endpoints interact with checks on the local agent

 # Check - Agent HTTP API

-The `/agent/check` endpoints interact with checks on the local agent in Consul.
+Consul's health check capabilities are described in the
+[health checks overview](/docs/discovery/checks).
+The `/agent/check` endpoints interact with health checks
+managed by the local agent in Consul.
 These should not be confused with checks in the catalog.

 ## List Checks
@ -418,6 +421,10 @@ $ curl \
 This endpoint is used with a TTL type check to set the status of the check to
 `critical` and to reset the TTL clock.

+If you want to manually mark a service as unhealthy,
+use [maintenance mode](/api-docs/agent#enable-maintenance-mode)
+instead of defining a TTL health check and using this endpoint.
+
 | Method | Path                          | Produces           |
 | ------ | ----------------------------- | ------------------ |
 | `PUT`  | `/agent/check/fail/:check_id` | `application/json` |
@ -456,6 +463,10 @@ $ curl \
 This endpoint is used with a TTL type check to set the status of the check and
 to reset the TTL clock.

+If you want to manually mark a service as unhealthy,
+use [maintenance mode](/api-docs/agent#enable-maintenance-mode)
+instead of defining a TTL health check and using this endpoint.
+
 | Method | Path                            | Produces           |
 | ------ | ------------------------------- | ------------------ |
 | `PUT`  | `/agent/check/update/:check_id` | `application/json` |
--- a/website/content/api-docs/health.mdx
+++ b/website/content/api-docs/health.mdx
@ -14,6 +14,9 @@ optional health checking mechanisms. Additionally, some of the query results
 from the health endpoints are filtered while the catalog endpoints provide the
 raw entries.

+To modify health check registration or information,
+use the [`/agent/check`](/api-docs/agent/check) endpoints.
+
 ## List Checks for Node

 This endpoint returns the checks specific to the node provided on the path.
--- a/website/content/docs/discovery/checks.mdx
+++ b/website/content/docs/discovery/checks.mdx
@ -13,144 +13,72 @@ description: >-
 One of the primary roles of the agent is management of system-level and application-level health
 checks. A health check is considered to be application-level if it is associated with a
 service. If not associated with a service, the check monitors the health of the entire node.
-Review the [health checks tutorial](https://learn.hashicorp.com/tutorials/consul/service-registration-health-checks) to get a more complete example on how to leverage health check capabilities in Consul.

-A check is defined in a configuration file or added at runtime over the HTTP interface. Checks
-created via the HTTP interface persist with that node.
+Review the [service health checks tutorial](https://learn.hashicorp.com/tutorials/consul/service-registration-health-checks)
+to get a more complete example on how to leverage health check capabilities in Consul.

-There are several different kinds of checks:
+## Registering a health check

- Script + Interval - These checks depend on invoking an external application
-  that performs the health check, exits with an appropriate exit code, and potentially
-  generates some output. A script is paired with an invocation interval (e.g.
-  every 30 seconds). This is similar to the Nagios plugin system. The output of
-  a script check is limited to 4KB. Output larger than this will be truncated.
-  By default, Script checks will be configured with a timeout equal to 30 seconds.
-  It is possible to configure a custom Script check timeout value by specifying the
-  `timeout` field in the check definition. When the timeout is reached on Windows,
-  Consul will wait for any child processes spawned by the script to finish. For any
-  other system, Consul will attempt to force-kill the script and any child processes
-  it has spawned once the timeout has passed.
-  In Consul 0.9.0 and later, script checks are not enabled by default. To use them you
-  can either use :
+There are three ways to register a service with health checks:

-  - [`enable_local_script_checks`](/docs/agent/config/cli-flags#_enable_local_script_checks):
-    enable script checks defined in local config files. Script checks defined via the HTTP
-    API will not be allowed.
-  - [`enable_script_checks`](/docs/agent/config/cli-flags#_enable_script_checks): enable
-    script checks regardless of how they are defined.
+1. Start or reload a Consul agent with a service definition file in the
+   [agent's configuration directory](/docs/agent#configuring-consul-agents).
+1. Call the
+   [`/agent/service/register`](/api-docs/agent/service#register-service)
+   HTTP API endpoint to register the service.
+1. Use the
+   [`consul services register`](/commands/services/register)
+   CLI command to register the service.

-  ~> **Security Warning:** Enabling script checks in some configurations may
-  introduce a remote execution vulnerability which is known to be targeted by
-  malware. We strongly recommend `enable_local_script_checks` instead. See [this
-  blog post](https://www.hashicorp.com/blog/protecting-consul-from-rce-risk-in-specific-configurations)
-  for more details.
+When a service is registered using the HTTP API endpoint or CLI command,
+the checks persist in the Consul data folder across Consul agent restarts.

- `HTTP + Interval` - These checks make an HTTP `GET` request to the specified URL,
-  waiting the specified `interval` amount of time between requests (eg. 30 seconds).
-  The status of the service depends on the HTTP response code: any `2xx` code is
-  considered passing, a `429 Too ManyRequests` is a warning, and anything else is
-  a failure. This type of check
-  should be preferred over a script that uses `curl` or another external process
-  to check a simple HTTP operation. By default, HTTP checks are `GET` requests
-  unless the `method` field specifies a different method. Additional header
-  fields can be set through the `header` field which is a map of lists of
-  strings, e.g. `{"x-foo": ["bar", "baz"]}`. By default, HTTP checks will be
-  configured with a request timeout equal to 10 seconds.
+## Types of checks

-  It is possible to configure a custom HTTP check timeout value by
-  specifying the `timeout` field in the check definition. The output of the
-  check is limited to roughly 4KB. Responses larger than this will be truncated.
-  HTTP checks also support TLS. By default, a valid TLS certificate is expected.
-  Certificate verification can be turned off by setting the `tls_skip_verify`
-  field to `true` in the check definition. When using TLS, the SNI will be set
-  automatically from the URL if it uses a hostname (as opposed to an IP address);
-  the value can be overridden by setting `tls_server_name`.
+This section describes the available types of health checks you can use to
+automatically monitor the health of a service instance or node.

-  Consul follows HTTP redirects by default. Set the `disable_redirects` field to
-  `true` to disable redirects.
+-> **To manually mark a service unhealthy:** Use the maintenance mode
+  [CLI command](/commands/maint) or
+  [HTTP API endpoint](/api-docs/agent#enable-maintenance-mode)
+  to temporarily remove one or all service instances on a node
+  from service discovery DNS and HTTP API query results.

- `TCP + Interval` - These checks make a TCP connection attempt to the specified
-  IP/hostname and port, waiting `interval` amount of time between attempts
-  (e.g. 30 seconds). If no hostname
-  is specified, it defaults to "localhost". The status of the service depends on
-  whether the connection attempt is successful (ie - the port is currently
-  accepting connections). If the connection is accepted, the status is
-  `success`, otherwise the status is `critical`. In the case of a hostname that
-  resolves to both IPv4 and IPv6 addresses, an attempt will be made to both
-  addresses, and the first successful connection attempt will result in a
-  successful check. This type of check should be preferred over a script that
-  uses `netcat` or another external process to check a simple socket operation.
-  By default, TCP checks will be configured with a request timeout of 10 seconds.
-  It is possible to configure a custom TCP check timeout value by specifying the
-  `timeout` field in the check definition.
+### Script check ((#script-interval))

- `UDP + Interval` - These checks direct the client to periodically send UDP datagrams
-  to the specified IP/hostname and port. The duration specified in the `interval` field sets the amount of time 
-  between attempts, such as `30s` to indicate 30 seconds. The check is logged as healthy if any response from the UDP server is received. Any other result sets the status to `critical`.
-  The default interval for, UDP checks is `10s`, but you can configure a custom UDP check timeout value by specifying the
-  `timeout` field in the check definition. If any timeout on read exists, the check is still considered healthy.
+Script checks periodically invoke an external application that performs the health check,
+exits with an appropriate exit code, and potentially generates some output.
+The specified `interval` determines the time between check invocations.
+The output of a script check is limited to 4KB.
+Larger outputs are truncated.

- `Time to Live (TTL)` ((#ttl)) - These checks retain their last known state
-  for a given TTL. The state of the check must be updated periodically over the HTTP
-  interface. If an external system fails to update the status within a given TTL,
-  the check is set to the failed state. This mechanism, conceptually similar to a
-  dead man's switch, relies on the application to directly report its health. For
-  example, a healthy app can periodically `PUT` a status update to the HTTP endpoint;
-  if the app fails, the TTL will expire and the health check enters a critical state.
-  The endpoints used to update health information for a given check are: [pass](/api-docs/agent/check#ttl-check-pass),
-  [warn](/api-docs/agent/check#ttl-check-warn), [fail](/api-docs/agent/check#ttl-check-fail),
-  and [update](/api-docs/agent/check#ttl-check-update). TTL checks also persist their
-  last known status to disk. This allows the Consul agent to restore the last known
-  status of the check across restarts. Persisted check status is valid through the
-  end of the TTL from the time of the last check.
+By default, script checks are configured with a timeout equal to 30 seconds.
+To configure a custom script check timeout value,
+specify the `timeout` field in the check definition.
+After reaching the timeout on a Windows system,
+Consul waits for any child processes spawned by the script to finish.
+After reaching the timeout on other systems,
+Consul attempts to force-kill the script and any child processes it spawned.

- `Docker + Interval` - These checks depend on invoking an external application which
-  is packaged within a Docker Container. The application is triggered within the running
-  container via the Docker Exec API. We expect that the Consul agent user has access
-  to either the Docker HTTP API or the unix socket. Consul uses `$DOCKER_HOST` to
-  determine the Docker API endpoint. The application is expected to run, perform a health
-  check of the service running inside the container, and exit with an appropriate exit code.
-  The check should be paired with an invocation interval. The shell on which the check
-  has to be performed is configurable which makes it possible to run containers which
-  have different shells on the same host. Check output for Docker is limited to
-  4KB. Any output larger than this will be truncated. In Consul 0.9.0 and later, the agent
-  must be configured with [`enable_script_checks`](/docs/agent/config/cli-flags#_enable_script_checks)
-  set to `true` in order to enable Docker health checks.
+Script checks are not enabled by default.
+To enable a Consul agent to perform script checks,
+use one of the following agent configuration options:

- `gRPC + Interval` - These checks are intended for applications that support the standard
-  [gRPC health checking protocol](https://github.com/grpc/grpc/blob/master/doc/health-checking.md).
-  The state of the check will be updated by probing the configured endpoint, waiting `interval`
-  amount of time between probes (eg. 30 seconds). By default, gRPC checks will be configured
-  with a default timeout of 10 seconds.
-  It is possible to configure a custom timeout value by specifying the `timeout` field in
-  the check definition. gRPC checks will default to not using TLS, but TLS can be enabled by
-  setting `grpc_use_tls` in the check definition. If TLS is enabled, then by default, a valid
-  TLS certificate is expected. Certificate verification can be turned off by setting the
-  `tls_skip_verify` field to `true` in the check definition.
-  To check on a specific service instead of the whole gRPC server, add the service identifier after the `gRPC` check's endpoint in the following format `/:service_identifier`.
+- [`enable_local_script_checks`](/docs/agent/config/cli-flags#_enable_local_script_checks):
+  Enable script checks defined in local config files.
+  Script checks registered using the HTTP API are not allowed.
+- [`enable_script_checks`](/docs/agent/config/cli-flags#_enable_script_checks):
+  Enable script checks no matter how they are registered.

- `H2ping + Interval` - These checks test an endpoint that uses http2
-  by connecting to the endpoint and sending a ping frame. TLS is assumed to be configured by default.
-  To disable TLS and use h2c, set `h2ping_use_tls` to `false`. If the ping is successful
-  within a specified timeout, then the check is updated as passing.
-  The timeout defaults to 10 seconds, but is configurable using the `timeout` field. If TLS is enabled a valid
-  certificate is required, unless `tls_skip_verify` is set to `true`.
-  The check will be run on the interval specified by the `interval` field.
+  ~> **Security Warning:**
+  Enabling non-local script checks in some configurations may introduce
+  a remote execution vulnerability known to be targeted by malware.
+  We strongly recommend `enable_local_script_checks` instead.
+  For more information, refer to
+  [this blog post](https://www.hashicorp.com/blog/protecting-consul-from-rce-risk-in-specific-configurations).

- `Alias` - These checks alias the health state of another registered
-  node or service. The state of the check will be updated asynchronously, but is
-  nearly instant. For aliased services on the same agent, the local state is monitored
-  and no additional network resources are consumed. For other services and nodes,
-  the check maintains a blocking query over the agent's connection with a current
-  server and allows stale requests. If there are any errors in watching the aliased
-  node or service, the check state will be critical. For the blocking query, the
-  check will use the ACL token set on the service or check definition or otherwise
-  will fall back to the default ACL token set with the agent (`acl_token`).
-
-## Check Definition
-
-A script check:
+The following service definition file snippet is an example
+of a script check definition:

 <CodeTabs heading="Script Check">

@ -162,7 +90,6 @@ check = {
  interval = "10s"
  timeout = "1s"
 }
-
 ```

 ```json
@ -179,7 +106,47 @@ check = {

 </CodeTabs>

-A HTTP check:
+#### Check script conventions
+
+A check script's exit code is used to determine the health check status:
+
+- Exit code 0 - Check is passing
+- Exit code 1 - Check is warning
+- Any other code - Check is failing
+
+Any output of the script is captured and made available in the
+`Output` field of checks included in HTTP API responses,
+as in this example from the [local service health endpoint](/api-docs/agent/service#by-name-json).
+
+### HTTP check ((#http-interval))
+
+HTTP checks periodically make an HTTP `GET` request to the specified URL,
+waiting the specified `interval` amount of time between requests.
+The status of the service depends on the HTTP response code: any `2xx` code is
+considered passing, a `429 Too ManyRequests` is a warning, and anything else is
+a failure. This type of check
+should be preferred over a script that uses `curl` or another external process
+to check a simple HTTP operation. By default, HTTP checks are `GET` requests
+unless the `method` field specifies a different method. Additional request
+headers can be set through the `header` field which is a map of lists of
+strings, such as `{"x-foo": ["bar", "baz"]}`.
+
+By default, HTTP checks are configured with a request timeout equal to 10 seconds.
+To configure a custom HTTP check timeout value,
+specify the `timeout` field in the check definition.
+The output of an HTTP check is limited to approximately 4KB.
+Larger outputs are truncated.
+HTTP checks also support TLS. By default, a valid TLS certificate is expected.
+Certificate verification can be turned off by setting the `tls_skip_verify`
+field to `true` in the check definition. When using TLS, the SNI is implicitly
+determined from the URL if it uses a hostname instead of an IP address.
+You can explicitly set the SNI value by setting `tls_server_name`.
+
+Consul follows HTTP redirects by default.
+To disable redirects, set the `disable_redirects` field to `true`.
+
+The following service definition file snippet is an example
+of an HTTP check definition:

 <CodeTabs heading="HTTP Check">

@ -220,7 +187,23 @@ check = {

 </CodeTabs>

-A TCP check:
+### TCP check ((#tcp-interval))
+
+TCP checks periodically make a TCP connection attempt to the specified IP/hostname and port, waiting `interval` amount of time between attempts.
+If no hostname is specified, it defaults to "localhost".
+The health check status is `success` if the target host accepts the connection attempt,
+otherwise the status is `critical`. In the case of a hostname that
+resolves to both IPv4 and IPv6 addresses, an attempt is made to both
+addresses, and the first successful connection attempt results in a
+successful check. This type of check should be preferred over a script that
+uses `netcat` or another external process to check a simple socket operation.
+
+By default, TCP checks are configured with a request timeout equal to 10 seconds.
+To configure a custom TCP check timeout value,
+specify the `timeout` field in the check definition.
+
+The following service definition file snippet is an example
+of a TCP check definition:

 <CodeTabs heading="TCP Check">

@ -232,7 +215,6 @@ check = {
  interval = "10s"
  timeout = "1s"
 }
-
 ```

 ```json
@ -249,7 +231,21 @@ check = {

 </CodeTabs>

-A UDP check:
+### UDP check ((#udp-interval))
+
+UDP checks periodically direct the Consul agent to send UDP datagrams
+to the specified IP/hostname and port,
+waiting `interval` amount of time between attempts.
+The check status is set to `success` if any response is received from the targeted UDP server.
+Any other result sets the status to `critical`.
+
+By default, UDP checks are configured with a request timeout equal to 10 seconds.
+To configure a custom UDP check timeout value,
+specify the `timeout` field in the check definition.
+If any timeout on read exists, the check is still considered healthy.
+
+The following service definition file snippet is an example
+of a UDP check definition:

 <CodeTabs heading="UDP Check">

@ -261,7 +257,6 @@ check = {
  interval = "10s"
  timeout = "1s"
 }
-
 ```

 ```json
@ -278,7 +273,32 @@ check = {

 </CodeTabs>

-A TTL check:
+### Time to live (TTL) check ((#ttl))
+
+TTL checks retain their last known state for the specified `ttl` duration.
+If the `ttl` duration elapses before a new check update
+is provided over the HTTP interface,
+the check is set to `critical` state.
+
+This mechanism relies on the application to directly report its health.
+For example, a healthy app can periodically `PUT` a status update to the HTTP endpoint.
+Then, if the app is disrupted and unable to perform this update
+before the TTL expires, the health check enters the `critical` state.
+The endpoints used to update health information for a given check are: [pass](/api-docs/agent/check#ttl-check-pass),
+[warn](/api-docs/agent/check#ttl-check-warn), [fail](/api-docs/agent/check#ttl-check-fail),
+and [update](/api-docs/agent/check#ttl-check-update). TTL checks also persist their
+last known status to disk. This persistence allows the Consul agent to restore the last known
+status of the check across agent restarts. Persisted check status is valid through the
+end of the TTL from the time of the last check.
+
+To manually mark a service unhealthy,
+it is far more convenient to use the maintenance mode
+[CLI command](/commands/maint) or
+[HTTP API endpoint](/api-docs/agent#enable-maintenance-mode)
+rather than a TTL health check with arbitrarily high `ttl`.
+
+The following service definition file snippet is an example
+of a TTL check definition:

 <CodeTabs heading="TTL Check">

@ -304,7 +324,24 @@ check = {

 </CodeTabs>

-A Docker check:
+### Docker check ((#docker-interval))
+
+These checks depend on periodically invoking an external application that
+is packaged within a Docker Container. The application is triggered within the running
+container through the Docker Exec API. We expect that the Consul agent user has access
+to either the Docker HTTP API or the unix socket. Consul uses `$DOCKER_HOST` to
+determine the Docker API endpoint. The application is expected to run, perform a health
+check of the service running inside the container, and exit with an appropriate exit code.
+The check should be paired with an invocation interval. The shell on which the check
+has to be performed is configurable, making it possible to run containers which
+have different shells on the same host.
+The output of a Docker check is limited to 4KB.
+Larger outputs are truncated.
+The agent must be configured with [`enable_script_checks`](/docs/agent/config/cli-flags#_enable_script_checks)
+set to `true` in order to enable Docker health checks.
+
+The following service definition file snippet is an example
+of a Docker check definition:

 <CodeTabs heading="Docker Check">

@ -334,7 +371,26 @@ check = {

 </CodeTabs>

-A gRPC check for the whole application:
+### gRPC check ((##grpc-interval))
+
+gRPC checks are intended for applications that support the standard
+[gRPC health checking protocol](https://github.com/grpc/grpc/blob/master/doc/health-checking.md).
+The state of the check will be updated by periodically probing the configured endpoint,
+waiting `interval` amount of time between attempts.
+
+By default, gRPC checks are configured with a timeout equal to 10 seconds.
+To configure a custom Docker check timeout value,
+specify the `timeout` field in the check definition.
+
+gRPC checks default to not using TLS.
+To enable TLS, set `grpc_use_tls` in the check definition.
+If TLS is enabled, then by default, a valid TLS certificate is expected.
+Certificate verification can be turned off by setting the
+`tls_skip_verify` field to `true` in the check definition.
+To check on a specific service instead of the whole gRPC server, add the service identifier after the `gRPC` check's endpoint in the following format `/:service_identifier`.
+
+The following service definition file snippet is an example
+of a gRPC check for a whole application:

 <CodeTabs heading="gRPC Check">

@ -362,7 +418,8 @@ check = {

 </CodeTabs>

-A gRPC check for the specific `my_service` service:
+The following service definition file snippet is an example
+of a gRPC check for the specific `my_service` service

 <CodeTabs heading="gRPC Specific Service Check">

@ -390,7 +447,23 @@ check = {

 </CodeTabs>

-A h2ping check:
+### H2ping check ((#h2ping-interval))
+
+H2ping checks test an endpoint that uses http2 by connecting to the endpoint
+and sending a ping frame, waiting `interval` amount of time between attempts.
+If the ping is successful within a specified timeout,
+then the check status is set to `success`.
+
+By default, h2ping checks are configured with a request timeout equal to 10 seconds.
+To configure a custom h2ping check timeout value,
+specify the `timeout` field in the check definition.
+
+TLS is enabled by default.
+To disable TLS and use h2c, set `h2ping_use_tls` to `false`.
+If TLS is not disabled, a valid certificate is required unless `tls_skip_verify` is set to `true`.
+
+The following service definition file snippet is an example
+of an h2ping check definition:

 <CodeTabs heading="H2ping Check">

@ -418,7 +491,29 @@ check = {

 </CodeTabs>

-An alias check for a local service:
+### Alias check
+
+These checks alias the health state of another registered
+node or service. The state of the check updates asynchronously, but is
+nearly instant. For aliased services on the same agent, the local state is monitored
+and no additional network resources are consumed. For other services and nodes,
+the check maintains a blocking query over the agent's connection with a current
+server and allows stale requests. If there are any errors in watching the aliased
+node or service, the check state is set to `critical`.
+For the blocking query, the check uses the ACL token set on the service or check definition.
+If no ACL token is set in the service or check definition,
+the blocking query uses the agent's default ACL token
+([`acl.tokens.default`](/docs/agent/config/config-files#acl_tokens_default)).
+
+~> **Configuration info**: The alias check configuration expects the alias to be
+registered on the same agent as the one you are aliasing. If the service is
+not registered with the same agent, `"alias_node": "<node_id>"` must also be
+specified. When using `alias_node`, if no service is specified, the check will
+alias the health of the node. If a service is specified, the check will alias
+the specified service on this particular node.
+
+The following service definition file snippet is an example
+of an alias check for a local service:

 <CodeTabs heading="Alias Check">

@ -440,72 +535,137 @@ check = {

 </CodeTabs>

-~> Configuration info: The alias check configuration expects the alias to be
-registered on the same agent as the one you are aliasing. If the service is
-not registered with the same agent, `"alias_node": "<node_id>"` must also be
-specified. When using `alias_node`, if no service is specified, the check will
-alias the health of the node. If a service is specified, the check will alias
-the specified service on this particular node.
+## Check definition

-Each type of definition must include a `name` and may optionally provide an
-`id` and `notes` field. The `id` must be unique per _agent_ otherwise only the
-last defined check with that `id` will be registered. If the `id` is not set
-and the check is embedded within a service definition a unique check id is
-generated. Otherwise, `id` will be set to `name`. If names might conflict,
-unique IDs should be provided.
+This section covers some of the most common options for check definitions.
+For a complete list of all check options, refer to the
+[Register Check HTTP API endpoint documentation](/api-docs/agent/check#json-request-body-schema).

-The `notes` field is opaque to Consul but can be used to provide a human-readable
-description of the current state of the check. Similarly, an external process
-updating a TTL check via the HTTP interface can set the `notes` value.
+-> **Casing for check options:**
+   The correct casing for an option depends on whether the check is defined in
+   a service definition file or an HTTP API JSON request body.
+   For example, the option `deregister_critical_service_after` in a service
+   definition file is instead named `DeregisterCriticalServiceAfter` in an
+   HTTP API JSON request body.

-Checks may also contain a `token` field to provide an ACL token. This token is
-used for any interaction with the catalog for the check, including
-[anti-entropy syncs](/docs/architecture/anti-entropy) and deregistration.
-For Alias checks, this token is used if a remote blocking query is necessary
-to watch the state of the aliased node or service.
+#### General options

-Script, TCP, UDP, HTTP, Docker, and gRPC checks must include an `interval` field. This
-field is parsed by Go's `time` package, and has the following
-[formatting specification](https://golang.org/pkg/time/#ParseDuration):
+- `name` `(string: <required>)` - Specifies the name of the check.

-> A duration string is a possibly signed sequence of decimal numbers, each with
-> optional fraction and a unit suffix, such as "300ms", "-1.5h" or "2h45m".
-> Valid time units are "ns", "us" (or "µs"), "ms", "s", "m", "h".
+- `id` `(string: "")` - Specifies a unique ID for this check on this node.
  
-In Consul 0.7 and later, checks that are associated with a service may also contain
-an optional `deregister_critical_service_after` field, which is a timeout in the
-same Go time format as `interval` and `ttl`. If a check is in the critical state
-for more than this configured value, then its associated service (and all of its
-associated checks) will automatically be deregistered. The minimum timeout is 1
-minute, and the process that reaps critical services runs every 30 seconds, so it
-may take slightly longer than the configured timeout to trigger the deregistration.
-This should generally be configured with a timeout that's much, much longer than
-any expected recoverable outage for the given service.
+  If unspecified, Consul defines the check id by:
+  - If the check definition is embedded within a service definition file,
+     a unique check id is auto-generated.
+  - Otherwise, the `id` is set to the value of `name`.
+    If names might conflict, you must provide unique IDs to avoid
+    overwriting existing checks with the same id on this node.

-To configure a check, either provide it as a `-config-file` option to the
-agent or place it inside the `-config-dir` of the agent. The file must
-end in a ".json" or ".hcl" extension to be loaded by Consul. Check definitions
-can also be updated by sending a `SIGHUP` to the agent. Alternatively, the
-check can be registered dynamically using the [HTTP API](/api).
+- `interval` `(string: <required for interval-based checks>)` - Specifies
+  the frequency at which to run this check.
+  Required for all check types except TTL and alias checks.

-## Check Scripts
+  The value is parsed by Go's `time` package, and has the following
+  [formatting specification](https://golang.org/pkg/time/#ParseDuration):

-A check script is generally free to do anything to determine the status
-of the check. The only limitations placed are that the exit codes must obey
-this convention:
+  > A duration string is a possibly signed sequence of decimal numbers, each with
+  > optional fraction and a unit suffix, such as "300ms", "-1.5h" or "2h45m".
+  > Valid time units are "ns", "us" (or "µs"), "ms", "s", "m", "h".

- Exit code 0 - Check is passing
- Exit code 1 - Check is warning
- Any other code - Check is failing
+- `service_id` `(string: <required for service health checks>)` - Specifies
+  the ID of a service instance to associate this check with.
+  That service instance must be on this node.
+  If not specified, this check is treated as a node-level check.
+  For more information, refer to the
+  [service-bound checks](#service-bound-checks) section.

-This is the only convention that Consul depends on. Any output of the script
-will be captured and stored in the `output` field.
+- `status` `(string: "")` - Specifies the initial status of the health check as
+   "critical" (default), "warning", or "passing". For more details, refer to
+   the [initial health check status](#initial-health-check-status) section.
  
-In Consul 0.9.0 and later, the agent must be configured with
-[`enable_script_checks`](/docs/agent/config/cli-flags#_enable_script_checks) set to `true`
-in order to enable script checks.
+  -> **Health defaults to critical:** If health status it not initially specified,
+     it defaults to "critical" to protect against including a service
+     in discovery results before it is ready. 

-## Initial Health Check Status
+- `deregister_critical_service_after` `(string: "")` - If specified,
+  the associated service and all its checks are deregistered
+  after this check is in the critical state for more than the specified value.
+  The value has the same formatting specification as the [`interval`](#interval) field.
+
+  The minimum timeout is 1 minute,
+  and the process that reaps critical services runs every 30 seconds,
+  so it may take slightly longer than the configured timeout to trigger the deregistration.
+  This field should generally be configured with a timeout that's significantly longer than
+  any expected recoverable outage for the given service.
+
+- `notes` `(string: "")` - Provides a human-readable description of the check.
+  This field is opaque to Consul and can be used however is useful to the user.
+  For example, it could be used to describe the current state of the check.
+
+- `token` `(string: "")` - Specifies an ACL token used for any interaction
+  with the catalog for the check, including
+  [anti-entropy syncs](/docs/architecture/anti-entropy) and deregistration.
+
+  For alias checks, this token is used if a remote blocking query is necessary to watch the state of the aliased node or service.
+
+#### Success/failures before passing/warning/critical
+
+To prevent flapping health checks and limit the load they cause on the cluster,
+a health check may be configured to become passing/warning/critical only after a
+specified number of consecutive checks return as passing/critical.
+The status does not transition states until the configured threshold is reached.
+
+- `success_before_passing` - Number of consecutive successful results required
+  before check status transitions to passing. Defaults to `0`. Added in Consul 1.7.0.
+
+- `failures_before_warning` - Number of consecutive unsuccessful results required
+  before check status transitions to warning. Defaults to the same value as that of
+  `failures_before_critical` to maintain the expected behavior of not changing the
+  status of service checks to `warning` before `critical` unless configured to do so.
+  Values higher than `failures_before_critical` are invalid. Added in Consul 1.11.0.
+
+- `failures_before_critical` - Number of consecutive unsuccessful results required
+  before check status transitions to critical. Defaults to `0`. Added in Consul 1.7.0.
+
+This feature is available for all check types except TTL and alias checks.
+By default, both passing and critical thresholds are set to 0 so the check
+status always reflects the last check result.
+
+<CodeTabs heading="Flapping Prevention Example">
+
+```hcl
+checks = [
+  {
+    name = "HTTP TCP on port 80"
+    tcp = "localhost:80"
+    interval = "10s"
+    timeout  = "1s"
+    success_before_passing =  3
+    failures_before_warning =  1
+    failures_before_critical =  3
+  }
+]
+```
+
+```json
+{
+  "checks": [
+    {
+      "name": "HTTP TCP on port 80",
+      "tcp": "localhost:80",
+      "interval": "10s",
+      "timeout": "1s",
+      "success_before_passing": 3,
+      "failures_before_warning": 1,
+      "failures_before_critical": 3
+    }
+  ]
+}
+```
+
+</CodeTabs>
+
+## Initial health check status

 By default, when checks are registered against a Consul agent, the state is set
 immediately to "critical". This is useful to prevent services from being
@ -576,13 +736,13 @@ In the above configuration, if the web-app health check begins failing, it will
 only affect the availability of the web-app service. All other services
 provided by the node will remain unchanged.

-## Agent Certificates for TLS Checks
+## Agent certificates for TLS checks

 The [enable_agent_tls_for_checks](/docs/agent/config/config-files#enable_agent_tls_for_checks)
 agent configuration option can be utilized to have HTTP or gRPC health checks
 to use the agent's credentials when configured for TLS.

-## Multiple Check Definitions
+## Multiple check definitions

 Multiple check definitions can be defined using the `checks` (plural)
 key in your configuration file.
@ -640,58 +800,3 @@ checks = [
 ```

 </CodeTabs>
-
-## Success/Failures before passing/warning/critical
-
-To prevent flapping health checks, and limit the load they cause on the cluster,
-a health check may be configured to become passing/warning/critical only after a
-specified number of consecutive checks return passing/critical.
-The status will not transition states until the configured threshold is reached.
-
- `success_before_passing` - Number of consecutive successful results required
-  before check status transitions to passing. Defaults to `0`. Added in Consul 1.7.0.
- `failures_before_warning` - Number of consecutive unsuccessful results required
-  before check status transitions to warning. Defaults to the same value as that of
-  `failures_before_critical` to maintain the expected behavior of not changing the
-  status of service checks to `warning` before `critical` unless configured to do so.
-  Values higher than `failures_before_critical` are invalid. Added in Consul 1.11.0.
- `failures_before_critical` - Number of consecutive unsuccessful results required
-  before check status transitions to critical. Defaults to `0`. Added in Consul 1.7.0.
-
-This feature is available for HTTP, TCP, gRPC, Docker & Monitor checks.
-By default, both passing and critical thresholds will be set to 0 so the check
-status will always reflect the last check result.
-
-<CodeTabs heading="Flapping Prevention Example">
-
-```hcl
-checks = [
-  {
-    name = "HTTP TCP on port 80"
-    tcp = "localhost:80"
-    interval = "10s"
-    timeout  = "1s"
-    success_before_passing =  3
-    failures_before_warning =  1
-    failures_before_critical =  3
-  }
-]
-```
-
-```json
-{
-  "checks": [
-    {
-      "name": "HTTP TCP on port 80",
-      "tcp": "localhost:80",
-      "interval": "10s",
-      "timeout": "1s",
-      "success_before_passing": 3,
-      "failures_before_warning": 1,
-      "failures_before_critical": 3
-    }
-  ]
-}
-```
-
-</CodeTabs>