docs: write a lot of words about heartbeats (#14679)

* docs: write a lot of words about heartbeats

Alternative to #14670

* Apply suggestions from code review

Co-authored-by: Tim Gross <tgross@hashicorp.com>

* use descriptive title for link

* rework example of high failover ttl

Co-authored-by: Tim Gross <tgross@hashicorp.com>
This commit is contained in:
Michael Schurter 2022-09-26 14:43:34 -07:00 committed by GitHub
parent 7235d9988b
commit fb8739d926
No known key found for this signature in database
GPG Key ID: 4AEE18F83AFDEB23
1 changed files with 105 additions and 26 deletions

View File

@ -118,38 +118,25 @@ server {
example section](#configuring-scheduler-config) for more details example section](#configuring-scheduler-config) for more details
`default_scheduler_config` was introduced in Nomad 0.10.4. `default_scheduler_config` was introduced in Nomad 0.10.4.
- `heartbeat_grace` `(string: "10s")` - Specifies the additional time given as a - `heartbeat_grace` `(string: "10s")` - Specifies the additional time given
grace period beyond the heartbeat TTL of nodes to account for network and beyond the heartbeat TTL of Clients to account for network and processing
processing delays as well as clock skew. This is specified using a label delays and clock skew. This is specified using a label suffix like "30s" or
suffix like "30s" or "1h". "1h". See [Client Heartbeats](#client-heartbeats) below for details.
- `license_path` `(string: "")` - Specifies the path to load a Nomad Enterprise
license from. This must be an absolute path (`/opt/nomad/license.hclic`). The
license can also be set by setting `NOMAD_LICENSE_PATH` or by setting
`NOMAD_LICENSE` as the entire license value. `license_path` has the highest
precedence, followed by `NOMAD_LICENSE` and then `NOMAD_LICENSE_PATH`.
- `min_heartbeat_ttl` `(string: "10s")` - Specifies the minimum time between - `min_heartbeat_ttl` `(string: "10s")` - Specifies the minimum time between
node heartbeats. This is used as a floor to prevent excessive updates. This is Client heartbeats. This is used as a floor to prevent excessive updates. This
specified using a label suffix like "30s" or "1h". Lowering the minimum TTL is is specified using a label suffix like "30s" or "1h". See [Client
a tradeoff as it lowers failure detection time of nodes at the tradeoff of Heartbeats](#client-heartbeats) below for details.
false positives and increased load on the leader.
- `failover_heartbeat_ttl` `(string: "5m")` - Specifies the TTL applied to - `failover_heartbeat_ttl` `(string: "5m")` - The time by which all Clients
heartbeats after a new leader is elected, since we no longer know the status must heartbeat after a Server leader election. This is specified using a label
of all the heartbeats. This is specified using a label suffix like "30s" or suffix like "30s" or "1h". See [Client Heartbeats](#client-heartbeats) below
"1h". for details.
~> Lowering the `failover_heartbeat_ttl` is a tradeoff as it lowers failure
detection time of nodes at the tradeoff of false positives. False positives
could cause all clients to stop their allocations if a leadership transition
lasts longer than `heartbeat_grace + failover_heartbeat_ttl`.
- `max_heartbeats_per_second` `(float: 50.0)` - Specifies the maximum target - `max_heartbeats_per_second` `(float: 50.0)` - Specifies the maximum target
rate of heartbeats being processed per second. This allows the TTL to be rate of heartbeats being processed per second. This allows the TTL to be
increased to meet the target rate. Increasing the maximum heartbeats per increased to meet the target rate. See [Client
second is a tradeoff as it lowers failure detection time of nodes at the Heartbeats](#client-heartbeats) below for details.
tradeoff of false positives and increased load on the leader.
- `non_voting_server` `(bool: false)` - (Enterprise-only) Specifies whether - `non_voting_server` `(bool: false)` - (Enterprise-only) Specifies whether
this server will act as a non-voting member of the cluster to help provide this server will act as a non-voting member of the cluster to help provide
@ -160,6 +147,12 @@ server {
disallow this server from making any scheduling decisions. This defaults to disallow this server from making any scheduling decisions. This defaults to
the number of CPU cores. the number of CPU cores.
- `license_path` `(string: "")` - Specifies the path to load a Nomad Enterprise
license from. This must be an absolute path (`/opt/nomad/license.hclic`). The
license can also be set by setting `NOMAD_LICENSE_PATH` or by setting
`NOMAD_LICENSE` as the entire license value. `license_path` has the highest
precedence, followed by `NOMAD_LICENSE` and then `NOMAD_LICENSE_PATH`.
- `plan_rejection_tracker` <code>([PlanRejectionTracker](#plan_rejection_tracker-parameters))</code> - - `plan_rejection_tracker` <code>([PlanRejectionTracker](#plan_rejection_tracker-parameters))</code> -
Configuration for the plan rejection tracker that the Nomad leader uses to Configuration for the plan rejection tracker that the Nomad leader uses to
track the history of plan rejections. track the history of plan rejections.
@ -369,6 +362,90 @@ server {
} }
``` ```
## Client Heartbeats ((#client-heartbeats))
~> This is an advanced topic. It is most beneficial to clusters over 1,000
nodes or with unreliable networks or nodes (eg some edge deployments).
Nomad Clients periodically heartbeat to Nomad Servers to confirm they are
operating as expected. Nomad Clients which do not heartbeat in the specified
amount of time are considered `down` and their allocations are marked as `lost`
or `disconnected` (if [`max_client_disconnect`][max_client_disconnect] is set)
and rescheduled.
The various heartbeat related parameters allow you to tune the following
tradeoffs:
- The longer the heartbeat period, the longer a `down` Client's workload will
take to be rescheduled.
- The shorter the heartbeat period, the more likely transient network issues,
leader elections, and other temporary issues could cause a perfectly
functional Client and its workloads to be marked as `down` and the work
rescheduled.
While Nomad Clients can connect to any Server, all heartbeats are forwarded to
the leader for processing. Since this heartbeat processing consumes resources,
Nomad adjusts the rate at which Clients heartbeat based on cluster size. The
goal is to try to keep the resource cost of processing heartbeats constant
regardless of cluster size.
The base formula for determining how often a Client must heartbeat is:
```
<number of Clients> / <max_heartbeats_per_second>
```
Other factors modify this base TTL:
- A random factor up to `2x` is added to the base TTL to prevent the
[thundering herd][herd] problem where a large number of clients attempt to
heartbeat at exactly the same time.
- [`min_heartbeat_ttl`](#min_heartbeat_ttl) is used as the lower bound to
prevent small clusters from sending excessive heartbeats.
- [`heartbeat_grace`](#heartbeat_grace) is the amount of _extra_ time the
leader will wait for a heartbeat beyond the base heartbeat.
- After a leader election all Clients are given up to `failover_heartbeat_ttl`
to successfully heartbeat. This gives Clients time to discover a functioning
Server in case they were directly connected to a leader that crashed.
For example, given the default values for heartbeat parameters, different sized
clusters will use the following TTLs for the heartbeats. Note that the `Server TTL`
simply adds the `heartbeat_grace` parameter to the TTL Clients are given.
| Clients | Client TTL | Server TTL | Safe after elections |
| ------- | ----------- | ----------- | -------------------- |
| 10 | 10s - 20s | 20s - 30s | yes |
| 100 | 10s - 20s | 20s - 30s | yes |
| 1000 | 20s - 40s | 30s - 50s | yes |
| 5000 | 100s - 200s | 110s - 210s | yes |
| 10000 | 200s - 400s | 210s - 410s | NO (see below) |
Regardless of size, all clients will have a Server TTL of
`failover_heartbeat_ttl` after a leader election. It should always be larger
than the maximum Client TTL for your cluster size in order to prevent marking
live Clients as `down`.
For clusters over 5000 Clients you should increase `failover_heartbeat_ttl`
using the following formula:
```
(2 * (<number of Clients> / <max_heartbeats_per_second>)) + (10 * <min_heartbeat_ttl>)
# For example with 6000 Clients:
(2 * (6000 / 50)) + (10 * 10) = 340s (5m40s)
```
This ensures Clients have some additional time to failover even if they were
told to heartbeat after the maximum interval.
The actual value used should take into consideration how much tolerance your
system has for a delay in noticing crashed Clients. For example a
`failover_heartbeat_ttl` of 30 minutes may give even the slowest clients in the
largest clusters ample time to heartbeat after an election. However if the
election was due to a datacenter-wide failure affecting Clients, it will be 30
minutes before Nomad recognizes that they are `down` and reschedules their
work.
[encryption]: https://learn.hashicorp.com/tutorials/nomad/security-gossip-encryption 'Nomad Encryption Overview' [encryption]: https://learn.hashicorp.com/tutorials/nomad/security-gossip-encryption 'Nomad Encryption Overview'
[server-join]: /docs/configuration/server_join 'Server Join' [server-join]: /docs/configuration/server_join 'Server Join'
[update-scheduler-config]: /api-docs/operator/scheduler#update-scheduler-configuration 'Scheduler Config' [update-scheduler-config]: /api-docs/operator/scheduler#update-scheduler-configuration 'Scheduler Config'
@ -378,3 +455,5 @@ server {
[`nomad operator keygen`]: /docs/commands/operator/keygen [`nomad operator keygen`]: /docs/commands/operator/keygen
[search]: /docs/configuration/search [search]: /docs/configuration/search
[encryption key]: /docs/operations/key-management [encryption key]: /docs/operations/key-management
[max_client_disconnect]: /docs/job-specification/group#max-client-disconnect
[herd]: https://en.wikipedia.org/wiki/Thundering_herd_problem