57 lines
3.1 KiB
Plaintext
57 lines
3.1 KiB
Plaintext
---
|
|
layout: docs
|
|
page_title: Gossip Protocol | Serf
|
|
description: >-
|
|
Consul agents manage membership in datacenters and WAN federations using the Serf protocol. Learn about the differences between LAN and WAN gossip pools and how `serfHealth` affects health checks.
|
|
---
|
|
|
|
# Gossip Protocol
|
|
|
|
Consul uses a [gossip protocol](https://en.wikipedia.org/wiki/Gossip_protocol)
|
|
to manage membership and broadcast messages to the cluster. The protocol, membership management, and message broadcasting is provided
|
|
through the [Serf library](https://www.serf.io/). The gossip protocol
|
|
used by Serf is based on a modified version of the
|
|
[SWIM (Scalable Weakly-consistent Infection-style Process Group Membership)](https://www.cs.cornell.edu/projects/Quicksilver/public_pdfs/SWIM.pdf) protocol.
|
|
Refer to the [Serf documentation](https://www.serf.io/docs/internals/gossip.html) for additional information about the gossip protocol.
|
|
|
|
## Gossip in Consul
|
|
|
|
Consul uses a LAN gossip pool and a WAN gossip pool to perform different functions. The pools
|
|
are able to perform their functions by leveraging an embedded [Serf](https://www.serf.io/)
|
|
library. The library is abstracted and masked by Consul to simplify the user experience,
|
|
but developers may find it useful to understand how the library is leveraged.
|
|
|
|
### LAN Gossip Pool
|
|
|
|
Each datacenter that Consul operates in has a LAN gossip pool containing all members
|
|
of the datacenter (clients _and_ servers). Membership information provided by the
|
|
LAN pool allows clients to automatically discover servers, reducing the amount of
|
|
configuration needed. Failure detection is also distributed and shared by the entire cluster,
|
|
instead of concentrated on a few servers. Lastly, the gossip pool allows for fast and
|
|
reliable event broadcasts.
|
|
|
|
### WAN Gossip Pool
|
|
|
|
The WAN pool is globally unique. All servers should participate in the WAN pool,
|
|
regardless of datacenter. Membership information provided by the WAN pool allows
|
|
servers to perform cross-datacenter requests. The integrated failure detection
|
|
allows Consul to gracefully handle loss of connectivity--whether the loss is for
|
|
an entire datacenter, or a single server in a remote datacenter.
|
|
|
|
## Lifeguard Enhancements ((#lifeguard))
|
|
|
|
SWIM assumes that the local node is healthy, meaning that soft real-time packet
|
|
processing is possible. The assumption may be violated, however, if the local node
|
|
experiences CPU or network exhaustion. In these cases, the `serfHealth` check status
|
|
can flap. This can result in false monitoring alarms, additional telemetry noise, and
|
|
CPU and network resources being wasted as they attempt to diagnose non-existent failures.
|
|
|
|
Lifeguard completely resolves this issue with novel enhancements to SWIM.
|
|
|
|
For more details about Lifeguard, please see the
|
|
[Making Gossip More Robust with Lifeguard](https://www.hashicorp.com/blog/making-gossip-more-robust-with-lifeguard/)
|
|
blog post, which provides a high level overview of the HashiCorp Research paper
|
|
[Lifeguard : SWIM-ing with Situational Awareness](https://arxiv.org/abs/1707.00788). The
|
|
[Serf gossip protocol guide](https://www.serf.io/docs/internals/gossip.html#lifeguard)
|
|
also provides some lower-level details about the gossip protocol and Lifeguard.
|