Updating Stopping Agent Section (#8016)
Fixes #6935 to clarify agent behavior.
This commit is contained in:
parent
e8be0533df
commit
ebdc3a6b9e
|
@ -15,23 +15,28 @@ information, registers services, runs checks, responds to queries,
|
|||
and more. The agent must run on every node that is part of a Consul cluster.
|
||||
|
||||
Any agent may run in one of two modes: client or server. A server
|
||||
node takes on the additional responsibility of being part of the [consensus quorum](/docs/internals/consensus).
|
||||
node takes on the additional responsibility of being part of the
|
||||
[consensus quorum](/docs/internals/consensus).
|
||||
These nodes take part in Raft and provide strong consistency and availability in
|
||||
the case of failure. The higher burden on the server nodes means that usually they
|
||||
should be run on dedicated instances -- they are more resource intensive than a client
|
||||
node. Client nodes make up the majority of the cluster, and they are very lightweight
|
||||
as they interface with the server nodes for most operations and maintain very little state
|
||||
of their own.
|
||||
the case of failure. The higher burden on the server nodes means that usually
|
||||
they should be run on dedicated instances -- they are more resource intensive
|
||||
than a client node. Client nodes make up the majority of the cluster, and they
|
||||
are very lightweight as they interface with the server nodes for most
|
||||
operations and maintain very little state of their own.
|
||||
|
||||
## Running an Agent
|
||||
|
||||
The agent is started with the [`consul agent`](/docs/commands/agent) command. This
|
||||
command blocks, running forever or until told to quit. You can test a local agent by following the [Getting Started guides](https://learn.hashicorp.com/consul/getting-started/install?utm_source=consul.io&utm_medium=docs).
|
||||
The agent is started with the [`consul agent`](/docs/commands/agent) command.
|
||||
This command blocks, running forever or until told to quit. You can test a
|
||||
local agent by following the
|
||||
[Getting Started guides](https://learn.hashicorp.com/consul/getting-started/install?utm_source=consul.io&utm_medium=docs).
|
||||
|
||||
The agent command takes a variety
|
||||
of [`configuration options`](/docs/agent/options#command-line-options), but most have sane defaults.
|
||||
The agent command takes a variety of
|
||||
[`configuration options`](/docs/agent/options#command-line-options), but most
|
||||
have sane defaults.
|
||||
|
||||
When running [`consul agent`](/docs/commands/agent), you should see output similar to this:
|
||||
When running [`consul agent`](/docs/commands/agent), you should see output
|
||||
similar to this:
|
||||
|
||||
```shell-session
|
||||
$ consul agent -data-dir=/tmp/consul
|
||||
|
@ -49,33 +54,38 @@ $ consul agent -data-dir=/tmp/consul
|
|||
...
|
||||
```
|
||||
|
||||
There are several important messages that [`consul agent`](/docs/commands/agent) outputs:
|
||||
There are several important messages that
|
||||
[`consul agent`](/docs/commands/agent) outputs:
|
||||
|
||||
- **Node name**: This is a unique name for the agent. By default, this
|
||||
is the hostname of the machine, but you may customize it using the
|
||||
[`-node`](/docs/agent/options#_node) flag.
|
||||
|
||||
- **Datacenter**: This is the datacenter in which the agent is configured to run.
|
||||
Consul has first-class support for multiple datacenters; however, to work efficiently,
|
||||
each node must be configured to report its datacenter. The [`-datacenter`](/docs/agent/options#_datacenter)
|
||||
flag can be used to set the datacenter. For single-DC configurations, the agent
|
||||
will default to "dc1".
|
||||
- **Datacenter**: This is the datacenter in which the agent is configured to
|
||||
run.
|
||||
Consul has first-class support for multiple datacenters; however, to work
|
||||
efficiently, each node must be configured to report its datacenter. The
|
||||
[`-datacenter`](/docs/agent/options#_datacenter) flag can be used to set the
|
||||
datacenter. For single-DC configurations, the agent will default to "dc1".
|
||||
|
||||
- **Server**: This indicates whether the agent is running in server or client mode.
|
||||
- **Server**: This indicates whether the agent is running in server or client
|
||||
mode.
|
||||
Server nodes have the extra burden of participating in the consensus quorum,
|
||||
storing cluster state, and handling queries. Additionally, a server may be
|
||||
in ["bootstrap"](/docs/agent/options#_bootstrap_expect) mode. Multiple servers
|
||||
cannot be in bootstrap mode as that would put the cluster in an inconsistent state.
|
||||
cannot be in bootstrap mode as that would put the cluster in an inconsistent
|
||||
state.
|
||||
|
||||
- **Client Addr**: This is the address used for client interfaces to the agent.
|
||||
This includes the ports for the HTTP and DNS interfaces. By default, this binds only
|
||||
to localhost. If you change this address or port, you'll have to specify a `-http-addr`
|
||||
whenever you run commands such as [`consul members`](/docs/commands/members) to
|
||||
indicate how to reach the agent. Other applications can also use the HTTP address and port
|
||||
This includes the ports for the HTTP and DNS interfaces. By default, this
|
||||
binds only to localhost. If you change this address or port, you'll have to
|
||||
specify a `-http-addr` whenever you run commands such as
|
||||
[`consul members`](/docs/commands/members) to indicate how to reach the
|
||||
agent. Other applications can also use the HTTP address and port
|
||||
[to control Consul](/api).
|
||||
|
||||
- **Cluster Addr**: This is the address and set of ports used for communication between
|
||||
Consul agents in a cluster. Not all Consul agents in a cluster have to
|
||||
- **Cluster Addr**: This is the address and set of ports used for communication
|
||||
between Consul agents in a cluster. Not all Consul agents in a cluster have to
|
||||
use the same port, but this address **MUST** be reachable by all other nodes.
|
||||
|
||||
When running under `systemd` on Linux, Consul notifies systemd by sending
|
||||
|
@ -85,44 +95,62 @@ service definition file has to have `Type=notify` set.
|
|||
|
||||
## Stopping an Agent
|
||||
|
||||
An agent can be stopped in two ways: gracefully or forcefully. To gracefully
|
||||
halt an agent, send the process an interrupt signal (usually
|
||||
`Ctrl-C` from a terminal or running `kill -INT consul_pid` ). When gracefully exiting, the agent first notifies
|
||||
the cluster it intends to leave the cluster. This way, other cluster members
|
||||
notify the cluster that the node has _left_.
|
||||
An agent can be stopped in two ways: gracefully or forcefully. Servers and
|
||||
Clients both behave differently depending on the leave that is performed. There
|
||||
are two potential states a process can be in after a system signal is sent:
|
||||
_left_ and _failed_.
|
||||
|
||||
Alternatively, you can force kill the agent by sending it a kill signal.
|
||||
When force killed, the agent ends immediately. The rest of the cluster will
|
||||
eventually (usually within seconds) detect that the node has died and
|
||||
notify the cluster that the node has _failed_.
|
||||
To gracefully halt an agent, send the process an _interrupt signal_ (usually
|
||||
`Ctrl-C` from a terminal, or running `kill -INT consul_pid` ). For more
|
||||
information on different signals sent by the `kill` command, see
|
||||
[here](https://www.linux.org/threads/kill-signals-and-commands-revised.11625/)
|
||||
|
||||
It is especially important that a server node be allowed to leave gracefully
|
||||
so that there will be a minimal impact on availability as the server leaves
|
||||
the consensus quorum.
|
||||
When a Client is gracefully exited, the agent first notifies the cluster it
|
||||
intends to leave the cluster. This way, other cluster members notify the
|
||||
cluster that the node has _left_.
|
||||
|
||||
When a Server is gracefully exited, the server will not be marked as _left_.
|
||||
This is to minimally impact the consensus quorum. Instead, the Server will be
|
||||
marked as _failed_. To remove a server from the cluster, the
|
||||
[`force-leave`](/docs/commands/force-leave) command is used. Using
|
||||
`force-leave` will put the server instance in a _left_ state so long as the
|
||||
Server agent is not alive.
|
||||
|
||||
Alternatively, you can forcibly stop an agent by sending it a
|
||||
`kill -KILL consul_pid` signal. This will stop any agent immediately. The rest
|
||||
of the cluster will eventually (usually within seconds) detect that the node has
|
||||
died and notify the cluster that the node has _failed_.
|
||||
|
||||
For client agents, the difference between a node _failing_ and a node _leaving_
|
||||
may not be important for your use case. For example, for a web server and load
|
||||
balancer setup, both result in the same outcome: the web node is removed
|
||||
from the load balancer pool.
|
||||
|
||||
The [`skip_leave_on_interrupt`](/docs/agent/options#skip_leave_on_interrupt) and
|
||||
[`leave_on_terminate`](/docs/agent/options#leave_on_terminate) configuration
|
||||
options allow you to adjust this behavior.
|
||||
|
||||
## Lifecycle
|
||||
|
||||
Every agent in the Consul cluster goes through a lifecycle. Understanding
|
||||
this lifecycle is useful for building a mental model of an agent's interactions
|
||||
with a cluster and how the cluster treats a node.
|
||||
|
||||
When an agent is first started, it does not know about any other node in the cluster.
|
||||
When an agent is first started, it does not know about any other node in the
|
||||
cluster.
|
||||
To discover its peers, it must _join_ the cluster. This is done with the
|
||||
[`join`](/docs/commands/join)
|
||||
command or by providing the proper configuration to auto-join on start. Once a node
|
||||
joins, this information is gossiped to the entire cluster, meaning all nodes will
|
||||
eventually be aware of each other. If the agent is a server, existing servers will
|
||||
begin replicating to the new node.
|
||||
command or by providing the proper configuration to auto-join on start. Once a
|
||||
node joins, this information is gossiped to the entire cluster, meaning all
|
||||
nodes will eventually be aware of each other. If the agent is a server,
|
||||
existing servers will begin replicating to the new node.
|
||||
|
||||
In the case of a network failure, some nodes may be unreachable by other nodes.
|
||||
In this case, unreachable nodes are marked as _failed_. It is impossible to distinguish
|
||||
between a network failure and an agent crash, so both cases are handled the same.
|
||||
Once a node is marked as failed, this information is updated in the service catalog.
|
||||
In this case, unreachable nodes are marked as _failed_. It is impossible to
|
||||
distinguish between a network failure and an agent crash, so both cases are
|
||||
handled the same.
|
||||
Once a node is marked as failed, this information is updated in the service
|
||||
catalog.
|
||||
|
||||
-> **Note:** There is some nuance here since this update is only possible if the servers can still [form a quorum](/docs/internals/consensus). Once the network recovers or a crashed agent restarts the cluster will repair itself and unmark a node as failed. The health check in the catalog will also be updated to reflect this.
|
||||
|
||||
|
|
Loading…
Reference in New Issue