Merge pull request #2796 from hashicorp/f-autopilot-guide

Add autopilot guide to the docs
This commit is contained in:
Kyle Havlovitz 2017-03-10 15:32:27 -08:00 committed by GitHub
commit 29c4044e08
5 changed files with 156 additions and 0 deletions

View File

@ -48,6 +48,7 @@ func (s *Server) autopilotLoop() {
_, autopilotConf, err := state.AutopilotConfig()
if err != nil {
s.logger.Printf("[ERR] consul: error retrieving autopilot config: %s", err)
break
}
if err := s.autopilotPolicy.PromoteNonVoters(autopilotConf); err != nil {

View File

@ -558,6 +558,7 @@ Consul will not enable TLS for the HTTP API unless the `https` port has been ass
* <a name="autopilot"></a><a href="#autopilot">`autopilot`</a> Added in Consul 0.8, this object
allows a number of sub-keys to be set which can configure operator-friendly settings for Consul servers.
For more information about Autopilot, see the [Autopilot Guide](/docs/guides/autopilot.html).
<br><br>
The following sub-keys are available:

View File

@ -0,0 +1,116 @@
---
layout: "docs"
page_title: "Autopilot"
sidebar_current: "docs-guides-autopilot"
description: |-
This guide covers how to configure and use Autopilot features.
---
# Autopilot
Autopilot is a set of new features added in Consul 0.8 to allow for automatic
operator-friendly management of Consul servers. It includes cleanup of dead
servers, monitoring the of the Raft cluster, and stable server introduction.
To enable Autopilot features (with the exception of dead server cleanup),
the [`raft_protocol`](/docs/agent/options.html#_raft_protocol) setting in
the Agent configuration must be set to 3 or higher on all servers. In Consul
0.8 this setting defaults to 2; in Consul 0.9 it will default to 3. For more
information, see the [Version Upgrade section](/docs/upgrade-specific.html#raft_protocol)
on Raft Protocol versions.
## Configuration
The configuration of Autopilot is loaded by the leader from the agent's
[`autopilot`](/docs/agent/options.html#autopilot) settings when initially
bootstrapping the cluster. After bootstrapping, the configuration can
be viewed or modified either via the [`operator autopilot`]
(/docs/commands/operator/autopilot.html) subcommand or the
[`/v1/operator/autopilot/configuration`](/docs/agent/http/operator.html#autopilot-configuration)
HTTP endpoint:
```
$ consul operator autopilot get-config
CleanupDeadServers = true
LastContactThreshold = 200ms
MaxTrailingLogs = 250
ServerStabilizationTime = 10s
$ consul operator autopilot set-config -cleanup-dead-servers=false
Configuration updated!
$ consul operator autopilot get-config
CleanupDeadServers = false
LastContactThreshold = 200ms
MaxTrailingLogs = 250
ServerStabilizationTime = 10s
```
## Dead Server Cleanup
Dead servers will periodically be cleaned up and removed from the Raft peer
set, to prevent them from interfering with the quorum size and leader elections.
This cleanup will also happen whenever a new server is successfully added to the
cluster.
Prior to Autopilot, it would take 72 hours for dead servers to be automatically reaped,
or operators had to script a `consul force-leave`. If another server failure occurred,
it could jeopardize the quorum, even if the failed Consul server had been automatically
replaced. Autopilot helps prevent these kinds of outages by quickly removing failed
servers as soon as a replacement Consul server comes online.
This option can be disabled by running `consul operator autopilot set-config`
with the `-cleanup-dead-servers=false` option.
## Server Health Checking
An internal health check runs on the leader to track the stability of servers.
</br>A server is considered healthy if:
- It has a SerfHealth status of 'Alive'
- The time since its last contact with the current leader is below
`LastContactThreshold`
- Its latest Raft term matches the leader's term
- The number of Raft log entries it trails the leader by does not exceed
`MaxTrailingLogs`
The status of these health checks can be viewed through the [`/v1/operator/autopilot/health`]
(/docs/agent/http/operator.html#autopilot-health) HTTP endpoint, with a top level
`Healthy` field indicating the overall status of the cluster:
```
$ curl localhost:8500/v1/operator/autopilot/health
{
"Healthy": true,
"FailureTolerance": 0,
"Servers": [
{
"ID": "e349749b-3303-3ddf-959c-b5885a0e1f6e",
"Name": "node1",
"SerfStatus": "alive",
"LastContact": "0s",
"LastTerm": 3,
"LastIndex": 23,
"Healthy": true,
"StableSince": "2017-03-10T22:01:14Z"
},
{
"ID": "099061c7-ea74-42d5-be04-a0ad74caaaf5",
"Name": "node2",
"SerfStatus": "alive",
"LastContact": "53.279635ms",
"LastTerm": 3,
"LastIndex": 23,
"Healthy": true,
"StableSince": "2017-03-10T22:03:26Z"
}
]
}
```
## Stable Server Introduction
When a new server is added to the cluster, there is a waiting period where it
must be healthy and stable for a certain amount of time before being promoted
to a full, voting member. This can be configured via the `ServerStabilizationTime`
setting.

View File

@ -33,6 +33,40 @@ and update any scripts that passed a custom `-rpc-addr` to the following command
* `monitor`
* `reload`
#### <a name="raft_protocol"></a><a href="#raft_protocol">Raft Protocol Version Compatibility</a>
When upgrading to Consul 0.8.0 from a version lower than 0.7.0, users will need to
set the [`-raft-protocol`](/docs/agent/options.html#_raft_protocol) option to 1 in
order to maintain backwards compatibility with the old servers during the upgrade.
After the servers have been migrated to version 0.8.0, `-raft-protocol` can be moved
up to 2 and the servers restarted to match the default.
The Raft protocol must be stepped up in this way; only adjacent version numbers are
compatible (for example, version 1 cannot talk to version 3). Here is a table of the
Raft Protocol versions supported by each Consul version:
<table class="table table-bordered table-striped">
<tr>
<th>Version</th>
<th>Supported Raft Protocols</th>
</tr>
<tr>
<td>0.6 and earlier</td>
<td>0</td>
</tr>
<tr>
<td>0.7</td>
<td>1</td>
</tr>
<tr>
<td>0.8</td>
<td>1, 2, 3</td>
</tr>
</table>
In order to enable all [Autopilot](/docs/guides/autopilot.html) features, all servers
in a Consul cluster must be running with Raft protocol version 3 or later.
## Consul 0.7.1
#### Child Process Reaping

View File

@ -296,6 +296,10 @@
<a href="/docs/guides/servers.html">Adding/Removing Servers</a>
</li>
<li<%= sidebar_current("docs-guides-autopilot") %>>
<a href="/docs/guides/autopilot.html">Autopilot</a>
</li>
<li<%= sidebar_current("docs-guides-bootstrapping") %>>
<a href="/docs/guides/bootstrapping.html">Bootstrapping</a>
</li>